State of the project :

I need to return to this and think more about why the policy doesn’t seem to be improving. Initially it seems to reach a place where it seems to be learning a rolling locomotive strategy but it decays into a sliding, jerking motion with more training. It perhaps would recover from this given enough training time but I have a feeling I could do a better job specifying the observations or actions. I plan on returning to this after some work running RL policies on the Jetson for the WalkBot.

The next thing might be an observation of the CMs positions relative to the core?

Introduction

The purpose of this project is to recreate the tensegrity control research being conducted by NASA for experimental planetary rovers. I started by creating a simple tensegrity structure where spring forces were used at the ends of structural rods. I want to try using the Isaac Gym tendon component for this instead. It is probably a more optimized implementation than what I was attempting. The ShadowHand example will be a good resource for this (it is also the only example project that uses the tendon objects).

  • URDF doesn’t support tendons. I’m either going to need to implement them while creating the actors or learn how to use XML <mujoco> type of robot definitions… There’s always something new to learn.

  • I don’t think I can specify tendons after actor creation, No way around it, need to use the mojoco type of robot descriptor!

    • Isaac gym does not support multiple free bodies for a given actor… So I cant specify the rods of the tensegrity with tendons between them. I tried :

      • 3 prismatic joints

      • 3 prismatic joints + 3 rotational joints

      • 3 rotational joints + 3 prismatic joints

        • This one almost works… Ghost forces and explosions still happen frequently…

    • Forces / Positions explode to NAN sometimes

    • Ghost forces act on the ‘free’ bodies in this set up

  • Back to manually applying spring forces

    • Modifications for springs to only work in tension and not compression seem to have improved the stability of the tensegrity bot.

    • Spring lengths can be modified to actuate the tensegrity bot:

    • Video to the right shows how modifying the spring lengths can actuate the bot

      • Spring_Multiplier = cos(loopcount/100)*0.8 + 1.0

      • Spring_Length = Rest_Length*Spring_Multiplier

    • The action for the RL agent will likely take the place of spring multiplier to start, I think this should allow it to learn to locomote… There is likely a lot of setup work to be done still as this isn’t as simple as loading CubeBot or WalkBot

Defining the Tensegrity Bot

I needed to come up with a way that allowed me to specify the configuration of the tensegrity bot that was concise and understandable. I created a section in the training yaml files to achieve this :

tensegrityParams:
  numSupports: 6       
  seperationDist: 0.05 
  spawnHeight: 0.15  
  spring_coff: 50
  damping_coff: 0.99
  spring_length_change_factor: 0.5 #The % a spring change change from its rest length

  supportNames: ["X1", "X2", "Y1", "Y2", "Z1", "Z2"]
  positions: [[0.0, 1.0, 0.0], #This is multiplied by seperation distance
              [0.0, -1.0, 0.0], #Spawn height is added to the Z dim
              [0.0, 0.0, 1.0],   # These are basically the init positions
              [0.0, 0.0, -1.0],
              [1.0, 0.0, 0.0],
              [-1.0, 0.0, 0.0]]
  orientations: [[0, -1.57075, 0], #rotations in zyx euler angles (rads)
                [0, -1.57075, 0],
                [-1.57075, 0, 0],
                [-1.57075, 0, 0],
                [0, 0, 0],
                [0, 0, 0]]
  connections: [["X1", "T", "Y1", "B", 0.1], # Connections to X1 Top 
                ["X1", "T", "Y2", "B", 0.1],
                ["X1", "T", "Z1", "T", 0.1],
                ["X1", "T", "Z1", "B", 0.1], ...

Here we can see how the tensegrity is specified. The rods are named and a retaliative position and orientation needs to be specified. After that a connection list is created where elements of the list are :

  • [Support1 Name, Support1 Connection Location, Support2 Name, Support2 Connection Location, Connection Length]

These lists are parsed in the Tensegrity VecTask class when the agent is created.

 Tensegrity 6

The 3 bar linkage tensegrity was very likely to enter a state where it could no longer actuate and would flatten to the ground. It has served its purpose as proof of concept and I’ll be moving forward to a 6 bar tensegrity moving forward. This shape is much less likely to come apart and offers more degrees of freedom to actuate.

Tensegrity 6 with center ‘Core’ attached to support midpoints.

Tests :

  1. Vanilla

    1. My first attempt

  2. Compressing Observations Space

    1. Trying to get a smaller representation of the robot

  3. Adding a ‘Core’

  4. Larger NN?

    1. The current NN has 3 layers [128, 128, 128]. I’m not 100% sure if this is large enough…

      1. Shadow Hand uses [512, 512, 256, 128]

      2. Anymal uses [256, 128, 64]

      3. Humanoid uses [400, 200, 100]

        1. I tried this size just to see if there was a difference. Both NN sizes plateaued at the same reward level. I think that might be pointing me in the direction of fixing the observation space.

  5. Trying Different observations

pink : NN=[128, 128, 128]

cyan : NN=[400, 200, 100]

The Observations and Actions :

  • Full state of each support rod

    • I was excited to see if any policy could be learned so my first attempt at an observation space was the full state information for each supporting rod.

      • Each rigid body state consists of thirteen (13) float variables

        • Position (3), Orientation Quaternion (4), Linear Velocity (3), Angular Velocity (3)

    • Additionally, there was goal position (3), a normalized vector to the goal (3), and the selected actions (24)

    • The observation space consists of 108 floats

  • Averaged state of each support rod

    • To reduce the observation space, I thought an average of the rod states would be useful. Position, Linear Velocity and Angular Velocity were trivial, but I needed to do a bit of reading to understand how to average quaternions. Unforchantly due to the support rods not being constrained to rotate about their local ‘Z’ axis, the average quaternion was not a good estimate of the total orientation. I tried adding some torques to keep the support rods near there starting Z orientation but this lead to instabilities in the model. I instead opted to attach another object I’ve called the core. This core is attached by springs to each of the support rod midpoints and provides an easy to access reference that approximates the average of the rod states

  • Core for average state reference

    • The execution seemed solid but the training performance of this is notably worse than the full 108 observation space. This is odd to me, after discovering the unconstrained Z rotation I assumed the orientation information would have been useless. Maybe the observation needs to have some representation of the overall configuration to determine a good policy

  • Contact Sensors

    • Isaac sim gives us access to the contact force tensor. Using this we can create an observation for which bodies are currently in contact with the ground. This might help the control policy, I’m not 100% sure yet.

Some Training Results

So far training hasn’t produced a policy that looks great. The following shows the training reward over time for some of the trials I’ve run

Red : Observation of all the support rod states

Cyan : Removed the orientation observation because I believed it to be unreliable due to the unconstrained Z rotations. Apparently PPO disagrees with my assessment

Pink : ‘Core’ replaces state observation of the individual support rods

I’m a little surprised that the Core as a source of state information didn’t produce better results. I think this means that another observation is needed that gives more insight into the current configuration of the tensegrity bot. I’m thinking spring connection length might not be a bad next thing to try.

Next modifications to try:

  • Low pass filter on actions to prevent instantaneous spring length changes

  • Change the action to something more physically viable

    • Changing the length of each rod would reduce the action space from (24) to (6)

    • This also seems like something that would be more viable to build

Averaging Quaternions

  • https://hal.inria.fr/hal-01513263/file/bare_jrnl.pdf

  • http://www.malcolmdshuster.com/FC_Lerner-SADC-ThreeAxis_MDSscan.pdf

  • http://www.acsu.buffalo.edu/%7Ejohnc/ave_quat07.pdf


Notes

  • Training is notably slower than the previous projects I’ve worked on. This is probably do the the force calculations being manually calculated outside the physics engine. Notably adding the core and it’s six associated springs increased the training time significantly.

  • The stiffer the springs (larger spring forces), the smaller the dt needed to prevent the simulation from becoming unstable

    • This also is a function of the spring length

    • dt is currently set to 1/1000 s. I wouldn’t want to go too much smaller than this as it impacts the training time significantly.

  • There is an option to enable/disable the contact force tensor. By default this was set to off in the PPO.yaml file. Took a while to discover why the contact forces worked in my test code but not during training…