Work in Progress

Introduction

This section will cover the work of moving to the previous work over to the Jetson. To start, we are going to need to modify the observations to something that we can actually measure. An IMU is probably the best stand in for the state information used previously. I’ll need to develop an IMU model, work on getting the hardware together and figure out how deployment will work. I think I may start with a simple demo project where the angle of a single leg will be controlled with a policy learned in Isaac Gym. If I can get that working I believe rest of this project will be an engineering exercise.

Testing Policy Deployment

For this test I will use a single leg and a target end effector position. In the simulator ,a policy will be learned that has the following observations :

  • Joint DOF Positions

  • Joint DOF Velocity

  • Target End Position

The reward will be given for minimizing the distance between the foot position and the target position. When deployed I should be able to drive the leg to desired positions by supplying the target coordinates manually.

Modifying the Walkbot Code

  • I’ve removed all the left leg components from the URDF for this test. It starting to look like what I have on my workbench.

  • I’ll start by copying the walking training files and renaming them for this new project. I’m just calling it Walkbot_IKRL. The expect the end result to look like the a solution one would get with inverse kinematics.

  • Most of the training parameters can stay the same. The changes are :

    • Observation

      • Reduced to 12 observations:

        • Joint Angles (3), Joint Velocities (3), Goal End Position (3), Actions (3)

    • Actions

      • Reduced to 3 actions since we are only controlling one leg right now. Actions will be target positions for now

    • Rewards :

      • Reward will be something like 1-norm(end_pos - goal_pos)^2

        • I’m using -norm(end_pos - goal_pos)^2, No need for the positive portion of the reward

        • I added a -1 penalty when norm(dof_vel) > 20 to force it to regulate it’s speed. In the future I’ll probably want to add something more intelligent than this but for now it works

        • I also give +1 reward for reaching the goal

          • Reaching the goal is reducing the distance to goal below the goal threshold

      • The setup was promoting behaviors i didn’t like. The goal was reset and points were awarded for reaching the destination position. This promoted very rapid movements between positions. I decided that the goal should be moving constantly and points would be award for staying within the vicinity of the goal.

      • I added penalties for

        • Large action values

          • actions_cost_scale * torch.sum(actions**2, dim=-1)

        • Large energy values (very similar to the action penalty)

          • energy_cost_scale * torch.sum(torch.abs(actions * dof_vel), dim=-1)

        • DOF at limit of travel penalty

          • joints_at_limit_cost_scale * sum( (torch.abs(obs_buf[:, 0:3]) - 0.98) / 0.02 * (torch.abs(obs_buf[:, 0:3]) > 0.98))

    • Asset Options

      • asset_options.fix_base_link = True

    • baseInitState (in the yaml config file)

      • rot: [0.7071, 0.0, 0.0, 0.7071] #So it looks like what is oriented like it is on my workbench

    • I needed to do a transform from the 3rd servo joint to the tip of the foot

      • Turn the Quaternions into rotation matrices,

      • Rotate a vector that is the length from the servo2 origin to the tip of the foot

      • Add the servo2 origin to the result

  • This is also a good time to get the torque control mode working. The simplified environment might help me debug the issues I was having during the WalkBot training

Some Results

Early on it seems to produce an ok policy but as training continues it seems to decay and I’m not sure why. It seems to be learning behaviors that are against the rewards metrics. It is also very likely to learn behaviors where actions fluctuate between extreme values, causing twitchy, jerky motions that I wouldn’t want in the real model. I’ve tried curbing this with punishments scaled by actions, punishments scaled by dof_velocity and punishments scaled by power usage. None of these seem to help at this point.

  • The fluctuations seem to happen when the goal is outside of the reach of the end effector. When it is in range it seems fine

  • The goal position observation wasn’t correctly constrained to [-1, 1]

Video Block
Double-click here to add a video by URL or embed code. Learn more
Video Block
Double-click here to add a video by URL or embed code. Learn more

Interesting/Unexpected Behaviors

  • Early on I forgot to set the collision filters correctly so the goal object could interacting with the servo leg. Because they could interact, it was learned that a high scoring policy was to capture the goal to prevent it from drifting away.

    • Collision filters were changed so environment interactions wouldn’t occur

  • After removing collisions I noticed another strange behavior. As the goal drifted away, the leg would retreat in the opposite direction. This would reduce the rewards initially as distance to the goal was increasing. HOWEVER: I had set it so if the goal is a certain distance from the foot, it would receive a velocity vector in the direction of the table. By moving in the opposite direction it was forcing the goal to move back into range sooner than it would have otherwise.

    • This metric was switch to reference a ‘home position’ that the agent could not manipulate

  • The agent should not have any ability to manipulate the goal but it keeps discovering things I had not thought about. It was able to move the end point outside the envelope the goal was allowed to be in. In some situations this could force the goal to get locked to the foot since my logic tried to push the ball towards the foot whenever it was getting too far from the platform. These are the smartest stupid machines I’ve made :-p

    • The ‘kick’ logic now moves the ball towards the home point and not the end effector.

Installing Isaac Gym on the Jetson

Wish me Luck!

  • Check JetPack version : sudo apt-cache show nvidia-jetpack

    • Currently on version 4.6-b199

  • Download the Isaac Gym files from : https://developer.nvidia.com/isaac-gym

    • first install attempted failed, couldn’t find skbuild

      • pip3 install scikit-build

    • second attempt failed, couldn’t find torch>=1.8.0

      • There are some directions on how to get torch on the jetson here : https://forums.developer.nvidia.com/t/pytorch-for-jetson-version-1-10-now-available/72048

        • This took more than 45 minutes to complete, probably could have skipped the torchvision part which comprised the majority of this time

        • Jetson froze while testing immediately after the installed completed

        • It appears to working, although creating tensors caused the system to freeze up for a minute or two…. not sure if this will end up being an issue later on….

    • Still can’t find torch

      • Upgrade pip : pip install -U pip

    • INSTALLED SUCCESSFUL!!!

Ok now what? No way this will run simulation / training… so how do I test this? I guess lets try running the joint_monkey.py example and see what happens

That didn’t work… I’m trying to figure out the path forward now…

I need to get the Torch model running on the Jetson. Issac Gym really isn’t necessary for that since I’ll be getting my observations from the robot’s sensors anyways… I’ll need to do some research on how to leverage the network I made on the computer here. I’ll be back…

Update :

I’m currently pulling the thread of the RL_Games library. I think a better understanding of how this framework is set up will help me get this network running on hardware. I definitely don’t need the Isaac gym environment on the Jetson…

  • Can I make a stand alone python project that loads the network trained using Isaac Gym?

    • If I can do this, and feed it some generated data, I think this is all that needs to run on the Jetson.

  • If I can’t do the above, I need to find a way to just create the network and load the weights from training. Since Torch is used as the ML backend, I know this is possible. I’d need to mess around with the code of RL_Games with either path.

  • I think I want to try the HER (Hindsight Experience Replay) technique anyways, so I’m going to need to learn how this code is structured to modify the algorithms to implement this

RL Games

RL games is the RL framework Nvidia has chosen for Isaac Gym. The code seems very well implemented but documentation is a bit lacking. :(

Notes :

  • It seems the code is kicked off by a script called runner.py

    • runner.py loads the config yaml file

    • it instantiates a Runner object from rl_games.torch_runner

    • the runner loads the config file

    • the runner runs

    • the runner is wrapped by ray (used for code parallelization)

  • next important thing seems like torch_runner.py

    • Contains the algo_factory, player_factory and algo_observer

    • Had Methods :

      • load_config

        • loads the config file…

      • load

        • calls load_config.

        • saved the default config. I assume so we can change settings and restore them or something… not sure

      • run_train

        • agent is created

        • agent is restored from a checkpoint (if provided)

        • agent.train() is called

      • run_play

        • player is created with self.create_player()

      • create_player

        • self.player_factory.create(…)

          • idk why this needed its own function…

      • reset

        • pass

      • run

        • calls run_train or run_play

    • I think the player_factory might be the direction I need to go to run the policy on hardware

      • registers players.PpoPlayerContinuous(**kwargs)

        • Builds the network!!!

        • gets actions from observations

        • can load the model from checkpoints

      • Derived from BasePlayer

        • Creates the Environment

        • This is where the run() function is defined

    • I think the algo_factory is where I’ll find some more threads to pull to impliment HER

      • registers a2c_continuous.A2CAgent(**kwargs)

        • Similar to player except with the learning algorithm functions

          • Updates Epoch

          • Calc Gradients

          • Ect …

        • Derived from a2c_common.ContinuousA2CBase)

          • Here is where the experience buffer is instantiated!

            • ERB is updated in play_steps(self)

          • Contains the Vec_Env object

config = yaml.safe_load(stream)

runner = Runner()
try:
    runner.load(config)
except yaml.YAMLError as exc:
    print(exc)

runner.run(args)
 
def load(self, yaml_conf):
        self.default_config = yaml_conf['params']
        self.load_config(params=copy.deepcopy(self.default_config))

def run_train(self, load_path=None):
    print('Started to train')
    agent = self.algo_factory.create(self.algo_name, base_name='run', params=self.params)
    if load_path is not None:
        agent.restore(load_path)
    agent.train()

def run_play(self, load_path=None):
    print('Started to play')
    player = self.create_player()
    if load_path is not None:
        player.restore(load_path)
    player.run()

Checking in :

  • The path to run this on the Jetson involves

    • Making a new environment specifically for the Jetson configuration

      • This should derive from the configuration files used for training so they have identical parameters

      • Load the model and run with num_envs = 1

        • I think depending on how I write the code, this will need to be enforced as there will only be 1 physical robot.

So… how do I make a new environment?

  • I ended up using a lot of the framework from isaacgym. It was easy to modify the classes and try and keep as many things as similar as possible to facilitate loading models trained with isaac… I’m still fighting the Jetson to run RL_Games

I broke something… starting a fresh Jetpack 4.6 SD card

This is going to be setup documentation

  • Getting pytorch installed (method from above)

  • Getting dynamixel servo control code working (forgot to back this up :( )

  • Installing rl_games

    • This was a pain. I tried many methods to get the dependency ‘Ray’ working but I couldn’t build this software on the Jetson. After digging through the rl_games code I realized this dependency wasn’t very important to the work I was doing.

    • The only modification needed was removing Ray references from the vecenv class. I made a new file called vecenv_mod, removed the ray dependency from install.py, and ran python setup.py install —user to install the modified rl_games.

    • I’ve successfully loaded the trained model and was able to feed the NN fake observation data to get actions.

  • I need to finish the HER implementation, get a better model for the RL_IK I was doing above, and then try running it on hardware. I feel like I’m pretty close