Work in Progress
Introduction
This section will cover the work of moving to the previous work over to the Jetson. To start, we are going to need to modify the observations to something that we can actually measure. An IMU is probably the best stand in for the state information used previously. I’ll need to develop an IMU model, work on getting the hardware together and figure out how deployment will work. I think I may start with a simple demo project where the angle of a single leg will be controlled with a policy learned in Isaac Gym. If I can get that working I believe rest of this project will be an engineering exercise.
Testing Policy Deployment
For this test I will use a single leg and a target end effector position. In the simulator ,a policy will be learned that has the following observations :
Joint DOF Positions
Joint DOF Velocity
Target End Position
The reward will be given for minimizing the distance between the foot position and the target position. When deployed I should be able to drive the leg to desired positions by supplying the target coordinates manually.
Modifying the Walkbot Code
I’ve removed all the left leg components from the URDF for this test. It starting to look like what I have on my workbench.
I’ll start by copying the walking training files and renaming them for this new project. I’m just calling it Walkbot_IKRL. The expect the end result to look like the a solution one would get with inverse kinematics.
Most of the training parameters can stay the same. The changes are :
Observation
Reduced to 12 observations:
Joint Angles (3), Joint Velocities (3), Goal End Position (3), Actions (3)
Actions
Reduced to 3 actions since we are only controlling one leg right now. Actions will be target positions for now
Rewards :
Reward will be something like 1-norm(end_pos - goal_pos)^2
I’m using -norm(end_pos - goal_pos)^2, No need for the positive portion of the reward
I added a -1 penalty when norm(dof_vel) > 20 to force it to regulate it’s speed. In the future I’ll probably want to add something more intelligent than this but for now it works
I also give +1 reward for reaching the goal
Reaching the goal is reducing the distance to goal below the goal threshold
The setup was promoting behaviors i didn’t like. The goal was reset and points were awarded for reaching the destination position. This promoted very rapid movements between positions. I decided that the goal should be moving constantly and points would be award for staying within the vicinity of the goal.
I added penalties for
Large action values
actions_cost_scale * torch.sum(actions**2, dim=-1)
Large energy values (very similar to the action penalty)
energy_cost_scale * torch.sum(torch.abs(actions * dof_vel), dim=-1)
DOF at limit of travel penalty
joints_at_limit_cost_scale * sum( (torch.abs(obs_buf[:, 0:3]) - 0.98) / 0.02 * (torch.abs(obs_buf[:, 0:3]) > 0.98))
Asset Options
asset_options.fix_base_link = True
baseInitState (in the yaml config file)
rot: [0.7071, 0.0, 0.0, 0.7071] #So it looks like what is oriented like it is on my workbench
I needed to do a transform from the 3rd servo joint to the tip of the foot
Turn the Quaternions into rotation matrices,
Rotate a vector that is the length from the servo2 origin to the tip of the foot
Add the servo2 origin to the result
This is also a good time to get the torque control mode working. The simplified environment might help me debug the issues I was having during the WalkBot training
Some Results
Early on it seems to produce an ok policy but as training continues it seems to decay and I’m not sure why. It seems to be learning behaviors that are against the rewards metrics. It is also very likely to learn behaviors where actions fluctuate between extreme values, causing twitchy, jerky motions that I wouldn’t want in the real model. I’ve tried curbing this with punishments scaled by actions, punishments scaled by dof_velocity and punishments scaled by power usage. None of these seem to help at this point.
The fluctuations seem to happen when the goal is outside of the reach of the end effector. When it is in range it seems fine
The goal position observation wasn’t correctly constrained to [-1, 1]
Interesting/Unexpected Behaviors
Early on I forgot to set the collision filters correctly so the goal object could interacting with the servo leg. Because they could interact, it was learned that a high scoring policy was to capture the goal to prevent it from drifting away.
Collision filters were changed so environment interactions wouldn’t occur
After removing collisions I noticed another strange behavior. As the goal drifted away, the leg would retreat in the opposite direction. This would reduce the rewards initially as distance to the goal was increasing. HOWEVER: I had set it so if the goal is a certain distance from the foot, it would receive a velocity vector in the direction of the table. By moving in the opposite direction it was forcing the goal to move back into range sooner than it would have otherwise.
This metric was switch to reference a ‘home position’ that the agent could not manipulate
The agent should not have any ability to manipulate the goal but it keeps discovering things I had not thought about. It was able to move the end point outside the envelope the goal was allowed to be in. In some situations this could force the goal to get locked to the foot since my logic tried to push the ball towards the foot whenever it was getting too far from the platform. These are the smartest stupid machines I’ve made :-p
The ‘kick’ logic now moves the ball towards the home point and not the end effector.
Installing Isaac Gym on the Jetson
Wish me Luck!
Check JetPack version : sudo apt-cache show nvidia-jetpack
Currently on version 4.6-b199
Download the Isaac Gym files from : https://developer.nvidia.com/isaac-gym
first install attempted failed, couldn’t find skbuild
pip3 install scikit-build
second attempt failed, couldn’t find torch>=1.8.0
There are some directions on how to get torch on the jetson here : https://forums.developer.nvidia.com/t/pytorch-for-jetson-version-1-10-now-available/72048
This took more than 45 minutes to complete, probably could have skipped the torchvision part which comprised the majority of this time
Jetson froze while testing immediately after the installed completed
It appears to working, although creating tensors caused the system to freeze up for a minute or two…. not sure if this will end up being an issue later on….
Still can’t find torch
Upgrade pip : pip install -U pip
INSTALLED SUCCESSFUL!!!
Ok now what? No way this will run simulation / training… so how do I test this? I guess lets try running the joint_monkey.py example and see what happens
That didn’t work… I’m trying to figure out the path forward now…
I need to get the Torch model running on the Jetson. Issac Gym really isn’t necessary for that since I’ll be getting my observations from the robot’s sensors anyways… I’ll need to do some research on how to leverage the network I made on the computer here. I’ll be back…
Update :
I’m currently pulling the thread of the RL_Games library. I think a better understanding of how this framework is set up will help me get this network running on hardware. I definitely don’t need the Isaac gym environment on the Jetson…
Can I make a stand alone python project that loads the network trained using Isaac Gym?
If I can do this, and feed it some generated data, I think this is all that needs to run on the Jetson.
If I can’t do the above, I need to find a way to just create the network and load the weights from training. Since Torch is used as the ML backend, I know this is possible. I’d need to mess around with the code of RL_Games with either path.
I think I want to try the HER (Hindsight Experience Replay) technique anyways, so I’m going to need to learn how this code is structured to modify the algorithms to implement this
RL Games
RL games is the RL framework Nvidia has chosen for Isaac Gym. The code seems very well implemented but documentation is a bit lacking. :(
Notes :
It seems the code is kicked off by a script called runner.py
runner.py loads the config yaml file
it instantiates a Runner object from rl_games.torch_runner
the runner loads the config file
the runner runs
the runner is wrapped by ray (used for code parallelization)
next important thing seems like torch_runner.py
Contains the algo_factory, player_factory and algo_observer
Had Methods :
load_config
loads the config file…
load
calls load_config.
saved the default config. I assume so we can change settings and restore them or something… not sure
run_train
agent is created
agent is restored from a checkpoint (if provided)
agent.train() is called
run_play
player is created with self.create_player()
create_player
self.player_factory.create(…)
idk why this needed its own function…
reset
pass
run
calls run_train or run_play
I think the player_factory might be the direction I need to go to run the policy on hardware
registers players.PpoPlayerContinuous(**kwargs)
Builds the network!!!
gets actions from observations
can load the model from checkpoints
Derived from BasePlayer
Creates the Environment
This is where the run() function is defined
I think the algo_factory is where I’ll find some more threads to pull to impliment HER
registers a2c_continuous.A2CAgent(**kwargs)
Similar to player except with the learning algorithm functions
Updates Epoch
Calc Gradients
Ect …
Derived from a2c_common.ContinuousA2CBase)
Here is where the experience buffer is instantiated!
ERB is updated in play_steps(self)
Contains the Vec_Env object
config = yaml.safe_load(stream) runner = Runner() try: runner.load(config) except yaml.YAMLError as exc: print(exc) runner.run(args)
def load(self, yaml_conf): self.default_config = yaml_conf['params'] self.load_config(params=copy.deepcopy(self.default_config)) def run_train(self, load_path=None): print('Started to train') agent = self.algo_factory.create(self.algo_name, base_name='run', params=self.params) if load_path is not None: agent.restore(load_path) agent.train() def run_play(self, load_path=None): print('Started to play') player = self.create_player() if load_path is not None: player.restore(load_path) player.run()
Checking in :
The path to run this on the Jetson involves
Making a new environment specifically for the Jetson configuration
This should derive from the configuration files used for training so they have identical parameters
Load the model and run with num_envs = 1
I think depending on how I write the code, this will need to be enforced as there will only be 1 physical robot.
So… how do I make a new environment?
I ended up using a lot of the framework from isaacgym. It was easy to modify the classes and try and keep as many things as similar as possible to facilitate loading models trained with isaac… I’m still fighting the Jetson to run RL_Games
I broke something… starting a fresh Jetpack 4.6 SD card
This is going to be setup documentation
Getting pytorch installed (method from above)
Getting dynamixel servo control code working (forgot to back this up :( )
Installing rl_games
This was a pain. I tried many methods to get the dependency ‘Ray’ working but I couldn’t build this software on the Jetson. After digging through the rl_games code I realized this dependency wasn’t very important to the work I was doing.
The only modification needed was removing Ray references from the vecenv class. I made a new file called vecenv_mod, removed the ray dependency from install.py, and ran python setup.py install —user to install the modified rl_games.
I’ve successfully loaded the trained model and was able to feed the NN fake observation data to get actions.
I need to finish the HER implementation, get a better model for the RL_IK I was doing above, and then try running it on hardware. I feel like I’m pretty close