ML Agents is a Unity software add on that enables Unity to be used as a simulation environment in reinforcement learning problems. This is a powerful tool that allows for quick physical environment setup and agent training with very minimal work.
Currently the only draw back is the accuracy of the physics simulation which can be offloaded to external physics engines to calculate more physical phenomena.
My exploration in ML-Agents has lead me to create a simple drone simulation. Using the PPO algorithm, the agent was able to learn how to stabilize and fly a drone within a matter of hours.
I’m currently looking into how to improve the physical simulation of the environment for drone related work. I’ve run across a github repository that seems to be doing just that : https://github.com/uzh-rpg/flightmare . Additionally I’ve been looking into more mainstream simulated environments like PyBullet Physics and MoJoCo.
Observations :
Initial reward shaping for control was based on the deviation of the drones vertical vector to the words vertical vector. As it was hard to learn the correct control policy, the algorithm learned that the best way to maximize score was to die as fast as possible
I tried offsetting this by creating a reward for simply staying alive, a universal basic reward, This helped the drones learn to survive longer but probably wasn’t the best reward to add as they tended to not learn how to stay upright but instead would spin in tight circles to stay alive the longest.
I finally clipped the negative portion of the orientation reward and removed the ‘stay alive’ reward and was able to get a policy that kept the drones upright.
To facilitate learning to move, targets were randomly generated. The agent was given a speed and the location of the target. A reward was granted to maintaining the appropriate velocity vector and for reaching the target goal.
The method in bullet (4) allowed for the agent to learn how to stay upright while maintaining arbitrary velocity vectors. This policy was the basis for the robot control that would then be extended when learning higher level functions like navigation