Deep Reinforcement Learning for Unreal
The purpose of the plugin is to create your own gym environment within Unreal and connect it to your Reinforcement Learning (RL) algorithm thanks to a TCP connection. This way, one can easily use a framework like Stable Baselines to train an AI to control a pawn in the Unreal Engine.
In order for your customized gym environment defined in Unreal to communicate with your RL algorithm, a "Bridge" environment is needed, this environment’s sole purpose is to redirect information between Unreal and your algorithm.
Therefore, you will be able to define your action space and observation space within the Unreal editor (only Discrete and Box are supported for now, note: shape is not supported within Box space, arrays must be reshaped manually.
With Deep Reinforcement Learning for Unreal, the Environment can easily be designed withing the Unreal Engine. It can even be an existing level.
Once set up, an external agent can automatically learn a policy using the Gym interface for Unreal Engine provided with the plugin.
It is an easy, fast and efficient way to design AI for your games.
The plugin is extremely powerful, but it requires a bit of reading/training before benefiting from its full potential. This documentation is divided in three main parts:
General explanations about the principles behind the plugin
The general documentation on how to create an environment (Unreal Engine side and DRL - Python - side)
A tutorial showing the application of the plugin on a concrete use-case
It's strongly advised to read the whole documentation before using the plugin.
Here are how things work :
More specifically, the Environment part of Unreal Engine is split into 3 categories:
The Interface part, which implements the core functions common to any environment (not necessarily Gym environments)
The abstract class part : this part is the most important, it will basically handle everything for you in order to let you only define what’s important
The customized part : this is where you will implement the logic behind your learning
Thus, you only have to define the functions Reset, GetState, GetReward, IsDone and PerformAction within your customized environment. Using inheritance, the step function defined in the abstract class will handle the rest.
Here are the responsibilities of each of these functions :
GetState: Gives the state of the environment when called (this is what will be returned as the current state in the bridge environment’s step function). The only format supported right now is a 1 dimensional array, it therefore returns a 1 dimensional float array. This is also the reason why you can’t provide a matrix inside a Box space for now.
IsDone: Indicates if the currentState is an end of episode (this is what will be returned to indicate the end of an episode in the bridge environment’s step function)
GetReward: Indicates the reward associated to the current state (this is what will be returned as the reward in the bridge environment’s step function)
Reset: Resets the environment and prepare it to start a new episode
PerformAction: Performs the action decided by the AI on the environment (move to the left, shoot, stop moving…)
It contains an "Action" field which is an array of float defining the actions given by Python. Be careful about the fact that it must be an array, which means that if your action is a single value, you will still have to access it through an array.
You must fill it inside the implementation of GetState, GetReward and IsDone. Its fields are "State", "IsDone" and "Reward". Don’t forget to fill these fields in the appropriate functions.
How to set up a scene
Once the customized environment is configured, you have to configure your scene:
Add an instance of your customized environment class inside the scene
Add an instance of the AIServer class inside the scene
You can create your environment either entirely in C++ or using blueprint. In either way, it must be a class inheriting from EnvironmentAbstract.
Then, assign the AIServer reference to your environment (as well as the references which you defined and are dependent of your project)
You can use the provided widget to have the tools allowing you to manage the server creation and connection, or you can implement your own system. In either way, you simply have to use the "CreateServerAndWaitForConnection" function.
This function takes an IP address, a port and a timeout value as parameters. The timeout indicates the time for which the Unreal server will wait for a connection with Python to happen. When calling this function, you have to manually start your Python script, or you can call the "StartPythonProcess" blueprint node to start it automatically (however, you won’t have access to the logs of your Python script in that case).
You have only 2 things to configure on this side:
Run the game with the learning algorithm
Once everything is configured, just start your game, initiates a connection with a call to "CreateServerAndWaitForConnection" whenever you want and start your Python script. It should work, and you will see your agent starting to learn.
Example of the resolution of a problem using blueprints
Here is the problem we want to solve:
A car is crossing an intersection at the same time as a pedestrian. The car must not collide with the pedestrian while crossing this intersection.
To solve this example, we first have to model this problem on the Unreal Engine. This was done using two pawns, one for the car and the other one for the pedestrian. The agent of the environment is the car.
Once this is done, we have to create a class deriving from EnvironmentAbstract:
We can then configure our environment, which implies:
Creation of the action space and the observation space
Implementation of the GetState, GetReward, IsDone, PerformAction and Reset functions
Here's how to create the spaces:
The different information that the agent can observe are the following:
The pedestrian location on the X-axis
The bounds which are decided are quite arbitrary, they depended on where we placed our pawns.
The different Actions are the following:
The AI will be able to decide how to accelerate and decelerate. This value will be added up to a base speed which the car has on start (300).
Both of those spaces need to be continuous over their interval, this is why we use a Box type for both.
Environment's functions implementations
Now that the spaces are defined, we can implement all the needed functions:
PerformAction: this method will be called at every step and will depend on what the AI sent. The information regarding the Action chosen by the AI is stored inside the AIInput property defined in the EnvironmentAbstract class. You have to use it in order to apply the action to your agent.
Remember that an Action is always defined by an array of float, therefore we have to access the first element of this array to get an action which is defined on a single value.
Reset: this method is called when the field « bReset » within the AIInput private property is set to true. It will reset the environment and prepare it to a new episode. In our case, we want the pawn and pedestrian to be placed back before the intersection. For a better learning, we also want this location to be randomized in order for our AI not to learn one specific case.
The Reset functions which are called reside inside the car and pedestrian’s blueprint. They simply do the job described earlier.
The functions GetState, GetReward and IsDone : These functions’ purpose is to provide the information which are normally returned by the step function inside a gym environment. That is « state, reward, done » . To do so, a property of type EnvironmentState is provided inside the EnvironmentAbstract class. This property contains these field. The fields State, bIsDone and reward must therefore be filled inside these functions, this is mandatory.
We query the X axis value of the pedestrian, the Y axis value of the car as well as its speed and add them up to the State array. Remember that the State array will always be an array of float, even if your observations are single valued.
This function seems complicated but this simply is because of the blueprint coding, it really just is a succession of if statements which will get diverse information regarding the environment and apply a reward to them.
Setting up the scene
We now have to place our actors in the scene. These actors are:
An instance of the AIServer class
An instance of the created EnvironmentAbstract derived blueprint
We can therefore create an AIServer blueprint and place an instance of it within the scene as well as an instance of our newly created environment’s blueprint.
Then, we need to give the required references to our customized environment, these references are:
A car pawn reference
A pedestrian pawn reference
An AIServer actor reference
Setting up the Python side
2 simple things are required:
Setting up the server
This part is very simple and fast. You can use the provided widget to start your server and stop it, or you can implement your own system using the StartServerAndWaitForConnection blueprint node. You don’t have to worry about stopping the server, this will be done when you will stop the game automatically (but you can stop it prematurely if you want to use the StopServer node).
This node is used to create the socket and initiate the TCP connection with Python. Once this function is called, you have to start your Python process either manually or using the StartPythonProcess node (however you won’t be able to access the Python logs with this method).
Now that everything is configured, if we start the game, click the « Create Server And Wait For Connection » button on the UI and start our Python process, a new episode will start and the learning will begin.
How to create your environment using C++
It’s exactly the same procedure, just create a class deriving from EnvironmentAbstract instead of a blueprint and implement PerformAction, Reset, GetState, GetReward and IsDone inside it.