Deep Reinforcement Learning for Unreal

Coming soon...

The purpose of the plugin is to create your own gym environment within Unreal and connect it to your Reinforcement Learning (RL) algorithm thanks to a TCP connection. This way, one can easily use a framework like Stable Baselines to train an AI to control a pawn in the Unreal Engine.

In order for your customized gym environment defined in Unreal to communicate with your RL algorithm, a "Bridge" environment is needed, this environment’s sole purpose is to redirect information between Unreal and your algorithm.

Therefore, you will be able to define your action space and observation space within the Unreal editor (only Discrete and Box are supported for now, note: shape is not supported within Box space, arrays must be reshaped manually.

With Deep Reinforcement Learning for Unreal, the Environment can easily be designed withing the Unreal Engine. It can even be an existing level.

Once set up, an external agent can automatically learn  a policy using the Gym interface for Unreal Engine provided with the plugin.

It is an easy, fast and efficient way to design AI for your games.

The plugin is extremely powerful, but it requires a bit of reading/training before benefiting from its full potential. This documentation is divided in three main parts:

It's strongly advised to read the whole documentation before using the plugin.

Summary

General explanations

Here are how things work :

More specifically, the Environment part of Unreal Engine is split into 3 categories:

Here are the abstract class responsibilities:


Here is the pseudocode of the step function:


If(AIInput.Reset)                Reset()                EnvironmentState.State = GetState()                SendInfoToPython(EnvironmentState)                ReturnPerformAction()EnvironmentState.State = GetState()EnvironmentState.Reward = GetReward()EnvironmentState.bIsDone = IsDone()SendInfoToPython(EnvironmentState)Return

Thus, you only have to define the functions Reset, GetState, GetReward, IsDone and PerformAction within your customized environment. Using inheritance, the step function defined in the abstract class will handle the rest.

Here are the responsibilities of each of these functions :


To implement these functions, two properties are given in the abstract class: AIInput and EnvironmentState. These properties are instances of two different structures which will represent the action decided by Python for the former, and the information about the current environment state for the latter.


About AIInput:

It contains an "Action" field which is an array of float defining the actions given by Python. Be careful about the fact that it must be an array, which means that if your action is a single value, you will still have to access it through an array.

About EnvironmentState:

You must fill it inside the implementation of GetState, GetReward and IsDone. Its fields are "State", "IsDone" and "Reward". Don’t forget to fill these fields in the appropriate functions.

How to set up a scene

Once the customized environment is configured, you have to configure your scene:

You can create your environment either entirely in C++ or using blueprint. In either way, it must be a class inheriting from EnvironmentAbstract.

Then, assign the AIServer reference to your environment (as well as the references which you defined and are dependent of your project)

Server configuration

You can use the provided widget to have the tools allowing you to manage the server creation and connection, or you can implement your own system. In either way, you simply have to use the "CreateServerAndWaitForConnection" function.

This function takes an IP address, a port and a timeout value as parameters. The timeout indicates the time for which the Unreal server will wait for a connection with Python to happen. When calling this function, you have to manually start your Python script, or you can call the "StartPythonProcess" blueprint node to start it automatically (however, you won’t have access to the logs of your Python script in that case).

Python configuration

You have only 2 things to configure on this side:

Run the game with the learning algorithm

Once everything is configured, just start your game, initiates a connection with a call to "CreateServerAndWaitForConnection" whenever you want and start your Python script. It should work, and you will see your agent starting to learn.

Example of the resolution of a problem using blueprints

Here is the problem we want to solve:

A car is crossing an intersection at the same time as a pedestrian. The car must not collide with the pedestrian while crossing this intersection.

To solve this example, we first have to model this problem on the Unreal Engine. This was done using two pawns, one for the car and the other one for the pedestrian. The agent of the environment is the car.

Environment creation

Once this is done, we have to create a class deriving from EnvironmentAbstract:

We can then configure our environment, which implies:

Here's how to create the spaces:

The different information that the agent can observe are the following:

The bounds which are decided are quite arbitrary, they depended on where we placed our pawns.

The different Actions are the following:

The AI will be able to decide how to accelerate and decelerate. This value will be added up to a base speed which the car has on start (300).

Both of those spaces need to be continuous over their interval, this is why we use a Box type for both.

Environment's functions implementations

Now that the spaces are defined, we can implement all the needed functions:

PerformAction: this method will be called at every step and will depend on what the AI sent. The information regarding the Action chosen by the AI is stored inside the AIInput property defined in the EnvironmentAbstract class. You have to use it in order to apply the action to your agent.

Remember that an Action is always defined by an array of float, therefore we have to access the first element of this array to get an action which is defined on a single value.

Reset: this method is called when the field « bReset » within the AIInput private property is set to true. It will reset the environment and prepare it to a new episode. In our case, we want the pawn and pedestrian to be placed back before the intersection. For a better learning, we also want this location to be randomized in order for our AI not to learn one specific case.

The Reset functions which are called reside inside the car and pedestrian’s blueprint. They simply do the job described earlier.

The functions GetState, GetReward and IsDone : These functions’ purpose is to provide the information which are normally returned by the step function inside a gym environment. That is « state, reward, done » . To do so, a property of type EnvironmentState is provided inside the EnvironmentAbstract class. This property contains these field. The fields State, bIsDone and reward must therefore be filled inside these functions, this is mandatory.

GetState implementation:

We query the X axis value of the pedestrian, the Y axis value of the car as well as its speed and add them up to the State array. Remember that the State array will always be an array of float, even if your observations are single valued.

IsDone implementation:

GetReward: 

This function seems complicated but this simply is because of the blueprint coding, it really just is a succession of if statements which will get diverse information regarding the environment and apply a reward to them.

Setting up the scene

We now have to place our actors in the scene. These actors are:

We can therefore create an AIServer blueprint and place an instance of it within the scene as well as an instance of our newly created environment’s blueprint.

Then, we need to give the required references to our customized environment, these references are:

Setting up the Python side

2 simple things are required:


The required files are provided with the plugin. The dependencies gym and stable-baselines3[extra] must be installed to make this example work.

Setting up the server

This part is very simple and fast. You can use the provided widget to start your server and stop it, or you can implement your own system using the StartServerAndWaitForConnection blueprint node. You don’t have to worry about stopping the server, this will be done when you will stop the game automatically (but you can stop it prematurely if you want to use the StopServer node).

This node is used to create the socket and initiate the TCP connection with Python. Once this function is called, you have to start your Python process either manually or using the StartPythonProcess node (however you won’t be able to access the Python logs with this method).

Now that everything is configured, if we start the game, click the « Create Server And Wait For Connection » button on the UI and start our Python process, a new episode will start and the learning will begin.

How to create your environment using C++

It’s exactly the same procedure, just create a class deriving from EnvironmentAbstract instead of a blueprint and implement PerformAction, Reset, GetState, GetReward and IsDone inside it.