Running our application

For now, let's start using our application with our default parameters. Simply click on the Start button and the learning will commence. Once this is complete, you will be able to click on the Show Solution button, and the learned path will be animated from start to finish.

Clicking on Start will begin the learning stage and continue until the black object reaches its goal:

Here you will see that as the learning progresses, we are sending the output to ReflectInsight to help us see and learn what the algorithm is doing internally. You see that for each iteration, different object positions are being evaluated, and so are their actions and rewards:

Once the learning is complete, we can click on the Show Solution button to replay the final solution. When complete, the black object will sit atop the red object:

Now let's look at the code from our application. There are two methods of learning that we highlighted previously. Here's how Q-learning looks:

int iteration = 0;
             TabuSearchExploration tabuPolicy = (TabuSearchExploration)qLearning.ExplorationPolicy;
             EpsilonGreedyExploration explorationPolicy = (EpsilonGreedyExploration)tabuPolicy.BasePolicy;
 
             while ((!needToStop) && (iteration < learningIterations))
             {
                 explorationPolicy.Epsilon = explorationRate - ((double)iteration / learningIterations) * explorationRate;
                 qLearning.LearningRate = learningRate - ((double)iteration / learningIterations) * learningRate;
                 tabuPolicy.ResetTabuList();
 
                 var agentCurrentX = agentStartX;
                 var agentCurrentY = agentStartY;
 
                 int steps = 0;
                 while ((!needToStop) && ((agentCurrentX != agentStopX) || (agentCurrentY != agentStopY)))
                 {
                     steps++;
                     int currentState = GetStateNumber(agentCurrentX, agentCurrentY);
                     int action = qLearning.GetAction(currentState);
                     double reward = UpdateAgentPosition(ref agentCurrentX, ref agentCurrentY, action);
                     int nextState = GetStateNumber(agentCurrentX, agentCurrentY);
 
                     // do learning of the agent - update his Q-function, set Tabu action
                     qLearning.UpdateState(currentState, action, reward, nextState);
                     tabuPolicy.SetTabuAction((action + 2) % 4, 1);
                 }
 
                 System.Diagnostics.Debug.WriteLine(steps);
                 iteration++;
 
                 SetText(iterationBox, iteration.ToString());
             }

How does SARSA learning differ? Let's take a look at the while loop of SARSA learning and understand:

int iteration = 0;
             TabuSearchExploration tabuPolicy = (TabuSearchExploration)sarsa.ExplorationPolicy;
             EpsilonGreedyExploration explorationPolicy = (EpsilonGreedyExploration)tabuPolicy.BasePolicy;
 
             while ((!needToStop) && (iteration < learningIterations))
             {
                 explorationPolicy.Epsilon = explorationRate - ((double)iteration / learningIterations) * explorationRate;
                 sarsa.LearningRate = learningRate - ((double)iteration / learningIterations) * learningRate;
                 tabuPolicy.ResetTabuList();
 
                 var agentCurrentX = agentStartX;
                 var agentCurrentY = agentStartY;
                 int steps = 1;
                 int previousState = GetStateNumber(agentCurrentX, agentCurrentY);
                 int previousAction = sarsa.GetAction(previousState);
                 double reward = UpdateAgentPosition(ref agentCurrentX, ref agentCurrentY, previousAction);
 
                 while ((!needToStop) && ((agentCurrentX != agentStopX) || (agentCurrentY != agentStopY)))
                 {
                     steps++;
 
                     tabuPolicy.SetTabuAction((previousAction + 2) % 4, 1);
                     int nextState = GetStateNumber(agentCurrentX, agentCurrentY);
                     int nextAction = sarsa.GetAction(nextState);
                     sarsa.UpdateState(previousState, previousAction, reward, nextState, nextAction);
                     reward = UpdateAgentPosition(ref agentCurrentX, ref agentCurrentY, nextAction);
                     previousState = nextState;
                     previousAction = nextAction;
                 }
 
                 if (!needToStop)
                 {
                     sarsa.UpdateState(previousState, previousAction, reward);
                 }
 
                 System.Diagnostics.Debug.WriteLine(steps);
 
                 iteration++;
 
                 SetText(iterationBox, iteration.ToString());
             }

Our last step is to see how we can animate the solution. This will be needed for us to see that our algorithm achieved its goal. Here is the code:

TabuSearchExploration tabuPolicy;
 
             if (qLearning != null)
                 tabuPolicy = (TabuSearchExploration)qLearning.ExplorationPolicy;
             else if (sarsa != null)
                 tabuPolicy = (TabuSearchExploration)sarsa.ExplorationPolicy;
             else
                 throw new Exception();
 
             var explorationPolicy = (EpsilonGreedyExploration)tabuPolicy?.BasePolicy;
             explorationPolicy.Epsilon = 0;
             tabuPolicy?.ResetTabuList();
             int agentCurrentX = agentStartX, agentCurrentY = agentStartY;
             Array.Copy(map, mapToDisplay, mapWidth * mapHeight);
             mapToDisplay[agentStartY, agentStartX] = 2;
             mapToDisplay[agentStopY, agentStopX] = 3;

And here is our while loop where all the magic happens!

while (!needToStop)
             {
                 cellWorld.Map = mapToDisplay;
                 Thread.Sleep(200);
 
                 if ((agentCurrentX == agentStopX) && (agentCurrentY == agentStopY))
                 {
                     mapToDisplay[agentStartY, agentStartX] = 2;
                     mapToDisplay[agentStopY, agentStopX] = 3;
                     agentCurrentX = agentStartX;
                     agentCurrentY = agentStartY;
                     cellWorld.Map = mapToDisplay;
                     Thread.Sleep(200);
                 }
 
                 mapToDisplay[agentCurrentY, agentCurrentX] = 0;
                 int currentState = GetStateNumber(agentCurrentX, agentCurrentY);
                 int action = qLearning?.GetAction(currentState) ?? sarsa.GetAction(currentState);
                 UpdateAgentPosition(ref agentCurrentX, ref agentCurrentY, action);
                 mapToDisplay[agentCurrentY, agentCurrentX] = 2;
             }

Let's break this down into more digestible sections. The first thing that we do is establish our tabu policy. If you are not familiar with tabu searching, note that it is designed to enhance the performance of a local search by relaxing its rule. At each step, sometimes worsening a move is acceptable if there are no alternatives (moves with reward).

Additionally, prohibitions (tabu) are put in place to ensure that the algorithm does not return to the previously visited solution.

            TabuSearchExploration tabuPolicy;
 
             if (qLearning != null)
                 tabuPolicy = (TabuSearchExploration)qLearning.ExplorationPolicy;
             else if (sarsa != null)
                 tabuPolicy = (TabuSearchExploration)sarsa.ExplorationPolicy;
             else
                 throw new Exception();
 
             var explorationPolicy = (EpsilonGreedyExploration)tabuPolicy?.BasePolicy;
             explorationPolicy.Epsilon = 0;
             tabuPolicy?.ResetTabuList();

Next, we have to position our agent and prepare the map.

Here is our main execution loop, which will show the animated solution:

while (!needToStop)
             {
                 cellWorld.Map = mapToDisplay;
                 Thread.Sleep(200);
 
                 if ((agentCurrentX == agentStopX) && (agentCurrentY == agentStopY))
                 {
                     mapToDisplay[agentStartY, agentStartX] = 2;
                     mapToDisplay[agentStopY, agentStopX] = 3;
                     agentCurrentX = agentStartX;
                     agentCurrentY = agentStartY;
                     cellWorld.Map = mapToDisplay;
                     Thread.Sleep(200);
                 }
 
                 mapToDisplay[agentCurrentY, agentCurrentX] = 0;
                 int currentState = GetStateNumber(agentCurrentX, agentCurrentY);
                 int action = qLearning?.GetAction(currentState) ?? sarsa.GetAction(currentState);
                 UpdateAgentPosition(ref agentCurrentX, ref agentCurrentY, action);
                 mapToDisplay[agentCurrentY, agentCurrentX] = 2;
             }

Table of Contents for Running our application

Create new playlist

Sign In

Sign Up

Table of Contents for
Running our application