<< JointMeta << Science << Machine Learning

>> (Experiment) Designing a Generic Unsupervised Learning Component: Knowledge dll

Index
 


>> 2. Knowledge dll System

OVERALL DESIGN

With simplicity in mind, the Knowledge dll is the generic AI component designed for this project. It is based on two seemingly flawed premises. These premises were chosen because they would lead the project to bring about a greater understanding of AI. They are also easily to implement. These premises are:

  • Experimenting with random behavior will lead to understanding
    • Flawed, because apparent random behavior comes through a enormous collection of facts gathered by humans over a time of many years. In short, humans are far worse than computers are generating anything random
  • The best way to solve a problem is have some rigid rules to follow
    • Flawed because it does not lead to generalization, but workable when we have a very small dataset
However, these concepts have been implemented to create a simple learning system that is quite effective, as will be revealed.




The List Memory

The Knowledge dll works by being provided with a history of experience, and from that history, deriving what the best action to take to reach a "win state". Therefore it needs to be provided with information about the environment and opportunities to respond to this information with actions of its own.


Actions

This AI component does not learn from observation, but by experimentation. For example, if an operation is performed and that information is reported back to the AI as part of its observations, the AI cannot learn from that. Instead it has a limited number of actions that it can experiment with, and from the consequences of that experimentation and based on sensory data, it can derive sequences of response that will lead to win states.


Recognition of Win Situation

When the AI has won or lost (succeeded or failed), the software using the AI reports to the Knowledge dll whether it has won or lost. This becomes important as the list memory grows larger, as the system can base its plans for success, and guesses (when it has no plans) on its previous experience, i.e. where in the past it succeeded or failed.


Outline of Strategizing

A more detailed view of the exact algorithm used is presented below in the code design section.

All the strategizing and planning takes place when the system is asked to make a move. When the AI makes a move it follows these steps:

  1. It calculates the best plans (based on previous experience of wins and losses)
  2. It finds which plans are most appropriate for the current situation (based on the observations of the world the AI is applied to)
  3. It selects the most appropriate, most effective plan
  4. It picks the next action from that plan

If there is no data for the system to learn from, the AI picks a random choice and stores that choice in it's memory.


CODE DESIGN

The functions shown here are only those that contribute to the AI logic. All others are concerned with data structures and are not explained here.

The public functions that are concerned with AI, and are presented first are:

  • stateUpdate - the function to store any environmental data (the statement of the world) reported from the host application
  • effectRegistration - the function to inform the AI that a state affecting the AI, such as a win or loose has been reached
  • move - the function in which the AI is allowed to calculate a new move. It returns an action ID to be executed by the host application.
    • Calls pickBestFit function, which calls the similar function. Then nextAction is called the find the next action in the plan.

There are two major variables that are used repeatedly in these functions:

  • shortHistory - stores all actions and events that have happened in the game since the last win or lose (everything from the current game).
  • log - contains the list of all actions, events, and effects for the entire list memory.


Workings of the public Knowledge::stateUpdate function

Summary

The stateUpdate function is called to inform the AI that something has happened in the world it is interacting with. This could be an visible action, a movement, a location, or any information that the AI should base its understanding of the environment upon. Once this information arrives, the data is added to the list-memory


Workings of the public Knowledge::effectRegistration function

Summary

The effectRegistration function is called to inform the AI that a state affecting the AI, such as a win or loose has been reached. If the effect registered is -2, this indicates a win, a -1 indicates a loss. At this point, this information is not considered at all - it is simply added to the list memory.


Workings of the public Knowledge::move function

Summary

Calling the move function allows the AI to make a move, and it will return an action ID which is the action it wants to be executed.

Technique

 

      int noClearAction = 0;

      int response = -1;

      ostringstream ss;

 

Based on the environmental data collected since the AI last won or lost (shortHistory), the best plan to reach a win state is chosen.

 

            plan = pickBestFit(shortHistory);

 

If a plan wasn’t found, it is recorded as such, otherwise, the chosen plan is compared to the current history, the next step is chosen from the plan, which is then stored in the variable response.

 

      if (plan.length() == 0)

      {

            noClearAction = 1;

      }

      else

      {

            response = nextAction(shortHistory, plan);

      }

 

If no action was found, or no good step was found in the plan, we pick a random action. A potential addition here is the avoidance of plans that have failed before.

 

      if (noClearAction || response == -1)

      {

            response = actions[(rand() % actionTotal)];

            //response = checkNotPartOfFailedStratergy(response);

      }

     

The response is replaced with a random choice every 3/100 times. This is done to encourage the system to experiment with different ideas even when it has a solution. Ideally, this kind of thinking should occur based on environmental data alone, and if this project is extended, this will be one of the first parts of the algorithm to be altered.

 

      if ((rand() % 100) < 4) // && certainty == LOW

      {

            response = actions[(rand() % actionTotal)];

            //response = checkNotPartOfFailedStratergy(response);      

      }

 

A record is kept of the chosen response, and the action code is returned to the application using the AI.

 

      addItem(response);

      return response;

 

 



Workings of the Knowledge::pickBestFit function

Summary

The pickBestFit function takes the history since the last win or lose state (or game start), stored in shortHistory, and finds the best solution based on successful and failed plans already experienced.

Note: Obviously this function could be much faster if it simply maintained the data already collected in tree form.

Technique

                Received data

shortHistory – the actions and events occurring during the current game (since the last win or lose event).

 

This function begins by looping through the number of lines in the entire history (all games ever played) in the vector array log.

 

      int scount = 0;

 

      for (int i = 0; i < itemCount; i = i + 1)

      {

 

Are we analyzing a place in the log where the game was restarted? (-1 is the restart/lose code – the human player has won)

 

            if (log[i] == -1)

            {

As the game we are studying is a unsuccessful one (-1 indicates a loss) we score negatively against this and all strategies that we have already discovered, that are similar to this one.

 

                  noNewStrategy = 0;

 

Search through all the strategies that we have found so far.

 

                  for (int j = 0; j < scount; j = j + 1)

                  {

 

If we find a similar strategy, we subtract two[1] from its score. This means that if we have seen a strategy win and then loose, we make sure it is not used again. This is the adaptability functionality.

The logic here for finding a similar strategy is:

                (strategy under investigation) IS SUBSTRING (strategy in list) AND

NOT((strategy in list) IS SUBSTRING (strategy under investigation))

 

if (similar(trackStrategy,

workableStrategyDefns[j], 1, 0))

                        {

                              workableStrategyScore[j] =

workableStrategyScore[j] - 2;

                        }

 

We reset our memory of the current strategy, as we don’t need to remember failed strategies unless they (above) are failed versions of successful strategies.

 

                  trackStrategy = "";

 

            }

 

Have we reached a win state in the current strategy?

 

else if (log[i] == -2)

            {

 

As the strategy we have found is a successful one, we need to remember it. However, if we have already encountered the winning strategy then encountering it again increases that strategy’s value (as winning more than once is a great achievement)

 

                  for (int j = 0; j < scount; j = j + 1)

                  {

 

If we find a similar winning strategy, we add one to its score.

The logic here for finding a similar strategy is:

                (strategy under investigation) IS SUBSTRING (strategy in list) AND

NOT((strategy in list) IS SUBSTRING (strategy under investigation))

 

                        if (similar(trackStrategy,

workableStrategyDefns[j], 1, 0))

                        {

                              workableStrategyScore[j] =

workableStrategyScore[j] + 1;

                        }

                  }

 

The new strategy is added to the list even if a similar strategy is found. The reason for this is that the similar function finds similar strategies – if this strategy is more efficient, it should be kept even if it is similar to a winning strategy (it could be better).

 

                  workableStrategyDefns.push_back(trackStrategy);

                  workableStrategyScore.push_back(1);

                  scount = scount + 1;

 

We clear our memory of the current strategy.

                  trackStrategy = "";

            }

            else

            {

If the state is neither a win or loose, we simply add the data to the current study of the strategy (strings are used as a legacy of the Perl version of this software).

 

                  trackStrategy.append(intToString(log[i]));

                  trackStrategy.append("|");

            }

      }

 

Now that all the winning strategies have been found and have been scored, we need to sort them all.

 

sortedStrategies = workableStrategyScore;

      sort(sortedStrategies, sortedStrategiesIndex);

(pseudocode)

 

We search through the sorted strategies until we find one that’s suitable, then we return it.

 

      int score, index;

      for (int i = 0; i < scount; i = i + 1)

      {

            score = sortedStrategies[i];

            index = sortedStrategiesIndex[i];

           

Does the strategy work? More specifically, does it map to our current shortHistory? If so it’s usable, and we return it.

            if (works(shortHistory, workableStrategyDefns[index]) && score > 0)

            {

                  return workableStrategyDefns[index];

            }

      }

     

Otherwise we return the strategy as empty.

     

      return "";

 

 

Workings of the Knowledge::similar function

Summary

The similar function takes two strategies stored in strings s1 and s2, and then compares how different they are from one another. Cost1 indicates the price of a difference in string s1, and cost2 indicates the price of a difference in string s2.

Use of Function

This function is useful when comparing strategies (stored as lists), especially if we throw in random elements. This allows a strategy with a slightly different step to be counted the same as another strategy that we have some history on. It means we can take partial knowledge and assume it's the same as something we already know.

Technique

Received data

s1 – list of actions

s2 – list of actions

cost1 – the cost of a move in string s1

cost2 – the cost of a move in string s2

 

First, a reject difference is set: this is, the maximum difference between the two lists of actions before we reject them outright as being different:

     

      // a third of the length seems to be a good scale for difference

      int maxdiff = items / 3;

C++ code

 

Then, the function then enters a loop through the actions in list s1 and list s2. In each iteration of the loop the following logic is executed:

 

1.       Is the difference greater than the maximum acceptable difference? If so, the two lists are not similar

 

            if (difference > maxdiff)

            {

                  return 0;

            }

C++ code

 

2.       Are we at the end of either of the two lists and we have actually traversed something? (one list could be empty - the other might have lots of items in it) Then the two lists are similar.

 

// are we at the end of either of the lists?

if ((s1i >= items) || (s2i >= items2))

            {

                  if ((difference <= maxdiff) && !foundNothing)

                  {

                        // yes these two are similar

                        return 1;

                  }

                  // otherwise exit loop (the strings are not similar)

                  slide = 0;

            }

C++ code

 

3.       Are the two list items the same? Then the point of synchronization is  remembered, and we shift along in both lists

 

            if (s1L[s1i] == s2L[s2i])

            {

                  difference = difference + 0;

                  lastSync1 = s1i;

                  lastSync2 = s2i;

                  indexer = 0;

                  foundNothing = 0;

                       

                  s2i = s2i + 1;

                  s1i = s1i + 1;

            }

C++ code

 

4.       Are the two list items not the same? Then we shift one space along list two. If we’re shifting more than the maximum difference, or off the end of the list, we go back and start shifting along list 1. This is to allow desynchronized lists:

 



 

maxdiff = 6 / 3 = 2

Therefore, these two lists are similar (D on list 1 was skipped – if the cost1 is 1)

 

                  else

                  {

// we skip one on s2

                        s2i = s2i + 1;

                        indexer = indexer + 1;

                       

// If we’re finding no match on our sliding scale, then we backtrack and study comparison the other way.

                        if (indexer > maxdiff || indexer > items2)

                        {

                              s1i = lastSync1 + 1;

                              s2i = lastSync2;

                             

// not strictly true (not last sync) but stops infinite loops

                              lastSync1 = s1i;

                              lastSync2 = s2i;

                             

// we allow substrings (up to cost maxdiff)

                              if (!foundNothing)

                              {

                                    difference = difference + cost1;

                              }

                        }

                        else

                        {

                              // we allow substrings in

                              if (!foundNothing)

                              {

                                    difference = difference + cost2;

                              }

                        }

                  }

C++ code

 

 

Workings of the Knowledge::nextAction function

Summary

The nextAction function is provided with the history of actions in the current game, and the plan for the game. By comparing these two lists, it selects the next action to be performed.

Use of Function

This function is called from the move function to determine what step should be taken next.

Technique

Received data

shortHistory – the history of actions in the current game

plan – a tried and tested plan for the current game

 

Break apart the shortHistory and the plan

 

      StringTokenizer shSTOK = StringTokenizer(shortHistory, "|");

      StringTokenizer pSTOK = StringTokenizer(plan, "|");

     

      vector<int> sh = vectorize(shSTOK);

      vector<int> p = vectorize(pSTOK);

     

      int pval;

      int pi = 0;

      int shi = 0;

      int lastPSync = 0;

 

Go through the short history and the plan at the same pace, until we reach a point on the plan which hasn’t been executed.

     

      while (shi < (int)sh.size() && pi < ((int)p.size() + 5))

      {

            // this exists to allow us to overrun a bit

            if (pi < (int)p.size())

            {

                  pval = p[pi];

            }

            else

            {

                  pval = 0;

            }

 

If something matches on both the short history and the plan we move forward one space along each one. We also remember that this was a point of synchronization.

 

            if (sh[shi] == pval)

            {

                  lastPSync = pi;

                  pi = pi + 1;

                  shi = shi + 1;

            }

            else

            {

 

If something doesn’t match on either one, we skip forward on the plan.

 

                  pi = pi + 1;

 

If we have not matched and we are already two spaces off from the existing plan, we sync back to the plan.

 

// we map back a bit   

                  if ((pi - shi) > 2)

                  {

                        pi = pi + 1;

                        // not really a sync but good enough

                        lastPSync = pi;

                  }

            }

      }

The synchronization synchronizes all elements, we are only interested in the next action we can perform. Therefore, we move forward from the last sync point and find the next action.

 

      for (int m = lastPSync; m < (int)p.size(); m = m + 1)

      {

Check if the plan element is an action, if so, return it

     

            if (isAction(p[m]))

            {

                  return p[m];

            }

      }

     

Otherwise we return -1 which means there’s a problem.

 

      return -1;



>> Next

[1] A bird in hand is worth two in the bush (Anon)

 
 

Email

Creative Commons License
This work is licensed under a Creative Commons License.