|
>> (Experiment) Designing a Generic Unsupervised Learning Component: Knowledge dll Index >> 2. Knowledge dll System OVERALL DESIGN With simplicity in mind, the Knowledge dll is the generic AI component designed for this project. It is based on two seemingly flawed premises. These premises were chosen because they would lead the project to bring about a greater understanding of AI. They are also easily to implement. These premises are:
The List Memory The Knowledge dll works by being provided with a history of experience, and from that history, deriving what the best action to take to reach a "win state". Therefore it needs to be provided with information about the environment and opportunities to respond to this information with actions of its own. Actions This AI component does not learn from observation, but by experimentation. For example, if an operation is performed and that information is reported back to the AI as part of its observations, the AI cannot learn from that. Instead it has a limited number of actions that it can experiment with, and from the consequences of that experimentation and based on sensory data, it can derive sequences of response that will lead to win states. Recognition of Win Situation When the AI has won or lost (succeeded or failed), the software using the AI reports to the Knowledge dll whether it has won or lost. This becomes important as the list memory grows larger, as the system can base its plans for success, and guesses (when it has no plans) on its previous experience, i.e. where in the past it succeeded or failed. Outline of Strategizing A more detailed view of the exact algorithm used is presented below in the code design section. All the strategizing and planning takes place when the system is asked to make a move. When the AI makes a move it follows these steps:
If there is no data for the system to learn from, the AI picks a random choice and stores that choice in it's memory. CODE DESIGN The functions shown here are only those that contribute to the AI logic. All others are concerned with data structures and are not explained here. The public functions that are concerned with AI, and are presented first are:
There are two major variables that are used repeatedly in these functions:
Workings of the public Knowledge::stateUpdate function Summary The stateUpdate function is called to inform the AI that something has happened in the world it is interacting with. This could be an visible action, a movement, a location, or any information that the AI should base its understanding of the environment upon. Once this information arrives, the data is added to the list-memory Workings of the public Knowledge::effectRegistration function Summary The effectRegistration function is called to inform the AI that a state affecting the AI, such as a win or loose has been reached. If the effect registered is -2, this indicates a win, a -1 indicates a loss. At this point, this information is not considered at all - it is simply added to the list memory. Workings of the public Knowledge::move function Summary Calling the move function allows the AI to make a move, and it will return an action ID which is the action it wants to be executed.
Technique int
noClearAction = 0; int
response = -1; ostringstream
ss; Based
on the environmental data collected since the AI last won or lost (shortHistory), the best plan to reach a win state is
chosen. plan = pickBestFit(shortHistory); If
a plan wasn’t found, it is recorded as such, otherwise, the chosen plan is
compared to the current history, the next step is chosen from the plan, which
is then stored in the variable response. if
(plan.length() == 0) { noClearAction = 1; } else { response = nextAction(shortHistory, plan); } If
no action was found, or no good step was found in the plan, we pick a random
action. A potential addition here is the avoidance of plans that have failed
before. if
(noClearAction || response == -1) { response =
actions[(rand() % actionTotal)]; //response
= checkNotPartOfFailedStratergy(response); } The
response is replaced with a random choice every 3/100 times. This is done to
encourage the system to experiment with different ideas even when it has a
solution. Ideally, this kind of thinking should occur based on environmental
data alone, and if this project is extended, this will be one of the first
parts of the algorithm to be altered. if
((rand() % 100) < 4) // && certainty ==
LOW { response =
actions[(rand() % actionTotal)]; //response
= checkNotPartOfFailedStratergy(response); } A
record is kept of the chosen response, and the action code is returned to the
application using the AI. addItem(response); return
response; Workings of the Knowledge::pickBestFit function Summary The pickBestFit function takes the history since the last win or lose state (or game start), stored in shortHistory, and finds the best solution based on successful and failed plans already experienced. Note: Obviously this function could be much faster if it simply maintained the data already collected in tree form. Technique Received data shortHistory – the actions and
events occurring during the current game (since the last win or lose event). This
function begins by looping through the number of lines in the entire history
(all games ever played) in the vector array log. int scount = 0; for (int i
= 0; i < itemCount; i = i + 1) { Are
we analyzing a place in the log where the game was restarted? (-1 is the
restart/lose code – the human player has won) if
(log[i] == -1) { As
the game we are studying is a unsuccessful one (-1
indicates a loss) we score negatively against this and all strategies that we
have already discovered, that are similar to this one. noNewStrategy
= 0; Search
through all the strategies that we have found so far. for
(int j = 0; j < scount; j = j + 1) { If
we find a similar strategy, we subtract two[1]
from its score. This means that if we have seen a strategy win and then loose,
we make sure it is not used again. This is the adaptability functionality. The
logic here for finding a similar strategy is: (strategy under investigation) IS SUBSTRING (strategy in list) AND NOT((strategy
in list) IS SUBSTRING (strategy under
investigation)) if (similar(trackStrategy, workableStrategyDefns[j], 1, 0)) { workableStrategyScore[j] = workableStrategyScore[j] - 2; } We
reset our memory of the current strategy, as we don’t need to remember failed
strategies unless they (above) are failed versions of successful strategies. trackStrategy
= ""; } Have
we reached a win state in the current strategy? else if (log[i] == -2) { As
the strategy we have found is a successful one, we need to remember it.
However, if we have already encountered the winning strategy then encountering
it again increases that strategy’s value (as winning more than once is a great
achievement) for
(int j = 0; j < scount; j = j + 1) { If
we find a similar winning strategy, we add one to its score. The
logic here for finding a similar strategy is: (strategy under investigation) IS SUBSTRING (strategy in list) AND NOT((strategy
in list) IS SUBSTRING (strategy under
investigation)) if
(similar(trackStrategy, workableStrategyDefns[j], 1, 0)) { workableStrategyScore[j] = workableStrategyScore[j] + 1; } } The
new strategy is added to the list even if a similar strategy is found. The
reason for this is that the similar
function finds similar strategies – if this strategy is more efficient, it
should be kept even if it is similar to a winning strategy (it could be
better). workableStrategyDefns.push_back(trackStrategy); workableStrategyScore.push_back(1); scount
= scount + 1; We
clear our memory of the current strategy. trackStrategy
= ""; } else { If
the state is neither a win or loose, we simply add the data to the current
study of the strategy (strings are used as a legacy of the Perl version of this
software). trackStrategy.append(intToString(log[i])); trackStrategy.append("|"); } } Now
that all the winning strategies have been found and have been scored, we need
to sort them all. sortedStrategies = workableStrategyScore; sort(sortedStrategies,
sortedStrategiesIndex); (pseudocode) We
search through the sorted strategies until we find one that’s suitable, then we return it. int
score, index; for (int i
= 0; i < scount; i = i + 1) { score = sortedStrategies[i]; index = sortedStrategiesIndex[i]; Does
the strategy work? More specifically, does it map to our current shortHistory? If
so it’s usable, and we return it. if
(works(shortHistory, workableStrategyDefns[index])
&& score > 0) { return
workableStrategyDefns[index]; } } Otherwise
we return the strategy as empty. return
""; Workings of the Knowledge::similar function Summary The similar function takes two strategies stored in strings s1 and s2, and then compares how different they are from one another. Cost1 indicates the price of a difference in string s1, and cost2 indicates the price of a difference in string s2. Use of Function This function is useful when comparing strategies (stored as lists), especially if we throw in random elements. This allows a strategy with a slightly different step to be counted the same as another strategy that we have some history on. It means we can take partial knowledge and assume it's the same as something we already know. Technique Received
data s1
– list of actions s2
– list of actions cost1
– the cost of a move in string s1 cost2
– the cost of a move in string s2 First, a reject
difference is set: this is, the maximum difference between the two lists of
actions before we reject them outright as being different: // a third of the length seems to be
a good scale for difference int
maxdiff = items / 3; C++ code Then,
the function then enters a loop through the actions in list s1 and list s2. In each iteration of the loop the following logic is
executed: 1.
Is
the difference greater than the maximum acceptable difference? If so, the two
lists are not similar if (difference > maxdiff) { return
0; } C++ code 2.
Are
we at the end of either of the two lists and we have actually traversed
something? (one list could be empty - the other might have lots of items in it)
Then the two lists are similar. // are we at the end of either of the lists? if ((s1i >= items) || (s2i >= items2)) { if
((difference <= maxdiff) && !foundNothing) { // yes these two
are similar return
1; } // otherwise exit loop
(the strings are not similar) slide = 0; } C++ code 3.
Are
the two list items the same? Then the point of synchronization is remembered, and we shift along in both lists if
(s1L[s1i] == s2L[s2i]) { difference = difference +
0; lastSync1 = s1i; lastSync2 = s2i; indexer = 0; foundNothing
= 0; s2i = s2i + 1; s1i = s1i + 1; } C++ code 4.
Are
the two list items not the same? Then
we shift one space along list two. If we’re shifting more than the maximum
difference, or off the end of the list, we go back and start shifting along
list 1. This is to allow desynchronized lists: ![]() maxdiff = 6 / 3 = 2 Therefore,
these two lists are similar (D on list 1 was skipped – if the cost1 is 1) else { // we skip one on s2 s2i = s2i + 1; indexer = indexer +
1; // If we’re finding no match on our sliding scale, then we
backtrack and study comparison the other way. if
(indexer > maxdiff || indexer > items2) { s1i = lastSync1 + 1; s2i = lastSync2; // not strictly true (not last sync) but stops infinite loops lastSync1 = s1i; lastSync2 = s2i; // we allow substrings (up to cost maxdiff) if (!foundNothing) { difference
= difference + cost1; } } else { // we allow
substrings in if (!foundNothing) { difference
= difference + cost2; } } } C++ code Workings of the Knowledge::nextAction function Summary The nextAction function is provided with the history of actions in the current game, and the plan for the game. By comparing these two lists, it selects the next action to be performed. Use of Function This function is called from the move function to determine what step should be taken next. Technique Received
data shortHistory – the history of
actions in the current game plan – a tried and
tested plan for the current game Break apart the shortHistory and
the plan StringTokenizer shSTOK
= StringTokenizer(shortHistory, "|"); StringTokenizer pSTOK
= StringTokenizer(plan, "|"); vector<int> sh = vectorize(shSTOK); vector<int> p = vectorize(pSTOK); int
pval; int
pi = 0; int
shi = 0; int
lastPSync = 0; Go through the short
history and the plan at the same pace, until we reach a point on the plan which
hasn’t been executed. while
(shi < (int)sh.size() && pi < ((int)p.size() + 5)) { // this exists to allow us to
overrun a bit if
(pi < (int)p.size()) { pval
= p[pi]; } else { pval
= 0; } If something
matches on both the short history and the plan we move forward one space along
each one. We also remember that this was a point of synchronization. if
(sh[shi] == pval) { lastPSync
= pi; pi = pi + 1; shi = shi + 1; } else { If
something doesn’t match on either one, we skip forward on the plan. pi = pi + 1; If we have not
matched and we are already two spaces off from the existing plan, we sync back
to the plan. // we map back a bit if
((pi - shi) > 2) { pi = pi + 1; // not really a sync but good enough lastPSync
= pi; } } } The synchronization
synchronizes all elements, we are only interested in
the next action we can perform. Therefore, we move forward from the last sync
point and find the next action. for (int m = lastPSync;
m < (int)p.size(); m = m + 1) { Check
if the plan element is an action, if so, return it if
(isAction(p[m])) { return
p[m]; } } Otherwise we return
-1 which means there’s a problem. return
-1; >> Next [1] A bird in hand is worth two in the bush (Anon) |
