<< JointMeta << Science << Machine Learning

>> (Experiment) Designing a Generic Unsupervised Learning Component: Knowledge dll

Index
 


>> 4. Performance

IN THE MACHINE ROOM EXPERIMENTAL GAME

We first look at the entertainment value of this AI component, and then we extend this to a more formal investigation as to the capabilities of the AI.


Gameplay/Entertainment Value (observations of gamers)

It is not entirely possible to quantify the success of the Knowledge dll component based on gameplay, as this depends upon many other factors, most importantly the design of the game itself.
However in tests of gamers aged 14-19 the entertainment value of trying to continuously outwit the computer was enough to keep each person wanting to play for about 20 minutes - a considerable success for such a simple game. The game was less appealing to women, who were more excited by the fear caused by the darkness and sound effects of the sample game environment.

The most appealing aspect of the AI to the players who played was how the Knowledge dll was able to come up with different winning strategies each time, and did not resort to anything that was recognizable. It seemed a more human AI - it was more playful.

In the room game the Knowledge dll is very effective because:

  • There are not very many choices of actions to take, and it is simple to find winning combinations
  • The player is unpredictable, therefore the ability to reject failed strategies and then to adapt becomes very useful

When controlling the machines in the room, the random experimentation is sufficient to find a combination that will prevent the player from winning. The more that the game is played, the more skilled the software becomes. As it is relatively easy to

When a player first attempts the game, they are not completely familiar with the objects in the room and therefore tend to be easily catchable at the beginning, causing the AI to reject strategies that worked when the player was more cautious; this extended the game's interest further, as the player learned how to outwit the traps controlled by the AI.


Sample Winning Strategies

These strategies were derived automatically by the Knowledge dll.

A simple winning strategy:

1002 (first partition)
1007 (first cage)


Closing the first partition and opening the first cage causes the user to be trapped in with the let-loose creature.

Another strategy (more foolproof)

1004 (stairs)
1002 (middle partition)
1006 (middle cage)
1005 (first cage)
1009 (no operation)
1007 (third cage)


The first move here, the stairs, has no effect on the player - the Knowledge dll is unaware of this, but doesn't care, because this strategy works anyway.


Sample Learning Process

This is a real example of the learning process using the "total plan" approach where the location of the room is always reported to be the same.

The first attempt the AI tried the following actions (each one after 5 seconds):
1006, 1004, 1005, 1009
And the game responded:
-1 (user win)

The second attempt the AI is luckier:
1002, 1007, 1009, 1001
And the game responded
-2 (user dead)

So it tries it again:
1002, 1007, 1009, 1001
But nothing happens, so it starts to pick randomly:
1004, 1005, 1002, 1009, 1006, 1002, 1008, 1003
And the game responded:
-2 (user dead)

Now, the user goes again, and the AI tries
1002, 1007, 1009, 1001, 1004, 1005, 1002, 1009, 1006
And the game responded:
-1 (user win)
This time the user has won against a strategy that works. It's now unlikely that this strategy will be used again. The four actions *may* be used as it reoccurs in both two previous successful attempts.

In this example the location in the room was not factored in - this simply adds places where the strategy branches off depending upon the room location data.


Experiment

Process of Evaluation

To demonstrate the use of the Knowledge dll to game programming it is necessary to work with it in a game environment. It would be possible to provide fixed input data, or use the console version to see how quickly the software can consistently win. However, this does not take into account the very slight variations in timing by a human player in a computer game which can enormously change an outcome. For example, if a player can dodge a creature by one meter in a game environment, how can this be factored into a scientific evaluation of the learning capabilities?

The only practical solution is the repeated testing of the Knowledge dll in the Machine Room Game, played to very strict behaviors. In addition, the visual environment allows for a greater insight into the thinking of the AI, and awareness of what possibilities exist for its improvement.

The theorem I base this on is the idea that the challenge of practical application gives rise to more questions than a theoretical example typically presents.

Strict Rules of Human Play

  • The evaluation proceeds until three consecutive losses occur. At this point the game is considered to be 'mostly unbeatable'. Three consecutive losses indicate that the system has found a strategy that is allows it to consistently win the game. These wins must be based on the success of the strategy not based on bad luck.
  • Play must be as aggressive as possible - the creatures must be dodged as well as possible, by letting them follow across the space and then moving around the edge of the rooms. When repeatedly caught on the stairs, yet the bridge is unaffected, the bridge must be tried out.
  • Always advance across the room, never hold back (it doesn't help anyway)
  • Always choose to cross the bridge rather than take the steps
  • Always climb steps to avoid creatures if the bridge is dangerous or down

Data Collected

The data that has been collected includes:

  • The number of trials is the number of wins and losses before the 'mostly unbeatable' state occurs
  • The number of wins until the 'mostly unbeatable' state occurs
  • The number of losses until the 'mostly unbeatable' state occurs
  • The number of steps in the winning strategy
    • From the three consecutive losses before the 'mostly unbeatable' state, of the last two, the number from the strategy with the lowest count of steps is used. This is because the first time the strategy is seen the user is taken by surprise, and it is likely the number of steps is incorrect.
  • The total number of operations (machine movements) run in the level, to reach the 'mostly unbeatable' state.

Potential Inferences

The number of wins and losses indicate the level of strategy of the system. If the number of losses is only three, the AI is less interesting to the player, because the behavior is simply random until a successful strategy is found. If the player learns to dodge one strategy, the system must adapt and restart experimentation. It is that adaptation that is entertaining and makes the 'computer' seem more human. In trials of the entertainment value of the game, the impression that the computer was as devious as the human player made the game an exciting challenge.

Nevertheless, the aim of the AI is to reach three consecutive wins. Therefore, the lower the number of wins, the more effective the AI is.

It is important to consider the number of steps in the winning strategy. If this number is very high, it indicates that there is considerable redundancy, or that what we have is not a strategy, but a brute force solution.

A higher number of operations performed over the entire game cycle indicates a less efficient system.


Results of Experiment

195 attempts were performed on the Machine Room game of varying strategies, with the total number of operations counted at 2294.

Five Second Total Plan (no location information, random ideas)
In this state the software is only guessing sequences of events that will lead to a win-state.
Every five seconds the Knowledge dll is permitted to make a move in the game.
No feedback is recorded except a win or lose state.

It is visually clear that the number of trials corresponds to the number of operations performed. Ideally we should see that the "Operations vs. Total Trials" should curve upwards, as the AI adapts towards the behavior of the user. This happens only in a couple of times. The furthest diamond point to the right represents an attempt where the system failed to come up with a strategy until 18 attempts had passed - an outcome that suggests randomness and not learning (and unsuitable in a much more complex system).

graph1 (5K)

The data from the trials shows that the winning strategies were relatively efficient - with the average winning strategy under 9 operations, which indicates that the strategies are true strategies and not brute force methods. There are 10 possible operations, and assuming several of these are repeated (and therefore ignored) and considering that the events in a strategy must be synchronized to the time (as the player is moving across the room), therefore these strategies have meaning beyond randomness.

 

Total Trials

Winning Strategy Steps

Operations

Wins

Losses

Notes

Attempt 1

6

8

47

3

3

trapped in pit

Attempt 2

5

8

36

1

4

closes first partition

Attempt 3

16

13

120

11

5

trapped on stairs

Attempt 4

3

10

38

0

3

trapped in pit

Attempt 5

4

7

37

1

3

closes first partition, opens two cages

Attempt 6

10

6

111

5

7

eventually trapped on stairs

Attempt 7

7

11

59

4

3

 

Attempt 8

5

6

41

2

3

 

Attempt 9

22

9

214

18

3

two partitions and the pit

Attempt 10

3

11

30

0

3

second partition

Total

81

89

733

45

37

 

Average

8.1

8.9

73.3

4.5

3.7

 

The Five Second Total plan technique works by allowing the Knowledge dll to submit an action every five seconds. To see how effective this is compared to brute force methods (where everything happens at once) the game was changed to request an action every two and every three seconds. The results are shown below.

It is clearly noticeable that the number of operations has increased considerably when the time delay between each operation is reduced. This is to be expected. But are the strategies any better?

graph2 (12K)

It turns out that it takes longer for the player to be defeated using the two or three second delays, as the AI becomes it's own worst enemy. As it takes time for the partitions to open again once they are closed, the very frequent number of actions means that a lot of steps in the strategies are irrelevant. It also means that the stairs in the Machine Room is moved most of the time - and it becomes a useful place for the player to hide on, where the creatures cannot reach him/her. This leaves the AI throwing all the operations at the user and unable to have any effect.

Delay

Average Strategy Length

5 seconds

73.3

3 seconds

138

2 seconds

146

The situation is even worse if you look at the number of operations in the winning strategies:

Delay

Average Number of Steps in strategy that leads to ‘mostly unbeatable’

5 seconds

8.9

3 seconds

13

2 seconds

17.6

When the AI is allowed to run an operation every two seconds it expends a very wasteful amount of operations, which makes it very impractical for real world problems and not entertaining when part of a video game. This implies that one of the most important aspects of good AI is it's ability to choose wisely the timescale relative to it's task.

Event State Reaction allows the Knowledge dll to operate two machines every time the player enters a new region of the room. This means that there has to be some relation of where the user is located to exactly what devices are controlled.

The Event State reaction produces a very scattered graph for the number of operations vs. the total trials. This is because Event State reaction is more clever at coming up with strategies that work. This is probably due to the ability for this mode to run more operations than the ability to think for itself. This is shown by the lack of adaptation involved - there are very few strategies that were successfully beaten by the player (as opposed to several, which is the occurrence with the five second total plan).

The fact that there is little or no adaptation means that this method, in this particular implementation is less entertaining for the player. In addition as the environment is less chaotic, the true strength of the Knowledge dll: adaptation and taking successful strategies and extending them, can never be used. This means that the AI seems far less human, and consequently, less fun to play against.
For truly intelligent computer players in video games, it is necessary for them to think in real-time (if the game is set in real time), rather than being event triggered.

graph3 (32K)

There is one benefit to event state reaction. As we are very conservative in terms of the number of operations used the winning strategies are much, much shorter than those with the five second total plan. They unusually have little or no redundant operations. Compare 3.6, the average number of steps in the event triggered approach to 8.9, the number of steps in the five second total plan approach.

This, as well as the low number of operations (41.2 average vs. 73.3 average) indicates that despite making a less entertaining game, learning from triggered randomness yields a far more efficient response than from being allowed to move after a fixed period of time.

 

Total Trials

Winning Strategy Steps

Operations

Wins

Losses

Notes

Attempt 1

10

4

48

7

3

first partition, first cage

Attempt 2

7

2

24

4

3

first partition, second cage

Attempt 3

3

2

6

0

3

first partition, second cage

Attempt 4

5

4

54

3

3

first partition, second cage

Attempt 5

6

4

44

2

4

second partition, first cage

Attempt 6

7

4

42

4

3

chases onto bridge

Attempt 7

10

4

60

6

4

partition 1 and 2, cage 2

Attempt 8

9

6

68

6

3

caught on stairs

Attempt 9

4

4

28

1

3

 

Attempt 10

7

2

38

4

3

trapped stairs/first partition

Average

6.8

3.6

41.2

3.7

3.2

 

Total

68

36

412

37

32

 

The best solution would be to combine triggered responses, location information, with a system that is allowed to move when it wants to. Forgetting the triggering, if location information is combined with a frequency based response (five second total plan) the Machine Room game becomes much more interesting. If the player walks slowly, quickly, or runs the AI knows to learn a different strategy to deal with that particular situation. This has the huge advantage that if a human behaves differently, formally successful strategies are not forgotten. For example, if someone can beat one strategy by a slightly different way of moving around the room, the AI remembers the formally successful strategy and is ready to use if again if necessary. The ability to store multiple strategies makes this mode seem the most intelligent.
The only drawback to this approach is that it means the learning time is increased considerably, to deal with all kinds of branches of scenarios.


Experimental Conclusions

Learning how to solve any problem, and in this case, learning to defeat the player in the Machine Room problem benefits from more information about the environment. In this example, the system was mostly blind as to what was really happening in the room - but this relates back to Plato's cave conundrum - are we really interfacing with the world, or are we interfacing an interface?

Nonetheless it is true that as the Knowledge dll is blind when it deals with the room, it is unable to generalize its knowledge to solve other problems. In order to become more intelligent the AI must be able to recognize elements (i.e. partitions in the game) as being the same as other elements, and therefore somewhat being able to guess what is going to happen when they are used.

It is true that the unsupervised learning used in the Machine Room game is what makes it so entertaining - it is the AI's incompetence at the beginning and watching it become more educated - if the game knew that the monster was a monster, it would know to release it, and therefore make the game less fun. If any learning systems are used for entertainment value in a game, whether they are to fight or to communicate with, it is important that they should learn how things work from a very basic level. The AI systems must be ignorant of the very basic facts of life (i.e. gravity) to be able to create an entertaining separation which seems like they are really a true life form.



>> Next
 
 

Email

Creative Commons License
This work is licensed under a Creative Commons License.