|
>> (Experiment) Designing a Generic Unsupervised Learning Component: Knowledge dll Index >> 4. Performance IN THE MACHINE ROOM EXPERIMENTAL GAME We first look at the entertainment value of this AI component, and then we extend this to a more formal investigation as to the capabilities of the AI. Gameplay/Entertainment Value (observations of gamers) It is not entirely possible to quantify the success of the Knowledge dll component based on gameplay, as this depends upon many other factors, most importantly the design of the game itself. The most appealing aspect of the AI to the players who played was how the Knowledge dll was able to come up with different winning strategies each time, and did not resort to anything that was recognizable. It seemed a more human AI - it was more playful. In the room game the Knowledge dll is very effective because:
When controlling the machines in the room, the random experimentation is sufficient to find a combination that will prevent the player from winning. The more that the game is played, the more skilled the software becomes. As it is relatively easy to When a player first attempts the game, they are not completely familiar with the objects in the room and therefore tend to be easily catchable at the beginning, causing the AI to reject strategies that worked when the player was more cautious; this extended the game's interest further, as the player learned how to outwit the traps controlled by the AI. Sample Winning Strategies These strategies were derived automatically by the Knowledge dll.
Sample Learning Process This is a real example of the learning process using the "total plan" approach where the location of the room is always reported to be the same.
In this example the location in the room was not factored in - this simply adds places where the strategy branches off depending upon the room location data. Experiment Process of Evaluation To demonstrate the use of the Knowledge dll to game programming it is necessary to work with it in a game environment. It would be possible to provide fixed input data, or use the console version to see how quickly the software can consistently win. However, this does not take into account the very slight variations in timing by a human player in a computer game which can enormously change an outcome. For example, if a player can dodge a creature by one meter in a game environment, how can this be factored into a scientific evaluation of the learning capabilities? The only practical solution is the repeated testing of the Knowledge dll in the Machine Room Game, played to very strict behaviors. In addition, the visual environment allows for a greater insight into the thinking of the AI, and awareness of what possibilities exist for its improvement. The theorem I base this on is the idea that the challenge of practical application gives rise to more questions than a theoretical example typically presents. Strict Rules of Human Play
Data Collected The data that has been collected includes:
Potential Inferences The number of wins and losses indicate the level of strategy of the system. If the number of losses is only three, the AI is less interesting to the player, because the behavior is simply random until a successful strategy is found. If the player learns to dodge one strategy, the system must adapt and restart experimentation. It is that adaptation that is entertaining and makes the 'computer' seem more human. In trials of the entertainment value of the game, the impression that the computer was as devious as the human player made the game an exciting challenge. Nevertheless, the aim of the AI is to reach three consecutive wins. Therefore, the lower the number of wins, the more effective the AI is. It is important to consider the number of steps in the winning strategy. If this number is very high, it indicates that there is considerable redundancy, or that what we have is not a strategy, but a brute force solution. A higher number of operations performed over the entire game cycle indicates a less efficient system. Results of Experiment 195 attempts were performed on the Machine Room game of varying strategies, with the total number of operations counted at 2294.
It is visually clear that the number of trials corresponds to the number of operations performed. Ideally we should see that the "Operations vs. Total Trials" should curve upwards, as the AI adapts towards the behavior of the user. This happens only in a couple of times. The furthest diamond point to the right represents an attempt where the system failed to come up with a strategy until 18 attempts had passed - an outcome that suggests randomness and not learning (and unsuitable in a much more complex system).
The data from the trials shows that the winning strategies were relatively efficient - with the average winning strategy under 9 operations, which indicates that the strategies are true strategies and not brute force methods. There are 10 possible operations, and assuming several of these are repeated (and therefore ignored) and considering that the events in a strategy must be synchronized to the time (as the player is moving across the room), therefore these strategies have meaning beyond randomness.
The Five Second Total plan technique works by allowing the Knowledge dll to submit an action every five seconds. To see how effective this is compared to brute force methods (where everything happens at once) the game was changed to request an action every two and every three seconds. The results are shown below. It is clearly noticeable that the number of operations has increased considerably when the time delay between each operation is reduced. This is to be expected. But are the strategies any better?
It turns out that it takes longer for the player to be defeated using the two or three second delays, as the AI becomes it's own worst enemy. As it takes time for the partitions to open again once they are closed, the very frequent number of actions means that a lot of steps in the strategies are irrelevant. It also means that the stairs in the Machine Room is moved most of the time - and it becomes a useful place for the player to hide on, where the creatures cannot reach him/her. This leaves the AI throwing all the operations at the user and unable to have any effect.
The situation is even worse if you look at the number of operations in the winning strategies:
When the AI is allowed to run an operation every two seconds it expends a very wasteful amount of operations, which makes it very impractical for real world problems and not entertaining when part of a video game. This implies that one of the most important aspects of good AI is it's ability to choose wisely the timescale relative to it's task. Event State Reaction allows the Knowledge dll to operate two machines every time the player enters a new region of the room. This means that there has to be some relation of where the user is located to exactly what devices are controlled. The Event State reaction produces a very scattered graph for the number of operations vs. the total trials. This is because Event State reaction is more clever at coming up with strategies that work. This is probably due to the ability for this mode to run more operations than the ability to think for itself. This is shown by the lack of adaptation involved - there are very few strategies that were successfully beaten by the player (as opposed to several, which is the occurrence with the five second total plan). The fact that there is little or no adaptation means that this method, in this particular implementation is less entertaining for the player. In addition as the environment is less chaotic, the true strength of the Knowledge dll: adaptation and taking successful strategies and extending them, can never be used. This means that the AI seems far less human, and consequently, less fun to play against.
There is one benefit to event state reaction. As we are very conservative in terms of the number of operations used the winning strategies are much, much shorter than those with the five second total plan. They unusually have little or no redundant operations. Compare 3.6, the average number of steps in the event triggered approach to 8.9, the number of steps in the five second total plan approach. This, as well as the low number of operations (41.2 average vs. 73.3 average) indicates that despite making a less entertaining game, learning from triggered randomness yields a far more efficient response than from being allowed to move after a fixed period of time.
The best solution would be to combine triggered responses, location information, with a system that is allowed to move when it wants to. Forgetting the triggering, if location information is combined with a frequency based response (five second total plan) the Machine Room game becomes much more interesting. If the player walks slowly, quickly, or runs the AI knows to learn a different strategy to deal with that particular situation. This has the huge advantage that if a human behaves differently, formally successful strategies are not forgotten. For example, if someone can beat one strategy by a slightly different way of moving around the room, the AI remembers the formally successful strategy and is ready to use if again if necessary. The ability to store multiple strategies makes this mode seem the most intelligent. Experimental Conclusions Learning how to solve any problem, and in this case, learning to defeat the player in the Machine Room problem benefits from more information about the environment. In this example, the system was mostly blind as to what was really happening in the room - but this relates back to Plato's cave conundrum - are we really interfacing with the world, or are we interfacing an interface? Nonetheless it is true that as the Knowledge dll is blind when it deals with the room, it is unable to generalize its knowledge to solve other problems. In order to become more intelligent the AI must be able to recognize elements (i.e. partitions in the game) as being the same as other elements, and therefore somewhat being able to guess what is going to happen when they are used. It is true that the unsupervised learning used in the Machine Room game is what makes it so entertaining - it is the AI's incompetence at the beginning and watching it become more educated - if the game knew that the monster was a monster, it would know to release it, and therefore make the game less fun. If any learning systems are used for entertainment value in a game, whether they are to fight or to communicate with, it is important that they should learn how things work from a very basic level. The AI systems must be ignorant of the very basic facts of life (i.e. gravity) to be able to create an entertaining separation which seems like they are really a true life form. >> Next |
