Breaking News

Build a Game AI - Machine Learning for Hackers

Yes I beat it, did that impress you? If i built an ai to beat this for me would that impress youHello World, welcome to Sirajology! In this episod
e we're going to build an AI to beata bunch of Atari games.  Games have had a long history of being a testbed for AI ever sincethe days of Pong.  Traditionally, game programmers have taken a reductionist approach to buildingAI.  They've reduced the simulated world to a model and had the AI act on prior knowledgeof that model.  And it worked out for the most part.  I guess.  Not really.  But what if wewant to build an AI that can be used in several different types of game worlds? All the worldmodels are different so we couldn't feed it just one world model.  Instead of modelingthe world, we need to model the mind. We want to create an AI that can become a pro at any game we throw at it.  So in thinkingabout this problem, we have to ask ourselves -- what is the dopest way to do this? Well,the London-based startup DeepMind already did this in 2015.  DeepMind's goal is to createartificial general intelligence, thats one algorithm that can solve any problem withhuman level thinking or greater.  They reached an important milestone by creating an algorithmthat was able to master 49 different Atari games with no game-specific hyperparametertuning whatsoever.  Google snapped them up like yooooooooo.  The algorithm is called theDeep Q Learner and it was recently made open source on GitHub.  It only takes two inputs-- the raw pixels of the game and the game score.  That's it.  Based on just that it hasto complete its objective; maximize the score.  Let's dive into how this works, since we'llwant to recreate their results. First it uses a deep convolutional neural network to interpret the pixels.  This is atype of neural network inspired by how our visual cortex operates, and expects imagesas inputs.  Images are high dimensional data so we need to reduce the number of connectionseach neuron has to avoid overfitting.  Overfitting by the way is when your model is too complex,there too many parameters and so its overly tuned to the data you've given it and won'tgeneralize well for any new dataset.  So unlike a regular neural network, a convolutionalnetwork's layers are stacked in 3 dimensions and this makes it easy to connect each neuronONLY to neurons in its local region instead of every single other neuron.  Each layer actsas a detection filter for the presence of specific features in an image and the layersget increasingly abstract with feature representation.  So the first layer could be a simple featurelike edges, then the next layer would use those edges to detect simple shapes, and thenext one would use those shapes to detect something even more complex like Kanye.  Thesehierarchical layers of abstraction are what neural nets do really well. So once it's interpreted the pixels, it needs to act on that knowledge in some way.  In aprevious episode we talked about supervised and unsupervised learning.  But wait (thereis another and his name is john cena) its called Reinforcement Learning.  Reinforcementlearning is all about trial and error.  Its about teaching an AI to select actions tomaximize future rewards.  Its similar to how you would train a dog.  If the dog fetchesthe ball you give it a treat, if it doesn't then you withhold the treat.  So while thegame is running, at each time step, the AI executes an action based on what it observesand may or may not receive a reward.  If it does receive a reward, we'll adjust our weightsso that the AI will be likely to do a similar action in the future.  Q Learning is the typeof reinforcement learning that learns the optimal action-selection behavior or policyfor the AI without having a prior model of the environment.  So based on the current gamestate, like an enemy spaceship being in shooting distance, the AI will eventually know to takethe action of shooting it.  This mapping of state to action is its policy and it getsbetter and better with training.  Deep Q also uses something called experience replay, whichmeans the AI learns from the dataset of its past policies as well.  This is inspired byhow our hippocampus works, it replays past experiences during rest periods, like whenwe sleep. So we're going to build our game bot in just 10 lines of Python using a combination ofTensorflow and Gym.  Tensorflow is google's ML library which we'll use to create the convolutionalneural net, and Gym is OpenAI's ML library which we'll use to create our reinforcementlearning algorithm and setup our environment.  Oh, If you haven't heard, OpenAI is a non-profitAI research lab focused on creating AGI in an open source way.  They've got a billionbucks pledged from people like Elon Musk so yeah.  Elon Musk. Let's start off by importing our dependencies.  Environment is our helper class that willhelp initialize our game environment.  In our case, this will be space invaders, but wecan easily switch that out to a whole host of different environments.  Gym is very modular,OpenAI wants it to be a gym for AI agents to train in and get better.  You can submityour algorithm to their site for an evaluation and they'll 'score' it against a set of metricsserver-side.  The more generalized the algorithm, the better -- and everybody's attempts canbe viewed online so it makes sharing and collaborating a whole lot easier.  I approve.  We'll alsowant to import our deep q network helper class to help observe the game and our trainingclass to initialize the reinforcement learning. Once we've imported our dependencies, we can go ahead and initialize our environment.  We'llset the parameter to space invaders.  and then initialize our agent using our DQN helperclass with the environment and environment type as the parameters.  Once we have thatwe can start training by running the trainer class with the agent as the parameter.  First,this will populate our initial replay memory with 50,000 plays so we have a little experienceto train with.  Then it will initialize our convolutional neural network to start readingin pixels and our Q learning algorithm to start updating our agent's decisions basedon the pixels it receives.  This is an implementation of the classic "agent-environment loop".  Eachtimestep, the agent chooses an action, and the environment returns an observation anda reward.  The observation is raw pixel data which we can feed into our convolutional network,and the reward is a number we can use to help improve our next actions.  Gym neatly returnsthese parameters to use via the step function which we've wrapped in the environment helperclass.  During training, our algorithm will periodically save the 'weights' to a filein the models directory so we'll always have a partially trained model at least. Expect it to take a few days to fully train this to human level.  Once we've started training,we can start the game with the play function of our agent object.  We can go ahead and runthis in terminal and the space invaders window should pop up and we'll start seeing the AIstart attempting to play the game.  It'll be hilariously bad at first but will slowly getbetter with time.  (terminator) We can see in terminal a set of metrics periodicallyprinted out so we can see how the agent is doing as time progresses.  The AI will getmore difficult to defeat the longer you train it and ideally you can apply it to any gameyou create.  Video games and other simulated environments are the perfect testing groundsfor building AI since you can easily observe its behavior visually.  For more info, checkout the links down below and please subscribe for more machine learning videos.  For nowi've gotta go fix a runtime error so thanks for watching

No comments