Integrating Artificial Intelligence and simulation modelling
Headline make it clear that AI is a hot topic …. What does that mean for simulation?
“AI is one of the most important things humanity is working on. It is more profound than, electricity or fire”. – Sundar Pichai (Google CEO)
“The average AI job command six figures- here are the top 15 companies hiring talent right now” –CNBC
“Artificial intelligence is likely to make a career in finance, medicine or law a lot less lucrative.” – Entrepreneur
“The trajectory of AI and its influence on society is only beginning .” –Satya Nadella(Microsoft CEO)
Artificial Intelligence is not a single technology, here are five key areas that PwC[PricewaterhouseCoopers(acounting/consulting firm)] is focusing on. We consider simulation to be an AI technology.
Reinforcement Learning Components
- Machine Learning
- Deep Learning
Machine Learning– Using algorithms to learn from data and solve business problems without being explicitly programmed.
Deep Learning: Leveraging cutting-edge machine learning algorithms learning inspired by artificial neural networks especially for unstructured data.
Simulation: Developing models of real world processes and testing their performance/ success under various scenarios.
Data at Scale: Using distributed computing and machine learning tools to analyse terabytes of data.
Natural language Understanding: Understanding human speech and text through application of computer science, AI, and computational linguistics.
Simulation is tightly coupled to other AI technologies
ML AND DL Models are created by training them on large data sets. Often, it is too dangerous, expensive, or otherwise impractical to create training data from real-world experience. Simulation must used to train and test the ML/DL models.
Modelling describes how the structure of a system drives its behavior. When AI is a pervasive structural element is real world systems then it needs to be created in the structure of our models.
AI tools can improve how we create simulation by increasing our understanding of how models behave across a wide range of conditions and facilitate policy analysis, calibration and insight.
Role of simulation in AI: Deep Reinforcement Learning
Reinforcement Learning is an area of machine learning focused on teaching a computer an optimal decision policy over time using rewards and penalties as signals.
A. The agent’s goal is to select actions that maximize the future reward.
B. Representative model of how human learn from experiences.
C. Lies in between supervised and unsupervised learning.
D. We implement Deep Reinforcement Learning for more complex problems using Deep neural networks.
Anybody who’s taken physics would be able to write using any logic a simulation model of those cases. But in a reinforcement learning technique what we would use in some sort of machine learning model that would by trail and error, learn what the right rule was a simple care and stick approach.
So if by trial will randomly try different things and for those trails that succeed it gets a reward and for those trails that fail it gets punished over time. It learns that it should do one thing and not the other and in that way it learns the rules.
But that means you don’t have to write any code, you don’t have to write the rule. Of-course you have to write a boatload of code to produce a reinforcement learning.
Any logic package delivery maze reinforcement learning
In deep reinforcement learning these days is still grid world models and in grid world models what we do is our layout. What we would call a state action space where we have a laser pointer, where every square in there is a state essentially and there is some set of actions.
The actions are very simple there are for ant grid that the car is in it can only go up down or left or right that was the same as we saw any other one.
Now this setup is little bit different
- The car always starts in what’s going to be grid zero-two and it has to pick up a package that one can see in grid three. And it has to deliver it to the destination which is a checkered flag and it has to learn how to do this.
- So this simulation has to run may be three thousand times for it to learn the right route. And for it get rewarded it has to pick-up the package and it has to get it to the destination.
- If one look at the layout then what one see how this reinforcement, this training process works is that every square has a score it’s called the Q score quality associated with that action.
- So their are four actions and then along each of the side one can see what the score is. And during the training process it might have been hard to see at this distance as those scores were changing as the vehicle did tried and error.
- And the square was colored red when all the scores were bad in other words and it was turned green once there are good moves to be made. So the way that this training process works it for every state that the car finds itself in.
- Then it can try one of the four moves and by trial and error then the quality of that move become established and eventually it will find a good path. The fascinating thing is that that’s guarantee to cut if you do this right i.e if you implement it’s guarantee to converge, its guarantee to find a best path, and so that’s fairly remarkable fast.
So the working phase
Exploration phase: It was moving randomly, it was trying things randomly and then as the simulation or as the training period progressed. Then it would exploit the historical learning and it would move and more choose the best move and so that’s how it incrementally learns through the process.
How it works?
(DQ Algorithm and other algorithms)
Deep Q Network– The neural network acts as a value function for a particular state and outputs a Q-value for each action. The action that yields the highest value is chosen by the agent and then performed in the environment.
Deep Q Network Attributes:-
- Bellman’s Update: Iteratively used to train the neural network to represent the Q function.
- Prioritized experience replay: It is used to store all experiences in a replay memory in order of importance and it is sampled in mini-batches during training.
- Exploration-Exploitation: E-greedy exploration is used where with probability E-choose a random action, otherwise go with “greedy-action.”
- Target Networks– Training is stabilized with a second network that is only updated with the weights every X-training period.
Specialized AI tool-sets are fine for developing specific components but to model entire systems we want to integrate AI directly into any logic.
Python is well known as the prominent language for data science tool-kits. However we needed a java based library to achieve our goal of integrating AI into an any logic simulation.
We looked for a library that met these requirements:
- Java based to allow for integration with any logic.
- Well supported by an engaged community to keep up with AI trends.
- Designed for scale as our application are likely to be for big-businesses.
- Interfaces with a variety of data stores as it is likely in a commercial scenario.
With these requirements, we landed on Deep Learning4j(dl+j). Deep learning4j is a commercial grade library designed for data scientists in business environments. Whilst it is always recommended to research libraries for each use-case, here are some dl4j scientists is business environment.
Here are some dl4j highlights:
- JVM based distributed deep learning framework.
- Dl4j leverages ND4j for data management which is also distributed.
- Integrates with Hadoop, Spark and Kafke.
- Strong documentation and community.
A common application for simulation is develop “optimal” decision rules for agent in a complex system. This is a problem that can be tackled with AI technologies combined with simulation.
- Many business problems consists of multiple decisions makers either collaborating or computing towards a particular goal.
- PwC is working with a team at large car company that is looking to roll out autonomous vehicles for purpose of transporting customers.
- Delivered solution used a very complex logic AB/DE model simulated vehicles that follow hand coded rules to make their decisions.
- As an experiment we used “Deep Reinforcement Learning” to train AV to maximize fleet efficiently while satisfying customer trip demand.
Making it real: Moving beyond grid world
To explore using AI in a strategy setting we are experimenting with training an AI to play the consumer market game.
The state–action space for this simple strategy game is larger and more complex than most “Grid Worlds”. Also, Specifying the reward to guide training is more difficult.
“Red company” was replaced by a NN while the other two remained rule based. The AI was trained by playing a 90 day game 1000 times.
So far, the results are mixed
- We ran four experiments with different competitive behaviors.
- Promotion at 5 day intervals.
- 3 promotions at 15 day intervals.
- Promotion day after red promotes.
- Promotion whenever share drops below 30%.
- The AI is trained “from scratch” for each experiment.
- The AI wins if it has the highest profit at the end of the game.
- Part of the issue appears to be inadequate simulation model quality.
A moving from “grid world” to real life strategy application is challenging.
- The “state-action results” space is much more abstract and complex in a strategy application.
- For training purposes the strategy question needs to be structured as a game. Can be difficult to do in a way that maintains realism.
- As the state-action space grows the number of training iterations grows rapidly placing a premium on compute performance.
- Casual dynamics of the underlying system model must be correct.
- AI is opaque making the identification of issues and diagnosis of causes difficult.
- If in doubt, do not over engineer.