Unlike Artificial Intelligence (AI) or Machine Learning (ML), Deep Learning (DL) is a relatively new term circa 2010. However the underlying technology is not new. Deep Learning is based on Artificial Neural Networks (ANN). Artificial Neural Networks are as old as AI itself. The original idea behind AI was to make computers as smart as human brain. And that's what ANN tries to do by mimicking the brain. The recent hype around DL, especially since 2010, is because the idea seems close to its actual fruition. This dream like scenario is made possible by two recent technological breakthroughs i.e. GPUs and Big Data.
So what are ANN anyway and how do they work? ANN are network of neurons connected with Synopses. A neuron is a unit of calculation either in the form of a physical device or an algorithm. A synopses, much like their biological counterparts, is a connecting tissue with a caveat i.e. it adds weights on the calculation done by neuron before propagating it to the next. Together these neurons and synapses form a connectionist system. This connectionist system calculates its output by considering examples without any task specific programs. Much like how a child learns to recognize a dog by seeing multiple examples of what a dog looks like.
Although ANN are loosely based on mimicking how brain's neurons work, they differ in their complexity and working. A typical large ANN consists thousands of neurons while a typical mammal's brain has billions. Also unlike brain, neurons in ANN work in layers. Say for example you have a cat image to identify. The first layer might just be focusing on border lines, another one on colors and yet another one on physical dimensions of the cat. Every layer weights on the input and moves it to the next layer. The final cat or no cat result is based on calculation from the collective weights of all layers.
Every ANN typically has an input layer, a hidden layer and an output layer. Hidden layer is where the magic happens. In Deep Learning there are multiple hidden layers hence the term. The problem with having only one hidden layer is that it results in a linear mapping of input to output. This is not ideal since we want system to learn, correct and then relearn while acting on the same input. With each layer focusing on one aspect and subsequently learning and modifying its output based on new learnings from other layers, results in the entire connectionist system getting better and better overtime.
Every Deep Learning system works in two phases or steps. First step is training and second is inferring. A useful analogy to understand both these steps will be a graduate student's transition from university to professional work. The training is what happens in a university when the student is exposed to a lot of knowledge. However, while the student is learning a lot, the knowledge is often based on fictional examples rather than a real world problem that needs to be solved. This later more useful thing, i.e. solving a real world problem happens in the second stage, i.e. Inferring. In the training phase the ANNs are exposed to a lot of data. Once the training has taken place, ANNs use inferring to make educated guess on new, previously unexposed, real world data.
Returning to the example of our cat image. In the training phase the ANNs will be exposed to a lot of cat images. They will learn to understand the shape of a cat in different frames and situations. This is also called supervised learning. The ANNs are being shown and told how to identify cats much like you tell your 2 year old kid. In inferring the ANNs will use what they have learnt and make educated guesses if a particular image is a cat or not. Much like when your kid sees a new cat on the street and immediately concludes that it’s a cat because you have show her the one at home.