Deep Learning evolution

History of Deep learning

▶️ Deep Learning History and Recent Timeline

1998 Lenet

Back in 1998 Yann Lecun was using convolutional neural networks to recognize handwritten digits.

lenet-5

Neural networks were smalls and datasets were also small at that time.

2012 Imagenet

On 2012 AlexNet won the Imagenet challenge. This is the start of the modern deep learning era where 3 factors were combined:

Big datasets
GPU compute power
Algorithmic advances

2019 GPT-2

On 2017 the paper "Attention is all you need" was published. This paper introduced the Transformer architecture which is the basis of the modern natural language models. On 2019 OpenAI released GPT-2 which is a 1.5 billion parameters model. This model was trained on 8 million web pages.

One year later GPT-3 was released which is a 175 billion parameters model. This model was only accessible through an API due to its enormous size.

2022 Dalle-2 and Stable diffusion

2022 was the year of the big breakthroughs in the field of generative models. Dalle-2 was released by OpenAI and Stable diffusion was open sourced months later by Stability AI. These models can generate high quality image from text descriptions. Thus they combine language and vision.

dalle-2-images

Trends in Deep Learning

Datasets and models are growing exponentially.

Trends in datasets

Epoch AI Trends in Training Dataset Sizes

trends in dataset size

The plots show an exponential growth in the size of the datasets used to train deep learning models.

Trends in model size

Epoch AI Trends in Model Sizes

trends in model size

The model size of notable Machine Learning systems has grown ten times faster than before since 2018. After 2020 growth has not been entirely continuous: there was a jump of one order of magnitude which persists until today. This is relevant for forecasting model size and thus AI capabilities.

We can see that the model size also grows exponentially.

The future of Deep Learning

Clearly there is a trend towards bigger and bigger models and datasets. However at the same time this year 2022 we have seen that a model of just 2 GB can generate high quality images from text.

The combination of vision and language has enabled unprecedented levels of generalization. Although the generalization problem is not solved yet this recipe of using a big multimodal text dataset seems to be a promising path to follow.

Last update: 2022-12-05