The history of machine learning

Ediacaran

Fundamental research.

Cambrian

Designing, implementing, and toying around with all of that, deliberately using low memory and CPU power, and not using the Internet, because the system was designed to work in offline mode only.

Overview

1960-1980

1980-2000

Carboniferous

Trying the same things with more memory, CPU power, and using the Internet, because, well, it made sense.

2000-2010

~2010bis

The deep learning revolution.

First production-grade applications
  • Natural language processing (NLP): Speech and handwriting recognition, machine translation, etc.
  • Image classification and object detection
  • Bioinformatics: Drug discovery and toxicology.
Events

2014-2017

The earliest “large” language models were built with recurrent architectures such as the long short-term memory (LSTM).

2017 Attention Is All You Need


[1706.03762] Attention Is All You Need

https://youtu.be/b76gsOSkHB4?t=855

2018-

Todo: Ran out of fuel. Please pick up the torch.

Language models and word embeddings become large language models. LLMs can acquire an embodied knowledge about syntax, semantics and “ontology” inherent in human language corpora, but also inaccuracies and biases present in the corpora.

Image generation leads to Artificial intelligence art - Wikipedia, and beyond. Prompt engineering becomes a thing.

  • 2018: Contemporary LLMs. The main architectures as of 2023, are of one of two types:
    • BERT is a bidirectional Transformer.
    • GPT is a unidirectional (“autoregressive”) Transformer.
  • 2018: GPT-1, https://en.wikipedia.org/wiki/Generative_pre-trained_transformer
  • 2019: GPT-2 - Wikipedia

2020

Unprecedented size. Just download and feed the whole WWW through Common Crawl and friends. OpenAI built a very advanced auto-complete.

https://youtu.be/b76gsOSkHB4?t=1060

2021-

Appendix

Deep learning