AI Transformers and the Self-Attention Mechanism


Original: 5/20/21
Revised: no

   Transformers have been the most interesting new development in neural networks, especially in language applications, where in the beginning neural networks did not enjoy the same success as in computer vision applications. Moreover, large models built around language transformers have shown the ability to "understand" other types of data, not just text. A transformer has an ingenious neural network architecture based on the concept of attention. The architecture bridges nicely with that most prized computer science concept, parallel processing, allowing transformers to be trained in reasonable time on massive datasets. This short note will be a more detailed introduction to transformers, but until then this is a good introduction: