LeCun, Y., L. Bottou, Y. Bengio, and P. Haffner. “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, 1998, pp. 2278–2324. Describes the pathbreaking use of convolutional neural networks to handwriting by a Bell Labs group led by Bengio’s co-awardee Yann LeCun. The paper introduced graph transformer networks, a new approach able to train in an end-to-end way networks composed of specialized modules. It uses Bell Labs’ successful check recognition system as its main example.
Bengio, Y., R. Ducharme, P. Vincent, and C. Jauvin, “A neural probabilistic language model,” Journal of Machine Learning Research, vol. 3, 2003, pp. 1137–1155. A landmark paper in the application of neural networks to natural language processing. Because natural text often includes unique word sequences, training a network to recognize patterns is hard. As Bengio puts it, the paper “introduced high-dimension word embeddings as a representation of word meaning,” which allowed networks to recognize similarities between new sentences and training sentences with similar meanings.
Bahdanau, D., K. Cho & Y. Bengio. Neural Machine Translation by Jointly Learning to Align and Translate. ArXiv:1409.0473, 2016 [Cs, Stat]. http://arxiv.org/abs/1409.0473 Introduced the attention mechanism for machine translation, which helps networks to narrow their focus to only the relevant context at each stage of the translation in ways that reflect the context of words. These attention mechanisms are now at the heart of state-of-the-art natural language processing systems based on machine learning.
Goodfellow, J., J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” Advances in Neural Information Processing Systems, vol. 27, 2014, pp. 2672-2680. Introduced the Generative Adversarial Network (GAN), an idea that was rapidly put into practice by teams around the world. Whereas most networks were designed to recognize patterns, a generative network learns to generate objects that resemble those in the training set. The technique is “adversarial” because a network learning to generate plausible fakes can be trained against another network learning to identify fakes, allowing for unsupervised learning.
Y. LeCun, Bengio, Y. and Hinton, G. E. (2015) “Deep Learning,” Nature, vol. 521, pp 436-444. A recent and accessible summary of the methods that LeCun and his co-winners termed “deep learning,” because of their reliance on neural networks with multiple, specialized, layers of neurons between input and output nodes. It addressed a surge of interest in their work following the successful demonstration of these methods for object categorization, face identification, and speech recognition.
J. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016. Widely adopted textbook, written to satisfy the huge demand for up-to-date and authoritative coverage of the new techniques. Available freely online at https://www.deeplearningbook.org/.
ACM (www.acm.org) is widely recognized as the premier organization for computing professionals, delivering a broad array of resources that advance the computing and IT disciplines, enable professional development, and promote policies and research that benefit society.