0%

Recent advances in machine learning have lowered the barriers to creating and using ML models. But understanding what these models are doing has only become more difficult. We discuss technological advances with little understanding of how they work and struggle to develop a comfortable intuition for new functionality.

In this report, authors Austin Eovito and Marina Danilevsky from IBM focus on how to think about neural network-based language model architectures. They guide you through various models (neural networks, RNN/LSTM, encoder-decoder, attention/transformers) to convey a sense of their abilities without getting entangled in the complex details. The report uses simple examples of how humans approach language in specific applications to explore and compare how different neural network-based language models work.

This report will empower you to better understand how machines understand language.

  • Dive deep into the basic task of a language model to predict the next word, and use it as a lens to understand neural network language models
  • Explore encoder-decoder architecture through abstractive text summarization
  • Use machine translation to understand the attention mechanism and transformer architecture
  • Examine the current state of machine language understanding to discern what these language models are good at and their risks and weaknesses

Table of Contents

  1. 1. Introduction: What Is It like to Be a Language Model?
    1. Why Do You Need to Read This Report?
    2. What Is a Language Model?
    3. What Does a Language Model Do?
    4. Are Language Models like Humans?
    5. How Does a Language Model Learn?
    6. How Does an LM Represent Language?
    7. Road Map of the Rest of the Report
  2. 2. Meet the Neural Model Family
    1. What Do Humans Want to Remember?
    2. Do Machines Dream of Electric Cake?
    3. Neural Networks for Language Modeling
    4. Predicting the Next Word with Neural Networks
    5. RNN: Adapting Neural Architecture to Language Modeling
    6. LSTM: Adding a Separate Memory Structure to RNN
    7. On Bidirectionality
    8. Considerations on the Use of RNNs
    9. Key Takeaways
  3. 3. Two Heads Are Better than One: Encoder-Decoder Architecture
    1. How Do Humans Summarize?
    2. Encoder-Decoders for Language Modeling
    3. Considerations on the Use of Encoder-Decoder Architecture
    4. Key Takeaways
  4. 4. Choosing What to Care About: Attention and Transformers
    1. Communicating Across Languages
    2. Attention for Language Modeling: Concentrating on Only the Relevant Input
    3. Is Attention Interpretable?
    4. Transformer Architecture for Language Modeling
    5. Multihead Attention
    6. Considerations on the Use of Transformer Architecture
    7. Key Takeaways
  5. 5. Machine Language Understanding
    1. What Do Language Models Understand?
    2. Language Models Are Not Always Enough
    3. With Great Power Comes Great Responsibility
    4. Final Takeaways
3.81.23.50