Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Appendix V — Answers to the Questions

Chapter 1, What are Transformers?

We are still in the Third Industrial Revolution. (True/False)
False. Eras in history indeed overlap. However, the Third Industrial Revolution focused on making the world digital. The Fourth Industrial Revolution has begun to connect everything to everything else: systems, machines, bots, robots, algorithms, and more.
The Fourth Industrial Revolution is connecting everything to everything else. (True/False)
True. This leads to an increasing amount of automated decisions that formerly required human intervention.
Industry 4.0 developers will sometimes have no AI development to do. (True/False)
True. In some projects, AI will be an online service that requires no development.
Industry 4.0 developers might have to implement transformers from scratch. (True/False)
True. In some projects, not all, standard online services or APIs might not satisfy the needs of a project. There may not be a satisfactory solution for a project in some cases. Instead, a developer will have to customize a model significantly and work from scratch.
It’s not necessary to learn more than one transformer ecosystem, such as Hugging Face, for example. (True/False)
False. A corporation’s policy might be to work only with Google Cloud AI or Microsoft Azure AI. Hugging Face might be a tool used in another company. You can’t know in advance and, in most cases, cannot decide.
A ready-to-use transformer API can satisfy all needs. (True/False)
True if it is effective. False if the transformer model is not well trained.
A company will accept the transformer ecosystem a developer knows best. (True/False)
False. A company may or may not accept what a developer suggests. Therefore, it’s safer to cover as many bases as possible.
Cloud transformers have become mainstream. (True/False)
True.
A transformer project can be run on a laptop. (True/False)
True for a prototype, for example. False for a project involving thousands of users.
Industry 4.0 AI specialists will have to be more flexible (True/False)
True.

Chapter 2, Getting Started with the Architecture of the Transformer Model

NLP transduction can encode and decode text representations. (True/False)
True. NLP is transduction that converts sequences (written or oral) into numerical representations, processes them, and decodes the results back into text.
Natural Language Understanding (NLU) is a subset of Natural Language Processing (NLP). (True/False)
True.
Language modeling algorithms generate probable sequences of words based on input sequences. (True/False)
True.
A transformer is a customized LSTM with a CNN layer. (True/False)
False. A transformer does not contain an LSTM or a CNN at all.
A transformer does not contain LSTM or CNN layers. (True/False)
True.
Attention examines all the tokens in a sequence, not just the last one. (True/False)
True.
A transformer does not use positional encoding. (True/False)
False. A transformer uses positional encoding.
A transformer contains a feedforward network. (True/False)
True.
The masked multi-headed attention component of the decoder of a transformer prevents the algorithm parsing a given position from seeing the rest of a sequence that is being processed. (True/False)
True.
Transformers can analyze long-distance dependencies better than LSTMs. (True/False)
True.

Chapter 3, Fine-Tuning BERT Models

BERT stands for Bidirectional Encoder Representations from Transformers. (True/False)
True.
BERT is a two-step framework. Step 1 is pretraining. Step 2 is fine-tuning. (True/False)
True.
Fine-tuning a BERT model implies training parameters from scratch. (True/False)
False. BERT fine-tuning is initialized with the trained parameters of pretraining.
BERT only pretrains using all downstream tasks. (True/False)
False.
BERT pretrains on Masked Language Modeling (MLM). (True/False)
True.
BERT pretrains on Next Sentence Prediction (NSP). (True/False)
True.
BERT pretrains on mathematical functions. (True/False)
False.
A question-answer task is a downstream task. (True/False)
True.
A BERT pretraining model does not require tokenization. (True/False)
False.
Fine-tuning a BERT model takes less time than pretraining. (True/False)
True.

Chapter 4, Pretraining a RoBERTa Model from Scratch

RoBERTa uses a byte-level byte-pair encoding tokenizer. (True/False)
True.
A trained Hugging Face tokenizer produces merges.txt and vocab.json. (True/False)
True.
RoBERTa does not use token-type IDs. (True/False)
True.
DistilBERT has 6 layers and 12 heads. (True/False)
True.
A transformer model with 80 million parameters is enormous. (True/False)
False. 80 million parameters is a small model.
We cannot train a tokenizer. (True/False)
False. A tokenizer can be trained.
A BERT-like model has six decoder layers. (True/False)
False. BERT contains six encoder layers, not decoder layers.
MLM predicts a word contained in a mask token in a sentence. (True/False)
True.
A BERT-like model has no self-attention sublayers. (True/False)
False. BERT has self-attention layers.
Data collators are helpful for backpropagation. (True/False)
True.

Chapter 5, Downstream NLP Tasks with Transformers

Machine intelligence uses the same data as humans to make predictions. (True/False)
True and False.

True. In some cases, machine intelligence surpasses humans when processing massive amounts of data to extract meaning and perform a range of tasks that would take centuries for humans to process.

False. For NLU, humans have access to more information through their senses. Machine intelligence relies on what humans provide for all types of media.
SuperGLUE is more difficult than GLUE for NLP models. (True/False)
True.
BoolQ expects a binary answer. (True/False)
True.
WiC stands for Words in Context. (True/False)
True.
Recognizing Textual Entailment (RTE) detects whether one sequence entails another sequence. (True/False)
True.
A Winograd schema predicts whether a verb is spelled correctly. (True/False)
False. Winograd schemas mainly apply to pronoun disambiguation.
Transformer models now occupy the top ranks of GLUE and SuperGLUE. (True/False)
True.
Human Baseline Standards are not defined once and for all. They were made tougher to attain by SuperGLUE. (True/False)
True.
Transformer models will never beat SuperGLUE human baseline standards. (True/False)
True and False.

False. Transformer models beat human baselines for GLUE and will do the same for SuperGLUE in the future.

True. We will keep setting higher benchmark standards as we progress in the field of NLU.
Variants of transformer models have outperformed RNN and CNN models. (True/False)
True. But you never know what will happen in the future in AI!

Chapter 6, Machine Translation with the Transformer

Machine translation has now exceeded human baselines. (True/False)
False. Machine translation is one of the most challenging NLP ML tasks.
Machine translation requires large datasets. (True/False)
True.
There is no need to compare transformer models using the same datasets. (True/False)
False. The only way to compare different models is to use the same datasets.
BLEU is the French word for blue and is the acronym of an NLP metric. (True/False)
True. BLEU stands for Bilingual Evaluation Understudy Score, making it easy to remember.
Smoothing techniques enhance BERT. (True/False)
True.
German-English is the same as English-German for machine translation. (True/False)
False. Representing German and then translating into another language is not the same process as representing English and translating into another language. The language structures are not the same.
The original Transformer multi-head attention sublayer has two heads. (True/False)
False. Each attention sublayer has eight heads.
The original Transformer encoder has six layers. (True/False)
True.
The original Transformer encoder has six layers but only two decoder layers. (True/False)
False. There are six decoder layers.
You can train transformers without decoders. (True/False)
True. The architecture of BERT only contains encoders.

Chapter 7, The Rise of Suprahuman Transformers with GPT-3 Engines

A zero-shot method trains the parameters once. (True/False)
False. No parameters are trained.
Gradient updates are performed when running zero-shot models. (True/False)
False.
GPT models only have a decoder stack. (True/False)
True.
It is impossible to train a 117M GPT model on a local machine. (True/False)
False. We trained one in this chapter.
It is impossible to train the GPT-2 model with a specific dataset. (True/False)
False. We trained one in this chapter.
A GPT-2 model cannot be conditioned to generate text. (True/False)
False. We implemented this in this chapter.
A GPT-2 model can analyze the context of input and produce completion content. (True/False)
True.
We cannot interact with a 345M GTP parameter model on a machine with fewer than eight GPUs. (True/False).
False. We interacted with a model of this size in this chapter.
Supercomputers with 285,000 CPUs do not exist. (True/False)
False.
Supercomputers with thousands of GPUs are game-changers in AI. (True/False)
True. Thanks to this, we will be able to build models with increasing numbers of parameters and connections.

Chapter 8, Applying Transformers to Legal and Financial Documents for AI Text Summarization

T5 models only have encoder stacks like BERT models. (True/False)
False.
T5 models have both encoder and decoder stacks. (True/False)
True.
T5 models use relative positional encoding, not absolute positional encoding. (True/False)
True.
Text-to-text models are only designed for summarization. (True/False)
False.
Text-to-text models apply a prefix to the input sequence that determines the NLP task. (True/False)
True.
T5 models require specific hyperparameters for each task. (True/False)
False.
One of the advantages of text-to-text models is that they use the same hyperparameters for all NLP tasks. (True/False)
True.
T5 transformers do not contain a feedforward network. (True/False)
False.
Hugging Face is a framework that makes transformers easier to implement. (True/False)
True.
OpenAI’s transformer engines are game-changers. (True/False)
True. OpenAI has produced a wide range of ready-to-use engines such as Codex (language to code) or Davinci (a general purpose engine).

Chapter 9, Matching Tokenizers and Datasets

A tokenized dictionary contains every word that exists in a language. (True/False)
False.
Pretrained tokenizers can encode any dataset. (True/False)
False.
It is good practice to check a database before using it. (True/False)
True.
It is good practice to eliminate obscene data from datasets. (True/False)
True.
It is good practice to delete data containing discriminating assertions. (True/False)
True.
Raw datasets might sometimes produce relationships between noisy content and useful content. (True/False)
True.
A standard pretrained tokenizer contains the English vocabulary of the past 700 years. (True/False)
False.
Old English can create problems when encoding data with a tokenizer trained in modern English. (True/False)
True.
Medical and other types of jargon can create problems when encoding data with a tokenizer trained in modern English. (True/False)
True.
Controlling the output of the encoded data produced by a pretrained tokenizer is good practice. (True/False)
True.

Chapter 10, Semantic Role Labeling with BERT-Based Transformers

Semantic Role Labeling (SRL) is a text generation task. (True/False)
False.
A predicate is a noun. (True/False)
False.
A verb is a predicate. (True/False)
True.
Arguments can describe who and what is doing something. (True/False)
True.
A modifier can be an adverb. (True/False)
True.
A modifier can be a location. (True/False)
True.
A BERT-based model contains encoder and decoder stacks. (True/False)
False.
A BERT-based SRL model has standard input formats. (True/False)
True.
Transformers can solve any SRL task. (True/False)
False.

Chapter 11, Let Your Data Do the Talking: Story, Questions, and Answers

A trained transformer model can answer any question. (True/False)
False.
Question-answering requires no further research. It is perfect as it is. (True/False)
False.
Named Entity Recognition (NER) can provide useful information when looking for meaningful questions. (True/False)
True.
Semantic Role Labeling (SRL) is useless when preparing questions. (True/False)
False.
A question generator is an excellent way to produce questions. (True/False)
True.
Implementing question-answering requires careful project management. (True/False)
True.
ELECTRA models have the same architecture as GPT-2. (True/False)
False.
ELECTRA models have the same architecture as BERT but are trained as discriminators. (True/False)
True.
NER can recognize a location and label it as I-LOC. (True/False)
True.
NER can recognize a person and label that person as I-PER. (True/False)
True.

Chapter 12, Detecting Customer Emotions to Make Predictions

It is not necessary to pretrain transformers for sentiment analysis. (True/False)
False.
A sentence is always positive or negative. It cannot be neutral. (True/False)
False.
The principle of compositionality signifies that a transformer must grasp every part of a sentence to understand it. (True/False)
True.
RoBERTa-large was designed to improve the pretraining process of transformer models. (True/False)
True.
A transformer can provide feedback that informs us of whether a customer is satisfied or not. (True/False)
True.
If the sentiment analysis of a product or service is consistently negative, it helps us make appropriate decisions to improve our offer. (True/False)
True.
If a model fails to provide a good result on a task, it requires more training or fine-tuning before changing models. (True/False)
True.

Chapter 13, Analyzing Fake News with Transformers

News labeled as fake news is always fake. (True/False)
False.
News that everybody agrees with is always accurate. (True/False)
False.
Transformers can be used to run sentiment analysis on Tweets. (True/False)
True.
Key entities can be extracted from Facebook messages with a DistilBERT model running NER. (True/False)
True.
Key verbs can be identified from YouTube chats with BERT-based models running SRL. (True/False)
True.
Emotional reactions are a natural first response to fake news. (True/False)
True.
A rational approach to fake news can help clarify one’s position. (True/False)
True.
Connecting transformers to reliable websites can help somebody understand why some news is fake. (True/False)
True.
Transformers can make summaries of reliable websites to help us understand some of the topics labeled as fake news. (True/False)
True.
You can change the world if you use AI for the good of us all. (True/False)
True.

Chapter 14, Interpreting Black Box Transformer Models

BERTViz only shows the output of the last layer of the BERT model. (True/False)
False. BERTViz displays the outputs of all the layers.
BERTViz shows the attention heads of each layer of a BERT model. (True/False)
True.
BERTViz shows how the tokens relate to each other. (True/False)
True.
LIT shows the inner workings of the attention heads like BERTViz. (True/False)
False. However, LIT makes non-probing predictions.
Probing is a way for an algorithm to predict language representations. (True/False)
True.
NER is a probing task. (True/False)
True.
PCA and UMAP are non-probing tasks. (True/False)
True.
LIME is model-agnostic. (True/False)
True.
Transformers deepen the relationships of the tokens layer by layer. (True/False)
True.
Visual transformer model interpretation adds a new dimension to interpretable AI. (True/False)
True.

Chapter 15, From NLP to Task-Agnostic Transformer Models

Reformer transformer models don’t contain encoders. (True/False)
False. Reformer transformer models contain encoders.
Reformer transformer models don’t contain decoders. (True/False)
False. Reformer transformer models contain encoders and decoders.
The inputs are stored layer by layer in Reformer models. (True/False)
False. The inputs are recomputed at each level, thus saving memory.
DeBERTa transformer models disentangle content and positions. (True/False)
True.
It is necessary to test the hundreds of pretrained transformer models before choosing one for a project. (True/False)
True and False. You can try all of the models, or you can choose a very reliable model and implement it to fit your needs.
The latest transformer model is always the best. (True/False)
True and false. A lot of research is being produced on transformers, but some experimental models are short-lived. Sometimes, though, the latest model exceeds the performance of preceding models.
It is better to have one transformer model per NLP task than one multi-task transformer model. (True/False)
True and False. This is a personal decision you will have to make. Risk assessment is a critical aspect of a project.
A transformer model always needs to be fine-tuned. (True/False)
False. GPT-3 engines are zero-shot models.
OpenAI GPT-3 engines can perform a wide range of NLP tasks without fine-tuning. (True/False)
True.
It is always better to implement an AI algorithm on a local server. (True/False)
False. It depends on your project. It’s a risk assessment you will have to make.

Chapter 16, The Emergence of Transformer-Driven Copilots

AI copilots that can generate code automatically do not exist. (True/False)
False. GitHub Copilot, for example, is now in production.
AI copilots will never replace humans. (True/False)
True and false. AI will take over many tasks in sales, support, maintenance, and other domains. However, many complex tasks will still require human intervention.
GPT-3 engines can only do one task. (True/False)
False. GPT-3 engines can do a wide variety of tasks.
Transformers can be trained to be recommenders. (True/False)
True. Transformers have gone from language sequences to sequences in many domains.
Transformers can only process language. (True/False)
False. Once transformers are trained for language sequences, they can analyze many other types of sequences.
A transformer sequence can only contain words. (True/False)
False. Once the language sequences are processed, transformers only work on numbers, not words.
Vision transformers cannot equal CNNs. (True/False)
False. Transformers are deep neural networks that can equal CNNs in computer vision.
AI robots with computer vision do not exist. (True/False)
False. For example, robots with computer vision have begun to surface in military applications.
It is impossible to produce Python source code automatically. (True/False)
False. Microsoft and OpenAI have joined to produce a copilot that can write Python code with us or for us.
We might one day become the copilots of robots. (True/False)
This could be true or false. This remains a challenge for humans, bots, and robots in an ever-growing AI ecosystem.

Chapter 17, The Consolidation of Suprahuman Transformers with OpenAI’s ChatGPT and GPT-4

GPT-4 is sentient. (True/False)
False. GPT-4 is a mathematical algorithm. It does not need to be sentient to learn statistical patterns to do a wide variety of tasks.
ChatGPT can replace a human expert. (True/False)
False. ChatGPT can produce results based on its datasets. However, it cannot make subject matter expert (SME) decisions.
GPT-4 can generate source code for any task? (True/False)
False. GPT-4 can generate source code for many tasks. However, for complex problems, human intervention is required.
Advanced prompt engineering is intuitive. (True/False)
False. Advanced prompt engineering has become a skill that is based on in-depth knowledge of transformers. Advanced prompt engineering involves building knowledge bases, multiple types of objects for the completion of APIs, and understanding the many models available.
The most advanced transformer, such as GPT-4, is the best model to use. (True/False)
False. It is like choosing a car model. The most expensive and fastest car is not necessarily the one you need. Some projects might not require the most powerful transformer model but only the sufficient one.
Developing applications with transformers will require no training since copilots such as GPT-4 can do the job. (True/False)
False. Copilots are helpers. Transformer models can sometimes write a complete function or program. However, for more complex programs, human intervention is necessary.
GPT-4 will be the last OpenAI transformer model because it has reached the limit of AI. (True/False)
False. OpenAI will no doubt produce better models, as will its competitors.

Join our book’s Discord space

Join the book’s Discord workspace:

https://www.packt.link/Transformers

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Appendix V — Answers to the Questions

Create new playlist

Sign In

Sign Up

Appendix V — Answers to the Questions

Chapter 1, What are Transformers?

Chapter 2, Getting Started with the Architecture of the Transformer Model

Chapter 3, Fine-Tuning BERT Models

Chapter 4, Pretraining a RoBERTa Model from Scratch

Chapter 5, Downstream NLP Tasks with Transformers

Chapter 6, Machine Translation with the Transformer

Chapter 7, The Rise of Suprahuman Transformers with GPT-3 Engines

Chapter 8, Applying Transformers to Legal and Financial Documents for AI Text Summarization

Chapter 9, Matching Tokenizers and Datasets

Chapter 10, Semantic Role Labeling with BERT-Based Transformers

Chapter 11, Let Your Data Do the Talking: Story, Questions, and Answers

Chapter 12, Detecting Customer Emotions to Make Predictions

Chapter 13, Analyzing Fake News with Transformers

Chapter 14, Interpreting Black Box Transformer Models

Chapter 15, From NLP to Task-Agnostic Transformer Models

Chapter 16, The Emergence of Transformer-Driven Copilots

Chapter 17, The Consolidation of Suprahuman Transformers with OpenAI’s ChatGPT and GPT-4

Join our book’s Discord space

Table of Contents for
Appendix V — Answers to the Questions