Transformers for Natural Language Processing
Second Edition
Build, train, and fine-tune deep neural network architectures for NLP with Python, Hugging Face, and OpenAI’s GPT-3, ChatGPT, and GPT-4
Transformers for Natural Language Processing
Second Edition
Copyright © 2022 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Producer: Tushar Gupta
Acquisition Editor – Peer Reviews: Saby Dsilva
Project Editor: Janice Gonsalves
Content Development Editor: Bhavesh Amin
Copy Editor: Safis Editing
Technical Editor: Karan Sonawane
Proofreader: Safis Editing
Indexer: Pratik Shirodkar
Presentation Designer: Pranit Padwal
First published: January 2021
Second edition: March 2022
Production reference: 5270423
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-80324-733-5
In less than four years, Transformers took the NLP community by storm, breaking any record achieved in the previous 30 years. Models such as BERT, T5, and GPT, now constitute the fundamental building bricks for new applications in everything from computer vision to speech recognition to translation to protein sequencing to writing code. For this reason, Stanford has recently introduced the term foundation models to define a set of large language models based on giant pre-trained transformers. All of this progress is thanks to a few simple ideas.
This book is a reference for everyone interested in understanding how transformers work both from a theoretical and from a practical perspective. The author does a tremendous job of explaining how to use transformers step-by-step with a hands-on approach. After reading this book, you will be ready to use this state-of-the-art set of techniques for empowering your deep learning applications. In Particular, this book gives a solid background on the architecture of transformers before covering, in detail, popular models, such as BERT, RoBERTa, T5, and GPT-3. It also explains many use cases (text summarization, image labeling, question-answering, sentiment analysis, and fake news analysis) that transformers can cover.
If these topics interest you, then this is definitely a worthwhile book. The first edition always has a place on my desk, and the same is going to happen with the second edition.
Antonio Gulli
Engineering Director for the Office of the CTO, Google
graduated from Sorbonne University and Paris Diderot University, designing one of the first patented encoding and embedding systems. He authored one of the first patented AI cognitive robots and bots. He began his career delivering Natural Language Processing (NLP) chatbots for Moët et Chandon and an AI tactical defense optimizer for Airbus (formerly Aerospatiale). Denis then authored an AI resource optimizer for IBM and luxury brands, leading to an Advanced Planning and Scheduling (APS) solution used worldwide
I want to thank the corporations that trusted me from the start to deliver artificial intelligence solutions and shared the risks of continuous innovation. I also want to thank my family, who always believed I would make it.
is a Ph.D. candidate at the University of North Texas in the Department of Computer Science, where he also got his master’s degree in computer science. He received his bachelor’s degree in electrical engineering in his home country, Romania.
He worked for 10 months at TCF Bank, where he helped put together the machine learning operation framework for automatic model deployment and monitoring. He did three internships for State Farm as a data scientist and machine learning engineer. He worked as a data scientist and machine learning engineer for the University of North Texas’ High-Performance Computing Center for 2 years. He has been working in the research field of natural language processing for 5 years, with the last 3 years spent working with transformer models. His research interests are in dialogue generation with persona.
He was a technical reviewer for the first edition of Transformers for Natural Language Processing by Denis Rothman.
He is currently working toward his doctoral thesis in casual dialog generation with persona.
In his free time, George likes to share his knowledge of state-of-the-art language models with tutorials and articles and help other researchers in the field of NLP.
Join the book’s Discord workspace:
https://www.packt.link/Transformers
18.220.202.209