Chapter 2. Using the OpenAI API

Even though the GPT-3 API is the most sophisticated and complex language model in the world, its capabilities are abstracted as a simple “text-in-text-out” interface to end users. This chapter will get you started with using that interface, Playground, and cover the technical nuances of the GPT-3 API, because it is always the details that reveal the true gems.

Warning

In order to work through this chapter, you will need access to the GPT-3 API. We showed you how to apply in Chapter 1; if you haven’t done that, please do so now. When your access is granted, come back. We’ll be here!

Components of the OpenAI API

Components of API
Figure 2-4. Components of API

Components of API figure—we can have each component name in the block with a one-liner in the white box.

One-Liner for figure components:

Execution engine:

It determines the language model used for execution

Response Length:

The response length sets a limit on how much text the API includes in its completion.

Temperature & Top P:

temperature dial controls the randomness of the response, represented as a range from 0 to 1. Top P controls how many random results the model should consider for completion, as suggested by the temperature dial; it determines the scope of randomness.

Frequency & Presence Penalty:

The frequency penalty decreases the likelihood that the model will repeat the same line verbatim by “punishing” it. The presence penalty increases the likelihood that it will talk about new topics.

Best Of:

Best Of parameter, lets you specify the number of completions (n) to generate on the server side and returns the best of “n” completions.

Stop Sequence:

Stop sequence is a set of characters that signals the API to stop generating completions.

Inject Start & Restart Text:

The inject start text and inject restart text parameters allow you to insert text at the beginning or end of the completion, respectively.

Show Probabilities:

Show Probabilities lets you debug the text prompt by showing the probability of tokens that the model can generate for a given input.

Here is an overview of the components of GPT-3 API. We will discuss all of these components in more detail throughout the chapter.

Execution Engine

The execution engine determines the language model used for execution. Choosing the right engine is the key to determining your model’s capabilities and in turn getting the right output. GPT-3 comes with four execution engines of varying sizes and capabilities: Da Vinci, Ada, Babbage, and Curie. Da Vinci is the most powerful and the Playground’s default.

Response Length

The response length sets a limit on how much text the API includes in its completion. Because OpenAI charges by the length of text generated per API call (as noted, this is translated into tokens, or numeric representations of words), response length is a crucial parameter for anyone on a budget. A higher response length will use more tokens and cost more. For example, if you do a classification task, it is not a good idea to set the response text dial to 100: the API could generate irrelevant texts and use extra tokens that will incur charges on your account.

Temperature and Top P

The temperature dial controls the randomness of the response, represented as a range from 0 to 1. A lower value of temperature means the API will predict the first thing that the model sees; a higher value means the model evaluates possible responses that could fit into the context before predicting the result.

Top P controls how many random results the model should consider for completion, as suggested by the temperature dial; it determines the scope of randomness. Top P’s range is from 0 to 1. A value close to zero means the random responses will be limited to a certain fraction: for example, if the value is 0.1, then only 10% of the random responses will be considered for completion. This makes the engine deterministic, which means that it will always generate the same output for a given input text. If the value is set to 1, the API will consider all responses for completion, taking risks and coming up with creative responses. A lower value limits randomness; a higher value expands its horizons.

Temperature and Top P have a very significant effect on output. It can be confusing at times to get your head around when and how to use them to get the correct output. The two are correlated: changing the value of one will affect the other. So, by setting Top P to 1, you can allow the model to unleash its creativity by exploring the entire spectrum of responses and control the randomness by using the temperature dial.

Note

We always advise changing either Top P or temperature and keeping the other dial set at 1.

Large language models rely on probabilistic approaches rather than conventional logic . They can generate a variety of responses for the same input, depending on how you set the model’s parameters. The model tries to find a best probabilistic match within the universe of data it has been trained on, instead of looking for a perfect solution every time.

GPT-3’s universe of training data is almost all the content on the Internet, so it can generate a wide variety of completions for the same prompt. That’s where temperature and Top P, sometimes called the “creativity dials,” come in: you can tune them to generate more natural or abstract responses with an element of playful creativity.

Let’s say you are going to use GPT-3 to generate names for your startup. You can set the temperature dial to a higher level to get the most creative response. When we were spending days and nights trying to come up with the perfect name for our startup, we dialed up the temperature. GPT-3 came to the rescue and helped us to arrive at a name we love: Kairos Data Labs.

On other occasions, your task might require little to no creativity: classification and question-answering tasks, for example. For these, keep the temperature lower.

Let’s look at a simple classification example that categorizes companies into general buckets or categories based on their names. As you can see in Figure 2-5, I have again used temperature to control the degree of randomness. You can also do this by changing Top P while keeping the temperature dial set to 1.

Temperature component
Figure 2-5. Temperature component

Frequency and Presence Penalties

Like the temperature and Top P dials, the frequency penalty and presence penalty dials consider text prompts (the previous completion plus the new input) instead of internal model parameters when deciding on output. Existing text thus influences the new completions. The frequency penalty decreases the likelihood that the model will repeat the same line verbatim by “punishing” it. The presence penalty increases the likelihood that it will talk about new topics.

These come in handy when you want to prevent the same completion text from being repeated across multiple completions. Although these dials are similar, there is one important distinction. The frequency penalty is applied if the suggested text output is repeated (for example, the model used the exact token in previous completions or during the same session) and the model chooses an old output over a new one. The presence penalty is applied if a token is present in a given text at all.

Best Of

GPT-3 allows you to generate multiple completions on the server side and evaluate them behind the scenes, so it can provide you with the best probabilistic result. Using the “best of” parameter, you can specify the number of completions (n) to generate on the server side. The model will return the best of n completions.

This allows you to evaluate multiple prompt completions in a single API call rather than calling the API repeatedly to check the quality of different completions for the same input. However, using “best of” is expensive: it costs n times the tokens in the prompt. For example, if you select the best-of value as 2, then you will be charged double the tokens present in the input prompt, because in the backend the API will generate two completions and show you the best one.

We advise keeping response length minimal while generating multiple prompts using the best-of parameter to avoid additional charges.

Stop Sequence

A stop sequence is a set of characters that signals the API to stop generating completions. This helps avoid using unnecessary tokens, an essential cost-saving feature for regular users.

You can provide up to 4 sequences for the API to stop generating further tokens.

Let’s look at the example language translation task in Figure 2-9 to understand how the stop sequence works. In this example, English phrases are being translated into French. We use the restart sequence “English:” as a stop sequence: whenever the API encounters that phrase, it will stop generating new tokens.

Stop Sequence component
Figure 2-6. Stop Sequence component

Inject Start Text and Inject Restart Text

The inject start text and inject restart text parameters allow you to insert text at the beginning or end of the completion, respectively. You can use them to keep a desired pattern going. Often, these settings work in tandem with the stop sequence, as in our example. The prompt has the pattern where an English sentence is provided with the prefix “English:” (the restart text) and the translated output is generated with the prefix “French:” (the start text). As a result, anyone can easily distinguish between the two and create a training prompt that both the model and the user can clearly comprehend.

Whenever we run the model for such kinds of prompts, the model automatically injects a start text “French:”before the output and a restart text “English:” before the next input, so that this pattern can be sustained

Show Probabilities

The “show probabilities” parameter is at the bottom of the Playground’s settings pane. In conventional software engineering, developers use a debugger to troubleshoot (debug) a piece of code. You can use the “show probabilities” parameter to debug your text prompt. Whenever you select this parameter, you will see highlighted text. Hovering over it with the cursor will show a list of tokens that the model can generate for the particular input specified, with their respective probabilities.

You can use this parameter to examine your options. In addition, it can make it easier to see alternatives that might be more effective. The “show probabilities” parameter has three settings:

Most Likely

Lists the tokens most likely to be considered for completion, in decreasing order of probability.

Least Likely

Lists the tokens least likely to be considered for completion, in decreasing order of probability.

Full Spectrum

Shows the entire universe of tokens that could be selected for completion.

Let’s look at this parameter in the context of a simple prompt. We want to start the output sentence with a simple, well known phrase: “Once upon a time”. We provide the API with the prompt “Once upon a” and then we check the Most Likely option in the show probabilities tab.

Show Probabilities component   most likely tokens
Figure 2-7. Show Probabilities component – most likely tokens

As Figure 2-7 shows, it generates “time” as the response. Because we have set the “show probabilities” parameter to “Most Likely”, the API shows not only the response but a list of possible options along with their probabilities.

Now that you’ve had an overview, let’s look at these components in more detail.

Execution Engines

As noted above, the OpenAI API offers four different execution engines, differentiated by number of parameters and performance capabilities. Execution engines are what powers OpenAI API. They serve as “autoML” solutions, providing automated ML methods and processes to make machine learning available for non-experts. They are easy to configure and adapt to a given dataset and task.

The four primary execution engines were named after famous scientists in alphabetical order: Ada (named after Ada Lovelace), Babbage (named after Charles Babbage), Curie (named after Madame Marie Curie) and Davinci (named after Leonardo da Vinci). Let’s deep-dive into each of these execution engines to understand when to use which execution engine when working with GPT-3.

Davinci

Da Vinci is the largest execution engine and the default when you first open the Playground. It can do anything the others can, often with fewer instructions and better outcomes. However, the trade-off is that it costs more to use per API call and is slower than other engines. You might want to use other engines to optimize costs and run times.

Note

When testing new ideas and prompts, we recommend starting with Davinci because of its superior capabilities. Experimenting with Davinci is a great way to find out what the API is capable of doing. You can then slowly move down the ladder to optimize budgets and runtimes as you become comfortable with the problem statement. Once you have an idea of what you want to accomplish, you can either stay with Davinci (if cost and speed are not a concern) or you can move on to Curie or other less costly engines and try to optimize the output around its capabilities. You can use OpenAI’s Comparison Tool to generate an Excel spreadsheet that compares engines’ outputs, settings and response times.

Davinci should be your first choice for tasks that require understanding of the content, like summarizing meeting notes or generating creative ad copy. It’s great at solving logic problems and explaining the motives of fictional characters. It can write a story! Davinci has been able to solve some of the most challenging AI problems involving cause and effect.

Curie

Curie aims to find an optimal balance between power and speed, that is very important for performing high-frequency tasks like classification on a very large scale or putting a model into production.

Curie is also quite good at answering questions, performing Q&As, and serving as a general purpose chatbot. For instance if you are building a customer-support chatbot ,you might choose Curie to serve high-volume requests faster.

While Davinci is stronger at analyzing complicated texts, Curie can perform with low latency and lightning-fast speed. It is always sensible to figure out what your use case is and do a cost-benefit analysis before choosing Davinci over Curie in production.

Babbage

Babbage is faster than Cuire but not capable of performing tasks that involve understanding complex intent. However, it is quite capable and is preferable when it comes to semantic search rankings and analyzing how well documents match up with search queries. It’s less expensive than Curie and Davinci and is a preferred choice for simple problems involving frequent API calls.

Ada

Ada is the fastest and least expensive of all the engines available. It can perform simple tasks that do not require a nuanced understanding of context, like parsing text, correcting grammar, or simple classification. It is often possible to improve Ada’s performance by providing more context with the input.

Instruct Model Series

Based on the above four primary models, OpenAI has also launched an instruct series of models that are better at understanding the instructions and following them to produce the specific output as per your need. All you have to do is to tell the model what you want it to do and it will do its best to fulfill your instructions.

Currently there are two instruct series models, davinci-instruct-beta and curie-instruct-beta, built on top of the base models Davinci and Curie. With clear instructions, these models can produce better results than their base counterparts. This series is important in bridging the gap between how humans think and how models operate.

Endpoints

The Playground is a graphical web interface that calls the OpenAI API behind the scenes, but there are several other ways to call the API. To do this, you will need to get familiar with its endpoints: the remote APIs that communicate back and forth when they are called. In this section, you will get familiar with the functionality and usage of eight API endpoints.

List Engines

The list engine API, also known as the “metadata API,” provides a list of available engines along with specific metadata associated with each engine, such as owner and availability. To access it, you can hit the following URI with the HTTP GET method without passing any request parameters:

GET https://api.openai.com/v1/engines

Retrieve Engines

When you provide an engine name to the retrieve engine API, it returns detailed metadata information about that engine. To access, hit the following URI with the HTTP GET method without passing any request parameters:

GET https://api.openai.com/v1/engines/{engine_id }

Completion

Completion is GPT-3’s most famous and widely used API endpoint. It simply takes in the text prompt as input and returns the completed response as output. It uses the HTTP POST method and requires an engine ID as part of the URI path. As part of the HTTP Body, the Completions endpoint accepts several of the additional parameters discussed in the previous section. Its signature is:

POST https://api.openai.com/v1/engines/{engine_id}/completions

Semantic Search

The semantic search endpoint allows you to provide a query in natural language to search a set of documents, which can be words, sentences, paragraphs or even longer texts. It will score and rank the documents based on how semantically related they are to the input query. For example, if you provide the documents ["school”, “hospital”, “park"] and query “the doctor”, you’ll get a different similarity score for each document.

The similarity score is a positive score that usually ranges from 0 to 300 (but can sometimes go higher), where a score above 200 usually indicates that the document is semantically similar to the query. The higher the similarity score, the more semantically similar the document is to the query (in this example, “hospital” will be most similar to “the doctor”). You can provide up to 200 documents as part of your request to the API.1

Following is the signature for the semantic search endpoint:

POST https://api.openai.com/v1/engines/{engine_id}/search

Files Endpoint

Files Endpoint can be used across different endpoints like Answers, Classification and Semantic Search. It is used to upload documents or files to the OpenAI storage which is accessible throughout the API features. The same endpoint can be used with different signatures to perform the following tasks:

List Files

It simply returns the list of files that belong to the user’s organization or that are linked to a particular user’s account. It’s an HTTP GET call which doesn’t require any parameters to be passed with the request.

GET https://api.openai.com/v1/files

Upload Files

It is used to upload a file that contains documents to be used across various endpoints. It uploads the documents to the already allocated internal space by OpenAI for the users organization. It is a HTTP POST call that requires the file path to be added with the API request.

POST https://api.openai.com/v1/files

Retrieve File

It returns the information about a specific file by just providing the file id as the request parameter. Following is the signature for the Retrieve endpoint:

GET https://api.openai.com/v1/files/{file_id}

Delete File

It deletes the specific file by providing the file as the request parameter. Following is the signature for the Delete endpoint:

DELETE https://api.openai.com/v1/files/{file_id}

Classification (Beta)

The classification endpoint lets you leverage a labeled set of examples without fine-tuning. It classifies the query using the provided examples thereby avoiding fine-tuning, and in turn eliminating the need for hyperparameter tuning. You can use it for virtually any machine learning classification task.

The endpoint provides an easy-to-configure “autoML” solution that can easily adapt to the changing label schema. You can provide up to 200 labeled examples as part of the request or a pre-uploaded file can be provided during the query. In addition to providing a URI path, the API endpoint requires a model and query, along with examples. Its signature is:

POST https://api.openai.com/v1/classifications

Answers (Beta)

GPT-3’s question-answering endpoint is still in beta as of this writing in late 2021. When given a question, it generates answers based on information provided in a set of documents or training examples.

For example, if you want to implement a QA endpoint on a set of PDFs, you just upload them using the files endpoint and provide the file ID with the request parameters. The answers endpoint will use those files as the context to answer any query. It also allows you to steer the model’s contextual tone and responses by providing a list of (question, answer) pairs in the form of training examples. It first searches the provided documents or examples to find the relevant context, then combines it with relevant examples and questions to generate a response. Its signature is:

POST https://api.openai.com/v1/answers

GPT-3 Fine-tuning

OpenAI’s research on Process for Adapting Language Models to Society (PALMS) led it to launch a first-of-its-kind fine-tuning endpoint that allows you to get more out of GPT-3 than was previously possible. GPT-3 has been pre-trained on a vast amount of text from the open internet. When given a prompt with just a few examples, it can often intuit what task you are trying to perform and generate a plausible completion. This is called “few-shot learning,” as you learned in Chapter 1.

Fine-tuning is about tweaking the whole model so that it performs every time in the way you wish it to perform. Unlike few-shot learning, which requires you to give models a few examples every time you want to perform a certain task, fine-tuning is about providing many more examples of a desired output. This provides more stable performance over a broader spectrum of inputs, handy when putting a model into production.

Once a model has been fine-tuned, you won’t need to provide examples in the prompt anymore. This saves costs, decreases response times, and increases the quality and reliability of the outputs.

At a very high level, using the fine-tuning endpoint involves the following steps:

  • Prepare new training data and upload it to the OpenAI server

  • Fine-tune the existing models with the new training data

  • Use the fine-tuned model

Prepare and upload training data

Training data is what the model takes in as input for fine-tuning. Your training data must be a JSONL document, where each line is a prompt-completion pair corresponding to a training example. For model fine-tuning you can provide an arbitrary number of examples where it is highly recommended to create a value-targeted dataset to provide the model with quality data and wide representation. Fine-tuning improves performance with more examples, so the more examples you provide, the better the outcome.

Your JSONL document should look something like this:

{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
…

Where prompt text should include the exact prompt text you want to complete, and ideal generated text should include an example of desired completion text that you want GPT-3 to generate.

You can use OpenAI’s CLI data preparation tool to easily convert your data into this file format. CLI data preparation tool accepts files in different formats, with the only requirement that they contain a prompt and a completion column/key. You can pass a CSV, TSV, XLSX, JSON or JSONL file, and it will save the output into a JSONL file ready for fine-tuning. To do that, you can use the following command:

openai tools fine_tunes.prepare_data -f <LOCAL_FILE>

Where LOCAL_FILE is the file you prepared for conversion.

Train a new fine-tuned model

Once you prepare your training data as described above, you can move on to the fine-tuning job with the help of the OpenAI CLI. For that, you need the following command:

openai api fine_tunes.create -t <TRAIN_FILE_ID_OR_PATH> -m <BASE_MODEL>

Where BASE_MODEL is the name of the base model you’re starting from (ada, babbage, or curie). Running this command does several things:

  • Uploads the file using the files endpoint (as discussed earlier in this chapter);

  • Fine-tunes the model using the request configuration from the command;

  • Streams the event logs until the fine-tuning job is completed.

Log streaming is helpful to understand what’s happening in real-time and to respond to any incidents/failures as and when they happen. The streaming may take from minutes to hours depending on the number of jobs in the queue and the size of your dataset.

Using the fine-tuned model

Once the model is successfully fine-tuned, you can start using it! You can now specify this model as a parameter to the Completion Endpoint, and make requests to it using the Playground.

Note

After the fine-tuning job completes, it may take several minutes for your model to become ready to handle requests. If completion requests to your model time out, it is likely because your model is still being loaded. If this happens, try again in a few minutes.

You can start making requests by passing the model name as the model parameter of a completion request using the following command:

              openai api completions.create -m <FINE_TUNED_MODEL> -p <YOUR_PROMPT>
            

The FINE_TUNED_MODEL field will be populated with the name of the model. YOUR_PROMPT will be populated with the prompt you want to complete in this request.

You can continue to use all the Completion Endpoint parameters that were discussed in this chapter, like temperature, frequency_penalty, presence_penalty, etc., on these requests to the newly fine-tuned model as well.

Note

No engine is specified on these requests. This is the intended design and something that OpenAI plans on standardizing across other API endpoints in the future.

Tokens

Before diving deeper into how different prompts consume tokens, let’s look more closely at what is a token.

We’ve told you that tokens are numerical representations of words or characters. Using tokens as a standard measure, GPT-3 can handle training prompts from a few words to entire documents.

For regular English text, 1 token consists of approximately 4 characters. It translates to roughly ¾ of a word, so for 100 tokens there will be approximately 75 words. As a point of reference, the collected works of Shakespeare consists of about 900,000 words, roughly tra 1.2M tokens.

To maintain the latency of API calls, OpenAI imposes a limit of 2,048 tokens (approximately ~1,500 words) for prompts and completions.

To further understand how the tokens are calculated/consumed in the context of GPT-3 and to stay within the limits set by the API let us walk you through the following ways you can measure the token count.

In the Playground, as you enter text into the interface, you can see the token count update in real-time in the footer at the bottom right. It displays the number of tokens that will be consumed by the text prompt after hitting the submit button. You can use it to monitor your token consumption every time you interact with the Playground (see figure 2-8).

Token Count in the Playground
Figure 2-8. Token Count in the Playground

The other way to measure the consumption of tokens is by using the explicit GPT-3 Tokenizer Tool that lets you visualize the formation of tokens from the word characters. You can interact with the Tokenizer tool via a simple text box where you write the prompt text and Tokenizer will show you with the token and character counts along with the detailed visualization.

Tokenizer by OpenAI
Figure 2-9. Tokenizer by OpenAI

For integrating the token count metric in your API calls to different endpoints, you can patch the logprobs and echo attribute along with the API request to get the full list of tokens consumed.

In the next section we will cover how tokens are priced based on the different execution engines.

Pricing

In the last section we talked about tokens, which is the smallest fungible unit used by OpenAI to determine the pricing for API calls. Tokens allow greater flexibility than measuring the number of words or sentences used in the training prompt and due to token’s sheer granularity tokens can be easily processed and used to measure the pricing for a wide range of training prompts.

Everytime you call the API from either the Playground or programmatically, behind the scenes the API calculates the number of tokens used in the training prompt along with the generated completion and charges each call on the basis of the total number of tokens used.

At the moment of writing this book the pricing is the following:

Table 2-1. Table 2.1
Model Price Per 1k Tokens
Davinci (Most powerful) $0.0600
Curie $0.0060
Babbage $0.0012
Ada (Fastest) $0.0008

The company works on the cloud pricing model of “pay as you go”. For the up to date pricing please check: https://beta.openai.com/pricing.

OpenAI generally charges a flat fee per 1k tokens and the fee may vary depending on the type of execution engine used in the API call. Davinci is the most powerful and expensive, while Curie, Ada and Babbage are cheaper and faster.

Instead of monitoring the tokens for each of the API calls, OpenAI provides a reporting dashboard to monitor daily cumulative token usage. Depending on your usage, it may look something like Figure 2-10.

API Usage Dashboard
Figure 2-10. API Usage Dashboard

In figure 2-10 you can see a bar graph showing the daily token consumption from the API usage. The dashboard helps you to monitor the token usage and pricing for your organization. This helps you to regulate the API usage and stay within your budget. There is also an option to monitor the cumulative usage and breakdown of token count per API call. This should give you enough flexibility to create policies around token consumption and pricing for your organization. Now that you understand the ins and outs of the Playground and the API, we will take a look at GPT-3’s performance on typical language modelling tasks.

Performance on Conventional NLP Tasks

GPT-3 is a highly advanced and sophisticated successor to the NLP field, built and trained using the core NLP approaches and deep neural networks. For any AI based modelling approach, the model performance is evaluated in the following way: First you train the model for a specific task (like classification, Q/A, text generation etc) on training data; then you verify the model performance using the test data (unseen data).

In a similar way, there is a standard set of NLP benchmarks for evaluating the performance of NLP models and coming up with a relative model ranking or comparison. This comparison, or relative ranking, allows you to pick and choose the best model for a specific NLP task (business problem).

Conventional NLP Tasks
Figure 2-11. Conventional NLP Tasks

In this section we will discuss the performance of GPT-3 on some standard NLP tasks and compare it with the performance of similar models on respective NLP tasks.

Text Classification

Text classification is the process of categorizing the text into organized groups. By using NLP, text classification can automatically analyze text and then assign a set of predefined tags or categories based on its context.

Text classification involves analyzing the text provided as input and assigning it a label, score, or another attribute that characterizes the text. Some common examples of text classification are sentiment analysis, entity recognition, and anomaly or outlier detection which we discuss in the later part of this section.

You can use a number of approaches to get GTP-3 to classify text, again ranging from zero-shot classification (where you don’t give any examples to the model) to single-shot and few-shot classification (where you show some examples to the model).

Zero-Shot Classification

Modern artificial intelligence has long aimed to develop models that can perform predictive functions on data it has never seen before. This important research area is called zero-shot learning. Similarly, a zero-shot classification is a classification task where no prior training and fine-tuning is required for the model to classify a piece of text.GPT-3 currently produces results for unseen data that are either better or at par with state-of-the-art AI models fine-tuned for that specific purpose. In order to perform zero-shot classification with GPT-3, we must provide it with a compatible prompt. In Chapter 2, we will discuss more about prompt engineering.

Here is an example of a zero-shot classification where the goal is to perform a fact-check analysis to determine if the information included in the tweet is correct or incorrect. Figure 2-12 shows a pretty impressive information correctness classification result based on a zero-shot example.

Zero shot classification example
Figure 2-12. Zero shot classification example

And here is our prompt:

Analyse the tweet in terms of information correctness.
Tweet: “More than 50%of global scientists don’t believe in climate change.”
Analysis:

And the output:

The tweet is incorrect.

Single Shot/ Few Shots Classification

The other approach to text classification is via fine tuning an AI model on a single or few training examples also known as single shot or few shots text classification. When you provide examples of how to classify text, the model can learn information about the object categories based on the samples you provide. This is a superset of zero-shot classification that allows you to classify text by providing the model with some examples. This can be useful specifically for downstream use cases, which require some level of context setting.

Let’s look at the following example of a few-shot classification. We are asking the model to perform a tweet sentiment analysis classification and giving it three tweet examples to illustrate each of the possible labels: positive, neutral, and negative. As you can see from the following screenshot, the model equipped with such a detailed context based on a few examples, is able to very easily perform the sentiment analysis of the next tweet.

Note

When you recreate prompt examples from the book, or create your own, make sure to have adequate line spacing in your prompt. Additional line after a paragraph can result in a very different outcome, so you want to play with that and see what works best for you.

Here is our prompt:

Analyse the tweet in terms of its sentiment. Depending on the sentiment, classify it as positive, neutral, or negative.
Tweet: "I'm seriously worried that super intelligent AI will be disappointed in humans."
Sentiment analysis(positive, neutral, negative): negative
Tweet: "I cannot wait for super intelligent AI to emerge and deepen our understanding of the Universe."
Sentiment analysis(positive, neutral, negative): positive
Tweet: "I think it is neither super likely nor super unlikely that the super intelligent AI will emerge one day."
Sentiment analysis(positive, neutral, negative): neutral
Tweet: "Super intelligent AI is going to be the most exciting discovery in human history."
Sentiment analysis(positive, neutral, negative): 


And the output:

positive
Few shots classification example
Figure 2-13. Few shots classification example

Batch Classification

After understanding the few-shot classification with GPT-3, let’s dive deeper into the batch classification which allows you to classify input samples in batches in a single API call instead of just classifying one example per API call. It is suitable for applications where you want to classify multiple examples in a single go, just like the tweet sentiment analysis task we examined, but analysing a few tweets in a row.

As with few shots classification, you want to provide enough context for the model to achieve the desired result but in a batch configuration format. Here, we define the different categories of tweet sentiment classification using various examples in the batch configuration format. Then we ask the model to analyze the next batch of tweets.

Here is our prompt:

Analyse tweets in terms of their sentiment. Depending on their sentiment, classify them as positive, neutral, or negative.
Tweet: "I'm seriously worried that super intelligent AI will be disappointed in humans."
Sentiment analysis(positive, neutral, negative): negative
###
Tweet: "I cannot wait for super intelligent AI to emerge and deepen our understanding of the Universe."
Sentiment analysis(positive, neutral, negative): positive
###
Tweet: "I think it is neither super likely nor super unlikely that the super intelligent AI will emerge one day."
Sentiment analysis(positive, neutral, negative): neutral
###
Tweet: "Super intelligent AI is going to be the most exciting discovery in human history."
Sentiment analysis(positive, neutral, negative): positive
###
Tweet:
1. "I'm seriously worried that super intelligent AI will be disappointed in humans."
2. "I cannot wait for super intelligent AI to emerge and deepen our understanding of the Universe."
3. "I think it is neither super likely nor super unlikely that the super intelligent AI will emerge one day."
4. "Super intelligent AI is going to be the most exciting discovery in human history."
5. "This is the latest report on the state of the AI"
Tweet sentiment:
1. negative
2. positive
3. neutral
4. positive
5. neutral
Tweet:
1. "I can't stand bad techno music"
2. "This is a tweet"
3. "I can't wait for going to the Moon!!!"
4. "AI is super cute ❤️"
5. "Got VERY ANGRY now!!! ?"
Tweet sentiment:
1.

And the output:

1. negative
2. neutral
3. positive
4. positive
5. negative

As you can see in Figure 2-14, the model recreated the batch sentiment analysis format and classified the tweets successfully. Now let’s move on to see how it performs at the Named Entity Recognition tasks.

Batch classification example  Part 1
Figure 2-14. Batch classification example (Part-1)
Batch classification example  Part 2
Figure 2-15. Batch classification example (Part-2)

Named Entity Recognition

Named entity recognition is an information extraction task that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, expressions of time, quantities, monetary values, percentages, etc.

NER helps to make the responses more personalized and relevant but the current state-of-the-art approaches require massive amounts of data for training before you even start with the prediction. GPT-,3 on the other hand, can work out of the box for recognizing general entities like people, places, and organizations without humans providing even a single training example.

In the following example we used a davinci-instruct-series version of the model that was in beta at the time of writing this book and the model gathers prompts to train and improve the future OpenAI API models. We gave it a simple task: to extract contact information from an example email. It successfully completed the task on the first attempt (Figure 2-16).

Here is our input:

Extract the name and mailing address from this email:
Shubham,
It was great to talk to you the other day!
I'm SO looking forward to start working on our book.
Here's my address 1307 Roosevelt Street, San Francisco CA	94107
Best,
Sandra Kublik
Name and mailing address:

And the output:

Sandra Kublik
1307 Roosevelt Street, San Francisco CA 94107
NER example
Figure 2-16. NER example

Text Summarization

Text summarization is a technique for generating a concise and exact summary of lengthy texts while focusing on the sections that convey useful information, without losing the overall meaning. GPT-3 based text summarization aims to transform lengthy pieces of texts into their condensed tl;dr2 versions. Such tasks are generally difficult and costly to accomplish manually. With GPT-3, it is a matter of a single input and a few seconds!

NLP models can be trained to comprehend documents and identify the sections that convey important facts and information before producing the required summarized texts. However, such models need a large amount of training samples before they can learn the context and start summarizing the unseen input.

GPT-3 solves this problem with its zero-shot and few-shot approach towards text summarization. With GPT-3 there are multiple ways you can go about summarizing the text involving basic summaries, one-line summaries, grade-level summaries depending on your use-case. Let’s have a quick walkthrough of these approaches.

Text summarization example
Figure 2-17. Text summarization example

Most of the time the model is able to generate decent results in the form of a review summary, but sometimes it can output irrelevant results depending on the prior context. To avoid the problem of getting unwanted results, you can set the Best of parameter to 3 which will always give you the best of three results generated by the API. In the example shown in Figure 2-17, after a few tries and minor parameter tweaking, we got decent results.

Here is our prompt:

The problem is that when it works, it works, but when it doesn’t, it really doesn’t. Luckily, the game’s low points like the horrible binary stealth system and the bugs and lack of QoL are either fixable or vastly outweighed by the good bits, and the overall tally still ends up being much higher than the average of many other games out there. That’s helped a lot by the moment-to-moment gameplay, which pays off just as much as the story beats; the movement system lets you climb almost any short object, vehicles are weighty to drive, and have fantastic interior and exterior designs, and guns feel properly loud and heavy. Its story and narrative kept me glued to the screen, and everything – from sex scenes and romance options to Keanu Reeves trying to kill you or help you throughout the game – are remarkably well done. Cyberpunk 2077 is a game crafted with love, and it shows.
tl;dr: 

And the output:

Cyberpunk 2077 is a game crafted with love, and it shows. The story is great, the gameplay is great, the worldbuilding is great. It's not perfect by any means, but it's still one of the best games I've played in years. 
The Good: 
- Great story and characters 
- Amazing worldbuilding 
- Excellent gunplay and movement system 
The Bad: 
- Binary stealth system (sometimes you're seen no matter what) - Bugs 

Since many people spend hours per day reading and writing emails, summarizing them is a widely desired use case for GPT-3. Let’s see how GPT-3 does with summarizing a three-paragraph email into one crisp line (Figure 2-18).

Email summarization example
Figure 2-18. Email summarization example

To achieve the desired result, we pasted the full email and then simply added “one-sentence summary:'' at the end. We also included a “.” stop sequence to tell the model that it should stop its summary generation after a single sentence.

Our prompt:

Hey Greg,
We are Sandra and Shubham. We are the co-authors of a GPT-3 based book that aims to be a go-to manual for API users. It will provide the readers with the knowledge to get started with GPT-3 as well as the context of the growing ecosystem surrounding it. The book will be released by O’Reilly publishing house by early 2022.
We’d like to connect and explore the possibility of collaboration with OpenAI to evangelize access to API for people from all backgrounds and enable them to build exciting new solutions using GPT-3. We would appreciate your input to ensure we accurately represent OpenAI's stance on GPT-3 and inform our readers on the upcoming advancements/features of the API.
It will be great if we can catch up sometime this week to discuss it further?
Best,
S&S
one-sentence summary:

And the output:

We’re writing a book on GPT-3 and would like to connect with OpenAI to get a better understanding of the API and the future developments

Text Generation

Before the introduction of GPT-3, there was a general understanding that AI models are capable of carrying a short conversation with humans that answer specific questions or handle specific tasks. However the models were not sophisticated enough to handle complicated text generation tasks and started to lose track whenever they encountered something complex or abstract.

In the complicated world of natural language generation, GPT-3 has shaken the notion of language models being limited to trivial tasks. Text generation is the greatest strength of GPT-3. It is capable of generating textual content that is almost indistinguishable from human-written text. GPT-3 is trained on billions of words scraped from the internet to generate text in response to a variety of prompts. It generates an average of 4.5 billion words per day, according to OpenAI.

In the next example, we experiment with using GPT-3 to create content for a personal productivity app startup. We gave the model only minimal context. It generated many of the examples in Figure 2-19 on the first take.

Article Generation

Article generation example
Figure 2-19. Article generation example

Our prompt:

​​Here are the top 5 tips for personal productivity.

And the output:

1. Develop a routine 
Work can be overwhelming. You don’t need to remember every little detail. By having a routine, you will be able to work more efficiently. 
2. Avoid multitasking 
If you are multitasking, you are bound to miss something. It is better to work on one task at a time. 
3. Take breaks 
It is true that if you do not take breaks, you will be more productive. However, if you take breaks too often you will get tired and stressed. If you work more effectively, you will shorten the time needed to complete a task. 
4. Learn to delegate 
If you have a lot of work to do, you should not do everything on your own. Delegate some of the work to others. 
5. Prioritize 
Prioritize what you have to do and do the most important tasks first. Don’t worry about the small things.

Social media Post Generation

  Social media post generation example
Figure 2-20. - Social media post generation example

Here is our prompt:

Write a social media post for Facebook, Twitter, and Instagram about launching a new beta version of the personal productivity app.
Facebook post:

And the output:

“After extensive user testing, we are happy to announce that our latest update is available for download! Our team has worked hard to create the best version of our app yet. Download it now for free at www.app.com.” 
Twitter post: 
“We're rolling out an updated version of your favorite productivity app! Now available on iOS and Android. #productivity #beta”
Instagram post: 
“We just released a new beta version of your favorite productivity app! Download it today for free on iOS and Android. #productivity #beta”

Summary

In this chapter we have covered the OpenAI playground, prompt engineering and the different components of the OpenAI API followed by playground examples covering the major NLP tasks. By now, you have an understanding of how the API works in tandem with different components and how to use Playground as the base to design and experiment with different training prompts.

In the next chapter, we will walk you through how to use GPT-3 with different programming languages to integrate the API within your product or build a completely new application from scratch.

1 For more than 200 documents, OpenAI offers a beta API.

2 A longstanding Internet abbreviation for “too long; didn’t read.”

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
35.171.22.220