Preparing dataset

As we created a dataset in our previous approach in a CSV file with 2 columns of question and answer pair, we need to do it again but in a different format. In this case, we need questions associated with its intent as shown in the following screenshot, so we have a query as 'hello' and its intent is labeled as 'greet'. Similarly, we will label all the questions with its respective intents.

Once we have all the forms of questions and intents ready, now we need to label the entities. In this case, as shown in the figure we have 'location' entity with value 'centre' and 'cuisine' entity with value as 'Mexican':

To feed data in rasa, we need to store this information in a specific JSON format which looks like:

# intent_list : Only intent part
[
  {
    "text": "hey",
    "intent": "greet"
  },
  {
    "text": "hello",
    "intent": "greet"
  }
]

# entity_list : Intent with entities
[{
  "text": "show me indian restaurants",
  "intent": "restaurant_search",
  "entities": [
    {
      "start": 8,
      "end": 15,
      "value": "indian",
      "entity": "cuisine"
    }
  ]
},
]

The final version of the JSON should have this structure:

{
  "rasa_nlu_data": {
    "entity_examples": [entity_list],
    "intent_examples": [intent_list]
  }
}

To make it simple there is an online tool in which you can feed and annotate all the data and download the JSON version of it. You can run the editor locally by following the instructions from https://github.com/RasaHQ/rasa-nlu-trainer or simply use the online version of it from https://rasahq.github.io/rasa-nlu-trainer/.

Save this JSON file as restaurant.json in the current working directory.

Table of Contents for Preparing dataset

Create new playlist

Sign In

Sign Up

Table of Contents for
Preparing dataset