Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

BBC dataset

In 2006, Greene and Cunningham collected the BBC dataset to study a particular document—Clustering challenge using support vector machines. The dataset consists of 2,225 documents from the BBC News website from 2004 to 2005, corresponding to the stories collected from five topical areas: business, entertainment, politics, sport, and technology. The dataset can be seen at the following website: http://mlg.ucd.ie/datasets/bbc.html.

We can download the raw text files under the Dataset: BBC section. You will also notice that the website contains an already processed dataset, but, for this example, we want to process the dataset by ourselves. The ZIP contains five folders, one per topic. The actual documents are placed in the corresponding topic folder, as shown in the following screenshot:

Now, let's build a topic classifier.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

18.118.24.106

Table of Contents for BBC dataset

Create new playlist

Sign In

Sign Up

Table of Contents for
BBC dataset