For analysis, you’ll commonly store tweets in:
CSV files—A file format that we introduced in the “Files and Exceptions” chapter.
pandas DataFrame
s in memory—CSV files can be loaded easily into DataFrame
s for cleaning and manipulation.
SQL databases—Such as MySQL, a free and open source relational database management system (RDBMS).
NoSQL databases—Twitter returns tweets as JSON documents, so the natural way to store them is in a NoSQL JSON document database, such as MongoDB. Tweepy generally hides the JSON from the developer. If you’d like to manipulate the JSON directly, use the techniques we present in the “Big Data: Hadoop, Spark, NoSQL and IoT Databases” chapter, where we’ll look at the PyMongo library.
3.144.127.232