Challenges

Flickr is one of the longest-standing social networks and it has evolved over the years. Pretty much like Flickr, we have progressed through this book and evolved our methods and techniques across chapters. Flickr presented its own set of challenges and the following is a quick summary of these:

  • API response objects: Flickr has a nicely documented and updated set of APIs which provide access to most of its publicly usable content. The challenge comes from the design and response of these APIs. While the design of the APIs is something for which Flickr engineers must have put in a lot of thought, they pose difficulties for analytical use cases. It is difficult to use multiple API methods to extract data related to a single entity and so on. On the same lines, the response objects are deeply nested and require some thought and creativity before one can preprocess and use the data for any analysis. Moreover, any changes to the APIs may require extensive rework with regards to extraction and preprocessing.
  • Lack of standard packages/libraries: The presence of packages/libraries from a social network platform or a third party doesn't just make life easy, it also helps in keeping things modular and to maintain a separation of concern. As a person working on data science related use cases, playing around with APIs is a basic requirement, yet the existence of standard packages helps in speeding up the overall use case development and helps us to stay focused on solving business problems. Flickr provides a standard set of libraries for a variety of languages, though R is missing from the list. There are a couple of third-party R packages but most of them are outdated or provide limited functionality. You can take this up as a challenge and come up with you own solutions using the learnings from this chapter itself and give back to the community.
  • Data quality: Data quality is a common pain point for any analytical/machine-learning/data science related use case. In the case of Flickr, apart from user-related information, a lot of information (read EXIF) is extracted by the platform from the photos uploaded. Since EXIF is a standard with recommended fields, many attributes are missing or the values are not standardized. One should be careful before using such data.

Apart from these, the few minor challenges are mostly logic/programming or use case related which can be solved with a bit of creativity and of course the Internet.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.204.201