Hadoop to MongoDB pipeline

An alternative to using the MongoDB Connector for Hadoop is to use the programming language of our choice to export data from Hadoop and then write into MongoDB using the low-level driver or an ODM as described in previous chapters.

For example in Ruby there are a few options:

  • WebHDFS on GitHub, which uses the WebHDFS or the HttpFS Hadoop API to fetch data from HDFS
  • System calls, using the Hadoop command-line tool and Ruby's system() call

Whereas in Python we can use:

  • HdfsCLI, which uses the WebHDFS or the HttpFS Hadoop API
  • libhdfs, which uses a JNI-based native C wrapped around the HDFS Java client

All of these options require an intermediate server between our Hadoop infrastructure and our MongoDB server but on the other hand allow for more flexibility in the extract transform load (ETL) process of exporting/importing data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.134.103.74