Is big data for me?

Perhaps you might be working on a file that is hundreds of megabytes in size, and your current data analysis tool is performing too slowly. Since big data typically involves terabytes or gigabytes worth of storage, you might be wondering, Is Apache Hadoop for me?

Note that Hadoop implements one general computation by mapping every single entity on your data, and then performing some reduction computation to add up the individual parts. You might be able to achieve the same grouping and counting function using programming languages, such as SQL and Python. In addition, writing the code allows you to express the computational flow more easily.

Or you might want to consider migrating your data analysis tools to pandas or R. They are very powerful and able to handle gigabytes of data when coded efficiently with no memory leaks. Most commercial SQL servers are up to the task. Besides, memory and storage costs are so affordable that you could be performing heavy computations on your local workstation.

If your data storage runs into terabytes, then you are out of luck. Without many choices left, Apache Hadoop and other alternative big data analysis tools such as Apache Spark seem to be your best hope for scalability, affordability, and fault-tolerance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.129.253