Job engine

Hive supports running jobs on different engines. The choice of engine will also impact the overall performance. However, this is a bigger change compared to the other settings. Also, this change requires a service restart rather than temporarily make it effective in command-line session. Here is the syntax to set the engine as well as details for each of them:

SET hive.execution.engine=<engine>; -- <engine> = mr|tez|spark 
  • mr: This is the default engine, MapReduce. It was deprecated after Hive v2.0.0.
  • tez: Tez (http://tez.apache.org/) is an application framework built on Yarn that can execute complex Directed Acyclic Graphs (DAGs) for general data-processing tasks. Tez further splits map and reduce jobs into smaller tasks and combines them in a flexible and efficient way for execution. Tez is considered a flexible and powerful successor to the MapReduce framework. Tez is production-ready and being used most of the time to replace the mr engine.
  • spark: Spark is another general purpose big data framework. Its component, Spark SQL, supports a subset of HQL and provides similar syntax to HQL. By using Hive over Spark, Hive can leverage Spark's in-memory computing model as well as Hive's mature cost-based optimizer. However, Hive over Spark requires manual configurations and still lacks solid use cases in production. For more details of Hive over Spark, refer to the Wiki page at (https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started).
  • mr3: MR3 is another experiment engine (https://mr3.postech.ac.kr/). It is similar to Tez but with the enhancements of simpler design, better performance, and more features. MR3 is documented as ready for production use and supports all major features from Tez, such as Kerberos-based security, authentication and authorization, fault tolerance, and recovery. However, it lacks a solid production use case and best practices in production deployment, as well as CDH or HDP distribution support. 
Live Long And Process (LLAP) functionality was added in Hive v2.0.0. It combines a live long running query service and intelligent in-memory caching to deliver fast queries. Together with a job engine, LLAP provides a hybrid execution model to improve overall Hive performance. LLAP needs to work through Apache Slider (https://slider.incubator.apache.org/) and only works with Tez for now. In the future, it will support other engines. The recent HDP has provided LLAP supported thought Tez.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.162.214