Vectorization optimization

Vectorization optimization processes a larger batch of data at the same time rather than one row at a time, thus significantly reducing computing overhead. Each batch consists of a column vector that is usually an array of primitive types. Operations are performed on the entire column vector, which improves the instruction pipelines and cache use. Files must be stored in the ORC format in order to use vectorization. For more details on vectorization, please refer to the Hive Wiki (https://cwiki.apache.org/confluence/display/Hive/Vectorized+Query+Execution). To enable vectorization, we need to use the following setting:

> SET hive.vectorized.execution.enabled=true; -- default false
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.200.46