Random sampling

Random sampling uses the rand() function and LIMIT keyword to get the sampling of data, as shown in the following example. The DISTRIBUTE and SORT keywords are used here to make sure the data is also randomly distributed among mappers and reducers efficiently. The ORDER BY rand() statement can also achieve the same purpose, but the performance is not good:

> SELECT name FROM employee_hr 
> DISTRIBUTE BY rand() SORT BY rand() LIMIT 2;
+--------+
| name |
+--------+
| Will |
| Steven |
+--------+
2 rows selected (52.399 seconds)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.196.172