The UCI machine learning repository

We can access the UCI machine learning repository by navigating to https://archive.ics.uci.edu/ml/. So, what is the UCI machine learning repository? UCI stands for the University of California Irvine machine learning repository, and it is a very useful resource for getting open source and free datasets for machine learning. Although PySpark's main issue or solution doesn't concern machine learning, we can use this as a chance to get big datasets that help us test out the functions of PySpark.

Let's take a look at the KDD Cup 1999 dataset, which we will download, and then we will load the whole dataset into PySpark.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.188.121