Lending Club provides all available loan applications and their results publicly. The data for years 2007-2012 and 2013-2014 can be directly downloaded from https://www.lendingclub.com/info/download-data.action.
Download the DECLINED LOAN DATA, as shown in the following screenshot:
The downloaded files contain filesLoanStats3a.CSV and LoanStats3b.CSV.
The file we have contains approximately 230 k rows that are split into two sections:
- Loans that meet the credit policy: 168 k
- Loans that do not meet the credit policy: 62 k (note the imbalanced dataset)
As always, it is advisable to look at the data by viewing a sample row or perhaps the first 10 rows; given the size of the dataset we have here, we can use Excel see at what a row looks like:
Be careful since the downloaded file can contain a first line with a Lending Club download system comment. The best way is to remove it manually before loading into Spark.