A few more big data use cases

Every big data project must go through the same standard practices in any financial organization, and these are as follows:

  • Understand the business needs and build a cross-functional team to support your short, mid, and long term goals.
  • Ensure that the platform supports the growth of the project. Make sure that the infrastructure is configured appropriately for your performance requirements and caters to both current and possible future projects.
  • The first project will obviously have a lot of visibility, so make sure you show the return on investment as soon as possible, to ensure executive sponsorship for all the current and possible future projects.
  • Hadoop is still evolving, so make sure that your team is well trained and a significant budget is allocated for not just initial training, but also periodic training on new products and releases. Even the users and operations must be well-trained to use it effectively.

I will now discuss more finance big data use cases with proposed solutions. Feel free to tweak them according to your requirements, as you know there is never just one right thing in technology.

Use case – fraud again

There are many different kinds of fraudulent activities within an investment bank, which must be proactively detected. If you have read the newspapers, you would know that the regulatory fines are so significant that they can even pull down the quarterly profit projection and share prices. Some common fraud incidents are as follows:

  • Stocks of small companies with market capitalization of under $250 million, normally traded on OTC, are promoted with misleading information and sold to the public. The brokers dump these shares as soon as the prices peak.
  • Abusive short selling are normally non-value-added speculative trades. The stocks are sold without borrowing and also without the intention to borrow.
  • Penny stocks are those that have a per share value less than or equal to $5. Traders buy such products at obviously lower prices, promote with misleading prospects, and then sell at inflated prices.
  • Market manipulation and insider trading.

Solution

Well, no marks for guessing. If you recall the project I discussed in Chapter 6, Getting Experienced, on debit/credit card transaction fraud, this use case is very similar.

  • Build the detection model using historical trades, according to your requirement. You can filter it by market capitalization, share values less than $5 for penny stocks, and so on.
  • For insider trading and market manipulators, filter all traders in question and group it by traders, and analyze their trading patterns.
  • Build the outlier detection algorithm. There will be many to choose from and it will actually depend on your detailed requirements.
  • If fraud detection is needed in real time, use Storm or Spark; else batch mode MapReduce using Java, Pig, or Hive should be good enough.

Use case – customer complaints

The top priority for any bank is to minimize the customer attrition rate and that depends on customer satisfaction. We know that customers choose banks depending on their service level. Customer satisfaction data is very large, including unstructured data, such as telephone recordings and e-mails. For an accurate analysis, the solution must be based on a complete dataset rather than a small subset, as each customer is different.

Solution

There are many factors that need to be analyzed, such as geographic and socio-economic classification, depending on customer dissatisfaction. The correlation between satisfaction and parameters needs to be identified, and we need to choose the following:

  • Algorithm: Clustering and classification of customer complaints. We can group them by regions and socio-economic factors, extract the top complaints, and analyze their impacts.
  • Technology: Hadoop as a platform, Mahout or Spark MLlib for machine learning, programming in Java or Python.

Use case – algorithm trading

Algorithm trading is the use of computer programs for submitting trading orders. The computer algorithms decide every aspect of the order, such as timing, price, and quantity.

To develop the algorithm, we need to do the following:

  • Backtest the algorithm with historical price data and fine-tune it in terms of accuracy
  • The more parameters you use, the more sophisticated the algorithm gets

Solution

Algorithm trading is not new for investment banks, but Hadoop MapReduce has made it a bit easier. The solution will contain MapReduce phases, as mentioned next:

  • The first phase of MapReduce will take the large daily price data with the given set of parameters and the output will be the performance of the parameters
  • The second phase of MapReduce will take the input as the set of parameters and the output will be the top performing parameter set
  • The top performing parameter set can be used going forward for real-time automated trading

Use case – forex trading

There is a large quantity of Forex trading data, but due to hardware limitations it can be only visualized by day/week/month. For a better statistical insight into the money exchange market, we need to be able to collect, analyze, and visualize the streaming data in real time.

Solution

This one is an opposite use case of fraud detection called recommender. For fraud detection, we design a cluster of transaction patterns and identify outliers. Here, instead of identifying outliers, we find similar items and recommend those.

  • Build the detection model using historical intraday/daily forex/equity/other data
  • Build the recommendation algorithm; there are many to choose from and it will really depend on your detailed requirements
  • Use Storm or Spark for real-time stream processing

Use case – social media based trading

Predict the stock price based on tweet volume, news, or Facebook messages. The hypothesis is that if there is more positive public interest in a stock or industry sector, the demand to buy those stocks will increase, or vice versa.

Solution

Use statistical analysis, that is the correlation between stock price and social media interest, and predict the price using the following points:

  • Use Facebook and Twitter APIs to load your data into Hadoop.
  • Once the data is cleaned and filtered, use Mahout libraries to predict the stock price. Most of the algorithms are based on the correlation similarity, the distance similarity, or the regression method.
  • If you are using Spark, it provides MLlib library for machine learning, but is not as advanced as Mahout.

Use case – no big data

One can ask if there is any benefit in using Hadoop or Apache open source products at all to process small and medium data such as:

  • Activities and trades less than 100,000 per day
  • Small reference data such as risk type, market, currency, customers, employees, trading desks, classifications, and so on

Solution

The answer is yes. Even if the data access performance is not as good as you may like, keeping the data on Hadoop is at least cheaper, both in terms of hardware and software.

There is a definite benefit, if any of the following points hold true:

  • The processing of small or medium data is part of a bigger Hadoop ecosystem. For example, if processing of trades from most of the large systems has been migrated to Hadoop, it makes sense to include smaller trading data to centralize the data on a single Hadoop platform.
  • If the external data is referenced frequently from Hadoop jobs, then there is no harm in keeping a local copy on Hadoop.
  • Use Spark to process small and medium data, if performance is your main concern.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.86.211