The architecture of the project

For a simple e-commerce site these days, the following entities are obvious:

  • Products/items and their metadata
  • Customers/users and their metadata
  • News items/blog posts related to the products or from editorial
  • User reviews associated with products
  • User ratings associated with products

Other than that, there are many more systems required to effectively build a complete e-commerce enterprise. The following diagram highlights them:

The architecture of the project

Because our objective is to build a recommender system and not a complete e-commerce site, we will narrow our focus to a minimum set of requirements. So here are different software components that we will need:

  • Persistent/structured data storage
  • A queuing mechanism
  • Search support

For data storage, we can use MongoDB, and we have already covered it in a previous chapter. Because MongoDB is NoSQL storage, we need to be careful in designing a schema that allows us to join different entities such as a user and reviews to form a complete product profile.

A queuing mechanism is used to process the data as it arrives in a streaming fashion. This is also important if different independent components such as a search indexer, recommendation engine, e-mailer service, and so on, all want to process the data in parallel. We can use Apache Kafka for this purpose. Since we have already covered Apache Kafka in a previous chapter let's stick to that.

For search, we can use a popular search technology such as Elasticsearch/Apache Solr, or just plain Apache Lucene. Although MongoDB also supports search queries, it is not as extensive as Elasticsearch or Apache Solr. In order to set up Elasticsearch (or Apache Solr) you can refer to their project pages:

We will also go through Elasticsearch setup in Chapter 7, Enhancing the User Experience.

The following is the architecture of the application that we will build:

The architecture of the project

Batch versus online

As shown in the architecture diagram of your application, the input data of different interactions happening in the system are captured as soon as they take place. They are routed via the queuing mechanism to storage and indexing components (we have ignored other components such as e-mail and payment for now).

When this data finally reaches the recommender system component, then either the recommender system will learn instantly, that is, online recommendations, or it will wait for some specified time (maybe hours or days) and then re-generate recommendations, that is, batched processing. The recommender system can wait for some time before generating new recommendations. This delayed approach is also called batching. This can be due to the fact that either enough data is not yet available so it makes no sense to run a recommender algorithm or the recommender algorithm is itself very expensive.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.217.95