26
Queries: Domain Reporting

WHAT’S IN THIS CHAPTER?

  • Guidance on building reports that aren’t overly coupled to domain structure
  • Building reports using existing domain services
  • Building reports that bypass the domain and hit the database directly
  • Building reports in applications that use event sourcing
  • A discussion on the trade-offs involved when choosing between reports that belong to a bounded context and reports that need to integrate data from multiple bounded contexts

Wrox.com Code Downloads for This Chapter

The wrox.com code downloads for this chapter are found at www.wrox.com/go/domaindrivendesign on the Download Code tab. The code is in the Chapter 26 download and individually named according to the names throughout the chapter.

Software systems are built to support the needs of the business. Not only do these needs include revenue-generating functionality, but they also include the ability to assess how well the business is performing. This is the role of reports: to track important metrics and key performance indicators (KPIs) like sales, financial targets, and customer satisfaction. As you’ve seen so far in this book, there are many ways to build a system when applying Domain-Driven Design (DDD). Equally, there are many ways to implement reporting.

Choosing how to implement reporting in your applications involves considering familiar trade-offs: speed of development, maintainability, performance, and even scalability. Sometimes you can simply create a new web page that reuses all your existing code. Other times, when you have distributed bounded contexts, you may need to create an entirely separate reporting bounded context that subscribes to events from many other bounded contexts and stores all the information locally. This chapter aims to present you with a variety of options to make you familiar with the trade-offs and better equip you to make diligent decisions on the projects you are involved in.

Domain Reporting Within a Bounded Context

Having all your data collocated inside the same bounded context is the least difficult reporting scenario. It’s also an indication that you have correctly identified the boundaries for your bounded contexts, because reports are usually intended for a specific department within a business, which should map onto a single bounded context. One of the biggest decisions you have to make when you don’t have to worry about distribution is whether to use your domain code when generating reports.

Deriving Reports from Domain Objects

To build a web page displaying reports, you can use existing domain code to build a view model. If you need to build a page quickly, this is usually the first option to consider, because it can require the least amount of effort. The big trade-off is performance, because you have less control over the query that is made to your datastore. If performance isn’t much of an issue but development speed is, this can be the perfect choice.

In the following examples, you learn how to create a Dealership Performance Report, which an automotive franchise uses to track the performance of its dealerships. First, you see an example using basic mappings, and then you see an alternative implementation using the mediator design pattern.

Using Simple Mappings

Probably the quickest way to build a report is to take domain objects and map their properties onto a view model that provides data and presentation logic required to build the view. Given the domain objects shown in Listing 26-1, which are already used in other parts of the application, you can easily populate the view model shown in Listing 26-2 by simply mapping across the properties shown in Listing 26-3.




As you can see from the code in Listings 26-1 to 26-3, you can easily create the report by using existing domain objects. One of the biggest decisions you need to make is where to locate your mapping logic. You can put it directly in application services or controllers, you can create dedicated mapper classes, or you can do the mapping inside the view model in the constructor or via a static factory method. Listing 26-4 shows how this solution looks using an application service called DealershipReportBuilder to build the report.

Unfortunately, the benefits do come with a cost. Reusing the domain objects from Listing 26-2 made light work of creating the report. However, the logic to build the view model in Listing 26-4 carries a potentially big performance hit because of it. For each dealership ID, the dealership and its performance details are retrieved. There can be up to three object-relational mapper (ORM)-generated database queries. In a system that has ten dealerships, that can be thirty database queries, which may have severe consequences under peak load. Performance isn’t always critical, but in reporting scenarios like this, it’s important to have an idea of how much you are giving away and how that might hurt you in production—especially when ORMs with features like lazy-loading are involved.

Another problem with using mappings is that you may need to expose additional properties on domain objects that you probably prefer to keep private and internal to the domain. Subsequently, this increases the opportunity for the service layer to be coupled to domain structure. To reduce unwanted coupling between the service layer and the domain, you may want to consider other patterns, like mediator.

Using the Mediator Pattern

To create a view model that contains all the relevant information for a report but isn’t overly coupled to domain structure, you can use the mediator design pattern. Using the mediator, you instead pass your view model into a mediator, which itself is passed into the domain. The domain objects then interact with the mediator, which updates the view model accordingly. This doesn’t break layering, because the mediator implements an interface that belongs in the domain.

As part of an alternative implementation of the dealership performance report, Listing 26-5 shows the mediator interface along with modified domain objects that no longer expose their structure, but instead provide a method that accepts and interacts with a mediator.

IDealershipAssessment is the mediator interface in Listing 26-5. Whenever a concrete implementation of the mediator is passed into the DealershipPerformanceTargets or DealershipPerformanceActuals’s Populate(), fields are set on the mediator using the values from private instance variables. The benefit to doing this is that those private variables are not exposed outside the domain. Without the coupling, they are free to change. This is in direct contrast to the previous example. The implementation of the mediator shown in Listing 26-6 attempts to clarify this.

In Listing 26-6, the DealershipAssessmentMediator wraps a DealershipPerformanceStatus view model. When the mediator is passed into the domain objects, those domain objects set properties on the mediator. In turn, the mediator sets properties on the DealershipPerformanceStatus view model it encapsulates. This is also how the domain and the view model remain decoupled in a similar fashion to the mapping approach.

Deciding when to use the mediator comes down to experience, judgment, and a few key criteria. If you find yourself wanting to share a private domain state, the mediator should be high on your list of considerations. However, if your domain is still growing and the extra complexity of a mediator is not needed, it’s likely to be a suboptimal choice. Performance-critical reports are another scenario in which you may want to avoid the mediator pattern due to the lack of low-level control. Where performance is a significant factor, you may want consider going directly to the datastore.

Going Directly to the Datastore

When performance and efficiency are important, or when going through layers of complexity and mappings is not desired, many DDD practitioners pull data for their reports directly from the database. In applications that use CQRS, dedicated, denormalized copies of the data are created for each report that needs them. When applications don’t CQRS, it’s common to query the datastore using raw data access technologies like ADO.NET. But it’s also common to use low-level features of ORMs, such as NHibernate’s HQL.

In this section, you see an example of querying a project’s main datastore with an ad-hoc reporting query. After that, you see an example of querying a denormalized copy of the data (a view cache), used specifically for reporting. Each of these examples involves creating a loyalty report for an online sports store. This report indicates to the business how successful its loyalty program is. Table 26.1 shows the format of the loyalty report.

TABLE 26.1 Display Format of the Loyalty Report

Points (per $) Net Profit (% of Overall) Sign-Ups Purchases (% of Overall)
Month A
Month B

Understanding how much profit the loyalty scheme is generating is the most important requirement of the loyalty report. As Table 26.1 shows, this is achieved by showing what percentage of overall profit came from the loyalty scheme for a given month. As part of their loyalty-optimization strategy, and to compete with rival companies, the online sports store often adjusts the number of points awarded. Using the loyalty report, the business can draw inferences about how changing this ratio affects the overall success of the scheme. Finally, no report would be complete without vanity metrics, so the loyalty report shows the number of loyalty scheme sign-ups as well.

Querying a Datastore

Building reports by directly querying a datastore gives you greater control and the ability to write efficient queries. To generate the loyalty report shown in Table 26.1, a SQL query may need to join and pull in data from a number of tables, including orders, users, loyaltyAccounts, loyaltySettings, and maybe even more. Many teams find that trusting an ORM to perform complex queries with lots of joins like this is a recipe for disaster. As a result, Micro-ORMs have become very popular because they provide some of the benefits Big-ORMs bring, yet they cut out a lot of the complexity. Micro-ORMs are a lower level of abstraction than Big-ORMs, providing you with more control over your queries and a better opportunity to make them fast and efficient.

LISTING 26-7 shows an application service that uses Dapper (https://code.google.com/p/dapper-dot-net/), a concise Micro-ORM, to run a SQL query directly against the project’s main SQL database without involving the domain. You can also see the definition of the view and database models being mapped to and from in Listing 26-8.



Dapper adds the Query<T> extension method onto the native ADO.NET SQLConnection, as shown in Listing 26-7. Query<T> maps the results of the query you pass onto an object of type T that it creates for you. But the important issue in Listing 26-8 is that the developer is completely in control of the SQL being generated. In performance-critical reporting scenarios, low-level data access circumvents the inefficiency associated with ORMs. Unfortunately, greater control comes at the cost of potential concept duplication.


One of the challenges with direct-datastore queries is duplication. Some domain entities have computed properties. If you look at the SQL for the profitQuery in Listing 26-7, you can see the percentage of loyalty net profit being calculated against overall net profit in the same period. This is likely to be a calculation that occurs somewhere in the domain model as well. It’s a risk and a violation of the don’t repeat yourself (DRY) principle, because if this calculation were to change for any reason, both the SQL query in the report and the domain logic would both need to be updated—something that can easily be overlooked or forgotten.

Duplicating domain logic anywhere is not ideal. Clearly, if you update the logic in the domain and forget to update the logic in the datastore query, you can have several problems that annoy users or give the business completely wrong numbers. If you’re worried about that concern, you may also want to consider storing the value of the computed property. In this scenario, you need to calculate the value of the computed property and then save it to the database whenever there is an update. However, if you update the database in multiple places, you need to recompute the value in multiple places or have a database trigger.

Reading Denormalized View Caches

Sometimes, even directly querying the database with handcrafted SQL can be inefficient. For this reason, some DDD practitioners choose to create view/report-specific denormalized copies of the data (view caches). This is close in concept and implementation to CQRS. Whenever an update occurs, the main database is updated, but so are the relevant denormalized view caches. Figure 26.1 shows how you can implement this pattern for the loyalty report.

images

FIGURE 26.1 Denormalized view cache for the loyalty report.

As orders are placed and new users sign up (manifested as method calls, commands, domain events, and so on), the domain is invoked as usual. When creating denormalized views, though, the updates usually follow one path to the main database and at least one other via a denormalizer to the denormalized view cache, as per Figure 26.1. The denormalizer’s job is usually to flatten the data so that queries are simple SQL select statements. Listing 26-9 shows an alternative LoyaltyReportBuilder that pulls in data from a denormalized view, emphasizing just how simple the query can be, by offloading the complexity to a denormalizer.

As you can see in Listing 26-9, the complexity is massively reduced to just a single SQL select without joins. This is all thanks to extra up-front effort of denormalizing the data. You have to decide if that effort provides enough of a reduction in complexity or enough of a performance improvement on your projects before using this approach. You can mix and match where appropriate on your projects, though.

Building Projections from Event Streams

Applications that use event sourcing, which was introduced in Chapter 22, “Event Sourcing,” require a different technique to generate reports because they don’t store the current representation of the application state. Instead, event-sourced applications rely on a feature called projections. Projections are really just queries against event streams that produce some desired state or new streams, based on the contents of the events in the original stream.

Projection usage in a reporting context will be demonstrated in the following examples, where projections are used to create a health care diagnosis report. A health care authority uses this report to track the number of diagnoses made for certain medical conditions on a monthly basis. You can see the format of this report in Table 26.2.

TABLE 26.2 Health Care Diagnosis Report Format

02/2014 03/2014 04/2014 05/2014
Total % Total % Total % Total %
Diagnosis A
Diagnosis B

As Table 26.2 shows, each row in the health care diagnosis report tracks the number of times a diagnosis is made each month. For each month, the number of diagnoses made is shown alongside its percentage relative to all diagnoses made in that month. Using this report, the staff at the Health Care Foundation can look for trends in certain diagnoses. This may help them understand seasonal differences or correlate changes with other events such as the introduction of new vaccines or medical practices.

To implement this report, each monthly summary, for each diagnosis, is created as a new event stream with the naming format diagnosis-{diagnosisId}-{month}. These new streams are created from a projection that operates on a single event stream containing every diagnosis (the “diagnoses” stream). Figure 26.2 illustrates this process.

images

FIGURE 26.2 Projecting the “diagnoses” event stream onto event streams representing the monthly summary of each diagnosis.

For every diagnosis made, a stream contains all its events for each month, as Figure 26.2 shows. As an example, all diagnoses made for the diagnosis with ID dg1 in February 2014 are projected into the stream diagnosis-dg1_201402. This means that when you come to build the report, all you have to do is count the number of events in a stream to get the total for that month. As you can see, using projections involves a similar philosophy to creating denormalized view caches—all the hard work is done up front to reduce the complexity involved in reading the data.

Setting Up ES for Projections

To work through the examples in this section, you need Event Store v3 rc2 to take advantage of newer projection capabilities. So you need to download the Event Store (http://download.geteventstore.com/binaries/EventStore-OSS-Win-v3.0.0-rc2.zip), extract it into a folder of your choice, and then run the following start-up command from PowerShell (as Administrator) from inside the directory you extracted the Event Store to:


.EventStore.SingleNode.exe ––db .ESData ––run-projections=all

Once the Event Store is started, you need to make a few changes to its configuration that enable some projection features. You can make these changes by navigating to the Projections tab in your browser (http://localhost:2113/projections) and starting the projections $by_category and $stream_by_category.


Creating Reporting Projections

Projections are created using JavaScript, which can either be posted to the Hypertext Transport Protocol (HTTP) application programming interface (API) or manually entered into the admin website. This example uses the latter approach, which you can carry out by first navigating the Projections tab and then choosing New Projection. Because the projection you need to create groups all events for a diagnosis by month, it is called DiagnosesByMonth. The code for this projection is shown in Listing 26-10 and needs to be added to the Source input editor. When creating the DiagnosesByMonth, you need to select Continuous mode and check the Emit Enabled check box. Once this is complete, you can click Post to create the projection.

The Event Store applies projections to each event in the stream. So the JavaScript in Listing 26-10 is applied to each event in the diagnoses stream. Each of those events is going to create a reference to the event in another stream. That other stream represents all diagnoses with the same diagnosisId in the same month. This is the process that was illustrated in 26-2. The Event Store supports this projection behavior with its linkTo(). linkTo adds a reference to the event passed in as the second argument on the stream whose name matches the first argument, creating that stream if necessary. Therefore, projections do not actually copy events; they just create references or pointers.

To check that the projection has worked, you can navigate to the Event Store’s Streams tab and observe the names of newly created events. You should see events of the format diagnosis-{diagnosisId}_{month}, such as diagnosis-d13_201402. If you click on one of these streams, you see pointers to events that reside in the diagnoses stream, which the projection is based on.

Counting the Number of Events in a Stream

Each row in the report needs to show the number of diagnoses made in each month. As discussed previously, these totals are just the number of events in each stream created by the projection in Listing 26-10. One approach for querying the size of an event stream is to create another projection. It’s an approach that many recommend, and it will be used in this example.

To create the projection that counts the monthly total for each diagnosis, you need to use the JavaScript in Listing 26-11. To follow along with this example, name this projection DiagnosesByMonthCounts. It should again use the continuous mode, but you can leave Emit Enabled unchecked. The projection is then ready to be created by clicking Post.

In the Event Store, categories are streams that have the same prefix. A prefix is a string of text preceding a hyphen. So all the streams that begin diagnosis- created by the first projection are in the diagnosis category. Categories provide the capability for the behavior of the projection in Listing 26-11. foreachStream() operates on each stream in a category, so the projection in Listing 26-11 goes through each stream in the diagnosis category and counts how many events there are. This count is stored in the projection’s state. You can confirm this by querying the state for the projection, making sure to supply the name of the stream you want the state for as the value of the partition parameter. For example, a request for http://localhost:2113/projection/DiagnosesByMonthCounts/state?partition=diagnosis-dg1_201402 gets the count of all events in that stream in the following format:


{
   count:1
}

Creating As Many Streams As Required

With the Event Store, creating streams is usually a cheap operation, as was mentioned in Chapter 22. So getting the total number of diagnoses made in any given month is an opportunity to use projections that create further streams. Creating these streams follows the same pattern as the last two. First, events can be partitioned by month using the JavaScript shown in Listing 26-12, using the same settings as the DiagnosesByMonth projection. To follow along with this example, call this projection Months.

Once you have run the Months projection, you then just need to sum up the numbers in each stream in the same way as the DiagnosesByMonthCounts projection. You can see the code for this projection in Listing 26-13. It should look familiar. Once this projection is running, all the streams that are needed to build the report will be in place.

Building a Report from Streams and Projections

With a set of event streams containing all the needed data, building the report is reduced to a series of HTTP calls (or interactions with a client library) and mapping between objects. An application service called HealthcareReportBuilder demonstrates that in this final part of the current example. Listing 26-14 shows the initial version of the HealthcareReportBuilder containing the high-level logic required to build the report.

To build the HealthcareReport, the HealthcareReportBuilder starts by calculating each month in the specified date range. For each of those months, it first fetches the total number of diagnoses from the Event Store, with the call to FetchMonthlyTotalsFromES() whose implementation is shown in Listing 26-15.

To get the total for each month, the code in Listing 26-15 constructs a URL for the MonthsCounts projection’s state resource. The name of the stream containing all the diagnoses for that month is used as the partition value. In response, the Event Store API returns the count as JSON. You can see this JSON response being mapped onto a DiagnosisCount, which is a data transfer object (DTO) that matches the structure of the JSON response, as Listing 26-16 shows. This object’s Count property is then stored as the count for the month. When there are no diagnoses for a given month, there are no count values either. The code sets a value of zero in those cases.

After obtaining the overall totals for each month, the HealthcareReportBuilder then gets the monthly total for each diagnosis. Before querying the Event Store, though, it carries out an intermediate step with the call to BuildQueriesFor(). BuildQueriesFor() creates a collection of strongly typed DTOs of the format shown in Listing 26-17 to make the code more expressive.

After creating the collection of DiagnosisQueries, the HealthcareReportBuilder uses them to finally query the Event Store for the monthly totals for each diagnosis, with the call to BuildMonthlySummariesFor(). The implementation of this is similar to FetchMonthlyTotalsFromES() in that the actual hard work is querying the Event Store and mapping the response, as shown in Listing 26-18.

Aside from fetching the totals, BuildMonthlySummaries() calculates the percentages, using the monthly total previously fetched, and maps the results onto a DiagnosisSummary. Upon completion, each DiagnosisSummary is mapped onto the HealthcareReport view model, as shown in Listing 26-14. All this hard work is then complete, and you can render the report.


Domain Reporting Across Bounded Contexts

Unfortunately, producing reports is not always as easy as querying a single datastore. When you have a distributed system, such as those discussed in Part II, “Strategic Patterns: Patterns for Distributed Domain-Driven Design,” each bounded context has its own datastore(s) that requires additional work on your behalf to produce the reports. This section outlines two approaches that rely on techniques presented in earlier chapters. One approach is to use the event-driven principles of Chapter 12 to create a dedicated reporting bounded context that subscribes to lots of events having all the information it needs locally in a single database. Sometimes, though, you can get away with a much lighter approach, using the UI composition techniques outlined in Chapter 23, “Composing Applications.”

Composed UI

Combining data from multiple bounded contexts to form a report can work, but usually only when most of the processing can be carried out in distinct phases, each by a single bounded context. Any other supporting information, like translating IDs to names, can also be carried out afterward by querying the bounded context that owns the source of the lookup. A territorial record label comparison report can be used to demonstrate this. An online music streaming organization can use this report to show the popularity of each record label in a variety of countries. Popularity is a measure of the combined total of streams and downloads for every song belonging to a record label. Table 26.3 shows the layout of a territorial record label comparison report.

TABLE 26.3 Territorial Record Label Comparison Report

North America Europe Asia
Record Label 1
Record Label 2
Record Label 3

One of the big challenges involved in producing the territorial record label comparison report is that streaming and downloads are completely independent parts of the business, each with its own bounded context. So to get the total of streams and downloads for each record label, the information from each of those bounded contexts needs to be combined, as Figure 26.3 shows.

images

FIGURE 26.3 Aggregating data from multiple bounded contexts into a single report.

Fortunately, the aggregation can occur in distinct phases. Total downloads can be retrieved from the Downloads bounded context. At the same time, the total number of streams for each label can be retrieved from the Streaming bounded context. Using client- or server-side aggregation, as demonstrated in Chapter 23, the totals for each record label in each territory can easily be combined.

Separate Reporting Context

For reasons of performance, efficiency, or convenience, having all your data for reporting live inside the same datastore may be an important criteria. One example of this is data warehousing, in which the business wants to slice and dice all its data in new ways as it seeks to uncover insights. Often a business employs data scientists to carry out this important role. Having read Part II, “Strategic Patterns: Patterns for Distributed Domain-Driven Design,” about building distributed bounded contexts, you know that by default this is not possible due to each bounded context having its own datastore(s) and being loosely coupled to the others. But you also know that bounded contexts communicate with events, opening the possibility to create a special reporting context that subscribes to events from many bounded contexts, enabling it to gather all the data it needs.

Implementing a report context can vary drastically in scope and implementation. In the simplest case, it may be like any other bounded context in that it subscribes to events and stores them in a SQL database, as shown in Figure 26.4. At the other end of the scale, it may be pushing data through a variety of database technologies, recommendation engines, and machine learning algorithms, similar to Netflix (http://techblog.netflix.com/2013/01/hadoop-platform-as-service-in-cloud.html), as Figure 26.5 illustrates.

images

FIGURE 26.4 Standard reporting context.

images

FIGURE 26.5 Complex data-processing reporting context.

To learn more about reporting and business intelligence in an event-driven Service Oriented Architecture (SOA) system, Arnon Rotem-Gal-Oz has published a detailed article on the InfoQ website (http://www.infoq.com/articles/BI-and-SOA).

The Salient Points

  • Reports can be created using a variety of tools and technologies that avoid domain coupling.
  • Some reports operate on data within a bounded context, but some may need to query data from multiple bounded contexts.
  • Mapping from domain objects onto view models is often the quickest approach, but it provides little control over low-level data access.
  • Design patterns like the mediator pattern can be used to build reports or to juggle trade-offs such as coupling.
  • It’s okay to go directly to the datastore if you need queries to be inefficient, but duplication of concepts and violation of DRY is a concern to be mindful of.
  • Querying the main database and querying denormalized view caches are two direct data access approaches.
  • Denormalized view caches move all the hard work into the denormalization process in return for simpler queries.
  • Using projections of event streams also trades off background processing in favor of simpler reads.
  • You can query data from multiple bounded contexts by using UI composition in some cases and a separate reporting context in others.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.253.198