10

Tuning Performance

In this chapter, we’ll cover the various methods available to tune performance when designing Salesforce applications on Lightning Platform. We’ll revisit several concepts we’ve already looked at, diving a little deeper into some scenarios to demonstrate the practical application of the techniques highlighted. By the end of this chapter, you’ll know of the various performance tuning techniques available when designing scalable solutions on the Salesforce Platform, with particular regard to data performance.

In this chapter, we’ll cover the following topics:

  • Skinny tables and other Large Data Volume (LDV) mitigation techniques
  • Query and search optimization
  • Task locks
  • Primary Key (PK) chunking to improve performance
  • Loading massive amounts of data

We’ll begin by looking at LDV mitigation techniques and tuning our Salesforce implementations when working with massive amounts of data.

Skinny tables and other LDV mitigation techniques

Skinny tables are an LDV mitigation technique that is useful when users experience slow read operations for queries against the Salesforce database. This is since the underlying (virtual) database schema for a standard object is actually a join across two tables. There are standard object standard fields and standard object custom fields. For example, a query for data from a standard object such as Account that includes both standard and custom fields of the Account object may benefit from skinny tables when performance is impacted.

As mentioned in Chapter 6, Understanding Large Data Volumes, skinny tables are for read operations only, in order to speed up queries for the fields that the skinny table contains. Whether a query should use a skinny table or not is determined behind the scenes by the Salesforce application layer. Skinny tables do not contain any soft-deleted records, and therefore can only be used to speed up the performance of read-only queries against records that aren’t deleted.

Skinny tables are enabled by Salesforce support upon the creation of a support case, requiring you to provide the appropriate justification. Salesforce support won’t simply create a skinny table without first trying to understand why the performance is slow, so they may indeed work with you to determine why that is first. It’s also worth noting that skinny tables can only contain up to 100 columns.

Skinny tables can be created on all custom objects and on the following standard objects:

  • Account
  • Contact
  • Opportunity
  • Lead
  • Case

Let’s revisit our visual example of a skinny table to show how the skinny table construct works:

Figure 10.1 – Skinny table construct

Figure 10.1 – Skinny table construct

As we can see, skinny tables are created to remove joins across standard and custom fields when issuing read-only queries to read data from a single object. Skinny tables are best enabled when a table has tens of millions of records and are useful for speeding up report performance for your users.

With skinny tables now covered, let’s look at selective filter conditions and how they can be used when working to improve Salesforce Object Query Language (SOQL) query performance.

Selective filter conditions

In order to improve SOQL query performance, selective filter conditions can be utilized. You can determine whether a specific filter condition is selective by running it in the Developer Console and looking at the query statistics. You can work with Salesforce support to apply an index to a selective field in order to speed up query and reporting performance. In short, queries will perform more efficiently when indexes are applied to selective filter condition fields. Within the Salesforce application logic, the Lightning Platform query optimizer determines filter condition selectivity for a SOQL query, report, or list view when queries are issued.

By confirming that filter conditions in reports, SOQL, and list views are selective, then by applying indexes to those fields, the Lightning Platform query optimizer doesn’t have to perform full dataset scans when looking for target data. It means that the performance will be optimal in such use cases.

Divisions

Divisions are a special Salesforce data performance feature that can be used to partition data and reduce the number of records returned by SOQL queries. This effectively increases performance for users running data queries and running reports. In order to enable Divisions, you need to raise a case with Salesforce support.

With skinny tables and other LDV mitigation techniques now revisited, let’s look at the optimization of searches and queries.

Query and search optimization

As we explored in Chapter 6, Understanding Large Data Volumes, consider that Salesforce performs searches in a two-part process. The first part is the creation of a result set that is used to then perform the actual search. Let’s quickly remind ourselves how a result set is produced. When a search is invoked (be it through the Salesforce user interface, a Salesforce API, or Apex code), Salesforce will first search the indexes that have been created for the appropriate records. Taking those results, Salesforce will then apply access permissions, search limits, and any other filters or filter logic to narrow down the results into a result set. That result set is then used to perform an actual search for records in the underlying database.

The way to speed up searches is to think carefully about the indexing of fields that are used when searching in order to speed up the performance of those searches. As seen in Chapter 6, Understanding Large Data Volumes, several indexes are automatically applied out of the box. When combined with custom indexes (which are enabled with assistance from Salesforce support), search performance can be increased.

As a reminder, when Salesforce performs an indexed search, standard indexes are used if the filter condition of the query matches less than 30% of the first million records (and less than 15% of additional records), up to a maximum of 1 million records.

Custom indexes (you would have worked with Salesforce support to enable these) are used where the query filter matches less than 10% of the total records, up to a maximum of 333,333 records.

Now we’ve covered the optimizations we can make to queries and searching, let’s next look at task locks.

Task locks

When loading data, record locking can occur due to several issues. When dealing with tasks, it’s worth noting a few conditions where records associated with the task records are locked. Specifically, these are as follows:

  • When a task is inserted, the Account record, along with the records referenced by the WhoId and WhatId fields, is locked, but only if the status of the task record is not Completed, and the activity date is set (it’s not equal to null).
  • When a task is updated, the Account record, along with the records referenced by the WhoId and WhatId fields, is locked, irrespective of task record values.
  • When a task is deleted, the Account record, along with the records referenced by the WhoId and WhatId fields, is locked, irrespective of task record values.

When we break down the record associations available for a task record, there are two main fields in play—namely, the WhoId and WhatId fields. WhoId is a Contact or a Lead record, and WhatId is any other type of record, such as Account or Opportunity.

With WhoId referencing a Contact record, a parent Account record is referenced through the relationship between the Contact and Account records. WhatId, if not referencing an Account record directly, will be another object record that ultimately resolves to an Account record. With this information understood, task records can be organized by the ultimate referencing account and loaded in discrete batches in parallel using the Bulk API.

Another way to load tasks, which may well be applicable if the ultimate parent Account record is the same, is by means of a serial job (one that is processed sequentially). This will be necessary when there are more than 10,000 task records associated with a single Account record, or where lock contentions may occur due to the relationships between referenced objects.

If jobs are being scheduled as part of a nightly batch or similar, the Salesforce best practice would be to ensure that there is a time buffer built into the batch job executions. This is to ensure that record overlaps with scheduled jobs that run too close together don’t lock records and cause contention.

With task locks now looked at, let’s turn to querying large amounts of data and how PK chunking can be used to improve performance.

PK chunking to improve performance

PK chunking is designed as a mechanism to allow entire Salesforce table data to be extracted—for example, as part of a backup routine. PK chunking effectively adds record IDs as a WHERE clause parameter to query data from a Salesforce entity in batches.

In general, if an object in Salesforce has more than 10 million rows, you should use PK chunking when exporting its data. If you are finding that querying for data times out regularly, use PK chunking.

Given that PK chunking effectively separates one big query into separate queries by adding a WHERE clause and using a range of ordered IDs, the batch size can be set. This is defaulted to 100,000 (as in, 100,000 records will be returned by default for each batch) but can be as high as 250,000. Therefore, for a 10 million-row entity, a batch size of 250,000 would result in 40 data batches being returned.

In Chapter 7, Data Migration, we walked through a practical example of how PK chunking works. It should be noted that it is possible to use filtered queries with PK chunking (as in, they already have a WHERE clause). In such scenarios, records returned from each sub-query may be less than the specified batch (or chunk) size. It’s also worth noting that because the query is across the whole table, it will return (but omit) soft-deleted records. Given these are omitted from the result for a given query, the batch size returned may be smaller than the specified chunk size for that particular sub-query.

When viewing a PK-chunked job in Salesforce, it will have an initial state of NOT_PROCESSED. Any batches that fail will have a state of FAILED, but subsequent batches will still be processed as part of the overall job.

Important note

When filtering as part of your job where PK chunking is enabled, using either an ORDER BY clause or any form of ID field effectively disables PK chunking, and no performance benefit will be realized.

To start a PK-chunked batch job, send the Sforce-Enable-PKChunking header with the programmatic request. Here’s an example of this:

Sforce-Enable-PKChunking: chunkSize=250000; startRow=001A0000000abcdef

In this example, the chunk size has been set to 250000, and the starting ID of the query is 001A0000000abcdef.

With querying of large amounts of data and how PK chunking can be used to improve the performance of such operations now understood, let’s look at getting massive amounts of data moving the other way into Salesforce.

Loading massive amounts of data

When loading lots and lots of data into the Salesforce Platform, we’re essentially concerned with how we can get as much data as possible into our Salesforce instance, reliably, in the shortest time possible. Let’s imagine we have 20 million records to load. Thinking in terms of serial versus parallel processing, we can view our loading scenario in two ways:

  • Loading 20 million records sequentially (serial)
  • Loading 20 million records in parallel, by breaking down the 20 million records into smaller batches, inserting them in parallel, and taking less time

To load lots of data quickly, we need to optimize our parallel data loads. There are several steps we can take before loading any data to speed up the load operations. Deferring sharing calculations until after the load operation completes will mean sharing recalculations will be run once rather than on every record batch being processed. Disabling any logic that may run when the data is loaded—such as workflow rules, flows, and triggers—will ensure that valuable processing time won’t be taken running these automations when loading data. Of course, there is a reliance on the data being inserted in a preprocessed state as the automations not running mean that certain field value updates and the like won’t be invoked and therefore will need to be processed offline manually.

Next, let’s think about the types of skews that may occur. For example, are we going to have an individual Account record with more than 10,000 child records? Are we going to have an owner of more than 10,000 records? Are we going to have more than 10,000 lookups to an individual record?

By thinking about the various types of skews, we can plan for and mitigate these potential skews when loading our data. We can have several Account records to mitigate Account Skew and several record owners for ownership skew mitigation and remove the role for those owners (although don’t do this without a proper impact analysis), and so on. Planning for the known effects resulting from our expected resultant state of data helps massively when loading lots of data.

Record locks should be considered next. Knowing how records will be locked depending on the type of record being loaded means source data can be divided up, ordered, and loaded in discrete, parallel batches. Pay attention to task-locking behavior and the behavior of detail records for Master/Detail relationships.

Lastly, batch sizes can be tweaked depending on the amount of data. For example, loading 20 million records as a sequential operation versus 200 batches of 100,000 records will result in varying degrees of performance, record-locking behavior, and data skew. With the correct planning and considerations of disabling automation, deferring sharing calculations and so on, millions of records can be loaded using the Bulk API to great effect.

Now we’ve looked at the techniques that can be used when loading massive amounts of data into Salesforce, let’s summarize what we’ve covered in this chapter.

Summary

In this chapter, we’ve covered a lot of varying concepts, all related to the improvement of performance when working with data in Salesforce. We started with revisiting LDV and covering skinny tables. We then moved on to the optimization of search and queries, and then looked at task locks and their impact on performance.

Looking at moving large amounts of data out and into Salesforce was next on our agenda for this chapter. To that end, we covered PK chunking and then looked at moving data into Salesforce in a performant way despite loading millions of records.

In the next chapter, we’ll look at how data movement in and out of Salesforce can be used in data backup and restore scenarios.

Practice questions

Test your knowledge of the topics and concepts covered in this chapter by attempting to answer the following questions:

  1. Which data API facilitates the asynchronous processing of data in parallel batches?
  2. Which LDV mitigation technique can be used to partition data and reduce the number of records returned by SOQL queries?
  3. Which header is required for PK chunking to be enabled?
  4. Which is considered the fastest mode for data loads with the Bulk API—Serial or Parallel?
  5. When Salesforce performs an indexed search, standard indexes are used if the filter condition of the query matches what?
  6. Custom indexes are used where the query filter matches what?
  7. When mitigating ownership skew for users that must have a role, where should that role reside in the role hierarchy?
  8. True or false? Salesforce support is needed to enable skinny tables.
  9. True or false? When mitigating ownership skew, you should consider removing a role associated with users owning 10,000 records or more.
  10. True or false? When loading lots of data, it is not considered best practice to disable automation such as workflow rules and triggers.

Answers

How did you get on attempting the practice questions? Check your answers here:

  1. The Bulk API
  2. Divisions
  3. Sforce-Enable-PKChunking
  4. Parallel
  5. Less than 30% of the first million records (and less than 15% of additional records), up to a maximum of 1 million records
  6. Less than 10% of the total records, up to a maximum of 333,333 records
  7. At the top
  8. True
  9. True
  10. False

Further reading

To learn more about the concepts covered in this chapter, check out the following resources:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.7.7