Concurrent

Over the last decade, we have been hearing more and more about concurrency. If you have never used a language with first-class concurrency support before, you may be wondering what all the fuss is about. In this section, we will cover why concurrency matters in the context of web applications and how Phoenix developers leverage it to build fast, performant applications. First let’s talk about the different types of concurrency.

Types of Concurrency

For our purposes, let’s think of concurrency as a web application’s ability to process two or more web requests at the same time. The simplest way to handle multiple requests is by executing them one right after the other, but that strategy isn’t very efficient. To process a request, most web apps need to perform I/O such as making database requests or communicating with an external API. While you’re waiting for those external services to complete, you could start working on the next request. This is I/O concurrency. Most programming languages provide I/O concurrency out of the box or via libraries. Sometimes, however, the I/O concurrency abstraction ends up leaking to the developer, who must write code in a confusing way, with callbacks of some form.

Another type of concurrency is multi-core concurrency, which focuses on the CPU. If your machine has more than one core, one core processes one request while a second core processes another one. For the rest of this discussion, we will consider machines with four cores in our examples, which is commonplace, as even smart watches have multiple cores today.

There are two main ways to leverage multi-core concurrency:

  • With an operating system process per core: If your machine has four cores, you will start four different instances of your web application.

  • With user space routines: If your machine has four cores, you start a single instance of your web application that is capable of using all cores efficiently.

The downside of using operating system processes is that those four instances cannot share memory. This solution typically leads to higher resource usage and more complex solutions.

Thanks to the Erlang VM, Elixir provides I/O concurrency without callbacks, with user-space multi-core concurrency. In a nutshell, this means Elixir developers write code in the simplest and most straightforward fashion and the virtual machine takes care of using all of the resources, both CPU and I/O, for you. The result is a better application. Let’s talk about why.

Simpler Solutions

One issue with concurrency via operating system processes is the poor resource usage. Each core needs a separate instance of your application. If you have 16 cores, you need 16 instances, each on its own memory space.

With user space concurrency, you always start a single instance of your application. As you receive new requests, they are naturally spread throughout all cores. Furthermore, they all share the same memory. This small drawback might seem a little vague, so let’s make it more explicit by looking at one specific problem, a common dashboard.

Imagine each user in your application has a dashboard. The data in this dashboard takes around 200ms to load and it takes about 100kB in memory. Since we want to provide good experience to users, we decide to cache this data. Let’s say your web application supports only operating system process concurrency. That means each application instance needs to keep its own cache. For ten thousand (10,000) active users, that’s a 1GB data cache for all of the dashboards per instance. For 16 cores with 16 instances, that’s 16GB of cache, and it’s only for the dashboard data. Furthermore, since each instance has its own cache shared across all users, each cache will be less effective at startup time because cache hit rates will be lower, leading to poor startup times.

To save memory and improve the hit rates, you may decide to put the data in an external caching system, such as Redis or memcached. This external cache increases your complexity for both development and deployment concerns because you now have a new external dependency. Your application is much faster than it would be if you were simply querying the database, but every time users access the dashboard, your application still needs to go over the network, load the cache data, and deserialize it.

In Elixir, since we start a single web application instance across all cores, we have a single cache of 1GB, shared across all cores, regardless of whether the machine has 1, 4, or 16 cores. We don’t need to add external dependencies and we can serve the dashboard as quickly as possible because we don’t need to go over the network.

Does this mean Elixir eliminates the need for caching systems? Surely not. For example, if you have a high number of machines running in production, you may still want an external caching system as a fallback to the local one. We just don’t need external cache systems nearly as often. Elixir developers typically get a lot of mileage from their servers, without a need to resort to external caching. For example, Bleacher Report was able to replace 150 instances running Ruby on Rails with 5 Phoenix instances, which has been proven to handle eight times their average load at a fraction of the cost.[3]

And while this is just one example, we have the option to make similar trade-offs at different times in our stacks. For simple asynchronous processing, you don’t need a background job framework. For real-time messaging across nodes, you don’t need an external queue system. We may still use those tools, but Elixir developers don’t need to reach for them as often as other developers might. We can avoid or delay buying into complex solutions, spending more time on domain and business logic.

Performance for Developers

Developers are users too. Elixir’s concurrency can have a dramatic impact on our experience as we write software. When we compile software, run tests, or even fetch dependencies, Elixir is using all cores in your machine, and these shorter cycles over the course of a day can stack up.

Here is a fun story. In its first versions, Elixir used to start as many tests concurrently as the number of cores in your machine. For instance, if your machine has four cores, it would run at most four tests at the same time. This is a great choice if your tests are mostly using the CPU.

However, for web applications, it is most likely that your tests are also waiting on I/O, due to the database or external systems. Based on this insight, the Elixir team bumped the default number of concurrent tests to double the number of cores. The result? Users reported their test suites became 20%-30% faster. Overall, it is not uncommon for us to hear about web applications running thousands of tests in under 10 seconds.

But Concurrency Is Hard

You may have heard that concurrency is hard and we don’t dispute that. We do claim that traditional languages make concurrency considerably harder than it should be. Many of the issues with concurrency in traditional programming languages come from in-memory race conditions, caused by mutability.

Let’s take an example. If you have two user space routines trying to remove an element from the same list, you can have a segmentation fault or similarly scary error, as those routines may change the same address in memory at the same time. This means developers need to track where all of the state is and how it is changing across multiple routines.

In functional programming languages, such as Elixir, the data is immutable. If you want to remove an element from a list, you don’t change that list in memory. You create a new list instead. That means as a functional developer, you don’t need to be concerned with bugs that are caused by concurrent access to memory. You’ll deal only with concurrency issues that are natural to your domain.

For example, what is the issue with this code sample?

 product = get_product_from_the_database(id)
 product = set_product_pageviews(get_product_pageviews(product) + 1)
 update_product_in_the_database(product)

Consider a product with 100 pageviews. Now imagine two requests are happening at the same time. Each request reads the product from the database, sees that the counter is 100, increments the counter to 101, and updates the product in the database. When both requests are done, the end result could be 101 in the database while we expected it to be 102. This is a race condition that will happen regardless of the programming language you are using. Different databases will have different solutions to the problem. The simplest one is to perform the increment atomically in the database.

Therefore, when talking about web applications, concurrency issues are natural. Using a language like Elixir and a framework such as Phoenix makes all of the difference in the world. When your chosen environment is equipped with excellent tools to reason about concurrency, you’ll have all of the tools you need to grow as a developer and improve your reasoning about concurrency in the wild.

In Elixir, our user-space abstraction for concurrency is also called processes, but do not confuse them with operating system processes. Elixir processes are abstractions inside the Erlang VM that are very cheap and very lightweight. Here is how you can start 100,000 of them in a couple of seconds:

 for i <- 1..100_000 ​do
  spawn(​fn​ -> Process.sleep(​:infinity​) ​end​)
 end

From now on, when you read the word process, you should think about Elixir’s lightweight processes rather than operating system processes. That’s enough about concurrency for now but we will be sure to revisit this topic later.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.194.123