This chapter covers
Every computer program performs one or more computations, and these computations are usually performed sequentially. That is, the current computation has to complete before the next one starts. For example, consider a simple calculation, (2 + 2) x 4, in which the addition must be computed first, to give 4, followed by the multiplication, to give 16. In that example, the calculation is performed sequentially.
Concurrency allows more than one computation to make progress without waiting for all other computations to complete. This form of computing is useful in many situations, such as in an I/O application like the chat application you developed in chapter 3. If executed sequentially, such applications waste time waiting on input or output operations to complete. Concurrency allows this time to be used for another task, drastically reducing the execution time of the application. You learned about concurrency in chapter 3; in this chapter, you’ll learn about a related concept called parallelism.
Nim offers many built-in facilities for concurrency and parallelism including asynchronous I/O features in the form of futures and await, spawn for creating new threads, and more. You’ve already seen some of these used in chapter 3.
Parallelism in Nim is still evolving, which means that the features described in this chapter may change or be replaced by more-robust features. But the core concepts of parallelism in Nim should remain the same, and what you’ll learn in this chapter will be applicable to other programming languages as well.
In addition to showing you Nim’s parallelism features, this chapter will lead you through the implementation of a simple parser, which will show you different methods for creating parsers. Toward the end of the chapter, you’ll optimize the parser so it’s concurrent and can be run in parallel on multiple CPU cores.
Nowadays, almost all OSs support multitasking, the ability to perform multiple tasks over a certain period of time. A task is usually known as a process, which is an instance of a computer program being executed. Each CPU executes only a single process at a time, but multitasking allows the OS to change the process that’s currently being executed on the CPU without having to wait for the process to finish its execution. Figure 6.1 shows how two processes are executed concurrently on a multitasking OS.
Because CPUs are extremely fast, process A can be executed for 1 nanosecond, followed by process B for 2 nanoseconds, followed by process A for another nanosecond.[1] This gives the impression of multiple processes being executed at the same time, even though a CPU can only execute a single instruction at a time. This apparent simultaneous execution of multiple processes is called concurrency.
For simplicity, I’ll ignore the time taken for a context switch here.
In recent years, multicore CPUs have become popular. This kind of CPU consists of two or more independent units that can run multiple instructions simultaneously. This allows a multitasking OS to run two or more processes at the same time in parallel. Figure 6.2 shows how two processes are executed in parallel on a dual-core CPU.
Unlike a single-core CPU, a dual-core CPU can actually execute two processes at the same time. This type of execution is called parallelism, and it can only be achieved on multiple physical CPUs or via a simultaneous multithreading (SMT) technology such as Intel’s Hyper-Threading (HT) Technology. Remember that despite the apparent similarities between concurrency and parallelism, the two are not the same.
In addition to processes, the OS also manages the execution of threads. A thread is a component of a process, and more than one can exist within the same process. It can be executed concurrently or in parallel, just like a process, although unlike processes, threads share resources such as memory among each other.
To make use of the full power of a multicore CPU, CPU-intensive computations must be parallelized. This can be done by using multiple processes, although threads are more appropriate for computations that require a large amount of data to be shared.
The asynchronous await that you saw used in chapter 3 is strictly concurrent. Because the asynchronous code always runs on a single thread, it isn’t parallel, which means that it can’t currently use the full power of multicore CPUs.
It’s very likely that a future version of Nim will include an asynchronous await that’s parallel.
Unlike asynchronous await, spawn is parallel and has been designed specifically for CPU-intensive computations that can benefit from being executed on multicore CPUs.
Some programming languages, such as Python and Ruby, don’t support thread-level parallelism due to a global interpreter lock in their interpreter. This prevents applications that use threads from using the full power of multicore CPUs. There are ways around this limitation, but they require the use of processes that aren’t as flexible as threads.
Now that you’ve learned the difference between concurrency and parallelism, you’re ready to learn how to use threads in Nim.
In Nim, there are two modules for working with threads. The threads module (http://nim-lang.org/docs/threads.html) exposes the ability to create threads manually. Threads created this way immediately execute a specified procedure and run for the duration of that procedure’s runtime. There’s also the threadpool module (http://nim-lang.org/docs/threadpool.html), which implements a thread pool. It exposes spawn, which adds a specified procedure to the thread pool’s task queue. The act of spawning a procedure doesn’t mean it will be running in a separate thread immediately, though. The creation of threads is managed entirely by the thread pool.
The sections that follow will explain all about the two different threading modules, so don’t feel overwhelmed by the new terms I just introduced.
In this section, we’ll look at the threads module. But before we start, I should explain how threads work in Nim. In particular, you need to know what garbage collector safety (GC safety) is in Nim. There’s a very important distinction between the way threads work in Nim and in most other programming languages. Each of Nim’s threads has its own isolated memory heap. Sharing of memory between threads is restricted, which helps to prevent race conditions and improves efficiency.
Efficiency is also improved by each thread having its own garbage collector. Other implementations of threads that share memory need to pause all threads while the garbage collector does its business. This can add problematic pauses to the application.
Let’s look at how this threading model works in practice. The following listing shows a code sample that doesn’t compile.
var data = "Hello World" 1 proc showData() {.thread.} = 2 echo(data) 3 var thread: Thread[void] 4 createThread[void](thread, showData) 5 joinThread(thread) 6
The threads module is a part of the implicitly imported system module, so you don’t need to import it explicitly.
This example illustrates what’s disallowed by the GC safety mechanism in Nim, and you’ll see later on how to fix this example so that it compiles.
Save the code in listing 6.1 as listing01.nim, and then execute nim c --threads:on listing01.nim to compile it. The --threads:on flag is necessary to enable thread support. You should see an error similar to this:
listing01.nim(3, 6) Error: 'showData' is not GC-safe as it accesses 'data' which is a global using GC'ed memory
This error describes the problem fairly well. The global variable data has been created in the main thread, so it belongs to the main thread’s memory. The showData thread can’t access another thread’s memory, and if it attempts to, it’s not considered GC safe by the compiler. The compiler refuses to execute threads that aren’t GC safe.
A procedure is considered GC safe by the compiler as long as it doesn’t access any global variables that contain garbage-collected memory. An assignment or any sort of mutation also counts as an access and is disallowed. Garbage-collected memory includes the following types of variables:
There are other ways of sharing memory between threads that are GC safe. You may, for example, pass the contents of data as one of the parameters to showData. The following listing shows how to pass data as a parameter to a thread; the differences between listings 6.2 and 6.1 are shown in bold.
var data = "Hello World" proc showData(param: string) {.thread.} = 1 echo(param) 2 var thread: Thread[string] 3 createThread[string](thread, showData, data) 4 joinThread(thread)
Save the code in listing 6.2 as listing2.nim, and then compile it using nim c --threads:on listing2.nim. The compilation should be successful, and running the program should display "Hello World".
The createThread procedure can only pass one variable to the thread that it’s creating. In order to pass multiple separate pieces of data to the thread, you must define a new type to hold the data. The following listing shows how this can be done.
type ThreadData = tuple[param: string, param2: int] var data = "Hello World" proc showData(data: ThreadData) {.thread.} = echo(data.param, data.param2) var thread: Thread[ThreadData] createThread[ThreadData](thread, showData, (param: data, param2: 10)) joinThread(thread)
The threads created in the previous listings don’t do very much. Let’s examine the execution of these threads and see what happens when two threads are created at the same time and are instructed to display a few lines of text. In the following examples, two series of integers are displayed.
var data = "Hello World" proc countData(param: string) {.thread.} = for i in 0 .. <param.len: 1 stdout.write($i) 2 echo() 3 var threads: array[2, Thread[string]] 4 createThread[string](threads[0], countData, data) 5 createThread[string](threads[1], countData, data) 5 joinThreads(threads) 6
Save the code in listing 6.4 as listing3.nim, and then compile and run it. Listing 6.5 shows what the output will look like in most cases, and listing 6.6 shows what it may sometimes look like instead.
001122334455667788991010
012345678910 012345678910
The execution of the threads depends entirely on the OS and computer used. On my machine, the output in listing 6.5 likely happens as a result of the two threads running in parallel on two CPU cores, whereas the output in listing 6.6 is a result of the first thread finishing before the second thread even starts. Your system may show different results. Figure 6.3 shows what the execution for both the first and second sets of results looks like.
The threads created using the threads module are considerably resource intensive. They consume a lot of memory, so creating large numbers of them is inefficient. They’re useful if you need full control over the threads that your application is using, but for most use cases the threadpool module is superior. Let’s take a look at how the threadpool module works.
The main purpose of using multiple threads is to parallelize your code. CPU-intensive computations should make use of as much CPU power as possible, which includes using the power of all the cores in a multicore CPU.
A single thread can use the power of a single CPU core. To use the power of all the cores, you could simply create one thread per core. The biggest problem then is making sure that those threads are all busy. You might have 100 tasks that don’t all take the same amount of time to complete, and distributing them across the threads isn’t a trivial job.
Alternatively, one thread per task could be created. But this creates problems of its own, in part because thread creation is very expensive. A large number of threads will consume a lot of memory due to OS overhead.
The threadpool module implements an abstraction that manages the distribution of tasks over a number of threads. The threads themselves are also managed by the thread pool.
The spawn command allows tasks, in the form of procedure invocations, to be added to the thread pool, which then executes the tasks in one of the threads it manages. The thread pool ensures that the tasks keep all the threads busy so that the CPU’s power is utilized in the best way possible. Figure 6.4 shows how the thread pool manages tasks under the hood.
The spawn procedure accepts an expression, which in most cases is a procedure call. spawn returns a value of the type FlowVar[T] that holds the return value of the procedure that was called. This is an advantage in comparison to the threads module, where threads can’t return any values.
The following listing shows the spawn equivalent of the code in listing 6.4.
import threadpool 1 var data = "Hello World" proc countData(param: string) = 2 for i in 0 .. <param.len: stdout.write($i) echo() spawn countData(data) 3 spawn countData(data) sync() 4
Save the code in listing 6.7 as listing4.nim, and then compile and run it. Keep in mind that the --threads:on flag still needs to be specified. The output should be mostly the same as the output shown in listings 6.5 and 6.6.
Procedures executed using spawn also have to be GC safe.
Let’s look at an example that shows how to retrieve the return values from a spawned procedure. This involves dealing with the FlowVar[T] type.
FlowVar[T] can be thought of as a container similar to the Future[T] type, which you used in chapter 3. At first, the container has nothing inside it. When the spawned procedure is executed in a separate thread, it returns a value sometime in the future. When that happens, the returned value is put into the FlowVar container.
The following listing shows the readLine procedure from chapter 3, which uses a while loop to read text from the terminal without blocking.
import threadpool, os 1 let lineFlowVar = spawn stdin.readLine() 2 while not lineFlowVar.isReady: 3 echo("No input received.") 4 echo("Will check again in 3 seconds.") 4 sleep(3000) 5 echo("Input received: ", ^lineFlowVar) 6
Save listing 6.8 as listing5.nim, and then compile and run it. The application will wait until you enter some input into the terminal. It will check for input every 3 seconds.
Using the FlowVar type is straightforward. You can read the value inside it with the ^ operator, but this operator will block the thread it’s used in until the FlowVar it’s called on contains a value. You can check whether a FlowVar contains a value by using the isReady procedure. Listing 6.8 checks whether the lineFlowVar variable contains a value periodically, every 3 seconds.
Keep in mind that listing 6.8 is meant to demonstrate how the FlowVar[T] works. It’s not meant to be very practical, because the program will only check for input every 3 seconds.
In this example, you could just as well call readLine on the main thread, since there’s nothing else running on it. A more realistic example might replace the sleep(3000) statement with another procedure that does some useful work on the main thread. For example, you might draw your application’s user interface or call the asynchronous I/O event loop’s poll procedure, as in chapter 3.
The ways exceptions behave in separate threads may be surprising. When a thread crashes with an unhandled exception, the application will crash with it. It doesn’t matter whether you read the value of the FlowVar or not.
This behavior will change in a future version of Nim, so that exceptions aren’t raised unless you read the value of the FlowVar.
The following listing shows this behavior in action.
import threadpool proc crash(): string = raise newException(Exception, "Crash") let lineFlowVar = spawn crash() sync()
Save listing 6.9 as listing6.nim, and then compile and run it. You should see a traceback in the output pointing you to the raise statement in the crash procedure.
The raises pragma can be used to ensure that your threads handle all exceptions. To make use of it, you can define the crash procedure like so: proc crash(): string {.raises: [].} = .... This will mark the crash procedure as raising no exceptions. Exceptions that are allowed to be raised by the procedure can be specified in the square brackets.
In summary, the simplicity of both passing arguments to a spawned procedure and receiving the procedure’s result makes spawn good for tasks that have a relatively short runtime. Such tasks typically produce results at the end of their execution, and as such don’t need to communicate with other threads until their execution stops.
For long-running tasks that need to communicate with other threads periodically, the createThread procedure defined in the threads module should be used instead.
Now that you know how to use threads in Nim, let’s look at a practical example of how they can be used. The example in this section involves parsers and shows a practical use case involving Nim’s concurrency and parallelism features.
A lot of data is generated every day from many different sources and intended for many different applications. Computers are very useful tools for processing this data, but in order for that data to be consumed, the computers must understand the format the data is stored in.
A parser is a software component that takes data as input and builds a data structure out of it. The input data is typically in the form of text. In chapter 3, you looked at the JSON data format and at how it was parsed, using the json module, into a data structure that could then be queried for specific information.
There often comes a time when you need to write a custom parser for a simple data format. There are many ways such a task can be tackled in Nim.
In this section, I’ll show you how to write a parser for Wikipedia’s page-view data.[2] This data is useful for many different applications, but in this section we’ll create an application that will find the most popular page in the English Wikipedia. In the process, you’ll do the following:
Wikipedia recently introduced a Pageview API (https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI) that supplements the raw page-view data and makes finding the most popular page in the English Wikipedia much easier. If you’re writing an application that needs to find the most popular pages on Wikipedia, you may want to use the API instead. Parsing the raw data manually is less efficient, but you’ll find the example applicable to other tasks.
At the end of this section, I’ll also show you how to parallelize the parser, allowing it to perform better on systems with multicore CPUs.
The raw page-count data can be downloaded from Wikipedia here: https://dumps.wikimedia.org/other/pagecounts-all-sites/.
The data files are organized into specific years and months. For example, the page-count data for January 2016 is available at https://dumps.wikimedia.org/other/p-agecounts-all-sites/2016/2016-01/. The data is then further subdivided into days and hours. Each file at the preceding URL represents the visitors within a single hour. The files are all gzipped to reduce their size.
Download the following file and then extract it: https://dumps.wikimedia.org/other/pagecounts-all-sites/2016/2016-01/pagecounts-20160101-050000.gz.
On Windows, you may need to install 7-Zip or another application for extracting gzipped archives.
The file may take a while to download, depending on your internet speed. It’s around 92 MB before extraction, and around 428 MB after extraction, so it’s a fairly large file. The parser will need to be efficient to parse that file in a timely manner.
The file is filled with lines of text separated by newline characters. Each line of text consists of the following four fields separated by spaces:
domain_code page_title count_views total_response_size
domain_code contains an abbreviated domain name; for example, en.wikipedia.org is abbreviated as en. page_title contains the title of the page requested; for example, Dublin for http://en.wikipedia.org/wiki/Dublin. count_views contains the number of times the page has been viewed within the hour. Finally, total_response_size is the number of bytes that have been transferred due to requests for that page.
For example, consider the following line:
en Nim_(programming_language) 1 70231
This means that there was one request to http://en.wikipedia.org/wiki/Nim_(programming_language) that accounted in total for 70,231 response bytes.
The file I asked you to download is one of the smaller files from January 2016. It contains data about the Wikipedia pages visited from January 1, 2016, 4:00 a.m. UTC, to January 1, 2016, 5:00 a.m. UTC.
There are many different options when it comes to parsing the page-counts format. I’ll show you how to implement a parser using three different methods: regular expressions, the split procedure, and the parseutils module.
A common way to parse data is using regular expressions (regexes), and if you’ve ever dealt with string processing in any way, you’ve likely come across them. Regular expressions are very popular, and often when developers need to parse a string, they immediately jump to using regular expressions.
Regular expressions are by no means a magical solution to every parsing problem. For example, writing a regular expression to parse arbitrary HTML is virtually impossible.[3] But for parsing a simple data format like the Wikipedia page-counts format, regular expressions work well.
That doesn’t stop people from trying: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags.
Explaining regular expressions in depth is beyond the scope of this chapter. If you aren’t familiar with them, I encourage you to read up on them online.
Regular expressions are supported in Nim via the re module. It defines procedures and types for using regular expressions to parse and manipulate strings.
The re module is an impure module, which means it depends on an external C library. In re’s case, the C library is called PCRE, and it must be installed alongside your application for your application to function properly.
Let’s focus on parsing a single line first. The following listing shows how you can do that with the re module.
import re 1 let pattern = re"([^s]+)s([^s]+)s(d+)s(d+)" 2 var line = "en Nim_(programming_language) 1 70231" var matches: array[4, string] 3 let start = find(line, pattern, matches) 4 doAssert start == 0 5 doAssert matches[0] == "en" 6 doAssert matches[1] == "Nim_(programming_language)" 6 doAssert matches[2] == "1" 6 doAssert matches[3] == "70231" 6 echo("Parsed successfully!")
Constructing a regular expression is an expensive operation. When you’re performing multiple regex matches with the same regular expression, make sure you reuse the value returned by the re constructor.
Save listing 6.10 as listing7.nim, and then compile and run it. The program should compile and run successfully, displaying “Parsed successfully!”
If the program exits with an error similar to could not load: pcre.dll, you’re missing the PCRE library and must install it.
The code for parsing strings with regular expressions is straightforward. As long as you know how to create regular expressions, you should have no trouble using it.
The re module also includes other procedures for parsing and manipulating strings. For example, you can replace matched substrings using the replace procedure. Take a look at the documentation for the re module for more information (http://nim-lang.org/docs/re.html).
You can also parse data manually in many different ways. This approach provides multiple advantages but also a few disadvantages. The biggest advantage over using regular expressions is that your application will have no dependency on the PCRE library. Manual parsing also makes it easier to control the parsing process. In some cases, the biggest disadvantage is that it takes more code to parse data manually.
For such a simple data format as the Wikipedia page-counts format, you can use the split procedure defined in the strutils module. The following listing shows how split can be used to parse “en Nim_(programming_language) 1 70231.”
import strutils 1 var line = "en Nim_(programming_language) 1 70231" var matches = line.split() 2 doAssert matches[0] == "en" 3 doAssert matches[1] == "Nim_(programming_language)" 3 doAssert matches[2] == "1" 3 doAssert matches[3] == "70231" 3
This solution will work very well for this use case, but for more-complex data formats, you may want a solution that’s more flexible. The most flexible way to parse a string is to iterate over every character in that string using a while loop. This method of parsing is very verbose, but it’s useful in certain circumstances, such as when parsing more-complex data formats like HTML. Nim’s parseutils module defines procedures that make parsing using such methods much easier.
The following listing shows how the parseutils module can be used to parse “en Nim_(programming_language) 1 70231.”
import parseutils 1 var line = "en Nim_(programming_language) 1 70231" var i = 0 2 var domainCode = "" 3 i.inc parseUntil(line, domainCode, {' '}, i) 4 i.inc 5 var pageTitle = "" 3 i.inc parseUntil(line, pageTitle, {' '}, i) 4 i.inc 5 var countViews = 0 3 i.inc parseInt(line, countViews, i) 6 i.inc 5 var totalSize = 0 3 i.inc parseInt(line, totalSize, i) 6 doAssert domainCode == "en" doAssert pageTitle == "Nim_(programming_language)" doAssert countViews == 1 doAssert totalSize == 70231
The code in listing 6.12 is far more complex than the previous listing, but it allows for far greater flexibility.
The parseutils module also defines many other procedures that are useful for parsing. They’re mostly convenience wrappers over a while loop. For example, the equivalent code for i.inc parseUntil(line, domainCode, {' '}, i) is the following:
while line[i] != ' ': domainCode.add(line[i]) i.inc
Because of the flexibility of this parser, the code can parse the last two fields into integers in a single step. That’s instead of having to first separate the fields and then parse the integer, which is inefficient.
In summary, the split procedure is the simplest approach, but it’s slower than parseutils because it needs to create a sequence and new strings to hold the matches. In comparison, the parsing code that uses parseutils only needs to create two new strings and two new integers; there’s no overhead associated with the creation of a sequence.
The regex parsing code is also simpler than parseutils, but it suffers from the PCRE dependency and is also slower than the parseutils parser.
This makes the parseutils parser the best solution for this use case, even though it’s slightly more complex and significantly more verbose. Its speed will come in handy when parsing the 7,156,099 lines in the pagecounts-20160101-050000 file.
The Wikipedia page-counts files are large. Each measures around 500 MB and contains around 10 million lines of data. The pagecounts-20160101-050000 file that I asked you to download measures 428 MB and contains 7,156,099 lines of page-count data.
In order to parse this file efficiently, you’ll need to consume the file in fragments. Reading the full file into your program’s memory would consume at least 428 MB of RAM, and the actual consumption would likely be far larger due to various overheads. That’s why it’s a good idea to read large files by breaking them up into conveniently sized, smaller fragments, otherwise known as chunks.
Nim defines an iterator called lines that iterates over each line in a file. This iterator doesn’t need to copy the full file’s contents into the program’s memory, which makes it very efficient. The lines iterator is defined in the system module.
The following listing shows how the lines iterator can be used to read lines from the pagecounts-20160101-050000 file.
import os 1 proc readPageCounts(filename: string) = 2 for line in filename.lines: 3 echo(line) 4 when isMainModule: 5 const file = "pagecounts-20160101-050000" 6 let filename = getCurrentDir() / file 7 readPageCounts(filename) 8
Save listing 6.13 as sequential_counts.nim, and then compile and run it. The program will take around a minute to execute because it will display each line of the page-counts file. You may terminate it by pressing Ctrl-C. As it runs, you can observe the memory usage, which should remain low.
You can now add the parsing code from listing 6.12 to the code in listing 6.13. Listing 6.14 shows how the parser can be integrated into listing 6.13, with the changes highlighted in bold.
import os, parseutils proc parse(line: string, domainCode, pageTitle: var string, countViews, totalSize: var int) = 1 var i = 0 domainCode.setLen(0) 2 i.inc parseUntil(line, domainCode, {' '}, i) i.inc pageTitle.setLen(0) 2 i.inc parseUntil(line, pageTitle, {' '}, i) i.inc countViews = 0 3 i.inc parseInt(line, countViews, i) i.inc totalSize = 0 3 i.inc parseInt(line, totalSize, i) proc readPageCounts(filename: string) = var domainCode = "" var pageTitle = "" var countViews = 0 var totalSize = 0 for line in filename.lines: parse(line, domainCode, pageTitle, countViews, totalSize) 4 echo("Title: ", pageTitle) 5 when isMainModule: const file = "pagecounts-20160101-050000" let filename = getCurrentDir() / file readPageCounts(filename)
Replace the code in sequential_counts.nim with the code in listing 6.14. The following listing shows what some of the output from sequential_counts.nim may look like.
... Title: List_of_digital_terrestrial_television_channels_(UK) Title: List_of_diglossic_regions Title: List_of_dignitaries_at_the_state_funeral_of_John_F._Kennedy Title: List_of_dimensionless_quantities Title: List_of_diners Title: List_of_dinosaur_genera Title: List_of_dinosaur_specimens_with_nicknames Title: List_of_dinosaurs ...
The code in listing 6.14 employs a number of optimizations. First, the biggest slowdowns in Nim applications are often caused by too many variables being allocated and deallocated. The parse procedure could return the parsed tokens, but that would result in a new string being allocated for each iteration. Instead, the parse procedure here accepts a mutable reference to two strings and two ints, which it then fills with the parsed tokens. A file that takes 9.3 seconds to parse without this optimization takes 7.8 seconds to parse with the optimization. That’s a difference of 1.5 seconds.
The use of setLen is another optimization. It ensures that the string isn’t reallocated but is instead reused. The parse procedure is executed at least 7 million times, so a tiny optimization creates massive gains in total execution speed.
Now that the parsing code has been introduced, all that’s left is to find the most popular article on the English Wikipedia. The following listing shows the finished sequential_counts application with the latest changes shown in bold.
import os, parseutils proc parse(line: string, domainCode, pageTitle: var string, countViews, totalSize: var int) = var i = 0 domainCode.setLen(0) i.inc parseUntil(line, domainCode, {' '}, i) i.inc pageTitle.setLen(0) i.inc parseUntil(line, pageTitle, {' '}, i) i.inc countViews = 0 i.inc parseInt(line, countViews, i) i.inc totalSize = 0 i.inc parseInt(line, totalSize, i) proc readPageCounts(filename: string) = var domainCode = "" var pageTitle = "" var countViews = 0 var totalSize = 0 var mostPopular = ("", "", 0, 0) 1 for line in filename.lines: parse(line, domainCode, pageTitle, countViews, totalSize) if domainCode == "en" and countViews > mostPopular[2]: 2 mostPopular = (domainCode, pageTitle, countViews, totalSize) 3 echo("Most popular is: ", mostPopular) when isMainModule: const file = "pagecounts-20160101-050000" let filename = getCurrentDir() / file readPageCounts(filename)
Ensure that you compile sequential_counts.nim in release mode by passing the -d:release flag to the Nim compiler. Without that flag, the execution time of the application will be significantly longer.
Replace the contents of sequential_counts.nim with the code in listing 6.16, and then compile it in release mode and run it. After a few seconds, you should see output similar to the following.
Most popular is: (Field0: en, Field1: Main_Page, Field2: 271165, Field3: 4791147476)
The most popular page in the English Wikipedia is in fact the main page! This makes a lot of sense, and although it’s obvious in hindsight, it’s trivial to edit the code you’ve written to find more-interesting statistics. I challenge you to edit sequential_counts.nim and play around with the data. You can try finding the top-10 most popular pages in the English Wikipedia, or you can download different page-counts files and compare the results.
You should now have a good understanding of how you can parse data effectively. You’ve learned what bottlenecks you should look out for in your Nim applications and how to fix them. The next step is to parallelize this parser so that its execution time is even lower on multicore CPUs.
In order for the program to be parallel, it must make use of threads. As mentioned previously, there are two ways that threads can be created in Nim: using the threads module, or using the threadpool module. Both will work, but the threadpool module is more appropriate for this program.
Before we parallelize the code, let’s measure how long sequential_counts takes to execute.
This can be done very easily on UNIX-like OSs by using the time command. Executing time ./sequential_counts should output the execution time of sequential_counts.nim. On a MacBook Pro with an SSD and a dual-core 2.7 GHz Intel Core i5 CPU, which includes hyperthreading, the execution time is about 2.8 seconds.
On Windows, you’ll need to open a new Windows PowerShell window, and then use the Measure-Command command to measure the execution time. Executing Measure-Command {./sequential_counts.exe} should output the execution time.
The program currently runs in a single thread and is very CPU-intensive. This means its speed can be significantly improved by making it parallel.
Create a new parallel_counts.nim file. This is the file that we’ll populate with code from now on.
How can the threadpool module be used to parallelize this code? You may be tempted to spawn the parse procedure, but this won’t work because it needs var parameters that can’t safely be passed to a spawned procedure. It also wouldn’t help much, because a single call to parse is relatively quick.
Before you can parallelize this code, you must first change the way that the page-counts file is read. Instead of reading each line separately, you need to read the file in larger fragments. But what size fragment should you read?
Consider the following scenario. The page-counts file begins with the following lines:
en Main_Page 123 1234567 en Nim_(programming_language) 100 12415551
If the fragment size is so small that only "en Main_Page" is read, the program will fail because the size of the fragment is insufficient.
Alternatively, a fragment might contain valid data at the start, but it may end with a line that was not fully read, such as "en Main_Page 123 1234567 en Nim_". This data will need to be split after every newline (" "), and each line will need to be parsed separately. The last line in this example will lead to an error, because it’s not complete. A solution is to find where the last line ends, and then defer parsing the line that hasn’t been fully read until the next time a fragment of the file is read.
Here’s how parallel_counts.nim should work:
The term chunk is synonymous with the term fragment, and throughout this chapter both will be used interchangeably. A slice means a subset of the full data, such as a substring.
Listings 6.18, 6.19, and 6.20 show different sections of a parallel_counts.nim file that implements this solution.
Listing 6.18 starts with the top section of the file, which is not much different from the sequential version. This section includes the import statement, some new type definitions, and the original parse procedure. A new Stats type is defined to store page-count statistics about a specific page; this type will be used to store the most popular page in each spawned procedure. The Stats type will be returned from the spawned procedure, so it must be a ref type because spawn currently can’t spawn procedures that return custom value types. A new procedure called newStats is also defined, which constructs a new empty Stats object. There’s also the definition of $, which converts a Stats type to a string.
import os, parseutils, threadpool, strutils 1 type Stats = ref object 2 domainCode, pageTitle: string 3 countViews, totalSize: int 3 proc newStats(): Stats = 4 Stats(domainCode: "", pageTitle: "", countViews: 0, totalSize: 0) proc `$`(stats: Stats): string = 5 "(domainCode: $#, pageTitle: $#, countViews: $#, totalSize: $#)" % [ stats.domainCode, stats.pageTitle, $stats.countViews, $stats.totalSize ] proc parse(line: string, domainCode, pageTitle: var string, countViews, totalSize: var int) = 6 if line.len == 0: return var i = 0 domainCode.setLen(0) i.inc parseUntil(line, domainCode, {' '}, i) i.inc pageTitle.setLen(0) i.inc parseUntil(line, pageTitle, {' '}, i) i.inc countViews = 0 i.inc parseInt(line, countViews, i) i.inc totalSize = 0 i.inc parseInt(line, totalSize, i)
Listing 6.19 shows the middle section of the parallel_counts.nim file. It defines a new procedure called parseChunk, which takes a string parameter called chunk and returns the most popular English Wikipedia page in that fragment. The fragment consists of multiple lines of page-count data.
The procedure begins by initializing the result variable; the return type is a ref type that must be initialized so that it’s not nil. The rest of the procedure is similar to the readPageCounts procedure in the sequential_counts.nim file. It defines four variables to store the parsed tokens, and then it iterates through the lines in the chunk using the splitLines iterator, and parses each of the lines.
proc parseChunk(chunk: string): Stats = 1 result = newStats() 2 var domainCode = "" 3 var pageTitle = "" 3 var countViews = 0 3 var totalSize = 0 3 for line in splitLines(chunk): 4 parse(line, domainCode, pageTitle, countViews, totalSize) 5 if domainCode == "en" and countViews > result.countViews: 6 result = Stats(domainCode: domainCode, pageTitle: pageTitle, 7 countViews: countViews, totalSize: totalSize)
Listing 6.20 shows the readPageCounts procedure, which has been modified significantly since the last time you saw it in listing 6.16. It now takes an optional parameter called chunkSize that determines how many characters it should read each iteration. But the procedure’s implementation is what differs most. The file is opened manually using the open procedure, followed by definitions of variables required to properly store the results of the fragment-reading process.
The fragment-reading process is complicated by the fact that the code needs to keep track of unfinished lines. It does so by moving backwards through the contents of buffer, which stores the fragment temporarily, until it finds a newline character. The buffer string is then sliced from the start of the fragment to the end of the last full line in the fragment. The resulting slice is then passed to the parseChunk procedure, which is spawned in a new thread using spawn.
The end of the fragment that hasn’t yet been parsed is then moved to the beginning of the buffer. In the next iteration, the length of the characters that will be read will be chunkSize minus the length of the buffer that wasn’t parsed in the last iteration.
proc readPageCounts(filename: string, chunkSize = 1_000_000) = 1 var file = open(filename) 2 var responses = newSeq[FlowVar[Stats]]() 3 var buffer = newString(chunkSize) 4 var oldBufferLen = 0 5 while not endOfFile(file): 6 let reqSize = chunksize - oldBufferLen 7 let readSize = file.readChars(buffer, oldBufferLen, reqSize) + oldBufferLen8 var chunkLen = readSize 9 while chunkLen >= 0 and buffer[chunkLen - 1] notin NewLines: 10 chunkLen.dec responses.add(spawn parseChunk(buffer[0 .. <chunkLen])) 11 oldBufferLen = readSize - chunkLen buffer[0 .. <oldBufferLen] = buffer[readSize - oldBufferLen .. ^1] 12 var mostPopular = newStats() for resp in responses: 13 let statistic = ^resp 14 if statistic.countViews > mostPopular.countViews: 15 mostPopular = statistic echo("Most popular is: ", mostPopular) file.close() 16 when isMainModule: const file = "pagecounts-20160101-050000" let filename = getCurrentDir() / file readPageCounts(filename)
The parallel version is unfortunately more complex, but the complexity is mostly restricted to the readPageCounts procedure, where the algorithm for reading the file in fragments adds great complexity to the program. In terms of the line count, though, the parallel version is only about twice as long.
Merge listings 6.18, 6.19, and 6.20 into a single parallel_counts.nim file. Then compile and run the program. Make sure you pass both the --threads:on flag as well as the -d:release flag to Nim when compiling. Measure the execution time using the techniques described in section 6.4.1.
On a MacBook Pro with an SSD and a dual core 2.7 GHz Intel Core i5 CPU that includes hyperthreading, the execution time is about 1.2 seconds, which is less than half of the 2.8 seconds that the sequential version took to execute. That’s a considerable difference!
On UNIX-like systems, the time command allows you to verify that the parallel version is in fact parallel by looking at its CPU usage. For example, the time command outputs ./parallel_counts 4.30s user 0.25s system 364% cpu 1.251 total, showing that parallel_counts was using 364% of the available CPU. In comparison, sequential_counts almost always shows around 99% CPU usage. This high CPU usage percentage proves that parallel_counts is using all cores together with hyperthreading.
Now that you’ve seen how to parallelize a parser, you should have a better idea about how to parallelize Nim code in general. The last sections of this chapter will teach you about race conditions and how to avoid them.
You don’t typically need to worry about race conditions when writing concurrent code in Nim because of the restriction that Nim puts on GC-safe procedures: memory belonging to another thread can’t be accessed in a spawned procedure or a procedure marked using the {.thread.} pragma.
A race condition occurs when two or more threads attempt to read and write to a shared resource at the same time. Such behavior can result in unpredictable results that often are difficult to debug. This is one of the reasons why Nim prevents the sharing of some resources between threads. Nim instead prefers data to be shared using alternative methods such as channels, which prevent race conditions.
Sometimes these methods aren’t appropriate for certain use cases, such as when lots of data needs to be modified by the thread. Because of this, Nim also supports shared memory. Sharing memory via global variables is easy as long as you only want to share value types. Sharing reference types is much harder because you must make use of Nim’s manual memory-management procedures.
Using shared memory is risky because it increases the chances for race conditions in your code. Also, you must manage the memory yourself. I advise you to only use shared memory if you’re certain that it’s required and if you know what you’re doing. In future versions of Nim, using shared memory will likely become safer and much easier.
Listing 6.21 implements a simple program that increments the value of a global variable inside two threads running in parallel. The result is a race condition.
import threadpool 1 var counter = 0 2 proc increment(x: int) = for i in 0 .. <x: 3 var value = counter 4 value.inc 5 counter = value 6 spawn increment(10_000) 7 spawn increment(10_000) 7 sync() 8 echo(counter) 9
In this example, the increment procedure is GC safe because the global variable counter it accesses is of type int, which is a value type. The increment procedure increments the global counter variable x times. The procedure is spawned twice, which means that there will be two increment procedures executing at the same time. The fact that they’re both reading, incrementing, and then writing the incremented value to the global counter variable in discrete steps means that some increments may be missed.
Value types, such as integers, can exist on the stack (or in the executable’s data section if the value is stored in a global variable), but reference types such as string, seq[T] and ref T can’t. Nim supports the sharing of reference types, but it won’t manage the memory for you. This may change in a future version of Nim, but currently you must use a procedure called allocShared defined in the system module to allocate shared memory manually.
Save listing 6.21 as race_condition.nim, and then compile it without the -d:release flag and run it. Run it a couple of times and note the results. The results should appear random, and should almost never display the expected value of 20,000. Figure 6.5 shows what the execution of listing 6.21 looks like.
Preventing race conditions is very important because whenever a bug occurs due to a race condition, it’s almost always nondeterministic. The bug will be very difficult to reproduce, and once it is reproduced, debugging it will be even harder because the mere act of doing so may cause the bug to disappear.
Now that you know what race conditions are, let’s look at ways to prevent them.
Just like most languages, Nim provides synchronization mechanisms to ensure that resources are only used by a single thread at a time.
One of these mechanisms is a lock. It enforces limits on access to a resource, and it’s usually paired with a single resource. Before that resource is accessed, the lock is acquired, and after the resource is accessed, it’s released. Other threads that try to access the same resource must attempt to acquire the same lock, and if the lock has already been acquired by another thread, the acquire operation will block the thread until the lock is released. This ensures that only one thread has access to the resource.
Locks work very well, but they aren’t assigned to any variables by default. They can be assigned using a guard. When a variable is guarded with a specific lock, the compiler will ensure that the lock is locked before allowing access. Any other access will result in a compile-time error.
The following listing shows how a new Lock, together with a guard, can be defined.
import threadpool, locks 1 var counterLock: Lock 2 initLock(counterLock) 3 var counter {.guard: counterLock.} = 0 4 proc increment(x: int) = for i in 0 .. <x: var value = counter value.inc counter = value spawn increment(10_000) spawn increment(10_000) sync() echo(counter)
Save listing 6.22 as unguarded_access.nim, and then compile it. The compilation should fail with “unguarded_access.nim(9, 17) Error: unguarded access: counter.” This is because the counter variable is protected by the guard, which ensures that any access to counter must occur after the counterLock lock is locked. Let’s fix this error by locking the counterLock lock.
import threadpool, locks var counterLock: Lock initLock(counterLock) var counter {.guard: counterLock.} = 0 proc increment(x: int) = for i in 0 .. <x: withLock counterLock: 1 var value = counter value.inc counter = value spawn increment(10_000) spawn increment(10_000) sync() echo(counter)
Save the code in listing 6.23 as parallel_incrementer.nim, and then compile and run it. The file should compile successfully and its output should always be 20000, which means that the race condition is fixed! The fact that the compiler verifies that every guarded variable is locked properly ensures the safe execution of the code. It also helps prevent bugs from appearing accidentally in the future, when new code is added or existing code is changed.
Despite all Nim’s efforts to make locks as safe as possible, they may not always be the safest choice. And for some use cases, they may simply be inappropriate, such as when threads share very few resources. Channels offer an alternative form of synchronization that allows threads to send and receive messages between each other.
A channel is an implementation of a queue—a first-in-first-out (FIFO) data structure. This means that the first value to be added to the channel will be the first one to be removed. The best way to visualize such a data structure is to imagine yourself queuing for food at a cafeteria. The first person to queue is also the first person to get their food. Figure 6.6 shows a representation of a FIFO channel.
Nim implements channels in the channels module of the standard library. This module is part of system, so it doesn’t need to be explicitly imported.
A channel is created as a global variable, allowing every thread to send and receive messages through it. Once a channel is defined, it must be initialized using the open procedure. Listing 6.24 defines and initializes a new chan variable of type Channel [string]. You can specify any type inside the square brackets, including your own custom types.
var chan: Channel[string] open(chan)
Values can be sent using the send procedure and received using the recv procedure. The following listing shows how to use both procedures.
import os, threadpool 1 var chan: Channel[string] open(chan) proc sayHello() = sleep(1000) 2 chan.send("Hello!") spawn sayHello() 3 doAssert chan.recv() == "Hello!" 4
The recv procedure will block until a message is received. You can use the tryRecv procedure to get nonblocking behavior; it returns a tuple consisting of a Boolean, which indicates whether or not data was received, and the actual data.
To give you a better idea of how channels work, let’s implement listing 6.23 with channels instead of locks. The following listing shows parallel_incrementer.nim implemented using channels.
import threadpool var resultChan: Channel[int] 1 open(resultChan) 2 proc increment(x: int) = var counter = 0 3 for i in 0 .. <x: counter.inc resultChan.send(counter) 4 spawn increment(10_000) spawn increment(10_000) sync() 5 var total = 0 for i in 0 .. <resultChan.peek: 6 total.inc resultChan.recv() 7 echo(total)
The global counter variable is replaced by a global resultChan channel. The increment procedure increments a local counter variable x times, and then it sends counter’s value through the channel. This is done in two different threads.
The main thread waits for the two threads to finish, at which point it reads the messages that have been sent to the resultChan. Figure 6.7 shows the execution of listing 6.26.
13.59.32.1