Chapter 4. On scalars

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 4. On scalars

It requires a very unusual mind to undertake the analysis of the obvious.

Alfred North Whitehead^[1]

¹ Science and the Modern World (Macmillan, 1925).

This chapter covers

Understanding precision
Trying to be rational
When to use keywords
Symbolic resolution
Regular expressions—the second problem

So far, we’ve covered a somewhat eclectic mix of fun and practical concerns. This brings us to a point where we can dive deeper into a fundamental topic: how Clojure deals with scalar values, including numeric, symbolic, and regular expression values, and how they behave as data and sometimes as code.

A scalar data type represents a singular value of one of the following types: number, symbol, keyword, string, or character. Most of the use cases for Clojure’s scalar data types will be familiar to you, but there are nuances that should be observed. Clojure’s scalar data types exist in an interesting conceptual space. Because of Clojure’s symbiotic relationship with the host language it’s embedded in, some of the scalar type behaviors walk a conceptual line between pure Clojure semantics and host semantics. This chapter provides a rundown of some of the common uses of Clojure’s scalar data types as well as some pitfalls you might encounter. In most cases, Clojure will shield you from the quirks of its host, but there are times when they’ll demand attention. We’ll talk about their limitations and possible mitigation techniques. Additionally, we’ll address the age-old topic of Lisp-1 versus Lisp-2 implementations and how Clojure approaches the matter. Finally, we’ll talk briefly about Clojure’s regular expression literals and how they’re typically used.

We’ll first cover matters of numerical precision and how the Java Virtual Machine works to thwart your attempts at mathematical nirvana.

4.1. Understanding precision

Numbers in Clojure are by default as precise as they need to be. Given enough memory, you could store the value of pi accurately up to a billion places and beyond; in practice, values that large are rarely needed. But it’s sometimes important to provide perfect accuracy at less-precise values. When dealing with raw Clojure functions and forms, it’s a trivial matter to ensure such accuracy; it’s handled automatically. Because Clojure encourages interoperability with its host platform, the matter of accuracy becomes less than certain. This section will talk about real matters of precision related to Clojure’s support for the Java Virtual Machine. As it pertains to programming languages, numerical precision^[2] is proportional to the mechanisms used for storing numerical representations. The Java language specification describes the internal representation of its primitive types, thus limiting their precision. Depending on the class of application specialization, a programmer could go an entire career and never be affected by Java’s precision limitations. But many industries require perfect accuracy of arbitrarily precise computations, and here Clojure can provide a great boon; but with this power come some pointy edges, as we’ll discuss shortly.

² As opposed to arithmetic precision.

4.1.1. Truncation

Truncation refers to limiting accuracy for a floating-point number based on a deficiency in the corresponding representation. When a number is truncated, its precision is limited such that the maximum number of digits of accuracy is bound by the number of bits that can “fit” into the storage space allowed by its representation. For floating-point values, Clojure truncates by default. Therefore, if high precision is required for your floating-point operations, then explicit typing is required, as seen with the use of the M literal in the following:

(let [imadeuapi 3.14159265358979323846264338327950288419716939937M]
  (println (class imadeuapi))
  imadeuapi)
; java.math.BigDecimal
;=> 3.14159265358979323846264338327950288419716939937M

(let [butieatedit 3.14159265358979323846264338327950288419716939937]
  (println (class butieatedit))
  butieatedit)
; java.lang.Double
;=> 3.141592653589793

As we show, the local butieatedit is truncated because the default Java double type is insufficient. On the other hand, imadeuapi uses Clojure’s literal notation, a suffix character M, to declare a value as requiring arbitrary decimal representation. This is one possible way to mitigate truncation for an immensely large range of values, but as we’ll explore in section 4.2, it’s not a guarantee of perfect precision.

4.1.2. Promotion

Clojure is able to detect when overflow occurs, and it promotes the value to a numerical representation that can accommodate larger values. In many cases, promotion results in the usage of a pair of classes used to hold exceptionally large values. This promotion in Clojure is automatic, because the primary focus is first correctness of numerical values, then raw speed. It’s important to remember that this promotion will occur, as shown in the following snippet, and your code should accommodate this certainty:^[3]

³ In the example, it’s important to realize that the actual class of the value is changing, so any functions or methods reliant on specific types might not work as expected.

Java has a bevy of contexts under which automatic type conversion occurs, so we advise you to familiarize yourself with them (Lindholm 1999) when dealing with Java native libraries.

4.1.3. Overflow

Integer and long values in Java are subject to overflow errors. When a numeric calculation results in a value that’s larger than 32 bits of representation will allow, the bits of storage wrap around. When you’re operating in Clojure, overflow isn’t an issue for most cases, thanks to promotion. But when you’re dealing with numeric operations on primitive types, overflow can occur. Fortunately, in these instances an exception will occur rather than propagating inaccuracies:

(+ Long/MAX_VALUE Long/MAX_VALUE)
;=> java.lang.ArithmeticException: integer overflow

Clojure provides a class of unchecked integer and long mathematical operations that assume their arguments are primitive types. These unchecked operations will overflow if given excessively large values:

(unchecked-add (Long/MAX_VALUE) (Long/MAX_VALUE))
;=> -2

You should take care with unchecked operations, because there’s no way to detect overflowing values and no reliable way to return from them. Use the unchecked functions only when overflow is desired.

4.1.4. Underflow

Underflow is the inverse of overflow, where a number is so small that its value collapses into zero. Here are simple examples of underflow for floats and doubles:

(float 0.0000000000000000000000000000000000000000000001)
;=> 0.0

1.0E-430
;=> 0.0

Underflow presents a danger similar to overflow, except that it occurs only with floating-point numbers.

4.1.5. Rounding errors

When the representation of a floating-point value isn’t sufficient for storing its actual value, then rounding errors will occur (Goldberg 1991). Rounding errors are an especially insidious numerical inaccuracy, because they have a habit of propagating throughout a computation and/or build over time, leading to difficulties in debugging. There’s a famous case involving the failure of a Patriot missile caused by a rounding error, resulting in the death of 28 U.S. soldiers in the first Gulf War (Skeel 1992). This occurred due to a rounding error in the representation of a count register’s update interval. The timer register was meant to update once every 0.1 seconds, but because the hardware couldn’t represent 0.1 directly, an approximation was used instead. Tragically, the approximation used was subject to rounding error. Therefore, over the course of 100 hours, the rounding accumulated into a timing error of approximately 0.34 seconds:

In the case of the Patriot missile, the deviation of 0.34 seconds was enough to cause a catastrophic software error, resulting in the missile’s ineffectiveness. When human lives are at stake, the inaccuracies wrought from rounding errors are unacceptable. For the most part, Clojure can maintain arithmetic accuracies in a certain range, but you shouldn’t take for granted that such will be the case when interacting with Java libraries.

One way to contribute to rounding errors is to introduce doubles and floats into an operation. In Clojure, any computation involving even a single double results in a value that’s a double:

(+ 0.1M 0.1M 0.1M 0.1 0.1M 0.1M 0.1M 0.1M 0.1M 0.1M)
;=> 0.9999999999999999

Can you spot the double?

This discussion was Java-centric, but Clojure’s ultimate goal is to be platform agnostic, and the problem of numerical consistency across platforms is a nontrivial matter. It’s still unknown whether the preceding points will be universal across host platforms, so bear in mind that they should be reexamined when using Clojure outside the context of the JVM. Now that we’ve identified the root issues when dealing with numbers in Clojure, we’ll dive into a successful mitigation technique for dealing with them: rationals.

4.2. Trying to be rational

Clojure provides a data type representing a rational number, and all of its core mathematical functions operate with rational numbers. A rational number is a fraction consisting of an arbitrary-precision integer numerator and denominator of the form 22/7. Clojure’s rational type allows for arbitrarily large numerators and denominators. We won’t go into depth about the limitations of floating-point operations, but the problem can be summarized simply. Given a finite representation of an infinitely large set, a determination must be made which finite subset is represented. In the case of standard floating-point numbers as representations of real numbers, the distribution of represented numbers is logarithmic (Kuki 1973) and not one for one. What does this mean in practice? That requiring more accuracy in your floating-point operations increases the probability that the corresponding representation won’t be available. In these circumstances, you have to settle for approximations. But Clojure’s rational number type provides a way to retain perfect accuracy when needed.

4.2.1. Why be rational?

Of course, Clojure provides a decimal type that’s boundless relative to your computer memory, so why wouldn’t you use that? In short, you can, but decimal operations can be easily corrupted—especially when you’re working with existing Java libraries (Kahan 1998), taking and returning primitive types. Additionally, in the case of Java, its underlying BigDecimal class is finite in that it uses a 32-bit integer to represent the number of digits to the right of the decimal place. This can represent an extremely large range of values perfectly, but it’s still subject to error:

1.0E-430000000M
;=> 1.0E-430000000M

1.0E-4300000000M
;=> java.lang.RuntimeException: java.lang.NumberFormatException

Even if you manage to ensure that your BigDecimal values are free from floating-point corruption, you can never protect them from themselves. At some point, a floating-point calculation will encounter a number such as 2/3 that always requires rounding, leading to subtle yet propagating errors. Finally, floating-point arithmetic is neither associative nor distributive, which may lead to the shocking result that many floating-point calculations are dependent on the order in which they’re carried out:

As shown, by selectively wrapping parentheses, we’ve changed the order of operations and, amazingly, changed the answer! Therefore, for absolutely precise calculations, rationals are the best choice.^[4]

⁴ In the case of irrational numbers like pi, all bets are off.

4.2.2. How to be rational

Aside from the rational data type, Clojure provides functions that can help maintain your sanity: ratio?, rational?, and rationalize. Taking apart rationals is also a trivial matter.

The best way to ensure that your calculations remain as accurate as possible is to do them all using rational numbers. As shown next, the shocking results from using floating-point numbers have been eliminated:

You can use rational? to check whether a given number is a rational and then use rationalize to convert it to one. There are a few rules of thumb to remember if you want to maintain perfect accuracy in your computations:

Never use Java math libraries unless they return results of BigDecimal, and even then be suspicious.
Don’t rationalize values that are Java float or double primitives.
If you must write your own high-precision calculations, do so with rationals.
Only convert to a floating-point representation as a last resort.

Finally, you can extract the constituent parts of a rational using the numerator and denominator functions:

(numerator (/ 123 10))
;=> 123
(denominator (/ 123 10))
;=> 10

You might never need perfect accuracy in your calculations. When you do, Clojure provides tools for maintaining sanity, but the responsibility to maintain rigor lies with you.

4.2.3. Caveats of rationality

Like any programming language feature, Clojure’s rational type has its share of trade-offs. The calculation of rational math, although accurate, isn’t nearly as fast as with floats or doubles. Each operation in rational math has an overhead cost (such as finding the least common denominator) that should be accounted for. It does you no good to use rational operations if speed is a greater concern than accuracy.

That covers the numerical scalars, so we’ll move on to two data types that you may not be familiar with unless you come from a background in the Lisp family of languages: keywords and symbols.

4.3. When to use keywords

If you’ll recall from section 2.1.6, keywords are self-evaluating types that are prefixed by one or more colons, as shown next:

:a-keyword

;;=> :a-keyword

::also-a-keyword

;;=> :user/also-a-keyword

Knowing how to type a keyword is important,^[5] but the purpose of Clojure keywords can sometimes lead to confusion for first-time Clojure programmers, because their analogue isn’t often found^[6] in other languages. This section attempts to alleviate the confusion and provides some tips for how keywords are typically used.

⁵ You’ll notice that the third keyword has two colons in the front. This is a qualified keyword and is discussed in section 4.3.2.

⁶ Ruby has a symbol type that acts, looks, and is used similarly to Clojure keywords.

4.3.1. Applications of keywords

Keywords always refer to themselves, whereas symbols don’t. What this means is that the keyword :magma always has the value :magma, whereas the symbol ruins may refer to any legal Clojure value or reference.

As keys

In Clojure code in the wild, keywords are almost always used as map keys:

(def population {:zombies 2700, :humans 9})

(get population :zombies)
;=> 2700

(println (/ (get population :zombies)
            (get population :humans))
         "zombies per capita")
; 300 zombies per capita

The use of the get function does what you might expect. That is, get takes a collection and attempts to look up its keyed value. But there’s a special property of keywords that makes this lookup more concise.

As functions

So far, we’ve somewhat oversimplified the capabilities of keywords. Although they’re self-evaluating, they’re also sometimes evaluated as functions. When a keyword appears in the function position, it’s treated as a function. Therefore, another important reason to use keywords as map keys is that they’re also functions that take a map as an argument to perform lookups of themselves:

(:zombies population)
;=> 2700

(println (/ (:zombies population)
            (:humans population))
         "zombies per capita")
; 300 zombies per capita

Using keywords as map-lookup functions leads to much more concise code.

As enumerations

Often, Clojure code uses keywords as enumeration values, such as :small, :medium, and :large. This provides a nice visual delineation in the source code.

As multimethod dispatch values

Because keywords are used often as enumerations, they’re ideal candidates for dispatch values for multimethods, which we’ll explore in more detail in section 9.2.

As directives

Another common use for keywords is to provide a directive to a function, multi-method, or macro. A simple way to illustrate this is to imagine a simple function pour, shown in listing 4.1, which takes two numbers and returns a lazy sequence of the range of those numbers. But there’s also a mode for this function that takes a keyword :toujours, which instead returns an infinite lazy range starting with the first number and continuing “forever.”

Listing 4.1. Using a keyword as a function directive

An illustrative bonus with pour is that the macro cond uses a directive :else to mark the default conditional case. In this case, cond uses the fact that the keyword :else is truthy; any keyword (or truthy value) would work just as well.

4.3.2. Qualifying your keywords

Keywords don’t belong to any specific namespace, although they may appear to if you start them with two colons rather than only one:

::not-in-ns
;=> :user/not-in-ns

When Clojure sees a double colon, it assumes that the programmer wants a qualified, or prefixed keyword. Because this example doesn’t specify an exact prefix, Clojure uses the current namespace name—in this case, user—to automatically qualify the keyword. But the prefix portion of the keyword marked as :user/ only looks like it’s denoting a namespace, when in fact it’s a prefix gathered from the current namespace by the Clojure reader. This may seem like a distinction without a difference—we’ll show how the prefix is arbitrary in relation to existing namespaces. First, you create a new namespace and manually create a prefixed keyword:

(ns another)

another=> :user/in-another

;=> :user/in-another

This example creates a namespace another and a keyword :user/in-another that appears to belong to the preexisting user namespace but in fact is only prefixed to look that way. The prefix on a keyword is arbitrary and in no way associates it with a given namespace as far as Clojure is concerned. You can even create a keyword with a prefix that’s not named the same as any existing namespace:

:haunted/name
;=> :haunted/name

You create a keyword :haunted/name, showing that the prefix doesn’t have to be an existing namespace name. But the fact that keywords aren’t members of any given namespace doesn’t mean namespace-qualifying them is pointless. Instead, it’s often clearer to do so, especially when a namespace aggregates a specific functionality and its keywords are meaningful in that context.

Separating the plumbing from the domain

Even though qualified keywords can have any arbitrary prefix, sometimes it’s useful to use namespaces to provide special information for keywords. In a namespace named crypto, the keywords ::rsa and ::blowfish make sense as being namespace qualified. Similarly, if you create a namespace aquarium, then using ::blowfish in it is contextually meaningful. Likewise, when adding metadata to structures, you should consider using qualified keywords as keys and directives if their intention is domain oriented. Consider the following code:

(defn do-blowfish [directive]
  (case directive
    :aquarium/blowfish (println "feed the fish")
    :crypto/blowfish   (println "encode the message")
    :blowfish          (println "not sure what to do")))
(ns crypto)
(user/do-blowfish :blowfish)
; not sure what to do

(user/do-blowfish ::blowfish)
; encode the message
(ns aquarium)
(user/do-blowfish :blowfish)
; not sure what to do
(user/do-blowfish ::blowfish)
; feed the fish

When switching to different namespaces using ns, you can use the namespacequalified keyword syntax to ensure that the correct domain-specific code path is executed.

Namespace qualification is especially important when you’re creating ad hoc hierarchies and defining multimethods, both discussed in section XREF ch09lev1sec2.

4.4. Symbolic resolution

In the previous section, we covered the differences between symbols and keywords. Whereas keywords are fairly straightforward, symbols abide by a slightly more complicated system for lookup resolution.

Symbols in Clojure are roughly analogous to identifiers in many other languages—words that refer to other things. In a nutshell, symbols are primarily used to provide a name for a given value. But in Clojure, symbols can also be referred to directly, by using the symbol or quote function or the ' special operator. Symbols tend to be discrete entities from one lexical contour (or scope) to another, and often even in a single contour. Unlike keywords, symbols aren’t unique based solely on name alone, as you can see in the following:

(identical? 'goat 'goat)
;=> false

identical? returns false in this example because each goat symbol is a discrete object that only happens to share a name and therefore a symbolic representation. But that name is the basis for symbol equality:

(= 'goat 'goat)
;=> true

(name 'goat)
"goat"

The identical? function in Clojure only ever returns true when the symbols are the same object:

(let [x 'goat, y x]
  (identical? x y))

;=> true

In the preceding example, x is also a symbol; but when evaluated in the (identical? x y) form, it returns the symbol goat, which is being stored on the runtime call stack. The question arises: why not make two identically named symbols the same object? The answer lies in metadata, which we discuss next.

4.4.1. Metadata

Clojure lets you attach metadata to various objects, but for now we’ll focus on attaching metadata to symbols. The with-meta function takes an object and a map and returns another object of the same type with the metadata attached. Equally named symbols often aren’t the same instance because each can have its own unique metadata:

(let [x (with-meta 'goat {:ornery true})
      y (with-meta 'goat {:ornery false})]
  [(= x y)
   (identical? x y)
   (meta x)
   (meta y)])

;=> [true false {:ornery true} {:ornery false}]

The two locals x and y both hold an equal symbol 'goat, but they’re different instances, each containing separate metadata maps obtained with the meta function. So you see, symbol equality depends on neither metadata nor identity. This equality semantic isn’t limited to symbols but is pervasive in Clojure, as we’ll demonstrate throughout this book. You’ll find that keywords can’t hold metadata^[7] because any equally named keyword is the same object.

⁷ Java class instances, including strings, can’t hold metadata either.

4.4.2. Symbols and namespaces

Like keywords, symbols don’t belong to any specific namespace. Take, for example, the following code:

(ns where-is)
(def a-symbol 'where-am-i)

a-symbol
;=> where-am-i

(resolve 'a-symbol)
;=> #'where-is/a-symbol

`a-symbol
;=> where-is/a-symbol

The initial evaluation of a-symbol shows the expected value where-am-i. But attempting to resolve the symbol using resolve and using syntax-quote returns what looks like (as printed at the REPL) a namespace-qualified symbol. This is because a symbol’s qualification is a characteristic of evaluation and not necessarily inherent in the symbol. This also applies to symbols qualified with class names. This evaluation behavior will prove beneficial when we discuss macros in chapter 8, but for now we can summarize the overarching idea known as Lisp-1 (Gabriel 2001).

4.4.3. Lisp-1

Clojure is what’s known as a Lisp-1,^[8] which in simple terms means it uses the same name resolution for function and value bindings. In a Lisp-2 programming language like Common Lisp, these name resolutions are performed differently depending on the context of the symbol, be it in a function-call position or a function-argument position. There are many arguments for and against both Lisp-1 and Lisp-2, but one downside of Lisp-1 bears consideration. Because the same name-resolution scheme is used for functions and their arguments, there’s a real possibility of shadowing existing functions with other locals or vars. Name shadowing isn’t necessarily bad if done thoughtfully, but if done accidentally it can lead to unexpected and obscure errors. You should take care when naming locals and defining new functions so that name-shadowing complications can be avoided.

⁸ It’s more like a Lisp-1.5, because like keywords, symbols can be used as functions when placed in the function-call position. This doesn’t have anything to do with name resolution, however, so it’s a minor point.

Because name-shadowing errors tend to be rare, the benefits of a simplified mechanism for calling and passing first-class functions far outweigh the detriments. Clojure’s adoption of a Lisp-1 resolution scheme makes for cleaner implementations and therefore highlights the solution rather than muddying the waters with the nuances of symbolic lookup. For example, the best function highlights this perfectly in the way that it takes the greater-than function > and calls it in its body as f:

(defn best [f xs]
  (reduce #(if (f % %2) % %2) xs))

(best > [1 3 4 2 7 5 3])
;=> 7

A similar function body using a Lisp-2 language requires the intervention of another function (in this case, funcall) responsible for invoking the function explicitly. Likewise, passing any function requires the use of a qualifying tag marking it as a function object, as shown here:

;; This is Common Lisp and NOT Clojure code
(defun best (f xs)
  (reduce #'(lambda (l r)
               (if (funcall f l r) l r))
          xs))

(best #'> '(1 3 4 2 7 5 3))
;=> 7

This section isn’t intended to champion the cause of Lisp-1 over Lisp-2, but rather to highlight the differences between the two. Many of the design decisions in Clojure provide succinctness in implementation, and Lisp-1 is no exception. The preference for Lisp-1 versus Lisp-2 typically boils down to matters of style and taste; by all practical measures, they’re equivalent.

Having covered the two symbolic scalar types, we now move into a type that you’re (for better or worse) likely familiar with: the regular expression.

4.5. Regular expressions—the second problem

Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.

Jamie Zawinski^[9]

⁹ Usenet post, 12 August 1997, alt.religion.emacs.

Regular expressions are a powerful and compact way to find specific patterns in text strings. Although we sympathize with Zawinski’s attitude and appreciate his wit, sometimes regular expressions are a useful tool to have on hand. The full capabilities of regular expressions (or regexes) are well beyond the scope of this section (Friedl 1997), but we’ll look at some of the ways Clojure uses Java’s regex capabilities.

Java’s regular-expression engine is reasonably powerful, supporting Unicode and features such as reluctant quantifiers and look-around clauses. Clojure doesn’t try to reinvent the wheel and instead provides special syntax for literal Java regex patterns plus a few functions to help Java’s regex capabilities fit better with the rest of Clojure.

4.5.1. Syntax

A literal regular expression in Clojure^[10] looks like this:

¹⁰ This section is Clojure-specific. ClojureScript’s regular expressions are notably different, and we’ll defer discussing them until chapter 13.

#"an example pattern"

This produces^[11] a compiled regex object that can be used either directly with Java interop method calls or with any of the Clojure regex functions described later:

¹¹ Literal regex patterns are compiled to java.util.regex.Pattern instances at read-time. This means, for example, if you use a literal regex in a loop, it’s not recompiled each time through the loop, but just once when the surrounding code is compiled.

(class #"example")
;=> java.util.regex.Pattern

Although the pattern is surrounded with double quotes like string literals, the way things are escaped within the quotes isn’t the same. If you’ve written regexes in Java, you know that any backslashes intended for consumption by the regex compiler must be doubled, as shown in the following compile call. This isn’t necessary in Clojure regex literals, as shown by the undoubled return value:

(java.util.regex.Pattern/compile "\d")
;=> #"d"

In short, the only rules you need to know for embedding unusual literal characters or predefined character classes are listed in the Javadoc for Pattern.^[12]

¹² See the online Pattern reference at http://mng.bz/qp77.

Regular expressions accept option flags, shown in table 4.1, that can make a pattern case-insensitive or enable multiline mode. Clojure’s regex literals starting with (?<flag>) set the mode for the rest of the pattern. For example, the pattern #"(?i)yo" matches the strings “yo”, “yO”, “Yo”, and “YO”.

Table 4.1. Flags that can be used in Clojure regular-expression patterns, along with their long name and a description of what they do. See Java’s documentation for the `java.util.regex.Pattern` class for more details.

Flag	Flag name	Description
d	UNIX_LINES	., ^, and $ match only the Unix line terminator ' '.
i	CASE_INSENSITIVE	ASCII characters are matched without regard to uppercase or lowercase.
x	COMMENTS	Whitespace and comments in the pattern are ignored.
m	MULTILINE	^ and $ match near line terminators instead of only at the beginning or end of the entire input string.
s	DOTALL	. matches any character including the line terminator.
u	UNICODE_CASE	Causes the i flag to use Unicode case insensitivity instead of ASCII.

4.5.2. Regular-expression functions

The re-seq function is Clojure’s regex workhorse. It returns a lazy seq of all matches in a string, which means it can be used to efficiently test whether a string matches or to find all matches in a string or a mapped file:

(re-seq #"w+" "one-two/three")
;=> ("one" "two" "three")

The preceding regular expression has no capturing groups, so each match in the returned seq is a string. A capturing group (subsegments that are accessible via the returned match object) in the regex causes each returned item to be a vector:

(re-seq #"w*(w)" "one-two/three")
;=> (["one" "e"] ["two" "o"] ["three" "e"])

Now that we’ve looked at some nice functions you can use, we’ll talk about one object you shouldn’t.

4.5.3. Beware of mutable matchers

Java’s regular-expression engine includes a Matcher object that mutates in a non-thread-safe way as it walks through a string finding matches. This object is exposed by Clojure via the re-matcher function and can be used as an argument to re-groups and the single-parameter form of re-find. We highly recommend avoiding all of these unless you’re certain you know what you’re doing. These dangerous functions are used internally by the implementations of some of the recommended functions described earlier, but in each case they’re careful to disallow access to the Matcher object they use. Use matchers at your own risk, or better yet don’t use them directly at all.^[13]

¹³ The clojure.string namespace has a bevy of functions that are handy for using regular expressions.

4.6. Summary

Clojure’s scalar types generally work as expected, but its numerical types have the potential to cause frustration in certain situations. Although you may rarely encounter issues with numerical precision, keeping in mind the circumstances under which they occur may prove useful in the future. Given its inherent arbitrary-precision big decimal and rational numerics, Clojure provides the tools for perfectly accurate calculations. Keywords serve many purposes and are ubiquitous in Clojure code. When dealing directly with symbols, Clojure’s nature as a Lisp-1 defines the nature of how symbolic resolution occurs. Finally, Clojure provides regular expressions as first-class data types, and their usage is encouraged where appropriate.

As you might have speculated, this chapter was nice and short due to the relative simplicity of scalar types. In the following chapter, we’ll step it up a notch or 10 when covering Clojure’s composite data types. Although scalars are interesting and deeper than expected, the next chapter will start you on your way to understanding Clojure’s true goal: providing a sane approach to application state.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 4. On scalars

Create new playlist

Sign In

Sign Up

Chapter 4. On scalars

4.1. Understanding precision

4.1.1. Truncation

4.1.2. Promotion

4.1.3. Overflow

4.1.4. Underflow

4.1.5. Rounding errors

4.2. Trying to be rational

4.2.1. Why be rational?

4.2.2. How to be rational

4.2.3. Caveats of rationality

4.3. When to use keywords

4.3.1. Applications of keywords

As keys

As functions

As enumerations

As multimethod dispatch values

As directives

Listing 4.1. Using a keyword as a function directive

4.3.2. Qualifying your keywords

4.4. Symbolic resolution

4.4.1. Metadata

4.4.2. Symbols and namespaces

4.4.3. Lisp-1

4.5. Regular expressions—the second problem

4.5.1. Syntax

Table 4.1. Flags that can be used in Clojure regular-expression patterns, along with their long name and a description of what they do. See Java’s documentation for the java.util.regex.Pattern class for more details.

4.5.2. Regular-expression functions

4.5.3. Beware of mutable matchers

4.6. Summary

Table of Contents for
Chapter 4. On scalars

Table 4.1. Flags that can be used in Clojure regular-expression patterns, along with their long name and a description of what they do. See Java’s documentation for the `java.util.regex.Pattern` class for more details.