CHAPTER 3: Learning the Ropes of Julia

Chapter003.jpg

The goal of this chapter is to get you acquainted with the specifics of the language, in order to gain enough mastery to use it for the problems we’ll be working through. Again, it is assumed that you are familiar with the basics of programming and have some experience in other programming languages. This way, we will be able to focus more on the data science applications of Julia.

This chapter will also be a useful reference for the language when you develop your own scripts. You will learn enough to be able to understand the language’s logic and figure out things yourself when needed. This chapter covers the following topics:

  • Data types
  • Arrays
  • Dictionaries
  • Basic commands and functions
  • Loops and conditionals.

Let us now dive into this exciting new technology and see how we can unveil its potential within the data science domain. If you are already familiar with the Julia language, feel free to skip to Chapter 5, or just focus on the exercises at the end of this chapter as well as those of the next one.

Data Types

Let’s delve into Julia programming by looking at the building blocks of data, usually referred to as types. Every variable in Julia belongs to a particular type, such as integer, string, array, etc. Still, certain types (like matrix or vector, for example) are not as straightforward as you may expect, and can be subtypes of other types (in this case the array type).

Although optional, defining the type of a variable lets Julia know that the variable’s values will be converted to that type. This is particularly useful when creating complex programs, where ambiguity often translates into errors and unexpected results. If you don’t define the type of a variable, Julia will assume the simplest type that can relate to the value of that variable (if there is no value, it will just use the generic type “any”). Let’s clarify all this will a couple of examples:

In[1]: x = 123              #A

Out[1]: 123                 #B

In[2]: y = “hello world!”   #C

Out[2]:”hello world!”       #D

#A Assign the value 123 to variable x

#B Julia interprets this as an Integer and stores it into x

#C Assign the value “hello world!” to variable y

#D Julia interprets this as a String and stores it into y

You can discover the type of a variable using the typeof() command. So for the above examples’ variables:

In[3]: typeof(x)

Out[3]: Int64      #A

In[4]: typeof(y)

Out[4]: ASCIIString

#A This could be Int32 as well, depending on your computer

Int64 is a subcategory (or subtype) of the integer type, while ASCIIString is a specific case (or subtype) of the string type.

You can define the type of a variable using the double colons (x::Int64), as we will see shortly. For now, let’s just shift from one variable type to another using the corresponding constructor functions, which have the same names as their types, but with all lowercase letters. For example, the function Int32() will transform whatever you give it into a 32-bit integer (Int32 type). So, building on the example above:

In[5]: z = Int32(x)

Out[5]: 123

In[6]: typeof(z)

In[6]: Int32

Naturally, not all types can convert from one to another:

In[7]: w = Int32(“whatever”)

Out[7]: ERROR: invalid base 10 digit ‘w’ in “whatever”

Below is a list of the main types found in Julia with corresponding examples.

Data Type

Sample Values

Int8

98, -123

Int32

2134112, -2199996

Int64

123123123123121, -1234123451234

Float32

12312312.3223, -12312312.3223

Float64

12332523452345.345233343, -123333312312.3223232

Bool

true, false (notice that the contents of this type of variable are always lowercase in Julia)

Char

‘a’, ‘?’

String

“some word or sentence”, “ “

BigInt

3454893223743457239848953894985240398349234435234532

BigFloat

3454893223743457239848953894985240398349234435234532.3432

Array

[1, 2322433423, 0.12312312, false, ‘c’, “whatever”]

To get a better understanding of types, we highly recommend that you spend some time playing with these data types in the REPL (which is short for “Read, Evaluate, Print, Loop“ and refers to the interactive interface that is now common among most scripting languages). Pay close attention to the char and string types as their constructors are similar. Make use of single quotes () for the former, while for the latter we use double quotes ().

The BigInt and BigFloat types have no limit on the value of their contained numbers, making them ideal for handling arbitrary large numbers. They do take a toll on memory usage use, though, so it’s best not to be too liberal with the use of these types. If you do plan to use them, make sure that you initialize the corresponding variables accordingly. For instance:

In[8]: x = BigInt()

As BigInt and BigFloat are special types, they cannot be defined with the double colon notation (::), so you will have to use the BigInt() and BigFloat() constructors respectively. When dealing with small values (between -128 and 127), use Int8 as it’s more frugal in terms of computer resources and is particularly useful for counter variables and many other cases dealing with small integer values (e.g. indexes).

Arrays

Array basics

Arrays are fundamental in Julia. They allow you to handle collections of any type of data, as well as combinations of different types. Just like in other languages such as R and Python, indexing an array takes place using square brackets, which are also used for defining a set of variables as an array. So, for the array p = [1, 2322433423, 0.12312312, “false”, c, “whatever”], you can access its third element (the float number, in this case) by typing p[3]:

In[9]: p = [1, 2322433423, 0.12312312, false, ‘c’, “whatever”];

    p[3]

Out[9]: 0.12312312

Unlike most other languages (e.g. C#), indexing in Julia starts with 1 instead of 0; if you try to access the first element of an array by typing p[0] you’ll get a Bounds Error. You’ll obtain the same error if you try to access something beyond the last element of an array. The index of an array always needs to be an integer (although you can use the Boolean value “true” as well, to access the first element of the array). To access the last element of an array, you can use the pseudo-index “end”:

In[10]: p[end]

Out[10]: “whatever”

This is particularly useful if you don’t know the exact dimension of an array, which is common when you add and remove elements in that array. Just like in other languages, arrays in Julia are mutable structures, making them relatively slower than immutable ones such as tuples or certain types of dictionaries. So if you are opting for versatility (e.g. in the case of a variable, or a list of coefficients), arrays are the way to go. Should you wish to initialize an array so that you can populate it at a later time, you merely need to provide Julia with the data type of its elements and its dimensions. Say that you want to have an array with three rows and four columns, for example, so that you can store Int64 values to it. Simply type:

In[11]: Z = Array(Int64, 3, 4)

Out[11]: 3x4 Array{Int64,2}:

34359738376 0 1 3

2147483649 4 1 4

   0 5 2 5

An array’s contents upon initialization are whatever happens to dwell in that part of the memory that Julia allocates to it. It is uncommon that you find such an array full of zeros. If you wish to have a more generic array that you will use to store all kinds of variables (including other arrays), you need to use the “Any” type when you initialize it:

In[12]: Z = Array(Any, 3, 1)

Out[12]: 3x1 Array{Any,2}:

#undef

#undef

#undef

Such an array is going to be full of undefined values (referred to as #undef). You cannot use these with any calculations, numeric or otherwise, so be sure to allocate some meaningful data to this array before using it (to avoid an error message).

Accessing multiple elements in an array

You can access several elements of an array at once using a range or an array of integers as an index. So, if you need to get the first three elements of p, you just need to type p[1:3] (note that 1:3 is an inclusive range for all integers between and including 1 and 3):

In[13]: p[1:3]

Out[13]: 3-element Array{Any,1}:

  1

2322433423

  0.123123

As sometimes the exact length of the array is unknown, it is handy to refer to the last element of the array as “end,” as we saw earlier. So, if you want to get the last three elements of p, type p[(end-2):end], with or without the parentheses (including them simply makes the code easier to understand).

Using an array of integers as an index is very similar. If you want to get only the first and the fourth element of p, you just need to type p[[1,4]]. The double brackets used here: the outermost are for referencing the p array, while the innermost are used to define the array of the indexes 1 and 4:

In[14]: p[[1,4]]

Out[14]: 2-element Array{Any,1}:

   1

false

In practice, you would store the indexes of interest to you in an array–let’s call it ind–and access the elements of interest using p[ind]. This makes for cleaner and more intuitive code.

Multidimensional arrays

For arrays that have more than one dimension, you must provide as many indexes as the number of dimensions. For example, say that A is a 3x4 matrix, which we can build and populate as follows:

In[15]: A = Array(Int64, 3,4); A[:] = 1:12; A

Out[15]: 3x4 Array{Int64,2}:

1 4 7 10

2 5 8 11

3 6 9 12

To obtain the third element in the second row, you need to type A[2,3]. Now, if you wish to get all the elements of the third row, you can do this by typing A[3,:] (you can also achieve this with A[3,1:end] but it is more cumbersome). If you want to access all elements in the matrix you would type A[:,:], though typing A would yield the same result, as we saw in the above example. By the way, if you wish to obtain the contents of A in a single file, just type A[:]. The result will be a one-dimensional array.

Dictionaries

As the name suggests, a dictionary is a simple look-up table used for organizing different types of data. Just like an array, it can contain all kinds of data types, though usually there are two types in any given dictionary. Unlike an array, a dictionary doesn’t have to have numeric indexes. Instead, it has an index of any type (usually referred to as a key). The data corresponding to that key is referred to as its value. Julia implements this data structure with an object called dict, which provides a mapping between its keys and its values: {key1 => value1, key2 => value2, …, keyN => valueN}. The => operator is used specifically in this structure only, and is an abbreviation of the pair() function. It is completely different than the >= operator, which is an abbreviation of the “greater or equal to” algebraic function. You can easily create a dictionary as follows:

In[16]: a = Dict()            #A

Out[16]: Dict{Any,Any} with 0 entries

In[17]: b = Dict(“one” => 1, “two” => 2, “three” => 3,  “four” => 4)                            #B

Out[17]: Dict{ASCIIString,Int64} with 4 entries:

“two” => 2

“four” => 4

“one” => 1

“three” => 3

#A This creates an empty dictionary

#B This creates a dictionary with predefined entries (it is still mutable). Note that this format works only from version 0.4 onwards

To look up a value in a dictionary, just type its name and the key as an index, like you would for an array (using an integer instead of a key):

In[18]: b[“three”]

Out[18]: 3

Naturally, if the key doesn’t exist in the dictionary, Julia throws an error:

In[19]: b[“five”]

Out[19]: ERROR: key not found: “five”

in getindex at dict.jl:617

Dictionaries are useful for certain cases where you need to access data in a more intuitive manner, but don’t need to access ranges of data, such as in a database table containing book titles and their ISBNs.

Basic Commands and Functions

We’ll continue our journey into Julia with a few basic commands and functions, which can make types more meaningful and useful. Get comfortable with each of these so you can create your own use cases. Every command will produce some kind of response, to show you that Julia has acknowledged it. To avoid this acknowledgment, you can use a semicolon right after the command, leading instead to another prompt.

print(), println()

Syntax: print(var1, var2, …, varN), where all of the arguments are optional and can be variables of any type. println() has exactly the same syntax.

Although the REPL makes it easy to view the values of variables when you enter their names, this perk is rarely available in real-world scenarios. Instead, simply enter print() and println(). These functions barely need an introduction as they are essentially identical across various high-level languages.

Print() simply prints a variable at the terminal, right after the previously printed data; this allows you to save space and customize how you view your data. Println() prints a variable along with a carriage return, ensuring that whatever is printed afterwards will be in a new line. You can use print() and println() with a number of variables (e.g. print(x,y,z), print(x, “ “, y)) as follows:

In[20]: x = 123; y = 345; print(x, “ “,y)

Out[20]: 123 345

In[21]: print(x,y); print(“!”)

Out[21]: 123345!

In[22]: println(x); println(y); print(“!”)

In[22]:123

345

!

All variables used with print() and println() are turned into strings, which are then concatenated and treated as a single string. These functions are particularly useful for debugging and for presenting the results of a program.

typemax(), typemin()

Syntax: typemax(DataType), typemin(DataType)

These commands provide you with some useful information about the limits of certain numeric types (e.g. Int32, Float64, etc.). For example:

In[23]: typemax(Int64)

Out[23]: 9223372036854775807

In[24]: typemin(Int32)

Out[24]: -2147483648

In[25]: typemax(Float64)

Out[25]: Inf

Finding the min and max values of a data type is particularly handy when you are dealing with numbers of high absolute value and you want to conserve memory. Although a single Float64 doesn’t utilize much memory on its own, imagine the impact if you were using a large array comprised of such variables.

collect()

Syntax: collect(ElementType, X), where X is any data type that corresponds to a kind of range (usually referred to as a “collection”), and ElementType is the type of elements of X that you wish to obtain (this parameter is usually omitted).

A handy function that allows you to obtain all the elements in a given object, in an array form. This is particularly useful if you are dealing with a variety of objects that were developed to improve Julia’s performance (e.g. ranges) but are counter-intuitive to high-level users (since it is unlikely for them to encounter these objects in the real world). For example:

In[26]: 1:5

Out[26]: 1:5

In[27]: collect(1:5)

Out[27]: 5-element Array{Int64,1}:

1

2

3

4

5

show()

Syntax: show(X), where X is any data type (usually an array or dictionary).

This useful function allows you to view the contents of an array without all the metadata that accompanies it, saving you space on the terminal. The contents of the array are shown horizontally, making it particularly handy for large arrays, which tend to be abbreviated when you try to view them otherwise. For example:

In[28]: show([x y])

Out[28]: [123 345]

In[29]: a = collect(1:50); show(a)

Out[29]: [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50]

linspace()

Syntax: linspace(StartPoint, EndPoint, NumberOfPoints), where the NumberOfPoints argument is optional and defaults to 50. All arguments are floats or integers.

When you want to plot a mathematical function, you often need an array of equidistant values for your independent variable. These can be provided by the linspace() function. When run with two arguments, a and b, it yields a list of 50 values (including a and b) with each consecutive value having the same distance to the next. This output takes the form of a special object called linspace, but you can view its elements using the collect() function. For example, show(collect(linspace(0,10))) will yield the following:

[0.0,0.20408163265306123,0.40816326530612246,0.6122448979591837, ..., 10.0]

If you want to specify the number of points in this array, you can add a third argument c (always an integer), denoting just that. For example, show(collect(linspace(0,10,6))) will yield:

[0.0,2.0,4.0,6.0,8.0,10.0]

Mathematical Functions

round()

Syntax: round(var, DecimalPlaces), where var is the numeric variable you want to round to the nearest integer, and DecimalPlaces is the number of decimal places to take into account (this parameter is optional and has a default value of 0, rounding to the nearest integer).

As the name suggests, this function rounds off a given number (usually of the float type) to the nearest integer. The output of the function is of the same type as the input:

In[30]: round(123.45)

Out[30]: 123.0

In[31]: round(100.69)

Out[31]: 101.0

Although int() does the same kind of rounding (for zero decimal points), it is not supported any more (although the current version recognizes it). Most likely, it will throw an error in future releases of the language. You can also customize round() to give you the number of decimal places you want:

In[32]: round(100.69, 1)

Out[32]: 100.7

In[33]: round(123.45, -1)

Out[33]: 120.0

Since round() is used on float variables, it returns a float too. If you want an int instead, you need to specify that as an argument:

In[34]: round(Int64, 19.39)

Out[34]: 19

This is particularly useful if you plan to use the output of this function as part of a range or an index to a matrix or multi-dimensional array.

rand(), randn()

Syntax: rand(type, dimension1, dimension2, …, dimensionN), where type is the data type you wish to have the output in (default is float) and dimensionX is the number of random values in the X dimension of the output. There needs to be at least one dimension (i.e. the output is a vector). The type argument is optional. randn() shares the same syntax, with the only difference being that it doesn’t have the type argument. Rand() yields numbers that follow the uniform distribution [0, 1], while randn() numbers that follow the normal distribution N(0,1).

These are a couple of particularly useful functions, especially if you plan to do simulations in your analyses. What they do is provide you with a random float. In the case of rand() it is between 0 and 1 and follows the uniform distribution; in randn() it is a number in the normal distribution with mean = 1 and standard deviation = 1. If you want a series of random numbers, you can add an integer argument to the function. For example:

In[35]: show(rand(10))

Out[35]: [0.7730573763699315,0.5244000402202329,0.7087464615493806,0.30694152302474875,0.052097051188102705,0.7553963677335493,0.27754039163886635,0.3651389712487374,0.2772384170629354,0.9607152514021782]

If you want a matrix of random numbers, just add an extra integer in the arguments, to denote the length of the second dimension. For example:

In[36]: show(rand(5,3))

Out[36]: [0.9819193447719754 0.32051953795789445 0.16868830612754793

0.5650335899407546 0.6752962982347646 0.6440294745246324

0.3682684190774339 0.06726933651330436 0.5005871456892146

0.5592698510697376 0.8277375991607441 0.6975769949167918

0.7225171655795466 0.7698193892868241 0.4455584310168279]

Yet, we don’t always need floats between 0 and 1. We often require integers or Booleans. When you require random integers, just add a type argument before the integer ones:

In[37]: show(rand(Int8, 10))

Out[37]: Int8[-111,6,0,-91,105,123,-76,-62,127,25]

If you require random Boolean values, you can use the rand() function with the Bool type as its first parameter. For example:

In[38]: show(rand(Bool, 10))

Out[38]: Bool[false,true,true,true,true,false,true,true,false,true]

It is often useful to have an array of integers between two given values. This can be accomplished by using a range as the first argument of the rand() function. For example, rand(1:6,10) will provide an array of 10 integers between 1 and 6:

In[39]: show(rand(1:6,10))

Out[39]: [5,2,3,2,3,1,4,5,1,2]

This style of rand() makes use of multiple dispatch, as it’s a slightly different method than is used in the backend. This use of rand() is particularly helpful for simulating stochastic processes. Also, rand() always provides results based on the uniform distribution. Should you need something that follows the Bell curve, randn() is the function you would use:

In[40]: show(randn(10))

Out[40]: [-0.0900864435078182,1.0365011168586151,                   -1.0610943900829333, 1.8400571395469378,                       -1.2169491862799908,1.318463768859766,                      -0.14255638153224454,0.6416070324451357, 0.28893583730900324,1.2388310266681493]

If you require a few random numbers that that follow N(40, 5), you can type the following:

In[41]: show(40 + 5*randn(10))

Out[41]: [43.52248877988562,29.776468140230627,40.83084217842971,39.88325340176333,38.296440507642934,43.05294839551039,50.35013128871701,45.07288143568174,50.14614332268907,38.70116850375828]

Naturally, these results will be different every time you run either one of these functions. To ensure that the same sequence of random numbers is always going to be used, you can set the seed of the random number generator that Julia uses (the seed needs to be a number between 0 and 2147483647):

In[42]: srand(12345)

In[43]: show(randn(6))

Out[43]: [1.1723579360378058,0.852707459143324,0.4155652148774136,0.5164248452398443,0.6857588179217985,0.2822721070914419]

In[44]: srand(12345)

In[45]: show(randn(6))

Out[45]: [1.1723579360378058,0.852707459143324,0.4155652148774136,0.5164248452398443,0.6857588179217985,0.2822721070914419]

To create random numbers that follow any form of the normal distribution, apply a linear transformation to the outputs of the randn() function. Say, for example, that we require ten random numbers stemming from a distribution having an average μ = 40 and standard deviation σ = 10. In this case we need to type the following:

In[46]: show(10*randn(10) - 40)

Out[46]: [-32.55431668595578,-39.940916092640805,                  -33.735585473277375,-31.701071486620336,-44.81211848719756,    -42.488100875252336,-39.70764823986392,-41.9736830812393,      -52.122465106839456,-56.74087248032391]

sum()

Syntax: sum(A, dimension), where A is the array containing the values to be summed up, and dimension is the dimension upon which the summing up takes place (this argument is optional and has a default value of 1).

This function barely needs any explanation as it is pretty much the same across most programming languages (including spreadsheet software, like Excel). Still, it’s worth describing, as it’s commonly used. The key thing to remember is that it takes arrays as its main input. For example:

In[47]: sum([1,2,3,4])

Out[47]: 10

For larger collections of data, such as matrices, you can use this function with an additional parameter: an extra argument (integer) to denote which dimension you want to calculate the sum on. For example, say you have a 3x4 2-D matrix A, containing the integers between 1 and 12:

In[48]: A = Array(Int64, 3,4); A[:] = 1:12; show(A)

Out[48]: [1 4 7 10

2 5 8 11

3 6 9 12]

If you type sum(A) you’ll get the sum of all the elements of A. To get the sum of all the rows (i.e. sum across dimension 1), you would need to type sum(A,1), while for the sum across all the columns, sum(A,2) would do the trick:

In[49]: sum(A,1)

Out[49]: 1x4 Array{Int64,2}:

6 15 24 33

In[50]: sum(A,2)

Out[50]: 3x1 Array{Int64,2}:

22

26

30

The arrays you put into the sum() function don’t have to be composed of integers or floats only. You can also add up Boolean arrays, as “true” is equivalent to 1 in Julia. For example:

In[51]: sum([true, false, true, true, false])

Out[51]: 3

This is the only case where sum() will yield a result of a different type than its inputs.

mean()

Syntax: mean(A, dimension), where A is the array containing the values to be averaged, and dimension is the dimension upon which the summing up takes place (this argument is optional and has a default value of 1).

This is another well-known function that remains consistent across various programming platforms. As you may have guessed, it just provides the arithmetic mean of an array. The values in that array need to be of the number type (e.g. floats, integers, real, or complex numbers) or Booleans. If these values are of the number type, the output is either a float, a real, or a complex number (depending on the exact type of the inputs), while in the case of Booleans, the result is always a float. Here a few examples of this function in action:

In[52]: mean([1,2,3])

Out[52]: 2.0

In[53]: mean([1.34, pi])

Out[53]: 2.2407963267948965

In[54]: mean([true, false])

Out[54]: 0.5

The same additional arguments of sum() apply to this function too: mean(A,1) will yield the average of all the rows of matrix A.

Array and Dictionary Functions

in

Syntax: V in A, where V is a value that may or may not exist in the Array A.

This is a handy command for searching an array for a particular value. Say, for instance, that you have the Array x = [23, 1583, 0, 953, 10, -3232, -123] and you are interested to see whether 1234 and 10 exist within it. You can perform these checks using the in command:

In[55]: 1234 in x

Out[55]: false

In[56]: 10 in x

Out[56]: true

This command works with all kinds of arrays and always provides a Boolean as its output. Although you can use this to search for a particular character in a string, there are better ways to accomplish that, as we’ll see in Chapter 4.

append!()

Syntax: append!(Array1, Array2), where Array1 and Array2 are arrays of the same dimensionality. Array2 can be a single cell (1x1 array).

This is a useful function for merging existing arrays. These arrays can have values of any type. For example:

In[57]: a = [“some phrase”, 1234]; b = [43.321, ‘z’, false];

In[58]: append!(a,b); show(a)

Out[58]: Any[“some phrase”,1234,43.321,’z’,false]

Note the exclamation mark right before the parentheses. Functions with this feature make changes to their first variable, so keep that in mind when using them. It logically follows that there is no need to use another variable for their output (although you can if you want to).

pop!()

Syntax: pop!(D, K, default), where D is a dictionary, K is the key we are searching for, and default is the value to return if the key is not present in the dictionary. The final argument is optional. pop!(A), where A is an array (or any other type of collection, apart from dictionary since these require the special syntax described previously). Although pop!()works with arrays, there will be performance issues when working with large arrays.

When dealing with dictionaries you often need to fetch an element while simultaneously removing it. This can be accomplished with the pop!() function. This function is agnostic of the values of that dictionary, so it is versatile. Take for example the following scenario:

In[59]: z = Dict(“w” => 25, “q” => 0, “a” => true, “b” => 10, “x” => -3.34);

In[60]: pop!(z, “a”)

Out[60]: true

In[61]: z

Out[61]: Dict{ASCIIString,Any} with 4 entries:

“w” => 25

“q” => 0

“b” => 10

“x” => -3.34

In[62]: pop!(z,”whatever”, -1)

Out[62]: -1

Note that -1 appears because the element “whatever” doesn’t exist in the dictionary z. We could put anything else in its place, such as a whole string like “can’t find this element!” if we prefer.

push!()

Syntax: push!(A, V), where A is a one-dimensional array, and V is a value. Just like in the case of pop!, we recommend that you don’t use this function, particularly for larger arrays.

The push!() function is something of an opposite to pop!(), as it augments an existing array with a new element. So, if we wanted to add the element 12345 to an array z, we’d run the following:

In[63]: z = [235, “something”, true, -3232.34, ‘d’];

In[64]: push!(z,12345)

Out[64]: 6-element Array{Any,1}:

  235

   “something”

true

-3232.34

  ‘d’

12345

splice!()

Syntax: splice!(A, ind, a), where A is an array (or collection in general), ind is the index you are interested in retrieving, and a is the replacement value (optional).

Splice!() is a generalization of the pop!() function: instead of retrieving the last element of the collection, it fetches any element you wish. The desired element is defined by the ind variable (an integer). Once the function is applied on the collection A (usually an array, though it can be a dictionary, or any other type of collection), it automatically removes that element from the collection.

If you wish to preserve the size of A, you can put something in the place of the index you remove (usually something you can easily identify, like a special character, or the value -1 for numeric collections). This is possible by using the third parameter, a, which is entirely optional. So, in the z array from earlier, you can take away its fifth value (the d character) by typing the following:

In[65]: splice!(z, 5)

Out[65]: ‘d’

In[66]: show(z)

Out[66]: Any[235,”something”,true,-3232.34,12345]

You can also replace the value “true” with something else, say “~”, since that’s a character not often encountered; you can use it in your application to mean “the value in this index has been used already.” All this is possible through the following command:

In[67]: splice!(z, 3, ‘~’)

Out[67]: true

In[68]: show(z)

Out[68]: Any[235,”something”,’~’,-3232.34,12345]

insert!()

Syntax: insert!(A, ind, a), where A is an array (or collection in general), ind is the index you are interested in retrieving, and a is the replacement value.

This function is similar to splice!(), sharing exactly the same syntax. The difference is that it doesn’t remove anything from the collection it is applied to, nor does it have any optional arguments. As its name suggests, it inserts a value a into a given collection A, at the index ind. So, if we wish to put the value “Julia rocks!” as the fourth element of our previous Array z, we just need to type the following:

In[69]: insert!(z, 4, “Julia rocks!”)

Out[69]: 6-element Array{Any,1}:

  235

   “something”

   ‘~’

   “Julia rocks!”

-3232.34

12345

Naturally, all the elements from the fourth position on will be shifted forward, increasing the array’s length by one.

sort(), sort!()

Syntax: sort(A, dim, rev, …), where A is an array, dim is the dimension upon which the sorting will take place (in the case of a multi-dimensional array), and rev is a Boolean parameter for getting the results in reverse order (default = “false”, i.e. smallest to largest).

This is a handy function, particularly when you are dealing with alphanumeric data. As the name suggests and as you may already know from other languages, sort() takes an array of data and orders it using one of the many sorting algorithms (the default is QuickSort for all numeric arrays and MergeSort for all other arrays). If you don’t intend to keep the original version of the array, you can use the sort!() function, which does the same thing but replaces the original array as well. Let’s try to sort the array x = [23, 1583, 0, 953, 10, -3232, -123] using these functions:

In[70]: x = [23, 1583, 0, 953, 10, -3232, -123];

In[71]: show(sort(x))

Out[71]: [-3232, -123, 0, 10, 23, 953, 1583]

In[72]: show(x)

Out[72]: [23, 1583, 0, 953, 10, -3232, -123]

In[73]: sort!(x); show(x)

Out[73]: [-3232, -123, 0, 10, 23, 953, 1583]

If you prefer to sort your array from largest to smallest, you’ll need to use the rev parameter of the function: sort(x, rev=true). Naturally, sort() works well with strings too:

In[74]: show(sort([“Smith”, “Brown”, “Black”, “Anderson”, “Johnson”, “Howe”, “Holmes”, “Patel”, “Jones”]))

Out[74]: ASCIIString[“Anderson”, “Black”, “Brown”, “Holmes”, “Howe”, “Johnson”, “Jones”, “Patel”, “Smith”]

get()

Syntax: get(D, K, default), where D is the name of the dictionary you wish to access, K is the key you are querying, and default is the default value to return if the key is not found in the dictionary (to avoid getting an error message). The last parameter is optional.

Sometimes the key you are looking for doesn’t exist in a particular dictionary. To avoid error messages, you can set a default value for Julia to return whenever that happens. You can do that as follows:

In[75]: b = Dict(“one” => 1, “two” => 2, “three” => 3, “four” => 4);

In[76]: get(b, “two”, “String not found!”)

Out[76]: 2

In[77]: get(b, “whatever”, “String not found!”)

Out[77]: “String not found!”

Keys(), values()

Syntax: keys(D) and values(D), where D is the name of the dictionary you wish to access.

You can access all the keys and all the values of a dictionary using keys() and values() respectively:

In[77]: b = Dict(“one” => 1, “two” => 2, “three” => 3, “four” => 4);

In[78]: keys(b)

Out[78]: Base.KeyIterator for a Dict{ASCIIString,Int64} with 4 entries. Keys:

“one”

“two”

“three”

“four”

In[79]: values(b)

Out[79]: ValueIterator for a Dict{ASCIIString,Any} with 4 entries. Values:

1

2

3

4

length(), size()

Syntax: length(X), where X is an array, dictionary, or string (this also works on the number and Boolean types, but always yields “1” in those cases).

This is by far the most commonly used function when handling arrays, as it provides the number of elements in a given array (in the form of an integer). Let’s take Array x used previously, as well as a 4x5 matrix of random numbers:

In[80]: x = [23, 1583, 0, 953, 10, -3232, -123];

In[81]: length(x)

Out[81]: 7

In[82]: length(rand(4,5))

Out[82]: 20

This function can also be used for finding the size of a given string, in terms of characters. So, if you want to see how long the string “Julia rocks!” is, for example:

In[83]: y = “Julia rocks!”

In[84]: length(y)

Out[84]: 12

Miscellaneous Functions

time()

Syntax: time()

If you ever wonder how many seconds have passed since the Unix epoch began (i.e. midnight of January 1, 1970), time() is there to help you answer this question. The actual number probably won’t make much of a difference in your life (unless you were born that day, in which case it would be great!), but having an accurate time stamp (with microsecond precision) can be useful at times. This function doesn’t need any arguments and always yields a float:

In[85]: t = time()

Out[85]: 1.443729720687e9

Unfortunately, it is not user-friendly. Nevertheless, it is useful for the applications it is designed for (mainly bench-marking code performance). In fact, one of the most widely used programming commands in Julia, @time, is based on this function. Without this command it would be downright impossible to measure the performance of a Julia program from within the language.

Conditionals

if-else statements

This type of statement, often referred to as conditional evaluation, is essential for the majority of algorithms (data science and otherwise). In essence, an if-else statement allows you to execute certain pieces of code when a given condition holds true, and other pieces of code otherwise. This gives you a lot of flexibility and allows for more elaborate programming structures, particularly if you use a combination of such statements together (nested if-statements). Here are a couple of examples to illustrate the use of if-else statements:

In[99]: x = 2; y = 1; c = 0; d = 0;

In[100]: if x >= 1

     c += 1

    else

     d += 1

    end;

In[101]: show([c, d])

Out[101]: [1,0]

In[102]: if x == 2

     c += 1

     if y < 0

    d += 1

     else

    d -= 1

     end

    end;

In[103]: show([c, d])

Out[103]: [2,-1]

The else clause is optional. Also, the semicolons are not necessary, but they help avoid any confusion from the outputs of the conditionals, since there are two variables involved. You can always merge the else clause with additional if-statements, allowing for more powerful filters using the elseif command:

In[104]: x = 0; c = 0; d = 0;

In[105]: if x > 0

      c += 1

    elseif x == 0

      d += 1

    else

      println(“x is negative”)

    end

In[106]: show([c, d])

Out[106]: [0,1]

You can abbreviate this whole if-else structure using what is called the ternary operator, which is useful when the end result of the statement is the assignment of a value to a variable. The ternary operator takes the form variable = condition ? (value if condition is “true”) : (value if condition is “false”). The parentheses are included for clarification and are entirely optional. For example, these two snippets of code are identical in function:

Snippet 1

In[107]: x = 123;

In[108]: if x > 0

    “x is positive”

  else

    “x is not positive”

  end

Out[108]: “x is positive”

Snippet 2

In[109]: x = 123; result = x > 0 ? “x is positive” : “x is not positive”

Out[109]: “x is positive”

If x is negative, the same conditional will yield a different result:

In[110]: x = -123; result = x > 0 ? “x is positive” : “x is not positive”

Out[110]: “x is not positive”

Note that the ternary operator can be nested as well:

In[111]: result = x > 0 ? “x is positive” : x < 0 ? “x is negative” : “x is zero”

string()

Syntax: string(var1, var2, …, varN), where varX is a variable of any type. All arguments are optional, though the function is meaningful if there is at least one present.

Transforming a data type into a string can be accomplished using the string() function:

In[86]: string(2134)

Out[86]: “2134”

Furthermore, string() can also concatenate several variables together, after converting each one of them into a string:

In[87]: string(1234, true, “Smith”, ‘ ‘, 53.3)

Out[87]: 1234trueSmith 53.3

This function is particularly useful when preparing data for IO processing. It also makes formatting easier and allows you to handle special characters effectively. We’ll look more into string-related functions in the following chapter.

map()

Syntax: map(fun, arr), where fun is a function that you want to apply on every element of array arr. This makes for more elegant code and is crucial for more advanced programming structures (e.g. parallelization).

Since you’ll frequently need to apply transformations to your data, the creators of Julia came up with a function that does just that. This is equivalent to the apply() function in Python (mainly used for older versions of the language, as well as in the Graphlab Create Python API) and the lapply() and sapply() functions in R. Here is an example:

In[88]: show(map(length, [“this”, “is”, “a”, “map()”, “test”]))

Out[88]: [4, 2, 1, 5, 4]

In essence, this application of the map() function calculates the length of each string in the given array. The result is going to be an array, too.

Since Julia is inherently fast, this function provides little performance benefit. However, it can be handy if you are used to this kind of programming structure from pre-Julia languages.

VERSION()

Syntax: VERSION().

As the name of this command suggests, this is the simplest way to view the version number of the Julia kernel you are running (whether you are accessing Julia through the REPL, or via an IDE). This information is usually not crucial, but if you are using older packages (that may be outdated) or very new ones, it is useful to ensure that they can run properly.

Operators, Loops and Conditionals

In this section we’ll take a look at how for-loops, while-loops, and if-else statements are implemented in Julia. Before we begin, however, it will be useful to see how Julia uses operators. Without these, both iteration and conditional structures would not be possible.

Operators

Operators are logical functions that take two arguments (of the same type) and provide a Boolean as an output. Some of these operators can be chained together, yielding more complex logical structures. Operators are fundamental for all kinds of meaningful programs, and are essential for creating non-trivial algorithms that provide useful results. There are generally two types of operators: the alphanumeric ones and the logical ones. The former are used to compare a pair of numeric, string, or character variables; the latter apply only to Booleans. All operators, however, yield a Boolean as a result.

Alphanumeric operators (<, >, ==, <=, >=, !=)

Syntax: A < B, where A and B are variables of the same data type. This applies to all of these operators.

Alphanumeric operators are used to perform many types of comparisons. For example, a < 5, a == 2.3, b > -12312413211121, a <= “something”, b >= ‘c’, and a != b + 4 are all cases of alphanumeric operations. The only catch is that the variables involved must be comparable to each other. For instance, in the first case (a<5), a must be convertible to an integer (even if it is not an integer per se), otherwise the application of the operator would yield an error. When applying these operators to string or character types, the comparisons are based on the alphabetical order:

In[89]: “alice” < “bob”

Out[89]: true

In[90]: “eve” < “bob”

Out[90]: false

The case of a letter plays a role in these comparisons as well, since uppercase letters are considered “smaller” in value compared to lowercase ones. This is due to the fact that if you convert a character to an integer (based on the ASCII system), it yields a specific value. All uppercase letters appear first on the corresponding table, giving them smaller values.

Logical operators (&&, ||)

Syntax: A && B, where A and B are Boolean variables. || has the same syntax too. Although && and & are somewhat different under the hood, in practice they can be used interchangeably for this kind of application. Same goes for the || and | operators.

The && and || operators correspond to the AND and OR logical functions, which are complementary and particularly useful when performing different tests on the variables. These tests must yield Boolean variables, since they only work on this type. The && operator yields the value true only if both of its arguments are true, yielding false in all other cases. For example, if x is an integer variable, you can find out if it is between 1 and 100 by employing the following operation: (x > 1) && (x < 100) or (x > 1) & (x < 100):

In[91]: x = 50; y = -120; z = 323;

In[92]: (x > 1) && (x < 100)

Out[92]: true

In[93]: (y > 1) && (y < 100)

Out[93]: false

In[94]: (z > 1) && (z < 100)

Out[94]: false

The parentheses are not essential, but they make the whole expression more comprehensible. This is particularly useful when you use several of these operators in a sequence, e.g. (x > 1) && (y > 1) && (z != 0).

The || operator works similarly to &&, but it marks the value “true” if either one of its arguments (or both of them) is true, while it yields “false” if both of its arguments are false. For example, (x <= -1) || (x >= 1) will cover all cases where x is greater than 1 in absolute value:

In[95]: x = 0.1; y = 12.1;

In[96]: (x <= -1) || (x >= 1)

Out[96]: false

In[97]: (y <= -1) || (y >= 1)

Out[97]: true

Operators can also be nested to create even more sophisticated structures, through the use of additional parentheses: ((x > 1) && (z > 1)) || ((x == 0) && (y != 1)).

Loops

Loops, in general, allow you to perform repetitions of the commands you choose, change the values of variables, and dig into your data, without having to write too many lines of code. Although they are seldom used in high-level languages like Matlab, where they are inefficient, in Julia they are lightning-fast and effective. This is because all the code is compiled in a low-level form that the computer understands, instead of just being interpreted, like in the case of Matlab.

for-loops

This is the most common type of loop, and probably the simplest. In essence, a for-loop involves iterating a variable over a given range and repeating everything in that loop for every value of this iteration. Julia implements this type of loop as follows:

for v = x:s:y

  [some code]

end

where v is the name of the variable, x and y are the first and the last values it takes, and s is the step (usually this is omitted, having a default value of 1). All of these parameters have to be of the integer type. With all that in mind, take a look at the following for-loop and try to figure out what it does to Int64 variable s which is initialized to be equal to 0.

In[97]: s = 0

   for i = 1:2:10   #1

      s += i     #2

        println(“s = “, s)

   end

#1 Repeat for values of i ranging from 1 to 10, with a step of 2 (i.e. only odd numbers in that range)

#2 Just like pretty much every other programming language, Julia uses a += b, a -= b, a *= b, etc. as shortcuts for a = a + b, a = a - b, a = a * b, etc. respectively.

As there are five odd numbers in the range 1:10, the code in the for-loop repeats five times. With each iteration the corresponding number is printed in a separate line, while s increases in value by that number. So at the end of the loop s has taken the value 20. You can track the value of variable s as this script is executed.

while-loops

This while-loop is similar to the for-loop but more open-ended, as the terminating condition of the loop is a logical expression. The while-loop comprises one or more of the aforementioned operators, and as long as it holds “true,” the loop continues. The general structure is as follows:

while condition

  [some code]

end

The condition usually includes a variable that is intricately connected to the code in the loop. It is important to ensure that the condition changes its value at some point (i.e. it becomes “false”), otherwise the code in the loop will be running relentlessly (infinite loop). Below is an example of a valid while-loop, building on the variable c which is initialized to 1:

In[98]: c = 1

while c < 100

    println(c)

    c *= 2

  end  

This brief program basically doubles the value of c until it surpasses 100, printing it along the way. If c had been initialized differently (say to -1) this loop would never end. Also, you may encounter while-loops that start with while true, which could make them infinite if we are not careful. Even in these cases there are workarounds making them a viable programming strategy, as we’ll see later on in this chapter.

break command

There are times when we don’t need to go through all the iterations in a loop (particularly if we want to optimize the performance of an algorithm). In such cases, we can escape the loop using the break command. This is usually done using an if statement, as in the example that follows. Here, Julia parses a one-dimensional array x until it finds an element equal to -1, in which case it prints the corresponding index (i) and escapes the loop, because of the break command:

In[113]: X = [1, 4, -3, 0, -1, 12]

for i = 1:length(X)

    if X[i] == -1

      println(i)

        break

    end

    end

Summary

  • Data types are important in Julia, as they allow for better performance and less ambiguity in the functions and programs developed.
  • You can convert data from one type to another using the target data type’s name in lowercase as a function (e.g. Int64() for converting something into the Int64 type).
  • Unlike Python and most other programming languages, Julia’s indexing starts with 1, instead of 0.

Chapter Challenge

  1. 1. Have you checked out the useful tutorials and reference material on Julia in Appendix B?
  2. 2. Is it better to use a function implemented in Julia (optimized for code performance) or to call the corresponding function from another language?
  3. 3. Say you want to create a list of the various (unique) words encountered in a text, along with their corresponding counts. What kind of data structure would you use to store this, to make the data most easily accessible afterwards?
  4. 4. Does it make sense to define the exact type of each input parameter in a function? Could this backfire?
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.103.10