In the preface to this book, we mentioned the triangle representing the pillars of information technology. Chapters 1 and 2 dealt with two of these pillars: computer hardware, on the one hand, and means of communication, on the other hand. This is the third pillar, probably the most important, because the concepts of algorithm and software are at the heart of computing.
A computer system is a set of hardware or software components, put together to collaborate in the execution of an application.
A computer system can be described as a stack of layers, each one resting on the underlying layer and providing new and more elaborate functions. Adapted interfaces allow these layers to communicate with each other.
Figure 3.1 is a simplified representation of a computer system.
Hardware is composed of processors, memories, disks and peripheral units (screen, mouse, printer, etc.). The operation of the hardware is based on instructions composed of only 0 and 1, and is said to have the lowest layer of abstraction at the hardware level.
The operating system (the core and utilities that provide basic services to the user) is the central element of the system. This layer will support the communication between applications and hardware, and ensure the execution of the instructions it has received from the application. Special languages will be used to build the operating system programs.
The application is what the user manipulates. The user does not, in general, need to know the operating system, let alone the hardware layer. The user sees the application as a virtual machine which will manage the communications with the underlying layer, the operating system. This virtual machine will also be able to operate thanks to programs built with languages.
We will see that some system architectures, for example distributed systems, require additional layers that are often called middleware.
Software is therefore a very large set of computer programs, each with specific characteristics and functions. The way of programming will be very different if it is a question of writing a web page, the driver of a peripheral device, the tasks of an industrial robot or the user interface of a smartphone: programming environment, language(s), security constraints, etc.
This chapter aims to describe this diversity of programming languages and software at one level or another of this layered structure, as well as the main aspects of software development. It will end with a presentation of the ways in which software is protected and distributed.
We discussed algorithms in Chapter 1, and remember that an algorithm is simply a way of describing, in great detail, how to do something “mechanically”. Algorithms can be written without any link to computer science, as shown with the example of the recipe!
The algorithm must accurately translate what you want to do, the “intent”. And there are always several ways to do this translation, so several ways to build an algorithm to solve a well-defined problem. However, there is almost always a better way, according to various criteria (speed, required memory space, etc.).
We have already described an algorithm to compute the PGCD of two numbers. We now want to calculate the factorial F of a natural number N. It is the product of strictly positive integers less than or equal to N, and it is 1 if N = 0. For example, the factorial of 5 is 1 x 2 x 3 x 4 x 5 = 120. The direct method is based on the finding that the factorial of N is equal to N multiplied by the factorial of (N – 1). The algorithm can then be described as follows (Figure 3.2):
This algorithm does not check that the number N is not negative. If, by some misfortune, I get a negative N read even though the statement says that N is positive or zero, the algorithm will never end; there will be what is called a bug. Adding this test in the algorithm would therefore not be useless.
This algorithm can be expressed in a programming language taking into account the specific syntax of this language and the four ingredients: variables, instructions that execute sequentially, tests, and loops.
Variables can be seen as boxes, for example memory areas, in which values can be put. Each variable, here F and N, has a value at a given time. These variables can be complex, and we can find (beyond numerical values and character strings) lists, tables, graphs and containers corresponding to images, sounds, etc.
An instruction or sequence of instructions transforms a state, that is, the value of variables. In our example, the value of variables N and F is modified by instructions.
In the sequences, we can introduce branches from tests (if the condition is true then I do this, if not I do that) and loops (repeat this as many times as necessary). In our case, there is a test (Is N equal to zero?) and a loop, which is the repetition of three instructions. We could simplify this writing with a loop instruction like:
FOR N RANGING FROM N TO 1, CALCULATE F=N*F
The algorithm is thus the abstract skeleton of the computer program, independent of the particular coding mode, which will allow its effective implementation within a computer system.
The processing required from a computer is generally complex, and it is impossible to describe it with a single global algorithm. The computer program that translates it can contain hundreds of thousands or even millions of lines. We will break down the problem into less complex subproblems that will facilitate the structuring and writing of the program.
A subprogram is a block of instructions performing a certain processing. It has a name and is executed when it is called upon. This processing involves data; the value of which may change from one call of the subprogram to another. The data transmitted to the subprogram by the main program are the parameters of the subprogram.
There are two types of subprograms:
Subprograms offer the means to create relatively independent programming modules that can be developed, checked separately, and reused as many times as desired. You can also use subprograms written by colleagues or available in program libraries; there are many of them, often adapted to particular contexts.
The algorithm describes precisely what I want to do. My goal is that this description becomes an operational realization: a database of my music records (categories, musicians, location on my shelves, etc.), a web page describing my family tree, the calculation of the factorial of a positive or null integer. I need to translate this algorithm by writing a computer program in a language that I know, or that I could learn, and that could eventually become an executable program by the computer system I will have access to.
The problem is that the computer’s processor only knows how to process sequences of 0 and 1, so it has to be told in binary code which operations it has to execute, operations being globally of four types:
If I do not “speak” machine language, the only one that the processor understands, I will need an automatic translation system between my programming language and the language the computer understands. This is the role of particular programs called compilers or interpreters.
A programming language is a conventional notation for formulating algorithms and producing computer programs that apply them. In a similar way to the English language, a programming language is composed of an alphabet, a vocabulary, grammar rules and meanings. All languages must follow a precise syntax; when learning a language, we must first learn the syntax.
Depending on the programming language’s level of abstraction, a language can be said to be “low level”, consisting of elementary instructions close to the machine (assembler, for example), or “high level”, consisting of more abstract instructions that can be translated into a certain number of elementary instructions (Fortran, Pascal, C++, Lisp, etc.).
Computer languages are to be considered at several levels of abstraction, from the lowest, closest to the electronic components, to the highest, closest to the user, more independent of the hardware.
Hundreds, even thousands of languages have been defined and implemented. There are many reasons for this:
A few hundred languages are probably still in use, and a few dozen are used to a large extent.
The success of a language depends on several factors, including
Figure 3.3 presents a hierarchical approach to the main programming languages, referring to “basic” rather than “low-level” software because it is on them that the whole computer system and application software is based, taking into account the interfaces, of which the compilers are the main element.
Basic or “low-level” languages are used to control the machine’s operation. A distinction is made between machine language and the languages associated with the development of the operating system.
A processor, by design of its architecture, knows only a reduced number of instructions, called its instruction set, which allows it to execute arithmetic and logical operations, to make data movements between memory and registers, connections, etc.
Since it only includes instructions coded in binary (0 and 1), a program intended for it will have to be written in the machine language specific to its architecture. The “words” of a machine language are instructions, each of which triggers an action on the part of the processor. These words are composed of 0 and 1, for example 10110000 01100001.
How do we directly program the instructions available on a processor? Writing a program using machine language is a challenge and assembly languages have been created to simplify this work. An assembly language provides a symbolic representation of a machine language. An instruction in this language corresponds to an instruction in the machine language.
In the assembly language, an instruction consists of the mnemonic code of the type of operation and the symbolic representations of the operands.
For example, the instruction:
MOV DESTINATION, ORIGIN
will copy the data from the ORIGIN address into the contents of the DESTINATION address.
A special program, called the assembler, will translate the program written in assembler language (source code) into machine language (object code). There can be several assembly languages for the same processor, each with its own syntax (e.g. Intel, AT&T or Microsoft syntax for the x86 processor) and as many assemblers as assembly languages. The result, from the point of view of the machine program, is of course the same.
In short, the processor only understands instructions written in its machine language; these instructions can be written in an assembly language and then translated by an assembler.
The assembly language is the basic language used to write the programs that constitute the Operating System (OS), which has two essential roles:
It is composed of two main parts:
Without going into detail, the main functions of the OS can be summarized as follows:
Let us continue this layered approach with Figure 3.4, which illustrates the OS’s role.
The operating system somehow masks the complexity of the lower level, the hardware. It thus presents a kind of virtual machine to its user. And this will become increasingly important with the growing complexity of computer systems, especially the space taken up by distributed systems.
Operating systems have, of course, evolved considerably over time.
The early OSs were very simple: the program was read (usually on punched cards), the computer would run the program, print the results, and then move on to the next program. The sequence of jobs was done by a program called a monitor, or batch processing system, and this was the first form of an operating system. This was called monoprogramming (managing only one program at a time), which had, among other faults, a misuse of the CPU: while a card was being read or a line printed, the CPU would wait, because computers of the time could only do one thing at a time.
Innovations in the management of peripheral units and an increase in memory size have enabled a new execution scheme, multiprogramming, which consists of making several programs coexist simultaneously in memory, and in taking advantage of the dead times of a program (e.g. input–output operations) to “advance” the other programs. In early multiprogramming systems, the memory was divided into fixed-sized areas in which user programs were loaded (this was the case in the MFT version of IBM’s OS/360). Later, the limits of these zones, in variable number, were dynamically adjustable (the MFT version was replaced by MVT, Multiprogramming with a Variable number of Tasks), which allowed a better use of the memory.
Users do not yet have the ability to interact with the program they are asking the machine to run. Time-sharing systems have transformed the way computers are managed. The principle of time-sharing is simple: it is about taking advantage of the speed differences between a computer, then clocked at the microsecond, and a human whose interaction time including reflection and command keystrokes is several seconds. Once a user’s request has been processed, the computer moves on to the next user’s request, then to the next user and so on. Virtually all current systems incorporate the concept of time-sharing.
One can imagine that the operating system becomes more complex, especially since it is necessary to manage the memory which is limited in size and is shared between several users. We will therefore discover the so-called virtual memory technology that allows a computer to compensate for the lack of physical memory by transferring data pages from the central memory to an external storage (disk, for example). When the OS needs certain pages to run the program, it frees up some space in the main memory by transferring pages that are not needed at that moment so that it can load the necessary parts of the running program. This process is called paging.
Some operating systems must meet strong constraints; this is the case for real-time systems. These are systems whose correction depends not only on the values of the results produced but also on the time frame in which the results are produced. The system must respond to external stimuli within a specified delay. This is the case in the control of vehicles (including aircraft), robotics, industrial process control, etc.
After multiprogramming systems, then time sharing or real time, the arrival of distributed systems has complicated the architecture of operating systems. A distributed system is a set of computers, independent, networked and cooperating to take advantage of the various physical resources (processors, memories, etc.) available. It is not easy to make this distributed system appear to a user as a single, coherent system. A layer of software (middleware) is interposed between the operating system and the elements of the distributed application. It is responsible for managing heterogeneity (programming languages, operating systems used, etc.) and will offer one or more communication services between the elements forming the application.
Finally, the widespread use of embedded systems in almost all everyday devices has given rise to the development of systems adapted to specific devices and applications: mobile telephony (smartphones), sensor networks, automotive, avionics, etc. These systems are generally smaller and simpler than general-purpose systems.
Let us take a look at a few dates that have marked the history of operating systems:
Over the years, operating systems that did not disappear experienced new versions, made necessary by technological developments (architectures, hardware, etc.). New OSs have appeared, in particular to meet the demand for mobile equipment and the explosion of the market for connected objects.
We have situated OSs between hardware and applications, so it makes sense to look at the broad categories of OSs based on hardware and application characteristics.
Today, there are two main classes of “universal systems”: Unix-based OSs and Linux-based OSs.
UNIX occupies a central place in the history of IT. This OS is not aimed at the general public but at organizations, their servers and workstations.
The Unix system is a multi-user, multi-tasking operating system that was developed beginning in 1970 by Ken Thompson and Dennis Ritchie at AT&T’s Bell Labs. First written in assembler language, it was then written in C language, which made it possible to modify it before compiling it and thus facilitated its porting to various processors.
In 1977, two branches of development were created: the branch commercialized by AT&T which was to become System V of UNIX System Labs (USL) and the BSD branch (Berkeley Software Distribution) distributed free of charge by the University of California.
Various versions of Unix were developed by major manufacturers, including:
The large number of Unix variants, each with its own specificities, has allowed Unix systems to be used in many different environments. However, this diversity posed serious problems of communication between systems, which led manufacturers to turn to GNU/Linux.
In 1991, Linus Torvalds1, a student at the University of Helsinki (Finland), started to develop a Unix kernel for his own use and published the source code on the Internet. Many developers joined this project and gradually turned it into a full operating system. This system took the name Linux, in reference to the name of its creator (Linux is the contraction of Linus and Unix).
Linux is therefore, from the outset, an operating system kernel that enables management of the execution of applications on any computer equipment. Today, the complete Linux system, including kernel and utilities, is freely available with its source code to anyone wishing to use, modify and redistribute it (GNU or General Public License; we will specify this notion of license in section 3.7.2).
In particular, thanks to its open-source model (freely available source code), Linux has made it possible to give rise to hundreds of Linux OSs or distributions, that is, complete operating systems based on this kernel. These distributions can be designed to play an OS role on a PC, a connected object, a server, etc. Among the most popular Linux distributions are Linux Mint, Linux Ubuntu, Red Hat, Debian, Suse Linux, Kali Linux and Arch Linux, each of which has its own specificities.
Linux has conquered many consumer devices: music players, ISP boxes, personal assistants, GPS, phones, etc. The 500 most powerful supercomputers in the world are now also running on this system. It is still very much in the minority on personal computers (less than 2%) because its use requires good computer skills, even though the open-source community has produced a large number of software programs that can be used in many areas (office automation, multimedia, Internet, etc.).
In the end, Linux continues to evolve thanks to a very strong community of users around the world.
More than 30 years ago, on November 20, 1985, Microsoft (founded in 1975 by Bill Gates and Paul Allen) introduced Windows 1.01. It was more of a GUI (Graphical User Interface) than an OS because it was based on the DOS system bought from IBM.
Successive versions of Windows, the most recent (as of 2019) being Windows 10 available since July 2015, have made it a very complete system whose popularity, due in part to the Microsoft Office product line (Word, Excel, etc.), has continued to grow. Windows now equips hundreds of millions of PCs (compatible with the range of personal computers derived from the IBM PC) and remains the undisputed leader in this market. According to the NetMarketShare site, Windows equips 88% of personal computers (end of 2018).
Microsoft has also developed Windows Server, which is an operating system in its own right, intended for use in companies. It allows you to administer large computer equipment, manage user accounts, group policies, messaging, etc.
Apple’s first operating system, designed to equip its personal computers, came to fruition in 1984. In 1997, the name Mac OS (Macintosh Operating System) appeared. Mac OS is part of the Unix family of operating systems. The first version of Mac OS X was released in 2001 and versions follow one another.
In 2016, Apple chose to rename OS X to macOS in order to harmonize the name with the other OSs of the brand (iOS, watchOS, tvOS). The most recent version (October 2019) is macOS Catalina (OS 10.15).
According to the same NetMarketShare site, by the end of 2018, Apple’s OS would equip nearly 10% of personal computers.
Like Microsoft, Apple has developed a server version of macOS called macOS Server.
Smartphones and tablets have specific operating systems that must take into account the mobility of users, the limitations of mobile terminals in terms of computing power, memory capacity and energy autonomy. Systems have had to become lighter to adapt to these new constraints.
Nearly all manufacturers have started to develop an OS for their products. We can cite Symbian OS, which was used by Samsung, Motorola, Sony Ericsson but abandoned in 2013, as well as BlackBerry OS and Windows 10 Mobile from Microsoft (these two being no longer developed but still benefiting from maintenance), and finally two systems that equip phones and tablets today: Android and iOS.
Android, based on Linux, was developed by Google, and it dominates the market (86.8% market share in 2018 according to IDC Research) and is used by the majority of manufacturers. iOS, derived from macOS, equips Apple’s hardware, with a 13.2% market share.
Some predict some convergence of laptops and tablets (hybrid computers). The first products available in 2018 are based on the OS of computers (Windows 10) or on the OS of tablets (iOS for Apple). In any case, it is very unlikely that new OSs will be developed in this context.
As mentioned, real-time systems (RTOS for Real-Time Operating System) must respond in a finite and specified time to stimuli generated by the outside world. We will also talk about embedded systems for these components that integrate software and hardware and provide critical functionality with continuous interaction with their physical environment.
Two categories of real-time systems can be distinguished: “hard” real-time systems (the response of the system in time is vital, so it must be reliable and available in all circumstances), and “soft” real-time systems (time constraint overruns are tolerable within certain limits). They are generally broken down into subsystems with interacting tasks or processes, and their core must react very quickly to external events.
There are many real-time systems; here are four of them:
Connected objects, such as smartphones, need an OS to function, but with two strong constraints: use as little memory as possible and consume as little power as possible.
There is not yet a dominant OS for all categories of connected objects, as the spectrum of sizes, uses and hardware configurations is so wide. However, the Linux kernel is, again, widely present and some real-time OSs may have lighter versions adapted to connected objects.
In late 2016, Google announced Android Things, an Android-derived platform designed to run on low-power devices. This announcement should be seen in a context where major international groups are seeking to capture the emerging market of the Internet of Things, a strategy that sometimes relies on a specific OS. While Microsoft is pinning all its hopes on Windows 10 IoT, Huawei seems to be pinning its hopes on LiteOS, an ultralight system (some cite a size of 10 Kbytes) for connected objects, while Samsung supports Tizen, based on Linux and developed in collaboration with Intel.
Remember that the “high-level” programming languages are located in the layer above the operating system layer. They therefore essentially concern applications.
We will discuss some of these languages, positioning them according to the principles that have guided their development. The programming paradigm will be discussed as a way to approach computer programming and to deal with solutions to problems and their formulation in an appropriate programming language. Each paradigm is defined by a set of programming concepts.
There are four main programming paradigms:
Here are some of the languages that have marked or still mark the history of programming. They have, of course, evolved over time (versions, normalizations, etc.).
The building blocks of mandatory programming are instruction sequences, loops and switches. We talk about sequential execution. The program tells the machine precisely what it must do (imperative orders) and efficiency is sought.
It is, historically, the basis of the first programming languages. We can retain some languages.
Fortran (Formula Translator) was founded in 1957 by John Backus from IBM. It was intended to provide scientists with a simple means of moving from arithmetic expressions to an effective program. It is very efficient in the field of numerical computation and has been the subject of several standardizations over the decades to exploit the new possibilities of computers (vectorization, coprocessors, parallelism). It is still used in major scientific applications.
Cobol (Common Business Oriented Language) was created in 1959 in response to a request from the US government for a manufacturer-independent programming language to manage the US administration. It was designed to create business applications and added the concepts of record and structured files. It is still used today, after several standardized revisions, for example in banks.
PL/1 (Programming Language number 1) was developed by IBM in the early 1960s. Its objective was to be universal and to be able to replace languages for scientific (Fortran) or commercial (Cobol) purposes. The programmers of each of these two languages did not appreciate the additions made for the other language, which did not allow PL/1 to take the place it should have taken (I myself greatly appreciated the efficiency of PL/1 at the end of the 1970s!).
C is an imperative and general-purpose language that was created in 1972 at Bell Laboratories, by Dennis Ritchie and Ken Thompson, for the development of the Unix system. Some consider it a lower level language than the three previous languages because its instructions are closer to the machine and the user has to program certain processes that are automatically supported in the higher level languages. The power of C – fast, portable and general-purpose – has ensured its continued success. It has influenced many newer languages including C++, Java and PHP.
Without being exhaustive, we should also mention languages that have had a significant place in the past: Algol (Algorithmic Oriented Language), created in 1958 but which has not been commercially successful; Pascal, a descendant of Algol created in 1968 by Niklaus Wirth; APL (A Programming Language), created in 1962 by Ken Iverson and which has been widely used in applied mathematics.
A functional language is a language in which the notion of function (procedure, subprogram) is central. A program is no longer a series of instructions to be executed but a composition of functions. This definition covers so-called pure functional languages, in which any calculation is performed by means of function calls (Haskell or Coq, for example) and imperative functional languages (Lisp, ML).
Their proximity to mathematical formalism is a major asset of these languages, because this feature makes it easier to prove, than in the imperative paradigm, that programs achieve what they were designed for.
Lisp (List Processing) is the oldest family of programming languages that is both imperative and functional. Lisp was born out of the need for a symbolic programming language that performs well in symbol reasoning rather than in numerical information processing. It was created in 1958 at MIT by John McCarthy, one of the founders of artificial intelligence. Lisp consists of functions to be evaluated and not of procedures to be executed like procedural languages. Lisp was followed by many versions like Common Lisp and is used in industry (robotics), aeronautics (NASA), etc.
Caml (Categorical Abstract Machine Language) is a general-purpose programming language that falls into the category of functional languages, as well as lends itself to imperative (and object-oriented) programming for OCaml. Descendant of the ML (Meta Language) language created by Robin Milner in the 1970s at the University of Edinburgh, it has been developed since 1985 by Inria, designed to create secure and reliable programs. The functional style brings the Caml language closer to mathematical writing. Numerous extensions of Caml allow the language to cover various concepts.
Haskell is a pure functional programming language, created in 1990 by a committee of language theory researchers interested in functional languages and “lazy” evaluation (the program will not perform the functions until it is forced to provide the results). Haskell has evolved a lot and now has very efficient compilers like GHC (Glasgow Haskell Compiler).
Object-Oriented Programming (OOP) consists of the definition and assembly of software bricks called objects. The definition (or structure) of an object describes the characteristics of the object, an object (or instance) being the concrete realization of the object’s structure. We can say that OOP is a way of developing an application that consists of representing (also called modeling) a computer application in the form of objects with properties that can interact with each other. Various languages allow this approach, although the most important are not limited to this paradigm, which was introduced in the 1970s by Alan Kay, who worked in the Xerox laboratories in Palo Alto.
Ada, the first version of which dates back to 1983, was designed following a call for tenders from the US Department of Defense (DoD), according to very strict specifications, by a French team led by Frenchman Jean Ichbiah. The Ada 95 version was an internationally standardized object language. It is used in real-time and embedded systems that require a high level of reliability and security: avionics, military uses, etc.
C++ was created in 1983 by Bjarne Stroustrup. From version 2.0 of the C language, it is a programming language allowing programming under multiple paradigms. Like C, it is a so-called “low-level” language because it is closer to the operation of the machine, which makes it particularly efficient. Its good performance and compatibility with C make it one of the most widely used programming languages for performance-critical applications.
Java, created by Sun Microsystems, was officially created in 1995. It has many advantages such as being portable, meaning that a program made in Java can be run on different platforms such as Windows, MacOS and Linux. This portability is due to the fact that this language is not compiled in machine code but it is compiled in an intermediate language called ByteCode. It then requires what is called a Java Virtual Machine (JVM) to run it. Only this virtual machine changes depending on the system. The Java language is popular mainly because it is the basis of most networked applications. Java technology can be found in all areas: laptops, game consoles, scientific supercomputers, cell phones, etc.
Python is an interpreted, portable, dynamic, extensible, free language that allows a modular and object-oriented approach to programming. Python has been developed since 1989 and many volunteer contributors have improved it over time. Python is suitable for scripts of a dozen lines as well as complex projects of tens of thousands of lines. Python’s syntax is very simple and, combined with advanced data types (lists, dictionaries, etc.), leads to programs that are both very compact and very readable.
C# is an object-oriented language created by Microsoft in 2002. Derived from C++ and very close to Java, it is intended to develop many applications, especially web applications. It is part of a larger set, known as the Microsoft .NET Framework. C# is increasingly seen as a competitor to the Java language.
Other languages have been created, thanks to research work and to respond to new needs.
Prolog (an acronym for the French for logical programming: programmation logique) was created by Alain Colmerauer and Philippe Roussel in the early 1970s. It is a declarative programming language for solving logical problems. The principle of logical programming is to describe the statement of a problem by a set of expressions and logical links and allow the compiler to transform it into a sequence of instructions. Prolog is used in artificial intelligence and computer language processing.
Esterel is a precursor of synchronous reactive languages for applications whose function is to constantly interact with their environment (aircraft, robotics, autonomous vehicles, etc.). Based on signals, it was created in the 1980s by Gérard Berry’s team at Sophia-Antipolis. Together with Lustre, another synchronous reactive language, it became SCADE, which is developed industrially by the company Esterel Technologies.
Finally, even though we are not talking about programming languages in the strict sense of the word, we should mention two languages associated with the Web: HTML (Hypertext Markup Language) already mentioned in Chapter 2, and PHP (Hypertext Preprocessor), created by Rasmus Lerdorf in 1994, a scripting language mainly used to produce dynamic web pages.
We have seen some languages, but there are many others, and new languages are appearing.
The IEEE (Institute of Electrical and Electronics Engineers), a reference organization in the field of information and communication technologies, publishes a list each year of the languages considered to be the most important. Figure 3.5 gives the IEEE2 2018 list.
We note that the applications associated with communicating computing have a very strong impact in this ranking.
Software is commonly classified into two categories: system software and application software.
System software includes, in addition to the operating system, utility software. While closely related to the operating system, it is not part of it. These utilities include file managers (backup, compression, archiving, version management, etc.), disk managers (de-fragmentation, cleaning, etc.), security software (antivirus), communication software (browser, search engine, e-mail), PDF or audio file reading software, etc.
Although the utility software that comes with operating systems is becoming increasingly comprehensive and sophisticated, users often install third-party utility software as a replacement for or in addition to the utility software that comes with the operating system.
Unlike system software, application software (often referred to as apps) is chosen by the user to meet his or her own needs or to perform special processing. Included among these apps are the following:
The development of software often represents a significant investment involving many people, from project managers to IT specialists and to users. The aim is therefore to produce software with a high level of quality. For their part, software companies are looking for a standard to guarantee the quality of the products they offer to potential customers. But what are the criteria for defining quality? Organizations developing large software products have been working on this subject for a long time.
The ISO 9126 standard, “Software engineering – Product quality”, defines and describes a series of quality characteristics of a software product. It was published in 1991 and revised in 2001. The SQuaRE (Software Quality Requirements and Evaluation) standard, defined since 2005, is the successor to the ISO 9126 standard.
The SQuaRE model consists of eight characteristics, broken down into subcharacteristics, each of which must have a precise definition. If you, reader of this book, are developing software, think about these characteristics:
The term software development is used to refer to all activities related to the creation of software and the programs that make it up, programming being the part corresponding to the writing of the programs themselves. The main stages of software production, whatever they may be, are the following:
Software development specialists have always sought to formalize methods to make this work more efficient and provide a result in line with the needs expressed by the user (the client). Here are two methods that show two slightly different approaches.
The V-model is a well-known organizational method that was adapted to computers in the 1980s. It is one of the first methods learned at school and it is still used today.
The V-model is a cycle composed of three main phases: design (from the analysis of the needs expressed by the user to the detailed design), implementation (coding and unit tests of the different modules), and validation (the software components are integrated into the final solution to check that the integration does not cause any anomalies). The product is then tested against the functional specifications and validated before going into production.
The V-model method is based on the principle that the client already knows all the features he or she needs and that he or she will not change his or her mind during development. In addition, it is often during product implementation that conceptual problems are identified. The main flaw of this method is therefore its lack of flexibility. It is particularly for these reasons that many IT projects are stopped along the way, and others end up costing a lot more than they could have if they had been delivered on time and on budget, while offering fewer features than required.
To give flexibility to the development process, the AGILE method proposes a completely different approach: project management by iterative cycle. This approach considers that the need to which the software must respond cannot be fixed, and it proposes, on the contrary, to adapt to changes in the software.
The customer and the service provider will define together an overall objective to be reached, as well as the organization and functioning of the project team. Then, several steps leading to this objective are determined. Finally, each of these steps will be divided into tasks to be handled by the development team. The software is thus developed step by step, each of which must be validated before moving on to the next step, until the overall objective is reached.
The AGILE method is flexible (users and developers have to work together throughout the project); it is reassuring because the client sees developments as they happen; it is fast because the customer can use the system even though all the functionalities are not yet implemented.
The production of software is complex and poor quality (from insufficient verification and validation, cost and time overruns, etc.) can have serious consequences. Numerous failures have easily shown that this production must be based on systematic, rigorous and measurable procedures to ensure that the specification corresponds to the client’s real needs and that the deadlines and costs allocated to the production are respected.
Software engineering can be defined as the set of methods, techniques and tools dedicated to the design, development and maintenance of computer software. We could say that it is the art of building software industrially.
Software engineering is not only concerned with the technical aspects of analysis and design, but also with the entire software lifecycle – needs analysis, design, maintenance, testing, certification, standardization, deployment – as well as organizational aspects (team building, process management, forecasting and monitoring of costs and deadlines). Project management is a complement to software engineering.
Various standardization bodies (AFNOR, IEEE, etc.) have published recommendations and standards to be applied in software engineering, and many software engineering training courses exist and are tending to multiply.
We have seen that software is everywhere. In particular, we find it in the embedded systems that have now invaded our daily lives: smartphones and tablets, car navigation, bank cards, multimedia (photo, video, music), etc. This software sometimes contains millions of lines of code and we are obliged to trust them (when I press the brake pedal of my car, I do not wonder if the software that will manage braking has been well designed and made!).
In all the development cycles seen above, the test phases are crucial. Indeed, a software malfunction can have catastrophic consequences and, moreover, it will generate high costs since the defect will have to be corrected.
The term “bug” is used both to designate a defect introduced into the software during one of the phases of the development cycle and an anomaly in the expected behavior. The term bug comes from a tasty anecdote: in 1947, an insect got stuck in a relay of the Mark II computer at Harvard University, causing a failure!
One of the most famous quality defects is the one that affected Flight 501, the inaugural flight of the European launcher Ariane 5, which took place on June 4, 1996. It resulted in a failure, caused by a malfunction in the data processing, which saw the rocket explode in flight only 36.7 seconds after lift-off. Several problems of software non-quality were at the origin of this spectacular mistake. The portion of code at the origin of the failure had in fact been taken from an old system (in this case Ariane 4), and was unfortunately not adapted to the power of its successor. The cost of this mistake was several hundred million euros!
The automotive industry is not spared from defects in its embedded software. For example, in 2013, and based on an expert report on the software managing the acceleration system, a US jury judged for the first time that design problems in the embedded software on the Toyota Camry were responsible for the uncontrolled acceleration of the vehicle, which resulted in the death of a passenger.
The verification of software based on numerical models (avionics, weather, etc.) is particularly tricky because of the accuracy of the data handled (number of digits, rounding, etc.). From the implementation of arithmetic in programs to the actual specification of tasks, the sources of error are multiple. For example, the Mars Climate Orbiter probe crashed on Mars in September 1999 due to a staggering blunder: some of the software designers assumed that the unit of measurement was the meter, and others assumed that it was the foot.
But mistakes are not always human. A compiler translates a program into machine language, but a bug in the compiler can lead to false machine code. This is also true of operating systems, which are very complex and therefore prone to bugs (you may have noticed this with the OS of your personal computer). Errors can even occur in the processors themselves, as was the case in the floating division operation (an operation in which dividend and divisor are real numbers, not integers) of the Intel Pentium processor in 1994.
Readers interested in other examples of defects that have had serious consequences can refer to Gérard Berry’s excellent book, L’Hyperpuissance de l’informatique (Berry 2017).
Experts in the field estimate that in Europe, losses due to programming errors cost more than 100 billion euros per year.
Software testing is the process of analyzing a program against its specifications with the intention of detecting possible anomalies in order to validate it. It is the oldest and most widely used method, still used today. In the software development process, almost half of the effort is spent on verification testing (does the system do its job properly?) and validation testing (does the system produced correspond to the customer’s needs?).
There are generally two types of tests, namely static and dynamic testing. Static testing is done without running the software (in part or in full). This is the case for code or specification inspection reviews, done by specialists. The dynamic test, on the contrary, consists of testing a system by running it with real input data and this is what we are going to specify.
Regardless of its type, the test is effective in finding errors, but it cannot prove that there are no errors.
Without going into detail about the classification of tests, we will limit ourselves to giving the four levels of testing identified by the French Committee for Software Testing (Comité français du test logiciel, CFTL): component (testing of small parts of the code, separately), integration (testing of a set of parts of the code that cooperate, testing of modules together), system (testing of the entire system, inspecting its functionality) and acceptance (acceptance by the user).
It is also necessary to ensure the non-regression tests, which consist of verifying that the correction of the errors has not affected the parts already tested (the tests already executed must be systematically repeated).
Manual or automatic tests? We can distinguish:
These verification and validation activities are most often done under the responsibility of a testing professional because the program developer is not in the best position to verify the program’s quality.
A program usually has a very large number of possible executions and it is impossible to validate all of them. It is therefore necessary to define a test strategy to be as relevant as possible. This strategy must be integrated into the software development process. It depends on the criticality of the software (seriousness of the consequences of a possible anomaly) and the cost of software development; it defines the resources required (human resources, tools) and the criteria used to establish whether each element of the software has succeeded or failed.
The test plan identifies the objectives and the means to carry out the tests. It allows the technical organization of the tests, defining what will be tested, why to test, how and when the tests will be carried out, and who will test. The goal is to establish the order in which each component is ready, tested individually, and integrated with the other components of the system. It also serves as a validation document for the final quality of the software and is part of the project contract documents, along with the technical specifications or functional requirements. It is designed by the test manager and validated by the project manager.
Testing, no matter how extensive, cannot guarantee that a software program is bug-free, especially if it is a “big” piece of software in terms of lines of code.
Software publishers are well aware of this and therefore include clauses in user licenses that exclude the guarantee of conformity. The user therefore accepts the software as is.
Specific software developers, for their part, often offer free bug fixes for a limited period of time.
A legal action based on the existence of a latent defect can only be invoked if the purchaser proves that the malfunction is due to a defect that is both hidden and serious, which in the IT field can be difficult. The tests therefore do not guarantee anything.
We will see another approach, that of formal methods, which tries to guarantee the absence of bugs.
Software is everywhere and can be millions of lines of code. Their vulnerability to both design and programming errors as well as to malicious use requires that their verification and validation be pushed to the limit, as they can put human lives at risk.
The difficulty of making programs right is a well-known problem. The first approach, which is the oldest and which we have gone through, relies on software engineering and the use of tests. However, the cost of these tests can be very high and they provide no guarantees.
Another approach, much more scientific, is to use mathematical tools to specify, validate and verify computer systems. These mathematically based approaches are commonly referred to as formal methods. Researchers have long imagined giving a mathematical meaning to a program, the choice of mathematics being explained by the need for rigor and precision in describing the behavior of complex systems.
Formal methods are computer techniques of great rigor that, with the help of specialized languages and logical rules, ensure (ideally) the absence of any defect in computer programs. The main elements of a formal method are:
Generally, a formal specification describes what is to be done (what?) without saying how it will be done (how?). It prescribes a property that is deemed necessary to obtain. Its statement may be a function, capability, characteristic or limitation that a system, product or process must satisfy.
A simple specification consists of defining the types of data that will be manipulated in the software, which will enable checking their consistency. For example, the cosine function cannot be applied to character data. Checking that the software manipulates the data in coherence with its type allows errors to be detected, even before the software itself is executed.
Formal methods can be applied at different stages of the system development process, from specification through verification to final implementation. Automation of the refinement process speeds up software development by entrusting a tool with the mechanical transformation of a complete abstract model into an installable model.
Formal verification of a system also assumes that the tools used, such as a compiler, have been formally verified. With this in mind, Xavier Leroy and his team at Inria developed the CompCert compiler for the C language, entirely written and proven with the Coq software, a tool for writing and verifying programs also developed at Inria under the direction of Gérard Huet.
This approach can be summarized with the three stages presented in Figure 3.8, for which verification tools can be used.
This approach using formal methods challenges traditional development cycles, and in particular the V-model described above.
Invented by the Frenchman Jean-Raymond Abrial in the mid-1990s, the formal B method is an approach that makes it possible to specify and design software while ensuring its safety and reliability. Thus, the whole specification, design and coding process is entirely based on the realization of a certain number of mathematical evidence.
The formal B method evokes the set comprising the B-language, refinement, evidence and associated tools. It has been successfully used for several industrial applications, and includes several development steps:
The embedded software of the automatic control system for line 14 of the Paris metro (METEOR) was thus developed using the B method. Since its first tests in 1997, its safety systems (86,000 lines of code, which have been proven to comply with the original specifications) have not experienced any failures.
Even though industrial tools are now available to implement this approach with a sufficient degree of maturity, the use of formal methods is still not widespread and their large-scale use will still require investment in research, as well as in the training of computer scientists.
Certification is a procedure by which a software publisher obtains the attestation of compliance with a quality standard. In general, certification authorizes the use of a label, such as the NF label. Certification bodies are independent and follow long and rigorous processes before awarding a certification to a software, whereas a certificate issued by the software publisher means that it has paid particular attention to compliance with standards in the design of its software.
Measuring software quality is more complex than it seems. First, it is necessary to clearly define what the word quality means for each company, each team and each piece of software being measured. Once the quality objectives have been clearly established, the next step is to know how to measure them. Different models exist for measuring quality, but none, to my knowledge, has defined a clear standard to provide official certification.
It is not simple to measure the quality of a word processing software package, for example, from the characteristics of the SQuaRE model. However, this has not prevented the implementation of certifications for specific software that must comply with very strong constraints, particularly legal ones. The NF software certification is undoubtedly the best known and is a mark for software professionals.
Since the entry into force of the VAT anti-fraud law on January 1, 2018, publishers and users of cash register software have not been able to miss this certification. Cash management software has the obligation to meet the conditions of inalterability, security, conservation and archiving of data. The certification, in this particular case as well as in the case of accounting software, first focuses on the characteristics “functional adequacy”, “reliability” and “security”.
In another field, that of health, the Haute autorité de santé in France is in charge of establishing certification procedures for healthcare professionals’ software (prescription assistance software, dispensing assistance software, database on medicines). In particular, it draws up reference systems containing all the requirements to be met.
The work carried out by numerous research teams should lead to metrics that will enable the precise conditions for certification to be defined with a global quality reference system. However, development through formal methods is a recognized approach.
Software is therefore a set of programs accompanied by everything necessary to make it operational. The creators of software are, for the most part, companies called software publishers. The software market represents considerable economic stakes. According to Syntec Numérique, one of the France’s professional associations in the digital industry, it represented more than 13 billion euros for France in 2016.
Software is a work that has a cost (the means used to create it) and a value (that stems from the service it provides, from the expertise contained within the source code). It is useful to understand how software can be protected (against piracy or counterfeiting, for example) and under what conditions it can be used by others.
So we are going to cover a little bit of software law, trying not to be too boring, before describing the main modes of distribution.
In France, the Intellectual Property Code (Code de la propriété intellectuelle, CPI) is the legal corpus that brings together all the laws and regulations relating to intellectual property and aimed at protecting “works of the mind”. It is divided into two branches:
The software is protected in France by copyright, but it has been adapted to the software in order to understand its technical aspect. Ideas are not protected under French law; this is the case for algorithms. On the contrary, the programs (source code and object code) are protected, as well as the user documentation.
Copyright confers two types of rights:
Economic rights have been adapted for software. The essential point is that these rights are automatically transferred to the employer when the authors have developed the software as part of an employed activity. It is the employer who alone decides on the life of the work (distribution, choice of license, etc.). Authors who are not paid employees remain holders of the economic rights on their software works and can transfer these rights to third parties under the conditions of their choice.
The protection is automatic; it is acquired as soon as the software is designed. If one wishes to validate this protection in a more formal way in the event of litigation, it is prudent to pre-constitute proof of paternity and anteriority. This can be done, for example, by filing with a notary, or with a specialized association such as the French Agency for the Protection of Programs (Agence pour la protection des programmes, APP).
Buying contains the idea of transfer of ownership (you buy a car, it becomes your property). When you purchase or download software to your personal computer, you do not buy the software, even though it is free; you sign a contract allowing you to use it. The license is an offer of contract from the supplier, which defines the conditions of use of this software (so it is important to read the content of the license, even though we do not always do it!). Anyone using, copying, modifying or distributing software without explicit permission of the holder of the economic rights is guilty of counterfeiting.
There are many types of software licenses:
The notion of free software was first described in the early 1980s by Richard Stallman, a researcher at MIT, who then formalized and popularized it with the GNU project and the Free Software Foundation (FSF).
Free software has four freedoms in common:
Free software does not necessarily imply free of charge; the FSF’s maxim on this point is “free as in free speech, not as in free beer”. However, this onerous nature should not restrict the freedoms provided by the license.
Born in 1998 from a split in the free software community in order to conduct a policy deemed more adapted to economic and technical realities, the open-source movement defends the freedom to access the sources of the programs used, in order to achieve a software economy dependent solely on the sale of services and no longer on the sale of user licenses.
However, it is not easy to tell the difference between free software and open source. According to Richard Stallman, the fundamental difference between the two concepts lies in their philosophy: “Open source is a development methodology; free software is a social movement.” In practice, most open-source licenses meet the FSF’s criteria for open-source software, with the various subtleties that distinguish them being mainly philosophical and commercial.
There are many free or open-source licenses. The most widely used license is the GNU GPL (General Public License). Many free or open-source licenses are close to each other; some are compatible with each other, others have real philosophical differences. In particular, some licenses have a “contaminating” effect. For example, if a developer integrates a programming library published under GPL version 2, he or she must publish his or her software under the same license, and therefore publish his or her sources. Hence, the importance of reading the license that comes with the software!
Software represents a considerable market both economically and in terms of jobs. Here are a few figures to better measure the importance of this sector.
According to Gartner Inc., global corporate spending on software was $419 billion in 2018. These expenditures included licenses acquired from software publishers (companies that design, develop and market software products) and development done in-house or outsourced to DSCs (Digital Services Companies). The United States largely dominates these activities, with IBM and Microsoft being the top two producers.
Cloud computing (Software as a Service) is disrupting the world of software publishers as more and more companies are renting software services on the Cloud instead of purchasing expensive licenses. The revenue associated with these services is estimated to account for around 10% of global software spending.
Note that the open-source market is in very good health according to ReportBuyer. Its worldwide revenues are expected to increase from $11.40 billion in 2017 to $32.95 billion in 2022, an average annual growth rate of 23.65%.
In their report on the top 250 French software publishers (TOP 250 des éditeurs de logiciels français, 2019), Syntec Numérique and EY showed that this sector was particularly dynamic. Indeed, the market is booming with a 12% increase compared to 2016, representing a total of 16 billion euros in revenues for the 300 companies consulted. In addition, this strong dynamic led to the creation of 12,700 jobs between 2016 and 2018. Innovation is at the heart of the development strategy of software publishers, as they devote 14% of their revenues to R&D. Leading the way are Dassault Systèmes (3D design, digital 3D modeling), Criteo (Internet advertising targeting for brands and advertisers) and Ubisoft (video games).