7

Working with Files and Streams

One of the most important parts of the C++ standard library is the input/output (I/O), stream-based library that enables developers to work with files, memory streams, or other types of I/O devices. The first part of this chapter provides solutions to some common stream operations, such as reading and writing data, localization settings, and manipulating the input and output of a stream. The second part of the chapter explores the new C++17 filesystem library that enables developers to perform operations with the filesystem and its objects, such as files and directories.

The recipes covered in this chapter are as follows:

  • Reading and writing raw data from/to binary files
  • Reading and writing objects from/to binary files
  • Using localized settings for streams
  • Using I/O manipulators to control the output of a stream
  • Using monetary I/O manipulators
  • Using time I/O manipulators
  • Working with filesystem paths
  • Creating, copying, and deleting files and directories
  • Removing content from a file
  • Checking the properties of an existing file or directory
  • Enumerating the content of a directory
  • Finding a file

We will start the chapter with a couple of recipes on how to serialize and deserialize data to/from files.

Reading and writing raw data from/to binary files

Some of the data programs you work with must be persisted to disk files in various ways, including storing data in a database or to flat files, either as text or binary data. This recipe, and the next one, are focused on persisting and loading both raw data and objects from and to binary files. In this context, raw data means unstructured data, and, in this recipe, we will consider writing and reading the content of a buffer (that is, a contiguous sequence of memory) that can either be an array, an std::vector, or an std::array.

Getting ready

For this recipe, you should be familiar with the standard stream I/O library, although some explanations, to the extent that is required to understand this recipe, are provided next. You should also be familiar with the differences between binary and text files.

In this recipe, we will use the ofstream and ifstream classes, which are available in the namespace std in the <fstream> header.

How to do it...

To write the content of a buffer (in our example, an std::vector) to a binary file, you should perform the following steps:

  1. Open a file stream for writing in binary mode by creating an instance of the std::ofstream class:
    std::ofstream ofile("sample.bin", std::ios::binary);
    
  2. Ensure that the file is actually open before writing data to the file:
    if(ofile.is_open())
    {
      // streamed file operations
    }
    
  3. Write the data to the file by providing a pointer to the array of characters and the number of characters to write. In the following example, we write the content of a local vector; however, typically, this data comes from a different context:
    std::vector<unsigned char> output {0,1,2,3,4,5,6,7,8,9};
    ofile.write(reinterpret_cast<char*>(output.data()),
                output.size());
    
  4. Optionally, you can flush the content of the stream's output buffer to the actual disk file by calling the flush() method. This determines the uncommitted changes in the stream to be synchronized with the external destination, which, in this case, is a disk file.
  5. Close the stream by calling close(). This, in turn, calls flush(), making the preceding step unnecessary in most contexts:
    ofile.close();
    

To read the entire content of a binary file to a buffer, you should perform the following steps:

  1. Open a file stream to read from a file in binary mode by creating an instance of the std::ifstream class:
    std::ifstream ifile("sample.bin", std::ios::binary);
    
  2. Ensure that the file is actually open before reading data from it:
    if(ifile.is_open())
    {
      // streamed file operations
    }
    
  3. Determine the length of the file by positioning the input position indicator to the end of the file, read its value, and then move the indicator to the beginning:
    ifile.seekg(0, std::ios_base::end);
    auto length = ifile.tellg();
    ifile.seekg(0, std::ios_base::beg);
    
  4. Allocate memory to read the content of the file:
    std::vector<unsigned char> input;
    input.resize(static_cast<size_t>(length));
    
  5. Read the content of the file to the allocated buffer by providing a pointer to the array of characters for receiving the data and the number of characters to read:
    ifile.read(reinterpret_cast<char*>(input.data()), length);
    
  6. Check that the read operation is completed successfully:
    auto success = !ifile.fail() && length == ifile.gcount();
    
  7. Finally, close the file stream:
    ifile.close();
    

How it works...

The standard stream-based I/O library provides various classes that implement high-level input, output, or both input and output file stream, string stream and character array operations, manipulators that control how these streams behave, and several predefined stream objects (cin/wcin, cout/wcout, cerr/wcerr, and clog/wclog).

These streams are implemented as class templates, and, for files, the library provides several classes:

  • basic_filebuf implements the I/O operations for a raw file and is similar in semantics to a C FILE stream.
  • basic_ifstream implements the high-level file stream input operations defined by the basic_istream stream interface, internally using a basic_filebuf object.
  • basic_ofstream implements the high-level file stream output operations defined by the basic_ostream stream interface, internally using a basic_filebuf object.
  • basic_fstream implements the high-level file stream input and output operations defined by the basic_iostream stream interface, internally using a basic_filebuf object.

These classes are represented in the following class diagram to better understand their relationship:

Figure 7.1: Stream class diagram

Notice that this diagram also features several classes designed to work with a string-based stream. These streams, however, will not be discussed here.

Several typedefs for the class templates mentioned earlier are also defined in the <fstream> header, in the std namespace. The ofstream and ifstream objects are the type synonyms used in the preceding examples:

typedef basic_ifstream<char>    ifstream;
typedef basic_ifstream<wchar_t> wifstream;
typedef basic_ofstream<char>    ofstream;
typedef basic_ofstream<wchar_t> wofstream;
typedef basic_fstream<char>     fstream;
typedef basic_fstream<wchar_t>  wfstream;

In the previous section, you saw how we can write and read raw data to and from a file stream. Now, we'll cover this process in more detail.

To write data to a file, we instantiated an object of the type std::ofstream. In the constructor, we passed the name of the file to be opened and the stream's open mode, for which we specified std::ios::binary to indicate binary mode. Opening the file like this discards the previous file content. If you want to append content to an existing file, you should also use the flag std::ios::app (that is, std::ios::app | std::ios::binary). This constructor internally calls open() on its underlying raw file object, that is, a basic_filebuf object. If this operation fails, a fail bit is set. To check whether the stream has been successfully associated with a file device, we used is_open() (this internally calls the method with the same name from the underlying basic_filebuf). Writing data to the file stream is done using the write() method, which takes a pointer to the string of characters to write and the number of characters to write. Since this method operates with strings of characters, a reinterpret_cast is necessary if data is of another type, such as unsigned char in our example. The write operation does not set a fail bit in the case of a failure, but it may throw an std::ios_base::failure exception. However, data is not written directly to the file device but stored in the basic_filebuf object. To write it to the file, the buffer needs to be flushed, which is done by calling flush(). This is done automatically when closing the file stream, as shown in the preceding example.

To read data from a file, we instantiated an object of type std::ifstream. In the constructor, we passed the same arguments that we used for opening the file to write the name of the file and the open mode, that is, std::ios::binary. The constructor internally calls open() on the underlying std::basic_filebuf object. To check whether the stream has been successfully associated with a file device, we use is_open() (this internally calls the method with the same name from the underlying basic_filebuf). In this example, we read the entire content of the file to a memory buffer, in particular, an std::vector. Before we can read the data, we must know the size of the file in order to allocate a buffer that is large enough to hold that data. To do this, we used seekg() to move the input position indicator to the end of the file.

Then, we called tellg() to return the current position, which, in this case, indicates the size of the file, in bytes, and then we moved the input position indicator to the beginning of the file to be able to start reading from the beginning. Calling seekg() to move the position indicator to the end can be avoided by opening the file with the position indicator moved directly to the end. This can be achieved by using the std::ios::ate opening flag in the constructor (or the open() method). After allocating enough memory for the content of the file, we copied the data from the file into memory using the read() method. This takes a pointer to the string of characters that receives the data read from the stream and the number of characters to be read. Since the stream operates on characters, a reinterpret_cast expression is necessary if the buffer contains other types of data, such as unsigned char in our example.

This operation throws an std::basic_ios::failure exception if an error occurs. To determine the number of characters that have been successfully read from the stream, we can use the gcount() method. Upon completing the read operation, we close the file stream.

The operations shown in these examples are the minimum ones required to write and read data to and from file streams. It is important, though, that you perform appropriate checks for the success of the operations and to catch any possible exceptions that could occur.

The example code discussed so far in this recipe can be reorganized in the form of two general functions for writing and reading data to and from a file:

bool write_data(char const * const filename,
                char const * const data,
                size_t const size)
{
  auto success = false;
  std::ofstream ofile(filename, std::ios::binary);
  if(ofile.is_open())
  {
    try
    {
      ofile.write(data, size);
      success = true;
    }
    catch(std::ios_base::failure &)
    {
      // handle the error
    }
    ofile.close();
  }
  return success;
}
size_t read_data(char const * const filename,
                 std::function<char*(size_t const)> allocator)
{
  size_t readbytes = 0;
  std::ifstream ifile(filename, std::ios::ate | std::ios::binary);
  if(ifile.is_open())
  {
    auto length = static_cast<size_t>(ifile.tellg());
    ifile.seekg(0, std::ios_base::beg);
    auto buffer = allocator(length);
    try
    {
      ifile.read(buffer, length);
      readbytes = static_cast<size_t>(ifile.gcount());
    }
    catch (std::ios_base::failure &)
    {
      // handle the error
    }
    ifile.close();
  }
  return readbytes;
}

write_data() is a function that takes the name of a file, a pointer to an array of characters, and the length of this array as arguments and writes the characters to the specified file. read_data() is a function that takes the name of a file and a function that allocates a buffer and reads the entire content of the file to the buffer that is returned by the allocated function. The following is an example of how these functions can be used:

std::vector<unsigned char> output {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
std::vector<unsigned char> input;
if(write_data("sample.bin",
              reinterpret_cast<char*>(output.data()),
              output.size()))
{
  if(read_data("sample.bin",
               [&input](size_t const length) {
    input.resize(length);
    return reinterpret_cast<char*>(input.data());}) > 0)
  {
    std::cout << (output == input ? "equal": "not equal")
              << '
';
  }
}

Alternatively, we could use a dynamically allocated buffer, instead of the std::vector; the changes required for this are small in the overall example:

std::vector<unsigned char> output {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
unsigned char* input = nullptr;
size_t readb = 0;
if(write_data("sample.bin",
              reinterpret_cast<char*>(output.data()),
              output.size()))
{
  if((readb = read_data(
     "sample.bin",
     [&input](size_t const length) {
       input = new unsigned char[length];
       return reinterpret_cast<char*>(input); })) > 0)
  {
    auto cmp = memcmp(output.data(), input, output.size());
    std::cout << (cmp == 0 ? "equal": "not equal")
              << '
';
  }
}
delete [] input;

However, this alternative is only provided to show that read_data() can be used with different kinds of input buffers. It is recommended that you avoid the explicit dynamic allocation of memory whenever possible.

There's more...

The way of reading data from a file to memory, as shown in this recipe, is only one of several. The following is a list of possible alternatives for reading data from a file stream:

  • Initializing an std::vector directly using std::istreambuf_iterator iterators (similarly, this can be used with std::string):
    std::vector<unsigned char> input;
    std::ifstream ifile("sample.bin", std::ios::binary);
    if(ifile.is_open())
    {
      input = std::vector<unsigned char>(
        std::istreambuf_iterator<char>(ifile),
        std::istreambuf_iterator<char>());
      ifile.close();
    }
    
  • Assigning the content of an std::vector from std::istreambuf_iterator iterators:
    std::vector<unsigned char> input;
    std::ifstream ifile("sample.bin", std::ios::binary);
    if(ifile.is_open())
    {
      ifile.seekg(0, std::ios_base::end);
      auto length = ifile.tellg();
      ifile.seekg(0, std::ios_base::beg);
      input.reserve(static_cast<size_t>(length));
        input.assign(
        std::istreambuf_iterator<char>(ifile),
        std::istreambuf_iterator<char>());
      ifile.close();
    }
    
  • Copying the content of the file stream to a vector using std::istreambuf_iterator iterators and an std::back_inserter adapter to write to the end of the vector:
    std::vector<unsigned char> input;
    std::ifstream ifile("sample.bin", std::ios::binary);
    if(ifile.is_open())
    {
      ifile.seekg(0, std::ios_base::end);
      auto length = ifile.tellg();
      ifile.seekg(0, std::ios_base::beg);
      input.reserve(static_cast<size_t>(length));
      std::copy(std::istreambuf_iterator<char>(ifile),
                std::istreambuf_iterator<char>(),
                std::back_inserter(input));
      ifile.close();
    }
    

Compared to these alternatives, however, the method described in the How to do it... section is the fastest one, even though the alternatives may look more appealing from an object-oriented perspective. It is beyond the scope of this recipe to compare the performance of these alternatives, but you can try it as an exercise.

See also

  • Reading and writing objects from/to binary files to learn how to serialize and deserialize objects to and from binary files
  • Using I/O manipulators to control the output of a stream to learn about the use of helper functions, called manipulators, that control input and output streams using the << and >> stream operators

Reading and writing objects from/to binary files

In the previous recipe, we learned how to write and read raw data (that is, unstructured data) to and from a file. Many times, however, we must persist and load objects instead. Writing and reading in the manner shown in the previous recipe works for POD types only. For anything else, we must explicitly decide what is actually written or read, since writing or reading pointers, virtual tables (vtables), and any sort of metadata is not only irrelevant but also semantically wrong. These operations are commonly referred to as serialization and deserialization. In this recipe, we will learn how to serialize and deserialize both POD and non-POD types to and from binary files.

Getting ready

For the examples in this recipe, we will use the foo and foopod classes, as follows:

class foo
{
  int i;
  char c;
  std::string s;
public:
  foo(int const i = 0, char const c = 0, std::string const & s = {}):
    i(i), c(c), s(s)
  {}
  foo(foo const &) = default;
  foo& operator=(foo const &) = default;
  bool operator==(foo const & rhv) const
  {
    return i == rhv.i &&
           c == rhv.c &&
           s == rhv.s;
  }
  bool operator!=(foo const & rhv) const
  {
    return !(*this == rhv);
  }
};
struct foopod
{
  bool a;
  char b;
  int c[2];
};
bool operator==(foopod const & f1, foopod const & f2)
{
  return f1.a == f2.a && f1.b == f2.b &&
         f1.c[0] == f2.c[0] && f1.c[1] == f2.c[1];
}

It is recommended that you first read the previous recipe, Reading and writing raw data from/to binary files, before you continue. You should also know what POD (a type that is both trivial and has a standard layout) and non-POD types are and how operators can be overloaded. You can check the closing notes of the Using type traits to query properties of types recipe, in Chapter 6, General-Purpose Utilities, for further details on POD types.

How to do it...

To serialize/deserialize POD types that do not contain pointers, use ofstream::write() and ifstream::read(), as shown in the previous recipe:

  • Serialize objects to a binary file using ofstream and the write() method:
    std::vector<foopod> output {
      {true, '1', {1, 2}},
      {true, '2', {3, 4}},
      {false, '3', {4, 5}}
    };
    std::ofstream ofile("sample.bin", std::ios::binary);
    if(ofile.is_open())
    {
      for(auto const & value : output)
      {
        ofile.write(reinterpret_cast<const char*>(&value),
                    sizeof(value));
      }
      ofile.close();
    }
    
  • Deserialize objects from a binary file using the ifstream and read() methods:
    std::vector<foopod> input;
    std::ifstream ifile("sample.bin", std::ios::binary);
    if(ifile.is_open())
    {
      while(true)
      {
        foopod value;
        ifile.read(reinterpret_cast<char*>(&value),
                   sizeof(value));
        if(ifile.fail() || ifile.eof()) break;
        input.push_back(value);
      }
      ifile.close();
    }
    

To serialize non-POD types (or POD types that contain pointers), you must explicitly write the value of the data members to a file, and to deserialize, you must explicitly read from the file to the data members in the same order. To demonstrate this, we will consider the foo class that we defined earlier:

  • Add a member function called write() to serialize objects of this class. The method takes a reference to an ofstream and returns a bool indicating whether the operation was successful or not:
    bool write(std::ofstream& ofile) const
    {
      ofile.write(reinterpret_cast<const char*>(&i), sizeof(i));
      ofile.write(&c, sizeof(c));
      auto size = static_cast<int>(s.size());
      ofile.write(reinterpret_cast<char*>(&size), sizeof(size));
      ofile.write(s.data(), s.size());
      return !ofile.fail();
    }
    
  • Add a member function, called read(), to deserialize the objects of this class. This method takes a reference to an ifstream and returns a bool indicating whether the operation was successful or not:
    bool read(std::ifstream& ifile)
    {
      ifile.read(reinterpret_cast<char*>(&i), sizeof(i));
      ifile.read(&c, sizeof(c));
      auto size {0};
      ifile.read(reinterpret_cast<char*>(&size), sizeof(size));
      s.resize(size);
      ifile.read(reinterpret_cast<char*>(&s.front()), size);
      return !ifile.fail();
    }
    

An alternative to the write() and read() member functions demonstrated earlier is to overload operator<< and operator>>. To do this, you should perform the following steps:

  1. Add friend declarations for the non-member operator<< and operator>> to the class to be serialized/deserialized (in this case, the foo class):
    friend std::ofstream& operator<<(std::ofstream& ofile,
                                     foo const& f);
    friend std::ifstream& operator>>(std::ifstream& ifile,
                                     foo& f);
    
  2. Overload operator<< for your class:
    std::ofstream& operator<<(std::ofstream& ofile, foo const& f)
    {
      ofile.write(reinterpret_cast<const char*>(&f.i),
                  sizeof(f.i));
      ofile.write(&f.c, sizeof(f.c));
      auto size = static_cast<int>(f.s.size());
      ofile.write(reinterpret_cast<char*>(&size), sizeof(size));
      ofile.write(f.s.data(), f.s.size());
      return ofile;
    }
    
  3. Overload operator>> for your class:
    std::ifstream& operator>>(std::ifstream& ifile, foo& f)
    {
      ifile.read(reinterpret_cast<char*>(&f.i), sizeof(f.i));
      ifile.read(&f.c, sizeof(f.c));
      auto size {0};
      ifile.read(reinterpret_cast<char*>(&size), sizeof(size));
      f.s.resize(size);
      ifile.read(reinterpret_cast<char*>(&f.s.front()), size);
      return ifile;
    }
    

How it works...

Regardless of whether we serialize the entire object (for POD types) or only parts of it, we use the same stream classes that we discussed in the previous recipe: ofstream for output file streams and ifstream for input file streams. Details about writing and reading data using these standard classes have been discussed in that recipe and will not be reiterated here.

When you serialize and deserialize objects to and from files, you should avoid writing the values of the pointers to a file. Additionally, you must not read pointer values from the file since these represent memory addresses and are meaningless across processes and even in the same process some moments later. Instead, you should write data referred by a pointer and read data into objects referred by a pointer.

This is a general principle, and, in practice, you may encounter situations where a source may have multiple pointers to the same object; in this case, you might want to write only one copy and also handle the reading in a corresponding manner.

If the objects you want to serialize are of the POD type, you can do it just like we did when we discussed raw data. In the example in this recipe, we serialized a sequence of objects of the foopod type. When we deserialize, we read from the file stream in a loop until the end of the file is read or a failure occurs. The way we read, in this case, may look counterintuitive, but doing it differently may lead to the duplication of the last read value:

  • Reading is done in an infinite loop
  • A read operation is performed in the loop
  • A check for a failure or the end of file is performed, and if either of them has occurred, the infinite loop is exited
  • The value is added to the input sequence and the looping continues

If reading is done using a loop with an exit condition that checks the end of the file bit, that is, while(!ifile.eof()), the last value will be added to the input sequence twice. The reason for this is that upon reading the last value, the end of the file has not yet been encountered (as that is a mark beyond the last byte of the file). The end of the file mark is only reached at the next read attempt, which, therefore, sets the eofbit of the stream. However, the input variable still has the last value since it hasn't been overwritten with anything, and this is added to the input vector for a second time.

If the objects you want to serialize and deserialize are of non-POD types, writing/reading these objects as raw data is not possible. For instance, such an object may have a virtual table. Writing the virtual table to a file does not cause problems, even though it does not have any value; however, reading from a file, and, therefore, overwriting the virtual table of an object will have catastrophic effects on the object and the program.

When serializing/deserializing non-POD types, there are various alternatives, and some of them have been discussed in the previous section. All of them provide explicit methods for writing and reading or overloading the standard << and >> operators. The second approach has an advantage in that it enables the use of your class in generic code, where objects are written and read to and from stream files using these operators.

When you plan to serialize and deserialize your objects, consider versioning your data from the very beginning to avoid problems if the structure of your data changes over time. How versioning should be done is beyond the scope of this recipe.

See also

  • Reading and writing raw data from/to binary files to learn how to write and read unstructured data to binary files
  • Using I/O manipulators to control the output of a stream to learn about the use of helper functions, called manipulators, that control input and output streams using the << and >> stream operators

Using localized settings for streams

How writing or reading to and from streams is performed may depend on the language and regional settings. Examples include writing and parsing numbers, time values, or monetary values, or comparing (collating) strings. The C++ I/O library provides a general-purpose mechanism for handling internationalization features through locales and facets. In this recipe, you will learn how to use locales to control the behavior of input/output streams.

Getting ready

All of the examples in this recipe use the std::cout predefined console stream object. However, the same applies to all I/O stream objects. Also, in these recipe examples, we will use the following objects and lambda function:

auto now = std::chrono::system_clock::now();
auto stime = std::chrono::system_clock::to_time_t(now);
auto ltime = std::localtime(&stime);
std::vector<std::string> names
  {"John", "adele", "Øivind", "François", "Robert", "Åke"};
auto sort_and_print = [](std::vector<std::string> v,
                         std::locale const & loc)
{
  std::sort(v.begin(), v.end(), loc);
  for (auto const & s : v) std::cout << s << ' ';
  std::cout << '
';
};

The locale names used in this recipe (en_US.utf8, de_DE.utf8, and so on) are the ones that are used on UNIX systems. The following table lists their equivalents for Windows systems:

UNIX

Windows

en_US.utf8

English_US.1252

en_GB.utf8

English_UK.1252

de_DE.utf8

German_Germany.1252

sv_SE.utf8

Swedish_Sweden.1252

How to do it...

To control the localization settings of a stream, you must do the following:

  • Use the std::locale class to represent the localization settings. There are various ways in which to construct locale objects, including the following:
    • Default construct it to use the global locale (by default, the C locale at the program startup)
    • From a local name, such as C, POSIX, en_US.utf8, and so on, if supported by the operating system
    • From another locale, except for a specified facet
    • From another locale, except for all of the facets from a specified category that are copied from another specified locale:
      // default construct
      auto loc_def = std::locale {};
      // from a name
      auto loc_us = std::locale {"en_US.utf8"};
      // from another locale except for a facet
      auto loc1 = std::locale {loc_def,
                               new std::collate<wchar_t>};
      // from another local, except the facet in a category
      auto loc2 = std::locale {loc_def, loc_us,
                               std::locale::collate};
      
  • To get a copy of the default C locale, use the std::locale::classic() static method:
    auto loc = std::locale::classic();
    
  • To change the default locale that is copied every time a locale is default-constructed, use the std::locale::global() static method:
    std::locale::global(std::locale("en_US.utf8"));
    
  • Use the imbue() method to change the current locale of an I/O stream:
    std::cout.imbue(std::locale("en_US.utf8"));
    

The following list shows examples of using various locales:

  • Use a particular locale, indicated by its name. In this example, the locale is for German:
    auto loc = std::locale("de_DE.utf8");
    std::cout.imbue(loc);
    std::cout << 1000.50 << '
    ';
    // 1.000,5
    std::cout << std::showbase << std::put_money(1050)
              << '
    ';
    // 10,50 €
    std::cout << std::put_time(ltime, "%c") << '
    ';
    // So 04 Dez 2016 17:54:06 JST
    sort_and_print(names, loc);
    // adele Åke François John Øivind Robert
    
  • Use a locale that corresponds to the user settings (as defined in the system). This is done by constructing an std::locale object from an empty string:
    auto loc = std::locale("");
    std::cout.imbue(loc);
    std::cout << 1000.50 << '
    ';
    // 1,000.5
    std::cout << std::showbase << std::put_money(1050)
              << '
    ';
    // $10.50
    std::cout << std::put_time(ltime, "%c") << '
    ';
    // Sun 04 Dec 2016 05:54:06 PM JST
    sort_and_print(names, loc);
    // adele Åke François John Øivind Robert
    
  • Set and use the global locale:
    std::locale::global(std::locale("sv_SE.utf8")); // set global
    auto loc = std::locale{};                       // use global
    std::cout.imbue(loc);
    std::cout << 1000.50 << '
    ';
    // 1 000,5
    std::cout << std::showbase << std::put_money(1050)
              << '
    ';
    // 10,50 kr
    std::cout << std::put_time(ltime, "%c") << '
    ';
    // sön 4 dec 2016 18:02:29
    sort_and_print(names, loc);
    // adele François John Robert Åke Øivind
    
  • Use the default C locale:
    auto loc = std::locale::classic();
    std::cout.imbue(loc);
    std::cout << 1000.50 << '
    ';
    // 1000.5
    std::cout << std::showbase << std::put_money(1050)
              << '
    ';
    // 1050
    std::cout << std::put_time(ltime, "%c") << '
    ';
    // Sun Dec 4 17:55:14 2016
    sort_and_print(names, loc);
    // François John Robert adele Åke Øivind
    

How it works...

A locale object does not actually store localized settings. A locale is a heterogeneous container of facets. A facet is an object that defines the localization and internationalization settings. The standard defines a list of facets that each locale must contain. In addition to this, a locale can contain any other user-defined facets. The following is a list of all standard-defined facets:

std::collate<char>

std::collate<wchar_t>

std::ctype<char>

std::ctype<wchar_t>

std::codecvt<char,char,mbstate_t>

std::codecvt<char16_t,char,mbstate_t>

std::codecvt<char32_t,char,mbstate_t>

std::codecvt<wchar_t,char,mbstate_t>

std::moneypunct<char>

std::moneypunct<char,true>

std::moneypunct<wchar_t>

std::moneypunct<wchar_t,true>

std::money_get<char>

std::money_get<wchar_t>

std::money_put<char>

std::money_put<wchar_t>

std::numpunct<char>

std::numpunct<wchar_t>

std::num_get<char>

std::num_get<wchar_t>

std::num_put<char>

std::num_put<wchar_t>

std::time_get<char>

std::time_get<wchar_t>

std::time_put<char>

std::time_put<wchar_t>

std::messages<char>

std::messages<wchar_t>

It is beyond the scope of this recipe to go through this list and discuss all of these facets. However, we could mention that std::money_get is a facet that encapsulates the rules for parsing monetary values from character streams, while std::money_put is a facet that encapsulates the rules for formatting monetary values as strings. In a similar manner, std::time_get encapsulates rules for data and time parsing, while std::time_put encapsulates rules for data and time formatting. These will form the subject of the next couple of recipes.

A locale is an immutable object containing immutable facet objects. Locales are implemented as a reference-counted array of reference-counted pointers to facets. The array is indexed by std::locale::id, and all facets must be derived from the base class std::locale::facet and must have a public static member of the std::locale::id type, called id.

It is only possible to create a locale object using one of the overloaded constructors or with the combine() method, which, as the name implies, combines the current locale with a new compile-time identifiable facet and returns a new locale object. On the other hand, it is possible to determine whether a locale contains a particular facet using the std::has_facet() function template, or to obtain a reference to a facet implemented by a particular locale using the std::use_facet() function template.

In the preceding examples, we sorted a vector of strings and passed a locale object as the third argument to the std::sort() general algorithm. This third argument is supposed to be a comparison function object. Passing a locale object works because std::locale has an operator() that lexicographically compares two strings using its collate facet. This is actually the only localization functionality that is directly provided by std::locale; however, what this does is invoke the collate facet's compare() method that performs the string comparison based on the facet's rules.

Every program has a global locale created when the program starts. The content of this global locale is copied into every default-constructed locale. The global locale can be replaced using the static method std::locale::global(). By default, the global locale is the C locale, which is a locale equivalent to ANSI C's locale with the same name. This locale was created to handle simple English texts, and it is the default one in C++ that provides compatibility with C. A reference to this locale can be obtained with the static method std::locale::classic().

By default, all streams use the classic locale to write or parse text. However, it is possible to change the locale used by a stream using the stream's imbue() method. This is a member of the std::ios_base class that is the base for all I/O streams. A companion member is the getloc() method, which returns a copy of the current stream's locale.

In the preceding examples, we changed the locale for the std::cout stream object. In practice, you may want to set the same locale for all stream objects associated with the standard C streams: cin, cout, cerr, and clog (or wcin, wcout, wcerr, and wclog).

See also

  • Using I/O manipulators to control the output of a stream to learn about the use of helper functions, called manipulators, that control input and output streams using the << and >> stream operators
  • Using monetary I/O manipulators to learn how to use standard manipulators to write and read monetary values
  • Using time I/O manipulators to learn how to use standard manipulators to write and read date and time values

Using I/O manipulators to control the output of a stream

Apart from the stream-based I/O library, the standard library provides a series of helper functions, called manipulators, that control the input and output streams using operator<< and operator>>. In this recipe, we will look at some of these manipulators and demonstrate their use through some examples that format the output to the console. We will continue covering more manipulators in the upcoming recipes.

Getting ready

The I/O manipulators are available in the std namespace in the headers <ios>, <istream>, <ostream>, and <iomanip>. In this recipe, we will only discuss some of the manipulators from <ios> and <iomanip>.

How to do it...

The following manipulators can be used to control the output or input of a stream:

  • boolalpha and noboolalpha enable and disable the textual representation of Booleans:
    std::cout << std::boolalpha << true << '
    ';    // true
    std::cout << false << '
    ';                     // false
    std::cout << std::noboolalpha << false << '
    '; // 0
    
  • left, right, and internal affect the alignment of the fill characters; left and right affect all text, but internal affects only the integer, floating point, and monetary output:
    std::cout << std::right << std::setw(10) << "right
    ";
    std::cout << std::setw(10) << "text
    ";
    std::cout << std::left << std::setw(10) << "left
    ";
    
  • fixed, scientific, hexfloat, and defaultfloat change the formatting used for floating-point types (for both the input and output streams). The latter two have only been available since C++11:
    std::cout << std::fixed << 0.25 << '
    ';
    // 0.250000
    std::cout << std::scientific << 0.25 << '
    ';
    // 2.500000e-01
    std::cout << std::hexfloat << 0.25 << '
    ';
    // 0x1p-2
    std::cout << std::defaultfloat << 0.25 << '
    ';
    // 0.25
    
  • dec, hex, and oct control the base that is used for the integer types (in both the input and output streams):
    std::cout << std::oct << 42 << '
    '; // 52
    std::cout << std::hex << 42 << '
    '; // 2a
    std::cout << std::dec << 42 << '
    '; // 42
    
  • setw changes the width of the next input or output field. The default width is 0.
  • setfill changes the fill character for the output stream; this is the character that is used to fill the next fields until the specified width is reached. The default fill character is whitespace:
    std::cout << std::right
              << std::setfill('.') << std::setw(10)
              << "right" << '
    ';
    // .....right
    
  • setprecision changes the decimal precision (how many digits are generated) for the floating-point types in both the input and output streams. The default precision is 6:
    std::cout << std::fixed << std::setprecision(2) << 12.345
              << '
    ';
    // 12.35
    

How it works...

All of the I/O manipulators listed earlier, with the exception of setw, which only refers to the next output field, affect the stream. Additionally, all consecutive writing or reading operations use the last specified format until another manipulator is used again.

Some of these manipulators are called without arguments. Examples include boolalpha/noboolalpha or dec/hex/oct. These manipulators are functions that take a single argument, that is, a reference to a string, and return a reference to the same stream:

std::ios_base& hex(std::ios_base& str);

Expressions, such as std::cout << std::hex, are possible because both basic_ostream::operator<< and basic_istream::operator>> have special overloads that take a pointer to these functions.

Other manipulators, including some that are not mentioned here, are invoked with arguments. These manipulators are functions that take one or more arguments and return an object of an unspecified type:

template<class CharT>
/*unspecified*/ setfill(CharT c);

To better demonstrate the use of these manipulators, we will consider two examples that format output to the console.

In the first example, we will list the table of contents of a book with the following requirements:

  • The chapter number is right-aligned and shown with Roman numerals.
  • The chapter title is left-aligned and the remaining space until the page number is filled with dots.
  • The page number of the chapter is right-aligned.

For this example, we will use the following classes and helper function:

struct Chapter
{
  int Number;
  std::string Title;
  int Page;
};
struct BookPart
{
  std::string Title;
  std::vector<Chapter> Chapters;
};
struct Book
{
  std::string Title;
  std::vector<BookPart> Parts;
};
std::string to_roman(unsigned int value)
{
  struct roman_t { unsigned int value; char const* numeral; };
  const static roman_t rarr[13] =
  {
    {1000, "M"}, {900, "CM"}, {500, "D"}, {400, "CD"},
    {100, "C"}, { 90, "XC"}, { 50, "L"}, { 40, "XL"},
    { 10, "X"}, { 9, "IX"}, { 5, "V"}, { 4, "IV"},
    { 1, "I"}
  };
  std::string result;
  for (auto const & number : rarr)
  {
    while (value >= number.value)
    {
      result += number.numeral;
      value -= number.value;
    }
  }
  return result;
}

The print_toc() function, as shown in the following code snippet, takes a Book as its argument and prints its content to the console according to the specified requirements. For this purpose, we use the following:

  • std::left and std::right specify the text alignment
  • std::setw specifies the width of each output field
  • std::fill specifies the fill character (a blank space for the chapter number and a dot for the chapter title)

The implementation of the print_toc() function is listed here:

void print_toc(Book const & book)
{
  std::cout << book.Title << '
';
  for(auto const & part : book.Parts)
  {
    std::cout << std::left << std::setw(15) << std::setfill(' ')
              << part.Title << '
';
    std::cout << std::left << std::setw(15) << std::setfill('-')
              << '-' << '
';
    for(auto const & chapter : part.Chapters)
    {
      std::cout << std::right << std::setw(4) << std::setfill(' ')
                << to_roman(chapter.Number) << ' ';
      std::cout << std::left << std::setw(35) << std::setfill('.')
                << chapter.Title;
      std::cout << std::right << std::setw(3) << std::setfill('.')
                << chapter.Page << '
';
    }
  }
}

The following example uses this method with a Book object describing the table of contents from the book The Fellowship of the Ring:

auto book = Book
{
  "THE FELLOWSHIP OF THE RING"s,
  {
    {
      "BOOK ONE"s,
      {
        {1, "A Long-expected Party"s, 21},
        {2, "The Shadow of the Past"s, 42},
        {3, "Three Is Company"s, 65},
        {4, "A Short Cut to Mushrooms"s, 86},
        {5, "A Conspiracy Unmasked"s, 98},
        {6, "The Old Forest"s, 109},
        {7, "In the House of Tom Bombadil"s, 123},
        {8, "Fog on the Barrow-downs"s, 135},
        {9, "At the Sign of The Prancing Pony"s, 149},
        {10, "Strider"s, 163},
        {11, "A Knife in the Dark"s, 176},
        {12, "Flight to the Ford"s, 197},
      },
    },
    {
      "BOOK TWO"s,
      {
        {1, "Many Meetings"s, 219},
        {2, "The Council of Elrond"s, 239},
        {3, "The Ring Goes South"s, 272},
        {4, "A Journey in the Dark"s, 295},
        {5, "The Bridge of Khazad-dum"s, 321},
        {6, "Lothlorien"s, 333},
        {7, "The Mirror of Galadriel"s, 353},
        {8, "Farewell to Lorien"s, 367},
        {9, "The Great River"s, 380},
        {10, "The Breaking of the Fellowship"s, 390},
      },
    },
  }
};
print_toc(book);

In this case, the output is as follows:

THE FELLOWSHIP OF THE RING
BOOK ONE
---------------
   I A Long-expected Party...............21
  II The Shadow of the Past..............42
 III Three Is Company....................65
  IV A Short Cut to Mushrooms............86
   V A Conspiracy Unmasked...............98
  VI The Old Forest.....................109
 VII In the House of Tom Bombadil.......123
VIII Fog on the Barrow-downs............135
  IX At the Sign of The Prancing Pony...149
   X Strider............................163
  XI A Knife in the Dark................176
 XII Flight to the Ford.................197
BOOK TWO
---------------
   I Many Meetings......................219
  II The Council of Elrond..............239
 III The Ring Goes South................272
  IV A Journey in the Dark..............295
   V The Bridge of Khazad-dum...........321
  VI Lothlorien.........................333
 VII The Mirror of Galadriel............353
VIII Farewell to Lorien.................367
  IX The Great River....................380
   X The Breaking of the Fellowship.....390

For the second example, our goal is to output a table that lists the largest companies in the world by revenue. The table will have columns for the company name, the industry, the revenue (in USD billions), the increase/decrease in revenue growth, the revenue growth, the number of employees, and the country of origin. For this example, we will use the following class:

struct Company
{
  std::string Name;
  std::string Industry;
  double      Revenue;
  bool        RevenueIncrease;
  double      Growth;
  int         Employees;
  std::string Country;
};

The print_companies() function in the following code snippet uses several additional manipulators to the ones shown in the previous example:

  • std::boolalpha displays Boolean values as true and false instead of 1 and 0.
  • std::fixed indicates a fixed floating-point representation, and then std::defaultfloat reverts to the default floating-point representation.
  • std::setprecision specifies the number of decimal digits to be displayed in the output. Together with std::fixed, this is used to indicate a fixed representation with a decimal digit for the Growth field.

The implementation of the print_companies() function is listed here:

void print_companies(std::vector<Company> const & companies)
{
  for(auto const & company : companies)
  {
    std::cout << std::left << std::setw(26) << std::setfill(' ')
              << company.Name;
    std::cout << std::left << std::setw(18) << std::setfill(' ')
              << company.Industry;
    std::cout << std::left << std::setw(5) << std::setfill(' ')
              << company.Revenue;
    std::cout << std::left << std::setw(5) << std::setfill(' ')
              << std::boolalpha << company.RevenueIncrease
              << std::noboolalpha;
    std::cout << std::right << std::setw(5) << std::setfill(' ')
              << std::fixed << std::setprecision(1) << company.Growth
              << std::defaultfloat << std::setprecision(6) << ' ';
    std::cout << std::right << std::setw(8) << std::setfill(' ')
              << company.Employees << ' ';
    std::cout << std::left << std::setw(2) << std::setfill(' ')
              << company.Country
              << '
';
  }
}

The following is an example of calling this method. The source of the data shown here is Wikipedia (https://en.wikipedia.org/wiki/List_of_largest_companies_by_revenue, as of 2016):

std::vector<Company> companies
{
  {"Walmart"s, "Retail"s, 482, false, 0.71,
    2300000, "US"s},
  {"State Grid"s, "Electric utility"s, 330, false, 2.91,
    927839, "China"s},
  {"Saudi Aramco"s, "Oil and gas"s, 311, true, 40.11,
    65266, "SA"s},
  {"China National Petroleum"s, "Oil and gas"s, 299,
    false, 30.21, 1589508, "China"s},
  {"Sinopec Group"s, "Oil and gas"s, 294, false, 34.11,
    810538, "China"s},
};
print_companies(companies);

In this case, the output has a table-based format, as follows:

Walmart                   Retail            482  false  0.7  2300000 US
State Grid                Electric utility  330  false  2.9   927839 China
Saudi Aramco              Oil and gas       311  true  40.1    65266 SA
China National Petroleum  Oil and gas       299  false 30.2  1589508 China
Sinopec Group             Oil and gas       294  false 34.1   810538 China

As an exercise, you can try adding a table heading or even a grid line to precede these lines for a better tabulation of the data.

See also

  • Reading and writing raw data from/to binary files to learn how to write and read unstructured data to binary files
  • Using monetary I/O manipulators to learn how to use standard manipulators to write and read monetary values
  • Using time I/O manipulators to learn how to use standard manipulators to write and read date and time values

Using monetary I/O manipulators

In the previous recipe, we looked at some of the manipulators that can be used to control input and output streams. The manipulators that we discussed were related to numeric values and text values. In this recipe, we will look at how to use standard manipulators to write and read monetary values.

Getting ready

You should now be familiar with locales and how to set them for a stream. This topic was discussed in the Using localized settings for streams recipe. It is recommended that you read that recipe before continuing.

The manipulators discussed in this recipe are available in the std namespace, in the <iomanip> header.

How to do it...

To write a monetary value to an output stream, you should do the following:

  • Set the desired locale for controlling the monetary format:
    std::cout.imbue(std::locale("en_GB.utf8"));
    
  • Use either a long double or a std::basic_string value for the amount:
    long double mon = 12345.67;
    std::string smon = "12345.67";
    
  • Use a std::put_money manipulator with a single argument, the monetary value, to display the value using the currency symbol (if any is available):
    std::cout << std::showbase << std::put_money(mon)
              << '
    '; // £123.46
    std::cout << std::showbase << std::put_money(smon)
              << '
    '; // £123.46
    
  • Use std::put_money with two arguments, the monetary value and a Boolean flag set to true, to indicate the use of an international currency string:
    std::cout << std::showbase << std::put_money(mon, true)
              << '
    '; // GBP 123.46
    std::cout << std::showbase << std::put_money(smon, true)
              << '
    '; // GBP 123.46
    

To read a monetary value from an input stream, you should do the following:

  • Set the desired locale for controlling the monetary format:
    std::istringstream stext("$123.45 123.45 USD");
    stext.imbue(std::locale("en_US.utf8"));
    
  • Use either a long double or std::basic_string value to read the amount from the input stream:
    long double v1;
    std::string v2;
    
  • Use std::get_money() with a single argument, the variable where the monetary value is to be written, if a currency symbol might be used in the input stream:
    stext >> std::get_money(v1) >> std::get_money(v2);
    // v1 = 12345, v2 = "12345"
    
  • Use std::get_money() with two arguments, the variable where the monetary value is to be written and a Boolean flag set to true, to indicate the presence of an international currency string:
    stext >> std::get_money(v1, true) >> std::get_money(v2, true);
    // v1 = 0, v2 = "12345"
    

How it works...

The put_money() and get_money() manipulators are very similar. They are both function templates that take an argument representing either the monetary value to be written to the output stream or a variable to hold the monetary value read from an input stream, and a second, optional parameter, to indicate whether an international currency string is used. The default alternative is the currency symbol, if one is available. put_money() uses the std::money_put() facet settings to output a monetary value, and get_money() uses the std::money_get() facet to parse a monetary value. Both manipulator function templates return an object of an unspecified type. These functions do not throw exceptions:

template <class MoneyT>
/*unspecified*/ put_money(const MoneyT& mon, bool intl = false);
template <class MoneyT>
/*unspecified*/ get_money(MoneyT& mon, bool intl = false);

Both of these manipulator functions require the monetary value to be either a long double or a std::basic_string.

However, it is important to note that monetary values are stored as integral numbers of the smallest denomination of the currency defined by the locale in use. Considering US dollars as that currency, $100.00 is stored as 10000.0, and 1 cent, that is, $0.01, is stored as 1.0.

When writing a monetary value to an output stream, it is important to use the std::showbase manipulator if you want to display the currency symbol or the international currency string. This is normally used to indicate the prefix of a numeric base (such as 0x for hexadecimal); however, for monetary values, it is used to indicate whether the currency symbol/string should be displayed or not. The following snippet provides an example:

// print 123.46
std::cout << std::put_money(12345.67) << '
';
// print £123.46
std::cout << std::showbase << std::put_money(12345.67) << '
';

In the preceding snippet, the first line will just print the numerical value representing a currency amount, 123.46, while the second line will print the same numerical value but preceded by the currency symbol.

See also

  • Using I/O manipulators to control the output of a stream to learn about the use of helper functions, called manipulators, that control input and output streams using the << and >> stream operators
  • Using time I/O manipulators to learn how to use standard manipulators to write and read date and time values

Using time I/O manipulators

Similar to the monetary I/O manipulators that we discussed in the previous recipe, the C++11 standard provides manipulators that control the writing and reading of time values to and from streams, where time values are represented in the form of an std::tm object that holds a calendar date and time. In this recipe, you will learn how to use these time manipulators.

Getting ready

Time values used by the time I/O manipulators are expressed in std::tm values. You should be familiar with this structure from the <ctime> header.

You should also be familiar with locales and how to set them for a stream. This topic was discussed in the Using localized settings for streams recipe. It is recommended that you read that recipe before continuing.

The manipulators discussed in this recipe are available in the std namespace, in the <iomanip> header.

How to do it...

To write a time value to an output stream, you should perform the following steps:

  1. Obtain a calendar date and time value corresponding to a given time. There are various ways in which to do this. The following shows several examples of how to convert the current time to a local time that is expressed as a calendar date and time:
    auto now = std::chrono::system_clock::now();
    auto stime = std::chrono::system_clock::to_time_t(now);
    auto ltime = std::localtime(&stime);
    auto ttime = std::time(nullptr);
    auto ltime = std::localtime(&ttime);
    
  2. Use std::put_time() to supply a pointer to the std::tm object, representing the calendar date and time, and a pointer to a null-terminated character string, representing the format. The C++11 standard provides a long list of formats that can be used; this list can be consulted at http://en.cppreference.com/w/cpp/io/manip/put_time.
  3. To write a standard date and time string according to the settings of a specific locale, first set the locale for the stream by calling imbue() and then use the std::put_time() manipulator:
    std::cout.imbue(std::locale("en_GB.utf8"));
    std::cout << std::put_time(ltime, "%c") << '
    ';
    // Sun 04 Dec 2016 05:26:47 JST
    

The following list shows some examples of supported time formats:

  • ISO 8601 date format "%F" or "%Y-%m-%d":
    std::cout << std::put_time(ltime, "%F") << '
    ';
    // 2016-12-04
    
  • ISO 8601 time format "%T":
    std::cout << std::put_time(ltime, "%T") << '
    ';
    // 05:26:47
    
  • ISO 8601 combined date and time in UTC format "%FT%T%z":
    std::cout << std::put_time(ltime, "%FT%T%z") << '
    ';
    // 2016-12-04T05:26:47+0900
    
  • ISO 8601 week format "%Y-W%V":
    std::cout << std::put_time(ltime, "%Y-W%V") << '
    ';
    // 2016-W48
    
  • ISO 8601 date with week number format "%Y-W%V-%u":
    std::cout << std::put_time(ltime, "%Y-W%V-%u") << '
    ';
    // 2016-W48-7
    
  • ISO 8601 ordinal date format "%Y-%j":
    std::cout << std::put_time(ltime, "%Y-%j") << '
    ';
    // 2016-339
    

To read a time value from an input stream, you should perform the following steps:

  1. Declare an object of the std::tm type to hold the time value read from the stream:
    auto time = std::tm {};
    
  2. Use std::get_time() to supply a pointer to the std::tm object, which will hold the time value, and a pointer to a null-terminated character string, which represents the format. The list of possible formats can be consulted at http://en.cppreference.com/w/cpp/io/manip/get_time. The following example parses an ISO 8601 combined date and time value:
    std::istringstream stext("2016-12-04T05:26:47+0900");
    stext >> std::get_time(&time, "%Y-%m-%dT%H:%M:%S");
    if (!stext.fail()) { /* do something */ }
    
  3. To read a standard date and time string according to the settings of a specific locale, first set the locale for the stream by calling imbue() and then use the std::get_time() manipulator:
    std::istringstream stext("Sun 04 Dec 2016 05:35:30 JST");
    stext.imbue(std::locale("en_GB.utf8"));
    stext >> std::get_time(&time, "%c");
    if (stext.fail()) { /* do something else */ }
    

How it works...

The two manipulators for time values, put_time() and get_time(), are very similar: they are both function templates with two arguments. The first argument is a pointer to an std::tm object representing the calendar date and time that holds the value to be written to the stream or the value that is read from the stream. The second argument is a pointer to a null-terminated character string representing the format of the time text. put_time() uses the std::time_put() facet to output a date and time value, and get_time() uses the std::time_get() facet to parse a date and time value. Both manipulator function templates return an object of an unspecified type. These functions do not throw exceptions:

template<class CharT>
/*unspecified*/ put_time(const std::tm* tmb, const CharT* fmt);
template<class CharT>
/*unspecified*/ get_time(std::tm* tmb, const CharT* fmt);

The string that results from using put_time() to write a date and time value to an output stream is the same as the one that results from a call to std::strftime() or std::wcsftime().

The standard defines a long list of available conversion specifiers that compose the format string. These specifiers are prefixed with a %, and, in some cases, are followed by an E or a 0. Some of them are also equivalent; for instance, %F is equivalent to %Y-%m-%d (this is the ISO 8601 date format), and %T is equivalent to %H:%M:%S (this is the ISO 8601 time format). The examples in this recipe mention only a few of the conversion specifiers, referring to ISO 8601 date and time formats. For the complete list of conversion specifiers, refer to the C++ standard or follow the links that were mentioned earlier.

It is important to note that not all of the conversion specifiers supported by put_time() are also supported by get_time(). Examples include the z (offset from UTC in the ISO 8601 format) and Z (time zone name or abbreviation) specifiers, which can only be used with put_time(). This is demonstrated in the following snippet:

std::istringstream stext("2016-12-04T05:26:47+0900");
auto time = std::tm {};
stext >> std::get_time(&time, "%Y-%m-%dT%H:%M:%S%z"); // fails
stext >> std::get_time(&time, "%Y-%m-%dT%H:%M:%S");   // OK

The text represented by some conversion specifiers is locale-dependent. All specifiers prefixed with E or 0 are locale-dependent. To set a particular locale for the stream, use the imbue() method, as demonstrated in the examples in the How to do it... section.

See also

  • Using I/O manipulators to control the output of a stream to learn about the use of helper functions, called manipulators, that control input and output streams using the << and >> stream operators
  • Using monetary I/O manipulators to learn how to use standard manipulators to write and read monetary values

Working with filesystem paths

An important addition to the C++17 standard is the filesystem library that enables us to work with paths, files, and directories in hierarchical filesystems (such as Windows or POSIX filesystems). This standard library has been developed based on the boost.filesystem library. In the next few recipes, we will explore those features of the library that enable us to perform operations with files and directories, such as creating, moving, or deleting them, but also querying properties and searching. It is important, however, to first look at how this library handles paths.

Getting ready

For this recipe, we will consider most of the examples using Windows paths. In the accompanying code, all examples have both Windows and POSIX alternatives.

The filesystem library is available in the std::filesystem namespace, in the <filesystem> header. To simplify the code, we will use the following namespace alias in all of the examples:

namespace fs = std::filesystem;

A path to a filesystem component (file, directory, hard link, or soft link) is represented by the path class.

How to do it...

The following is a list of the most common operations on paths:

  • Create a path using the constructor, the assignment operator, or the assign() method:
    // Windows
    auto path = fs::path{"C:\Users\Marius\Documents"};
    // POSIX
    auto path = fs::path{ "/home/marius/docs" };
    
  • Append elements to a path by including a directory separator using the member operator /=, the non-member operator /, or the append() method:
    path /= "Book";
    path = path / "Modern" / "Cpp";
    path.append("Programming");
    // Windows: C:UsersMariusDocumentsBookModernCppProgramming
    // POSIX:   /home/marius/docs/Book/Modern/Cpp/Programming
    
  • Concatenate elements to a path without including a directory separator by using the member operator +=, the non-member operator +, or the concat() method:
    auto path = fs::path{ "C:\Users\Marius\Documents" };
    path += "\Book";
    path.concat("\Modern");
    // path = C:UsersMariusDocumentsBookModern
    
  • Decompose the elements of a path into its parts, such as the root, root directory, parent path, filename, extension, and so on, using member functions such as root_name(), root_dir(), filename(), stem(), extension(), and so on (all of them are shown in the following example):
    auto path =
      fs::path{"C:\Users\Marius\Documents\sample.file.txt"};
    std::cout
      << "root: "        << path.root_name() << '
    '
      << "root dir: "    << path.root_directory() << '
    '
      << "root path: "   << path.root_path() << '
    '
      << "rel path: "    << path.relative_path() << '
    '
      << "parent path: " << path.parent_path() << '
    '
      << "filename: "    << path.filename() << '
    '
      << "stem: "        << path.stem() << '
    '
      << "extension: "   << path.extension() << '
    ';
    
  • Query whether parts of a part are available using member functions such as has_root_name(), has_root_directory(), has_filename(), has_stem(), and has_extension() (all of them are shown in the following example):
    auto path =
      fs::path{"C:\Users\Marius\Documents\sample.file.txt"};
    std::cout
      << "has root: "        << path.has_root_name() << '
    '
      << "has root dir: "    << path.has_root_directory() << '
    '
      << "has root path: "   << path.has_root_path() << '
    '
      << "has rel path: "    << path.has_relative_path() << '
    '
      << "has parent path: " << path.has_parent_path() << '
    '
      << "has filename: "    << path.has_filename() << '
    '
      << "has stem: "        << path.has_stem() << '
    '
      << "has extension: "   << path.has_extension() << '
    ';
    
  • Check whether a path is relative or absolute:
    auto path2 = fs::path{ "marius\temp" };
    std::cout
      << "absolute: " << path1.is_absolute() << '
    '
      << "absolute: " << path2.is_absolute() << '
    ';
    
  • Modify individual parts of the path, such as the filename with replace_filename() and remove_filename(), and the extension with replace_extension():
    auto path =
      fs::path{"C:\Users\Marius\Documents\sample.file.txt"};
    path.replace_filename("output");
    path.replace_extension(".log");
    // path = C:UsersMariusDocumentsoutput.log
    path.remove_filename();
    // path = C:UsersMariusDocuments
    
  • Convert the directory separator to the system-preferred separator:
    // Windows
    auto path = fs::path{"Users/Marius/Documents"};
    path.make_preferred();
    // path = UsersMariusDocuments
    // POSIX
    auto path = fs::path{ "\home\marius\docs" };
    path.make_preferred();
    // path = /home/marius/docs
    

How it works...

The std::filesystem::path class models paths to filesystem components. However, it only handles the syntax and does not validate the existence of a component (such as a file or a directory) represented by the path.

The library defines a portable, generic syntax for paths that can accommodate various filesystems, such as POSIX or Windows, including the Microsoft Windows Universal Naming Convention (UNC) format. Both of them differ in several key aspects:

  • POSIX systems have a single tree, no root name, a single root directory called /, and a single current directory. Additionally, they use / as the directory separator. Paths are represented as null-terminated strings of char encoded as UTF-8.
  • Windows systems have multiple trees, each with a root name (such as C:), a root directory (such as ), and a current directory (such as C:WindowsSystem32). Paths are represented as null-terminated strings of wide characters encoded as UTF-16.

A pathname, as defined in the filesystem library, has the following syntax:

  • An optional root name (C: or //localhost)
  • An optional root directory
  • Zero or more filenames (which may refer to a file, a directory, a hard link, or a symbolic link) or directory separators

There are two special filenames that are recognized: the single dot (.), which represents the current directory, and the double dot (..), which represents the parent directory. The directory separator can be repeated, in which case it is treated as a single separator (in other words, /home////docs is the same as /home/marius/docs). A path that has no redundant current directory name (.), no redundant parent directory name (..), and no redundant directory separators is said to be in a normal form.

The path operations presented in the previous section are the most common operations with paths. However, their implementation defines additional querying and modifying methods, iterators, non-member comparison operators, and more. The following sample iterates through the parts of a path and prints them to the console:

auto path =
  fs::path{ "C:\Users\Marius\Documents\sample.file.txt" };
for (auto const & part : path)
{
  std::cout << part << '
';
}

The following listing represents its result:

C:
Users
Marius
Documents
sample.file.txt

In this example, sample.file.txt is the filename. This is basically the part from the last directory separator to the end of the path. This is what the member function filename() would be returning for the given path. The extension for this file is .txt, which is the string returned by the extension() member function. To retrieve the filename without an extension, another member function called stem() is available. Here, the string returned by this method is sample.file. For all of these methods, but also all of the other decomposition methods, there is a corresponding querying method with the same name and prefix has_, such as has_filename(), has_stem(), and has_extension(). All of these methods return a bool value to indicate whether the path has the corresponding part.

See also

  • Creating, copying, and deleting files and directories to learn how to perform these basic operations with files and directories independently of the filesystem in use
  • Checking the properties of an existing file or directory to learn how to query the properties of files and directories, such as the type, permissions, file times, and more

Creating, copying, and deleting files and directories

Operations with files, such as copying, moving, and deleting, or with directories, such as creating, renaming, and deleting, are all supported by the filesystem library. Files and directories are identified using a path (which can be absolute, canonical, or relative), a topic that was covered in the previous recipes. In this recipe, we will look at what the standard functions for the previously mentioned operations are and how they work.

Getting ready

Before going forward, you should read the Working with filesystem paths recipe. The introductory notes from that recipe also apply here. However, all of the examples in this recipe are platform-independent.

For all of the following examples, we will use the following variables, and assume the current path is C:UsersMariusDocuments on Windows and /home/marius/docs for a POSIX system:

auto err = std::error_code{};
auto basepath = fs::current_path();
auto path = basepath / "temp";
auto filepath = path / "sample.txt";

We will also assume the presence of a file called sample.txt in the temp subdirectory of the current path (such as C:UsersMariusDocuments empsample.txt or /home/marius/docs/temp/sample.txt).

How to do it...

Use the following library functions to perform operations with directories:

  • To create a new directory, use create_directory(). This method does nothing if the directory already exists; however, it does not create directories recursively:
    auto success = fs::create_directory(path, err);
    
  • To create new directories recursively, use create_directories():
    auto temp = path / "tmp1" / "tmp2" / "tmp3";
    auto success = fs::create_directories(temp, err);
    
  • To move an existing directory, use rename():
    auto temp = path / "tmp1" / "tmp2" / "tmp3";
    auto newtemp = path / "tmp1" / "tmp3";
    fs::rename(temp, newtemp, err);
    if (err) std::cout << err.message() << '
    ';
    
  • To rename an existing directory, also use rename():
    auto temp = path / "tmp1" / "tmp3";
    auto newtemp = path / "tmp1" / "tmp4";
    fs::rename(temp, newtemp, err);
    if (err) std::cout << err.message() << '
    ';
    
  • To copy an existing directory, use copy(). To recursively copy the entire content of a directory, use the copy_options::recursive flag:
    fs::copy(path, basepath / "temp2",
             fs::copy_options::recursive, err);
    if (err) std::cout << err.message() << '
    ';
    
  • To create a symbolic link to a directory, use create_directory_symlink():
    auto linkdir = basepath / "templink";
    fs::create_directory_symlink(path, linkdir, err);
    if (err) std::cout << err.message() << '
    ';
    
  • To remove an empty directory, use remove():
    auto temp = path / "tmp1" / "tmp4";
    auto success = fs::remove(temp, err);
    
  • To remove the entire content of a directory recursively, and the directory itself, use remove_all():
    auto success = fs::remove_all(path, err) !=
                   static_cast<std::uintmax_t>(-1);
    

Use the following library functions to perform operations with files:

  • To copy a file, use copy() or copy_file(). The next section explains the difference between the two:
    auto success = fs::copy_file(filepath, path / "sample.bak", err);
    if (!success) std::cout << err.message() << '
    ';
    fs::copy(filepath, path / "sample.cpy", err);
    if (err) std::cout << err.message() << '
    ';
    
  • To rename a file, use rename():
    auto newpath = path / "sample.log";
    fs::rename(filepath, newpath, err);
    if (err) std::cout << err.message() << '
    ';
    
  • To move a file, use rename():
    auto newpath = path / "sample.log";
    fs::rename(newpath, path / "tmp1" / "sample.log", err);
    if (err) std::cout << err.message() << '
    ';
    
  • To create a symbolic link to a file, use create_symlink():
    auto linkpath = path / "sample.txt.link";
    fs::create_symlink(filepath, linkpath, err);
    if (err) std::cout << err.message() << '
    ';
    
  • To delete a file, use remove():
    auto success = fs::remove(path / "sample.cpy", err);
    if (!success) std::cout << err.message() << '
    ';
    

How it works...

All of the functions mentioned in this recipe, and other similar functions that are not discussed here, have multiple overloads that can be grouped into two categories:

  • Overloads that take, as the last argument, a reference to an std::error_code: these overloads do not throw an exception (they are defined with the noexcept specification). Instead, they set the value of the error_code object to the operating system error code if an operating system error has occurred. If no such error has occurred, then the clear() method on the error_code object is called to reset any possible previously set code.
  • Overloads that do not take the last argument of the std::error_code type: these overloads throw exceptions if errors occur. If an operating system error occurs, they throw an std::filesystem::filesystem_error exception. On the other hand, if memory allocation fails, these functions throw an std::bad_alloc exception.

All the examples in the previous section used the overload that does not throw exceptions but, instead, sets a code when an error occurs. Some functions return a bool to indicate a success or a failure. You can check whether the error_code object holds the code of an error by either checking whether the value of the error code, returned by the method value(), is different from zero, or by using the conversion operator bool, which returns true for the same case and false otherwise. To retrieve the explanatory string for the error code, use the message() method.

Some filesystem library functions are common for both files and directories. This is the case for rename(), remove(), and copy(). The working details of each of these functions can be complex, especially in the case of copy(), and are beyond the scope of this recipe. You should refer to the reference documentation if you need to perform anything other than the simple operations covered here.

When it comes to copying files, there are two functions that can be used: copy() and copy_file(). These have equivalent overloads with identical signatures and, apparently, work the same way. However, there is an important difference (other than the fact that copy() also works for directories): copy_file() follows symbolic links. To avoid doing that and, instead, copy the actual symbolic link, you must use either copy_symlink() or copy() with the copy_options::copy_symlinks flag. Both the copy() and copy_file() functions have an overload that takes an argument of the std::filesystem::copy_options type, which defines how the operation should be performed. copy_options is a scoped enum with the following definition:

enum class copy_options
{
  none = 0,
  skip_existing = 1,
  overwrite_existing = 2,
  update_existing = 4,
  recursive = 8,
  copy_symlinks = 16,
  skip_symlinks = 32,
  directories_only = 64,
  create_symlinks = 128,
  create_hard_links = 256
};

The following table defines how each of these flags affects a copy operation, either with copy() or copy_file(). The table is taken from the 27.10.10.4 paragraph of the C++17 standard:

Option group controlling copy_file function effects for existing target files

none

(Default) Error; file already exists

skip_existing

Do not overwrite existing file; do not report an error

overwrite_existing

Overwrite the existing file

update_existing

Overwrite the existing file if it is older than the replacement file

Option group controlling copy function effects for subdirectories

none

(Default) Do not copy subdirectories

recursive

Recursively copy subdirectories and their contents

Option group controlling copy function effects for symbolic links

none

(Default) Follow symbolic links

copy_symlinks

Copy symbolic links as symbolic links rather than copying the files that they point to

skip_symlinks

Ignore symbolic links

Option group controlling copy function effects for choosing the form of copying

none

(Default) Copy contents

directories_only

Copy the directory structure only, do not copy non-directory files

create_symlinks

Make symbolic links instead of copies of files; the source path will be an absolute path unless the destination path is in the current directory

create_hard_links

Make hard links instead of copies of files

Another aspect that should be mentioned is related to symbolic links: create_directory_symlink() creates a symbolic link to a directory, whereas create_symlink() creates symbolic links to either files or directories. On POSIX systems, the two are identical when it comes to directories. On other systems (such as Windows), symbolic links to directories are created differently than symbolic links to files. Therefore, it is recommended that you use create_directory_symlink() for directories in order to write code that works correctly on all systems.

When you perform operations with files and directories, such as the ones described in this recipe, and you use the overloads that may throw exceptions, ensure that you try-catch the calls. Regardless of the type of overload used, you should check the success of the operation and take appropriate action in the case of a failure.

See also

  • Working with filesystem paths to learn about the C++17 standard support for filesystem paths
  • Removing content from a file to explore the possible ways of removing parts of the content of a file
  • Checking the properties of an existing file or directory to learn how to query the properties of files and directories, such as the type, permissions, file times, and more

Removing content from a file

Operations such as copying, renaming, moving, or deleting files are directly provided by the filesystem library. However, when it comes to removing content from a file, you must perform explicit actions.

Regardless of whether you need to do this for text or binary files, you must implement the following pattern:

  1. Create a temporary file.
  2. Copy only the content that you want from the original file to the temporary file.
  3. Delete the original file.
  4. Rename/move the temporary file to the name/location of the original file.

In this recipe, we will learn how to implement this pattern for a text file.

Getting ready

For the purpose of this recipe, we will consider removing empty lines, or lines that start with a semicolon (;), from a text file. For this example, we will have an initial file, called sample.dat, that contains the names of Shakespeare's plays but also empty lines and lines that start with a semicolon. The following is a partial listing of this file (from the beginning):

;Shakespeare's plays, listed by genre
;TRAGEDIES
Troilus and Cressida
Coriolanus
Titus Andronicus
Romeo and Juliet
Timon of Athens
Julius Caesar

The code samples listed in the next section use the following variables:

auto path = fs::current_path();
auto filepath = path / "sample.dat";
auto temppath = path / "sample.tmp";
auto err = std::error_code{};

We will learn how to put this pattern into code in the following section.

How to do it...

Perform the following operations to remove content from a file:

  1. Open the file for reading:
    std::ifstream in(filepath);
    if (!in.is_open())
    {
      std::cout << "File could not be opened!" << '
    ';
      return;
    }
    
  2. Open another temporary file for writing; if the file already exists, truncate its content:
    std::ofstream out(temppath, std::ios::trunc);
    if (!out.is_open())
    {
      std::cout << "Temporary file could not be created!"
                << '
    ';
      return;
    }
    
  3. Read, line by line, from the input file and copy the selected content to the output file:
    auto line = std::string{};
    while (std::getline(in, line))
    {
      if (!line.empty() && line.at(0) != ';')
      {
        out << line << 'n';
      }
    }
    
  4. Close both the input and output files:
    in.close();
    out.close();
    
  5. Delete the original file:
    auto success = fs::remove(filepath, err);
    if(!success || err)
    {
      std::cout << err.message() << '
    ';
      return;
    }
    
  6. Rename/move the temporary file to the name/location of the original file:
    fs::rename(temppath, filepath, err);
    if (err)
    {
      std::cout << err.message() << '
    ';
    }
    

How it works...

The pattern described here is the same for binary files too; however, for our convenience, we are only discussing an example with text files. The temporary file in this example is in the same directory as the original file. Alternatively, this can be located in a separate directory, such as a user temporary directory. To get a path to a temporary directory, you can use std::filesystem::temp_directory_path(). On Windows systems, this function returns the same directory as GetTempPath(). On POSIX systems, it returns the path specified in one of the environment variables TMPDIR, TMP, TEMP, or TEMPDIR; or, if none of them are available, it returns the path /tmp.

How content from the original file is copied to the temporary file varies from one case to another, depending on what needs to be copied. In the preceding example, we have copied entire lines, unless they are empty or start with a semicolon. For this purpose, we read the content of the original file, line by line, using std::getline() until there are no more lines to read. After all the necessary content has been copied, the files should be closed, so they can be moved or deleted.

To complete the operation, there are three options:

  • Delete the original file and rename the temporary file to the same name as the original one, if they are in the same directory, or move the temporary file to the original file location, if they are in different directories. This is the approach taken in this recipe. For this, we used the remove() function to delete the original file and rename() to rename the temporary file to the original filename.
  • Copy the content of the temporary file to the original file (for this, you can use either the copy() or copy_file() functions) and then delete the temporary file (use remove() for this).
  • Rename the original file (for instance, changing the extension or the name) and then use the original filename to rename/move the temporary file.

If you take the first approach mentioned here, then you must make sure that the temporary file that is later replacing the original file has the same file permissions as the original file; otherwise, depending on the context of your solution, it can lead to problems.

See also

  • Creating, copying, and deleting files and directories to learn how to perform these basic operations with files and directories independently of the filesystem in use

Checking the properties of an existing file or directory

The filesystem library provides functions and types that enable developers to check for the existence of a filesystem object, such as a file or directory, its properties, such as the type (the file, directory, symbolic link, and more), the last write time, permissions, and more. In this recipe, we will look at what these types and functions are and how they can be used.

Getting ready

For the following code samples, we will use the namespace alias fs for the std::filesystem namespace. The filesystem library is available in the header with the same name, <filesystem>. Also, we will use the variables shown here, path for the path of a file and err for receiving potential operating system error codes from the filesystem APIs:

auto path = fs::current_path() / "main.cpp";
auto err = std::error_code{};

Also, the function to_time_t shown here, will be referred in this recipe:

  template <typename TP>
  std::time_t to_time_t(TP tp)
  {
     using namespace std::chrono;
     auto sctp = time_point_cast<system_clock::duration>(
       tp - TP::clock::now() + system_clock::now());
     return system_clock::to_time_t(sctp);
  }

Before continuing with this recipe, you should read the Working with filesystem paths recipe.

How to do it...

Use the following library functions to retrieve information about filesystem objects:

  • To check whether a path refers to an existing filesystem object, use exists():
    auto exists = fs::exists(path, err);
    std::cout << "file exists: " << std::boolalpha
              << exists << '
    ';
    
  • To check whether two different paths refer to the same filesystem object, use equivalent():
    auto same = fs::equivalent(path,
                   fs::current_path() / "." / "main.cpp");
    std::cout << "equivalent: " << same << '
    ';
    
  • To retrieve the size of a file in bytes, use file_size():
    auto size = fs::file_size(path, err);
    std::cout << "file size: " << size << '
    ';
    
  • To retrieve the count of hard links to a filesystem object, use hard_link_count():
    auto links = fs::hard_link_count(path, err);
    if(links != static_cast<uintmax_t>(-1))
      std::cout << "hard links: " << links << '
    ';
    else
      std::cout << "hard links: error" << '
    ';
    
  • To retrieve or set the last modification time for a filesystem object, use last_write_time():
    auto lwt = fs::last_write_time(path, err);
    auto time = to_time_t(lwt);
    auto localtime = std::localtime(&time);
    std::cout << "last write time: "
              << std::put_time(localtime, "%c") << '
    ';
    
  • To retrieve the file attributes, such as the type and permissions (as if returned by the POSIX stat function), use the status() function. This function follows symbolic links. To retrieve the file attributes of a symbolic link without following it, use symlink_status():
    auto print_perm = [](fs::perms p)
    {
      std::cout
        << ((p & fs::perms::owner_read) != fs::perms::none ?
           "r" : "-")
        << ((p & fs::perms::owner_write) != fs::perms::none ?
           "w" : "-")
        << ((p & fs::perms::owner_exec) != fs::perms::none ?
           "x" : "-")
        << ((p & fs::perms::group_read) != fs::perms::none ?
           "r" : "-")
        << ((p & fs::perms::group_write) != fs::perms::none ?
           "w" : "-")
        << ((p & fs::perms::group_exec) != fs::perms::none ?
           "x" : "-")
        << ((p & fs::perms::others_read) != fs::perms::none ?
           "r" : "-")
        << ((p & fs::perms::others_write) != fs::perms::none ?
           "w" : "-")
        << ((p & fs::perms::others_exec) != fs::perms::none ?
           "x" : "-")
        << '
    ';
    };
    auto status = fs::status(path, err);
    std::cout << "type: " << static_cast<int>(status.type()) << '
    ';
    std::cout << "permissions: ";
    print_perm(status.permissions());
    
  • To check whether a path refers to a particular type of filesystem object, such as a file, directory, symbolic link, and so on, use the functions is_regular_file(), is_directory(), is_symlink(), and so on:
    std::cout << "regular file? " <<
              fs::is_regular_file(path, err) << '
    ';
    std::cout << "directory? " <<
              fs::is_directory(path, err) << '
    ';
    std::cout << "char file? " <<
              fs::is_character_file(path, err) << '
    ';
    std::cout << "symlink? " <<
              fs::is_symlink(path, err) << '
    ';
    

How it works...

These functions, used to retrieve information about the filesystem files and directories, are, in general, simple and straightforward. However, some considerations are necessary:

  • Checking whether a filesystem object exists can be done using exists(), either by passing the path or an std::filesystem::file_status object that was previously retrieved using the status() function.
  • The equivalent() function determines whether two filesystem objects have the same status, as retrieved by the function status(). If neither path exists, or if both exist but neither is a file, directory, or symbolic link, then the function returns an error. Hard links to the same file object are equivalent. A symbolic link and its target are also equivalent.
  • The file_size() function can only be used to determine the size of regular files and symbolic links that target a regular file. For any other types of file objects, such as directories, this function fails. This function returns the size of the file in bytes, or -1 if an error has occurred. If you want to determine whether a file is empty, you can use the is_empty() function. This works for all types of filesystem objects, including directories.
  • The last_write_time() function has two sets of overloads: one that is used to retrieve the last modification time of the filesystem object, and one that is used to set the last modification time. Time is indicated by a std::filesystem::file_time_type object, which is basically a type alias for std::chrono::time_point. The following example changes the last write time for a file to 30 minutes earlier than its previous value:
    using namespace std::chrono_literals;
    auto lwt = fs::last_write_time(path, err);
    fs::last_write_time(path, lwt - 30min);
    
  • The status() function determines the type and permissions of a filesystem object. However, if the file is a symbolic link, the information returned is about the target of the symbolic link. To retrieve information about the symbolic link itself, the symlink_status() function must be used. Permissions are defined as an enumeration, std::filesystem::perms. Not all the enumerators of this scoped enum represent permissions; some of them represent controlling bits, such as add_perms, to indicate that permissions should be added, or remove_perms, to indicate that permissions should be removed. The permissions() function can be used to modify the permissions of a file or a directory. The following example adds all permissions to the owner and user group of a file:
    fs::permissions(
      path,
      fs::perms::add_perms |
      fs::perms::owner_all | fs::perms::group_all,
      err);
    
  • To determine the type of a filesystem object, such as a file, directory, or symbolic link, there are two options available: retrieve the file status and then check the type property, or use one of the available filesystem functions, such as is_regular_file(), is_symlink(), or is_directory(). The following examples that check whether a path refers to a regular file are equivalent:
    auto s = fs::status(path, err);
    auto isfile = s.type() == std::filesystem::file_type::regular;
    auto isfile = fs::is_regular_file(path, err);
    

All of the functions discussed in this recipe have an overload that throws exceptions if an error occurs, and an overload that does not throw but returns an error code via a function parameter. All of the examples in this recipe used this approach. More information about these sets of overloads can be found in the Creating, copying, and deleting files and directories recipe.

See also

  • Working with filesystem paths to learn about the C++17-standard support for filesystem paths
  • Creating, copying, and deleting files and directories to learn how to perform these basic operations with files and directories independently of the filesystem in use
  • Enumerating the content of a directory to learn how to iterate through the files and subdirectories of a directory

Enumerating the content of a directory

So far in this chapter, we have looked at many of the functionalities provided by the filesystem library, such as working with paths, performing operations with files and directories (creating, moving, renaming, deleting, and so on), and querying or modifying properties. Another useful functionality when working with the filesystem is to iterate through the content of a directory. The filesystem library provides two directory iterators, one called directory_iterator, which iterates the content of a directory, and one called recursive_directory_iterator, which recursively iterates the content of a directory and its subdirectories. In this recipe, we will learn how to use them.

Getting ready

For this recipe, we will consider a directory with the following structure:

test/
├──data/
│ ├──input.dat
│ └──output.dat
├──file_1.txt
├──file_2.txt
└──file_3.log

In this recipe, we will work with filesystem paths and check the properties of a filesystem object. Therefore, it is recommended that you first read the Working with filesystem paths and Checking the properties of an existing file or directory recipes.

How to do it...

Use the following patterns to enumerate the content of a directory:

  • To iterate only the content of a directory without recursively visiting its subdirectories, use directory_iterator:
    void visit_directory(fs::path const & dir)
    {
      if (fs::exists(dir) && fs::is_directory(dir))
      {
        for (auto const & entry : fs::directory_iterator(dir))
        {
          auto filename = entry.path().filename();
          if (fs::is_directory(entry.status()))
            std::cout << "[+]" << filename << '
    ';
          else if (fs::is_symlink(entry.status()))
            std::cout << "[>]" << filename << '
    ';
          else if (fs::is_regular_file(entry.status()))
            std::cout << " " << filename << '
    ';
          else
            std::cout << "[?]" << filename << '
    ';
        }
      }
    }
    
  • To iterate all the content of a directory, including its subdirectories, use recursive_directory_iterator when the order of processing the entries does not matter:
    void visit_directory_rec(fs::path const & dir)
    {
      if (fs::exists(dir) && fs::is_directory(dir))
      {
        for (auto const & entry :
             fs::recursive_directory_iterator(dir))
        {
          auto filename = entry.path().filename();
          if (fs::is_directory(entry.status()))
            std::cout << "[+]" << filename << '
    ';
          else if (fs::is_symlink(entry.status()))
            std::cout << "[>]" << filename << '
    ';
          else if (fs::is_regular_file(entry.status()))
            std::cout << " " << filename << '
    ';
          else
            std::cout << "[?]" << filename << '
    ';
        }
      }
    }
    
  • To iterate all the content of a directory, including its subdirectories, in a structured manner, such as traversing a tree, use a function similar to the one in the first example, which uses directory_iterator to iterate the content of a directory. However, instead, call it recursively for each subdirectory:
    void visit_directory(
      fs::path const & dir,
      bool const recursive = false,
      unsigned int const level = 0)
    {
      if (fs::exists(dir) && fs::is_directory(dir))
      {
        auto lead = std::string(level*3, ' ');
        for (auto const & entry : fs::directory_iterator(dir))
        {
          auto filename = entry.path().filename();
          if (fs::is_directory(entry.status()))
          {
            std::cout << lead << "[+]" << filename << '
    ';
            if(recursive)
              visit_directory(entry, recursive, level+1);
          }
          else if (fs::is_symlink(entry.status()))
            std::cout << lead << "[>]" << filename << '
    ';
          else if (fs::is_regular_file(entry.status()))
            std::cout << lead << " " << filename << '
    ';
          else
            std::cout << lead << "[?]" << filename << '
    ';
        }
      }
    }
    

How it works...

Both directory_iterator and recursive_directory_iterator are input iterators that iterate over the entries of a directory. The difference is that the first one does not visit the subdirectories recursively, while the second one, as its name implies, does. They both share a similar behavior:

  • The order of iteration is unspecified.
  • Each directory entry is visited only once.
  • The special paths dot (.) and dot-dot (..) are skipped.
  • A default-constructed iterator is the end iterator and two end iterators are always equal.
  • When iterated past the last directory entries, it becomes equal to the end iterator.
  • The standard does not specify what happens if a directory entry is added or deleted to the iterated directory after the iterator has been created.
  • The standard defines the non-member functions begin() and end() for both directory_iterator and recursive_directory_iterator, which enables us to use these iterators in range-based for loops, as shown in the examples earlier.

Both iterators have overloaded constructors. Some overloads of the recursive_directory_iterator constructor take an argument of the std::filesystem::directory_options type, which specifies additional options for the iteration:

  • none: This is the default that does not specify anything.
  • follow_directory_symlink: This specifies that the iteration should follow symbolic links instead of serving the link itself.
  • Skip_permission_denied: This specifies that you should ignore and skip the directories that could trigger an access denied error.

The elements that both directory iterators point to are of the directory_entry type. The path() member function returns the path of the filesystem object represented by this object. The status of the filesystem object can be retrieved with the member functions status() and symlink_status() for symbolic links.

The preceding examples follow a common pattern:

  • Verify that the path to iterate actually exists.
  • Use a range-based for loop to iterate all the entries of a directory.
  • Use one of the two directory iterators available in the filesystem library, depending on the way the iteration is supposed to be done.
  • Process each entry according to the requirements.

In our examples, we simply printed the names of the directory entries to the console. It is important to note, as we specified earlier, that the content of the directory is iterated in an unspecified order. If you want to process the content in a structured manner, such as showing subdirectories and their entries indented (for this particular case) or in a tree (in other types of applications), then using recursive_directory_iterator is not appropriate. Instead, you should use directory_iterator in a function that is called recursively from the iteration, for each subdirectory, as shown in the last example from the previous section.

Considering the directory structure presented at the beginning of this recipe (relative to the current path), we get the following output when using the recursive iterator, as follows:

visit_directory_rec(fs::current_path() / "test");
[+]data
   input.dat
   output.dat
   file_1.txt
   file_2.txt
   file_3.log

On the other hand, when using the recursive function from the third example, as shown in the following listing, the output is displayed ordered on sublevels, as intended:

visit_directory(fs::current_path() / "test", true);
[+]data
      input.dat
      output.dat
   file_1.txt
   file_2.txt
   file_3.log

Remember that the visit_directory_rec() function is a non-recursive function that uses the recursive_directory_iterator iterator, while the visit_directory() function is a recursive function that uses the directory_iterator. This example should help you to understand the difference between the two iterators.

There's more...

In the previous recipe, Checking the properties of an existing file or directory, we discussed, among other things, the file_size() function that returns the size of a file in bytes. However, this function fails if the specified path is a directory. To determine the size of a directory, we need to iterate recursively through the content of a directory, retrieve the size of the regular files or symbolic links, and add them together. However, we must make sure that we check the value returned by file_size(), that is, -1 cast to an std::uintmax_t, in the case of an error. This value, indicating a failure, should not be added to the total size of a directory.

Consider the following function to exemplify this case:

std::uintmax_t dir_size(fs::path const & path)
{
  auto size = static_cast<uintmax_t>(-1);
  if (fs::exists(path) && fs::is_directory(path))
  {
    for (auto const & entry : fs::recursive_directory_iterator(path))
    {
      if (fs::is_regular_file(entry.status()) ||
      fs::is_symlink(entry.status()))
      {
        auto err = std::error_code{};
        auto filesize = fs::file_size(entry);
        if (filesize != static_cast<uintmax_t>(-1))
          size += filesize;
      }
    }
  }
  return size;
}

The preceding dir_size() function returns the size of all the files in a directory (recursively), or -1, as an uintmax_t, in the case of an error.

See also

  • Checking the properties of an existing file or directory to learn how to query the properties of files and directories, such as the type, permissions, file times, and more
  • Finding a file to learn how to search for files based on their name, extension, or other properties

Finding a file

In the previous recipe, we learned how we can use directory_iterator and recursive_directory_iterator to enumerate the content of a directory. Displaying the content of a directory, as we did in the previous recipe, is only one of the scenarios in which this is needed. The other major scenario is when searching for particular entries in a directory, such as files with a particular name, extension, and so on. In this recipe, we will demonstrate how we can use the directory iterators and the iterating patterns shown earlier to find files that match a given criterion.

Getting ready

You should read the previous recipe, Enumerating the content of a directory, for details about directory iterators. In this recipe, we will also use the same test directory structure that was presented in the previous recipe.

How to do it...

To find files that match particular criteria, use the following pattern:

  1. Use recursive_directory_iterator to iterate through all the entries of a directory and recursively through its subdirectories.
  2. Consider regular files (and any other types of files you may need to process).
  3. Use a function object (such as a lambda expression) to filter only the files that match your criteria.
  4. Add the selected entries to a range (such as a vector).

This pattern is exemplified in the find_files() function shown here:

std::vector<fs::path> find_files(
    fs::path const & dir,
    std::function<bool(fs::path const&)> filter)
{
  auto result = std::vector<fs::path>{};
  if (fs::exists(dir))
  {
    for (auto const & entry :
      fs::recursive_directory_iterator(
        dir,
        fs::directory_options::follow_directory_symlink))
    {
      if (fs::is_regular_file(entry) &&
          filter(entry))
      {
        result.push_back(entry);
      }
    }
  }
  return result;
}

How it works...

When we want to find files in a directory, the structure of the directory and the order its entries, including subdirectories, are visited in is probably not important. Therefore, we can use the recursive_directory_iterator to iterate through the entries.

The function find_files() takes two arguments: a path and a function wrapper that is used to select the entries that should be returned. The return type is a vector of filesystem::path, though. Alternatively, it could also be a vector of filesystem::directory_entry. The recursive directory iterator used in this example does not follow symbolic links, returning the link itself and not the target. This behavior can be changed using a constructor overload that has an argument of the type filesystem::directory_options and by passing follow_directory_symlink.

In the preceding example, we only consider the regular files and ignore the other types of filesystem objects. The predicate is applied to the directory entry, and, if it returns true, the entry is added to the result.

The following example uses the find_files() function to find all of the files in the test directory that start with the prefix file_:

auto results = find_files(
          fs::current_path() / "test",
          [](fs::path const & p) {
  auto filename = p.wstring();
  return filename.find(L"file_") != std::wstring::npos;
});
for (auto const & path : results)
{
  std::cout << path << '
';
}

The output of executing this program, with paths relative to the current path, is as follows:

testfile_1.txt
testfile_2.txt
testfile_3.log

A second example shows how to find files that have a particular extension, in this case, the extension .dat:

auto results = find_files(
       fs::current_path() / "test",
       [](fs::path const & p) {
         return p.extension() == L".dat";});
for (auto const & path : results)
{
  std::cout << path << '
';
}

The output, again relative to the current path, is shown here:

testdatainput.dat
testdataoutput.dat

These two examples are very similar. The only thing that is different is the code in the lambda function, which checks the path received as an argument.

See also

  • Checking the properties of an existing file or directory to learn how to query the properties of files and directories, such as the type, permissions, file times, and more
  • Enumerating the content of a directory to learn how to iterate through the files and subdirectories of a directory
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.27.202