Reading text files in memory

The Files class comes with two methods that can read an entire text file in memory. One of them is List<String> readAllLines​(Path path, Charset cs):

List<String> lines = Files.readAllLines(
chineseFile, StandardCharsets.UTF_16);

Moreover, we can read the entire content in a String via Files.readString​(Path path, Charset cs):

String content = Files.readString(chineseFile, 
StandardCharsets.UTF_16);

While these methods are very convenient for relatively small files, they are not a good choice for large files. Trying to fetch large files in memory is prone to OutOfMemoryError and, obviously, will consume a lot of memory. Alternatively, in the case of huge files (for example, 200 GB), we can focus on memory-mapped files (MappedByteBuffer). MappedByteBuffer allows us to create and modify huge files and treat them as very big arrays. They look like they are in memory, even if they are not. Everything happens at the native level:

// or use, Files.newByteChannel()
try (FileChannel fileChannel = (FileChannel.open(chineseFile,
EnumSet.of(StandardOpenOption.READ)))) {

MappedByteBuffer mbBuffer = fileChannel.map(
FileChannel.MapMode.READ_ONLY, 0, fileChannel.size());

if (mbBuffer != null) {
String bufferContent
= StandardCharsets.UTF_16.decode(mbBuffer).toString();

System.out.println(bufferContent);
mbBuffer.clear();
}
}

For huge files, it is advisable to traverse the buffer with a fixed size, as follows:

private static final int MAP_SIZE = 5242880; // 5 MB in bytes

try (FileChannel fileChannel = (FileChannel.open(chineseFile,
EnumSet.of(StandardOpenOption.READ)))) {

int position = 0;
long length = fileChannel.size();

while (position < length) {
long remaining = length - position;
int bytestomap = (int) Math.min(MAP_SIZE, remaining);

MappedByteBuffer mbBuffer = fileChannel.map(
MapMode.READ_ONLY, position, bytestomap);

... // do something with the current buffer

position += bytestomap;
}
}
JDK 13 prepares the release of non-volatile MappedByteBuffers. Stay tuned!
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.50.222