The variable length values region

Finally, the variable length values region, the last region of an UnsafeRow object, contains variable-sized fields like strings. As mentioned earlier, a pointer containing the offset to the position in the variable length values region is stored in the fixed length values region. Since offset and length are known to be stored in the fixed length values region, a variable sized value can be easily addressed.

For those of you interested in getting a better idea about how these offsets are calculated, let's have a look at some code. Note that this code is Java code, since UnsafeRow has been implemented in Java instead of Scala:

 private long getFieldOffset(int ordinal) {
return baseOffset + bitSetWidthInBytes + ordinal * 8L;
}

The method getFieldOffset returns the byte offset of a fixed size field by using baseOffset of the UnsafeRow object itself, adding the size of the first region to it (bitSetWidthInBytes) and finally jumping right to the start of the correct slot by using ordinal, which stores the number/id of the field multiplied by eight. We multiply by eight because all slots are 8-byte aligned.

Slightly more complicated is addressing variable sized fields. Let's have a look at the corresponding Java code:

public UTF8String getUTF8String(int ordinal) {
if (isNullAt(ordinal)) return null;
final long offsetAndSize = getLong(ordinal);
final int offset = (int) (offsetAndSize >> 32);
final int size = (int) offsetAndSize;
return UTF8String.fromAddress(baseObject, baseOffset + offset, size);
}

First, we check in the fixed size region if the value is null, because then we can already stop here and return null. Then, offsetAndSize is obtained by reading an ordinary Long from the fixed size region, which stores the offset and length of variable size objects in the variable sized objects area. Next, this Long value has to be split into two int values, containing offset and size separately, therefore offsetAndSize, right-shifted 32 bits and cast to int to resemble the offset value, whereas only casting offsetAndSize to int resembles the size value. Finally, using all these address values, a String object is created from the raw bytes within the UnsafeRow object.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.22.23