Tuning the data structures

The first way to reduce extra memory usage is to avoid some features in the Java data structure that impose extra overheads. For example, pointer-based data structures and wrapper objects contribute to nontrivial overheads. To tune your source code with a better data structure, we provide some suggestions here, which can be useful.

First, design your data structures such that you use arrays of objects and primitive types more. Thus, this also suggests using standard Java or Scala collection classes like Set, List, Queue, ArrayList, Vector, LinkedList, PriorityQueue, HashSet, LinkedHashSet, and TreeSet more frequently.

Second, when possible, avoid using nested structures with a lot of small objects and pointers so that your source code becomes more optimized and concise. Third, when possible, consider using numeric IDs and sometimes using enumeration objects rather than using strings for keys. This is recommended because, as we have already stated, a single Java string object creates an extra overhead of 40 bytes. Finally, if you have less than 32 GB of main memory (that is, RAM), set the JVM flag -XX:+UseCompressedOops to make pointers 4 bytes instead of 8.

The earlier option can be set in the SPARK_HOME/conf/spark-env.sh.template. Just rename the file as spark-env.sh and set the value straight away!
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.104.250