Optimizing Memory

The Xonai profiling page can be used to obtain estimates of how much memory an application is using and how much of the execution time is performed by Xonai. These estimates can be used to deduce a more optimal memory configuration for the application.

Example

The application below peaked at 3.6G memory per task and most of the task execution time was performed by Xonai rather than the JVM.

Xonai profiling page with application memory

Assuming the application is using, for example, 4 cores (spark.executor.cores=4), the total executor memory needed for tasks alone should be peaking around 15G.

Additionally, the following must be considered before determining a configuration:

  • At least 1G JVM executor memory per core is recommended for serious applications, even if most of the application is executed by Xonai.

  • Additional memory overhead should be taken into account, such as the one defined by spark.executor.memoryOverheadFactor, which Xonai defaults to 18.75% and subtracts from spark.executor.memoryOverhead to reserve off-heap memory for the JVM.

With this in mind, the following memory configuration is likely to be optimal:

--conf spark.executor.memory=6g
--conf spark.executor.memoryOverhead=18g
  • The 18G assigned to the spark.executor.memoryOverhead comes from the 15G estimate from the metrics and additional 3G overhead off-heap memory for the JVM.

  • Executor JVM memory is set to 6G, which should be more than enough.

  • The total memory per executor is 24G.

Troubleshooting

Setting Spark properties incorrectly, such as assigning too little executor memory overhead, will result in the application failing to initialize and the Xonai Accelerator will report the cause in the logs. All initialization errors can be found on the errors reference page.

Spark or the resource manager itself may also abort execution if the JVM memory overhead is too low, for example:

Container killed by YARN for exceeding physical memory limits. Consider boosting spark.executor.memoryOverhead

In this case, the spark.executor.memoryOverheadFactor should be increased.