Overview

The Xonai Accelerator plugs into Apache Spark 3.1+ via plugin interface, which allows injecting a custom backend for Spark APIs, such as SQL and DataFrame, and execute applications faster without requiring application code changes.

The Xonai Accelerator backend generates code for supported SQL operations with a custom DSL and MLIR-based compiler purpose-built to optimize data analytics programs. This replaces the default Java code generation and JVM compiler if the SQL operation is supported (see compatibility reference), otherwise it will fallback to the default Spark Catalyst engine and convert data between engines.

The Xonai Accelerator does not interfere with query planning, execution model or any other mechanism other than replacing the Catalyst engine at execution time only for the purpose of doing batch processing. This is a design principle in order to avoid breaking existing enterprise-grade applications or requiring tuning them for a custom backend.

The mandatory requirements to run the Xonai Accelerator are:

  • Apache Spark 3.1+

  • Compatible Spark runtimes:

    • Amazon EMR

    • Databricks

    • Google Dataproc

    • Open-source distribution of Apache Spark 3.1+

  • Access to Xonai Accelerator JARs

  • Access to Spark configuration properties (may not be possible in specific managed platforms)

You may deploy additional components that work together with the Xonai Accelerator to complement it.

Supported Spark Runtimes

The Xonai Accelerator is distributed as a set of JARs, one for each supported Spark runtime as shown in the following table:

Spark Version

Runtime

Open Source

Amazon EMR

Databricks

Google Dataproc

3.1.1

(6.3.X)

3.1.2

(6.4.0, 6.5.0)

(9.1 LTS)

3.1.3

(2.0.X)

3.2.0

(6.6.0)

3.2.1

(6.7.0)

(10.4 LTS)

3.2.2

3.2.3

3.2.4

3.3.0

(6.8.X, 6.9.X)

(11.3 LTS)

3.3.1

(6.10.X)

3.3.2

(6.11.X)

(12.2 LTS)

(2.1.X)

3.3.3

3.3.4

3.4.0

(6.12.0)

3.4.1

(6.13.0, 6.14.0, 6.15.0)

(13.3 LTS)

3.4.2

3.4.3

3.4.4

3.5.0

(7.0.0, 7.1.0)

(14.3 LTS, 15.4 LTS)

3.5.1

(7.2.0, 7.3.0)

(2.2.X)

3.5.2

(7.4.0, 7.5.0)

3.5.3

(7.6.0, 7.7.0)

3.5.4

(7.8.0)

3.5.5

Each JAR uses the following naming convention:

xonai-spark-plugin-<package>-<runtime>-<release>-<channel>-<platform>-<arch>.jar

Depending on your execution environment, each tag may be named as listed in the following table:

Package <package>

Label

Open-source distribution

oss

Amazon EMR

emr

Databricks

dbx

Google Dataproc

gdp

Supported Linux Distributions

The Xonai Accelerator works on any glibc-2.17-or-later-based amd64 or arm64 Linux distribution. A non-exhaustive list includes, but is not limited to:

  • Debian >= 8

  • Ubuntu >= 14.06

  • Red Hat Enterprise Linux >= 7

  • CentOS >= 7

Supported JDKs and Scala

The Xonai Accelerator is compatible with JDK 1.8, 11 and 17. All JARs are built with Scala 2.12.

Plugin Activation

The Xonai Accelerator can be activated on a per-application basis via Spark 3 configuration just by adding the following properties:

--jars <scheme>://<path>/xonai-spark-plugin-<package>-<runtime>-<release>-<channel>-<platform>-<arch>.jar
--conf spark.plugins=com.xonai.spark.SQLPlugin

For optimal performance, memory configuration may be required as per next section.

Memory Modes

The Xonai Accelerator uses off-heap memory to process data and it can operate in one of three modes: off-heap, overhead, and dynamic.

The default memory mode is selected based on existing properties and can be overridden with spark.xonai.memory.mode. The amount of memory Xonai can use is also automatically computed and can be overridden with spark.xonai.memory.size.

Info

The default values of the properties above are printed to the driver logs and visible in the Environment tab of the Spark UI.

Off-Heap

Xonai uses the Spark off-heap management to acquire memory meaning that Spark and Xonai share a memory pool. This mode is selected by default if spark.memory.offHeap.size and spark.memory.offHeap.enabled are specified.

The default usage looks as the following:

--conf spark.memory.offHeap.size=30g
--conf spark.memory.offHeap.enabled=true

Overhead

Xonai uses part of the executor memory overhead with a memory pool separate from Spark. This mode is selected by default if spark.executor.memoryOverhead is specified. The default executor memory overhead is 10% of spark.executor.memory (18% for EMR).

For Xonai to run optimally, the executor memory (JVM) must be reduced and the memory overhead increased. For example, if an application has the following memory configuration:

--conf spark.executor.memory=30g
--conf spark.executor.memoryOverhead=5g

It can be modified to:

--conf spark.executor.memory=10g
--conf spark.executor.memoryOverhead=25g

By default, the Xonai memory size is around overhead - memory * overhead factor with a default factor of 0.2. For the example above the result is spark.xonai.memory.size=23g.

The overhead memory mode is recommended when most or all operations in an application are supported by Xonai. Otherwise, non-supported operations may underperform due to limited memory.

Dynamic

Xonai monitors memory allocation by the JVM and it uses non-allocated memory. This mode is selected by default if no other mode is selected. As the JVM grows in size the Xonai memory size decreases.

This memory mode can be configured using:

--conf spark.xonai.memory.mode=dynamic
--conf spark.xonai.memory.size=1g

In this mode, the value of spark.xonai.memory.size defines a lower-bound and defaults to 50% of the executor memory overhead. If the dynamic memory size is low the plugin will stop to offload operations to the Xonai engine.

The dynamic memory mode is the most conservative and it is recommended when testing the plugin for the first time or when using the plugin in applications with frequent code changes.


Last update: Apr 02, 2025