Overview¶
The Xonai Accelerator plugs into Apache Spark 3.1+ via plugin interface, which allows injecting a custom backend for Spark APIs, such as SQL and DataFrame, and execute applications faster without requiring application code changes.
The Xonai Accelerator backend generates code for supported SQL operations with a custom DSL and MLIR-based compiler purpose-built to optimize data analytics programs. This replaces the default Java code generation and JVM compiler if the SQL operation is supported (see compatibility reference), otherwise it will fallback to the default Spark Catalyst engine and convert data between engines.
The Xonai Accelerator does not interfere with query planning, execution model or any other mechanism other than replacing the Catalyst engine at execution time only for the purpose of doing batch processing. This is a design principle in order to avoid breaking existing enterprise-grade applications or requiring tuning them for a custom backend.
The mandatory requirements to run the Xonai Accelerator are:
Apache Spark 3.1+
Compatible Spark runtimes:
Amazon EMR
Databricks
Google Dataproc
Open-source distribution of Apache Spark 3.1+
Access to Xonai Accelerator JARs
Access to Spark configuration properties (may not be possible in specific managed platforms)
You may deploy additional components that work together with the Xonai Accelerator to complement it.
Supported Spark Runtimes¶
The Xonai Accelerator is distributed as a set of JARs, one for each supported Spark runtime as shown in the following table:
Spark Version |
Runtime |
||||
---|---|---|---|---|---|
Open Source |
Amazon EMR |
Databricks |
Google Dataproc |
||
3.1.1 |
(6.3.X) |
||||
3.1.2 |
(6.4.0, 6.5.0) |
(9.1 LTS) |
|||
3.1.3 |
(2.0.X) |
||||
3.2.0 |
(6.6.0) |
||||
3.2.1 |
(6.7.0) |
(10.4 LTS) |
|||
3.2.2 |
|||||
3.2.3 |
|||||
3.2.4 |
|||||
3.3.0 |
(6.8.X, 6.9.X) |
(11.3 LTS) |
|||
3.3.1 |
(6.10.X) |
||||
3.3.2 |
(6.11.X) |
(12.2 LTS) |
(2.1.X) |
||
3.3.3 |
|||||
3.3.4 |
|||||
3.4.0 |
(6.12.0) |
||||
3.4.1 |
(6.13.0, 6.14.0, 6.15.0) |
(13.3 LTS) |
|||
3.4.2 |
|||||
3.4.3 |
|||||
3.4.4 |
|||||
3.5.0 |
(7.0.0, 7.1.0) |
(14.3 LTS, 15.4 LTS) |
|||
3.5.1 |
(7.2.0, 7.3.0) |
(2.2.X) |
|||
3.5.2 |
(7.4.0, 7.5.0) |
||||
3.5.3 |
(7.6.0, 7.7.0) |
||||
3.5.4 |
(7.8.0) |
||||
3.5.5 |
Each JAR uses the following naming convention:
xonai-spark-plugin-<package>-<runtime>-<release>-<channel>-<platform>-<arch>.jar
Depending on your execution environment, each tag may be named as listed in the following table:
Package <package> |
Label |
---|---|
Open-source distribution |
oss |
Amazon EMR |
emr |
Databricks |
dbx |
Google Dataproc |
gdp |
Supported Linux Distributions¶
The Xonai Accelerator works on any glibc-2.17-or-later-based amd64 or arm64 Linux distribution. A non-exhaustive list includes, but is not limited to:
Debian >= 8
Ubuntu >= 14.06
Red Hat Enterprise Linux >= 7
CentOS >= 7
Supported JDKs and Scala¶
The Xonai Accelerator is compatible with JDK 1.8, 11 and 17. All JARs are built with Scala 2.12.
Plugin Activation¶
The Xonai Accelerator can be activated on a per-application basis via Spark 3 configuration just by adding the following properties:
--jars <scheme>://<path>/xonai-spark-plugin-<package>-<runtime>-<release>-<channel>-<platform>-<arch>.jar
--conf spark.plugins=com.xonai.spark.SQLPlugin
For optimal performance, memory configuration may be required as per next section.
Memory Modes¶
The Xonai Accelerator uses off-heap memory to process data and it can operate in one of three modes: off-heap, overhead, and dynamic.
The default memory mode is selected based on existing properties and can be overridden with
spark.xonai.memory.mode
. The amount of memory Xonai can use is also automatically computed and
can be overridden with spark.xonai.memory.size
.
Info
The default values of the properties above are printed to the driver logs and visible in the Environment tab of the Spark UI.
Off-Heap¶
Xonai uses the Spark off-heap management to acquire memory meaning that Spark and Xonai share a
memory pool. This mode is selected by default if spark.memory.offHeap.size
and
spark.memory.offHeap.enabled
are specified.
The default usage looks as the following:
--conf spark.memory.offHeap.size=30g
--conf spark.memory.offHeap.enabled=true
Overhead¶
Xonai uses part of the executor memory overhead with a memory pool separate from Spark. This mode is
selected by default if spark.executor.memoryOverhead
is specified. The default executor memory
overhead is 10% of spark.executor.memory
(18% for EMR).
For Xonai to run optimally, the executor memory (JVM) must be reduced and the memory overhead increased. For example, if an application has the following memory configuration:
--conf spark.executor.memory=30g
--conf spark.executor.memoryOverhead=5g
It can be modified to:
--conf spark.executor.memory=10g
--conf spark.executor.memoryOverhead=25g
By default, the Xonai memory size is around overhead - memory * overhead factor
with a default
factor of 0.2
. For the example above the result is spark.xonai.memory.size=23g
.
The overhead memory mode is recommended when most or all operations in an application are supported by Xonai. Otherwise, non-supported operations may underperform due to limited memory.
Dynamic¶
Xonai monitors memory allocation by the JVM and it uses non-allocated memory. This mode is selected by default if no other mode is selected. As the JVM grows in size the Xonai memory size decreases.
This memory mode can be configured using:
--conf spark.xonai.memory.mode=dynamic
--conf spark.xonai.memory.size=1g
In this mode, the value of spark.xonai.memory.size
defines a lower-bound and defaults to 50%
of the executor memory overhead. If the dynamic memory size is low the plugin will stop to offload
operations to the Xonai engine.
The dynamic memory mode is the most conservative and it is recommended when testing the plugin for the first time or when using the plugin in applications with frequent code changes.