Overview¶
The Xonai Accelerator plugs into Apache Spark 3.1+ via plugin interface, which allows injecting a custom backend for Spark APIs, such as SQL and DataFrame, and execute applications faster without requiring application code changes.
The Xonai Accelerator backend generates code for supported SQL operations with a custom DSL and MLIR-based compiler purpose-built to optimize data analytics programs. This replaces the default Java code generation and JVM compiler if the SQL operation is supported (see compatibility reference), otherwise it will fallback to the default Spark Catalyst engine and convert data between engines.
The Xonai Accelerator does not interfere with query planning, execution model or any other mechanism other than replacing the Catalyst engine at execution time only for the purpose of doing batch processing. This is a design principle in order to avoid breaking existing enterprise-grade applications or requiring tuning them for a custom backend.
The mandatory requirements to run the Xonai Accelerator are:
Apache Spark 3.1+
Compatible Spark runtimes:
Amazon EMR
Databricks
Google Dataproc
Open-source distribution of Apache Spark 3.1+
Access to Xonai Accelerator JARs
Access to Spark configuration properties (may not be possible in specific managed platforms)
Additionally, you may deploy additional components that work together with the Xonai Accelerator to complement it.
Supported Spark Runtimes¶
The Xonai Accelerator is distributed as a set of JARs, one for each supported Spark runtime as shown in the following table:
Spark Version |
Runtime |
||||
---|---|---|---|---|---|
Open Source |
Amazon EMR |
Databricks |
Google Dataproc |
||
3.1.1 |
(6.3.X) |
||||
3.1.2 |
(6.4.0, 6.5.0) |
(9.1 LTS) |
|||
3.1.3 |
(2.0.X) |
||||
3.2.0 |
(6.6.0) |
||||
3.2.1 |
(6.7.0) |
(10.4 LTS) |
|||
3.2.2 |
|||||
3.2.3 |
|||||
3.2.4 |
|||||
3.3.0 |
(6.8.X, 6.9.X) |
(11.3 LTS) |
|||
3.3.1 |
(6.10.X) |
||||
3.3.2 |
(6.11.X) |
(12.2 LTS) |
(2.1.X) |
||
3.3.3 |
|||||
3.4.0 |
(6.12.X) |
||||
3.4.1 |
(13.3 LTS) |
||||
3.4.2 |
|||||
3.4.3 |
|||||
3.5.0 |
(14.3 LTS) |
(2.2.X) |
|||
3.5.1 |
Each JAR uses the following naming convention:
xonai-spark-plugin-<package>-<runtime>-<release>-<channel>-<platform>-<arch>.jar
Depending on your execution environment, each tag may be named as listed as shown in the following tables:
Package <package> |
Label |
---|---|
Open-source distribution |
oss |
Amazon EMR |
emr |
Databricks |
dbx |
Google Dataproc |
gdp |
Supported Linux Distributions¶
The Xonai Accelerator works on any glibc-2.17-or-later-based amd64 or arm64 Linux distribution. A non-exhaustive list includes, but is not limited to:
Debian >= 8
Ubuntu >= 14.06
Red Hat Enterprise Linux >= 7
CentOS >= 7
Supported JDKs and Scala¶
The Xonai Accelerator is compatible with JDK 1.8, 11 and 17. All JARs are built with Scala 2.12.
Plugin Activation¶
The Xonai Accelerator can be activated on a per-application basis via Spark 3 configuration just by adding the following properties:
--jars <scheme>://<path>/xonai-spark-plugin-<package>-<runtime>-<release>-<channel>-<platform>-<arch>.jar
--conf spark.plugins=com.xonai.spark.SQLPlugin
The Xonai Accelerator uses off-heap memory to process data and requires moving a fraction of the executor memory to spark.executor.memoryOverhead property to run optimally.
For example, if an application has the following memory configuration:
--conf spark.executor.memory=30g
--conf spark.executor.memoryOverhead=5g
It can be modified to:
--conf spark.executor.memory=10g
--conf spark.executor.memoryOverhead=25g
If the Xonai Accelerator supports operations peaking in memory usage, then a significant fraction of the JVM executor memory can be assigned to the memory overhead property.