Overview

The Xonai Accelerator plugs into Apache Spark 3.1+ via plugin interface, which allows injecting a custom backend for Spark APIs, such as SQL and DataFrame, and execute applications faster without requiring application code changes.

The Xonai Accelerator backend generates code for supported SQL operations with a custom DSL and MLIR-based compiler purpose-built to optimize data analytics programs. This replaces the default Java code generation and JVM compiler if the SQL operation is supported (see compatibility reference), otherwise it will fallback to the default Spark Catalyst engine and convert data between engines.

The Xonai Accelerator does not interfere with query planning, execution model or any other mechanism other than replacing the Catalyst engine at execution time only for the purpose of doing batch processing. This is a design principle in order to avoid breaking existing enterprise-grade applications or requiring tuning them for a custom backend.

The mandatory requirements to run the Xonai Accelerator are:

  • Apache Spark 3.1+

  • Compatible Spark runtimes:

    • Amazon EMR

    • Databricks

    • Google Dataproc

    • Open-source distribution of Apache Spark 3.1+

  • Access to Xonai Accelerator JARs

  • Access to Spark configuration properties (may not be possible in specific managed platforms)

Additionally, you may deploy additional components that work together with the Xonai Accelerator to complement it.

Supported Spark Runtimes

The Xonai Accelerator is distributed as a set of JARs, one for each supported Spark runtime as shown in the following table:

Spark Version

Runtime

Open Source

Amazon EMR

Databricks

Google Dataproc

3.1.1

(6.3.X)

3.1.2

(6.4.0, 6.5.0)

(9.1 LTS)

3.1.3

(2.0.X)

3.2.0

(6.6.0)

3.2.1

(6.7.0)

(10.4 LTS)

3.2.2

3.2.3

3.2.4

3.3.0

(6.8.X, 6.9.X)

(11.3 LTS)

3.3.1

(6.10.X)

3.3.2

(6.11.X)

(12.2 LTS)

(2.1.X)

3.3.3

3.4.0

(6.12.X)

3.4.1

(13.3 LTS)

3.4.2

3.4.3

3.5.0

(14.3 LTS)

(2.2.X)

3.5.1

Each JAR uses the following naming convention:

xonai-spark-plugin-<package>-<runtime>-<release>-<channel>-<platform>-<arch>.jar

Depending on your execution environment, each tag may be named as listed as shown in the following tables:

Package <package>

Label

Open-source distribution

oss

Amazon EMR

emr

Databricks

dbx

Google Dataproc

gdp

Supported Linux Distributions

The Xonai Accelerator works on any glibc-2.17-or-later-based amd64 or arm64 Linux distribution. A non-exhaustive list includes, but is not limited to:

  • Debian >= 8

  • Ubuntu >= 14.06

  • Red Hat Enterprise Linux >= 7

  • CentOS >= 7

Supported JDKs and Scala

The Xonai Accelerator is compatible with JDK 1.8, 11 and 17. All JARs are built with Scala 2.12.

Plugin Activation

The Xonai Accelerator can be activated on a per-application basis via Spark 3 configuration just by adding the following properties:

--jars <scheme>://<path>/xonai-spark-plugin-<package>-<runtime>-<release>-<channel>-<platform>-<arch>.jar
--conf spark.plugins=com.xonai.spark.SQLPlugin

The Xonai Accelerator uses off-heap memory to process data and requires moving a fraction of the executor memory to spark.executor.memoryOverhead property to run optimally.

For example, if an application has the following memory configuration:

--conf spark.executor.memory=30g
--conf spark.executor.memoryOverhead=5g

It can be modified to:

--conf spark.executor.memory=10g
--conf spark.executor.memoryOverhead=25g

If the Xonai Accelerator supports operations peaking in memory usage, then a significant fraction of the JVM executor memory can be assigned to the memory overhead property.