Overview¶

Xonai helps data-driven organizations to drastically reduce costs of petabyte-scale Apache Spark pipelines without requiring changes to code, infrastructure or platform.

Xonai meets the high demand of reducing ballooning data infrastructure expenses with new technology that is both effective and non-disruptive to established environments and workflows. This is achieved by ensuring full API compatibility with Apache Spark and delivering massive data processing acceleration on commodity hardware available in public and private cloud environments, such as Intel, AMD and ARM processors.

Xonai Accelerator¶

The core solution responsible for reducing resource-intensive Spark application costs is the Xonai Accelerator, a plugin for Apache Spark 3 that can be trivially activated via Spark properties.

Activating the Xonai Accelerator in any Spark application is easy enough to be entirely exemplified in the following code snippet:

spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --jars <scheme>://<path>/xonai-spark-plugin-<package>-<runtime>-<release>-<channel>-<platform>-<arch>.jar \
  --conf spark.plugins=com.xonai.spark.SQLPlugin \
  $SPARK_HOME/examples/jars/spark-examples.jar 1000

The Xonai Accelerator JAR, plugin class and memory configuration are set up in the job submission. The process is the same regardless of the Spark runtime (open-source, Databricks, EMR and others).

The Xonai Accelerator will prefix supported query physical plan nodes with Xon, which will be accelerated as a result. As a simple example with TPC-H Q6:

== Physical Plan ==
AdaptiveSparkPlan (27)
+- == Final Plan ==
   ColumnarToRow (18)
   +- * XonHashAggregate (15)
      +- XonCoalesceBatches (13)
         +- ShuffleQueryStage (11)
            +- XonExchange (10)
               +- XonCoalesceBatches (9)
                  +- * XonHashAggregate (6)
                     +- * XonProject (5)
                        +- * XonFilter (4)
                           +- XonParquetScan (2)
                              +- Scan parquet (1)

Reduced Spark Costs¶

Xonai establishes a new benchmark in performing petabyte-scale ETL with a new technology that taps into the full potential the hardware.

It does not employ the common legacy approach of having a runtime with faster implementations of individual plan steps, but a full blown new type of compiler and unified DSL designed from the ground up to eliminate the many layers of indirection inherently present in data analytics as it efficiently combines steps in a minimal set of optimized kernels to accelerate throughput to the fullest.

This primary features behind reducing costs of petabyte-scale ETL up to 80% are summarized below.

Query Plan Fusion¶

Query plans (e.g. Filters, Aggregates, Joins) are represented in a canonical loop format with our internal DSL to facilitate fusing them into a minimal set of optimized kernels. This eliminates inherent inefficiencies present when running multiple individual operations in sequence, which may move data across different steps and cannot be co-optimized altogether.

The benefit of this core optimization is particularly pronounced in complex stages with many operations as these are transformed into a few vectorized loops.

Up to 4X Faster Aggregations¶

The Xonai Accelerator has matured to deal with the most resource-intensive applications aggregating hundreds of billions of rows, which typically have complex aggregate functions, are memory-intensive and may spill data to the disk.

Acceleration is guaranteedly delivered via a compiler-generated fast vectorized loop, which also benefits from generalized plan fusion described above as previous steps may be combined in the aggregate loop.

Memory is reduced by incorporating a fast and cache-efficient process of repartition and sorting data at the machine instruction level while aggregating. This eliminates the default Spark shuffle repartition and sort manager, while guaranteeing the same order of results.

Memory can be reduced up to 50% by activating the Xonai Shuffle Manager and performance improved due to integrated aggregate repartition and sorting while greatly reducing the likelyhood of spilling memory to the disk.

Up to 6X Faster Cache Serializer¶

The Xonai Accelerator has enabled by default a fast cache serializer measured to be up to 6X faster in production environments compared to the default Spark cache mechanism. The lz4 compression scheme is used by default as in Spark, while zstd and uncompressed schemes are also available and deliver more performance at the cost of more memory.

Last update: Jun 17, 2025