Compatibility Reference

Apache Spark supports a large number of operations and data types for structured data processing. This page documents how data transformation results produced by the Xonai Accelerator are compatible with Spark and details the support status of each operation.

Compatibility of Results

The Xonai Accelerator produces the same data transformation results as Apache Spark, except for cases where Spark cannot itself guarantee to be deterministic or are left to the implementation to define.

In practice, this means that inconsistency between results is already present in Spark itself, even if the user is unaware of it. The following sections document specific cases where this occurs but is nevertheless expected.

Floating Point Arithmetic

Floating point calculations are never expected to be exact because of the fundamental limitation of representing continuous real numbers with a fixed set of bits.

The Xonai Accelerator supports the same 4-byte single-precision and 8-byte double-precision floating-point types as Spark, but both engines may produce results with very small discrepancies as the order of machine instructions is implementation-defined.

For example, Spark itself may produce inconsistent results for the same application if the JDK version is changed, as each JDK may produce instructions to do floating point arithmetic with a different order or because of changes in math-related builtins.

When maintaining arithmetic precision is critical, such as in currency calculations, the Spark Decimal type should be used instead.

Ordering of Results

Rows with the same sorting or grouping values may be returned in a different order than default Spark, but the SQL standard specification is always adhered to.

This means that the order of results of aggregations is inherently inconsistent, as the order of elements in the underlying hash table is implementation-defined, and this applies to other operations such as sort-merge join.

When sorting, both Spark and the Xonai Accelerator comply with the SQL standards specification and do not guarantee stable sorting, meaning the order of rows with equal sorting may not be the same between different Spark engines.

Backporting Bug Fixes

When the Xonai Accelerator supports a new Spark release version, it backports bug fixes that will also fix results-related bugs in previous Spark versions (see this as an example).


Operation Support

The Xonai Accelerator is regularly updated to support new operations, while components not yet supported will simply fall back to the default Spark execution engine.

This section documents all operation support status and is updated at every new release.

Support Status Symbols

The following table describes the meaning of each status support symbol.

Symbol

Description

Supported

Unsupported at the moment

Not applicable (type does not apply to the corresponding plan or expression)

Undetermined

Data Types

The following table describes the meaning of each abbreviated type name in the table and a brief description of the type. See the official documentation for more information.

Type Name

Spark Type

Description

byte

ByteType

Represents 1-byte signed integer numbers.

short

ShortType

Represents 2-byte signed integer numbers.

int

IntType

Represents 4-byte signed integer numbers.

long

LongType

Represents 8-byte signed integer numbers.

float

FloatType

Represents 4-byte single-precision floating point numbers.

double

DoubleType

Represents 8-byte double-precision floating point numbers.

decimal

DecimalType

Represents arbitrary-precision signed decimal numbers.

string

StringType

VarCharType

CharType

Represents character string values.

bin

BinaryType

Represents byte sequence values.

bool

BooleanType

Represents boolean values.

tstamp

TimestampType

Represents values with year, month, day, hour, minute, and second fields.

date

DateType

Represents values with year, month and day fields, without a time zone.

calendar

CalendarInterval

Represents calendar intervals.

array

ArrayType

Represents a sequence of elements of a specific type.

map

MapType

Represents a set of key-value pairs.

struct

StructType

Represents a sequence of named fields.

udt

-

User-defined types and Java objects (non-standard SQL types).

SparkPlan or Executor Nodes

Expression

Param(s)

Numeric Types

Misc. Types

Date/Time Types

Complex Types

byte

short

int

long

float

double

decimal

string

bin

bool

null

tstamp

date

calendar

array

map

struct

udt

AggregateInPandasExec

To be supported soon

ArrowEvalPythonExec

To be supported soon

BroadcastExchangeExec

Input/Result

BroadcastHashJoinExec

Input 1

Input 2

Result

BroadcastNestedLoopJoinExec

To be supported soon

CartesianProductExec

To be supported soon

CoalesceExec

To be supported soon

CollectLimitExec

To be supported soon

DataWritingCommandExec

To be supported soon

ExpandExec

Input/Result

FilterExec

Input/Result

FlatMapGroupsInPandasExec

To be supported soon

GenerateExec

Input/Result

GlobalLimitExec

To be supported soon

HashAggregateExec

Input/Result

InMemoryTableScanExec

To be supported soon

LocalLimitExec

To be supported soon

MapInPandasExec

To be supported soon

ObjectHashAggregateExec

Input/Result

ProjectExec

Input/Result

RangeExec

To be supported soon

SampleExec

To be supported soon

ShuffleExchangeExec

Input/Result

ShuffledHashJoinExec

Input 1

Input 2

Result

SortAggregateExec

Input/Result

SortExec

Input/Result

SortMergeJoinExec

Input 1

Input 2

Result

TakeOrderedAndProjectExec

Input/Result

UnionExec

Input/Result

WindowExec

To be supported soon

WindowInPandasExec

To be supported soon

Expressions and SQL Functions

Aggregate Expressions

Expression

Param(s)

Numeric Types

Misc. Types

Date/Time Types

Complex Types

byte

short

int

long

float

double

decimal

string

bin

bool

null

tstamp

date

calendar

array

map

struct

udt

ApproxCountDistinctForIntervals

To be supported soon

ApproximatePercentile

To be supported soon

Average

Input

Result

BitAndAgg

Input/Result

BitOrAgg

Input/Result

BitXorAgg

Input/Result

StddevPop

Input/Result

StddevSamp

Input/Result

VariancePop

Input/Result

VarianceSamp

Input/Result

Skewness

Input/Result

Kurtosis

Input/Result

CollectList

To be supported soon

CollectSet

To be supported soon

Corr

To be supported soon

Count

Input

Result

CountIf

To be supported soon

CountMinSketchAgg

To be supported soon

CovPopulation

Input 1

Input 2

Result

CovSample

Input 1

Input 2

Result

First

Input/Result

HyperLogLogPlusPlus

To be supported soon

AggregateExpression

aggFunc

filter

Result

Last

To be supported soon

Max

Input/Result

MaxBy

To be supported soon

MinBy

To be supported soon

Min

Input/Result

Percentile

To be supported soon

PivotFirst

To be supported soon

Sum

Input

Result

spark-alchemy Expressions

Expression

Param(s)

Numeric Types

Misc. Types

Date/Time Types

Complex Types

byte

short

int

long

float

double

decimal

string

bin

bool

null

tstamp

date

calendar

array

map

struct

udt

HyperLogLogInitSimpleAgg

Input/Result

HyperLogLogCardinality

Input

Result

Arithmetic Expressions

Expression

Param(s)

Numeric Types

Misc. Types

Date/Time Types

Complex Types

byte

short

int

long

float

double

decimal

string

bin

bool

null

tstamp

date

calendar

array

map

struct

udt

Abs

Input/Result

UnaryPositive

Input/Result

UnaryMinus

Input/Result

Add

Input 1

Input 2

Result

Subtract

Input 1

Input 2

Result

Multiply

Input 1

Input 2

Result

Divide

Input 1

Input 2

Result

IntegralDivide

Input 1

Input 2

Result

Remainder

Input 1

Input 2

Result

Pmod

Input 1

Input 2

Result

Array Type Expressions

Expression

Param(s)

Numeric Types

Misc. Types

Date/Time Types

Complex Types

byte

short

int

long

float

double

decimal

string

bin

bool

null

tstamp

date

calendar

array

map

struct

udt

CreateArray

To be supported soon

GetArrayItem

To be supported soon

Concat

To be supported soon

Sequence

To be supported soon

Slice

To be supported soon

Flatten

To be supported soon

Shuffle

To be supported soon

Reverse

To be supported soon

SortArray

To be supported soon

ArraysZip

To be supported soon

ArrayContains

To be supported soon

ArraysOverlap

To be supported soon

ArrayJoin

To be supported soon

ArrayMin

To be supported soon

ArrayMax

To be supported soon

ArrayPosition

To be supported soon

ArrayRepeat

To be supported soon

ArrayRemove

To be supported soon

ArrayDistinct

To be supported soon

ArrayUnion

To be supported soon

ArrayIntersect

To be supported soon

ArrayExcept

To be supported soon

GetArrayStructFields

To be supported soon

ArrayTransform

To be supported soon

ArraySort

To be supported soon

ArrayFilter

To be supported soon

ArrayExists

To be supported soon

ArrayForAll

To be supported soon

ArrayAggregate

To be supported soon

ZipWith

To be supported soon

Bitwise Expressions

Expression

Param(s)

Numeric Types

Misc. Types

Date/Time Types

Complex Types

byte

short

int

long

float

double

decimal

string

bin

bool

null

tstamp

date

calendar

array

map

struct

udt

BitwiseAnd

Input 1

Input 2

Result

BitwiseOr

Input 1

Input 2

Result

BitwiseXor

Input 1

Input 2

Result

BitwiseNot

Input/Result

BitwiseCount

To be supported soon

Core Expressions

Expression

Param(s)

Numeric Types

Misc. Types

Date/Time Types

Complex Types

byte

short

int

long

float

double

decimal

string

bin

bool

null

tstamp

date

calendar

array

map

struct

udt

Cast

Input/Result

Collection Expressions

Expression

Param(s)

Numeric Types

Misc. Types

Date/Time Types

Complex Types

byte

short

int

long

float

double

decimal

string

bin

bool

null

tstamp

date

calendar

array

map

struct

udt

Size

Input

Result

ElementAt

To be supported soon

Conditional Expressions

Expression

Param(s)

Numeric Types

Misc. Types

Date/Time Types

Complex Types

byte

short

int

long

float

double

decimal

string

bin

bool

null

tstamp

date

calendar

array

map

struct

udt

If

predicate

trueValue

falseValue

Result

CaseWhen

else

then

Result

Constraint Expressions

Expression

Param(s)

Numeric Types

Misc. Types

Date/Time Types

Complex Types

byte

short

int

long

float

double

decimal

string

bin

bool

null

tstamp

date

calendar

array

map

struct

udt

KnownNotNull

To be supported soon

CSV Expressions

Expression

Param(s)

Numeric Types

Misc. Types

Date/Time Types

Complex Types

byte

short

int

long

float

double

decimal

string

bin

bool

null

tstamp

date

calendar

array

map

struct

udt

CsvToStructs

To be supported soon

SchemaOfCsv

To be supported soon

StructsToCsv

To be supported soon

Datetime Expressions

Expression

Param(s)

Numeric Types

Misc. Types

Date/Time Types

Complex Types

byte

short

int

long

float

double

decimal

string

bin

bool

null

tstamp

date

calendar

array

map

struct

udt

CurrentTimeZone

To be supported soon

CurrentDate

To be supported soon

CurrentTimestamp

To be supported soon

Now

To be supported soon

CurrentBatchTimestamp

To be supported soon

DateAdd

Input 1

Input 2

Result

DateSub

Input 1

Input 2

Result

Hour

To be supported soon

Minute

To be supported soon

Second

To be supported soon

SecondWithFraction

To be supported soon

DayOfYear

Input

Result

DateFromUnixDate

Input

Result

UnixDate

Input

Result

SecondsToTimestamp

Input

Result

MillisToTimestamp

Input

Result

MicrosToTimestamp

Input

Result

UnixSeconds

Input

Result

UnixMillis

Input

Result

UnixMicros

Input

Result

Year

Input

Result

YearOfWeek

To be supported soon

Quarter

To be supported soon

Month

Input

Result

DayOfMonth

To be supported soon

DayOfWeek

Input

Result

WeekDay

Input

Result

WeekOfYear

To be supported soon

DateFormatClass

To be supported soon

ToUnixTimestamp

To be supported soon

UnixTimestamp

To be supported soon

FromUnixTime

To be supported soon

LastDay

Input/Result

NextDay

To be supported soon

TimeAdd

To be supported soon

DatetimeSub

To be supported soon

DateAddInterval

To be supported soon

FromUTCTimestamp

To be supported soon

ToUTCTimestamp

To be supported soon

AddMonths

Input 1

Input 2

Result

MonthsBetween

To be supported soon

ParseToDate

To be supported soon

ParseToTimestamp

To be supported soon

TruncDate

To be supported soon

TruncTimestamp

To be supported soon

DateDiff

Input 1

Input 2

Result

MakeDate

To be supported soon

MakeTimestamp

To be supported soon

Extract

To be supported soon

SubtractTimestamps

To be supported soon

SubtractDates

To be supported soon

Decimal Expressions

Expression

Param(s)

Numeric Types

Misc. Types

Date/Time Types

Complex Types

byte

short

int

long

float

double

decimal

string

bin

bool

null

tstamp

date

calendar

array

map

struct

udt

UnscaledValue

To be supported soon

MakeDecimal

To be supported soon

CheckOverflow

To be supported soon

CheckOverflowInSum

To be supported soon

Generator Expressions

Expression

Param(s)

Numeric Types

Misc. Types

Date/Time Types

Complex Types

byte

short

int

long

float

double

decimal

string

bin

bool

null

tstamp

date

calendar

array

map

struct

udt

UserDefinedGenerator

To be supported soon

Stack

To be supported soon

ReplicateRows

To be supported soon

GeneratorOuter

To be supported soon

Explode

Input

Result

PosExplode

To be supported soon

Inline

To be supported soon

Hash Expressions

Expression

Param(s)

Numeric Types

Misc. Types

Date/Time Types

Complex Types

byte

short

int

long

float

double

decimal

string

bin

bool

null

tstamp

date

calendar

array

map

struct

udt

Md5

To be supported soon

Sha2

To be supported soon

Sha1

To be supported soon

Crc32

To be supported soon

Murmur3Hash

Input

Result

XxHash64

Input

Result

HiveHash

To be supported soon

Input File Expressions

Expression

Param(s)

Numeric Types

Misc. Types

Date/Time Types

Complex Types

byte

short

int

long

float

double

decimal

string

bin

bool

null

tstamp

date

calendar

array

map

struct

udt

InputFileName

To be supported soon

InputFileBlockStart

To be supported soon

InputFileBlockLength

To be supported soon

Interval Expressions

Expression

Param(s)

Numeric Types

Misc. Types

Date/Time Types

Complex Types

byte

short

int

long

float

double

decimal

string

bin

bool

null

tstamp

date

calendar

array

map

struct

udt

ExtractIntervalYears

To be supported soon

ExtractIntervalMonths

To be supported soon

ExtractIntervalDays

To be supported soon

ExtractIntervalHours

To be supported soon

ExtractIntervalMinutes

To be supported soon

ExtractIntervalSeconds

To be supported soon

MultiplyInterval

To be supported soon

DivideInterval

To be supported soon

MakeInterval

To be supported soon

JSON Expressions

Expression

Param(s)

Numeric Types

Misc. Types

Date/Time Types

Complex Types

byte

short

int

long

float

double

decimal

string

bin

bool

null

tstamp

date

calendar

array

map

struct

udt

GetJsonObject

To be supported soon

JsonTuple

To be supported soon

JsonToStructs

To be supported soon

StructsToJson

To be supported soon

SchemaOfJson

To be supported soon

LengthOfJsonArray

To be supported soon

JsonObjectKeys

To be supported soon

Lambda Expressions

Expression

Param(s)

Numeric Types

Misc. Types

Date/Time Types

Complex Types

byte

short

int

long

float

double

decimal

string

bin

bool

null

tstamp

date

calendar

array

map

struct

udt

NamedLambdaVariable

To be supported soon

LambdaFunction

To be supported soon

Map Type Expressions

Expression

Param(s)

Numeric Types

Misc. Types

Date/Time Types

Complex Types

byte

short

int

long

float

double

decimal

string

bin

bool

null

tstamp

date

calendar

array

map

struct

udt

CreateMap

To be supported soon

GetMapValue

To be supported soon

MapFromArrays

To be supported soon

StringToMap

To be supported soon

MapKeys

To be supported soon

MapValues

To be supported soon

MapEntries

To be supported soon

MapConcat

To be supported soon

MapFromEntries

To be supported soon

MapFilter

To be supported soon

TransformKeys

To be supported soon

TransformValues

To be supported soon

MapZipWith

To be supported soon

Math Expressions

Expression

Param(s)

Numeric Types

Misc. Types

Date/Time Types

Complex Types

byte

short

int

long

float

double

decimal

string

bin

bool

null

tstamp

date

calendar

array

map

struct

udt

Acos

To be supported soon

Asin

To be supported soon

Atan

Input/Result

Cbrt

To be supported soon

Ceil

Input

Result

Cos

Input/Result

Cosh

To be supported soon

Acosh

To be supported soon

Conv

To be supported soon

Exp

Input/Result

Expm1

Input/Result

Floor

Input

Result

Factorial

Input

Result

Log

Input/Result

Log2

Input/Result

Log10

Input/Result

Log1p

Input/Result

Rint

Input/Result

Signum

Input/Result

Sin

Input/Result

Sinh

To be supported soon

Asinh

Input/Result

Sqrt

Input/Result

Tan

To be supported soon

Cot

To be supported soon

Tanh

Input/Result

Atanh

Input/Result

ToDegrees

Input/Result

ToRadians

Input/Result

Bin

To be supported soon

Hex

To be supported soon

Unhex

To be supported soon

Atan2

Input 1

Input 2

Result

Pow

Input 1

Input 2

Result

ShiftLeft

Input 1

Input 2

Result

ShiftRight

Input 1

Input 2

Result

ShiftRightUnsigned

Input 1

Input 2

Result

Hypot

To be supported soon

Logarithm

Input 1

Input 2

Result

Round

To be supported soon

BRound

To be supported soon

WidthBucket

To be supported soon

NormalizeNaNAndZero

Input/Result

Miscellaneous Expressions

Expression

Param(s)

Numeric Types

Misc. Types

Date/Time Types

Complex Types

byte

short

int

long

float

double

decimal

string

bin

bool

null

tstamp

date

calendar

array

map

struct

udt

SparkVersion

To be supported soon

SparkPartitionID

To be supported soon

TypeOf

To be supported soon

Uuid

To be supported soon

Rand

To be supported soon

Randn

To be supported soon

MonotonicallyIncreasingID

To be supported soon

PrintToStderr

To be supported soon

RaiseError

To be supported soon

AssertTrue

To be supported soon

PythonUDF

To be supported soon

ScalaUDF

To be supported soon

Null Expressions

Expression

Param(s)

Numeric Types

Misc. Types

Date/Time Types

Complex Types

byte

short

int

long

float

double

decimal

string

bin

bool

null

tstamp

date

calendar

array

map

struct

udt

Coalesce

Input/Result

NullIf

To be supported soon

Nvl

To be supported soon

Nvl2

To be supported soon

IsNaN

Input

Result

NaNvl

Input 1

Input 2

Result

IsNull

Input

Result

IsNotNull

Input

Result

AtLeastNNonNulls

To be supported soon

Ordering Expressions

Expression

Param(s)

Numeric Types

Misc. Types

Date/Time Types

Complex Types

byte

short

int

long

float

double

decimal

string

bin

bool

null

tstamp

date

calendar

array

map

struct

udt

SortOrder

child

sameOrderExpressions

Result

Least

To be supported soon

Greatest

To be supported soon

Predicate Expressions

Expression

Param(s)

Numeric Types

Misc. Types

Date/Time Types

Complex Types

byte

short

int

long

float

double

decimal

string

bin

bool

null

tstamp

date

calendar

array

map

struct

udt

Not

Input/Result

InSubquery

To be supported soon

InSet

Input

Result

And

Input 1

Input 2

Result

Or

Input 1

Input 2

Result

EqualTo

Input 1

Input 2

Result

EqualNullSafe

Input 1

Input 2

Result

LessThan

Input 1

Input 2

Result

LessThanOrEqual

Input 1

Input 2

Result

GreaterThan

Input 1

Input 2

Result

GreaterThanOrEqual

Input 1

Input 2

Result

In

value

list

Result

Regex Expressions

Expression

Param(s)

Numeric Types

Misc. Types

Date/Time Types

Complex Types

byte

short

int

long

float

double

decimal

string

bin

bool

null

tstamp

date

calendar

array

map

struct

udt

Like

Input 1

Input 2

Result

LikeAll

To be supported soon

NotLikeAll

To be supported soon

LikeAny

To be supported soon

NotLikeAny

To be supported soon

RLike

To be supported soon

StringSplit

To be supported soon

RegExpReplace

To be supported soon

RegExpExtract

To be supported soon

RegExpExtractAll

To be supported soon

String Expressions

Expression

Param(s)

Numeric Types

Misc. Types

Date/Time Types

Complex Types

byte

short

int

long

float

double

decimal

string

bin

bool

null

tstamp

date

calendar

array

map

struct

udt

ConcatWs

sep

strings

Result

Elt

To be supported soon

Upper

Input/Result

Lower

Input/Result

Contains

Input 1

Input 2

Result

StartsWith

Input 1

Input 2

Result

EndsWith

Input 1

Input 2

Result

StringReplace

To be supported soon

Overlay

To be supported soon

StringTranslate

To be supported soon

FindInSet

To be supported soon

StringTrim

To be supported soon

StringTrimLeft

To be supported soon

StringTrimRight

To be supported soon

StringInstr

Input 1

Input 2

Result

SubstringIndex

To be supported soon

StringLocate

To be supported soon

StringLPad

To be supported soon

StringRPad

To be supported soon

ParseUrl

To be supported soon

FormatString

To be supported soon

InitCap

To be supported soon

StringRepeat

To be supported soon

StringSpace

Input

Result

Substring

str

pos

len

Result

Right

To be supported soon

Left

To be supported soon

Length

Input

Result

BitLength

Input

Result

OctetLength

Input

Result

Levenshtein

To be supported soon

SoundEx

To be supported soon

Ascii

Input

Result

Chr

Input

Result

Base64

To be supported soon

UnBase64

To be supported soon

Decode

To be supported soon

Encode

To be supported soon

FormatNumber

To be supported soon

Sentences

To be supported soon

Struct Type Expressions

Expression

Param(s)

Numeric Types

Misc. Types

Date/Time Types

Complex Types

byte

short

int

long

float

double

decimal

string

bin

bool

null

tstamp

date

calendar

array

map

struct

udt

CreateNamedStruct

To be supported soon

GetStructField

Input

Result

Window Expressions

Expression

Param(s)

Numeric Types

Misc. Types

Date/Time Types

Complex Types

byte

short

int

long

float

double

decimal

string

bin

bool

null

tstamp

date

calendar

array

map

struct

udt

RowNumber

To be supported soon

CumeDist

To be supported soon

NthValue

To be supported soon

NTile

To be supported soon

Rank

To be supported soon

DenseRank

To be supported soon

PercentRank

To be supported soon

PreciseTimestampConversion

To be supported soon

XML Expressions

Expression

Param(s)

Numeric Types

Misc. Types

Date/Time Types

Complex Types

byte

short

int

long

float

double

decimal

string

bin

bool

null

tstamp

date

calendar

array

map

struct

udt

XPathBoolean

To be supported soon

XPathShort

To be supported soon

XPathInt

To be supported soon

XPathLong

To be supported soon

XPathFloat

To be supported soon

XPathDouble

To be supported soon

XPathString

To be supported soon

XPathList

To be supported soon