table functions. A scalar function takes a set of values and returns only a single value (or NULL), an aggregate function takes a column of values and return a single value, whereas a table function returns a (relational) table comprising zero or more rows, each row with one or more columns. SDF allows you to define almost any imagenable user-defined function. The following sections define how to add user defined function signatures.
Scalar functions
SDF distinguishes scalar functions in various dimensions:- They can have fixed or varying arity, i.e number of parameters
- They can have one defintion or be overloaded, i.e. have many defintions.
- They can be monomorphic, i.e., defined for one type only, or polymorphic, i.e. defined to operate on all types
- They can be first-order, i.e., take only data as parameters, or higher order, take also functions as parameter. We explain each of these concepts and the authoring of the corresponding function signature in turn.
Fixed Arity
The functionsubstr
is a typical scalar function. It takes two arguments of the types varchar
and int
, respectively.
We say substr
has fixed arity two. The expression substr(text,start)
returns a varchar
.
In SDF we write substr
’s function signature as follows:
function-kind
field.
aggregate
, and table
functions, which we will introduce later.
Overloading
The functionsubstr
is overloaded. You can call it with two parameters, as we saw above, or with three parameters, e.g. substr(string, start, length)
.
To allow for a second defintion of substr
you have to declare one of them to be an overload
.
overload
is false.
Uniform Variadic Arity
Many functions in SQL can take an arbitrary number of arguments. For instanceconcat
allows to concat an arbitrary number strings.
We say concat
is uniform variadic, that is the function can have one or more parameters but all must have the same type. In SDF we write:
Generic types and functions
Modern SQL does not only have primitive types likeint
and varchar
, but also composite types like array
, map
, and struct
. These composite types are generic,
so that we can build arrays of ints, arrays of varchar or array of array of ints, etc.SDF denotes the type of
- arrays of type T as
array\<
T\>
where T is either a concrete type, likevarchar
, or a type variable, which we denote here by$1
,$2
, etc. - generic maps of key type T to value type S by
map\<
T,
S\>
, where T and S can either be concrete types or type variables. - named structs of n-fields using the syntax
struct\<x
1T
1, … x
nT
n\>
, where eachT
i can either be concrete types or a type variable. Some SQL variants, like Spark, allow to drop the namesx
i.
concat
that allows concatenating arrays of arbitrary but fixed type:
Higher-order functions
Trino and Spark support higher order function applications. As an example, Trino’stransform
function applies a function pointwise to an array,
e.g., the expression transform(array[1,2,3], x -> x*x)
returns in Trino the value array[1,4,9]
.
In SDF we capture the signature of function via the lambda\<
A,
T\>
type constructor, where A stands for the argument type and R for the return type.
The function transform
thus has the following type signature:
array_sort
function, for instance, takes an array as the first parameter and a comparison operator as the second.
Bounded types
Sometimes functions don’t work over all types, but can only applied to a limited set of types. We call this set type parameter bounds. Theabs
function for instance can only be applied to number types and to intervals.
So we define abs
as a type parameter bounded function.
Aggregate and window functions
Aggregate functions take a column of values and return a scalar value. For instanceselect sum(x) from values (1),(2),(3) T(x)
computes the sum of the x
column, here 5
.
While the implementation might be non-trivial the signature of aggregate functions is easy. We just specify the functions kind as aggregate, the type of the aggregate function is its column type.
Thus for sum
we get the trivial signature:
Further features
Dialect
All of SDF’s function definitions are dialect specific. We specify the dialect as part of the function signature. So to definesubstr
as available for Trino we write:
Volatility
SDF distinguishes three kinds of functions.- Pure functions, like
substr(..)
return the same output for the same input. - Stable functions, like
now()
return the same output within one query execution. - Non-deterministic functions, like
first()
, might return a different output for the same input.
pure
. Knowing the volaitity is key for applying optimizations, like comon subexpression elimination.
Non-uniform variadic functions
Spark offers several functions that are variadic but not uniform. For instancemap(1,'a',2, 'b', ...) builds a
map<int,varchar>`. By closer inspection we observe that every even position has one type, every odd position another.
In SDF we specify map to be of type
Three-Part Function Names
SDF supports multi-part function names by specifying the catalog and schema these functions belong to in the name. The result is a fully qualified name for the function.This pattern is often seen in Snowflake as functions are materialized to a database and schema.