SDF allows you to author user defined functions using function signatures in sdf.yml blocks. Types are used to statically analyze SQL, provide label propagation and lineage computation.
substr
is a typical scalar function. It takes two arguments of the types varchar
and int
, respectively.
We say substr
has fixed arity two. The expression substr(text,start)
returns a varchar
.
In SDF we write substr
’s function signature as follows:
function-kind
field.
aggregate
, and table
functions, which we will introduce later.
substr
is overloaded. You can call it with two parameters, as we saw above, or with three parameters, e.g. substr(string, start, length)
.
To allow for a second defintion of substr
you have to declare one of them to be an overload
.
overload
is false.
concat
allows to concat an arbitrary number strings.
We say concat
is uniform variadic, that is the function can have one or more parameters but all must have the same type. In SDF we write:
int
and varchar
, but also composite types like array
, map
, and struct
. These composite types are generic,
so that we can build arrays of ints, arrays of varchar or array of array of ints, etc.array\<
T\>
where T is either a concrete type, like varchar
, or a type variable, which we denote
here by $1
, $2
, etc.map\<
T,
S\>
, where T and S can either be concrete types or type variables.struct\<x
1 T
1, … x
n T
n\>
, where each T
i can either be concrete types or a type variable. Some SQL variants, like Spark, allow to drop the names x
i.concat
that allows concatenating arrays of arbitrary but fixed type:
transform
function applies a function pointwise to an array,
e.g., the expression transform(array[1,2,3], x -> x*x)
returns in Trino the value array[1,4,9]
.
In SDF we capture the signature of function via the lambda\<
A,
T\>
type constructor, where A stands for the argument type and R for the return type.
The function transform
thus has the following type signature:
array_sort
function, for instance, takes an array as the first parameter and a comparison operator as the second.
abs
function for instance can only be applied to number types and to intervals.
So we define abs
as a type parameter bounded function.
select sum(x) from values (1),(2),(3) T(x)
computes the sum of the x
column, here 5
.
While the implementation might be non-trivial the signature of aggregate functions is easy. We just specify the functions kind as aggregate, the type of the aggregate function is its column type.
Thus for sum
we get the trivial signature:
substr
as available for Trino we write:
substr(..)
return the same output for the same input.now()
return the same output within one query execution.first()
, might return a different output for the same input.pure
. Knowing the volaitity is key for applying optimizations, like comon subexpression elimination.
map(1,'a',2, 'b', ...) builds a
map<int,varchar>`. By closer inspection we observe that every even position has one type, every odd position another.
In SDF we specify map to be of type