> ## Documentation Index
> Fetch the complete documentation index at: https://docs.sdf.com/llms.txt
> Use this file to discover all available pages before exploring further.

# User Defined Functions

> SDF allows you to author user defined functions using function signatures in sdf.yml blocks. Types are used to statically analyze SQL, provide label propagation and lineage computation.

User-defined functions allow you to extend a database by adding a new function, that can be evaluated in SQL. The SQL standard distinguishes between scalar, aggregate and\
table functions. A scalar function takes a set of values and returns only a single value (or NULL), an aggregate function takes a column of values and return
a single value, whereas a table function returns a (relational) table comprising zero or more rows, each row with one or more columns.

SDF allows you to define almost any imagenable user-defined function. The following sections define how to add user defined function signatures.

## Scalar functions

SDF distinguishes scalar functions in various dimensions:

* They can have fixed or varying arity, i.e number of parameters
* They can have one defintion or be overloaded, i.e. have many defintions.
* They can be monomorphic, i.e., defined for one type only, or polymorphic, i.e. defined to operate on all types
* They can be first-order, i.e., take only data as parameters, or higher order, take also functions as parameter.
  We explain each of these concepts and the authoring of the corresponding function signature in turn.

### Fixed Arity

The function `substr` is a typical scalar function. It takes two arguments of the types `varchar` and `int`, respectively.
We say `substr` has **fixed arity** two. The expression `substr(text,start)` returns  a `varchar`.

In SDF we write `substr`'s function signature as follows:

```yaml theme={null}
function:
  name: substr
  parameters: 
    - datatype: varchar
    - datatype: int
  returns: 
    datatype: varchar
```

By default function definitions are for scalar functions. If we want to make a function kind explicit we add the `function-kind` field.

```yaml theme={null}
function:
  name: substr
  function-kind: scalar
  ...
```

Alternative function kinds are `aggregate`, and `table` functions, which we will introduce later.

### Overloading

The function `substr` is overloaded. You can call it with two parameters, as we saw above, or with three parameters, e.g. `substr(string, start, length)`.
To allow for a second defintion of `substr` you have to declare one of them to be an `overload`.

```yaml theme={null}
function:
  name: substr
  overload: true
  parameters: 
    - datatype: varchar
    - datatype: int
    - datatype: int
  volatility: pure
  returns: 
    datatype: varchar
```

The default for `overload` is false.

### Uniform Variadic Arity

Many functions in SQL can take an arbitrary number of arguments. For instance `concat` allows to concat an arbitrary number strings.
We say `concat` is **uniform variadic**, that is the function can have one or more parameters but all must have the same type. In SDF we write:

```yaml theme={null}
function:
  name: concat
  variadic: uniform
  parameters: 
    - datatype: varchar
  returns: 
    datatype: varchar
```

### Generic types and functions

Modern SQL does not only have primitive types like `int` and `varchar`, but also composite types like `array`, `map`, and `struct`. These composite types are generic,
so that we can build arrays of ints, arrays of varchar or array of array of ints, etc.\
SDF denotes the type of

* **arrays of type** *T* as `array\<`*T*`\>` where T is either a concrete type, like `varchar`, or a **type variable**, which we denote
  here  by  `$1`, `$2`, etc.
* **generic maps** of key type *T* to value type *S* by `map\<`*T*`,`*S*`\>`, where *T* and *S* can either be concrete types or type variables.
* **named structs** of *n*-fields using the syntax `struct\<x`*1*` T`*1*`, … x`*n*` T`*n*`\>`, where each `T`*i* can either be concrete types or a type variable. Some SQL variants, like Spark, allow to drop the names `x`*i*.

With these notation in hand, we can now define an overloaded version of `concat` that allows concatenating arrays of arbitrary but fixed type:

```yaml theme={null}
function:
  name: concat
  variadic: uniform
  overload: true
  parameters: 
    - datatype: array\<$1\>
  returns: 
    datatype: array\<$1\>
```

### Higher-order functions

Trino and Spark support higher order function applications. As an example, Trino's  `transform` function applies a function pointwise to an array,
e.g., the expression `transform(array[1,2,3], x -> x*x)` returns in Trino the value `array[1,4,9]`.

In SDF we capture the **signature of function** via the `lambda\<`*A*`,`*T*`\>` type constructor, where *A* stands for the argument type and *R* for the return type.
The function `transform` thus has the following type signature:

```yaml theme={null}
function:
  name: transform
  parameters: 
    - datatype: array\<$1\>
    - datatype: lambda\<$1, $2\>
  volatility: pure
  kind: scalar
  returns: 
    datatype: array\<$2\>
```

Lambda functions can also take tuples as argument. The `array_sort` function, for instance, takes an array as the first parameter and a comparison operator as the second.

```yaml theme={null}
function:
  name: array_sort
  parameters: 
    - datatype: array\<$1\>
    - datatype: lambda\<tuple\<$1,$1\>,int\>
  returns: 
    datatype: array\<$1\>
```

### Bounded types

Sometimes functions don't work over all types, but can only applied to a limited set of types.
We call this set **type parameter bounds**. The `abs` function for instance can only be applied to number types and to intervals.
So we define `abs` as a type parameter bounded function.

```yaml theme={null}
function:
  name: abs
  parameters: 
    - datatype: $1
  returns: 
    datatype: $1
  with:
    - bound: $1
      datatypes:
       - number
       - interval
```

## Aggregate and window functions

Aggregate functions take a column of values and return a scalar value. For instance `select sum(x) from values (1),(2),(3) T(x)` computes the sum of the `x` column, here `5`.

While the implementation might be non-trivial the signature of aggregate functions is easy. We just specify the functions kind as aggregate, the type of the aggregate function is its column type.
Thus for `sum` we get the trivial signature:

```yaml theme={null}
function:
  name: sum
  function-kind: aggregate
    parameters: 
    - datatype: $1
  returns: 
    datatype: $1
  with:
    - bound: $1
      datatypes:
       - number
```

## Further features

### Dialect

All of SDF's function definitions are dialect specific. We specify the dialect as part of the function signature.
So to define `substr` as available for Trino we write:

```yaml theme={null}
function:
  name: substr
  dialect: trino
  parameters: 
    - datatype: varchar
    - datatype: int
  returns: 
    datatype: varchar
```

### Volatility

SDF distinguishes three kinds of functions.

* **Pure** functions, like `substr(..)` return the same output for the same input.
* **Stable** functions, like `now()` return the same output within one query execution.
* **Non-deterministic** functions, like `first()`, might return a different output for the same input.

We specify the volatility of each function using the volatility field, e.g.

```yaml theme={null}
function:
  name: now
  volatility: stable
  parameters: 
  returns: 
    datatype: datetime
```

The default volatility is `pure`. Knowing the volaitity is key for applying optimizations, like comon subexpression elimination.

### Non-uniform variadic functions

Spark offers several functions that are variadic but not uniform. For instance `map(1,'a',2, 'b', ...) builds a `map\<int,varchar>\`. By closer inspection we observe that every *even* position has one type, every *odd* position another.
In SDF we specify map to be of type

```yaml theme={null}
function:
  name: map
  variadic: even-odd
  parameters: 
    - datatype: $1
    - datatype: $2
  returns: 
    datatype: map\<$1,$2\>
  with:
    - bound: $1
      datatypes:
       - number
```

### Three-Part Function Names

SDF supports multi-part function names by specifying the catalog and schema these functions belong to in the name. The result is a fully qualified name for the function.

```yaml theme={null}
function:
  name: functions_db.functions_schema.my_custom_udf
  parameters: 
    - datatype: varchar
    - datatype: int
  returns: 
    datatype: varchar
```

This function can then be invoked in SQL like so:

```sql theme={null}
SELECT functions_db.functions_schema.my_custom_udf('hello', 1);
```

<Note>
  This pattern is often seen in Snowflake as functions are materialized to a database and schema.
</Note>

## Outlook

SDF function blocks allow to specify almost all of SQL's function signatures. Going forward we also want to enable you to write implementations, in SQL, Python and Rust. Stay tuned for the continuation of this story!
