This document aims to outline the concept of indexing within the SDF CLI and outline a specific example to better articulate how to take advantage of this powerful feature.
catalog/schema/table-name
. Following this structure, SDF auto-applies
fully qualified names to the DDLs allowing for faster search and compilation.
Within SDF, the process of indexing begins with the first sdf compile
command.
This creates an index of the local filesystem and workspace, allowing for files
and dependencies to be mapped. The default index type, unless specified within
the workspace, is set to none
.
csv_123
sample workspace is used. If this workspace is not already
set up, it can be created with an sdf new --sample csv_123
command.Basic Index
none
, every time you run sdf compile
SDF begins a lookup to
find the SQL defining the table in question. SDF begins by parsing a query, understands that it
depends on another table and attempts to find the subsequent table. With indexing set appropriately,
SDF knows exactly where the query files needed are stored. Comparitively, without indexing, or
indexing set to none
, SDF has to scan until it located the query needed.In this example, the current directory is set to:Indexing at Scale with Catalog-Schema-Table-Name
none
for indexing works at reasonable scale,
but when it comes to large organizations that have thousands of tables utilizing a different
index function can drastically increase the speed of the process.Three other options are the following:catalog-schema-table-name
schema-table-name
table-name
catalog-schema-table-name
and defining defaults for the catalog
and schema
:catalog-schema-table-name
as
the index type, requires that under src
and local
files are structured in a directories like<catalog>/<schema>/<table-name>.sql
.workspace.sdf.yml
. It is
important to add and update the catalog
field within defaults
of the workspace.sdf.yml
file to reflect
the appropriate catalog SDF will use for indexing.sdf compile
after making these changes and updating the index results in an improvemnet of total
time to run, reducing down to less than 0.1 seconds.Indexing with Schema-Table-Name
<schema>/<table-name>
then you
are able to use the index type schema-table-name.To test this example type, update the workspace.sdf.yml
to reflect the new indexing types and update
the folder directories to reflect this change. It is important to set the catalog
within defaults
on each
of the includes paths. For example, add the catalog
default to each path, src
and local
below.
If the default catalog is not set in the top-level workspace block, the catalog will default to the workspace name.