> ## Documentation Index
> Fetch the complete documentation index at: https://docs.sdf.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Executing on Data from S3

> SDF DB can execute queries on data stored in S3 parquet, CSV, JSON, gzip, and more

## Overview

This guide will walk you through the steps of creating an AWS S3
integration with SDF to analyze world population data.

## Guide to walking through the AWS S3 example

<Steps>
  <Step title="Create Hello World S3 Workspace from Example">
    Begin by running the following command to create an SDF workspace from the example
    `hello_world_s3`.

    ```shell theme={null}
    sdf new --sample hello_world_s3 && cd hello_world_s3
    ```

    This command will generate a new workspace containing a two environments - one configured to run data locally, the other configured to pull data down
    from an SDF-hosted public S3 bucket `s3://sdf-datasets`.

    The project structure looks like this:

    <div className="bg-[#0F1117] dark:bg-codeblock rounded-xl dark:ring-1 dark:ring-gray-800/50 relative">
      <pre style={{ fontFamily: 'monospace', backgroundColor: 'transparent' }} className="language-shell">
        <code className="language-shell">
          .
          ├── local
          │   ├── pop.csv
          │   ├── popdata.sql
          │   └── world\_metrics.sql
          ├── remote
          │   ├── q1.sql
          │   └── un\_pop\_data.sql
          └── workspace.sdf.yml

          3 directories, 6 files
        </code>
      </pre>
    </div>
  </Step>

  <Step title="Build using local population data">
    To start, let's run the queries on the local CSV data packaged into the sample. To do this, we can use the local environment defined in the `workspace.sdf.yml`.

    The local environment is configured to run the queries on the `pop.csv` file in the `local` directory. Let's run our queries to get a sense for the data we're working with.

    ```shell theme={null}
    sdf run --show all
    ```

    <div className="bg-[#0F1117] dark:bg-codeblock rounded-xl dark:ring-1 dark:ring-gray-800/50 relative">
      <pre style={{ fontFamily: 'monospace', backgroundColor: 'transparent' }} className="language-shell">
        <code className="language-shell">
          Working set 1 .sdf file
             Finished run in 1.596 secs
        </code>
      </pre>
    </div>

    <Note>
      Since the default environment specified in the `workspace.sdf.yml` is `local` we don't need specify the environment (`-e`) in the compile command.
    </Note>
  </Step>

  <Step title="Authenticate SDF using a local AWS Profile">
    Nice! We're run the queries locally. Let's move on to running them on remote data pulled down from S3.

    Let's start by authenticating SDF to AWS. This will be used to pull data from our public bucket (`s3://sdfdatasets`),
    so this can be any AWS profile you have available locally.

    <Note>
      If you have not set up an AWS profile yet, follow the instructions on the
      [Getting Started with S3 Guide](/integrations/aws/S3/getting-started)
    </Note>

    Run the following command:

    ```cmd theme={null}
    sdf auth login aws --profile <profile>
    ```

    <Info>
      This will create a new [credential](/reference/sdf-yml#block-credential) in a `~/.sdf/`
      directory in the root of your system. This credential will be used to authenticate with
      AWS services. By default, the credential's name is default. As such, the credential does
      not need to be explicitly referenced in the integrations configuration below.
    </Info>

    To validate the connection, run:

    ```shell theme={null}
    sdf auth status
    ```
  </Step>

  <Step title="Run using the remote environment">
    Now that we've successfully authenticated SDF with AWS, we can try running in the remote environment.

    This command will pull down the data from remote, then run the queries on the downloaded data locally. It will then cache the results and the downloaded data,
    so that subsequent runs will be much faster.

    Let's try it!

    ```shell theme={null}
    sdf run -e remote --show all
    ```

    <div className="bg-[#0F1117] dark:bg-codeblock rounded-xl dark:ring-1 dark:ring-gray-800/50 relative">
      <pre style={{ fontFamily: 'monospace', backgroundColor: 'transparent' }} className="language-shell">
        <code className="language-shell">
          Working set 2 model files, 1 .sdf file
              Running hello\_world.pub.un\_pop\_data (./remote/un\_pop\_data.sql)
          Downloading hello\_world.pub.un\_pop\_data  (s3://sdfdatasets/hello-world/world\_population\_full.csv)
              Running hello\_world.pub.q1 (./remote/q1.sql)
             Finished 2 models \[2 succeeded] in 2.233 secs

          Table hello\_world.pub.un\_pop\_data
          ┌───────┬───────────┬───────────────────┬───────────────┬─────────────────┬───────┬──────┬──────────────────────┬──────────────────────────────────┬────────────┐
          │ Index ┆ Variant   ┆ region\_or\_country ┆ Location\_Code ┆ ISO3\_Alpha\_code ┆ Type  ┆ Year ┆ Population\_Thousands ┆ Population\_Density\_Per\_Square\_KM ┆ Median\_Age │
          ╞═══════╪═══════════╪═══════════════════╪═══════════════╪═════════════════╪═══════╪══════╪══════════════════════╪══════════════════════════════════╪════════════╡
          │ 1     ┆ Estimates ┆ WORLD             ┆ 900           ┆                 ┆ World ┆ 1950 ┆  2 477 675           ┆ 19.2                             ┆ 22.2       │
          │ 2     ┆ Estimates ┆ WORLD             ┆ 900           ┆                 ┆ World ┆ 1951 ┆  2 520 970           ┆ 19.5                             ┆ 22.2       │
          │ 3     ┆ Estimates ┆ WORLD             ┆ 900           ┆                 ┆ World ┆ 1952 ┆  2 565 291           ┆ 19.9                             ┆ 22.1       │
          │ 4     ┆ Estimates ┆ WORLD             ┆ 900           ┆                 ┆ World ┆ 1953 ┆  2 615 251           ┆ 20.2                             ┆ 22.0       │
          │ 5     ┆ Estimates ┆ WORLD             ┆ 900           ┆                 ┆ World ┆ 1954 ┆  2 665 307           ┆ 20.6                             ┆ 22.0       │
          │ 6     ┆ Estimates ┆ WORLD             ┆ 900           ┆                 ┆ World ┆ 1955 ┆  2 718 652           ┆ 21.1                             ┆ 21.8       │
          │ 7     ┆ Estimates ┆ WORLD             ┆ 900           ┆                 ┆ World ┆ 1956 ┆  2 773 493           ┆ 21.5                             ┆ 21.7       │
          │ 8     ┆ Estimates ┆ WORLD             ┆ 900           ┆                 ┆ World ┆ 1957 ┆  2 828 513           ┆ 21.9                             ┆ 21.6       │
          │ 9     ┆ Estimates ┆ WORLD             ┆ 900           ┆                 ┆ World ┆ 1958 ┆  2 887 221           ┆ 22.4                             ┆ 21.5       │
          │ 10    ┆ Estimates ┆ WORLD             ┆ 900           ┆                 ┆ World ┆ 1959 ┆  2 944 995           ┆ 22.8                             ┆ 21.5       │
          └───────┴───────────┴───────────────────┴───────────────┴─────────────────┴───────┴──────┴──────────────────────┴──────────────────────────────────┴────────────┘
          20595 rows, showing only 10 rows.
            Run with --limit 0 to show all rows.

          Table hello\_world.pub.q1
          ┌───────────────────┬─────────────────┬──────────────────────────────────┐
          │ region\_or\_country ┆ ISO3\_Alpha\_code ┆ Population\_Density\_Per\_Square\_KM │
          ╞═══════════════════╪═════════════════╪══════════════════════════════════╡
          │ Greenland         ┆                 ┆ 0.1                              │
          └───────────────────┴─────────────────┴──────────────────────────────────┘
          1 rows.
        </code>
      </pre>
    </div>

    Great! We've showcased how SDF can pull down remote data for local execution.

    <Tip>
      Run it again, and notice how much faster it is while using the cache :)
    </Tip>
  </Step>
</Steps>
