File

Supermetal ingests files from object stores (S3, GCS, Azure Blob) and filesystems. Format, compression, and CSV dialect are detected from the files themselves. Syncs are incremental, processing only changed files. Every file streams through the pipeline without local disk or a file size cap.

Supported Backends

Backend	URL Format
Amazon S3	`s3://bucket/path`
Google Cloud Storage	`gs://bucket/path`
Azure Blob Storage	`az://container/path`
Local Filesystem	`file:///absolute/path`
SFTP (coming soon)	`sftp://host/path`
FTP (coming soon)	`ftp://host/path`

S3 compatible stores (MinIO, Cloudflare R2) use the S3 backend with a custom endpoint.

Supported Formats

Format	Extensions
Parquet	`.parquet`
CSV	`.csv`, `.tsv`, `.psv`
JSON (coming soon)	`.json`, `.jsonl`, `.ndjson`
Avro (coming soon)	`.avro`
Excel (coming soon)	`.xlsx`, `.xls`

Compression

Compression	Extensions
Gzip	`.gz`, `.gzip`
Zstandard	`.zst`, `.zstd`
Bzip2	`.bz2`
XZ	`.xz`
LZMA	`.lzma`
Brotli	`.br`
Deflate	`.deflate`
Zlib	`.zz`

Compression is detected from the file extension. data.csv.gz is detected as gzip compressed CSV.

Archive	Extensions
ZIP	`.zip`
TAR	`.tar`, `.tar.gz`, `.tgz`, `.tar.bz2`, `.tbz2`, `.tar.xz`, `.txz`, `.tar.zst`, `.tzst`, `.tar.lzma`, `.tlz`

Prerequisites

Read access to the object store or filesystem holding your files. The Setup tabs cover credentials per backend.
Network connectivity from the Supermetal agent to the source endpoint.
Write access, only when post processing deletes or moves files after sync.

Setup

Configure AWS S3

Create IAM Policy

Create an IAM policy with read only access to your bucket. Attach this policy to an IAM user or role.

Navigate to the AWS IAM Console.
Go to Policies and click "Create policy".
Select "JSON" and paste the following policy document:

Create a policy document file from the following template and run:

aws iam create-policy \
    --policy-name supermetal-file-source-policy \
    --policy-document file://policy.json

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "SupermetalFileSourcePolicy",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::your-bucket/*",
                "arn:aws:s3:::your-bucket"
            ]
        }
    ]
}

Post processing

If using post processing (delete or move files after sync), add s3:DeleteObject and s3:PutObject to the policy.

Connection details

You need the following:

Bucket name
Region
Access key ID (optional)
Secret access key (optional)

Instance Profile

When running on AWS (EC2, ECS, EKS), you can use an instance profile or IAM role instead of access keys. Attach the policy to your instance role and leave the access key fields empty.

S3 compatible stores

For S3-compatible stores (MinIO, Cloudflare R2, Wasabi), also specify the custom endpoint URL.

Configure Google Cloud Storage

Create Service Account

Navigate to the Google Cloud Console.
Go to IAM & Admin > Service Accounts.
Click "Create Service Account".
Enter a name (e.g., "supermetal-file-source").
Grant the "Storage Object Viewer" role.
If using post processing, also grant "Storage Object Admin".
Click "Done".

Generate Key File

Select the service account you created.
Go to the "Keys" tab.
Click "Add Key" > "Create new key".
Select "JSON" and click "Create".
Save the downloaded JSON key file securely.

Connection details

You need the following:

Service account JSON key file
Bucket name

Configure Azure Blob Storage

Create SAS Token

Navigate to your storage account in the Azure Portal.
Go to Containers and select your container.
Click "Shared access tokens".
Configure the SAS settings:
- Permissions: Read, List
- Set start and expiry time
Click "Generate SAS token and URL".
Copy the SAS token.

end=$(date -u -d "1 year" '+%Y-%m-%dT%H:%MZ')

az storage container generate-sas \
    --name your-container \
    --account-name your-storage-account \
    --permissions rl \
    --expiry $end \
    --output tsv

Post processing

If using post processing, add Write and Delete permissions to the SAS token.

Connection details

You need the following:

Storage account name
Container name
SAS token

Configure Local Filesystem

Specify the absolute path to the directory containing your files.

file:///path/to/your/files

Ensure the Supermetal agent has read access to the directory.

File Selection

Option	Description	Example
`glob_patterns`	Files to include	`*/.parquet` (all parquet files), `data/2024/*/.csv` (CSV under data/2024/)
`exclude_patterns`	Files to skip	`/_temporary/`, `/.staging/`
`start_date`	Ignore files modified before this timestamp	`2024-01-01T00:00:00Z`

Table Mapping

Auto Table Mapping

Each file becomes its own table. Table name derived from filename.

prefix: "raw_"

Source Files

s3://exports/
├── customers.parquet
├── products.parquet
└── transactions.parquet

→

Destination Tables

├── raw_customers
├── raw_products
└── raw_transactions

Single Table Mapping

All files load into one destination table.

destination: "orders"

Source Files

s3://vendor-data/
├── orders_jan.csv
├── orders_feb.csv
└── orders_mar.csv

→

Destination Tables

└── orders

Dynamic Table Mapping

Extract table name from file path using regex capture groups.

pattern: "(?P<entity>[^/]+)/(?P<year>[0-9]{4})/.*"
template: "{entity}_{year}"

Source Files

s3://datalake/
├── sales/2024/q1.parquet
├── sales/2024/q2.parquet
├── orders/2024/q1.parquet
└── orders/2024/q2.parquet

→

Destination Tables

├── sales_2024
└── orders_2024

Format Options

CSV Options

All options are detected automatically, including column data types (string, integer, float, boolean, date, timestamp).

Option	Description
`has_header`	First row contains column names
`delimiter`	Field separator (e.g., `,`, `\t`, `\|`)
`quote`	Character used to quote field values
`escape`	Character used to escape special characters
`comment`	Lines starting with this character are skipped
`terminator`	Line ending (e.g., `\n`, `\r\n`)
`null_values`	Strings treated as NULL (e.g., `["NULL", "\\N", ""]`)
`encoding`	Character encoding
`skip_rows`	Number of rows to skip before header
`allow_jagged_rows`	Allow rows with fewer columns, fill missing with NULL

Encoding

Detected automatically if not specified. Supports encodings from the WHATWG Encoding Standard: UTF-8, UTF-16LE, UTF-16BE, ISO-8859-1 through ISO-8859-16, Windows-1250 through Windows-1258, GBK, GB18030, Big5, EUC-JP, EUC-KR, Shift_JIS, ISO-2022-JP, KOI8-R, KOI8-U, and others.

Parquet Options

No configuration required. Schema and compression are read from file metadata.

Polling

Supermetal discovers new and modified files by polling the source location, every 60 seconds by default. Setting the interval to 0 disables polling and runs the sync once.

Error Handling

When a file fails to process, the sync logs the error, skips the file, and continues (default), or stops on the first error.

Post Processing

After successful processing, source files can be left in place (default), deleted, or moved to another path, for example from s3://bucket/inbox/ to s3://bucket/processed/. Post processing is best effort (a failed delete or move does not fail the sync) and requires write access to the source bucket.

Limitations

Nested archives not supported (e.g., .tar.gz containing .zip)
Object store API rate limits apply (S3, GCS, Azure). Supermetal retries throttled requests automatically.

Supported Backends

Supported Formats

Compression

Archives

Prerequisites

Setup

Configure AWS S3

Create IAM Policy

Connection details

Configure Google Cloud Storage

Create Service Account

Generate Key File

Connection details

Configure Azure Blob Storage

Create SAS Token

Connection details

Configure Local Filesystem

File Selection

Table Mapping

Auto Table Mapping

Single Table Mapping

Dynamic Table Mapping

Format Options

CSV Options

Encoding

Parquet Options

Polling

Error Handling

Post Processing

Limitations

Changelog

On this page