MainTargetsIceberg
Iceberg Setup
Prerequisites
You need a catalog (REST, Glue, or S3 Tables), storage credentials for where data files will be written (S3 or GCS), and a target namespace for table creation.
Setup
Catalog
Configure the Iceberg catalog where table metadata is stored.
| Field | Description |
|---|---|
| URI | Catalog endpoint (e.g., https://catalog.example.com) |
| Warehouse | Storage location identifier |
| Authentication | OAuth2, Bearer, Basic, or SigV4 |
Authentication methods:
| Method | Use Case |
|---|---|
| OAuth2 | Production environments with token endpoint, client ID/secret |
| Bearer | Service accounts, CI/CD with static token |
| Basic | Development, JDBC catalogs with username/password |
| SigV4 | AWS services requiring request signing (region, service) |
| Field | Description |
|---|---|
| Warehouse | S3 location (e.g., s3://my-bucket/warehouse) |
| Region | AWS region |
| Catalog ID | AWS account ID (optional) |
| Credentials | Access key and secret |
| Field | Description |
|---|---|
| Table Bucket ARN | S3 Tables bucket ARN |
| Region | AWS region |
| Credentials | Access key and secret |
Target Namespace
Tables are created under this namespace. For multi-level namespaces, use comma-separated values: my_database, my_schema creates tables under my_database.my_schema.
Storage Credentials
Credentials for writing Parquet data files to cloud storage.
| Field | Description |
|---|---|
| Access Key ID | AWS access key |
| Secret Access Key | AWS secret key |
| Region | AWS region (e.g., us-east-1) |
| Endpoint | Custom endpoint for S3-compatible storage |
| Path Style Access | Enable for MinIO and similar |
| Field | Description |
|---|---|
| Credentials JSON | Service account key (base64-encoded) |
| Project ID | GCP project identifier |
Write Options
Control how data is written to Iceberg tables. See Write Modes for details on Merge on Read vs Append.
| Field | Default | Description |
|---|---|---|
| Spec Version | V3 | Iceberg table format version |
| Write Mode | Merge on Read | How updates and deletes are handled |
| Delete Mode | Soft | For Merge on Read: Soft preserves audit trail, Hard removes rows |
| Truncate Table if exists | Off | Remove existing data before snapshot sync |
| Metadata Compression | Gzip | Compression for Iceberg metadata files |
| Flush Interval | 10000 ms | Commit frequency |
Parquet Settings
Configure the Parquet file format. Defaults work well for most workloads.
| Field | Default | Description |
|---|---|---|
| Compression | Zstd | Zstd, Snappy, Gzip, Lz4Raw, Brotli, or Uncompressed |
| Compression Level | 3 | Zstd (1-22), Gzip (0-9), or Brotli (0-11) |
| Target File Size | 512 MB | Files roll when exceeding this size |
| Parquet Version | V1 | V1 for compatibility, V2 for better encoding |
Last updated on