MainTargetsIceberg

Iceberg Setup

Prerequisites

You need a catalog (REST, Glue, or S3 Tables), storage credentials for where data files will be written (S3 or GCS), and a target namespace for table creation.


Setup

Catalog

Configure the Iceberg catalog where table metadata is stored.

FieldDescription
URICatalog endpoint (e.g., https://catalog.example.com)
WarehouseStorage location identifier
AuthenticationOAuth2, Bearer, Basic, or SigV4

Authentication methods:

MethodUse Case
OAuth2Production environments with token endpoint, client ID/secret
BearerService accounts, CI/CD with static token
BasicDevelopment, JDBC catalogs with username/password
SigV4AWS services requiring request signing (region, service)
FieldDescription
WarehouseS3 location (e.g., s3://my-bucket/warehouse)
RegionAWS region
Catalog IDAWS account ID (optional)
CredentialsAccess key and secret
FieldDescription
Table Bucket ARNS3 Tables bucket ARN
RegionAWS region
CredentialsAccess key and secret

Target Namespace

Tables are created under this namespace. For multi-level namespaces, use comma-separated values: my_database, my_schema creates tables under my_database.my_schema.

Storage Credentials

Credentials for writing Parquet data files to cloud storage.

FieldDescription
Access Key IDAWS access key
Secret Access KeyAWS secret key
RegionAWS region (e.g., us-east-1)
EndpointCustom endpoint for S3-compatible storage
Path Style AccessEnable for MinIO and similar
FieldDescription
Credentials JSONService account key (base64-encoded)
Project IDGCP project identifier

Write Options

Control how data is written to Iceberg tables. See Write Modes for details on Merge on Read vs Append.

FieldDefaultDescription
Spec VersionV3Iceberg table format version
Write ModeMerge on ReadHow updates and deletes are handled
Delete ModeSoftFor Merge on Read: Soft preserves audit trail, Hard removes rows
Truncate Table if existsOffRemove existing data before snapshot sync
Metadata CompressionGzipCompression for Iceberg metadata files
Flush Interval10000 msCommit frequency

Parquet Settings

Configure the Parquet file format. Defaults work well for most workloads.

FieldDefaultDescription
CompressionZstdZstd, Snappy, Gzip, Lz4Raw, Brotli, or Uncompressed
Compression Level3Zstd (1-22), Gzip (0-9), or Brotli (0-11)
Target File Size512 MBFiles roll when exceeding this size
Parquet VersionV1V1 for compatibility, V2 for better encoding

Last updated on

On this page