|
Bulk Insert
|
Uses `COPY FROM ... FORMAT BINARY` to efficiently load data in bulk compared to row-by-row updates, providing optimal throughput for large data transfers.
|
Transactional Consistency
|
Guarantees that the data synced reflects a consistent point in the upstream database's timeline, respecting transaction boundaries to maintain data integrity.
|
|
Schema Evolution
|
Automatic schema evolution to handle changes in source table structures, including new columns and data type modifications.
|
|
Hard Deletes
|
Row deletion is handled by issuing a `DELETE` statement, ensuring that deleted records are properly removed from the target database.
|
***
## Prerequisites
Before you begin, ensure you have:
* **Supported PostgreSQL Version:** PostgreSQL 9.4 or higher.
* [Generic PostgreSQL](https://www.postgresql.org/)
* [AWS RDS PostgreSQL](https://aws.amazon.com/rds/postgresql/)
* [AWS Aurora PostgreSQL](https://aws.amazon.com/rds/aurora/)
* [Google Cloud PostgreSQL](https://cloud.google.com/sql/postgresql)
* [Azure PostgreSQL](https://azure.microsoft.com/en-us/products/postgresql)
* **Database Admin Access:** Sufficient privileges to create users and grant permissions.
* **Network Connectivity:** Ensure that Supermetal's services can reach your PostgreSQL database. You may need to configure security groups.
***
## Setup
Log in to your PostgreSQL database and run the following:
```sql
-- Create a user for Supermetal
CREATE USER supermetal_user WITH PASSWORD 'strong-password';
-- Grant permissions
-- allows supermetal to connect to the database, create schemas and (temp) tables
GRANT CONNECT, CREATE, TEMPORARY ON DATABASE target_database TO supermetal_user;
```
Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the target in Supermetal.
Replace the placeholders in the script with your specific values:
* `target_database`: The name of your PostgreSQL database
If your destination schema has foreign key constraints, tables loaded in parallel may temporarily violate them until all referenced rows arrive. To defer constraint checks during writes, grant elevated privileges:
```sql
-- AWS RDS
GRANT rds_superuser TO supermetal_user;
-- Standard PostgreSQL (run as superuser)
ALTER USER supermetal_user WITH SUPERUSER;
```
For large snapshots, optimize checkpoint frequency to maximize COPY throughput and avoid I/O storms. Checkpoints trigger when either limit is reached first.
* **`max_wal_size`**: Maximum WAL size before checkpoint. COPY generates WAL rapidly, and small values (default: 1GB) cause frequent checkpoints that throttle performance. Increase to 16GB - 64GB to allow bulk loads to complete with fewer I/O interruptions.
* **`checkpoint_timeout`**: Maximum time between checkpoints. Default (5min) triggers checkpoints before large snapshots complete. Increase to 30min - 1h to prevent premature interruptions during extended loads.
## Data Types Mapping
| Apache Arrow DataType | PostgreSQL Type | Notes |
| :-------------------- | :----------------- | :---- |
| `Int8` | `SMALLINT` | |
| `Int16` | `SMALLINT` | |
| `Int32` | `INTEGER` | |
| `Int64` | `BIGINT` | |
| `UInt8` | `SMALLINT` | |
| `UInt16` | `INTEGER` | |
| `UInt32` | `BIGINT` | |
| `UInt64` | `NUMERIC` | |
| `Float16` | `REAL` | |
| `Float32` | `REAL` | |
| `Float64` | `DOUBLE PRECISION` | |
| `Decimal128(p,s)` | `NUMERIC(p,s)` | |
| `Decimal256(p,s)` | `NUMERIC(p,s)` | |
| Apache Arrow DataType | PostgreSQL Type | Notes |
| :------------------------- | :-------------- | :--------------------------------------------------------------------------------------------- |
| `Date32` | `DATE` | |
| `Date64` | `DATE` | |
| `Time32` | `TIME` | |
| `Time64` | `TIME` | |
| `Time64(ns)` | `BIGINT` | Nanosecond precision stored as integer because Postgres does not support nanosecond precision |
| `Timestamp(s/ms/us, [tz])` | `TIMESTAMP[Z]` | |
| `Timestamp(ns, [tz])` | `BIGINT` | Nanosecond timestamps stored as integer because Postgres does not support nanosecond precision |
| `Interval` | `INTERVAL` | |
| `Duration` | Not supported | Durations are explicitly not supported |
| Apache Arrow DataType | PostgreSQL Type | Notes |
| :-------------------- | :-------------- | :---- |
| `Utf8`, `LargeUtf8` | `TEXT` | |
| Apache Arrow DataType | PostgreSQL Type | Notes |
| :-------------------- | :-------------- | :------------------- |
| `Boolean` | `BOOLEAN` | Boolean (true/false) |
| Apache Arrow DataType | PostgreSQL Type | Notes |
| :--------------------------------------------------------------------------------------------------------- | :-------------- | :---- |
| `Utf8` [JSON Extension](https://arrow.apache.org/docs/format/CanonicalExtensions.html#json) (`arrow.json`) | `JSONB` | |
| `Struct` | Not supported | |
| `Map` | Not supported | |
| Apache Arrow DataType | PostgreSQL Type | Notes |
| :----------------------------------------- | :-------------- | :---- |
| `Binary`, `LargeBinary`, `FixedSizeBinary` | `BYTEA` | |
| Apache Arrow DataType | PostgreSQL Type | Notes |
| :-------------------------------------------- | :-------------- | :----------------------------------------------------------- |
| `List`, `LargeList`, `FixedSizeList` | `T[]` | Only one-dimensional arrays of primitive types are supported |
# PostgreSQL Setup (/docs/main/targets/postgres/setup)
## Prerequisites
Before you begin, ensure you have:
* **Supported PostgreSQL Version:** PostgreSQL 9.4 or higher.
* [Generic PostgreSQL](https://www.postgresql.org/)
* [AWS RDS PostgreSQL](https://aws.amazon.com/rds/postgresql/)
* [AWS Aurora PostgreSQL](https://aws.amazon.com/rds/aurora/)
* [Google Cloud PostgreSQL](https://cloud.google.com/sql/postgresql)
* [Azure PostgreSQL](https://azure.microsoft.com/en-us/products/postgresql)
* **Database Admin Access:** Sufficient privileges to create users and grant permissions.
* **Network Connectivity:** Ensure that Supermetal's services can reach your PostgreSQL database. You may need to configure security groups.
***
## Setup
Log in to your PostgreSQL database and run the following:
```sql
-- Create a user for Supermetal
CREATE USER supermetal_user WITH PASSWORD 'strong-password';
-- Grant permissions
-- allows supermetal to connect to the database, create schemas and (temp) tables
GRANT CONNECT, CREATE, TEMPORARY ON DATABASE target_database TO supermetal_user;
```
Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the target in Supermetal.
Replace the placeholders in the script with your specific values:
* `target_database`: The name of your PostgreSQL database
If your destination schema has foreign key constraints, tables loaded in parallel may temporarily violate them until all referenced rows arrive. To defer constraint checks during writes, grant elevated privileges:
```sql
-- AWS RDS
GRANT rds_superuser TO supermetal_user;
-- Standard PostgreSQL (run as superuser)
ALTER USER supermetal_user WITH SUPERUSER;
```
For large snapshots, optimize checkpoint frequency to maximize COPY throughput and avoid I/O storms. Checkpoints trigger when either limit is reached first.
* **`max_wal_size`**: Maximum WAL size before checkpoint. COPY generates WAL rapidly, and small values (default: 1GB) cause frequent checkpoints that throttle performance. Increase to 16GB - 64GB to allow bulk loads to complete with fewer I/O interruptions.
* **`checkpoint_timeout`**: Maximum time between checkpoints. Default (5min) triggers checkpoints before large snapshots complete. Increase to 30min - 1h to prevent premature interruptions during extended loads.
# Snowflake (/docs/main/targets/snowflake)
import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
import { Callout } from 'fumadocs-ui/components/callout';
import { Accordion, Accordions } from 'fumadocs-ui/components/accordion';
Snowflake is a popular cloud data warehouse that separates compute and storage.
This guide walks you through configuring your Snowflake warehouse to work seamlessly with Supermetal.
***
## Features
|
Feature
|
Notes
|
|
Bulk Insert
|
Uses the [new vectorized parquet scanner](https://www.snowflake.com/en/engineering-blog/loading-terabytes-into-snowflake-speeds-feeds-techniques/) to efficiently load data.
|
|
Transactionally Consistent
|
Guarantees that the data synced reflects a consistent point in the upstream database's timeline, respecting transaction boundaries.
|
|
Schema Evolution
|
Automatic schema evolution
|
|
Soft Delete(s)
|
Row deletion is handled by marking a `_sm_deleted` boolean metadata column value.
|
***
## Prerequisites
Before you begin, ensure you have:
* **Supported Snowflake Implementations:**
* AWS
* Azure
Snowflake on GCP is not yet supported.
* **Database Admin Access:** Sufficient privileges to create users, grant permissions. This might require `ACCOUNTADMIN` / `SYSADMIN` / `SECURITYADMIN` privileges.
***
## Setup
### Create a Role
```sql
begin;
set supermetal_role = 'SUPERMETAL_ROLE';
-- create Supermetal role
use role ACCOUNTADMIN;
create role if not exists identifier($supermetal_role);
grant role identifier($supermetal_role) to role SYSADMIN;
-- grant permissions to create database
grant CREATE DATABASE ON ACCOUNT TO ROLE identifier($supermetal_role);
commit;
```
### Create a User
```sql
begin;
set supermetal_role = 'SUPERMETAL_ROLE';
set supermetal_username = 'SUPERMETAL_USER';
set supermetal_password = 'strong-password';
-- create Supermetal user
use role ACCOUNTADMIN;
create user if not exists identifier($supermetal_username)
default_role = $supermetal_role
password = $supermetal_password;
grant role identifier($supermetal_role) to user identifier($supermetal_username);
commit;
```
Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the target in Supermetal.
Snowflake is [deprecating password authentication](https://community.snowflake.com/s/article/FAQ-Snowflake-to-block-password-only-logins). Please use other authentication methods instead.
### Create a User
```sql
begin;
set supermetal_role = 'SUPERMETAL_ROLE';
set supermetal_username = 'SUPERMETAL_USER';
-- create Supermetal user
use role ACCOUNTADMIN;
create user if not exists identifier($supermetal_username)
default_role = $supermetal_role
type = SERVICE;
grant role identifier($supermetal_role) to user identifier($supermetal_username);
commit;
```
### Create a Key Pair
Follow the snowflake [documentation](https://docs.snowflake.com/en/user-guide/key-pair-auth#configuring-key-pair-authentication) to:
* Generate a private key and a public key
* Assign the key pair to the supermetal user `ALTER USER SUPERMETAL_USER SET RSA_PUBLIC_KEY='MIIBIjANBgkqh...';`
### (Optional) Create a Warehouse
```sql
begin;
set supermetal_role = 'SUPERMETAL_ROLE';
set supermetal_username = 'SUPERMETAL_USER';
set supermetal_warehouse = 'SUPERMETAL_WAREHOUSE';
-- change role to sysadmin to create warehouse
use role sysadmin;
-- create Supermetal warehouse
create warehouse if not exists identifier($supermetal_warehouse)
warehouse_size = xsmall
warehouse_type = standard
auto_suspend = 60
auto_resume = true
initially_suspended = true;
-- uncomment to use Generation 2 Warehouse if supported
-- ALTER WAREHOUSE identifier($supermetal_warehouse) SET RESOURCE_CONSTRAINT = STANDARD_GEN_2;
-- grant Supermetal warehouse access
grant USAGE
on warehouse identifier($supermetal_warehouse)
to role identifier($supermetal_role);
commit;
```
***
## Transactions
When enabled, Supermetal uses Snowflake's [transactions](https://docs.snowflake.com/en/sql-reference/transactions) to apply changes atomically across multiple tables. This preserves transaction boundaries from your source database when possible, ensuring that all changes within a single source transaction are either fully applied or not applied at all.
***
## Data Types Mapping
| Apache Arrow DataType | Snowflake Type | Notes |
| :----------------------------------------------------------------------- | :------------- | :------------------------- |
| `Int8`, `Int16`, `Int32`, `Int64`, `UInt8`, `UInt16`, `UInt32`, `UInt64` | `NUMBER` | |
| `Float16`, `Float32`, `Float64` | `FLOAT` | |
| `Decimal128(p,s)` | `NUMBER(p,s)` | if `p <= 38` and `s <= 37` |
| `Decimal256(p,s)` | `VARCHAR` | if `p > 38` |
| Apache Arrow DataType | Snowflake Type | Notes |
| :--------------------- | :----------------- | :---- |
| `Date32`, `Date64` | `DATE` | |
| `Time32(ms)` | `TIME(0)` | |
| `Time64(us)` | `TIME(6)` | |
| `Time64(ns)` | `TIME(9)` | |
| `Timestamp(s, [tz])` | `TIMESTAMP_NTZ(0)` | |
| `Timestamp(ms, [tz])` | `TIMESTAMP_NTZ(3)` | |
| `Timestamp(us, [tz])` | `TIMESTAMP_NTZ(6)` | |
| `Timestamp(ns, [tz])` | `TIMESTAMP_NTZ(9)` | |
| `Duration`, `Interval` | `VARCHAR` | |
| Apache Arrow DataType | Snowflake Type | Notes |
| :-------------------- | :------------- | :---- |
| `Utf8`, `LargeUtf8` | `VARCHAR` | |
| Apache Arrow DataType | Snowflake Type | Notes |
| :-------------------- | :------------- | :---- |
| `Boolean` | `BOOLEAN` | |
| Apache Arrow DataType | Snowflake Type | Notes |
| :--------------------------------------------------------------------------------------------------------- | :------------- | :---- |
| `Struct` | `VARIANT` | |
| `Map` | `VARIANT` | |
| `Utf8` [JSON Extension](https://arrow.apache.org/docs/format/CanonicalExtensions.html#json) (`arrow.json`) | `VARIANT` | |
| Apache Arrow DataType | Snowflake Type | Notes |
| :----------------------------------------- | :------------- | :---- |
| `Binary`, `LargeBinary`, `FixedSizeBinary` | `BINARY` | |
| Apache Arrow DataType | Snowflake Type | Notes |
| :----------------------------------- | :------------- | :---- |
| `List`, `LargeList`, `FixedSizeList` | `VARIANT` | |
VARCHAR & BINARY data types are limited to 16MB and might be truncated ([TRUNCATECOLUMNS](https://docs.snowflake.com/en/sql-reference/sql/copy-into-table#copy-options-copyoptions) option).
***
# Snowflake Setup (/docs/main/targets/snowflake/setup)
## Prerequisites
Before you begin, ensure you have:
* **Supported Snowflake Implementations:**
* AWS
* Azure
Snowflake on GCP is not yet supported.
* **Database Admin Access:** Sufficient privileges to create users, grant permissions. This might require `ACCOUNTADMIN` / `SYSADMIN` / `SECURITYADMIN` privileges.
***
## Setup
### Create a Role
```sql
begin;
set supermetal_role = 'SUPERMETAL_ROLE';
-- create Supermetal role
use role ACCOUNTADMIN;
create role if not exists identifier($supermetal_role);
grant role identifier($supermetal_role) to role SYSADMIN;
-- grant permissions to create database
grant CREATE DATABASE ON ACCOUNT TO ROLE identifier($supermetal_role);
commit;
```
### Create a User
```sql
begin;
set supermetal_role = 'SUPERMETAL_ROLE';
set supermetal_username = 'SUPERMETAL_USER';
set supermetal_password = 'strong-password';
-- create Supermetal user
use role ACCOUNTADMIN;
create user if not exists identifier($supermetal_username)
default_role = $supermetal_role
password = $supermetal_password;
grant role identifier($supermetal_role) to user identifier($supermetal_username);
commit;
```
Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the target in Supermetal.
Snowflake is [deprecating password authentication](https://community.snowflake.com/s/article/FAQ-Snowflake-to-block-password-only-logins). Please use other authentication methods instead.
### Create a User
```sql
begin;
set supermetal_role = 'SUPERMETAL_ROLE';
set supermetal_username = 'SUPERMETAL_USER';
-- create Supermetal user
use role ACCOUNTADMIN;
create user if not exists identifier($supermetal_username)
default_role = $supermetal_role
type = SERVICE;
grant role identifier($supermetal_role) to user identifier($supermetal_username);
commit;
```
### Create a Key Pair
Follow the snowflake [documentation](https://docs.snowflake.com/en/user-guide/key-pair-auth#configuring-key-pair-authentication) to:
* Generate a private key and a public key
* Assign the key pair to the supermetal user `ALTER USER SUPERMETAL_USER SET RSA_PUBLIC_KEY='MIIBIjANBgkqh...';`
### (Optional) Create a Warehouse
```sql
begin;
set supermetal_role = 'SUPERMETAL_ROLE';
set supermetal_username = 'SUPERMETAL_USER';
set supermetal_warehouse = 'SUPERMETAL_WAREHOUSE';
-- change role to sysadmin to create warehouse
use role sysadmin;
-- create Supermetal warehouse
create warehouse if not exists identifier($supermetal_warehouse)
warehouse_size = xsmall
warehouse_type = standard
auto_suspend = 60
auto_resume = true
initially_suspended = true;
-- uncomment to use Generation 2 Warehouse if supported
-- ALTER WAREHOUSE identifier($supermetal_warehouse) SET RESOURCE_CONSTRAINT = STANDARD_GEN_2;
-- grant Supermetal warehouse access
grant USAGE
on warehouse identifier($supermetal_warehouse)
to role identifier($supermetal_role);
commit;
```
# File (/docs/main/sources/file)
import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
import { Callout } from 'fumadocs-ui/components/callout';
import { Accordion, Accordions } from 'fumadocs-ui/components/accordion';
FileSource ingests files from object stores (S3, GCS, Azure Blob) and filesystems. It handles format detection, compression, and incremental syncs automatically.
***
## Features
|
Feature
|
Notes
|
|
Zero-Config
|
File type, formats and compression detected from file extension. CSV dialect (delimiter, quote character, encoding) and column data types inferred automatically.
|
|
Change Detection
|
Only changed files are processed on each sync.
|
|
Streaming Architecture
|
Files processed in a streaming fashion. No local disk required. No file size limits.
|
|
Flexible Table Mapping
|
Map all files to one table, derive table names from filenames, or extract from paths using regex.
|
***
## Supported Backends
| Backend | URL Format |
| :------------------- | :---------------------- |
| Amazon S3 | `s3://bucket/path` |
| Google Cloud Storage | `gs://bucket/path` |
| Azure Blob Storage | `az://container/path` |
| Local Filesystem | `file:///absolute/path` |
| SFTP (coming soon) | `sftp://host/path` |
| FTP (coming soon) | `ftp://host/path` |
S3-compatible stores (MinIO, Cloudflare R2) use the S3 backend with a custom endpoint.
***
## Supported Formats
| Format | Extensions |
| :------------------ | :--------------------------- |
| Parquet | `.parquet` |
| CSV | `.csv`, `.tsv`, `.psv` |
| JSON (coming soon) | `.json`, `.jsonl`, `.ndjson` |
| Avro (coming soon) | `.avro` |
| Excel (coming soon) | `.xlsx`, `.xls` |
***
## Compression
| Compression | Extensions |
| :---------- | :-------------- |
| Gzip | `.gz`, `.gzip` |
| Zstandard | `.zst`, `.zstd` |
| Bzip2 | `.bz2` |
| XZ | `.xz` |
| LZMA | `.lzma` |
| Brotli | `.br` |
| Deflate | `.deflate` |
| Zlib | `.zz` |
Compression is detected from file extension. Example: `data.csv.gz` is detected as gzip-compressed CSV.
***
## Archives
| Archive | Extensions |
| :------ | :---------------------------------------------------------------------------------------------------------- |
| ZIP | `.zip` |
| TAR | `.tar`, `.tar.gz`, `.tgz`, `.tar.bz2`, `.tbz2`, `.tar.xz`, `.txz`, `.tar.zst`, `.tzst`, `.tar.lzma`, `.tlz` |
***
## Prerequisites
Before you begin, ensure you have:
* **Source Access**: Read access to the object store, filesystem, or remote server containing your files
* **Credentials**:
* AWS S3: IAM user or role with `s3:GetObject` and `s3:ListBucket` permissions
* GCS: Service account with Storage Object Viewer role
* Azure: SAS token with Read and List permissions
* Local Filesystem: Read permissions on the directory
* SFTP/FTP (coming soon): Username and password or SSH key
* **Network Connectivity**: Supermetal agent can reach the source endpoint
* **(Optional) Write Access**: Required only if using post-processing (delete or move files after sync)
***
## Setup
### Configure AWS S3
#### Create IAM Policy
Create an IAM policy with read-only access to your bucket. Attach this policy to an IAM user or role.
* Navigate to the [AWS IAM Console](https://console.aws.amazon.com/iam/).
* Go to Policies and click "Create policy".
* Select "JSON" and paste the following policy document:
Create a policy document file from the following template and run:
```bash
aws iam create-policy \
--policy-name supermetal-file-source-policy \
--policy-document file://policy.json
```
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "SupermetalFileSourcePolicy",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::your-bucket/*",
"arn:aws:s3:::your-bucket"
]
}
]
}
```
If using post-processing (delete or move files after sync), add `s3:DeleteObject` and `s3:PutObject` to the policy.
#### Configure File Source in Supermetal
You need the following connection details:
* Bucket name
* Region
* Access key ID (optional)
* Secret access key (optional)
When running on AWS (EC2, ECS, EKS), you can use an instance profile or IAM role instead of access keys. Attach the policy to your instance role and leave the access key fields empty.
For S3-compatible stores (MinIO, Cloudflare R2, Wasabi), also specify the custom endpoint URL.
### Configure Google Cloud Storage
#### Create Service Account
* Navigate to the [Google Cloud Console](https://console.cloud.google.com/).
* Go to IAM & Admin > Service Accounts.
* Click "Create Service Account".
* Enter a name (e.g., "supermetal-file-source").
* Grant the "Storage Object Viewer" role.
* If using post-processing, also grant "Storage Object Admin".
* Click "Done".
#### Generate Key File
* Select the service account you created.
* Go to the "Keys" tab.
* Click "Add Key" > "Create new key".
* Select "JSON" and click "Create".
* Save the downloaded JSON key file securely.
#### Configure File Source in Supermetal
You need the following:
* Service account JSON key file
* Bucket name
### Configure Azure Blob Storage
#### Create SAS Token
* Navigate to your storage account in the [Azure Portal](https://portal.azure.com/).
* Go to Containers and select your container.
* Click "Shared access tokens".
* Configure the SAS settings:
* Permissions: Read, List
* Set start and expiry time
* Click "Generate SAS token and URL".
* Copy the SAS token.
```bash
end=$(date -u -d "1 year" '+%Y-%m-%dT%H:%MZ')
az storage container generate-sas \
--name your-container \
--account-name your-storage-account \
--permissions rl \
--expiry $end \
--output tsv
```
If using post-processing, add Write and Delete permissions to the SAS token.
#### Configure File Source in Supermetal
You need the following:
* Storage account name
* Container name
* SAS token
### Configure Local Filesystem
Specify the absolute path to the directory containing your files:
```
file:///path/to/your/files
```
Ensure the Supermetal agent has read access to the directory.
***
## File Selection
| Option | Description | Example |
| :----------------- | :------------------------------------------ | :------------------------------------------------------------------------------ |
| `glob_patterns` | Files to include | `**/*.parquet` (all parquet files), `data/2024/**/*.csv` (CSV under data/2024/) |
| `exclude_patterns` | Files to skip | `**/_temporary/**`, `**/.staging/**` |
| `start_date` | Ignore files modified before this timestamp | `2024-01-01T00:00:00Z` |
***
## Table Mapping
### Auto Table Mapping
Each file becomes its own table. Table name derived from filename.
```
prefix: "raw_"
```
**Source Files**
```
s3://exports/
├── customers.parquet
├── products.parquet
└── transactions.parquet
```
→
**Destination Tables**
```
├── raw_customers
├── raw_products
└── raw_transactions
```
### Single Table Mapping
All files load into one destination table.
```
destination: "orders"
```
**Source Files**
```
s3://vendor-data/
├── orders_jan.csv
├── orders_feb.csv
└── orders_mar.csv
```
→
**Destination Tables**
```
└── orders
```
### Dynamic Table Mapping
Extract table name from file path using regex capture groups.
```
pattern: "(?P[^/]+)/(?P[0-9]{4})/.*"
template: "{entity}_{year}"
```
**Source Files**
```
s3://datalake/
├── sales/2024/q1.parquet
├── sales/2024/q2.parquet
├── orders/2024/q1.parquet
└── orders/2024/q2.parquet
```
→
**Destination Tables**
```
├── sales_2024
└── orders_2024
```
***
## Format Options
### CSV Options
All options are auto-detected, including column data types (string, integer, float, boolean, date, timestamp).
| Option | Description |
| :------------------ | :---------------------------------------------------- |
| `has_header` | First row contains column names |
| `delimiter` | Field separator (e.g., `,`, `\t`, `\|`) |
| `quote` | Character used to quote field values |
| `escape` | Character used to escape special characters |
| `comment` | Lines starting with this character are skipped |
| `terminator` | Line ending (e.g., `\n`, `\r\n`) |
| `null_values` | Strings treated as NULL (e.g., `["NULL", "\\N", ""]`) |
| `encoding` | Character encoding |
| `skip_rows` | Number of rows to skip before header |
| `allow_jagged_rows` | Allow rows with fewer columns, fill missing with NULL |
### Encoding
Auto-detected if not specified. Supports encodings from the [WHATWG Encoding Standard](https://encoding.spec.whatwg.org/#concept-encoding-get):
`UTF-8`, `UTF-16LE`, `UTF-16BE`, `ISO-8859-1` through `ISO-8859-16`, `Windows-1250` through `Windows-1258`, `GBK`, `GB18030`, `Big5`, `EUC-JP`, `EUC-KR`, `Shift_JIS`, `ISO-2022-JP`, `KOI8-R`, `KOI8-U`, and others.
### Parquet Options
No configuration required. Schema and compression are read from file metadata.
***
## Polling
| Option | Description | Default |
| :----------------- | :------------------------- | :------ |
| `interval_seconds` | Seconds between file scans | `60` |
Set to `0` for a one-time sync.
***
## Error Handling
| Option | Behavior |
| :--------------- | :----------------------------------- |
| `Skip` (default) | Log error, continue with other files |
| `Fail` | Stop sync on first file error |
***
## Post-Processing
Action to take on source files after successful processing. Post-processing is best effort.
Post-processing requires write access to the source bucket.
| Action | Behavior |
| :------------- | :------------------------------------------- |
| None (default) | Leave files in place |
| Delete | Remove file after successful processing |
| Move | Move file to specified path after processing |
**Move example**: Files in `s3://bucket/inbox/` moved to `s3://bucket/processed/` after sync.
***
## Limitations
* Nested archives not supported (e.g., `.tar.gz` containing `.zip`)
* Object store API rate limits apply (S3, GCS, Azure). Supermetal retries throttled requests automatically.
# File Source Setup (/docs/main/sources/file/setup)
## Prerequisites
Before you begin, ensure you have:
* **Source Access**: Read access to the object store, filesystem, or remote server containing your files
* **Credentials**:
* AWS S3: IAM user or role with `s3:GetObject` and `s3:ListBucket` permissions
* GCS: Service account with Storage Object Viewer role
* Azure: SAS token with Read and List permissions
* Local Filesystem: Read permissions on the directory
* SFTP/FTP (coming soon): Username and password or SSH key
* **Network Connectivity**: Supermetal agent can reach the source endpoint
* **(Optional) Write Access**: Required only if using post-processing (delete or move files after sync)
***
## Setup
### Configure AWS S3
#### Create IAM Policy
Create an IAM policy with read-only access to your bucket. Attach this policy to an IAM user or role.
* Navigate to the [AWS IAM Console](https://console.aws.amazon.com/iam/).
* Go to Policies and click "Create policy".
* Select "JSON" and paste the following policy document:
Create a policy document file from the following template and run:
```bash
aws iam create-policy \
--policy-name supermetal-file-source-policy \
--policy-document file://policy.json
```
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "SupermetalFileSourcePolicy",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::your-bucket/*",
"arn:aws:s3:::your-bucket"
]
}
]
}
```
If using post-processing (delete or move files after sync), add `s3:DeleteObject` and `s3:PutObject` to the policy.
#### Configure File Source in Supermetal
You need the following connection details:
* Bucket name
* Region
* Access key ID (optional)
* Secret access key (optional)
When running on AWS (EC2, ECS, EKS), you can use an instance profile or IAM role instead of access keys. Attach the policy to your instance role and leave the access key fields empty.
For S3-compatible stores (MinIO, Cloudflare R2, Wasabi), also specify the custom endpoint URL.
### Configure Google Cloud Storage
#### Create Service Account
* Navigate to the [Google Cloud Console](https://console.cloud.google.com/).
* Go to IAM & Admin > Service Accounts.
* Click "Create Service Account".
* Enter a name (e.g., "supermetal-file-source").
* Grant the "Storage Object Viewer" role.
* If using post-processing, also grant "Storage Object Admin".
* Click "Done".
#### Generate Key File
* Select the service account you created.
* Go to the "Keys" tab.
* Click "Add Key" > "Create new key".
* Select "JSON" and click "Create".
* Save the downloaded JSON key file securely.
#### Configure File Source in Supermetal
You need the following:
* Service account JSON key file
* Bucket name
### Configure Azure Blob Storage
#### Create SAS Token
* Navigate to your storage account in the [Azure Portal](https://portal.azure.com/).
* Go to Containers and select your container.
* Click "Shared access tokens".
* Configure the SAS settings:
* Permissions: Read, List
* Set start and expiry time
* Click "Generate SAS token and URL".
* Copy the SAS token.
```bash
end=$(date -u -d "1 year" '+%Y-%m-%dT%H:%MZ')
az storage container generate-sas \
--name your-container \
--account-name your-storage-account \
--permissions rl \
--expiry $end \
--output tsv
```
If using post-processing, add Write and Delete permissions to the SAS token.
#### Configure File Source in Supermetal
You need the following:
* Storage account name
* Container name
* SAS token
### Configure Local Filesystem
Specify the absolute path to the directory containing your files:
```
file:///path/to/your/files
```
Ensure the Supermetal agent has read access to the directory.
***
## File Selection
| Option | Description | Example |
| :----------------- | :------------------------------------------ | :------------------------------------------------------------------------------ |
| `glob_patterns` | Files to include | `**/*.parquet` (all parquet files), `data/2024/**/*.csv` (CSV under data/2024/) |
| `exclude_patterns` | Files to skip | `**/_temporary/**`, `**/.staging/**` |
| `start_date` | Ignore files modified before this timestamp | `2024-01-01T00:00:00Z` |
***
## Table Mapping
### Auto Table Mapping
Each file becomes its own table. Table name derived from filename.
```
prefix: "raw_"
```
**Source Files**
```
s3://exports/
├── customers.parquet
├── products.parquet
└── transactions.parquet
```
→
**Destination Tables**
```
├── raw_customers
├── raw_products
└── raw_transactions
```
### Single Table Mapping
All files load into one destination table.
```
destination: "orders"
```
**Source Files**
```
s3://vendor-data/
├── orders_jan.csv
├── orders_feb.csv
└── orders_mar.csv
```
→
**Destination Tables**
```
└── orders
```
### Dynamic Table Mapping
Extract table name from file path using regex capture groups.
```
pattern: "(?P[^/]+)/(?P[0-9]{4})/.*"
template: "{entity}_{year}"
```
**Source Files**
```
s3://datalake/
├── sales/2024/q1.parquet
├── sales/2024/q2.parquet
├── orders/2024/q1.parquet
└── orders/2024/q2.parquet
```
→
**Destination Tables**
```
├── sales_2024
└── orders_2024
```
***
## Format Options
### CSV Options
All options are auto-detected, including column data types (string, integer, float, boolean, date, timestamp).
| Option | Description |
| :------------------ | :---------------------------------------------------- |
| `has_header` | First row contains column names |
| `delimiter` | Field separator (e.g., `,`, `\t`, `\|`) |
| `quote` | Character used to quote field values |
| `escape` | Character used to escape special characters |
| `comment` | Lines starting with this character are skipped |
| `terminator` | Line ending (e.g., `\n`, `\r\n`) |
| `null_values` | Strings treated as NULL (e.g., `["NULL", "\\N", ""]`) |
| `encoding` | Character encoding |
| `skip_rows` | Number of rows to skip before header |
| `allow_jagged_rows` | Allow rows with fewer columns, fill missing with NULL |
### Encoding
Auto-detected if not specified. Supports encodings from the [WHATWG Encoding Standard](https://encoding.spec.whatwg.org/#concept-encoding-get):
`UTF-8`, `UTF-16LE`, `UTF-16BE`, `ISO-8859-1` through `ISO-8859-16`, `Windows-1250` through `Windows-1258`, `GBK`, `GB18030`, `Big5`, `EUC-JP`, `EUC-KR`, `Shift_JIS`, `ISO-2022-JP`, `KOI8-R`, `KOI8-U`, and others.
### Parquet Options
No configuration required. Schema and compression are read from file metadata.
***
## Polling
| Option | Description | Default |
| :----------------- | :------------------------- | :------ |
| `interval_seconds` | Seconds between file scans | `60` |
Set to `0` for a one-time sync.
***
## Error Handling
| Option | Behavior |
| :--------------- | :----------------------------------- |
| `Skip` (default) | Log error, continue with other files |
| `Fail` | Stop sync on first file error |
***
## Post-Processing
Action to take on source files after successful processing. Post-processing is best effort.
Post-processing requires write access to the source bucket.
| Action | Behavior |
| :------------- | :------------------------------------------- |
| None (default) | Leave files in place |
| Delete | Remove file after successful processing |
| Move | Move file to specified path after processing |
**Move example**: Files in `s3://bucket/inbox/` moved to `s3://bucket/processed/` after sync.
***
## Limitations
* Nested archives not supported (e.g., `.tar.gz` containing `.zip`)
* Object store API rate limits apply (S3, GCS, Azure). Supermetal retries throttled requests automatically.
# MongoDB (/docs/main/sources/mongo)
import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
import { Callout } from 'fumadocs-ui/components/callout';
import { Accordion, Accordions } from 'fumadocs-ui/components/accordion';
import { ImageZoom } from 'fumadocs-ui/components/image-zoom';
import { Steps, Step } from 'fumadocs-ui/components/steps';
Supermetal's MongoDB integration leverages MongoDB's native change streams capability to provide efficient, reliable data extraction with minimal impact on your database. This integration enables both initial snapshot capture and continuous replication of changes.
This guide covers the features, prerequisites, and configuration steps required to connect MongoDB with Supermetal.
***
## Replication Modes
Supermetal offers two replication modes to handle MongoDB's flexible document model:
**Schema Mode**: Automatically infers and evolves a strongly-typed schema from your documents. Each field becomes a typed column in the target, enabling native SQL queries and type-safe analytics. Best for structured data and analytics workloads.
**Schemaless Mode**: Preserves documents as JSON in a two-column format (`_id`, `document`). Supports [parallelized snapshots](#parallelized-snapshots) for faster initial loads. Ideal for highly variable document structures or when you need to preserve the original document format for downstream JSON processing.
*Click each tab to see how a MongoDB document is replicated in each mode.*
```json
{
"_id": ObjectId("507f1f77bcf86cd799439011"),
"name": "Alice",
"email": "alice@example.com",
"age": 30,
"is_active": true,
"tags": ["admin", "user"]
}
```
| Column | Value | Type |
| :---------- | :------------------------- | :------ |
| `_id` | `507f1f77bcf86cd799439011` | Utf8 |
| `name` | `Alice` | Utf8 |
| `email` | `alice@example.com` | Utf8 |
| `age` | `30` | Int32 |
| `is_active` | `true` | Boolean |
| `tags` | `["admin","user"]` | Utf8 |
Arrays are serialized as JSON strings to avoid schema conflicts when element types vary across documents.
| Column | Value |
| :--------- | :----------------------------------------------------------------------------------------------- |
| `_id` | `507f1f77bcf86cd799439011` |
| `document` | `{"name":"Alice","email":"alice@example.com","age":30,"is_active":true,"tags":["admin","user"]}` |
***
## Features
|
Feature
|
Notes
|
|
Initial Data Sync
|
Performs fast initial data loads by chunking large collections and reading each chunk concurrently using multiple connections. Parallelized snapshots are available in Schemaless Mode.
|
|
Change Data Capture
|
Captures data changes (inserts, updates, deletes) in real-time using MongoDB's native [Change Streams](https://www.mongodb.com/docs/manual/changeStreams/) functionality. Change streams provide a time-ordered sequence of change events that occur within your MongoDB deployment.
Supermetal creates a single change stream at the database level rather than per collection, reducing resource usage and following MongoDB's [performance recommendations](https://www.mongodb.com/docs/manual/changeStreams/#change-stream-performance-considerations).
|
|
Schema Evolution
|
In Schema Mode, Supermetal automatically infers and enforces a strongly-typed schema backed by Apache Arrow:
**Precise Type Mapping**: Backed by Apache Arrow's rich type system, Supermetal maintains precise type mapping for MongoDB BSON types to Arrow data types
**Schema Evolution**: Detects new fields during both snapshot and CDC phases, seamlessly merging them into a unified schema
**Downstream Compatibility**: Ensures schema consistency when writing to structured targets like databases and data warehouses
|
|
Catalog Support
|
Ability to configure which collections to sync through the catalog interface.
|
|
Document Flattening
|
Optionally converts nested MongoDB documents into a flat, analytics-friendly structure using a double-underscore (`__`) notation. Available in both Schema Mode and Schemaless Mode.
**Example:**
```javascript
// Original MongoDB document
{
"_id": ObjectId("..."),
"user": {
"name": "John",
"address": {
"city": "San Francisco",
"country": "USA"
}
},
"tags": ["tag1", "tag2"]
}
// Flattened representation
{
"_id": "...",
"user__name": "John",
"user__address__city": "San Francisco",
"user__address__country": "USA",
"tags": ["tag1", "tag2"]
}
```
|
***
## Prerequisites
Before you begin, ensure you have:
* **MongoDB Requirements:**
* **Version:** MongoDB 4.0 or higher (required for change streams support)
* **Supported Deployments:**
* `MongoDB Community/Enterprise Server` (replica set configuration)
* `MongoDB Atlas`
Change streams require a replica set deployment. If you're using a standalone server, you must convert it to a single-node replica set to use change streams.
* **Deployment Requirements:**
* **Replica Set**: Your MongoDB deployment must be configured as a replica set
* **Read Concern Majority**: Supermetal uses read concern "majority" to ensure consistent reads
* **Database Permissions**: User with appropriate permissions (see [Setup](#setup))
* **Network Connectivity**: Ensure Supermetal can reach your MongoDB deployment (default port: 27017)
* **TLS/SSL Support**: Supermetal supports both unencrypted and TLS/SSL encrypted connections
* **MongoDB Atlas Requirements**:
* **Network Access**: Configure network access rules to allow Supermetal to connect
* **Connection String**: Use the connection string format that includes all replica set members
***
## Setup
### Permissions Overview
Supermetal requires a dedicated MongoDB user with appropriate permissions to read data and access change streams. The recommended approach is to create a dedicated read-only user.
| MongoDB Deployment | Minimum Required Permissions |
| :----------------- | :-------------------------------------------------- |
| Self-managed | `read` role on the mongo database to replicate from |
| MongoDB Atlas | `readAnyDatabase` role |
### Create a Dedicated Read-Only MongoDB User
#### Connect to your MongoDB instance using the mongo shell with admin privileges: \[!toc]
```bash
mongosh --host --port -u -p --authenticationDatabase admin
```
Replace the placeholder values in the command with your actual information:
* ``: Your MongoDB server hostname or IP address
* ``: MongoDB port (default is 27017)
* ``: Username with admin privileges
* ``: Password for the admin user
#### Create a dedicated user for Supermetal: \[!toc]
```javascript
use admin
db.createUser({
user: "supermetal_user",
pwd: "strong-password",
roles: [
{ role: "read", db: "target-database" }
]
})
```
Replace the placeholder values in the script with your actual information:
* `strong-password`: Replace with a secure, unique password for the `supermetal_user`.
* `target-database`: The name of the database you want to replicate from.
#### Log in to your MongoDB Atlas account \[!toc]
Visit [cloud.mongodb.com](https://cloud.mongodb.com/) to access your account.
#### Select your project and navigate to the **Database Access** section \[!toc]
#### Click the "Add New Database User" button \[!toc]
#### Enter a username and password \[!toc]
#### Under "Database User Privileges", select "Built-in Role" and choose "Read Any Database" \[!toc]
#### Optionally, you can restrict access to specific clusters that contain the databases you want to replicate \[!toc]
#### Click "Add User" to create the user with the selected permissions \[!toc]
### Configure MongoDB Source in Supermetal
You are now ready to configure MongoDB as a source within Supermetal. When configuring the source, select your preferred replication mode:
* **Schema Mode**: Infers strongly-typed schemas from documents, creating typed columns in the target. Best for analytics workloads.
* **Schemaless Mode**: Preserves documents as JSON in a two-column format (`_id`, `document`). Best for variable document structures.
***
## Data Types Mapping
The following type mappings apply to **Schema Mode** replication. In **Schemaless Mode**, all documents are stored as JSON in a two-column format (`_id: Utf8`, `document: Json`).
| MongoDB BSON Type(s) | Apache Arrow DataType | Notes |
| :------------------- | :-------------------- | :----------------------------------------------------------------------------------------------------- |
| `Double` | `Float64` | Non-finite values (NaN, Infinity) are converted to null. |
| `Int32` | `Int32` | |
| `Int64` | `Int64` | |
| `Decimal128` | `Utf8` | Preserved as string to maintain exact precision and handle MongoDB's variable decimal precision/scale. |
| MongoDB BSON Type(s) | Apache Arrow DataType | Notes |
| :------------------- | :-------------------- | :---- |
| `Boolean` | `Boolean` | |
| MongoDB BSON Type(s) | Apache Arrow DataType | Notes |
| :------------------- | :------------------------------ | :---------------------------------------------------------------------------------------------------------- |
| `DateTime` | `Timestamp(Millisecond, "UTC")` | Serialized as RFC3339 format with UTC timezone. |
| `Timestamp` | `Utf8` | MongoDB internal oplog timestamp (seconds + ordinal). Serialized as JSON: `{"t": seconds, "i": increment}`. |
| MongoDB BSON Type(s) | Apache Arrow DataType | Notes |
| :------------------------ | :-------------------- | :-------------------------------------- |
| `String` | `Utf8` | |
| `Symbol` | `Utf8` | Deprecated MongoDB type. |
| `RegularExpression` | `Utf8` | Pattern string only. |
| `JavaScriptCode` | `Utf8` | Code string only. |
| `JavaScriptCodeWithScope` | `Utf8` | Serialized as JSON with code and scope. |
| MongoDB BSON Type(s) | Apache Arrow DataType | Notes |
| :------------------- | :-------------------- | :--------------------------------------------------------- |
| `Binary` | `Utf8` | Encoded as hexadecimal string for lossless representation. |
| MongoDB BSON Type(s) | Apache Arrow DataType | Notes |
| :------------------- | :-------------------- | :-------------------------------------------------------------------------------------------------- |
| `Array` | `Utf8` | Arrays are stringified as JSON to avoid schema conflicts when element types vary across documents. |
| `Document` | `Utf8` (JSON) | Nested documents serialized as JSON strings. Empty documents do not contribute to schema inference. |
| MongoDB BSON Type(s) | Apache Arrow DataType | Notes |
| :------------------- | :-------------------- | :---------------------------------------------------- |
| `ObjectId` | `Utf8` | Converted to 24-character hex string. |
| `DbPointer` | `Utf8` | Legacy MongoDB type, serialized as JSON. |
| `Null` | *(no column)* | Null values do not contribute to schema inference. |
| `MinKey` | `Utf8` | Serialized as `{"$minKey":1}`. |
| `MaxKey` | `Utf8` | Serialized as `{"$maxKey":1}`. |
| `Undefined` | `Utf8` | Deprecated type, serialized as `{"$undefined":true}`. |
# MongoDB Setup (/docs/main/sources/mongo/setup)
## Prerequisites
Before you begin, ensure you have:
* **MongoDB Requirements:**
* **Version:** MongoDB 4.0 or higher (required for change streams support)
* **Supported Deployments:**
* `MongoDB Community/Enterprise Server` (replica set configuration)
* `MongoDB Atlas`
Change streams require a replica set deployment. If you're using a standalone server, you must convert it to a single-node replica set to use change streams.
* **Deployment Requirements:**
* **Replica Set**: Your MongoDB deployment must be configured as a replica set
* **Read Concern Majority**: Supermetal uses read concern "majority" to ensure consistent reads
* **Database Permissions**: User with appropriate permissions (see [Setup](#setup))
* **Network Connectivity**: Ensure Supermetal can reach your MongoDB deployment (default port: 27017)
* **TLS/SSL Support**: Supermetal supports both unencrypted and TLS/SSL encrypted connections
* **MongoDB Atlas Requirements**:
* **Network Access**: Configure network access rules to allow Supermetal to connect
* **Connection String**: Use the connection string format that includes all replica set members
***
## Setup
### Permissions Overview
Supermetal requires a dedicated MongoDB user with appropriate permissions to read data and access change streams. The recommended approach is to create a dedicated read-only user.
| MongoDB Deployment | Minimum Required Permissions |
| :----------------- | :-------------------------------------------------- |
| Self-managed | `read` role on the mongo database to replicate from |
| MongoDB Atlas | `readAnyDatabase` role |
### Create a Dedicated Read-Only MongoDB User
#### Connect to your MongoDB instance using the mongo shell with admin privileges: \[!toc]
```bash
mongosh --host --port -u -p --authenticationDatabase admin
```
Replace the placeholder values in the command with your actual information:
* ``: Your MongoDB server hostname or IP address
* ``: MongoDB port (default is 27017)
* ``: Username with admin privileges
* ``: Password for the admin user
#### Create a dedicated user for Supermetal: \[!toc]
```javascript
use admin
db.createUser({
user: "supermetal_user",
pwd: "strong-password",
roles: [
{ role: "read", db: "target-database" }
]
})
```
Replace the placeholder values in the script with your actual information:
* `strong-password`: Replace with a secure, unique password for the `supermetal_user`.
* `target-database`: The name of the database you want to replicate from.
#### Log in to your MongoDB Atlas account \[!toc]
Visit [cloud.mongodb.com](https://cloud.mongodb.com/) to access your account.
#### Select your project and navigate to the **Database Access** section \[!toc]
#### Click the "Add New Database User" button \[!toc]
#### Enter a username and password \[!toc]
#### Under "Database User Privileges", select "Built-in Role" and choose "Read Any Database" \[!toc]
#### Optionally, you can restrict access to specific clusters that contain the databases you want to replicate \[!toc]
#### Click "Add User" to create the user with the selected permissions \[!toc]
### Configure MongoDB Source in Supermetal
You are now ready to configure MongoDB as a source within Supermetal. When configuring the source, select your preferred replication mode:
* **Schema Mode**: Infers strongly-typed schemas from documents, creating typed columns in the target. Best for analytics workloads.
* **Schemaless Mode**: Preserves documents as JSON in a two-column format (`_id`, `document`). Best for variable document structures.
# Oracle (/docs/main/sources/oracle)
import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
import { Steps, Step } from 'fumadocs-ui/components/steps';
import { Callout } from 'fumadocs-ui/components/callout';
import { Accordion, Accordions } from 'fumadocs-ui/components/accordion';
Supermetal supports change data capture (CDC) using Oracle's native [LogMiner](https://docs.oracle.com/en/database/oracle/oracle-database/23/sutil/oracle-logminer-utility.html) for querying redo logs and enabling real-time Change Data Capture (CDC).
This guide walks you through configuring your Oracle database to work seamlessly with Supermetal.
***
## Features
|
Feature
|
Notes
|
|
Initial Data Sync
|
Performs fast initial data loads (snapshots) and historical backfills by chunking data within each table and reading each chunk concurrently using multiple connections.
|
|
Change Data Capture
|
Captures data changes (inserts, updates, deletes) using Oracle's native [LogMiner](https://docs.oracle.com/en/database/oracle/oracle-database/23/sutil/oracle-logminer-utility.html) utility. LogMiner reads the database redo and archive logs to extract change data with guaranteed transactional consistency.
|
|
Transactional Consistency
|
Guarantees that the data synced reflects a consistent point in the upstream database's timeline, respecting transaction boundaries through LogMiner's redo log analysis.
|
|
Catalog Support
|
Ability to configure which schemas, tables and columns to sync.
|
|
Multi-tenant Architecture Support
|
Works with Oracle Container Database (CDB) and Pluggable Databases (PDB).
|
***
## Limitations
* [LogMiner](https://docs.oracle.com/en/database/oracle/oracle-database/23/sutil/oracle-logminer-utility.html#GUID-7594F0D7-0ACD-46E6-BD61-2751136ECDB4):
* Objects (table or columns) with names longer than 30 characters are not supported.
* LOB data types: Partially supported. LogMiner only logs column values for LOB (XML, CLOB, NCLOB, and BLOB) data types if they are explicitly set or changed in a SQL statement.
* Schema evolution support depends on LogMiner dictionary strategy:
* **Redo logs**: Supported.
* **Online catalog**: Partial support, limited to backwards-compatible changes.
***
## Prerequisites
Before you begin, ensure you have:
* **Supported Oracle Version:** Oracle Database 12c or higher
Standard standalone Oracle database without container architecture.
| Mode | Description |
| -------------- | ---------------------------------------- |
| **All PDBs** | Capture changes from all PDBs |
| **Single PDB** | Capture changes from a specific PDB only |
Capture changes by directly connecting to a pluggable database. Requires Oracle 19c Update 10 or later.
**Oracle RAC (Real Application Clusters)** is not currently supported.
* **Supported Platforms:**
* **Self-Managed Oracle**: Generic Oracle databases on-premises or self-hosted in the cloud
* **AWS RDS Oracle**: Amazon RDS for Oracle instances (single tenant and multi-tenant)
* **Database Admin Access:** Database master or admin user with sufficient privileges to create users, grant permissions, and modify database configuration parameters
* **Network Connectivity:** Ensure that Supermetal's services can reach your Oracle database. You may need to configure security groups
***
## Setup
Follow these steps to configure your Oracle source:
### Create a user & grant permissions
```sql
-- Connect to your Oracle database as an Admin user
-- Create a dedicated user for Supermetal
CREATE USER SUPERMETAL_USER IDENTIFIED BY "strong-password";
-- Grant session and catalog permissions
GRANT CREATE SESSION TO SUPERMETAL_USER;
GRANT EXECUTE_CATALOG_ROLE TO SUPERMETAL_USER;
-- Grant LogMiner permissions
GRANT LOGMINING TO SUPERMETAL_USER;
-- Grant table access
GRANT SELECT ANY TABLE TO SUPERMETAL_USER;
GRANT SELECT ANY TRANSACTION TO SUPERMETAL_USER;
-- Grant access to LogMiner dictionary views
GRANT SELECT ON V_$DATABASE TO SUPERMETAL_USER;
GRANT SELECT ON V_$ARCHIVED_LOG TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOGMNR_CONTENTS TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOGMNR_LOGS TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOG TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOGFILE TO SUPERMETAL_USER;
GRANT SELECT ON V_$TRANSACTION TO SUPERMETAL_USER;
GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO SUPERMETAL_USER;
-- Required when using redo logs dictionary strategy (Oracle 19cU10+)
GRANT SELECT ON DBA_LOGMNR_DICTIONARY_BUILDLOG TO SUPERMETAL_USER;
-- Grant access to system views for metadata
GRANT SELECT ON DBA_OBJECTS TO SUPERMETAL_USER;
GRANT SELECT ON DBA_EXTENTS TO SUPERMETAL_USER;
-- Grant execute permissions for LogMiner
GRANT EXECUTE ON DBMS_LOGMNR TO SUPERMETAL_USER;
GRANT EXECUTE ON DBMS_LOGMNR_D TO SUPERMETAL_USER;
-- If TDE (Transparent Data Encryption) is enabled, also grant:
-- GRANT SELECT ON DBA_TABLESPACES TO SUPERMETAL_USER;
-- GRANT SELECT ON DBA_ENCRYPTED_COLUMNS TO SUPERMETAL_USER;
```
```sql
-- Connect to your Oracle database as an Admin user
-- Switch to the container (root) database
ALTER SESSION SET CONTAINER=CDB$ROOT;
-- Create a common user (must start with C##)
CREATE USER C##SUPERMETAL_USER IDENTIFIED BY "strong-password" CONTAINER=ALL;
-- Grant session and container permissions
GRANT CREATE SESSION, ALTER SESSION, SET CONTAINER TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT EXECUTE_CATALOG_ROLE TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Configure user to access all containers
ALTER USER C##SUPERMETAL_USER SET CONTAINER_DATA=ALL CONTAINER=CURRENT;
-- Grant LogMiner permissions
GRANT LOGMINING TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Grant table access across all containers
GRANT SELECT ANY TABLE TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ANY TRANSACTION TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Grant access to LogMiner dictionary views
GRANT SELECT ON V_$DATABASE TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$ARCHIVED_LOG TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOGMNR_CONTENTS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOGMNR_LOGS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOG TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOGFILE TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$TRANSACTION TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$CONTAINERS TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Required when using redo logs dictionary strategy (Oracle 19cU10+)
GRANT SELECT ON DBA_LOGMNR_DICTIONARY_BUILDLOG TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Grant access to system views for metadata
GRANT SELECT ON DBA_OBJECTS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON DBA_EXTENTS TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Grant execute permissions for LogMiner
GRANT EXECUTE ON DBMS_LOGMNR TO C##SUPERMETAL_USER;
GRANT EXECUTE ON DBMS_LOGMNR_D TO C##SUPERMETAL_USER;
-- If TDE is enabled, also grant:
-- GRANT SELECT ON DBA_TABLESPACES TO C##SUPERMETAL_USER CONTAINER=ALL;
-- GRANT SELECT ON DBA_ENCRYPTED_COLUMNS TO C##SUPERMETAL_USER CONTAINER=ALL;
```
```sql
-- Connect to CDB root as an Admin user
ALTER SESSION SET CONTAINER=CDB$ROOT;
-- Create a common user (must start with C##)
CREATE USER C##SUPERMETAL_USER IDENTIFIED BY "strong-password" CONTAINER=ALL;
-- Grant session and container permissions
GRANT CREATE SESSION, ALTER SESSION, SET CONTAINER TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT EXECUTE_CATALOG_ROLE TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Configure user to access specific PDB (replace PDB_NAME with your PDB)
ALTER USER C##SUPERMETAL_USER SET CONTAINER_DATA=(CDB$ROOT, PDB_NAME) CONTAINER=CURRENT;
-- Grant LogMiner permissions
GRANT LOGMINING TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Switch to the specific PDB
ALTER SESSION SET CONTAINER=PDB_NAME;
-- Grant table access in the PDB
GRANT SELECT ANY TABLE TO C##SUPERMETAL_USER;
GRANT SELECT ANY TRANSACTION TO C##SUPERMETAL_USER;
-- Switch back to CDB root
ALTER SESSION SET CONTAINER=CDB$ROOT;
-- Grant access to LogMiner dictionary views
GRANT SELECT ON V_$DATABASE TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$ARCHIVED_LOG TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOGMNR_CONTENTS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOGMNR_LOGS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOG TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOGFILE TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$TRANSACTION TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$CONTAINERS TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Required when using redo logs dictionary strategy (Oracle 19cU10+)
GRANT SELECT ON DBA_LOGMNR_DICTIONARY_BUILDLOG TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Grant access to system views for metadata
GRANT SELECT ON DBA_OBJECTS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON DBA_EXTENTS TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Grant execute permissions for LogMiner
GRANT EXECUTE ON DBMS_LOGMNR TO C##SUPERMETAL_USER;
GRANT EXECUTE ON DBMS_LOGMNR_D TO C##SUPERMETAL_USER;
-- If TDE is enabled, also grant:
-- GRANT SELECT ON DBA_TABLESPACES TO C##SUPERMETAL_USER CONTAINER=ALL;
-- GRANT SELECT ON DBA_ENCRYPTED_COLUMNS TO C##SUPERMETAL_USER CONTAINER=ALL;
```
```sql
-- Connect directly to the PDB as an Admin user
-- Requires Oracle 19c Update 10 or later
-- Create a local user in the PDB
CREATE USER SUPERMETAL_USER IDENTIFIED BY "strong-password";
-- Grant session and catalog permissions
GRANT CREATE SESSION TO SUPERMETAL_USER;
GRANT EXECUTE_CATALOG_ROLE TO SUPERMETAL_USER;
-- Grant LogMiner permissions
GRANT LOGMINING TO SUPERMETAL_USER;
-- Grant table access
GRANT SELECT ANY TABLE TO SUPERMETAL_USER;
GRANT SELECT ANY TRANSACTION TO SUPERMETAL_USER;
-- Grant access to LogMiner dictionary views
GRANT SELECT ON V_$DATABASE TO SUPERMETAL_USER;
GRANT SELECT ON V_$ARCHIVED_LOG TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOGMNR_CONTENTS TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOGMNR_LOGS TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOG TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOGFILE TO SUPERMETAL_USER;
GRANT SELECT ON V_$TRANSACTION TO SUPERMETAL_USER;
GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO SUPERMETAL_USER;
-- Required when using redo logs dictionary strategy (Oracle 19cU10+)
GRANT SELECT ON DBA_LOGMNR_DICTIONARY_BUILDLOG TO SUPERMETAL_USER;
-- Grant access to system views for metadata
GRANT SELECT ON DBA_OBJECTS TO SUPERMETAL_USER;
GRANT SELECT ON DBA_EXTENTS TO SUPERMETAL_USER;
-- Grant execute permissions for LogMiner
GRANT EXECUTE ON DBMS_LOGMNR TO SUPERMETAL_USER;
GRANT EXECUTE ON DBMS_LOGMNR_D TO SUPERMETAL_USER;
-- If TDE is enabled, also grant:
-- GRANT SELECT ON DBA_TABLESPACES TO SUPERMETAL_USER;
-- GRANT SELECT ON DBA_ENCRYPTED_COLUMNS TO SUPERMETAL_USER;
```
```sql
-- Connect as the master user
-- Create a user for Supermetal
CREATE USER SUPERMETAL_USER IDENTIFIED BY "strong-password";
GRANT CONNECT TO SUPERMETAL_USER;
GRANT CREATE SESSION TO SUPERMETAL_USER;
-- Grant LogMiner and catalog permissions
GRANT LOGMINING TO SUPERMETAL_USER;
GRANT EXECUTE_CATALOG_ROLE TO SUPERMETAL_USER;
-- Grant table access
GRANT SELECT ANY TRANSACTION TO SUPERMETAL_USER;
GRANT SELECT ANY TABLE TO SUPERMETAL_USER;
-- Use RDS procedures to grant access to system objects
BEGIN
-- Required when using redo logs dictionary strategy (Oracle 19cU10+)
rdsadmin.rdsadmin_util.grant_sys_object('DBA_LOGMNR_DICTIONARY_BUILDLOG','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$DATABASE','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$ARCHIVED_LOG','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGMNR_CONTENTS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGMNR_LOGS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOG','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGFILE','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$TRANSACTION','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$ARCHIVE_DEST_STATUS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$PARAMETER','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('DBMS_LOGMNR','SUPERMETAL_USER','EXECUTE');
rdsadmin.rdsadmin_util.grant_sys_object('DBMS_LOGMNR_D','SUPERMETAL_USER','EXECUTE');
rdsadmin.rdsadmin_util.grant_sys_object('DBA_OBJECTS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('DBA_EXTENTS','SUPERMETAL_USER','SELECT');
END;
/
-- If TDE is enabled, also grant:
-- BEGIN
-- rdsadmin.rdsadmin_util.grant_sys_object('DBA_TABLESPACES','SUPERMETAL_USER','SELECT');
-- rdsadmin.rdsadmin_util.grant_sys_object('DBA_ENCRYPTED_COLUMNS','SUPERMETAL_USER','SELECT');
-- END;
-- /
```
```sql
-- Connect to the PDB as the master user
-- Requires Oracle 19c Update 10 or later on RDS
-- Create a local user in the PDB
CREATE USER SUPERMETAL_USER IDENTIFIED BY "strong-password";
GRANT CONNECT TO SUPERMETAL_USER;
GRANT CREATE SESSION TO SUPERMETAL_USER;
-- Grant LogMiner and catalog permissions
GRANT LOGMINING TO SUPERMETAL_USER;
GRANT EXECUTE_CATALOG_ROLE TO SUPERMETAL_USER;
-- Grant table access
GRANT SELECT ANY TRANSACTION TO SUPERMETAL_USER;
GRANT SELECT ANY TABLE TO SUPERMETAL_USER;
-- Use RDS procedures to grant access to system objects
BEGIN
rdsadmin.rdsadmin_util.grant_sys_object('DBA_LOGMNR_DICTIONARY_BUILDLOG','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$DATABASE','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$ARCHIVED_LOG','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGMNR_CONTENTS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGMNR_LOGS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOG','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGFILE','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$TRANSACTION','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$ARCHIVE_DEST_STATUS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$PARAMETER','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('DBMS_LOGMNR','SUPERMETAL_USER','EXECUTE');
rdsadmin.rdsadmin_util.grant_sys_object('DBMS_LOGMNR_D','SUPERMETAL_USER','EXECUTE');
rdsadmin.rdsadmin_util.grant_sys_object('DBA_OBJECTS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('DBA_EXTENTS','SUPERMETAL_USER','SELECT');
END;
/
-- If TDE is enabled, also grant:
-- BEGIN
-- rdsadmin.rdsadmin_util.grant_sys_object('DBA_TABLESPACES','SUPERMETAL_USER','SELECT');
-- rdsadmin.rdsadmin_util.grant_sys_object('DBA_ENCRYPTED_COLUMNS','SUPERMETAL_USER','SELECT');
-- END;
-- /
```
Oracle's default profiles often have low session limits which may impact Supermetal's parallel processing capabilities when snapshotting your database.
For optimal performance, set your `SESSIONS_PER_USER` limit to the smaller of:
* The number of Supermetal agent cores, or
* The number of database cores (available from `v$parameter.cpu_count`)
At minimum, we recommend setting this value to at least 10 sessions:
```sql
-- Check current database CPU count
SELECT VALUE FROM v$parameter WHERE NAME = 'cpu_count';
-- Create profile with appropriate session limit
CREATE PROFILE SUPERMETAL_PROFILE LIMIT SESSIONS_PER_USER 10;
-- Apply profile to Supermetal user
ALTER USER SUPERMETAL_USER PROFILE SUPERMETAL_PROFILE;
```
For CDB environments, apply the profile to your common user instead:
```sql
ALTER USER C##SUPERMETAL_USER PROFILE SUPERMETAL_PROFILE;
```
Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the source in Supermetal.
### Setup LogMiner
#### Archive Log Mode
Oracle's LogMiner requires that your database is running in ARCHIVELOG mode. You can check and enable it as follows:
```sql
-- Check if database is in ARCHIVELOG mode
SELECT log_mode FROM v$database;
-- If not in ARCHIVELOG mode, enable it
SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
ALTER DATABASE ARCHIVELOG;
ALTER DATABASE OPEN;
```
[Changing log mode](https://docs.oracle.com/en/database/oracle/oracle-database/23/sutil/oracle-logminer-utility.html#GUID-7594F0D7-0ACD-46E6-BD61-2751136ECDB4) requires a restart of the oracle database.
Instruct Oracle [RMAN](https://docs.oracle.com/en/database/oracle/oracle-database/23/sutil/oracle-logminer-utility.html#GUID-7594F0D7-0ACD-46E6-BD61-2751136ECDB4) to retain archive logs for at least 3 days.
```sql
RMAN> CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 3 DAYS;
```
For optimal LogMiner performance, ensure your redo logs are appropriately sized
```sql
-- Check current redo log configuration
SELECT group#, bytes/1024/1024 AS size_mb FROM v$log;
```
Each redo log file should be at least 500MB for production databases.
If your logs are smaller, consider increasing their size by contacting your DBA
```sql
-- Check if database is in ARCHIVELOG mode
SELECT log_mode FROM v$database;
-- If not in ARCHIVELOG mode, enable it by turning on backups in AWS RDS console
```
AWS RDS Oracle instances are automatically configured with ARCHIVELOG mode enabled when backups are enabled.
Set archive log retention using the `set_configuration` procedure for at least 3 days:
```sql
BEGIN
rdsadmin.rdsadmin_util.set_configuration('archivelog retention hours', 72);
END;
/
```
Adjust this value based on your backup and recovery requirements.
#### Database Supplemental Logging
LogMiner requires supplemental logging to capture the necessary information for CDC:
* **Minimal**: Lower redo log usage but requires table-level supplemental logging configuration for each table.
* **Full**: Higher redo log usage but eliminates the need for table-level configuration.
```sql
-- Enable minimal supplemental logging at the database level
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
-- Then configure table-level supplemental logging for each table to be synced
-- Repeat for each table to be synced:
ALTER TABLE schema_name.table_name ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
```
```sql
-- Enable full supplemental logging at the database level
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
```
For more reliable CDC, consider enabling force logging at the database level:
```sql
ALTER DATABASE FORCE LOGGING;
```
```sql
-- Enable full supplemental logging using RDS procedures
BEGIN
rdsadmin.rdsadmin_util.alter_supplemental_logging(
p_action => 'ADD',
p_type => 'ALL'
);
END;
/
```
For more reliable CDC, enable force logging using the RDS specific procedure:
```sql
BEGIN
rdsadmin.rdsadmin_util.force_logging(p_enable => true);
END;
```
This ensures all database operations are logged.
***
## Data Types Mapping
| Oracle Type(s) | Apache Arrow DataType | Notes |
| :--------------------------------- | :-------------------- | :----------------------------------------- |
| `NUMBER(p, s)` where p,s specified | `Decimal128(p, s)` | Constrained precision and scale. |
| `NUMBER` (identity column) | `Decimal128(38, 0)` | Primary keys |
| `NUMBER` (unconstrained) | `Utf8` | Unconstrained precision and scale |
| `NUMBER(p, s)` where s \< 0 | `Utf8` | Negative scale numbers |
| `BINARY_FLOAT` | `Float32` | 32-bit IEEE 754 single precision |
| `BINARY_DOUBLE` | `Float64` | 64-bit IEEE 754 double precision |
| `FLOAT(p)` where p ≤ 20 | `Float32` | Binary precision up to \~6 decimal digits |
| `FLOAT(p)` where 20 \< p ≤ 49 | `Float64` | Binary precision up to \~15 decimal digits |
| `FLOAT(p)` where p > 49 | `Utf8` | High-precision floats preserved as strings |
| Oracle Type(s) | Apache Arrow DataType | Notes |
| :------------- | :-------------------- | :---- |
| `BOOLEAN` | *Unsupported* | |
| Oracle Type(s) | Apache Arrow DataType | Notes |
| :---------------------------- | :-------------------------- | :---------------------------------------------------------------------------------------------- |
| `DATE` | `Timestamp(s)` | Oracle DATE includes both date and time components |
| `TIMESTAMP(p)` | `Timestamp(s/ms/μs)` | Precision based on p: p=0: seconds p=1-3: milliseconds p=4-6: microseconds |
| `TIMESTAMP(ns)` | `Utf8` | |
| `TIMESTAMP(p) WITH TIME ZONE` | `Timestamp(s/ms/μs, "UTC")` | Same precision levels as above, normalized to UTC |
| `TIMESTAMP(p) WITH LOCAL TZ` | `Timestamp(s/ms/μs)` | Same precision levels as above, UTC timezone |
| `INTERVAL YEAR(2) TO MONTH` | `Interval(YearMonth)` | Stores duration as months (years×12 + months) |
| `INTERVAL YEAR TO MONTH` | `Utf8` | |
| `INTERVAL DAY TO SECOND` | `Utf8` | |
| Oracle Type(s) | Apache Arrow DataType | Notes |
| :------------- | :-------------------- | :---------------------------------------------------------------------------------------------------- |
| `CHAR` | `Utf8` | Fixed-length character data |
| `NCHAR` | `Utf8` | Fixed-length Unicode character data |
| `VARCHAR2` | `Utf8` | Variable-length character data. Supports extended string size (MAX\_STRING\_SIZE = EXTENDED). |
| `NVARCHAR2` | `Utf8` | Variable-length Unicode character data. Supports extended string size (MAX\_STRING\_SIZE = EXTENDED). |
| `CLOB` | `LargeUtf8` | Character Large Object for text > 2GB |
| `NCLOB` | `LargeUtf8` | National Character Large Object for Unicode > 2GB |
| `LONG` | `LargeUtf8` | Legacy long text type (deprecated in Oracle) |
| `ROWID` | `Utf8` | Oracle's physical row address |
| Oracle Type(s) | Apache Arrow DataType | Notes |
| :------------- | :-------------------- | :------------------------------------------------------------------------------- |
| `RAW` | `Binary` | Fixed-length binary data. Supports extended size (MAX\_STRING\_SIZE = EXTENDED). |
| `BLOB` | `LargeBinary` | Binary Large Object for binary data > 2GB |
| `BFILE` | `LargeBinary` | Binary file locator for external files |
| `LONG RAW` | `LargeBinary` | Legacy long binary type (deprecated in Oracle) |
| Oracle Type(s) | Apache Arrow DataType | Notes |
| :------------- | :-------------------- | :---------------------------------------------------------------------- |
| `XMLTYPE` | `Utf8` | Mapped to `Utf8`. A extension type for XML is defined, similar to JSON. |
| `JSON` | *Unsupported* | |
| Oracle Type(s) | Apache Arrow DataType | Notes |
| :------------- | :-------------------- | :----------------------------- |
| `OBJECT` | *Unsupported* | Object types are not supported |
# Oracle Setup (/docs/main/sources/oracle/setup)
## Prerequisites
Before you begin, ensure you have:
* **Supported Oracle Version:** Oracle Database 12c or higher
Standard standalone Oracle database without container architecture.
| Mode | Description |
| -------------- | ---------------------------------------- |
| **All PDBs** | Capture changes from all PDBs |
| **Single PDB** | Capture changes from a specific PDB only |
Capture changes by directly connecting to a pluggable database. Requires Oracle 19c Update 10 or later.
**Oracle RAC (Real Application Clusters)** is not currently supported.
* **Supported Platforms:**
* **Self-Managed Oracle**: Generic Oracle databases on-premises or self-hosted in the cloud
* **AWS RDS Oracle**: Amazon RDS for Oracle instances (single tenant and multi-tenant)
* **Database Admin Access:** Database master or admin user with sufficient privileges to create users, grant permissions, and modify database configuration parameters
* **Network Connectivity:** Ensure that Supermetal's services can reach your Oracle database. You may need to configure security groups
***
## Setup
Follow these steps to configure your Oracle source:
### Create a user & grant permissions
```sql
-- Connect to your Oracle database as an Admin user
-- Create a dedicated user for Supermetal
CREATE USER SUPERMETAL_USER IDENTIFIED BY "strong-password";
-- Grant session and catalog permissions
GRANT CREATE SESSION TO SUPERMETAL_USER;
GRANT EXECUTE_CATALOG_ROLE TO SUPERMETAL_USER;
-- Grant LogMiner permissions
GRANT LOGMINING TO SUPERMETAL_USER;
-- Grant table access
GRANT SELECT ANY TABLE TO SUPERMETAL_USER;
GRANT SELECT ANY TRANSACTION TO SUPERMETAL_USER;
-- Grant access to LogMiner dictionary views
GRANT SELECT ON V_$DATABASE TO SUPERMETAL_USER;
GRANT SELECT ON V_$ARCHIVED_LOG TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOGMNR_CONTENTS TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOGMNR_LOGS TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOG TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOGFILE TO SUPERMETAL_USER;
GRANT SELECT ON V_$TRANSACTION TO SUPERMETAL_USER;
GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO SUPERMETAL_USER;
-- Required when using redo logs dictionary strategy (Oracle 19cU10+)
GRANT SELECT ON DBA_LOGMNR_DICTIONARY_BUILDLOG TO SUPERMETAL_USER;
-- Grant access to system views for metadata
GRANT SELECT ON DBA_OBJECTS TO SUPERMETAL_USER;
GRANT SELECT ON DBA_EXTENTS TO SUPERMETAL_USER;
-- Grant execute permissions for LogMiner
GRANT EXECUTE ON DBMS_LOGMNR TO SUPERMETAL_USER;
GRANT EXECUTE ON DBMS_LOGMNR_D TO SUPERMETAL_USER;
-- If TDE (Transparent Data Encryption) is enabled, also grant:
-- GRANT SELECT ON DBA_TABLESPACES TO SUPERMETAL_USER;
-- GRANT SELECT ON DBA_ENCRYPTED_COLUMNS TO SUPERMETAL_USER;
```
```sql
-- Connect to your Oracle database as an Admin user
-- Switch to the container (root) database
ALTER SESSION SET CONTAINER=CDB$ROOT;
-- Create a common user (must start with C##)
CREATE USER C##SUPERMETAL_USER IDENTIFIED BY "strong-password" CONTAINER=ALL;
-- Grant session and container permissions
GRANT CREATE SESSION, ALTER SESSION, SET CONTAINER TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT EXECUTE_CATALOG_ROLE TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Configure user to access all containers
ALTER USER C##SUPERMETAL_USER SET CONTAINER_DATA=ALL CONTAINER=CURRENT;
-- Grant LogMiner permissions
GRANT LOGMINING TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Grant table access across all containers
GRANT SELECT ANY TABLE TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ANY TRANSACTION TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Grant access to LogMiner dictionary views
GRANT SELECT ON V_$DATABASE TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$ARCHIVED_LOG TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOGMNR_CONTENTS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOGMNR_LOGS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOG TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOGFILE TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$TRANSACTION TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$CONTAINERS TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Required when using redo logs dictionary strategy (Oracle 19cU10+)
GRANT SELECT ON DBA_LOGMNR_DICTIONARY_BUILDLOG TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Grant access to system views for metadata
GRANT SELECT ON DBA_OBJECTS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON DBA_EXTENTS TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Grant execute permissions for LogMiner
GRANT EXECUTE ON DBMS_LOGMNR TO C##SUPERMETAL_USER;
GRANT EXECUTE ON DBMS_LOGMNR_D TO C##SUPERMETAL_USER;
-- If TDE is enabled, also grant:
-- GRANT SELECT ON DBA_TABLESPACES TO C##SUPERMETAL_USER CONTAINER=ALL;
-- GRANT SELECT ON DBA_ENCRYPTED_COLUMNS TO C##SUPERMETAL_USER CONTAINER=ALL;
```
```sql
-- Connect to CDB root as an Admin user
ALTER SESSION SET CONTAINER=CDB$ROOT;
-- Create a common user (must start with C##)
CREATE USER C##SUPERMETAL_USER IDENTIFIED BY "strong-password" CONTAINER=ALL;
-- Grant session and container permissions
GRANT CREATE SESSION, ALTER SESSION, SET CONTAINER TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT EXECUTE_CATALOG_ROLE TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Configure user to access specific PDB (replace PDB_NAME with your PDB)
ALTER USER C##SUPERMETAL_USER SET CONTAINER_DATA=(CDB$ROOT, PDB_NAME) CONTAINER=CURRENT;
-- Grant LogMiner permissions
GRANT LOGMINING TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Switch to the specific PDB
ALTER SESSION SET CONTAINER=PDB_NAME;
-- Grant table access in the PDB
GRANT SELECT ANY TABLE TO C##SUPERMETAL_USER;
GRANT SELECT ANY TRANSACTION TO C##SUPERMETAL_USER;
-- Switch back to CDB root
ALTER SESSION SET CONTAINER=CDB$ROOT;
-- Grant access to LogMiner dictionary views
GRANT SELECT ON V_$DATABASE TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$ARCHIVED_LOG TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOGMNR_CONTENTS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOGMNR_LOGS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOG TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOGFILE TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$TRANSACTION TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$CONTAINERS TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Required when using redo logs dictionary strategy (Oracle 19cU10+)
GRANT SELECT ON DBA_LOGMNR_DICTIONARY_BUILDLOG TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Grant access to system views for metadata
GRANT SELECT ON DBA_OBJECTS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON DBA_EXTENTS TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Grant execute permissions for LogMiner
GRANT EXECUTE ON DBMS_LOGMNR TO C##SUPERMETAL_USER;
GRANT EXECUTE ON DBMS_LOGMNR_D TO C##SUPERMETAL_USER;
-- If TDE is enabled, also grant:
-- GRANT SELECT ON DBA_TABLESPACES TO C##SUPERMETAL_USER CONTAINER=ALL;
-- GRANT SELECT ON DBA_ENCRYPTED_COLUMNS TO C##SUPERMETAL_USER CONTAINER=ALL;
```
```sql
-- Connect directly to the PDB as an Admin user
-- Requires Oracle 19c Update 10 or later
-- Create a local user in the PDB
CREATE USER SUPERMETAL_USER IDENTIFIED BY "strong-password";
-- Grant session and catalog permissions
GRANT CREATE SESSION TO SUPERMETAL_USER;
GRANT EXECUTE_CATALOG_ROLE TO SUPERMETAL_USER;
-- Grant LogMiner permissions
GRANT LOGMINING TO SUPERMETAL_USER;
-- Grant table access
GRANT SELECT ANY TABLE TO SUPERMETAL_USER;
GRANT SELECT ANY TRANSACTION TO SUPERMETAL_USER;
-- Grant access to LogMiner dictionary views
GRANT SELECT ON V_$DATABASE TO SUPERMETAL_USER;
GRANT SELECT ON V_$ARCHIVED_LOG TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOGMNR_CONTENTS TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOGMNR_LOGS TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOG TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOGFILE TO SUPERMETAL_USER;
GRANT SELECT ON V_$TRANSACTION TO SUPERMETAL_USER;
GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO SUPERMETAL_USER;
-- Required when using redo logs dictionary strategy (Oracle 19cU10+)
GRANT SELECT ON DBA_LOGMNR_DICTIONARY_BUILDLOG TO SUPERMETAL_USER;
-- Grant access to system views for metadata
GRANT SELECT ON DBA_OBJECTS TO SUPERMETAL_USER;
GRANT SELECT ON DBA_EXTENTS TO SUPERMETAL_USER;
-- Grant execute permissions for LogMiner
GRANT EXECUTE ON DBMS_LOGMNR TO SUPERMETAL_USER;
GRANT EXECUTE ON DBMS_LOGMNR_D TO SUPERMETAL_USER;
-- If TDE is enabled, also grant:
-- GRANT SELECT ON DBA_TABLESPACES TO SUPERMETAL_USER;
-- GRANT SELECT ON DBA_ENCRYPTED_COLUMNS TO SUPERMETAL_USER;
```
```sql
-- Connect as the master user
-- Create a user for Supermetal
CREATE USER SUPERMETAL_USER IDENTIFIED BY "strong-password";
GRANT CONNECT TO SUPERMETAL_USER;
GRANT CREATE SESSION TO SUPERMETAL_USER;
-- Grant LogMiner and catalog permissions
GRANT LOGMINING TO SUPERMETAL_USER;
GRANT EXECUTE_CATALOG_ROLE TO SUPERMETAL_USER;
-- Grant table access
GRANT SELECT ANY TRANSACTION TO SUPERMETAL_USER;
GRANT SELECT ANY TABLE TO SUPERMETAL_USER;
-- Use RDS procedures to grant access to system objects
BEGIN
-- Required when using redo logs dictionary strategy (Oracle 19cU10+)
rdsadmin.rdsadmin_util.grant_sys_object('DBA_LOGMNR_DICTIONARY_BUILDLOG','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$DATABASE','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$ARCHIVED_LOG','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGMNR_CONTENTS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGMNR_LOGS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOG','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGFILE','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$TRANSACTION','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$ARCHIVE_DEST_STATUS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$PARAMETER','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('DBMS_LOGMNR','SUPERMETAL_USER','EXECUTE');
rdsadmin.rdsadmin_util.grant_sys_object('DBMS_LOGMNR_D','SUPERMETAL_USER','EXECUTE');
rdsadmin.rdsadmin_util.grant_sys_object('DBA_OBJECTS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('DBA_EXTENTS','SUPERMETAL_USER','SELECT');
END;
/
-- If TDE is enabled, also grant:
-- BEGIN
-- rdsadmin.rdsadmin_util.grant_sys_object('DBA_TABLESPACES','SUPERMETAL_USER','SELECT');
-- rdsadmin.rdsadmin_util.grant_sys_object('DBA_ENCRYPTED_COLUMNS','SUPERMETAL_USER','SELECT');
-- END;
-- /
```
```sql
-- Connect to the PDB as the master user
-- Requires Oracle 19c Update 10 or later on RDS
-- Create a local user in the PDB
CREATE USER SUPERMETAL_USER IDENTIFIED BY "strong-password";
GRANT CONNECT TO SUPERMETAL_USER;
GRANT CREATE SESSION TO SUPERMETAL_USER;
-- Grant LogMiner and catalog permissions
GRANT LOGMINING TO SUPERMETAL_USER;
GRANT EXECUTE_CATALOG_ROLE TO SUPERMETAL_USER;
-- Grant table access
GRANT SELECT ANY TRANSACTION TO SUPERMETAL_USER;
GRANT SELECT ANY TABLE TO SUPERMETAL_USER;
-- Use RDS procedures to grant access to system objects
BEGIN
rdsadmin.rdsadmin_util.grant_sys_object('DBA_LOGMNR_DICTIONARY_BUILDLOG','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$DATABASE','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$ARCHIVED_LOG','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGMNR_CONTENTS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGMNR_LOGS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOG','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGFILE','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$TRANSACTION','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$ARCHIVE_DEST_STATUS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$PARAMETER','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('DBMS_LOGMNR','SUPERMETAL_USER','EXECUTE');
rdsadmin.rdsadmin_util.grant_sys_object('DBMS_LOGMNR_D','SUPERMETAL_USER','EXECUTE');
rdsadmin.rdsadmin_util.grant_sys_object('DBA_OBJECTS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('DBA_EXTENTS','SUPERMETAL_USER','SELECT');
END;
/
-- If TDE is enabled, also grant:
-- BEGIN
-- rdsadmin.rdsadmin_util.grant_sys_object('DBA_TABLESPACES','SUPERMETAL_USER','SELECT');
-- rdsadmin.rdsadmin_util.grant_sys_object('DBA_ENCRYPTED_COLUMNS','SUPERMETAL_USER','SELECT');
-- END;
-- /
```
Oracle's default profiles often have low session limits which may impact Supermetal's parallel processing capabilities when snapshotting your database.
For optimal performance, set your `SESSIONS_PER_USER` limit to the smaller of:
* The number of Supermetal agent cores, or
* The number of database cores (available from `v$parameter.cpu_count`)
At minimum, we recommend setting this value to at least 10 sessions:
```sql
-- Check current database CPU count
SELECT VALUE FROM v$parameter WHERE NAME = 'cpu_count';
-- Create profile with appropriate session limit
CREATE PROFILE SUPERMETAL_PROFILE LIMIT SESSIONS_PER_USER 10;
-- Apply profile to Supermetal user
ALTER USER SUPERMETAL_USER PROFILE SUPERMETAL_PROFILE;
```
For CDB environments, apply the profile to your common user instead:
```sql
ALTER USER C##SUPERMETAL_USER PROFILE SUPERMETAL_PROFILE;
```
Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the source in Supermetal.
### Setup LogMiner
#### Archive Log Mode
Oracle's LogMiner requires that your database is running in ARCHIVELOG mode. You can check and enable it as follows:
```sql
-- Check if database is in ARCHIVELOG mode
SELECT log_mode FROM v$database;
-- If not in ARCHIVELOG mode, enable it
SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
ALTER DATABASE ARCHIVELOG;
ALTER DATABASE OPEN;
```
[Changing log mode](https://docs.oracle.com/en/database/oracle/oracle-database/23/sutil/oracle-logminer-utility.html#GUID-7594F0D7-0ACD-46E6-BD61-2751136ECDB4) requires a restart of the oracle database.
Instruct Oracle [RMAN](https://docs.oracle.com/en/database/oracle/oracle-database/23/sutil/oracle-logminer-utility.html#GUID-7594F0D7-0ACD-46E6-BD61-2751136ECDB4) to retain archive logs for at least 3 days.
```sql
RMAN> CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 3 DAYS;
```
For optimal LogMiner performance, ensure your redo logs are appropriately sized
```sql
-- Check current redo log configuration
SELECT group#, bytes/1024/1024 AS size_mb FROM v$log;
```
Each redo log file should be at least 500MB for production databases.
If your logs are smaller, consider increasing their size by contacting your DBA
```sql
-- Check if database is in ARCHIVELOG mode
SELECT log_mode FROM v$database;
-- If not in ARCHIVELOG mode, enable it by turning on backups in AWS RDS console
```
AWS RDS Oracle instances are automatically configured with ARCHIVELOG mode enabled when backups are enabled.
Set archive log retention using the `set_configuration` procedure for at least 3 days:
```sql
BEGIN
rdsadmin.rdsadmin_util.set_configuration('archivelog retention hours', 72);
END;
/
```
Adjust this value based on your backup and recovery requirements.
#### Database Supplemental Logging
LogMiner requires supplemental logging to capture the necessary information for CDC:
* **Minimal**: Lower redo log usage but requires table-level supplemental logging configuration for each table.
* **Full**: Higher redo log usage but eliminates the need for table-level configuration.
```sql
-- Enable minimal supplemental logging at the database level
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
-- Then configure table-level supplemental logging for each table to be synced
-- Repeat for each table to be synced:
ALTER TABLE schema_name.table_name ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
```
```sql
-- Enable full supplemental logging at the database level
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
```
For more reliable CDC, consider enabling force logging at the database level:
```sql
ALTER DATABASE FORCE LOGGING;
```
```sql
-- Enable full supplemental logging using RDS procedures
BEGIN
rdsadmin.rdsadmin_util.alter_supplemental_logging(
p_action => 'ADD',
p_type => 'ALL'
);
END;
/
```
For more reliable CDC, enable force logging using the RDS specific procedure:
```sql
BEGIN
rdsadmin.rdsadmin_util.force_logging(p_enable => true);
END;
```
This ensures all database operations are logged.
# Databricks (/docs/main/targets/databricks)
import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
import { Callout } from 'fumadocs-ui/components/callout';
import { Accordion, Accordions } from 'fumadocs-ui/components/accordion';
Databricks is a unified data platform that combines key features of data lakes and data warehouses.
This guide walks you through configuring your Databricks platform to work seamlessly with Supermetal.
***
## Features
|
Feature
|
Notes
|
|
Schema Evolution
|
Automatic schema evolution
|
|
Soft Delete(s)
|
Row deletion is handled by marking a `_sm_deleted` boolean metadata column value.
|
***
## Prerequisites
Before you begin, ensure you have:
* **Supported Databricks Implementations:**
* [AWS](https://www.databricks.com/product/aws)
* [Azure](https://www.databricks.com/product/azure)
* [GCP](https://www.databricks.com/product/google-cloud)
* [Serverless Workspace](https://docs.databricks.com/aws/en/admin/workspace/serverless-workspaces)
* **Unity Catalog:** Unity Catalog enabled on your Databricks workspace.
* **SQL Warehouse:** [Serverless SQL Warehouse](https://docs.databricks.com/aws/en/compute/sql-warehouse/).
***
## Setup
### Configure Authentication
### Create a Service Principal
Follow the databricks [documentation](https://docs.databricks.com/aws/en/dev-tools/auth/oauth-m2m) to create a Service Principal.
Note down the Client ID and Client Secret.
### Create a Personal Access Token
Follow the databricks [documentation](https://docs.databricks.com/aws/en/dev-tools/auth/pat) to create a PAT.
Note down the personal access token.
### Create a SQL Warehouse
Login to Databricks console to create or use an existing SQL Warehouse.
* Go to SQL > SQL warehouses > Create SQL warehouse
* Fill in and select the required fields like
* Name
* Warehouse Size (2X Small)
* Warehouse Type (Serverless)
* Auto Stop (10 minutes)
* Scaling Min & Max (1)
* Unity Catalog (Enabled)
* etc.
* Click Create
* Once created, click on the Connection Details Tab
Note down the following details:
* Server Hostname (your-workspace.cloud.databricks.com)
* Warehouse ID (0123456789abcdef)
### Configure a Catalog
Login to Databricks console to choose a Catalog (or set a new one from [databricks documentation](https://docs.databricks.com/aws/en/data-governance/unity-catalog/get-started#overview-of-unity-catalog-setup)).
* From Databricks workspace console, navigate to Data
* Choose a Catalog (`my_catalog`)
### Create a Volume
Supermetal uses the configured volume as a temporary stage
Follow the steps from the [databricks documentation](https://docs.databricks.com/aws/en/volumes/utility-commands#create-a-volume).
* From Databricks workspace console, navigate to Catalog
* Choose the Catalog from above step (`my_catalog`)
* Search or browse for the schema that you want to add the volume to and select it.
* Click on Create Volume and specify a Name.
* Click Create
```sql
CREATE VOLUME my_catalog.my_schema.my_volume;
```
Note down the following details:
* Catalog Name (my\_catalog)
* Volume Path (/Volumes/my\_catalog/my\_schema/my\_volume)
***
## Data Types Mapping
| Apache Arrow DataType | Databricks Type | Notes |
| :----------------------------------- | :--------------- | :------------------------------------------------------- |
| `Int8` | `TINYINT` | |
| `Int16` | `SMALLINT` | |
| `Int32` | `INT` | |
| `Int64` | `BIGINT` | |
| `UInt8` | `SMALLINT` | Promoted to signed 16-bit |
| `UInt16` | `INT` | Promoted to signed 32-bit |
| `UInt32` | `BIGINT` | Promoted to signed 64-bit |
| `UInt64` | `DECIMAL(20, 0)` | Mapped to decimal to preserve full unsigned 64-bit range |
| `Float16` | `FLOAT` | Upcast to Float32 in Parquet |
| `Float32` | `FLOAT` | |
| `Float64` | `DOUBLE` | |
| `Decimal128(p, s)` where p ≤ 38 | `DECIMAL(p, s)` | |
| `Decimal128(p, s)` where p > 38 | `STRING` | Precision exceeds Databricks maximum of 38 |
| `Decimal256(p, s)` where p ≤ 38 | `DECIMAL(p, s)` | Downcast to Decimal128 in Parquet |
| `Decimal256(p, s)` where p > 38 | `STRING` | Precision exceeds Databricks maximum of 38 |
| Apache Arrow DataType | Databricks Type |
| :-------------------- | :-------------- |
| `Boolean` | `BOOLEAN` |
| Apache Arrow DataType | Databricks Type | Notes |
| :-------------------- | :-------------- | :--------------------------------------------------------------- |
| `Date32` | `DATE` | |
| `Date64` | `DATE` | Converted to Date32 in Parquet |
| `Timestamp(s, tz)` | `TIMESTAMP_NTZ` | Converted to Timestamp(ms) in Parquet for proper annotation |
| `Timestamp(ms, tz)` | `TIMESTAMP_NTZ` | |
| `Timestamp(μs, tz)` | `TIMESTAMP_NTZ` | Databricks supports microsecond precision |
| `Timestamp(ns, tz)` | `TIMESTAMP_NTZ` | Converted to Timestamp(μs) in Parquet (Databricks max precision) |
| `Time32`, `Time64` | `STRING` | Databricks does not support TIME types |
| `Interval` | `STRING` | Databricks cannot read INTERVAL from Parquet |
| Apache Arrow DataType | Databricks Type |
| :-------------------- | :-------------- |
| `Utf8`, `LargeUtf8` | `STRING` |
| Apache Arrow DataType | Databricks Type |
| :---------------------- | :-------------- |
| `Binary`, `LargeBinary` | `BINARY` |
| Apache Arrow DataType | Databricks Type | Notes |
| :--------------------------------------------------------------------------------------------------------- | :-------------- | :---------------------------------------- |
| `Utf8` [JSON Extension](https://arrow.apache.org/docs/format/CanonicalExtensions.html#json) (`arrow.json`) | `STRING` | `VARIANT` will be supported in the future |
| Apache Arrow DataType | Databricks Type | Notes |
| :-------------------------------------------- | :-------------- | :----------------------------------- |
| `List`, `LargeList`, `FixedSizeList` | `ARRAY` | Element type T is recursively mapped |
***
# Databricks Setup (/docs/main/targets/databricks/setup)
## Prerequisites
Before you begin, ensure you have:
* **Supported Databricks Implementations:**
* [AWS](https://www.databricks.com/product/aws)
* [Azure](https://www.databricks.com/product/azure)
* [GCP](https://www.databricks.com/product/google-cloud)
* [Serverless Workspace](https://docs.databricks.com/aws/en/admin/workspace/serverless-workspaces)
* **Unity Catalog:** Unity Catalog enabled on your Databricks workspace.
* **SQL Warehouse:** [Serverless SQL Warehouse](https://docs.databricks.com/aws/en/compute/sql-warehouse/).
***
## Setup
### Configure Authentication
### Create a Service Principal
Follow the databricks [documentation](https://docs.databricks.com/aws/en/dev-tools/auth/oauth-m2m) to create a Service Principal.
Note down the Client ID and Client Secret.
### Create a Personal Access Token
Follow the databricks [documentation](https://docs.databricks.com/aws/en/dev-tools/auth/pat) to create a PAT.
Note down the personal access token.
### Create a SQL Warehouse
Login to Databricks console to create or use an existing SQL Warehouse.
* Go to SQL > SQL warehouses > Create SQL warehouse
* Fill in and select the required fields like
* Name
* Warehouse Size (2X Small)
* Warehouse Type (Serverless)
* Auto Stop (10 minutes)
* Scaling Min & Max (1)
* Unity Catalog (Enabled)
* etc.
* Click Create
* Once created, click on the Connection Details Tab
Note down the following details:
* Server Hostname (your-workspace.cloud.databricks.com)
* Warehouse ID (0123456789abcdef)
### Configure a Catalog
Login to Databricks console to choose a Catalog (or set a new one from [databricks documentation](https://docs.databricks.com/aws/en/data-governance/unity-catalog/get-started#overview-of-unity-catalog-setup)).
* From Databricks workspace console, navigate to Data
* Choose a Catalog (`my_catalog`)
### Create a Volume
Supermetal uses the configured volume as a temporary stage
Follow the steps from the [databricks documentation](https://docs.databricks.com/aws/en/volumes/utility-commands#create-a-volume).
* From Databricks workspace console, navigate to Catalog
* Choose the Catalog from above step (`my_catalog`)
* Search or browse for the schema that you want to add the volume to and select it.
* Click on Create Volume and specify a Name.
* Click Create
```sql
CREATE VOLUME my_catalog.my_schema.my_volume;
```
Note down the following details:
* Catalog Name (my\_catalog)
* Volume Path (/Volumes/my\_catalog/my\_schema/my\_volume)
# Kafka (/docs/main/targets/kafka)
Replicate data to Apache Kafka with Debezium-compatible or Supermetal-native message formats.
* **Two formats**: Debezium for existing CDC consumers, Supermetal for minimal post-processing
* **Transactions**: Optional atomic writes aligned to source database transaction boundaries
***
## Message Formats
### Debezium
Fully compatible with Debezium's output format for seamless integration with existing CDC consumers.
See [Debezium change events](https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-events).
**Use when**: You have existing Debezium consumers, need before/after images, or require full source metadata.
```json
{
"payload": {
"before": null,
"after": {
"id": 1,
"first_name": "Anne",
"last_name": "Kretchmar",
"email": "annek@noanswer.org"
},
"source": {
"version": "3.4.1.Final",
"connector": "postgresql",
"name": "PostgreSQL_server",
"ts_ms": 1559033904863,
"ts_us": 1559033904863123,
"ts_ns": 1559033904863123000,
"snapshot": true,
"db": "postgres",
"sequence": "[\"24023119\",\"24023128\"]",
"schema": "public",
"table": "customers",
"txId": 555,
"lsn": 24023128,
"xmin": null
},
"op": "c",
"ts_ms": 1559033904863,
"ts_us": 1559033904863841,
"ts_ns": 1559033904863841257
}
}
```
### Supermetal
Upsert-based format requiring little to no post-processing.
Compact payload with [minimal metadata](https://docs.supermetal.io/docs/main/faq/#metadata-columns).
**Use when**: You don't need Debezium compatibility and want minimal post-processing.
```json
{
"id": 1,
"first_name": "Anne",
"last_name": "Kretchmar",
"email": "annek@noanswer.org",
"_sm_version": 1559033904863841257,
"_sm_deleted": false
}
```
| Field | Purpose |
| :------------ | :------------------------------------------------- |
| `_sm_version` | Monotonic timestamp for deduplication and ordering |
| `_sm_deleted` | Soft delete flag (`true` on delete operations) |
***
## Prerequisites
* **Kafka cluster**: Self-hosted, Confluent Cloud, Amazon MSK, Redpanda, or any Kafka-compatible broker
* **Credentials**: Username/password or mTLS certificates (if authentication enabled)
* **Schema Registry**: Confluent Schema Registry URL (if using Avro)
PostgreSQL and MySQL are supported. Additional sources coming soon.
***
## Setup
### Connection
* **Brokers**: Comma-separated bootstrap servers (e.g., `broker1:9092, broker2:9092`)
* **Security Protocol**: `Plaintext`, `Ssl`, `SaslPlaintext`, `SaslSsl`
* **SASL Mechanism**: `Plain`, `ScramSha256`, `ScramSha512` (if SASL enabled)
* **Username / Password**: For SASL authentication
### Message Format
* **Format Type**: `Debezium` or `Supermetal`
* **Value SerDe**: `JSON` (no registry) or `Avro` (requires Schema Registry)
* **Key SerDe**: Optional—defaults to JSON with primary key columns
If using Avro, configure the **Schema Registry** URL and credentials.
### Topics
Set **Topic Name Template** to control routing. Default: `supermetal.{database_or_schema}.{table}`
Configure **partitions** and **replication factor** for auto-created topics. Ensure your Kafka principal has CreateTopic permissions, or pre-create topics manually.
***
## Debezium Options
| Option | Debezium Config | Default | Description |
| :------------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------ | :--------------------------------------------- |
| Include Before | - | `true` | Include `before` image in update/delete events |
| Tombstones on Delete | [`tombstones.on.delete`](https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-property-tombstones-on-delete) | `true` | Emit null-value record after deletes |
| Emit TX Metadata | [`provide.transaction.metadata`](https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-property-provide-transaction-metadata) | `false` | Publish transaction begin/end events |
| TX Metadata Topic | [`topic.transaction`](https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-property-topic-transaction) | `transaction` | Topic name for transaction metadata |
| Skipped Operations | [`skipped.operations`](https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-property-skipped-operations) | `t` | Operations to skip (c, u, d, t, none) |
### Headers
| Header | Default | Description |
| :---------- | :------ | :------------------------------------------------ |
| `op` | off | Operation type (c, r, u, d) |
| `db` | off | Database name |
| `schema` | off | Schema name |
| `tbl` | off | Table name |
| `context` | on | `__debezium.context.*` headers |
| `pk_update` | on | `__debezium.newkey` / `__debezium.oldkey` headers |
**Example**: With `prefix: "dbz"`, `op: true`, `source_template: "{schema}.{table}"`, `context: false`:
```
dbz.op: "c"
dbz.source: "public.accounts"
```
### Data Type Handling
| Option | Debezium Config | Default | Values |
| :--------------- | :------------------------------------------------------------------------------------------------------------------------------------------------- | :--------- | :-------------------------------------------------------------------------------------------- |
| Decimal Handling | [`decimal.handling.mode`](https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-property-decimal-handling-mode) | `Precise` | `Precise`, `Double`, `String` |
| Binary Handling | [`binary.handling.mode`](https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-property-binary-handling-mode) | `Bytes` | `Bytes`, `Base64`, `Base64UrlSafe`, `Hex` |
| Time Precision | [`time.precision.mode`](https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-property-time-precision-mode) | `Adaptive` | `Adaptive`, `AdaptiveTimeMicroseconds`, `Connect`, `IsoString`, `Microseconds`, `Nanoseconds` |
***
## SerDe
| SerDe | Notes |
| :---- | :------------------------------------------------------------------------------------ |
| JSON | No registry required. Optionally use Schema Registry or embed schema in each message. |
| Avro | Requires Confluent Schema Registry. |
Separate serializers for values and keys. Key SerDe is optional—defaults to JSON with primary key columns.
***
## Topics
Each table maps to a Kafka topic via the **Topic Name Template**.
Default: `supermetal.{database_or_schema}.{table}`
### Placeholders
| Placeholder | Description |
| :--------------------- | :------------------------------------ |
| `{database}` | Source database name |
| `{schema}` | Source schema name |
| `{table}` | Table name |
| `{database_or_schema}` | Database if present, otherwise schema |
### Naming Mode
| Mode | Description |
| :-------- | :----------------------------------------- |
| `Default` | Replace invalid characters with underscore |
| `Unicode` | Encode invalid characters as `_uXXXX` |
### Creation
| Option | Default | Description |
| :----------------- | :--------- | :---------------------- |
| Partitions | `1` | Number of partitions |
| Replication Factor | `1` | Number of replicas |
| Operation Timeout | `30000` ms | Admin operation timeout |
***
## Transactions
Align Kafka delivery with source database transaction boundaries. Consumers never observe partial transactions.
| Setting | Default | Description |
| :---------------- | :--------- | :------------------------------------------------ |
| Enabled | `false` | Enable transactional producer |
| Transactional ID | auto | Custom `transactional.id` (auto-derived if empty) |
| Timeout | `60000` ms | `transaction.timeout.ms` (broker-side) |
| Operation Timeout | `30000` ms | Client-side deadline for commit/abort |
Transactions require `acks=All` and enable idempotence automatically.
Consumers must set `isolation.level=read_committed` to only see committed transactions:
```java
props.put("isolation.level", "read_committed");
```
***
Apache® and Apache Kafka® are either registered trademarks or trademarks of the Apache Software Foundation. Debezium is a unregistered trademark of the Commonhaus Foundation. No endorsement by these organizations is implied by the use of these marks.
# Kafka Setup (/docs/main/targets/kafka/setup)
## Prerequisites
* **Kafka cluster**: Self-hosted, Confluent Cloud, Amazon MSK, Redpanda, or any Kafka-compatible broker
* **Credentials**: Username/password or mTLS certificates (if authentication enabled)
* **Schema Registry**: Confluent Schema Registry URL (if using Avro)
PostgreSQL and MySQL are supported. Additional sources coming soon.
***
## Setup
### Connection
* **Brokers**: Comma-separated bootstrap servers (e.g., `broker1:9092, broker2:9092`)
* **Security Protocol**: `Plaintext`, `Ssl`, `SaslPlaintext`, `SaslSsl`
* **SASL Mechanism**: `Plain`, `ScramSha256`, `ScramSha512` (if SASL enabled)
* **Username / Password**: For SASL authentication
### Message Format
* **Format Type**: `Debezium` or `Supermetal`
* **Value SerDe**: `JSON` (no registry) or `Avro` (requires Schema Registry)
* **Key SerDe**: Optional—defaults to JSON with primary key columns
If using Avro, configure the **Schema Registry** URL and credentials.
### Topics
Set **Topic Name Template** to control routing. Default: `supermetal.{database_or_schema}.{table}`
Configure **partitions** and **replication factor** for auto-created topics. Ensure your Kafka principal has CreateTopic permissions, or pre-create topics manually.
***
## Debezium Options
| Option | Debezium Config | Default | Description |
| :------------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------ | :--------------------------------------------- |
| Include Before | - | `true` | Include `before` image in update/delete events |
| Tombstones on Delete | [`tombstones.on.delete`](https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-property-tombstones-on-delete) | `true` | Emit null-value record after deletes |
| Emit TX Metadata | [`provide.transaction.metadata`](https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-property-provide-transaction-metadata) | `false` | Publish transaction begin/end events |
| TX Metadata Topic | [`topic.transaction`](https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-property-topic-transaction) | `transaction` | Topic name for transaction metadata |
| Skipped Operations | [`skipped.operations`](https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-property-skipped-operations) | `t` | Operations to skip (c, u, d, t, none) |
### Headers
| Header | Default | Description |
| :---------- | :------ | :------------------------------------------------ |
| `op` | off | Operation type (c, r, u, d) |
| `db` | off | Database name |
| `schema` | off | Schema name |
| `tbl` | off | Table name |
| `context` | on | `__debezium.context.*` headers |
| `pk_update` | on | `__debezium.newkey` / `__debezium.oldkey` headers |
**Example**: With `prefix: "dbz"`, `op: true`, `source_template: "{schema}.{table}"`, `context: false`:
```
dbz.op: "c"
dbz.source: "public.accounts"
```
### Data Type Handling
| Option | Debezium Config | Default | Values |
| :--------------- | :------------------------------------------------------------------------------------------------------------------------------------------------- | :--------- | :-------------------------------------------------------------------------------------------- |
| Decimal Handling | [`decimal.handling.mode`](https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-property-decimal-handling-mode) | `Precise` | `Precise`, `Double`, `String` |
| Binary Handling | [`binary.handling.mode`](https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-property-binary-handling-mode) | `Bytes` | `Bytes`, `Base64`, `Base64UrlSafe`, `Hex` |
| Time Precision | [`time.precision.mode`](https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-property-time-precision-mode) | `Adaptive` | `Adaptive`, `AdaptiveTimeMicroseconds`, `Connect`, `IsoString`, `Microseconds`, `Nanoseconds` |
***
## SerDe
| SerDe | Notes |
| :---- | :------------------------------------------------------------------------------------ |
| JSON | No registry required. Optionally use Schema Registry or embed schema in each message. |
| Avro | Requires Confluent Schema Registry. |
Separate serializers for values and keys. Key SerDe is optional—defaults to JSON with primary key columns.
***
## Topics
Each table maps to a Kafka topic via the **Topic Name Template**.
Default: `supermetal.{database_or_schema}.{table}`
### Placeholders
| Placeholder | Description |
| :--------------------- | :------------------------------------ |
| `{database}` | Source database name |
| `{schema}` | Source schema name |
| `{table}` | Table name |
| `{database_or_schema}` | Database if present, otherwise schema |
### Naming Mode
| Mode | Description |
| :-------- | :----------------------------------------- |
| `Default` | Replace invalid characters with underscore |
| `Unicode` | Encode invalid characters as `_uXXXX` |
### Creation
| Option | Default | Description |
| :----------------- | :--------- | :---------------------- |
| Partitions | `1` | Number of partitions |
| Replication Factor | `1` | Number of replicas |
| Operation Timeout | `30000` ms | Admin operation timeout |
***
## Transactions
Align Kafka delivery with source database transaction boundaries. Consumers never observe partial transactions.
| Setting | Default | Description |
| :---------------- | :--------- | :------------------------------------------------ |
| Enabled | `false` | Enable transactional producer |
| Transactional ID | auto | Custom `transactional.id` (auto-derived if empty) |
| Timeout | `60000` ms | `transaction.timeout.ms` (broker-side) |
| Operation Timeout | `30000` ms | Client-side deadline for commit/abort |
Transactions require `acks=All` and enable idempotence automatically.
Consumers must set `isolation.level=read_committed` to only see committed transactions:
```java
props.put("isolation.level", "read_committed");
```
# SQL Server (/docs/main/sources/sqlserver)
import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
import { Callout } from 'fumadocs-ui/components/callout';
import { Accordion, Accordions } from 'fumadocs-ui/components/accordion';
import {
CreateAdminUser,
CreateMinimalUser,
CreateDBOwnerUser,
EnableCDCTable,
StandardEnableCDCDB,
StandardConfigureCDCJob,
RDSEnableCDCDB,
RDSConfigureCDCJob,
AzureConfigureCDCJob,
AzureEnableCDCDB
} from '@supermetal/docs-components';
SQL Server's Change Data Capture (CDC) through capture tables enables reliable, low impact change capture for data integration. Supermetal builds on this foundation to deliver efficient data extraction and replication with minimal source database overhead.
This guide covers features and configuration steps required to connect SQL Server with Supermetal, focusing on CDC setup and permissions.
***
## Features
|
Feature
|
Notes
|
|
Initial Data Sync
|
Performs fast initial data loads (snapshots) and historical backfills by chunking data within each table and reading each chunk concurrently using multiple connections:
**Physical Partitioning**: Automatically detects and utilizes existing SQL Server table partitions, extracting data from each partition in parallel
**Dynamic Range Partitioning**: For non-partitioned tables, identifies suitable partition columns (clustered indexes or primary keys) and creates optimal query boundaries based on data distribution
|
|
Change Data Capture
|
Captures data changes (inserts, updates, deletes) in real-time using SQL Server's native [CDC functionality](https://learn.microsoft.com/en-us/sql/relational-databases/track-changes/about-change-data-capture-sql-server). Leverages the [SQL Server Agent](https://learn.microsoft.com/en-us/sql/ssms/agent/sql-server-agent) to asynchronously populate dedicated [CDC capture tables](https://learn.microsoft.com/en-us/sql/relational-databases/system-tables/change-data-capture-tables-transact-sql) from transaction log entries.
|
|
CDC Processing Modes
|
**Standard CDC Mode**: Processes changes from individual CDC capture tables with minimal latency. This mode delivers lower latency data replication and is suitable when you don't need strict transactional consistency across multiple tables.
**Strictly Transactional CDC Mode**: When enabled (`transactional_cdc_enabled: true`), Supermetal guarantees that all changes across multiple tables are processed while preserving their original transaction boundaries. This ensures that related data changes stay grouped together exactly as they were in the source database—for example, when an order and its line items were updated together, they will remain grouped together during replication.
|
|
Configurable CDC Processing
|
Fine-tune CDC behavior with configurable parameters for permissions/roles, polling intervals, maximum scan cycles, and batch sizes.
|
|
Catalog Support
|
Ability to configure which schemas, tables and columns to sync.
|
***
## Prerequisites
Before you begin, ensure you have:
* **SQL Server Requirements:**
* **Version:** SQL Server 2016 or higher
* **Supported Editions:**
* `Standard SQL Server` (Enterprise, Standard, Personal, Azure SQL Managed Instance)
* `Azure SQL Database`
* `Amazon RDS for SQL Server`
CDC is not supported on `Azure Synapse`, `Azure SQL Edge`, or `Azure Synapse Serverless`.
* **SQL Server Agent:** The [SQL Server Agent](https://learn.microsoft.com/en-us/sql/ssms/agent/sql-server-agent) service must be running, as it manages the CDC capture jobs.
* **Setup Requirements:**
* **Database Permissions:** `sysadmin` or `db_owner` role for CDC setup (see [Setup](#setup) for role-specific instructions)
* **Network Connectivity:** Ensure Supermetal can reach your SQL Server (default port: 1433)
***
## Setup
### Permissions Overview
Supermetal's SQL Server integration provides different levels of Change Data Capture (CDC) automation depending on your SQL Server edition and user permissions. The table below summarizes the automation capabilities:
| SQL Server Edition | User Role | Automation Capabilities |
| :----------------- | :----------- | :---------------------------------------------------------------------------------------------- |
| Standard | `sysadmin` | Full CDC workflow automation (database CDC enablement, table CDC enablement, job configuration) |
| Standard | `db_owner` | Partial automation (table CDC enablement, job configuration) |
| Standard | Regular user | Limited automation (requires manual setup by a privileged user) |
| Azure SQL Database | `db_owner` | Full CDC workflow automation (all CDC operations) |
| Azure SQL Database | Regular user | Limited automation (requires manual setup by a privileged user) |
| AWS RDS | `db_owner` | Partial automation (table CDC enablement, job configuration) |
| AWS RDS | Regular user | Limited automation (requires manual setup by a privileged user) |
We recommend creating a `db_owner` user for all SQL server editions as it provides the best tradeoff between automation and permissions.
### Setup Instructions
#### SYSADMIN Role \[!toc]
The `sysadmin` server role provides the highest level of permissions, enabling Supermetal to fully automate the CDC workflow, including enabling CDC at the database level, configuring tables for CDC, and managing CDC jobs.
The `sysadmin` role is only available on Standard SQL Server editions. Azure SQL Database and AWS RDS restrict the use of `sysadmin` privileges.
#### Create a SYSADMIN User \[!toc]
AWS RDS for SQL Server does not allow the creation of `sysadmin` users. Please use the DB\_OWNER role instead.
Azure SQL Database does not support the `sysadmin` role. Please use the DB\_OWNER role instead.
Replace `'strong-password'` with a secure, unique password for the Supermetal user. Store this password securely, as you'll need it when configuring the source in Supermetal.
#### DB\_OWNER Role \[!toc]
The `db_owner` database role provides sufficient permissions for most CDC operations except enabling CDC at the database level on Standard SQL Server and AWS RDS (this step must be performed by a `sysadmin`). On Azure SQL Database, `db_owner` can automate the entire CDC workflow.
#### Create a DB\_OWNER User \[!toc]
Replace the placeholder values in the script with your actual information:
* `'strong-password'`: Replace with a secure, unique password for the `supermetal_user`.
* `DATABASE_NAME`: The name of the database you want to replicate from.
#### Enable Snapshot Isolation \[!toc]
Enable snapshot isolation on the database to ensure consistent reads during CDC operations. This step must be performed by a `sysadmin` user .
Not required for Azure SQL Database. Supermetal automatically manages snapshot isolation enablement.
Replace `DATABASE_NAME` with the name of the database you want to replicate from.
#### Enable CDC at Database Level \[!toc]
This step is required for Standard SQL Server and AWS RDS. It creates the CDC schema and starts the SQL Server Agent jobs necessary for CDC. This step must be performed by a user with `sysadmin` privileges.
Not required for Azure SQL Database. Supermetal automatically manages CDC enablement at the database level.
#### MINIMAL\_ACCESS\_USER Role \[!toc]
A minimal access user has limited permissions and requires a user with higher privileges (such as `sysadmin` or `db_owner`) to set up the CDC environment. After setup, the minimal access user can be used for ongoing replication.
#### Create a Minimal Access User \[!toc]
Replace the placeholder values in the script with your actual information:
* `'strong-password'`: Replace with a secure, unique password for the `supermetal_user`.
* `DATABASE_NAME`: The name of the database you want to replicate from.
* `SCHEMA_NAME`: The schema of the table you want to replicate from.
* `TABLE_NAME`: The name of the table you want to replicate from.
#### Enable Snapshot Isolation \[!toc]
Enable snapshot isolation on the database to ensure consistent reads during CDC operations. This step must be performed by a user with appropriate privileges (typically `sysadmin` or `db_owner`).
Replace `DATABASE_NAME` with the name of the database you want to replicate from.
#### Enable CDC at Database Level \[!toc]
This step must be performed by a user with appropriate privileges (typically `sysadmin` for Standard and RDS, `db_owner` for Azure).
If this user does not have SELECT permission on the database schema, then we need to explicitly grant SELECT permission on the \[cdc] schema:
```sql
GRANT SELECT on SCHEMA::[cdc] to [supermetal_user];
```
#### Enable CDC for Tables
The following query creates a capture instance for a specific table. This step must be performed by a user with appropriate privileges.
Replace the placeholder values in the script with your actual information:
* `SCHEMA_NAME`: The schema of the table you want to replicate from.
* `TABLE_NAME`: The name of the table you want to replicate from.
#### Configure CDC Job Parameters (Optional)
You can customize the CDC job parameters to optimize performance based on your workload.
Replace the parameter values with appropriate values for your environment:
* `INSERT_MAX_SCANS_VALUE`: Maximum number of scan cycles (default: 10)
* `INSERT_MAX_TRANS_VALUE`: Maximum number of transactions (default: 500)
* `INSERT_POLL_INTERVAL_SECS_VALUE`: Polling interval in seconds (default: 5)
***
## Data Types Mapping
| SQL Server Type(s) | Apache Arrow DataType | Notes |
| :----------------- | :-------------------------------------- | :--------------------------------------------------------------------------- |
| `TINYINT` | `UInt8` | |
| `SMALLINT` | `Int16` | |
| `INT` | `Int32` | |
| `BIGINT` | `Int64` | |
| `SMALLMONEY` | `Decimal128(10, 4)` | Fixed precision with 4 decimal places. |
| `MONEY` | `Decimal128(19, 4)` | Fixed precision with 4 decimal places. |
| `DECIMAL(p,s)` | `Decimal128(p, s)` / `Decimal256(p, s)` | Uses SQL Server precision and scale directly. Decimal256 for precision > 38. |
| `NUMERIC(p,s)` | `Decimal128(p, s)` / `Decimal256(p, s)` | Same as DECIMAL. Default precision is 18. |
| `REAL` | `Float64` | |
| `FLOAT` | `Float64` | |
| SQL Server Type(s) | Apache Arrow DataType | Notes |
| :----------------- | :-------------------- | :---- |
| `BIT` | `Boolean` | |
| SQL Server Type(s) | Apache Arrow DataType | Notes |
| :-------------------- | :--------------------- | :------------------------------------------------------------------------ |
| `DATE` | `Date32` | Days since UNIX epoch. |
| `TIME` | `Time64(ns)` | Nanoseconds since midnight. |
| `SMALLDATETIME` | `Timestamp(s)` | Second-level precision, minute-level accuracy (rounds to nearest minute). |
| `DATETIME` | `Timestamp(ms)` | Millisecond precision, 3.33ms accuracy. |
| `DATETIME2(0)` | `Timestamp(s)` | Second precision. |
| `DATETIME2(1-3)` | `Timestamp(ms)` | Millisecond precision (1-3 fractional digits). |
| `DATETIME2(4-6)` | `Timestamp(μs)` | Microsecond precision (4-6 fractional digits). |
| `DATETIME2(7)` | `Utf8` | Mapped to string to preserve full range (years 1-9999). |
| `DATETIMEOFFSET(0)` | `Timestamp(s, "UTC")` | Second precision, normalized to UTC. |
| `DATETIMEOFFSET(1-3)` | `Timestamp(ms, "UTC")` | Millisecond precision (1-3 fractional digits), normalized to UTC. |
| `DATETIMEOFFSET(4-6)` | `Timestamp(μs, "UTC")` | Microsecond precision (4-6 fractional digits), normalized to UTC. |
| `DATETIMEOFFSET(7)` | `Utf8` | Mapped to string to preserve full range (years 0001-9999). |
| SQL Server Type(s) | Apache Arrow DataType | Notes |
| :------------------ | :-------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `CHAR`, `VARCHAR` | `Utf8` | |
| `NCHAR`, `NVARCHAR` | `Utf8` | Unicode character strings. |
| `TEXT` | `Utf8` | Legacy large text type ([deprecated](https://learn.microsoft.com/en-us/sql/t-sql/data-types/ntext-text-and-image-transact-sql)). Use VARCHAR(MAX) instead. |
| `NTEXT` | `Utf8` | Legacy large Unicode text type ([deprecated](https://learn.microsoft.com/en-us/sql/t-sql/data-types/ntext-text-and-image-transact-sql)). Use NVARCHAR(MAX) instead. |
| `UNIQUEIDENTIFIER` | `Utf8` | UUID/GUID represented as a string. |
| SQL Server Type(s) | Apache Arrow DataType | Notes |
| :-------------------- | :-------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `BINARY`, `VARBINARY` | `Binary` | |
| `ROWVERSION` | `Binary` | Automatically incremented binary identifier. See [rowversion](https://learn.microsoft.com/en-us/sql/t-sql/data-types/rowversion-transact-sql). |
| `IMAGE` | `Binary` | Legacy large binary type ([deprecated](https://learn.microsoft.com/en-us/sql/t-sql/data-types/ntext-text-and-image-transact-sql)). Use VARBINARY(MAX) instead. |
| SQL Server Type(s) | Apache Arrow DataType |
| :----------------- | :-------------------- |
| `XML` | `Utf8` |
| `GEOGRAPHY` | *Unsupported* |
| `GEOMETRY` | *Unsupported* |
| `HIERARCHYID` | *Unsupported* |
***
# SQL Server Setup (/docs/main/sources/sqlserver/setup)
## Prerequisites
Before you begin, ensure you have:
* **SQL Server Requirements:**
* **Version:** SQL Server 2016 or higher
* **Supported Editions:**
* `Standard SQL Server` (Enterprise, Standard, Personal, Azure SQL Managed Instance)
* `Azure SQL Database`
* `Amazon RDS for SQL Server`
CDC is not supported on `Azure Synapse`, `Azure SQL Edge`, or `Azure Synapse Serverless`.
* **SQL Server Agent:** The [SQL Server Agent](https://learn.microsoft.com/en-us/sql/ssms/agent/sql-server-agent) service must be running, as it manages the CDC capture jobs.
* **Setup Requirements:**
* **Database Permissions:** `sysadmin` or `db_owner` role for CDC setup (see [Setup](#setup) for role-specific instructions)
* **Network Connectivity:** Ensure Supermetal can reach your SQL Server (default port: 1433)
***
## Setup
### Permissions Overview
Supermetal's SQL Server integration provides different levels of Change Data Capture (CDC) automation depending on your SQL Server edition and user permissions. The table below summarizes the automation capabilities:
| SQL Server Edition | User Role | Automation Capabilities |
| :----------------- | :----------- | :---------------------------------------------------------------------------------------------- |
| Standard | `sysadmin` | Full CDC workflow automation (database CDC enablement, table CDC enablement, job configuration) |
| Standard | `db_owner` | Partial automation (table CDC enablement, job configuration) |
| Standard | Regular user | Limited automation (requires manual setup by a privileged user) |
| Azure SQL Database | `db_owner` | Full CDC workflow automation (all CDC operations) |
| Azure SQL Database | Regular user | Limited automation (requires manual setup by a privileged user) |
| AWS RDS | `db_owner` | Partial automation (table CDC enablement, job configuration) |
| AWS RDS | Regular user | Limited automation (requires manual setup by a privileged user) |
We recommend creating a `db_owner` user for all SQL server editions as it provides the best tradeoff between automation and permissions.
### Setup Instructions
#### SYSADMIN Role \[!toc]
The `sysadmin` server role provides the highest level of permissions, enabling Supermetal to fully automate the CDC workflow, including enabling CDC at the database level, configuring tables for CDC, and managing CDC jobs.
The `sysadmin` role is only available on Standard SQL Server editions. Azure SQL Database and AWS RDS restrict the use of `sysadmin` privileges.
#### Create a SYSADMIN User \[!toc]
AWS RDS for SQL Server does not allow the creation of `sysadmin` users. Please use the DB\_OWNER role instead.
Azure SQL Database does not support the `sysadmin` role. Please use the DB\_OWNER role instead.
Replace `'strong-password'` with a secure, unique password for the Supermetal user. Store this password securely, as you'll need it when configuring the source in Supermetal.
#### DB\_OWNER Role \[!toc]
The `db_owner` database role provides sufficient permissions for most CDC operations except enabling CDC at the database level on Standard SQL Server and AWS RDS (this step must be performed by a `sysadmin`). On Azure SQL Database, `db_owner` can automate the entire CDC workflow.
#### Create a DB\_OWNER User \[!toc]
Replace the placeholder values in the script with your actual information:
* `'strong-password'`: Replace with a secure, unique password for the `supermetal_user`.
* `DATABASE_NAME`: The name of the database you want to replicate from.
#### Enable Snapshot Isolation \[!toc]
Enable snapshot isolation on the database to ensure consistent reads during CDC operations. This step must be performed by a `sysadmin` user .
Not required for Azure SQL Database. Supermetal automatically manages snapshot isolation enablement.
Replace `DATABASE_NAME` with the name of the database you want to replicate from.
#### Enable CDC at Database Level \[!toc]
This step is required for Standard SQL Server and AWS RDS. It creates the CDC schema and starts the SQL Server Agent jobs necessary for CDC. This step must be performed by a user with `sysadmin` privileges.
Not required for Azure SQL Database. Supermetal automatically manages CDC enablement at the database level.
#### MINIMAL\_ACCESS\_USER Role \[!toc]
A minimal access user has limited permissions and requires a user with higher privileges (such as `sysadmin` or `db_owner`) to set up the CDC environment. After setup, the minimal access user can be used for ongoing replication.
#### Create a Minimal Access User \[!toc]
Replace the placeholder values in the script with your actual information:
* `'strong-password'`: Replace with a secure, unique password for the `supermetal_user`.
* `DATABASE_NAME`: The name of the database you want to replicate from.
* `SCHEMA_NAME`: The schema of the table you want to replicate from.
* `TABLE_NAME`: The name of the table you want to replicate from.
#### Enable Snapshot Isolation \[!toc]
Enable snapshot isolation on the database to ensure consistent reads during CDC operations. This step must be performed by a user with appropriate privileges (typically `sysadmin` or `db_owner`).
Replace `DATABASE_NAME` with the name of the database you want to replicate from.
#### Enable CDC at Database Level \[!toc]
This step must be performed by a user with appropriate privileges (typically `sysadmin` for Standard and RDS, `db_owner` for Azure).
If this user does not have SELECT permission on the database schema, then we need to explicitly grant SELECT permission on the \[cdc] schema:
```sql
GRANT SELECT on SCHEMA::[cdc] to [supermetal_user];
```
#### Enable CDC for Tables
The following query creates a capture instance for a specific table. This step must be performed by a user with appropriate privileges.
Replace the placeholder values in the script with your actual information:
* `SCHEMA_NAME`: The schema of the table you want to replicate from.
* `TABLE_NAME`: The name of the table you want to replicate from.
#### Configure CDC Job Parameters (Optional)
You can customize the CDC job parameters to optimize performance based on your workload.
Replace the parameter values with appropriate values for your environment:
* `INSERT_MAX_SCANS_VALUE`: Maximum number of scan cycles (default: 10)
* `INSERT_MAX_TRANS_VALUE`: Maximum number of transactions (default: 500)
* `INSERT_POLL_INTERVAL_SECS_VALUE`: Polling interval in seconds (default: 5)
# PostgreSQL (/docs/main/sources/pg)
import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
import { Callout } from 'fumadocs-ui/components/callout';
import { Steps, Step } from 'fumadocs-ui/components/steps';
import { Accordion, Accordions } from 'fumadocs-ui/components/accordion';
PostgreSQL (Postgres) is a powerful open-source relational database that supports logical replication, enabling real-time Change Data Capture (CDC).
This guide walks you through configuring your PostgreSQL database to work seamlessly with Supermetal.
***
## Features
|
Feature
|
Notes
|
|
Initial Data Sync
|
Performs fast initial data loads (snapshots) and historical backfills by chunking data within each table and reading each chunk concurrently using multiple connections.
|
|
Change Data Capture
|
Captures data changes (inserts, updates, deletes) in real-time using PostgreSQL's native logical replication for minimal database impact and guaranteed consistency.
|
|
Transactional Consistency
|
Guarantees that the data synced reflects a consistent point in the upstream database's timeline, respecting transaction boundaries.
|
|
Catalog Support
|
Ability to configure which schemas, tables and columns to sync.
|
***
## Prerequisites
Before you begin, ensure you have:
* **Supported PostgreSQL Version:** PostgreSQL version 9.4 or higher.
* **Database Admin Access:** Sufficient privileges to create users, grant permissions, and modify database configuration parameters. This might require `superuser` privileges or specific cloud provider roles (e.g., `rds_superuser` on AWS RDS, `azure_pg_admin` on Azure).
* **Network Connectivity:** Ensure that Supermetal's services can reach your PostgreSQL database. You may need to configure firewall rules, security groups, or VPC peering.
***
## Setup
Supermetal supports two replication types for PostgreSQL sources:
* **Snapshot**: One-time or periodic full table snapshots.
* **Logical Replication**: Real-time change data capture (CDC) with initial snapshot.
Follow these steps to configure PostgreSQL for snapshot-only replication:
### Create Read-Only User
Create a dedicated database user for Supermetal with read-only permissions.
```sql
-- Create user with a secure password
CREATE USER supermetal_user WITH PASSWORD 'strong-password';
```
Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the source in Supermetal.
### Grant Table Access
Grant SELECT permissions on the schemas and tables you want to replicate.
```sql
-- Grant schema access
GRANT USAGE ON SCHEMA public TO supermetal_user;
-- Grant select permission on all current tables in the schema
GRANT SELECT ON ALL TABLES IN SCHEMA public TO supermetal_user;
-- Ensure select permission is granted on any new tables created in the future
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT SELECT ON TABLES TO supermetal_user;
-- (Optional but Recommended) Bypass Row-Level Security (RLS)
-- This ensures Supermetal reads all data, regardless of RLS policies
-- applied for application users. Only grant if necessary and understood.
ALTER USER supermetal_user BYPASSRLS;
```
Repeat for each schema being replicated. Replace `public` with your schema name.
### Configure PostgreSQL Source in Supermetal
You are now ready to configure PostgreSQL as a source within Supermetal. Select **Snapshot** as the replication type when configuring the source.
Follow these steps to configure PostgreSQL for logical replication with CDC:
### Configure User and Permissions
Create a dedicated database user for Supermetal with the necessary permissions for replication and data access.
#### Create Replication User
```sql
-- Create user with minimal required privileges
CREATE USER supermetal_user
WITH PASSWORD 'strong-password'
REPLICATION;
```
```sql
-- Create user and grant RDS replication role
CREATE USER supermetal_user
WITH PASSWORD 'strong-password';
GRANT rds_replication TO supermetal_user;
```
```sql
-- Create replication user
CREATE USER supermetal_user
WITH PASSWORD 'strong-password';
ALTER USER supermetal_user WITH REPLICATION;
```
```sql
-- Create replication user
CREATE USER supermetal_user
WITH PASSWORD 'strong-password';
ALTER ROLE supermetal_user WITH REPLICATION;
```
Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the source in Supermetal.
#### Grant Table Access
```sql
-- Current and future table access
GRANT USAGE ON SCHEMA public TO supermetal_user;
-- Grant select permission on all current tables in the schema
GRANT SELECT ON ALL TABLES IN SCHEMA public TO supermetal_user;
-- Ensure select permission is granted on any new tables created in the future
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT SELECT ON TABLES TO supermetal_user;
-- (Optional but Recommended) Bypass Row-Level Security (RLS)
-- This ensures Supermetal syncs all data, regardless of RLS policies
-- applied for application users. Only grant if necessary and understood.
ALTER USER supermetal_user BYPASSRLS;
```
Repeat for each schema being replicated. Replace `public` with your schema name.
### Configure Replication Settings (WAL)
Logical replication relies on PostgreSQL's Write-Ahead Log (WAL). You need to configure specific parameters to enable it.
#### Understanding WAL Settings
These settings must be configured for logical replication to work:
* `wal_level`: Must be set to `logical`. This adds the necessary information to the WAL for logical decoding.
* `max_wal_senders`: Defines the maximum number of concurrent replication connections the server can support. Supermetal uses these connections for CDC.
* `max_replication_slots`: Defines the maximum number of replication slots the server can manage. Each Supermetal CDC connection requires a replication slot to track its progress and prevent the server from purging WAL files needed for replication. This should generally be equal to or greater than max\_wal\_senders.
#### Platform-Specific Configuration
Apply the settings based on your PostgreSQL environment.
1. Check current settings:
```sql
-- View all replication settings
SELECT name, setting, unit, context
FROM pg_settings
WHERE name IN (
'wal_level', 'max_wal_senders', 'max_replication_slots'
);
```
2. Configure settings:
```sql
ALTER SYSTEM SET wal_level = 'logical';
ALTER SYSTEM SET max_wal_senders = 10; -- Adjust for your tables
ALTER SYSTEM SET max_replication_slots = 10; -- Match max_wal_senders
```
3. Restart PostgreSQL
Modify the DB Parameter Group:
1. Go to your RDS instance configuration.
2. Find the associated DB Parameter Group and modify it (or create a new one and associate it).
3. Set the following parameters:
1. `rds.logical_replication` = 1 (This enables logical replication on RDS)
2. `wal_level` = logical (Often set automatically when rds.logical\_replication is 1)
3. `max_wal_senders` = 10 (Adjust as needed)
4. `max_replication_slots` = 10 (Match or exceed max\_wal\_senders)
4. Apply to RDS instance and reboot: Applying changes to a Parameter Group that includes static parameters like wal\_level requires rebooting the RDS instance.
1. Enable Logical Decoding Flag:
2. Navigate to your Cloud SQL for PostgreSQL instance in the Google Cloud Console. Click Edit.
3. Go to the Flags section.
4. Add or modify the following flags:
1. `cloudsql.logical_decoding` = on
2. `wal_sender_timeout` = 0 (Optional but recommended. Setting to 0 disables the timeout, preventing replication issues during long transactions or initial syncs. Default is 60 seconds.)
5. Automatic Management: Cloud SQL automatically manages `wal_level`, `max_wal_senders`, and `max_replication_slots` when `cloudsql.logical_decoding` is enabled. You do not need to set these manually.
6. You'll need to restart your instance to update your instance with the changes.
1. Navigate to your Azure Database for PostgreSQL instance in the Azure Portal.
2. Go to Server parameters. Set the following parameters:
1. `wal_level` = logical
2. `max_wal_senders` = 10 (Adjust as needed)
3. `max_replication_slots` = 10 (Match or exceed max\_wal\_senders)
4. `wal_sender_timeout` = 0 (Optional but recommended, prevents timeouts. Default might be 60000ms)
3. Save and Restart Server: Changes to these parameters require a server restart to take effect. Azure Portal will prompt you to save and restart.
Changes to WAL parameters require a server restart.
### Configure Publications
Publications define which tables are replicated. Choose between replicating all tables or specific ones.
```sql
-- Option 1: Create a publication for ALL tables
CREATE PUBLICATION supermetal_publication FOR ALL TABLES;
-- Option 2: For warehouse targets, to sync partitioned tables as root, include publish_via_partition_root flag (https://www.postgresql.org/docs/current/sql-createpublication.html#SQL-CREATEPUBLICATION-PARAMS-WITH-PUBLISH-VIA-PARTITION-ROOT)
CREATE PUBLICATION supermetal_publication FOR ALL TABLES WITH (publish_via_partition_root);
```
### Configure PostgreSQL Source in Supermetal
You are now ready to configure PostgreSQL as a source within Supermetal. Select **Logical Replication** as the replication type when configuring the source.
## High Availability
Supermetal supports fast failover for clustered PostgreSQL databases (e.g., AWS Aurora, RDS Multi-AZ).
* **Use Cluster Endpoints:** Use the **Cluster Writer Endpoint** (e.g., `my-cluster.cluster-xyz.us-east-1.rds.amazonaws.com`) rather than instance addresses.
* **Automatic Recovery:** During logical replication, Supermetal detects failovers and resumes streaming from the new primary (typically \< 10 seconds).
Transient connection errors (network timeouts, server restarts, failovers) are retried with backoff for up to 5 minutes (configurable via `retry_window_seconds`).
***
## Data Types Mapping
| PostgreSQL Type(s) | Apache Arrow DataType | Notes |
| :----------------- | :-------------------- | :---- |
| `BOOL` | `Boolean` | |
| PostgreSQL Type(s) | Apache Arrow DataType | Notes |
| :------------------------ | :-------------------------------------- | :----------------------------------------------------------------------------------------------- |
| `INT2` | `Int16` | |
| `INT4` | `Int32` | |
| `INT8`, `MONEY` | `Int64` | `MONEY` is mapped as a fixed-point integer. |
| `FLOAT4` | `Float32` | Single precision floating-point. |
| `FLOAT8` | `Float64` | Double precision floating-point. |
| `NUMERIC(p, s)` | `Decimal128(p, s)` / `Decimal256(p, s)` | Uses PostgreSQL precision (`p`) and scale (`s`) directly to ensure exact numeric representation. |
| `NUMERIC` (unconstrained) | `Utf8` | Unconstrained NUMERIC defaults to `Utf8` as precision/scale are undefined. |
| `OID` | `UInt32` | Object Identifier. |
| `PG_LSN` | `UInt64` | PostgreSQL Log Sequence Number. |
| PostgreSQL Type(s) | Apache Arrow DataType | Notes |
| :------------------- | :--------------------- | :--------------------------------------------------------------------------------------- |
| `DATE` | `Date32` | Days since UNIX epoch. Precision preserved at the day level. |
| `TIME` | `Time64(µs)` | Microseconds since midnight. Precision preserved at the microsecond level. |
| `TIMESTAMP` | `Timestamp(µs)` | Microseconds since UNIX epoch, timezone naive. Precision preserved at microsecond level. |
| `TIMESTAMPTZ` | `Timestamp(µs, "UTC")` | Microseconds since UNIX epoch, assumed UTC. Precision preserved at microsecond level. |
| `TIMETZ`, `INTERVAL` | `Utf8` | Temporal types with string representation in Arrow. |
| PostgreSQL Type(s) | Apache Arrow DataType | Notes |
| :----------------------------------------------------- | :-------------------- | :----------------------------------------------------------------------- |
| `CHAR`, `VARCHAR`, `TEXT`, `BPCHAR`, `NAME`, `UNKNOWN` | `Utf8` | General-purpose text types. |
| `citext`, `ltree`, `lquery`, `ltxtquery` | `Utf8` | Special text types (checked by name in code). |
| `BIT`, `VARBIT` | `Utf8` | Bit string types represented as text. |
| `UUID` | `Utf8` | Universally Unique Identifier. |
| `ENUM` | `Utf8` | Enumerated types represented as their text labels. |
| `INET`, `CIDR`, `MACADDR`, `MACADDR8` | `Utf8` | Network address types. |
| `TSVECTOR` | `Utf8` | Text search vector type represented as text. Positions are not preserved |
| PostgreSQL Type(s) | Apache Arrow DataType | Notes |
| :----------------- | :-------------------- | :-------------------------- |
| `BYTEA` | `Binary` | Variable-length byte array. |
| PostgreSQL Type(s) | Apache Arrow DataType | Notes |
| :----------------- | :-------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `JSON`, `JSONB` | `Utf8` | Mapped to `Utf8`. Using the [Canonical JSON Extension](https://arrow.apache.org/docs/format/CanonicalExtensions.html#json) (`arrow.json`) is recommended for interoperability, allowing downstream systems to recognize the content as JSON. This metadata signals to downstream supermetal targets (like Snowflake, Clickhouse, etc.) that the string contains JSON, allowing them to map it to their appropriate semi-structured types (e.g., VARIANT, JSON). |
| `XML` | `Utf8` | Mapped to `Utf8`. A canonical extension type for XML might also exist or be defined depending on specific library support, similar to JSON. |
| PostgreSQL Type(s) | Apache Arrow DataType | Notes |
| :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :--------------------------- | :---------------------------------------------------------- |
| `BOOL[]` | `List` | Base types map as per their categories. |
| `BYTEA[]` | `List` | |
| `INT2[]` | `List` | |
| `INT4[]` | `List` | |
| `INT8[]`, `MONEY[]` | `List` | |
| `OID[]` | `List` | |
| `PG_LSN[]` | `List` | |
| `FLOAT4[]` | `List` | |
| `FLOAT8[]` | `List` | |
| `NUMERIC(p, s)[]` | `List` | Precision and scale preserved within list elements. |
| `DATE[]` | `List` | |
| `TIME[]` | `List` | |
| `TIMESTAMP[]` | `List` | |
| `TIMESTAMPTZ[]` | `List` | |
| `CHAR[]`, `VARCHAR[]`, `TEXT[]`, etc., `_citext`, `_ltree`, etc., `BIT[]`, `VARBIT[]`, `TIMETZ[]`, `INTERVAL[]`, `XML[]`, `UUID[]`, `INET[]`, `CIDR[]`, `MACADDR8[]`, `MACADDR[]`, `JSON[]`, `JSONB[]`, `NUMERIC[]` (unconstrained) | `List` | Includes arrays of types mapped to `Utf8`. |
| Multi-dimensional Arrays (>1 dim) | *Unsupported* | Multi-dimensional arrays (dimension > 1) are not supported. |
| PostgreSQL Type(s) | Apache Arrow DataType | Notes |
| :------------------------------------------------------------------------ | :-------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `INT4RANGE`, `INT8RANGE`, `NUMRANGE`, `TSRANGE`, `TSTZRANGE`, `DATERANGE` | `Utf8` | Range types are serialized to JSON format using the [Canonical JSON Extension](https://arrow.apache.org/docs/format/CanonicalExtensions.html#json) (`arrow.json`). |
| Multirange types | *Unsupported* | Multirange types are not supported. |
| PostgreSQL Type(s) | Apache Arrow DataType | Notes |
| :-------------------- | :--------------------- | :--------------------------------------------------------------------------------------------------------------------- |
| `DOMAIN(base_type)` | *Mapped as base\_type* | Domains are resolved to their underlying base type. |
| Composite Types (ROW) | `Utf8` | Composite types (structs/rows) are not explicitly mapped and default to a string (e.g., JSON) representation (`Utf8`). |
| PostgreSQL Type(s) | Apache Arrow DataType | Notes |
| :---------------------------------------------------------- | :-------------------- | :------------------------------- |
| `POINT`, `LSEG`, `PATH`, `BOX`, `POLYGON`, `LINE`, `CIRCLE` | *Unsupported* | Geometric types are unsupported. |
When replicating from PostgreSQL to PostgreSQL, Supermetal replicates the source PostgreSQL data types 1:1, preserving the exact type from source to target when possible.
***
# PostgreSQL Setup (/docs/main/sources/pg/setup)
## Prerequisites
Before you begin, ensure you have:
* **Supported PostgreSQL Version:** PostgreSQL version 9.4 or higher.
* **Database Admin Access:** Sufficient privileges to create users, grant permissions, and modify database configuration parameters. This might require `superuser` privileges or specific cloud provider roles (e.g., `rds_superuser` on AWS RDS, `azure_pg_admin` on Azure).
* **Network Connectivity:** Ensure that Supermetal's services can reach your PostgreSQL database. You may need to configure firewall rules, security groups, or VPC peering.
***
## Setup
Supermetal supports two replication types for PostgreSQL sources:
* **Snapshot**: One-time or periodic full table snapshots.
* **Logical Replication**: Real-time change data capture (CDC) with initial snapshot.
Follow these steps to configure PostgreSQL for snapshot-only replication:
### Create Read-Only User
Create a dedicated database user for Supermetal with read-only permissions.
```sql
-- Create user with a secure password
CREATE USER supermetal_user WITH PASSWORD 'strong-password';
```
Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the source in Supermetal.
### Grant Table Access
Grant SELECT permissions on the schemas and tables you want to replicate.
```sql
-- Grant schema access
GRANT USAGE ON SCHEMA public TO supermetal_user;
-- Grant select permission on all current tables in the schema
GRANT SELECT ON ALL TABLES IN SCHEMA public TO supermetal_user;
-- Ensure select permission is granted on any new tables created in the future
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT SELECT ON TABLES TO supermetal_user;
-- (Optional but Recommended) Bypass Row-Level Security (RLS)
-- This ensures Supermetal reads all data, regardless of RLS policies
-- applied for application users. Only grant if necessary and understood.
ALTER USER supermetal_user BYPASSRLS;
```
Repeat for each schema being replicated. Replace `public` with your schema name.
### Configure PostgreSQL Source in Supermetal
You are now ready to configure PostgreSQL as a source within Supermetal. Select **Snapshot** as the replication type when configuring the source.
Follow these steps to configure PostgreSQL for logical replication with CDC:
### Configure User and Permissions
Create a dedicated database user for Supermetal with the necessary permissions for replication and data access.
#### Create Replication User
```sql
-- Create user with minimal required privileges
CREATE USER supermetal_user
WITH PASSWORD 'strong-password'
REPLICATION;
```
```sql
-- Create user and grant RDS replication role
CREATE USER supermetal_user
WITH PASSWORD 'strong-password';
GRANT rds_replication TO supermetal_user;
```
```sql
-- Create replication user
CREATE USER supermetal_user
WITH PASSWORD 'strong-password';
ALTER USER supermetal_user WITH REPLICATION;
```
```sql
-- Create replication user
CREATE USER supermetal_user
WITH PASSWORD 'strong-password';
ALTER ROLE supermetal_user WITH REPLICATION;
```
Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the source in Supermetal.
#### Grant Table Access
```sql
-- Current and future table access
GRANT USAGE ON SCHEMA public TO supermetal_user;
-- Grant select permission on all current tables in the schema
GRANT SELECT ON ALL TABLES IN SCHEMA public TO supermetal_user;
-- Ensure select permission is granted on any new tables created in the future
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT SELECT ON TABLES TO supermetal_user;
-- (Optional but Recommended) Bypass Row-Level Security (RLS)
-- This ensures Supermetal syncs all data, regardless of RLS policies
-- applied for application users. Only grant if necessary and understood.
ALTER USER supermetal_user BYPASSRLS;
```
Repeat for each schema being replicated. Replace `public` with your schema name.
### Configure Replication Settings (WAL)
Logical replication relies on PostgreSQL's Write-Ahead Log (WAL). You need to configure specific parameters to enable it.
#### Understanding WAL Settings
These settings must be configured for logical replication to work:
* `wal_level`: Must be set to `logical`. This adds the necessary information to the WAL for logical decoding.
* `max_wal_senders`: Defines the maximum number of concurrent replication connections the server can support. Supermetal uses these connections for CDC.
* `max_replication_slots`: Defines the maximum number of replication slots the server can manage. Each Supermetal CDC connection requires a replication slot to track its progress and prevent the server from purging WAL files needed for replication. This should generally be equal to or greater than max\_wal\_senders.
#### Platform-Specific Configuration
Apply the settings based on your PostgreSQL environment.
1. Check current settings:
```sql
-- View all replication settings
SELECT name, setting, unit, context
FROM pg_settings
WHERE name IN (
'wal_level', 'max_wal_senders', 'max_replication_slots'
);
```
2. Configure settings:
```sql
ALTER SYSTEM SET wal_level = 'logical';
ALTER SYSTEM SET max_wal_senders = 10; -- Adjust for your tables
ALTER SYSTEM SET max_replication_slots = 10; -- Match max_wal_senders
```
3. Restart PostgreSQL
Modify the DB Parameter Group:
1. Go to your RDS instance configuration.
2. Find the associated DB Parameter Group and modify it (or create a new one and associate it).
3. Set the following parameters:
1. `rds.logical_replication` = 1 (This enables logical replication on RDS)
2. `wal_level` = logical (Often set automatically when rds.logical\_replication is 1)
3. `max_wal_senders` = 10 (Adjust as needed)
4. `max_replication_slots` = 10 (Match or exceed max\_wal\_senders)
4. Apply to RDS instance and reboot: Applying changes to a Parameter Group that includes static parameters like wal\_level requires rebooting the RDS instance.
1. Enable Logical Decoding Flag:
2. Navigate to your Cloud SQL for PostgreSQL instance in the Google Cloud Console. Click Edit.
3. Go to the Flags section.
4. Add or modify the following flags:
1. `cloudsql.logical_decoding` = on
2. `wal_sender_timeout` = 0 (Optional but recommended. Setting to 0 disables the timeout, preventing replication issues during long transactions or initial syncs. Default is 60 seconds.)
5. Automatic Management: Cloud SQL automatically manages `wal_level`, `max_wal_senders`, and `max_replication_slots` when `cloudsql.logical_decoding` is enabled. You do not need to set these manually.
6. You'll need to restart your instance to update your instance with the changes.
1. Navigate to your Azure Database for PostgreSQL instance in the Azure Portal.
2. Go to Server parameters. Set the following parameters:
1. `wal_level` = logical
2. `max_wal_senders` = 10 (Adjust as needed)
3. `max_replication_slots` = 10 (Match or exceed max\_wal\_senders)
4. `wal_sender_timeout` = 0 (Optional but recommended, prevents timeouts. Default might be 60000ms)
3. Save and Restart Server: Changes to these parameters require a server restart to take effect. Azure Portal will prompt you to save and restart.
Changes to WAL parameters require a server restart.
### Configure Publications
Publications define which tables are replicated. Choose between replicating all tables or specific ones.
```sql
-- Option 1: Create a publication for ALL tables
CREATE PUBLICATION supermetal_publication FOR ALL TABLES;
-- Option 2: For warehouse targets, to sync partitioned tables as root, include publish_via_partition_root flag (https://www.postgresql.org/docs/current/sql-createpublication.html#SQL-CREATEPUBLICATION-PARAMS-WITH-PUBLISH-VIA-PARTITION-ROOT)
CREATE PUBLICATION supermetal_publication FOR ALL TABLES WITH (publish_via_partition_root);
```
### Configure PostgreSQL Source in Supermetal
You are now ready to configure PostgreSQL as a source within Supermetal. Select **Logical Replication** as the replication type when configuring the source.
## High Availability
Supermetal supports fast failover for clustered PostgreSQL databases (e.g., AWS Aurora, RDS Multi-AZ).
* **Use Cluster Endpoints:** Use the **Cluster Writer Endpoint** (e.g., `my-cluster.cluster-xyz.us-east-1.rds.amazonaws.com`) rather than instance addresses.
* **Automatic Recovery:** During logical replication, Supermetal detects failovers and resumes streaming from the new primary (typically \< 10 seconds).
Transient connection errors (network timeouts, server restarts, failovers) are retried with backoff for up to 5 minutes (configurable via `retry_window_seconds`).
# Monitoring (/docs/main/concepts/deployment/monitoring)
Supermetal provides real-time monitoring of its replication processes, offering deep insights into health and performance. All metrics and logs are recorded using the [OpenTelemetry (OTEL)](https://opentelemetry.io/) vendor-neutral standard.
OTEL ensures that you can seamlessly integrate Supermetal's observability data with your preferred monitoring and logging tools, such as [Datadog](https://docs.datadoghq.com/opentelemetry/), [Splunk](https://docs.splunk.com/Observability/gdi/opentelemetry/opentelemetry.html), or any [other OTEL-compatible](https://opentelemetry.io/ecosystem/vendors/) platform. Key observability features include:
* [**Comprehensive Metrics:**](/docs/main/concepts/deployment/monitoring/metrics) Detailed metrics covering data throughput, latency, resource utilization, and operational events.
* **Structured Logging:** Rich, structured logs that provide context for events and errors, facilitating easier debugging and analysis.
## Export
You can configure the agent to send this telemetry data to your preferred OTLP-compatible backend. The method for configuring the OTLP exporter endpoint varies depending on your Supermetal deployment model.
### Local / Self-Hosted
When running the Supermetal agent locally, you can configure the OTLP exporter endpoint and other parameters using standard OpenTelemetry environment variables. Key variables include:
* `OTEL_EXPORTER_OTLP_ENDPOINT`: Specifies the target URL for the OTLP collector (e.g., `http://localhost:4317` for gRPC or `http://localhost:4318/v1/metrics` for HTTP/protobuf metrics).
* `OTEL_EXPORTER_OTLP_HEADERS`: Allows you to specify headers, such as authentication tokens (e.g., `Authorization=Bearer `).
For detailed information on OTLP exporter configuration, refer to the official [OpenTelemetry SDK configuration documentation](https://opentelemetry.io/docs/languages/sdk-configuration/otlp-exporter/).
Set these environment variables in the environment where your Supermetal agent is running.
### BYOC / Hybrid
For deployments managed by the Supermetal Control Plane, including Bring Your Own Cloud (BYOC), the configuration of OTLP exporter endpoints for your agents is managed centrally through the Supermetal Console.
Navigate to settings within the Supermetal Console to specify the OTLP endpoint URL and any necessary authentication headers for your chosen monitoring backend. The Control Plane will then start propagating telemetry data to your environment.
# Metrics (/docs/main/concepts/deployment/monitoring/metrics)
## Connector Metrics
These metrics track data flow and performance through the ETL pipeline from source to target systems.
| Category | Metric Name | Description | Unit | Type |
| ----------------- | --------------------------------------- | ---------------------------------------------------------------------------------------------------- | ---- | --------------- |
| Data | `rows.in` | Number of rows read | 1 | Counter (u64) |
| | `rows.out` | Number of rows written | 1 | Counter (u64) |
| | `bytes.in` | Number of bytes read | By | Counter (u64) |
| | `bytes.out` | Number of bytes written | By | Counter (u64) |
| Latency | `latency.total` | Latency from source event timestamp to event being processed & written to target. End-to-end latency | ns | Histogram (u64) |
| | `latency.in` | Latency from source event timestamp to when the event was received | ns | Histogram (u64) |
| | `latency.out` | Latency from when the event was received to when event written to target | ns | Histogram (u64) |
| | `latency.in.process` | Latency to process source event | ns | Histogram (u64) |
| | `latency.out.process` | Latency to process event before writing to target | ns | Histogram (u64) |
| Buffer | | | | |
| ↳ Data | `buffer.rows.in` | Number of rows read from the buffer | 1 | Counter (u64) |
| | `buffer.rows.out` | Number of rows written to the buffer | 1 | Counter (u64) |
| | `buffer.bytes.in` | Number of bytes read from the buffer | By | Counter (u64) |
| | `buffer.bytes.out` | Number of bytes written to the buffer | By | Counter (u64) |
| ↳ Latency | `buffer.latency.total` | Latency from source event timestamp to event being processed & written to buffer | ns | Histogram (u64) |
| | `buffer.latency.in` | Latency from source event timestamp to when the event was received by buffer | ns | Histogram (u64) |
| | `buffer.latency.out` | Latency from when the event was received to when event written to buffer | ns | Histogram (u64) |
| | `buffer.latency.in.process` | Latency to process the event was received by buffer | ns | Histogram (u64) |
| | `buffer.latency.out.process` | Latency to process event before writing to buffer | ns | Histogram (u64) |
| ↳ Object Store | `buffer.object_store.files.force-flush` | Number of times buffer forced a flush (on no writes) | 1 | Counter (u64) |
| | `buffer.object_store.files.abort` | Failed file operations due to errors | 1 | Counter (u64) |
| | `buffer.object_store.files.create` | Successfully created new file in object storage | 1 | Counter (u64) |
| | `buffer.object_store.files.delete` | Files removed from object storage | 1 | Counter (u64) |
| | `buffer.object_store.files.get` | Files retrieved from object storage | 1 | Counter (u64) |
| Snapshot Progress | `progress.snapshot` | Current progress of initial snapshot load (0.0 to 100.0) | % | Gauge (f64) |
| | `buffer.progress.snapshot` | Current progress of buffer snapshot operations (0.0 to 100.0) | % | Gauge (f64) |
## Process Metrics
System-level metrics for the Supermetal process following [OpenTelemetry semantic conventions](https://opentelemetry.io/docs/specs/semconv/system/process-metrics/). These help monitor resource usage and system health.
| Category | Metric Name | Description | Unit | Type |
| ----------------- | ------------------------------------ | --------------------------------------------------------- | ---- | ------------------- |
| Process CPU | `process.cpu.time` | Total CPU time consumed by the process since startup | s | Counter (u64) |
| | `process.cpu.utilization` | Current CPU usage as ratio (0.0=idle, 1.0=fully utilized) | 1 | Gauge (f64) |
| Process Memory | `process.memory.usage` | Physical memory (RSS) currently used by the process | By | UpDownCounter (i64) |
| | `process.memory.virtual` | Virtual memory currently allocated by the process | By | UpDownCounter (i64) |
| Process Resources | `process.open_file_descriptor_count` | Currently open file descriptors (monitor for fd leaks) | 1 | UpDownCounter (i64) |
| | `process.thread_count` | Currently active threads (monitor for thread leaks) | 1 | UpDownCounter (i64) |
| | `process.disk.io` | Total disk I/O bytes with `direction` label (read/write) | By | Counter (u64) |
## Runtime Metrics
Tokio async runtime metrics providing insights into task scheduling, worker thread utilization, and async I/O performance.
| Category | Metric Name | Description | Unit | Type |
| ----------------------- | ------------------------------------- | ------------------------------------------------------------------------------ | ---- | ------------- |
| Runtime Tasks | `runtime.global_queue_depth` | Tasks waiting in global queue (high values indicate backpressure) | 1 | Gauge (u64) |
| | `runtime.alive_tasks` | Currently executing async tasks | 1 | Gauge (u64) |
| | `runtime.spawned_tasks` | Total async tasks created since startup | 1 | Counter (u64) |
| | `runtime.budget_forced_yield_count` | Tasks forced to yield due to execution time limits (indicates CPU-bound tasks) | 1 | Counter (u64) |
| Runtime Workers | `runtime.num_workers` | Total worker threads in the runtime thread pool | 1 | Gauge (u64) |
| | `runtime.blocking_queue_depth` | Tasks waiting for blocking thread pool | 1 | Gauge (u64) |
| | `runtime.num_blocking_threads` | Currently active blocking threads | 1 | Gauge (u64) |
| | `runtime.num_idle_blocking_threads` | Idle blocking threads available for work | 1 | Gauge (u64) |
| | `runtime.io_driver_ready_count` | I/O events processed by the runtime (network/file operations) | 1 | Counter (u64) |
| Worker Metrics (opt-in) | `runtime.worker_local_queue_depth` | Tasks in each worker's local queue (set `ENABLE_WORKER_METRICS=1`) | 1 | Gauge (u64) |
| | `runtime.worker_local_schedule_count` | Tasks scheduled locally per worker (indicates work distribution) | 1 | Counter (u64) |
| | `runtime.worker_mean_poll_time` | Average task execution time per worker | ns | Gauge (u64) |
| | `runtime.worker_noop_count` | Idle polls per worker (high values indicate low workload) | 1 | Counter (u64) |
| | `runtime.worker_overflow_count` | Local queue overflows per worker (tasks moved to global queue) | 1 | Counter (u64) |
| | `runtime.worker_park_count` | Times worker thread went idle waiting for tasks | 1 | Counter (u64) |
| | `runtime.worker_poll_count` | Total task polls per worker | 1 | Counter (u64) |
| | `runtime.worker_steal_count` | Tasks stolen from other workers (indicates load balancing) | 1 | Counter (u64) |
| | `runtime.worker_steal_operations` | Steal attempts per worker (successful + failed) | 1 | Counter (u64) |
| | `runtime.worker_total_busy_duration` | Cumulative execution time per worker | ns | Counter (u64) |
# MySQL (/docs/main/sources/mysql)
import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
import { Callout } from 'fumadocs-ui/components/callout';
import { Accordion, Accordions } from 'fumadocs-ui/components/accordion';
MySQL's binary log (binlog) provides a reliable foundation for Change Data Capture and data replication. Supermetal builds on this capability to deliver efficient data extraction with minimal database overhead, leveraging Global Transaction Identifiers (GTIDs) for guaranteed consistency.
This guide covers the features, prerequisites, and configuration steps required to connect MySQL with Supermetal mainly focusing on binlog setup and permissions.
***
## Features
|
Feature
|
Notes
|
|
Initial Data Sync
|
Performs fast initial data loads (snapshots) and historical backfills by chunking data within each table and reading each chunk concurrently using multiple connections:
**Physical Partitioning**: Automatically detects and utilizes existing MySQL table partitions, extracting data from each partition in parallel
**Dynamic Range Partitioning**: For non-partitioned tables with primary keys, creates optimized query boundaries based on data distribution for parallel extraction
|
|
Change Data Capture
|
Captures data changes in real-time using MySQL's native [binary log](https://dev.mysql.com/doc/refman/8.0/en/binary-log.html) with minimal database impact.
**GTID-based Consistency**: Uses [Global Transaction Identifiers](https://dev.mysql.com/doc/refman/8.0/en/replication-gtids.html) to maintain consistent replication position and preserve transaction boundaries, ensuring reliable recovery and preventing data loss.
Supermetal validates the GTID position against the server's purged GTIDs to ensure continuity, preventing replication from a position that's no longer available in the binary log.
|
|
Schema Evolution
|
Precise mapping of MySQL types to Apache Arrow with specialized handling for complex types, including DECIMAL (up to 76 digits precision), SET types (as arrays) and microsecond-precision temporal types. Schema evolution is properly tracked to ensure consistent downstream target compatibility.
|
|
Transactional Consistency
|
Guarantees that the data synced reflects a consistent point in the upstream database's timeline, respecting transaction boundaries through GTID-based replication.
|
|
Catalog Support
|
Ability to configure which schemas, tables and columns to sync.
|
***
## Prerequisites
Before you begin, ensure you have:
* **MySQL Requirements:**
* **Version:** MySQL 8.0 or later with GTID mode enabled
* **Supported Deployments:**
* `Self-Hosted MySQL`
* `Amazon RDS for MySQL` (Coming Soon)
* `Azure Database for MySQL` (Coming Soon)
GTID mode must be enabled on your MySQL server for Change Data Capture functionality. See [MySQL GTID Documentation](https://dev.mysql.com/doc/refman/8.0/en/replication-gtids.html) for details.
* **Binary Log Requirements:**
* **Binary Log Format**: Must be set to `ROW` format
* **Binary Log Retention**: Configure appropriate retention period (recommended: at least 3 days)
* **Deployment Requirements:**
* **Database Permissions**: User with `SELECT`, `REPLICATION SLAVE`, and `REPLICATION CLIENT` privileges
* **Network Connectivity**: Ensure Supermetal can reach your MySQL server (default port: 3306)
***
## Setup
### Permissions Overview
Supermetal requires a dedicated MySQL user with appropriate permissions to read data and access the binary log for CDC.
| Required Permission | Purpose |
| :------------------- | :------------------------------------------------------ |
| `SELECT` | Read data from tables during snapshot and CDC |
| `REPLICATION SLAVE` | Access the binary log for CDC |
| `REPLICATION CLIENT` | Query server status information including GTID position |
### Setup Instructions
### Self-Hosted MySQL Setup \[!toc]
#### Enable Binary Logging and GTID Mode \[!toc]
Edit your MySQL configuration file (`my.cnf` or `my.ini`):
```ini
[mysqld]
# Server identification (must be unique in your infrastructure)
server-id = 223344
# Enable binary logging
log_bin = mysql-bin
binlog_format = ROW
binlog_row_image = FULL
binlog_row_metadata = FULL
# Enable GTID mode
gtid_mode = ON
enforce_gtid_consistency = ON
# Set binary log retention (3 days)
binlog_expire_logs_seconds = 259200
```
* `server-id`: Must be unique for each server in your replication topology. Can be any non-zero value between 1 and 4294967295.
* `log_bin`: Base name for binary log files.
* `binlog_format`: Must be set to ROW for CDC to work properly.
* `binlog_row_image`: Must be set to FULL to capture complete before/after row images, which is required for CDC.
* `binlog_row_metadata`: Must be set to FULL to include complete metadata in binary log events.
* `gtid_mode` and `enforce_gtid_consistency`: Required for reliable replication position tracking.
* `binlog_expire_logs_seconds`: Controls how long binary logs are retained before automatic deletion.
#### Create a Dedicated User for Replication \[!toc]
Connect to MySQL as a privileged user and run:
```sql
CREATE USER 'supermetal_user'@'%' IDENTIFIED BY 'strong-password';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'supermetal_user'@'%';
FLUSH PRIVILEGES;
```
Replace the following variables with your own values:
* `supermetal_user`: Username for Supermetal access
* `strong-password`: Secure password for the Supermetal user
#### Verify Configuration \[!toc]
Verify binary logging and GTID configuration:
```sql
SHOW VARIABLES LIKE 'gtid_mode';
SHOW VARIABLES LIKE 'enforce_gtid_consistency';
SHOW VARIABLES LIKE 'log_bin';
SHOW VARIABLES LIKE 'binlog_format';
SHOW VARIABLES LIKE 'binlog_row_image';
SHOW VARIABLES LIKE 'binlog_row_metadata';
SHOW VARIABLES LIKE 'server_id';
```
The output should show:
* `gtid_mode`: ON
* `enforce_gtid_consistency`: ON
* `log_bin`: ON
* `binlog_format`: ROW
* `binlog_row_image`: FULL
* `binlog_row_metadata`: FULL
* `server_id`: A non-zero value
### Amazon RDS MySQL Setup \[!toc]
#### Create a Parameter Group for GTID Mode \[!toc]
1. In the AWS RDS console, navigate to "Parameter groups"
2. Click "Create parameter group"
3. Select the MySQL engine and version that matches your RDS instance
4. Enter a name and description (e.g., "supermetal-cdc-params")
5. Click "Create"
6. Select the newly created parameter group and modify the following parameters:
* `gtid-mode`: ON
* `enforce_gtid_consistency`: ON
* `binlog_format`: ROW
* `binlog_row_image`: FULL
* `binlog_row_metadata`: FULL
In RDS, some parameters cannot be modified, including `server-id`. RDS automatically assigns a unique server-id to each instance, so you don't need to configure this manually.
#### Apply the Parameter Group to Your RDS Instance \[!toc]
1. Navigate to your RDS instances and select your MySQL instance
2. Click "Modify"
3. Under "Additional configuration", select the parameter group you created
4. Choose "Apply immediately" or schedule for your next maintenance window
5. Click "Continue" and confirm the changes
#### Set Binary Log Retention Period \[!toc]
Connect to your RDS MySQL instance and run:
```sql
CALL mysql.rds_set_configuration('binlog retention hours', 72);
```
Amazon RDS uses `binlog retention hours` instead of the standard `binlog_expire_logs_seconds` parameter. The default value is 0 (immediate removal), so this step is essential.
#### Create a User for Replication \[!toc]
```sql
CREATE USER 'supermetal_user'@'%' IDENTIFIED BY 'strong-password';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'supermetal_user'@'%';
FLUSH PRIVILEGES;
```
Replace the following variables with your own values:
* `supermetal_user`: Username for Supermetal access
* `strong-password`: Secure password for the Supermetal user
### Azure Database for MySQL Setup \[!toc]
#### Configure Server Parameters \[!toc]
1. In the Azure portal, navigate to your MySQL server resource
2. Select "Server parameters" under the Settings section
3. Find and set the following parameters:
* `gtid_mode`: ON
* `enforce_gtid_consistency`: ON
* `binlog_format`: ROW
* `binlog_row_image`: FULL
* `binlog_row_metadata`: FULL
4. Click "Save" to apply the changes
5. Restart your Azure Database for MySQL server if prompted
Some parameters might be read-only in Azure Database for MySQL, particularly in Flexible Server configurations. The `server-id` is automatically assigned by Azure and cannot be modified.
#### Create a User for Replication \[!toc]
Connect to your Azure MySQL instance and run:
```sql
CREATE USER 'supermetal_user'@'%' IDENTIFIED BY 'strong-password';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'supermetal_user'@'%';
FLUSH PRIVILEGES;
```
Replace the following variables with your own values:
* `supermetal_user`: Username for Supermetal access
* `strong-password`: Secure password for the Supermetal user
Ensure your Azure MySQL server's firewall rules allow connections from the Supermetal service IP addresses.
***
## Data Types Mapping
Supermetal automatically maps MySQL data types to Apache Arrow data types according to the following mapping:
| MySQL Type | Arrow Type | Notes |
| ----------- | ------------------------------------ | ------------------------------------------------------------------------------------------- |
| `TINYINT` | `Int8` / `UInt8` | Signed/unsigned determined by column definition |
| `SMALLINT` | `Int16` / `UInt16` | Signed/unsigned determined by column definition |
| `MEDIUMINT` | `Int32` / `UInt32` | Signed/unsigned determined by column definition |
| `INT` | `Int32` / `UInt32` | Signed/unsigned determined by column definition |
| `BIGINT` | `Int64` / `UInt64` | Signed/unsigned determined by column definition |
| `FLOAT` | `Float32` | |
| `DOUBLE` | `Float64` | |
| `DECIMAL` | `Decimal128` / `Decimal256` / `Utf8` | `Decimal128` for precision ≤ 38, `Decimal256` for precision ≤ 76, `Utf8` for precision > 76 |
| MySQL Type | Arrow Type | Notes |
| ---------- | ------------ | ------------------------------------------------------------------------------ |
| `CHAR` | `Utf8` | |
| `VARCHAR` | `Utf8` | |
| `TEXT` | `Utf8` | Includes all TEXT variants (TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT) |
| `ENUM` | `Utf8` | Original enum values preserved as strings |
| `SET` | `List` | Converted to arrays of strings (e.g., 'value1,value2' → \["value1", "value2"]) |
| `JSON` | `Utf8` | Preserved as JSON string |
| MySQL Type | Arrow Type | Notes |
| ----------- | ---------- | ------------------------------------------------------------------------------- |
| `BINARY` | `Binary` | |
| `VARBINARY` | `Binary` | |
| `BLOB` | `Binary` | Includes all BLOB variants (TINYBLOB, BLOB, MEDIUMBLOB, LONGBLOB) |
| `BIT` | `Utf8` | Converted to binary string representation (e.g., BIT(4) value 5 becomes "0101") |
| MySQL Type | Arrow Type | Notes |
| ----------- | -------------------------- | ------------------------------------------------------------------------------------------- |
| `DATE` | `Date32` | |
| `TIME` | `IntervalDayTime` / `Utf8` | `IntervalDayTime` for precision ≤ 3 (milliseconds), `Utf8` for precision > 3 (microseconds) |
| `DATETIME` | `Timestamp(Microsecond)` | Full microsecond precision without timezone |
| `TIMESTAMP` | `Timestamp(Microsecond)` | Full microsecond precision, stored in UTC internally by MySQL |
| `YEAR` | `UInt16` | |
| MySQL Type | Arrow Type |
| ----------------- | ------------- |
| `GEOMETRY` | *Unsupported* |
| `POINT` | *Unsupported* |
| `LINESTRING` | *Unsupported* |
| `POLYGON` | *Unsupported* |
| `MULTIPOINT` | *Unsupported* |
| `MULTILINESTRING` | *Unsupported* |
| `MULTIPOLYGON` | *Unsupported* |
| `GEOMCOLLECTION` | *Unsupported* |
| `VECTOR` | *Unsupported* |
Columns with unsupported data types are automatically excluded from replication. Tables containing only unsupported column types cannot be replicated.
# MySQL Setup (/docs/main/sources/mysql/setup)
## Prerequisites
Before you begin, ensure you have:
* **MySQL Requirements:**
* **Version:** MySQL 8.0 or later with GTID mode enabled
* **Supported Deployments:**
* `Self-Hosted MySQL`
* `Amazon RDS for MySQL` (Coming Soon)
* `Azure Database for MySQL` (Coming Soon)
GTID mode must be enabled on your MySQL server for Change Data Capture functionality. See [MySQL GTID Documentation](https://dev.mysql.com/doc/refman/8.0/en/replication-gtids.html) for details.
* **Binary Log Requirements:**
* **Binary Log Format**: Must be set to `ROW` format
* **Binary Log Retention**: Configure appropriate retention period (recommended: at least 3 days)
* **Deployment Requirements:**
* **Database Permissions**: User with `SELECT`, `REPLICATION SLAVE`, and `REPLICATION CLIENT` privileges
* **Network Connectivity**: Ensure Supermetal can reach your MySQL server (default port: 3306)
***
## Setup
### Permissions Overview
Supermetal requires a dedicated MySQL user with appropriate permissions to read data and access the binary log for CDC.
| Required Permission | Purpose |
| :------------------- | :------------------------------------------------------ |
| `SELECT` | Read data from tables during snapshot and CDC |
| `REPLICATION SLAVE` | Access the binary log for CDC |
| `REPLICATION CLIENT` | Query server status information including GTID position |
### Setup Instructions
### Self-Hosted MySQL Setup \[!toc]
#### Enable Binary Logging and GTID Mode \[!toc]
Edit your MySQL configuration file (`my.cnf` or `my.ini`):
```ini
[mysqld]
# Server identification (must be unique in your infrastructure)
server-id = 223344
# Enable binary logging
log_bin = mysql-bin
binlog_format = ROW
binlog_row_image = FULL
binlog_row_metadata = FULL
# Enable GTID mode
gtid_mode = ON
enforce_gtid_consistency = ON
# Set binary log retention (3 days)
binlog_expire_logs_seconds = 259200
```
* `server-id`: Must be unique for each server in your replication topology. Can be any non-zero value between 1 and 4294967295.
* `log_bin`: Base name for binary log files.
* `binlog_format`: Must be set to ROW for CDC to work properly.
* `binlog_row_image`: Must be set to FULL to capture complete before/after row images, which is required for CDC.
* `binlog_row_metadata`: Must be set to FULL to include complete metadata in binary log events.
* `gtid_mode` and `enforce_gtid_consistency`: Required for reliable replication position tracking.
* `binlog_expire_logs_seconds`: Controls how long binary logs are retained before automatic deletion.
#### Create a Dedicated User for Replication \[!toc]
Connect to MySQL as a privileged user and run:
```sql
CREATE USER 'supermetal_user'@'%' IDENTIFIED BY 'strong-password';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'supermetal_user'@'%';
FLUSH PRIVILEGES;
```
Replace the following variables with your own values:
* `supermetal_user`: Username for Supermetal access
* `strong-password`: Secure password for the Supermetal user
#### Verify Configuration \[!toc]
Verify binary logging and GTID configuration:
```sql
SHOW VARIABLES LIKE 'gtid_mode';
SHOW VARIABLES LIKE 'enforce_gtid_consistency';
SHOW VARIABLES LIKE 'log_bin';
SHOW VARIABLES LIKE 'binlog_format';
SHOW VARIABLES LIKE 'binlog_row_image';
SHOW VARIABLES LIKE 'binlog_row_metadata';
SHOW VARIABLES LIKE 'server_id';
```
The output should show:
* `gtid_mode`: ON
* `enforce_gtid_consistency`: ON
* `log_bin`: ON
* `binlog_format`: ROW
* `binlog_row_image`: FULL
* `binlog_row_metadata`: FULL
* `server_id`: A non-zero value
### Amazon RDS MySQL Setup \[!toc]
#### Create a Parameter Group for GTID Mode \[!toc]
1. In the AWS RDS console, navigate to "Parameter groups"
2. Click "Create parameter group"
3. Select the MySQL engine and version that matches your RDS instance
4. Enter a name and description (e.g., "supermetal-cdc-params")
5. Click "Create"
6. Select the newly created parameter group and modify the following parameters:
* `gtid-mode`: ON
* `enforce_gtid_consistency`: ON
* `binlog_format`: ROW
* `binlog_row_image`: FULL
* `binlog_row_metadata`: FULL
In RDS, some parameters cannot be modified, including `server-id`. RDS automatically assigns a unique server-id to each instance, so you don't need to configure this manually.
#### Apply the Parameter Group to Your RDS Instance \[!toc]
1. Navigate to your RDS instances and select your MySQL instance
2. Click "Modify"
3. Under "Additional configuration", select the parameter group you created
4. Choose "Apply immediately" or schedule for your next maintenance window
5. Click "Continue" and confirm the changes
#### Set Binary Log Retention Period \[!toc]
Connect to your RDS MySQL instance and run:
```sql
CALL mysql.rds_set_configuration('binlog retention hours', 72);
```
Amazon RDS uses `binlog retention hours` instead of the standard `binlog_expire_logs_seconds` parameter. The default value is 0 (immediate removal), so this step is essential.
#### Create a User for Replication \[!toc]
```sql
CREATE USER 'supermetal_user'@'%' IDENTIFIED BY 'strong-password';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'supermetal_user'@'%';
FLUSH PRIVILEGES;
```
Replace the following variables with your own values:
* `supermetal_user`: Username for Supermetal access
* `strong-password`: Secure password for the Supermetal user
### Azure Database for MySQL Setup \[!toc]
#### Configure Server Parameters \[!toc]
1. In the Azure portal, navigate to your MySQL server resource
2. Select "Server parameters" under the Settings section
3. Find and set the following parameters:
* `gtid_mode`: ON
* `enforce_gtid_consistency`: ON
* `binlog_format`: ROW
* `binlog_row_image`: FULL
* `binlog_row_metadata`: FULL
4. Click "Save" to apply the changes
5. Restart your Azure Database for MySQL server if prompted
Some parameters might be read-only in Azure Database for MySQL, particularly in Flexible Server configurations. The `server-id` is automatically assigned by Azure and cannot be modified.
#### Create a User for Replication \[!toc]
Connect to your Azure MySQL instance and run:
```sql
CREATE USER 'supermetal_user'@'%' IDENTIFIED BY 'strong-password';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'supermetal_user'@'%';
FLUSH PRIVILEGES;
```
Replace the following variables with your own values:
* `supermetal_user`: Username for Supermetal access
* `strong-password`: Secure password for the Supermetal user
Ensure your Azure MySQL server's firewall rules allow connections from the Supermetal service IP addresses.
# AWS (/docs/main/concepts/deployment/byoc/aws)
import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
import { Step, Steps } from 'fumadocs-ui/components/steps';
import { Accordion, Accordions } from 'fumadocs-ui/components/accordion';
import ByocAwsDiagram from '../../../../../../app/diagrams/byoc_aws';
Follow this guide to provision Supermetal BYOC (Bring Your Own Cloud), which deploys Supermetal's data replication agents in your own virtual cloud environment, maintaining full control and ownership over your data.
This provides an additional layer of security and isolation. Supermetal handles provisioning, operations, and maintenance.
***
## Overview
| Phase | Resource | Description |
| ---------------------- | -------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Bootstrap** | IAM Role | `SupermetalControlPlaneRole`: Assumed by Supermetal's control plane for orchestration of resources in your AWS account. |
| | IAM Role | `SupermetalAgentTaskRole`: IAM Role for Supermetal Agent ECS tasks, granting necessary permissions for agent operations (e.g., S3, Secrets Manager, KMS). |
| | IAM Role | `SupermetalTaskExecutionRole`: IAM Role for ECS task execution, allowing ECS to pull ECR images for agents and send logs to CloudWatch. |
| | IAM Role | `SupermetalAgentEC2InstanceRole`: IAM Role for EC2 instances that host Supermetal Agents (used with EC2-backed ECS capacity providers). |
| | IAM Instance Profile | `SupermetalAgentEC2InstanceProfile`: Instance profile associated with `SupermetalAgentEC2InstanceRole`, attached to EC2 instances hosting agents. |
| | IAM Role | `SupermetalValidationLambdaRole`: IAM Role for Supermetal Lambda functions used for validation tasks (e.g., connector credential validation, network reachability). |
| | Interface VPC Endpoint | `MetadataVPCEndpoint`: An AWS PrivateLink Interface VPC Endpoint for private communication between Supermetal agents and the Supermetal control plane. |
| | Security Group | `MetadataVPCEndpointSecurityGroup`: Associated with the `MetadataVPCEndpoint`, controls inbound traffic to the endpoint. |
| | Security Group | `SupermetalAgentMetadataSecurityGroup`: Allows Supermetal agents (running in your VPC) to connect to the `MetadataVPCEndpoint` on HTTPS port 443. |
| | (Optional) KMS Key & Alias | `SupermetalKMSKey` (and `SupermetalKMSKeyAlias` like `alias/supermetal-key-${VpcId}`): AWS KMS Customer Managed Key (CMK) for encrypting sensitive connector credentials. A new key is created if an existing `CredentialsKeyArn` is not provided. |
| **Registration** | Secret | An AWS Secrets Manager secret is created to store credentials required by the Supermetal control plane or agents, such as for ECR access or metadata service authentication. |
| **Connector Creation** | ECS Cluster | An Amazon ECS cluster (e.g., `supermetal-cluster-${VpcId}`) is created to host the Supermetal agent services. |
| | ECS Service | An Amazon ECS service is deployed within the cluster to run and manage Supermetal agent tasks. |
| | ECS Capacity Provider | An ECS Capacity Provider, typically backed by an EC2 Auto Scaling Group (ASG), is set up to manage the compute resources for the agents. |
* **Data Sovereignty:** Your primary data (being replicated) is processed entirely within your AWS VPC and never leaves your environment to the Supermetal Control Plane.
* **Secure Communication:** All metadata communication between Supermetal agents in your VPC and the Supermetal Control Plane occurs securely over AWS PrivateLink via an Interface VPC Endpoint. This traffic is encrypted in transit using **TLS 1.2+**.
* **Credential Protection:** Sensitive connector credentials provided to Supermetal are encrypted using **your** AWS KMS Customer Managed Key (`CredentialsKeyArn`) before being stored by the Supermetal Control Plane. Only the Supermetal Agents decrypt these credentials at runtime within your environment. Supermetal Control Plane cannot decrypt these credentials.
* **IAM Security:** The cross-account role (`SupermetalControlPlaneRole`) assumed by Supermetal utilizes an `ExternalId` to protect against the "confused deputy" problem. All IAM roles are designed with the principle of least privilege to perform only their required functions.
***
## Prerequisites
To successfully deploy Supermetal BYOC in your AWS environment, please ensure you have the following:
#### Supermetal Account Identifier
Supermetal account identifier (obtainable from the Supermetal Console under Settings).
* `SUPERMETAL_ACCOUNT_ID`
#### AWS Account & Permissions
* IAM User or Role with sufficient permissions (e.g., `AWSAdministratorAccess`) to bootstrap the Supermetal BYOC environment using either the [CloudFormation or the Terraform](#deployment_method).
#### Deployment Parameters
* **AWS Account ID:** The AWS Account ID of the target deployment account (e.g., `123456789012`).
* **AWS Region(s):** The AWS Region(s) of the target deployment account (e.g., `us-east-1`, `eu-west-2`).
* **VPC ID (`VpcId`):** The ID of an existing VPC within your chosen AWS Region where the bootstrap infrastructure and Supermetal agents will be deployed to on connector creation.
* **Subnet IDs (`SubnetIds`):** A list of (private) subnet IDs within your specified `VpcId`. Supermetal agents will be deployed across these subnets for high availability. Ensure these subnets belong to the same availability zones as your source (and target) databases.
* **(Optional) KMS Key for Credentials (`CredentialsKeyArn`):** Decide if you'll use an existing AWS KMS Customer Managed Key (CMK) for encrypting sensitive connector credentials. If you have an existing CMK, have its ARN ready. If left blank, the bootstrap process will create a new KMS key specifically for Supermetal.
***
## Setup
A two phase process, one is to bootstrap the environment and the other is to create a connector to deploy the data plane / agents in your VPC.
#### Environment Bootstrap
The bootstrap process uses an AWS CloudFormation template or a Terraform script provided by Supermetal to provision the necessary resources, including [IAM roles](#iam-roles) and private endpoint.
These IAM roles are used by Supermetal to automatically register your cloud environment with the Supermetal control plane over the private endpoint, and to automatically deploy Supermetal data plane / agents in your VPC.
The initial environment bootstrap process does not provision any compute resources or the data plane / agents. Once the environment bootstrap is complete, you can create connectors to deploy Supermetal data plane / agents on-demand in your VPC.
You can use the AWS Management Console or the AWS CLI to deploy the bootstrap CloudFormation template.
#### Obtain CloudFormation Template \[!toc]
* Obtain the latest CloudFormation template file from the Supermetal Console, the [template from appendix](#cloudformation-template) is for reference.
#### Deploy CloudFormation Template \[!toc]
#### Deploy with AWS Management Console \[!toc]
#### Prepare and Initiate Stack Creation \[!toc]
* Navigate to the **AWS CloudFormation console** in your AWS account.
* Ensure you are in the correct **AWS Region** for your deployment.
* Click **"Create stack"** and select **"With new resources (standard)"**.
* Under "Prerequisite - Prepare template", choose **"Template is ready"**.
* Under "Specify template", select **"Upload a template file"**.
* Click **"Choose file"** and upload the CloudFormation template file you obtained.
* Click **"Next"**.
#### Specify Stack Details \[!toc]
* On the "Specify stack details" page, enter a **"Stack name"** (e.g., `supermetal-bootstrap`).
* In the **"Parameters"** section, provide values for each parameter.
* Click **"Next"**.
Refer to the [Deployment Parameters](#deployment-parameters) section above for detailed explanations of each.
* `SupermetalAccountId`
* `VpcId`
* `SubnetIds`
* `CredentialsKeyArn`
#### Review and Create \[!toc]
* On the "Review" page, carefully **review all your settings** and parameter values.
* Scroll to the bottom and find the **"Capabilities"** section.
* **Important:** You **must** check the box that says: "I acknowledge that AWS CloudFormation might create IAM resources with custom names." (The exact wording might vary slightly).
* Click **"Create stack"** (or "Submit" depending on the AWS console version).
#### Deploy with AWS CLI \[!toc]
* AWS variables necessary to authenticate. Use either:
* `AWS_PROFILE`
* `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`
* Verify access, run `aws sts get-caller-identity --region `. See the [AWS CLI reference](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html).
#### Deploy Stack \[!toc]
Use the `aws cloudformation deploy` command to create the stack.
```bash
aws cloudformation deploy \
--template-file /path/to/your/supermetal-bootstrap-cf-template.yaml \
--stack-name supermetal-bootstrap \
--capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM \
--parameter-overrides \
SupermetalAccountId="$SUPERMETAL_ACCOUNT_ID" \
VpcId="$VPC_ID" \
SubnetIds="$SUBNET_IDS" \
CredentialsKeyArn="$CREDENTIALS_KEY_ARN" # Optional: Provide a Customer Managed Key (CMK) ARN for encrypting connector credentials
```
Refer to the [Deployment Parameters](#deployment-parameters) section above for detailed explanations of each.
* `SUPERMETAL_ACCOUNT_ID`
* `VPC_ID`
* `SUBNET_IDS`
* `CREDENTIALS_KEY_ARN`
#### Verification \[!toc]
* Monitor the CloudFormation stack creation process in the AWS CloudFormation console. It may take several minutes to complete.
* Once the CloudFormation stack status shows `CREATE_COMPLETE`, Supermetal's Control Plane automatically performs a registration process for the new environment post stack creation.
* Monitor the Environments page in your Supermetal Console. The environment should show up with a "Ready" status within a few minutes after the CloudFormation stack completes.
You can use the Terraform CLI to deploy the bootstrap module.
#### Obtain Terraform Script \[!toc]
* Obtain the latest Terraform script from the Supermetal Console. The Terraform code in the [appendix](#environment-bootstrap-scripts) can be used as a reference or saved as local files (e.g., `main.tf`, `variables.tf`, `outputs.tf`).
#### Configure Terraform \[!toc]
* Ensure you have [Terraform installed](https://learn.hashicorp.com/tutorials/terraform/install-cli) (version 1.0 or later).
* Set up your AWS credentials for Terraform. This typically involves configuring the AWS CLI and ensuring your environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN`) or AWS profile (`AWS_PROFILE`) are correctly set.
* Verify access by running `aws sts get-caller-identity --region `. Refer to the [AWS provider documentation](https://registry.terraform.io/providers/hashicorp/aws/latest/docs#authentication-and-configuration) for more details.
#### Prepare Terraform Files \[!toc]
* Create a new directory for your Terraform configuration.
* If using the appendix script, save the Terraform code into files within this directory (e.g., `main.tf`).
* Create a `terraform.tfvars` file in the same directory to specify the required variable values. Alternatively, you can pass variables via the command line.
**Example `terraform.tfvars`:**
```tfvars
vpc_id = "vpc-xxxxxxxxxxxxxxxxx"
private_subnet_ids = ["subnet-xxxxxxxxxxxxxxxxx", "subnet-yyyyyyyyyyyyyyyyy"]
external_id = "your-supermetal-provided-external-id"
supermetal_control_plane_aws_account_id = "supermetal-aws-account-id" # Provided by Supermetal
supermetal_metadata_service_endpoint_name = "com.amazonaws.vpce.your-region.vpce-svc-xxxxxxxxxxxxxxxxx" # Provided by Supermetal
# credentials_key_arn = "arn:aws:kms:your-region:your-account-id:key/your-key-id" # Optional: Only if using an existing KMS CMK
```
Refer to the [Deployment Parameters](#deployment-parameters) section and the variable descriptions within the Terraform script for detailed explanations of each.
* `vpc_id`
* `private_subnet_ids`
* `external_id`
* `supermetal_control_plane_aws_account_id`
* `supermetal_metadata_service_endpoint_name`
* `credentials_key_arn` (Optional)
#### Deploy with Terraform CLI \[!toc]
* Navigate to your Terraform configuration directory in your terminal.
* Initialize Terraform:
```bash
terraform init
```
* Review the execution plan:
```bash
terraform plan
```
* Apply the configuration:
```bash
terraform apply
```
Confirm the action by typing `yes` when prompted.
If you didn't use a `.tfvars` file, you can pass variables directly with the `apply` command. Ensure you provide values for all required variables:
```bash
terraform apply \
-var="vpc_id=vpc-xxxxxxxxxxxxxxxxx" \
-var="private_subnet_ids=[\"subnet-xxxxxxxxxxxxxxxxx\", \"subnet-yyyyyyyyyyyyyyyyy\"]" \
-var="external_id=YOUR_EXTERNAL_ID_FROM_SUPERMETAL_CONSOLE" \
-var="supermetal_control_plane_aws_account_id=SUPERMETAL_CONTROL_PLANE_AWS_ACCOUNT_ID" \
-var="supermetal_metadata_service_endpoint_name=COM_AMAZONAWS_VPCE_YOUR_REGION_VPCE_SVC_XXXXXXXXXXXXXX" \
# Optional: Provide this if you are using an existing KMS CMK.
# If omitted and 'credentials_key_arn' is not set in a .tfvars file, a new KMS key will be created.
# -var="credentials_key_arn=arn:aws:kms:your-region:your-account-id:key/your-key-id"
```
Replace placeholder values (e.g., `vpc-xxxxxxxxxxxxxxxxx`, `YOUR_EXTERNAL_ID_FROM_SUPERMETAL_CONSOLE`) with your actual values.
#### Verification \[!toc]
* Monitor the `terraform apply` command output. It will indicate when the resources have been successfully created.
* Once the Terraform apply is complete, Supermetal's Control Plane automatically performs a registration process for the new environment.
* Monitor the Environments page in your Supermetal Console. The environment should show up with a "Ready" status within a few minutes after Terraform completes.
***
#### Create Connector(s) to deploy Supermetal agents
Once the environment bootstrap is complete, we are now ready to create a connector to deploy Supermetal agents in your VPC.
#### Select source database \[!toc]
Follow the connector setup instructions in the Supermetal Console to select a source database, supermetal can optionally [auto discover](#database_discovery) available source databases.
#### Create Security Group and Configure Source Database Access \[!toc]
To give Supermetal access to your source database and to validate network reachability and connection credentials, you need to create a security group and configure the source database access.
Supermetal Console will list pre-filled steps to create the security group and configure the database access as part of the prerequisite steps similar to the steps listed below.
```bash
# Create the security group
SG_ID=$(aws ec2 create-security-group \
--description "Security group for Supermetal" \
--vpc-id $VPC_ID \
--output text --query 'GroupId')
# Add required outbound rules
# 1. For metadata VPC endpoint (created during environment bootstrap)
aws ec2 authorize-security-group-egress \
--group-id $SG_ID \
--protocol tcp \
--port 443 \
--source-group $(aws ec2 describe-security-groups \
--filters Name=group-name,Values=supermetal-metadata-sg Name=vpc-id,Values=$VPC_ID \
--query 'SecurityGroups[0].GroupId' --output text)
# 2. For database access
aws ec2 authorize-security-group-egress \
--group-id $SG_ID \
--protocol tcp \
--port $DATABASE_PORT \
--source-group $DATABASE_SECURITY_GROUP_ID
# Add database access rule to your database's security group
aws ec2 authorize-security-group-ingress \
--group-id $DATABASE_SECURITY_GROUP_ID \
--protocol tcp \
--port $DATABASE_PORT \
--source-group $SG_ID
```
* `VPC_ID`: The VPC ID of your environment. Refer to the [Deployment Parameters](#deployment-parameters) section
* `DATABASE_SECURITY_GROUP_ID`: The security group ID of your database
* `DATABASE_PORT`: The port number your database listens on (e.g., 5432 for PostgreSQL)
* The metadata endpoint security group (`supermetal-metadata-sg`) is automatically looked up using the VPC ID
#### Create Security Group and Configure Database Access \[!toc]
1. Navigate to the **AWS EC2 Console** > **Security Groups**
2. Click **Create security group**
3. Configure the basic details:
* **Description:** "Security group for Supermetal agents "
* **VPC:** Select the same VPC used for your Supermetal environment
4. Leave **Inbound rules** empty (no rules needed)
5. **Outbound rules (required):**
* Add rule for metadata access:
* **Type:** HTTPS
* **Port:** 443
* **Destination:** Security group named `supermetal-metadata-sg` (created during environment bootstrap)
* Add rule for database access:
* **Type:** Custom TCP
* **Port:** Your database port
* **Destination:** Your database's security group
6. Click **Create security group** and note the Security Group ID
#### Configure Database Access \[!toc]
7. Locate your database's security group
8. Edit its inbound rules:
* **Type:** Custom TCP
* **Port:** Your database port (e.g., 5432 for PostgreSQL)
* **Source:** Select the `supermetal-agent-sg` security group
9. Save the rules
#### CloudFormation Template for Security Groups \[!toc]
```yaml
Resources:
# Create the Supermetal Agent Security Group
SupermetalAgentSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Security group for Supermetal agents
VpcId: !Ref VpcId
SecurityGroupEgress:
# Required outbound rules
- Description: Allow access to metadata service
IpProtocol: tcp
FromPort: 443
ToPort: 443
SourceSecurityGroupId: !Ref MetadataSecurityGroup
- Description: Allow access to database
IpProtocol: tcp
FromPort: !Ref DatabasePort
ToPort: !Ref DatabasePort
SourceSecurityGroupId: !Ref DatabaseSecurityGroupId
# Look up the existing metadata security group
MetadataSecurityGroup:
Type: AWS::EC2::SecurityGroup::Id
Properties:
Filters:
- Name: group-name
Values: [supermetal-metadata-sg]
- Name: vpc-id
Values: [!Ref VpcId]
# Add the ingress rule to the database security group
DatabaseSecurityGroupIngress:
Type: AWS::EC2::SecurityGroupIngress
Properties:
GroupId: !Ref DatabaseSecurityGroupId
IpProtocol: tcp
FromPort: !Ref DatabasePort
ToPort: !Ref DatabasePort
SourceSecurityGroupId: !Ref SupermetalAgentSecurityGroup
Parameters:
VpcId:
Type: AWS::EC2::VPC::Id
Description: VPC where the security groups will be created
DatabaseSecurityGroupId:
Type: AWS::EC2::SecurityGroup::Id
Description: Security Group ID of your database
DatabasePort:
Type: Number
Description: Port number your database listens on (e.g., 5432 for PostgreSQL)
MinValue: 1
MaxValue: 65535
Outputs:
AgentSecurityGroupId:
Description: ID of the Supermetal Agent Security Group
Value: !Ref SupermetalAgentSecurityGroup
```
This template creates both the agent security group and configures the database access in one deployment.
* `VpcId`: The VPC ID of your environment. Refer to the [Deployment Parameters](#deployment-parameters) section
* `DatabaseSecurityGroupId`: The security group ID of your database
* `DatabasePort`: The port number your database listens on (e.g., 5432 for PostgreSQL)
* The metadata endpoint security group (`supermetal-metadata-sg`) is automatically looked up using the VPC ID
#### Create Security Group and Configure Database Access \[!toc]
```terraform
# Create the agent security group
resource "aws_security_group" "supermetal_agent_sg" {
description = "Security group for Supermetal agents"
vpc_id = var.vpc_id
# Required outbound rules
egress {
description = "Allow access to metadata service"
from_port = 443
to_port = 443
protocol = "tcp"
security_group_ids = [data.aws_security_group.metadata_endpoint.id]
}
# Look up the metadata endpoint security group
data "aws_security_group" "metadata_endpoint" {
name = "supermetal-metadata-sg"
vpc_id = var.vpc_id
}
egress {
description = "Allow access to database"
from_port = var.database_port
to_port = var.database_port
protocol = "tcp"
security_group_ids = [var.database_security_group_id]
}
}
# Add database access rule
resource "aws_security_group_rule" "allow_supermetal_agent_to_db" {
type = "ingress"
from_port = var.database_port
to_port = var.database_port
protocol = "tcp"
source_security_group_id = aws_security_group.supermetal_agent_sg.id
security_group_id = var.database_security_group_id
}
# Output the security group ID for reference
output "supermetal_agent_sg_id" {
value = aws_security_group.supermetal_agent_sg.id
}
```
* `vpc_id`: The VPC ID of your environment. Refer to the [Deployment Parameters](#deployment-parameters) section
* `database_security_group_id`: The security group ID of your database
* `database_port`: The port number your database listens on (e.g., 5432 for PostgreSQL)
* The metadata endpoint security group (`supermetal-metadata-sg`) is automatically looked up using the VPC ID
#### Validate Source database \[!toc]
Follow the setup instructions in the Supermetal Console to input the created security group ID for the source database and validate the network reachability and connection credentials.
#### Repeat for Target database \[!toc]
Repeat the same steps for the target database.
#### Finalize \[!toc]
Once the source and target database connections are validated, you can finalize the connector setup which will launch the Supermetal agents in your VPC.
***
## Appendix
### Environment Bootstrap Scripts
Resources used to bootstrap the environment
```yaml
AWSTemplateFormatVersion: '2010-09-09'
Description: >-
CloudFormation template to set up the Supermetal BYOC environment in your AWS account.
This template creates IAM roles, a VPC Endpoint for communication with the Supermetal Control Plane and an optional KMS key.
Parameters:
VpcId:
Type: AWS::EC2::VPC::Id
Description: (Required) The ID of your existing VPC for agent deployment.
PrivateSubnetIds:
Type: List
Description: >-
(Required) Comma-separated list of private subnet IDs within the VpcId
where agents and the VPC endpoint will be deployed.
CredentialsKeyArn:
Type: String
Description: >-
(Optional) ARN of your existing KMS CMK for encrypting connector credentials.
If left blank, a new KMS key will be created.
Default: ""
ExternalId:
Type: String
Description: >-
(Required) A unique ID provided by the Supermetal Console.
Used for securing the cross-account IAM role.
SupermetalControlPlaneAWSAccountId:
Type: String
Description: The AWS Account ID of the Supermetal Control Plane.
NoEcho: true
SupermetalMetadataServiceEndpointName:
Type: String
Description: >-
The VPC Endpoint Service Name for the Supermetal Metadata Service (e.g., com.amazonaws.vpce.us-east-1.vpce-svc-xxxxxxxxxxxxxxxxx).
Conditions:
CreateKMSKey: !Equals [!Ref CredentialsKeyArn, ""]
Resources:
# IAM Role for Supermetal Control Plane
SupermetalControlPlaneRole:
Type: AWS::IAM::Role
Properties:
RoleName: !Sub "SupermetalControlPlaneRole-${VpcId}"
Tags:
- Key: Name
Value: !Sub "SupermetalControlPlaneRole-${VpcId}"
- Key: supermetal:byoc:vpc-id
Value: !Ref VpcId
- Key: supermetal:byoc:stack-name
Value: !Ref AWS::StackName
- Key: supermetal:byoc:external-id
Value: !Ref ExternalId
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
AWS: !Sub arn:aws:iam::${SupermetalControlPlaneAWSAccountId}:root
Action: sts:AssumeRole
Condition:
StringEquals:
sts:ExternalId: !Ref ExternalId
ManagedPolicyArns:
- !Ref SupermetalControlPlanePolicy
# Managed Policy for SupermetalControlPlaneRole
SupermetalControlPlanePolicy:
Type: AWS::IAM::ManagedPolicy
Properties:
ManagedPolicyName: !Sub "SupermetalControlPlanePolicy-${VpcId}"
PolicyDocument:
Version: '2012-10-17'
Statement:
# ECS Cluster Management
- Effect: Allow
Action:
- ecs:CreateCluster
- ecs:DeleteCluster
- ecs:DescribeClusters
Resource: "*" # Scoped to clusters with name prefix 'supermetal-' via condition
# Note: Conditions like 'ecs:cluster' on CreateCluster primarily apply to actions on existing resources.
# They do not strictly restrict naming during creation itself due to IAM evaluation timing.
Condition:
StringLike:
ecs:cluster: "supermetal-*"
# ECS Task and Service Management
- Effect: Allow
Action:
- ecs:CreateService
- ecs:UpdateService
- ecs:DeleteService
- ecs:RegisterTaskDefinition
- ecs:DeregisterTaskDefinition
- ecs:DescribeServices
- ecs:DescribeTasks
- ecs:RunTask
- ecs:StopTask
- ecs:ListTasks
- ecs:ListServices
- ecs:DescribeTaskDefinition
- ecs:ListContainerInstances
- ecs:DescribeContainerInstances
Resource: "*" # Scoped to clusters with name prefix 'supermetal-' via condition
Condition:
StringLike:
ecs:cluster: "supermetal-*"
# ECS Capacity Provider Management
- Effect: Allow
Action:
- ecs:CreateCapacityProvider
- ecs:DeleteCapacityProvider
- ecs:DescribeCapacityProviders
- ecs:UpdateCapacityProvider
- ecs:PutClusterCapacityProviders
Resource: "*" # Scoped to capacity providers with name prefix 'supermetal-' via condition
# Note: Conditions like 'ecs:capacity-provider' on CreateCapacityProvider primarily apply to actions on existing resources.
# They do not strictly restrict naming during creation itself due to IAM evaluation timing.
Condition:
StringLike:
ecs:capacity-provider: "supermetal-*"
# EC2 Capacity Management
- Effect: Allow
Action:
- autoscaling:CreateAutoScalingGroup
- autoscaling:UpdateAutoScalingGroup
- autoscaling:DeleteAutoScalingGroup
- autoscaling:DescribeAutoScalingGroups
- autoscaling:DescribeScalingActivities
- autoscaling:SetDesiredCapacity
- ec2:CreateLaunchTemplate
- ec2:DeleteLaunchTemplate
- ec2:DescribeLaunchTemplates
- ec2:DescribeLaunchTemplateVersions
- ec2:CreateTags
- ec2:DescribeInstances
Resource: "*"
Condition:
StringLike:
aws:ResourceTag/supermetal: "*"
# Lambda for Connector Validation
- Effect: Allow
Action:
- lambda:InvokeFunction
- lambda:CreateFunction
- lambda:DeleteFunction
- lambda:GetFunctionConfiguration
- lambda:UpdateFunctionConfiguration
- lambda:AddPermission
- lambda:RemovePermission
Resource: !Sub arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:supermetal-validation-*
# Lambda VPC Access
- Effect: Allow
Action:
- ec2:CreateNetworkInterface
- ec2:DescribeNetworkInterfaces
- ec2:DeleteNetworkInterface
- ec2:AssignPrivateIpAddresses
- ec2:UnassignPrivateIpAddresses
Resource: "*" # Scoped to specified VPC via condition
Condition:
StringEquals:
ec2:VpcId: !Ref VpcId
# Database Discovery
- Effect: Allow
Action:
- rds:DescribeDBInstances
- rds:DescribeDBClusters
- redshift:DescribeClusters
Resource: "*" # Scoped to specified VPC via condition
# Note: While ec2:VpcId is the correct key, RDS/Redshift Describe actions may not fully support VPC-based IAM filtering.
# The control plane might list all regional resources; further filtering may be needed application-side.
Condition:
StringEquals:
ec2:VpcId: !Ref VpcId
# S3 Buffer Management
- Effect: Allow
Action:
- s3:CreateBucket
- s3:DeleteBucket
- s3:PutBucketTagging
- s3:GetBucketTagging
- s3:PutEncryptionConfiguration
- s3:GetEncryptionConfiguration
Resource: !Sub arn:aws:s3:::supermetal-*
# Security Group Management
- Effect: Allow
Action:
- ec2:DescribeSecurityGroups
Resource: "*"
Condition:
StringEquals:
ec2:VpcId: !Ref VpcId
# Secrets Management
- Effect: Allow
Action:
- secretsmanager:CreateSecret
- secretsmanager:DeleteSecret
- secretsmanager:GetSecretValue
- secretsmanager:PutSecretValue
- secretsmanager:UpdateSecret
- secretsmanager:TagResource
Resource: !Sub arn:aws:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:supermetal-*
# KMS Encryption
- Effect: Allow
Action: kms:Encrypt
Resource: !If [CreateKMSKey, !GetAtt SupermetalKMSKey.Arn, !Ref CredentialsKeyArn]
# IAM Role Management
- Effect: Allow
Action: iam:PassRole
Resource:
- !GetAtt SupermetalAgentTaskRole.Arn
- !GetAtt SupermetalValidationLambdaRole.Arn
- !GetAtt SupermetalTaskExecutionRole.Arn
- !GetAtt SupermetalAgentEC2InstanceRole.Arn
Condition:
StringEquals:
iam:PassedToService:
- ecs-tasks.amazonaws.com
- lambda.amazonaws.com
- ec2.amazonaws.com
# IAM Role for Lambda Validation Function
SupermetalValidationLambdaRole:
Type: AWS::IAM::Role
Properties:
RoleName: !Sub "SupermetalValidationLambdaRole-${VpcId}"
Tags:
- Key: Name
Value: !Sub "SupermetalValidationLambdaRole-${VpcId}"
- Key: supermetal:byoc:vpc-id
Value: !Ref VpcId
- Key: supermetal:byoc:stack-name
Value: !Ref AWS::StackName
- Key: supermetal:byoc:external-id
Value: !Ref ExternalId
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: lambda.amazonaws.com
Action: sts:AssumeRole
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole
Path: /
# IAM Role for Supermetal Agent Tasks
SupermetalAgentTaskRole:
Type: AWS::IAM::Role
Properties:
RoleName: !Sub "SupermetalAgentTaskRole-${VpcId}"
Tags:
- Key: Name
Value: !Sub "SupermetalAgentTaskRole-${VpcId}"
- Key: supermetal:byoc:vpc-id
Value: !Ref VpcId
- Key: supermetal:byoc:stack-name
Value: !Ref AWS::StackName
- Key: supermetal:byoc:external-id
Value: !Ref ExternalId
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: ecs-tasks.amazonaws.com
Action: sts:AssumeRole
Condition:
ArnLike:
aws:SourceArn: !Sub arn:aws:ecs:${AWS::Region}:${AWS::AccountId}:task/supermetal-*
StringEquals:
aws:SourceAccount: !Ref AWS::AccountId
ManagedPolicyArns:
- !Ref SupermetalAgentTaskPolicy
Path: /
# IAM Role for ECS Task Execution
SupermetalTaskExecutionRole:
Type: AWS::IAM::Role
Properties:
RoleName: !Sub "SupermetalTaskExecutionRole-${VpcId}"
Tags:
- Key: Name
Value: !Sub "SupermetalTaskExecutionRole-${VpcId}"
- Key: supermetal:byoc:vpc-id
Value: !Ref VpcId
- Key: supermetal:byoc:stack-name
Value: !Ref AWS::StackName
- Key: supermetal:byoc:external-id
Value: !Ref ExternalId
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: ecs-tasks.amazonaws.com
Action: sts:AssumeRole
Condition:
StringEquals:
aws:SourceAccount: !Ref AWS::AccountId
ArnLike:
aws:SourceArn: !Sub arn:aws:ecs:${AWS::Region}:${AWS::AccountId}:task-definition/supermetal*:*
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy
- !Ref SupermetalTaskExecutionPolicy
Path: /
# Managed Policy for SupermetalAgentTaskRole
SupermetalAgentTaskPolicy:
Type: AWS::IAM::ManagedPolicy
Properties:
ManagedPolicyName: !Sub "SupermetalAgentTaskPolicy-${VpcId}"
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- kms:Decrypt
- kms:Encrypt
Resource: !If [CreateKMSKey, !GetAtt SupermetalKMSKey.Arn, !Ref CredentialsKeyArn]
# Secrets Access
- Effect: Allow
Action:
- secretsmanager:GetSecretValue
Resource: !Sub arn:aws:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:supermetal-*
# S3 Buffer Access
- Effect: Allow
Action:
- s3:PutObject
- s3:GetObject
- s3:DeleteObject
- s3:ListBucket
Resource:
- !Sub arn:aws:s3:::supermetal-*
- !Sub arn:aws:s3:::supermetal-*/*
# Managed Policy for SupermetalTaskExecutionRole
SupermetalTaskExecutionPolicy:
Type: AWS::IAM::ManagedPolicy
Properties:
ManagedPolicyName: !Sub "SupermetalTaskExecutionPolicy-${VpcId}"
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- secretsmanager:GetSecretValue
Resource: !Sub arn:aws:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:supermetal-*
- Effect: Allow
Action:
- kms:Decrypt
Resource: !If [CreateKMSKey, !GetAtt SupermetalKMSKey.Arn, !Ref CredentialsKeyArn]
# KMS Key for Connector Credentials
SupermetalKMSKey:
Type: AWS::KMS::Key
Condition: CreateKMSKey
Properties:
Description: KMS key for encrypting Supermetal connector credentials
Tags:
- Key: Name
Value: !Sub "SupermetalKMSKey-${VpcId}"
- Key: supermetal:byoc:vpc-id
Value: !Ref VpcId
- Key: supermetal:byoc:stack-name
Value: !Ref AWS::StackName
- Key: supermetal:byoc:external-id
Value: !Ref ExternalId
EnableKeyRotation: true
KeyPolicy:
Version: '2012-10-17'
Statement:
- Sid: Enable IAM User Permissions
Effect: Allow
Principal:
AWS: !Sub arn:aws:iam::${AWS::AccountId}:root
Action: kms:*
Resource: '*' # Key policy applies to this key only
- Sid: Allow Supermetal Control Plane to Encrypt
Effect: Allow
Principal:
AWS: !GetAtt SupermetalControlPlaneRole.Arn
Action: kms:Encrypt
Resource: '*' # Key policy applies to this key only
- Sid: Allow Supermetal Agents to Decrypt/Encrypt
Effect: Allow
Principal:
AWS: !GetAtt SupermetalAgentTaskRole.Arn
Action:
- kms:Decrypt
- kms:Encrypt
Resource: '*' # Key policy applies to this key only
SupermetalKMSKeyAlias:
Type: AWS::KMS::Alias
Condition: CreateKMSKey
Properties:
AliasName: !Sub "alias/supermetal-key-${VpcId}"
TargetKeyId: !Ref SupermetalKMSKey
SupermetalAgentEC2InstanceProfile:
Type: AWS::IAM::InstanceProfile
Properties:
InstanceProfileName: !Sub "SupermetalAgentEC2InstanceProfile-${VpcId}"
Tags:
- Key: Name
Value: !Sub "SupermetalAgentEC2InstanceProfile-${VpcId}"
- Key: supermetal:byoc:vpc-id
Value: !Ref VpcId
- Key: supermetal:byoc:stack-name
Value: !Ref AWS::StackName
- Key: supermetal:byoc:external-id
Value: !Ref ExternalId
Path: /
Roles:
- !Ref SupermetalAgentEC2InstanceRole
SupermetalAgentEC2InstanceRole:
Type: AWS::IAM::Role
Properties:
RoleName: !Sub "SupermetalAgentEC2InstanceRole-${VpcId}"
Tags:
- Key: Name
Value: !Sub "SupermetalAgentEC2InstanceRole-${VpcId}"
- Key: supermetal:byoc:vpc-id
Value: !Ref VpcId
- Key: supermetal:byoc:stack-name
Value: !Ref AWS::StackName
- Key: supermetal:byoc:external-id
Value: !Ref ExternalId
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: ec2.amazonaws.com
Action: sts:AssumeRole
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role
- !Ref SupermetalAgentEC2InstancePolicy
Path: /
# Managed Policy for EC2 Instance Role
SupermetalAgentEC2InstancePolicy:
Type: AWS::IAM::ManagedPolicy
Properties:
ManagedPolicyName: !Sub "SupermetalAgentEC2InstancePolicy-${VpcId}"
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- ecr:GetAuthorizationToken
Resource: "*"
- Effect: Allow
Action:
- ecr:BatchGetImage
- ecr:GetDownloadUrlForLayer
Resource: !Sub arn:aws:ecr:${AWS::Region}:${SupermetalControlPlaneAWSAccountId}:repository/agent/*
SupermetalAgentMetadataSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupName: !Sub "supermetal-agent-metadata-sg-${VpcId}"
GroupDescription: Security group for Supermetal Agents
VpcId: !Ref VpcId
Tags:
- Key: Name
Value: !Sub "supermetal-agent-metadata-sg-${VpcId}"
- Key: supermetal:byoc:vpc-id
Value: !Ref VpcId
- Key: supermetal:byoc:stack-name
Value: !Ref AWS::StackName
- Key: supermetal:byoc:external-id
Value: !Ref ExternalId
# VPC Endpoint for Supermetal Metadata Service
MetadataVPCEndpoint:
Type: AWS::EC2::VPCEndpoint
Properties:
ServiceName: !Ref SupermetalMetadataServiceEndpointName
VpcId: !Ref VpcId
SubnetIds: !Ref PrivateSubnetIds
VpcEndpointType: Interface
PrivateDnsEnabled: true
SecurityGroupIds:
- !Ref MetadataVPCEndpointSecurityGroup
Tags:
- Key: Name
Value: !Sub "MetadataVPCEndpoint-${VpcId}"
- Key: supermetal:byoc:vpc-id
Value: !Ref VpcId
- Key: supermetal:byoc:stack-name
Value: !Ref AWS::StackName
- Key: supermetal:byoc:external-id
Value: !Ref ExternalId
MetadataVPCEndpointSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupName: !Sub "supermetal-metadata-vpce-sg-${VpcId}"
GroupDescription: Security group for Supermetal metadata VPC Endpoint
VpcId: !Ref VpcId
SecurityGroupIngress: # Allow agents to connect to the endpoint
- IpProtocol: tcp
FromPort: 443 # HTTPS
ToPort: 443
SourceSecurityGroupId: !Ref SupermetalAgentMetadataSecurityGroup
Tags:
- Key: Name
Value: !Sub "supermetal-metadata-vpce-sg-${VpcId}"
- Key: supermetal:byoc:vpc-id
Value: !Ref VpcId
- Key: supermetal:byoc:stack-name
Value: !Ref AWS::StackName
- Key: supermetal:byoc:external-id
Value: !Ref ExternalId
Outputs:
SupermetalControlPlaneRoleArn:
Description: ARN of the IAM Role for Supermetal Control Plane
Value: !GetAtt SupermetalControlPlaneRole.Arn
SupermetalAgentTaskRoleArn:
Description: ARN of the IAM Role for Supermetal Agent Tasks
Value: !GetAtt SupermetalAgentTaskRole.Arn
SupermetalKMSKeyArnOutput:
Description: ARN of the KMS Key for encrypting credentials
Value: !If [CreateKMSKey, !GetAtt SupermetalKMSKey.Arn, !Ref CredentialsKeyArn]
SupermetalTaskExecutionRoleArn:
Description: ARN of the IAM Role for ECS Task Execution
Value: !GetAtt SupermetalTaskExecutionRole.Arn
MetadataVPCEndpointId:
Description: ID of the VPC Endpoint for Supermetal Metadata Service
Value: !Ref MetadataVPCEndpoint
SupermetalAgentMetadataSecurityGroupId:
Description: ID of the Security Group for Supermetal Agents (for Metadata Endpoint communication)
Value: !GetAtt SupermetalAgentMetadataSecurityGroup.GroupId
SupermetalValidationLambdaRoleArn:
Description: ARN of the IAM Role for Supermetal Validation Lambda functions
Value: !GetAtt SupermetalValidationLambdaRole.Arn
SupermetalAgentEC2InstanceRoleArn:
Description: ARN of the IAM Role for Supermetal Agent EC2 Instances
Value: !GetAtt SupermetalAgentEC2InstanceRole.Arn
SupermetalAgentEC2InstanceProfileArn:
Description: ARN of the IAM Instance Profile for Supermetal Agent EC2 Instances
Value: !GetAtt SupermetalAgentEC2InstanceProfile.Arn
```
```terraform
# This sets up the Supermetal BYOC environment in your AWS account.
# It creates IAM roles, a VPC Endpoint for communication with the Supermetal Control Plane,
# and an optional KMS key.
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0" # Specify a compatible AWS provider version
}
}
}
# ------------------------------------------------------------------------------
# Input Variables
# ------------------------------------------------------------------------------
variable "vpc_id" {
description = "(Required) The ID of your existing VPC for agent deployment."
type = string
}
variable "private_subnet_ids" {
description = "(Required) List of private subnet IDs within the VpcId where agents and the VPC endpoint will be deployed."
type = list(string)
}
variable "credentials_key_arn" {
description = "(Optional) ARN of your existing KMS CMK for encrypting connector credentials. If left blank, a new KMS key will be created."
type = string
default = ""
}
variable "external_id" {
description = "(Required) A unique ID provided by the Supermetal Console. Used for securing the cross-account IAM role."
type = string
}
variable "supermetal_control_plane_aws_account_id" {
description = "The AWS Account ID of the Supermetal Control Plane."
type = string
}
variable "supermetal_metadata_service_endpoint_name" {
description = "The VPC Endpoint Service Name for the Supermetal Metadata Service (e.g., com.amazonaws.vpce.us-east-1.vpce-svc-xxxxxxxxxxxxxxxxx)."
type = string
}
locals {
create_kms_key = var.credentials_key_arn == ""
common_tags = {
"supermetal:byoc:vpc-id" = var.vpc_id
"supermetal:byoc:stack-name" = "terraform-supermetal-byoc" # Terraform doesn't have a direct AWS::StackName equivalent, using a placeholder
"supermetal:byoc:external-id" = var.external_id
}
}
data "aws_caller_identity" "current" {}
data "aws_region" "current" {}
# ------------------------------------------------------------------------------
# Resources
# ------------------------------------------------------------------------------
# IAM Role for Supermetal Control Plane
resource "aws_iam_role" "supermetal_control_plane_role" {
name = "SupermetalControlPlaneRole-${var.vpc_id}"
tags = merge(local.common_tags, {
Name = "SupermetalControlPlaneRole-${var.vpc_id}"
})
assume_role_policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Effect = "Allow",
Principal = {
AWS = "arn:aws:iam::${var.supermetal_control_plane_aws_account_id}:root"
},
Action = "sts:AssumeRole",
Condition = {
StringEquals = {
"sts:ExternalId" = var.external_id
}
}
}
]
})
}
# Managed Policy for SupermetalControlPlaneRole
resource "aws_iam_policy" "supermetal_control_plane_policy" {
name = "SupermetalControlPlanePolicy-${var.vpc_id}"
description = "Policy for Supermetal Control Plane Role"
policy = jsonencode({
Version = "2012-10-17",
Statement = [
# ECS Cluster Management
{
Effect = "Allow",
Action = [
"ecs:CreateCluster",
"ecs:DeleteCluster",
"ecs:DescribeClusters"
],
Resource = "*", # Scoped to clusters with name prefix 'supermetal-' via condition
Condition = {
StringLike = {
"ecs:cluster" = "supermetal-*"
}
}
},
# ECS Task and Service Management
{
Effect = "Allow",
Action = [
"ecs:CreateService",
"ecs:UpdateService",
"ecs:DeleteService",
"ecs:RegisterTaskDefinition",
"ecs:DeregisterTaskDefinition",
"ecs:DescribeServices",
"ecs:DescribeTasks",
"ecs:RunTask",
"ecs:StopTask",
"ecs:ListTasks",
"ecs:ListServices",
"ecs:DescribeTaskDefinition",
"ecs:ListContainerInstances",
"ecs:DescribeContainerInstances"
],
Resource = "*", # Scoped to clusters with name prefix 'supermetal-' via condition
Condition = {
StringLike = {
"ecs:cluster" = "supermetal-*"
}
}
},
# ECS Capacity Provider Management
{
Effect = "Allow",
Action = [
"ecs:CreateCapacityProvider",
"ecs:DeleteCapacityProvider",
"ecs:DescribeCapacityProviders",
"ecs:UpdateCapacityProvider",
"ecs:PutClusterCapacityProviders"
],
Resource = "*", # Scoped to capacity providers with name prefix 'supermetal-' via condition
Condition = {
StringLike = {
"ecs:capacity-provider" = "supermetal-*"
}
}
},
# EC2 Capacity Management
{
Effect = "Allow",
Action = [
"autoscaling:CreateAutoScalingGroup",
"autoscaling:UpdateAutoScalingGroup",
"autoscaling:DeleteAutoScalingGroup",
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeScalingActivities",
"autoscaling:SetDesiredCapacity",
"ec2:CreateLaunchTemplate",
"ec2:DeleteLaunchTemplate",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeLaunchTemplateVersions",
"ec2:CreateTags",
"ec2:DescribeInstances"
],
Resource = "*",
Condition = {
StringLike = {
"aws:ResourceTag/supermetal" = "*"
}
}
},
# Lambda for Connector Validation
{
Effect = "Allow",
Action = [
"lambda:InvokeFunction",
"lambda:CreateFunction",
"lambda:DeleteFunction",
"lambda:GetFunctionConfiguration",
"lambda:UpdateFunctionConfiguration",
"lambda:AddPermission",
"lambda:RemovePermission"
],
Resource = "arn:aws:lambda:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:function:supermetal-validation-*"
},
# Lambda VPC Access
{
Effect = "Allow",
Action = [
"ec2:CreateNetworkInterface",
"ec2:DescribeNetworkInterfaces",
"ec2:DeleteNetworkInterface",
"ec2:AssignPrivateIpAddresses",
"ec2:UnassignPrivateIpAddresses"
],
Resource = "*", # Scoped to specified VPC via condition
Condition = {
StringEquals = {
"ec2:VpcId" = var.vpc_id
}
}
},
# Database Discovery
{
Effect = "Allow",
Action = [
"rds:DescribeDBInstances",
"rds:DescribeDBClusters",
"redshift:DescribeClusters"
],
Resource = "*", # Scoped to specified VPC via condition
Condition = {
StringEquals = {
"ec2:VpcId" = var.vpc_id
}
}
},
# S3 Buffer Management
{
Effect = "Allow",
Action = [
"s3:CreateBucket",
"s3:DeleteBucket",
"s3:PutBucketTagging",
"s3:GetBucketTagging",
"s3:PutEncryptionConfiguration",
"s3:GetEncryptionConfiguration"
],
Resource = "arn:aws:s3:::supermetal-*"
},
# Security Group Management
{
Effect = "Allow",
Action = [
"ec2:DescribeSecurityGroups"
],
Resource = "*",
Condition = {
StringEquals = {
"ec2:VpcId" = var.vpc_id
}
}
},
# Secrets Management
{
Effect = "Allow",
Action = [
"secretsmanager:CreateSecret",
"secretsmanager:DeleteSecret",
"secretsmanager:GetSecretValue",
"secretsmanager:PutSecretValue",
"secretsmanager:UpdateSecret",
"secretsmanager:TagResource"
],
Resource = "arn:aws:secretsmanager:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:secret:supermetal-*"
},
# KMS Encryption
{
Effect = "Allow",
Action = "kms:Encrypt",
Resource = local.create_kms_key ? aws_kms_key.supermetal_kms_key[0].arn : var.credentials_key_arn
},
# IAM Role Management
{
Effect = "Allow",
Action = "iam:PassRole",
Resource = [
aws_iam_role.supermetal_agent_task_role.arn,
aws_iam_role.supermetal_validation_lambda_role.arn,
aws_iam_role.supermetal_task_execution_role.arn,
aws_iam_role.supermetal_agent_ec2_instance_role.arn
],
Condition = {
StringEquals = {
"iam:PassedToService" = [
"ecs-tasks.amazonaws.com",
"lambda.amazonaws.com",
"ec2.amazonaws.com"
]
}
}
}
]
})
}
resource "aws_iam_role_policy_attachment" "supermetal_control_plane_role_policy_attach" {
role = aws_iam_role.supermetal_control_plane_role.name
policy_arn = aws_iam_policy.supermetal_control_plane_policy.arn
}
# IAM Role for Lambda Validation Function
resource "aws_iam_role" "supermetal_validation_lambda_role" {
name = "SupermetalValidationLambdaRole-${var.vpc_id}"
path = "/"
tags = merge(local.common_tags, {
Name = "SupermetalValidationLambdaRole-${var.vpc_id}"
})
assume_role_policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Effect = "Allow",
Principal = {
Service = "lambda.amazonaws.com"
},
Action = "sts:AssumeRole"
}
]
})
}
resource "aws_iam_role_policy_attachment" "supermetal_validation_lambda_role_vpc_access_policy_attach" {
role = aws_iam_role.supermetal_validation_lambda_role.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole"
}
# IAM Role for Supermetal Agent Tasks
resource "aws_iam_role" "supermetal_agent_task_role" {
name = "SupermetalAgentTaskRole-${var.vpc_id}"
path = "/"
tags = merge(local.common_tags, {
Name = "SupermetalAgentTaskRole-${var.vpc_id}"
})
assume_role_policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Effect = "Allow",
Principal = {
Service = "ecs-tasks.amazonaws.com"
},
Action = "sts:AssumeRole",
Condition = {
ArnLike = {
"aws:SourceArn" = "arn:aws:ecs:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:task/supermetal-*"
},
StringEquals = {
"aws:SourceAccount" = data.aws_caller_identity.current.account_id
}
}
}
]
})
}
# Managed Policy for SupermetalAgentTaskRole
resource "aws_iam_policy" "supermetal_agent_task_policy" {
name = "SupermetalAgentTaskPolicy-${var.vpc_id}"
description = "Policy for Supermetal Agent Task Role"
policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Effect = "Allow",
Action = [
"kms:Decrypt",
"kms:Encrypt"
],
Resource = local.create_kms_key ? aws_kms_key.supermetal_kms_key[0].arn : var.credentials_key_arn
},
# Secrets Access
{
Effect = "Allow",
Action = [
"secretsmanager:GetSecretValue"
],
Resource = "arn:aws:secretsmanager:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:secret:supermetal-*"
},
# S3 Buffer Access
{
Effect = "Allow",
Action = [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject",
"s3:ListBucket"
],
Resource = [
"arn:aws:s3:::supermetal-*",
"arn:aws:s3:::supermetal-*/*"
]
}
]
})
}
resource "aws_iam_role_policy_attachment" "supermetal_agent_task_role_policy_attach" {
role = aws_iam_role.supermetal_agent_task_role.name
policy_arn = aws_iam_policy.supermetal_agent_task_policy.arn
}
# IAM Role for ECS Task Execution
resource "aws_iam_role" "supermetal_task_execution_role" {
name = "SupermetalTaskExecutionRole-${var.vpc_id}"
path = "/"
tags = merge(local.common_tags, {
Name = "SupermetalTaskExecutionRole-${var.vpc_id}"
})
assume_role_policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Effect = "Allow",
Principal = {
Service = "ecs-tasks.amazonaws.com"
},
Action = "sts:AssumeRole",
Condition = {
StringEquals = {
"aws:SourceAccount" = data.aws_caller_identity.current.account_id
},
ArnLike = {
"aws:SourceArn" = "arn:aws:ecs:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:task-definition/supermetal*:*"
}
}
}
]
})
}
resource "aws_iam_role_policy_attachment" "supermetal_task_execution_role_ecs_policy_attach" {
role = aws_iam_role.supermetal_task_execution_role.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}
# Managed Policy for SupermetalTaskExecutionRole
resource "aws_iam_policy" "supermetal_task_execution_policy" {
name = "SupermetalTaskExecutionPolicy-${var.vpc_id}"
description = "Policy for Supermetal Task Execution Role"
policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Effect = "Allow",
Action = [
"secretsmanager:GetSecretValue"
],
Resource = "arn:aws:secretsmanager:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:secret:supermetal-*"
},
{
Effect = "Allow",
Action = [
"kms:Decrypt"
],
Resource = local.create_kms_key ? aws_kms_key.supermetal_kms_key[0].arn : var.credentials_key_arn
}
]
})
}
resource "aws_iam_role_policy_attachment" "supermetal_task_execution_role_custom_policy_attach" {
role = aws_iam_role.supermetal_task_execution_role.name
policy_arn = aws_iam_policy.supermetal_task_execution_policy.arn
}
# KMS Key for Connector Credentials
resource "aws_kms_key" "supermetal_kms_key" {
count = local.create_kms_key ? 1 : 0
description = "KMS key for encrypting Supermetal connector credentials"
enable_key_rotation = true
tags = merge(local.common_tags, {
Name = "SupermetalKMSKey-${var.vpc_id}"
})
policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Sid = "Enable IAM User Permissions",
Effect = "Allow",
Principal = {
AWS = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"
},
Action = "kms:*",
Resource = "*"
},
{
Sid = "Allow Supermetal Control Plane to Encrypt",
Effect = "Allow",
Principal = {
AWS = aws_iam_role.supermetal_control_plane_role.arn
},
Action = "kms:Encrypt",
Resource = "*"
},
{
Sid = "Allow Supermetal Agents to Decrypt/Encrypt",
Effect = "Allow",
Principal = {
AWS = aws_iam_role.supermetal_agent_task_role.arn
},
Action = [
"kms:Decrypt",
"kms:Encrypt"
],
Resource = "*"
}
]
})
}
resource "aws_kms_alias" "supermetal_kms_key_alias" {
count = local.create_kms_key ? 1 : 0
name = "alias/supermetal-key-${var.vpc_id}"
target_key_id = aws_kms_key.supermetal_kms_key[0].key_id
}
# IAM Role for Supermetal Agent EC2 Instances
resource "aws_iam_role" "supermetal_agent_ec2_instance_role" {
name = "SupermetalAgentEC2InstanceRole-${var.vpc_id}"
path = "/"
tags = merge(local.common_tags, {
Name = "SupermetalAgentEC2InstanceRole-${var.vpc_id}"
})
assume_role_policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Effect = "Allow",
Principal = {
Service = "ec2.amazonaws.com"
},
Action = "sts:AssumeRole"
}
]
})
}
# Managed Policy for EC2 Instance Role
resource "aws_iam_policy" "supermetal_agent_ec2_instance_policy" {
name = "SupermetalAgentEC2InstancePolicy-${var.vpc_id}"
description = "Policy for Supermetal EC2 Instance Role"
policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Effect = "Allow",
Action = [
"ecr:GetAuthorizationToken"
],
Resource = "*"
},
{
Effect = "Allow",
Action = [
"ecr:BatchGetImage",
"ecr:GetDownloadUrlForLayer"
],
Resource = "arn:aws:ecr:${data.aws_region.current.name}:${var.supermetal_control_plane_aws_account_id}:repository/agent/*"
}
]
})
}
resource "aws_iam_role_policy_attachment" "supermetal_agent_ec2_instance_role_ecs_policy_attach" {
role = aws_iam_role.supermetal_agent_ec2_instance_role.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role"
}
resource "aws_iam_role_policy_attachment" "supermetal_agent_ec2_instance_role_custom_policy_attach" {
role = aws_iam_role.supermetal_agent_ec2_instance_role.name
policy_arn = aws_iam_policy.supermetal_agent_ec2_instance_policy.arn
}
resource "aws_iam_instance_profile" "supermetal_agent_ec2_instance_profile" {
name = "SupermetalAgentEC2InstanceProfile-${var.vpc_id}"
role = aws_iam_role.supermetal_agent_ec2_instance_role.name
path = "/"
tags = merge(local.common_tags, {
Name = "SupermetalAgentEC2InstanceProfile-${var.vpc_id}"
})
}
# Security Group for Supermetal Agents (for Metadata Endpoint communication)
resource "aws_security_group" "supermetal_agent_metadata_sg" {
name = "supermetal-agent-metadata-sg-${var.vpc_id}"
description = "Security group for Supermetal Agents"
vpc_id = var.vpc_id
tags = merge(local.common_tags, {
Name = "supermetal-agent-metadata-sg-${var.vpc_id}"
})
}
# Security Group for Metadata VPC Endpoint
resource "aws_security_group" "metadata_vpc_endpoint_sg" {
name = "supermetal-metadata-vpce-sg-${var.vpc_id}"
description = "Security group for Supermetal metadata VPC Endpoint"
vpc_id = var.vpc_id
tags = merge(local.common_tags, {
Name = "supermetal-metadata-vpce-sg-${var.vpc_id}"
})
ingress {
description = "Allow agents to connect to the endpoint (HTTPS)"
from_port = 443
to_port = 443
protocol = "tcp"
security_groups = [aws_security_group.supermetal_agent_metadata_sg.id]
}
}
# VPC Endpoint for Supermetal Metadata Service
resource "aws_vpc_endpoint" "metadata_vpc_endpoint" {
vpc_id = var.vpc_id
service_name = var.supermetal_metadata_service_endpoint_name
vpc_endpoint_type = "Interface"
subnet_ids = var.private_subnet_ids
private_dns_enabled = true
security_group_ids = [
aws_security_group.metadata_vpc_endpoint_sg.id
]
tags = merge(local.common_tags, {
Name = "MetadataVPCEndpoint-${var.vpc_id}"
})
}
# ------------------------------------------------------------------------------
# Outputs
# ------------------------------------------------------------------------------
output "supermetal_control_plane_role_arn" {
description = "ARN of the IAM Role for Supermetal Control Plane"
value = aws_iam_role.supermetal_control_plane_role.arn
}
output "supermetal_agent_task_role_arn" {
description = "ARN of the IAM Role for Supermetal Agent Tasks"
value = aws_iam_role.supermetal_agent_task_role.arn
}
output "supermetal_kms_key_arn_output" {
description = "ARN of the KMS Key for encrypting credentials"
value = local.create_kms_key ? aws_kms_key.supermetal_kms_key[0].arn : var.credentials_key_arn
}
output "supermetal_task_execution_role_arn" {
description = "ARN of the IAM Role for ECS Task Execution"
value = aws_iam_role.supermetal_task_execution_role.arn
}
output "metadata_vpc_endpoint_id" {
description = "ID of the VPC Endpoint for Supermetal Metadata Service"
value = aws_vpc_endpoint.metadata_vpc_endpoint.id
}
output "supermetal_agent_metadata_security_group_id" {
description = "ID of the Security Group for Supermetal Agents (for Metadata Endpoint communication)"
value = aws_security_group.supermetal_agent_metadata_sg.id
}
output "supermetal_validation_lambda_role_arn" {
description = "ARN of the IAM Role for Supermetal Validation Lambda functions"
value = aws_iam_role.supermetal_validation_lambda_role.arn
}
output "supermetal_agent_ec2_instance_role_arn" {
description = "ARN of the IAM Role for Supermetal Agent EC2 Instances"
value = aws_iam_role.supermetal_agent_ec2_instance_role.arn
}
output "supermetal_agent_ec2_instance_profile_arn" {
description = "ARN of the IAM Instance Profile for Supermetal Agent EC2 Instances"
value = aws_iam_instance_profile.supermetal_agent_ec2_instance_profile.arn
}
```
***
### IAM Roles & Policies
Bootstrap creates several IAM roles and policies with specific permissions necessary for Supermetal BYOC to function securely.
These roles ensure that the Supermetal Control Plane can manage resources in your AWS account on your behalf, and that the Supermetal agents have the necessary permissions to operate.
The principle of least privilege is applied to scope down permissions as much as possible.
Key roles created and their purposes:
#### SupermetalControlPlaneRole
```json
{
"SupermetalControlPlaneRole": {
"Type": "AWS::IAM::Role",
"Properties": {
"RoleName": "SupermetalControlPlaneRole-${VPC_ID}",
"Tags": [
{ "Key": "Name", "Value": "SupermetalControlPlaneRole-${VPC_ID}" },
{ "Key": "supermetal:byoc:vpc-id", "Value": "${VPC_ID}" },
{ "Key": "supermetal:byoc:stack-name", "Value": "${STACK_NAME}" },
{ "Key": "supermetal:byoc:external-id", "Value": "${EXTERNAL_ID}" }
],
"AssumeRolePolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": { "AWS": "arn:aws:iam::${SUPERMETAL_CONTROL_PLANE_AWS_ACCOUNT_ID}:root" },
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": { "sts:ExternalId": "${EXTERNAL_ID}" }
}
}
]
},
"ManagedPolicyArns": [
"arn:aws:iam::${AWS_ACCOUNT_ID}:policy/SupermetalControlPlanePolicy-${VPC_ID}"
]
}
}
```
```json
{
"SupermetalControlPlanePolicy": {
"Type": "AWS::IAM::ManagedPolicy",
"Properties": {
"ManagedPolicyName": "SupermetalControlPlanePolicy-${VPC_ID}",
"PolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["ecs:CreateCluster", "ecs:DeleteCluster", "ecs:DescribeClusters"],
"Resource": "*",
"Condition": { "StringLike": { "ecs:cluster": "supermetal-*" } }
},
{
"Effect": "Allow",
"Action": [
"ecs:CreateService", "ecs:UpdateService", "ecs:DeleteService", "ecs:RegisterTaskDefinition",
"ecs:DeregisterTaskDefinition", "ecs:DescribeServices", "ecs:DescribeTasks", "ecs:RunTask",
"ecs:StopTask", "ecs:ListTasks", "ecs:ListServices", "ecs:DescribeTaskDefinition",
"ecs:ListContainerInstances", "ecs:DescribeContainerInstances"
],
"Resource": "*",
"Condition": { "StringLike": { "ecs:cluster": "supermetal-*" } }
},
{
"Effect": "Allow",
"Action": [
"ecs:CreateCapacityProvider", "ecs:DeleteCapacityProvider", "ecs:DescribeCapacityProviders",
"ecs:UpdateCapacityProvider", "ecs:PutClusterCapacityProviders"
],
"Resource": "*",
"Condition": { "StringLike": {"ecs:capacity-provider": "supermetal-*"} }
},
{
"Effect": "Allow",
"Action": [
"autoscaling:CreateAutoScalingGroup", "autoscaling:UpdateAutoScalingGroup", "autoscaling:DeleteAutoScalingGroup",
"autoscaling:DescribeAutoScalingGroups", "autoscaling:DescribeScalingActivities", "autoscaling:SetDesiredCapacity",
"ec2:CreateLaunchTemplate", "ec2:DeleteLaunchTemplate", "ec2:DescribeLaunchTemplates",
"ec2:DescribeLaunchTemplateVersions", "ec2:CreateTags", "ec2:DescribeInstances"
],
"Resource": "*",
"Condition": { "StringLike": { "aws:ResourceTag/supermetal": "*" } }
},
{
"Effect": "Allow",
"Action": [
"lambda:InvokeFunction", "lambda:CreateFunction", "lambda:DeleteFunction",
"lambda:GetFunctionConfiguration", "lambda:UpdateFunctionConfiguration",
"lambda:AddPermission", "lambda:RemovePermission"
],
"Resource": "arn:aws:lambda:${AWS_REGION}:${AWS_ACCOUNT_ID}:function:supermetal-validation-*"
},
{
"Effect": "Allow",
"Action": [
"ec2:CreateNetworkInterface", "ec2:DescribeNetworkInterfaces", "ec2:DeleteNetworkInterface",
"ec2:AssignPrivateIpAddresses", "ec2:UnassignPrivateIpAddresses"
],
"Resource": "*",
"Condition": { "StringEquals": { "ec2:VpcId": "${VPC_ID}"} }
},
{
"Effect": "Allow",
"Action": ["rds:DescribeDBInstances", "rds:DescribeDBClusters", "redshift:DescribeClusters"],
"Resource": "*",
"Condition": { "StringEquals": { "ec2:VpcId": "${VPC_ID}"} }
},
{
"Effect": "Allow",
"Action": [
"s3:CreateBucket", "s3:DeleteBucket", "s3:PutBucketTagging", "s3:GetBucketTagging",
"s3:PutEncryptionConfiguration", "s3:GetEncryptionConfiguration"
],
"Resource": "arn:aws:s3:::supermetal-*"
},
{
"Effect": "Allow",
"Action": "ec2:DescribeSecurityGroups",
"Resource": "*",
"Condition": { "StringEquals": { "ec2:VpcId": "${VPC_ID}"} }
},
{
"Effect": "Allow",
"Action": [
"secretsmanager:CreateSecret", "secretsmanager:DeleteSecret", "secretsmanager:GetSecretValue",
"secretsmanager:PutSecretValue", "secretsmanager:UpdateSecret", "secretsmanager:TagResource"
],
"Resource": "arn:aws:secretsmanager:${AWS_REGION}:${AWS_ACCOUNT_ID}:secret:supermetal-*"
},
{
"Effect": "Allow",
"Action": "kms:Encrypt",
"Resource": "${CUSTOMER_CREDENTIALS_KMS_KEY_ARN}"
},
{
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": [
"${SUPERMETAL_AGENT_TASK_ROLE_ARN}",
"${SUPERMETAL_VALIDATION_LAMBDA_ROLE_ARN}",
"${SUPERMETAL_TASK_EXECUTION_ROLE_ARN}",
"${SUPERMETAL_AGENT_EC2_INSTANCE_ROLE_ARN}"
],
"Condition": {
"StringEquals": {
"iam:PassedToService": ["ecs-tasks.amazonaws.com", "lambda.amazonaws.com", "ec2.amazonaws.com"]
}
}
}
]
}
}
}
}
```
This role is assumed by the Supermetal Control Plane. It is granted permissions to orchestrate and manage the lifecycle of resources required for the Supermetal data plane within your AWS account.
* **ECS Resources**
* Create and manage ECS clusters, services, tasks, and capacity providers (e.g., `ecs:CreateCluster`, `ecs:DeleteCluster`, `ecs:CreateService`, `ecs:RunTask`)
* Resources are restricted to those tagged or named with `supermetal-*` prefix
* **EC2 and Auto Scaling**
* Manage EC2 instances and launch templates (`ec2:CreateLaunchTemplate`, `ec2:DescribeInstances`, `ec2:CreateTags`)
* Control Auto Scaling groups (`autoscaling:CreateAutoScalingGroup`, `autoscaling:UpdateAutoScalingGroup`)
* Resources are typically conditioned on tags like `supermetal:*`
* Network interface management for VPC access (`ec2:CreateNetworkInterface`, `ec2:DescribeNetworkInterfaces`)
* Security group discovery (`ec2:DescribeSecurityGroups`) for network configuration
* **Lambda Functions**
* Deploy and manage validation functions named `supermetal-validation-*` (`lambda:CreateFunction`, `lambda:InvokeFunction`, `lambda:DeleteFunction`)
* VPC access permissions for Lambda functions
* Pre-deployment validation and auxiliary tasks within your VPC
* **Database Access**
* Read-only discovery of potential data sources (`rds:DescribeDBInstances`, `rds:DescribeDBClusters`, `redshift:DescribeClusters`)
* Scoped to resources within specified VPC (`ec2:VpcId`)
* **Storage and Data Management**
* S3 bucket management (`s3:CreateBucket`, `s3:DeleteBucket`, `s3:PutBucketTagging`) for buckets prefixed with `supermetal-*`
* Used for data buffering and staging per connector
* **Security and Access Control**
* Secrets Manager: Full control over secrets prefixed with `supermetal-*` (`secretsmanager:CreateSecret`, `secretsmanager:GetSecretValue`)
* KMS: Encryption permissions (`kms:Encrypt`) on specified KMS keys
* IAM: Pass roles to AWS services (`iam:PassRole`) for ECS tasks, Lambda functions, and EC2 instances
***
#### SupermetalAgentRole
```json
{
"SupermetalAgentTaskRole": {
"Type": "AWS::IAM::Role",
"Properties": {
"RoleName": "SupermetalAgentTaskRole-${VPC_ID}",
"Tags": [
{ "Key": "Name", "Value": "SupermetalAgentTaskRole-${VPC_ID}" },
{ "Key": "supermetal:byoc:vpc-id", "Value": "${VPC_ID}" },
{ "Key": "supermetal:byoc:stack-name", "Value": "${STACK_NAME}" },
{ "Key": "supermetal:byoc:external-id", "Value": "${EXTERNAL_ID}" }
],
"AssumeRolePolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": { "Service": "ecs-tasks.amazonaws.com" },
"Action": "sts:AssumeRole",
"Condition": {
"ArnLike": { "aws:SourceArn": "arn:aws:ecs:${AWS_REGION}:${AWS_ACCOUNT_ID}:task/supermetal-*" },
"StringEquals": { "aws:SourceAccount": "${AWS_ACCOUNT_ID}" }
}
}
]
},
"ManagedPolicyArns": [
"arn:aws:iam::${AWS_ACCOUNT_ID}:policy/SupermetalAgentTaskPolicy-${VPC_ID}"
],
"Path": "/"
}
}
}
```
```json
{
"SupermetalAgentTaskPolicy": {
"Type": "AWS::IAM::ManagedPolicy",
"Properties": {
"ManagedPolicyName": "SupermetalAgentTaskPolicy-${VPC_ID}",
"PolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["kms:Decrypt", "kms:Encrypt"],
"Resource": "${CUSTOMER_CREDENTIALS_KMS_KEY_ARN}"
},
{
"Effect": "Allow",
"Action": ["secretsmanager:GetSecretValue"],
"Resource": "arn:aws:secretsmanager:${AWS_REGION}:${AWS_ACCOUNT_ID}:secret:supermetal-*"
},
{
"Effect": "Allow",
"Action": ["s3:PutObject", "s3:GetObject", "s3:DeleteObject", "s3:ListBucket"],
"Resource": [
"arn:aws:s3:::supermetal-*",
"arn:aws:s3:::supermetal-*/*"
]
}
]
}
}
}
}
```
This IAM role is assumed by the Supermetal agent tasks running on ECS within your AWS account. It grants the agent the necessary permissions to perform its data processing duties:
* **Data Security and Encryption**
* KMS: Encrypt and decrypt operations (`kms:Decrypt`, `kms:Encrypt`) on specified KMS key
* Used for connector credential decryption
* **Configuration Management**
* Secrets Manager: Access to secrets (`secretsmanager:GetSecretValue`) prefixed with `supermetal-*`
* Retrieves necessary API keys for Supermetal control plane and to authenticate to ECR repository for agent images
* **Data Processing**
* S3: Full object operations on `supermetal-*` buckets (`s3:PutObject`, `s3:GetObject`, `s3:DeleteObject`, `s3:ListBucket`)
* Used to create buckets per connector to buffer data
***
#### SupermetalTaskExecutionRole
```json
{
"SupermetalTaskExecutionRole": {
"Type": "AWS::IAM::Role",
"Properties": {
"RoleName": "SupermetalTaskExecutionRole-${VPC_ID}",
"Tags": [
{ "Key": "Name", "Value": "SupermetalTaskExecutionRole-${VPC_ID}" },
{ "Key": "supermetal:byoc:vpc-id", "Value": "${VPC_ID}" },
{ "Key": "supermetal:byoc:stack-name", "Value": "${STACK_NAME}" },
{ "Key": "supermetal:byoc:external-id", "Value": "${EXTERNAL_ID}" }
],
"AssumeRolePolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": { "Service": "ecs-tasks.amazonaws.com" },
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": { "aws:SourceAccount": "${AWS_ACCOUNT_ID}" },
"ArnLike": { "aws:SourceArn": "arn:aws:ecs:${AWS_REGION}:${AWS_ACCOUNT_ID}:task-definition/supermetal*:*" }
}
}
]
},
"ManagedPolicyArns": [
"arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy",
"arn:aws:iam::${AWS_ACCOUNT_ID}:policy/SupermetalTaskExecutionPolicy-${VPC_ID}"
],
"Path": "/"
}
}
}
```
```json
{
"SupermetalTaskExecutionPolicy": {
"Type": "AWS::IAM::ManagedPolicy",
"Properties": {
"ManagedPolicyName": "SupermetalTaskExecutionPolicy-${VPC_ID}",
"PolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["secretsmanager:GetSecretValue"],
"Resource": "arn:aws:secretsmanager:${AWS_REGION}:${AWS_ACCOUNT_ID}:secret:supermetal-*"
},
{
"Effect": "Allow",
"Action": ["kms:Decrypt"],
"Resource": "${CUSTOMER_CREDENTIALS_KMS_KEY_ARN}"
}
]
}
}
}
}
```
This role is used by Amazon ECS to launch your agent tasks. It has two sets of permissions:
* **Core ECS Operations**
* Uses AWS managed policy `arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy`
* Pull container images from Amazon ECR
* **Agent Configuration**
* Custom policy `SupermetalTaskExecutionPolicy` provides:
* Access to Secrets Manager (`secretsmanager:GetSecretValue`) for fetching Control Plane API secrets
* KMS decryption (`kms:Decrypt`) to decrypt connector credentials
* Makes configuration available to agent containers via environment variables
***
#### SupermetalAgentEC2InstanceRole
```json
{
"SupermetalAgentEC2InstanceRole": {
"Type": "AWS::IAM::Role",
"Properties": {
"RoleName": "SupermetalAgentEC2InstanceRole-${VPC_ID}",
"Tags": [
{ "Key": "Name", "Value": "SupermetalAgentEC2InstanceRole-${VPC_ID}" },
{ "Key": "supermetal:byoc:vpc-id", "Value": "${VPC_ID}" },
{ "Key": "supermetal:byoc:stack-name", "Value": "${STACK_NAME}" },
{ "Key": "supermetal:byoc:external-id", "Value": "${EXTERNAL_ID}" }
],
"AssumeRolePolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": { "Service": "ec2.amazonaws.com" },
"Action": "sts:AssumeRole"
}
]
},
"ManagedPolicyArns": [
"arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role",
"arn:aws:iam::${AWS_ACCOUNT_ID}:policy/SupermetalAgentEC2InstancePolicy-${VPC_ID}"
],
"Path": "/"
}
}
}
```
```json
{
"SupermetalAgentEC2InstancePolicy": {
"Type": "AWS::IAM::ManagedPolicy",
"Properties": {
"ManagedPolicyName": "SupermetalAgentEC2InstancePolicy-${VPC_ID}",
"PolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["ecr:GetAuthorizationToken"],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": ["ecr:BatchGetImage", "ecr:GetDownloadUrlForLayer"],
"Resource": "arn:aws:ecr:${AWS_REGION}:${SUPERMETAL_CONTROL_PLANE_AWS_ACCOUNT_ID}:repository/agent/*"
}
]
}
}
}
}
```
This role is attached to EC2 instances that form your ECS cluster's capacity when using the EC2 launch type. It has two sets of permissions:
* **Core ECS Instance Operations**
* Uses AWS managed policy `arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role`
* Register with ECS cluster
* Pull common container images
* Send instance logs
* **Supermetal Agent Access**
* Custom policy `SupermetalAgentEC2InstancePolicy` provides:
* ECR authentication (`ecr:GetAuthorizationToken`)
* Image pulling from Supermetal's ECR repository (`ecr:BatchGetImage`, `ecr:GetDownloadUrlForLayer`)
* Enables instances to pull agent container images from Supermetal's control plane
***
#### SupermetalValidationLambdaRole
```json
{
"SupermetalValidationLambdaRole": {
"Type": "AWS::IAM::Role",
"Properties": {
"RoleName": "SupermetalValidationLambdaRole-${VPC_ID}",
"Tags": [
{ "Key": "Name", "Value": "SupermetalValidationLambdaRole-${VPC_ID}" },
{ "Key": "supermetal:byoc:vpc-id", "Value": "${VPC_ID}" },
{ "Key": "supermetal:byoc:stack-name", "Value": "${STACK_NAME}" },
{ "Key": "supermetal:byoc:external-id", "Value": "${EXTERNAL_ID}" }
],
"AssumeRolePolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": { "Service": "lambda.amazonaws.com" },
"Action": "sts:AssumeRole"
}
]
},
"ManagedPolicyArns": [
"arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole"
],
"Path": "/"
}
}
}
```
This role is assumed by Lambda functions that perform pre-deployment validation tasks. It provides:
* **VPC Access and Networking**
* Uses AWS managed policy `arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole`
* Create and manage elastic network interfaces
* Access resources within your VPC
* Send function logs to CloudWatch
# Azure (/docs/main/concepts/deployment/byoc/azure)
import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
import { Step, Steps } from 'fumadocs-ui/components/steps';
import { Accordion, Accordions } from 'fumadocs-ui/components/accordion';
import ByocAzureDiagram from '../../../../../../app/diagrams/byoc_azure';
Follow this guide to provision Supermetal BYOC (Bring Your Own Cloud), which deploys Supermetal's data replication agents in your own Azure virtual cloud environment, maintaining full control and ownership over your data.
This provides an additional layer of security and isolation. Supermetal handles provisioning, operations, and maintenance.
***
## Overview
Supermetal's BYOC model for Azure separates responsibilities between Supermetal's control plane and your Azure environment. The data replication agents run within your Azure Virtual Network (VNet), giving you full control over your data while Supermetal manages the agent lifecycle.
| Phase | Resource | Description |
| ---------------------- | -------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Bootstrap** | Azure AD Service Principal | Represents Supermetal's application and allows it to authenticate to your Azure AD tenant with specific permissions. |
| | Azure Role Definition | Defines the precise set of permissions Supermetal requires to manage resources in your Azure subscription. Designed for least privilege. |
| | Azure Role Assignment | Assigns the Custom Role to Supermetal's Service Principal, scoped to a designated Resource Group. |
| | Azure Key Vault | Securely stores secrets and keys used by Supermetal for bootstrap information and by agents for runtime credentials. You can use your own Customer-Managed Keys (CMK). |
| | Private Endpoint | An Azure Private Endpoint for secure communication between Supermetal agents and the Supermetal control plane via Azure Private Link. |
| | Network Security Group Rules | Security rules to allow necessary traffic for the Supermetal agents and Private Endpoint. |
| **Registration** | Key Vault Secrets | Secrets stored in your Azure Key Vault for agent bootstrap and operation. |
| **Connector Creation** | Virtual Machine Scale Set (VMSS) | Hosts the Supermetal data replication agents with dedicated hardware capacity. |
| | User-Assigned Managed Identity | Assigned to the VMSS instances, granting agents permissions to access necessary Azure resources. |
| | Azure Blob Storage | Used by agents for temporary buffering or staging data if required by the replication process. |
## Prerequisites
To successfully deploy Supermetal BYOC in your Azure environment, please ensure you have the following:
#### Supermetal Account Identifiers
You'll need the following Supermetal Azure identifiers (obtainable from the Supermetal Console under Settings):
* `supermetal_tenant_id`: Tenant ID of Supermetal's Azure AD
* `supermetal_app_id`: The Application (Client) ID of Supermetal's Azure AD application registration
* `supermetal_principal_id`: The Object ID of Supermetal's service principal (required for role assignments)
* `external_id`: A unique identifier for securing cross-tenant access
#### Azure Account & Permissions
* An Azure subscription with Owner or User Access Administrator permissions to create service principals and assign roles
#### Deployment Parameters
* **Azure Subscription ID (`subscription_id`):** The ID of your Azure subscription where resources will be deployed
* **Azure Region (`location`):** The Azure region for deployment (e.g., `eastus2`, `westeurope`)
* **Resource Group (`resource_group_name`):** The name of a Resource Group where Supermetal will deploy resources
* **VNet ID (`vnet_id`):** The resource ID of your existing Virtual Network where Supermetal agents will be deployed
* **Subnet IDs (`private_subnet_ids`):** A list of subnet resource IDs within your specified VNet. Supermetal agents will be deployed in these subnets. Ensure these subnets are in the same region as your source (and target) databases
* **(Optional) Key Vault ID (`credentials_key_vault_id`):** Decide if you'll use an existing Azure Key Vault for storing sensitive connector credentials. If you have an existing Key Vault, have its resource ID ready. If left blank, the bootstrap process will create a new Key Vault specifically for Supermetal
***
## Setup
A two-phase process: first bootstrap the environment, then create connectors to deploy the data plane / agents in your Azure VNet.
#### Environment Bootstrap
The bootstrap process uses a Terraform script provided by Supermetal to provision the necessary resources, including the Service Principal, Role, and private endpoint.
These resources enable Supermetal to automatically register your Azure environment with the Supermetal control plane over the private endpoint, and to deploy Supermetal data plane / agents in your VNet.
The initial environment bootstrap process does not provision any compute resources or the data plane / agents. Once the environment bootstrap is complete, you can create connectors to deploy Supermetal data plane / agents on-demand in your VNet.
#### Obtain Terraform Script \[!toc]
* Obtain the latest Terraform script from the Supermetal Console. The Terraform code in the [appendix](#environment-bootstrap-scripts) can be used as a reference.
#### Configure Terraform \[!toc]
* Ensure you have [Terraform installed](https://learn.hashicorp.com/tutorials/terraform/install-cli) (version 1.0 or later).
* Set up your Azure credentials for Terraform. This typically involves authenticating with the Azure CLI using `az login` and ensuring you have the appropriate permissions.
* Verify access by running `az account show`. You should see your Azure subscription details.
#### Prepare Terraform Files \[!toc]
* Create a new directory for your Terraform configuration.
* If using the appendix script, save the Terraform code into files within this directory (e.g., `main.tf`).
* Create a `terraform.tfvars` file in the same directory to specify the required variable values. Alternatively, you can pass variables via the command line.
**Example `terraform.tfvars`:**
```tfvars
# Supermetal identifiers (provided by Supermetal)
supermetal_tenant_id = "00000000-0000-0000-0000-000000000000"
supermetal_app_id = "00000000-0000-0000-0000-000000000000"
supermetal_principal_id = "00000000-0000-0000-0000-000000000000"
external_id = "unique-external-id-from-supermetal"
# Azure environment parameters
subscription_id = "00000000-0000-0000-0000-000000000000"
location = "eastus2"
resource_group_name = "supermetal-byoc-rg"
vnet_id = "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/network-rg/providers/Microsoft.Network/virtualNetworks/production-vnet"
private_subnet_ids = [
"/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/network-rg/providers/Microsoft.Network/virtualNetworks/production-vnet/subnets/data-subnet",
"/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/network-rg/providers/Microsoft.Network/virtualNetworks/production-vnet/subnets/private-endpoint-subnet"
]
# credentials_key_vault_id = "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/security-rg/providers/Microsoft.KeyVault/vaults/my-existing-keyvault" # Optional
```
Refer to the [Deployment Parameters](#deployment-parameters) section and the variable descriptions within the Terraform script for detailed explanations of each.
* `supermetal_tenant_id` - Tenant ID of Supermetal's Azure AD
* `supermetal_app_id` - Application (Client) ID of Supermetal's Azure AD application
* `supermetal_principal_id` - Object ID of Supermetal's service principal
* `external_id` - Unique identifier for securing cross-tenant access
* `subscription_id` - Your Azure subscription ID
* `location` - Azure region for deployment
* `resource_group_name` - Resource group name for Supermetal resources
* `vnet_id` - Resource ID of your existing VNet
* `private_subnet_ids` - List of subnet resource IDs within your VNet
* `credentials_key_vault_id` (Optional) - Resource ID of your existing Key Vault
#### Deploy with Terraform CLI \[!toc]
* Navigate to your Terraform configuration directory in your terminal.
* Initialize Terraform:
```bash
terraform init
```
* Review the execution plan with your tfvars file:
```bash
terraform plan -var-file=terraform.tfvars
```
* Apply the configuration using your tfvars file:
```bash
terraform apply -var-file=terraform.tfvars
```
Confirm the action by typing `yes` when prompted.
#### Verification \[!toc]
* Monitor the `terraform apply` command output. It will indicate when the resources have been successfully created.
* Once the Terraform apply is complete, Supermetal's Control Plane automatically performs a registration process for the new environment.
* Monitor the Environments page in your Supermetal Console. The environment should show up with a "Ready" status within a few minutes after Terraform completes.
#### Create Connector(s) to deploy Supermetal agents
Once the environment bootstrap is complete, we are now ready to create a connector to deploy Supermetal agents in your Azure environment.
#### Select source database \[!toc]
Follow the connector setup instructions in the Supermetal Console to select a source database, supermetal can optionally [auto discover](#database_discovery) available source databases.
#### Configure Database Network Security Group \[!toc]
To give Supermetal access to your source database and to validate network reachability and connection credentials, you need to ensure your database's network security group allows connections from the Supermetal agent subnet.
Supermetal Console will list pre-filled steps to configure the database access as part of the prerequisite steps similar to the steps listed below.
```bash
# Set variables for your environment
RESGROUP="your-resource-group-name"
CONNECTOR_NAME="connector1" # Unique identifier for this connector
NSG_NAME="supermetal-${CONNECTOR_NAME}-nsg" # Unique NSG name for this connector
DB_NSG_NAME="your-db-nsg-name"
VNET_NAME="your-vnet-name"
AGENT_SUBNET_NAME="your-agent-subnet-name" # Subnet where Supermetal agents will be deployed
LOCATION="eastus2"
DB_PORT=5432 # For PostgreSQL
SUPERMETAL_METADATA_ENDPOINT="Provided-By-Supermetal" # Specific endpoint for Supermetal metadata service
# Create the NSG for Supermetal agents
az network nsg create \
--resource-group $RESGROUP \
--name $NSG_NAME \
--location $LOCATION
# Add outbound rule for metadata private endpoint access
az network nsg rule create \
--resource-group $RESGROUP \
--nsg-name $NSG_NAME \
--name "AllowMetadataAccess" \
--priority 100 \
--direction "Outbound" \
--access "Allow" \
--protocol "Tcp" \
--destination-port-ranges 443 \
--destination-address-prefixes $SUPERMETAL_METADATA_ENDPOINT \
--description "Allow access to Supermetal metadata service"
# Add outbound rule for database access
az network nsg rule create \
--resource-group $RESGROUP \
--nsg-name $NSG_NAME \
--name "AllowDatabaseAccess" \
--priority 110 \
--direction "Outbound" \
--access "Allow" \
--protocol "Tcp" \
--destination-port-ranges $DB_PORT \
--destination-address-prefixes "YOUR_DATABASE_SUBNET_PREFIX" \
--description "Allow access to database"
# Add inbound rule to database NSG to allow access from Supermetal agent
az network nsg rule create \
--resource-group $RESGROUP \
--nsg-name $DB_NSG_NAME \
--name "AllowSupermetalAgentAccess" \
--priority 100 \
--direction "Inbound" \
--access "Allow" \
--protocol "Tcp" \
--source-address-prefixes "YOUR_AGENT_SUBNET_PREFIX" \
--destination-port-ranges $DB_PORT \
--description "Allow Supermetal agent access to database"
# Associate the NSG with the agent subnet
az network vnet subnet update \
--resource-group $RESGROUP \
--vnet-name $VNET_NAME \
--name $AGENT_SUBNET_NAME \
--network-security-group $NSG_NAME
```
* `RESGROUP`: Your Azure resource group name where the NSGs will be created
* `CONNECTOR_NAME`: A unique identifier for this connector (important if you're deploying multiple connectors)
* `VNET_NAME`: The name of your Azure Virtual Network
* `AGENT_SUBNET_NAME`: The name of the subnet where Supermetal agents will be deployed
* `YOUR_DATABASE_SUBNET_PREFIX`: CIDR range of your database subnet (e.g., "10.0.2.0/24")
* `YOUR_AGENT_SUBNET_PREFIX`: CIDR range of the subnet where Supermetal agents will be deployed
* `DB_PORT`: The port number your database listens on (e.g., 5432 for PostgreSQL)
* `SUPERMETAL_METADATA_ENDPOINT`: Specific endpoint for the Supermetal metadata service (provided by Supermetal)
#### Create Network Security Group and Configure Database Access \[!toc]
1. Navigate to the **Azure Portal** > **Network Security Groups**
2. Click **Create network security group**
3. Configure the basic details:
* **Resource group:** Select your resource group
* **Name:** "supermetal-`${connector-name}`-nsg" (e.g., "supermetal-connector1-nsg")
* **Region:** Select the same region as your Azure resources
4. Click **Review + create** and then **Create**
#### Configure Outbound Rules \[!toc]
5. Once created, navigate to the new NSG
6. Select **Outbound security rules**
7. Click **Add** to create an outbound rule for metadata access:
* **Destination:** IP Addresses
* **Destination IP addresses:** Supermetal-provided metadata endpoint address
* **Service:** HTTPS
* **Port ranges:** 443
* **Priority:** 100
* **Name:** "AllowMetadataAccess"
8. Add another outbound rule for database access:
* **Destination:** IP Addresses
* **Destination IP addresses/CIDR ranges:** Your database subnet CIDR
* **Service:** Custom
* **Protocol:** TCP
* **Port ranges:** Your database port (e.g., 5432)
* **Priority:** 110
* **Name:** "AllowDatabaseAccess"
#### Configure Database NSG Inbound Rule \[!toc]
9. Navigate to your database's NSG
10. Select **Inbound security rules**
11. Click **Add** to create a new rule:
* **Source:** IP Addresses
* **Source IP addresses/CIDR ranges:** Your agent subnet CIDR
* **Service:** Custom
* **Protocol:** TCP
* **Port ranges:** Your database port (e.g., 5432)
* **Priority:** 100
* **Name:** "AllowSupermetal`${ConnectorName}`Access" (e.g., "AllowSupermetalConnector1Access")
12. Save the rule
#### Associate NSG with Subnet \[!toc]
13. Navigate to **Virtual Networks** > your VNet > **Subnets**
14. Click on the subnet where Supermetal agents will be deployed
15. Under **Network security group**, select the NSG you created earlier
16. Click **Save**
#### Create Network Security Group and Configure Database Access \[!toc]
```terraform
# Variables for connector-specific resources
variable "connector_name" {
description = "Unique identifier for this connector (used in resource naming)"
type = string
default = "connector1"
}
variable "vnet_name" {
description = "Name of your virtual network"
type = string
}
variable "agent_subnet_name" {
description = "Name of the subnet where Supermetal agents will be deployed"
type = string
}
variable "supermetal_metadata_endpoint" {
description = "Specific endpoint for Supermetal metadata service (provided by Supermetal)"
type = string
}
# Create the agent network security group with unique name for this connector
resource "azurerm_network_security_group" "supermetal_agent_nsg" {
name = "supermetal-${var.connector_name}-nsg"
location = var.location
resource_group_name = var.resource_group_name
# Outbound rule for metadata private endpoint access
security_rule {
name = "AllowMetadataAccess"
priority = 100
direction = "Outbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_range = "443"
source_address_prefix = "*"
destination_address_prefix = var.supermetal_metadata_endpoint
description = "Allow access to Supermetal metadata service"
}
# Outbound rule for database access
security_rule {
name = "AllowDatabaseAccess"
priority = 110
direction = "Outbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_range = var.database_port
source_address_prefix = "*"
destination_address_prefix = var.database_subnet_cidr
description = "Allow access to database"
}
}
# Look up existing database NSG
data "azurerm_network_security_group" "database_nsg" {
name = var.database_nsg_name
resource_group_name = var.database_resource_group_name
}
# Add inbound rule to database NSG
resource "azurerm_network_security_rule" "allow_supermetal_agent" {
name = "AllowSupermetal${title(var.connector_name)}Access"
priority = 100
direction = "Inbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_range = var.database_port
source_address_prefix = var.agent_subnet_cidr
destination_address_prefix = "*"
resource_group_name = var.database_resource_group_name
network_security_group_name = var.database_nsg_name
}
# Associate the NSG with the agent subnet
data "azurerm_virtual_network" "vnet" {
name = var.vnet_name
resource_group_name = var.resource_group_name
}
data "azurerm_subnet" "agent_subnet" {
name = var.agent_subnet_name
virtual_network_name = var.vnet_name
resource_group_name = var.resource_group_name
}
resource "azurerm_subnet_network_security_group_association" "nsg_association" {
subnet_id = data.azurerm_subnet.agent_subnet.id
network_security_group_id = azurerm_network_security_group.supermetal_agent_nsg.id
}
# Output the NSG ID for reference
output "supermetal_agent_nsg_id" {
value = azurerm_network_security_group.supermetal_agent_nsg.id
}
```
* `connector_name`: Unique identifier for this connector (important if you're deploying multiple connectors)
* `resource_group_name`: Your Azure resource group name
* `location`: Azure region for deployment
* `vnet_name`: The name of your Azure Virtual Network
* `agent_subnet_name`: The name of the subnet where Supermetal agents will be deployed
* `database_subnet_cidr`: CIDR range of your database subnet (e.g., "10.0.2.0/24")
* `agent_subnet_cidr`: CIDR range of the subnet where Supermetal agents will be deployed
* `database_nsg_name`: The name of your existing database NSG
* `database_resource_group_name`: The resource group containing your database NSG
* `database_port`: The port number your database listens on (e.g., 5432 for PostgreSQL)
* `supermetal_metadata_endpoint`: Specific endpoint for Supermetal metadata service (provided by Supermetal)
#### Validate Source database \[!toc]
Follow the setup instructions in the Supermetal Console to input the created NSG ID for the source database and validate the network reachability and connection credentials.
#### Repeat for Target database \[!toc]
Repeat the same steps for the target database.
#### Finalize \[!toc]
Once the source and target database connections are validated, you can finalize the connector setup which will launch the Supermetal agents in your Azure VNet.
***
## Appendix
### Environment Bootstrap Scripts
Supermetal provides Terraform scripts for bootstrapping your Azure environment.
The scripts create all necessary resources and permissions for Supermetal to manage the data replication agents in your Azure subscription.
```terraform
terraform {
required_version = ">= 1.0"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 3.0"
}
azuread = {
source = "hashicorp/azuread"
version = "~> 2.0"
}
}
}
provider "azurerm" {
features {
key_vault {
purge_soft_delete_on_destroy = false
recover_soft_deleted_key_vaults = true
}
}
}
provider "azuread" {}
# =============================================================================
# Data Sources
# =============================================================================
data "azurerm_client_config" "current" {}
data "azurerm_subscription" "current" {}
# =============================================================================
# Input Variables
# =============================================================================
variable "resource_group_name" {
description = "Name of the Resource Group where Supermetal will deploy resources"
type = string
validation {
condition = can(regex("^[a-zA-Z0-9-_]+$", var.resource_group_name))
error_message = "Resource group name must contain only alphanumeric characters, hyphens, and underscores."
}
}
variable "location" {
description = "Azure region for deployment (e.g., eastus, westeurope)"
type = string
}
variable "vnet_id" {
description = "Resource ID of your existing Virtual Network (e.g., /subscriptions/.../resourceGroups/.../providers/Microsoft.Network/virtualNetworks/...)"
type = string
}
variable "private_subnet_ids" {
description = "List of private subnet Resource IDs where agents and private endpoints will be deployed"
type = list(string)
validation {
condition = length(var.private_subnet_ids) > 0
error_message = "At least one private subnet must be provided."
}
}
variable "credentials_key_vault_id" {
description = "Resource ID of existing Key Vault for storing credentials. If empty, a new Key Vault will be created"
type = string
default = ""
}
variable "external_id" {
description = "Unique identifier provided by Supermetal Console for securing cross-tenant access"
type = string
sensitive = true
}
variable "supermetal_tenant_id" {
description = "Azure AD Tenant ID of Supermetal Control Plane"
type = string
}
variable "supermetal_app_id" {
description = "Azure AD Application ID of Supermetal Control Plane"
type = string
}
variable "supermetal_principal_id" {
description = "Object ID of Supermetal's Service Principal"
type = string
}
variable "supermetal_metadata_service_id" {
description = "Resource ID of Supermetal Metadata Service's Private Link Service"
type = string
}
variable "supermetal_container_registry_name" {
description = "Name of Supermetal's container registry for agent images"
type = string
}
# =============================================================================
# Local Values
# =============================================================================
locals {
# Parse VNet information
vnet_parts = split("/", var.vnet_id)
vnet_name = element(local.vnet_parts, length(local.vnet_parts) - 1)
vnet_resource_group = element(local.vnet_parts, index(local.vnet_parts, "resourceGroups") + 1)
# Subnet selection
primary_subnet_id = var.private_subnet_ids[0]
# Key Vault configuration
create_key_vault = var.credentials_key_vault_id == ""
key_vault_id = local.create_key_vault ? azurerm_key_vault.credentials[0].id : var.credentials_key_vault_id
key_vault_name = local.create_key_vault ? "supermetal-kv-${substr(md5(var.resource_group_name), 0, 8)}" : ""
existing_kv_rg = local.create_key_vault ? "" : split("/", var.credentials_key_vault_id)[4]
existing_kv_name = local.create_key_vault ? "" : split("/", var.credentials_key_vault_id)[8]
# Common tags
common_tags = {
"supermetal:byoc:external-id" = var.external_id
"supermetal:byoc:resource-group" = var.resource_group_name
"supermetal:byoc:stack-name" = "supermetal-bootstrap-${var.resource_group_name}"
"supermetal:managed" = "true"
"Environment" = "Production"
}
}
# =============================================================================
# Resource Group Data Source
# =============================================================================
data "azurerm_resource_group" "target" {
name = var.resource_group_name
}
# =============================================================================
# Networking Data Sources
# =============================================================================
data "azurerm_virtual_network" "vnet" {
name = local.vnet_name
resource_group_name = local.vnet_resource_group
}
data "azurerm_subnet" "primary" {
name = element(split("/", local.primary_subnet_id), length(split("/", local.primary_subnet_id)) - 1)
virtual_network_name = local.vnet_name
resource_group_name = local.vnet_resource_group
}
# =============================================================================
# Custom Role Definition for Supermetal Control Plane
# =============================================================================
resource "azurerm_role_definition" "supermetal_control_plane" {
name = "SupermetalControlPlaneRole-${var.resource_group_name}"
scope = data.azurerm_subscription.current.id
description = "Allows Supermetal Control Plane to manage resources in customer subscription"
permissions {
actions = [
# ===== Virtual Machine Scale Sets =====
"Microsoft.Compute/virtualMachineScaleSets/read",
"Microsoft.Compute/virtualMachineScaleSets/write",
"Microsoft.Compute/virtualMachineScaleSets/delete",
"Microsoft.Compute/virtualMachineScaleSets/start/action",
"Microsoft.Compute/virtualMachineScaleSets/powerOff/action",
"Microsoft.Compute/virtualMachineScaleSets/restart/action",
"Microsoft.Compute/virtualMachineScaleSets/deallocate/action",
"Microsoft.Compute/virtualMachineScaleSets/manualUpgrade/action",
"Microsoft.Compute/virtualMachineScaleSets/scale/action",
"Microsoft.Compute/virtualMachineScaleSets/instanceView/read",
"Microsoft.Compute/virtualMachineScaleSets/skus/read",
# ===== VMSS Extensions (for Docker/Agent setup) =====
"Microsoft.Compute/virtualMachineScaleSets/extensions/read",
"Microsoft.Compute/virtualMachineScaleSets/extensions/write",
"Microsoft.Compute/virtualMachineScaleSets/extensions/delete",
# ===== Virtual Machines (VMSS instances) =====
"Microsoft.Compute/virtualMachines/read",
"Microsoft.Compute/virtualMachines/instanceView/read",
# ===== Managed Disks =====
"Microsoft.Compute/disks/read",
"Microsoft.Compute/disks/write",
"Microsoft.Compute/disks/delete",
# ===== Container Instances (for validation) =====
"Microsoft.ContainerInstance/containerGroups/read",
"Microsoft.ContainerInstance/containerGroups/write",
"Microsoft.ContainerInstance/containerGroups/delete",
"Microsoft.ContainerInstance/containerGroups/restart/action",
"Microsoft.ContainerInstance/containerGroups/stop/action",
"Microsoft.ContainerInstance/containerGroups/start/action",
"Microsoft.ContainerInstance/containerGroups/containers/logs/read",
"Microsoft.ContainerInstance/containerGroups/containers/exec/action",
# ===== Container Registry Access =====
"Microsoft.ContainerRegistry/registries/pull/read",
"Microsoft.ContainerRegistry/registries/artifacts/read",
"Microsoft.ContainerRegistry/registries/metadata/read",
# ===== Storage Accounts =====
"Microsoft.Storage/storageAccounts/read",
"Microsoft.Storage/storageAccounts/write",
"Microsoft.Storage/storageAccounts/delete",
"Microsoft.Storage/storageAccounts/listkeys/action",
"Microsoft.Storage/storageAccounts/listAccountSas/action",
"Microsoft.Storage/storageAccounts/blobServices/containers/read",
"Microsoft.Storage/storageAccounts/blobServices/containers/write",
"Microsoft.Storage/storageAccounts/blobServices/containers/delete",
# ===== Networking =====
"Microsoft.Network/virtualNetworks/subnets/join/action"
"Microsoft.Network/networkInterfaces/read"
"Microsoft.Network/networkSecurityGroups/read"
"Microsoft.Network/networkSecurityGroups/join/action"
# ===== Key Vault Management =====
"Microsoft.KeyVault/vaults/read",
# ===== Database Discovery (Read-only) - Scoped to resource group =====
"Microsoft.Sql/servers/read",
"Microsoft.Sql/servers/databases/read",
"Microsoft.DBforPostgreSQL/servers/read",
"Microsoft.DBforPostgreSQL/flexibleServers/read",
"Microsoft.DBforMySQL/servers/read",
"Microsoft.DBforMySQL/flexibleServers/read",
"Microsoft.DBforMariaDB/servers/read"
]
not_actions = []
data_actions = [
# ===== Key Vault Data Operations (scoped to supermetal-* items) =====
"Microsoft.KeyVault/vaults/secrets/getSecret/action",
"Microsoft.KeyVault/vaults/secrets/setSecret/action",
"Microsoft.KeyVault/vaults/keys/encrypt/action",
"Microsoft.KeyVault/vaults/keys/decrypt/action",
# ===== Storage Data Operations =====
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/add/action"
]
not_data_actions = []
}
assignable_scopes = [
data.azurerm_resource_group.target.id
]
}
# =============================================================================
# Role Assignment for Supermetal Control Plane
# =============================================================================
resource "azurerm_role_assignment" "supermetal_control_plane" {
scope = data.azurerm_resource_group.target.id
role_definition_id = azurerm_role_definition.supermetal_control_plane.role_definition_resource_id
principal_id = var.supermetal_principal_id
principal_type = "ServicePrincipal"
# Condition to validate this is the correct service principal
condition = <<-EOT
(
@Request[Microsoft.Authorization/roleAssignments:principalId] == '${var.supermetal_principal_id}'
)
AND
(
@Request[Microsoft.Authorization/roleAssignments:principalType] == 'ServicePrincipal'
)
EOT
condition_version = "2.0"
description = "Supermetal Control Plane access with External ID: ${var.external_id}"
}
# =============================================================================
# User-Assigned Managed Identities
# =============================================================================
# Identity for Supermetal Agents (VMSS)
resource "azurerm_user_assigned_identity" "agent" {
name = "supermetal-agent-identity-${var.resource_group_name}"
location = var.location
resource_group_name = var.resource_group_name
tags = local.common_tags
}
# Identity for Validation Containers (ACI)
resource "azurerm_user_assigned_identity" "validation" {
name = "supermetal-validation-identity-${var.resource_group_name}"
location = var.location
resource_group_name = var.resource_group_name
tags = local.common_tags
}
# =============================================================================
# Key Vault for Credentials
# =============================================================================
resource "azurerm_key_vault" "credentials" {
count = local.create_key_vault ? 1 : 0
name = local.key_vault_name
location = var.location
resource_group_name = var.resource_group_name
tenant_id = data.azurerm_client_config.current.tenant_id
sku_name = "standard"
soft_delete_retention_days = 90
purge_protection_enabled = true
enable_rbac_authorization = false
network_acls {
default_action = "Deny"
bypass = "AzureServices"
ip_rules = []
virtual_network_subnet_ids = var.private_subnet_ids
}
tags = local.common_tags
}
# =============================================================================
# Key Vault Access Policies
# =============================================================================
# Access for Supermetal Control Plane (encrypt only)
resource "azurerm_key_vault_access_policy" "control_plane" {
key_vault_id = local.key_vault_id
tenant_id = var.supermetal_tenant_id
object_id = var.supermetal_principal_id
key_permissions = [
"Get",
"List",
"Encrypt"
]
secret_permissions = [
"Set",
"List"
]
}
# Access for Agent Identity (decrypt)
resource "azurerm_key_vault_access_policy" "agent" {
key_vault_id = local.key_vault_id
tenant_id = data.azurerm_client_config.current.tenant_id
object_id = azurerm_user_assigned_identity.agent.principal_id
key_permissions = [
"Get",
"Decrypt"
]
secret_permissions = [
"Get"
]
}
# Access for Validation Identity
resource "azurerm_key_vault_access_policy" "validation" {
key_vault_id = local.key_vault_id
tenant_id = data.azurerm_client_config.current.tenant_id
object_id = azurerm_user_assigned_identity.validation.principal_id
key_permissions = [
"Get",
"Decrypt"
]
secret_permissions = [
"Get"
]
}
# =============================================================================
# Network Security Groups
# =============================================================================
# NSG for Agent Metadata Access
resource "azurerm_network_security_group" "agent_metadata" {
name = "supermetal-agent-metadata-nsg-${var.resource_group_name}"
location = var.location
resource_group_name = var.resource_group_name
security_rule {
name = "AllowHttpsToMetadataEndpoint"
priority = 100
direction = "Outbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_range = "443"
source_address_prefix = "VirtualNetwork"
destination_address_prefix = "PrivateEndpoint"
description = "Allow HTTPS traffic to Supermetal metadata private endpoint"
}
security_rule {
name = "AllowAzureStorage"
priority = 110
direction = "Outbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_range = "443"
source_address_prefix = "VirtualNetwork"
destination_address_prefix = "Storage.${var.location}"
description = "Allow access to Azure Storage for buffer operations"
}
security_rule {
name = "AllowAzureKeyVault"
priority = 120
direction = "Outbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_range = "443"
source_address_prefix = "VirtualNetwork"
destination_address_prefix = "AzureKeyVault.${var.location}"
description = "Allow access to Azure Key Vault for credential retrieval"
}
tags = local.common_tags
}
# =============================================================================
# Private Endpoint for Metadata Service
# =============================================================================
resource "azurerm_private_endpoint" "metadata" {
name = "supermetal-metadata-endpoint-${var.resource_group_name}"
location = var.location
resource_group_name = var.resource_group_name
subnet_id = local.primary_subnet_id
private_service_connection {
name = "supermetal-metadata-connection"
private_connection_resource_id = var.supermetal_metadata_service_id
is_manual_connection = false
subresource_names = ["metadata"]
}
tags = local.common_tags
}
# =============================================================================
# Container Registry Access Role Assignments
# =============================================================================
# Role assignment for agent identity to pull images
resource "azurerm_role_assignment" "agent_acr_pull" {
scope = "/subscriptions/${var.supermetal_tenant_id}/resourceGroups/supermetal-control-plane/providers/Microsoft.ContainerRegistry/registries/${var.supermetal_container_registry_name}"
role_definition_name = "AcrPull"
principal_id = azurerm_user_assigned_identity.agent.principal_id
principal_type = "ServicePrincipal"
}
# Role assignment for validation identity to pull images
resource "azurerm_role_assignment" "validation_acr_pull" {
scope = "/subscriptions/${var.supermetal_tenant_id}/resourceGroups/supermetal-control-plane/providers/Microsoft.ContainerRegistry/registries/${var.supermetal_container_registry_name}"
role_definition_name = "AcrPull"
principal_id = azurerm_user_assigned_identity.validation.principal_id
principal_type = "ServicePrincipal"
}
# =============================================================================
# Outputs
# =============================================================================
output "control_plane_role_definition_id" {
description = "ID of the custom role definition for Supermetal Control Plane"
value = azurerm_role_definition.supermetal_control_plane.role_definition_resource_id
}
output "control_plane_role_assignment_id" {
description = "ID of the role assignment for Supermetal Control Plane"
value = azurerm_role_assignment.supermetal_control_plane.id
}
output "agent_identity_resource_id" {
description = "Resource ID of the Managed Identity for Supermetal agents"
value = azurerm_user_assigned_identity.agent.id
}
output "agent_identity_client_id" {
description = "Client ID of the Managed Identity for Supermetal agents"
value = azurerm_user_assigned_identity.agent.client_id
}
output "agent_identity_principal_id" {
description = "Principal ID of the Managed Identity for Supermetal agents"
value = azurerm_user_assigned_identity.agent.principal_id
}
output "validation_identity_resource_id" {
description = "Resource ID of the Managed Identity for validation containers"
value = azurerm_user_assigned_identity.validation.id
}
output "validation_identity_client_id" {
description = "Client ID of the Managed Identity for validation containers"
value = azurerm_user_assigned_identity.validation.client_id
}
output "key_vault_id" {
description = "Resource ID of the Key Vault for credentials"
value = local.key_vault_id
}
output "key_vault_uri" {
description = "URI of the Key Vault for credentials"
value = local.create_key_vault ? azurerm_key_vault.credentials[0].vault_uri : "https://${local.existing_kv_name}.vault.azure.net/"
}
output "metadata_endpoint_id" {
description = "Resource ID of the Private Endpoint for Supermetal Metadata Service"
value = azurerm_private_endpoint.metadata.id
}
output "metadata_endpoint_ip" {
description = "Private IP address of the Metadata Service endpoint"
value = azurerm_private_endpoint.metadata.private_service_connection[0].private_ip_address
}
output "agent_metadata_nsg_id" {
description = "Resource ID of the NSG for agent metadata access"
value = azurerm_network_security_group.agent_metadata.id
}
output "bootstrap_timestamp" {
description = "Timestamp when bootstrap was completed"
value = timestamp()
}
```
***
### Azure Roles & Permissions
Bootstrap creates several Azure roles and managed identities with specific permissions necessary for Supermetal BYOC to function securely.
These roles ensure that the Supermetal Control Plane can manage resources in your Azure subscription on your behalf, and that the Supermetal agents have the necessary permissions to operate.
The principle of least privilege is applied to scope down permissions as much as possible.
Key roles created and their purposes:
#### Supermetal Control Plane Role
```hcl
resource "azurerm_role_definition" "supermetal_control_plane" {
name = "SupermetalControlPlaneRole-${var.resource_group_name}"
scope = data.azurerm_subscription.current.id
description = "Allows Supermetal Control Plane to manage resources in customer subscription"
permissions {
actions = [
# ===== Virtual Machine Scale Sets =====
"Microsoft.Compute/virtualMachineScaleSets/read",
"Microsoft.Compute/virtualMachineScaleSets/write",
"Microsoft.Compute/virtualMachineScaleSets/delete",
"Microsoft.Compute/virtualMachineScaleSets/start/action",
"Microsoft.Compute/virtualMachineScaleSets/powerOff/action",
"Microsoft.Compute/virtualMachineScaleSets/restart/action",
"Microsoft.Compute/virtualMachineScaleSets/deallocate/action",
"Microsoft.Compute/virtualMachineScaleSets/manualUpgrade/action",
"Microsoft.Compute/virtualMachineScaleSets/scale/action",
"Microsoft.Compute/virtualMachineScaleSets/instanceView/read",
"Microsoft.Compute/virtualMachineScaleSets/skus/read",
# ===== VMSS Extensions (for Docker/Agent setup) =====
"Microsoft.Compute/virtualMachineScaleSets/extensions/read",
"Microsoft.Compute/virtualMachineScaleSets/extensions/write",
"Microsoft.Compute/virtualMachineScaleSets/extensions/delete",
# ===== Virtual Machines (VMSS instances) =====
"Microsoft.Compute/virtualMachines/read",
"Microsoft.Compute/virtualMachines/instanceView/read",
# ===== Managed Disks =====
"Microsoft.Compute/disks/read",
"Microsoft.Compute/disks/write",
"Microsoft.Compute/disks/delete",
# ===== Container Instances (for validation) =====
"Microsoft.ContainerInstance/containerGroups/read",
"Microsoft.ContainerInstance/containerGroups/write",
"Microsoft.ContainerInstance/containerGroups/delete",
"Microsoft.ContainerInstance/containerGroups/restart/action",
"Microsoft.ContainerInstance/containerGroups/stop/action",
"Microsoft.ContainerInstance/containerGroups/start/action",
"Microsoft.ContainerInstance/containerGroups/containers/logs/read",
"Microsoft.ContainerInstance/containerGroups/containers/exec/action",
# ===== Container Registry Access =====
"Microsoft.ContainerRegistry/registries/pull/read",
"Microsoft.ContainerRegistry/registries/artifacts/read",
"Microsoft.ContainerRegistry/registries/metadata/read",
# ===== Storage Accounts =====
"Microsoft.Storage/storageAccounts/read",
"Microsoft.Storage/storageAccounts/write",
"Microsoft.Storage/storageAccounts/delete",
"Microsoft.Storage/storageAccounts/listkeys/action",
"Microsoft.Storage/storageAccounts/listAccountSas/action",
"Microsoft.Storage/storageAccounts/blobServices/containers/read",
"Microsoft.Storage/storageAccounts/blobServices/containers/write",
"Microsoft.Storage/storageAccounts/blobServices/containers/delete",
# ===== Networking =====
"Microsoft.Network/virtualNetworks/subnets/join/action"
"Microsoft.Network/networkInterfaces/read"
"Microsoft.Network/networkSecurityGroups/read"
"Microsoft.Network/networkSecurityGroups/join/action"
# ===== Key Vault Management =====
"Microsoft.KeyVault/vaults/read",
# ===== Database Discovery (Read-only) =====
"Microsoft.Sql/servers/read",
"Microsoft.Sql/servers/databases/read",
"Microsoft.DBforPostgreSQL/servers/read",
"Microsoft.DBforPostgreSQL/flexibleServers/read",
"Microsoft.DBforMySQL/servers/read",
"Microsoft.DBforMySQL/flexibleServers/read",
"Microsoft.DBforMariaDB/servers/read"
]
not_actions = []
data_actions = [
# ===== Key Vault Data Operations (scoped to supermetal-* items) =====
"Microsoft.KeyVault/vaults/secrets/getSecret/action",
"Microsoft.KeyVault/vaults/secrets/setSecret/action",
"Microsoft.KeyVault/vaults/keys/encrypt/action",
"Microsoft.KeyVault/vaults/keys/decrypt/action",
# ===== Storage Data Operations =====
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/add/action"
]
not_data_actions = []
}
assignable_scopes = [
data.azurerm_resource_group.target.id
]
}
```
This role is assigned to the Supermetal Service Principal. It enables the Supermetal Control Plane to orchestrate and manage the lifecycle of resources required for the Supermetal data plane within your Azure subscription.
* **Virtual Machine Scale Sets Management**
* Create, read, update, and delete VMSS resources (`Microsoft.Compute/virtualMachineScaleSets/*`)
* Control instance lifecycle (start, stop, restart, scale) for agent deployment
* Manage VMSS extensions for Docker/agent setup
* Read-only access to VM instances (`Microsoft.Compute/virtualMachines/read`)
* **Container Instances and Registry**
* Full management of Container Instances for validation tasks (`Microsoft.ContainerInstance/containerGroups/*`)
* Pull access to container registry for agent images (`Microsoft.ContainerRegistry/registries/pull/read`)
* Used for deploying validation containers
* **Storage and Data Management**
* Storage Account management (`Microsoft.Storage/storageAccounts/*`)
* Used for data buffering and staging per connector
* **Networking**
* Read VNet and subnet information (`Microsoft.Network/virtualNetworks/*/read`)
* Join subnets for resource deployment (`Microsoft.Network/virtualNetworks/subnets/join/action`)
* Read and join network security groups
* **Security and Access Control**
* Key Vault read access (`Microsoft.KeyVault/vaults/read`)
* Key Vault data operations (get/set secrets, encrypt/decrypt with keys)
* Restricted to resource group scope specified during bootstrap
* **Database Discovery**
* Read-only access to database resources (`Microsoft.Sql/servers/read`, `Microsoft.DBforPostgreSQL/*/read`, etc.)
* Limited to discovery capabilities for validating connections
***
#### Agent and Validation Identities
```hcl
# Identity for Supermetal Agents (VMSS)
resource "azurerm_user_assigned_identity" "agent" {
name = "supermetal-agent-identity-${var.resource_group_name}"
location = var.location
resource_group_name = var.resource_group_name
tags = local.common_tags
}
# Access for Agent Identity (decrypt)
resource "azurerm_key_vault_access_policy" "agent" {
key_vault_id = local.key_vault_id
tenant_id = data.azurerm_client_config.current.tenant_id
object_id = azurerm_user_assigned_identity.agent.principal_id
key_permissions = [
"Get",
"Decrypt"
]
secret_permissions = [
"Get"
]
}
# Role assignment for agent identity to pull images
resource "azurerm_role_assignment" "agent_acr_pull" {
scope = "/subscriptions/${var.supermetal_tenant_id}/resourceGroups/supermetal-control-plane/providers/Microsoft.ContainerRegistry/registries/${var.supermetal_container_registry_name}"
role_definition_name = "AcrPull"
principal_id = azurerm_user_assigned_identity.agent.principal_id
principal_type = "ServicePrincipal"
}
```
This User-Assigned Managed Identity is attached to the VMSS instances running Supermetal agents. It provides the following permissions:
* **Key Vault Access**
* Get and decrypt operations on keys (`Get`, `Decrypt`)
* Retrieve secrets from Key Vault (`Get`)
* Used to securely access connection credentials
* **Container Registry Access**
* AcrPull role on Supermetal's container registry
* Enables pulling agent container images from Supermetal's registry
```hcl
# Identity for Validation Containers (ACI)
resource "azurerm_user_assigned_identity" "validation" {
name = "supermetal-validation-identity-${var.resource_group_name}"
location = var.location
resource_group_name = var.resource_group_name
tags = local.common_tags
}
# Access for Validation Identity
resource "azurerm_key_vault_access_policy" "validation" {
key_vault_id = local.key_vault_id
tenant_id = data.azurerm_client_config.current.tenant_id
object_id = azurerm_user_assigned_identity.validation.principal_id
key_permissions = [
"Get",
"Decrypt"
]
secret_permissions = [
"Get"
]
}
# Role assignment for validation identity to pull images
resource "azurerm_role_assignment" "validation_acr_pull" {
scope = "/subscriptions/${var.supermetal_tenant_id}/resourceGroups/supermetal-control-plane/providers/Microsoft.ContainerRegistry/registries/${var.supermetal_container_registry_name}"
role_definition_name = "AcrPull"
principal_id = azurerm_user_assigned_identity.validation.principal_id
principal_type = "ServicePrincipal"
}
```
This User-Assigned Managed Identity is used by Azure Container Instances (ACI) that may be deployed for validation tasks such as testing database connectivity or network reachability.
* **Key Vault Access**
* Similar permissions to the Agent identity (Get, Decrypt)
* Retrieves necessary connection credentials for validation
* **Container Registry Access**
* AcrPull role on Supermetal's container registry
* Enables pulling validation container images
# BYOC (/docs/main/concepts/deployment/byoc)
import { Icons } from '../../../../../../app/icons';
import SupermetalHighLevelBYOCArchitectureDiagram from '../../../../../../app/diagrams/byoc';
The Bring your own cloud (BYOC) / Hybrid model offers a balance between the ease of a managed service and the security/control of self-hosting. It separates Supermetal into two distinct components: a Control Plane managed by Supermetal, and a Data Plane that runs entirely within your cloud environment.
### Overview
* **Control Plane:** This is the centralized management and infrastructure orchestration layer hosted by Supermetal. It handles metadata, monitoring, and the entire connector lifecycle.
* **Data Plane:** This consists of Supermetal agents deployed within your Virtual Private Cloud (VPC) on your chosen cloud provider (AWS, Azure, GCP). The Data Plane handles all data processing and movement.
**Crucially, customer data never leaves your VPC or network boundary. The Control Plane only interacts with metadata.**
### Control Plane
The Supermetal-hosted Control Plane is responsible for:
* **Connector Lifecycle Management:**
* Storing connector definitions and configurations.
* Managing encrypted credentials securely.
* Maintaining connector state (e.g., WAL positions, LSNs for CDC).
* Orchestrating provisioning and de-provisioning workflows for agents in the Data Plane.
* Scaling, failover, and high availability of data plane agents.
* **Operational Interface:**
* Providing a centralized web console & REST APIs for creating, managing, and monitoring all connectors across your organization.
* **Analytics and Insights:**
* Aggregating logs, metrics, and performance analytics from all Data Plane agents.
* **Security and Governance:**
* Implementing SSO, Role-Based Access Control (RBAC) for managing user & permissions.
* Maintaining audit logs for all management operations.
### Data Plane
The Data Plane runs entirely within your cloud infrastructure (e.g., your AWS VPC, Azure VNet, or GCP VPC) and consists of:
* **Supermetal Agents:** These are the same lightweight, efficient agents used in the self-hosted model. They perform the actual data extraction from sources and loading to targets.
* **Network Security:** Operates entirely within your VPC's security boundaries, adhering to your network ACLs, security groups, and private networking configurations.
* **Data Confinement:** Customer data is processed and moved directly between sources and targets within the customer's VPC. It *never* traverses the public internet to the Control Plane or leaves the customer's designated network environment.
* **Metadata Exchange:** Only essential metadata (e.g., connector state, operational logs, performance metrics) is securely transmitted to the Control Plane using private endpoints for management and observability.
### Benefits of BYOC / Hybrid Deployment
This split two plane architecture offers significant advantages:
* **Data Sovereignty and Security:** Ensures that sensitive data remains within your secure cloud environment and network boundaries at all times, simplifying compliance with regulations like GDPR, HIPAA, and CCPA.
* **Reduced Egress Costs:** By processing data within your VPC, especially when sources and targets are in the same cloud region, you can significantly reduce or eliminate costly data egress charges.
* **Leverage Existing Infrastructure and Security:** Utilize your established cloud security posture, including IAM roles, security groups, VPC endpoints, private links, and monitoring tools.
* **Optimized Performance:** Data transfer occurs over your private network, potentially offering lower latency and higher bandwidth than solutions routing data externally.
* **Simplified Management:** The Supermetal-managed Control Plane offloads the operational burden of managing the orchestration, monitoring, and update mechanisms for the core replication logic, allowing your teams to focus on data integration tasks rather than infrastructure management.
* **Centralized Control with Distributed Execution:** Gain a unified view and control over all your data pipelines via the Control Plane, while the actual data processing happens securely and efficiently within your environment.
### Next Steps
|