|
Bulk Insert
|
Uses `COPY FROM ... FORMAT BINARY` to efficiently load data in bulk compared to row-by-row updates, providing optimal throughput for large data transfers.
|
Transactional Consistency
|
Guarantees that the data synced reflects a consistent point in the upstream database's timeline, respecting transaction boundaries to maintain data integrity.
|
|
Schema Evolution
|
Automatic schema evolution to handle changes in source table structures, including new columns and data type modifications.
|
|
Hard Deletes
|
Row deletion is handled by issuing a `DELETE` statement, ensuring that deleted records are properly removed from the target database.
|
***
## Prerequisites
Before you begin, ensure you have:
* **Supported PostgreSQL Version:** PostgreSQL 9.4 or higher.
* [Generic PostgreSQL](https://www.postgresql.org/)
* [AWS RDS PostgreSQL](https://aws.amazon.com/rds/postgresql/)
* [AWS Aurora PostgreSQL](https://aws.amazon.com/rds/aurora/)
* [Google Cloud PostgreSQL](https://cloud.google.com/sql/postgresql)
* [Azure PostgreSQL](https://azure.microsoft.com/en-us/products/postgresql)
* **Database Admin Access:** Sufficient privileges to create users and grant permissions.
* **Network Connectivity:** Ensure that Supermetal's services can reach your PostgreSQL database. You may need to configure security groups.
***
## Setup
Log in to your PostgreSQL database and run the following:
```sql
-- Create a user for Supermetal
CREATE USER supermetal_user WITH PASSWORD 'strong-password';
-- Grant permissions
-- allows supermetal to connect to the database, create schemas and (temp) tables
GRANT CONNECT, CREATE, TEMPORARY ON DATABASE target_database TO supermetal_user;
```
Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the target in Supermetal.
Replace the placeholders in the script with your specific values:
* `target_database`: The name of your PostgreSQL database
For large snapshots, optimize checkpoint frequency to maximize COPY throughput and avoid I/O storms. Checkpoints trigger when either limit is reached first.
* **`max_wal_size`**: Maximum WAL size before checkpoint. COPY generates WAL rapidly, and small values (default: 1GB) cause frequent checkpoints that throttle performance. Increase to 16GB - 64GB to allow bulk loads to complete with fewer I/O interruptions.
* **`checkpoint_timeout`**: Maximum time between checkpoints. Default (5min) triggers checkpoints before large snapshots complete. Increase to 30min - 1h to prevent premature interruptions during extended loads.
## Data Types Mapping
| Apache Arrow DataType | PostgreSQL Type | Notes |
| :-------------------- | :----------------- | :---- |
| `Int8` | `SMALLINT` | |
| `Int16` | `SMALLINT` | |
| `Int32` | `INTEGER` | |
| `Int64` | `BIGINT` | |
| `UInt8` | `SMALLINT` | |
| `UInt16` | `INTEGER` | |
| `UInt32` | `BIGINT` | |
| `UInt64` | `NUMERIC` | |
| `Float16` | `REAL` | |
| `Float32` | `REAL` | |
| `Float64` | `DOUBLE PRECISION` | |
| `Decimal128(p,s)` | `NUMERIC(p,s)` | |
| `Decimal256(p,s)` | `NUMERIC(p,s)` | |
| Apache Arrow DataType | PostgreSQL Type | Notes |
| :------------------------- | :-------------- | :--------------------------------------------------------------------------------------------- |
| `Date32` | `DATE` | |
| `Date64` | `DATE` | |
| `Time32` | `TIME` | |
| `Time64` | `TIME` | |
| `Time64(ns)` | `BIGINT` | Nanosecond precision stored as integer because Postgres does not support nanosecond precision |
| `Timestamp(s/ms/us, [tz])` | `TIMESTAMP[Z]` | |
| `Timestamp(ns, [tz])` | `BIGINT` | Nanosecond timestamps stored as integer because Postgres does not support nanosecond precision |
| `Interval` | `INTERVAL` | |
| `Duration` | Not supported | Durations are explicitly not supported |
| Apache Arrow DataType | PostgreSQL Type | Notes |
| :-------------------- | :-------------- | :---- |
| `Utf8`, `LargeUtf8` | `TEXT` | |
| Apache Arrow DataType | PostgreSQL Type | Notes |
| :-------------------- | :-------------- | :------------------- |
| `Boolean` | `BOOLEAN` | Boolean (true/false) |
| Apache Arrow DataType | PostgreSQL Type | Notes |
| :--------------------------------------------------------------------------------------------------------- | :-------------- | :---- |
| `Utf8` [JSON Extension](https://arrow.apache.org/docs/format/CanonicalExtensions.html#json) (`arrow.json`) | `JSONB` | |
| `Struct` | Not supported | |
| `Map` | Not supported | |
| Apache Arrow DataType | PostgreSQL Type | Notes |
| :----------------------------------------- | :-------------- | :---- |
| `Binary`, `LargeBinary`, `FixedSizeBinary` | `BYTEA` | |
| Apache Arrow DataType | PostgreSQL Type | Notes |
| :-------------------------------------------- | :-------------- | :----------------------------------------------------------- |
| `List`, `LargeList`, `FixedSizeList` | `T[]` | Only one-dimensional arrays of primitive types are supported |
# PostgreSQL Setup (/docs/main/targets/postgres/setup)
## Prerequisites
Before you begin, ensure you have:
* **Supported PostgreSQL Version:** PostgreSQL 9.4 or higher.
* [Generic PostgreSQL](https://www.postgresql.org/)
* [AWS RDS PostgreSQL](https://aws.amazon.com/rds/postgresql/)
* [AWS Aurora PostgreSQL](https://aws.amazon.com/rds/aurora/)
* [Google Cloud PostgreSQL](https://cloud.google.com/sql/postgresql)
* [Azure PostgreSQL](https://azure.microsoft.com/en-us/products/postgresql)
* **Database Admin Access:** Sufficient privileges to create users and grant permissions.
* **Network Connectivity:** Ensure that Supermetal's services can reach your PostgreSQL database. You may need to configure security groups.
***
## Setup
Log in to your PostgreSQL database and run the following:
```sql
-- Create a user for Supermetal
CREATE USER supermetal_user WITH PASSWORD 'strong-password';
-- Grant permissions
-- allows supermetal to connect to the database, create schemas and (temp) tables
GRANT CONNECT, CREATE, TEMPORARY ON DATABASE target_database TO supermetal_user;
```
Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the target in Supermetal.
Replace the placeholders in the script with your specific values:
* `target_database`: The name of your PostgreSQL database
For large snapshots, optimize checkpoint frequency to maximize COPY throughput and avoid I/O storms. Checkpoints trigger when either limit is reached first.
* **`max_wal_size`**: Maximum WAL size before checkpoint. COPY generates WAL rapidly, and small values (default: 1GB) cause frequent checkpoints that throttle performance. Increase to 16GB - 64GB to allow bulk loads to complete with fewer I/O interruptions.
* **`checkpoint_timeout`**: Maximum time between checkpoints. Default (5min) triggers checkpoints before large snapshots complete. Increase to 30min - 1h to prevent premature interruptions during extended loads.
# Databricks (/docs/main/targets/databricks)
import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
import { Callout } from 'fumadocs-ui/components/callout';
import { Accordion, Accordions } from 'fumadocs-ui/components/accordion';
Databricks is a unified data platform that combines key features of data lakes and data warehouses.
This guide walks you through configuring your Databricks platform to work seamlessly with Supermetal.
***
## Features
|
Feature
|
Notes
|
|
Schema Evolution
|
Automatic schema evolution
|
|
Soft Delete(s)
|
Row deletion is handled by marking a `_sm_deleted` boolean metadata column value.
|
***
## Prerequisites
Before you begin, ensure you have:
* **Supported Databricks Implementations:**
* [AWS](https://www.databricks.com/product/aws)
* [Azure](https://www.databricks.com/product/azure)
* [GCP](https://www.databricks.com/product/google-cloud)
* [Serverless Workspace](https://docs.databricks.com/aws/en/admin/workspace/serverless-workspaces)
* **Unity Catalog:** Unity Catalog enabled on your Databricks workspace.
* **SQL Warehouse:** [Serverless SQL Warehouse](https://docs.databricks.com/aws/en/compute/sql-warehouse/).
***
## Setup
### Configure Authentication
### Create a Service Principal
Follow the databricks [documentation](https://docs.databricks.com/aws/en/dev-tools/auth/oauth-m2m) to create a Service Principal.
Note down the Client ID and Client Secret.
### Create a Personal Access Token
Follow the databricks [documentation](https://docs.databricks.com/aws/en/dev-tools/auth/pat) to create a PAT.
Note down the personal access token.
### Create a SQL Warehouse
Login to Databricks console to create or use an existing SQL Warehouse.
* Go to SQL > SQL warehouses > Create SQL warehouse
* Fill in and select the required fields like
* Name
* Warehouse Size (2X Small)
* Warehouse Type (Serverless)
* Auto Stop (10 minutes)
* Scaling Min & Max (1)
* Unity Catalog (Enabled)
* etc.
* Click Create
* Once created, click on the Connection Details Tab
Note down the following details:
* Server Hostname (your-workspace.cloud.databricks.com)
* Warehouse ID (0123456789abcdef)
### Configure a Catalog
Login to Databricks console to choose a Catalog (or set a new one from [databricks documentation](https://docs.databricks.com/aws/en/data-governance/unity-catalog/get-started#overview-of-unity-catalog-setup)).
* From Databricks workspace console, navigate to Data
* Choose a Catalog (`my_catalog`)
### Create a Volume
Supermetal uses the configured volume as a temporary stage
Follow the steps from the [databricks documentation](https://docs.databricks.com/aws/en/volumes/utility-commands#create-a-volume).
* From Databricks workspace console, navigate to Catalog
* Choose the Catalog from above step (`my_catalog`)
* Search or browse for the schema that you want to add the volume to and select it.
* Click on Create Volume and specify a Name.
* Click Create
```sql
CREATE VOLUME my_catalog.my_schema.my_volume;
```
Note down the following details:
* Catalog Name (my\_catalog)
* Volume Path (/Volumes/my\_catalog/my\_schema/my\_volume)
***
## Data Types Mapping
| Apache Arrow DataType | Databricks Type | Notes |
| :----------------------------------- | :--------------- | :------------------------------------------------------- |
| `Int8` | `TINYINT` | |
| `Int16` | `SMALLINT` | |
| `Int32` | `INT` | |
| `Int64` | `BIGINT` | |
| `UInt8` | `SMALLINT` | Promoted to signed 16-bit |
| `UInt16` | `INT` | Promoted to signed 32-bit |
| `UInt32` | `BIGINT` | Promoted to signed 64-bit |
| `UInt64` | `DECIMAL(20, 0)` | Mapped to decimal to preserve full unsigned 64-bit range |
| `Float16` | `FLOAT` | Upcast to Float32 in Parquet |
| `Float32` | `FLOAT` | |
| `Float64` | `DOUBLE` | |
| `Decimal128(p, s)` where p ≤ 38 | `DECIMAL(p, s)` | |
| `Decimal128(p, s)` where p > 38 | `STRING` | Precision exceeds Databricks maximum of 38 |
| `Decimal256(p, s)` where p ≤ 38 | `DECIMAL(p, s)` | Downcast to Decimal128 in Parquet |
| `Decimal256(p, s)` where p > 38 | `STRING` | Precision exceeds Databricks maximum of 38 |
| Apache Arrow DataType | Databricks Type |
| :-------------------- | :-------------- |
| `Boolean` | `BOOLEAN` |
| Apache Arrow DataType | Databricks Type | Notes |
| :-------------------- | :-------------- | :--------------------------------------------------------------- |
| `Date32` | `DATE` | |
| `Date64` | `DATE` | Converted to Date32 in Parquet |
| `Timestamp(s, tz)` | `TIMESTAMP_NTZ` | Converted to Timestamp(ms) in Parquet for proper annotation |
| `Timestamp(ms, tz)` | `TIMESTAMP_NTZ` | |
| `Timestamp(μs, tz)` | `TIMESTAMP_NTZ` | Databricks supports microsecond precision |
| `Timestamp(ns, tz)` | `TIMESTAMP_NTZ` | Converted to Timestamp(μs) in Parquet (Databricks max precision) |
| `Time32`, `Time64` | `STRING` | Databricks does not support TIME types |
| `Interval` | `STRING` | Databricks cannot read INTERVAL from Parquet |
| Apache Arrow DataType | Databricks Type |
| :-------------------- | :-------------- |
| `Utf8`, `LargeUtf8` | `STRING` |
| Apache Arrow DataType | Databricks Type |
| :---------------------- | :-------------- |
| `Binary`, `LargeBinary` | `BINARY` |
| Apache Arrow DataType | Databricks Type | Notes |
| :--------------------------------------------------------------------------------------------------------- | :-------------- | :---------------------------------------- |
| `Utf8` [JSON Extension](https://arrow.apache.org/docs/format/CanonicalExtensions.html#json) (`arrow.json`) | `STRING` | `VARIANT` will be supported in the future |
| Apache Arrow DataType | Databricks Type | Notes |
| :-------------------------------------------- | :-------------- | :----------------------------------- |
| `List`, `LargeList`, `FixedSizeList` | `ARRAY` | Element type T is recursively mapped |
***
# Databricks Setup (/docs/main/targets/databricks/setup)
## Prerequisites
Before you begin, ensure you have:
* **Supported Databricks Implementations:**
* [AWS](https://www.databricks.com/product/aws)
* [Azure](https://www.databricks.com/product/azure)
* [GCP](https://www.databricks.com/product/google-cloud)
* [Serverless Workspace](https://docs.databricks.com/aws/en/admin/workspace/serverless-workspaces)
* **Unity Catalog:** Unity Catalog enabled on your Databricks workspace.
* **SQL Warehouse:** [Serverless SQL Warehouse](https://docs.databricks.com/aws/en/compute/sql-warehouse/).
***
## Setup
### Configure Authentication
### Create a Service Principal
Follow the databricks [documentation](https://docs.databricks.com/aws/en/dev-tools/auth/oauth-m2m) to create a Service Principal.
Note down the Client ID and Client Secret.
### Create a Personal Access Token
Follow the databricks [documentation](https://docs.databricks.com/aws/en/dev-tools/auth/pat) to create a PAT.
Note down the personal access token.
### Create a SQL Warehouse
Login to Databricks console to create or use an existing SQL Warehouse.
* Go to SQL > SQL warehouses > Create SQL warehouse
* Fill in and select the required fields like
* Name
* Warehouse Size (2X Small)
* Warehouse Type (Serverless)
* Auto Stop (10 minutes)
* Scaling Min & Max (1)
* Unity Catalog (Enabled)
* etc.
* Click Create
* Once created, click on the Connection Details Tab
Note down the following details:
* Server Hostname (your-workspace.cloud.databricks.com)
* Warehouse ID (0123456789abcdef)
### Configure a Catalog
Login to Databricks console to choose a Catalog (or set a new one from [databricks documentation](https://docs.databricks.com/aws/en/data-governance/unity-catalog/get-started#overview-of-unity-catalog-setup)).
* From Databricks workspace console, navigate to Data
* Choose a Catalog (`my_catalog`)
### Create a Volume
Supermetal uses the configured volume as a temporary stage
Follow the steps from the [databricks documentation](https://docs.databricks.com/aws/en/volumes/utility-commands#create-a-volume).
* From Databricks workspace console, navigate to Catalog
* Choose the Catalog from above step (`my_catalog`)
* Search or browse for the schema that you want to add the volume to and select it.
* Click on Create Volume and specify a Name.
* Click Create
```sql
CREATE VOLUME my_catalog.my_schema.my_volume;
```
Note down the following details:
* Catalog Name (my\_catalog)
* Volume Path (/Volumes/my\_catalog/my\_schema/my\_volume)
# MongoDB (/docs/main/sources/mongo)
import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
import { Callout } from 'fumadocs-ui/components/callout';
import { Accordion, Accordions } from 'fumadocs-ui/components/accordion';
import { ImageZoom } from 'fumadocs-ui/components/image-zoom';
Supermetal's MongoDB integration leverages MongoDB's native change streams capability to provide efficient, reliable data extraction with minimal impact on your database. This integration enables both initial snapshot capture and continuous replication of changes through a strongly-typed schema approach.
This guide covers the features, prerequisites, and configuration steps required to connect MongoDB with Supermetal.
***
## Features
|
Feature
|
Notes
|
|
Initial Data Sync
|
Fast, memory efficient initial data loading through optimized MongoDB query patterns and streaming data processing.
|
|
Change Data Capture
|
Captures data changes (inserts, updates, deletes) in real-time using MongoDB's native [Change Streams](https://www.mongodb.com/docs/manual/changeStreams/) functionality. Change streams provide a time-ordered sequence of change events that occur within your MongoDB deployment.
Supermetal creates a single change stream at the database level rather than per collection, reducing resource usage and following MongoDB's [performance recommendations](https://www.mongodb.com/docs/manual/changeStreams/#change-stream-performance-considerations).
|
|
Schema Evolution
|
Unlike traditional MongoDB ETL tools that treat all data as unstructured JSON, Supermetal automatically infers and enforces a strongly-typed schema backed by Apache Arrow:
**Precise Type Mapping**: Backed by Apache Arrow's rich type system, Supermetal internally maintains precise type mapping for MongoDB BSON types to Arrow data types
**Schema Evolution**: Detects new fields and type changes during both snapshot and CDC phases, seamlessly merging them into a unified schema continuously
**Downstream Compatibility**: Ensures schema consistency when writing to structured targets like databases and data warehouses
|
|
Catalog Support
|
Ability to configure which collections to sync through the catalog interface.
|
|
Document Flattening
|
Converts nested MongoDB documents into a flat, analytics-friendly structure using a double-underscore (`__`) notation.
**Example:**
```javascript
// Original MongoDB document
{
"_id": ObjectId("..."),
"user": {
"name": "John",
"address": {
"city": "San Francisco",
"country": "USA"
}
},
"tags": ["tag1", "tag2"]
}
// Flattened representation
{
"_id": "...",
"user__name": "John",
"user__address__city": "San Francisco",
"user__address__country": "USA",
"tags": ["tag1", "tag2"]
}
```
|
***
## Prerequisites
Before you begin, ensure you have:
* **MongoDB Requirements:**
* **Version:** MongoDB 4.0 or higher (required for change streams support)
* **Supported Deployments:**
* `MongoDB Community/Enterprise Server` (replica set configuration)
* `MongoDB Atlas`
Change streams require a replica set deployment. If you're using a standalone server, you must convert it to a single-node replica set to use change streams.
* **Deployment Requirements:**
* **Replica Set**: Your MongoDB deployment must be configured as a replica set
* **Read Concern Majority**: Supermetal uses read concern "majority" to ensure consistent reads
* **Database Permissions**: User with appropriate permissions (see [Setup](#setup))
* **Network Connectivity**: Ensure Supermetal can reach your MongoDB deployment (default port: 27017)
* **TLS/SSL Support**: Supermetal supports both unencrypted and TLS/SSL encrypted connections
* **MongoDB Atlas Requirements**:
* **Network Access**: Configure network access rules to allow Supermetal to connect
* **Connection String**: Use the connection string format that includes all replica set members
***
## Setup
### Permissions Overview
Supermetal requires a dedicated MongoDB user with appropriate permissions to read data and access change streams. The recommended approach is to create a dedicated read-only user.
| MongoDB Deployment | Minimum Required Permissions |
| :----------------- | :-------------------------------------------------- |
| Self-managed | `read` role on the mongo database to replicate from |
| MongoDB Atlas | `readAnyDatabase` role |
### Create a Dedicated Read-Only MongoDB User
#### Connect to your MongoDB instance using the mongo shell with admin privileges: \[!toc]
```bash
mongosh --host --port -u -p --authenticationDatabase admin
```
Replace the placeholder values in the command with your actual information:
* ``: Your MongoDB server hostname or IP address
* ``: MongoDB port (default is 27017)
* ``: Username with admin privileges
* ``: Password for the admin user
#### Create a dedicated user for Supermetal: \[!toc]
```javascript
use admin
db.createUser({
user: "supermetal_user",
pwd: "strong-password",
roles: [
{ role: "read", db: "target-database" }
]
})
```
Replace the placeholder values in the script with your actual information:
* `strong-password`: Replace with a secure, unique password for the `supermetal_user`.
* `target-database`: The name of the database you want to replicate from.
#### Log in to your MongoDB Atlas account \[!toc]
Visit [cloud.mongodb.com](https://cloud.mongodb.com/) to access your account.
#### Select your project and navigate to the **Database Access** section \[!toc]
#### Click the "Add New Database User" button \[!toc]
#### Enter a username and password \[!toc]
#### Under "Database User Privileges", select "Built-in Role" and choose "Read Any Database" \[!toc]
#### Optionally, you can restrict access to specific clusters that contain the databases you want to replicate \[!toc]
#### Click "Add User" to create the user with the selected permissions \[!toc]
***
## Data Types Mapping
Supermetal automatically maps MongoDB BSON types to Apache Arrow data types according to the following mapping:
| MongoDB BSON Type(s) | Apache Arrow DataType | Notes |
| :------------------- | :-------------------- | :----------------------------------------------------------------------------------------------------------------- |
| `Double` | `Float64` | |
| `Int32` | `Int32` | |
| `Int64` | `Int64` | |
| `Decimal128` | `Utf8` | Preserved as string to maintain exact precision and handle MongoDB's variable decimal precision/scale requirements |
| MongoDB BSON Type(s) | Apache Arrow DataType | Notes |
| :------------------- | :----------------------- | :------------------------------ |
| `DateTime` | `Timestamp(Millisecond)` | |
| `Timestamp` | `Timestamp(Millisecond)` | MongoDB internal timestamp type |
| MongoDB BSON Type(s) | Apache Arrow DataType | Notes |
| :------------------------ | :-------------------- | :--------------------------------- |
| `String` | `Utf8` | |
| `Symbol` | `Utf8` | Deprecated MongoDB type |
| `RegularExpression` | `Utf8` | Converted to string representation |
| `JavaScriptCode` | `Utf8` | Converted to string representation |
| `JavaScriptCodeWithScope` | `Utf8` | Converted to string representation |
| MongoDB BSON Type(s) | Apache Arrow DataType | Notes |
| :------------------- | :-------------------- | :---- |
| `Boolean` | `Boolean` | |
| MongoDB BSON Type(s) | Apache Arrow DataType | Notes |
| :----------------------------- | :-------------------- | :--------------------------------------------- |
| `Array` (same data type) | `List` | Only for arrays with homogeneous element types |
| `Array` (different data types) | `Json` | Heterogeneous arrays are converted to JSON |
| `Document` | `Json` | Nested documents represented as JSON |
| MongoDB BSON Type(s) | Apache Arrow DataType | Notes |
| :------------------- | :-------------------- | :-------------------------------------------------------------------------- |
| `Binary` | `Utf8` | Encoded as hexadecimal string for lossless representation and compatibility |
| MongoDB BSON Type(s) | Apache Arrow DataType | Notes |
| :------------------- | :-------------------- | :------------------------------------- |
| `ObjectId` | `Utf8` | Converted to hex string representation |
| `DbPointer` | `Utf8` | Legacy MongoDB type |
| `Null` | `Null` | Represented as null in target columns |
| `MinKey` | `Utf8` | Special MongoDB comparison type |
| `MaxKey` | `Utf8` | Special MongoDB comparison type |
| `Undefined` | `Utf8` | Deprecated MongoDB type |
# MongoDB Setup (/docs/main/sources/mongo/setup)
## Prerequisites
Before you begin, ensure you have:
* **MongoDB Requirements:**
* **Version:** MongoDB 4.0 or higher (required for change streams support)
* **Supported Deployments:**
* `MongoDB Community/Enterprise Server` (replica set configuration)
* `MongoDB Atlas`
Change streams require a replica set deployment. If you're using a standalone server, you must convert it to a single-node replica set to use change streams.
* **Deployment Requirements:**
* **Replica Set**: Your MongoDB deployment must be configured as a replica set
* **Read Concern Majority**: Supermetal uses read concern "majority" to ensure consistent reads
* **Database Permissions**: User with appropriate permissions (see [Setup](#setup))
* **Network Connectivity**: Ensure Supermetal can reach your MongoDB deployment (default port: 27017)
* **TLS/SSL Support**: Supermetal supports both unencrypted and TLS/SSL encrypted connections
* **MongoDB Atlas Requirements**:
* **Network Access**: Configure network access rules to allow Supermetal to connect
* **Connection String**: Use the connection string format that includes all replica set members
***
## Setup
### Permissions Overview
Supermetal requires a dedicated MongoDB user with appropriate permissions to read data and access change streams. The recommended approach is to create a dedicated read-only user.
| MongoDB Deployment | Minimum Required Permissions |
| :----------------- | :-------------------------------------------------- |
| Self-managed | `read` role on the mongo database to replicate from |
| MongoDB Atlas | `readAnyDatabase` role |
### Create a Dedicated Read-Only MongoDB User
#### Connect to your MongoDB instance using the mongo shell with admin privileges: \[!toc]
```bash
mongosh --host --port -u -p --authenticationDatabase admin
```
Replace the placeholder values in the command with your actual information:
* ``: Your MongoDB server hostname or IP address
* ``: MongoDB port (default is 27017)
* ``: Username with admin privileges
* ``: Password for the admin user
#### Create a dedicated user for Supermetal: \[!toc]
```javascript
use admin
db.createUser({
user: "supermetal_user",
pwd: "strong-password",
roles: [
{ role: "read", db: "target-database" }
]
})
```
Replace the placeholder values in the script with your actual information:
* `strong-password`: Replace with a secure, unique password for the `supermetal_user`.
* `target-database`: The name of the database you want to replicate from.
#### Log in to your MongoDB Atlas account \[!toc]
Visit [cloud.mongodb.com](https://cloud.mongodb.com/) to access your account.
#### Select your project and navigate to the **Database Access** section \[!toc]
#### Click the "Add New Database User" button \[!toc]
#### Enter a username and password \[!toc]
#### Under "Database User Privileges", select "Built-in Role" and choose "Read Any Database" \[!toc]
#### Optionally, you can restrict access to specific clusters that contain the databases you want to replicate \[!toc]
#### Click "Add User" to create the user with the selected permissions \[!toc]
# Clickhouse (/docs/main/targets/clickhouse)
import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
import { Callout } from 'fumadocs-ui/components/callout';
import { Accordion, Accordions } from 'fumadocs-ui/components/accordion';
import { ImageZoom } from 'fumadocs-ui/components/image-zoom';
ClickHouse is a high-performance, column-oriented SQL database management system for online analytical processing (OLAP).
This guide walks you through configuring your ClickHouse database to work seamlessly with Supermetal.
***
## Features
|
Feature
|
Notes
|
|
ReplacingMergeTree
|
De-duplicates data using ClickHouse's ReplacingMergeTree engine for tables with primary keys.
|
|
Async Inserts
|
Optionally enable [async inserts](https://clickhouse.com/docs/optimize/asynchronous-inserts).
|
|
Schema Evolution
|
Automatic schema evolution
|
|
Soft Delete(s)
|
Row deletion is handled by marking a `_sm_deleted` boolean metadata column value.
|
***
## Prerequisites
Before you begin, ensure you have:
* **Supported ClickHouse Implementations:**
* ClickHouse Cloud
* ClickHouse Server:
* 21.8 or higher.
* Ensure that the ClickHouse HTTP interface (default port 8123) is enabled.
[ClickHouse Cluster](https://clickhouse.com/docs/architecture/cluster-deployment) deployment is not yet supported.
* **Database Admin Access:** Sufficient privileges to create users, grant permissions. This might require `admin` privileges.
* **Network Connectivity:** Ensure that Supermetal's services can reach your ClickHouse database. You may need to configure security groups.
***
## Setup
### Create a user & grant permissions
#### Visit SQL console \[!toc]
Log in to [ClickHouse Cloud](https://clickhouse.cloud/) to access the SQL console and select your service.
#### Create a user \[!toc]
```sql
CREATE USER IF NOT EXISTS 'supermetal_user' IDENTIFIED WITH sha256_password BY 'strong-password';
```
Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the target in Supermetal.
#### Create database \[!toc]
```sql
CREATE DATABASE IF NOT EXISTS target_database;
```
Replace `target_database` with the name of the database you want to use for storing data.
#### Grant permissions \[!toc]
```sql
GRANT CREATE DATABASE ON target_database.*, CREATE TABLE ON target_database.*, CREATE TEMPORARY TABLE ON *.*, S3 ON *.*, AZURE ON *.*, INSERT ON target_database.*, ALTER TABLE ON target_database.*, SELECT ON target_database.*, TRUNCATE ON target_database.* TO 'supermetal_user';
```
Replace `target_database` with the name of the database created in the previous step.
#### Connection Details \[!toc]
You'll need the following connection details to configure the target in Supermetal:
* HTTPs URL (e.g. `https://..clickhouse.cloud:8443`)
* Username and password you created
You may need to add the IP address of the machine running Supermetal to the ClickHouse Cloud IP whitelist from the Service Settings -> Security -> IP Access List.
#### Create a user \[!toc]
Login into ClickHouse server as an admin user to create a user for supermetal.
```sql
CREATE USER IF NOT EXISTS 'supermetal_user' IDENTIFIED WITH sha256_password BY 'strong-password';
```
Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the target in Supermetal.
#### Create database \[!toc]
```sql
CREATE DATABASE IF NOT EXISTS target_database;
```
Replace `target_database` with the name of the database you want to use for storing data.
#### Grant permissions \[!toc]
```sql
GRANT CREATE DATABASE ON target_database.*, CREATE TABLE ON target_database.*, CREATE TEMPORARY TABLE ON *.*, S3 ON *.*, AZURE ON *.*, INSERT ON target_database.*, ALTER TABLE ON target_database.*, SELECT ON target_database.*, TRUNCATE ON target_database.* TO 'supermetal_user';
```
Replace `target_database` with the name of the database created in the previous step.
#### Connection Details \[!toc]
You'll need the following connection details to configure the target in Supermetal:
* HTTP interface URL (e.g. `https://hostname:port`)
* Username and password you created
***
## Table Engine(s)
Supermetal automatically creates tables in ClickHouse with appropriate engines
* Tables without primary keys use the `MergeTree` engine
* Tables with primary keys use the `ReplacingMergeTree` engine with version tracking (`_sm_version`)
***
## Data Types Mapping
| Apache Arrow DataType | ClickHouse Type | Notes |
| :-------------------- | :-------------- | :---- |
| `Int8` | `Int8` | |
| `Int16` | `Int16` | |
| `Int32` | `Int32` | |
| `Int64` | `Int64` | |
| `UInt8` | `UInt8` | |
| `UInt16` | `UInt16` | |
| `UInt32` | `UInt32` | |
| `UInt64` | `UInt64` | |
| `Float16` | `Float32` | |
| `Float32` | `Float32` | |
| `Float64` | `Float64` | |
| `Decimal128(p,s)` | `Decimal(p,s)` | |
| `Decimal256(p,s)` | `Decimal(p,s)` | |
| Apache Arrow DataType | ClickHouse Type | Notes |
| :--------------------- | :-------------------- | :-------------------------- |
| `Date32` | `Date32` | |
| `Date64` | `DateTime64(3)` | |
| `Timestamp(s, [tz])` | `DateTime(0, [tz])` | |
| `Timestamp(ms, [tz])` | `DateTime64(3, [tz])` | |
| `Timestamp(us, [tz])` | `DateTime64(6, [tz])` | |
| `Timestamp(ns, [tz])` | `DateTime64(9, [tz])` | |
| `Time32(ms)` | `DateTime64(3)` | |
| `Time64(us)` | `DateTime64(6)` | |
| `Time64(ns)` | `DateTime64(9)` | |
| `Duration`, `Interval` | `String` | Not supported by ClickHouse |
| Apache Arrow DataType | ClickHouse Type | Notes |
| :-------------------- | :-------------- | :---- |
| `Utf8`, `LargeUtf8` | `String` | |
| Apache Arrow DataType | ClickHouse Type | Notes |
| :-------------------- | :-------------- | :---- |
| `Boolean` | `Boolean` | |
| Apache Arrow DataType | ClickHouse Type | Notes |
| :--------------------------------------------------------------------------------------------------------- | :-------------- | :------------------------------------------------------------------------------------------------------------------------------------------ |
| `Struct` | `Tuple` | |
| `Map` | `Map` | |
| `Utf8` [JSON Extension](https://arrow.apache.org/docs/format/CanonicalExtensions.html#json) (`arrow.json`) | ~~`JSON`~~ | [JSON Data Type](https://clickhouse.com/docs/sql-reference/data-types/newjson) is not supported (as it cannot ingest arbitrary json arrays) |
| Apache Arrow DataType | ClickHouse Type | Notes |
| :----------------------------------------- | :-------------- | :-------------------------------------- |
| `Binary`, `LargeBinary`, `FixedSizeBinary` | `String` | ClickHouse does not have a binary type |
| Apache Arrow DataType | ClickHouse Type | Notes |
| :-------------------------------------------- | :-------------- | :---- |
| `List`, `LargeList`, `FixedSizeList` | `Array(T)` | |
Nullable columns are marked as nullable in ClickHouse using the `Nullable(Type)` wrapper.
Supermetal preserves ClickHouse's [Arrow data type mappings](https://clickhouse.com/docs/interfaces/formats/Arrow#data-types-matching).
# ClickHouse Setup (/docs/main/targets/clickhouse/setup)
## Prerequisites
Before you begin, ensure you have:
* **Supported ClickHouse Implementations:**
* ClickHouse Cloud
* ClickHouse Server:
* 21.8 or higher.
* Ensure that the ClickHouse HTTP interface (default port 8123) is enabled.
[ClickHouse Cluster](https://clickhouse.com/docs/architecture/cluster-deployment) deployment is not yet supported.
* **Database Admin Access:** Sufficient privileges to create users, grant permissions. This might require `admin` privileges.
* **Network Connectivity:** Ensure that Supermetal's services can reach your ClickHouse database. You may need to configure security groups.
***
## Setup
### Create a user & grant permissions
#### Visit SQL console \[!toc]
Log in to [ClickHouse Cloud](https://clickhouse.cloud/) to access the SQL console and select your service.
#### Create a user \[!toc]
```sql
CREATE USER IF NOT EXISTS 'supermetal_user' IDENTIFIED WITH sha256_password BY 'strong-password';
```
Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the target in Supermetal.
#### Create database \[!toc]
```sql
CREATE DATABASE IF NOT EXISTS target_database;
```
Replace `target_database` with the name of the database you want to use for storing data.
#### Grant permissions \[!toc]
```sql
GRANT CREATE DATABASE ON target_database.*, CREATE TABLE ON target_database.*, CREATE TEMPORARY TABLE ON *.*, S3 ON *.*, AZURE ON *.*, INSERT ON target_database.*, ALTER TABLE ON target_database.*, SELECT ON target_database.*, TRUNCATE ON target_database.* TO 'supermetal_user';
```
Replace `target_database` with the name of the database created in the previous step.
#### Connection Details \[!toc]
You'll need the following connection details to configure the target in Supermetal:
* HTTPs URL (e.g. `https://..clickhouse.cloud:8443`)
* Username and password you created
You may need to add the IP address of the machine running Supermetal to the ClickHouse Cloud IP whitelist from the Service Settings -> Security -> IP Access List.
#### Create a user \[!toc]
Login into ClickHouse server as an admin user to create a user for supermetal.
```sql
CREATE USER IF NOT EXISTS 'supermetal_user' IDENTIFIED WITH sha256_password BY 'strong-password';
```
Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the target in Supermetal.
#### Create database \[!toc]
```sql
CREATE DATABASE IF NOT EXISTS target_database;
```
Replace `target_database` with the name of the database you want to use for storing data.
#### Grant permissions \[!toc]
```sql
GRANT CREATE DATABASE ON target_database.*, CREATE TABLE ON target_database.*, CREATE TEMPORARY TABLE ON *.*, S3 ON *.*, AZURE ON *.*, INSERT ON target_database.*, ALTER TABLE ON target_database.*, SELECT ON target_database.*, TRUNCATE ON target_database.* TO 'supermetal_user';
```
Replace `target_database` with the name of the database created in the previous step.
#### Connection Details \[!toc]
You'll need the following connection details to configure the target in Supermetal:
* HTTP interface URL (e.g. `https://hostname:port`)
* Username and password you created
# Architecture (/docs/main/concepts/architecture)
import SupermetalHighLevelArchitectureDiagram from '../../../../../app/diagrams/high_level_architecture';
import ConsistentPoint from '../../../../../app/diagrams/consistent_point';
import Snapshot from '../../../../../app/diagrams/snapshot';
Supermetal's architecture centers around three core principles: preserving data accuracy through Apache Arrow's type system, maintaining transactional consistency across the entire pipeline, and processing everything within a single process to massively reduce serialization overhead.
This document covers how Supermetal establishes consistent points in source databases, parallelizes snapshot operations, handles continuous CDC, and manages state to ensure reliable data delivery.
***
## Overview
Supermetal operates as a single, self-contained process that manages the entire data pipeline from source to target.
Supermetal is built on two primary technologies:
1. **Rust**: A systems programming language chosen for its performance, memory safety, and lack of garbage collection. Rust provides:
* C/C++-level predictable performance without the memory safety issues
* Efficient concurrency model
* Strong type system preventing common bugs
2. **Apache Arrow**: A columnar in-memory format serving as the universal data representation within Supermetal:
* Optimized for fast data access and processing
* Rich type system that preserves data accuracy
* Zero-copy read capabilities
* Efficient serialization to other downstream formats
The combination of Rust and Arrow creates a powerful synergy: Rust provides the safe, high-performance execution environment, while Arrow provides efficient, standardized data representation.
***
## Key Differentiators
### Data Format
Supermetal uses Apache Arrow's rich type system, ensuring:
* Precise representation of database types
* Preservation of precision, scale, signedness
* Robust schema evolution
This contrasts with conventional tools relying on JSON, CSV, or even Avro, which often collapse or simplify types to a set of base super types leading to data storage and compute inefficiencies.
### Protocol Efficiency
Supermetal runs in a single process, moving data in a streaming manner using Apache Arrow record batches. This eliminates:
* Inter Process Communication (IPC)
* Intermediate Serialization/deserialization overhead
* Text-based protocol inefficiencies (JSON parsing)
* Excessive network hops for data exchange
This contrasts with conventional tools that decouple source and target into distinct processes using IPC to exchange data in text-based formats (JSON).
### Transactional Consistency
Supermetal ensures that at any given point, target state reflects a consistent point-in-time from the upstream source database:
* Maintains transactional boundaries from the source system
* When a transaction modifies multiple tables, changes are applied such that the target reflects a consistent state
* Can batch multiple transactions together and apply them as a single atomic transaction on target for better throughput
### Infrastructure Simplicity
Supermetal runs as a single, self-contained agent with no external dependencies and separates data plane from control plane.
This contrasts with conventional tools that often require additional components such as:
* Message brokers (Kafka)
* External runtimes (JVM)
* Complex container orchestration
***
## Connector
A Supermetal deployment typically involves one or more connectors, each responsible for moving data from a specific source to a specific target.
The architecture is designed for streaming data, processing changes in real-time from source to target.
### Replication phases
The Supermetal connector manages the end-to-end process of Change Data Capture (CDC), from initial data synchronization to continuous, real-time updates.
This typically involves a two phase approach:
* [**Establishing a Consistent Point:**](#establishing-a-consistent-point) Establishes a consistent point in the source database (e.g., a specific WAL position or LSN)
* [**Initial Snapshot:**](#parallel-snapshot-processing) Captures all existing tables and data
* [**Real time Change Data Capture (CDC):**](#continuous-change-data-capture-cdc) Continuously captures changes by reading the database write-ahead log (WAL)
* [**Target Optimized Loading & Deduplication:**](#target-optimized-loading--deduplication) Applies captured changes to target database using target-specific (bulk load) optimized formats and deduplicates while maintaining transactional integrity if the target database supports it.
### Establishing a Consistent Point
{/* */}
When a connector starts, Supermetal's invokes a source database specific workflow that perform the following steps:
1. Establishes a consistent point in the source database (e.g., a specific WAL position or LSN)
2. Discovers all tables to be replicated and their schema
3. Performs the initial snapshot of existing data.
4. Switches to reading the WAL from the previously established consistent point.
The established consistent point is recorded (checkpoint) in the [state store](#state-management) only after the snapshot step is complete. This approach ensures that no transactions are missed during the transition from snapshot to CDC, providing complete data consistency.
### Parallel Snapshot Processing
Supermetal dramatically reduces initial synchronization time through advanced parallelization:
* **Inter-Table Parallelization**: Multiple tables are snapshotted simultaneously
* **Intra-Table Parallelization**: Large tables are automatically split into chunks that are processed in parallel
* **Streaming Approach**: Each chunk is processed in a streaming fashion preventing memory overload even for large tables
This parallelization strategy can reduce snapshot times by orders of magnitude compared to traditional tools, especially for large databases with many tables.
### Continuous Change Data Capture (CDC)
After completing the initial snapshot, Supermetal:
* Begins reading the WAL from the consistent point established earlier
* Decodes log entries into changes (inserts, updates, deletes)
* Maintains transaction boundaries to ensure consistency
* Provides near real-time propagation of changes to the target
* Records (checkpoint) progress in the [state store](#state-management) to resume processing in failure scenarios or process restarts.
### Target-Optimized Loading & Deduplication
On the target side, Supermetal intelligently optimizes the data loading strategy:
* Automatically selects the most efficient data format for each target database
* Uses Columnar formats (e.g., Parquet) for warehouses like Snowflake, Databricks, ClickHouse etc that ingest it efficiently, binary formats (postgres, mysql, sqlserver) through bulk loading APIs when available
* Leverages direct integration between object stores and data warehouses to perform direct loads (zero copy) from the files in [object store buffer](#buffer).
* Applies changes in a transactionally consistent manner if the target database supports it
* Transparently handles deduplication (merge/upsert operations) by tracking metadata about the source transactions
### Buffer
Supermetal includes an optional buffering mechanism to stage data between source and target components:
* Can buffer and write batched transactions to object storage (e.g., S3, GCS, Azure Blob) or any filesystem
* Decouples source and target, allowing them to operate at independent rates
* Prevents backpressure on the source when it produces data faster than the target can consume
* Enhances resilience against target unavailability
### State Management
Supermetal decouples state tracking from the data plane and ensures reliability and recovery through efficient state management:
* **Transaction-Aware Checkpointing**: Tracks progress at transaction boundaries end-to-end (source to destination).
* **Automatic Recovery**: After any interruption or crash, automatically resumes from the last successfully processed transaction
* **Consistent Resumption**: When restarting, continues from exactly where it left off without data loss or duplication
* **Minimal Reprocessing**: Only processes data that hasn't been successfully delivered to the target
***
## Data Model
The choice of data model is fundamental to Supermetal's performance and data accuracy.
### Apache Arrow as Universal Format
Supermetal uses Apache Arrow RecordBatches as its in-memory representation for data flowing through the system.
Source components read data into Arrow RecordBatches, which can then be forwarded in a zero copy manner to sinks or efficiently converted to a downstream format optimal for the target system, such as Parquet.
Arrow's columnar structure stores data column by column rather than row by row. This approach provides significant advantages for data movement and ETL processes, including more efficient memory usage, better compression potential, and faster conversion to target formats.
The layout is particularly well-suited for data integration, as it allows for fast zero-copy data exchange while preserving accuracy through Arrow's rich type system.
### Schema Mapping and Evolution
Supermetal maintains precise mappings between source/target database types and Arrow data types.
* **Accurate Type Representation**: Each source and target database type maps to a corresponding Arrow type that preserves its characteristics (precision, scale, signedness, temporal resolution)
* **Robust Schema Evolution**: When a source schema changes, Supermetal leverages Arrow's schema capabilities to detect these changes, propagate them through the pipeline, and adapt target systems accordingly
## Concurrency Model
Supermetal implements a concurrency model built on the asynchronous Tokio runtime that scales naturally with incoming data volume, allowing it to efficiently handle multiple data streams simultaneously.
### Streaming, Task-Based Processing
Data flows through Supermetal in a streaming fashion:
* **Real-Time Processing**: Data is processed as it becomes available rather than in large batch jobs
* **Chunked Processing**: Information moves through the system in chunks (RecordBatches) of configurable size
* **End-to-End Streaming**: From initial capture to target loading, the entire pipeline operates in a streaming manner
**Benefits of this Concurrency Model:**
* **Low Latency**: Minimizes the delay between source changes and target updates
* **Memory Efficiency**: Streaming approach avoids buffering entire datasets in memory
* **Flexible Throughput**: Automatically adapts to varying data volumes and rates
* **Scalability**: A single Supermetal instance can efficiently handle connections to numerous sources and targets
# Snowflake (/docs/main/targets/snowflake)
import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
import { Callout } from 'fumadocs-ui/components/callout';
import { Accordion, Accordions } from 'fumadocs-ui/components/accordion';
Snowflake is a popular cloud data warehouse that separates compute and storage.
This guide walks you through configuring your Snowflake warehouse to work seamlessly with Supermetal.
***
## Features
|
Feature
|
Notes
|
|
Bulk Insert
|
Uses the [new vectorized parquet scanner](https://www.snowflake.com/en/engineering-blog/loading-terabytes-into-snowflake-speeds-feeds-techniques/) to efficiently load data.
|
|
Transactionally Consistent
|
Guarantees that the data synced reflects a consistent point in the upstream database's timeline, respecting transaction boundaries.
|
|
Schema Evolution
|
Automatic schema evolution
|
|
Soft Delete(s)
|
Row deletion is handled by marking a `_sm_deleted` boolean metadata column value.
|
***
## Prerequisites
Before you begin, ensure you have:
* **Supported Snowflake Implementations:**
* AWS
* Azure
Snowflake on GCP is not yet supported.
* **Database Admin Access:** Sufficient privileges to create users, grant permissions. This might require `ACCOUNTADMIN` / `SYSADMIN` / `SECURITYADMIN` privileges.
***
## Setup
### Create a Role
```sql
begin;
set supermetal_role = 'SUPERMETAL_ROLE';
-- create Supermetal role
use role ACCOUNTADMIN;
create role if not exists identifier($supermetal_role);
grant role identifier($supermetal_role) to role SYSADMIN;
-- grant permissions to create database
grant CREATE DATABASE ON ACCOUNT TO ROLE identifier($supermetal_role);
commit;
```
### Create a User
```sql
begin;
set supermetal_role = 'SUPERMETAL_ROLE';
set supermetal_username = 'SUPERMETAL_USER';
set supermetal_password = 'strong-password';
-- create Supermetal user
use role ACCOUNTADMIN;
create user if not exists identifier($supermetal_username)
default_role = $supermetal_role
password = $supermetal_password;
grant role identifier($supermetal_role) to user identifier($supermetal_username);
commit;
```
Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the target in Supermetal.
Snowflake is [deprecating password authentication](https://community.snowflake.com/s/article/FAQ-Snowflake-to-block-password-only-logins). Please use other authentication methods instead.
### Create a User
```sql
begin;
set supermetal_role = 'SUPERMETAL_ROLE';
set supermetal_username = 'SUPERMETAL_USER';
-- create Supermetal user
use role ACCOUNTADMIN;
create user if not exists identifier($supermetal_username)
default_role = $supermetal_role
type = SERVICE;
grant role identifier($supermetal_role) to user identifier($supermetal_username);
commit;
```
### Create a Key Pair
Follow the snowflake [documentation](https://docs.snowflake.com/en/user-guide/key-pair-auth#configuring-key-pair-authentication) to:
* Generate a private key and a public key
* Assign the key pair to the supermetal user `ALTER USER SUPERMETAL_USER SET RSA_PUBLIC_KEY='MIIBIjANBgkqh...';`
### (Optional) Create a Warehouse
```sql
begin;
set supermetal_role = 'SUPERMETAL_ROLE';
set supermetal_username = 'SUPERMETAL_USER';
set supermetal_warehouse = 'SUPERMETAL_WAREHOUSE';
-- change role to sysadmin to create warehouse
use role sysadmin;
-- create Supermetal warehouse
create warehouse if not exists identifier($supermetal_warehouse)
warehouse_size = xsmall
warehouse_type = standard
auto_suspend = 60
auto_resume = true
initially_suspended = true;
-- uncomment to use Generation 2 Warehouse if supported
-- ALTER WAREHOUSE identifier($supermetal_warehouse) SET RESOURCE_CONSTRAINT = STANDARD_GEN_2;
-- grant Supermetal warehouse access
grant USAGE
on warehouse identifier($supermetal_warehouse)
to role identifier($supermetal_role);
commit;
```
***
## Transactions
When enabled, Supermetal uses Snowflake's [transactions](https://docs.snowflake.com/en/sql-reference/transactions) to apply changes atomically across multiple tables. This preserves transaction boundaries from your source database when possible, ensuring that all changes within a single source transaction are either fully applied or not applied at all.
***
## Data Types Mapping
| Apache Arrow DataType | Snowflake Type | Notes |
| :----------------------------------------------------------------------- | :------------- | :------------------------- |
| `Int8`, `Int16`, `Int32`, `Int64`, `UInt8`, `UInt16`, `UInt32`, `UInt64` | `NUMBER` | |
| `Float16`, `Float32`, `Float64` | `FLOAT` | |
| `Decimal128(p,s)` | `NUMBER(p,s)` | if `p <= 38` and `s <= 37` |
| `Decimal256(p,s)` | `VARCHAR` | if `p > 38` |
| Apache Arrow DataType | Snowflake Type | Notes |
| :--------------------- | :----------------- | :---- |
| `Date32`, `Date64` | `DATE` | |
| `Time32(ms)` | `TIME(0)` | |
| `Time64(us)` | `TIME(6)` | |
| `Time64(ns)` | `TIME(9)` | |
| `Timestamp(s, [tz])` | `TIMESTAMP_NTZ(0)` | |
| `Timestamp(ms, [tz])` | `TIMESTAMP_NTZ(3)` | |
| `Timestamp(us, [tz])` | `TIMESTAMP_NTZ(6)` | |
| `Timestamp(ns, [tz])` | `TIMESTAMP_NTZ(9)` | |
| `Duration`, `Interval` | `VARCHAR` | |
| Apache Arrow DataType | Snowflake Type | Notes |
| :-------------------- | :------------- | :---- |
| `Utf8`, `LargeUtf8` | `VARCHAR` | |
| Apache Arrow DataType | Snowflake Type | Notes |
| :-------------------- | :------------- | :---- |
| `Boolean` | `BOOLEAN` | |
| Apache Arrow DataType | Snowflake Type | Notes |
| :--------------------------------------------------------------------------------------------------------- | :------------- | :---- |
| `Struct` | `VARIANT` | |
| `Map` | `VARIANT` | |
| `Utf8` [JSON Extension](https://arrow.apache.org/docs/format/CanonicalExtensions.html#json) (`arrow.json`) | `VARIANT` | |
| Apache Arrow DataType | Snowflake Type | Notes |
| :----------------------------------------- | :------------- | :---- |
| `Binary`, `LargeBinary`, `FixedSizeBinary` | `BINARY` | |
| Apache Arrow DataType | Snowflake Type | Notes |
| :----------------------------------- | :------------- | :---- |
| `List`, `LargeList`, `FixedSizeList` | `VARIANT` | |
VARCHAR & BINARY data types are limited to 16MB and might be truncated ([TRUNCATECOLUMNS](https://docs.snowflake.com/en/sql-reference/sql/copy-into-table#copy-options-copyoptions) option).
***
# Snowflake Setup (/docs/main/targets/snowflake/setup)
## Prerequisites
Before you begin, ensure you have:
* **Supported Snowflake Implementations:**
* AWS
* Azure
Snowflake on GCP is not yet supported.
* **Database Admin Access:** Sufficient privileges to create users, grant permissions. This might require `ACCOUNTADMIN` / `SYSADMIN` / `SECURITYADMIN` privileges.
***
## Setup
### Create a Role
```sql
begin;
set supermetal_role = 'SUPERMETAL_ROLE';
-- create Supermetal role
use role ACCOUNTADMIN;
create role if not exists identifier($supermetal_role);
grant role identifier($supermetal_role) to role SYSADMIN;
-- grant permissions to create database
grant CREATE DATABASE ON ACCOUNT TO ROLE identifier($supermetal_role);
commit;
```
### Create a User
```sql
begin;
set supermetal_role = 'SUPERMETAL_ROLE';
set supermetal_username = 'SUPERMETAL_USER';
set supermetal_password = 'strong-password';
-- create Supermetal user
use role ACCOUNTADMIN;
create user if not exists identifier($supermetal_username)
default_role = $supermetal_role
password = $supermetal_password;
grant role identifier($supermetal_role) to user identifier($supermetal_username);
commit;
```
Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the target in Supermetal.
Snowflake is [deprecating password authentication](https://community.snowflake.com/s/article/FAQ-Snowflake-to-block-password-only-logins). Please use other authentication methods instead.
### Create a User
```sql
begin;
set supermetal_role = 'SUPERMETAL_ROLE';
set supermetal_username = 'SUPERMETAL_USER';
-- create Supermetal user
use role ACCOUNTADMIN;
create user if not exists identifier($supermetal_username)
default_role = $supermetal_role
type = SERVICE;
grant role identifier($supermetal_role) to user identifier($supermetal_username);
commit;
```
### Create a Key Pair
Follow the snowflake [documentation](https://docs.snowflake.com/en/user-guide/key-pair-auth#configuring-key-pair-authentication) to:
* Generate a private key and a public key
* Assign the key pair to the supermetal user `ALTER USER SUPERMETAL_USER SET RSA_PUBLIC_KEY='MIIBIjANBgkqh...';`
### (Optional) Create a Warehouse
```sql
begin;
set supermetal_role = 'SUPERMETAL_ROLE';
set supermetal_username = 'SUPERMETAL_USER';
set supermetal_warehouse = 'SUPERMETAL_WAREHOUSE';
-- change role to sysadmin to create warehouse
use role sysadmin;
-- create Supermetal warehouse
create warehouse if not exists identifier($supermetal_warehouse)
warehouse_size = xsmall
warehouse_type = standard
auto_suspend = 60
auto_resume = true
initially_suspended = true;
-- uncomment to use Generation 2 Warehouse if supported
-- ALTER WAREHOUSE identifier($supermetal_warehouse) SET RESOURCE_CONSTRAINT = STANDARD_GEN_2;
-- grant Supermetal warehouse access
grant USAGE
on warehouse identifier($supermetal_warehouse)
to role identifier($supermetal_role);
commit;
```
# SQL Server (/docs/main/sources/sqlserver)
import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
import { Callout } from 'fumadocs-ui/components/callout';
import { Accordion, Accordions } from 'fumadocs-ui/components/accordion';
import {
CreateAdminUser,
CreateMinimalUser,
CreateDBOwnerUser,
EnableCDCTable,
StandardEnableCDCDB,
StandardConfigureCDCJob,
RDSEnableCDCDB,
RDSConfigureCDCJob,
AzureConfigureCDCJob,
AzureEnableCDCDB
} from '@supermetal/docs-components';
SQL Server's Change Data Capture (CDC) through capture tables enables reliable, low impact change capture for data integration. Supermetal builds on this foundation to deliver efficient data extraction and replication with minimal source database overhead.
This guide covers features and configuration steps required to connect SQL Server with Supermetal, focusing on CDC setup and permissions.
***
## Features
|
Feature
|
Notes
|
|
Initial Data Sync
|
Performs fast initial data loads (snapshots) and historical backfills by chunking data within each table and reading each chunk concurrently using multiple connections:
**Physical Partitioning**: Automatically detects and utilizes existing SQL Server table partitions, extracting data from each partition in parallel
**Dynamic Range Partitioning**: For non-partitioned tables, identifies suitable partition columns (clustered indexes or primary keys) and creates optimal query boundaries based on data distribution
|
|
Change Data Capture
|
Captures data changes (inserts, updates, deletes) in real-time using SQL Server's native [CDC functionality](https://learn.microsoft.com/en-us/sql/relational-databases/track-changes/about-change-data-capture-sql-server). Leverages the [SQL Server Agent](https://learn.microsoft.com/en-us/sql/ssms/agent/sql-server-agent) to asynchronously populate dedicated [CDC capture tables](https://learn.microsoft.com/en-us/sql/relational-databases/system-tables/change-data-capture-tables-transact-sql) from transaction log entries.
|
|
CDC Processing Modes
|
**Standard CDC Mode**: Processes changes from individual CDC capture tables with minimal latency. This mode delivers lower latency data replication and is suitable when you don't need strict transactional consistency across multiple tables.
**Strictly Transactional CDC Mode**: When enabled (`transactional_cdc_enabled: true`), Supermetal guarantees that all changes across multiple tables are processed while preserving their original transaction boundaries. This ensures that related data changes stay grouped together exactly as they were in the source database—for example, when an order and its line items were updated together, they will remain grouped together during replication.
|
|
Configurable CDC Processing
|
Fine-tune CDC behavior with configurable parameters for permissions/roles, polling intervals, maximum scan cycles, and batch sizes.
|
|
Catalog Support
|
Ability to configure which schemas, tables and columns to sync.
|
***
## Prerequisites
Before you begin, ensure you have:
* **SQL Server Requirements:**
* **Version:** SQL Server 2016 or higher
* **Supported Editions:**
* `Standard SQL Server` (Enterprise, Standard, Personal, Azure SQL Managed Instance)
* `Azure SQL Database`
* `Amazon RDS for SQL Server`
CDC is not supported on `Azure Synapse`, `Azure SQL Edge`, or `Azure Synapse Serverless`.
* **SQL Server Agent:** The [SQL Server Agent](https://learn.microsoft.com/en-us/sql/ssms/agent/sql-server-agent) service must be running, as it manages the CDC capture jobs.
* **Setup Requirements:**
* **Database Permissions:** `sysadmin` or `db_owner` role for CDC setup (see [Setup](#setup) for role-specific instructions)
* **Network Connectivity:** Ensure Supermetal can reach your SQL Server (default port: 1433)
***
## Setup
### Permissions Overview
Supermetal's SQL Server integration provides different levels of Change Data Capture (CDC) automation depending on your SQL Server edition and user permissions. The table below summarizes the automation capabilities:
| SQL Server Edition | User Role | Automation Capabilities |
| :----------------- | :----------- | :---------------------------------------------------------------------------------------------- |
| Standard | `sysadmin` | Full CDC workflow automation (database CDC enablement, table CDC enablement, job configuration) |
| Standard | `db_owner` | Partial automation (table CDC enablement, job configuration) |
| Standard | Regular user | Limited automation (requires manual setup by a privileged user) |
| Azure SQL Database | `db_owner` | Full CDC workflow automation (all CDC operations) |
| Azure SQL Database | Regular user | Limited automation (requires manual setup by a privileged user) |
| AWS RDS | `db_owner` | Partial automation (table CDC enablement, job configuration) |
| AWS RDS | Regular user | Limited automation (requires manual setup by a privileged user) |
We recommend creating a `db_owner` user for all SQL server editions as it provides the best tradeoff between automation and permissions.
### Setup Instructions
#### SYSADMIN Role \[!toc]
The `sysadmin` server role provides the highest level of permissions, enabling Supermetal to fully automate the CDC workflow, including enabling CDC at the database level, configuring tables for CDC, and managing CDC jobs.
The `sysadmin` role is only available on Standard SQL Server editions. Azure SQL Database and AWS RDS restrict the use of `sysadmin` privileges.
#### Create a SYSADMIN User \[!toc]
AWS RDS for SQL Server does not allow the creation of `sysadmin` users. Please use the DB\_OWNER role instead.
Azure SQL Database does not support the `sysadmin` role. Please use the DB\_OWNER role instead.
Replace `'strong-password'` with a secure, unique password for the Supermetal user. Store this password securely, as you'll need it when configuring the source in Supermetal.
#### DB\_OWNER Role \[!toc]
The `db_owner` database role provides sufficient permissions for most CDC operations except enabling CDC at the database level on Standard SQL Server and AWS RDS (this step must be performed by a `sysadmin`). On Azure SQL Database, `db_owner` can automate the entire CDC workflow.
#### Create a DB\_OWNER User \[!toc]
Replace the placeholder values in the script with your actual information:
* `'strong-password'`: Replace with a secure, unique password for the `supermetal_user`.
* `DATABASE_NAME`: The name of the database you want to replicate from.
#### Enable Snapshot Isolation \[!toc]
Enable snapshot isolation on the database to ensure consistent reads during CDC operations. This step must be performed by a `sysadmin` user .
Not required for Azure SQL Database. Supermetal automatically manages snapshot isolation enablement.
Replace `DATABASE_NAME` with the name of the database you want to replicate from.
#### Enable CDC at Database Level \[!toc]
This step is required for Standard SQL Server and AWS RDS. It creates the CDC schema and starts the SQL Server Agent jobs necessary for CDC. This step must be performed by a user with `sysadmin` privileges.
Not required for Azure SQL Database. Supermetal automatically manages CDC enablement at the database level.
#### MINIMAL\_ACCESS\_USER Role \[!toc]
A minimal access user has limited permissions and requires a user with higher privileges (such as `sysadmin` or `db_owner`) to set up the CDC environment. After setup, the minimal access user can be used for ongoing replication.
#### Create a Minimal Access User \[!toc]
Replace the placeholder values in the script with your actual information:
* `'strong-password'`: Replace with a secure, unique password for the `supermetal_user`.
* `DATABASE_NAME`: The name of the database you want to replicate from.
* `SCHEMA_NAME`: The schema of the table you want to replicate from.
* `TABLE_NAME`: The name of the table you want to replicate from.
#### Enable Snapshot Isolation \[!toc]
Enable snapshot isolation on the database to ensure consistent reads during CDC operations. This step must be performed by a user with appropriate privileges (typically `sysadmin` or `db_owner`).
Replace `DATABASE_NAME` with the name of the database you want to replicate from.
#### Enable CDC at Database Level \[!toc]
This step must be performed by a user with appropriate privileges (typically `sysadmin` for Standard and RDS, `db_owner` for Azure).
If this user does not have SELECT permission on the database schema, then we need to explicitly grant SELECT permission on the \[cdc] schema:
```sql
GRANT SELECT on SCHEMA::[cdc] to [supermetal_user];
```
#### Enable CDC for Tables
The following query creates a capture instance for a specific table. This step must be performed by a user with appropriate privileges.
Replace the placeholder values in the script with your actual information:
* `SCHEMA_NAME`: The schema of the table you want to replicate from.
* `TABLE_NAME`: The name of the table you want to replicate from.
#### Configure CDC Job Parameters (Optional)
You can customize the CDC job parameters to optimize performance based on your workload.
Replace the parameter values with appropriate values for your environment:
* `INSERT_MAX_SCANS_VALUE`: Maximum number of scan cycles (default: 10)
* `INSERT_MAX_TRANS_VALUE`: Maximum number of transactions (default: 500)
* `INSERT_POLL_INTERVAL_SECS_VALUE`: Polling interval in seconds (default: 5)
***
## Data Types Mapping
| SQL Server Type(s) | Apache Arrow DataType | Notes |
| :----------------- | :-------------------------------------- | :--------------------------------------------------------------------------- |
| `TINYINT` | `UInt8` | |
| `SMALLINT` | `Int16` | |
| `INT` | `Int32` | |
| `BIGINT` | `Int64` | |
| `SMALLMONEY` | `Decimal128(10, 4)` | Fixed precision with 4 decimal places. |
| `MONEY` | `Decimal128(19, 4)` | Fixed precision with 4 decimal places. |
| `DECIMAL(p,s)` | `Decimal128(p, s)` / `Decimal256(p, s)` | Uses SQL Server precision and scale directly. Decimal256 for precision > 38. |
| `NUMERIC(p,s)` | `Decimal128(p, s)` / `Decimal256(p, s)` | Same as DECIMAL. Default precision is 18. |
| `REAL` | `Float64` | |
| `FLOAT` | `Float64` | |
| SQL Server Type(s) | Apache Arrow DataType | Notes |
| :----------------- | :-------------------- | :---- |
| `BIT` | `Boolean` | |
| SQL Server Type(s) | Apache Arrow DataType | Notes |
| :-------------------- | :--------------------- | :------------------------------------------------------------------------ |
| `DATE` | `Date32` | Days since UNIX epoch. |
| `TIME` | `Time64(ns)` | Nanoseconds since midnight. |
| `SMALLDATETIME` | `Timestamp(s)` | Second-level precision, minute-level accuracy (rounds to nearest minute). |
| `DATETIME` | `Timestamp(ms)` | Millisecond precision, 3.33ms accuracy. |
| `DATETIME2(0)` | `Timestamp(s)` | Second precision. |
| `DATETIME2(1-3)` | `Timestamp(ms)` | Millisecond precision (1-3 fractional digits). |
| `DATETIME2(4-6)` | `Timestamp(μs)` | Microsecond precision (4-6 fractional digits). |
| `DATETIME2(7)` | `Utf8` | Mapped to string to preserve full range (years 1-9999). |
| `DATETIMEOFFSET(0)` | `Timestamp(s, "UTC")` | Second precision, normalized to UTC. |
| `DATETIMEOFFSET(1-3)` | `Timestamp(ms, "UTC")` | Millisecond precision (1-3 fractional digits), normalized to UTC. |
| `DATETIMEOFFSET(4-6)` | `Timestamp(μs, "UTC")` | Microsecond precision (4-6 fractional digits), normalized to UTC. |
| `DATETIMEOFFSET(7)` | `Utf8` | Mapped to string to preserve full range (years 0001-9999). |
| SQL Server Type(s) | Apache Arrow DataType | Notes |
| :------------------ | :-------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `CHAR`, `VARCHAR` | `Utf8` | |
| `NCHAR`, `NVARCHAR` | `Utf8` | Unicode character strings. |
| `TEXT` | `Utf8` | Legacy large text type ([deprecated](https://learn.microsoft.com/en-us/sql/t-sql/data-types/ntext-text-and-image-transact-sql)). Use VARCHAR(MAX) instead. |
| `NTEXT` | `Utf8` | Legacy large Unicode text type ([deprecated](https://learn.microsoft.com/en-us/sql/t-sql/data-types/ntext-text-and-image-transact-sql)). Use NVARCHAR(MAX) instead. |
| `UNIQUEIDENTIFIER` | `Utf8` | UUID/GUID represented as a string. |
| SQL Server Type(s) | Apache Arrow DataType | Notes |
| :-------------------- | :-------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `BINARY`, `VARBINARY` | `Binary` | |
| `ROWVERSION` | `Binary` | Automatically incremented binary identifier. See [rowversion](https://learn.microsoft.com/en-us/sql/t-sql/data-types/rowversion-transact-sql). |
| `IMAGE` | `Binary` | Legacy large binary type ([deprecated](https://learn.microsoft.com/en-us/sql/t-sql/data-types/ntext-text-and-image-transact-sql)). Use VARBINARY(MAX) instead. |
| SQL Server Type(s) | Apache Arrow DataType |
| :----------------- | :-------------------- |
| `XML` | `Utf8` |
| `GEOGRAPHY` | *Unsupported* |
| `GEOMETRY` | *Unsupported* |
| `HIERARCHYID` | *Unsupported* |
***
# SQL Server Setup (/docs/main/sources/sqlserver/setup)
## Prerequisites
Before you begin, ensure you have:
* **SQL Server Requirements:**
* **Version:** SQL Server 2016 or higher
* **Supported Editions:**
* `Standard SQL Server` (Enterprise, Standard, Personal, Azure SQL Managed Instance)
* `Azure SQL Database`
* `Amazon RDS for SQL Server`
CDC is not supported on `Azure Synapse`, `Azure SQL Edge`, or `Azure Synapse Serverless`.
* **SQL Server Agent:** The [SQL Server Agent](https://learn.microsoft.com/en-us/sql/ssms/agent/sql-server-agent) service must be running, as it manages the CDC capture jobs.
* **Setup Requirements:**
* **Database Permissions:** `sysadmin` or `db_owner` role for CDC setup (see [Setup](#setup) for role-specific instructions)
* **Network Connectivity:** Ensure Supermetal can reach your SQL Server (default port: 1433)
***
## Setup
### Permissions Overview
Supermetal's SQL Server integration provides different levels of Change Data Capture (CDC) automation depending on your SQL Server edition and user permissions. The table below summarizes the automation capabilities:
| SQL Server Edition | User Role | Automation Capabilities |
| :----------------- | :----------- | :---------------------------------------------------------------------------------------------- |
| Standard | `sysadmin` | Full CDC workflow automation (database CDC enablement, table CDC enablement, job configuration) |
| Standard | `db_owner` | Partial automation (table CDC enablement, job configuration) |
| Standard | Regular user | Limited automation (requires manual setup by a privileged user) |
| Azure SQL Database | `db_owner` | Full CDC workflow automation (all CDC operations) |
| Azure SQL Database | Regular user | Limited automation (requires manual setup by a privileged user) |
| AWS RDS | `db_owner` | Partial automation (table CDC enablement, job configuration) |
| AWS RDS | Regular user | Limited automation (requires manual setup by a privileged user) |
We recommend creating a `db_owner` user for all SQL server editions as it provides the best tradeoff between automation and permissions.
### Setup Instructions
#### SYSADMIN Role \[!toc]
The `sysadmin` server role provides the highest level of permissions, enabling Supermetal to fully automate the CDC workflow, including enabling CDC at the database level, configuring tables for CDC, and managing CDC jobs.
The `sysadmin` role is only available on Standard SQL Server editions. Azure SQL Database and AWS RDS restrict the use of `sysadmin` privileges.
#### Create a SYSADMIN User \[!toc]
AWS RDS for SQL Server does not allow the creation of `sysadmin` users. Please use the DB\_OWNER role instead.
Azure SQL Database does not support the `sysadmin` role. Please use the DB\_OWNER role instead.
Replace `'strong-password'` with a secure, unique password for the Supermetal user. Store this password securely, as you'll need it when configuring the source in Supermetal.
#### DB\_OWNER Role \[!toc]
The `db_owner` database role provides sufficient permissions for most CDC operations except enabling CDC at the database level on Standard SQL Server and AWS RDS (this step must be performed by a `sysadmin`). On Azure SQL Database, `db_owner` can automate the entire CDC workflow.
#### Create a DB\_OWNER User \[!toc]
Replace the placeholder values in the script with your actual information:
* `'strong-password'`: Replace with a secure, unique password for the `supermetal_user`.
* `DATABASE_NAME`: The name of the database you want to replicate from.
#### Enable Snapshot Isolation \[!toc]
Enable snapshot isolation on the database to ensure consistent reads during CDC operations. This step must be performed by a `sysadmin` user .
Not required for Azure SQL Database. Supermetal automatically manages snapshot isolation enablement.
Replace `DATABASE_NAME` with the name of the database you want to replicate from.
#### Enable CDC at Database Level \[!toc]
This step is required for Standard SQL Server and AWS RDS. It creates the CDC schema and starts the SQL Server Agent jobs necessary for CDC. This step must be performed by a user with `sysadmin` privileges.
Not required for Azure SQL Database. Supermetal automatically manages CDC enablement at the database level.
#### MINIMAL\_ACCESS\_USER Role \[!toc]
A minimal access user has limited permissions and requires a user with higher privileges (such as `sysadmin` or `db_owner`) to set up the CDC environment. After setup, the minimal access user can be used for ongoing replication.
#### Create a Minimal Access User \[!toc]
Replace the placeholder values in the script with your actual information:
* `'strong-password'`: Replace with a secure, unique password for the `supermetal_user`.
* `DATABASE_NAME`: The name of the database you want to replicate from.
* `SCHEMA_NAME`: The schema of the table you want to replicate from.
* `TABLE_NAME`: The name of the table you want to replicate from.
#### Enable Snapshot Isolation \[!toc]
Enable snapshot isolation on the database to ensure consistent reads during CDC operations. This step must be performed by a user with appropriate privileges (typically `sysadmin` or `db_owner`).
Replace `DATABASE_NAME` with the name of the database you want to replicate from.
#### Enable CDC at Database Level \[!toc]
This step must be performed by a user with appropriate privileges (typically `sysadmin` for Standard and RDS, `db_owner` for Azure).
If this user does not have SELECT permission on the database schema, then we need to explicitly grant SELECT permission on the \[cdc] schema:
```sql
GRANT SELECT on SCHEMA::[cdc] to [supermetal_user];
```
#### Enable CDC for Tables
The following query creates a capture instance for a specific table. This step must be performed by a user with appropriate privileges.
Replace the placeholder values in the script with your actual information:
* `SCHEMA_NAME`: The schema of the table you want to replicate from.
* `TABLE_NAME`: The name of the table you want to replicate from.
#### Configure CDC Job Parameters (Optional)
You can customize the CDC job parameters to optimize performance based on your workload.
Replace the parameter values with appropriate values for your environment:
* `INSERT_MAX_SCANS_VALUE`: Maximum number of scan cycles (default: 10)
* `INSERT_MAX_TRANS_VALUE`: Maximum number of transactions (default: 500)
* `INSERT_POLL_INTERVAL_SECS_VALUE`: Polling interval in seconds (default: 5)
# MySQL (/docs/main/sources/mysql)
import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
import { Callout } from 'fumadocs-ui/components/callout';
import { Accordion, Accordions } from 'fumadocs-ui/components/accordion';
MySQL's binary log (binlog) provides a reliable foundation for Change Data Capture and data replication. Supermetal builds on this capability to deliver efficient data extraction with minimal database overhead, leveraging Global Transaction Identifiers (GTIDs) for guaranteed consistency.
This guide covers the features, prerequisites, and configuration steps required to connect MySQL with Supermetal mainly focusing on binlog setup and permissions.
***
## Features
|
Feature
|
Notes
|
|
Initial Data Sync
|
Performs fast initial data loads (snapshots) and historical backfills by chunking data within each table and reading each chunk concurrently using multiple connections:
**Physical Partitioning**: Automatically detects and utilizes existing MySQL table partitions, extracting data from each partition in parallel
**Dynamic Range Partitioning**: For non-partitioned tables with primary keys, creates optimized query boundaries based on data distribution for parallel extraction
|
|
Change Data Capture
|
Captures data changes in real-time using MySQL's native [binary log](https://dev.mysql.com/doc/refman/8.0/en/binary-log.html) with minimal database impact.
**GTID-based Consistency**: Uses [Global Transaction Identifiers](https://dev.mysql.com/doc/refman/8.0/en/replication-gtids.html) to maintain consistent replication position and preserve transaction boundaries, ensuring reliable recovery and preventing data loss.
Supermetal validates the GTID position against the server's purged GTIDs to ensure continuity, preventing replication from a position that's no longer available in the binary log.
|
|
Schema Evolution
|
Precise mapping of MySQL types to Apache Arrow with specialized handling for complex types, including DECIMAL (up to 76 digits precision), SET types (as arrays) and microsecond-precision temporal types. Schema evolution is properly tracked to ensure consistent downstream target compatibility.
|
|
Transactional Consistency
|
Guarantees that the data synced reflects a consistent point in the upstream database's timeline, respecting transaction boundaries through GTID-based replication.
|
|
Catalog Support
|
Ability to configure which schemas, tables and columns to sync.
|
***
## Prerequisites
Before you begin, ensure you have:
* **MySQL Requirements:**
* **Version:** MySQL 8.0 or later with GTID mode enabled
* **Supported Deployments:**
* `Self-Hosted MySQL`
* `Amazon RDS for MySQL` (Coming Soon)
* `Azure Database for MySQL` (Coming Soon)
GTID mode must be enabled on your MySQL server for Change Data Capture functionality. See [MySQL GTID Documentation](https://dev.mysql.com/doc/refman/8.0/en/replication-gtids.html) for details.
* **Binary Log Requirements:**
* **Binary Log Format**: Must be set to `ROW` format
* **Binary Log Retention**: Configure appropriate retention period (recommended: at least 3 days)
* **Deployment Requirements:**
* **Database Permissions**: User with `SELECT`, `REPLICATION SLAVE`, and `REPLICATION CLIENT` privileges
* **Network Connectivity**: Ensure Supermetal can reach your MySQL server (default port: 3306)
***
## Setup
### Permissions Overview
Supermetal requires a dedicated MySQL user with appropriate permissions to read data and access the binary log for CDC.
| Required Permission | Purpose |
| :------------------- | :------------------------------------------------------ |
| `SELECT` | Read data from tables during snapshot and CDC |
| `REPLICATION SLAVE` | Access the binary log for CDC |
| `REPLICATION CLIENT` | Query server status information including GTID position |
### Setup Instructions
### Self-Hosted MySQL Setup \[!toc]
#### Enable Binary Logging and GTID Mode \[!toc]
Edit your MySQL configuration file (`my.cnf` or `my.ini`):
```ini
[mysqld]
# Server identification (must be unique in your infrastructure)
server-id = 223344
# Enable binary logging
log_bin = mysql-bin
binlog_format = ROW
binlog_row_image = FULL
binlog_row_metadata = FULL
# Enable GTID mode
gtid_mode = ON
enforce_gtid_consistency = ON
# Set binary log retention (3 days)
binlog_expire_logs_seconds = 259200
```
* `server-id`: Must be unique for each server in your replication topology. Can be any non-zero value between 1 and 4294967295.
* `log_bin`: Base name for binary log files.
* `binlog_format`: Must be set to ROW for CDC to work properly.
* `binlog_row_image`: Must be set to FULL to capture complete before/after row images, which is required for CDC.
* `binlog_row_metadata`: Must be set to FULL to include complete metadata in binary log events.
* `gtid_mode` and `enforce_gtid_consistency`: Required for reliable replication position tracking.
* `binlog_expire_logs_seconds`: Controls how long binary logs are retained before automatic deletion.
#### Create a Dedicated User for Replication \[!toc]
Connect to MySQL as a privileged user and run:
```sql
CREATE USER 'supermetal_user'@'%' IDENTIFIED BY 'strong-password';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'supermetal_user'@'%';
FLUSH PRIVILEGES;
```
Replace the following variables with your own values:
* `supermetal_user`: Username for Supermetal access
* `strong-password`: Secure password for the Supermetal user
#### Verify Configuration \[!toc]
Verify binary logging and GTID configuration:
```sql
SHOW VARIABLES LIKE 'gtid_mode';
SHOW VARIABLES LIKE 'enforce_gtid_consistency';
SHOW VARIABLES LIKE 'log_bin';
SHOW VARIABLES LIKE 'binlog_format';
SHOW VARIABLES LIKE 'binlog_row_image';
SHOW VARIABLES LIKE 'binlog_row_metadata';
SHOW VARIABLES LIKE 'server_id';
```
The output should show:
* `gtid_mode`: ON
* `enforce_gtid_consistency`: ON
* `log_bin`: ON
* `binlog_format`: ROW
* `binlog_row_image`: FULL
* `binlog_row_metadata`: FULL
* `server_id`: A non-zero value
### Amazon RDS MySQL Setup \[!toc]
#### Create a Parameter Group for GTID Mode \[!toc]
1. In the AWS RDS console, navigate to "Parameter groups"
2. Click "Create parameter group"
3. Select the MySQL engine and version that matches your RDS instance
4. Enter a name and description (e.g., "supermetal-cdc-params")
5. Click "Create"
6. Select the newly created parameter group and modify the following parameters:
* `gtid-mode`: ON
* `enforce_gtid_consistency`: ON
* `binlog_format`: ROW
* `binlog_row_image`: FULL
* `binlog_row_metadata`: FULL
In RDS, some parameters cannot be modified, including `server-id`. RDS automatically assigns a unique server-id to each instance, so you don't need to configure this manually.
#### Apply the Parameter Group to Your RDS Instance \[!toc]
1. Navigate to your RDS instances and select your MySQL instance
2. Click "Modify"
3. Under "Additional configuration", select the parameter group you created
4. Choose "Apply immediately" or schedule for your next maintenance window
5. Click "Continue" and confirm the changes
#### Set Binary Log Retention Period \[!toc]
Connect to your RDS MySQL instance and run:
```sql
CALL mysql.rds_set_configuration('binlog retention hours', 72);
```
Amazon RDS uses `binlog retention hours` instead of the standard `binlog_expire_logs_seconds` parameter. The default value is 0 (immediate removal), so this step is essential.
#### Create a User for Replication \[!toc]
```sql
CREATE USER 'supermetal_user'@'%' IDENTIFIED BY 'strong-password';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'supermetal_user'@'%';
FLUSH PRIVILEGES;
```
Replace the following variables with your own values:
* `supermetal_user`: Username for Supermetal access
* `strong-password`: Secure password for the Supermetal user
### Azure Database for MySQL Setup \[!toc]
#### Configure Server Parameters \[!toc]
1. In the Azure portal, navigate to your MySQL server resource
2. Select "Server parameters" under the Settings section
3. Find and set the following parameters:
* `gtid_mode`: ON
* `enforce_gtid_consistency`: ON
* `binlog_format`: ROW
* `binlog_row_image`: FULL
* `binlog_row_metadata`: FULL
4. Click "Save" to apply the changes
5. Restart your Azure Database for MySQL server if prompted
Some parameters might be read-only in Azure Database for MySQL, particularly in Flexible Server configurations. The `server-id` is automatically assigned by Azure and cannot be modified.
#### Create a User for Replication \[!toc]
Connect to your Azure MySQL instance and run:
```sql
CREATE USER 'supermetal_user'@'%' IDENTIFIED BY 'strong-password';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'supermetal_user'@'%';
FLUSH PRIVILEGES;
```
Replace the following variables with your own values:
* `supermetal_user`: Username for Supermetal access
* `strong-password`: Secure password for the Supermetal user
Ensure your Azure MySQL server's firewall rules allow connections from the Supermetal service IP addresses.
***
## Data Types Mapping
Supermetal automatically maps MySQL data types to Apache Arrow data types according to the following mapping:
| MySQL Type | Arrow Type | Notes |
| ----------- | ------------------------------------ | ------------------------------------------------------------------------------------------- |
| `TINYINT` | `Int8` / `UInt8` | Signed/unsigned determined by column definition |
| `SMALLINT` | `Int16` / `UInt16` | Signed/unsigned determined by column definition |
| `MEDIUMINT` | `Int32` / `UInt32` | Signed/unsigned determined by column definition |
| `INT` | `Int32` / `UInt32` | Signed/unsigned determined by column definition |
| `BIGINT` | `Int64` / `UInt64` | Signed/unsigned determined by column definition |
| `FLOAT` | `Float32` | |
| `DOUBLE` | `Float64` | |
| `DECIMAL` | `Decimal128` / `Decimal256` / `Utf8` | `Decimal128` for precision ≤ 38, `Decimal256` for precision ≤ 76, `Utf8` for precision > 76 |
| MySQL Type | Arrow Type | Notes |
| ---------- | ------------ | ------------------------------------------------------------------------------ |
| `CHAR` | `Utf8` | |
| `VARCHAR` | `Utf8` | |
| `TEXT` | `Utf8` | Includes all TEXT variants (TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT) |
| `ENUM` | `Utf8` | Original enum values preserved as strings |
| `SET` | `List` | Converted to arrays of strings (e.g., 'value1,value2' → \["value1", "value2"]) |
| `JSON` | `Utf8` | Preserved as JSON string |
| MySQL Type | Arrow Type | Notes |
| ----------- | ---------- | ------------------------------------------------------------------------------- |
| `BINARY` | `Binary` | |
| `VARBINARY` | `Binary` | |
| `BLOB` | `Binary` | Includes all BLOB variants (TINYBLOB, BLOB, MEDIUMBLOB, LONGBLOB) |
| `BIT` | `Utf8` | Converted to binary string representation (e.g., BIT(4) value 5 becomes "0101") |
| MySQL Type | Arrow Type | Notes |
| ----------- | -------------------------- | ------------------------------------------------------------------------------------------- |
| `DATE` | `Date32` | |
| `TIME` | `IntervalDayTime` / `Utf8` | `IntervalDayTime` for precision ≤ 3 (milliseconds), `Utf8` for precision > 3 (microseconds) |
| `DATETIME` | `Timestamp(Microsecond)` | Full microsecond precision without timezone |
| `TIMESTAMP` | `Timestamp(Microsecond)` | Full microsecond precision, stored in UTC internally by MySQL |
| `YEAR` | `UInt16` | |
| MySQL Type | Arrow Type |
| ----------------- | ------------- |
| `GEOMETRY` | *Unsupported* |
| `POINT` | *Unsupported* |
| `LINESTRING` | *Unsupported* |
| `POLYGON` | *Unsupported* |
| `MULTIPOINT` | *Unsupported* |
| `MULTILINESTRING` | *Unsupported* |
| `MULTIPOLYGON` | *Unsupported* |
| `GEOMCOLLECTION` | *Unsupported* |
| `VECTOR` | *Unsupported* |
Columns with unsupported data types are automatically excluded from replication. Tables containing only unsupported column types cannot be replicated.
# MySQL Setup (/docs/main/sources/mysql/setup)
## Prerequisites
Before you begin, ensure you have:
* **MySQL Requirements:**
* **Version:** MySQL 8.0 or later with GTID mode enabled
* **Supported Deployments:**
* `Self-Hosted MySQL`
* `Amazon RDS for MySQL` (Coming Soon)
* `Azure Database for MySQL` (Coming Soon)
GTID mode must be enabled on your MySQL server for Change Data Capture functionality. See [MySQL GTID Documentation](https://dev.mysql.com/doc/refman/8.0/en/replication-gtids.html) for details.
* **Binary Log Requirements:**
* **Binary Log Format**: Must be set to `ROW` format
* **Binary Log Retention**: Configure appropriate retention period (recommended: at least 3 days)
* **Deployment Requirements:**
* **Database Permissions**: User with `SELECT`, `REPLICATION SLAVE`, and `REPLICATION CLIENT` privileges
* **Network Connectivity**: Ensure Supermetal can reach your MySQL server (default port: 3306)
***
## Setup
### Permissions Overview
Supermetal requires a dedicated MySQL user with appropriate permissions to read data and access the binary log for CDC.
| Required Permission | Purpose |
| :------------------- | :------------------------------------------------------ |
| `SELECT` | Read data from tables during snapshot and CDC |
| `REPLICATION SLAVE` | Access the binary log for CDC |
| `REPLICATION CLIENT` | Query server status information including GTID position |
### Setup Instructions
### Self-Hosted MySQL Setup \[!toc]
#### Enable Binary Logging and GTID Mode \[!toc]
Edit your MySQL configuration file (`my.cnf` or `my.ini`):
```ini
[mysqld]
# Server identification (must be unique in your infrastructure)
server-id = 223344
# Enable binary logging
log_bin = mysql-bin
binlog_format = ROW
binlog_row_image = FULL
binlog_row_metadata = FULL
# Enable GTID mode
gtid_mode = ON
enforce_gtid_consistency = ON
# Set binary log retention (3 days)
binlog_expire_logs_seconds = 259200
```
* `server-id`: Must be unique for each server in your replication topology. Can be any non-zero value between 1 and 4294967295.
* `log_bin`: Base name for binary log files.
* `binlog_format`: Must be set to ROW for CDC to work properly.
* `binlog_row_image`: Must be set to FULL to capture complete before/after row images, which is required for CDC.
* `binlog_row_metadata`: Must be set to FULL to include complete metadata in binary log events.
* `gtid_mode` and `enforce_gtid_consistency`: Required for reliable replication position tracking.
* `binlog_expire_logs_seconds`: Controls how long binary logs are retained before automatic deletion.
#### Create a Dedicated User for Replication \[!toc]
Connect to MySQL as a privileged user and run:
```sql
CREATE USER 'supermetal_user'@'%' IDENTIFIED BY 'strong-password';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'supermetal_user'@'%';
FLUSH PRIVILEGES;
```
Replace the following variables with your own values:
* `supermetal_user`: Username for Supermetal access
* `strong-password`: Secure password for the Supermetal user
#### Verify Configuration \[!toc]
Verify binary logging and GTID configuration:
```sql
SHOW VARIABLES LIKE 'gtid_mode';
SHOW VARIABLES LIKE 'enforce_gtid_consistency';
SHOW VARIABLES LIKE 'log_bin';
SHOW VARIABLES LIKE 'binlog_format';
SHOW VARIABLES LIKE 'binlog_row_image';
SHOW VARIABLES LIKE 'binlog_row_metadata';
SHOW VARIABLES LIKE 'server_id';
```
The output should show:
* `gtid_mode`: ON
* `enforce_gtid_consistency`: ON
* `log_bin`: ON
* `binlog_format`: ROW
* `binlog_row_image`: FULL
* `binlog_row_metadata`: FULL
* `server_id`: A non-zero value
### Amazon RDS MySQL Setup \[!toc]
#### Create a Parameter Group for GTID Mode \[!toc]
1. In the AWS RDS console, navigate to "Parameter groups"
2. Click "Create parameter group"
3. Select the MySQL engine and version that matches your RDS instance
4. Enter a name and description (e.g., "supermetal-cdc-params")
5. Click "Create"
6. Select the newly created parameter group and modify the following parameters:
* `gtid-mode`: ON
* `enforce_gtid_consistency`: ON
* `binlog_format`: ROW
* `binlog_row_image`: FULL
* `binlog_row_metadata`: FULL
In RDS, some parameters cannot be modified, including `server-id`. RDS automatically assigns a unique server-id to each instance, so you don't need to configure this manually.
#### Apply the Parameter Group to Your RDS Instance \[!toc]
1. Navigate to your RDS instances and select your MySQL instance
2. Click "Modify"
3. Under "Additional configuration", select the parameter group you created
4. Choose "Apply immediately" or schedule for your next maintenance window
5. Click "Continue" and confirm the changes
#### Set Binary Log Retention Period \[!toc]
Connect to your RDS MySQL instance and run:
```sql
CALL mysql.rds_set_configuration('binlog retention hours', 72);
```
Amazon RDS uses `binlog retention hours` instead of the standard `binlog_expire_logs_seconds` parameter. The default value is 0 (immediate removal), so this step is essential.
#### Create a User for Replication \[!toc]
```sql
CREATE USER 'supermetal_user'@'%' IDENTIFIED BY 'strong-password';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'supermetal_user'@'%';
FLUSH PRIVILEGES;
```
Replace the following variables with your own values:
* `supermetal_user`: Username for Supermetal access
* `strong-password`: Secure password for the Supermetal user
### Azure Database for MySQL Setup \[!toc]
#### Configure Server Parameters \[!toc]
1. In the Azure portal, navigate to your MySQL server resource
2. Select "Server parameters" under the Settings section
3. Find and set the following parameters:
* `gtid_mode`: ON
* `enforce_gtid_consistency`: ON
* `binlog_format`: ROW
* `binlog_row_image`: FULL
* `binlog_row_metadata`: FULL
4. Click "Save" to apply the changes
5. Restart your Azure Database for MySQL server if prompted
Some parameters might be read-only in Azure Database for MySQL, particularly in Flexible Server configurations. The `server-id` is automatically assigned by Azure and cannot be modified.
#### Create a User for Replication \[!toc]
Connect to your Azure MySQL instance and run:
```sql
CREATE USER 'supermetal_user'@'%' IDENTIFIED BY 'strong-password';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'supermetal_user'@'%';
FLUSH PRIVILEGES;
```
Replace the following variables with your own values:
* `supermetal_user`: Username for Supermetal access
* `strong-password`: Secure password for the Supermetal user
Ensure your Azure MySQL server's firewall rules allow connections from the Supermetal service IP addresses.
# PostgreSQL (/docs/main/sources/pg)
import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
import { Callout } from 'fumadocs-ui/components/callout';
import { Steps, Step } from 'fumadocs-ui/components/steps';
import { Accordion, Accordions } from 'fumadocs-ui/components/accordion';
PostgreSQL (Postgres) is a powerful open-source relational database that supports logical replication, enabling real-time Change Data Capture (CDC).
This guide walks you through configuring your PostgreSQL database to work seamlessly with Supermetal.
***
## Features
|
Feature
|
Notes
|
|
Initial Data Sync
|
Performs fast initial data loads (snapshots) and historical backfills by chunking data within each table and reading each chunk concurrently using multiple connections.
|
|
Change Data Capture
|
Captures data changes (inserts, updates, deletes) in real-time using PostgreSQL's native logical replication for minimal database impact and guaranteed consistency.
|
|
Transactional Consistency
|
Guarantees that the data synced reflects a consistent point in the upstream database's timeline, respecting transaction boundaries.
|
|
Catalog Support
|
Ability to configure which schemas, tables and columns to sync.
|
***
## Prerequisites
Before you begin, ensure you have:
* **Supported PostgreSQL Version:** PostgreSQL version 9.4 or higher.
* **Database Admin Access:** Sufficient privileges to create users, grant permissions, and modify database configuration parameters. This might require `superuser` privileges or specific cloud provider roles (e.g., `rds_superuser` on AWS RDS, `azure_pg_admin` on Azure).
* **Network Connectivity:** Ensure that Supermetal's services can reach your PostgreSQL database. You may need to configure firewall rules, security groups, or VPC peering.
***
## Setup
Supermetal supports two replication types for PostgreSQL sources:
* **Snapshot**: One-time or periodic full table snapshots.
* **Logical Replication**: Real-time change data capture (CDC) with initial snapshot.
Follow these steps to configure PostgreSQL for snapshot-only replication:
### Create Read-Only User
Create a dedicated database user for Supermetal with read-only permissions.
```sql
-- Create user with a secure password
CREATE USER supermetal_user WITH PASSWORD 'strong-password';
```
Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the source in Supermetal.
### Grant Table Access
Grant SELECT permissions on the schemas and tables you want to replicate.
```sql
-- Grant schema access
GRANT USAGE ON SCHEMA public TO supermetal_user;
-- Grant select permission on all current tables in the schema
GRANT SELECT ON ALL TABLES IN SCHEMA public TO supermetal_user;
-- Ensure select permission is granted on any new tables created in the future
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT SELECT ON TABLES TO supermetal_user;
-- (Optional but Recommended) Bypass Row-Level Security (RLS)
-- This ensures Supermetal reads all data, regardless of RLS policies
-- applied for application users. Only grant if necessary and understood.
ALTER USER supermetal_user BYPASSRLS;
```
Repeat for each schema being replicated. Replace `public` with your schema name.
### Configure PostgreSQL Source in Supermetal
You are now ready to configure PostgreSQL as a source within Supermetal. Select **Snapshot** as the replication type when configuring the source.
Follow these steps to configure PostgreSQL for logical replication with CDC:
### Configure User and Permissions
Create a dedicated database user for Supermetal with the necessary permissions for replication and data access.
#### Create Replication User
```sql
-- Create user with minimal required privileges
CREATE USER supermetal_user
WITH PASSWORD 'strong-password'
REPLICATION;
```
```sql
-- Create user and grant RDS replication role
CREATE USER supermetal_user
WITH PASSWORD 'strong-password';
GRANT rds_replication TO supermetal_user;
```
```sql
-- Create replication user
CREATE USER supermetal_user
WITH PASSWORD 'strong-password';
ALTER USER supermetal_user WITH REPLICATION;
```
```sql
-- Create replication user
CREATE USER supermetal_user
WITH PASSWORD 'strong-password';
ALTER ROLE supermetal_user WITH REPLICATION;
```
Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the source in Supermetal.
#### Grant Table Access
```sql
-- Current and future table access
GRANT USAGE ON SCHEMA public TO supermetal_user;
-- Grant select permission on all current tables in the schema
GRANT SELECT ON ALL TABLES IN SCHEMA public TO supermetal_user;
-- Ensure select permission is granted on any new tables created in the future
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT SELECT ON TABLES TO supermetal_user;
-- (Optional but Recommended) Bypass Row-Level Security (RLS)
-- This ensures Supermetal syncs all data, regardless of RLS policies
-- applied for application users. Only grant if necessary and understood.
ALTER USER supermetal_user BYPASSRLS;
```
Repeat for each schema being replicated. Replace `public` with your schema name.
### Configure Replication Settings (WAL)
Logical replication relies on PostgreSQL's Write-Ahead Log (WAL). You need to configure specific parameters to enable it.
#### Understanding WAL Settings
These settings must be configured for logical replication to work:
* `wal_level`: Must be set to `logical`. This adds the necessary information to the WAL for logical decoding.
* `max_wal_senders`: Defines the maximum number of concurrent replication connections the server can support. Supermetal uses these connections for CDC.
* `max_replication_slots`: Defines the maximum number of replication slots the server can manage. Each Supermetal CDC connection requires a replication slot to track its progress and prevent the server from purging WAL files needed for replication. This should generally be equal to or greater than max\_wal\_senders.
#### Platform-Specific Configuration
Apply the settings based on your PostgreSQL environment.
1. Check current settings:
```sql
-- View all replication settings
SELECT name, setting, unit, context
FROM pg_settings
WHERE name IN (
'wal_level', 'max_wal_senders', 'max_replication_slots'
);
```
2. Configure settings:
```sql
ALTER SYSTEM SET wal_level = 'logical';
ALTER SYSTEM SET max_wal_senders = 10; -- Adjust for your tables
ALTER SYSTEM SET max_replication_slots = 10; -- Match max_wal_senders
```
3. Restart PostgreSQL
Modify the DB Parameter Group:
1. Go to your RDS instance configuration.
2. Find the associated DB Parameter Group and modify it (or create a new one and associate it).
3. Set the following parameters:
1. `rds.logical_replication` = 1 (This enables logical replication on RDS)
2. `wal_level` = logical (Often set automatically when rds.logical\_replication is 1)
3. `max_wal_senders` = 10 (Adjust as needed)
4. `max_replication_slots` = 10 (Match or exceed max\_wal\_senders)
4. Apply to RDS instance and reboot: Applying changes to a Parameter Group that includes static parameters like wal\_level requires rebooting the RDS instance.
1. Enable Logical Decoding Flag:
2. Navigate to your Cloud SQL for PostgreSQL instance in the Google Cloud Console. Click Edit.
3. Go to the Flags section.
4. Add or modify the following flags:
1. `cloudsql.logical_decoding` = on
2. `wal_sender_timeout` = 0 (Optional but recommended. Setting to 0 disables the timeout, preventing replication issues during long transactions or initial syncs. Default is 60 seconds.)
5. Automatic Management: Cloud SQL automatically manages `wal_level`, `max_wal_senders`, and `max_replication_slots` when `cloudsql.logical_decoding` is enabled. You do not need to set these manually.
6. You'll need to restart your instance to update your instance with the changes.
1. Navigate to your Azure Database for PostgreSQL instance in the Azure Portal.
2. Go to Server parameters. Set the following parameters:
1. `wal_level` = logical
2. `max_wal_senders` = 10 (Adjust as needed)
3. `max_replication_slots` = 10 (Match or exceed max\_wal\_senders)
4. `wal_sender_timeout` = 0 (Optional but recommended, prevents timeouts. Default might be 60000ms)
3. Save and Restart Server: Changes to these parameters require a server restart to take effect. Azure Portal will prompt you to save and restart.
Changes to WAL parameters require a server restart.
### Configure Publications
Publications define which tables are replicated. Choose between replicating all tables or specific ones.
```sql
-- Option 1: Create a publication for ALL tables
CREATE PUBLICATION supermetal_publication FOR ALL TABLES;
-- Option 2: For warehouse targets, to sync partitioned tables as root, include publish_via_partition_root flag (https://www.postgresql.org/docs/current/sql-createpublication.html#SQL-CREATEPUBLICATION-PARAMS-WITH-PUBLISH-VIA-PARTITION-ROOT)
CREATE PUBLICATION supermetal_publication FOR ALL TABLES WITH (publish_via_partition_root);
```
### Configure PostgreSQL Source in Supermetal
You are now ready to configure PostgreSQL as a source within Supermetal. Select **Logical Replication** as the replication type when configuring the source.
***
## Data Types Mapping
| PostgreSQL Type(s) | Apache Arrow DataType | Notes |
| :----------------- | :-------------------- | :---- |
| `BOOL` | `Boolean` | |
| PostgreSQL Type(s) | Apache Arrow DataType | Notes |
| :------------------------ | :-------------------------------------- | :----------------------------------------------------------------------------------------------- |
| `INT2` | `Int16` | |
| `INT4` | `Int32` | |
| `INT8`, `MONEY` | `Int64` | `MONEY` is mapped as a fixed-point integer. |
| `FLOAT4` | `Float32` | Single precision floating-point. |
| `FLOAT8` | `Float64` | Double precision floating-point. |
| `NUMERIC(p, s)` | `Decimal128(p, s)` / `Decimal256(p, s)` | Uses PostgreSQL precision (`p`) and scale (`s`) directly to ensure exact numeric representation. |
| `NUMERIC` (unconstrained) | `Utf8` | Unconstrained NUMERIC defaults to `Utf8` as precision/scale are undefined. |
| `OID` | `UInt32` | Object Identifier. |
| `PG_LSN` | `UInt64` | PostgreSQL Log Sequence Number. |
| PostgreSQL Type(s) | Apache Arrow DataType | Notes |
| :------------------- | :--------------------- | :--------------------------------------------------------------------------------------- |
| `DATE` | `Date32` | Days since UNIX epoch. Precision preserved at the day level. |
| `TIME` | `Time64(µs)` | Microseconds since midnight. Precision preserved at the microsecond level. |
| `TIMESTAMP` | `Timestamp(µs)` | Microseconds since UNIX epoch, timezone naive. Precision preserved at microsecond level. |
| `TIMESTAMPTZ` | `Timestamp(µs, "UTC")` | Microseconds since UNIX epoch, assumed UTC. Precision preserved at microsecond level. |
| `TIMETZ`, `INTERVAL` | `Utf8` | Temporal types with string representation in Arrow. |
| PostgreSQL Type(s) | Apache Arrow DataType | Notes |
| :----------------------------------------------------- | :-------------------- | :----------------------------------------------------------------------- |
| `CHAR`, `VARCHAR`, `TEXT`, `BPCHAR`, `NAME`, `UNKNOWN` | `Utf8` | General-purpose text types. |
| `citext`, `ltree`, `lquery`, `ltxtquery` | `Utf8` | Special text types (checked by name in code). |
| `BIT`, `VARBIT` | `Utf8` | Bit string types represented as text. |
| `UUID` | `Utf8` | Universally Unique Identifier. |
| `ENUM` | `Utf8` | Enumerated types represented as their text labels. |
| `INET`, `CIDR`, `MACADDR`, `MACADDR8` | `Utf8` | Network address types. |
| `TSVECTOR` | `Utf8` | Text search vector type represented as text. Positions are not preserved |
| PostgreSQL Type(s) | Apache Arrow DataType | Notes |
| :----------------- | :-------------------- | :-------------------------- |
| `BYTEA` | `Binary` | Variable-length byte array. |
| PostgreSQL Type(s) | Apache Arrow DataType | Notes |
| :----------------- | :-------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `JSON`, `JSONB` | `Utf8` | Mapped to `Utf8`. Using the [Canonical JSON Extension](https://arrow.apache.org/docs/format/CanonicalExtensions.html#json) (`arrow.json`) is recommended for interoperability, allowing downstream systems to recognize the content as JSON. This metadata signals to downstream supermetal targets (like Snowflake, Clickhouse, etc.) that the string contains JSON, allowing them to map it to their appropriate semi-structured types (e.g., VARIANT, JSON). |
| `XML` | `Utf8` | Mapped to `Utf8`. A canonical extension type for XML might also exist or be defined depending on specific library support, similar to JSON. |
| PostgreSQL Type(s) | Apache Arrow DataType | Notes |
| :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :--------------------------- | :---------------------------------------------------------- |
| `BOOL[]` | `List` | Base types map as per their categories. |
| `BYTEA[]` | `List` | |
| `INT2[]` | `List` | |
| `INT4[]` | `List` | |
| `INT8[]`, `MONEY[]` | `List` | |
| `OID[]` | `List` | |
| `PG_LSN[]` | `List` | |
| `FLOAT4[]` | `List` | |
| `FLOAT8[]` | `List` | |
| `NUMERIC(p, s)[]` | `List` | Precision and scale preserved within list elements. |
| `DATE[]` | `List` | |
| `TIME[]` | `List` | |
| `TIMESTAMP[]` | `List` | |
| `TIMESTAMPTZ[]` | `List` | |
| `CHAR[]`, `VARCHAR[]`, `TEXT[]`, etc., `_citext`, `_ltree`, etc., `BIT[]`, `VARBIT[]`, `TIMETZ[]`, `INTERVAL[]`, `XML[]`, `UUID[]`, `INET[]`, `CIDR[]`, `MACADDR8[]`, `MACADDR[]`, `JSON[]`, `JSONB[]`, `NUMERIC[]` (unconstrained) | `List` | Includes arrays of types mapped to `Utf8`. |
| Multi-dimensional Arrays (>1 dim) | *Unsupported* | Multi-dimensional arrays (dimension > 1) are not supported. |
| PostgreSQL Type(s) | Apache Arrow DataType | Notes |
| :------------------------------------------------------------------------ | :-------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `INT4RANGE`, `INT8RANGE`, `NUMRANGE`, `TSRANGE`, `TSTZRANGE`, `DATERANGE` | `Utf8` | Range types are serialized to JSON format using the [Canonical JSON Extension](https://arrow.apache.org/docs/format/CanonicalExtensions.html#json) (`arrow.json`). |
| Multirange types | *Unsupported* | Multirange types are not supported. |
| PostgreSQL Type(s) | Apache Arrow DataType | Notes |
| :-------------------- | :--------------------- | :--------------------------------------------------------------------------------------------------------------------- |
| `DOMAIN(base_type)` | *Mapped as base\_type* | Domains are resolved to their underlying base type. |
| Composite Types (ROW) | `Utf8` | Composite types (structs/rows) are not explicitly mapped and default to a string (e.g., JSON) representation (`Utf8`). |
| PostgreSQL Type(s) | Apache Arrow DataType | Notes |
| :---------------------------------------------------------- | :-------------------- | :------------------------------- |
| `POINT`, `LSEG`, `PATH`, `BOX`, `POLYGON`, `LINE`, `CIRCLE` | *Unsupported* | Geometric types are unsupported. |
When replicating from PostgreSQL to PostgreSQL, Supermetal replicates the source PostgreSQL data types 1:1, preserving the exact type from source to target when possible.
***
# PostgreSQL Setup (/docs/main/sources/pg/setup)
## Prerequisites
Before you begin, ensure you have:
* **Supported PostgreSQL Version:** PostgreSQL version 9.4 or higher.
* **Database Admin Access:** Sufficient privileges to create users, grant permissions, and modify database configuration parameters. This might require `superuser` privileges or specific cloud provider roles (e.g., `rds_superuser` on AWS RDS, `azure_pg_admin` on Azure).
* **Network Connectivity:** Ensure that Supermetal's services can reach your PostgreSQL database. You may need to configure firewall rules, security groups, or VPC peering.
***
## Setup
Supermetal supports two replication types for PostgreSQL sources:
* **Snapshot**: One-time or periodic full table snapshots.
* **Logical Replication**: Real-time change data capture (CDC) with initial snapshot.
Follow these steps to configure PostgreSQL for snapshot-only replication:
### Create Read-Only User
Create a dedicated database user for Supermetal with read-only permissions.
```sql
-- Create user with a secure password
CREATE USER supermetal_user WITH PASSWORD 'strong-password';
```
Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the source in Supermetal.
### Grant Table Access
Grant SELECT permissions on the schemas and tables you want to replicate.
```sql
-- Grant schema access
GRANT USAGE ON SCHEMA public TO supermetal_user;
-- Grant select permission on all current tables in the schema
GRANT SELECT ON ALL TABLES IN SCHEMA public TO supermetal_user;
-- Ensure select permission is granted on any new tables created in the future
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT SELECT ON TABLES TO supermetal_user;
-- (Optional but Recommended) Bypass Row-Level Security (RLS)
-- This ensures Supermetal reads all data, regardless of RLS policies
-- applied for application users. Only grant if necessary and understood.
ALTER USER supermetal_user BYPASSRLS;
```
Repeat for each schema being replicated. Replace `public` with your schema name.
### Configure PostgreSQL Source in Supermetal
You are now ready to configure PostgreSQL as a source within Supermetal. Select **Snapshot** as the replication type when configuring the source.
Follow these steps to configure PostgreSQL for logical replication with CDC:
### Configure User and Permissions
Create a dedicated database user for Supermetal with the necessary permissions for replication and data access.
#### Create Replication User
```sql
-- Create user with minimal required privileges
CREATE USER supermetal_user
WITH PASSWORD 'strong-password'
REPLICATION;
```
```sql
-- Create user and grant RDS replication role
CREATE USER supermetal_user
WITH PASSWORD 'strong-password';
GRANT rds_replication TO supermetal_user;
```
```sql
-- Create replication user
CREATE USER supermetal_user
WITH PASSWORD 'strong-password';
ALTER USER supermetal_user WITH REPLICATION;
```
```sql
-- Create replication user
CREATE USER supermetal_user
WITH PASSWORD 'strong-password';
ALTER ROLE supermetal_user WITH REPLICATION;
```
Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the source in Supermetal.
#### Grant Table Access
```sql
-- Current and future table access
GRANT USAGE ON SCHEMA public TO supermetal_user;
-- Grant select permission on all current tables in the schema
GRANT SELECT ON ALL TABLES IN SCHEMA public TO supermetal_user;
-- Ensure select permission is granted on any new tables created in the future
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT SELECT ON TABLES TO supermetal_user;
-- (Optional but Recommended) Bypass Row-Level Security (RLS)
-- This ensures Supermetal syncs all data, regardless of RLS policies
-- applied for application users. Only grant if necessary and understood.
ALTER USER supermetal_user BYPASSRLS;
```
Repeat for each schema being replicated. Replace `public` with your schema name.
### Configure Replication Settings (WAL)
Logical replication relies on PostgreSQL's Write-Ahead Log (WAL). You need to configure specific parameters to enable it.
#### Understanding WAL Settings
These settings must be configured for logical replication to work:
* `wal_level`: Must be set to `logical`. This adds the necessary information to the WAL for logical decoding.
* `max_wal_senders`: Defines the maximum number of concurrent replication connections the server can support. Supermetal uses these connections for CDC.
* `max_replication_slots`: Defines the maximum number of replication slots the server can manage. Each Supermetal CDC connection requires a replication slot to track its progress and prevent the server from purging WAL files needed for replication. This should generally be equal to or greater than max\_wal\_senders.
#### Platform-Specific Configuration
Apply the settings based on your PostgreSQL environment.
1. Check current settings:
```sql
-- View all replication settings
SELECT name, setting, unit, context
FROM pg_settings
WHERE name IN (
'wal_level', 'max_wal_senders', 'max_replication_slots'
);
```
2. Configure settings:
```sql
ALTER SYSTEM SET wal_level = 'logical';
ALTER SYSTEM SET max_wal_senders = 10; -- Adjust for your tables
ALTER SYSTEM SET max_replication_slots = 10; -- Match max_wal_senders
```
3. Restart PostgreSQL
Modify the DB Parameter Group:
1. Go to your RDS instance configuration.
2. Find the associated DB Parameter Group and modify it (or create a new one and associate it).
3. Set the following parameters:
1. `rds.logical_replication` = 1 (This enables logical replication on RDS)
2. `wal_level` = logical (Often set automatically when rds.logical\_replication is 1)
3. `max_wal_senders` = 10 (Adjust as needed)
4. `max_replication_slots` = 10 (Match or exceed max\_wal\_senders)
4. Apply to RDS instance and reboot: Applying changes to a Parameter Group that includes static parameters like wal\_level requires rebooting the RDS instance.
1. Enable Logical Decoding Flag:
2. Navigate to your Cloud SQL for PostgreSQL instance in the Google Cloud Console. Click Edit.
3. Go to the Flags section.
4. Add or modify the following flags:
1. `cloudsql.logical_decoding` = on
2. `wal_sender_timeout` = 0 (Optional but recommended. Setting to 0 disables the timeout, preventing replication issues during long transactions or initial syncs. Default is 60 seconds.)
5. Automatic Management: Cloud SQL automatically manages `wal_level`, `max_wal_senders`, and `max_replication_slots` when `cloudsql.logical_decoding` is enabled. You do not need to set these manually.
6. You'll need to restart your instance to update your instance with the changes.
1. Navigate to your Azure Database for PostgreSQL instance in the Azure Portal.
2. Go to Server parameters. Set the following parameters:
1. `wal_level` = logical
2. `max_wal_senders` = 10 (Adjust as needed)
3. `max_replication_slots` = 10 (Match or exceed max\_wal\_senders)
4. `wal_sender_timeout` = 0 (Optional but recommended, prevents timeouts. Default might be 60000ms)
3. Save and Restart Server: Changes to these parameters require a server restart to take effect. Azure Portal will prompt you to save and restart.
Changes to WAL parameters require a server restart.
### Configure Publications
Publications define which tables are replicated. Choose between replicating all tables or specific ones.
```sql
-- Option 1: Create a publication for ALL tables
CREATE PUBLICATION supermetal_publication FOR ALL TABLES;
-- Option 2: For warehouse targets, to sync partitioned tables as root, include publish_via_partition_root flag (https://www.postgresql.org/docs/current/sql-createpublication.html#SQL-CREATEPUBLICATION-PARAMS-WITH-PUBLISH-VIA-PARTITION-ROOT)
CREATE PUBLICATION supermetal_publication FOR ALL TABLES WITH (publish_via_partition_root);
```
### Configure PostgreSQL Source in Supermetal
You are now ready to configure PostgreSQL as a source within Supermetal. Select **Logical Replication** as the replication type when configuring the source.
# Oracle (/docs/main/sources/oracle)
import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
import { Steps, Step } from 'fumadocs-ui/components/steps';
import { Callout } from 'fumadocs-ui/components/callout';
import { Accordion, Accordions } from 'fumadocs-ui/components/accordion';
Supermetal supports change data capture (CDC) using Oracle's native [LogMiner](https://docs.oracle.com/en/database/oracle/oracle-database/23/sutil/oracle-logminer-utility.html) for querying redo logs and enabling real-time Change Data Capture (CDC).
This guide walks you through configuring your Oracle database to work seamlessly with Supermetal.
***
## Features
|
Feature
|
Notes
|
|
Initial Data Sync
|
Performs fast initial data loads (snapshots) and historical backfills by chunking data within each table and reading each chunk concurrently using multiple connections.
|
|
Change Data Capture
|
Captures data changes (inserts, updates, deletes) using Oracle's native [LogMiner](https://docs.oracle.com/en/database/oracle/oracle-database/23/sutil/oracle-logminer-utility.html) utility. LogMiner reads the database redo and archive logs to extract change data with guaranteed transactional consistency.
|
|
Transactional Consistency
|
Guarantees that the data synced reflects a consistent point in the upstream database's timeline, respecting transaction boundaries through LogMiner's redo log analysis.
|
|
Catalog Support
|
Ability to configure which schemas, tables and columns to sync.
|
|
Multi-tenant Architecture Support
|
Works with Oracle Container Database (CDB) and Pluggable Databases (PDB).
|
***
## Limitations
* [LogMiner](https://docs.oracle.com/en/database/oracle/oracle-database/23/sutil/oracle-logminer-utility.html#GUID-7594F0D7-0ACD-46E6-BD61-2751136ECDB4):
* Objects (table or columns) with names longer than 30 characters are not supported.
* LOB data types: Partially supported. LogMiner only logs column values for LOB (XML, CLOB, NCLOB, and BLOB) data types if they are explicitly set or changed in a SQL statement.
* Schema evolution support depends on LogMiner dictionary strategy:
* **Redo logs**: Supported.
* **Online catalog**: Partial support, limited to backwards-compatible changes.
***
## Prerequisites
Before you begin, ensure you have:
* **Supported Oracle Version:** Oracle Database 12c or higher
Standard standalone Oracle database without container architecture.
| Mode | Description |
| -------------- | ---------------------------------------- |
| **All PDBs** | Capture changes from all PDBs |
| **Single PDB** | Capture changes from a specific PDB only |
Capture changes by directly connecting to a pluggable database. Requires Oracle 19c Update 10 or later.
**Oracle RAC (Real Application Clusters)** is not currently supported.
* **Supported Platforms:**
* **Self-Managed Oracle**: Generic Oracle databases on-premises or self-hosted in the cloud
* **AWS RDS Oracle**: Amazon RDS for Oracle instances (single tenant and multi-tenant)
* **Database Admin Access:** Database master or admin user with sufficient privileges to create users, grant permissions, and modify database configuration parameters
* **Network Connectivity:** Ensure that Supermetal's services can reach your Oracle database. You may need to configure security groups
***
## Setup
Follow these steps to configure your Oracle source:
### Create a user & grant permissions
```sql
-- Connect to your Oracle database as an Admin user
-- Create a dedicated user for Supermetal
CREATE USER SUPERMETAL_USER IDENTIFIED BY "strong-password";
-- Grant session and catalog permissions
GRANT CREATE SESSION TO SUPERMETAL_USER;
GRANT EXECUTE_CATALOG_ROLE TO SUPERMETAL_USER;
-- Grant LogMiner permissions
GRANT LOGMINING TO SUPERMETAL_USER;
-- Grant table access
GRANT SELECT ANY TABLE TO SUPERMETAL_USER;
GRANT SELECT ANY TRANSACTION TO SUPERMETAL_USER;
-- Grant access to LogMiner dictionary views
GRANT SELECT ON V_$DATABASE TO SUPERMETAL_USER;
GRANT SELECT ON V_$ARCHIVED_LOG TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOGMNR_CONTENTS TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOGMNR_LOGS TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOG TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOGFILE TO SUPERMETAL_USER;
GRANT SELECT ON V_$TRANSACTION TO SUPERMETAL_USER;
GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO SUPERMETAL_USER;
-- Required when using redo logs dictionary strategy (Oracle 19cU10+)
GRANT SELECT ON DBA_LOGMNR_DICTIONARY_BUILDLOG TO SUPERMETAL_USER;
-- Grant access to system views for metadata
GRANT SELECT ON DBA_OBJECTS TO SUPERMETAL_USER;
GRANT SELECT ON DBA_EXTENTS TO SUPERMETAL_USER;
-- Grant execute permissions for LogMiner
GRANT EXECUTE ON DBMS_LOGMNR TO SUPERMETAL_USER;
GRANT EXECUTE ON DBMS_LOGMNR_D TO SUPERMETAL_USER;
-- If TDE (Transparent Data Encryption) is enabled, also grant:
-- GRANT SELECT ON DBA_TABLESPACES TO SUPERMETAL_USER;
-- GRANT SELECT ON DBA_ENCRYPTED_COLUMNS TO SUPERMETAL_USER;
```
```sql
-- Connect to your Oracle database as an Admin user
-- Switch to the container (root) database
ALTER SESSION SET CONTAINER=CDB$ROOT;
-- Create a common user (must start with C##)
CREATE USER C##SUPERMETAL_USER IDENTIFIED BY "strong-password" CONTAINER=ALL;
-- Grant session and container permissions
GRANT CREATE SESSION, ALTER SESSION, SET CONTAINER TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT EXECUTE_CATALOG_ROLE TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Configure user to access all containers
ALTER USER C##SUPERMETAL_USER SET CONTAINER_DATA=ALL CONTAINER=CURRENT;
-- Grant LogMiner permissions
GRANT LOGMINING TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Grant table access across all containers
GRANT SELECT ANY TABLE TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ANY TRANSACTION TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Grant access to LogMiner dictionary views
GRANT SELECT ON V_$DATABASE TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$ARCHIVED_LOG TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOGMNR_CONTENTS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOGMNR_LOGS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOG TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOGFILE TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$TRANSACTION TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$CONTAINERS TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Required when using redo logs dictionary strategy (Oracle 19cU10+)
GRANT SELECT ON DBA_LOGMNR_DICTIONARY_BUILDLOG TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Grant access to system views for metadata
GRANT SELECT ON DBA_OBJECTS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON DBA_EXTENTS TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Grant execute permissions for LogMiner
GRANT EXECUTE ON DBMS_LOGMNR TO C##SUPERMETAL_USER;
GRANT EXECUTE ON DBMS_LOGMNR_D TO C##SUPERMETAL_USER;
-- If TDE is enabled, also grant:
-- GRANT SELECT ON DBA_TABLESPACES TO C##SUPERMETAL_USER CONTAINER=ALL;
-- GRANT SELECT ON DBA_ENCRYPTED_COLUMNS TO C##SUPERMETAL_USER CONTAINER=ALL;
```
```sql
-- Connect to CDB root as an Admin user
ALTER SESSION SET CONTAINER=CDB$ROOT;
-- Create a common user (must start with C##)
CREATE USER C##SUPERMETAL_USER IDENTIFIED BY "strong-password" CONTAINER=ALL;
-- Grant session and container permissions
GRANT CREATE SESSION, ALTER SESSION, SET CONTAINER TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT EXECUTE_CATALOG_ROLE TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Configure user to access specific PDB (replace PDB_NAME with your PDB)
ALTER USER C##SUPERMETAL_USER SET CONTAINER_DATA=(CDB$ROOT, PDB_NAME) CONTAINER=CURRENT;
-- Grant LogMiner permissions
GRANT LOGMINING TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Switch to the specific PDB
ALTER SESSION SET CONTAINER=PDB_NAME;
-- Grant table access in the PDB
GRANT SELECT ANY TABLE TO C##SUPERMETAL_USER;
GRANT SELECT ANY TRANSACTION TO C##SUPERMETAL_USER;
-- Switch back to CDB root
ALTER SESSION SET CONTAINER=CDB$ROOT;
-- Grant access to LogMiner dictionary views
GRANT SELECT ON V_$DATABASE TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$ARCHIVED_LOG TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOGMNR_CONTENTS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOGMNR_LOGS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOG TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOGFILE TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$TRANSACTION TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$CONTAINERS TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Required when using redo logs dictionary strategy (Oracle 19cU10+)
GRANT SELECT ON DBA_LOGMNR_DICTIONARY_BUILDLOG TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Grant access to system views for metadata
GRANT SELECT ON DBA_OBJECTS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON DBA_EXTENTS TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Grant execute permissions for LogMiner
GRANT EXECUTE ON DBMS_LOGMNR TO C##SUPERMETAL_USER;
GRANT EXECUTE ON DBMS_LOGMNR_D TO C##SUPERMETAL_USER;
-- If TDE is enabled, also grant:
-- GRANT SELECT ON DBA_TABLESPACES TO C##SUPERMETAL_USER CONTAINER=ALL;
-- GRANT SELECT ON DBA_ENCRYPTED_COLUMNS TO C##SUPERMETAL_USER CONTAINER=ALL;
```
```sql
-- Connect directly to the PDB as an Admin user
-- Requires Oracle 19c Update 10 or later
-- Create a local user in the PDB
CREATE USER SUPERMETAL_USER IDENTIFIED BY "strong-password";
-- Grant session and catalog permissions
GRANT CREATE SESSION TO SUPERMETAL_USER;
GRANT EXECUTE_CATALOG_ROLE TO SUPERMETAL_USER;
-- Grant LogMiner permissions
GRANT LOGMINING TO SUPERMETAL_USER;
-- Grant table access
GRANT SELECT ANY TABLE TO SUPERMETAL_USER;
GRANT SELECT ANY TRANSACTION TO SUPERMETAL_USER;
-- Grant access to LogMiner dictionary views
GRANT SELECT ON V_$DATABASE TO SUPERMETAL_USER;
GRANT SELECT ON V_$ARCHIVED_LOG TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOGMNR_CONTENTS TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOGMNR_LOGS TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOG TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOGFILE TO SUPERMETAL_USER;
GRANT SELECT ON V_$TRANSACTION TO SUPERMETAL_USER;
GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO SUPERMETAL_USER;
-- Required when using redo logs dictionary strategy (Oracle 19cU10+)
GRANT SELECT ON DBA_LOGMNR_DICTIONARY_BUILDLOG TO SUPERMETAL_USER;
-- Grant access to system views for metadata
GRANT SELECT ON DBA_OBJECTS TO SUPERMETAL_USER;
GRANT SELECT ON DBA_EXTENTS TO SUPERMETAL_USER;
-- Grant execute permissions for LogMiner
GRANT EXECUTE ON DBMS_LOGMNR TO SUPERMETAL_USER;
GRANT EXECUTE ON DBMS_LOGMNR_D TO SUPERMETAL_USER;
-- If TDE is enabled, also grant:
-- GRANT SELECT ON DBA_TABLESPACES TO SUPERMETAL_USER;
-- GRANT SELECT ON DBA_ENCRYPTED_COLUMNS TO SUPERMETAL_USER;
```
```sql
-- Connect as the master user
-- Create a user for Supermetal
CREATE USER SUPERMETAL_USER IDENTIFIED BY "strong-password";
GRANT CONNECT TO SUPERMETAL_USER;
GRANT CREATE SESSION TO SUPERMETAL_USER;
-- Grant LogMiner and catalog permissions
GRANT LOGMINING TO SUPERMETAL_USER;
GRANT EXECUTE_CATALOG_ROLE TO SUPERMETAL_USER;
-- Grant table access
GRANT SELECT ANY TRANSACTION TO SUPERMETAL_USER;
GRANT SELECT ANY TABLE TO SUPERMETAL_USER;
-- Use RDS procedures to grant access to system objects
BEGIN
-- Required when using redo logs dictionary strategy (Oracle 19cU10+)
rdsadmin.rdsadmin_util.grant_sys_object('DBA_LOGMNR_DICTIONARY_BUILDLOG','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$DATABASE','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$ARCHIVED_LOG','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGMNR_CONTENTS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGMNR_LOGS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOG','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGFILE','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$TRANSACTION','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$ARCHIVE_DEST_STATUS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$PARAMETER','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('DBMS_LOGMNR','SUPERMETAL_USER','EXECUTE');
rdsadmin.rdsadmin_util.grant_sys_object('DBMS_LOGMNR_D','SUPERMETAL_USER','EXECUTE');
rdsadmin.rdsadmin_util.grant_sys_object('DBA_OBJECTS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('DBA_EXTENTS','SUPERMETAL_USER','SELECT');
END;
/
-- If TDE is enabled, also grant:
-- BEGIN
-- rdsadmin.rdsadmin_util.grant_sys_object('DBA_TABLESPACES','SUPERMETAL_USER','SELECT');
-- rdsadmin.rdsadmin_util.grant_sys_object('DBA_ENCRYPTED_COLUMNS','SUPERMETAL_USER','SELECT');
-- END;
-- /
```
```sql
-- Connect to the PDB as the master user
-- Requires Oracle 19c Update 10 or later on RDS
-- Create a local user in the PDB
CREATE USER SUPERMETAL_USER IDENTIFIED BY "strong-password";
GRANT CONNECT TO SUPERMETAL_USER;
GRANT CREATE SESSION TO SUPERMETAL_USER;
-- Grant LogMiner and catalog permissions
GRANT LOGMINING TO SUPERMETAL_USER;
GRANT EXECUTE_CATALOG_ROLE TO SUPERMETAL_USER;
-- Grant table access
GRANT SELECT ANY TRANSACTION TO SUPERMETAL_USER;
GRANT SELECT ANY TABLE TO SUPERMETAL_USER;
-- Use RDS procedures to grant access to system objects
BEGIN
rdsadmin.rdsadmin_util.grant_sys_object('DBA_LOGMNR_DICTIONARY_BUILDLOG','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$DATABASE','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$ARCHIVED_LOG','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGMNR_CONTENTS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGMNR_LOGS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOG','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGFILE','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$TRANSACTION','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$ARCHIVE_DEST_STATUS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$PARAMETER','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('DBMS_LOGMNR','SUPERMETAL_USER','EXECUTE');
rdsadmin.rdsadmin_util.grant_sys_object('DBMS_LOGMNR_D','SUPERMETAL_USER','EXECUTE');
rdsadmin.rdsadmin_util.grant_sys_object('DBA_OBJECTS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('DBA_EXTENTS','SUPERMETAL_USER','SELECT');
END;
/
-- If TDE is enabled, also grant:
-- BEGIN
-- rdsadmin.rdsadmin_util.grant_sys_object('DBA_TABLESPACES','SUPERMETAL_USER','SELECT');
-- rdsadmin.rdsadmin_util.grant_sys_object('DBA_ENCRYPTED_COLUMNS','SUPERMETAL_USER','SELECT');
-- END;
-- /
```
Oracle's default profiles often have low session limits which may impact Supermetal's parallel processing capabilities when snapshotting your database.
For optimal performance, set your `SESSIONS_PER_USER` limit to the smaller of:
* The number of Supermetal agent cores, or
* The number of database cores (available from `v$parameter.cpu_count`)
At minimum, we recommend setting this value to at least 10 sessions:
```sql
-- Check current database CPU count
SELECT VALUE FROM v$parameter WHERE NAME = 'cpu_count';
-- Create profile with appropriate session limit
CREATE PROFILE SUPERMETAL_PROFILE LIMIT SESSIONS_PER_USER 10;
-- Apply profile to Supermetal user
ALTER USER SUPERMETAL_USER PROFILE SUPERMETAL_PROFILE;
```
For CDB environments, apply the profile to your common user instead:
```sql
ALTER USER C##SUPERMETAL_USER PROFILE SUPERMETAL_PROFILE;
```
Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the source in Supermetal.
### Setup LogMiner
#### Archive Log Mode
Oracle's LogMiner requires that your database is running in ARCHIVELOG mode. You can check and enable it as follows:
```sql
-- Check if database is in ARCHIVELOG mode
SELECT log_mode FROM v$database;
-- If not in ARCHIVELOG mode, enable it
SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
ALTER DATABASE ARCHIVELOG;
ALTER DATABASE OPEN;
```
[Changing log mode](https://docs.oracle.com/en/database/oracle/oracle-database/23/sutil/oracle-logminer-utility.html#GUID-7594F0D7-0ACD-46E6-BD61-2751136ECDB4) requires a restart of the oracle database.
Instruct Oracle [RMAN](https://docs.oracle.com/en/database/oracle/oracle-database/23/sutil/oracle-logminer-utility.html#GUID-7594F0D7-0ACD-46E6-BD61-2751136ECDB4) to retain archive logs for at least 3 days.
```sql
RMAN> CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 3 DAYS;
```
For optimal LogMiner performance, ensure your redo logs are appropriately sized
```sql
-- Check current redo log configuration
SELECT group#, bytes/1024/1024 AS size_mb FROM v$log;
```
Each redo log file should be at least 500MB for production databases.
If your logs are smaller, consider increasing their size by contacting your DBA
```sql
-- Check if database is in ARCHIVELOG mode
SELECT log_mode FROM v$database;
-- If not in ARCHIVELOG mode, enable it by turning on backups in AWS RDS console
```
AWS RDS Oracle instances are automatically configured with ARCHIVELOG mode enabled when backups are enabled.
Set archive log retention using the `set_configuration` procedure for at least 3 days:
```sql
BEGIN
rdsadmin.rdsadmin_util.set_configuration('archivelog retention hours', 72);
END;
/
```
Adjust this value based on your backup and recovery requirements.
#### Database Supplemental Logging
LogMiner requires supplemental logging to capture the necessary information for CDC:
* **Minimal**: Lower redo log usage but requires table-level supplemental logging configuration for each table.
* **Full**: Higher redo log usage but eliminates the need for table-level configuration.
```sql
-- Enable minimal supplemental logging at the database level
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
-- Then configure table-level supplemental logging for each table to be synced
-- Repeat for each table to be synced:
ALTER TABLE schema_name.table_name ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
```
```sql
-- Enable full supplemental logging at the database level
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
```
For more reliable CDC, consider enabling force logging at the database level:
```sql
ALTER DATABASE FORCE LOGGING;
```
```sql
-- Enable full supplemental logging using RDS procedures
BEGIN
rdsadmin.rdsadmin_util.alter_supplemental_logging(
p_action => 'ADD',
p_type => 'ALL'
);
END;
/
```
For more reliable CDC, enable force logging using the RDS specific procedure:
```sql
BEGIN
rdsadmin.rdsadmin_util.force_logging(p_enable => true);
END;
```
This ensures all database operations are logged.
***
## Data Types Mapping
| Oracle Type(s) | Apache Arrow DataType | Notes |
| :--------------------------------- | :-------------------- | :----------------------------------------- |
| `NUMBER(p, s)` where p,s specified | `Decimal128(p, s)` | Constrained precision and scale. |
| `NUMBER` (identity column) | `Decimal128(38, 0)` | Primary keys |
| `NUMBER` (unconstrained) | `Utf8` | Unconstrained precision and scale |
| `NUMBER(p, s)` where s \< 0 | `Utf8` | Negative scale numbers |
| `BINARY_FLOAT` | `Float32` | 32-bit IEEE 754 single precision |
| `BINARY_DOUBLE` | `Float64` | 64-bit IEEE 754 double precision |
| `FLOAT(p)` where p ≤ 20 | `Float32` | Binary precision up to \~6 decimal digits |
| `FLOAT(p)` where 20 \< p ≤ 49 | `Float64` | Binary precision up to \~15 decimal digits |
| `FLOAT(p)` where p > 49 | `Utf8` | High-precision floats preserved as strings |
| Oracle Type(s) | Apache Arrow DataType | Notes |
| :------------- | :-------------------- | :---- |
| `BOOLEAN` | *Unsupported* | |
| Oracle Type(s) | Apache Arrow DataType | Notes |
| :---------------------------- | :-------------------------- | :---------------------------------------------------------------------------------------------- |
| `DATE` | `Timestamp(s)` | Oracle DATE includes both date and time components |
| `TIMESTAMP(p)` | `Timestamp(s/ms/μs)` | Precision based on p: p=0: seconds p=1-3: milliseconds p=4-6: microseconds |
| `TIMESTAMP(ns)` | `Utf8` | |
| `TIMESTAMP(p) WITH TIME ZONE` | `Timestamp(s/ms/μs, "UTC")` | Same precision levels as above, normalized to UTC |
| `TIMESTAMP(p) WITH LOCAL TZ` | `Timestamp(s/ms/μs)` | Same precision levels as above, UTC timezone |
| `INTERVAL YEAR(2) TO MONTH` | `Interval(YearMonth)` | Stores duration as months (years×12 + months) |
| `INTERVAL YEAR TO MONTH` | `Utf8` | |
| `INTERVAL DAY TO SECOND` | `Utf8` | |
| Oracle Type(s) | Apache Arrow DataType | Notes |
| :------------- | :-------------------- | :---------------------------------------------------------------------------------------------------- |
| `CHAR` | `Utf8` | Fixed-length character data |
| `NCHAR` | `Utf8` | Fixed-length Unicode character data |
| `VARCHAR2` | `Utf8` | Variable-length character data. Supports extended string size (MAX\_STRING\_SIZE = EXTENDED). |
| `NVARCHAR2` | `Utf8` | Variable-length Unicode character data. Supports extended string size (MAX\_STRING\_SIZE = EXTENDED). |
| `CLOB` | `LargeUtf8` | Character Large Object for text > 2GB |
| `NCLOB` | `LargeUtf8` | National Character Large Object for Unicode > 2GB |
| `LONG` | `LargeUtf8` | Legacy long text type (deprecated in Oracle) |
| `ROWID` | `Utf8` | Oracle's physical row address |
| Oracle Type(s) | Apache Arrow DataType | Notes |
| :------------- | :-------------------- | :------------------------------------------------------------------------------- |
| `RAW` | `Binary` | Fixed-length binary data. Supports extended size (MAX\_STRING\_SIZE = EXTENDED). |
| `BLOB` | `LargeBinary` | Binary Large Object for binary data > 2GB |
| `BFILE` | `LargeBinary` | Binary file locator for external files |
| `LONG RAW` | `LargeBinary` | Legacy long binary type (deprecated in Oracle) |
| Oracle Type(s) | Apache Arrow DataType | Notes |
| :------------- | :-------------------- | :---------------------------------------------------------------------- |
| `XMLTYPE` | `Utf8` | Mapped to `Utf8`. A extension type for XML is defined, similar to JSON. |
| `JSON` | *Unsupported* | |
| Oracle Type(s) | Apache Arrow DataType | Notes |
| :------------- | :-------------------- | :----------------------------- |
| `OBJECT` | *Unsupported* | Object types are not supported |
# Oracle Setup (/docs/main/sources/oracle/setup)
## Prerequisites
Before you begin, ensure you have:
* **Supported Oracle Version:** Oracle Database 12c or higher
Standard standalone Oracle database without container architecture.
| Mode | Description |
| -------------- | ---------------------------------------- |
| **All PDBs** | Capture changes from all PDBs |
| **Single PDB** | Capture changes from a specific PDB only |
Capture changes by directly connecting to a pluggable database. Requires Oracle 19c Update 10 or later.
**Oracle RAC (Real Application Clusters)** is not currently supported.
* **Supported Platforms:**
* **Self-Managed Oracle**: Generic Oracle databases on-premises or self-hosted in the cloud
* **AWS RDS Oracle**: Amazon RDS for Oracle instances (single tenant and multi-tenant)
* **Database Admin Access:** Database master or admin user with sufficient privileges to create users, grant permissions, and modify database configuration parameters
* **Network Connectivity:** Ensure that Supermetal's services can reach your Oracle database. You may need to configure security groups
***
## Setup
Follow these steps to configure your Oracle source:
### Create a user & grant permissions
```sql
-- Connect to your Oracle database as an Admin user
-- Create a dedicated user for Supermetal
CREATE USER SUPERMETAL_USER IDENTIFIED BY "strong-password";
-- Grant session and catalog permissions
GRANT CREATE SESSION TO SUPERMETAL_USER;
GRANT EXECUTE_CATALOG_ROLE TO SUPERMETAL_USER;
-- Grant LogMiner permissions
GRANT LOGMINING TO SUPERMETAL_USER;
-- Grant table access
GRANT SELECT ANY TABLE TO SUPERMETAL_USER;
GRANT SELECT ANY TRANSACTION TO SUPERMETAL_USER;
-- Grant access to LogMiner dictionary views
GRANT SELECT ON V_$DATABASE TO SUPERMETAL_USER;
GRANT SELECT ON V_$ARCHIVED_LOG TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOGMNR_CONTENTS TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOGMNR_LOGS TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOG TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOGFILE TO SUPERMETAL_USER;
GRANT SELECT ON V_$TRANSACTION TO SUPERMETAL_USER;
GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO SUPERMETAL_USER;
-- Required when using redo logs dictionary strategy (Oracle 19cU10+)
GRANT SELECT ON DBA_LOGMNR_DICTIONARY_BUILDLOG TO SUPERMETAL_USER;
-- Grant access to system views for metadata
GRANT SELECT ON DBA_OBJECTS TO SUPERMETAL_USER;
GRANT SELECT ON DBA_EXTENTS TO SUPERMETAL_USER;
-- Grant execute permissions for LogMiner
GRANT EXECUTE ON DBMS_LOGMNR TO SUPERMETAL_USER;
GRANT EXECUTE ON DBMS_LOGMNR_D TO SUPERMETAL_USER;
-- If TDE (Transparent Data Encryption) is enabled, also grant:
-- GRANT SELECT ON DBA_TABLESPACES TO SUPERMETAL_USER;
-- GRANT SELECT ON DBA_ENCRYPTED_COLUMNS TO SUPERMETAL_USER;
```
```sql
-- Connect to your Oracle database as an Admin user
-- Switch to the container (root) database
ALTER SESSION SET CONTAINER=CDB$ROOT;
-- Create a common user (must start with C##)
CREATE USER C##SUPERMETAL_USER IDENTIFIED BY "strong-password" CONTAINER=ALL;
-- Grant session and container permissions
GRANT CREATE SESSION, ALTER SESSION, SET CONTAINER TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT EXECUTE_CATALOG_ROLE TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Configure user to access all containers
ALTER USER C##SUPERMETAL_USER SET CONTAINER_DATA=ALL CONTAINER=CURRENT;
-- Grant LogMiner permissions
GRANT LOGMINING TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Grant table access across all containers
GRANT SELECT ANY TABLE TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ANY TRANSACTION TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Grant access to LogMiner dictionary views
GRANT SELECT ON V_$DATABASE TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$ARCHIVED_LOG TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOGMNR_CONTENTS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOGMNR_LOGS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOG TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOGFILE TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$TRANSACTION TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$CONTAINERS TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Required when using redo logs dictionary strategy (Oracle 19cU10+)
GRANT SELECT ON DBA_LOGMNR_DICTIONARY_BUILDLOG TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Grant access to system views for metadata
GRANT SELECT ON DBA_OBJECTS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON DBA_EXTENTS TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Grant execute permissions for LogMiner
GRANT EXECUTE ON DBMS_LOGMNR TO C##SUPERMETAL_USER;
GRANT EXECUTE ON DBMS_LOGMNR_D TO C##SUPERMETAL_USER;
-- If TDE is enabled, also grant:
-- GRANT SELECT ON DBA_TABLESPACES TO C##SUPERMETAL_USER CONTAINER=ALL;
-- GRANT SELECT ON DBA_ENCRYPTED_COLUMNS TO C##SUPERMETAL_USER CONTAINER=ALL;
```
```sql
-- Connect to CDB root as an Admin user
ALTER SESSION SET CONTAINER=CDB$ROOT;
-- Create a common user (must start with C##)
CREATE USER C##SUPERMETAL_USER IDENTIFIED BY "strong-password" CONTAINER=ALL;
-- Grant session and container permissions
GRANT CREATE SESSION, ALTER SESSION, SET CONTAINER TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT EXECUTE_CATALOG_ROLE TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Configure user to access specific PDB (replace PDB_NAME with your PDB)
ALTER USER C##SUPERMETAL_USER SET CONTAINER_DATA=(CDB$ROOT, PDB_NAME) CONTAINER=CURRENT;
-- Grant LogMiner permissions
GRANT LOGMINING TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Switch to the specific PDB
ALTER SESSION SET CONTAINER=PDB_NAME;
-- Grant table access in the PDB
GRANT SELECT ANY TABLE TO C##SUPERMETAL_USER;
GRANT SELECT ANY TRANSACTION TO C##SUPERMETAL_USER;
-- Switch back to CDB root
ALTER SESSION SET CONTAINER=CDB$ROOT;
-- Grant access to LogMiner dictionary views
GRANT SELECT ON V_$DATABASE TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$ARCHIVED_LOG TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOGMNR_CONTENTS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOGMNR_LOGS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOG TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$LOGFILE TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$TRANSACTION TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON V_$CONTAINERS TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Required when using redo logs dictionary strategy (Oracle 19cU10+)
GRANT SELECT ON DBA_LOGMNR_DICTIONARY_BUILDLOG TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Grant access to system views for metadata
GRANT SELECT ON DBA_OBJECTS TO C##SUPERMETAL_USER CONTAINER=ALL;
GRANT SELECT ON DBA_EXTENTS TO C##SUPERMETAL_USER CONTAINER=ALL;
-- Grant execute permissions for LogMiner
GRANT EXECUTE ON DBMS_LOGMNR TO C##SUPERMETAL_USER;
GRANT EXECUTE ON DBMS_LOGMNR_D TO C##SUPERMETAL_USER;
-- If TDE is enabled, also grant:
-- GRANT SELECT ON DBA_TABLESPACES TO C##SUPERMETAL_USER CONTAINER=ALL;
-- GRANT SELECT ON DBA_ENCRYPTED_COLUMNS TO C##SUPERMETAL_USER CONTAINER=ALL;
```
```sql
-- Connect directly to the PDB as an Admin user
-- Requires Oracle 19c Update 10 or later
-- Create a local user in the PDB
CREATE USER SUPERMETAL_USER IDENTIFIED BY "strong-password";
-- Grant session and catalog permissions
GRANT CREATE SESSION TO SUPERMETAL_USER;
GRANT EXECUTE_CATALOG_ROLE TO SUPERMETAL_USER;
-- Grant LogMiner permissions
GRANT LOGMINING TO SUPERMETAL_USER;
-- Grant table access
GRANT SELECT ANY TABLE TO SUPERMETAL_USER;
GRANT SELECT ANY TRANSACTION TO SUPERMETAL_USER;
-- Grant access to LogMiner dictionary views
GRANT SELECT ON V_$DATABASE TO SUPERMETAL_USER;
GRANT SELECT ON V_$ARCHIVED_LOG TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOGMNR_CONTENTS TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOGMNR_LOGS TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOG TO SUPERMETAL_USER;
GRANT SELECT ON V_$LOGFILE TO SUPERMETAL_USER;
GRANT SELECT ON V_$TRANSACTION TO SUPERMETAL_USER;
GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO SUPERMETAL_USER;
-- Required when using redo logs dictionary strategy (Oracle 19cU10+)
GRANT SELECT ON DBA_LOGMNR_DICTIONARY_BUILDLOG TO SUPERMETAL_USER;
-- Grant access to system views for metadata
GRANT SELECT ON DBA_OBJECTS TO SUPERMETAL_USER;
GRANT SELECT ON DBA_EXTENTS TO SUPERMETAL_USER;
-- Grant execute permissions for LogMiner
GRANT EXECUTE ON DBMS_LOGMNR TO SUPERMETAL_USER;
GRANT EXECUTE ON DBMS_LOGMNR_D TO SUPERMETAL_USER;
-- If TDE is enabled, also grant:
-- GRANT SELECT ON DBA_TABLESPACES TO SUPERMETAL_USER;
-- GRANT SELECT ON DBA_ENCRYPTED_COLUMNS TO SUPERMETAL_USER;
```
```sql
-- Connect as the master user
-- Create a user for Supermetal
CREATE USER SUPERMETAL_USER IDENTIFIED BY "strong-password";
GRANT CONNECT TO SUPERMETAL_USER;
GRANT CREATE SESSION TO SUPERMETAL_USER;
-- Grant LogMiner and catalog permissions
GRANT LOGMINING TO SUPERMETAL_USER;
GRANT EXECUTE_CATALOG_ROLE TO SUPERMETAL_USER;
-- Grant table access
GRANT SELECT ANY TRANSACTION TO SUPERMETAL_USER;
GRANT SELECT ANY TABLE TO SUPERMETAL_USER;
-- Use RDS procedures to grant access to system objects
BEGIN
-- Required when using redo logs dictionary strategy (Oracle 19cU10+)
rdsadmin.rdsadmin_util.grant_sys_object('DBA_LOGMNR_DICTIONARY_BUILDLOG','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$DATABASE','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$ARCHIVED_LOG','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGMNR_CONTENTS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGMNR_LOGS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOG','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGFILE','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$TRANSACTION','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$ARCHIVE_DEST_STATUS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$PARAMETER','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('DBMS_LOGMNR','SUPERMETAL_USER','EXECUTE');
rdsadmin.rdsadmin_util.grant_sys_object('DBMS_LOGMNR_D','SUPERMETAL_USER','EXECUTE');
rdsadmin.rdsadmin_util.grant_sys_object('DBA_OBJECTS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('DBA_EXTENTS','SUPERMETAL_USER','SELECT');
END;
/
-- If TDE is enabled, also grant:
-- BEGIN
-- rdsadmin.rdsadmin_util.grant_sys_object('DBA_TABLESPACES','SUPERMETAL_USER','SELECT');
-- rdsadmin.rdsadmin_util.grant_sys_object('DBA_ENCRYPTED_COLUMNS','SUPERMETAL_USER','SELECT');
-- END;
-- /
```
```sql
-- Connect to the PDB as the master user
-- Requires Oracle 19c Update 10 or later on RDS
-- Create a local user in the PDB
CREATE USER SUPERMETAL_USER IDENTIFIED BY "strong-password";
GRANT CONNECT TO SUPERMETAL_USER;
GRANT CREATE SESSION TO SUPERMETAL_USER;
-- Grant LogMiner and catalog permissions
GRANT LOGMINING TO SUPERMETAL_USER;
GRANT EXECUTE_CATALOG_ROLE TO SUPERMETAL_USER;
-- Grant table access
GRANT SELECT ANY TRANSACTION TO SUPERMETAL_USER;
GRANT SELECT ANY TABLE TO SUPERMETAL_USER;
-- Use RDS procedures to grant access to system objects
BEGIN
rdsadmin.rdsadmin_util.grant_sys_object('DBA_LOGMNR_DICTIONARY_BUILDLOG','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$DATABASE','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$ARCHIVED_LOG','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGMNR_CONTENTS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGMNR_LOGS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOG','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGFILE','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$TRANSACTION','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$ARCHIVE_DEST_STATUS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('V_$PARAMETER','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('DBMS_LOGMNR','SUPERMETAL_USER','EXECUTE');
rdsadmin.rdsadmin_util.grant_sys_object('DBMS_LOGMNR_D','SUPERMETAL_USER','EXECUTE');
rdsadmin.rdsadmin_util.grant_sys_object('DBA_OBJECTS','SUPERMETAL_USER','SELECT');
rdsadmin.rdsadmin_util.grant_sys_object('DBA_EXTENTS','SUPERMETAL_USER','SELECT');
END;
/
-- If TDE is enabled, also grant:
-- BEGIN
-- rdsadmin.rdsadmin_util.grant_sys_object('DBA_TABLESPACES','SUPERMETAL_USER','SELECT');
-- rdsadmin.rdsadmin_util.grant_sys_object('DBA_ENCRYPTED_COLUMNS','SUPERMETAL_USER','SELECT');
-- END;
-- /
```
Oracle's default profiles often have low session limits which may impact Supermetal's parallel processing capabilities when snapshotting your database.
For optimal performance, set your `SESSIONS_PER_USER` limit to the smaller of:
* The number of Supermetal agent cores, or
* The number of database cores (available from `v$parameter.cpu_count`)
At minimum, we recommend setting this value to at least 10 sessions:
```sql
-- Check current database CPU count
SELECT VALUE FROM v$parameter WHERE NAME = 'cpu_count';
-- Create profile with appropriate session limit
CREATE PROFILE SUPERMETAL_PROFILE LIMIT SESSIONS_PER_USER 10;
-- Apply profile to Supermetal user
ALTER USER SUPERMETAL_USER PROFILE SUPERMETAL_PROFILE;
```
For CDB environments, apply the profile to your common user instead:
```sql
ALTER USER C##SUPERMETAL_USER PROFILE SUPERMETAL_PROFILE;
```
Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the source in Supermetal.
### Setup LogMiner
#### Archive Log Mode
Oracle's LogMiner requires that your database is running in ARCHIVELOG mode. You can check and enable it as follows:
```sql
-- Check if database is in ARCHIVELOG mode
SELECT log_mode FROM v$database;
-- If not in ARCHIVELOG mode, enable it
SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
ALTER DATABASE ARCHIVELOG;
ALTER DATABASE OPEN;
```
[Changing log mode](https://docs.oracle.com/en/database/oracle/oracle-database/23/sutil/oracle-logminer-utility.html#GUID-7594F0D7-0ACD-46E6-BD61-2751136ECDB4) requires a restart of the oracle database.
Instruct Oracle [RMAN](https://docs.oracle.com/en/database/oracle/oracle-database/23/sutil/oracle-logminer-utility.html#GUID-7594F0D7-0ACD-46E6-BD61-2751136ECDB4) to retain archive logs for at least 3 days.
```sql
RMAN> CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 3 DAYS;
```
For optimal LogMiner performance, ensure your redo logs are appropriately sized
```sql
-- Check current redo log configuration
SELECT group#, bytes/1024/1024 AS size_mb FROM v$log;
```
Each redo log file should be at least 500MB for production databases.
If your logs are smaller, consider increasing their size by contacting your DBA
```sql
-- Check if database is in ARCHIVELOG mode
SELECT log_mode FROM v$database;
-- If not in ARCHIVELOG mode, enable it by turning on backups in AWS RDS console
```
AWS RDS Oracle instances are automatically configured with ARCHIVELOG mode enabled when backups are enabled.
Set archive log retention using the `set_configuration` procedure for at least 3 days:
```sql
BEGIN
rdsadmin.rdsadmin_util.set_configuration('archivelog retention hours', 72);
END;
/
```
Adjust this value based on your backup and recovery requirements.
#### Database Supplemental Logging
LogMiner requires supplemental logging to capture the necessary information for CDC:
* **Minimal**: Lower redo log usage but requires table-level supplemental logging configuration for each table.
* **Full**: Higher redo log usage but eliminates the need for table-level configuration.
```sql
-- Enable minimal supplemental logging at the database level
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
-- Then configure table-level supplemental logging for each table to be synced
-- Repeat for each table to be synced:
ALTER TABLE schema_name.table_name ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
```
```sql
-- Enable full supplemental logging at the database level
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
```
For more reliable CDC, consider enabling force logging at the database level:
```sql
ALTER DATABASE FORCE LOGGING;
```
```sql
-- Enable full supplemental logging using RDS procedures
BEGIN
rdsadmin.rdsadmin_util.alter_supplemental_logging(
p_action => 'ADD',
p_type => 'ALL'
);
END;
/
```
For more reliable CDC, enable force logging using the RDS specific procedure:
```sql
BEGIN
rdsadmin.rdsadmin_util.force_logging(p_enable => true);
END;
```
This ensures all database operations are logged.
# AWS (/docs/main/concepts/deployment/byoc/aws)
import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
import { Step, Steps } from 'fumadocs-ui/components/steps';
import { Accordion, Accordions } from 'fumadocs-ui/components/accordion';
import ByocAwsDiagram from '../../../../../../app/diagrams/byoc_aws';
Follow this guide to provision Supermetal BYOC (Bring Your Own Cloud), which deploys Supermetal's data replication agents in your own virtual cloud environment, maintaining full control and ownership over your data.
This provides an additional layer of security and isolation. Supermetal handles provisioning, operations, and maintenance.
***
## Overview
| Phase | Resource | Description |
| ---------------------- | -------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Bootstrap** | IAM Role | `SupermetalControlPlaneRole`: Assumed by Supermetal's control plane for orchestration of resources in your AWS account. |
| | IAM Role | `SupermetalAgentTaskRole`: IAM Role for Supermetal Agent ECS tasks, granting necessary permissions for agent operations (e.g., S3, Secrets Manager, KMS). |
| | IAM Role | `SupermetalTaskExecutionRole`: IAM Role for ECS task execution, allowing ECS to pull ECR images for agents and send logs to CloudWatch. |
| | IAM Role | `SupermetalAgentEC2InstanceRole`: IAM Role for EC2 instances that host Supermetal Agents (used with EC2-backed ECS capacity providers). |
| | IAM Instance Profile | `SupermetalAgentEC2InstanceProfile`: Instance profile associated with `SupermetalAgentEC2InstanceRole`, attached to EC2 instances hosting agents. |
| | IAM Role | `SupermetalValidationLambdaRole`: IAM Role for Supermetal Lambda functions used for validation tasks (e.g., connector credential validation, network reachability). |
| | Interface VPC Endpoint | `MetadataVPCEndpoint`: An AWS PrivateLink Interface VPC Endpoint for private communication between Supermetal agents and the Supermetal control plane. |
| | Security Group | `MetadataVPCEndpointSecurityGroup`: Associated with the `MetadataVPCEndpoint`, controls inbound traffic to the endpoint. |
| | Security Group | `SupermetalAgentMetadataSecurityGroup`: Allows Supermetal agents (running in your VPC) to connect to the `MetadataVPCEndpoint` on HTTPS port 443. |
| | (Optional) KMS Key & Alias | `SupermetalKMSKey` (and `SupermetalKMSKeyAlias` like `alias/supermetal-key-${VpcId}`): AWS KMS Customer Managed Key (CMK) for encrypting sensitive connector credentials. A new key is created if an existing `CredentialsKeyArn` is not provided. |
| **Registration** | Secret | An AWS Secrets Manager secret is created to store credentials required by the Supermetal control plane or agents, such as for ECR access or metadata service authentication. |
| **Connector Creation** | ECS Cluster | An Amazon ECS cluster (e.g., `supermetal-cluster-${VpcId}`) is created to host the Supermetal agent services. |
| | ECS Service | An Amazon ECS service is deployed within the cluster to run and manage Supermetal agent tasks. |
| | ECS Capacity Provider | An ECS Capacity Provider, typically backed by an EC2 Auto Scaling Group (ASG), is set up to manage the compute resources for the agents. |
* **Data Sovereignty:** Your primary data (being replicated) is processed entirely within your AWS VPC and never leaves your environment to the Supermetal Control Plane.
* **Secure Communication:** All metadata communication between Supermetal agents in your VPC and the Supermetal Control Plane occurs securely over AWS PrivateLink via an Interface VPC Endpoint. This traffic is encrypted in transit using **TLS 1.2+**.
* **Credential Protection:** Sensitive connector credentials provided to Supermetal are encrypted using **your** AWS KMS Customer Managed Key (`CredentialsKeyArn`) before being stored by the Supermetal Control Plane. Only the Supermetal Agents decrypt these credentials at runtime within your environment. Supermetal Control Plane cannot decrypt these credentials.
* **IAM Security:** The cross-account role (`SupermetalControlPlaneRole`) assumed by Supermetal utilizes an `ExternalId` to protect against the "confused deputy" problem. All IAM roles are designed with the principle of least privilege to perform only their required functions.
***
## Prerequisites
To successfully deploy Supermetal BYOC in your AWS environment, please ensure you have the following:
#### Supermetal Account Identifier
Supermetal account identifier (obtainable from the Supermetal Console under Settings).
* `SUPERMETAL_ACCOUNT_ID`
#### AWS Account & Permissions
* IAM User or Role with sufficient permissions (e.g., `AWSAdministratorAccess`) to bootstrap the Supermetal BYOC environment using either the [CloudFormation or the Terraform](#deployment_method).
#### Deployment Parameters
* **AWS Account ID:** The AWS Account ID of the target deployment account (e.g., `123456789012`).
* **AWS Region(s):** The AWS Region(s) of the target deployment account (e.g., `us-east-1`, `eu-west-2`).
* **VPC ID (`VpcId`):** The ID of an existing VPC within your chosen AWS Region where the bootstrap infrastructure and Supermetal agents will be deployed to on connector creation.
* **Subnet IDs (`SubnetIds`):** A list of (private) subnet IDs within your specified `VpcId`. Supermetal agents will be deployed across these subnets for high availability. Ensure these subnets belong to the same availability zones as your source (and target) databases.
* **(Optional) KMS Key for Credentials (`CredentialsKeyArn`):** Decide if you'll use an existing AWS KMS Customer Managed Key (CMK) for encrypting sensitive connector credentials. If you have an existing CMK, have its ARN ready. If left blank, the bootstrap process will create a new KMS key specifically for Supermetal.
***
## Setup
A two phase process, one is to bootstrap the environment and the other is to create a connector to deploy the data plane / agents in your VPC.
#### Environment Bootstrap
The bootstrap process uses an AWS CloudFormation template or a Terraform script provided by Supermetal to provision the necessary resources, including [IAM roles](#iam-roles) and private endpoint.
These IAM roles are used by Supermetal to automatically register your cloud environment with the Supermetal control plane over the private endpoint, and to automatically deploy Supermetal data plane / agents in your VPC.
The initial environment bootstrap process does not provision any compute resources or the data plane / agents. Once the environment bootstrap is complete, you can create connectors to deploy Supermetal data plane / agents on-demand in your VPC.
You can use the AWS Management Console or the AWS CLI to deploy the bootstrap CloudFormation template.
#### Obtain CloudFormation Template \[!toc]
* Obtain the latest CloudFormation template file from the Supermetal Console, the [template from appendix](#cloudformation-template) is for reference.
#### Deploy CloudFormation Template \[!toc]
#### Deploy with AWS Management Console \[!toc]
#### Prepare and Initiate Stack Creation \[!toc]
* Navigate to the **AWS CloudFormation console** in your AWS account.
* Ensure you are in the correct **AWS Region** for your deployment.
* Click **"Create stack"** and select **"With new resources (standard)"**.
* Under "Prerequisite - Prepare template", choose **"Template is ready"**.
* Under "Specify template", select **"Upload a template file"**.
* Click **"Choose file"** and upload the CloudFormation template file you obtained.
* Click **"Next"**.
#### Specify Stack Details \[!toc]
* On the "Specify stack details" page, enter a **"Stack name"** (e.g., `supermetal-bootstrap`).
* In the **"Parameters"** section, provide values for each parameter.
* Click **"Next"**.
Refer to the [Deployment Parameters](#deployment-parameters) section above for detailed explanations of each.
* `SupermetalAccountId`
* `VpcId`
* `SubnetIds`
* `CredentialsKeyArn`
#### Review and Create \[!toc]
* On the "Review" page, carefully **review all your settings** and parameter values.
* Scroll to the bottom and find the **"Capabilities"** section.
* **Important:** You **must** check the box that says: "I acknowledge that AWS CloudFormation might create IAM resources with custom names." (The exact wording might vary slightly).
* Click **"Create stack"** (or "Submit" depending on the AWS console version).
#### Deploy with AWS CLI \[!toc]
* AWS variables necessary to authenticate. Use either:
* `AWS_PROFILE`
* `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`
* Verify access, run `aws sts get-caller-identity --region `. See the [AWS CLI reference](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html).
#### Deploy Stack \[!toc]
Use the `aws cloudformation deploy` command to create the stack.
```bash
aws cloudformation deploy \
--template-file /path/to/your/supermetal-bootstrap-cf-template.yaml \
--stack-name supermetal-bootstrap \
--capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM \
--parameter-overrides \
SupermetalAccountId="$SUPERMETAL_ACCOUNT_ID" \
VpcId="$VPC_ID" \
SubnetIds="$SUBNET_IDS" \
CredentialsKeyArn="$CREDENTIALS_KEY_ARN" # Optional: Provide a Customer Managed Key (CMK) ARN for encrypting connector credentials
```
Refer to the [Deployment Parameters](#deployment-parameters) section above for detailed explanations of each.
* `SUPERMETAL_ACCOUNT_ID`
* `VPC_ID`
* `SUBNET_IDS`
* `CREDENTIALS_KEY_ARN`
#### Verification \[!toc]
* Monitor the CloudFormation stack creation process in the AWS CloudFormation console. It may take several minutes to complete.
* Once the CloudFormation stack status shows `CREATE_COMPLETE`, Supermetal's Control Plane automatically performs a registration process for the new environment post stack creation.
* Monitor the Environments page in your Supermetal Console. The environment should show up with a "Ready" status within a few minutes after the CloudFormation stack completes.
You can use the Terraform CLI to deploy the bootstrap module.
#### Obtain Terraform Script \[!toc]
* Obtain the latest Terraform script from the Supermetal Console. The Terraform code in the [appendix](#environment-bootstrap-scripts) can be used as a reference or saved as local files (e.g., `main.tf`, `variables.tf`, `outputs.tf`).
#### Configure Terraform \[!toc]
* Ensure you have [Terraform installed](https://learn.hashicorp.com/tutorials/terraform/install-cli) (version 1.0 or later).
* Set up your AWS credentials for Terraform. This typically involves configuring the AWS CLI and ensuring your environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN`) or AWS profile (`AWS_PROFILE`) are correctly set.
* Verify access by running `aws sts get-caller-identity --region `. Refer to the [AWS provider documentation](https://registry.terraform.io/providers/hashicorp/aws/latest/docs#authentication-and-configuration) for more details.
#### Prepare Terraform Files \[!toc]
* Create a new directory for your Terraform configuration.
* If using the appendix script, save the Terraform code into files within this directory (e.g., `main.tf`).
* Create a `terraform.tfvars` file in the same directory to specify the required variable values. Alternatively, you can pass variables via the command line.
**Example `terraform.tfvars`:**
```tfvars
vpc_id = "vpc-xxxxxxxxxxxxxxxxx"
private_subnet_ids = ["subnet-xxxxxxxxxxxxxxxxx", "subnet-yyyyyyyyyyyyyyyyy"]
external_id = "your-supermetal-provided-external-id"
supermetal_control_plane_aws_account_id = "supermetal-aws-account-id" # Provided by Supermetal
supermetal_metadata_service_endpoint_name = "com.amazonaws.vpce.your-region.vpce-svc-xxxxxxxxxxxxxxxxx" # Provided by Supermetal
# credentials_key_arn = "arn:aws:kms:your-region:your-account-id:key/your-key-id" # Optional: Only if using an existing KMS CMK
```
Refer to the [Deployment Parameters](#deployment-parameters) section and the variable descriptions within the Terraform script for detailed explanations of each.
* `vpc_id`
* `private_subnet_ids`
* `external_id`
* `supermetal_control_plane_aws_account_id`
* `supermetal_metadata_service_endpoint_name`
* `credentials_key_arn` (Optional)
#### Deploy with Terraform CLI \[!toc]
* Navigate to your Terraform configuration directory in your terminal.
* Initialize Terraform:
```bash
terraform init
```
* Review the execution plan:
```bash
terraform plan
```
* Apply the configuration:
```bash
terraform apply
```
Confirm the action by typing `yes` when prompted.
If you didn't use a `.tfvars` file, you can pass variables directly with the `apply` command. Ensure you provide values for all required variables:
```bash
terraform apply \
-var="vpc_id=vpc-xxxxxxxxxxxxxxxxx" \
-var="private_subnet_ids=[\"subnet-xxxxxxxxxxxxxxxxx\", \"subnet-yyyyyyyyyyyyyyyyy\"]" \
-var="external_id=YOUR_EXTERNAL_ID_FROM_SUPERMETAL_CONSOLE" \
-var="supermetal_control_plane_aws_account_id=SUPERMETAL_CONTROL_PLANE_AWS_ACCOUNT_ID" \
-var="supermetal_metadata_service_endpoint_name=COM_AMAZONAWS_VPCE_YOUR_REGION_VPCE_SVC_XXXXXXXXXXXXXX" \
# Optional: Provide this if you are using an existing KMS CMK.
# If omitted and 'credentials_key_arn' is not set in a .tfvars file, a new KMS key will be created.
# -var="credentials_key_arn=arn:aws:kms:your-region:your-account-id:key/your-key-id"
```
Replace placeholder values (e.g., `vpc-xxxxxxxxxxxxxxxxx`, `YOUR_EXTERNAL_ID_FROM_SUPERMETAL_CONSOLE`) with your actual values.
#### Verification \[!toc]
* Monitor the `terraform apply` command output. It will indicate when the resources have been successfully created.
* Once the Terraform apply is complete, Supermetal's Control Plane automatically performs a registration process for the new environment.
* Monitor the Environments page in your Supermetal Console. The environment should show up with a "Ready" status within a few minutes after Terraform completes.
***
#### Create Connector(s) to deploy Supermetal agents
Once the environment bootstrap is complete, we are now ready to create a connector to deploy Supermetal agents in your VPC.
#### Select source database \[!toc]
Follow the connector setup instructions in the Supermetal Console to select a source database, supermetal can optionally [auto discover](#database_discovery) available source databases.
#### Create Security Group and Configure Source Database Access \[!toc]
To give Supermetal access to your source database and to validate network reachability and connection credentials, you need to create a security group and configure the source database access.
Supermetal Console will list pre-filled steps to create the security group and configure the database access as part of the prerequisite steps similar to the steps listed below.
```bash
# Create the security group
SG_ID=$(aws ec2 create-security-group \
--description "Security group for Supermetal" \
--vpc-id $VPC_ID \
--output text --query 'GroupId')
# Add required outbound rules
# 1. For metadata VPC endpoint (created during environment bootstrap)
aws ec2 authorize-security-group-egress \
--group-id $SG_ID \
--protocol tcp \
--port 443 \
--source-group $(aws ec2 describe-security-groups \
--filters Name=group-name,Values=supermetal-metadata-sg Name=vpc-id,Values=$VPC_ID \
--query 'SecurityGroups[0].GroupId' --output text)
# 2. For database access
aws ec2 authorize-security-group-egress \
--group-id $SG_ID \
--protocol tcp \
--port $DATABASE_PORT \
--source-group $DATABASE_SECURITY_GROUP_ID
# Add database access rule to your database's security group
aws ec2 authorize-security-group-ingress \
--group-id $DATABASE_SECURITY_GROUP_ID \
--protocol tcp \
--port $DATABASE_PORT \
--source-group $SG_ID
```
* `VPC_ID`: The VPC ID of your environment. Refer to the [Deployment Parameters](#deployment-parameters) section
* `DATABASE_SECURITY_GROUP_ID`: The security group ID of your database
* `DATABASE_PORT`: The port number your database listens on (e.g., 5432 for PostgreSQL)
* The metadata endpoint security group (`supermetal-metadata-sg`) is automatically looked up using the VPC ID
#### Create Security Group and Configure Database Access \[!toc]
1. Navigate to the **AWS EC2 Console** > **Security Groups**
2. Click **Create security group**
3. Configure the basic details:
* **Description:** "Security group for Supermetal agents "
* **VPC:** Select the same VPC used for your Supermetal environment
4. Leave **Inbound rules** empty (no rules needed)
5. **Outbound rules (required):**
* Add rule for metadata access:
* **Type:** HTTPS
* **Port:** 443
* **Destination:** Security group named `supermetal-metadata-sg` (created during environment bootstrap)
* Add rule for database access:
* **Type:** Custom TCP
* **Port:** Your database port
* **Destination:** Your database's security group
6. Click **Create security group** and note the Security Group ID
#### Configure Database Access \[!toc]
7. Locate your database's security group
8. Edit its inbound rules:
* **Type:** Custom TCP
* **Port:** Your database port (e.g., 5432 for PostgreSQL)
* **Source:** Select the `supermetal-agent-sg` security group
9. Save the rules
#### CloudFormation Template for Security Groups \[!toc]
```yaml
Resources:
# Create the Supermetal Agent Security Group
SupermetalAgentSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Security group for Supermetal agents
VpcId: !Ref VpcId
SecurityGroupEgress:
# Required outbound rules
- Description: Allow access to metadata service
IpProtocol: tcp
FromPort: 443
ToPort: 443
SourceSecurityGroupId: !Ref MetadataSecurityGroup
- Description: Allow access to database
IpProtocol: tcp
FromPort: !Ref DatabasePort
ToPort: !Ref DatabasePort
SourceSecurityGroupId: !Ref DatabaseSecurityGroupId
# Look up the existing metadata security group
MetadataSecurityGroup:
Type: AWS::EC2::SecurityGroup::Id
Properties:
Filters:
- Name: group-name
Values: [supermetal-metadata-sg]
- Name: vpc-id
Values: [!Ref VpcId]
# Add the ingress rule to the database security group
DatabaseSecurityGroupIngress:
Type: AWS::EC2::SecurityGroupIngress
Properties:
GroupId: !Ref DatabaseSecurityGroupId
IpProtocol: tcp
FromPort: !Ref DatabasePort
ToPort: !Ref DatabasePort
SourceSecurityGroupId: !Ref SupermetalAgentSecurityGroup
Parameters:
VpcId:
Type: AWS::EC2::VPC::Id
Description: VPC where the security groups will be created
DatabaseSecurityGroupId:
Type: AWS::EC2::SecurityGroup::Id
Description: Security Group ID of your database
DatabasePort:
Type: Number
Description: Port number your database listens on (e.g., 5432 for PostgreSQL)
MinValue: 1
MaxValue: 65535
Outputs:
AgentSecurityGroupId:
Description: ID of the Supermetal Agent Security Group
Value: !Ref SupermetalAgentSecurityGroup
```
This template creates both the agent security group and configures the database access in one deployment.
* `VpcId`: The VPC ID of your environment. Refer to the [Deployment Parameters](#deployment-parameters) section
* `DatabaseSecurityGroupId`: The security group ID of your database
* `DatabasePort`: The port number your database listens on (e.g., 5432 for PostgreSQL)
* The metadata endpoint security group (`supermetal-metadata-sg`) is automatically looked up using the VPC ID
#### Create Security Group and Configure Database Access \[!toc]
```terraform
# Create the agent security group
resource "aws_security_group" "supermetal_agent_sg" {
description = "Security group for Supermetal agents"
vpc_id = var.vpc_id
# Required outbound rules
egress {
description = "Allow access to metadata service"
from_port = 443
to_port = 443
protocol = "tcp"
security_group_ids = [data.aws_security_group.metadata_endpoint.id]
}
# Look up the metadata endpoint security group
data "aws_security_group" "metadata_endpoint" {
name = "supermetal-metadata-sg"
vpc_id = var.vpc_id
}
egress {
description = "Allow access to database"
from_port = var.database_port
to_port = var.database_port
protocol = "tcp"
security_group_ids = [var.database_security_group_id]
}
}
# Add database access rule
resource "aws_security_group_rule" "allow_supermetal_agent_to_db" {
type = "ingress"
from_port = var.database_port
to_port = var.database_port
protocol = "tcp"
source_security_group_id = aws_security_group.supermetal_agent_sg.id
security_group_id = var.database_security_group_id
}
# Output the security group ID for reference
output "supermetal_agent_sg_id" {
value = aws_security_group.supermetal_agent_sg.id
}
```
* `vpc_id`: The VPC ID of your environment. Refer to the [Deployment Parameters](#deployment-parameters) section
* `database_security_group_id`: The security group ID of your database
* `database_port`: The port number your database listens on (e.g., 5432 for PostgreSQL)
* The metadata endpoint security group (`supermetal-metadata-sg`) is automatically looked up using the VPC ID
#### Validate Source database \[!toc]
Follow the setup instructions in the Supermetal Console to input the created security group ID for the source database and validate the network reachability and connection credentials.
#### Repeat for Target database \[!toc]
Repeat the same steps for the target database.
#### Finalize \[!toc]
Once the source and target database connections are validated, you can finalize the connector setup which will launch the Supermetal agents in your VPC.
***
## Appendix
### Environment Bootstrap Scripts
Resources used to bootstrap the environment
```yaml
AWSTemplateFormatVersion: '2010-09-09'
Description: >-
CloudFormation template to set up the Supermetal BYOC environment in your AWS account.
This template creates IAM roles, a VPC Endpoint for communication with the Supermetal Control Plane and an optional KMS key.
Parameters:
VpcId:
Type: AWS::EC2::VPC::Id
Description: (Required) The ID of your existing VPC for agent deployment.
PrivateSubnetIds:
Type: List
Description: >-
(Required) Comma-separated list of private subnet IDs within the VpcId
where agents and the VPC endpoint will be deployed.
CredentialsKeyArn:
Type: String
Description: >-
(Optional) ARN of your existing KMS CMK for encrypting connector credentials.
If left blank, a new KMS key will be created.
Default: ""
ExternalId:
Type: String
Description: >-
(Required) A unique ID provided by the Supermetal Console.
Used for securing the cross-account IAM role.
SupermetalControlPlaneAWSAccountId:
Type: String
Description: The AWS Account ID of the Supermetal Control Plane.
NoEcho: true
SupermetalMetadataServiceEndpointName:
Type: String
Description: >-
The VPC Endpoint Service Name for the Supermetal Metadata Service (e.g., com.amazonaws.vpce.us-east-1.vpce-svc-xxxxxxxxxxxxxxxxx).
Conditions:
CreateKMSKey: !Equals [!Ref CredentialsKeyArn, ""]
Resources:
# IAM Role for Supermetal Control Plane
SupermetalControlPlaneRole:
Type: AWS::IAM::Role
Properties:
RoleName: !Sub "SupermetalControlPlaneRole-${VpcId}"
Tags:
- Key: Name
Value: !Sub "SupermetalControlPlaneRole-${VpcId}"
- Key: supermetal:byoc:vpc-id
Value: !Ref VpcId
- Key: supermetal:byoc:stack-name
Value: !Ref AWS::StackName
- Key: supermetal:byoc:external-id
Value: !Ref ExternalId
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
AWS: !Sub arn:aws:iam::${SupermetalControlPlaneAWSAccountId}:root
Action: sts:AssumeRole
Condition:
StringEquals:
sts:ExternalId: !Ref ExternalId
ManagedPolicyArns:
- !Ref SupermetalControlPlanePolicy
# Managed Policy for SupermetalControlPlaneRole
SupermetalControlPlanePolicy:
Type: AWS::IAM::ManagedPolicy
Properties:
ManagedPolicyName: !Sub "SupermetalControlPlanePolicy-${VpcId}"
PolicyDocument:
Version: '2012-10-17'
Statement:
# ECS Cluster Management
- Effect: Allow
Action:
- ecs:CreateCluster
- ecs:DeleteCluster
- ecs:DescribeClusters
Resource: "*" # Scoped to clusters with name prefix 'supermetal-' via condition
# Note: Conditions like 'ecs:cluster' on CreateCluster primarily apply to actions on existing resources.
# They do not strictly restrict naming during creation itself due to IAM evaluation timing.
Condition:
StringLike:
ecs:cluster: "supermetal-*"
# ECS Task and Service Management
- Effect: Allow
Action:
- ecs:CreateService
- ecs:UpdateService
- ecs:DeleteService
- ecs:RegisterTaskDefinition
- ecs:DeregisterTaskDefinition
- ecs:DescribeServices
- ecs:DescribeTasks
- ecs:RunTask
- ecs:StopTask
- ecs:ListTasks
- ecs:ListServices
- ecs:DescribeTaskDefinition
- ecs:ListContainerInstances
- ecs:DescribeContainerInstances
Resource: "*" # Scoped to clusters with name prefix 'supermetal-' via condition
Condition:
StringLike:
ecs:cluster: "supermetal-*"
# ECS Capacity Provider Management
- Effect: Allow
Action:
- ecs:CreateCapacityProvider
- ecs:DeleteCapacityProvider
- ecs:DescribeCapacityProviders
- ecs:UpdateCapacityProvider
- ecs:PutClusterCapacityProviders
Resource: "*" # Scoped to capacity providers with name prefix 'supermetal-' via condition
# Note: Conditions like 'ecs:capacity-provider' on CreateCapacityProvider primarily apply to actions on existing resources.
# They do not strictly restrict naming during creation itself due to IAM evaluation timing.
Condition:
StringLike:
ecs:capacity-provider: "supermetal-*"
# EC2 Capacity Management
- Effect: Allow
Action:
- autoscaling:CreateAutoScalingGroup
- autoscaling:UpdateAutoScalingGroup
- autoscaling:DeleteAutoScalingGroup
- autoscaling:DescribeAutoScalingGroups
- autoscaling:DescribeScalingActivities
- autoscaling:SetDesiredCapacity
- ec2:CreateLaunchTemplate
- ec2:DeleteLaunchTemplate
- ec2:DescribeLaunchTemplates
- ec2:DescribeLaunchTemplateVersions
- ec2:CreateTags
- ec2:DescribeInstances
Resource: "*"
Condition:
StringLike:
aws:ResourceTag/supermetal: "*"
# Lambda for Connector Validation
- Effect: Allow
Action:
- lambda:InvokeFunction
- lambda:CreateFunction
- lambda:DeleteFunction
- lambda:GetFunctionConfiguration
- lambda:UpdateFunctionConfiguration
- lambda:AddPermission
- lambda:RemovePermission
Resource: !Sub arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:supermetal-validation-*
# Lambda VPC Access
- Effect: Allow
Action:
- ec2:CreateNetworkInterface
- ec2:DescribeNetworkInterfaces
- ec2:DeleteNetworkInterface
- ec2:AssignPrivateIpAddresses
- ec2:UnassignPrivateIpAddresses
Resource: "*" # Scoped to specified VPC via condition
Condition:
StringEquals:
ec2:VpcId: !Ref VpcId
# Database Discovery
- Effect: Allow
Action:
- rds:DescribeDBInstances
- rds:DescribeDBClusters
- redshift:DescribeClusters
Resource: "*" # Scoped to specified VPC via condition
# Note: While ec2:VpcId is the correct key, RDS/Redshift Describe actions may not fully support VPC-based IAM filtering.
# The control plane might list all regional resources; further filtering may be needed application-side.
Condition:
StringEquals:
ec2:VpcId: !Ref VpcId
# S3 Buffer Management
- Effect: Allow
Action:
- s3:CreateBucket
- s3:DeleteBucket
- s3:PutBucketTagging
- s3:GetBucketTagging
- s3:PutEncryptionConfiguration
- s3:GetEncryptionConfiguration
Resource: !Sub arn:aws:s3:::supermetal-*
# Security Group Management
- Effect: Allow
Action:
- ec2:DescribeSecurityGroups
Resource: "*"
Condition:
StringEquals:
ec2:VpcId: !Ref VpcId
# Secrets Management
- Effect: Allow
Action:
- secretsmanager:CreateSecret
- secretsmanager:DeleteSecret
- secretsmanager:GetSecretValue
- secretsmanager:PutSecretValue
- secretsmanager:UpdateSecret
- secretsmanager:TagResource
Resource: !Sub arn:aws:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:supermetal-*
# KMS Encryption
- Effect: Allow
Action: kms:Encrypt
Resource: !If [CreateKMSKey, !GetAtt SupermetalKMSKey.Arn, !Ref CredentialsKeyArn]
# IAM Role Management
- Effect: Allow
Action: iam:PassRole
Resource:
- !GetAtt SupermetalAgentTaskRole.Arn
- !GetAtt SupermetalValidationLambdaRole.Arn
- !GetAtt SupermetalTaskExecutionRole.Arn
- !GetAtt SupermetalAgentEC2InstanceRole.Arn
Condition:
StringEquals:
iam:PassedToService:
- ecs-tasks.amazonaws.com
- lambda.amazonaws.com
- ec2.amazonaws.com
# IAM Role for Lambda Validation Function
SupermetalValidationLambdaRole:
Type: AWS::IAM::Role
Properties:
RoleName: !Sub "SupermetalValidationLambdaRole-${VpcId}"
Tags:
- Key: Name
Value: !Sub "SupermetalValidationLambdaRole-${VpcId}"
- Key: supermetal:byoc:vpc-id
Value: !Ref VpcId
- Key: supermetal:byoc:stack-name
Value: !Ref AWS::StackName
- Key: supermetal:byoc:external-id
Value: !Ref ExternalId
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: lambda.amazonaws.com
Action: sts:AssumeRole
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole
Path: /
# IAM Role for Supermetal Agent Tasks
SupermetalAgentTaskRole:
Type: AWS::IAM::Role
Properties:
RoleName: !Sub "SupermetalAgentTaskRole-${VpcId}"
Tags:
- Key: Name
Value: !Sub "SupermetalAgentTaskRole-${VpcId}"
- Key: supermetal:byoc:vpc-id
Value: !Ref VpcId
- Key: supermetal:byoc:stack-name
Value: !Ref AWS::StackName
- Key: supermetal:byoc:external-id
Value: !Ref ExternalId
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: ecs-tasks.amazonaws.com
Action: sts:AssumeRole
Condition:
ArnLike:
aws:SourceArn: !Sub arn:aws:ecs:${AWS::Region}:${AWS::AccountId}:task/supermetal-*
StringEquals:
aws:SourceAccount: !Ref AWS::AccountId
ManagedPolicyArns:
- !Ref SupermetalAgentTaskPolicy
Path: /
# IAM Role for ECS Task Execution
SupermetalTaskExecutionRole:
Type: AWS::IAM::Role
Properties:
RoleName: !Sub "SupermetalTaskExecutionRole-${VpcId}"
Tags:
- Key: Name
Value: !Sub "SupermetalTaskExecutionRole-${VpcId}"
- Key: supermetal:byoc:vpc-id
Value: !Ref VpcId
- Key: supermetal:byoc:stack-name
Value: !Ref AWS::StackName
- Key: supermetal:byoc:external-id
Value: !Ref ExternalId
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: ecs-tasks.amazonaws.com
Action: sts:AssumeRole
Condition:
StringEquals:
aws:SourceAccount: !Ref AWS::AccountId
ArnLike:
aws:SourceArn: !Sub arn:aws:ecs:${AWS::Region}:${AWS::AccountId}:task-definition/supermetal*:*
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy
- !Ref SupermetalTaskExecutionPolicy
Path: /
# Managed Policy for SupermetalAgentTaskRole
SupermetalAgentTaskPolicy:
Type: AWS::IAM::ManagedPolicy
Properties:
ManagedPolicyName: !Sub "SupermetalAgentTaskPolicy-${VpcId}"
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- kms:Decrypt
- kms:Encrypt
Resource: !If [CreateKMSKey, !GetAtt SupermetalKMSKey.Arn, !Ref CredentialsKeyArn]
# Secrets Access
- Effect: Allow
Action:
- secretsmanager:GetSecretValue
Resource: !Sub arn:aws:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:supermetal-*
# S3 Buffer Access
- Effect: Allow
Action:
- s3:PutObject
- s3:GetObject
- s3:DeleteObject
- s3:ListBucket
Resource:
- !Sub arn:aws:s3:::supermetal-*
- !Sub arn:aws:s3:::supermetal-*/*
# Managed Policy for SupermetalTaskExecutionRole
SupermetalTaskExecutionPolicy:
Type: AWS::IAM::ManagedPolicy
Properties:
ManagedPolicyName: !Sub "SupermetalTaskExecutionPolicy-${VpcId}"
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- secretsmanager:GetSecretValue
Resource: !Sub arn:aws:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:supermetal-*
- Effect: Allow
Action:
- kms:Decrypt
Resource: !If [CreateKMSKey, !GetAtt SupermetalKMSKey.Arn, !Ref CredentialsKeyArn]
# KMS Key for Connector Credentials
SupermetalKMSKey:
Type: AWS::KMS::Key
Condition: CreateKMSKey
Properties:
Description: KMS key for encrypting Supermetal connector credentials
Tags:
- Key: Name
Value: !Sub "SupermetalKMSKey-${VpcId}"
- Key: supermetal:byoc:vpc-id
Value: !Ref VpcId
- Key: supermetal:byoc:stack-name
Value: !Ref AWS::StackName
- Key: supermetal:byoc:external-id
Value: !Ref ExternalId
EnableKeyRotation: true
KeyPolicy:
Version: '2012-10-17'
Statement:
- Sid: Enable IAM User Permissions
Effect: Allow
Principal:
AWS: !Sub arn:aws:iam::${AWS::AccountId}:root
Action: kms:*
Resource: '*' # Key policy applies to this key only
- Sid: Allow Supermetal Control Plane to Encrypt
Effect: Allow
Principal:
AWS: !GetAtt SupermetalControlPlaneRole.Arn
Action: kms:Encrypt
Resource: '*' # Key policy applies to this key only
- Sid: Allow Supermetal Agents to Decrypt/Encrypt
Effect: Allow
Principal:
AWS: !GetAtt SupermetalAgentTaskRole.Arn
Action:
- kms:Decrypt
- kms:Encrypt
Resource: '*' # Key policy applies to this key only
SupermetalKMSKeyAlias:
Type: AWS::KMS::Alias
Condition: CreateKMSKey
Properties:
AliasName: !Sub "alias/supermetal-key-${VpcId}"
TargetKeyId: !Ref SupermetalKMSKey
SupermetalAgentEC2InstanceProfile:
Type: AWS::IAM::InstanceProfile
Properties:
InstanceProfileName: !Sub "SupermetalAgentEC2InstanceProfile-${VpcId}"
Tags:
- Key: Name
Value: !Sub "SupermetalAgentEC2InstanceProfile-${VpcId}"
- Key: supermetal:byoc:vpc-id
Value: !Ref VpcId
- Key: supermetal:byoc:stack-name
Value: !Ref AWS::StackName
- Key: supermetal:byoc:external-id
Value: !Ref ExternalId
Path: /
Roles:
- !Ref SupermetalAgentEC2InstanceRole
SupermetalAgentEC2InstanceRole:
Type: AWS::IAM::Role
Properties:
RoleName: !Sub "SupermetalAgentEC2InstanceRole-${VpcId}"
Tags:
- Key: Name
Value: !Sub "SupermetalAgentEC2InstanceRole-${VpcId}"
- Key: supermetal:byoc:vpc-id
Value: !Ref VpcId
- Key: supermetal:byoc:stack-name
Value: !Ref AWS::StackName
- Key: supermetal:byoc:external-id
Value: !Ref ExternalId
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: ec2.amazonaws.com
Action: sts:AssumeRole
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role
- !Ref SupermetalAgentEC2InstancePolicy
Path: /
# Managed Policy for EC2 Instance Role
SupermetalAgentEC2InstancePolicy:
Type: AWS::IAM::ManagedPolicy
Properties:
ManagedPolicyName: !Sub "SupermetalAgentEC2InstancePolicy-${VpcId}"
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- ecr:GetAuthorizationToken
Resource: "*"
- Effect: Allow
Action:
- ecr:BatchGetImage
- ecr:GetDownloadUrlForLayer
Resource: !Sub arn:aws:ecr:${AWS::Region}:${SupermetalControlPlaneAWSAccountId}:repository/agent/*
SupermetalAgentMetadataSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupName: !Sub "supermetal-agent-metadata-sg-${VpcId}"
GroupDescription: Security group for Supermetal Agents
VpcId: !Ref VpcId
Tags:
- Key: Name
Value: !Sub "supermetal-agent-metadata-sg-${VpcId}"
- Key: supermetal:byoc:vpc-id
Value: !Ref VpcId
- Key: supermetal:byoc:stack-name
Value: !Ref AWS::StackName
- Key: supermetal:byoc:external-id
Value: !Ref ExternalId
# VPC Endpoint for Supermetal Metadata Service
MetadataVPCEndpoint:
Type: AWS::EC2::VPCEndpoint
Properties:
ServiceName: !Ref SupermetalMetadataServiceEndpointName
VpcId: !Ref VpcId
SubnetIds: !Ref PrivateSubnetIds
VpcEndpointType: Interface
PrivateDnsEnabled: true
SecurityGroupIds:
- !Ref MetadataVPCEndpointSecurityGroup
Tags:
- Key: Name
Value: !Sub "MetadataVPCEndpoint-${VpcId}"
- Key: supermetal:byoc:vpc-id
Value: !Ref VpcId
- Key: supermetal:byoc:stack-name
Value: !Ref AWS::StackName
- Key: supermetal:byoc:external-id
Value: !Ref ExternalId
MetadataVPCEndpointSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupName: !Sub "supermetal-metadata-vpce-sg-${VpcId}"
GroupDescription: Security group for Supermetal metadata VPC Endpoint
VpcId: !Ref VpcId
SecurityGroupIngress: # Allow agents to connect to the endpoint
- IpProtocol: tcp
FromPort: 443 # HTTPS
ToPort: 443
SourceSecurityGroupId: !Ref SupermetalAgentMetadataSecurityGroup
Tags:
- Key: Name
Value: !Sub "supermetal-metadata-vpce-sg-${VpcId}"
- Key: supermetal:byoc:vpc-id
Value: !Ref VpcId
- Key: supermetal:byoc:stack-name
Value: !Ref AWS::StackName
- Key: supermetal:byoc:external-id
Value: !Ref ExternalId
Outputs:
SupermetalControlPlaneRoleArn:
Description: ARN of the IAM Role for Supermetal Control Plane
Value: !GetAtt SupermetalControlPlaneRole.Arn
SupermetalAgentTaskRoleArn:
Description: ARN of the IAM Role for Supermetal Agent Tasks
Value: !GetAtt SupermetalAgentTaskRole.Arn
SupermetalKMSKeyArnOutput:
Description: ARN of the KMS Key for encrypting credentials
Value: !If [CreateKMSKey, !GetAtt SupermetalKMSKey.Arn, !Ref CredentialsKeyArn]
SupermetalTaskExecutionRoleArn:
Description: ARN of the IAM Role for ECS Task Execution
Value: !GetAtt SupermetalTaskExecutionRole.Arn
MetadataVPCEndpointId:
Description: ID of the VPC Endpoint for Supermetal Metadata Service
Value: !Ref MetadataVPCEndpoint
SupermetalAgentMetadataSecurityGroupId:
Description: ID of the Security Group for Supermetal Agents (for Metadata Endpoint communication)
Value: !GetAtt SupermetalAgentMetadataSecurityGroup.GroupId
SupermetalValidationLambdaRoleArn:
Description: ARN of the IAM Role for Supermetal Validation Lambda functions
Value: !GetAtt SupermetalValidationLambdaRole.Arn
SupermetalAgentEC2InstanceRoleArn:
Description: ARN of the IAM Role for Supermetal Agent EC2 Instances
Value: !GetAtt SupermetalAgentEC2InstanceRole.Arn
SupermetalAgentEC2InstanceProfileArn:
Description: ARN of the IAM Instance Profile for Supermetal Agent EC2 Instances
Value: !GetAtt SupermetalAgentEC2InstanceProfile.Arn
```
```terraform
# This sets up the Supermetal BYOC environment in your AWS account.
# It creates IAM roles, a VPC Endpoint for communication with the Supermetal Control Plane,
# and an optional KMS key.
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0" # Specify a compatible AWS provider version
}
}
}
# ------------------------------------------------------------------------------
# Input Variables
# ------------------------------------------------------------------------------
variable "vpc_id" {
description = "(Required) The ID of your existing VPC for agent deployment."
type = string
}
variable "private_subnet_ids" {
description = "(Required) List of private subnet IDs within the VpcId where agents and the VPC endpoint will be deployed."
type = list(string)
}
variable "credentials_key_arn" {
description = "(Optional) ARN of your existing KMS CMK for encrypting connector credentials. If left blank, a new KMS key will be created."
type = string
default = ""
}
variable "external_id" {
description = "(Required) A unique ID provided by the Supermetal Console. Used for securing the cross-account IAM role."
type = string
}
variable "supermetal_control_plane_aws_account_id" {
description = "The AWS Account ID of the Supermetal Control Plane."
type = string
}
variable "supermetal_metadata_service_endpoint_name" {
description = "The VPC Endpoint Service Name for the Supermetal Metadata Service (e.g., com.amazonaws.vpce.us-east-1.vpce-svc-xxxxxxxxxxxxxxxxx)."
type = string
}
locals {
create_kms_key = var.credentials_key_arn == ""
common_tags = {
"supermetal:byoc:vpc-id" = var.vpc_id
"supermetal:byoc:stack-name" = "terraform-supermetal-byoc" # Terraform doesn't have a direct AWS::StackName equivalent, using a placeholder
"supermetal:byoc:external-id" = var.external_id
}
}
data "aws_caller_identity" "current" {}
data "aws_region" "current" {}
# ------------------------------------------------------------------------------
# Resources
# ------------------------------------------------------------------------------
# IAM Role for Supermetal Control Plane
resource "aws_iam_role" "supermetal_control_plane_role" {
name = "SupermetalControlPlaneRole-${var.vpc_id}"
tags = merge(local.common_tags, {
Name = "SupermetalControlPlaneRole-${var.vpc_id}"
})
assume_role_policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Effect = "Allow",
Principal = {
AWS = "arn:aws:iam::${var.supermetal_control_plane_aws_account_id}:root"
},
Action = "sts:AssumeRole",
Condition = {
StringEquals = {
"sts:ExternalId" = var.external_id
}
}
}
]
})
}
# Managed Policy for SupermetalControlPlaneRole
resource "aws_iam_policy" "supermetal_control_plane_policy" {
name = "SupermetalControlPlanePolicy-${var.vpc_id}"
description = "Policy for Supermetal Control Plane Role"
policy = jsonencode({
Version = "2012-10-17",
Statement = [
# ECS Cluster Management
{
Effect = "Allow",
Action = [
"ecs:CreateCluster",
"ecs:DeleteCluster",
"ecs:DescribeClusters"
],
Resource = "*", # Scoped to clusters with name prefix 'supermetal-' via condition
Condition = {
StringLike = {
"ecs:cluster" = "supermetal-*"
}
}
},
# ECS Task and Service Management
{
Effect = "Allow",
Action = [
"ecs:CreateService",
"ecs:UpdateService",
"ecs:DeleteService",
"ecs:RegisterTaskDefinition",
"ecs:DeregisterTaskDefinition",
"ecs:DescribeServices",
"ecs:DescribeTasks",
"ecs:RunTask",
"ecs:StopTask",
"ecs:ListTasks",
"ecs:ListServices",
"ecs:DescribeTaskDefinition",
"ecs:ListContainerInstances",
"ecs:DescribeContainerInstances"
],
Resource = "*", # Scoped to clusters with name prefix 'supermetal-' via condition
Condition = {
StringLike = {
"ecs:cluster" = "supermetal-*"
}
}
},
# ECS Capacity Provider Management
{
Effect = "Allow",
Action = [
"ecs:CreateCapacityProvider",
"ecs:DeleteCapacityProvider",
"ecs:DescribeCapacityProviders",
"ecs:UpdateCapacityProvider",
"ecs:PutClusterCapacityProviders"
],
Resource = "*", # Scoped to capacity providers with name prefix 'supermetal-' via condition
Condition = {
StringLike = {
"ecs:capacity-provider" = "supermetal-*"
}
}
},
# EC2 Capacity Management
{
Effect = "Allow",
Action = [
"autoscaling:CreateAutoScalingGroup",
"autoscaling:UpdateAutoScalingGroup",
"autoscaling:DeleteAutoScalingGroup",
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeScalingActivities",
"autoscaling:SetDesiredCapacity",
"ec2:CreateLaunchTemplate",
"ec2:DeleteLaunchTemplate",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeLaunchTemplateVersions",
"ec2:CreateTags",
"ec2:DescribeInstances"
],
Resource = "*",
Condition = {
StringLike = {
"aws:ResourceTag/supermetal" = "*"
}
}
},
# Lambda for Connector Validation
{
Effect = "Allow",
Action = [
"lambda:InvokeFunction",
"lambda:CreateFunction",
"lambda:DeleteFunction",
"lambda:GetFunctionConfiguration",
"lambda:UpdateFunctionConfiguration",
"lambda:AddPermission",
"lambda:RemovePermission"
],
Resource = "arn:aws:lambda:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:function:supermetal-validation-*"
},
# Lambda VPC Access
{
Effect = "Allow",
Action = [
"ec2:CreateNetworkInterface",
"ec2:DescribeNetworkInterfaces",
"ec2:DeleteNetworkInterface",
"ec2:AssignPrivateIpAddresses",
"ec2:UnassignPrivateIpAddresses"
],
Resource = "*", # Scoped to specified VPC via condition
Condition = {
StringEquals = {
"ec2:VpcId" = var.vpc_id
}
}
},
# Database Discovery
{
Effect = "Allow",
Action = [
"rds:DescribeDBInstances",
"rds:DescribeDBClusters",
"redshift:DescribeClusters"
],
Resource = "*", # Scoped to specified VPC via condition
Condition = {
StringEquals = {
"ec2:VpcId" = var.vpc_id
}
}
},
# S3 Buffer Management
{
Effect = "Allow",
Action = [
"s3:CreateBucket",
"s3:DeleteBucket",
"s3:PutBucketTagging",
"s3:GetBucketTagging",
"s3:PutEncryptionConfiguration",
"s3:GetEncryptionConfiguration"
],
Resource = "arn:aws:s3:::supermetal-*"
},
# Security Group Management
{
Effect = "Allow",
Action = [
"ec2:DescribeSecurityGroups"
],
Resource = "*",
Condition = {
StringEquals = {
"ec2:VpcId" = var.vpc_id
}
}
},
# Secrets Management
{
Effect = "Allow",
Action = [
"secretsmanager:CreateSecret",
"secretsmanager:DeleteSecret",
"secretsmanager:GetSecretValue",
"secretsmanager:PutSecretValue",
"secretsmanager:UpdateSecret",
"secretsmanager:TagResource"
],
Resource = "arn:aws:secretsmanager:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:secret:supermetal-*"
},
# KMS Encryption
{
Effect = "Allow",
Action = "kms:Encrypt",
Resource = local.create_kms_key ? aws_kms_key.supermetal_kms_key[0].arn : var.credentials_key_arn
},
# IAM Role Management
{
Effect = "Allow",
Action = "iam:PassRole",
Resource = [
aws_iam_role.supermetal_agent_task_role.arn,
aws_iam_role.supermetal_validation_lambda_role.arn,
aws_iam_role.supermetal_task_execution_role.arn,
aws_iam_role.supermetal_agent_ec2_instance_role.arn
],
Condition = {
StringEquals = {
"iam:PassedToService" = [
"ecs-tasks.amazonaws.com",
"lambda.amazonaws.com",
"ec2.amazonaws.com"
]
}
}
}
]
})
}
resource "aws_iam_role_policy_attachment" "supermetal_control_plane_role_policy_attach" {
role = aws_iam_role.supermetal_control_plane_role.name
policy_arn = aws_iam_policy.supermetal_control_plane_policy.arn
}
# IAM Role for Lambda Validation Function
resource "aws_iam_role" "supermetal_validation_lambda_role" {
name = "SupermetalValidationLambdaRole-${var.vpc_id}"
path = "/"
tags = merge(local.common_tags, {
Name = "SupermetalValidationLambdaRole-${var.vpc_id}"
})
assume_role_policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Effect = "Allow",
Principal = {
Service = "lambda.amazonaws.com"
},
Action = "sts:AssumeRole"
}
]
})
}
resource "aws_iam_role_policy_attachment" "supermetal_validation_lambda_role_vpc_access_policy_attach" {
role = aws_iam_role.supermetal_validation_lambda_role.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole"
}
# IAM Role for Supermetal Agent Tasks
resource "aws_iam_role" "supermetal_agent_task_role" {
name = "SupermetalAgentTaskRole-${var.vpc_id}"
path = "/"
tags = merge(local.common_tags, {
Name = "SupermetalAgentTaskRole-${var.vpc_id}"
})
assume_role_policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Effect = "Allow",
Principal = {
Service = "ecs-tasks.amazonaws.com"
},
Action = "sts:AssumeRole",
Condition = {
ArnLike = {
"aws:SourceArn" = "arn:aws:ecs:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:task/supermetal-*"
},
StringEquals = {
"aws:SourceAccount" = data.aws_caller_identity.current.account_id
}
}
}
]
})
}
# Managed Policy for SupermetalAgentTaskRole
resource "aws_iam_policy" "supermetal_agent_task_policy" {
name = "SupermetalAgentTaskPolicy-${var.vpc_id}"
description = "Policy for Supermetal Agent Task Role"
policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Effect = "Allow",
Action = [
"kms:Decrypt",
"kms:Encrypt"
],
Resource = local.create_kms_key ? aws_kms_key.supermetal_kms_key[0].arn : var.credentials_key_arn
},
# Secrets Access
{
Effect = "Allow",
Action = [
"secretsmanager:GetSecretValue"
],
Resource = "arn:aws:secretsmanager:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:secret:supermetal-*"
},
# S3 Buffer Access
{
Effect = "Allow",
Action = [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject",
"s3:ListBucket"
],
Resource = [
"arn:aws:s3:::supermetal-*",
"arn:aws:s3:::supermetal-*/*"
]
}
]
})
}
resource "aws_iam_role_policy_attachment" "supermetal_agent_task_role_policy_attach" {
role = aws_iam_role.supermetal_agent_task_role.name
policy_arn = aws_iam_policy.supermetal_agent_task_policy.arn
}
# IAM Role for ECS Task Execution
resource "aws_iam_role" "supermetal_task_execution_role" {
name = "SupermetalTaskExecutionRole-${var.vpc_id}"
path = "/"
tags = merge(local.common_tags, {
Name = "SupermetalTaskExecutionRole-${var.vpc_id}"
})
assume_role_policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Effect = "Allow",
Principal = {
Service = "ecs-tasks.amazonaws.com"
},
Action = "sts:AssumeRole",
Condition = {
StringEquals = {
"aws:SourceAccount" = data.aws_caller_identity.current.account_id
},
ArnLike = {
"aws:SourceArn" = "arn:aws:ecs:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:task-definition/supermetal*:*"
}
}
}
]
})
}
resource "aws_iam_role_policy_attachment" "supermetal_task_execution_role_ecs_policy_attach" {
role = aws_iam_role.supermetal_task_execution_role.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}
# Managed Policy for SupermetalTaskExecutionRole
resource "aws_iam_policy" "supermetal_task_execution_policy" {
name = "SupermetalTaskExecutionPolicy-${var.vpc_id}"
description = "Policy for Supermetal Task Execution Role"
policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Effect = "Allow",
Action = [
"secretsmanager:GetSecretValue"
],
Resource = "arn:aws:secretsmanager:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:secret:supermetal-*"
},
{
Effect = "Allow",
Action = [
"kms:Decrypt"
],
Resource = local.create_kms_key ? aws_kms_key.supermetal_kms_key[0].arn : var.credentials_key_arn
}
]
})
}
resource "aws_iam_role_policy_attachment" "supermetal_task_execution_role_custom_policy_attach" {
role = aws_iam_role.supermetal_task_execution_role.name
policy_arn = aws_iam_policy.supermetal_task_execution_policy.arn
}
# KMS Key for Connector Credentials
resource "aws_kms_key" "supermetal_kms_key" {
count = local.create_kms_key ? 1 : 0
description = "KMS key for encrypting Supermetal connector credentials"
enable_key_rotation = true
tags = merge(local.common_tags, {
Name = "SupermetalKMSKey-${var.vpc_id}"
})
policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Sid = "Enable IAM User Permissions",
Effect = "Allow",
Principal = {
AWS = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"
},
Action = "kms:*",
Resource = "*"
},
{
Sid = "Allow Supermetal Control Plane to Encrypt",
Effect = "Allow",
Principal = {
AWS = aws_iam_role.supermetal_control_plane_role.arn
},
Action = "kms:Encrypt",
Resource = "*"
},
{
Sid = "Allow Supermetal Agents to Decrypt/Encrypt",
Effect = "Allow",
Principal = {
AWS = aws_iam_role.supermetal_agent_task_role.arn
},
Action = [
"kms:Decrypt",
"kms:Encrypt"
],
Resource = "*"
}
]
})
}
resource "aws_kms_alias" "supermetal_kms_key_alias" {
count = local.create_kms_key ? 1 : 0
name = "alias/supermetal-key-${var.vpc_id}"
target_key_id = aws_kms_key.supermetal_kms_key[0].key_id
}
# IAM Role for Supermetal Agent EC2 Instances
resource "aws_iam_role" "supermetal_agent_ec2_instance_role" {
name = "SupermetalAgentEC2InstanceRole-${var.vpc_id}"
path = "/"
tags = merge(local.common_tags, {
Name = "SupermetalAgentEC2InstanceRole-${var.vpc_id}"
})
assume_role_policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Effect = "Allow",
Principal = {
Service = "ec2.amazonaws.com"
},
Action = "sts:AssumeRole"
}
]
})
}
# Managed Policy for EC2 Instance Role
resource "aws_iam_policy" "supermetal_agent_ec2_instance_policy" {
name = "SupermetalAgentEC2InstancePolicy-${var.vpc_id}"
description = "Policy for Supermetal EC2 Instance Role"
policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Effect = "Allow",
Action = [
"ecr:GetAuthorizationToken"
],
Resource = "*"
},
{
Effect = "Allow",
Action = [
"ecr:BatchGetImage",
"ecr:GetDownloadUrlForLayer"
],
Resource = "arn:aws:ecr:${data.aws_region.current.name}:${var.supermetal_control_plane_aws_account_id}:repository/agent/*"
}
]
})
}
resource "aws_iam_role_policy_attachment" "supermetal_agent_ec2_instance_role_ecs_policy_attach" {
role = aws_iam_role.supermetal_agent_ec2_instance_role.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role"
}
resource "aws_iam_role_policy_attachment" "supermetal_agent_ec2_instance_role_custom_policy_attach" {
role = aws_iam_role.supermetal_agent_ec2_instance_role.name
policy_arn = aws_iam_policy.supermetal_agent_ec2_instance_policy.arn
}
resource "aws_iam_instance_profile" "supermetal_agent_ec2_instance_profile" {
name = "SupermetalAgentEC2InstanceProfile-${var.vpc_id}"
role = aws_iam_role.supermetal_agent_ec2_instance_role.name
path = "/"
tags = merge(local.common_tags, {
Name = "SupermetalAgentEC2InstanceProfile-${var.vpc_id}"
})
}
# Security Group for Supermetal Agents (for Metadata Endpoint communication)
resource "aws_security_group" "supermetal_agent_metadata_sg" {
name = "supermetal-agent-metadata-sg-${var.vpc_id}"
description = "Security group for Supermetal Agents"
vpc_id = var.vpc_id
tags = merge(local.common_tags, {
Name = "supermetal-agent-metadata-sg-${var.vpc_id}"
})
}
# Security Group for Metadata VPC Endpoint
resource "aws_security_group" "metadata_vpc_endpoint_sg" {
name = "supermetal-metadata-vpce-sg-${var.vpc_id}"
description = "Security group for Supermetal metadata VPC Endpoint"
vpc_id = var.vpc_id
tags = merge(local.common_tags, {
Name = "supermetal-metadata-vpce-sg-${var.vpc_id}"
})
ingress {
description = "Allow agents to connect to the endpoint (HTTPS)"
from_port = 443
to_port = 443
protocol = "tcp"
security_groups = [aws_security_group.supermetal_agent_metadata_sg.id]
}
}
# VPC Endpoint for Supermetal Metadata Service
resource "aws_vpc_endpoint" "metadata_vpc_endpoint" {
vpc_id = var.vpc_id
service_name = var.supermetal_metadata_service_endpoint_name
vpc_endpoint_type = "Interface"
subnet_ids = var.private_subnet_ids
private_dns_enabled = true
security_group_ids = [
aws_security_group.metadata_vpc_endpoint_sg.id
]
tags = merge(local.common_tags, {
Name = "MetadataVPCEndpoint-${var.vpc_id}"
})
}
# ------------------------------------------------------------------------------
# Outputs
# ------------------------------------------------------------------------------
output "supermetal_control_plane_role_arn" {
description = "ARN of the IAM Role for Supermetal Control Plane"
value = aws_iam_role.supermetal_control_plane_role.arn
}
output "supermetal_agent_task_role_arn" {
description = "ARN of the IAM Role for Supermetal Agent Tasks"
value = aws_iam_role.supermetal_agent_task_role.arn
}
output "supermetal_kms_key_arn_output" {
description = "ARN of the KMS Key for encrypting credentials"
value = local.create_kms_key ? aws_kms_key.supermetal_kms_key[0].arn : var.credentials_key_arn
}
output "supermetal_task_execution_role_arn" {
description = "ARN of the IAM Role for ECS Task Execution"
value = aws_iam_role.supermetal_task_execution_role.arn
}
output "metadata_vpc_endpoint_id" {
description = "ID of the VPC Endpoint for Supermetal Metadata Service"
value = aws_vpc_endpoint.metadata_vpc_endpoint.id
}
output "supermetal_agent_metadata_security_group_id" {
description = "ID of the Security Group for Supermetal Agents (for Metadata Endpoint communication)"
value = aws_security_group.supermetal_agent_metadata_sg.id
}
output "supermetal_validation_lambda_role_arn" {
description = "ARN of the IAM Role for Supermetal Validation Lambda functions"
value = aws_iam_role.supermetal_validation_lambda_role.arn
}
output "supermetal_agent_ec2_instance_role_arn" {
description = "ARN of the IAM Role for Supermetal Agent EC2 Instances"
value = aws_iam_role.supermetal_agent_ec2_instance_role.arn
}
output "supermetal_agent_ec2_instance_profile_arn" {
description = "ARN of the IAM Instance Profile for Supermetal Agent EC2 Instances"
value = aws_iam_instance_profile.supermetal_agent_ec2_instance_profile.arn
}
```
***
### IAM Roles & Policies
Bootstrap creates several IAM roles and policies with specific permissions necessary for Supermetal BYOC to function securely.
These roles ensure that the Supermetal Control Plane can manage resources in your AWS account on your behalf, and that the Supermetal agents have the necessary permissions to operate.
The principle of least privilege is applied to scope down permissions as much as possible.
Key roles created and their purposes:
#### SupermetalControlPlaneRole
```json
{
"SupermetalControlPlaneRole": {
"Type": "AWS::IAM::Role",
"Properties": {
"RoleName": "SupermetalControlPlaneRole-${VPC_ID}",
"Tags": [
{ "Key": "Name", "Value": "SupermetalControlPlaneRole-${VPC_ID}" },
{ "Key": "supermetal:byoc:vpc-id", "Value": "${VPC_ID}" },
{ "Key": "supermetal:byoc:stack-name", "Value": "${STACK_NAME}" },
{ "Key": "supermetal:byoc:external-id", "Value": "${EXTERNAL_ID}" }
],
"AssumeRolePolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": { "AWS": "arn:aws:iam::${SUPERMETAL_CONTROL_PLANE_AWS_ACCOUNT_ID}:root" },
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": { "sts:ExternalId": "${EXTERNAL_ID}" }
}
}
]
},
"ManagedPolicyArns": [
"arn:aws:iam::${AWS_ACCOUNT_ID}:policy/SupermetalControlPlanePolicy-${VPC_ID}"
]
}
}
```
```json
{
"SupermetalControlPlanePolicy": {
"Type": "AWS::IAM::ManagedPolicy",
"Properties": {
"ManagedPolicyName": "SupermetalControlPlanePolicy-${VPC_ID}",
"PolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["ecs:CreateCluster", "ecs:DeleteCluster", "ecs:DescribeClusters"],
"Resource": "*",
"Condition": { "StringLike": { "ecs:cluster": "supermetal-*" } }
},
{
"Effect": "Allow",
"Action": [
"ecs:CreateService", "ecs:UpdateService", "ecs:DeleteService", "ecs:RegisterTaskDefinition",
"ecs:DeregisterTaskDefinition", "ecs:DescribeServices", "ecs:DescribeTasks", "ecs:RunTask",
"ecs:StopTask", "ecs:ListTasks", "ecs:ListServices", "ecs:DescribeTaskDefinition",
"ecs:ListContainerInstances", "ecs:DescribeContainerInstances"
],
"Resource": "*",
"Condition": { "StringLike": { "ecs:cluster": "supermetal-*" } }
},
{
"Effect": "Allow",
"Action": [
"ecs:CreateCapacityProvider", "ecs:DeleteCapacityProvider", "ecs:DescribeCapacityProviders",
"ecs:UpdateCapacityProvider", "ecs:PutClusterCapacityProviders"
],
"Resource": "*",
"Condition": { "StringLike": {"ecs:capacity-provider": "supermetal-*"} }
},
{
"Effect": "Allow",
"Action": [
"autoscaling:CreateAutoScalingGroup", "autoscaling:UpdateAutoScalingGroup", "autoscaling:DeleteAutoScalingGroup",
"autoscaling:DescribeAutoScalingGroups", "autoscaling:DescribeScalingActivities", "autoscaling:SetDesiredCapacity",
"ec2:CreateLaunchTemplate", "ec2:DeleteLaunchTemplate", "ec2:DescribeLaunchTemplates",
"ec2:DescribeLaunchTemplateVersions", "ec2:CreateTags", "ec2:DescribeInstances"
],
"Resource": "*",
"Condition": { "StringLike": { "aws:ResourceTag/supermetal": "*" } }
},
{
"Effect": "Allow",
"Action": [
"lambda:InvokeFunction", "lambda:CreateFunction", "lambda:DeleteFunction",
"lambda:GetFunctionConfiguration", "lambda:UpdateFunctionConfiguration",
"lambda:AddPermission", "lambda:RemovePermission"
],
"Resource": "arn:aws:lambda:${AWS_REGION}:${AWS_ACCOUNT_ID}:function:supermetal-validation-*"
},
{
"Effect": "Allow",
"Action": [
"ec2:CreateNetworkInterface", "ec2:DescribeNetworkInterfaces", "ec2:DeleteNetworkInterface",
"ec2:AssignPrivateIpAddresses", "ec2:UnassignPrivateIpAddresses"
],
"Resource": "*",
"Condition": { "StringEquals": { "ec2:VpcId": "${VPC_ID}"} }
},
{
"Effect": "Allow",
"Action": ["rds:DescribeDBInstances", "rds:DescribeDBClusters", "redshift:DescribeClusters"],
"Resource": "*",
"Condition": { "StringEquals": { "ec2:VpcId": "${VPC_ID}"} }
},
{
"Effect": "Allow",
"Action": [
"s3:CreateBucket", "s3:DeleteBucket", "s3:PutBucketTagging", "s3:GetBucketTagging",
"s3:PutEncryptionConfiguration", "s3:GetEncryptionConfiguration"
],
"Resource": "arn:aws:s3:::supermetal-*"
},
{
"Effect": "Allow",
"Action": "ec2:DescribeSecurityGroups",
"Resource": "*",
"Condition": { "StringEquals": { "ec2:VpcId": "${VPC_ID}"} }
},
{
"Effect": "Allow",
"Action": [
"secretsmanager:CreateSecret", "secretsmanager:DeleteSecret", "secretsmanager:GetSecretValue",
"secretsmanager:PutSecretValue", "secretsmanager:UpdateSecret", "secretsmanager:TagResource"
],
"Resource": "arn:aws:secretsmanager:${AWS_REGION}:${AWS_ACCOUNT_ID}:secret:supermetal-*"
},
{
"Effect": "Allow",
"Action": "kms:Encrypt",
"Resource": "${CUSTOMER_CREDENTIALS_KMS_KEY_ARN}"
},
{
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": [
"${SUPERMETAL_AGENT_TASK_ROLE_ARN}",
"${SUPERMETAL_VALIDATION_LAMBDA_ROLE_ARN}",
"${SUPERMETAL_TASK_EXECUTION_ROLE_ARN}",
"${SUPERMETAL_AGENT_EC2_INSTANCE_ROLE_ARN}"
],
"Condition": {
"StringEquals": {
"iam:PassedToService": ["ecs-tasks.amazonaws.com", "lambda.amazonaws.com", "ec2.amazonaws.com"]
}
}
}
]
}
}
}
}
```
This role is assumed by the Supermetal Control Plane. It is granted permissions to orchestrate and manage the lifecycle of resources required for the Supermetal data plane within your AWS account.
* **ECS Resources**
* Create and manage ECS clusters, services, tasks, and capacity providers (e.g., `ecs:CreateCluster`, `ecs:DeleteCluster`, `ecs:CreateService`, `ecs:RunTask`)
* Resources are restricted to those tagged or named with `supermetal-*` prefix
* **EC2 and Auto Scaling**
* Manage EC2 instances and launch templates (`ec2:CreateLaunchTemplate`, `ec2:DescribeInstances`, `ec2:CreateTags`)
* Control Auto Scaling groups (`autoscaling:CreateAutoScalingGroup`, `autoscaling:UpdateAutoScalingGroup`)
* Resources are typically conditioned on tags like `supermetal:*`
* Network interface management for VPC access (`ec2:CreateNetworkInterface`, `ec2:DescribeNetworkInterfaces`)
* Security group discovery (`ec2:DescribeSecurityGroups`) for network configuration
* **Lambda Functions**
* Deploy and manage validation functions named `supermetal-validation-*` (`lambda:CreateFunction`, `lambda:InvokeFunction`, `lambda:DeleteFunction`)
* VPC access permissions for Lambda functions
* Pre-deployment validation and auxiliary tasks within your VPC
* **Database Access**
* Read-only discovery of potential data sources (`rds:DescribeDBInstances`, `rds:DescribeDBClusters`, `redshift:DescribeClusters`)
* Scoped to resources within specified VPC (`ec2:VpcId`)
* **Storage and Data Management**
* S3 bucket management (`s3:CreateBucket`, `s3:DeleteBucket`, `s3:PutBucketTagging`) for buckets prefixed with `supermetal-*`
* Used for data buffering and staging per connector
* **Security and Access Control**
* Secrets Manager: Full control over secrets prefixed with `supermetal-*` (`secretsmanager:CreateSecret`, `secretsmanager:GetSecretValue`)
* KMS: Encryption permissions (`kms:Encrypt`) on specified KMS keys
* IAM: Pass roles to AWS services (`iam:PassRole`) for ECS tasks, Lambda functions, and EC2 instances
***
#### SupermetalAgentRole
```json
{
"SupermetalAgentTaskRole": {
"Type": "AWS::IAM::Role",
"Properties": {
"RoleName": "SupermetalAgentTaskRole-${VPC_ID}",
"Tags": [
{ "Key": "Name", "Value": "SupermetalAgentTaskRole-${VPC_ID}" },
{ "Key": "supermetal:byoc:vpc-id", "Value": "${VPC_ID}" },
{ "Key": "supermetal:byoc:stack-name", "Value": "${STACK_NAME}" },
{ "Key": "supermetal:byoc:external-id", "Value": "${EXTERNAL_ID}" }
],
"AssumeRolePolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": { "Service": "ecs-tasks.amazonaws.com" },
"Action": "sts:AssumeRole",
"Condition": {
"ArnLike": { "aws:SourceArn": "arn:aws:ecs:${AWS_REGION}:${AWS_ACCOUNT_ID}:task/supermetal-*" },
"StringEquals": { "aws:SourceAccount": "${AWS_ACCOUNT_ID}" }
}
}
]
},
"ManagedPolicyArns": [
"arn:aws:iam::${AWS_ACCOUNT_ID}:policy/SupermetalAgentTaskPolicy-${VPC_ID}"
],
"Path": "/"
}
}
}
```
```json
{
"SupermetalAgentTaskPolicy": {
"Type": "AWS::IAM::ManagedPolicy",
"Properties": {
"ManagedPolicyName": "SupermetalAgentTaskPolicy-${VPC_ID}",
"PolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["kms:Decrypt", "kms:Encrypt"],
"Resource": "${CUSTOMER_CREDENTIALS_KMS_KEY_ARN}"
},
{
"Effect": "Allow",
"Action": ["secretsmanager:GetSecretValue"],
"Resource": "arn:aws:secretsmanager:${AWS_REGION}:${AWS_ACCOUNT_ID}:secret:supermetal-*"
},
{
"Effect": "Allow",
"Action": ["s3:PutObject", "s3:GetObject", "s3:DeleteObject", "s3:ListBucket"],
"Resource": [
"arn:aws:s3:::supermetal-*",
"arn:aws:s3:::supermetal-*/*"
]
}
]
}
}
}
}
```
This IAM role is assumed by the Supermetal agent tasks running on ECS within your AWS account. It grants the agent the necessary permissions to perform its data processing duties:
* **Data Security and Encryption**
* KMS: Encrypt and decrypt operations (`kms:Decrypt`, `kms:Encrypt`) on specified KMS key
* Used for connector credential decryption
* **Configuration Management**
* Secrets Manager: Access to secrets (`secretsmanager:GetSecretValue`) prefixed with `supermetal-*`
* Retrieves necessary API keys for Supermetal control plane and to authenticate to ECR repository for agent images
* **Data Processing**
* S3: Full object operations on `supermetal-*` buckets (`s3:PutObject`, `s3:GetObject`, `s3:DeleteObject`, `s3:ListBucket`)
* Used to create buckets per connector to buffer data
***
#### SupermetalTaskExecutionRole
```json
{
"SupermetalTaskExecutionRole": {
"Type": "AWS::IAM::Role",
"Properties": {
"RoleName": "SupermetalTaskExecutionRole-${VPC_ID}",
"Tags": [
{ "Key": "Name", "Value": "SupermetalTaskExecutionRole-${VPC_ID}" },
{ "Key": "supermetal:byoc:vpc-id", "Value": "${VPC_ID}" },
{ "Key": "supermetal:byoc:stack-name", "Value": "${STACK_NAME}" },
{ "Key": "supermetal:byoc:external-id", "Value": "${EXTERNAL_ID}" }
],
"AssumeRolePolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": { "Service": "ecs-tasks.amazonaws.com" },
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": { "aws:SourceAccount": "${AWS_ACCOUNT_ID}" },
"ArnLike": { "aws:SourceArn": "arn:aws:ecs:${AWS_REGION}:${AWS_ACCOUNT_ID}:task-definition/supermetal*:*" }
}
}
]
},
"ManagedPolicyArns": [
"arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy",
"arn:aws:iam::${AWS_ACCOUNT_ID}:policy/SupermetalTaskExecutionPolicy-${VPC_ID}"
],
"Path": "/"
}
}
}
```
```json
{
"SupermetalTaskExecutionPolicy": {
"Type": "AWS::IAM::ManagedPolicy",
"Properties": {
"ManagedPolicyName": "SupermetalTaskExecutionPolicy-${VPC_ID}",
"PolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["secretsmanager:GetSecretValue"],
"Resource": "arn:aws:secretsmanager:${AWS_REGION}:${AWS_ACCOUNT_ID}:secret:supermetal-*"
},
{
"Effect": "Allow",
"Action": ["kms:Decrypt"],
"Resource": "${CUSTOMER_CREDENTIALS_KMS_KEY_ARN}"
}
]
}
}
}
}
```
This role is used by Amazon ECS to launch your agent tasks. It has two sets of permissions:
* **Core ECS Operations**
* Uses AWS managed policy `arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy`
* Pull container images from Amazon ECR
* **Agent Configuration**
* Custom policy `SupermetalTaskExecutionPolicy` provides:
* Access to Secrets Manager (`secretsmanager:GetSecretValue`) for fetching Control Plane API secrets
* KMS decryption (`kms:Decrypt`) to decrypt connector credentials
* Makes configuration available to agent containers via environment variables
***
#### SupermetalAgentEC2InstanceRole
```json
{
"SupermetalAgentEC2InstanceRole": {
"Type": "AWS::IAM::Role",
"Properties": {
"RoleName": "SupermetalAgentEC2InstanceRole-${VPC_ID}",
"Tags": [
{ "Key": "Name", "Value": "SupermetalAgentEC2InstanceRole-${VPC_ID}" },
{ "Key": "supermetal:byoc:vpc-id", "Value": "${VPC_ID}" },
{ "Key": "supermetal:byoc:stack-name", "Value": "${STACK_NAME}" },
{ "Key": "supermetal:byoc:external-id", "Value": "${EXTERNAL_ID}" }
],
"AssumeRolePolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": { "Service": "ec2.amazonaws.com" },
"Action": "sts:AssumeRole"
}
]
},
"ManagedPolicyArns": [
"arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role",
"arn:aws:iam::${AWS_ACCOUNT_ID}:policy/SupermetalAgentEC2InstancePolicy-${VPC_ID}"
],
"Path": "/"
}
}
}
```
```json
{
"SupermetalAgentEC2InstancePolicy": {
"Type": "AWS::IAM::ManagedPolicy",
"Properties": {
"ManagedPolicyName": "SupermetalAgentEC2InstancePolicy-${VPC_ID}",
"PolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["ecr:GetAuthorizationToken"],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": ["ecr:BatchGetImage", "ecr:GetDownloadUrlForLayer"],
"Resource": "arn:aws:ecr:${AWS_REGION}:${SUPERMETAL_CONTROL_PLANE_AWS_ACCOUNT_ID}:repository/agent/*"
}
]
}
}
}
}
```
This role is attached to EC2 instances that form your ECS cluster's capacity when using the EC2 launch type. It has two sets of permissions:
* **Core ECS Instance Operations**
* Uses AWS managed policy `arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role`
* Register with ECS cluster
* Pull common container images
* Send instance logs
* **Supermetal Agent Access**
* Custom policy `SupermetalAgentEC2InstancePolicy` provides:
* ECR authentication (`ecr:GetAuthorizationToken`)
* Image pulling from Supermetal's ECR repository (`ecr:BatchGetImage`, `ecr:GetDownloadUrlForLayer`)
* Enables instances to pull agent container images from Supermetal's control plane
***
#### SupermetalValidationLambdaRole
```json
{
"SupermetalValidationLambdaRole": {
"Type": "AWS::IAM::Role",
"Properties": {
"RoleName": "SupermetalValidationLambdaRole-${VPC_ID}",
"Tags": [
{ "Key": "Name", "Value": "SupermetalValidationLambdaRole-${VPC_ID}" },
{ "Key": "supermetal:byoc:vpc-id", "Value": "${VPC_ID}" },
{ "Key": "supermetal:byoc:stack-name", "Value": "${STACK_NAME}" },
{ "Key": "supermetal:byoc:external-id", "Value": "${EXTERNAL_ID}" }
],
"AssumeRolePolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": { "Service": "lambda.amazonaws.com" },
"Action": "sts:AssumeRole"
}
]
},
"ManagedPolicyArns": [
"arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole"
],
"Path": "/"
}
}
}
```
This role is assumed by Lambda functions that perform pre-deployment validation tasks. It provides:
* **VPC Access and Networking**
* Uses AWS managed policy `arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole`
* Create and manage elastic network interfaces
* Access resources within your VPC
* Send function logs to CloudWatch
# Azure (/docs/main/concepts/deployment/byoc/azure)
import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
import { Step, Steps } from 'fumadocs-ui/components/steps';
import { Accordion, Accordions } from 'fumadocs-ui/components/accordion';
import ByocAzureDiagram from '../../../../../../app/diagrams/byoc_azure';
Follow this guide to provision Supermetal BYOC (Bring Your Own Cloud), which deploys Supermetal's data replication agents in your own Azure virtual cloud environment, maintaining full control and ownership over your data.
This provides an additional layer of security and isolation. Supermetal handles provisioning, operations, and maintenance.
***
## Overview
Supermetal's BYOC model for Azure separates responsibilities between Supermetal's control plane and your Azure environment. The data replication agents run within your Azure Virtual Network (VNet), giving you full control over your data while Supermetal manages the agent lifecycle.
| Phase | Resource | Description |
| ---------------------- | -------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Bootstrap** | Azure AD Service Principal | Represents Supermetal's application and allows it to authenticate to your Azure AD tenant with specific permissions. |
| | Azure Role Definition | Defines the precise set of permissions Supermetal requires to manage resources in your Azure subscription. Designed for least privilege. |
| | Azure Role Assignment | Assigns the Custom Role to Supermetal's Service Principal, scoped to a designated Resource Group. |
| | Azure Key Vault | Securely stores secrets and keys used by Supermetal for bootstrap information and by agents for runtime credentials. You can use your own Customer-Managed Keys (CMK). |
| | Private Endpoint | An Azure Private Endpoint for secure communication between Supermetal agents and the Supermetal control plane via Azure Private Link. |
| | Network Security Group Rules | Security rules to allow necessary traffic for the Supermetal agents and Private Endpoint. |
| **Registration** | Key Vault Secrets | Secrets stored in your Azure Key Vault for agent bootstrap and operation. |
| **Connector Creation** | Virtual Machine Scale Set (VMSS) | Hosts the Supermetal data replication agents with dedicated hardware capacity. |
| | User-Assigned Managed Identity | Assigned to the VMSS instances, granting agents permissions to access necessary Azure resources. |
| | Azure Blob Storage | Used by agents for temporary buffering or staging data if required by the replication process. |
## Prerequisites
To successfully deploy Supermetal BYOC in your Azure environment, please ensure you have the following:
#### Supermetal Account Identifiers
You'll need the following Supermetal Azure identifiers (obtainable from the Supermetal Console under Settings):
* `supermetal_tenant_id`: Tenant ID of Supermetal's Azure AD
* `supermetal_app_id`: The Application (Client) ID of Supermetal's Azure AD application registration
* `supermetal_principal_id`: The Object ID of Supermetal's service principal (required for role assignments)
* `external_id`: A unique identifier for securing cross-tenant access
#### Azure Account & Permissions
* An Azure subscription with Owner or User Access Administrator permissions to create service principals and assign roles
#### Deployment Parameters
* **Azure Subscription ID (`subscription_id`):** The ID of your Azure subscription where resources will be deployed
* **Azure Region (`location`):** The Azure region for deployment (e.g., `eastus2`, `westeurope`)
* **Resource Group (`resource_group_name`):** The name of a Resource Group where Supermetal will deploy resources
* **VNet ID (`vnet_id`):** The resource ID of your existing Virtual Network where Supermetal agents will be deployed
* **Subnet IDs (`private_subnet_ids`):** A list of subnet resource IDs within your specified VNet. Supermetal agents will be deployed in these subnets. Ensure these subnets are in the same region as your source (and target) databases
* **(Optional) Key Vault ID (`credentials_key_vault_id`):** Decide if you'll use an existing Azure Key Vault for storing sensitive connector credentials. If you have an existing Key Vault, have its resource ID ready. If left blank, the bootstrap process will create a new Key Vault specifically for Supermetal
***
## Setup
A two-phase process: first bootstrap the environment, then create connectors to deploy the data plane / agents in your Azure VNet.
#### Environment Bootstrap
The bootstrap process uses a Terraform script provided by Supermetal to provision the necessary resources, including the Service Principal, Role, and private endpoint.
These resources enable Supermetal to automatically register your Azure environment with the Supermetal control plane over the private endpoint, and to deploy Supermetal data plane / agents in your VNet.
The initial environment bootstrap process does not provision any compute resources or the data plane / agents. Once the environment bootstrap is complete, you can create connectors to deploy Supermetal data plane / agents on-demand in your VNet.
#### Obtain Terraform Script \[!toc]
* Obtain the latest Terraform script from the Supermetal Console. The Terraform code in the [appendix](#environment-bootstrap-scripts) can be used as a reference.
#### Configure Terraform \[!toc]
* Ensure you have [Terraform installed](https://learn.hashicorp.com/tutorials/terraform/install-cli) (version 1.0 or later).
* Set up your Azure credentials for Terraform. This typically involves authenticating with the Azure CLI using `az login` and ensuring you have the appropriate permissions.
* Verify access by running `az account show`. You should see your Azure subscription details.
#### Prepare Terraform Files \[!toc]
* Create a new directory for your Terraform configuration.
* If using the appendix script, save the Terraform code into files within this directory (e.g., `main.tf`).
* Create a `terraform.tfvars` file in the same directory to specify the required variable values. Alternatively, you can pass variables via the command line.
**Example `terraform.tfvars`:**
```tfvars
# Supermetal identifiers (provided by Supermetal)
supermetal_tenant_id = "00000000-0000-0000-0000-000000000000"
supermetal_app_id = "00000000-0000-0000-0000-000000000000"
supermetal_principal_id = "00000000-0000-0000-0000-000000000000"
external_id = "unique-external-id-from-supermetal"
# Azure environment parameters
subscription_id = "00000000-0000-0000-0000-000000000000"
location = "eastus2"
resource_group_name = "supermetal-byoc-rg"
vnet_id = "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/network-rg/providers/Microsoft.Network/virtualNetworks/production-vnet"
private_subnet_ids = [
"/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/network-rg/providers/Microsoft.Network/virtualNetworks/production-vnet/subnets/data-subnet",
"/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/network-rg/providers/Microsoft.Network/virtualNetworks/production-vnet/subnets/private-endpoint-subnet"
]
# credentials_key_vault_id = "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/security-rg/providers/Microsoft.KeyVault/vaults/my-existing-keyvault" # Optional
```
Refer to the [Deployment Parameters](#deployment-parameters) section and the variable descriptions within the Terraform script for detailed explanations of each.
* `supermetal_tenant_id` - Tenant ID of Supermetal's Azure AD
* `supermetal_app_id` - Application (Client) ID of Supermetal's Azure AD application
* `supermetal_principal_id` - Object ID of Supermetal's service principal
* `external_id` - Unique identifier for securing cross-tenant access
* `subscription_id` - Your Azure subscription ID
* `location` - Azure region for deployment
* `resource_group_name` - Resource group name for Supermetal resources
* `vnet_id` - Resource ID of your existing VNet
* `private_subnet_ids` - List of subnet resource IDs within your VNet
* `credentials_key_vault_id` (Optional) - Resource ID of your existing Key Vault
#### Deploy with Terraform CLI \[!toc]
* Navigate to your Terraform configuration directory in your terminal.
* Initialize Terraform:
```bash
terraform init
```
* Review the execution plan with your tfvars file:
```bash
terraform plan -var-file=terraform.tfvars
```
* Apply the configuration using your tfvars file:
```bash
terraform apply -var-file=terraform.tfvars
```
Confirm the action by typing `yes` when prompted.
#### Verification \[!toc]
* Monitor the `terraform apply` command output. It will indicate when the resources have been successfully created.
* Once the Terraform apply is complete, Supermetal's Control Plane automatically performs a registration process for the new environment.
* Monitor the Environments page in your Supermetal Console. The environment should show up with a "Ready" status within a few minutes after Terraform completes.
#### Create Connector(s) to deploy Supermetal agents
Once the environment bootstrap is complete, we are now ready to create a connector to deploy Supermetal agents in your Azure environment.
#### Select source database \[!toc]
Follow the connector setup instructions in the Supermetal Console to select a source database, supermetal can optionally [auto discover](#database_discovery) available source databases.
#### Configure Database Network Security Group \[!toc]
To give Supermetal access to your source database and to validate network reachability and connection credentials, you need to ensure your database's network security group allows connections from the Supermetal agent subnet.
Supermetal Console will list pre-filled steps to configure the database access as part of the prerequisite steps similar to the steps listed below.
```bash
# Set variables for your environment
RESGROUP="your-resource-group-name"
CONNECTOR_NAME="connector1" # Unique identifier for this connector
NSG_NAME="supermetal-${CONNECTOR_NAME}-nsg" # Unique NSG name for this connector
DB_NSG_NAME="your-db-nsg-name"
VNET_NAME="your-vnet-name"
AGENT_SUBNET_NAME="your-agent-subnet-name" # Subnet where Supermetal agents will be deployed
LOCATION="eastus2"
DB_PORT=5432 # For PostgreSQL
SUPERMETAL_METADATA_ENDPOINT="Provided-By-Supermetal" # Specific endpoint for Supermetal metadata service
# Create the NSG for Supermetal agents
az network nsg create \
--resource-group $RESGROUP \
--name $NSG_NAME \
--location $LOCATION
# Add outbound rule for metadata private endpoint access
az network nsg rule create \
--resource-group $RESGROUP \
--nsg-name $NSG_NAME \
--name "AllowMetadataAccess" \
--priority 100 \
--direction "Outbound" \
--access "Allow" \
--protocol "Tcp" \
--destination-port-ranges 443 \
--destination-address-prefixes $SUPERMETAL_METADATA_ENDPOINT \
--description "Allow access to Supermetal metadata service"
# Add outbound rule for database access
az network nsg rule create \
--resource-group $RESGROUP \
--nsg-name $NSG_NAME \
--name "AllowDatabaseAccess" \
--priority 110 \
--direction "Outbound" \
--access "Allow" \
--protocol "Tcp" \
--destination-port-ranges $DB_PORT \
--destination-address-prefixes "YOUR_DATABASE_SUBNET_PREFIX" \
--description "Allow access to database"
# Add inbound rule to database NSG to allow access from Supermetal agent
az network nsg rule create \
--resource-group $RESGROUP \
--nsg-name $DB_NSG_NAME \
--name "AllowSupermetalAgentAccess" \
--priority 100 \
--direction "Inbound" \
--access "Allow" \
--protocol "Tcp" \
--source-address-prefixes "YOUR_AGENT_SUBNET_PREFIX" \
--destination-port-ranges $DB_PORT \
--description "Allow Supermetal agent access to database"
# Associate the NSG with the agent subnet
az network vnet subnet update \
--resource-group $RESGROUP \
--vnet-name $VNET_NAME \
--name $AGENT_SUBNET_NAME \
--network-security-group $NSG_NAME
```
* `RESGROUP`: Your Azure resource group name where the NSGs will be created
* `CONNECTOR_NAME`: A unique identifier for this connector (important if you're deploying multiple connectors)
* `VNET_NAME`: The name of your Azure Virtual Network
* `AGENT_SUBNET_NAME`: The name of the subnet where Supermetal agents will be deployed
* `YOUR_DATABASE_SUBNET_PREFIX`: CIDR range of your database subnet (e.g., "10.0.2.0/24")
* `YOUR_AGENT_SUBNET_PREFIX`: CIDR range of the subnet where Supermetal agents will be deployed
* `DB_PORT`: The port number your database listens on (e.g., 5432 for PostgreSQL)
* `SUPERMETAL_METADATA_ENDPOINT`: Specific endpoint for the Supermetal metadata service (provided by Supermetal)
#### Create Network Security Group and Configure Database Access \[!toc]
1. Navigate to the **Azure Portal** > **Network Security Groups**
2. Click **Create network security group**
3. Configure the basic details:
* **Resource group:** Select your resource group
* **Name:** "supermetal-`${connector-name}`-nsg" (e.g., "supermetal-connector1-nsg")
* **Region:** Select the same region as your Azure resources
4. Click **Review + create** and then **Create**
#### Configure Outbound Rules \[!toc]
5. Once created, navigate to the new NSG
6. Select **Outbound security rules**
7. Click **Add** to create an outbound rule for metadata access:
* **Destination:** IP Addresses
* **Destination IP addresses:** Supermetal-provided metadata endpoint address
* **Service:** HTTPS
* **Port ranges:** 443
* **Priority:** 100
* **Name:** "AllowMetadataAccess"
8. Add another outbound rule for database access:
* **Destination:** IP Addresses
* **Destination IP addresses/CIDR ranges:** Your database subnet CIDR
* **Service:** Custom
* **Protocol:** TCP
* **Port ranges:** Your database port (e.g., 5432)
* **Priority:** 110
* **Name:** "AllowDatabaseAccess"
#### Configure Database NSG Inbound Rule \[!toc]
9. Navigate to your database's NSG
10. Select **Inbound security rules**
11. Click **Add** to create a new rule:
* **Source:** IP Addresses
* **Source IP addresses/CIDR ranges:** Your agent subnet CIDR
* **Service:** Custom
* **Protocol:** TCP
* **Port ranges:** Your database port (e.g., 5432)
* **Priority:** 100
* **Name:** "AllowSupermetal`${ConnectorName}`Access" (e.g., "AllowSupermetalConnector1Access")
12. Save the rule
#### Associate NSG with Subnet \[!toc]
13. Navigate to **Virtual Networks** > your VNet > **Subnets**
14. Click on the subnet where Supermetal agents will be deployed
15. Under **Network security group**, select the NSG you created earlier
16. Click **Save**
#### Create Network Security Group and Configure Database Access \[!toc]
```terraform
# Variables for connector-specific resources
variable "connector_name" {
description = "Unique identifier for this connector (used in resource naming)"
type = string
default = "connector1"
}
variable "vnet_name" {
description = "Name of your virtual network"
type = string
}
variable "agent_subnet_name" {
description = "Name of the subnet where Supermetal agents will be deployed"
type = string
}
variable "supermetal_metadata_endpoint" {
description = "Specific endpoint for Supermetal metadata service (provided by Supermetal)"
type = string
}
# Create the agent network security group with unique name for this connector
resource "azurerm_network_security_group" "supermetal_agent_nsg" {
name = "supermetal-${var.connector_name}-nsg"
location = var.location
resource_group_name = var.resource_group_name
# Outbound rule for metadata private endpoint access
security_rule {
name = "AllowMetadataAccess"
priority = 100
direction = "Outbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_range = "443"
source_address_prefix = "*"
destination_address_prefix = var.supermetal_metadata_endpoint
description = "Allow access to Supermetal metadata service"
}
# Outbound rule for database access
security_rule {
name = "AllowDatabaseAccess"
priority = 110
direction = "Outbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_range = var.database_port
source_address_prefix = "*"
destination_address_prefix = var.database_subnet_cidr
description = "Allow access to database"
}
}
# Look up existing database NSG
data "azurerm_network_security_group" "database_nsg" {
name = var.database_nsg_name
resource_group_name = var.database_resource_group_name
}
# Add inbound rule to database NSG
resource "azurerm_network_security_rule" "allow_supermetal_agent" {
name = "AllowSupermetal${title(var.connector_name)}Access"
priority = 100
direction = "Inbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_range = var.database_port
source_address_prefix = var.agent_subnet_cidr
destination_address_prefix = "*"
resource_group_name = var.database_resource_group_name
network_security_group_name = var.database_nsg_name
}
# Associate the NSG with the agent subnet
data "azurerm_virtual_network" "vnet" {
name = var.vnet_name
resource_group_name = var.resource_group_name
}
data "azurerm_subnet" "agent_subnet" {
name = var.agent_subnet_name
virtual_network_name = var.vnet_name
resource_group_name = var.resource_group_name
}
resource "azurerm_subnet_network_security_group_association" "nsg_association" {
subnet_id = data.azurerm_subnet.agent_subnet.id
network_security_group_id = azurerm_network_security_group.supermetal_agent_nsg.id
}
# Output the NSG ID for reference
output "supermetal_agent_nsg_id" {
value = azurerm_network_security_group.supermetal_agent_nsg.id
}
```
* `connector_name`: Unique identifier for this connector (important if you're deploying multiple connectors)
* `resource_group_name`: Your Azure resource group name
* `location`: Azure region for deployment
* `vnet_name`: The name of your Azure Virtual Network
* `agent_subnet_name`: The name of the subnet where Supermetal agents will be deployed
* `database_subnet_cidr`: CIDR range of your database subnet (e.g., "10.0.2.0/24")
* `agent_subnet_cidr`: CIDR range of the subnet where Supermetal agents will be deployed
* `database_nsg_name`: The name of your existing database NSG
* `database_resource_group_name`: The resource group containing your database NSG
* `database_port`: The port number your database listens on (e.g., 5432 for PostgreSQL)
* `supermetal_metadata_endpoint`: Specific endpoint for Supermetal metadata service (provided by Supermetal)
#### Validate Source database \[!toc]
Follow the setup instructions in the Supermetal Console to input the created NSG ID for the source database and validate the network reachability and connection credentials.
#### Repeat for Target database \[!toc]
Repeat the same steps for the target database.
#### Finalize \[!toc]
Once the source and target database connections are validated, you can finalize the connector setup which will launch the Supermetal agents in your Azure VNet.
***
## Appendix
### Environment Bootstrap Scripts
Supermetal provides Terraform scripts for bootstrapping your Azure environment.
The scripts create all necessary resources and permissions for Supermetal to manage the data replication agents in your Azure subscription.
```terraform
terraform {
required_version = ">= 1.0"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 3.0"
}
azuread = {
source = "hashicorp/azuread"
version = "~> 2.0"
}
}
}
provider "azurerm" {
features {
key_vault {
purge_soft_delete_on_destroy = false
recover_soft_deleted_key_vaults = true
}
}
}
provider "azuread" {}
# =============================================================================
# Data Sources
# =============================================================================
data "azurerm_client_config" "current" {}
data "azurerm_subscription" "current" {}
# =============================================================================
# Input Variables
# =============================================================================
variable "resource_group_name" {
description = "Name of the Resource Group where Supermetal will deploy resources"
type = string
validation {
condition = can(regex("^[a-zA-Z0-9-_]+$", var.resource_group_name))
error_message = "Resource group name must contain only alphanumeric characters, hyphens, and underscores."
}
}
variable "location" {
description = "Azure region for deployment (e.g., eastus, westeurope)"
type = string
}
variable "vnet_id" {
description = "Resource ID of your existing Virtual Network (e.g., /subscriptions/.../resourceGroups/.../providers/Microsoft.Network/virtualNetworks/...)"
type = string
}
variable "private_subnet_ids" {
description = "List of private subnet Resource IDs where agents and private endpoints will be deployed"
type = list(string)
validation {
condition = length(var.private_subnet_ids) > 0
error_message = "At least one private subnet must be provided."
}
}
variable "credentials_key_vault_id" {
description = "Resource ID of existing Key Vault for storing credentials. If empty, a new Key Vault will be created"
type = string
default = ""
}
variable "external_id" {
description = "Unique identifier provided by Supermetal Console for securing cross-tenant access"
type = string
sensitive = true
}
variable "supermetal_tenant_id" {
description = "Azure AD Tenant ID of Supermetal Control Plane"
type = string
}
variable "supermetal_app_id" {
description = "Azure AD Application ID of Supermetal Control Plane"
type = string
}
variable "supermetal_principal_id" {
description = "Object ID of Supermetal's Service Principal"
type = string
}
variable "supermetal_metadata_service_id" {
description = "Resource ID of Supermetal Metadata Service's Private Link Service"
type = string
}
variable "supermetal_container_registry_name" {
description = "Name of Supermetal's container registry for agent images"
type = string
}
# =============================================================================
# Local Values
# =============================================================================
locals {
# Parse VNet information
vnet_parts = split("/", var.vnet_id)
vnet_name = element(local.vnet_parts, length(local.vnet_parts) - 1)
vnet_resource_group = element(local.vnet_parts, index(local.vnet_parts, "resourceGroups") + 1)
# Subnet selection
primary_subnet_id = var.private_subnet_ids[0]
# Key Vault configuration
create_key_vault = var.credentials_key_vault_id == ""
key_vault_id = local.create_key_vault ? azurerm_key_vault.credentials[0].id : var.credentials_key_vault_id
key_vault_name = local.create_key_vault ? "supermetal-kv-${substr(md5(var.resource_group_name), 0, 8)}" : ""
existing_kv_rg = local.create_key_vault ? "" : split("/", var.credentials_key_vault_id)[4]
existing_kv_name = local.create_key_vault ? "" : split("/", var.credentials_key_vault_id)[8]
# Common tags
common_tags = {
"supermetal:byoc:external-id" = var.external_id
"supermetal:byoc:resource-group" = var.resource_group_name
"supermetal:byoc:stack-name" = "supermetal-bootstrap-${var.resource_group_name}"
"supermetal:managed" = "true"
"Environment" = "Production"
}
}
# =============================================================================
# Resource Group Data Source
# =============================================================================
data "azurerm_resource_group" "target" {
name = var.resource_group_name
}
# =============================================================================
# Networking Data Sources
# =============================================================================
data "azurerm_virtual_network" "vnet" {
name = local.vnet_name
resource_group_name = local.vnet_resource_group
}
data "azurerm_subnet" "primary" {
name = element(split("/", local.primary_subnet_id), length(split("/", local.primary_subnet_id)) - 1)
virtual_network_name = local.vnet_name
resource_group_name = local.vnet_resource_group
}
# =============================================================================
# Custom Role Definition for Supermetal Control Plane
# =============================================================================
resource "azurerm_role_definition" "supermetal_control_plane" {
name = "SupermetalControlPlaneRole-${var.resource_group_name}"
scope = data.azurerm_subscription.current.id
description = "Allows Supermetal Control Plane to manage resources in customer subscription"
permissions {
actions = [
# ===== Virtual Machine Scale Sets =====
"Microsoft.Compute/virtualMachineScaleSets/read",
"Microsoft.Compute/virtualMachineScaleSets/write",
"Microsoft.Compute/virtualMachineScaleSets/delete",
"Microsoft.Compute/virtualMachineScaleSets/start/action",
"Microsoft.Compute/virtualMachineScaleSets/powerOff/action",
"Microsoft.Compute/virtualMachineScaleSets/restart/action",
"Microsoft.Compute/virtualMachineScaleSets/deallocate/action",
"Microsoft.Compute/virtualMachineScaleSets/manualUpgrade/action",
"Microsoft.Compute/virtualMachineScaleSets/scale/action",
"Microsoft.Compute/virtualMachineScaleSets/instanceView/read",
"Microsoft.Compute/virtualMachineScaleSets/skus/read",
# ===== VMSS Extensions (for Docker/Agent setup) =====
"Microsoft.Compute/virtualMachineScaleSets/extensions/read",
"Microsoft.Compute/virtualMachineScaleSets/extensions/write",
"Microsoft.Compute/virtualMachineScaleSets/extensions/delete",
# ===== Virtual Machines (VMSS instances) =====
"Microsoft.Compute/virtualMachines/read",
"Microsoft.Compute/virtualMachines/instanceView/read",
# ===== Managed Disks =====
"Microsoft.Compute/disks/read",
"Microsoft.Compute/disks/write",
"Microsoft.Compute/disks/delete",
# ===== Container Instances (for validation) =====
"Microsoft.ContainerInstance/containerGroups/read",
"Microsoft.ContainerInstance/containerGroups/write",
"Microsoft.ContainerInstance/containerGroups/delete",
"Microsoft.ContainerInstance/containerGroups/restart/action",
"Microsoft.ContainerInstance/containerGroups/stop/action",
"Microsoft.ContainerInstance/containerGroups/start/action",
"Microsoft.ContainerInstance/containerGroups/containers/logs/read",
"Microsoft.ContainerInstance/containerGroups/containers/exec/action",
# ===== Container Registry Access =====
"Microsoft.ContainerRegistry/registries/pull/read",
"Microsoft.ContainerRegistry/registries/artifacts/read",
"Microsoft.ContainerRegistry/registries/metadata/read",
# ===== Storage Accounts =====
"Microsoft.Storage/storageAccounts/read",
"Microsoft.Storage/storageAccounts/write",
"Microsoft.Storage/storageAccounts/delete",
"Microsoft.Storage/storageAccounts/listkeys/action",
"Microsoft.Storage/storageAccounts/listAccountSas/action",
"Microsoft.Storage/storageAccounts/blobServices/containers/read",
"Microsoft.Storage/storageAccounts/blobServices/containers/write",
"Microsoft.Storage/storageAccounts/blobServices/containers/delete",
# ===== Networking =====
"Microsoft.Network/virtualNetworks/subnets/join/action"
"Microsoft.Network/networkInterfaces/read"
"Microsoft.Network/networkSecurityGroups/read"
"Microsoft.Network/networkSecurityGroups/join/action"
# ===== Key Vault Management =====
"Microsoft.KeyVault/vaults/read",
# ===== Database Discovery (Read-only) - Scoped to resource group =====
"Microsoft.Sql/servers/read",
"Microsoft.Sql/servers/databases/read",
"Microsoft.DBforPostgreSQL/servers/read",
"Microsoft.DBforPostgreSQL/flexibleServers/read",
"Microsoft.DBforMySQL/servers/read",
"Microsoft.DBforMySQL/flexibleServers/read",
"Microsoft.DBforMariaDB/servers/read"
]
not_actions = []
data_actions = [
# ===== Key Vault Data Operations (scoped to supermetal-* items) =====
"Microsoft.KeyVault/vaults/secrets/getSecret/action",
"Microsoft.KeyVault/vaults/secrets/setSecret/action",
"Microsoft.KeyVault/vaults/keys/encrypt/action",
"Microsoft.KeyVault/vaults/keys/decrypt/action",
# ===== Storage Data Operations =====
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/add/action"
]
not_data_actions = []
}
assignable_scopes = [
data.azurerm_resource_group.target.id
]
}
# =============================================================================
# Role Assignment for Supermetal Control Plane
# =============================================================================
resource "azurerm_role_assignment" "supermetal_control_plane" {
scope = data.azurerm_resource_group.target.id
role_definition_id = azurerm_role_definition.supermetal_control_plane.role_definition_resource_id
principal_id = var.supermetal_principal_id
principal_type = "ServicePrincipal"
# Condition to validate this is the correct service principal
condition = <<-EOT
(
@Request[Microsoft.Authorization/roleAssignments:principalId] == '${var.supermetal_principal_id}'
)
AND
(
@Request[Microsoft.Authorization/roleAssignments:principalType] == 'ServicePrincipal'
)
EOT
condition_version = "2.0"
description = "Supermetal Control Plane access with External ID: ${var.external_id}"
}
# =============================================================================
# User-Assigned Managed Identities
# =============================================================================
# Identity for Supermetal Agents (VMSS)
resource "azurerm_user_assigned_identity" "agent" {
name = "supermetal-agent-identity-${var.resource_group_name}"
location = var.location
resource_group_name = var.resource_group_name
tags = local.common_tags
}
# Identity for Validation Containers (ACI)
resource "azurerm_user_assigned_identity" "validation" {
name = "supermetal-validation-identity-${var.resource_group_name}"
location = var.location
resource_group_name = var.resource_group_name
tags = local.common_tags
}
# =============================================================================
# Key Vault for Credentials
# =============================================================================
resource "azurerm_key_vault" "credentials" {
count = local.create_key_vault ? 1 : 0
name = local.key_vault_name
location = var.location
resource_group_name = var.resource_group_name
tenant_id = data.azurerm_client_config.current.tenant_id
sku_name = "standard"
soft_delete_retention_days = 90
purge_protection_enabled = true
enable_rbac_authorization = false
network_acls {
default_action = "Deny"
bypass = "AzureServices"
ip_rules = []
virtual_network_subnet_ids = var.private_subnet_ids
}
tags = local.common_tags
}
# =============================================================================
# Key Vault Access Policies
# =============================================================================
# Access for Supermetal Control Plane (encrypt only)
resource "azurerm_key_vault_access_policy" "control_plane" {
key_vault_id = local.key_vault_id
tenant_id = var.supermetal_tenant_id
object_id = var.supermetal_principal_id
key_permissions = [
"Get",
"List",
"Encrypt"
]
secret_permissions = [
"Set",
"List"
]
}
# Access for Agent Identity (decrypt)
resource "azurerm_key_vault_access_policy" "agent" {
key_vault_id = local.key_vault_id
tenant_id = data.azurerm_client_config.current.tenant_id
object_id = azurerm_user_assigned_identity.agent.principal_id
key_permissions = [
"Get",
"Decrypt"
]
secret_permissions = [
"Get"
]
}
# Access for Validation Identity
resource "azurerm_key_vault_access_policy" "validation" {
key_vault_id = local.key_vault_id
tenant_id = data.azurerm_client_config.current.tenant_id
object_id = azurerm_user_assigned_identity.validation.principal_id
key_permissions = [
"Get",
"Decrypt"
]
secret_permissions = [
"Get"
]
}
# =============================================================================
# Network Security Groups
# =============================================================================
# NSG for Agent Metadata Access
resource "azurerm_network_security_group" "agent_metadata" {
name = "supermetal-agent-metadata-nsg-${var.resource_group_name}"
location = var.location
resource_group_name = var.resource_group_name
security_rule {
name = "AllowHttpsToMetadataEndpoint"
priority = 100
direction = "Outbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_range = "443"
source_address_prefix = "VirtualNetwork"
destination_address_prefix = "PrivateEndpoint"
description = "Allow HTTPS traffic to Supermetal metadata private endpoint"
}
security_rule {
name = "AllowAzureStorage"
priority = 110
direction = "Outbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_range = "443"
source_address_prefix = "VirtualNetwork"
destination_address_prefix = "Storage.${var.location}"
description = "Allow access to Azure Storage for buffer operations"
}
security_rule {
name = "AllowAzureKeyVault"
priority = 120
direction = "Outbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_range = "443"
source_address_prefix = "VirtualNetwork"
destination_address_prefix = "AzureKeyVault.${var.location}"
description = "Allow access to Azure Key Vault for credential retrieval"
}
tags = local.common_tags
}
# =============================================================================
# Private Endpoint for Metadata Service
# =============================================================================
resource "azurerm_private_endpoint" "metadata" {
name = "supermetal-metadata-endpoint-${var.resource_group_name}"
location = var.location
resource_group_name = var.resource_group_name
subnet_id = local.primary_subnet_id
private_service_connection {
name = "supermetal-metadata-connection"
private_connection_resource_id = var.supermetal_metadata_service_id
is_manual_connection = false
subresource_names = ["metadata"]
}
tags = local.common_tags
}
# =============================================================================
# Container Registry Access Role Assignments
# =============================================================================
# Role assignment for agent identity to pull images
resource "azurerm_role_assignment" "agent_acr_pull" {
scope = "/subscriptions/${var.supermetal_tenant_id}/resourceGroups/supermetal-control-plane/providers/Microsoft.ContainerRegistry/registries/${var.supermetal_container_registry_name}"
role_definition_name = "AcrPull"
principal_id = azurerm_user_assigned_identity.agent.principal_id
principal_type = "ServicePrincipal"
}
# Role assignment for validation identity to pull images
resource "azurerm_role_assignment" "validation_acr_pull" {
scope = "/subscriptions/${var.supermetal_tenant_id}/resourceGroups/supermetal-control-plane/providers/Microsoft.ContainerRegistry/registries/${var.supermetal_container_registry_name}"
role_definition_name = "AcrPull"
principal_id = azurerm_user_assigned_identity.validation.principal_id
principal_type = "ServicePrincipal"
}
# =============================================================================
# Outputs
# =============================================================================
output "control_plane_role_definition_id" {
description = "ID of the custom role definition for Supermetal Control Plane"
value = azurerm_role_definition.supermetal_control_plane.role_definition_resource_id
}
output "control_plane_role_assignment_id" {
description = "ID of the role assignment for Supermetal Control Plane"
value = azurerm_role_assignment.supermetal_control_plane.id
}
output "agent_identity_resource_id" {
description = "Resource ID of the Managed Identity for Supermetal agents"
value = azurerm_user_assigned_identity.agent.id
}
output "agent_identity_client_id" {
description = "Client ID of the Managed Identity for Supermetal agents"
value = azurerm_user_assigned_identity.agent.client_id
}
output "agent_identity_principal_id" {
description = "Principal ID of the Managed Identity for Supermetal agents"
value = azurerm_user_assigned_identity.agent.principal_id
}
output "validation_identity_resource_id" {
description = "Resource ID of the Managed Identity for validation containers"
value = azurerm_user_assigned_identity.validation.id
}
output "validation_identity_client_id" {
description = "Client ID of the Managed Identity for validation containers"
value = azurerm_user_assigned_identity.validation.client_id
}
output "key_vault_id" {
description = "Resource ID of the Key Vault for credentials"
value = local.key_vault_id
}
output "key_vault_uri" {
description = "URI of the Key Vault for credentials"
value = local.create_key_vault ? azurerm_key_vault.credentials[0].vault_uri : "https://${local.existing_kv_name}.vault.azure.net/"
}
output "metadata_endpoint_id" {
description = "Resource ID of the Private Endpoint for Supermetal Metadata Service"
value = azurerm_private_endpoint.metadata.id
}
output "metadata_endpoint_ip" {
description = "Private IP address of the Metadata Service endpoint"
value = azurerm_private_endpoint.metadata.private_service_connection[0].private_ip_address
}
output "agent_metadata_nsg_id" {
description = "Resource ID of the NSG for agent metadata access"
value = azurerm_network_security_group.agent_metadata.id
}
output "bootstrap_timestamp" {
description = "Timestamp when bootstrap was completed"
value = timestamp()
}
```
***
### Azure Roles & Permissions
Bootstrap creates several Azure roles and managed identities with specific permissions necessary for Supermetal BYOC to function securely.
These roles ensure that the Supermetal Control Plane can manage resources in your Azure subscription on your behalf, and that the Supermetal agents have the necessary permissions to operate.
The principle of least privilege is applied to scope down permissions as much as possible.
Key roles created and their purposes:
#### Supermetal Control Plane Role
```hcl
resource "azurerm_role_definition" "supermetal_control_plane" {
name = "SupermetalControlPlaneRole-${var.resource_group_name}"
scope = data.azurerm_subscription.current.id
description = "Allows Supermetal Control Plane to manage resources in customer subscription"
permissions {
actions = [
# ===== Virtual Machine Scale Sets =====
"Microsoft.Compute/virtualMachineScaleSets/read",
"Microsoft.Compute/virtualMachineScaleSets/write",
"Microsoft.Compute/virtualMachineScaleSets/delete",
"Microsoft.Compute/virtualMachineScaleSets/start/action",
"Microsoft.Compute/virtualMachineScaleSets/powerOff/action",
"Microsoft.Compute/virtualMachineScaleSets/restart/action",
"Microsoft.Compute/virtualMachineScaleSets/deallocate/action",
"Microsoft.Compute/virtualMachineScaleSets/manualUpgrade/action",
"Microsoft.Compute/virtualMachineScaleSets/scale/action",
"Microsoft.Compute/virtualMachineScaleSets/instanceView/read",
"Microsoft.Compute/virtualMachineScaleSets/skus/read",
# ===== VMSS Extensions (for Docker/Agent setup) =====
"Microsoft.Compute/virtualMachineScaleSets/extensions/read",
"Microsoft.Compute/virtualMachineScaleSets/extensions/write",
"Microsoft.Compute/virtualMachineScaleSets/extensions/delete",
# ===== Virtual Machines (VMSS instances) =====
"Microsoft.Compute/virtualMachines/read",
"Microsoft.Compute/virtualMachines/instanceView/read",
# ===== Managed Disks =====
"Microsoft.Compute/disks/read",
"Microsoft.Compute/disks/write",
"Microsoft.Compute/disks/delete",
# ===== Container Instances (for validation) =====
"Microsoft.ContainerInstance/containerGroups/read",
"Microsoft.ContainerInstance/containerGroups/write",
"Microsoft.ContainerInstance/containerGroups/delete",
"Microsoft.ContainerInstance/containerGroups/restart/action",
"Microsoft.ContainerInstance/containerGroups/stop/action",
"Microsoft.ContainerInstance/containerGroups/start/action",
"Microsoft.ContainerInstance/containerGroups/containers/logs/read",
"Microsoft.ContainerInstance/containerGroups/containers/exec/action",
# ===== Container Registry Access =====
"Microsoft.ContainerRegistry/registries/pull/read",
"Microsoft.ContainerRegistry/registries/artifacts/read",
"Microsoft.ContainerRegistry/registries/metadata/read",
# ===== Storage Accounts =====
"Microsoft.Storage/storageAccounts/read",
"Microsoft.Storage/storageAccounts/write",
"Microsoft.Storage/storageAccounts/delete",
"Microsoft.Storage/storageAccounts/listkeys/action",
"Microsoft.Storage/storageAccounts/listAccountSas/action",
"Microsoft.Storage/storageAccounts/blobServices/containers/read",
"Microsoft.Storage/storageAccounts/blobServices/containers/write",
"Microsoft.Storage/storageAccounts/blobServices/containers/delete",
# ===== Networking =====
"Microsoft.Network/virtualNetworks/subnets/join/action"
"Microsoft.Network/networkInterfaces/read"
"Microsoft.Network/networkSecurityGroups/read"
"Microsoft.Network/networkSecurityGroups/join/action"
# ===== Key Vault Management =====
"Microsoft.KeyVault/vaults/read",
# ===== Database Discovery (Read-only) =====
"Microsoft.Sql/servers/read",
"Microsoft.Sql/servers/databases/read",
"Microsoft.DBforPostgreSQL/servers/read",
"Microsoft.DBforPostgreSQL/flexibleServers/read",
"Microsoft.DBforMySQL/servers/read",
"Microsoft.DBforMySQL/flexibleServers/read",
"Microsoft.DBforMariaDB/servers/read"
]
not_actions = []
data_actions = [
# ===== Key Vault Data Operations (scoped to supermetal-* items) =====
"Microsoft.KeyVault/vaults/secrets/getSecret/action",
"Microsoft.KeyVault/vaults/secrets/setSecret/action",
"Microsoft.KeyVault/vaults/keys/encrypt/action",
"Microsoft.KeyVault/vaults/keys/decrypt/action",
# ===== Storage Data Operations =====
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/add/action"
]
not_data_actions = []
}
assignable_scopes = [
data.azurerm_resource_group.target.id
]
}
```
This role is assigned to the Supermetal Service Principal. It enables the Supermetal Control Plane to orchestrate and manage the lifecycle of resources required for the Supermetal data plane within your Azure subscription.
* **Virtual Machine Scale Sets Management**
* Create, read, update, and delete VMSS resources (`Microsoft.Compute/virtualMachineScaleSets/*`)
* Control instance lifecycle (start, stop, restart, scale) for agent deployment
* Manage VMSS extensions for Docker/agent setup
* Read-only access to VM instances (`Microsoft.Compute/virtualMachines/read`)
* **Container Instances and Registry**
* Full management of Container Instances for validation tasks (`Microsoft.ContainerInstance/containerGroups/*`)
* Pull access to container registry for agent images (`Microsoft.ContainerRegistry/registries/pull/read`)
* Used for deploying validation containers
* **Storage and Data Management**
* Storage Account management (`Microsoft.Storage/storageAccounts/*`)
* Used for data buffering and staging per connector
* **Networking**
* Read VNet and subnet information (`Microsoft.Network/virtualNetworks/*/read`)
* Join subnets for resource deployment (`Microsoft.Network/virtualNetworks/subnets/join/action`)
* Read and join network security groups
* **Security and Access Control**
* Key Vault read access (`Microsoft.KeyVault/vaults/read`)
* Key Vault data operations (get/set secrets, encrypt/decrypt with keys)
* Restricted to resource group scope specified during bootstrap
* **Database Discovery**
* Read-only access to database resources (`Microsoft.Sql/servers/read`, `Microsoft.DBforPostgreSQL/*/read`, etc.)
* Limited to discovery capabilities for validating connections
***
#### Agent and Validation Identities
```hcl
# Identity for Supermetal Agents (VMSS)
resource "azurerm_user_assigned_identity" "agent" {
name = "supermetal-agent-identity-${var.resource_group_name}"
location = var.location
resource_group_name = var.resource_group_name
tags = local.common_tags
}
# Access for Agent Identity (decrypt)
resource "azurerm_key_vault_access_policy" "agent" {
key_vault_id = local.key_vault_id
tenant_id = data.azurerm_client_config.current.tenant_id
object_id = azurerm_user_assigned_identity.agent.principal_id
key_permissions = [
"Get",
"Decrypt"
]
secret_permissions = [
"Get"
]
}
# Role assignment for agent identity to pull images
resource "azurerm_role_assignment" "agent_acr_pull" {
scope = "/subscriptions/${var.supermetal_tenant_id}/resourceGroups/supermetal-control-plane/providers/Microsoft.ContainerRegistry/registries/${var.supermetal_container_registry_name}"
role_definition_name = "AcrPull"
principal_id = azurerm_user_assigned_identity.agent.principal_id
principal_type = "ServicePrincipal"
}
```
This User-Assigned Managed Identity is attached to the VMSS instances running Supermetal agents. It provides the following permissions:
* **Key Vault Access**
* Get and decrypt operations on keys (`Get`, `Decrypt`)
* Retrieve secrets from Key Vault (`Get`)
* Used to securely access connection credentials
* **Container Registry Access**
* AcrPull role on Supermetal's container registry
* Enables pulling agent container images from Supermetal's registry
```hcl
# Identity for Validation Containers (ACI)
resource "azurerm_user_assigned_identity" "validation" {
name = "supermetal-validation-identity-${var.resource_group_name}"
location = var.location
resource_group_name = var.resource_group_name
tags = local.common_tags
}
# Access for Validation Identity
resource "azurerm_key_vault_access_policy" "validation" {
key_vault_id = local.key_vault_id
tenant_id = data.azurerm_client_config.current.tenant_id
object_id = azurerm_user_assigned_identity.validation.principal_id
key_permissions = [
"Get",
"Decrypt"
]
secret_permissions = [
"Get"
]
}
# Role assignment for validation identity to pull images
resource "azurerm_role_assignment" "validation_acr_pull" {
scope = "/subscriptions/${var.supermetal_tenant_id}/resourceGroups/supermetal-control-plane/providers/Microsoft.ContainerRegistry/registries/${var.supermetal_container_registry_name}"
role_definition_name = "AcrPull"
principal_id = azurerm_user_assigned_identity.validation.principal_id
principal_type = "ServicePrincipal"
}
```
This User-Assigned Managed Identity is used by Azure Container Instances (ACI) that may be deployed for validation tasks such as testing database connectivity or network reachability.
* **Key Vault Access**
* Similar permissions to the Agent identity (Get, Decrypt)
* Retrieves necessary connection credentials for validation
* **Container Registry Access**
* AcrPull role on Supermetal's container registry
* Enables pulling validation container images
# BYOC (/docs/main/concepts/deployment/byoc)
import { Icons } from '../../../../../../app/icons';
import SupermetalHighLevelBYOCArchitectureDiagram from '../../../../../../app/diagrams/byoc';
The Bring your own cloud (BYOC) / Hybrid model offers a balance between the ease of a managed service and the security/control of self-hosting. It separates Supermetal into two distinct components: a Control Plane managed by Supermetal, and a Data Plane that runs entirely within your cloud environment.
### Overview
* **Control Plane:** This is the centralized management and infrastructure orchestration layer hosted by Supermetal. It handles metadata, monitoring, and the entire connector lifecycle.
* **Data Plane:** This consists of Supermetal agents deployed within your Virtual Private Cloud (VPC) on your chosen cloud provider (AWS, Azure, GCP). The Data Plane handles all data processing and movement.
**Crucially, customer data never leaves your VPC or network boundary. The Control Plane only interacts with metadata.**
### Control Plane
The Supermetal-hosted Control Plane is responsible for:
* **Connector Lifecycle Management:**
* Storing connector definitions and configurations.
* Managing encrypted credentials securely.
* Maintaining connector state (e.g., WAL positions, LSNs for CDC).
* Orchestrating provisioning and de-provisioning workflows for agents in the Data Plane.
* Scaling, failover, and high availability of data plane agents.
* **Operational Interface:**
* Providing a centralized web console & REST APIs for creating, managing, and monitoring all connectors across your organization.
* **Analytics and Insights:**
* Aggregating logs, metrics, and performance analytics from all Data Plane agents.
* **Security and Governance:**
* Implementing SSO, Role-Based Access Control (RBAC) for managing user & permissions.
* Maintaining audit logs for all management operations.
### Data Plane
The Data Plane runs entirely within your cloud infrastructure (e.g., your AWS VPC, Azure VNet, or GCP VPC) and consists of:
* **Supermetal Agents:** These are the same lightweight, efficient agents used in the self-hosted model. They perform the actual data extraction from sources and loading to targets.
* **Network Security:** Operates entirely within your VPC's security boundaries, adhering to your network ACLs, security groups, and private networking configurations.
* **Data Confinement:** Customer data is processed and moved directly between sources and targets within the customer's VPC. It *never* traverses the public internet to the Control Plane or leaves the customer's designated network environment.
* **Metadata Exchange:** Only essential metadata (e.g., connector state, operational logs, performance metrics) is securely transmitted to the Control Plane using private endpoints for management and observability.
### Benefits of BYOC / Hybrid Deployment
This split two plane architecture offers significant advantages:
* **Data Sovereignty and Security:** Ensures that sensitive data remains within your secure cloud environment and network boundaries at all times, simplifying compliance with regulations like GDPR, HIPAA, and CCPA.
* **Reduced Egress Costs:** By processing data within your VPC, especially when sources and targets are in the same cloud region, you can significantly reduce or eliminate costly data egress charges.
* **Leverage Existing Infrastructure and Security:** Utilize your established cloud security posture, including IAM roles, security groups, VPC endpoints, private links, and monitoring tools.
* **Optimized Performance:** Data transfer occurs over your private network, potentially offering lower latency and higher bandwidth than solutions routing data externally.
* **Simplified Management:** The Supermetal-managed Control Plane offloads the operational burden of managing the orchestration, monitoring, and update mechanisms for the core replication logic, allowing your teams to focus on data integration tasks rather than infrastructure management.
* **Centralized Control with Distributed Execution:** Gain a unified view and control over all your data pipelines via the Control Plane, while the actual data processing happens securely and efficiently within your environment.
### Next Steps
# Monitoring (/docs/main/concepts/deployment/monitoring)
Supermetal provides real-time monitoring of its replication processes, offering deep insights into health and performance. All metrics and logs are recorded using the [OpenTelemetry (OTEL)](https://opentelemetry.io/) vendor-neutral standard.
OTEL ensures that you can seamlessly integrate Supermetal's observability data with your preferred monitoring and logging tools, such as [Datadog](https://docs.datadoghq.com/opentelemetry/), [Splunk](https://docs.splunk.com/Observability/gdi/opentelemetry/opentelemetry.html), or any [other OTEL-compatible](https://opentelemetry.io/ecosystem/vendors/) platform. Key observability features include:
* [**Comprehensive Metrics:**](/docs/main/concepts/deployment/monitoring/metrics) Detailed metrics covering data throughput, latency, resource utilization, and operational events.
* **Structured Logging:** Rich, structured logs that provide context for events and errors, facilitating easier debugging and analysis.
## Export
You can configure the agent to send this telemetry data to your preferred OTLP-compatible backend. The method for configuring the OTLP exporter endpoint varies depending on your Supermetal deployment model.
### Local / Self-Hosted
When running the Supermetal agent locally, you can configure the OTLP exporter endpoint and other parameters using standard OpenTelemetry environment variables. Key variables include:
* `OTEL_EXPORTER_OTLP_ENDPOINT`: Specifies the target URL for the OTLP collector (e.g., `http://localhost:4317` for gRPC or `http://localhost:4318/v1/metrics` for HTTP/protobuf metrics).
* `OTEL_EXPORTER_OTLP_HEADERS`: Allows you to specify headers, such as authentication tokens (e.g., `Authorization=Bearer `).
For detailed information on OTLP exporter configuration, refer to the official [OpenTelemetry SDK configuration documentation](https://opentelemetry.io/docs/languages/sdk-configuration/otlp-exporter/).
Set these environment variables in the environment where your Supermetal agent is running.
### BYOC / Hybrid
For deployments managed by the Supermetal Control Plane, including Bring Your Own Cloud (BYOC), the configuration of OTLP exporter endpoints for your agents is managed centrally through the Supermetal Console.
Navigate to settings within the Supermetal Console to specify the OTLP endpoint URL and any necessary authentication headers for your chosen monitoring backend. The Control Plane will then start propagating telemetry data to your environment.
# Metrics (/docs/main/concepts/deployment/monitoring/metrics)
## Connector Metrics
These metrics track data flow and performance through the ETL pipeline from source to target systems.
| Category | Metric Name | Description | Unit | Type |
| ----------------- | --------------------------------------- | ---------------------------------------------------------------------------------------------------- | ---- | --------------- |
| Data | `rows.in` | Number of rows read | 1 | Counter (u64) |
| | `rows.out` | Number of rows written | 1 | Counter (u64) |
| | `bytes.in` | Number of bytes read | By | Counter (u64) |
| | `bytes.out` | Number of bytes written | By | Counter (u64) |
| Latency | `latency.total` | Latency from source event timestamp to event being processed & written to target. End-to-end latency | ns | Histogram (u64) |
| | `latency.in` | Latency from source event timestamp to when the event was received | ns | Histogram (u64) |
| | `latency.out` | Latency from when the event was received to when event written to target | ns | Histogram (u64) |
| | `latency.in.process` | Latency to process source event | ns | Histogram (u64) |
| | `latency.out.process` | Latency to process event before writing to target | ns | Histogram (u64) |
| Buffer | | | | |
| ↳ Data | `buffer.rows.in` | Number of rows read from the buffer | 1 | Counter (u64) |
| | `buffer.rows.out` | Number of rows written to the buffer | 1 | Counter (u64) |
| | `buffer.bytes.in` | Number of bytes read from the buffer | By | Counter (u64) |
| | `buffer.bytes.out` | Number of bytes written to the buffer | By | Counter (u64) |
| ↳ Latency | `buffer.latency.total` | Latency from source event timestamp to event being processed & written to buffer | ns | Histogram (u64) |
| | `buffer.latency.in` | Latency from source event timestamp to when the event was received by buffer | ns | Histogram (u64) |
| | `buffer.latency.out` | Latency from when the event was received to when event written to buffer | ns | Histogram (u64) |
| | `buffer.latency.in.process` | Latency to process the event was received by buffer | ns | Histogram (u64) |
| | `buffer.latency.out.process` | Latency to process event before writing to buffer | ns | Histogram (u64) |
| ↳ Object Store | `buffer.object_store.files.force-flush` | Number of times buffer forced a flush (on no writes) | 1 | Counter (u64) |
| | `buffer.object_store.files.abort` | Failed file operations due to errors | 1 | Counter (u64) |
| | `buffer.object_store.files.create` | Successfully created new file in object storage | 1 | Counter (u64) |
| | `buffer.object_store.files.delete` | Files removed from object storage | 1 | Counter (u64) |
| | `buffer.object_store.files.get` | Files retrieved from object storage | 1 | Counter (u64) |
| Snapshot Progress | `progress.snapshot` | Current progress of initial snapshot load (0.0 to 100.0) | % | Gauge (f64) |
| | `buffer.progress.snapshot` | Current progress of buffer snapshot operations (0.0 to 100.0) | % | Gauge (f64) |
## Process Metrics
System-level metrics for the Supermetal process following [OpenTelemetry semantic conventions](https://opentelemetry.io/docs/specs/semconv/system/process-metrics/). These help monitor resource usage and system health.
| Category | Metric Name | Description | Unit | Type |
| ----------------- | ------------------------------------ | --------------------------------------------------------- | ---- | ------------------- |
| Process CPU | `process.cpu.time` | Total CPU time consumed by the process since startup | s | Counter (u64) |
| | `process.cpu.utilization` | Current CPU usage as ratio (0.0=idle, 1.0=fully utilized) | 1 | Gauge (f64) |
| Process Memory | `process.memory.usage` | Physical memory (RSS) currently used by the process | By | UpDownCounter (i64) |
| | `process.memory.virtual` | Virtual memory currently allocated by the process | By | UpDownCounter (i64) |
| Process Resources | `process.open_file_descriptor_count` | Currently open file descriptors (monitor for fd leaks) | 1 | UpDownCounter (i64) |
| | `process.thread_count` | Currently active threads (monitor for thread leaks) | 1 | UpDownCounter (i64) |
| | `process.disk.io` | Total disk I/O bytes with `direction` label (read/write) | By | Counter (u64) |
## Runtime Metrics
Tokio async runtime metrics providing insights into task scheduling, worker thread utilization, and async I/O performance.
| Category | Metric Name | Description | Unit | Type |
| ----------------------- | ------------------------------------- | ------------------------------------------------------------------------------ | ---- | ------------- |
| Runtime Tasks | `runtime.global_queue_depth` | Tasks waiting in global queue (high values indicate backpressure) | 1 | Gauge (u64) |
| | `runtime.alive_tasks` | Currently executing async tasks | 1 | Gauge (u64) |
| | `runtime.spawned_tasks` | Total async tasks created since startup | 1 | Counter (u64) |
| | `runtime.budget_forced_yield_count` | Tasks forced to yield due to execution time limits (indicates CPU-bound tasks) | 1 | Counter (u64) |
| Runtime Workers | `runtime.num_workers` | Total worker threads in the runtime thread pool | 1 | Gauge (u64) |
| | `runtime.blocking_queue_depth` | Tasks waiting for blocking thread pool | 1 | Gauge (u64) |
| | `runtime.num_blocking_threads` | Currently active blocking threads | 1 | Gauge (u64) |
| | `runtime.num_idle_blocking_threads` | Idle blocking threads available for work | 1 | Gauge (u64) |
| | `runtime.io_driver_ready_count` | I/O events processed by the runtime (network/file operations) | 1 | Counter (u64) |
| Worker Metrics (opt-in) | `runtime.worker_local_queue_depth` | Tasks in each worker's local queue (set `ENABLE_WORKER_METRICS=1`) | 1 | Gauge (u64) |
| | `runtime.worker_local_schedule_count` | Tasks scheduled locally per worker (indicates work distribution) | 1 | Counter (u64) |
| | `runtime.worker_mean_poll_time` | Average task execution time per worker | ns | Gauge (u64) |
| | `runtime.worker_noop_count` | Idle polls per worker (high values indicate low workload) | 1 | Counter (u64) |
| | `runtime.worker_overflow_count` | Local queue overflows per worker (tasks moved to global queue) | 1 | Counter (u64) |
| | `runtime.worker_park_count` | Times worker thread went idle waiting for tasks | 1 | Counter (u64) |
| | `runtime.worker_poll_count` | Total task polls per worker | 1 | Counter (u64) |
| | `runtime.worker_steal_count` | Tasks stolen from other workers (indicates load balancing) | 1 | Counter (u64) |
| | `runtime.worker_steal_operations` | Steal attempts per worker (successful + failed) | 1 | Counter (u64) |
| | `runtime.worker_total_busy_duration` | Cumulative execution time per worker | ns | Counter (u64) |
|