# Create a new connector (/docs/api/createConnector) {/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */} Create a new connector with a specified ID # Debug connector configuration (/docs/api/debugConnector) {/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */} Validate and debug a connector configuration without storing it # Get connector by ID (/docs/api/getConnector) {/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */} Retrieve a specific connector by its ID with status information # Get connector history (/docs/api/getConnectorHistory) {/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */} Retrieve the configuration history for a specific connector # Overview (/docs/api) {/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */} # List all connectors (/docs/api/listConnectors) {/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */} Retrieve a list of all connectors with their status # Send command to connector (/docs/api/sendConnectorCommand) {/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */} Send a control command (start, stop, pause, resume) to a specific connector # Validate buffer configuration (/docs/api/validateBuffer) {/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */} Tests connection and permissions for a buffer configuration. # Validate sink connector configuration (/docs/api/validateSink) {/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */} Tests connection, authentication, permissions, and configuration for target. # Validate source connector configuration (/docs/api/validateSource) {/* This file was generated by Fumadocs. Do not edit this file directly. Any changes should be made by running the generation command again. */} Tests connection, authentication, permissions, and configuration for source. # FAQ (/docs/main/faq) import { Accordion, Accordions } from 'fumadocs-ui/components/accordion'; *** ## Connector Supermetal's `flush_interval_ms` ([Advanced Settings](/docs/main/quickstart/#advanced-settings-optional)) controls how frequently Supermetal flushes buffered changes to the target database. Defaults: * `5,000ms` (5 seconds) for **transactional** targets * `10,000ms` (10 seconds) for **warehouse** targets | `flush_interval_ms` | Data Freshness | Impact | | ------------------- | --------------- | ------------------------------------------------------------------------------------------------------------------------- | | **Lower values** | Fresher data | More target database compute operations (e.g., `MERGE`s)
More Buffer Object Storage write API calls (e.g., `PUT`s) | | **Higher values** | Less fresh data | Fewer target database compute operations (e.g., `MERGE`s)
Fewer Buffer Object Storage write API calls (e.g., `PUT`s) |
#### Overview \[!toc] | Metric | **Transactional-to-Transactional¹** | **Transactional-to-Warehouse** | | ----------------------- | ------------------------------------------------ | ----------------------------------------------------- | | **Snapshot Throughput** | Up to `150` GB/hour
(`45,000+` rows/second) | Up to `300+` GB/hour
(`70,000+` rows/second) | | **CDC Throughput** | Up to `35,000` operations/second | Up to `80,000` operations/second | | **End-to-End Latency** | `P50`: Sub-second¹
`P95`: Under 2 seconds¹ | `P50`: `15-30` seconds
`P95`: Under `60` seconds | #### End-to-End CDC Latency \[!toc] *Measured at a sustained load of `20,000` operations/second* | Percentile | **Transactional Targets¹** | **Warehouse Targets²** | | :--------- | :------------------------- | :--------------------- | | P50 | `0.6` seconds¹ | `20` seconds | | P90 | `1.2` seconds¹ | `35` seconds | | P95 | `1.8` seconds¹ | `45` seconds | | P99 | `3.2` seconds¹ | `60` seconds | ¹ *Sub-second latency requires tuning for data freshness. This is achieved with a non-default [`flush_interval_ms`](/docs/main/faq/#sync-frequency) setting of `1,000ms` (1 second).* *Performance metrics are based on internal benchmarks replicating TPC-C-like workloads from transactional sources to various transactional and warehouse targets on comparable medium-sized cloud instances. Actual performance may vary based on data types, row size, network conditions, and source / target system load.*
*** ## Source [Catalog configuration](/docs/main/quickstart/#configure-catalog-optional) controls table and column filtering during replication. **Filtering behavior**: * **Default**: All discoverable tables and columns are replicated if no selections are made. * **Selection**: When specific tables or columns are selected, an include-list filtering is applied. The filtering system operates at both table and column levels, allowing for granular control over data replication scope. Any database object not explicitly selected (when selections are made) is excluded from replication. Note: Primary keys are always (force) included and cannot be filtered out. *** ## Target Supermetal injects the following system metadata columns into target tables: | Column | Data Type | Function | | ------------- | --------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- | | `_sm_deleted` | `BOOLEAN` | Tombstone marker for logical deletes. `TRUE` indicates the row was deleted in the source system but preserved in the target for historical tracking. | | `_sm_version` | `TIMESTAMP(ns)` or `BIGINT` | Monotonically increasing version identifier derived from the source transaction commit timestamp. Used for deduplication. | Target configuration determines the destination structure for replicated tables: | Parameter | Function | | ----------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `target_database` | Specifies destination database. | | `target_schema` | Controls schema & table creation strategy:
  • When specified: All tables are written to this single schema
  • When unspecified: Preserves source schema structure, creating schemas as needed if target supports schema creation
| Supermetal automatically creates any non-existent databases or schemas required for replication.
Schema migrations are controlled by the `migration_strategy` parameter, which takes a list of allowed schema migration operations: | Migration Operation | Function | | ------------------- | ------------------------------------------- | | `Add` | Allows adding new columns to tables. | | `Drop` | Allows dropping columns from tables. | | `Datatype` | Allows changing the data type of columns. | | `Nullability` | Allows changing the nullability of columns. | | `Order` | Allows changing the order of columns. | Defaults: Migrations are **disabled by default**.
*** ## Buffer Supermetal supports the following buffer types: | Buffer Type | Function | | ------------- | ----------------------------------------------------------------------- | | Object Stores | Buffers data in object stores (e.g., S3, GCS, Azure Blob) | | Local (Disk) | Buffers data locally on disk. Perfect for low latency (NVMe) use cases. | Supermetal's buffer determines the storage file format (Arrow, ArrowStream, Parquet, CSV, JSON) depending on the target database's ingestion capabilities, defaulting to Parquet. S3 Express One Zone provides single-AZ, low-latency object storage with reduced API latencies and higher throughput capabilities. When configured, Supermetal utilizes S3 Express for buffer file storage. # Supermetal (/docs/main) Supermetal is a change data capture (CDC) platform that synchronizes data between databases, data warehouses, and lakehouses while preserving transactional consistency. It handles both low latency streaming and large scale batch data movement with exceptional compute efficiency. Supermetal deploys as a single binary with no external dependencies, making it simple to operate across on-premise, hybrid, and multi-cloud environments. ## Key Features * **High-Performance Architecture**: Leverages Rust's bare-metal performance and Apache Arrow's columnar format to achieve high throughput and low latency * **Operational Simplicity**: Ships as a single, self-contained binary (agent) with no external dependencies, making deployment and management straightforward * **Resource Efficiency**: Process terabytes of data on modest hardware (e.g., 2 vCPU, 8GB RAM EC2 instance) * **Rich Data Type Support**: Apache Arrow's expressive type system ensures accurate data representation between diverse source and target systems * **Reliable Replication**: Built-in state management and checkpointing mechanisms guarantee at-least-once data delivery and graceful recovery ## Use Cases Supermetal excels in scenarios requiring: * **Fraud detection**: Monitor transaction streams for anomalous patterns and policy violations * **Inventory management**: Synchronize stock levels across multiple systems and channels * **Customer 360 views**: Aggregate customer data from disparate systems for unified analysis * **Compliance reporting**: Maintain real-time audit logs for regulatory and governance requirements * **Event-driven microservices**: Enable immediate downstream processing when source data changes ## Next Steps * Learn about [Supermetal's Architecture](/docs/main/concepts/architecture) * Explore [Deployment Options](/docs/main/concepts/deployment) # Quickstart (/docs/main/quickstart) import { Step, Steps } from 'fumadocs-ui/components/steps'; import { ImageZoom } from 'fumadocs-ui/components/image-zoom'; Get your first data replication pipeline running with Supermetal in minutes. This guide walks you through creating a connector from source to target using the web interface. Before starting, ensure you have [installed and started the Supermetal agent](/docs/main/concepts/deployment/local) locally. The agent should be running and accessible at `http://localhost:3000`. ## Creating Your First Connector ### Intro Screen After accessing the Supermetal dashboard at `http://localhost:3000`, you will see the introduction screen. Click **"Create New Connection"** to begin setting up your first connector. ### Select Source Database Choose your source database type from the available options. ### Configure Source Connection Provide the connection details for your source database including hostname, port, database name, and authentication credentials. ### Validate Source Connection Supermetal will validate your source connection and verify the necessary configs, permissions and connection tests. This ensures the agent can successfully connect and read from your source database. ### Configure Catalog (Optional) Choose which tables and columns to replicate from your source database. By default all discoverable schemas, tables and columns are selected. ### Select Target Database Choose your target database/data warehouse where the data will be replicated to. ### Configure Target Connection Provide the connection details for your target. The configuration fields will vary depending on your selected target type. ### Validate Target Connection Similar to the source validation, Supermetal will verify the target connection and ensure the agent has the necessary permissions to write data to your target database. ### Configure Buffer and Advanced Settings (Optional) Select a buffer configuration for temporary data storage during replication. For getting started, you can choose "Local" that requires an absolute file path or configure cloud storage like S3/Azure Blob Storage for production workloads. Optionally configure flush interval for cdc ### Validate and Create Connector Select `Validate` for Supermetal to verify required permissions for configured buffer. Select `Create Connector` and start your sync! ### Connector Created Congratulations! Your connector has been successfully created and is ready to start replicating data. The final screen shows your connector details and status. ### View Connector Configs (Optional) Click on the wrench icon on the top right corner to view your source, target and replication settings. ## Monitoring Your Connector Your connector will appear in the main dashboard where you can: * Monitor replication status and performance metrics * View real-time sync progress * Pause, resume, or modify connector settings ## Next Steps Now that you have a working connector, explore these topics to get the most out of Supermetal: * [Understanding Supermetal's Architecture](/docs/main/concepts/architecture) * [Buffer Configuration Options](/docs/main/buffer) * [Production Deployment Strategies](/docs/main/concepts/deployment) If you encounter any issues during setup, check our source and target database setup guides for detailed configuration instructions specific to your database types. # Azure Blob Storage (/docs/main/buffer/abs) Supermetal can stage your data in an Azure Blob Storage container before writing it into your target database. This guide walks you through configuring an Azure Blob Storage container as a Supermetal buffer. Azure Blob Storage Buffer is automatically managed by Supermetal Control Plane. ## Prerequisites Before you begin, ensure you have an Azure account with permissions to create and manage: * Storage accounts * Blob containers * Access policies *** ## Setup ### Create a Storage Account and Container Sign in to the [Azure Portal](https://portal.azure.com/) and follow these steps: * Go to **Storage accounts** in the Azure services section. * Click **+ Create** to create a new storage account. * Fill in the required details: * Select your Subscription and Resource Group. * Enter a unique Storage account name. * Choose a Region (select the same region as your Supermetal agent). * Select Performance tier and Redundancy options based on your needs. * Click **Review + create** and then **Create** to deploy the storage account. * Once created, navigate to the storage account. * Go to **Containers** in the left menu. * Click **+ Container** to create a new container. * Enter a name for the container (e.g., "supermetal-buffer"). * Select an access level (recommended: Private). * Click **Create**. ```bash # Login to Azure azure login # Create a resource group (if needed) az group create --name MyResourceGroup --location EastUS # Create storage account az storage account create \ --name \ --resource-group \ --location \ --sku Standard_LRS # Create container az storage container create \ --name supermetal-buffer \ --account-name ``` Replace: * `` with your Azure resource group name. * `` with a globally unique storage account name. * `` with your Azure region (e.g., eastus, westeurope). * `supermetal-buffer` with the name of the container you want to create. The storage account region should match the region of your Supermetal agent to avoid cross-region networking costs and latency. ### Create a Shared Access Signature (SAS) * Navigate to your storage account in the Azure Portal. * Go to **Containers** in the left menu. * Click on the container you created (e.g., "supermetal-buffer"). * Click on **Shared access tokens** at the top of the container page. * Configure the SAS settings: * Select the required permissions (Read, Write, Delete, List). * Set the start and expiry time as needed (note that Azure requires an expiry date). * Select the allowed protocols. * Click **Generate SAS token and URL**. * Copy the generated **SAS token** or **Blob SAS URL** and save it securely. ```bash # Generate container-specific SAS token end=`date -u -d "1 year" '+%Y-%m-%dT%H:%MZ'` az storage container generate-sas \ --name supermetal-buffer \ --account-name \ --permissions rwdl \ --expiry $end \ --output tsv ``` Replace: * `` with your storage account name. * The expiry date sets the token to expire in 1 year. You can set a longer duration if needed, but Azure always requires an expiry date. * `supermetal-buffer` with the name of the container you created. You'll need the following connection details to configure the buffer in Supermetal: * Storage account name * Container name * SAS token * Region # Azure Blob Storage (/docs/main/buffer/abs/setup) ## Prerequisites Before you begin, ensure you have an Azure account with permissions to create and manage: * Storage accounts * Blob containers * Access policies *** ## Setup ### Create a Storage Account and Container Sign in to the [Azure Portal](https://portal.azure.com/) and follow these steps: * Go to **Storage accounts** in the Azure services section. * Click **+ Create** to create a new storage account. * Fill in the required details: * Select your Subscription and Resource Group. * Enter a unique Storage account name. * Choose a Region (select the same region as your Supermetal agent). * Select Performance tier and Redundancy options based on your needs. * Click **Review + create** and then **Create** to deploy the storage account. * Once created, navigate to the storage account. * Go to **Containers** in the left menu. * Click **+ Container** to create a new container. * Enter a name for the container (e.g., "supermetal-buffer"). * Select an access level (recommended: Private). * Click **Create**. ```bash # Login to Azure azure login # Create a resource group (if needed) az group create --name MyResourceGroup --location EastUS # Create storage account az storage account create \ --name \ --resource-group \ --location \ --sku Standard_LRS # Create container az storage container create \ --name supermetal-buffer \ --account-name ``` Replace: * `` with your Azure resource group name. * `` with a globally unique storage account name. * `` with your Azure region (e.g., eastus, westeurope). * `supermetal-buffer` with the name of the container you want to create. The storage account region should match the region of your Supermetal agent to avoid cross-region networking costs and latency. ### Create a Shared Access Signature (SAS) * Navigate to your storage account in the Azure Portal. * Go to **Containers** in the left menu. * Click on the container you created (e.g., "supermetal-buffer"). * Click on **Shared access tokens** at the top of the container page. * Configure the SAS settings: * Select the required permissions (Read, Write, Delete, List). * Set the start and expiry time as needed (note that Azure requires an expiry date). * Select the allowed protocols. * Click **Generate SAS token and URL**. * Copy the generated **SAS token** or **Blob SAS URL** and save it securely. ```bash # Generate container-specific SAS token end=`date -u -d "1 year" '+%Y-%m-%dT%H:%MZ'` az storage container generate-sas \ --name supermetal-buffer \ --account-name \ --permissions rwdl \ --expiry $end \ --output tsv ``` Replace: * `` with your storage account name. * The expiry date sets the token to expire in 1 year. You can set a longer duration if needed, but Azure always requires an expiry date. * `supermetal-buffer` with the name of the container you created. You'll need the following connection details to configure the buffer in Supermetal: * Storage account name * Container name * SAS token * Region # Local (/docs/main/buffer/local) Supermetal can stage your data in a local filesystem directory before writing it into your target database. This guide walks you through configuring a local filesystem directory as a Supermetal buffer. Local Buffer is automatically managed by Supermetal Control Plane. ## Prerequisites Before you begin, ensure you have a local filesystem directory with: * Read and write permissions. * Sufficient disk space to store the data. *** ## Setup Specify the local directory path in Supermetal Console while creating the connector, if the directory does not exist Supermetal will create it. If running on a cloud environment, ensure that the filesystem directory is backed by persistent storage (e.g., EBS, GCP Persistent Disk, Azure Disk) to avoid data loss. *** # Local Buffer Setup (/docs/main/buffer/local/setup) ## Prerequisites Before you begin, ensure you have a local filesystem directory with: * Read and write permissions. * Sufficient disk space to store the data. *** ## Setup Specify the local directory path in Supermetal Console while creating the connector, if the directory does not exist Supermetal will create it. If running on a cloud environment, ensure that the filesystem directory is backed by persistent storage (e.g., EBS, GCP Persistent Disk, Azure Disk) to avoid data loss. # AWS S3 (/docs/main/buffer/s3) Supermetal can stage your data in an AWS S3 bucket before writing it into your target database. This guide walks you through configuring an AWS S3 bucket as a Supermetal buffer. AWS S3 Buffer is automatically managed by Supermetal Control Plane. ## Prerequisites Before you begin, ensure you have an AWS account with permissions to create and manage: * S3 buckets. * IAM roles and policies. *** ## Setup ### Create an S3 bucket Sign in to the [AWS Management Console](https://console.aws.amazon.com/) and follow [AWS documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html) to create an S3 bucket. * Go to the [AWS S3 Console](https://console.aws.amazon.com/s3/). * In the navigation bar on the top of the page, choose the name of the currently displayed AWS Region. Next, choose the Region in which you want to create a bucket. * In the left navigation pane, choose General purpose buckets. * Click on "Create bucket". * Enter a unique bucket name (`supermetal-s3-buffer`). * Configure optional settings as per your requirements. * Click on "Create bucket". ```bash aws s3api create-bucket \ --bucket supermetal-s3-buffer \ --region \ --create-bucket-configuration LocationConstraint= ``` Replace: * `` with the region of the bucket you want to create. * `supermetal-s3-buffer` with the name of the bucket you want to create. The bucket region must match the region of your Supermetal agent to avoid cross region networking costs and latency. ### Create IAM policy * Navigate to the [AWS IAM Console](https://console.aws.amazon.com/iam/). * Go to policies and click on "Create policy". * Select "JSON" and paste the following policy document: Create a policy document file from the following template and run the following command: ```bash aws iam create-policy \ --policy-name \ --policy-document file:// ``` ```json { "Version": "2012-10-17", "Statement": [ { "Sid": "SupermetalS3BufferPolicy", "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucket", "s3:GetBucketLocation", "s3:ListBucketMultipartUploads", "s3:ListMultipartUploadParts", "s3:AbortMultipartUpload" ], "Resource": [ "arn:aws:s3:::supermetal-s3-buffer/*", "arn:aws:s3:::supermetal-s3-buffer" ], "Effect": "Allow" } ] } ``` Replace `supermetal-s3-buffer` with the name of the bucket you want to create. ### Create IAM user * Navigate to the [AWS IAM Console](https://console.aws.amazon.com/iam/). * Go to users and click on "Create user". * Enter a name for the user (e.g., "supermetal-s3-user"). * Select Attach policies directly and attach the policy you created in the previous step. * Click "Create user". * Select the user and click on "Security credentials". * Click on "Create access key". * Select "CLI / Programmatic Access" and click on "Create access key". ```bash aws iam create-user \ --user-name supermetal-s3-user \ --permissions-boundary aws iam create-access-key \ --user-name supermetal-s3-user ``` Replace `` with the ARN of the policy you created in the previous step (`arn:aws:iam:::policy/supermetal-s3-user-policy`). You'll need the following connection details to configure the buffer in Supermetal: * Access key * Secret access key * Bucket name * Region # AWS S3 (/docs/main/buffer/s3/setup) ## Prerequisites Before you begin, ensure you have an AWS account with permissions to create and manage: * S3 buckets. * IAM roles and policies. *** ## Setup ### Create an S3 bucket Sign in to the [AWS Management Console](https://console.aws.amazon.com/) and follow [AWS documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html) to create an S3 bucket. * Go to the [AWS S3 Console](https://console.aws.amazon.com/s3/). * In the navigation bar on the top of the page, choose the name of the currently displayed AWS Region. Next, choose the Region in which you want to create a bucket. * In the left navigation pane, choose General purpose buckets. * Click on "Create bucket". * Enter a unique bucket name (`supermetal-s3-buffer`). * Configure optional settings as per your requirements. * Click on "Create bucket". ```bash aws s3api create-bucket \ --bucket supermetal-s3-buffer \ --region \ --create-bucket-configuration LocationConstraint= ``` Replace: * `` with the region of the bucket you want to create. * `supermetal-s3-buffer` with the name of the bucket you want to create. The bucket region must match the region of your Supermetal agent to avoid cross region networking costs and latency. ### Create IAM policy * Navigate to the [AWS IAM Console](https://console.aws.amazon.com/iam/). * Go to policies and click on "Create policy". * Select "JSON" and paste the following policy document: Create a policy document file from the following template and run the following command: ```bash aws iam create-policy \ --policy-name \ --policy-document file:// ``` ```json { "Version": "2012-10-17", "Statement": [ { "Sid": "SupermetalS3BufferPolicy", "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucket", "s3:GetBucketLocation", "s3:ListBucketMultipartUploads", "s3:ListMultipartUploadParts", "s3:AbortMultipartUpload" ], "Resource": [ "arn:aws:s3:::supermetal-s3-buffer/*", "arn:aws:s3:::supermetal-s3-buffer" ], "Effect": "Allow" } ] } ``` Replace `supermetal-s3-buffer` with the name of the bucket you want to create. ### Create IAM user * Navigate to the [AWS IAM Console](https://console.aws.amazon.com/iam/). * Go to users and click on "Create user". * Enter a name for the user (e.g., "supermetal-s3-user"). * Select Attach policies directly and attach the policy you created in the previous step. * Click "Create user". * Select the user and click on "Security credentials". * Click on "Create access key". * Select "CLI / Programmatic Access" and click on "Create access key". ```bash aws iam create-user \ --user-name supermetal-s3-user \ --permissions-boundary aws iam create-access-key \ --user-name supermetal-s3-user ``` Replace `` with the ARN of the policy you created in the previous step (`arn:aws:iam:::policy/supermetal-s3-user-policy`). You'll need the following connection details to configure the buffer in Supermetal: * Access key * Secret access key * Bucket name * Region # Deployment (/docs/main/concepts/deployment) Supermetal offers two primary deployment models : Self-hosted and BYOC (Bring Your Own Cloud). Both models leverage Supermetal's efficient, single-agent architecture while providing different management options. ### Local / Self-Hosted The self-hosted deployment model is the most straightforward option for users to get started: * **Complete Control**: Manage all aspects of your Supermetal agent * **Network Isolation**: Run within your existing network boundary * **Infrastructure Choice**: Deploy on-premise, in private clouds, or on public cloud VMs * **Simplified Operations**: Single agent with embedded management UI * **Cost Efficiency**: Use your existing infrastructure ### BYOC / Hybrid The BYOC (Bring Your Own Cloud) model offers a balance between the ease of a managed service and the security/control of self-hosting. It separates Supermetal into two distinct components: a Control Plane managed by Supermetal, and a Data Plane that runs entirely within your cloud environment. * **Data Sovereignty**: All data remains within your VPC/cloud account * **Managed Experience**: Control plane handles operations and management * **Simplified Operations**: Streamlined monitoring and maintenance * **Cloud Efficiency**: Eliminate egress costs when source and target are in the same region * **Security Compliance**: Easier to meet regulatory requirements ### Next Steps } title="Local / Self-Hosted" href="/docs/main/concepts/deployment/local" /> } title="BYOC / Hybrid" href="/docs/main/concepts/deployment/byoc" /> # Local / Self-Hosted (/docs/main/concepts/deployment/local) import { Tab, Tabs } from 'fumadocs-ui/components/tabs'; Self-hosted deployment gives you complete control over data sovereignty and infrastructure. The Supermetal agent runs as a single binary with zero runtime dependencies. Use this deployment method when you need data to stay within your infrastructure or are operating in restricted network environments.
### Installation Use the following installation script to download the binary for your platform. ```bash curl -fsSL https://trial.supermetal.io/install.sh | sh ``` ```powershell iwr -useb https://trial.supermetal.io/install.ps1 | iex ``` Download directly from the web interface: [trial.supermetal.io](https://trial.supermetal.io/#download)
### Run the Agent Run the following command to start running the supermetal agent which will start a local server to access the supermetal dashboard UI. ```bash supermetal --server ``` Running the agent will prompt you for a passphrase that will be used to encrypt connector credentials locally, you will need to re-enter the same passphrase each time you restart the agent.
### Access the embedded web interface Visit [http://localhost:3000](http://localhost:3000) to access the supermetal dashboard and to start creating connectors.
# Connector Management (/docs/main/concepts/deployment/management) Supermetal agents can be managed through several methods, with availability depending on your deployment model. The table below outlines the management options available for each deployment type: | Management Option | Local / Self-Hosted | BYOC / Hybrid | | --------------------------------------------- | ---------------------- | --------------------------------------- | | [**Web Console**](#web-console) | ✅ Embedded web console | ✅ Centralized Control Plane web console | | [**JSON APIs**](#json-apis) | ✅ | ✅ | | [**Configuration File**](#configuration-file) | ✅ | ❌ | ## Management Options ### Web Console The web interface provides intuitive management capabilities including: * Connector creation, configuration, and lifecycle management (start, stop, pause, resume) * Real-time monitoring of connector status, throughput, latency, and error rates ### JSON APIs A comprehensive RESTful API enables programmatic control over agent(s), which is ideal for: * Automating the creation, management, and scaling of connectors * Integrating Supermetal into existing CI/CD pipelines and orchestration frameworks * Building custom dashboards or operational tools ### Configuration File Agents are configured using structured files (e.g., YAML or JSON), which allows for defining sources, targets, and detailed replication parameters for each connector # Security (/docs/main/concepts/security) Supermetal's security architecture is designed to protect your data while maintaining the flexibility of our Bring Your Own Cloud (BYOC) model. This document provides information on how we secure both the Supermetal Control Plane and Data Plane components. ## Security Architecture Supermetal employs a secure, split-architecture model that separates our responsibilities from yours: * **Control Plane**: Managed by Supermetal, providing orchestration, monitoring, and management capabilities * **Data Plane**: Deployed in your cloud environment (AWS or Azure), processing your data within your security perimeter ## Authentication & Access Control ### Control Plane Authentication Supermetal's Control Plane implements multiple layers of authentication to protect your account and data: | Authentication Method | Description | | ------------------------------------------------------- | ------------------------------------------------------------------------------------------ | | **User Authentication** | Access to the Supermetal Cloud Console requires strong user authentication | | [**SSO Integration**](/docs/main/concepts/security/sso) | Support for SAML/OIDC providers like Okta, Azure AD / Microsoft Entra ID, Google Workspace | | **Multi-Factor Authentication** | Optional but recommended MFA for additional account security | | **API Authentication** | API keys for programmatic access with time-based expiration | ### Role-Based Access Control (RBAC) Supermetal's access control system enables precise permission management: * **Predefined Roles**: Admin, Operator, Viewer roles with appropriate permissions * **Custom Roles**: Create roles with specific permissions for your organization's needs * **Resource-Level Permissions**: Control access to specific connections, and environments ## Network Security ### Control Plane Network Security * All Control Plane services operate within private subnets with strict security controls * TLS 1.2+ for all communications with the Control Plane ### Data Plane Network Security The Data Plane operates entirely within your cloud environment: * **VPC/VNet Isolation**: The Supermetal Agent runs within your private virtual network * **Private Connectivity Options**: * AWS: PrivateLink for secure connectivity to the Control Plane * Azure: Private Link for secure connectivity to the Control Plane ## Data Security & Encryption ### Encryption in Transit * **Browser to Console**: HTTPS with TLS 1.2+ * **Control Plane to Data Plane**: TLS 1.2+ over private connections * **Data Plane to Databases**: TLS connections for database communication ### Encryption at Rest * **Control Plane**: All configuration data encrypted at rest using AWS KMS keys (AES-GCM-256) * **Data Plane**: * Buffer bucket/container is encrypted using your cloud provider's KMS/Key Vault encryption * Customer-managed keys (CMKs) support for enhanced control ### Data Processing * Data remains within your cloud environment * No persistent data storage in the Supermetal Control Plane * Only metadata and telemetry are sent to the Control Plane ## Secrets Management Supermetal implements a secure approach to credential management: * **Zero Knowledge Design**: Supermetal control plane stores encrypted database credentials, only the data plane agents have access to decrypt the credentials * **Cloud Provider Integration**: * AWS: Integration with Secrets Manager for credential retrieval * Azure: Integration with Key Vault for secure credential storage ## Shared Responsibility Model Supermetal operates on a shared responsibility model for security in our BYOC deployments. This model clearly defines which security aspects are managed by Supermetal and which are the customer's responsibility.
| Category | Security Area | Supermetal Responsibility | Customer Responsibility | | ---------------------------------- | ------------------------------------------ | :-----------------------: | :---------------------: | | **Infrastructure & Compute** | Control Plane infrastructure | ✓ | | | | Data Plane infrastructure | ✓ | | | **Software & Updates** | Control Plane security | ✓ | | | | Data Plane security | ✓ | | | | Control Plane updates | ✓ | | | | Data Plane updates | ✓ (provides) | ✓ (applies) | | **Network Security** | Network security for Control Plane | ✓ | | | | Network security for Data Plane | ✓ | ✓ (allow access) | | | VPC/VNet configuration | ✓ (requirements) | ✓ (implementation) | | | Private Link/Endpoint (Control Plane side) | ✓ | | | | Private Link/Endpoint (Customer side) | | ✓ | | **Identity & Access Management** | Control Plane authentication | ✓ | | | | User access management | ✓ (platform controls) | ✓ (user assignment) | | | IAM roles/policies for Agent | ✓ (guidance) | ✓ (implementation) | | | Database access controls | | ✓ | | **Data Security** | Encryption of Control Plane data | ✓ | | | | Encryption of Data Plane / Agent buffer | ✓ (provides) | ✓ (applies) | | | Database security | | ✓ | | | Data residency controls | ✓ (options) | ✓ (enforcement) | | | Database credentials management | ✓ (encryption) | ✓ (management) | | **Monitoring & Incident Response** | Monitoring of Control Plane | ✓ | | | | Monitoring of Data Plane | ✓ (agent health) | ✓ (environment) | | | Security incident response (Control Plane) | ✓ | | | | Security incident response (Data Plane) | ✓ (assistance) | ✓ (primary) | | | Audit logging for Control Plane | ✓ | | | | Audit logging for Data Plane | ✓ (agent logs) | ✓ (infrastructure) | | **Compliance & Governance** | Regulatory compliance documentation | ✓ (platform) | ✓ (overall) | | | Compliance certifications | ✓ (platform) | | | | Security policies and procedures | ✓ (platform) | ✓ (organization) | | | Risk assessments | ✓ (platform) | ✓ (implementation) |
* **Supermetal is responsible for** the security of the Control Plane, the Agent security, platform authentication, encryption of Control Plane data, and providing security requirements and guidance for the Data Plane. * **Customers are responsible for** securing their cloud environment, implementing IAM roles/policies, database security, and providing the cloud infrastructure where Supermetal manages the Data Plane components. * **Shared responsibilities include**: * **Network security for Data Plane**: Supermetal manages the security of Data Plane components, while customers configure their cloud environment to allow appropriate access. * **Data Plane monitoring**: Supermetal monitors agent health and functionality, while customers monitor the infrastructure environment. * **Security updates**: Supermetal provides secure updates, while customers are responsible for applying them. * **Encryption**: Supermetal designs encryption requirements, while customers implement them in their environment. # Single Sign-on (SSO) (/docs/main/concepts/security/sso) ## Overview Supermetal supports SAML 2.0-based Single Sign-on (SSO) for enterprise authentication. SSO allows organizations to: * Manage user access to Supermetal through their existing identity provider * Enforce consistent security policies across applications * Enable multi-factor authentication through your identity provider ## Supported Identity Providers Supermetal supports SAML 2.0 integration with any identity provider, including: * [Google Workspace (formerly G Suite)](https://workspace.google.com/) * [Microsoft Entra ID (formerly Azure AD)](https://entra.microsoft.com/) * [Okta](https://www.okta.com/) ## Configuration Process To set up SSO with Supermetal, you'll need to configure both your identity provider and the Supermetal Web Console. You'll need these common settings regardless of which provider you use: ### Supermetal Service Provider Details Provide these values to your identity provider: | Setting | Value | | -------------------------------- | --------------------------------------------- | | **ACS URL** / **Callback URL** | `https://console.supermetal.io/saml/callback` | | **Entity ID** / **Audience URI** | `https://console.supermetal.io` | | **Start URL** / **Sign-on URL** | `https://console.supermetal.io/login` | | **Name ID Format** | `Email Address` | ### Identity Provider Details You'll need to gather these values from your identity provider to configure Supermetal: | Setting | Description | | --------------------- | --------------------------------------------------------------- | | **SAML SSO URL** | The endpoint where Supermetal will send authentication requests | | **SAML Entity ID** | The unique identifier for your identity provider | | **X.509 Certificate** | The certificate to verify SAML responses | ## Provider Configuration
### Prerequisites * Super Admin access to your Google Workspace account * Administrator access to Supermetal
### Google Workspace Configuration 1. In your Google Admin console, go to **Apps** > **Web and mobile apps** > **Add App** > **Search for SAML app**. 2. Select **Add custom SAML app**. 3. Provide a name for the application (e.g., "Supermetal") and optionally upload the Supermetal logo. 4. On the **Service provider details** screen, enter the Supermetal service provider details from the table above. 5. For **Name ID Format**, select **EMAIL**. 6. For **Name ID**, select **Basic Information** > **Primary email**. 7. Download the SSO URL and certificate from Google Workspace. 8. Assign the application to the appropriate users or organizational units.
### Supermetal Configuration 1. In the Supermetal Web Console, navigate to **Account Settings** > **Authentication** > **Configure SSO**. 2. Enter the details from Google Workspace: * **SAML SSO URL**: The URL provided by Google * **SAML Entity ID**: The Entity ID provided by Google * **X.509 Certificate**: The certificate downloaded from Google 3. Save the configuration.
### Prerequisites * Global Administrator access to your Microsoft Entra ID * Administrator access to Supermetal
### Microsoft Entra ID Configuration 1. In the Microsoft Entra admin center, go to **Enterprise applications** > **New application**. 2. Select **Create your own application**. 3. Enter a name for the application (e.g., "Supermetal"), select **Integrate any other application you don't find in the gallery (Non-gallery)** and click **Create**. 4. Under **Set up single sign-on**, select **SAML**. 5. In the **Basic SAML Configuration** section, enter the Supermetal service provider details from the table above. 6. In the **User Attributes & Claims** section, ensure that the NameID claim is set to user.mail or user.userprincipalname. 7. Download the Federation Metadata XML file or note the Login URL, Azure AD Identifier, and Logout URL. 8. Download the Certificate (Base64) from the SAML Signing Certificate section. 9. Assign users and groups to the application.
### Supermetal Configuration 1. In the Supermetal Web Console, navigate to **Account Settings** > **Authentication** > **Configure SSO**. 2. Enter the details from Entra ID: * **SAML SSO URL**: The Login URL from Entra ID * **SAML Entity ID**: The Azure AD Identifier * **X.509 Certificate**: The Base64 certificate downloaded from Entra ID 3. Save the configuration.
### Prerequisites * Administrator access to your Okta account * Administrator access to Supermetal
### Okta Configuration 1. In your Okta Admin Dashboard, go to **Applications** > **Applications**. 2. Click **Add Application** > **Create New App**. 3. Select **Web** as the platform and **SAML 2.0** as the sign-on method. 4. Enter a name for the application (e.g., "Supermetal"). 5. In the **SAML Settings** section, enter the Supermetal service provider details from the table above. 6. For **Name ID format**, select **EmailAddress**. 7. For **Application username**, select **Email**. 8. Configure any additional attributes if needed. 9. After saving, go to the **Sign On** tab of the application and click **View Setup Instructions**. 10. Copy the Identity Provider Single Sign-On URL, Identity Provider Issuer, and download the X.509 Certificate. 11. Assign users to the application.
### Supermetal Configuration 1. In the Supermetal Web Console, navigate to **Account Settings** > **Authentication** > **Configure SSO**. 2. Enter the details from Okta: * **SAML SSO URL**: The Identity Provider Single Sign-On URL from Okta * **SAML Entity ID**: The Identity Provider Issuer from Okta * **X.509 Certificate**: The X.509 Certificate downloaded from Okta 3. Save the configuration.
## Testing and Enabling SSO After configuring SSO with your identity provider: 1. In the Supermetal Web Console, navigate to **Account Settings** > **Authentication** > **SSO Settings**. 2. Click **Test Configuration** to verify the integration works properly. 3. Once testing is successful, enable SSO for your organization by selecting the appropriate option: * **SSO Optional**: Users can sign in with either SSO or username/password * **SSO Required**: All users must authenticate using SSO # PostgreSQL (/docs/main/targets/postgres) import { Tab, Tabs } from 'fumadocs-ui/components/tabs'; import { Callout } from 'fumadocs-ui/components/callout'; import { Accordion, Accordions } from 'fumadocs-ui/components/accordion'; PostgreSQL (Postgres) is a powerful open-source relational database. This guide walks you through configuring your PostgreSQL database to work seamlessly with Supermetal. *** ## Features
Feature Notes
Bulk Insert Uses `COPY FROM ... FORMAT BINARY` to efficiently load data in bulk compared to row-by-row updates, providing optimal throughput for large data transfers.
Transactional Consistency Guarantees that the data synced reflects a consistent point in the upstream database's timeline, respecting transaction boundaries to maintain data integrity.
Schema Evolution Automatic schema evolution to handle changes in source table structures, including new columns and data type modifications.
Hard Deletes Row deletion is handled by issuing a `DELETE` statement, ensuring that deleted records are properly removed from the target database.
*** ## Prerequisites Before you begin, ensure you have: * **Supported PostgreSQL Version:** PostgreSQL 9.4 or higher. * [Generic PostgreSQL](https://www.postgresql.org/) * [AWS RDS PostgreSQL](https://aws.amazon.com/rds/postgresql/) * [AWS Aurora PostgreSQL](https://aws.amazon.com/rds/aurora/) * [Google Cloud PostgreSQL](https://cloud.google.com/sql/postgresql) * [Azure PostgreSQL](https://azure.microsoft.com/en-us/products/postgresql) * **Database Admin Access:** Sufficient privileges to create users and grant permissions. * **Network Connectivity:** Ensure that Supermetal's services can reach your PostgreSQL database. You may need to configure security groups. *** ## Setup Log in to your PostgreSQL database and run the following: ```sql -- Create a user for Supermetal CREATE USER supermetal_user WITH PASSWORD 'strong-password'; -- Grant permissions -- allows supermetal to connect to the database, create schemas and (temp) tables GRANT CONNECT, CREATE, TEMPORARY ON DATABASE target_database TO supermetal_user; ``` Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the target in Supermetal. Replace the placeholders in the script with your specific values: * `target_database`: The name of your PostgreSQL database For large snapshots, optimize checkpoint frequency to maximize COPY throughput and avoid I/O storms. Checkpoints trigger when either limit is reached first. * **`max_wal_size`**: Maximum WAL size before checkpoint. COPY generates WAL rapidly, and small values (default: 1GB) cause frequent checkpoints that throttle performance. Increase to 16GB - 64GB to allow bulk loads to complete with fewer I/O interruptions. * **`checkpoint_timeout`**: Maximum time between checkpoints. Default (5min) triggers checkpoints before large snapshots complete. Increase to 30min - 1h to prevent premature interruptions during extended loads. ## Data Types Mapping | Apache Arrow DataType | PostgreSQL Type | Notes | | :-------------------- | :----------------- | :---- | | `Int8` | `SMALLINT` | | | `Int16` | `SMALLINT` | | | `Int32` | `INTEGER` | | | `Int64` | `BIGINT` | | | `UInt8` | `SMALLINT` | | | `UInt16` | `INTEGER` | | | `UInt32` | `BIGINT` | | | `UInt64` | `NUMERIC` | | | `Float16` | `REAL` | | | `Float32` | `REAL` | | | `Float64` | `DOUBLE PRECISION` | | | `Decimal128(p,s)` | `NUMERIC(p,s)` | | | `Decimal256(p,s)` | `NUMERIC(p,s)` | | | Apache Arrow DataType | PostgreSQL Type | Notes | | :------------------------- | :-------------- | :--------------------------------------------------------------------------------------------- | | `Date32` | `DATE` | | | `Date64` | `DATE` | | | `Time32` | `TIME` | | | `Time64` | `TIME` | | | `Time64(ns)` | `BIGINT` | Nanosecond precision stored as integer because Postgres does not support nanosecond precision | | `Timestamp(s/ms/us, [tz])` | `TIMESTAMP[Z]` | | | `Timestamp(ns, [tz])` | `BIGINT` | Nanosecond timestamps stored as integer because Postgres does not support nanosecond precision | | `Interval` | `INTERVAL` | | | `Duration` | Not supported | Durations are explicitly not supported | | Apache Arrow DataType | PostgreSQL Type | Notes | | :-------------------- | :-------------- | :---- | | `Utf8`, `LargeUtf8` | `TEXT` | | | Apache Arrow DataType | PostgreSQL Type | Notes | | :-------------------- | :-------------- | :------------------- | | `Boolean` | `BOOLEAN` | Boolean (true/false) | | Apache Arrow DataType | PostgreSQL Type | Notes | | :--------------------------------------------------------------------------------------------------------- | :-------------- | :---- | | `Utf8` [JSON Extension](https://arrow.apache.org/docs/format/CanonicalExtensions.html#json) (`arrow.json`) | `JSONB` | | | `Struct` | Not supported | | | `Map` | Not supported | | | Apache Arrow DataType | PostgreSQL Type | Notes | | :----------------------------------------- | :-------------- | :---- | | `Binary`, `LargeBinary`, `FixedSizeBinary` | `BYTEA` | | | Apache Arrow DataType | PostgreSQL Type | Notes | | :-------------------------------------------- | :-------------- | :----------------------------------------------------------- | | `List`, `LargeList`, `FixedSizeList` | `T[]` | Only one-dimensional arrays of primitive types are supported | # PostgreSQL Setup (/docs/main/targets/postgres/setup) ## Prerequisites Before you begin, ensure you have: * **Supported PostgreSQL Version:** PostgreSQL 9.4 or higher. * [Generic PostgreSQL](https://www.postgresql.org/) * [AWS RDS PostgreSQL](https://aws.amazon.com/rds/postgresql/) * [AWS Aurora PostgreSQL](https://aws.amazon.com/rds/aurora/) * [Google Cloud PostgreSQL](https://cloud.google.com/sql/postgresql) * [Azure PostgreSQL](https://azure.microsoft.com/en-us/products/postgresql) * **Database Admin Access:** Sufficient privileges to create users and grant permissions. * **Network Connectivity:** Ensure that Supermetal's services can reach your PostgreSQL database. You may need to configure security groups. *** ## Setup Log in to your PostgreSQL database and run the following: ```sql -- Create a user for Supermetal CREATE USER supermetal_user WITH PASSWORD 'strong-password'; -- Grant permissions -- allows supermetal to connect to the database, create schemas and (temp) tables GRANT CONNECT, CREATE, TEMPORARY ON DATABASE target_database TO supermetal_user; ``` Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the target in Supermetal. Replace the placeholders in the script with your specific values: * `target_database`: The name of your PostgreSQL database For large snapshots, optimize checkpoint frequency to maximize COPY throughput and avoid I/O storms. Checkpoints trigger when either limit is reached first. * **`max_wal_size`**: Maximum WAL size before checkpoint. COPY generates WAL rapidly, and small values (default: 1GB) cause frequent checkpoints that throttle performance. Increase to 16GB - 64GB to allow bulk loads to complete with fewer I/O interruptions. * **`checkpoint_timeout`**: Maximum time between checkpoints. Default (5min) triggers checkpoints before large snapshots complete. Increase to 30min - 1h to prevent premature interruptions during extended loads. # Databricks (/docs/main/targets/databricks) import { Tab, Tabs } from 'fumadocs-ui/components/tabs'; import { Callout } from 'fumadocs-ui/components/callout'; import { Accordion, Accordions } from 'fumadocs-ui/components/accordion'; Databricks is a unified data platform that combines key features of data lakes and data warehouses. This guide walks you through configuring your Databricks platform to work seamlessly with Supermetal. *** ## Features
Feature Notes
Schema Evolution Automatic schema evolution
Soft Delete(s) Row deletion is handled by marking a `_sm_deleted` boolean metadata column value.
*** ## Prerequisites Before you begin, ensure you have: * **Supported Databricks Implementations:** * [AWS](https://www.databricks.com/product/aws) * [Azure](https://www.databricks.com/product/azure) * [GCP](https://www.databricks.com/product/google-cloud) * [Serverless Workspace](https://docs.databricks.com/aws/en/admin/workspace/serverless-workspaces) * **Unity Catalog:** Unity Catalog enabled on your Databricks workspace. * **SQL Warehouse:** [Serverless SQL Warehouse](https://docs.databricks.com/aws/en/compute/sql-warehouse/). *** ## Setup ### Configure Authentication ### Create a Service Principal Follow the databricks [documentation](https://docs.databricks.com/aws/en/dev-tools/auth/oauth-m2m) to create a Service Principal. Note down the Client ID and Client Secret. ### Create a Personal Access Token Follow the databricks [documentation](https://docs.databricks.com/aws/en/dev-tools/auth/pat) to create a PAT. Note down the personal access token. ### Create a SQL Warehouse Login to Databricks console to create or use an existing SQL Warehouse. * Go to SQL > SQL warehouses > Create SQL warehouse * Fill in and select the required fields like * Name * Warehouse Size (2X Small) * Warehouse Type (Serverless) * Auto Stop (10 minutes) * Scaling Min & Max (1) * Unity Catalog (Enabled) * etc. * Click Create * Once created, click on the Connection Details Tab Note down the following details: * Server Hostname (your-workspace.cloud.databricks.com) * Warehouse ID (0123456789abcdef) ### Configure a Catalog Login to Databricks console to choose a Catalog (or set a new one from [databricks documentation](https://docs.databricks.com/aws/en/data-governance/unity-catalog/get-started#overview-of-unity-catalog-setup)). * From Databricks workspace console, navigate to Data * Choose a Catalog (`my_catalog`) ### Create a Volume Supermetal uses the configured volume as a temporary stage Follow the steps from the [databricks documentation](https://docs.databricks.com/aws/en/volumes/utility-commands#create-a-volume). * From Databricks workspace console, navigate to Catalog * Choose the Catalog from above step (`my_catalog`) * Search or browse for the schema that you want to add the volume to and select it. * Click on Create Volume and specify a Name. * Click Create ```sql CREATE VOLUME my_catalog.my_schema.my_volume; ``` Note down the following details: * Catalog Name (my\_catalog) * Volume Path (/Volumes/my\_catalog/my\_schema/my\_volume) *** ## Data Types Mapping | Apache Arrow DataType | Databricks Type | Notes | | :----------------------------------- | :--------------- | :------------------------------------------------------- | | `Int8` | `TINYINT` | | | `Int16` | `SMALLINT` | | | `Int32` | `INT` | | | `Int64` | `BIGINT` | | | `UInt8` | `SMALLINT` | Promoted to signed 16-bit | | `UInt16` | `INT` | Promoted to signed 32-bit | | `UInt32` | `BIGINT` | Promoted to signed 64-bit | | `UInt64` | `DECIMAL(20, 0)` | Mapped to decimal to preserve full unsigned 64-bit range | | `Float16` | `FLOAT` | Upcast to Float32 in Parquet | | `Float32` | `FLOAT` | | | `Float64` | `DOUBLE` | | | `Decimal128(p, s)`
where p ≤ 38 | `DECIMAL(p, s)` | | | `Decimal128(p, s)`
where p > 38 | `STRING` | Precision exceeds Databricks maximum of 38 | | `Decimal256(p, s)`
where p ≤ 38 | `DECIMAL(p, s)` | Downcast to Decimal128 in Parquet | | `Decimal256(p, s)`
where p > 38 | `STRING` | Precision exceeds Databricks maximum of 38 |
| Apache Arrow DataType | Databricks Type | | :-------------------- | :-------------- | | `Boolean` | `BOOLEAN` | | Apache Arrow DataType | Databricks Type | Notes | | :-------------------- | :-------------- | :--------------------------------------------------------------- | | `Date32` | `DATE` | | | `Date64` | `DATE` | Converted to Date32 in Parquet | | `Timestamp(s, tz)` | `TIMESTAMP_NTZ` | Converted to Timestamp(ms) in Parquet for proper annotation | | `Timestamp(ms, tz)` | `TIMESTAMP_NTZ` | | | `Timestamp(μs, tz)` | `TIMESTAMP_NTZ` | Databricks supports microsecond precision | | `Timestamp(ns, tz)` | `TIMESTAMP_NTZ` | Converted to Timestamp(μs) in Parquet (Databricks max precision) | | `Time32`, `Time64` | `STRING` | Databricks does not support TIME types | | `Interval` | `STRING` | Databricks cannot read INTERVAL from Parquet | | Apache Arrow DataType | Databricks Type | | :-------------------- | :-------------- | | `Utf8`, `LargeUtf8` | `STRING` | | Apache Arrow DataType | Databricks Type | | :---------------------- | :-------------- | | `Binary`, `LargeBinary` | `BINARY` | | Apache Arrow DataType | Databricks Type | Notes | | :--------------------------------------------------------------------------------------------------------- | :-------------- | :---------------------------------------- | | `Utf8` [JSON Extension](https://arrow.apache.org/docs/format/CanonicalExtensions.html#json) (`arrow.json`) | `STRING` | `VARIANT` will be supported in the future | | Apache Arrow DataType | Databricks Type | Notes | | :-------------------------------------------- | :-------------- | :----------------------------------- | | `List`, `LargeList`, `FixedSizeList` | `ARRAY` | Element type T is recursively mapped |
*** # Databricks Setup (/docs/main/targets/databricks/setup) ## Prerequisites Before you begin, ensure you have: * **Supported Databricks Implementations:** * [AWS](https://www.databricks.com/product/aws) * [Azure](https://www.databricks.com/product/azure) * [GCP](https://www.databricks.com/product/google-cloud) * [Serverless Workspace](https://docs.databricks.com/aws/en/admin/workspace/serverless-workspaces) * **Unity Catalog:** Unity Catalog enabled on your Databricks workspace. * **SQL Warehouse:** [Serverless SQL Warehouse](https://docs.databricks.com/aws/en/compute/sql-warehouse/). *** ## Setup ### Configure Authentication ### Create a Service Principal Follow the databricks [documentation](https://docs.databricks.com/aws/en/dev-tools/auth/oauth-m2m) to create a Service Principal. Note down the Client ID and Client Secret. ### Create a Personal Access Token Follow the databricks [documentation](https://docs.databricks.com/aws/en/dev-tools/auth/pat) to create a PAT. Note down the personal access token. ### Create a SQL Warehouse Login to Databricks console to create or use an existing SQL Warehouse. * Go to SQL > SQL warehouses > Create SQL warehouse * Fill in and select the required fields like * Name * Warehouse Size (2X Small) * Warehouse Type (Serverless) * Auto Stop (10 minutes) * Scaling Min & Max (1) * Unity Catalog (Enabled) * etc. * Click Create * Once created, click on the Connection Details Tab Note down the following details: * Server Hostname (your-workspace.cloud.databricks.com) * Warehouse ID (0123456789abcdef) ### Configure a Catalog Login to Databricks console to choose a Catalog (or set a new one from [databricks documentation](https://docs.databricks.com/aws/en/data-governance/unity-catalog/get-started#overview-of-unity-catalog-setup)). * From Databricks workspace console, navigate to Data * Choose a Catalog (`my_catalog`) ### Create a Volume Supermetal uses the configured volume as a temporary stage Follow the steps from the [databricks documentation](https://docs.databricks.com/aws/en/volumes/utility-commands#create-a-volume). * From Databricks workspace console, navigate to Catalog * Choose the Catalog from above step (`my_catalog`) * Search or browse for the schema that you want to add the volume to and select it. * Click on Create Volume and specify a Name. * Click Create ```sql CREATE VOLUME my_catalog.my_schema.my_volume; ``` Note down the following details: * Catalog Name (my\_catalog) * Volume Path (/Volumes/my\_catalog/my\_schema/my\_volume) # MongoDB (/docs/main/sources/mongo) import { Tab, Tabs } from 'fumadocs-ui/components/tabs'; import { Callout } from 'fumadocs-ui/components/callout'; import { Accordion, Accordions } from 'fumadocs-ui/components/accordion'; import { ImageZoom } from 'fumadocs-ui/components/image-zoom'; Supermetal's MongoDB integration leverages MongoDB's native change streams capability to provide efficient, reliable data extraction with minimal impact on your database. This integration enables both initial snapshot capture and continuous replication of changes through a strongly-typed schema approach. This guide covers the features, prerequisites, and configuration steps required to connect MongoDB with Supermetal. *** ## Features
Feature Notes
Initial Data Sync Fast, memory efficient initial data loading through optimized MongoDB query patterns and streaming data processing.
Change Data Capture Captures data changes (inserts, updates, deletes) in real-time using MongoDB's native [Change Streams](https://www.mongodb.com/docs/manual/changeStreams/) functionality. Change streams provide a time-ordered sequence of change events that occur within your MongoDB deployment. Supermetal creates a single change stream at the database level rather than per collection, reducing resource usage and following MongoDB's [performance recommendations](https://www.mongodb.com/docs/manual/changeStreams/#change-stream-performance-considerations).
Schema Evolution Unlike traditional MongoDB ETL tools that treat all data as unstructured JSON, Supermetal automatically infers and enforces a strongly-typed schema backed by Apache Arrow: **Precise Type Mapping**: Backed by Apache Arrow's rich type system, Supermetal internally maintains precise type mapping for MongoDB BSON types to Arrow data types **Schema Evolution**: Detects new fields and type changes during both snapshot and CDC phases, seamlessly merging them into a unified schema continuously **Downstream Compatibility**: Ensures schema consistency when writing to structured targets like databases and data warehouses
Catalog Support Ability to configure which collections to sync through the catalog interface.
Document Flattening Converts nested MongoDB documents into a flat, analytics-friendly structure using a double-underscore (`__`) notation. **Example:** ```javascript // Original MongoDB document { "_id": ObjectId("..."), "user": { "name": "John", "address": { "city": "San Francisco", "country": "USA" } }, "tags": ["tag1", "tag2"] } // Flattened representation { "_id": "...", "user__name": "John", "user__address__city": "San Francisco", "user__address__country": "USA", "tags": ["tag1", "tag2"] } ```
*** ## Prerequisites Before you begin, ensure you have: * **MongoDB Requirements:** * **Version:** MongoDB 4.0 or higher (required for change streams support) * **Supported Deployments:** * `MongoDB Community/Enterprise Server` (replica set configuration) * `MongoDB Atlas` Change streams require a replica set deployment. If you're using a standalone server, you must convert it to a single-node replica set to use change streams. * **Deployment Requirements:** * **Replica Set**: Your MongoDB deployment must be configured as a replica set * **Read Concern Majority**: Supermetal uses read concern "majority" to ensure consistent reads * **Database Permissions**: User with appropriate permissions (see [Setup](#setup)) * **Network Connectivity**: Ensure Supermetal can reach your MongoDB deployment (default port: 27017) * **TLS/SSL Support**: Supermetal supports both unencrypted and TLS/SSL encrypted connections * **MongoDB Atlas Requirements**: * **Network Access**: Configure network access rules to allow Supermetal to connect * **Connection String**: Use the connection string format that includes all replica set members *** ## Setup ### Permissions Overview Supermetal requires a dedicated MongoDB user with appropriate permissions to read data and access change streams. The recommended approach is to create a dedicated read-only user. | MongoDB Deployment | Minimum Required Permissions | | :----------------- | :-------------------------------------------------- | | Self-managed | `read` role on the mongo database to replicate from | | MongoDB Atlas | `readAnyDatabase` role | ### Create a Dedicated Read-Only MongoDB User
#### Connect to your MongoDB instance using the mongo shell with admin privileges: \[!toc] ```bash mongosh --host --port -u -p --authenticationDatabase admin ``` Replace the placeholder values in the command with your actual information: * ``: Your MongoDB server hostname or IP address * ``: MongoDB port (default is 27017) * ``: Username with admin privileges * ``: Password for the admin user
#### Create a dedicated user for Supermetal: \[!toc] ```javascript use admin db.createUser({ user: "supermetal_user", pwd: "strong-password", roles: [ { role: "read", db: "target-database" } ] }) ``` Replace the placeholder values in the script with your actual information: * `strong-password`: Replace with a secure, unique password for the `supermetal_user`. * `target-database`: The name of the database you want to replicate from.
#### Log in to your MongoDB Atlas account \[!toc] Visit [cloud.mongodb.com](https://cloud.mongodb.com/) to access your account.
#### Select your project and navigate to the **Database Access** section \[!toc]
#### Click the "Add New Database User" button \[!toc]
#### Enter a username and password \[!toc]
#### Under "Database User Privileges", select "Built-in Role" and choose "Read Any Database" \[!toc]
#### Optionally, you can restrict access to specific clusters that contain the databases you want to replicate \[!toc]
#### Click "Add User" to create the user with the selected permissions \[!toc]
*** ## Data Types Mapping Supermetal automatically maps MongoDB BSON types to Apache Arrow data types according to the following mapping: | MongoDB BSON Type(s) | Apache Arrow DataType | Notes | | :------------------- | :-------------------- | :----------------------------------------------------------------------------------------------------------------- | | `Double` | `Float64` | | | `Int32` | `Int32` | | | `Int64` | `Int64` | | | `Decimal128` | `Utf8` | Preserved as string to maintain exact precision and handle MongoDB's variable decimal precision/scale requirements | | MongoDB BSON Type(s) | Apache Arrow DataType | Notes | | :------------------- | :----------------------- | :------------------------------ | | `DateTime` | `Timestamp(Millisecond)` | | | `Timestamp` | `Timestamp(Millisecond)` | MongoDB internal timestamp type | | MongoDB BSON Type(s) | Apache Arrow DataType | Notes | | :------------------------ | :-------------------- | :--------------------------------- | | `String` | `Utf8` | | | `Symbol` | `Utf8` | Deprecated MongoDB type | | `RegularExpression` | `Utf8` | Converted to string representation | | `JavaScriptCode` | `Utf8` | Converted to string representation | | `JavaScriptCodeWithScope` | `Utf8` | Converted to string representation | | MongoDB BSON Type(s) | Apache Arrow DataType | Notes | | :------------------- | :-------------------- | :---- | | `Boolean` | `Boolean` | | | MongoDB BSON Type(s) | Apache Arrow DataType | Notes | | :----------------------------- | :-------------------- | :--------------------------------------------- | | `Array` (same data type) | `List` | Only for arrays with homogeneous element types | | `Array` (different data types) | `Json` | Heterogeneous arrays are converted to JSON | | `Document` | `Json` | Nested documents represented as JSON | | MongoDB BSON Type(s) | Apache Arrow DataType | Notes | | :------------------- | :-------------------- | :-------------------------------------------------------------------------- | | `Binary` | `Utf8` | Encoded as hexadecimal string for lossless representation and compatibility | | MongoDB BSON Type(s) | Apache Arrow DataType | Notes | | :------------------- | :-------------------- | :------------------------------------- | | `ObjectId` | `Utf8` | Converted to hex string representation | | `DbPointer` | `Utf8` | Legacy MongoDB type | | `Null` | `Null` | Represented as null in target columns | | `MinKey` | `Utf8` | Special MongoDB comparison type | | `MaxKey` | `Utf8` | Special MongoDB comparison type | | `Undefined` | `Utf8` | Deprecated MongoDB type | # MongoDB Setup (/docs/main/sources/mongo/setup) ## Prerequisites Before you begin, ensure you have: * **MongoDB Requirements:** * **Version:** MongoDB 4.0 or higher (required for change streams support) * **Supported Deployments:** * `MongoDB Community/Enterprise Server` (replica set configuration) * `MongoDB Atlas` Change streams require a replica set deployment. If you're using a standalone server, you must convert it to a single-node replica set to use change streams. * **Deployment Requirements:** * **Replica Set**: Your MongoDB deployment must be configured as a replica set * **Read Concern Majority**: Supermetal uses read concern "majority" to ensure consistent reads * **Database Permissions**: User with appropriate permissions (see [Setup](#setup)) * **Network Connectivity**: Ensure Supermetal can reach your MongoDB deployment (default port: 27017) * **TLS/SSL Support**: Supermetal supports both unencrypted and TLS/SSL encrypted connections * **MongoDB Atlas Requirements**: * **Network Access**: Configure network access rules to allow Supermetal to connect * **Connection String**: Use the connection string format that includes all replica set members *** ## Setup ### Permissions Overview Supermetal requires a dedicated MongoDB user with appropriate permissions to read data and access change streams. The recommended approach is to create a dedicated read-only user. | MongoDB Deployment | Minimum Required Permissions | | :----------------- | :-------------------------------------------------- | | Self-managed | `read` role on the mongo database to replicate from | | MongoDB Atlas | `readAnyDatabase` role | ### Create a Dedicated Read-Only MongoDB User
#### Connect to your MongoDB instance using the mongo shell with admin privileges: \[!toc] ```bash mongosh --host --port -u -p --authenticationDatabase admin ``` Replace the placeholder values in the command with your actual information: * ``: Your MongoDB server hostname or IP address * ``: MongoDB port (default is 27017) * ``: Username with admin privileges * ``: Password for the admin user
#### Create a dedicated user for Supermetal: \[!toc] ```javascript use admin db.createUser({ user: "supermetal_user", pwd: "strong-password", roles: [ { role: "read", db: "target-database" } ] }) ``` Replace the placeholder values in the script with your actual information: * `strong-password`: Replace with a secure, unique password for the `supermetal_user`. * `target-database`: The name of the database you want to replicate from.
#### Log in to your MongoDB Atlas account \[!toc] Visit [cloud.mongodb.com](https://cloud.mongodb.com/) to access your account.
#### Select your project and navigate to the **Database Access** section \[!toc]
#### Click the "Add New Database User" button \[!toc]
#### Enter a username and password \[!toc]
#### Under "Database User Privileges", select "Built-in Role" and choose "Read Any Database" \[!toc]
#### Optionally, you can restrict access to specific clusters that contain the databases you want to replicate \[!toc]
#### Click "Add User" to create the user with the selected permissions \[!toc]
# Clickhouse (/docs/main/targets/clickhouse) import { Tab, Tabs } from 'fumadocs-ui/components/tabs'; import { Callout } from 'fumadocs-ui/components/callout'; import { Accordion, Accordions } from 'fumadocs-ui/components/accordion'; import { ImageZoom } from 'fumadocs-ui/components/image-zoom'; ClickHouse is a high-performance, column-oriented SQL database management system for online analytical processing (OLAP). This guide walks you through configuring your ClickHouse database to work seamlessly with Supermetal. *** ## Features
Feature Notes
ReplacingMergeTree De-duplicates data using ClickHouse's ReplacingMergeTree engine for tables with primary keys.
Async Inserts Optionally enable [async inserts](https://clickhouse.com/docs/optimize/asynchronous-inserts).
Schema Evolution Automatic schema evolution
Soft Delete(s) Row deletion is handled by marking a `_sm_deleted` boolean metadata column value.
*** ## Prerequisites Before you begin, ensure you have: * **Supported ClickHouse Implementations:** * ClickHouse Cloud * ClickHouse Server: * 21.8 or higher. * Ensure that the ClickHouse HTTP interface (default port 8123) is enabled. [ClickHouse Cluster](https://clickhouse.com/docs/architecture/cluster-deployment) deployment is not yet supported. * **Database Admin Access:** Sufficient privileges to create users, grant permissions. This might require `admin` privileges. * **Network Connectivity:** Ensure that Supermetal's services can reach your ClickHouse database. You may need to configure security groups. *** ## Setup ### Create a user & grant permissions
#### Visit SQL console \[!toc] Log in to [ClickHouse Cloud](https://clickhouse.cloud/) to access the SQL console and select your service.
#### Create a user \[!toc] ```sql CREATE USER IF NOT EXISTS 'supermetal_user' IDENTIFIED WITH sha256_password BY 'strong-password'; ``` Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the target in Supermetal.
#### Create database \[!toc] ```sql CREATE DATABASE IF NOT EXISTS target_database; ``` Replace `target_database` with the name of the database you want to use for storing data.
#### Grant permissions \[!toc] ```sql GRANT CREATE DATABASE ON target_database.*, CREATE TABLE ON target_database.*, CREATE TEMPORARY TABLE ON *.*, S3 ON *.*, AZURE ON *.*, INSERT ON target_database.*, ALTER TABLE ON target_database.*, SELECT ON target_database.*, TRUNCATE ON target_database.* TO 'supermetal_user'; ``` Replace `target_database` with the name of the database created in the previous step.
#### Connection Details \[!toc] You'll need the following connection details to configure the target in Supermetal: * HTTPs URL (e.g. `https://..clickhouse.cloud:8443`) * Username and password you created You may need to add the IP address of the machine running Supermetal to the ClickHouse Cloud IP whitelist from the Service Settings -> Security -> IP Access List.
#### Create a user \[!toc] Login into ClickHouse server as an admin user to create a user for supermetal. ```sql CREATE USER IF NOT EXISTS 'supermetal_user' IDENTIFIED WITH sha256_password BY 'strong-password'; ``` Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the target in Supermetal.
#### Create database \[!toc] ```sql CREATE DATABASE IF NOT EXISTS target_database; ``` Replace `target_database` with the name of the database you want to use for storing data.
#### Grant permissions \[!toc] ```sql GRANT CREATE DATABASE ON target_database.*, CREATE TABLE ON target_database.*, CREATE TEMPORARY TABLE ON *.*, S3 ON *.*, AZURE ON *.*, INSERT ON target_database.*, ALTER TABLE ON target_database.*, SELECT ON target_database.*, TRUNCATE ON target_database.* TO 'supermetal_user'; ``` Replace `target_database` with the name of the database created in the previous step.
#### Connection Details \[!toc] You'll need the following connection details to configure the target in Supermetal: * HTTP interface URL (e.g. `https://hostname:port`) * Username and password you created
*** ## Table Engine(s) Supermetal automatically creates tables in ClickHouse with appropriate engines * Tables without primary keys use the `MergeTree` engine * Tables with primary keys use the `ReplacingMergeTree` engine with version tracking (`_sm_version`) *** ## Data Types Mapping | Apache Arrow DataType | ClickHouse Type | Notes | | :-------------------- | :-------------- | :---- | | `Int8` | `Int8` | | | `Int16` | `Int16` | | | `Int32` | `Int32` | | | `Int64` | `Int64` | | | `UInt8` | `UInt8` | | | `UInt16` | `UInt16` | | | `UInt32` | `UInt32` | | | `UInt64` | `UInt64` | | | `Float16` | `Float32` | | | `Float32` | `Float32` | | | `Float64` | `Float64` | | | `Decimal128(p,s)` | `Decimal(p,s)` | | | `Decimal256(p,s)` | `Decimal(p,s)` | | | Apache Arrow DataType | ClickHouse Type | Notes | | :--------------------- | :-------------------- | :-------------------------- | | `Date32` | `Date32` | | | `Date64` | `DateTime64(3)` | | | `Timestamp(s, [tz])` | `DateTime(0, [tz])` | | | `Timestamp(ms, [tz])` | `DateTime64(3, [tz])` | | | `Timestamp(us, [tz])` | `DateTime64(6, [tz])` | | | `Timestamp(ns, [tz])` | `DateTime64(9, [tz])` | | | `Time32(ms)` | `DateTime64(3)` | | | `Time64(us)` | `DateTime64(6)` | | | `Time64(ns)` | `DateTime64(9)` | | | `Duration`, `Interval` | `String` | Not supported by ClickHouse | | Apache Arrow DataType | ClickHouse Type | Notes | | :-------------------- | :-------------- | :---- | | `Utf8`, `LargeUtf8` | `String` | | | Apache Arrow DataType | ClickHouse Type | Notes | | :-------------------- | :-------------- | :---- | | `Boolean` | `Boolean` | | | Apache Arrow DataType | ClickHouse Type | Notes | | :--------------------------------------------------------------------------------------------------------- | :-------------- | :------------------------------------------------------------------------------------------------------------------------------------------ | | `Struct` | `Tuple` | | | `Map` | `Map` | | | `Utf8` [JSON Extension](https://arrow.apache.org/docs/format/CanonicalExtensions.html#json) (`arrow.json`) | ~~`JSON`~~ | [JSON Data Type](https://clickhouse.com/docs/sql-reference/data-types/newjson) is not supported (as it cannot ingest arbitrary json arrays) | | Apache Arrow DataType | ClickHouse Type | Notes | | :----------------------------------------- | :-------------- | :-------------------------------------- | | `Binary`, `LargeBinary`, `FixedSizeBinary` | `String` | ClickHouse does not have a binary type | | Apache Arrow DataType | ClickHouse Type | Notes | | :-------------------------------------------- | :-------------- | :---- | | `List`, `LargeList`, `FixedSizeList` | `Array(T)` | | Nullable columns are marked as nullable in ClickHouse using the `Nullable(Type)` wrapper. Supermetal preserves ClickHouse's [Arrow data type mappings](https://clickhouse.com/docs/interfaces/formats/Arrow#data-types-matching). # ClickHouse Setup (/docs/main/targets/clickhouse/setup) ## Prerequisites Before you begin, ensure you have: * **Supported ClickHouse Implementations:** * ClickHouse Cloud * ClickHouse Server: * 21.8 or higher. * Ensure that the ClickHouse HTTP interface (default port 8123) is enabled. [ClickHouse Cluster](https://clickhouse.com/docs/architecture/cluster-deployment) deployment is not yet supported. * **Database Admin Access:** Sufficient privileges to create users, grant permissions. This might require `admin` privileges. * **Network Connectivity:** Ensure that Supermetal's services can reach your ClickHouse database. You may need to configure security groups. *** ## Setup ### Create a user & grant permissions
#### Visit SQL console \[!toc] Log in to [ClickHouse Cloud](https://clickhouse.cloud/) to access the SQL console and select your service.
#### Create a user \[!toc] ```sql CREATE USER IF NOT EXISTS 'supermetal_user' IDENTIFIED WITH sha256_password BY 'strong-password'; ``` Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the target in Supermetal.
#### Create database \[!toc] ```sql CREATE DATABASE IF NOT EXISTS target_database; ``` Replace `target_database` with the name of the database you want to use for storing data.
#### Grant permissions \[!toc] ```sql GRANT CREATE DATABASE ON target_database.*, CREATE TABLE ON target_database.*, CREATE TEMPORARY TABLE ON *.*, S3 ON *.*, AZURE ON *.*, INSERT ON target_database.*, ALTER TABLE ON target_database.*, SELECT ON target_database.*, TRUNCATE ON target_database.* TO 'supermetal_user'; ``` Replace `target_database` with the name of the database created in the previous step.
#### Connection Details \[!toc] You'll need the following connection details to configure the target in Supermetal: * HTTPs URL (e.g. `https://..clickhouse.cloud:8443`) * Username and password you created You may need to add the IP address of the machine running Supermetal to the ClickHouse Cloud IP whitelist from the Service Settings -> Security -> IP Access List.
#### Create a user \[!toc] Login into ClickHouse server as an admin user to create a user for supermetal. ```sql CREATE USER IF NOT EXISTS 'supermetal_user' IDENTIFIED WITH sha256_password BY 'strong-password'; ``` Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the target in Supermetal.
#### Create database \[!toc] ```sql CREATE DATABASE IF NOT EXISTS target_database; ``` Replace `target_database` with the name of the database you want to use for storing data.
#### Grant permissions \[!toc] ```sql GRANT CREATE DATABASE ON target_database.*, CREATE TABLE ON target_database.*, CREATE TEMPORARY TABLE ON *.*, S3 ON *.*, AZURE ON *.*, INSERT ON target_database.*, ALTER TABLE ON target_database.*, SELECT ON target_database.*, TRUNCATE ON target_database.* TO 'supermetal_user'; ``` Replace `target_database` with the name of the database created in the previous step.
#### Connection Details \[!toc] You'll need the following connection details to configure the target in Supermetal: * HTTP interface URL (e.g. `https://hostname:port`) * Username and password you created
# Architecture (/docs/main/concepts/architecture) import SupermetalHighLevelArchitectureDiagram from '../../../../../app/diagrams/high_level_architecture'; import ConsistentPoint from '../../../../../app/diagrams/consistent_point'; import Snapshot from '../../../../../app/diagrams/snapshot'; Supermetal's architecture centers around three core principles: preserving data accuracy through Apache Arrow's type system, maintaining transactional consistency across the entire pipeline, and processing everything within a single process to massively reduce serialization overhead. This document covers how Supermetal establishes consistent points in source databases, parallelizes snapshot operations, handles continuous CDC, and manages state to ensure reliable data delivery. *** ## Overview Supermetal operates as a single, self-contained process that manages the entire data pipeline from source to target. Supermetal is built on two primary technologies: 1. **Rust**: A systems programming language chosen for its performance, memory safety, and lack of garbage collection. Rust provides: * C/C++-level predictable performance without the memory safety issues * Efficient concurrency model * Strong type system preventing common bugs 2. **Apache Arrow**: A columnar in-memory format serving as the universal data representation within Supermetal: * Optimized for fast data access and processing * Rich type system that preserves data accuracy * Zero-copy read capabilities * Efficient serialization to other downstream formats The combination of Rust and Arrow creates a powerful synergy: Rust provides the safe, high-performance execution environment, while Arrow provides efficient, standardized data representation. *** ## Key Differentiators ### Data Format Supermetal uses Apache Arrow's rich type system, ensuring: * Precise representation of database types * Preservation of precision, scale, signedness * Robust schema evolution This contrasts with conventional tools relying on JSON, CSV, or even Avro, which often collapse or simplify types to a set of base super types leading to data storage and compute inefficiencies. ### Protocol Efficiency Supermetal runs in a single process, moving data in a streaming manner using Apache Arrow record batches. This eliminates: * Inter Process Communication (IPC) * Intermediate Serialization/deserialization overhead * Text-based protocol inefficiencies (JSON parsing) * Excessive network hops for data exchange This contrasts with conventional tools that decouple source and target into distinct processes using IPC to exchange data in text-based formats (JSON). ### Transactional Consistency Supermetal ensures that at any given point, target state reflects a consistent point-in-time from the upstream source database: * Maintains transactional boundaries from the source system * When a transaction modifies multiple tables, changes are applied such that the target reflects a consistent state * Can batch multiple transactions together and apply them as a single atomic transaction on target for better throughput ### Infrastructure Simplicity Supermetal runs as a single, self-contained agent with no external dependencies and separates data plane from control plane. This contrasts with conventional tools that often require additional components such as: * Message brokers (Kafka) * External runtimes (JVM) * Complex container orchestration *** ## Connector A Supermetal deployment typically involves one or more connectors, each responsible for moving data from a specific source to a specific target. The architecture is designed for streaming data, processing changes in real-time from source to target. ### Replication phases The Supermetal connector manages the end-to-end process of Change Data Capture (CDC), from initial data synchronization to continuous, real-time updates. This typically involves a two phase approach: * [**Establishing a Consistent Point:**](#establishing-a-consistent-point) Establishes a consistent point in the source database (e.g., a specific WAL position or LSN) * [**Initial Snapshot:**](#parallel-snapshot-processing) Captures all existing tables and data * [**Real time Change Data Capture (CDC):**](#continuous-change-data-capture-cdc) Continuously captures changes by reading the database write-ahead log (WAL) * [**Target Optimized Loading & Deduplication:**](#target-optimized-loading--deduplication) Applies captured changes to target database using target-specific (bulk load) optimized formats and deduplicates while maintaining transactional integrity if the target database supports it. ### Establishing a Consistent Point {/* */} When a connector starts, Supermetal's invokes a source database specific workflow that perform the following steps: 1. Establishes a consistent point in the source database (e.g., a specific WAL position or LSN) 2. Discovers all tables to be replicated and their schema 3. Performs the initial snapshot of existing data. 4. Switches to reading the WAL from the previously established consistent point. The established consistent point is recorded (checkpoint) in the [state store](#state-management) only after the snapshot step is complete. This approach ensures that no transactions are missed during the transition from snapshot to CDC, providing complete data consistency. ### Parallel Snapshot Processing Supermetal dramatically reduces initial synchronization time through advanced parallelization: * **Inter-Table Parallelization**: Multiple tables are snapshotted simultaneously * **Intra-Table Parallelization**: Large tables are automatically split into chunks that are processed in parallel * **Streaming Approach**: Each chunk is processed in a streaming fashion preventing memory overload even for large tables This parallelization strategy can reduce snapshot times by orders of magnitude compared to traditional tools, especially for large databases with many tables. ### Continuous Change Data Capture (CDC) After completing the initial snapshot, Supermetal: * Begins reading the WAL from the consistent point established earlier * Decodes log entries into changes (inserts, updates, deletes) * Maintains transaction boundaries to ensure consistency * Provides near real-time propagation of changes to the target * Records (checkpoint) progress in the [state store](#state-management) to resume processing in failure scenarios or process restarts. ### Target-Optimized Loading & Deduplication On the target side, Supermetal intelligently optimizes the data loading strategy: * Automatically selects the most efficient data format for each target database * Uses Columnar formats (e.g., Parquet) for warehouses like Snowflake, Databricks, ClickHouse etc that ingest it efficiently, binary formats (postgres, mysql, sqlserver) through bulk loading APIs when available * Leverages direct integration between object stores and data warehouses to perform direct loads (zero copy) from the files in [object store buffer](#buffer). * Applies changes in a transactionally consistent manner if the target database supports it * Transparently handles deduplication (merge/upsert operations) by tracking metadata about the source transactions ### Buffer Supermetal includes an optional buffering mechanism to stage data between source and target components: * Can buffer and write batched transactions to object storage (e.g., S3, GCS, Azure Blob) or any filesystem * Decouples source and target, allowing them to operate at independent rates * Prevents backpressure on the source when it produces data faster than the target can consume * Enhances resilience against target unavailability ### State Management Supermetal decouples state tracking from the data plane and ensures reliability and recovery through efficient state management: * **Transaction-Aware Checkpointing**: Tracks progress at transaction boundaries end-to-end (source to destination). * **Automatic Recovery**: After any interruption or crash, automatically resumes from the last successfully processed transaction * **Consistent Resumption**: When restarting, continues from exactly where it left off without data loss or duplication * **Minimal Reprocessing**: Only processes data that hasn't been successfully delivered to the target *** ## Data Model The choice of data model is fundamental to Supermetal's performance and data accuracy. ### Apache Arrow as Universal Format Supermetal uses Apache Arrow RecordBatches as its in-memory representation for data flowing through the system. Source components read data into Arrow RecordBatches, which can then be forwarded in a zero copy manner to sinks or efficiently converted to a downstream format optimal for the target system, such as Parquet. Arrow's columnar structure stores data column by column rather than row by row. This approach provides significant advantages for data movement and ETL processes, including more efficient memory usage, better compression potential, and faster conversion to target formats. The layout is particularly well-suited for data integration, as it allows for fast zero-copy data exchange while preserving accuracy through Arrow's rich type system. ### Schema Mapping and Evolution Supermetal maintains precise mappings between source/target database types and Arrow data types. * **Accurate Type Representation**: Each source and target database type maps to a corresponding Arrow type that preserves its characteristics (precision, scale, signedness, temporal resolution) * **Robust Schema Evolution**: When a source schema changes, Supermetal leverages Arrow's schema capabilities to detect these changes, propagate them through the pipeline, and adapt target systems accordingly ## Concurrency Model Supermetal implements a concurrency model built on the asynchronous Tokio runtime that scales naturally with incoming data volume, allowing it to efficiently handle multiple data streams simultaneously. ### Streaming, Task-Based Processing Data flows through Supermetal in a streaming fashion: * **Real-Time Processing**: Data is processed as it becomes available rather than in large batch jobs * **Chunked Processing**: Information moves through the system in chunks (RecordBatches) of configurable size * **End-to-End Streaming**: From initial capture to target loading, the entire pipeline operates in a streaming manner **Benefits of this Concurrency Model:** * **Low Latency**: Minimizes the delay between source changes and target updates * **Memory Efficiency**: Streaming approach avoids buffering entire datasets in memory * **Flexible Throughput**: Automatically adapts to varying data volumes and rates * **Scalability**: A single Supermetal instance can efficiently handle connections to numerous sources and targets # Snowflake (/docs/main/targets/snowflake) import { Tab, Tabs } from 'fumadocs-ui/components/tabs'; import { Callout } from 'fumadocs-ui/components/callout'; import { Accordion, Accordions } from 'fumadocs-ui/components/accordion'; Snowflake is a popular cloud data warehouse that separates compute and storage. This guide walks you through configuring your Snowflake warehouse to work seamlessly with Supermetal. *** ## Features
Feature Notes
Bulk Insert Uses the [new vectorized parquet scanner](https://www.snowflake.com/en/engineering-blog/loading-terabytes-into-snowflake-speeds-feeds-techniques/) to efficiently load data.
Transactionally Consistent Guarantees that the data synced reflects a consistent point in the upstream database's timeline, respecting transaction boundaries.
Schema Evolution Automatic schema evolution
Soft Delete(s) Row deletion is handled by marking a `_sm_deleted` boolean metadata column value.
*** ## Prerequisites Before you begin, ensure you have: * **Supported Snowflake Implementations:** * AWS * Azure Snowflake on GCP is not yet supported. * **Database Admin Access:** Sufficient privileges to create users, grant permissions. This might require `ACCOUNTADMIN` / `SYSADMIN` / `SECURITYADMIN` privileges. *** ## Setup ### Create a Role ```sql begin; set supermetal_role = 'SUPERMETAL_ROLE'; -- create Supermetal role use role ACCOUNTADMIN; create role if not exists identifier($supermetal_role); grant role identifier($supermetal_role) to role SYSADMIN; -- grant permissions to create database grant CREATE DATABASE ON ACCOUNT TO ROLE identifier($supermetal_role); commit; ``` ### Create a User ```sql begin; set supermetal_role = 'SUPERMETAL_ROLE'; set supermetal_username = 'SUPERMETAL_USER'; set supermetal_password = 'strong-password'; -- create Supermetal user use role ACCOUNTADMIN; create user if not exists identifier($supermetal_username) default_role = $supermetal_role password = $supermetal_password; grant role identifier($supermetal_role) to user identifier($supermetal_username); commit; ``` Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the target in Supermetal. Snowflake is [deprecating password authentication](https://community.snowflake.com/s/article/FAQ-Snowflake-to-block-password-only-logins). Please use other authentication methods instead. ### Create a User ```sql begin; set supermetal_role = 'SUPERMETAL_ROLE'; set supermetal_username = 'SUPERMETAL_USER'; -- create Supermetal user use role ACCOUNTADMIN; create user if not exists identifier($supermetal_username) default_role = $supermetal_role type = SERVICE; grant role identifier($supermetal_role) to user identifier($supermetal_username); commit; ``` ### Create a Key Pair Follow the snowflake [documentation](https://docs.snowflake.com/en/user-guide/key-pair-auth#configuring-key-pair-authentication) to: * Generate a private key and a public key * Assign the key pair to the supermetal user `ALTER USER SUPERMETAL_USER SET RSA_PUBLIC_KEY='MIIBIjANBgkqh...';` ### (Optional) Create a Warehouse ```sql begin; set supermetal_role = 'SUPERMETAL_ROLE'; set supermetal_username = 'SUPERMETAL_USER'; set supermetal_warehouse = 'SUPERMETAL_WAREHOUSE'; -- change role to sysadmin to create warehouse use role sysadmin; -- create Supermetal warehouse create warehouse if not exists identifier($supermetal_warehouse) warehouse_size = xsmall warehouse_type = standard auto_suspend = 60 auto_resume = true initially_suspended = true; -- uncomment to use Generation 2 Warehouse if supported -- ALTER WAREHOUSE identifier($supermetal_warehouse) SET RESOURCE_CONSTRAINT = STANDARD_GEN_2; -- grant Supermetal warehouse access grant USAGE on warehouse identifier($supermetal_warehouse) to role identifier($supermetal_role); commit; ``` *** ## Transactions When enabled, Supermetal uses Snowflake's [transactions](https://docs.snowflake.com/en/sql-reference/transactions) to apply changes atomically across multiple tables. This preserves transaction boundaries from your source database when possible, ensuring that all changes within a single source transaction are either fully applied or not applied at all. *** ## Data Types Mapping | Apache Arrow DataType | Snowflake Type | Notes | | :----------------------------------------------------------------------- | :------------- | :------------------------- | | `Int8`, `Int16`, `Int32`, `Int64`, `UInt8`, `UInt16`, `UInt32`, `UInt64` | `NUMBER` | | | `Float16`, `Float32`, `Float64` | `FLOAT` | | | `Decimal128(p,s)` | `NUMBER(p,s)` | if `p <= 38` and `s <= 37` | | `Decimal256(p,s)` | `VARCHAR` | if `p > 38` | | Apache Arrow DataType | Snowflake Type | Notes | | :--------------------- | :----------------- | :---- | | `Date32`, `Date64` | `DATE` | | | `Time32(ms)` | `TIME(0)` | | | `Time64(us)` | `TIME(6)` | | | `Time64(ns)` | `TIME(9)` | | | `Timestamp(s, [tz])` | `TIMESTAMP_NTZ(0)` | | | `Timestamp(ms, [tz])` | `TIMESTAMP_NTZ(3)` | | | `Timestamp(us, [tz])` | `TIMESTAMP_NTZ(6)` | | | `Timestamp(ns, [tz])` | `TIMESTAMP_NTZ(9)` | | | `Duration`, `Interval` | `VARCHAR` | | | Apache Arrow DataType | Snowflake Type | Notes | | :-------------------- | :------------- | :---- | | `Utf8`, `LargeUtf8` | `VARCHAR` | | | Apache Arrow DataType | Snowflake Type | Notes | | :-------------------- | :------------- | :---- | | `Boolean` | `BOOLEAN` | | | Apache Arrow DataType | Snowflake Type | Notes | | :--------------------------------------------------------------------------------------------------------- | :------------- | :---- | | `Struct` | `VARIANT` | | | `Map` | `VARIANT` | | | `Utf8` [JSON Extension](https://arrow.apache.org/docs/format/CanonicalExtensions.html#json) (`arrow.json`) | `VARIANT` | | | Apache Arrow DataType | Snowflake Type | Notes | | :----------------------------------------- | :------------- | :---- | | `Binary`, `LargeBinary`, `FixedSizeBinary` | `BINARY` | | | Apache Arrow DataType | Snowflake Type | Notes | | :----------------------------------- | :------------- | :---- | | `List`, `LargeList`, `FixedSizeList` | `VARIANT` | | VARCHAR & BINARY data types are limited to 16MB and might be truncated ([TRUNCATECOLUMNS](https://docs.snowflake.com/en/sql-reference/sql/copy-into-table#copy-options-copyoptions) option). *** # Snowflake Setup (/docs/main/targets/snowflake/setup) ## Prerequisites Before you begin, ensure you have: * **Supported Snowflake Implementations:** * AWS * Azure Snowflake on GCP is not yet supported. * **Database Admin Access:** Sufficient privileges to create users, grant permissions. This might require `ACCOUNTADMIN` / `SYSADMIN` / `SECURITYADMIN` privileges. *** ## Setup ### Create a Role ```sql begin; set supermetal_role = 'SUPERMETAL_ROLE'; -- create Supermetal role use role ACCOUNTADMIN; create role if not exists identifier($supermetal_role); grant role identifier($supermetal_role) to role SYSADMIN; -- grant permissions to create database grant CREATE DATABASE ON ACCOUNT TO ROLE identifier($supermetal_role); commit; ``` ### Create a User ```sql begin; set supermetal_role = 'SUPERMETAL_ROLE'; set supermetal_username = 'SUPERMETAL_USER'; set supermetal_password = 'strong-password'; -- create Supermetal user use role ACCOUNTADMIN; create user if not exists identifier($supermetal_username) default_role = $supermetal_role password = $supermetal_password; grant role identifier($supermetal_role) to user identifier($supermetal_username); commit; ``` Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the target in Supermetal. Snowflake is [deprecating password authentication](https://community.snowflake.com/s/article/FAQ-Snowflake-to-block-password-only-logins). Please use other authentication methods instead. ### Create a User ```sql begin; set supermetal_role = 'SUPERMETAL_ROLE'; set supermetal_username = 'SUPERMETAL_USER'; -- create Supermetal user use role ACCOUNTADMIN; create user if not exists identifier($supermetal_username) default_role = $supermetal_role type = SERVICE; grant role identifier($supermetal_role) to user identifier($supermetal_username); commit; ``` ### Create a Key Pair Follow the snowflake [documentation](https://docs.snowflake.com/en/user-guide/key-pair-auth#configuring-key-pair-authentication) to: * Generate a private key and a public key * Assign the key pair to the supermetal user `ALTER USER SUPERMETAL_USER SET RSA_PUBLIC_KEY='MIIBIjANBgkqh...';` ### (Optional) Create a Warehouse ```sql begin; set supermetal_role = 'SUPERMETAL_ROLE'; set supermetal_username = 'SUPERMETAL_USER'; set supermetal_warehouse = 'SUPERMETAL_WAREHOUSE'; -- change role to sysadmin to create warehouse use role sysadmin; -- create Supermetal warehouse create warehouse if not exists identifier($supermetal_warehouse) warehouse_size = xsmall warehouse_type = standard auto_suspend = 60 auto_resume = true initially_suspended = true; -- uncomment to use Generation 2 Warehouse if supported -- ALTER WAREHOUSE identifier($supermetal_warehouse) SET RESOURCE_CONSTRAINT = STANDARD_GEN_2; -- grant Supermetal warehouse access grant USAGE on warehouse identifier($supermetal_warehouse) to role identifier($supermetal_role); commit; ``` # SQL Server (/docs/main/sources/sqlserver) import { Tab, Tabs } from 'fumadocs-ui/components/tabs'; import { Callout } from 'fumadocs-ui/components/callout'; import { Accordion, Accordions } from 'fumadocs-ui/components/accordion'; import { CreateAdminUser, CreateMinimalUser, CreateDBOwnerUser, EnableCDCTable, StandardEnableCDCDB, StandardConfigureCDCJob, RDSEnableCDCDB, RDSConfigureCDCJob, AzureConfigureCDCJob, AzureEnableCDCDB } from '@supermetal/docs-components'; SQL Server's Change Data Capture (CDC) through capture tables enables reliable, low impact change capture for data integration. Supermetal builds on this foundation to deliver efficient data extraction and replication with minimal source database overhead. This guide covers features and configuration steps required to connect SQL Server with Supermetal, focusing on CDC setup and permissions. *** ## Features
Feature Notes
Initial Data Sync Performs fast initial data loads (snapshots) and historical backfills by chunking data within each table and reading each chunk concurrently using multiple connections: **Physical Partitioning**: Automatically detects and utilizes existing SQL Server table partitions, extracting data from each partition in parallel **Dynamic Range Partitioning**: For non-partitioned tables, identifies suitable partition columns (clustered indexes or primary keys) and creates optimal query boundaries based on data distribution
Change Data Capture Captures data changes (inserts, updates, deletes) in real-time using SQL Server's native [CDC functionality](https://learn.microsoft.com/en-us/sql/relational-databases/track-changes/about-change-data-capture-sql-server). Leverages the [SQL Server Agent](https://learn.microsoft.com/en-us/sql/ssms/agent/sql-server-agent) to asynchronously populate dedicated [CDC capture tables](https://learn.microsoft.com/en-us/sql/relational-databases/system-tables/change-data-capture-tables-transact-sql) from transaction log entries.
CDC Processing Modes **Standard CDC Mode**: Processes changes from individual CDC capture tables with minimal latency. This mode delivers lower latency data replication and is suitable when you don't need strict transactional consistency across multiple tables. **Strictly Transactional CDC Mode**: When enabled (`transactional_cdc_enabled: true`), Supermetal guarantees that all changes across multiple tables are processed while preserving their original transaction boundaries. This ensures that related data changes stay grouped together exactly as they were in the source database—for example, when an order and its line items were updated together, they will remain grouped together during replication.
Configurable CDC Processing Fine-tune CDC behavior with configurable parameters for permissions/roles, polling intervals, maximum scan cycles, and batch sizes.
Catalog Support Ability to configure which schemas, tables and columns to sync.
*** ## Prerequisites Before you begin, ensure you have: * **SQL Server Requirements:** * **Version:** SQL Server 2016 or higher * **Supported Editions:** * `Standard SQL Server` (Enterprise, Standard, Personal, Azure SQL Managed Instance) * `Azure SQL Database` * `Amazon RDS for SQL Server` CDC is not supported on `Azure Synapse`, `Azure SQL Edge`, or `Azure Synapse Serverless`. * **SQL Server Agent:** The [SQL Server Agent](https://learn.microsoft.com/en-us/sql/ssms/agent/sql-server-agent) service must be running, as it manages the CDC capture jobs. * **Setup Requirements:** * **Database Permissions:** `sysadmin` or `db_owner` role for CDC setup (see [Setup](#setup) for role-specific instructions) * **Network Connectivity:** Ensure Supermetal can reach your SQL Server (default port: 1433) *** ## Setup ### Permissions Overview Supermetal's SQL Server integration provides different levels of Change Data Capture (CDC) automation depending on your SQL Server edition and user permissions. The table below summarizes the automation capabilities: | SQL Server Edition | User Role | Automation Capabilities | | :----------------- | :----------- | :---------------------------------------------------------------------------------------------- | | Standard | `sysadmin` | Full CDC workflow automation (database CDC enablement, table CDC enablement, job configuration) | | Standard | `db_owner` | Partial automation (table CDC enablement, job configuration) | | Standard | Regular user | Limited automation (requires manual setup by a privileged user) | | Azure SQL Database | `db_owner` | Full CDC workflow automation (all CDC operations) | | Azure SQL Database | Regular user | Limited automation (requires manual setup by a privileged user) | | AWS RDS | `db_owner` | Partial automation (table CDC enablement, job configuration) | | AWS RDS | Regular user | Limited automation (requires manual setup by a privileged user) | We recommend creating a `db_owner` user for all SQL server editions as it provides the best tradeoff between automation and permissions. ### Setup Instructions #### SYSADMIN Role \[!toc] The `sysadmin` server role provides the highest level of permissions, enabling Supermetal to fully automate the CDC workflow, including enabling CDC at the database level, configuring tables for CDC, and managing CDC jobs. The `sysadmin` role is only available on Standard SQL Server editions. Azure SQL Database and AWS RDS restrict the use of `sysadmin` privileges.
#### Create a SYSADMIN User \[!toc] AWS RDS for SQL Server does not allow the creation of `sysadmin` users. Please use the DB\_OWNER role instead. Azure SQL Database does not support the `sysadmin` role. Please use the DB\_OWNER role instead. Replace `'strong-password'` with a secure, unique password for the Supermetal user. Store this password securely, as you'll need it when configuring the source in Supermetal.
#### DB\_OWNER Role \[!toc] The `db_owner` database role provides sufficient permissions for most CDC operations except enabling CDC at the database level on Standard SQL Server and AWS RDS (this step must be performed by a `sysadmin`). On Azure SQL Database, `db_owner` can automate the entire CDC workflow.
#### Create a DB\_OWNER User \[!toc] Replace the placeholder values in the script with your actual information: * `'strong-password'`: Replace with a secure, unique password for the `supermetal_user`. * `DATABASE_NAME`: The name of the database you want to replicate from.
#### Enable Snapshot Isolation \[!toc] Enable snapshot isolation on the database to ensure consistent reads during CDC operations. This step must be performed by a `sysadmin` user . Not required for Azure SQL Database. Supermetal automatically manages snapshot isolation enablement. Replace `DATABASE_NAME` with the name of the database you want to replicate from.
#### Enable CDC at Database Level \[!toc] This step is required for Standard SQL Server and AWS RDS. It creates the CDC schema and starts the SQL Server Agent jobs necessary for CDC. This step must be performed by a user with `sysadmin` privileges. Not required for Azure SQL Database. Supermetal automatically manages CDC enablement at the database level.
#### MINIMAL\_ACCESS\_USER Role \[!toc] A minimal access user has limited permissions and requires a user with higher privileges (such as `sysadmin` or `db_owner`) to set up the CDC environment. After setup, the minimal access user can be used for ongoing replication.
#### Create a Minimal Access User \[!toc] Replace the placeholder values in the script with your actual information: * `'strong-password'`: Replace with a secure, unique password for the `supermetal_user`. * `DATABASE_NAME`: The name of the database you want to replicate from. * `SCHEMA_NAME`: The schema of the table you want to replicate from. * `TABLE_NAME`: The name of the table you want to replicate from.
#### Enable Snapshot Isolation \[!toc] Enable snapshot isolation on the database to ensure consistent reads during CDC operations. This step must be performed by a user with appropriate privileges (typically `sysadmin` or `db_owner`). Replace `DATABASE_NAME` with the name of the database you want to replicate from.
#### Enable CDC at Database Level \[!toc] This step must be performed by a user with appropriate privileges (typically `sysadmin` for Standard and RDS, `db_owner` for Azure). If this user does not have SELECT permission on the database schema, then we need to explicitly grant SELECT permission on the \[cdc] schema: ```sql GRANT SELECT on SCHEMA::[cdc] to [supermetal_user]; ```
#### Enable CDC for Tables The following query creates a capture instance for a specific table. This step must be performed by a user with appropriate privileges. Replace the placeholder values in the script with your actual information: * `SCHEMA_NAME`: The schema of the table you want to replicate from. * `TABLE_NAME`: The name of the table you want to replicate from.
#### Configure CDC Job Parameters (Optional) You can customize the CDC job parameters to optimize performance based on your workload. Replace the parameter values with appropriate values for your environment: * `INSERT_MAX_SCANS_VALUE`: Maximum number of scan cycles (default: 10) * `INSERT_MAX_TRANS_VALUE`: Maximum number of transactions (default: 500) * `INSERT_POLL_INTERVAL_SECS_VALUE`: Polling interval in seconds (default: 5)
*** ## Data Types Mapping | SQL Server Type(s) | Apache Arrow DataType | Notes | | :----------------- | :-------------------------------------- | :--------------------------------------------------------------------------- | | `TINYINT` | `UInt8` | | | `SMALLINT` | `Int16` | | | `INT` | `Int32` | | | `BIGINT` | `Int64` | | | `SMALLMONEY` | `Decimal128(10, 4)` | Fixed precision with 4 decimal places. | | `MONEY` | `Decimal128(19, 4)` | Fixed precision with 4 decimal places. | | `DECIMAL(p,s)` | `Decimal128(p, s)` / `Decimal256(p, s)` | Uses SQL Server precision and scale directly. Decimal256 for precision > 38. | | `NUMERIC(p,s)` | `Decimal128(p, s)` / `Decimal256(p, s)` | Same as DECIMAL. Default precision is 18. | | `REAL` | `Float64` | | | `FLOAT` | `Float64` | | | SQL Server Type(s) | Apache Arrow DataType | Notes | | :----------------- | :-------------------- | :---- | | `BIT` | `Boolean` | | | SQL Server Type(s) | Apache Arrow DataType | Notes | | :-------------------- | :--------------------- | :------------------------------------------------------------------------ | | `DATE` | `Date32` | Days since UNIX epoch. | | `TIME` | `Time64(ns)` | Nanoseconds since midnight. | | `SMALLDATETIME` | `Timestamp(s)` | Second-level precision, minute-level accuracy (rounds to nearest minute). | | `DATETIME` | `Timestamp(ms)` | Millisecond precision, 3.33ms accuracy. | | `DATETIME2(0)` | `Timestamp(s)` | Second precision. | | `DATETIME2(1-3)` | `Timestamp(ms)` | Millisecond precision (1-3 fractional digits). | | `DATETIME2(4-6)` | `Timestamp(μs)` | Microsecond precision (4-6 fractional digits). | | `DATETIME2(7)` | `Utf8` | Mapped to string to preserve full range (years 1-9999). | | `DATETIMEOFFSET(0)` | `Timestamp(s, "UTC")` | Second precision, normalized to UTC. | | `DATETIMEOFFSET(1-3)` | `Timestamp(ms, "UTC")` | Millisecond precision (1-3 fractional digits), normalized to UTC. | | `DATETIMEOFFSET(4-6)` | `Timestamp(μs, "UTC")` | Microsecond precision (4-6 fractional digits), normalized to UTC. | | `DATETIMEOFFSET(7)` | `Utf8` | Mapped to string to preserve full range (years 0001-9999). | | SQL Server Type(s) | Apache Arrow DataType | Notes | | :------------------ | :-------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `CHAR`, `VARCHAR` | `Utf8` | | | `NCHAR`, `NVARCHAR` | `Utf8` | Unicode character strings. | | `TEXT` | `Utf8` | Legacy large text type ([deprecated](https://learn.microsoft.com/en-us/sql/t-sql/data-types/ntext-text-and-image-transact-sql)). Use VARCHAR(MAX) instead. | | `NTEXT` | `Utf8` | Legacy large Unicode text type ([deprecated](https://learn.microsoft.com/en-us/sql/t-sql/data-types/ntext-text-and-image-transact-sql)). Use NVARCHAR(MAX) instead. | | `UNIQUEIDENTIFIER` | `Utf8` | UUID/GUID represented as a string. | | SQL Server Type(s) | Apache Arrow DataType | Notes | | :-------------------- | :-------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `BINARY`, `VARBINARY` | `Binary` | | | `ROWVERSION` | `Binary` | Automatically incremented binary identifier. See [rowversion](https://learn.microsoft.com/en-us/sql/t-sql/data-types/rowversion-transact-sql). | | `IMAGE` | `Binary` | Legacy large binary type ([deprecated](https://learn.microsoft.com/en-us/sql/t-sql/data-types/ntext-text-and-image-transact-sql)). Use VARBINARY(MAX) instead. | | SQL Server Type(s) | Apache Arrow DataType | | :----------------- | :-------------------- | | `XML` | `Utf8` | | `GEOGRAPHY` | *Unsupported* | | `GEOMETRY` | *Unsupported* | | `HIERARCHYID` | *Unsupported* | *** # SQL Server Setup (/docs/main/sources/sqlserver/setup) ## Prerequisites Before you begin, ensure you have: * **SQL Server Requirements:** * **Version:** SQL Server 2016 or higher * **Supported Editions:** * `Standard SQL Server` (Enterprise, Standard, Personal, Azure SQL Managed Instance) * `Azure SQL Database` * `Amazon RDS for SQL Server` CDC is not supported on `Azure Synapse`, `Azure SQL Edge`, or `Azure Synapse Serverless`. * **SQL Server Agent:** The [SQL Server Agent](https://learn.microsoft.com/en-us/sql/ssms/agent/sql-server-agent) service must be running, as it manages the CDC capture jobs. * **Setup Requirements:** * **Database Permissions:** `sysadmin` or `db_owner` role for CDC setup (see [Setup](#setup) for role-specific instructions) * **Network Connectivity:** Ensure Supermetal can reach your SQL Server (default port: 1433) *** ## Setup ### Permissions Overview Supermetal's SQL Server integration provides different levels of Change Data Capture (CDC) automation depending on your SQL Server edition and user permissions. The table below summarizes the automation capabilities: | SQL Server Edition | User Role | Automation Capabilities | | :----------------- | :----------- | :---------------------------------------------------------------------------------------------- | | Standard | `sysadmin` | Full CDC workflow automation (database CDC enablement, table CDC enablement, job configuration) | | Standard | `db_owner` | Partial automation (table CDC enablement, job configuration) | | Standard | Regular user | Limited automation (requires manual setup by a privileged user) | | Azure SQL Database | `db_owner` | Full CDC workflow automation (all CDC operations) | | Azure SQL Database | Regular user | Limited automation (requires manual setup by a privileged user) | | AWS RDS | `db_owner` | Partial automation (table CDC enablement, job configuration) | | AWS RDS | Regular user | Limited automation (requires manual setup by a privileged user) | We recommend creating a `db_owner` user for all SQL server editions as it provides the best tradeoff between automation and permissions. ### Setup Instructions #### SYSADMIN Role \[!toc] The `sysadmin` server role provides the highest level of permissions, enabling Supermetal to fully automate the CDC workflow, including enabling CDC at the database level, configuring tables for CDC, and managing CDC jobs. The `sysadmin` role is only available on Standard SQL Server editions. Azure SQL Database and AWS RDS restrict the use of `sysadmin` privileges.
#### Create a SYSADMIN User \[!toc] AWS RDS for SQL Server does not allow the creation of `sysadmin` users. Please use the DB\_OWNER role instead. Azure SQL Database does not support the `sysadmin` role. Please use the DB\_OWNER role instead. Replace `'strong-password'` with a secure, unique password for the Supermetal user. Store this password securely, as you'll need it when configuring the source in Supermetal.
#### DB\_OWNER Role \[!toc] The `db_owner` database role provides sufficient permissions for most CDC operations except enabling CDC at the database level on Standard SQL Server and AWS RDS (this step must be performed by a `sysadmin`). On Azure SQL Database, `db_owner` can automate the entire CDC workflow.
#### Create a DB\_OWNER User \[!toc] Replace the placeholder values in the script with your actual information: * `'strong-password'`: Replace with a secure, unique password for the `supermetal_user`. * `DATABASE_NAME`: The name of the database you want to replicate from.
#### Enable Snapshot Isolation \[!toc] Enable snapshot isolation on the database to ensure consistent reads during CDC operations. This step must be performed by a `sysadmin` user . Not required for Azure SQL Database. Supermetal automatically manages snapshot isolation enablement. Replace `DATABASE_NAME` with the name of the database you want to replicate from.
#### Enable CDC at Database Level \[!toc] This step is required for Standard SQL Server and AWS RDS. It creates the CDC schema and starts the SQL Server Agent jobs necessary for CDC. This step must be performed by a user with `sysadmin` privileges. Not required for Azure SQL Database. Supermetal automatically manages CDC enablement at the database level.
#### MINIMAL\_ACCESS\_USER Role \[!toc] A minimal access user has limited permissions and requires a user with higher privileges (such as `sysadmin` or `db_owner`) to set up the CDC environment. After setup, the minimal access user can be used for ongoing replication.
#### Create a Minimal Access User \[!toc] Replace the placeholder values in the script with your actual information: * `'strong-password'`: Replace with a secure, unique password for the `supermetal_user`. * `DATABASE_NAME`: The name of the database you want to replicate from. * `SCHEMA_NAME`: The schema of the table you want to replicate from. * `TABLE_NAME`: The name of the table you want to replicate from.
#### Enable Snapshot Isolation \[!toc] Enable snapshot isolation on the database to ensure consistent reads during CDC operations. This step must be performed by a user with appropriate privileges (typically `sysadmin` or `db_owner`). Replace `DATABASE_NAME` with the name of the database you want to replicate from.
#### Enable CDC at Database Level \[!toc] This step must be performed by a user with appropriate privileges (typically `sysadmin` for Standard and RDS, `db_owner` for Azure). If this user does not have SELECT permission on the database schema, then we need to explicitly grant SELECT permission on the \[cdc] schema: ```sql GRANT SELECT on SCHEMA::[cdc] to [supermetal_user]; ```
#### Enable CDC for Tables The following query creates a capture instance for a specific table. This step must be performed by a user with appropriate privileges. Replace the placeholder values in the script with your actual information: * `SCHEMA_NAME`: The schema of the table you want to replicate from. * `TABLE_NAME`: The name of the table you want to replicate from.
#### Configure CDC Job Parameters (Optional) You can customize the CDC job parameters to optimize performance based on your workload. Replace the parameter values with appropriate values for your environment: * `INSERT_MAX_SCANS_VALUE`: Maximum number of scan cycles (default: 10) * `INSERT_MAX_TRANS_VALUE`: Maximum number of transactions (default: 500) * `INSERT_POLL_INTERVAL_SECS_VALUE`: Polling interval in seconds (default: 5)
# MySQL (/docs/main/sources/mysql) import { Tab, Tabs } from 'fumadocs-ui/components/tabs'; import { Callout } from 'fumadocs-ui/components/callout'; import { Accordion, Accordions } from 'fumadocs-ui/components/accordion'; MySQL's binary log (binlog) provides a reliable foundation for Change Data Capture and data replication. Supermetal builds on this capability to deliver efficient data extraction with minimal database overhead, leveraging Global Transaction Identifiers (GTIDs) for guaranteed consistency. This guide covers the features, prerequisites, and configuration steps required to connect MySQL with Supermetal mainly focusing on binlog setup and permissions. *** ## Features
Feature Notes
Initial Data Sync Performs fast initial data loads (snapshots) and historical backfills by chunking data within each table and reading each chunk concurrently using multiple connections: **Physical Partitioning**: Automatically detects and utilizes existing MySQL table partitions, extracting data from each partition in parallel **Dynamic Range Partitioning**: For non-partitioned tables with primary keys, creates optimized query boundaries based on data distribution for parallel extraction
Change Data Capture Captures data changes in real-time using MySQL's native [binary log](https://dev.mysql.com/doc/refman/8.0/en/binary-log.html) with minimal database impact. **GTID-based Consistency**: Uses [Global Transaction Identifiers](https://dev.mysql.com/doc/refman/8.0/en/replication-gtids.html) to maintain consistent replication position and preserve transaction boundaries, ensuring reliable recovery and preventing data loss. Supermetal validates the GTID position against the server's purged GTIDs to ensure continuity, preventing replication from a position that's no longer available in the binary log.
Schema Evolution Precise mapping of MySQL types to Apache Arrow with specialized handling for complex types, including DECIMAL (up to 76 digits precision), SET types (as arrays) and microsecond-precision temporal types. Schema evolution is properly tracked to ensure consistent downstream target compatibility.
Transactional Consistency Guarantees that the data synced reflects a consistent point in the upstream database's timeline, respecting transaction boundaries through GTID-based replication.
Catalog Support Ability to configure which schemas, tables and columns to sync.
*** ## Prerequisites Before you begin, ensure you have: * **MySQL Requirements:** * **Version:** MySQL 8.0 or later with GTID mode enabled * **Supported Deployments:** * `Self-Hosted MySQL` * `Amazon RDS for MySQL` (Coming Soon) * `Azure Database for MySQL` (Coming Soon) GTID mode must be enabled on your MySQL server for Change Data Capture functionality. See [MySQL GTID Documentation](https://dev.mysql.com/doc/refman/8.0/en/replication-gtids.html) for details. * **Binary Log Requirements:** * **Binary Log Format**: Must be set to `ROW` format * **Binary Log Retention**: Configure appropriate retention period (recommended: at least 3 days) * **Deployment Requirements:** * **Database Permissions**: User with `SELECT`, `REPLICATION SLAVE`, and `REPLICATION CLIENT` privileges * **Network Connectivity**: Ensure Supermetal can reach your MySQL server (default port: 3306) *** ## Setup ### Permissions Overview Supermetal requires a dedicated MySQL user with appropriate permissions to read data and access the binary log for CDC. | Required Permission | Purpose | | :------------------- | :------------------------------------------------------ | | `SELECT` | Read data from tables during snapshot and CDC | | `REPLICATION SLAVE` | Access the binary log for CDC | | `REPLICATION CLIENT` | Query server status information including GTID position | ### Setup Instructions ### Self-Hosted MySQL Setup \[!toc]
#### Enable Binary Logging and GTID Mode \[!toc] Edit your MySQL configuration file (`my.cnf` or `my.ini`): ```ini [mysqld] # Server identification (must be unique in your infrastructure) server-id = 223344 # Enable binary logging log_bin = mysql-bin binlog_format = ROW binlog_row_image = FULL binlog_row_metadata = FULL # Enable GTID mode gtid_mode = ON enforce_gtid_consistency = ON # Set binary log retention (3 days) binlog_expire_logs_seconds = 259200 ``` * `server-id`: Must be unique for each server in your replication topology. Can be any non-zero value between 1 and 4294967295. * `log_bin`: Base name for binary log files. * `binlog_format`: Must be set to ROW for CDC to work properly. * `binlog_row_image`: Must be set to FULL to capture complete before/after row images, which is required for CDC. * `binlog_row_metadata`: Must be set to FULL to include complete metadata in binary log events. * `gtid_mode` and `enforce_gtid_consistency`: Required for reliable replication position tracking. * `binlog_expire_logs_seconds`: Controls how long binary logs are retained before automatic deletion.
#### Create a Dedicated User for Replication \[!toc] Connect to MySQL as a privileged user and run: ```sql CREATE USER 'supermetal_user'@'%' IDENTIFIED BY 'strong-password'; GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'supermetal_user'@'%'; FLUSH PRIVILEGES; ``` Replace the following variables with your own values: * `supermetal_user`: Username for Supermetal access * `strong-password`: Secure password for the Supermetal user
#### Verify Configuration \[!toc] Verify binary logging and GTID configuration: ```sql SHOW VARIABLES LIKE 'gtid_mode'; SHOW VARIABLES LIKE 'enforce_gtid_consistency'; SHOW VARIABLES LIKE 'log_bin'; SHOW VARIABLES LIKE 'binlog_format'; SHOW VARIABLES LIKE 'binlog_row_image'; SHOW VARIABLES LIKE 'binlog_row_metadata'; SHOW VARIABLES LIKE 'server_id'; ``` The output should show: * `gtid_mode`: ON * `enforce_gtid_consistency`: ON * `log_bin`: ON * `binlog_format`: ROW * `binlog_row_image`: FULL * `binlog_row_metadata`: FULL * `server_id`: A non-zero value
### Amazon RDS MySQL Setup \[!toc]
#### Create a Parameter Group for GTID Mode \[!toc] 1. In the AWS RDS console, navigate to "Parameter groups" 2. Click "Create parameter group" 3. Select the MySQL engine and version that matches your RDS instance 4. Enter a name and description (e.g., "supermetal-cdc-params") 5. Click "Create" 6. Select the newly created parameter group and modify the following parameters: * `gtid-mode`: ON * `enforce_gtid_consistency`: ON * `binlog_format`: ROW * `binlog_row_image`: FULL * `binlog_row_metadata`: FULL In RDS, some parameters cannot be modified, including `server-id`. RDS automatically assigns a unique server-id to each instance, so you don't need to configure this manually.
#### Apply the Parameter Group to Your RDS Instance \[!toc] 1. Navigate to your RDS instances and select your MySQL instance 2. Click "Modify" 3. Under "Additional configuration", select the parameter group you created 4. Choose "Apply immediately" or schedule for your next maintenance window 5. Click "Continue" and confirm the changes
#### Set Binary Log Retention Period \[!toc] Connect to your RDS MySQL instance and run: ```sql CALL mysql.rds_set_configuration('binlog retention hours', 72); ``` Amazon RDS uses `binlog retention hours` instead of the standard `binlog_expire_logs_seconds` parameter. The default value is 0 (immediate removal), so this step is essential.
#### Create a User for Replication \[!toc] ```sql CREATE USER 'supermetal_user'@'%' IDENTIFIED BY 'strong-password'; GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'supermetal_user'@'%'; FLUSH PRIVILEGES; ``` Replace the following variables with your own values: * `supermetal_user`: Username for Supermetal access * `strong-password`: Secure password for the Supermetal user
### Azure Database for MySQL Setup \[!toc]
#### Configure Server Parameters \[!toc] 1. In the Azure portal, navigate to your MySQL server resource 2. Select "Server parameters" under the Settings section 3. Find and set the following parameters: * `gtid_mode`: ON * `enforce_gtid_consistency`: ON * `binlog_format`: ROW * `binlog_row_image`: FULL * `binlog_row_metadata`: FULL 4. Click "Save" to apply the changes 5. Restart your Azure Database for MySQL server if prompted Some parameters might be read-only in Azure Database for MySQL, particularly in Flexible Server configurations. The `server-id` is automatically assigned by Azure and cannot be modified.
#### Create a User for Replication \[!toc] Connect to your Azure MySQL instance and run: ```sql CREATE USER 'supermetal_user'@'%' IDENTIFIED BY 'strong-password'; GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'supermetal_user'@'%'; FLUSH PRIVILEGES; ``` Replace the following variables with your own values: * `supermetal_user`: Username for Supermetal access * `strong-password`: Secure password for the Supermetal user Ensure your Azure MySQL server's firewall rules allow connections from the Supermetal service IP addresses.
*** ## Data Types Mapping Supermetal automatically maps MySQL data types to Apache Arrow data types according to the following mapping: | MySQL Type | Arrow Type | Notes | | ----------- | ------------------------------------ | ------------------------------------------------------------------------------------------- | | `TINYINT` | `Int8` / `UInt8` | Signed/unsigned determined by column definition | | `SMALLINT` | `Int16` / `UInt16` | Signed/unsigned determined by column definition | | `MEDIUMINT` | `Int32` / `UInt32` | Signed/unsigned determined by column definition | | `INT` | `Int32` / `UInt32` | Signed/unsigned determined by column definition | | `BIGINT` | `Int64` / `UInt64` | Signed/unsigned determined by column definition | | `FLOAT` | `Float32` | | | `DOUBLE` | `Float64` | | | `DECIMAL` | `Decimal128` / `Decimal256` / `Utf8` | `Decimal128` for precision ≤ 38, `Decimal256` for precision ≤ 76, `Utf8` for precision > 76 | | MySQL Type | Arrow Type | Notes | | ---------- | ------------ | ------------------------------------------------------------------------------ | | `CHAR` | `Utf8` | | | `VARCHAR` | `Utf8` | | | `TEXT` | `Utf8` | Includes all TEXT variants (TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT) | | `ENUM` | `Utf8` | Original enum values preserved as strings | | `SET` | `List` | Converted to arrays of strings (e.g., 'value1,value2' → \["value1", "value2"]) | | `JSON` | `Utf8` | Preserved as JSON string | | MySQL Type | Arrow Type | Notes | | ----------- | ---------- | ------------------------------------------------------------------------------- | | `BINARY` | `Binary` | | | `VARBINARY` | `Binary` | | | `BLOB` | `Binary` | Includes all BLOB variants (TINYBLOB, BLOB, MEDIUMBLOB, LONGBLOB) | | `BIT` | `Utf8` | Converted to binary string representation (e.g., BIT(4) value 5 becomes "0101") | | MySQL Type | Arrow Type | Notes | | ----------- | -------------------------- | ------------------------------------------------------------------------------------------- | | `DATE` | `Date32` | | | `TIME` | `IntervalDayTime` / `Utf8` | `IntervalDayTime` for precision ≤ 3 (milliseconds), `Utf8` for precision > 3 (microseconds) | | `DATETIME` | `Timestamp(Microsecond)` | Full microsecond precision without timezone | | `TIMESTAMP` | `Timestamp(Microsecond)` | Full microsecond precision, stored in UTC internally by MySQL | | `YEAR` | `UInt16` | | | MySQL Type | Arrow Type | | ----------------- | ------------- | | `GEOMETRY` | *Unsupported* | | `POINT` | *Unsupported* | | `LINESTRING` | *Unsupported* | | `POLYGON` | *Unsupported* | | `MULTIPOINT` | *Unsupported* | | `MULTILINESTRING` | *Unsupported* | | `MULTIPOLYGON` | *Unsupported* | | `GEOMCOLLECTION` | *Unsupported* | | `VECTOR` | *Unsupported* | Columns with unsupported data types are automatically excluded from replication. Tables containing only unsupported column types cannot be replicated. # MySQL Setup (/docs/main/sources/mysql/setup) ## Prerequisites Before you begin, ensure you have: * **MySQL Requirements:** * **Version:** MySQL 8.0 or later with GTID mode enabled * **Supported Deployments:** * `Self-Hosted MySQL` * `Amazon RDS for MySQL` (Coming Soon) * `Azure Database for MySQL` (Coming Soon) GTID mode must be enabled on your MySQL server for Change Data Capture functionality. See [MySQL GTID Documentation](https://dev.mysql.com/doc/refman/8.0/en/replication-gtids.html) for details. * **Binary Log Requirements:** * **Binary Log Format**: Must be set to `ROW` format * **Binary Log Retention**: Configure appropriate retention period (recommended: at least 3 days) * **Deployment Requirements:** * **Database Permissions**: User with `SELECT`, `REPLICATION SLAVE`, and `REPLICATION CLIENT` privileges * **Network Connectivity**: Ensure Supermetal can reach your MySQL server (default port: 3306) *** ## Setup ### Permissions Overview Supermetal requires a dedicated MySQL user with appropriate permissions to read data and access the binary log for CDC. | Required Permission | Purpose | | :------------------- | :------------------------------------------------------ | | `SELECT` | Read data from tables during snapshot and CDC | | `REPLICATION SLAVE` | Access the binary log for CDC | | `REPLICATION CLIENT` | Query server status information including GTID position | ### Setup Instructions ### Self-Hosted MySQL Setup \[!toc]
#### Enable Binary Logging and GTID Mode \[!toc] Edit your MySQL configuration file (`my.cnf` or `my.ini`): ```ini [mysqld] # Server identification (must be unique in your infrastructure) server-id = 223344 # Enable binary logging log_bin = mysql-bin binlog_format = ROW binlog_row_image = FULL binlog_row_metadata = FULL # Enable GTID mode gtid_mode = ON enforce_gtid_consistency = ON # Set binary log retention (3 days) binlog_expire_logs_seconds = 259200 ``` * `server-id`: Must be unique for each server in your replication topology. Can be any non-zero value between 1 and 4294967295. * `log_bin`: Base name for binary log files. * `binlog_format`: Must be set to ROW for CDC to work properly. * `binlog_row_image`: Must be set to FULL to capture complete before/after row images, which is required for CDC. * `binlog_row_metadata`: Must be set to FULL to include complete metadata in binary log events. * `gtid_mode` and `enforce_gtid_consistency`: Required for reliable replication position tracking. * `binlog_expire_logs_seconds`: Controls how long binary logs are retained before automatic deletion.
#### Create a Dedicated User for Replication \[!toc] Connect to MySQL as a privileged user and run: ```sql CREATE USER 'supermetal_user'@'%' IDENTIFIED BY 'strong-password'; GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'supermetal_user'@'%'; FLUSH PRIVILEGES; ``` Replace the following variables with your own values: * `supermetal_user`: Username for Supermetal access * `strong-password`: Secure password for the Supermetal user
#### Verify Configuration \[!toc] Verify binary logging and GTID configuration: ```sql SHOW VARIABLES LIKE 'gtid_mode'; SHOW VARIABLES LIKE 'enforce_gtid_consistency'; SHOW VARIABLES LIKE 'log_bin'; SHOW VARIABLES LIKE 'binlog_format'; SHOW VARIABLES LIKE 'binlog_row_image'; SHOW VARIABLES LIKE 'binlog_row_metadata'; SHOW VARIABLES LIKE 'server_id'; ``` The output should show: * `gtid_mode`: ON * `enforce_gtid_consistency`: ON * `log_bin`: ON * `binlog_format`: ROW * `binlog_row_image`: FULL * `binlog_row_metadata`: FULL * `server_id`: A non-zero value
### Amazon RDS MySQL Setup \[!toc]
#### Create a Parameter Group for GTID Mode \[!toc] 1. In the AWS RDS console, navigate to "Parameter groups" 2. Click "Create parameter group" 3. Select the MySQL engine and version that matches your RDS instance 4. Enter a name and description (e.g., "supermetal-cdc-params") 5. Click "Create" 6. Select the newly created parameter group and modify the following parameters: * `gtid-mode`: ON * `enforce_gtid_consistency`: ON * `binlog_format`: ROW * `binlog_row_image`: FULL * `binlog_row_metadata`: FULL In RDS, some parameters cannot be modified, including `server-id`. RDS automatically assigns a unique server-id to each instance, so you don't need to configure this manually.
#### Apply the Parameter Group to Your RDS Instance \[!toc] 1. Navigate to your RDS instances and select your MySQL instance 2. Click "Modify" 3. Under "Additional configuration", select the parameter group you created 4. Choose "Apply immediately" or schedule for your next maintenance window 5. Click "Continue" and confirm the changes
#### Set Binary Log Retention Period \[!toc] Connect to your RDS MySQL instance and run: ```sql CALL mysql.rds_set_configuration('binlog retention hours', 72); ``` Amazon RDS uses `binlog retention hours` instead of the standard `binlog_expire_logs_seconds` parameter. The default value is 0 (immediate removal), so this step is essential.
#### Create a User for Replication \[!toc] ```sql CREATE USER 'supermetal_user'@'%' IDENTIFIED BY 'strong-password'; GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'supermetal_user'@'%'; FLUSH PRIVILEGES; ``` Replace the following variables with your own values: * `supermetal_user`: Username for Supermetal access * `strong-password`: Secure password for the Supermetal user
### Azure Database for MySQL Setup \[!toc]
#### Configure Server Parameters \[!toc] 1. In the Azure portal, navigate to your MySQL server resource 2. Select "Server parameters" under the Settings section 3. Find and set the following parameters: * `gtid_mode`: ON * `enforce_gtid_consistency`: ON * `binlog_format`: ROW * `binlog_row_image`: FULL * `binlog_row_metadata`: FULL 4. Click "Save" to apply the changes 5. Restart your Azure Database for MySQL server if prompted Some parameters might be read-only in Azure Database for MySQL, particularly in Flexible Server configurations. The `server-id` is automatically assigned by Azure and cannot be modified.
#### Create a User for Replication \[!toc] Connect to your Azure MySQL instance and run: ```sql CREATE USER 'supermetal_user'@'%' IDENTIFIED BY 'strong-password'; GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'supermetal_user'@'%'; FLUSH PRIVILEGES; ``` Replace the following variables with your own values: * `supermetal_user`: Username for Supermetal access * `strong-password`: Secure password for the Supermetal user Ensure your Azure MySQL server's firewall rules allow connections from the Supermetal service IP addresses.
# PostgreSQL (/docs/main/sources/pg) import { Tab, Tabs } from 'fumadocs-ui/components/tabs'; import { Callout } from 'fumadocs-ui/components/callout'; import { Steps, Step } from 'fumadocs-ui/components/steps'; import { Accordion, Accordions } from 'fumadocs-ui/components/accordion'; PostgreSQL (Postgres) is a powerful open-source relational database that supports logical replication, enabling real-time Change Data Capture (CDC). This guide walks you through configuring your PostgreSQL database to work seamlessly with Supermetal. *** ## Features
Feature Notes
Initial Data Sync Performs fast initial data loads (snapshots) and historical backfills by chunking data within each table and reading each chunk concurrently using multiple connections.
Change Data Capture Captures data changes (inserts, updates, deletes) in real-time using PostgreSQL's native logical replication for minimal database impact and guaranteed consistency.
Transactional Consistency Guarantees that the data synced reflects a consistent point in the upstream database's timeline, respecting transaction boundaries.
Catalog Support Ability to configure which schemas, tables and columns to sync.
*** ## Prerequisites Before you begin, ensure you have: * **Supported PostgreSQL Version:** PostgreSQL version 9.4 or higher. * **Database Admin Access:** Sufficient privileges to create users, grant permissions, and modify database configuration parameters. This might require `superuser` privileges or specific cloud provider roles (e.g., `rds_superuser` on AWS RDS, `azure_pg_admin` on Azure). * **Network Connectivity:** Ensure that Supermetal's services can reach your PostgreSQL database. You may need to configure firewall rules, security groups, or VPC peering. *** ## Setup Supermetal supports two replication types for PostgreSQL sources: * **Snapshot**: One-time or periodic full table snapshots. * **Logical Replication**: Real-time change data capture (CDC) with initial snapshot. Follow these steps to configure PostgreSQL for snapshot-only replication: ### Create Read-Only User Create a dedicated database user for Supermetal with read-only permissions. ```sql -- Create user with a secure password CREATE USER supermetal_user WITH PASSWORD 'strong-password'; ``` Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the source in Supermetal. ### Grant Table Access Grant SELECT permissions on the schemas and tables you want to replicate. ```sql -- Grant schema access GRANT USAGE ON SCHEMA public TO supermetal_user; -- Grant select permission on all current tables in the schema GRANT SELECT ON ALL TABLES IN SCHEMA public TO supermetal_user; -- Ensure select permission is granted on any new tables created in the future ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO supermetal_user; -- (Optional but Recommended) Bypass Row-Level Security (RLS) -- This ensures Supermetal reads all data, regardless of RLS policies -- applied for application users. Only grant if necessary and understood. ALTER USER supermetal_user BYPASSRLS; ``` Repeat for each schema being replicated. Replace `public` with your schema name. ### Configure PostgreSQL Source in Supermetal You are now ready to configure PostgreSQL as a source within Supermetal. Select **Snapshot** as the replication type when configuring the source. Follow these steps to configure PostgreSQL for logical replication with CDC: ### Configure User and Permissions Create a dedicated database user for Supermetal with the necessary permissions for replication and data access. #### Create Replication User ```sql -- Create user with minimal required privileges CREATE USER supermetal_user WITH PASSWORD 'strong-password' REPLICATION; ``` ```sql -- Create user and grant RDS replication role CREATE USER supermetal_user WITH PASSWORD 'strong-password'; GRANT rds_replication TO supermetal_user; ``` ```sql -- Create replication user CREATE USER supermetal_user WITH PASSWORD 'strong-password'; ALTER USER supermetal_user WITH REPLICATION; ``` ```sql -- Create replication user CREATE USER supermetal_user WITH PASSWORD 'strong-password'; ALTER ROLE supermetal_user WITH REPLICATION; ``` Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the source in Supermetal. #### Grant Table Access ```sql -- Current and future table access GRANT USAGE ON SCHEMA public TO supermetal_user; -- Grant select permission on all current tables in the schema GRANT SELECT ON ALL TABLES IN SCHEMA public TO supermetal_user; -- Ensure select permission is granted on any new tables created in the future ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO supermetal_user; -- (Optional but Recommended) Bypass Row-Level Security (RLS) -- This ensures Supermetal syncs all data, regardless of RLS policies -- applied for application users. Only grant if necessary and understood. ALTER USER supermetal_user BYPASSRLS; ``` Repeat for each schema being replicated. Replace `public` with your schema name. ### Configure Replication Settings (WAL) Logical replication relies on PostgreSQL's Write-Ahead Log (WAL). You need to configure specific parameters to enable it. #### Understanding WAL Settings These settings must be configured for logical replication to work: * `wal_level`: Must be set to `logical`. This adds the necessary information to the WAL for logical decoding. * `max_wal_senders`: Defines the maximum number of concurrent replication connections the server can support. Supermetal uses these connections for CDC. * `max_replication_slots`: Defines the maximum number of replication slots the server can manage. Each Supermetal CDC connection requires a replication slot to track its progress and prevent the server from purging WAL files needed for replication. This should generally be equal to or greater than max\_wal\_senders. #### Platform-Specific Configuration Apply the settings based on your PostgreSQL environment. 1. Check current settings: ```sql -- View all replication settings SELECT name, setting, unit, context FROM pg_settings WHERE name IN ( 'wal_level', 'max_wal_senders', 'max_replication_slots' ); ``` 2. Configure settings: ```sql ALTER SYSTEM SET wal_level = 'logical'; ALTER SYSTEM SET max_wal_senders = 10; -- Adjust for your tables ALTER SYSTEM SET max_replication_slots = 10; -- Match max_wal_senders ``` 3. Restart PostgreSQL Modify the DB Parameter Group: 1. Go to your RDS instance configuration. 2. Find the associated DB Parameter Group and modify it (or create a new one and associate it). 3. Set the following parameters: 1. `rds.logical_replication` = 1 (This enables logical replication on RDS) 2. `wal_level` = logical (Often set automatically when rds.logical\_replication is 1) 3. `max_wal_senders` = 10 (Adjust as needed) 4. `max_replication_slots` = 10 (Match or exceed max\_wal\_senders) 4. Apply to RDS instance and reboot: Applying changes to a Parameter Group that includes static parameters like wal\_level requires rebooting the RDS instance. 1. Enable Logical Decoding Flag: 2. Navigate to your Cloud SQL for PostgreSQL instance in the Google Cloud Console. Click Edit. 3. Go to the Flags section. 4. Add or modify the following flags: 1. `cloudsql.logical_decoding` = on 2. `wal_sender_timeout` = 0 (Optional but recommended. Setting to 0 disables the timeout, preventing replication issues during long transactions or initial syncs. Default is 60 seconds.) 5. Automatic Management: Cloud SQL automatically manages `wal_level`, `max_wal_senders`, and `max_replication_slots` when `cloudsql.logical_decoding` is enabled. You do not need to set these manually. 6. You'll need to restart your instance to update your instance with the changes. 1. Navigate to your Azure Database for PostgreSQL instance in the Azure Portal. 2. Go to Server parameters. Set the following parameters: 1. `wal_level` = logical 2. `max_wal_senders` = 10 (Adjust as needed) 3. `max_replication_slots` = 10 (Match or exceed max\_wal\_senders) 4. `wal_sender_timeout` = 0 (Optional but recommended, prevents timeouts. Default might be 60000ms) 3. Save and Restart Server: Changes to these parameters require a server restart to take effect. Azure Portal will prompt you to save and restart. Changes to WAL parameters require a server restart. ### Configure Publications Publications define which tables are replicated. Choose between replicating all tables or specific ones. ```sql -- Option 1: Create a publication for ALL tables CREATE PUBLICATION supermetal_publication FOR ALL TABLES; -- Option 2: For warehouse targets, to sync partitioned tables as root, include publish_via_partition_root flag (https://www.postgresql.org/docs/current/sql-createpublication.html#SQL-CREATEPUBLICATION-PARAMS-WITH-PUBLISH-VIA-PARTITION-ROOT) CREATE PUBLICATION supermetal_publication FOR ALL TABLES WITH (publish_via_partition_root); ``` ### Configure PostgreSQL Source in Supermetal You are now ready to configure PostgreSQL as a source within Supermetal. Select **Logical Replication** as the replication type when configuring the source. *** ## Data Types Mapping | PostgreSQL Type(s) | Apache Arrow DataType | Notes | | :----------------- | :-------------------- | :---- | | `BOOL` | `Boolean` | | | PostgreSQL Type(s) | Apache Arrow DataType | Notes | | :------------------------ | :-------------------------------------- | :----------------------------------------------------------------------------------------------- | | `INT2` | `Int16` | | | `INT4` | `Int32` | | | `INT8`, `MONEY` | `Int64` | `MONEY` is mapped as a fixed-point integer. | | `FLOAT4` | `Float32` | Single precision floating-point. | | `FLOAT8` | `Float64` | Double precision floating-point. | | `NUMERIC(p, s)` | `Decimal128(p, s)` / `Decimal256(p, s)` | Uses PostgreSQL precision (`p`) and scale (`s`) directly to ensure exact numeric representation. | | `NUMERIC` (unconstrained) | `Utf8` | Unconstrained NUMERIC defaults to `Utf8` as precision/scale are undefined. | | `OID` | `UInt32` | Object Identifier. | | `PG_LSN` | `UInt64` | PostgreSQL Log Sequence Number. | | PostgreSQL Type(s) | Apache Arrow DataType | Notes | | :------------------- | :--------------------- | :--------------------------------------------------------------------------------------- | | `DATE` | `Date32` | Days since UNIX epoch. Precision preserved at the day level. | | `TIME` | `Time64(µs)` | Microseconds since midnight. Precision preserved at the microsecond level. | | `TIMESTAMP` | `Timestamp(µs)` | Microseconds since UNIX epoch, timezone naive. Precision preserved at microsecond level. | | `TIMESTAMPTZ` | `Timestamp(µs, "UTC")` | Microseconds since UNIX epoch, assumed UTC. Precision preserved at microsecond level. | | `TIMETZ`, `INTERVAL` | `Utf8` | Temporal types with string representation in Arrow. | | PostgreSQL Type(s) | Apache Arrow DataType | Notes | | :----------------------------------------------------- | :-------------------- | :----------------------------------------------------------------------- | | `CHAR`, `VARCHAR`, `TEXT`, `BPCHAR`, `NAME`, `UNKNOWN` | `Utf8` | General-purpose text types. | | `citext`, `ltree`, `lquery`, `ltxtquery` | `Utf8` | Special text types (checked by name in code). | | `BIT`, `VARBIT` | `Utf8` | Bit string types represented as text. | | `UUID` | `Utf8` | Universally Unique Identifier. | | `ENUM` | `Utf8` | Enumerated types represented as their text labels. | | `INET`, `CIDR`, `MACADDR`, `MACADDR8` | `Utf8` | Network address types. | | `TSVECTOR` | `Utf8` | Text search vector type represented as text. Positions are not preserved | | PostgreSQL Type(s) | Apache Arrow DataType | Notes | | :----------------- | :-------------------- | :-------------------------- | | `BYTEA` | `Binary` | Variable-length byte array. | | PostgreSQL Type(s) | Apache Arrow DataType | Notes | | :----------------- | :-------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `JSON`, `JSONB` | `Utf8` | Mapped to `Utf8`. Using the [Canonical JSON Extension](https://arrow.apache.org/docs/format/CanonicalExtensions.html#json) (`arrow.json`) is recommended for interoperability, allowing downstream systems to recognize the content as JSON. This metadata signals to downstream supermetal targets (like Snowflake, Clickhouse, etc.) that the string contains JSON, allowing them to map it to their appropriate semi-structured types (e.g., VARIANT, JSON). | | `XML` | `Utf8` | Mapped to `Utf8`. A canonical extension type for XML might also exist or be defined depending on specific library support, similar to JSON. | | PostgreSQL Type(s) | Apache Arrow DataType | Notes | | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :--------------------------- | :---------------------------------------------------------- | | `BOOL[]` | `List` | Base types map as per their categories. | | `BYTEA[]` | `List` | | | `INT2[]` | `List` | | | `INT4[]` | `List` | | | `INT8[]`, `MONEY[]` | `List` | | | `OID[]` | `List` | | | `PG_LSN[]` | `List` | | | `FLOAT4[]` | `List` | | | `FLOAT8[]` | `List` | | | `NUMERIC(p, s)[]` | `List` | Precision and scale preserved within list elements. | | `DATE[]` | `List` | | | `TIME[]` | `List` | | | `TIMESTAMP[]` | `List` | | | `TIMESTAMPTZ[]` | `List` | | | `CHAR[]`, `VARCHAR[]`, `TEXT[]`, etc., `_citext`, `_ltree`, etc., `BIT[]`, `VARBIT[]`, `TIMETZ[]`, `INTERVAL[]`, `XML[]`, `UUID[]`, `INET[]`, `CIDR[]`, `MACADDR8[]`, `MACADDR[]`, `JSON[]`, `JSONB[]`, `NUMERIC[]` (unconstrained) | `List` | Includes arrays of types mapped to `Utf8`. | | Multi-dimensional Arrays (>1 dim) | *Unsupported* | Multi-dimensional arrays (dimension > 1) are not supported. | | PostgreSQL Type(s) | Apache Arrow DataType | Notes | | :------------------------------------------------------------------------ | :-------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `INT4RANGE`, `INT8RANGE`, `NUMRANGE`, `TSRANGE`, `TSTZRANGE`, `DATERANGE` | `Utf8` | Range types are serialized to JSON format using the [Canonical JSON Extension](https://arrow.apache.org/docs/format/CanonicalExtensions.html#json) (`arrow.json`). | | Multirange types | *Unsupported* | Multirange types are not supported. | | PostgreSQL Type(s) | Apache Arrow DataType | Notes | | :-------------------- | :--------------------- | :--------------------------------------------------------------------------------------------------------------------- | | `DOMAIN(base_type)` | *Mapped as base\_type* | Domains are resolved to their underlying base type. | | Composite Types (ROW) | `Utf8` | Composite types (structs/rows) are not explicitly mapped and default to a string (e.g., JSON) representation (`Utf8`). | | PostgreSQL Type(s) | Apache Arrow DataType | Notes | | :---------------------------------------------------------- | :-------------------- | :------------------------------- | | `POINT`, `LSEG`, `PATH`, `BOX`, `POLYGON`, `LINE`, `CIRCLE` | *Unsupported* | Geometric types are unsupported. | When replicating from PostgreSQL to PostgreSQL, Supermetal replicates the source PostgreSQL data types 1:1, preserving the exact type from source to target when possible. *** # PostgreSQL Setup (/docs/main/sources/pg/setup) ## Prerequisites Before you begin, ensure you have: * **Supported PostgreSQL Version:** PostgreSQL version 9.4 or higher. * **Database Admin Access:** Sufficient privileges to create users, grant permissions, and modify database configuration parameters. This might require `superuser` privileges or specific cloud provider roles (e.g., `rds_superuser` on AWS RDS, `azure_pg_admin` on Azure). * **Network Connectivity:** Ensure that Supermetal's services can reach your PostgreSQL database. You may need to configure firewall rules, security groups, or VPC peering. *** ## Setup Supermetal supports two replication types for PostgreSQL sources: * **Snapshot**: One-time or periodic full table snapshots. * **Logical Replication**: Real-time change data capture (CDC) with initial snapshot. Follow these steps to configure PostgreSQL for snapshot-only replication: ### Create Read-Only User Create a dedicated database user for Supermetal with read-only permissions. ```sql -- Create user with a secure password CREATE USER supermetal_user WITH PASSWORD 'strong-password'; ``` Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the source in Supermetal. ### Grant Table Access Grant SELECT permissions on the schemas and tables you want to replicate. ```sql -- Grant schema access GRANT USAGE ON SCHEMA public TO supermetal_user; -- Grant select permission on all current tables in the schema GRANT SELECT ON ALL TABLES IN SCHEMA public TO supermetal_user; -- Ensure select permission is granted on any new tables created in the future ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO supermetal_user; -- (Optional but Recommended) Bypass Row-Level Security (RLS) -- This ensures Supermetal reads all data, regardless of RLS policies -- applied for application users. Only grant if necessary and understood. ALTER USER supermetal_user BYPASSRLS; ``` Repeat for each schema being replicated. Replace `public` with your schema name. ### Configure PostgreSQL Source in Supermetal You are now ready to configure PostgreSQL as a source within Supermetal. Select **Snapshot** as the replication type when configuring the source. Follow these steps to configure PostgreSQL for logical replication with CDC: ### Configure User and Permissions Create a dedicated database user for Supermetal with the necessary permissions for replication and data access. #### Create Replication User ```sql -- Create user with minimal required privileges CREATE USER supermetal_user WITH PASSWORD 'strong-password' REPLICATION; ``` ```sql -- Create user and grant RDS replication role CREATE USER supermetal_user WITH PASSWORD 'strong-password'; GRANT rds_replication TO supermetal_user; ``` ```sql -- Create replication user CREATE USER supermetal_user WITH PASSWORD 'strong-password'; ALTER USER supermetal_user WITH REPLICATION; ``` ```sql -- Create replication user CREATE USER supermetal_user WITH PASSWORD 'strong-password'; ALTER ROLE supermetal_user WITH REPLICATION; ``` Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the source in Supermetal. #### Grant Table Access ```sql -- Current and future table access GRANT USAGE ON SCHEMA public TO supermetal_user; -- Grant select permission on all current tables in the schema GRANT SELECT ON ALL TABLES IN SCHEMA public TO supermetal_user; -- Ensure select permission is granted on any new tables created in the future ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO supermetal_user; -- (Optional but Recommended) Bypass Row-Level Security (RLS) -- This ensures Supermetal syncs all data, regardless of RLS policies -- applied for application users. Only grant if necessary and understood. ALTER USER supermetal_user BYPASSRLS; ``` Repeat for each schema being replicated. Replace `public` with your schema name. ### Configure Replication Settings (WAL) Logical replication relies on PostgreSQL's Write-Ahead Log (WAL). You need to configure specific parameters to enable it. #### Understanding WAL Settings These settings must be configured for logical replication to work: * `wal_level`: Must be set to `logical`. This adds the necessary information to the WAL for logical decoding. * `max_wal_senders`: Defines the maximum number of concurrent replication connections the server can support. Supermetal uses these connections for CDC. * `max_replication_slots`: Defines the maximum number of replication slots the server can manage. Each Supermetal CDC connection requires a replication slot to track its progress and prevent the server from purging WAL files needed for replication. This should generally be equal to or greater than max\_wal\_senders. #### Platform-Specific Configuration Apply the settings based on your PostgreSQL environment. 1. Check current settings: ```sql -- View all replication settings SELECT name, setting, unit, context FROM pg_settings WHERE name IN ( 'wal_level', 'max_wal_senders', 'max_replication_slots' ); ``` 2. Configure settings: ```sql ALTER SYSTEM SET wal_level = 'logical'; ALTER SYSTEM SET max_wal_senders = 10; -- Adjust for your tables ALTER SYSTEM SET max_replication_slots = 10; -- Match max_wal_senders ``` 3. Restart PostgreSQL Modify the DB Parameter Group: 1. Go to your RDS instance configuration. 2. Find the associated DB Parameter Group and modify it (or create a new one and associate it). 3. Set the following parameters: 1. `rds.logical_replication` = 1 (This enables logical replication on RDS) 2. `wal_level` = logical (Often set automatically when rds.logical\_replication is 1) 3. `max_wal_senders` = 10 (Adjust as needed) 4. `max_replication_slots` = 10 (Match or exceed max\_wal\_senders) 4. Apply to RDS instance and reboot: Applying changes to a Parameter Group that includes static parameters like wal\_level requires rebooting the RDS instance. 1. Enable Logical Decoding Flag: 2. Navigate to your Cloud SQL for PostgreSQL instance in the Google Cloud Console. Click Edit. 3. Go to the Flags section. 4. Add or modify the following flags: 1. `cloudsql.logical_decoding` = on 2. `wal_sender_timeout` = 0 (Optional but recommended. Setting to 0 disables the timeout, preventing replication issues during long transactions or initial syncs. Default is 60 seconds.) 5. Automatic Management: Cloud SQL automatically manages `wal_level`, `max_wal_senders`, and `max_replication_slots` when `cloudsql.logical_decoding` is enabled. You do not need to set these manually. 6. You'll need to restart your instance to update your instance with the changes. 1. Navigate to your Azure Database for PostgreSQL instance in the Azure Portal. 2. Go to Server parameters. Set the following parameters: 1. `wal_level` = logical 2. `max_wal_senders` = 10 (Adjust as needed) 3. `max_replication_slots` = 10 (Match or exceed max\_wal\_senders) 4. `wal_sender_timeout` = 0 (Optional but recommended, prevents timeouts. Default might be 60000ms) 3. Save and Restart Server: Changes to these parameters require a server restart to take effect. Azure Portal will prompt you to save and restart. Changes to WAL parameters require a server restart. ### Configure Publications Publications define which tables are replicated. Choose between replicating all tables or specific ones. ```sql -- Option 1: Create a publication for ALL tables CREATE PUBLICATION supermetal_publication FOR ALL TABLES; -- Option 2: For warehouse targets, to sync partitioned tables as root, include publish_via_partition_root flag (https://www.postgresql.org/docs/current/sql-createpublication.html#SQL-CREATEPUBLICATION-PARAMS-WITH-PUBLISH-VIA-PARTITION-ROOT) CREATE PUBLICATION supermetal_publication FOR ALL TABLES WITH (publish_via_partition_root); ``` ### Configure PostgreSQL Source in Supermetal You are now ready to configure PostgreSQL as a source within Supermetal. Select **Logical Replication** as the replication type when configuring the source. # Oracle (/docs/main/sources/oracle) import { Tab, Tabs } from 'fumadocs-ui/components/tabs'; import { Steps, Step } from 'fumadocs-ui/components/steps'; import { Callout } from 'fumadocs-ui/components/callout'; import { Accordion, Accordions } from 'fumadocs-ui/components/accordion'; Supermetal supports change data capture (CDC) using Oracle's native [LogMiner](https://docs.oracle.com/en/database/oracle/oracle-database/23/sutil/oracle-logminer-utility.html) for querying redo logs and enabling real-time Change Data Capture (CDC). This guide walks you through configuring your Oracle database to work seamlessly with Supermetal. *** ## Features
Feature Notes
Initial Data Sync Performs fast initial data loads (snapshots) and historical backfills by chunking data within each table and reading each chunk concurrently using multiple connections.
Change Data Capture Captures data changes (inserts, updates, deletes) using Oracle's native [LogMiner](https://docs.oracle.com/en/database/oracle/oracle-database/23/sutil/oracle-logminer-utility.html) utility. LogMiner reads the database redo and archive logs to extract change data with guaranteed transactional consistency.
Transactional Consistency Guarantees that the data synced reflects a consistent point in the upstream database's timeline, respecting transaction boundaries through LogMiner's redo log analysis.
Catalog Support Ability to configure which schemas, tables and columns to sync.
Multi-tenant Architecture Support Works with Oracle Container Database (CDB) and Pluggable Databases (PDB).
*** ## Limitations * [LogMiner](https://docs.oracle.com/en/database/oracle/oracle-database/23/sutil/oracle-logminer-utility.html#GUID-7594F0D7-0ACD-46E6-BD61-2751136ECDB4): * Objects (table or columns) with names longer than 30 characters are not supported. * LOB data types: Partially supported. LogMiner only logs column values for LOB (XML, CLOB, NCLOB, and BLOB) data types if they are explicitly set or changed in a SQL statement. * Schema evolution support depends on LogMiner dictionary strategy: * **Redo logs**: Supported. * **Online catalog**: Partial support, limited to backwards-compatible changes. *** ## Prerequisites Before you begin, ensure you have: * **Supported Oracle Version:** Oracle Database 12c or higher Standard standalone Oracle database without container architecture. | Mode | Description | | -------------- | ---------------------------------------- | | **All PDBs** | Capture changes from all PDBs | | **Single PDB** | Capture changes from a specific PDB only | Capture changes by directly connecting to a pluggable database. Requires Oracle 19c Update 10 or later. **Oracle RAC (Real Application Clusters)** is not currently supported. * **Supported Platforms:** * **Self-Managed Oracle**: Generic Oracle databases on-premises or self-hosted in the cloud * **AWS RDS Oracle**: Amazon RDS for Oracle instances (single tenant and multi-tenant) * **Database Admin Access:** Database master or admin user with sufficient privileges to create users, grant permissions, and modify database configuration parameters * **Network Connectivity:** Ensure that Supermetal's services can reach your Oracle database. You may need to configure security groups *** ## Setup Follow these steps to configure your Oracle source: ### Create a user & grant permissions ```sql -- Connect to your Oracle database as an Admin user -- Create a dedicated user for Supermetal CREATE USER SUPERMETAL_USER IDENTIFIED BY "strong-password"; -- Grant session and catalog permissions GRANT CREATE SESSION TO SUPERMETAL_USER; GRANT EXECUTE_CATALOG_ROLE TO SUPERMETAL_USER; -- Grant LogMiner permissions GRANT LOGMINING TO SUPERMETAL_USER; -- Grant table access GRANT SELECT ANY TABLE TO SUPERMETAL_USER; GRANT SELECT ANY TRANSACTION TO SUPERMETAL_USER; -- Grant access to LogMiner dictionary views GRANT SELECT ON V_$DATABASE TO SUPERMETAL_USER; GRANT SELECT ON V_$ARCHIVED_LOG TO SUPERMETAL_USER; GRANT SELECT ON V_$LOGMNR_CONTENTS TO SUPERMETAL_USER; GRANT SELECT ON V_$LOGMNR_LOGS TO SUPERMETAL_USER; GRANT SELECT ON V_$LOG TO SUPERMETAL_USER; GRANT SELECT ON V_$LOGFILE TO SUPERMETAL_USER; GRANT SELECT ON V_$TRANSACTION TO SUPERMETAL_USER; GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO SUPERMETAL_USER; -- Required when using redo logs dictionary strategy (Oracle 19cU10+) GRANT SELECT ON DBA_LOGMNR_DICTIONARY_BUILDLOG TO SUPERMETAL_USER; -- Grant access to system views for metadata GRANT SELECT ON DBA_OBJECTS TO SUPERMETAL_USER; GRANT SELECT ON DBA_EXTENTS TO SUPERMETAL_USER; -- Grant execute permissions for LogMiner GRANT EXECUTE ON DBMS_LOGMNR TO SUPERMETAL_USER; GRANT EXECUTE ON DBMS_LOGMNR_D TO SUPERMETAL_USER; -- If TDE (Transparent Data Encryption) is enabled, also grant: -- GRANT SELECT ON DBA_TABLESPACES TO SUPERMETAL_USER; -- GRANT SELECT ON DBA_ENCRYPTED_COLUMNS TO SUPERMETAL_USER; ``` ```sql -- Connect to your Oracle database as an Admin user -- Switch to the container (root) database ALTER SESSION SET CONTAINER=CDB$ROOT; -- Create a common user (must start with C##) CREATE USER C##SUPERMETAL_USER IDENTIFIED BY "strong-password" CONTAINER=ALL; -- Grant session and container permissions GRANT CREATE SESSION, ALTER SESSION, SET CONTAINER TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT EXECUTE_CATALOG_ROLE TO C##SUPERMETAL_USER CONTAINER=ALL; -- Configure user to access all containers ALTER USER C##SUPERMETAL_USER SET CONTAINER_DATA=ALL CONTAINER=CURRENT; -- Grant LogMiner permissions GRANT LOGMINING TO C##SUPERMETAL_USER CONTAINER=ALL; -- Grant table access across all containers GRANT SELECT ANY TABLE TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ANY TRANSACTION TO C##SUPERMETAL_USER CONTAINER=ALL; -- Grant access to LogMiner dictionary views GRANT SELECT ON V_$DATABASE TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$ARCHIVED_LOG TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$LOGMNR_CONTENTS TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$LOGMNR_LOGS TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$LOG TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$LOGFILE TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$TRANSACTION TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$CONTAINERS TO C##SUPERMETAL_USER CONTAINER=ALL; -- Required when using redo logs dictionary strategy (Oracle 19cU10+) GRANT SELECT ON DBA_LOGMNR_DICTIONARY_BUILDLOG TO C##SUPERMETAL_USER CONTAINER=ALL; -- Grant access to system views for metadata GRANT SELECT ON DBA_OBJECTS TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON DBA_EXTENTS TO C##SUPERMETAL_USER CONTAINER=ALL; -- Grant execute permissions for LogMiner GRANT EXECUTE ON DBMS_LOGMNR TO C##SUPERMETAL_USER; GRANT EXECUTE ON DBMS_LOGMNR_D TO C##SUPERMETAL_USER; -- If TDE is enabled, also grant: -- GRANT SELECT ON DBA_TABLESPACES TO C##SUPERMETAL_USER CONTAINER=ALL; -- GRANT SELECT ON DBA_ENCRYPTED_COLUMNS TO C##SUPERMETAL_USER CONTAINER=ALL; ``` ```sql -- Connect to CDB root as an Admin user ALTER SESSION SET CONTAINER=CDB$ROOT; -- Create a common user (must start with C##) CREATE USER C##SUPERMETAL_USER IDENTIFIED BY "strong-password" CONTAINER=ALL; -- Grant session and container permissions GRANT CREATE SESSION, ALTER SESSION, SET CONTAINER TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT EXECUTE_CATALOG_ROLE TO C##SUPERMETAL_USER CONTAINER=ALL; -- Configure user to access specific PDB (replace PDB_NAME with your PDB) ALTER USER C##SUPERMETAL_USER SET CONTAINER_DATA=(CDB$ROOT, PDB_NAME) CONTAINER=CURRENT; -- Grant LogMiner permissions GRANT LOGMINING TO C##SUPERMETAL_USER CONTAINER=ALL; -- Switch to the specific PDB ALTER SESSION SET CONTAINER=PDB_NAME; -- Grant table access in the PDB GRANT SELECT ANY TABLE TO C##SUPERMETAL_USER; GRANT SELECT ANY TRANSACTION TO C##SUPERMETAL_USER; -- Switch back to CDB root ALTER SESSION SET CONTAINER=CDB$ROOT; -- Grant access to LogMiner dictionary views GRANT SELECT ON V_$DATABASE TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$ARCHIVED_LOG TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$LOGMNR_CONTENTS TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$LOGMNR_LOGS TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$LOG TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$LOGFILE TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$TRANSACTION TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$CONTAINERS TO C##SUPERMETAL_USER CONTAINER=ALL; -- Required when using redo logs dictionary strategy (Oracle 19cU10+) GRANT SELECT ON DBA_LOGMNR_DICTIONARY_BUILDLOG TO C##SUPERMETAL_USER CONTAINER=ALL; -- Grant access to system views for metadata GRANT SELECT ON DBA_OBJECTS TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON DBA_EXTENTS TO C##SUPERMETAL_USER CONTAINER=ALL; -- Grant execute permissions for LogMiner GRANT EXECUTE ON DBMS_LOGMNR TO C##SUPERMETAL_USER; GRANT EXECUTE ON DBMS_LOGMNR_D TO C##SUPERMETAL_USER; -- If TDE is enabled, also grant: -- GRANT SELECT ON DBA_TABLESPACES TO C##SUPERMETAL_USER CONTAINER=ALL; -- GRANT SELECT ON DBA_ENCRYPTED_COLUMNS TO C##SUPERMETAL_USER CONTAINER=ALL; ``` ```sql -- Connect directly to the PDB as an Admin user -- Requires Oracle 19c Update 10 or later -- Create a local user in the PDB CREATE USER SUPERMETAL_USER IDENTIFIED BY "strong-password"; -- Grant session and catalog permissions GRANT CREATE SESSION TO SUPERMETAL_USER; GRANT EXECUTE_CATALOG_ROLE TO SUPERMETAL_USER; -- Grant LogMiner permissions GRANT LOGMINING TO SUPERMETAL_USER; -- Grant table access GRANT SELECT ANY TABLE TO SUPERMETAL_USER; GRANT SELECT ANY TRANSACTION TO SUPERMETAL_USER; -- Grant access to LogMiner dictionary views GRANT SELECT ON V_$DATABASE TO SUPERMETAL_USER; GRANT SELECT ON V_$ARCHIVED_LOG TO SUPERMETAL_USER; GRANT SELECT ON V_$LOGMNR_CONTENTS TO SUPERMETAL_USER; GRANT SELECT ON V_$LOGMNR_LOGS TO SUPERMETAL_USER; GRANT SELECT ON V_$LOG TO SUPERMETAL_USER; GRANT SELECT ON V_$LOGFILE TO SUPERMETAL_USER; GRANT SELECT ON V_$TRANSACTION TO SUPERMETAL_USER; GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO SUPERMETAL_USER; -- Required when using redo logs dictionary strategy (Oracle 19cU10+) GRANT SELECT ON DBA_LOGMNR_DICTIONARY_BUILDLOG TO SUPERMETAL_USER; -- Grant access to system views for metadata GRANT SELECT ON DBA_OBJECTS TO SUPERMETAL_USER; GRANT SELECT ON DBA_EXTENTS TO SUPERMETAL_USER; -- Grant execute permissions for LogMiner GRANT EXECUTE ON DBMS_LOGMNR TO SUPERMETAL_USER; GRANT EXECUTE ON DBMS_LOGMNR_D TO SUPERMETAL_USER; -- If TDE is enabled, also grant: -- GRANT SELECT ON DBA_TABLESPACES TO SUPERMETAL_USER; -- GRANT SELECT ON DBA_ENCRYPTED_COLUMNS TO SUPERMETAL_USER; ``` ```sql -- Connect as the master user -- Create a user for Supermetal CREATE USER SUPERMETAL_USER IDENTIFIED BY "strong-password"; GRANT CONNECT TO SUPERMETAL_USER; GRANT CREATE SESSION TO SUPERMETAL_USER; -- Grant LogMiner and catalog permissions GRANT LOGMINING TO SUPERMETAL_USER; GRANT EXECUTE_CATALOG_ROLE TO SUPERMETAL_USER; -- Grant table access GRANT SELECT ANY TRANSACTION TO SUPERMETAL_USER; GRANT SELECT ANY TABLE TO SUPERMETAL_USER; -- Use RDS procedures to grant access to system objects BEGIN -- Required when using redo logs dictionary strategy (Oracle 19cU10+) rdsadmin.rdsadmin_util.grant_sys_object('DBA_LOGMNR_DICTIONARY_BUILDLOG','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$DATABASE','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$ARCHIVED_LOG','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGMNR_CONTENTS','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGMNR_LOGS','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$LOG','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGFILE','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$TRANSACTION','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$ARCHIVE_DEST_STATUS','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$PARAMETER','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('DBMS_LOGMNR','SUPERMETAL_USER','EXECUTE'); rdsadmin.rdsadmin_util.grant_sys_object('DBMS_LOGMNR_D','SUPERMETAL_USER','EXECUTE'); rdsadmin.rdsadmin_util.grant_sys_object('DBA_OBJECTS','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('DBA_EXTENTS','SUPERMETAL_USER','SELECT'); END; / -- If TDE is enabled, also grant: -- BEGIN -- rdsadmin.rdsadmin_util.grant_sys_object('DBA_TABLESPACES','SUPERMETAL_USER','SELECT'); -- rdsadmin.rdsadmin_util.grant_sys_object('DBA_ENCRYPTED_COLUMNS','SUPERMETAL_USER','SELECT'); -- END; -- / ``` ```sql -- Connect to the PDB as the master user -- Requires Oracle 19c Update 10 or later on RDS -- Create a local user in the PDB CREATE USER SUPERMETAL_USER IDENTIFIED BY "strong-password"; GRANT CONNECT TO SUPERMETAL_USER; GRANT CREATE SESSION TO SUPERMETAL_USER; -- Grant LogMiner and catalog permissions GRANT LOGMINING TO SUPERMETAL_USER; GRANT EXECUTE_CATALOG_ROLE TO SUPERMETAL_USER; -- Grant table access GRANT SELECT ANY TRANSACTION TO SUPERMETAL_USER; GRANT SELECT ANY TABLE TO SUPERMETAL_USER; -- Use RDS procedures to grant access to system objects BEGIN rdsadmin.rdsadmin_util.grant_sys_object('DBA_LOGMNR_DICTIONARY_BUILDLOG','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$DATABASE','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$ARCHIVED_LOG','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGMNR_CONTENTS','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGMNR_LOGS','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$LOG','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGFILE','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$TRANSACTION','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$ARCHIVE_DEST_STATUS','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$PARAMETER','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('DBMS_LOGMNR','SUPERMETAL_USER','EXECUTE'); rdsadmin.rdsadmin_util.grant_sys_object('DBMS_LOGMNR_D','SUPERMETAL_USER','EXECUTE'); rdsadmin.rdsadmin_util.grant_sys_object('DBA_OBJECTS','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('DBA_EXTENTS','SUPERMETAL_USER','SELECT'); END; / -- If TDE is enabled, also grant: -- BEGIN -- rdsadmin.rdsadmin_util.grant_sys_object('DBA_TABLESPACES','SUPERMETAL_USER','SELECT'); -- rdsadmin.rdsadmin_util.grant_sys_object('DBA_ENCRYPTED_COLUMNS','SUPERMETAL_USER','SELECT'); -- END; -- / ``` Oracle's default profiles often have low session limits which may impact Supermetal's parallel processing capabilities when snapshotting your database. For optimal performance, set your `SESSIONS_PER_USER` limit to the smaller of: * The number of Supermetal agent cores, or * The number of database cores (available from `v$parameter.cpu_count`) At minimum, we recommend setting this value to at least 10 sessions: ```sql -- Check current database CPU count SELECT VALUE FROM v$parameter WHERE NAME = 'cpu_count'; -- Create profile with appropriate session limit CREATE PROFILE SUPERMETAL_PROFILE LIMIT SESSIONS_PER_USER 10; -- Apply profile to Supermetal user ALTER USER SUPERMETAL_USER PROFILE SUPERMETAL_PROFILE; ``` For CDB environments, apply the profile to your common user instead: ```sql ALTER USER C##SUPERMETAL_USER PROFILE SUPERMETAL_PROFILE; ``` Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the source in Supermetal. ### Setup LogMiner #### Archive Log Mode Oracle's LogMiner requires that your database is running in ARCHIVELOG mode. You can check and enable it as follows: ```sql -- Check if database is in ARCHIVELOG mode SELECT log_mode FROM v$database; -- If not in ARCHIVELOG mode, enable it SHUTDOWN IMMEDIATE; STARTUP MOUNT; ALTER DATABASE ARCHIVELOG; ALTER DATABASE OPEN; ``` [Changing log mode](https://docs.oracle.com/en/database/oracle/oracle-database/23/sutil/oracle-logminer-utility.html#GUID-7594F0D7-0ACD-46E6-BD61-2751136ECDB4) requires a restart of the oracle database. Instruct Oracle [RMAN](https://docs.oracle.com/en/database/oracle/oracle-database/23/sutil/oracle-logminer-utility.html#GUID-7594F0D7-0ACD-46E6-BD61-2751136ECDB4) to retain archive logs for at least 3 days. ```sql RMAN> CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 3 DAYS; ``` For optimal LogMiner performance, ensure your redo logs are appropriately sized ```sql -- Check current redo log configuration SELECT group#, bytes/1024/1024 AS size_mb FROM v$log; ``` Each redo log file should be at least 500MB for production databases. If your logs are smaller, consider increasing their size by contacting your DBA ```sql -- Check if database is in ARCHIVELOG mode SELECT log_mode FROM v$database; -- If not in ARCHIVELOG mode, enable it by turning on backups in AWS RDS console ``` AWS RDS Oracle instances are automatically configured with ARCHIVELOG mode enabled when backups are enabled. Set archive log retention using the `set_configuration` procedure for at least 3 days: ```sql BEGIN rdsadmin.rdsadmin_util.set_configuration('archivelog retention hours', 72); END; / ``` Adjust this value based on your backup and recovery requirements. #### Database Supplemental Logging LogMiner requires supplemental logging to capture the necessary information for CDC: * **Minimal**: Lower redo log usage but requires table-level supplemental logging configuration for each table. * **Full**: Higher redo log usage but eliminates the need for table-level configuration. ```sql -- Enable minimal supplemental logging at the database level ALTER DATABASE ADD SUPPLEMENTAL LOG DATA; -- Then configure table-level supplemental logging for each table to be synced -- Repeat for each table to be synced: ALTER TABLE schema_name.table_name ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS; ``` ```sql -- Enable full supplemental logging at the database level ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS; ``` For more reliable CDC, consider enabling force logging at the database level: ```sql ALTER DATABASE FORCE LOGGING; ``` ```sql -- Enable full supplemental logging using RDS procedures BEGIN rdsadmin.rdsadmin_util.alter_supplemental_logging( p_action => 'ADD', p_type => 'ALL' ); END; / ``` For more reliable CDC, enable force logging using the RDS specific procedure: ```sql BEGIN rdsadmin.rdsadmin_util.force_logging(p_enable => true); END; ``` This ensures all database operations are logged. *** ## Data Types Mapping | Oracle Type(s) | Apache Arrow DataType | Notes | | :--------------------------------- | :-------------------- | :----------------------------------------- | | `NUMBER(p, s)` where p,s specified | `Decimal128(p, s)` | Constrained precision and scale. | | `NUMBER` (identity column) | `Decimal128(38, 0)` | Primary keys | | `NUMBER` (unconstrained) | `Utf8` | Unconstrained precision and scale | | `NUMBER(p, s)` where s \< 0 | `Utf8` | Negative scale numbers | | `BINARY_FLOAT` | `Float32` | 32-bit IEEE 754 single precision | | `BINARY_DOUBLE` | `Float64` | 64-bit IEEE 754 double precision | | `FLOAT(p)` where p ≤ 20 | `Float32` | Binary precision up to \~6 decimal digits | | `FLOAT(p)` where 20 \< p ≤ 49 | `Float64` | Binary precision up to \~15 decimal digits | | `FLOAT(p)` where p > 49 | `Utf8` | High-precision floats preserved as strings | | Oracle Type(s) | Apache Arrow DataType | Notes | | :------------- | :-------------------- | :---- | | `BOOLEAN` | *Unsupported* | | | Oracle Type(s) | Apache Arrow DataType | Notes | | :---------------------------- | :-------------------------- | :---------------------------------------------------------------------------------------------- | | `DATE` | `Timestamp(s)` | Oracle DATE includes both date and time components | | `TIMESTAMP(p)` | `Timestamp(s/ms/μs)` | Precision based on p:
p=0: seconds
p=1-3: milliseconds
p=4-6: microseconds | | `TIMESTAMP(ns)` | `Utf8` | | | `TIMESTAMP(p) WITH TIME ZONE` | `Timestamp(s/ms/μs, "UTC")` | Same precision levels as above, normalized to UTC | | `TIMESTAMP(p) WITH LOCAL TZ` | `Timestamp(s/ms/μs)` | Same precision levels as above, UTC timezone | | `INTERVAL YEAR(2) TO MONTH` | `Interval(YearMonth)` | Stores duration as months (years×12 + months) | | `INTERVAL YEAR TO MONTH` | `Utf8` | | | `INTERVAL DAY TO SECOND` | `Utf8` | |
| Oracle Type(s) | Apache Arrow DataType | Notes | | :------------- | :-------------------- | :---------------------------------------------------------------------------------------------------- | | `CHAR` | `Utf8` | Fixed-length character data | | `NCHAR` | `Utf8` | Fixed-length Unicode character data | | `VARCHAR2` | `Utf8` | Variable-length character data. Supports extended string size (MAX\_STRING\_SIZE = EXTENDED). | | `NVARCHAR2` | `Utf8` | Variable-length Unicode character data. Supports extended string size (MAX\_STRING\_SIZE = EXTENDED). | | `CLOB` | `LargeUtf8` | Character Large Object for text > 2GB | | `NCLOB` | `LargeUtf8` | National Character Large Object for Unicode > 2GB | | `LONG` | `LargeUtf8` | Legacy long text type (deprecated in Oracle) | | `ROWID` | `Utf8` | Oracle's physical row address | | Oracle Type(s) | Apache Arrow DataType | Notes | | :------------- | :-------------------- | :------------------------------------------------------------------------------- | | `RAW` | `Binary` | Fixed-length binary data. Supports extended size (MAX\_STRING\_SIZE = EXTENDED). | | `BLOB` | `LargeBinary` | Binary Large Object for binary data > 2GB | | `BFILE` | `LargeBinary` | Binary file locator for external files | | `LONG RAW` | `LargeBinary` | Legacy long binary type (deprecated in Oracle) | | Oracle Type(s) | Apache Arrow DataType | Notes | | :------------- | :-------------------- | :---------------------------------------------------------------------- | | `XMLTYPE` | `Utf8` | Mapped to `Utf8`. A extension type for XML is defined, similar to JSON. | | `JSON` | *Unsupported* | | | Oracle Type(s) | Apache Arrow DataType | Notes | | :------------- | :-------------------- | :----------------------------- | | `OBJECT` | *Unsupported* | Object types are not supported |
# Oracle Setup (/docs/main/sources/oracle/setup) ## Prerequisites Before you begin, ensure you have: * **Supported Oracle Version:** Oracle Database 12c or higher Standard standalone Oracle database without container architecture. | Mode | Description | | -------------- | ---------------------------------------- | | **All PDBs** | Capture changes from all PDBs | | **Single PDB** | Capture changes from a specific PDB only | Capture changes by directly connecting to a pluggable database. Requires Oracle 19c Update 10 or later. **Oracle RAC (Real Application Clusters)** is not currently supported. * **Supported Platforms:** * **Self-Managed Oracle**: Generic Oracle databases on-premises or self-hosted in the cloud * **AWS RDS Oracle**: Amazon RDS for Oracle instances (single tenant and multi-tenant) * **Database Admin Access:** Database master or admin user with sufficient privileges to create users, grant permissions, and modify database configuration parameters * **Network Connectivity:** Ensure that Supermetal's services can reach your Oracle database. You may need to configure security groups *** ## Setup Follow these steps to configure your Oracle source: ### Create a user & grant permissions ```sql -- Connect to your Oracle database as an Admin user -- Create a dedicated user for Supermetal CREATE USER SUPERMETAL_USER IDENTIFIED BY "strong-password"; -- Grant session and catalog permissions GRANT CREATE SESSION TO SUPERMETAL_USER; GRANT EXECUTE_CATALOG_ROLE TO SUPERMETAL_USER; -- Grant LogMiner permissions GRANT LOGMINING TO SUPERMETAL_USER; -- Grant table access GRANT SELECT ANY TABLE TO SUPERMETAL_USER; GRANT SELECT ANY TRANSACTION TO SUPERMETAL_USER; -- Grant access to LogMiner dictionary views GRANT SELECT ON V_$DATABASE TO SUPERMETAL_USER; GRANT SELECT ON V_$ARCHIVED_LOG TO SUPERMETAL_USER; GRANT SELECT ON V_$LOGMNR_CONTENTS TO SUPERMETAL_USER; GRANT SELECT ON V_$LOGMNR_LOGS TO SUPERMETAL_USER; GRANT SELECT ON V_$LOG TO SUPERMETAL_USER; GRANT SELECT ON V_$LOGFILE TO SUPERMETAL_USER; GRANT SELECT ON V_$TRANSACTION TO SUPERMETAL_USER; GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO SUPERMETAL_USER; -- Required when using redo logs dictionary strategy (Oracle 19cU10+) GRANT SELECT ON DBA_LOGMNR_DICTIONARY_BUILDLOG TO SUPERMETAL_USER; -- Grant access to system views for metadata GRANT SELECT ON DBA_OBJECTS TO SUPERMETAL_USER; GRANT SELECT ON DBA_EXTENTS TO SUPERMETAL_USER; -- Grant execute permissions for LogMiner GRANT EXECUTE ON DBMS_LOGMNR TO SUPERMETAL_USER; GRANT EXECUTE ON DBMS_LOGMNR_D TO SUPERMETAL_USER; -- If TDE (Transparent Data Encryption) is enabled, also grant: -- GRANT SELECT ON DBA_TABLESPACES TO SUPERMETAL_USER; -- GRANT SELECT ON DBA_ENCRYPTED_COLUMNS TO SUPERMETAL_USER; ``` ```sql -- Connect to your Oracle database as an Admin user -- Switch to the container (root) database ALTER SESSION SET CONTAINER=CDB$ROOT; -- Create a common user (must start with C##) CREATE USER C##SUPERMETAL_USER IDENTIFIED BY "strong-password" CONTAINER=ALL; -- Grant session and container permissions GRANT CREATE SESSION, ALTER SESSION, SET CONTAINER TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT EXECUTE_CATALOG_ROLE TO C##SUPERMETAL_USER CONTAINER=ALL; -- Configure user to access all containers ALTER USER C##SUPERMETAL_USER SET CONTAINER_DATA=ALL CONTAINER=CURRENT; -- Grant LogMiner permissions GRANT LOGMINING TO C##SUPERMETAL_USER CONTAINER=ALL; -- Grant table access across all containers GRANT SELECT ANY TABLE TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ANY TRANSACTION TO C##SUPERMETAL_USER CONTAINER=ALL; -- Grant access to LogMiner dictionary views GRANT SELECT ON V_$DATABASE TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$ARCHIVED_LOG TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$LOGMNR_CONTENTS TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$LOGMNR_LOGS TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$LOG TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$LOGFILE TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$TRANSACTION TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$CONTAINERS TO C##SUPERMETAL_USER CONTAINER=ALL; -- Required when using redo logs dictionary strategy (Oracle 19cU10+) GRANT SELECT ON DBA_LOGMNR_DICTIONARY_BUILDLOG TO C##SUPERMETAL_USER CONTAINER=ALL; -- Grant access to system views for metadata GRANT SELECT ON DBA_OBJECTS TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON DBA_EXTENTS TO C##SUPERMETAL_USER CONTAINER=ALL; -- Grant execute permissions for LogMiner GRANT EXECUTE ON DBMS_LOGMNR TO C##SUPERMETAL_USER; GRANT EXECUTE ON DBMS_LOGMNR_D TO C##SUPERMETAL_USER; -- If TDE is enabled, also grant: -- GRANT SELECT ON DBA_TABLESPACES TO C##SUPERMETAL_USER CONTAINER=ALL; -- GRANT SELECT ON DBA_ENCRYPTED_COLUMNS TO C##SUPERMETAL_USER CONTAINER=ALL; ``` ```sql -- Connect to CDB root as an Admin user ALTER SESSION SET CONTAINER=CDB$ROOT; -- Create a common user (must start with C##) CREATE USER C##SUPERMETAL_USER IDENTIFIED BY "strong-password" CONTAINER=ALL; -- Grant session and container permissions GRANT CREATE SESSION, ALTER SESSION, SET CONTAINER TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT EXECUTE_CATALOG_ROLE TO C##SUPERMETAL_USER CONTAINER=ALL; -- Configure user to access specific PDB (replace PDB_NAME with your PDB) ALTER USER C##SUPERMETAL_USER SET CONTAINER_DATA=(CDB$ROOT, PDB_NAME) CONTAINER=CURRENT; -- Grant LogMiner permissions GRANT LOGMINING TO C##SUPERMETAL_USER CONTAINER=ALL; -- Switch to the specific PDB ALTER SESSION SET CONTAINER=PDB_NAME; -- Grant table access in the PDB GRANT SELECT ANY TABLE TO C##SUPERMETAL_USER; GRANT SELECT ANY TRANSACTION TO C##SUPERMETAL_USER; -- Switch back to CDB root ALTER SESSION SET CONTAINER=CDB$ROOT; -- Grant access to LogMiner dictionary views GRANT SELECT ON V_$DATABASE TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$ARCHIVED_LOG TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$LOGMNR_CONTENTS TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$LOGMNR_LOGS TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$LOG TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$LOGFILE TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$TRANSACTION TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON V_$CONTAINERS TO C##SUPERMETAL_USER CONTAINER=ALL; -- Required when using redo logs dictionary strategy (Oracle 19cU10+) GRANT SELECT ON DBA_LOGMNR_DICTIONARY_BUILDLOG TO C##SUPERMETAL_USER CONTAINER=ALL; -- Grant access to system views for metadata GRANT SELECT ON DBA_OBJECTS TO C##SUPERMETAL_USER CONTAINER=ALL; GRANT SELECT ON DBA_EXTENTS TO C##SUPERMETAL_USER CONTAINER=ALL; -- Grant execute permissions for LogMiner GRANT EXECUTE ON DBMS_LOGMNR TO C##SUPERMETAL_USER; GRANT EXECUTE ON DBMS_LOGMNR_D TO C##SUPERMETAL_USER; -- If TDE is enabled, also grant: -- GRANT SELECT ON DBA_TABLESPACES TO C##SUPERMETAL_USER CONTAINER=ALL; -- GRANT SELECT ON DBA_ENCRYPTED_COLUMNS TO C##SUPERMETAL_USER CONTAINER=ALL; ``` ```sql -- Connect directly to the PDB as an Admin user -- Requires Oracle 19c Update 10 or later -- Create a local user in the PDB CREATE USER SUPERMETAL_USER IDENTIFIED BY "strong-password"; -- Grant session and catalog permissions GRANT CREATE SESSION TO SUPERMETAL_USER; GRANT EXECUTE_CATALOG_ROLE TO SUPERMETAL_USER; -- Grant LogMiner permissions GRANT LOGMINING TO SUPERMETAL_USER; -- Grant table access GRANT SELECT ANY TABLE TO SUPERMETAL_USER; GRANT SELECT ANY TRANSACTION TO SUPERMETAL_USER; -- Grant access to LogMiner dictionary views GRANT SELECT ON V_$DATABASE TO SUPERMETAL_USER; GRANT SELECT ON V_$ARCHIVED_LOG TO SUPERMETAL_USER; GRANT SELECT ON V_$LOGMNR_CONTENTS TO SUPERMETAL_USER; GRANT SELECT ON V_$LOGMNR_LOGS TO SUPERMETAL_USER; GRANT SELECT ON V_$LOG TO SUPERMETAL_USER; GRANT SELECT ON V_$LOGFILE TO SUPERMETAL_USER; GRANT SELECT ON V_$TRANSACTION TO SUPERMETAL_USER; GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO SUPERMETAL_USER; -- Required when using redo logs dictionary strategy (Oracle 19cU10+) GRANT SELECT ON DBA_LOGMNR_DICTIONARY_BUILDLOG TO SUPERMETAL_USER; -- Grant access to system views for metadata GRANT SELECT ON DBA_OBJECTS TO SUPERMETAL_USER; GRANT SELECT ON DBA_EXTENTS TO SUPERMETAL_USER; -- Grant execute permissions for LogMiner GRANT EXECUTE ON DBMS_LOGMNR TO SUPERMETAL_USER; GRANT EXECUTE ON DBMS_LOGMNR_D TO SUPERMETAL_USER; -- If TDE is enabled, also grant: -- GRANT SELECT ON DBA_TABLESPACES TO SUPERMETAL_USER; -- GRANT SELECT ON DBA_ENCRYPTED_COLUMNS TO SUPERMETAL_USER; ``` ```sql -- Connect as the master user -- Create a user for Supermetal CREATE USER SUPERMETAL_USER IDENTIFIED BY "strong-password"; GRANT CONNECT TO SUPERMETAL_USER; GRANT CREATE SESSION TO SUPERMETAL_USER; -- Grant LogMiner and catalog permissions GRANT LOGMINING TO SUPERMETAL_USER; GRANT EXECUTE_CATALOG_ROLE TO SUPERMETAL_USER; -- Grant table access GRANT SELECT ANY TRANSACTION TO SUPERMETAL_USER; GRANT SELECT ANY TABLE TO SUPERMETAL_USER; -- Use RDS procedures to grant access to system objects BEGIN -- Required when using redo logs dictionary strategy (Oracle 19cU10+) rdsadmin.rdsadmin_util.grant_sys_object('DBA_LOGMNR_DICTIONARY_BUILDLOG','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$DATABASE','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$ARCHIVED_LOG','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGMNR_CONTENTS','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGMNR_LOGS','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$LOG','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGFILE','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$TRANSACTION','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$ARCHIVE_DEST_STATUS','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$PARAMETER','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('DBMS_LOGMNR','SUPERMETAL_USER','EXECUTE'); rdsadmin.rdsadmin_util.grant_sys_object('DBMS_LOGMNR_D','SUPERMETAL_USER','EXECUTE'); rdsadmin.rdsadmin_util.grant_sys_object('DBA_OBJECTS','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('DBA_EXTENTS','SUPERMETAL_USER','SELECT'); END; / -- If TDE is enabled, also grant: -- BEGIN -- rdsadmin.rdsadmin_util.grant_sys_object('DBA_TABLESPACES','SUPERMETAL_USER','SELECT'); -- rdsadmin.rdsadmin_util.grant_sys_object('DBA_ENCRYPTED_COLUMNS','SUPERMETAL_USER','SELECT'); -- END; -- / ``` ```sql -- Connect to the PDB as the master user -- Requires Oracle 19c Update 10 or later on RDS -- Create a local user in the PDB CREATE USER SUPERMETAL_USER IDENTIFIED BY "strong-password"; GRANT CONNECT TO SUPERMETAL_USER; GRANT CREATE SESSION TO SUPERMETAL_USER; -- Grant LogMiner and catalog permissions GRANT LOGMINING TO SUPERMETAL_USER; GRANT EXECUTE_CATALOG_ROLE TO SUPERMETAL_USER; -- Grant table access GRANT SELECT ANY TRANSACTION TO SUPERMETAL_USER; GRANT SELECT ANY TABLE TO SUPERMETAL_USER; -- Use RDS procedures to grant access to system objects BEGIN rdsadmin.rdsadmin_util.grant_sys_object('DBA_LOGMNR_DICTIONARY_BUILDLOG','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$DATABASE','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$ARCHIVED_LOG','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGMNR_CONTENTS','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGMNR_LOGS','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$LOG','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGFILE','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$TRANSACTION','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$ARCHIVE_DEST_STATUS','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('V_$PARAMETER','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('DBMS_LOGMNR','SUPERMETAL_USER','EXECUTE'); rdsadmin.rdsadmin_util.grant_sys_object('DBMS_LOGMNR_D','SUPERMETAL_USER','EXECUTE'); rdsadmin.rdsadmin_util.grant_sys_object('DBA_OBJECTS','SUPERMETAL_USER','SELECT'); rdsadmin.rdsadmin_util.grant_sys_object('DBA_EXTENTS','SUPERMETAL_USER','SELECT'); END; / -- If TDE is enabled, also grant: -- BEGIN -- rdsadmin.rdsadmin_util.grant_sys_object('DBA_TABLESPACES','SUPERMETAL_USER','SELECT'); -- rdsadmin.rdsadmin_util.grant_sys_object('DBA_ENCRYPTED_COLUMNS','SUPERMETAL_USER','SELECT'); -- END; -- / ``` Oracle's default profiles often have low session limits which may impact Supermetal's parallel processing capabilities when snapshotting your database. For optimal performance, set your `SESSIONS_PER_USER` limit to the smaller of: * The number of Supermetal agent cores, or * The number of database cores (available from `v$parameter.cpu_count`) At minimum, we recommend setting this value to at least 10 sessions: ```sql -- Check current database CPU count SELECT VALUE FROM v$parameter WHERE NAME = 'cpu_count'; -- Create profile with appropriate session limit CREATE PROFILE SUPERMETAL_PROFILE LIMIT SESSIONS_PER_USER 10; -- Apply profile to Supermetal user ALTER USER SUPERMETAL_USER PROFILE SUPERMETAL_PROFILE; ``` For CDB environments, apply the profile to your common user instead: ```sql ALTER USER C##SUPERMETAL_USER PROFILE SUPERMETAL_PROFILE; ``` Replace `'strong-password'` with a secure, unique password for the `supermetal_user`. Store this password securely, as you'll need it when configuring the source in Supermetal. ### Setup LogMiner #### Archive Log Mode Oracle's LogMiner requires that your database is running in ARCHIVELOG mode. You can check and enable it as follows: ```sql -- Check if database is in ARCHIVELOG mode SELECT log_mode FROM v$database; -- If not in ARCHIVELOG mode, enable it SHUTDOWN IMMEDIATE; STARTUP MOUNT; ALTER DATABASE ARCHIVELOG; ALTER DATABASE OPEN; ``` [Changing log mode](https://docs.oracle.com/en/database/oracle/oracle-database/23/sutil/oracle-logminer-utility.html#GUID-7594F0D7-0ACD-46E6-BD61-2751136ECDB4) requires a restart of the oracle database. Instruct Oracle [RMAN](https://docs.oracle.com/en/database/oracle/oracle-database/23/sutil/oracle-logminer-utility.html#GUID-7594F0D7-0ACD-46E6-BD61-2751136ECDB4) to retain archive logs for at least 3 days. ```sql RMAN> CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 3 DAYS; ``` For optimal LogMiner performance, ensure your redo logs are appropriately sized ```sql -- Check current redo log configuration SELECT group#, bytes/1024/1024 AS size_mb FROM v$log; ``` Each redo log file should be at least 500MB for production databases. If your logs are smaller, consider increasing their size by contacting your DBA ```sql -- Check if database is in ARCHIVELOG mode SELECT log_mode FROM v$database; -- If not in ARCHIVELOG mode, enable it by turning on backups in AWS RDS console ``` AWS RDS Oracle instances are automatically configured with ARCHIVELOG mode enabled when backups are enabled. Set archive log retention using the `set_configuration` procedure for at least 3 days: ```sql BEGIN rdsadmin.rdsadmin_util.set_configuration('archivelog retention hours', 72); END; / ``` Adjust this value based on your backup and recovery requirements. #### Database Supplemental Logging LogMiner requires supplemental logging to capture the necessary information for CDC: * **Minimal**: Lower redo log usage but requires table-level supplemental logging configuration for each table. * **Full**: Higher redo log usage but eliminates the need for table-level configuration. ```sql -- Enable minimal supplemental logging at the database level ALTER DATABASE ADD SUPPLEMENTAL LOG DATA; -- Then configure table-level supplemental logging for each table to be synced -- Repeat for each table to be synced: ALTER TABLE schema_name.table_name ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS; ``` ```sql -- Enable full supplemental logging at the database level ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS; ``` For more reliable CDC, consider enabling force logging at the database level: ```sql ALTER DATABASE FORCE LOGGING; ``` ```sql -- Enable full supplemental logging using RDS procedures BEGIN rdsadmin.rdsadmin_util.alter_supplemental_logging( p_action => 'ADD', p_type => 'ALL' ); END; / ``` For more reliable CDC, enable force logging using the RDS specific procedure: ```sql BEGIN rdsadmin.rdsadmin_util.force_logging(p_enable => true); END; ``` This ensures all database operations are logged. # AWS (/docs/main/concepts/deployment/byoc/aws) import { Tab, Tabs } from 'fumadocs-ui/components/tabs'; import { Step, Steps } from 'fumadocs-ui/components/steps'; import { Accordion, Accordions } from 'fumadocs-ui/components/accordion'; import ByocAwsDiagram from '../../../../../../app/diagrams/byoc_aws'; Follow this guide to provision Supermetal BYOC (Bring Your Own Cloud), which deploys Supermetal's data replication agents in your own virtual cloud environment, maintaining full control and ownership over your data. This provides an additional layer of security and isolation. Supermetal handles provisioning, operations, and maintenance. *** ## Overview | Phase | Resource | Description | | ---------------------- | -------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Bootstrap** | IAM Role | `SupermetalControlPlaneRole`: Assumed by Supermetal's control plane for orchestration of resources in your AWS account. | | | IAM Role | `SupermetalAgentTaskRole`: IAM Role for Supermetal Agent ECS tasks, granting necessary permissions for agent operations (e.g., S3, Secrets Manager, KMS). | | | IAM Role | `SupermetalTaskExecutionRole`: IAM Role for ECS task execution, allowing ECS to pull ECR images for agents and send logs to CloudWatch. | | | IAM Role | `SupermetalAgentEC2InstanceRole`: IAM Role for EC2 instances that host Supermetal Agents (used with EC2-backed ECS capacity providers). | | | IAM Instance Profile | `SupermetalAgentEC2InstanceProfile`: Instance profile associated with `SupermetalAgentEC2InstanceRole`, attached to EC2 instances hosting agents. | | | IAM Role | `SupermetalValidationLambdaRole`: IAM Role for Supermetal Lambda functions used for validation tasks (e.g., connector credential validation, network reachability). | | | Interface VPC Endpoint | `MetadataVPCEndpoint`: An AWS PrivateLink Interface VPC Endpoint for private communication between Supermetal agents and the Supermetal control plane. | | | Security Group | `MetadataVPCEndpointSecurityGroup`: Associated with the `MetadataVPCEndpoint`, controls inbound traffic to the endpoint. | | | Security Group | `SupermetalAgentMetadataSecurityGroup`: Allows Supermetal agents (running in your VPC) to connect to the `MetadataVPCEndpoint` on HTTPS port 443. | | | (Optional) KMS Key & Alias | `SupermetalKMSKey` (and `SupermetalKMSKeyAlias` like `alias/supermetal-key-${VpcId}`): AWS KMS Customer Managed Key (CMK) for encrypting sensitive connector credentials. A new key is created if an existing `CredentialsKeyArn` is not provided. | | **Registration** | Secret | An AWS Secrets Manager secret is created to store credentials required by the Supermetal control plane or agents, such as for ECR access or metadata service authentication. | | **Connector Creation** | ECS Cluster | An Amazon ECS cluster (e.g., `supermetal-cluster-${VpcId}`) is created to host the Supermetal agent services. | | | ECS Service | An Amazon ECS service is deployed within the cluster to run and manage Supermetal agent tasks. | | | ECS Capacity Provider | An ECS Capacity Provider, typically backed by an EC2 Auto Scaling Group (ASG), is set up to manage the compute resources for the agents. | * **Data Sovereignty:** Your primary data (being replicated) is processed entirely within your AWS VPC and never leaves your environment to the Supermetal Control Plane. * **Secure Communication:** All metadata communication between Supermetal agents in your VPC and the Supermetal Control Plane occurs securely over AWS PrivateLink via an Interface VPC Endpoint. This traffic is encrypted in transit using **TLS 1.2+**. * **Credential Protection:** Sensitive connector credentials provided to Supermetal are encrypted using **your** AWS KMS Customer Managed Key (`CredentialsKeyArn`) before being stored by the Supermetal Control Plane. Only the Supermetal Agents decrypt these credentials at runtime within your environment. Supermetal Control Plane cannot decrypt these credentials. * **IAM Security:** The cross-account role (`SupermetalControlPlaneRole`) assumed by Supermetal utilizes an `ExternalId` to protect against the "confused deputy" problem. All IAM roles are designed with the principle of least privilege to perform only their required functions. *** ## Prerequisites To successfully deploy Supermetal BYOC in your AWS environment, please ensure you have the following:
#### Supermetal Account Identifier Supermetal account identifier (obtainable from the Supermetal Console under Settings). * `SUPERMETAL_ACCOUNT_ID`
#### AWS Account & Permissions * IAM User or Role with sufficient permissions (e.g., `AWSAdministratorAccess`) to bootstrap the Supermetal BYOC environment using either the [CloudFormation or the Terraform](#deployment_method).
#### Deployment Parameters * **AWS Account ID:** The AWS Account ID of the target deployment account (e.g., `123456789012`). * **AWS Region(s):** The AWS Region(s) of the target deployment account (e.g., `us-east-1`, `eu-west-2`). * **VPC ID (`VpcId`):** The ID of an existing VPC within your chosen AWS Region where the bootstrap infrastructure and Supermetal agents will be deployed to on connector creation. * **Subnet IDs (`SubnetIds`):** A list of (private) subnet IDs within your specified `VpcId`. Supermetal agents will be deployed across these subnets for high availability. Ensure these subnets belong to the same availability zones as your source (and target) databases. * **(Optional) KMS Key for Credentials (`CredentialsKeyArn`):** Decide if you'll use an existing AWS KMS Customer Managed Key (CMK) for encrypting sensitive connector credentials. If you have an existing CMK, have its ARN ready. If left blank, the bootstrap process will create a new KMS key specifically for Supermetal. ***
## Setup A two phase process, one is to bootstrap the environment and the other is to create a connector to deploy the data plane / agents in your VPC.
#### Environment Bootstrap The bootstrap process uses an AWS CloudFormation template or a Terraform script provided by Supermetal to provision the necessary resources, including [IAM roles](#iam-roles) and private endpoint. These IAM roles are used by Supermetal to automatically register your cloud environment with the Supermetal control plane over the private endpoint, and to automatically deploy Supermetal data plane / agents in your VPC. The initial environment bootstrap process does not provision any compute resources or the data plane / agents. Once the environment bootstrap is complete, you can create connectors to deploy Supermetal data plane / agents on-demand in your VPC. You can use the AWS Management Console or the AWS CLI to deploy the bootstrap CloudFormation template. #### Obtain CloudFormation Template \[!toc] * Obtain the latest CloudFormation template file from the Supermetal Console, the [template from appendix](#cloudformation-template) is for reference. #### Deploy CloudFormation Template \[!toc] #### Deploy with AWS Management Console \[!toc] #### Prepare and Initiate Stack Creation \[!toc] * Navigate to the **AWS CloudFormation console** in your AWS account. * Ensure you are in the correct **AWS Region** for your deployment. * Click **"Create stack"** and select **"With new resources (standard)"**. * Under "Prerequisite - Prepare template", choose **"Template is ready"**. * Under "Specify template", select **"Upload a template file"**. * Click **"Choose file"** and upload the CloudFormation template file you obtained. * Click **"Next"**. #### Specify Stack Details \[!toc] * On the "Specify stack details" page, enter a **"Stack name"** (e.g., `supermetal-bootstrap`). * In the **"Parameters"** section, provide values for each parameter. * Click **"Next"**. Refer to the [Deployment Parameters](#deployment-parameters) section above for detailed explanations of each. * `SupermetalAccountId` * `VpcId` * `SubnetIds` * `CredentialsKeyArn` #### Review and Create \[!toc] * On the "Review" page, carefully **review all your settings** and parameter values. * Scroll to the bottom and find the **"Capabilities"** section. * **Important:** You **must** check the box that says: "I acknowledge that AWS CloudFormation might create IAM resources with custom names." (The exact wording might vary slightly). * Click **"Create stack"** (or "Submit" depending on the AWS console version). #### Deploy with AWS CLI \[!toc] * AWS variables necessary to authenticate. Use either: * `AWS_PROFILE` * `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` * Verify access, run `aws sts get-caller-identity --region `. See the [AWS CLI reference](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html). #### Deploy Stack \[!toc] Use the `aws cloudformation deploy` command to create the stack. ```bash aws cloudformation deploy \ --template-file /path/to/your/supermetal-bootstrap-cf-template.yaml \ --stack-name supermetal-bootstrap \ --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM \ --parameter-overrides \ SupermetalAccountId="$SUPERMETAL_ACCOUNT_ID" \ VpcId="$VPC_ID" \ SubnetIds="$SUBNET_IDS" \ CredentialsKeyArn="$CREDENTIALS_KEY_ARN" # Optional: Provide a Customer Managed Key (CMK) ARN for encrypting connector credentials ``` Refer to the [Deployment Parameters](#deployment-parameters) section above for detailed explanations of each. * `SUPERMETAL_ACCOUNT_ID` * `VPC_ID` * `SUBNET_IDS` * `CREDENTIALS_KEY_ARN` #### Verification \[!toc] * Monitor the CloudFormation stack creation process in the AWS CloudFormation console. It may take several minutes to complete. * Once the CloudFormation stack status shows `CREATE_COMPLETE`, Supermetal's Control Plane automatically performs a registration process for the new environment post stack creation. * Monitor the Environments page in your Supermetal Console. The environment should show up with a "Ready" status within a few minutes after the CloudFormation stack completes. You can use the Terraform CLI to deploy the bootstrap module. #### Obtain Terraform Script \[!toc] * Obtain the latest Terraform script from the Supermetal Console. The Terraform code in the [appendix](#environment-bootstrap-scripts) can be used as a reference or saved as local files (e.g., `main.tf`, `variables.tf`, `outputs.tf`). #### Configure Terraform \[!toc] * Ensure you have [Terraform installed](https://learn.hashicorp.com/tutorials/terraform/install-cli) (version 1.0 or later). * Set up your AWS credentials for Terraform. This typically involves configuring the AWS CLI and ensuring your environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN`) or AWS profile (`AWS_PROFILE`) are correctly set. * Verify access by running `aws sts get-caller-identity --region `. Refer to the [AWS provider documentation](https://registry.terraform.io/providers/hashicorp/aws/latest/docs#authentication-and-configuration) for more details. #### Prepare Terraform Files \[!toc] * Create a new directory for your Terraform configuration. * If using the appendix script, save the Terraform code into files within this directory (e.g., `main.tf`). * Create a `terraform.tfvars` file in the same directory to specify the required variable values. Alternatively, you can pass variables via the command line. **Example `terraform.tfvars`:** ```tfvars vpc_id = "vpc-xxxxxxxxxxxxxxxxx" private_subnet_ids = ["subnet-xxxxxxxxxxxxxxxxx", "subnet-yyyyyyyyyyyyyyyyy"] external_id = "your-supermetal-provided-external-id" supermetal_control_plane_aws_account_id = "supermetal-aws-account-id" # Provided by Supermetal supermetal_metadata_service_endpoint_name = "com.amazonaws.vpce.your-region.vpce-svc-xxxxxxxxxxxxxxxxx" # Provided by Supermetal # credentials_key_arn = "arn:aws:kms:your-region:your-account-id:key/your-key-id" # Optional: Only if using an existing KMS CMK ``` Refer to the [Deployment Parameters](#deployment-parameters) section and the variable descriptions within the Terraform script for detailed explanations of each. * `vpc_id` * `private_subnet_ids` * `external_id` * `supermetal_control_plane_aws_account_id` * `supermetal_metadata_service_endpoint_name` * `credentials_key_arn` (Optional) #### Deploy with Terraform CLI \[!toc] * Navigate to your Terraform configuration directory in your terminal. * Initialize Terraform: ```bash terraform init ``` * Review the execution plan: ```bash terraform plan ``` * Apply the configuration: ```bash terraform apply ``` Confirm the action by typing `yes` when prompted. If you didn't use a `.tfvars` file, you can pass variables directly with the `apply` command. Ensure you provide values for all required variables: ```bash terraform apply \ -var="vpc_id=vpc-xxxxxxxxxxxxxxxxx" \ -var="private_subnet_ids=[\"subnet-xxxxxxxxxxxxxxxxx\", \"subnet-yyyyyyyyyyyyyyyyy\"]" \ -var="external_id=YOUR_EXTERNAL_ID_FROM_SUPERMETAL_CONSOLE" \ -var="supermetal_control_plane_aws_account_id=SUPERMETAL_CONTROL_PLANE_AWS_ACCOUNT_ID" \ -var="supermetal_metadata_service_endpoint_name=COM_AMAZONAWS_VPCE_YOUR_REGION_VPCE_SVC_XXXXXXXXXXXXXX" \ # Optional: Provide this if you are using an existing KMS CMK. # If omitted and 'credentials_key_arn' is not set in a .tfvars file, a new KMS key will be created. # -var="credentials_key_arn=arn:aws:kms:your-region:your-account-id:key/your-key-id" ``` Replace placeholder values (e.g., `vpc-xxxxxxxxxxxxxxxxx`, `YOUR_EXTERNAL_ID_FROM_SUPERMETAL_CONSOLE`) with your actual values. #### Verification \[!toc] * Monitor the `terraform apply` command output. It will indicate when the resources have been successfully created. * Once the Terraform apply is complete, Supermetal's Control Plane automatically performs a registration process for the new environment. * Monitor the Environments page in your Supermetal Console. The environment should show up with a "Ready" status within a few minutes after Terraform completes. ***
#### Create Connector(s) to deploy Supermetal agents Once the environment bootstrap is complete, we are now ready to create a connector to deploy Supermetal agents in your VPC. #### Select source database \[!toc] Follow the connector setup instructions in the Supermetal Console to select a source database, supermetal can optionally [auto discover](#database_discovery) available source databases. #### Create Security Group and Configure Source Database Access \[!toc] To give Supermetal access to your source database and to validate network reachability and connection credentials, you need to create a security group and configure the source database access. Supermetal Console will list pre-filled steps to create the security group and configure the database access as part of the prerequisite steps similar to the steps listed below. ```bash # Create the security group SG_ID=$(aws ec2 create-security-group \ --description "Security group for Supermetal" \ --vpc-id $VPC_ID \ --output text --query 'GroupId') # Add required outbound rules # 1. For metadata VPC endpoint (created during environment bootstrap) aws ec2 authorize-security-group-egress \ --group-id $SG_ID \ --protocol tcp \ --port 443 \ --source-group $(aws ec2 describe-security-groups \ --filters Name=group-name,Values=supermetal-metadata-sg Name=vpc-id,Values=$VPC_ID \ --query 'SecurityGroups[0].GroupId' --output text) # 2. For database access aws ec2 authorize-security-group-egress \ --group-id $SG_ID \ --protocol tcp \ --port $DATABASE_PORT \ --source-group $DATABASE_SECURITY_GROUP_ID # Add database access rule to your database's security group aws ec2 authorize-security-group-ingress \ --group-id $DATABASE_SECURITY_GROUP_ID \ --protocol tcp \ --port $DATABASE_PORT \ --source-group $SG_ID ``` * `VPC_ID`: The VPC ID of your environment. Refer to the [Deployment Parameters](#deployment-parameters) section * `DATABASE_SECURITY_GROUP_ID`: The security group ID of your database * `DATABASE_PORT`: The port number your database listens on (e.g., 5432 for PostgreSQL) * The metadata endpoint security group (`supermetal-metadata-sg`) is automatically looked up using the VPC ID #### Create Security Group and Configure Database Access \[!toc] 1. Navigate to the **AWS EC2 Console** > **Security Groups** 2. Click **Create security group** 3. Configure the basic details: * **Description:** "Security group for Supermetal agents " * **VPC:** Select the same VPC used for your Supermetal environment 4. Leave **Inbound rules** empty (no rules needed) 5. **Outbound rules (required):** * Add rule for metadata access: * **Type:** HTTPS * **Port:** 443 * **Destination:** Security group named `supermetal-metadata-sg` (created during environment bootstrap) * Add rule for database access: * **Type:** Custom TCP * **Port:** Your database port * **Destination:** Your database's security group 6. Click **Create security group** and note the Security Group ID #### Configure Database Access \[!toc] 7. Locate your database's security group 8. Edit its inbound rules: * **Type:** Custom TCP * **Port:** Your database port (e.g., 5432 for PostgreSQL) * **Source:** Select the `supermetal-agent-sg` security group 9. Save the rules #### CloudFormation Template for Security Groups \[!toc] ```yaml Resources: # Create the Supermetal Agent Security Group SupermetalAgentSecurityGroup: Type: AWS::EC2::SecurityGroup Properties: GroupDescription: Security group for Supermetal agents VpcId: !Ref VpcId SecurityGroupEgress: # Required outbound rules - Description: Allow access to metadata service IpProtocol: tcp FromPort: 443 ToPort: 443 SourceSecurityGroupId: !Ref MetadataSecurityGroup - Description: Allow access to database IpProtocol: tcp FromPort: !Ref DatabasePort ToPort: !Ref DatabasePort SourceSecurityGroupId: !Ref DatabaseSecurityGroupId # Look up the existing metadata security group MetadataSecurityGroup: Type: AWS::EC2::SecurityGroup::Id Properties: Filters: - Name: group-name Values: [supermetal-metadata-sg] - Name: vpc-id Values: [!Ref VpcId] # Add the ingress rule to the database security group DatabaseSecurityGroupIngress: Type: AWS::EC2::SecurityGroupIngress Properties: GroupId: !Ref DatabaseSecurityGroupId IpProtocol: tcp FromPort: !Ref DatabasePort ToPort: !Ref DatabasePort SourceSecurityGroupId: !Ref SupermetalAgentSecurityGroup Parameters: VpcId: Type: AWS::EC2::VPC::Id Description: VPC where the security groups will be created DatabaseSecurityGroupId: Type: AWS::EC2::SecurityGroup::Id Description: Security Group ID of your database DatabasePort: Type: Number Description: Port number your database listens on (e.g., 5432 for PostgreSQL) MinValue: 1 MaxValue: 65535 Outputs: AgentSecurityGroupId: Description: ID of the Supermetal Agent Security Group Value: !Ref SupermetalAgentSecurityGroup ``` This template creates both the agent security group and configures the database access in one deployment. * `VpcId`: The VPC ID of your environment. Refer to the [Deployment Parameters](#deployment-parameters) section * `DatabaseSecurityGroupId`: The security group ID of your database * `DatabasePort`: The port number your database listens on (e.g., 5432 for PostgreSQL) * The metadata endpoint security group (`supermetal-metadata-sg`) is automatically looked up using the VPC ID #### Create Security Group and Configure Database Access \[!toc] ```terraform # Create the agent security group resource "aws_security_group" "supermetal_agent_sg" { description = "Security group for Supermetal agents" vpc_id = var.vpc_id # Required outbound rules egress { description = "Allow access to metadata service" from_port = 443 to_port = 443 protocol = "tcp" security_group_ids = [data.aws_security_group.metadata_endpoint.id] } # Look up the metadata endpoint security group data "aws_security_group" "metadata_endpoint" { name = "supermetal-metadata-sg" vpc_id = var.vpc_id } egress { description = "Allow access to database" from_port = var.database_port to_port = var.database_port protocol = "tcp" security_group_ids = [var.database_security_group_id] } } # Add database access rule resource "aws_security_group_rule" "allow_supermetal_agent_to_db" { type = "ingress" from_port = var.database_port to_port = var.database_port protocol = "tcp" source_security_group_id = aws_security_group.supermetal_agent_sg.id security_group_id = var.database_security_group_id } # Output the security group ID for reference output "supermetal_agent_sg_id" { value = aws_security_group.supermetal_agent_sg.id } ``` * `vpc_id`: The VPC ID of your environment. Refer to the [Deployment Parameters](#deployment-parameters) section * `database_security_group_id`: The security group ID of your database * `database_port`: The port number your database listens on (e.g., 5432 for PostgreSQL) * The metadata endpoint security group (`supermetal-metadata-sg`) is automatically looked up using the VPC ID #### Validate Source database \[!toc] Follow the setup instructions in the Supermetal Console to input the created security group ID for the source database and validate the network reachability and connection credentials. #### Repeat for Target database \[!toc] Repeat the same steps for the target database. #### Finalize \[!toc] Once the source and target database connections are validated, you can finalize the connector setup which will launch the Supermetal agents in your VPC. ***
## Appendix ### Environment Bootstrap Scripts Resources used to bootstrap the environment ```yaml AWSTemplateFormatVersion: '2010-09-09' Description: >- CloudFormation template to set up the Supermetal BYOC environment in your AWS account. This template creates IAM roles, a VPC Endpoint for communication with the Supermetal Control Plane and an optional KMS key. Parameters: VpcId: Type: AWS::EC2::VPC::Id Description: (Required) The ID of your existing VPC for agent deployment. PrivateSubnetIds: Type: List Description: >- (Required) Comma-separated list of private subnet IDs within the VpcId where agents and the VPC endpoint will be deployed. CredentialsKeyArn: Type: String Description: >- (Optional) ARN of your existing KMS CMK for encrypting connector credentials. If left blank, a new KMS key will be created. Default: "" ExternalId: Type: String Description: >- (Required) A unique ID provided by the Supermetal Console. Used for securing the cross-account IAM role. SupermetalControlPlaneAWSAccountId: Type: String Description: The AWS Account ID of the Supermetal Control Plane. NoEcho: true SupermetalMetadataServiceEndpointName: Type: String Description: >- The VPC Endpoint Service Name for the Supermetal Metadata Service (e.g., com.amazonaws.vpce.us-east-1.vpce-svc-xxxxxxxxxxxxxxxxx). Conditions: CreateKMSKey: !Equals [!Ref CredentialsKeyArn, ""] Resources: # IAM Role for Supermetal Control Plane SupermetalControlPlaneRole: Type: AWS::IAM::Role Properties: RoleName: !Sub "SupermetalControlPlaneRole-${VpcId}" Tags: - Key: Name Value: !Sub "SupermetalControlPlaneRole-${VpcId}" - Key: supermetal:byoc:vpc-id Value: !Ref VpcId - Key: supermetal:byoc:stack-name Value: !Ref AWS::StackName - Key: supermetal:byoc:external-id Value: !Ref ExternalId AssumeRolePolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Principal: AWS: !Sub arn:aws:iam::${SupermetalControlPlaneAWSAccountId}:root Action: sts:AssumeRole Condition: StringEquals: sts:ExternalId: !Ref ExternalId ManagedPolicyArns: - !Ref SupermetalControlPlanePolicy # Managed Policy for SupermetalControlPlaneRole SupermetalControlPlanePolicy: Type: AWS::IAM::ManagedPolicy Properties: ManagedPolicyName: !Sub "SupermetalControlPlanePolicy-${VpcId}" PolicyDocument: Version: '2012-10-17' Statement: # ECS Cluster Management - Effect: Allow Action: - ecs:CreateCluster - ecs:DeleteCluster - ecs:DescribeClusters Resource: "*" # Scoped to clusters with name prefix 'supermetal-' via condition # Note: Conditions like 'ecs:cluster' on CreateCluster primarily apply to actions on existing resources. # They do not strictly restrict naming during creation itself due to IAM evaluation timing. Condition: StringLike: ecs:cluster: "supermetal-*" # ECS Task and Service Management - Effect: Allow Action: - ecs:CreateService - ecs:UpdateService - ecs:DeleteService - ecs:RegisterTaskDefinition - ecs:DeregisterTaskDefinition - ecs:DescribeServices - ecs:DescribeTasks - ecs:RunTask - ecs:StopTask - ecs:ListTasks - ecs:ListServices - ecs:DescribeTaskDefinition - ecs:ListContainerInstances - ecs:DescribeContainerInstances Resource: "*" # Scoped to clusters with name prefix 'supermetal-' via condition Condition: StringLike: ecs:cluster: "supermetal-*" # ECS Capacity Provider Management - Effect: Allow Action: - ecs:CreateCapacityProvider - ecs:DeleteCapacityProvider - ecs:DescribeCapacityProviders - ecs:UpdateCapacityProvider - ecs:PutClusterCapacityProviders Resource: "*" # Scoped to capacity providers with name prefix 'supermetal-' via condition # Note: Conditions like 'ecs:capacity-provider' on CreateCapacityProvider primarily apply to actions on existing resources. # They do not strictly restrict naming during creation itself due to IAM evaluation timing. Condition: StringLike: ecs:capacity-provider: "supermetal-*" # EC2 Capacity Management - Effect: Allow Action: - autoscaling:CreateAutoScalingGroup - autoscaling:UpdateAutoScalingGroup - autoscaling:DeleteAutoScalingGroup - autoscaling:DescribeAutoScalingGroups - autoscaling:DescribeScalingActivities - autoscaling:SetDesiredCapacity - ec2:CreateLaunchTemplate - ec2:DeleteLaunchTemplate - ec2:DescribeLaunchTemplates - ec2:DescribeLaunchTemplateVersions - ec2:CreateTags - ec2:DescribeInstances Resource: "*" Condition: StringLike: aws:ResourceTag/supermetal: "*" # Lambda for Connector Validation - Effect: Allow Action: - lambda:InvokeFunction - lambda:CreateFunction - lambda:DeleteFunction - lambda:GetFunctionConfiguration - lambda:UpdateFunctionConfiguration - lambda:AddPermission - lambda:RemovePermission Resource: !Sub arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:supermetal-validation-* # Lambda VPC Access - Effect: Allow Action: - ec2:CreateNetworkInterface - ec2:DescribeNetworkInterfaces - ec2:DeleteNetworkInterface - ec2:AssignPrivateIpAddresses - ec2:UnassignPrivateIpAddresses Resource: "*" # Scoped to specified VPC via condition Condition: StringEquals: ec2:VpcId: !Ref VpcId # Database Discovery - Effect: Allow Action: - rds:DescribeDBInstances - rds:DescribeDBClusters - redshift:DescribeClusters Resource: "*" # Scoped to specified VPC via condition # Note: While ec2:VpcId is the correct key, RDS/Redshift Describe actions may not fully support VPC-based IAM filtering. # The control plane might list all regional resources; further filtering may be needed application-side. Condition: StringEquals: ec2:VpcId: !Ref VpcId # S3 Buffer Management - Effect: Allow Action: - s3:CreateBucket - s3:DeleteBucket - s3:PutBucketTagging - s3:GetBucketTagging - s3:PutEncryptionConfiguration - s3:GetEncryptionConfiguration Resource: !Sub arn:aws:s3:::supermetal-* # Security Group Management - Effect: Allow Action: - ec2:DescribeSecurityGroups Resource: "*" Condition: StringEquals: ec2:VpcId: !Ref VpcId # Secrets Management - Effect: Allow Action: - secretsmanager:CreateSecret - secretsmanager:DeleteSecret - secretsmanager:GetSecretValue - secretsmanager:PutSecretValue - secretsmanager:UpdateSecret - secretsmanager:TagResource Resource: !Sub arn:aws:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:supermetal-* # KMS Encryption - Effect: Allow Action: kms:Encrypt Resource: !If [CreateKMSKey, !GetAtt SupermetalKMSKey.Arn, !Ref CredentialsKeyArn] # IAM Role Management - Effect: Allow Action: iam:PassRole Resource: - !GetAtt SupermetalAgentTaskRole.Arn - !GetAtt SupermetalValidationLambdaRole.Arn - !GetAtt SupermetalTaskExecutionRole.Arn - !GetAtt SupermetalAgentEC2InstanceRole.Arn Condition: StringEquals: iam:PassedToService: - ecs-tasks.amazonaws.com - lambda.amazonaws.com - ec2.amazonaws.com # IAM Role for Lambda Validation Function SupermetalValidationLambdaRole: Type: AWS::IAM::Role Properties: RoleName: !Sub "SupermetalValidationLambdaRole-${VpcId}" Tags: - Key: Name Value: !Sub "SupermetalValidationLambdaRole-${VpcId}" - Key: supermetal:byoc:vpc-id Value: !Ref VpcId - Key: supermetal:byoc:stack-name Value: !Ref AWS::StackName - Key: supermetal:byoc:external-id Value: !Ref ExternalId AssumeRolePolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Principal: Service: lambda.amazonaws.com Action: sts:AssumeRole ManagedPolicyArns: - arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole Path: / # IAM Role for Supermetal Agent Tasks SupermetalAgentTaskRole: Type: AWS::IAM::Role Properties: RoleName: !Sub "SupermetalAgentTaskRole-${VpcId}" Tags: - Key: Name Value: !Sub "SupermetalAgentTaskRole-${VpcId}" - Key: supermetal:byoc:vpc-id Value: !Ref VpcId - Key: supermetal:byoc:stack-name Value: !Ref AWS::StackName - Key: supermetal:byoc:external-id Value: !Ref ExternalId AssumeRolePolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Principal: Service: ecs-tasks.amazonaws.com Action: sts:AssumeRole Condition: ArnLike: aws:SourceArn: !Sub arn:aws:ecs:${AWS::Region}:${AWS::AccountId}:task/supermetal-* StringEquals: aws:SourceAccount: !Ref AWS::AccountId ManagedPolicyArns: - !Ref SupermetalAgentTaskPolicy Path: / # IAM Role for ECS Task Execution SupermetalTaskExecutionRole: Type: AWS::IAM::Role Properties: RoleName: !Sub "SupermetalTaskExecutionRole-${VpcId}" Tags: - Key: Name Value: !Sub "SupermetalTaskExecutionRole-${VpcId}" - Key: supermetal:byoc:vpc-id Value: !Ref VpcId - Key: supermetal:byoc:stack-name Value: !Ref AWS::StackName - Key: supermetal:byoc:external-id Value: !Ref ExternalId AssumeRolePolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Principal: Service: ecs-tasks.amazonaws.com Action: sts:AssumeRole Condition: StringEquals: aws:SourceAccount: !Ref AWS::AccountId ArnLike: aws:SourceArn: !Sub arn:aws:ecs:${AWS::Region}:${AWS::AccountId}:task-definition/supermetal*:* ManagedPolicyArns: - arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy - !Ref SupermetalTaskExecutionPolicy Path: / # Managed Policy for SupermetalAgentTaskRole SupermetalAgentTaskPolicy: Type: AWS::IAM::ManagedPolicy Properties: ManagedPolicyName: !Sub "SupermetalAgentTaskPolicy-${VpcId}" PolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Action: - kms:Decrypt - kms:Encrypt Resource: !If [CreateKMSKey, !GetAtt SupermetalKMSKey.Arn, !Ref CredentialsKeyArn] # Secrets Access - Effect: Allow Action: - secretsmanager:GetSecretValue Resource: !Sub arn:aws:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:supermetal-* # S3 Buffer Access - Effect: Allow Action: - s3:PutObject - s3:GetObject - s3:DeleteObject - s3:ListBucket Resource: - !Sub arn:aws:s3:::supermetal-* - !Sub arn:aws:s3:::supermetal-*/* # Managed Policy for SupermetalTaskExecutionRole SupermetalTaskExecutionPolicy: Type: AWS::IAM::ManagedPolicy Properties: ManagedPolicyName: !Sub "SupermetalTaskExecutionPolicy-${VpcId}" PolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Action: - secretsmanager:GetSecretValue Resource: !Sub arn:aws:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:supermetal-* - Effect: Allow Action: - kms:Decrypt Resource: !If [CreateKMSKey, !GetAtt SupermetalKMSKey.Arn, !Ref CredentialsKeyArn] # KMS Key for Connector Credentials SupermetalKMSKey: Type: AWS::KMS::Key Condition: CreateKMSKey Properties: Description: KMS key for encrypting Supermetal connector credentials Tags: - Key: Name Value: !Sub "SupermetalKMSKey-${VpcId}" - Key: supermetal:byoc:vpc-id Value: !Ref VpcId - Key: supermetal:byoc:stack-name Value: !Ref AWS::StackName - Key: supermetal:byoc:external-id Value: !Ref ExternalId EnableKeyRotation: true KeyPolicy: Version: '2012-10-17' Statement: - Sid: Enable IAM User Permissions Effect: Allow Principal: AWS: !Sub arn:aws:iam::${AWS::AccountId}:root Action: kms:* Resource: '*' # Key policy applies to this key only - Sid: Allow Supermetal Control Plane to Encrypt Effect: Allow Principal: AWS: !GetAtt SupermetalControlPlaneRole.Arn Action: kms:Encrypt Resource: '*' # Key policy applies to this key only - Sid: Allow Supermetal Agents to Decrypt/Encrypt Effect: Allow Principal: AWS: !GetAtt SupermetalAgentTaskRole.Arn Action: - kms:Decrypt - kms:Encrypt Resource: '*' # Key policy applies to this key only SupermetalKMSKeyAlias: Type: AWS::KMS::Alias Condition: CreateKMSKey Properties: AliasName: !Sub "alias/supermetal-key-${VpcId}" TargetKeyId: !Ref SupermetalKMSKey SupermetalAgentEC2InstanceProfile: Type: AWS::IAM::InstanceProfile Properties: InstanceProfileName: !Sub "SupermetalAgentEC2InstanceProfile-${VpcId}" Tags: - Key: Name Value: !Sub "SupermetalAgentEC2InstanceProfile-${VpcId}" - Key: supermetal:byoc:vpc-id Value: !Ref VpcId - Key: supermetal:byoc:stack-name Value: !Ref AWS::StackName - Key: supermetal:byoc:external-id Value: !Ref ExternalId Path: / Roles: - !Ref SupermetalAgentEC2InstanceRole SupermetalAgentEC2InstanceRole: Type: AWS::IAM::Role Properties: RoleName: !Sub "SupermetalAgentEC2InstanceRole-${VpcId}" Tags: - Key: Name Value: !Sub "SupermetalAgentEC2InstanceRole-${VpcId}" - Key: supermetal:byoc:vpc-id Value: !Ref VpcId - Key: supermetal:byoc:stack-name Value: !Ref AWS::StackName - Key: supermetal:byoc:external-id Value: !Ref ExternalId AssumeRolePolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Principal: Service: ec2.amazonaws.com Action: sts:AssumeRole ManagedPolicyArns: - arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role - !Ref SupermetalAgentEC2InstancePolicy Path: / # Managed Policy for EC2 Instance Role SupermetalAgentEC2InstancePolicy: Type: AWS::IAM::ManagedPolicy Properties: ManagedPolicyName: !Sub "SupermetalAgentEC2InstancePolicy-${VpcId}" PolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Action: - ecr:GetAuthorizationToken Resource: "*" - Effect: Allow Action: - ecr:BatchGetImage - ecr:GetDownloadUrlForLayer Resource: !Sub arn:aws:ecr:${AWS::Region}:${SupermetalControlPlaneAWSAccountId}:repository/agent/* SupermetalAgentMetadataSecurityGroup: Type: AWS::EC2::SecurityGroup Properties: GroupName: !Sub "supermetal-agent-metadata-sg-${VpcId}" GroupDescription: Security group for Supermetal Agents VpcId: !Ref VpcId Tags: - Key: Name Value: !Sub "supermetal-agent-metadata-sg-${VpcId}" - Key: supermetal:byoc:vpc-id Value: !Ref VpcId - Key: supermetal:byoc:stack-name Value: !Ref AWS::StackName - Key: supermetal:byoc:external-id Value: !Ref ExternalId # VPC Endpoint for Supermetal Metadata Service MetadataVPCEndpoint: Type: AWS::EC2::VPCEndpoint Properties: ServiceName: !Ref SupermetalMetadataServiceEndpointName VpcId: !Ref VpcId SubnetIds: !Ref PrivateSubnetIds VpcEndpointType: Interface PrivateDnsEnabled: true SecurityGroupIds: - !Ref MetadataVPCEndpointSecurityGroup Tags: - Key: Name Value: !Sub "MetadataVPCEndpoint-${VpcId}" - Key: supermetal:byoc:vpc-id Value: !Ref VpcId - Key: supermetal:byoc:stack-name Value: !Ref AWS::StackName - Key: supermetal:byoc:external-id Value: !Ref ExternalId MetadataVPCEndpointSecurityGroup: Type: AWS::EC2::SecurityGroup Properties: GroupName: !Sub "supermetal-metadata-vpce-sg-${VpcId}" GroupDescription: Security group for Supermetal metadata VPC Endpoint VpcId: !Ref VpcId SecurityGroupIngress: # Allow agents to connect to the endpoint - IpProtocol: tcp FromPort: 443 # HTTPS ToPort: 443 SourceSecurityGroupId: !Ref SupermetalAgentMetadataSecurityGroup Tags: - Key: Name Value: !Sub "supermetal-metadata-vpce-sg-${VpcId}" - Key: supermetal:byoc:vpc-id Value: !Ref VpcId - Key: supermetal:byoc:stack-name Value: !Ref AWS::StackName - Key: supermetal:byoc:external-id Value: !Ref ExternalId Outputs: SupermetalControlPlaneRoleArn: Description: ARN of the IAM Role for Supermetal Control Plane Value: !GetAtt SupermetalControlPlaneRole.Arn SupermetalAgentTaskRoleArn: Description: ARN of the IAM Role for Supermetal Agent Tasks Value: !GetAtt SupermetalAgentTaskRole.Arn SupermetalKMSKeyArnOutput: Description: ARN of the KMS Key for encrypting credentials Value: !If [CreateKMSKey, !GetAtt SupermetalKMSKey.Arn, !Ref CredentialsKeyArn] SupermetalTaskExecutionRoleArn: Description: ARN of the IAM Role for ECS Task Execution Value: !GetAtt SupermetalTaskExecutionRole.Arn MetadataVPCEndpointId: Description: ID of the VPC Endpoint for Supermetal Metadata Service Value: !Ref MetadataVPCEndpoint SupermetalAgentMetadataSecurityGroupId: Description: ID of the Security Group for Supermetal Agents (for Metadata Endpoint communication) Value: !GetAtt SupermetalAgentMetadataSecurityGroup.GroupId SupermetalValidationLambdaRoleArn: Description: ARN of the IAM Role for Supermetal Validation Lambda functions Value: !GetAtt SupermetalValidationLambdaRole.Arn SupermetalAgentEC2InstanceRoleArn: Description: ARN of the IAM Role for Supermetal Agent EC2 Instances Value: !GetAtt SupermetalAgentEC2InstanceRole.Arn SupermetalAgentEC2InstanceProfileArn: Description: ARN of the IAM Instance Profile for Supermetal Agent EC2 Instances Value: !GetAtt SupermetalAgentEC2InstanceProfile.Arn ``` ```terraform # This sets up the Supermetal BYOC environment in your AWS account. # It creates IAM roles, a VPC Endpoint for communication with the Supermetal Control Plane, # and an optional KMS key. terraform { required_providers { aws = { source = "hashicorp/aws" version = "~> 5.0" # Specify a compatible AWS provider version } } } # ------------------------------------------------------------------------------ # Input Variables # ------------------------------------------------------------------------------ variable "vpc_id" { description = "(Required) The ID of your existing VPC for agent deployment." type = string } variable "private_subnet_ids" { description = "(Required) List of private subnet IDs within the VpcId where agents and the VPC endpoint will be deployed." type = list(string) } variable "credentials_key_arn" { description = "(Optional) ARN of your existing KMS CMK for encrypting connector credentials. If left blank, a new KMS key will be created." type = string default = "" } variable "external_id" { description = "(Required) A unique ID provided by the Supermetal Console. Used for securing the cross-account IAM role." type = string } variable "supermetal_control_plane_aws_account_id" { description = "The AWS Account ID of the Supermetal Control Plane." type = string } variable "supermetal_metadata_service_endpoint_name" { description = "The VPC Endpoint Service Name for the Supermetal Metadata Service (e.g., com.amazonaws.vpce.us-east-1.vpce-svc-xxxxxxxxxxxxxxxxx)." type = string } locals { create_kms_key = var.credentials_key_arn == "" common_tags = { "supermetal:byoc:vpc-id" = var.vpc_id "supermetal:byoc:stack-name" = "terraform-supermetal-byoc" # Terraform doesn't have a direct AWS::StackName equivalent, using a placeholder "supermetal:byoc:external-id" = var.external_id } } data "aws_caller_identity" "current" {} data "aws_region" "current" {} # ------------------------------------------------------------------------------ # Resources # ------------------------------------------------------------------------------ # IAM Role for Supermetal Control Plane resource "aws_iam_role" "supermetal_control_plane_role" { name = "SupermetalControlPlaneRole-${var.vpc_id}" tags = merge(local.common_tags, { Name = "SupermetalControlPlaneRole-${var.vpc_id}" }) assume_role_policy = jsonencode({ Version = "2012-10-17", Statement = [ { Effect = "Allow", Principal = { AWS = "arn:aws:iam::${var.supermetal_control_plane_aws_account_id}:root" }, Action = "sts:AssumeRole", Condition = { StringEquals = { "sts:ExternalId" = var.external_id } } } ] }) } # Managed Policy for SupermetalControlPlaneRole resource "aws_iam_policy" "supermetal_control_plane_policy" { name = "SupermetalControlPlanePolicy-${var.vpc_id}" description = "Policy for Supermetal Control Plane Role" policy = jsonencode({ Version = "2012-10-17", Statement = [ # ECS Cluster Management { Effect = "Allow", Action = [ "ecs:CreateCluster", "ecs:DeleteCluster", "ecs:DescribeClusters" ], Resource = "*", # Scoped to clusters with name prefix 'supermetal-' via condition Condition = { StringLike = { "ecs:cluster" = "supermetal-*" } } }, # ECS Task and Service Management { Effect = "Allow", Action = [ "ecs:CreateService", "ecs:UpdateService", "ecs:DeleteService", "ecs:RegisterTaskDefinition", "ecs:DeregisterTaskDefinition", "ecs:DescribeServices", "ecs:DescribeTasks", "ecs:RunTask", "ecs:StopTask", "ecs:ListTasks", "ecs:ListServices", "ecs:DescribeTaskDefinition", "ecs:ListContainerInstances", "ecs:DescribeContainerInstances" ], Resource = "*", # Scoped to clusters with name prefix 'supermetal-' via condition Condition = { StringLike = { "ecs:cluster" = "supermetal-*" } } }, # ECS Capacity Provider Management { Effect = "Allow", Action = [ "ecs:CreateCapacityProvider", "ecs:DeleteCapacityProvider", "ecs:DescribeCapacityProviders", "ecs:UpdateCapacityProvider", "ecs:PutClusterCapacityProviders" ], Resource = "*", # Scoped to capacity providers with name prefix 'supermetal-' via condition Condition = { StringLike = { "ecs:capacity-provider" = "supermetal-*" } } }, # EC2 Capacity Management { Effect = "Allow", Action = [ "autoscaling:CreateAutoScalingGroup", "autoscaling:UpdateAutoScalingGroup", "autoscaling:DeleteAutoScalingGroup", "autoscaling:DescribeAutoScalingGroups", "autoscaling:DescribeScalingActivities", "autoscaling:SetDesiredCapacity", "ec2:CreateLaunchTemplate", "ec2:DeleteLaunchTemplate", "ec2:DescribeLaunchTemplates", "ec2:DescribeLaunchTemplateVersions", "ec2:CreateTags", "ec2:DescribeInstances" ], Resource = "*", Condition = { StringLike = { "aws:ResourceTag/supermetal" = "*" } } }, # Lambda for Connector Validation { Effect = "Allow", Action = [ "lambda:InvokeFunction", "lambda:CreateFunction", "lambda:DeleteFunction", "lambda:GetFunctionConfiguration", "lambda:UpdateFunctionConfiguration", "lambda:AddPermission", "lambda:RemovePermission" ], Resource = "arn:aws:lambda:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:function:supermetal-validation-*" }, # Lambda VPC Access { Effect = "Allow", Action = [ "ec2:CreateNetworkInterface", "ec2:DescribeNetworkInterfaces", "ec2:DeleteNetworkInterface", "ec2:AssignPrivateIpAddresses", "ec2:UnassignPrivateIpAddresses" ], Resource = "*", # Scoped to specified VPC via condition Condition = { StringEquals = { "ec2:VpcId" = var.vpc_id } } }, # Database Discovery { Effect = "Allow", Action = [ "rds:DescribeDBInstances", "rds:DescribeDBClusters", "redshift:DescribeClusters" ], Resource = "*", # Scoped to specified VPC via condition Condition = { StringEquals = { "ec2:VpcId" = var.vpc_id } } }, # S3 Buffer Management { Effect = "Allow", Action = [ "s3:CreateBucket", "s3:DeleteBucket", "s3:PutBucketTagging", "s3:GetBucketTagging", "s3:PutEncryptionConfiguration", "s3:GetEncryptionConfiguration" ], Resource = "arn:aws:s3:::supermetal-*" }, # Security Group Management { Effect = "Allow", Action = [ "ec2:DescribeSecurityGroups" ], Resource = "*", Condition = { StringEquals = { "ec2:VpcId" = var.vpc_id } } }, # Secrets Management { Effect = "Allow", Action = [ "secretsmanager:CreateSecret", "secretsmanager:DeleteSecret", "secretsmanager:GetSecretValue", "secretsmanager:PutSecretValue", "secretsmanager:UpdateSecret", "secretsmanager:TagResource" ], Resource = "arn:aws:secretsmanager:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:secret:supermetal-*" }, # KMS Encryption { Effect = "Allow", Action = "kms:Encrypt", Resource = local.create_kms_key ? aws_kms_key.supermetal_kms_key[0].arn : var.credentials_key_arn }, # IAM Role Management { Effect = "Allow", Action = "iam:PassRole", Resource = [ aws_iam_role.supermetal_agent_task_role.arn, aws_iam_role.supermetal_validation_lambda_role.arn, aws_iam_role.supermetal_task_execution_role.arn, aws_iam_role.supermetal_agent_ec2_instance_role.arn ], Condition = { StringEquals = { "iam:PassedToService" = [ "ecs-tasks.amazonaws.com", "lambda.amazonaws.com", "ec2.amazonaws.com" ] } } } ] }) } resource "aws_iam_role_policy_attachment" "supermetal_control_plane_role_policy_attach" { role = aws_iam_role.supermetal_control_plane_role.name policy_arn = aws_iam_policy.supermetal_control_plane_policy.arn } # IAM Role for Lambda Validation Function resource "aws_iam_role" "supermetal_validation_lambda_role" { name = "SupermetalValidationLambdaRole-${var.vpc_id}" path = "/" tags = merge(local.common_tags, { Name = "SupermetalValidationLambdaRole-${var.vpc_id}" }) assume_role_policy = jsonencode({ Version = "2012-10-17", Statement = [ { Effect = "Allow", Principal = { Service = "lambda.amazonaws.com" }, Action = "sts:AssumeRole" } ] }) } resource "aws_iam_role_policy_attachment" "supermetal_validation_lambda_role_vpc_access_policy_attach" { role = aws_iam_role.supermetal_validation_lambda_role.name policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole" } # IAM Role for Supermetal Agent Tasks resource "aws_iam_role" "supermetal_agent_task_role" { name = "SupermetalAgentTaskRole-${var.vpc_id}" path = "/" tags = merge(local.common_tags, { Name = "SupermetalAgentTaskRole-${var.vpc_id}" }) assume_role_policy = jsonencode({ Version = "2012-10-17", Statement = [ { Effect = "Allow", Principal = { Service = "ecs-tasks.amazonaws.com" }, Action = "sts:AssumeRole", Condition = { ArnLike = { "aws:SourceArn" = "arn:aws:ecs:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:task/supermetal-*" }, StringEquals = { "aws:SourceAccount" = data.aws_caller_identity.current.account_id } } } ] }) } # Managed Policy for SupermetalAgentTaskRole resource "aws_iam_policy" "supermetal_agent_task_policy" { name = "SupermetalAgentTaskPolicy-${var.vpc_id}" description = "Policy for Supermetal Agent Task Role" policy = jsonencode({ Version = "2012-10-17", Statement = [ { Effect = "Allow", Action = [ "kms:Decrypt", "kms:Encrypt" ], Resource = local.create_kms_key ? aws_kms_key.supermetal_kms_key[0].arn : var.credentials_key_arn }, # Secrets Access { Effect = "Allow", Action = [ "secretsmanager:GetSecretValue" ], Resource = "arn:aws:secretsmanager:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:secret:supermetal-*" }, # S3 Buffer Access { Effect = "Allow", Action = [ "s3:PutObject", "s3:GetObject", "s3:DeleteObject", "s3:ListBucket" ], Resource = [ "arn:aws:s3:::supermetal-*", "arn:aws:s3:::supermetal-*/*" ] } ] }) } resource "aws_iam_role_policy_attachment" "supermetal_agent_task_role_policy_attach" { role = aws_iam_role.supermetal_agent_task_role.name policy_arn = aws_iam_policy.supermetal_agent_task_policy.arn } # IAM Role for ECS Task Execution resource "aws_iam_role" "supermetal_task_execution_role" { name = "SupermetalTaskExecutionRole-${var.vpc_id}" path = "/" tags = merge(local.common_tags, { Name = "SupermetalTaskExecutionRole-${var.vpc_id}" }) assume_role_policy = jsonencode({ Version = "2012-10-17", Statement = [ { Effect = "Allow", Principal = { Service = "ecs-tasks.amazonaws.com" }, Action = "sts:AssumeRole", Condition = { StringEquals = { "aws:SourceAccount" = data.aws_caller_identity.current.account_id }, ArnLike = { "aws:SourceArn" = "arn:aws:ecs:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:task-definition/supermetal*:*" } } } ] }) } resource "aws_iam_role_policy_attachment" "supermetal_task_execution_role_ecs_policy_attach" { role = aws_iam_role.supermetal_task_execution_role.name policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy" } # Managed Policy for SupermetalTaskExecutionRole resource "aws_iam_policy" "supermetal_task_execution_policy" { name = "SupermetalTaskExecutionPolicy-${var.vpc_id}" description = "Policy for Supermetal Task Execution Role" policy = jsonencode({ Version = "2012-10-17", Statement = [ { Effect = "Allow", Action = [ "secretsmanager:GetSecretValue" ], Resource = "arn:aws:secretsmanager:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:secret:supermetal-*" }, { Effect = "Allow", Action = [ "kms:Decrypt" ], Resource = local.create_kms_key ? aws_kms_key.supermetal_kms_key[0].arn : var.credentials_key_arn } ] }) } resource "aws_iam_role_policy_attachment" "supermetal_task_execution_role_custom_policy_attach" { role = aws_iam_role.supermetal_task_execution_role.name policy_arn = aws_iam_policy.supermetal_task_execution_policy.arn } # KMS Key for Connector Credentials resource "aws_kms_key" "supermetal_kms_key" { count = local.create_kms_key ? 1 : 0 description = "KMS key for encrypting Supermetal connector credentials" enable_key_rotation = true tags = merge(local.common_tags, { Name = "SupermetalKMSKey-${var.vpc_id}" }) policy = jsonencode({ Version = "2012-10-17", Statement = [ { Sid = "Enable IAM User Permissions", Effect = "Allow", Principal = { AWS = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root" }, Action = "kms:*", Resource = "*" }, { Sid = "Allow Supermetal Control Plane to Encrypt", Effect = "Allow", Principal = { AWS = aws_iam_role.supermetal_control_plane_role.arn }, Action = "kms:Encrypt", Resource = "*" }, { Sid = "Allow Supermetal Agents to Decrypt/Encrypt", Effect = "Allow", Principal = { AWS = aws_iam_role.supermetal_agent_task_role.arn }, Action = [ "kms:Decrypt", "kms:Encrypt" ], Resource = "*" } ] }) } resource "aws_kms_alias" "supermetal_kms_key_alias" { count = local.create_kms_key ? 1 : 0 name = "alias/supermetal-key-${var.vpc_id}" target_key_id = aws_kms_key.supermetal_kms_key[0].key_id } # IAM Role for Supermetal Agent EC2 Instances resource "aws_iam_role" "supermetal_agent_ec2_instance_role" { name = "SupermetalAgentEC2InstanceRole-${var.vpc_id}" path = "/" tags = merge(local.common_tags, { Name = "SupermetalAgentEC2InstanceRole-${var.vpc_id}" }) assume_role_policy = jsonencode({ Version = "2012-10-17", Statement = [ { Effect = "Allow", Principal = { Service = "ec2.amazonaws.com" }, Action = "sts:AssumeRole" } ] }) } # Managed Policy for EC2 Instance Role resource "aws_iam_policy" "supermetal_agent_ec2_instance_policy" { name = "SupermetalAgentEC2InstancePolicy-${var.vpc_id}" description = "Policy for Supermetal EC2 Instance Role" policy = jsonencode({ Version = "2012-10-17", Statement = [ { Effect = "Allow", Action = [ "ecr:GetAuthorizationToken" ], Resource = "*" }, { Effect = "Allow", Action = [ "ecr:BatchGetImage", "ecr:GetDownloadUrlForLayer" ], Resource = "arn:aws:ecr:${data.aws_region.current.name}:${var.supermetal_control_plane_aws_account_id}:repository/agent/*" } ] }) } resource "aws_iam_role_policy_attachment" "supermetal_agent_ec2_instance_role_ecs_policy_attach" { role = aws_iam_role.supermetal_agent_ec2_instance_role.name policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role" } resource "aws_iam_role_policy_attachment" "supermetal_agent_ec2_instance_role_custom_policy_attach" { role = aws_iam_role.supermetal_agent_ec2_instance_role.name policy_arn = aws_iam_policy.supermetal_agent_ec2_instance_policy.arn } resource "aws_iam_instance_profile" "supermetal_agent_ec2_instance_profile" { name = "SupermetalAgentEC2InstanceProfile-${var.vpc_id}" role = aws_iam_role.supermetal_agent_ec2_instance_role.name path = "/" tags = merge(local.common_tags, { Name = "SupermetalAgentEC2InstanceProfile-${var.vpc_id}" }) } # Security Group for Supermetal Agents (for Metadata Endpoint communication) resource "aws_security_group" "supermetal_agent_metadata_sg" { name = "supermetal-agent-metadata-sg-${var.vpc_id}" description = "Security group for Supermetal Agents" vpc_id = var.vpc_id tags = merge(local.common_tags, { Name = "supermetal-agent-metadata-sg-${var.vpc_id}" }) } # Security Group for Metadata VPC Endpoint resource "aws_security_group" "metadata_vpc_endpoint_sg" { name = "supermetal-metadata-vpce-sg-${var.vpc_id}" description = "Security group for Supermetal metadata VPC Endpoint" vpc_id = var.vpc_id tags = merge(local.common_tags, { Name = "supermetal-metadata-vpce-sg-${var.vpc_id}" }) ingress { description = "Allow agents to connect to the endpoint (HTTPS)" from_port = 443 to_port = 443 protocol = "tcp" security_groups = [aws_security_group.supermetal_agent_metadata_sg.id] } } # VPC Endpoint for Supermetal Metadata Service resource "aws_vpc_endpoint" "metadata_vpc_endpoint" { vpc_id = var.vpc_id service_name = var.supermetal_metadata_service_endpoint_name vpc_endpoint_type = "Interface" subnet_ids = var.private_subnet_ids private_dns_enabled = true security_group_ids = [ aws_security_group.metadata_vpc_endpoint_sg.id ] tags = merge(local.common_tags, { Name = "MetadataVPCEndpoint-${var.vpc_id}" }) } # ------------------------------------------------------------------------------ # Outputs # ------------------------------------------------------------------------------ output "supermetal_control_plane_role_arn" { description = "ARN of the IAM Role for Supermetal Control Plane" value = aws_iam_role.supermetal_control_plane_role.arn } output "supermetal_agent_task_role_arn" { description = "ARN of the IAM Role for Supermetal Agent Tasks" value = aws_iam_role.supermetal_agent_task_role.arn } output "supermetal_kms_key_arn_output" { description = "ARN of the KMS Key for encrypting credentials" value = local.create_kms_key ? aws_kms_key.supermetal_kms_key[0].arn : var.credentials_key_arn } output "supermetal_task_execution_role_arn" { description = "ARN of the IAM Role for ECS Task Execution" value = aws_iam_role.supermetal_task_execution_role.arn } output "metadata_vpc_endpoint_id" { description = "ID of the VPC Endpoint for Supermetal Metadata Service" value = aws_vpc_endpoint.metadata_vpc_endpoint.id } output "supermetal_agent_metadata_security_group_id" { description = "ID of the Security Group for Supermetal Agents (for Metadata Endpoint communication)" value = aws_security_group.supermetal_agent_metadata_sg.id } output "supermetal_validation_lambda_role_arn" { description = "ARN of the IAM Role for Supermetal Validation Lambda functions" value = aws_iam_role.supermetal_validation_lambda_role.arn } output "supermetal_agent_ec2_instance_role_arn" { description = "ARN of the IAM Role for Supermetal Agent EC2 Instances" value = aws_iam_role.supermetal_agent_ec2_instance_role.arn } output "supermetal_agent_ec2_instance_profile_arn" { description = "ARN of the IAM Instance Profile for Supermetal Agent EC2 Instances" value = aws_iam_instance_profile.supermetal_agent_ec2_instance_profile.arn } ``` *** ### IAM Roles & Policies Bootstrap creates several IAM roles and policies with specific permissions necessary for Supermetal BYOC to function securely. These roles ensure that the Supermetal Control Plane can manage resources in your AWS account on your behalf, and that the Supermetal agents have the necessary permissions to operate. The principle of least privilege is applied to scope down permissions as much as possible. Key roles created and their purposes: #### SupermetalControlPlaneRole ```json { "SupermetalControlPlaneRole": { "Type": "AWS::IAM::Role", "Properties": { "RoleName": "SupermetalControlPlaneRole-${VPC_ID}", "Tags": [ { "Key": "Name", "Value": "SupermetalControlPlaneRole-${VPC_ID}" }, { "Key": "supermetal:byoc:vpc-id", "Value": "${VPC_ID}" }, { "Key": "supermetal:byoc:stack-name", "Value": "${STACK_NAME}" }, { "Key": "supermetal:byoc:external-id", "Value": "${EXTERNAL_ID}" } ], "AssumeRolePolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::${SUPERMETAL_CONTROL_PLANE_AWS_ACCOUNT_ID}:root" }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "sts:ExternalId": "${EXTERNAL_ID}" } } } ] }, "ManagedPolicyArns": [ "arn:aws:iam::${AWS_ACCOUNT_ID}:policy/SupermetalControlPlanePolicy-${VPC_ID}" ] } } ``` ```json { "SupermetalControlPlanePolicy": { "Type": "AWS::IAM::ManagedPolicy", "Properties": { "ManagedPolicyName": "SupermetalControlPlanePolicy-${VPC_ID}", "PolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": ["ecs:CreateCluster", "ecs:DeleteCluster", "ecs:DescribeClusters"], "Resource": "*", "Condition": { "StringLike": { "ecs:cluster": "supermetal-*" } } }, { "Effect": "Allow", "Action": [ "ecs:CreateService", "ecs:UpdateService", "ecs:DeleteService", "ecs:RegisterTaskDefinition", "ecs:DeregisterTaskDefinition", "ecs:DescribeServices", "ecs:DescribeTasks", "ecs:RunTask", "ecs:StopTask", "ecs:ListTasks", "ecs:ListServices", "ecs:DescribeTaskDefinition", "ecs:ListContainerInstances", "ecs:DescribeContainerInstances" ], "Resource": "*", "Condition": { "StringLike": { "ecs:cluster": "supermetal-*" } } }, { "Effect": "Allow", "Action": [ "ecs:CreateCapacityProvider", "ecs:DeleteCapacityProvider", "ecs:DescribeCapacityProviders", "ecs:UpdateCapacityProvider", "ecs:PutClusterCapacityProviders" ], "Resource": "*", "Condition": { "StringLike": {"ecs:capacity-provider": "supermetal-*"} } }, { "Effect": "Allow", "Action": [ "autoscaling:CreateAutoScalingGroup", "autoscaling:UpdateAutoScalingGroup", "autoscaling:DeleteAutoScalingGroup", "autoscaling:DescribeAutoScalingGroups", "autoscaling:DescribeScalingActivities", "autoscaling:SetDesiredCapacity", "ec2:CreateLaunchTemplate", "ec2:DeleteLaunchTemplate", "ec2:DescribeLaunchTemplates", "ec2:DescribeLaunchTemplateVersions", "ec2:CreateTags", "ec2:DescribeInstances" ], "Resource": "*", "Condition": { "StringLike": { "aws:ResourceTag/supermetal": "*" } } }, { "Effect": "Allow", "Action": [ "lambda:InvokeFunction", "lambda:CreateFunction", "lambda:DeleteFunction", "lambda:GetFunctionConfiguration", "lambda:UpdateFunctionConfiguration", "lambda:AddPermission", "lambda:RemovePermission" ], "Resource": "arn:aws:lambda:${AWS_REGION}:${AWS_ACCOUNT_ID}:function:supermetal-validation-*" }, { "Effect": "Allow", "Action": [ "ec2:CreateNetworkInterface", "ec2:DescribeNetworkInterfaces", "ec2:DeleteNetworkInterface", "ec2:AssignPrivateIpAddresses", "ec2:UnassignPrivateIpAddresses" ], "Resource": "*", "Condition": { "StringEquals": { "ec2:VpcId": "${VPC_ID}"} } }, { "Effect": "Allow", "Action": ["rds:DescribeDBInstances", "rds:DescribeDBClusters", "redshift:DescribeClusters"], "Resource": "*", "Condition": { "StringEquals": { "ec2:VpcId": "${VPC_ID}"} } }, { "Effect": "Allow", "Action": [ "s3:CreateBucket", "s3:DeleteBucket", "s3:PutBucketTagging", "s3:GetBucketTagging", "s3:PutEncryptionConfiguration", "s3:GetEncryptionConfiguration" ], "Resource": "arn:aws:s3:::supermetal-*" }, { "Effect": "Allow", "Action": "ec2:DescribeSecurityGroups", "Resource": "*", "Condition": { "StringEquals": { "ec2:VpcId": "${VPC_ID}"} } }, { "Effect": "Allow", "Action": [ "secretsmanager:CreateSecret", "secretsmanager:DeleteSecret", "secretsmanager:GetSecretValue", "secretsmanager:PutSecretValue", "secretsmanager:UpdateSecret", "secretsmanager:TagResource" ], "Resource": "arn:aws:secretsmanager:${AWS_REGION}:${AWS_ACCOUNT_ID}:secret:supermetal-*" }, { "Effect": "Allow", "Action": "kms:Encrypt", "Resource": "${CUSTOMER_CREDENTIALS_KMS_KEY_ARN}" }, { "Effect": "Allow", "Action": "iam:PassRole", "Resource": [ "${SUPERMETAL_AGENT_TASK_ROLE_ARN}", "${SUPERMETAL_VALIDATION_LAMBDA_ROLE_ARN}", "${SUPERMETAL_TASK_EXECUTION_ROLE_ARN}", "${SUPERMETAL_AGENT_EC2_INSTANCE_ROLE_ARN}" ], "Condition": { "StringEquals": { "iam:PassedToService": ["ecs-tasks.amazonaws.com", "lambda.amazonaws.com", "ec2.amazonaws.com"] } } } ] } } } } ``` This role is assumed by the Supermetal Control Plane. It is granted permissions to orchestrate and manage the lifecycle of resources required for the Supermetal data plane within your AWS account. * **ECS Resources** * Create and manage ECS clusters, services, tasks, and capacity providers (e.g., `ecs:CreateCluster`, `ecs:DeleteCluster`, `ecs:CreateService`, `ecs:RunTask`) * Resources are restricted to those tagged or named with `supermetal-*` prefix * **EC2 and Auto Scaling** * Manage EC2 instances and launch templates (`ec2:CreateLaunchTemplate`, `ec2:DescribeInstances`, `ec2:CreateTags`) * Control Auto Scaling groups (`autoscaling:CreateAutoScalingGroup`, `autoscaling:UpdateAutoScalingGroup`) * Resources are typically conditioned on tags like `supermetal:*` * Network interface management for VPC access (`ec2:CreateNetworkInterface`, `ec2:DescribeNetworkInterfaces`) * Security group discovery (`ec2:DescribeSecurityGroups`) for network configuration * **Lambda Functions** * Deploy and manage validation functions named `supermetal-validation-*` (`lambda:CreateFunction`, `lambda:InvokeFunction`, `lambda:DeleteFunction`) * VPC access permissions for Lambda functions * Pre-deployment validation and auxiliary tasks within your VPC * **Database Access** * Read-only discovery of potential data sources (`rds:DescribeDBInstances`, `rds:DescribeDBClusters`, `redshift:DescribeClusters`) * Scoped to resources within specified VPC (`ec2:VpcId`) * **Storage and Data Management** * S3 bucket management (`s3:CreateBucket`, `s3:DeleteBucket`, `s3:PutBucketTagging`) for buckets prefixed with `supermetal-*` * Used for data buffering and staging per connector * **Security and Access Control** * Secrets Manager: Full control over secrets prefixed with `supermetal-*` (`secretsmanager:CreateSecret`, `secretsmanager:GetSecretValue`) * KMS: Encryption permissions (`kms:Encrypt`) on specified KMS keys * IAM: Pass roles to AWS services (`iam:PassRole`) for ECS tasks, Lambda functions, and EC2 instances *** #### SupermetalAgentRole ```json { "SupermetalAgentTaskRole": { "Type": "AWS::IAM::Role", "Properties": { "RoleName": "SupermetalAgentTaskRole-${VPC_ID}", "Tags": [ { "Key": "Name", "Value": "SupermetalAgentTaskRole-${VPC_ID}" }, { "Key": "supermetal:byoc:vpc-id", "Value": "${VPC_ID}" }, { "Key": "supermetal:byoc:stack-name", "Value": "${STACK_NAME}" }, { "Key": "supermetal:byoc:external-id", "Value": "${EXTERNAL_ID}" } ], "AssumeRolePolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "ecs-tasks.amazonaws.com" }, "Action": "sts:AssumeRole", "Condition": { "ArnLike": { "aws:SourceArn": "arn:aws:ecs:${AWS_REGION}:${AWS_ACCOUNT_ID}:task/supermetal-*" }, "StringEquals": { "aws:SourceAccount": "${AWS_ACCOUNT_ID}" } } } ] }, "ManagedPolicyArns": [ "arn:aws:iam::${AWS_ACCOUNT_ID}:policy/SupermetalAgentTaskPolicy-${VPC_ID}" ], "Path": "/" } } } ``` ```json { "SupermetalAgentTaskPolicy": { "Type": "AWS::IAM::ManagedPolicy", "Properties": { "ManagedPolicyName": "SupermetalAgentTaskPolicy-${VPC_ID}", "PolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": ["kms:Decrypt", "kms:Encrypt"], "Resource": "${CUSTOMER_CREDENTIALS_KMS_KEY_ARN}" }, { "Effect": "Allow", "Action": ["secretsmanager:GetSecretValue"], "Resource": "arn:aws:secretsmanager:${AWS_REGION}:${AWS_ACCOUNT_ID}:secret:supermetal-*" }, { "Effect": "Allow", "Action": ["s3:PutObject", "s3:GetObject", "s3:DeleteObject", "s3:ListBucket"], "Resource": [ "arn:aws:s3:::supermetal-*", "arn:aws:s3:::supermetal-*/*" ] } ] } } } } ``` This IAM role is assumed by the Supermetal agent tasks running on ECS within your AWS account. It grants the agent the necessary permissions to perform its data processing duties: * **Data Security and Encryption** * KMS: Encrypt and decrypt operations (`kms:Decrypt`, `kms:Encrypt`) on specified KMS key * Used for connector credential decryption * **Configuration Management** * Secrets Manager: Access to secrets (`secretsmanager:GetSecretValue`) prefixed with `supermetal-*` * Retrieves necessary API keys for Supermetal control plane and to authenticate to ECR repository for agent images * **Data Processing** * S3: Full object operations on `supermetal-*` buckets (`s3:PutObject`, `s3:GetObject`, `s3:DeleteObject`, `s3:ListBucket`) * Used to create buckets per connector to buffer data *** #### SupermetalTaskExecutionRole ```json { "SupermetalTaskExecutionRole": { "Type": "AWS::IAM::Role", "Properties": { "RoleName": "SupermetalTaskExecutionRole-${VPC_ID}", "Tags": [ { "Key": "Name", "Value": "SupermetalTaskExecutionRole-${VPC_ID}" }, { "Key": "supermetal:byoc:vpc-id", "Value": "${VPC_ID}" }, { "Key": "supermetal:byoc:stack-name", "Value": "${STACK_NAME}" }, { "Key": "supermetal:byoc:external-id", "Value": "${EXTERNAL_ID}" } ], "AssumeRolePolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "ecs-tasks.amazonaws.com" }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "aws:SourceAccount": "${AWS_ACCOUNT_ID}" }, "ArnLike": { "aws:SourceArn": "arn:aws:ecs:${AWS_REGION}:${AWS_ACCOUNT_ID}:task-definition/supermetal*:*" } } } ] }, "ManagedPolicyArns": [ "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy", "arn:aws:iam::${AWS_ACCOUNT_ID}:policy/SupermetalTaskExecutionPolicy-${VPC_ID}" ], "Path": "/" } } } ``` ```json { "SupermetalTaskExecutionPolicy": { "Type": "AWS::IAM::ManagedPolicy", "Properties": { "ManagedPolicyName": "SupermetalTaskExecutionPolicy-${VPC_ID}", "PolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": ["secretsmanager:GetSecretValue"], "Resource": "arn:aws:secretsmanager:${AWS_REGION}:${AWS_ACCOUNT_ID}:secret:supermetal-*" }, { "Effect": "Allow", "Action": ["kms:Decrypt"], "Resource": "${CUSTOMER_CREDENTIALS_KMS_KEY_ARN}" } ] } } } } ``` This role is used by Amazon ECS to launch your agent tasks. It has two sets of permissions: * **Core ECS Operations** * Uses AWS managed policy `arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy` * Pull container images from Amazon ECR * **Agent Configuration** * Custom policy `SupermetalTaskExecutionPolicy` provides: * Access to Secrets Manager (`secretsmanager:GetSecretValue`) for fetching Control Plane API secrets * KMS decryption (`kms:Decrypt`) to decrypt connector credentials * Makes configuration available to agent containers via environment variables *** #### SupermetalAgentEC2InstanceRole ```json { "SupermetalAgentEC2InstanceRole": { "Type": "AWS::IAM::Role", "Properties": { "RoleName": "SupermetalAgentEC2InstanceRole-${VPC_ID}", "Tags": [ { "Key": "Name", "Value": "SupermetalAgentEC2InstanceRole-${VPC_ID}" }, { "Key": "supermetal:byoc:vpc-id", "Value": "${VPC_ID}" }, { "Key": "supermetal:byoc:stack-name", "Value": "${STACK_NAME}" }, { "Key": "supermetal:byoc:external-id", "Value": "${EXTERNAL_ID}" } ], "AssumeRolePolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "ec2.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }, "ManagedPolicyArns": [ "arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role", "arn:aws:iam::${AWS_ACCOUNT_ID}:policy/SupermetalAgentEC2InstancePolicy-${VPC_ID}" ], "Path": "/" } } } ``` ```json { "SupermetalAgentEC2InstancePolicy": { "Type": "AWS::IAM::ManagedPolicy", "Properties": { "ManagedPolicyName": "SupermetalAgentEC2InstancePolicy-${VPC_ID}", "PolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": ["ecr:GetAuthorizationToken"], "Resource": "*" }, { "Effect": "Allow", "Action": ["ecr:BatchGetImage", "ecr:GetDownloadUrlForLayer"], "Resource": "arn:aws:ecr:${AWS_REGION}:${SUPERMETAL_CONTROL_PLANE_AWS_ACCOUNT_ID}:repository/agent/*" } ] } } } } ``` This role is attached to EC2 instances that form your ECS cluster's capacity when using the EC2 launch type. It has two sets of permissions: * **Core ECS Instance Operations** * Uses AWS managed policy `arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role` * Register with ECS cluster * Pull common container images * Send instance logs * **Supermetal Agent Access** * Custom policy `SupermetalAgentEC2InstancePolicy` provides: * ECR authentication (`ecr:GetAuthorizationToken`) * Image pulling from Supermetal's ECR repository (`ecr:BatchGetImage`, `ecr:GetDownloadUrlForLayer`) * Enables instances to pull agent container images from Supermetal's control plane *** #### SupermetalValidationLambdaRole ```json { "SupermetalValidationLambdaRole": { "Type": "AWS::IAM::Role", "Properties": { "RoleName": "SupermetalValidationLambdaRole-${VPC_ID}", "Tags": [ { "Key": "Name", "Value": "SupermetalValidationLambdaRole-${VPC_ID}" }, { "Key": "supermetal:byoc:vpc-id", "Value": "${VPC_ID}" }, { "Key": "supermetal:byoc:stack-name", "Value": "${STACK_NAME}" }, { "Key": "supermetal:byoc:external-id", "Value": "${EXTERNAL_ID}" } ], "AssumeRolePolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "lambda.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }, "ManagedPolicyArns": [ "arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole" ], "Path": "/" } } } ``` This role is assumed by Lambda functions that perform pre-deployment validation tasks. It provides: * **VPC Access and Networking** * Uses AWS managed policy `arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole` * Create and manage elastic network interfaces * Access resources within your VPC * Send function logs to CloudWatch # Azure (/docs/main/concepts/deployment/byoc/azure) import { Tab, Tabs } from 'fumadocs-ui/components/tabs'; import { Step, Steps } from 'fumadocs-ui/components/steps'; import { Accordion, Accordions } from 'fumadocs-ui/components/accordion'; import ByocAzureDiagram from '../../../../../../app/diagrams/byoc_azure'; Follow this guide to provision Supermetal BYOC (Bring Your Own Cloud), which deploys Supermetal's data replication agents in your own Azure virtual cloud environment, maintaining full control and ownership over your data. This provides an additional layer of security and isolation. Supermetal handles provisioning, operations, and maintenance. *** ## Overview Supermetal's BYOC model for Azure separates responsibilities between Supermetal's control plane and your Azure environment. The data replication agents run within your Azure Virtual Network (VNet), giving you full control over your data while Supermetal manages the agent lifecycle. | Phase | Resource | Description | | ---------------------- | -------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Bootstrap** | Azure AD Service Principal | Represents Supermetal's application and allows it to authenticate to your Azure AD tenant with specific permissions. | | | Azure Role Definition | Defines the precise set of permissions Supermetal requires to manage resources in your Azure subscription. Designed for least privilege. | | | Azure Role Assignment | Assigns the Custom Role to Supermetal's Service Principal, scoped to a designated Resource Group. | | | Azure Key Vault | Securely stores secrets and keys used by Supermetal for bootstrap information and by agents for runtime credentials. You can use your own Customer-Managed Keys (CMK). | | | Private Endpoint | An Azure Private Endpoint for secure communication between Supermetal agents and the Supermetal control plane via Azure Private Link. | | | Network Security Group Rules | Security rules to allow necessary traffic for the Supermetal agents and Private Endpoint. | | **Registration** | Key Vault Secrets | Secrets stored in your Azure Key Vault for agent bootstrap and operation. | | **Connector Creation** | Virtual Machine Scale Set (VMSS) | Hosts the Supermetal data replication agents with dedicated hardware capacity. | | | User-Assigned Managed Identity | Assigned to the VMSS instances, granting agents permissions to access necessary Azure resources. | | | Azure Blob Storage | Used by agents for temporary buffering or staging data if required by the replication process. | ## Prerequisites To successfully deploy Supermetal BYOC in your Azure environment, please ensure you have the following:
#### Supermetal Account Identifiers You'll need the following Supermetal Azure identifiers (obtainable from the Supermetal Console under Settings): * `supermetal_tenant_id`: Tenant ID of Supermetal's Azure AD * `supermetal_app_id`: The Application (Client) ID of Supermetal's Azure AD application registration * `supermetal_principal_id`: The Object ID of Supermetal's service principal (required for role assignments) * `external_id`: A unique identifier for securing cross-tenant access
#### Azure Account & Permissions * An Azure subscription with Owner or User Access Administrator permissions to create service principals and assign roles
#### Deployment Parameters * **Azure Subscription ID (`subscription_id`):** The ID of your Azure subscription where resources will be deployed * **Azure Region (`location`):** The Azure region for deployment (e.g., `eastus2`, `westeurope`) * **Resource Group (`resource_group_name`):** The name of a Resource Group where Supermetal will deploy resources * **VNet ID (`vnet_id`):** The resource ID of your existing Virtual Network where Supermetal agents will be deployed * **Subnet IDs (`private_subnet_ids`):** A list of subnet resource IDs within your specified VNet. Supermetal agents will be deployed in these subnets. Ensure these subnets are in the same region as your source (and target) databases * **(Optional) Key Vault ID (`credentials_key_vault_id`):** Decide if you'll use an existing Azure Key Vault for storing sensitive connector credentials. If you have an existing Key Vault, have its resource ID ready. If left blank, the bootstrap process will create a new Key Vault specifically for Supermetal ***
## Setup A two-phase process: first bootstrap the environment, then create connectors to deploy the data plane / agents in your Azure VNet.
#### Environment Bootstrap The bootstrap process uses a Terraform script provided by Supermetal to provision the necessary resources, including the Service Principal, Role, and private endpoint. These resources enable Supermetal to automatically register your Azure environment with the Supermetal control plane over the private endpoint, and to deploy Supermetal data plane / agents in your VNet. The initial environment bootstrap process does not provision any compute resources or the data plane / agents. Once the environment bootstrap is complete, you can create connectors to deploy Supermetal data plane / agents on-demand in your VNet. #### Obtain Terraform Script \[!toc] * Obtain the latest Terraform script from the Supermetal Console. The Terraform code in the [appendix](#environment-bootstrap-scripts) can be used as a reference. #### Configure Terraform \[!toc] * Ensure you have [Terraform installed](https://learn.hashicorp.com/tutorials/terraform/install-cli) (version 1.0 or later). * Set up your Azure credentials for Terraform. This typically involves authenticating with the Azure CLI using `az login` and ensuring you have the appropriate permissions. * Verify access by running `az account show`. You should see your Azure subscription details. #### Prepare Terraform Files \[!toc] * Create a new directory for your Terraform configuration. * If using the appendix script, save the Terraform code into files within this directory (e.g., `main.tf`). * Create a `terraform.tfvars` file in the same directory to specify the required variable values. Alternatively, you can pass variables via the command line. **Example `terraform.tfvars`:** ```tfvars # Supermetal identifiers (provided by Supermetal) supermetal_tenant_id = "00000000-0000-0000-0000-000000000000" supermetal_app_id = "00000000-0000-0000-0000-000000000000" supermetal_principal_id = "00000000-0000-0000-0000-000000000000" external_id = "unique-external-id-from-supermetal" # Azure environment parameters subscription_id = "00000000-0000-0000-0000-000000000000" location = "eastus2" resource_group_name = "supermetal-byoc-rg" vnet_id = "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/network-rg/providers/Microsoft.Network/virtualNetworks/production-vnet" private_subnet_ids = [ "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/network-rg/providers/Microsoft.Network/virtualNetworks/production-vnet/subnets/data-subnet", "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/network-rg/providers/Microsoft.Network/virtualNetworks/production-vnet/subnets/private-endpoint-subnet" ] # credentials_key_vault_id = "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/security-rg/providers/Microsoft.KeyVault/vaults/my-existing-keyvault" # Optional ``` Refer to the [Deployment Parameters](#deployment-parameters) section and the variable descriptions within the Terraform script for detailed explanations of each. * `supermetal_tenant_id` - Tenant ID of Supermetal's Azure AD * `supermetal_app_id` - Application (Client) ID of Supermetal's Azure AD application * `supermetal_principal_id` - Object ID of Supermetal's service principal * `external_id` - Unique identifier for securing cross-tenant access * `subscription_id` - Your Azure subscription ID * `location` - Azure region for deployment * `resource_group_name` - Resource group name for Supermetal resources * `vnet_id` - Resource ID of your existing VNet * `private_subnet_ids` - List of subnet resource IDs within your VNet * `credentials_key_vault_id` (Optional) - Resource ID of your existing Key Vault #### Deploy with Terraform CLI \[!toc] * Navigate to your Terraform configuration directory in your terminal. * Initialize Terraform: ```bash terraform init ``` * Review the execution plan with your tfvars file: ```bash terraform plan -var-file=terraform.tfvars ``` * Apply the configuration using your tfvars file: ```bash terraform apply -var-file=terraform.tfvars ``` Confirm the action by typing `yes` when prompted. #### Verification \[!toc] * Monitor the `terraform apply` command output. It will indicate when the resources have been successfully created. * Once the Terraform apply is complete, Supermetal's Control Plane automatically performs a registration process for the new environment. * Monitor the Environments page in your Supermetal Console. The environment should show up with a "Ready" status within a few minutes after Terraform completes.
#### Create Connector(s) to deploy Supermetal agents Once the environment bootstrap is complete, we are now ready to create a connector to deploy Supermetal agents in your Azure environment. #### Select source database \[!toc] Follow the connector setup instructions in the Supermetal Console to select a source database, supermetal can optionally [auto discover](#database_discovery) available source databases. #### Configure Database Network Security Group \[!toc] To give Supermetal access to your source database and to validate network reachability and connection credentials, you need to ensure your database's network security group allows connections from the Supermetal agent subnet. Supermetal Console will list pre-filled steps to configure the database access as part of the prerequisite steps similar to the steps listed below. ```bash # Set variables for your environment RESGROUP="your-resource-group-name" CONNECTOR_NAME="connector1" # Unique identifier for this connector NSG_NAME="supermetal-${CONNECTOR_NAME}-nsg" # Unique NSG name for this connector DB_NSG_NAME="your-db-nsg-name" VNET_NAME="your-vnet-name" AGENT_SUBNET_NAME="your-agent-subnet-name" # Subnet where Supermetal agents will be deployed LOCATION="eastus2" DB_PORT=5432 # For PostgreSQL SUPERMETAL_METADATA_ENDPOINT="Provided-By-Supermetal" # Specific endpoint for Supermetal metadata service # Create the NSG for Supermetal agents az network nsg create \ --resource-group $RESGROUP \ --name $NSG_NAME \ --location $LOCATION # Add outbound rule for metadata private endpoint access az network nsg rule create \ --resource-group $RESGROUP \ --nsg-name $NSG_NAME \ --name "AllowMetadataAccess" \ --priority 100 \ --direction "Outbound" \ --access "Allow" \ --protocol "Tcp" \ --destination-port-ranges 443 \ --destination-address-prefixes $SUPERMETAL_METADATA_ENDPOINT \ --description "Allow access to Supermetal metadata service" # Add outbound rule for database access az network nsg rule create \ --resource-group $RESGROUP \ --nsg-name $NSG_NAME \ --name "AllowDatabaseAccess" \ --priority 110 \ --direction "Outbound" \ --access "Allow" \ --protocol "Tcp" \ --destination-port-ranges $DB_PORT \ --destination-address-prefixes "YOUR_DATABASE_SUBNET_PREFIX" \ --description "Allow access to database" # Add inbound rule to database NSG to allow access from Supermetal agent az network nsg rule create \ --resource-group $RESGROUP \ --nsg-name $DB_NSG_NAME \ --name "AllowSupermetalAgentAccess" \ --priority 100 \ --direction "Inbound" \ --access "Allow" \ --protocol "Tcp" \ --source-address-prefixes "YOUR_AGENT_SUBNET_PREFIX" \ --destination-port-ranges $DB_PORT \ --description "Allow Supermetal agent access to database" # Associate the NSG with the agent subnet az network vnet subnet update \ --resource-group $RESGROUP \ --vnet-name $VNET_NAME \ --name $AGENT_SUBNET_NAME \ --network-security-group $NSG_NAME ``` * `RESGROUP`: Your Azure resource group name where the NSGs will be created * `CONNECTOR_NAME`: A unique identifier for this connector (important if you're deploying multiple connectors) * `VNET_NAME`: The name of your Azure Virtual Network * `AGENT_SUBNET_NAME`: The name of the subnet where Supermetal agents will be deployed * `YOUR_DATABASE_SUBNET_PREFIX`: CIDR range of your database subnet (e.g., "10.0.2.0/24") * `YOUR_AGENT_SUBNET_PREFIX`: CIDR range of the subnet where Supermetal agents will be deployed * `DB_PORT`: The port number your database listens on (e.g., 5432 for PostgreSQL) * `SUPERMETAL_METADATA_ENDPOINT`: Specific endpoint for the Supermetal metadata service (provided by Supermetal) #### Create Network Security Group and Configure Database Access \[!toc] 1. Navigate to the **Azure Portal** > **Network Security Groups** 2. Click **Create network security group** 3. Configure the basic details: * **Resource group:** Select your resource group * **Name:** "supermetal-`${connector-name}`-nsg" (e.g., "supermetal-connector1-nsg") * **Region:** Select the same region as your Azure resources 4. Click **Review + create** and then **Create** #### Configure Outbound Rules \[!toc] 5. Once created, navigate to the new NSG 6. Select **Outbound security rules** 7. Click **Add** to create an outbound rule for metadata access: * **Destination:** IP Addresses * **Destination IP addresses:** Supermetal-provided metadata endpoint address * **Service:** HTTPS * **Port ranges:** 443 * **Priority:** 100 * **Name:** "AllowMetadataAccess" 8. Add another outbound rule for database access: * **Destination:** IP Addresses * **Destination IP addresses/CIDR ranges:** Your database subnet CIDR * **Service:** Custom * **Protocol:** TCP * **Port ranges:** Your database port (e.g., 5432) * **Priority:** 110 * **Name:** "AllowDatabaseAccess" #### Configure Database NSG Inbound Rule \[!toc] 9. Navigate to your database's NSG 10. Select **Inbound security rules** 11. Click **Add** to create a new rule: * **Source:** IP Addresses * **Source IP addresses/CIDR ranges:** Your agent subnet CIDR * **Service:** Custom * **Protocol:** TCP * **Port ranges:** Your database port (e.g., 5432) * **Priority:** 100 * **Name:** "AllowSupermetal`${ConnectorName}`Access" (e.g., "AllowSupermetalConnector1Access") 12. Save the rule #### Associate NSG with Subnet \[!toc] 13. Navigate to **Virtual Networks** > your VNet > **Subnets** 14. Click on the subnet where Supermetal agents will be deployed 15. Under **Network security group**, select the NSG you created earlier 16. Click **Save** #### Create Network Security Group and Configure Database Access \[!toc] ```terraform # Variables for connector-specific resources variable "connector_name" { description = "Unique identifier for this connector (used in resource naming)" type = string default = "connector1" } variable "vnet_name" { description = "Name of your virtual network" type = string } variable "agent_subnet_name" { description = "Name of the subnet where Supermetal agents will be deployed" type = string } variable "supermetal_metadata_endpoint" { description = "Specific endpoint for Supermetal metadata service (provided by Supermetal)" type = string } # Create the agent network security group with unique name for this connector resource "azurerm_network_security_group" "supermetal_agent_nsg" { name = "supermetal-${var.connector_name}-nsg" location = var.location resource_group_name = var.resource_group_name # Outbound rule for metadata private endpoint access security_rule { name = "AllowMetadataAccess" priority = 100 direction = "Outbound" access = "Allow" protocol = "Tcp" source_port_range = "*" destination_port_range = "443" source_address_prefix = "*" destination_address_prefix = var.supermetal_metadata_endpoint description = "Allow access to Supermetal metadata service" } # Outbound rule for database access security_rule { name = "AllowDatabaseAccess" priority = 110 direction = "Outbound" access = "Allow" protocol = "Tcp" source_port_range = "*" destination_port_range = var.database_port source_address_prefix = "*" destination_address_prefix = var.database_subnet_cidr description = "Allow access to database" } } # Look up existing database NSG data "azurerm_network_security_group" "database_nsg" { name = var.database_nsg_name resource_group_name = var.database_resource_group_name } # Add inbound rule to database NSG resource "azurerm_network_security_rule" "allow_supermetal_agent" { name = "AllowSupermetal${title(var.connector_name)}Access" priority = 100 direction = "Inbound" access = "Allow" protocol = "Tcp" source_port_range = "*" destination_port_range = var.database_port source_address_prefix = var.agent_subnet_cidr destination_address_prefix = "*" resource_group_name = var.database_resource_group_name network_security_group_name = var.database_nsg_name } # Associate the NSG with the agent subnet data "azurerm_virtual_network" "vnet" { name = var.vnet_name resource_group_name = var.resource_group_name } data "azurerm_subnet" "agent_subnet" { name = var.agent_subnet_name virtual_network_name = var.vnet_name resource_group_name = var.resource_group_name } resource "azurerm_subnet_network_security_group_association" "nsg_association" { subnet_id = data.azurerm_subnet.agent_subnet.id network_security_group_id = azurerm_network_security_group.supermetal_agent_nsg.id } # Output the NSG ID for reference output "supermetal_agent_nsg_id" { value = azurerm_network_security_group.supermetal_agent_nsg.id } ``` * `connector_name`: Unique identifier for this connector (important if you're deploying multiple connectors) * `resource_group_name`: Your Azure resource group name * `location`: Azure region for deployment * `vnet_name`: The name of your Azure Virtual Network * `agent_subnet_name`: The name of the subnet where Supermetal agents will be deployed * `database_subnet_cidr`: CIDR range of your database subnet (e.g., "10.0.2.0/24") * `agent_subnet_cidr`: CIDR range of the subnet where Supermetal agents will be deployed * `database_nsg_name`: The name of your existing database NSG * `database_resource_group_name`: The resource group containing your database NSG * `database_port`: The port number your database listens on (e.g., 5432 for PostgreSQL) * `supermetal_metadata_endpoint`: Specific endpoint for Supermetal metadata service (provided by Supermetal) #### Validate Source database \[!toc] Follow the setup instructions in the Supermetal Console to input the created NSG ID for the source database and validate the network reachability and connection credentials. #### Repeat for Target database \[!toc] Repeat the same steps for the target database. #### Finalize \[!toc] Once the source and target database connections are validated, you can finalize the connector setup which will launch the Supermetal agents in your Azure VNet. ***
## Appendix ### Environment Bootstrap Scripts Supermetal provides Terraform scripts for bootstrapping your Azure environment. The scripts create all necessary resources and permissions for Supermetal to manage the data replication agents in your Azure subscription. ```terraform terraform { required_version = ">= 1.0" required_providers { azurerm = { source = "hashicorp/azurerm" version = "~> 3.0" } azuread = { source = "hashicorp/azuread" version = "~> 2.0" } } } provider "azurerm" { features { key_vault { purge_soft_delete_on_destroy = false recover_soft_deleted_key_vaults = true } } } provider "azuread" {} # ============================================================================= # Data Sources # ============================================================================= data "azurerm_client_config" "current" {} data "azurerm_subscription" "current" {} # ============================================================================= # Input Variables # ============================================================================= variable "resource_group_name" { description = "Name of the Resource Group where Supermetal will deploy resources" type = string validation { condition = can(regex("^[a-zA-Z0-9-_]+$", var.resource_group_name)) error_message = "Resource group name must contain only alphanumeric characters, hyphens, and underscores." } } variable "location" { description = "Azure region for deployment (e.g., eastus, westeurope)" type = string } variable "vnet_id" { description = "Resource ID of your existing Virtual Network (e.g., /subscriptions/.../resourceGroups/.../providers/Microsoft.Network/virtualNetworks/...)" type = string } variable "private_subnet_ids" { description = "List of private subnet Resource IDs where agents and private endpoints will be deployed" type = list(string) validation { condition = length(var.private_subnet_ids) > 0 error_message = "At least one private subnet must be provided." } } variable "credentials_key_vault_id" { description = "Resource ID of existing Key Vault for storing credentials. If empty, a new Key Vault will be created" type = string default = "" } variable "external_id" { description = "Unique identifier provided by Supermetal Console for securing cross-tenant access" type = string sensitive = true } variable "supermetal_tenant_id" { description = "Azure AD Tenant ID of Supermetal Control Plane" type = string } variable "supermetal_app_id" { description = "Azure AD Application ID of Supermetal Control Plane" type = string } variable "supermetal_principal_id" { description = "Object ID of Supermetal's Service Principal" type = string } variable "supermetal_metadata_service_id" { description = "Resource ID of Supermetal Metadata Service's Private Link Service" type = string } variable "supermetal_container_registry_name" { description = "Name of Supermetal's container registry for agent images" type = string } # ============================================================================= # Local Values # ============================================================================= locals { # Parse VNet information vnet_parts = split("/", var.vnet_id) vnet_name = element(local.vnet_parts, length(local.vnet_parts) - 1) vnet_resource_group = element(local.vnet_parts, index(local.vnet_parts, "resourceGroups") + 1) # Subnet selection primary_subnet_id = var.private_subnet_ids[0] # Key Vault configuration create_key_vault = var.credentials_key_vault_id == "" key_vault_id = local.create_key_vault ? azurerm_key_vault.credentials[0].id : var.credentials_key_vault_id key_vault_name = local.create_key_vault ? "supermetal-kv-${substr(md5(var.resource_group_name), 0, 8)}" : "" existing_kv_rg = local.create_key_vault ? "" : split("/", var.credentials_key_vault_id)[4] existing_kv_name = local.create_key_vault ? "" : split("/", var.credentials_key_vault_id)[8] # Common tags common_tags = { "supermetal:byoc:external-id" = var.external_id "supermetal:byoc:resource-group" = var.resource_group_name "supermetal:byoc:stack-name" = "supermetal-bootstrap-${var.resource_group_name}" "supermetal:managed" = "true" "Environment" = "Production" } } # ============================================================================= # Resource Group Data Source # ============================================================================= data "azurerm_resource_group" "target" { name = var.resource_group_name } # ============================================================================= # Networking Data Sources # ============================================================================= data "azurerm_virtual_network" "vnet" { name = local.vnet_name resource_group_name = local.vnet_resource_group } data "azurerm_subnet" "primary" { name = element(split("/", local.primary_subnet_id), length(split("/", local.primary_subnet_id)) - 1) virtual_network_name = local.vnet_name resource_group_name = local.vnet_resource_group } # ============================================================================= # Custom Role Definition for Supermetal Control Plane # ============================================================================= resource "azurerm_role_definition" "supermetal_control_plane" { name = "SupermetalControlPlaneRole-${var.resource_group_name}" scope = data.azurerm_subscription.current.id description = "Allows Supermetal Control Plane to manage resources in customer subscription" permissions { actions = [ # ===== Virtual Machine Scale Sets ===== "Microsoft.Compute/virtualMachineScaleSets/read", "Microsoft.Compute/virtualMachineScaleSets/write", "Microsoft.Compute/virtualMachineScaleSets/delete", "Microsoft.Compute/virtualMachineScaleSets/start/action", "Microsoft.Compute/virtualMachineScaleSets/powerOff/action", "Microsoft.Compute/virtualMachineScaleSets/restart/action", "Microsoft.Compute/virtualMachineScaleSets/deallocate/action", "Microsoft.Compute/virtualMachineScaleSets/manualUpgrade/action", "Microsoft.Compute/virtualMachineScaleSets/scale/action", "Microsoft.Compute/virtualMachineScaleSets/instanceView/read", "Microsoft.Compute/virtualMachineScaleSets/skus/read", # ===== VMSS Extensions (for Docker/Agent setup) ===== "Microsoft.Compute/virtualMachineScaleSets/extensions/read", "Microsoft.Compute/virtualMachineScaleSets/extensions/write", "Microsoft.Compute/virtualMachineScaleSets/extensions/delete", # ===== Virtual Machines (VMSS instances) ===== "Microsoft.Compute/virtualMachines/read", "Microsoft.Compute/virtualMachines/instanceView/read", # ===== Managed Disks ===== "Microsoft.Compute/disks/read", "Microsoft.Compute/disks/write", "Microsoft.Compute/disks/delete", # ===== Container Instances (for validation) ===== "Microsoft.ContainerInstance/containerGroups/read", "Microsoft.ContainerInstance/containerGroups/write", "Microsoft.ContainerInstance/containerGroups/delete", "Microsoft.ContainerInstance/containerGroups/restart/action", "Microsoft.ContainerInstance/containerGroups/stop/action", "Microsoft.ContainerInstance/containerGroups/start/action", "Microsoft.ContainerInstance/containerGroups/containers/logs/read", "Microsoft.ContainerInstance/containerGroups/containers/exec/action", # ===== Container Registry Access ===== "Microsoft.ContainerRegistry/registries/pull/read", "Microsoft.ContainerRegistry/registries/artifacts/read", "Microsoft.ContainerRegistry/registries/metadata/read", # ===== Storage Accounts ===== "Microsoft.Storage/storageAccounts/read", "Microsoft.Storage/storageAccounts/write", "Microsoft.Storage/storageAccounts/delete", "Microsoft.Storage/storageAccounts/listkeys/action", "Microsoft.Storage/storageAccounts/listAccountSas/action", "Microsoft.Storage/storageAccounts/blobServices/containers/read", "Microsoft.Storage/storageAccounts/blobServices/containers/write", "Microsoft.Storage/storageAccounts/blobServices/containers/delete", # ===== Networking ===== "Microsoft.Network/virtualNetworks/subnets/join/action" "Microsoft.Network/networkInterfaces/read" "Microsoft.Network/networkSecurityGroups/read" "Microsoft.Network/networkSecurityGroups/join/action" # ===== Key Vault Management ===== "Microsoft.KeyVault/vaults/read", # ===== Database Discovery (Read-only) - Scoped to resource group ===== "Microsoft.Sql/servers/read", "Microsoft.Sql/servers/databases/read", "Microsoft.DBforPostgreSQL/servers/read", "Microsoft.DBforPostgreSQL/flexibleServers/read", "Microsoft.DBforMySQL/servers/read", "Microsoft.DBforMySQL/flexibleServers/read", "Microsoft.DBforMariaDB/servers/read" ] not_actions = [] data_actions = [ # ===== Key Vault Data Operations (scoped to supermetal-* items) ===== "Microsoft.KeyVault/vaults/secrets/getSecret/action", "Microsoft.KeyVault/vaults/secrets/setSecret/action", "Microsoft.KeyVault/vaults/keys/encrypt/action", "Microsoft.KeyVault/vaults/keys/decrypt/action", # ===== Storage Data Operations ===== "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read", "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write", "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete", "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/add/action" ] not_data_actions = [] } assignable_scopes = [ data.azurerm_resource_group.target.id ] } # ============================================================================= # Role Assignment for Supermetal Control Plane # ============================================================================= resource "azurerm_role_assignment" "supermetal_control_plane" { scope = data.azurerm_resource_group.target.id role_definition_id = azurerm_role_definition.supermetal_control_plane.role_definition_resource_id principal_id = var.supermetal_principal_id principal_type = "ServicePrincipal" # Condition to validate this is the correct service principal condition = <<-EOT ( @Request[Microsoft.Authorization/roleAssignments:principalId] == '${var.supermetal_principal_id}' ) AND ( @Request[Microsoft.Authorization/roleAssignments:principalType] == 'ServicePrincipal' ) EOT condition_version = "2.0" description = "Supermetal Control Plane access with External ID: ${var.external_id}" } # ============================================================================= # User-Assigned Managed Identities # ============================================================================= # Identity for Supermetal Agents (VMSS) resource "azurerm_user_assigned_identity" "agent" { name = "supermetal-agent-identity-${var.resource_group_name}" location = var.location resource_group_name = var.resource_group_name tags = local.common_tags } # Identity for Validation Containers (ACI) resource "azurerm_user_assigned_identity" "validation" { name = "supermetal-validation-identity-${var.resource_group_name}" location = var.location resource_group_name = var.resource_group_name tags = local.common_tags } # ============================================================================= # Key Vault for Credentials # ============================================================================= resource "azurerm_key_vault" "credentials" { count = local.create_key_vault ? 1 : 0 name = local.key_vault_name location = var.location resource_group_name = var.resource_group_name tenant_id = data.azurerm_client_config.current.tenant_id sku_name = "standard" soft_delete_retention_days = 90 purge_protection_enabled = true enable_rbac_authorization = false network_acls { default_action = "Deny" bypass = "AzureServices" ip_rules = [] virtual_network_subnet_ids = var.private_subnet_ids } tags = local.common_tags } # ============================================================================= # Key Vault Access Policies # ============================================================================= # Access for Supermetal Control Plane (encrypt only) resource "azurerm_key_vault_access_policy" "control_plane" { key_vault_id = local.key_vault_id tenant_id = var.supermetal_tenant_id object_id = var.supermetal_principal_id key_permissions = [ "Get", "List", "Encrypt" ] secret_permissions = [ "Set", "List" ] } # Access for Agent Identity (decrypt) resource "azurerm_key_vault_access_policy" "agent" { key_vault_id = local.key_vault_id tenant_id = data.azurerm_client_config.current.tenant_id object_id = azurerm_user_assigned_identity.agent.principal_id key_permissions = [ "Get", "Decrypt" ] secret_permissions = [ "Get" ] } # Access for Validation Identity resource "azurerm_key_vault_access_policy" "validation" { key_vault_id = local.key_vault_id tenant_id = data.azurerm_client_config.current.tenant_id object_id = azurerm_user_assigned_identity.validation.principal_id key_permissions = [ "Get", "Decrypt" ] secret_permissions = [ "Get" ] } # ============================================================================= # Network Security Groups # ============================================================================= # NSG for Agent Metadata Access resource "azurerm_network_security_group" "agent_metadata" { name = "supermetal-agent-metadata-nsg-${var.resource_group_name}" location = var.location resource_group_name = var.resource_group_name security_rule { name = "AllowHttpsToMetadataEndpoint" priority = 100 direction = "Outbound" access = "Allow" protocol = "Tcp" source_port_range = "*" destination_port_range = "443" source_address_prefix = "VirtualNetwork" destination_address_prefix = "PrivateEndpoint" description = "Allow HTTPS traffic to Supermetal metadata private endpoint" } security_rule { name = "AllowAzureStorage" priority = 110 direction = "Outbound" access = "Allow" protocol = "Tcp" source_port_range = "*" destination_port_range = "443" source_address_prefix = "VirtualNetwork" destination_address_prefix = "Storage.${var.location}" description = "Allow access to Azure Storage for buffer operations" } security_rule { name = "AllowAzureKeyVault" priority = 120 direction = "Outbound" access = "Allow" protocol = "Tcp" source_port_range = "*" destination_port_range = "443" source_address_prefix = "VirtualNetwork" destination_address_prefix = "AzureKeyVault.${var.location}" description = "Allow access to Azure Key Vault for credential retrieval" } tags = local.common_tags } # ============================================================================= # Private Endpoint for Metadata Service # ============================================================================= resource "azurerm_private_endpoint" "metadata" { name = "supermetal-metadata-endpoint-${var.resource_group_name}" location = var.location resource_group_name = var.resource_group_name subnet_id = local.primary_subnet_id private_service_connection { name = "supermetal-metadata-connection" private_connection_resource_id = var.supermetal_metadata_service_id is_manual_connection = false subresource_names = ["metadata"] } tags = local.common_tags } # ============================================================================= # Container Registry Access Role Assignments # ============================================================================= # Role assignment for agent identity to pull images resource "azurerm_role_assignment" "agent_acr_pull" { scope = "/subscriptions/${var.supermetal_tenant_id}/resourceGroups/supermetal-control-plane/providers/Microsoft.ContainerRegistry/registries/${var.supermetal_container_registry_name}" role_definition_name = "AcrPull" principal_id = azurerm_user_assigned_identity.agent.principal_id principal_type = "ServicePrincipal" } # Role assignment for validation identity to pull images resource "azurerm_role_assignment" "validation_acr_pull" { scope = "/subscriptions/${var.supermetal_tenant_id}/resourceGroups/supermetal-control-plane/providers/Microsoft.ContainerRegistry/registries/${var.supermetal_container_registry_name}" role_definition_name = "AcrPull" principal_id = azurerm_user_assigned_identity.validation.principal_id principal_type = "ServicePrincipal" } # ============================================================================= # Outputs # ============================================================================= output "control_plane_role_definition_id" { description = "ID of the custom role definition for Supermetal Control Plane" value = azurerm_role_definition.supermetal_control_plane.role_definition_resource_id } output "control_plane_role_assignment_id" { description = "ID of the role assignment for Supermetal Control Plane" value = azurerm_role_assignment.supermetal_control_plane.id } output "agent_identity_resource_id" { description = "Resource ID of the Managed Identity for Supermetal agents" value = azurerm_user_assigned_identity.agent.id } output "agent_identity_client_id" { description = "Client ID of the Managed Identity for Supermetal agents" value = azurerm_user_assigned_identity.agent.client_id } output "agent_identity_principal_id" { description = "Principal ID of the Managed Identity for Supermetal agents" value = azurerm_user_assigned_identity.agent.principal_id } output "validation_identity_resource_id" { description = "Resource ID of the Managed Identity for validation containers" value = azurerm_user_assigned_identity.validation.id } output "validation_identity_client_id" { description = "Client ID of the Managed Identity for validation containers" value = azurerm_user_assigned_identity.validation.client_id } output "key_vault_id" { description = "Resource ID of the Key Vault for credentials" value = local.key_vault_id } output "key_vault_uri" { description = "URI of the Key Vault for credentials" value = local.create_key_vault ? azurerm_key_vault.credentials[0].vault_uri : "https://${local.existing_kv_name}.vault.azure.net/" } output "metadata_endpoint_id" { description = "Resource ID of the Private Endpoint for Supermetal Metadata Service" value = azurerm_private_endpoint.metadata.id } output "metadata_endpoint_ip" { description = "Private IP address of the Metadata Service endpoint" value = azurerm_private_endpoint.metadata.private_service_connection[0].private_ip_address } output "agent_metadata_nsg_id" { description = "Resource ID of the NSG for agent metadata access" value = azurerm_network_security_group.agent_metadata.id } output "bootstrap_timestamp" { description = "Timestamp when bootstrap was completed" value = timestamp() } ``` *** ### Azure Roles & Permissions Bootstrap creates several Azure roles and managed identities with specific permissions necessary for Supermetal BYOC to function securely. These roles ensure that the Supermetal Control Plane can manage resources in your Azure subscription on your behalf, and that the Supermetal agents have the necessary permissions to operate. The principle of least privilege is applied to scope down permissions as much as possible. Key roles created and their purposes: #### Supermetal Control Plane Role ```hcl resource "azurerm_role_definition" "supermetal_control_plane" { name = "SupermetalControlPlaneRole-${var.resource_group_name}" scope = data.azurerm_subscription.current.id description = "Allows Supermetal Control Plane to manage resources in customer subscription" permissions { actions = [ # ===== Virtual Machine Scale Sets ===== "Microsoft.Compute/virtualMachineScaleSets/read", "Microsoft.Compute/virtualMachineScaleSets/write", "Microsoft.Compute/virtualMachineScaleSets/delete", "Microsoft.Compute/virtualMachineScaleSets/start/action", "Microsoft.Compute/virtualMachineScaleSets/powerOff/action", "Microsoft.Compute/virtualMachineScaleSets/restart/action", "Microsoft.Compute/virtualMachineScaleSets/deallocate/action", "Microsoft.Compute/virtualMachineScaleSets/manualUpgrade/action", "Microsoft.Compute/virtualMachineScaleSets/scale/action", "Microsoft.Compute/virtualMachineScaleSets/instanceView/read", "Microsoft.Compute/virtualMachineScaleSets/skus/read", # ===== VMSS Extensions (for Docker/Agent setup) ===== "Microsoft.Compute/virtualMachineScaleSets/extensions/read", "Microsoft.Compute/virtualMachineScaleSets/extensions/write", "Microsoft.Compute/virtualMachineScaleSets/extensions/delete", # ===== Virtual Machines (VMSS instances) ===== "Microsoft.Compute/virtualMachines/read", "Microsoft.Compute/virtualMachines/instanceView/read", # ===== Managed Disks ===== "Microsoft.Compute/disks/read", "Microsoft.Compute/disks/write", "Microsoft.Compute/disks/delete", # ===== Container Instances (for validation) ===== "Microsoft.ContainerInstance/containerGroups/read", "Microsoft.ContainerInstance/containerGroups/write", "Microsoft.ContainerInstance/containerGroups/delete", "Microsoft.ContainerInstance/containerGroups/restart/action", "Microsoft.ContainerInstance/containerGroups/stop/action", "Microsoft.ContainerInstance/containerGroups/start/action", "Microsoft.ContainerInstance/containerGroups/containers/logs/read", "Microsoft.ContainerInstance/containerGroups/containers/exec/action", # ===== Container Registry Access ===== "Microsoft.ContainerRegistry/registries/pull/read", "Microsoft.ContainerRegistry/registries/artifacts/read", "Microsoft.ContainerRegistry/registries/metadata/read", # ===== Storage Accounts ===== "Microsoft.Storage/storageAccounts/read", "Microsoft.Storage/storageAccounts/write", "Microsoft.Storage/storageAccounts/delete", "Microsoft.Storage/storageAccounts/listkeys/action", "Microsoft.Storage/storageAccounts/listAccountSas/action", "Microsoft.Storage/storageAccounts/blobServices/containers/read", "Microsoft.Storage/storageAccounts/blobServices/containers/write", "Microsoft.Storage/storageAccounts/blobServices/containers/delete", # ===== Networking ===== "Microsoft.Network/virtualNetworks/subnets/join/action" "Microsoft.Network/networkInterfaces/read" "Microsoft.Network/networkSecurityGroups/read" "Microsoft.Network/networkSecurityGroups/join/action" # ===== Key Vault Management ===== "Microsoft.KeyVault/vaults/read", # ===== Database Discovery (Read-only) ===== "Microsoft.Sql/servers/read", "Microsoft.Sql/servers/databases/read", "Microsoft.DBforPostgreSQL/servers/read", "Microsoft.DBforPostgreSQL/flexibleServers/read", "Microsoft.DBforMySQL/servers/read", "Microsoft.DBforMySQL/flexibleServers/read", "Microsoft.DBforMariaDB/servers/read" ] not_actions = [] data_actions = [ # ===== Key Vault Data Operations (scoped to supermetal-* items) ===== "Microsoft.KeyVault/vaults/secrets/getSecret/action", "Microsoft.KeyVault/vaults/secrets/setSecret/action", "Microsoft.KeyVault/vaults/keys/encrypt/action", "Microsoft.KeyVault/vaults/keys/decrypt/action", # ===== Storage Data Operations ===== "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read", "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write", "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete", "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/add/action" ] not_data_actions = [] } assignable_scopes = [ data.azurerm_resource_group.target.id ] } ``` This role is assigned to the Supermetal Service Principal. It enables the Supermetal Control Plane to orchestrate and manage the lifecycle of resources required for the Supermetal data plane within your Azure subscription. * **Virtual Machine Scale Sets Management** * Create, read, update, and delete VMSS resources (`Microsoft.Compute/virtualMachineScaleSets/*`) * Control instance lifecycle (start, stop, restart, scale) for agent deployment * Manage VMSS extensions for Docker/agent setup * Read-only access to VM instances (`Microsoft.Compute/virtualMachines/read`) * **Container Instances and Registry** * Full management of Container Instances for validation tasks (`Microsoft.ContainerInstance/containerGroups/*`) * Pull access to container registry for agent images (`Microsoft.ContainerRegistry/registries/pull/read`) * Used for deploying validation containers * **Storage and Data Management** * Storage Account management (`Microsoft.Storage/storageAccounts/*`) * Used for data buffering and staging per connector * **Networking** * Read VNet and subnet information (`Microsoft.Network/virtualNetworks/*/read`) * Join subnets for resource deployment (`Microsoft.Network/virtualNetworks/subnets/join/action`) * Read and join network security groups * **Security and Access Control** * Key Vault read access (`Microsoft.KeyVault/vaults/read`) * Key Vault data operations (get/set secrets, encrypt/decrypt with keys) * Restricted to resource group scope specified during bootstrap * **Database Discovery**
* Read-only access to database resources (`Microsoft.Sql/servers/read`, `Microsoft.DBforPostgreSQL/*/read`, etc.) * Limited to discovery capabilities for validating connections *** #### Agent and Validation Identities ```hcl # Identity for Supermetal Agents (VMSS) resource "azurerm_user_assigned_identity" "agent" { name = "supermetal-agent-identity-${var.resource_group_name}" location = var.location resource_group_name = var.resource_group_name tags = local.common_tags } # Access for Agent Identity (decrypt) resource "azurerm_key_vault_access_policy" "agent" { key_vault_id = local.key_vault_id tenant_id = data.azurerm_client_config.current.tenant_id object_id = azurerm_user_assigned_identity.agent.principal_id key_permissions = [ "Get", "Decrypt" ] secret_permissions = [ "Get" ] } # Role assignment for agent identity to pull images resource "azurerm_role_assignment" "agent_acr_pull" { scope = "/subscriptions/${var.supermetal_tenant_id}/resourceGroups/supermetal-control-plane/providers/Microsoft.ContainerRegistry/registries/${var.supermetal_container_registry_name}" role_definition_name = "AcrPull" principal_id = azurerm_user_assigned_identity.agent.principal_id principal_type = "ServicePrincipal" } ``` This User-Assigned Managed Identity is attached to the VMSS instances running Supermetal agents. It provides the following permissions: * **Key Vault Access** * Get and decrypt operations on keys (`Get`, `Decrypt`) * Retrieve secrets from Key Vault (`Get`) * Used to securely access connection credentials * **Container Registry Access** * AcrPull role on Supermetal's container registry * Enables pulling agent container images from Supermetal's registry ```hcl # Identity for Validation Containers (ACI) resource "azurerm_user_assigned_identity" "validation" { name = "supermetal-validation-identity-${var.resource_group_name}" location = var.location resource_group_name = var.resource_group_name tags = local.common_tags } # Access for Validation Identity resource "azurerm_key_vault_access_policy" "validation" { key_vault_id = local.key_vault_id tenant_id = data.azurerm_client_config.current.tenant_id object_id = azurerm_user_assigned_identity.validation.principal_id key_permissions = [ "Get", "Decrypt" ] secret_permissions = [ "Get" ] } # Role assignment for validation identity to pull images resource "azurerm_role_assignment" "validation_acr_pull" { scope = "/subscriptions/${var.supermetal_tenant_id}/resourceGroups/supermetal-control-plane/providers/Microsoft.ContainerRegistry/registries/${var.supermetal_container_registry_name}" role_definition_name = "AcrPull" principal_id = azurerm_user_assigned_identity.validation.principal_id principal_type = "ServicePrincipal" } ``` This User-Assigned Managed Identity is used by Azure Container Instances (ACI) that may be deployed for validation tasks such as testing database connectivity or network reachability. * **Key Vault Access** * Similar permissions to the Agent identity (Get, Decrypt) * Retrieves necessary connection credentials for validation * **Container Registry Access** * AcrPull role on Supermetal's container registry * Enables pulling validation container images # BYOC (/docs/main/concepts/deployment/byoc) import { Icons } from '../../../../../../app/icons'; import SupermetalHighLevelBYOCArchitectureDiagram from '../../../../../../app/diagrams/byoc'; The Bring your own cloud (BYOC) / Hybrid model offers a balance between the ease of a managed service and the security/control of self-hosting. It separates Supermetal into two distinct components: a Control Plane managed by Supermetal, and a Data Plane that runs entirely within your cloud environment. ### Overview * **Control Plane:** This is the centralized management and infrastructure orchestration layer hosted by Supermetal. It handles metadata, monitoring, and the entire connector lifecycle. * **Data Plane:** This consists of Supermetal agents deployed within your Virtual Private Cloud (VPC) on your chosen cloud provider (AWS, Azure, GCP). The Data Plane handles all data processing and movement. **Crucially, customer data never leaves your VPC or network boundary. The Control Plane only interacts with metadata.** ### Control Plane The Supermetal-hosted Control Plane is responsible for: * **Connector Lifecycle Management:** * Storing connector definitions and configurations. * Managing encrypted credentials securely. * Maintaining connector state (e.g., WAL positions, LSNs for CDC). * Orchestrating provisioning and de-provisioning workflows for agents in the Data Plane. * Scaling, failover, and high availability of data plane agents. * **Operational Interface:** * Providing a centralized web console & REST APIs for creating, managing, and monitoring all connectors across your organization. * **Analytics and Insights:** * Aggregating logs, metrics, and performance analytics from all Data Plane agents. * **Security and Governance:** * Implementing SSO, Role-Based Access Control (RBAC) for managing user & permissions. * Maintaining audit logs for all management operations. ### Data Plane The Data Plane runs entirely within your cloud infrastructure (e.g., your AWS VPC, Azure VNet, or GCP VPC) and consists of: * **Supermetal Agents:** These are the same lightweight, efficient agents used in the self-hosted model. They perform the actual data extraction from sources and loading to targets. * **Network Security:** Operates entirely within your VPC's security boundaries, adhering to your network ACLs, security groups, and private networking configurations. * **Data Confinement:** Customer data is processed and moved directly between sources and targets within the customer's VPC. It *never* traverses the public internet to the Control Plane or leaves the customer's designated network environment. * **Metadata Exchange:** Only essential metadata (e.g., connector state, operational logs, performance metrics) is securely transmitted to the Control Plane using private endpoints for management and observability. ### Benefits of BYOC / Hybrid Deployment This split two plane architecture offers significant advantages: * **Data Sovereignty and Security:** Ensures that sensitive data remains within your secure cloud environment and network boundaries at all times, simplifying compliance with regulations like GDPR, HIPAA, and CCPA. * **Reduced Egress Costs:** By processing data within your VPC, especially when sources and targets are in the same cloud region, you can significantly reduce or eliminate costly data egress charges. * **Leverage Existing Infrastructure and Security:** Utilize your established cloud security posture, including IAM roles, security groups, VPC endpoints, private links, and monitoring tools. * **Optimized Performance:** Data transfer occurs over your private network, potentially offering lower latency and higher bandwidth than solutions routing data externally. * **Simplified Management:** The Supermetal-managed Control Plane offloads the operational burden of managing the orchestration, monitoring, and update mechanisms for the core replication logic, allowing your teams to focus on data integration tasks rather than infrastructure management. * **Centralized Control with Distributed Execution:** Gain a unified view and control over all your data pipelines via the Control Plane, while the actual data processing happens securely and efficiently within your environment. ### Next Steps # Monitoring (/docs/main/concepts/deployment/monitoring) Supermetal provides real-time monitoring of its replication processes, offering deep insights into health and performance. All metrics and logs are recorded using the [OpenTelemetry (OTEL)](https://opentelemetry.io/) vendor-neutral standard. OTEL ensures that you can seamlessly integrate Supermetal's observability data with your preferred monitoring and logging tools, such as [Datadog](https://docs.datadoghq.com/opentelemetry/), [Splunk](https://docs.splunk.com/Observability/gdi/opentelemetry/opentelemetry.html), or any [other OTEL-compatible](https://opentelemetry.io/ecosystem/vendors/) platform. Key observability features include: * [**Comprehensive Metrics:**](/docs/main/concepts/deployment/monitoring/metrics) Detailed metrics covering data throughput, latency, resource utilization, and operational events. * **Structured Logging:** Rich, structured logs that provide context for events and errors, facilitating easier debugging and analysis. ## Export You can configure the agent to send this telemetry data to your preferred OTLP-compatible backend. The method for configuring the OTLP exporter endpoint varies depending on your Supermetal deployment model. ### Local / Self-Hosted When running the Supermetal agent locally, you can configure the OTLP exporter endpoint and other parameters using standard OpenTelemetry environment variables. Key variables include: * `OTEL_EXPORTER_OTLP_ENDPOINT`: Specifies the target URL for the OTLP collector (e.g., `http://localhost:4317` for gRPC or `http://localhost:4318/v1/metrics` for HTTP/protobuf metrics). * `OTEL_EXPORTER_OTLP_HEADERS`: Allows you to specify headers, such as authentication tokens (e.g., `Authorization=Bearer `). For detailed information on OTLP exporter configuration, refer to the official [OpenTelemetry SDK configuration documentation](https://opentelemetry.io/docs/languages/sdk-configuration/otlp-exporter/). Set these environment variables in the environment where your Supermetal agent is running. ### BYOC / Hybrid For deployments managed by the Supermetal Control Plane, including Bring Your Own Cloud (BYOC), the configuration of OTLP exporter endpoints for your agents is managed centrally through the Supermetal Console. Navigate to settings within the Supermetal Console to specify the OTLP endpoint URL and any necessary authentication headers for your chosen monitoring backend. The Control Plane will then start propagating telemetry data to your environment. # Metrics (/docs/main/concepts/deployment/monitoring/metrics) ## Connector Metrics These metrics track data flow and performance through the ETL pipeline from source to target systems. | Category | Metric Name | Description | Unit | Type | | ----------------- | --------------------------------------- | ---------------------------------------------------------------------------------------------------- | ---- | --------------- | | Data | `rows.in` | Number of rows read | 1 | Counter (u64) | | | `rows.out` | Number of rows written | 1 | Counter (u64) | | | `bytes.in` | Number of bytes read | By | Counter (u64) | | | `bytes.out` | Number of bytes written | By | Counter (u64) | | Latency | `latency.total` | Latency from source event timestamp to event being processed & written to target. End-to-end latency | ns | Histogram (u64) | | | `latency.in` | Latency from source event timestamp to when the event was received | ns | Histogram (u64) | | | `latency.out` | Latency from when the event was received to when event written to target | ns | Histogram (u64) | | | `latency.in.process` | Latency to process source event | ns | Histogram (u64) | | | `latency.out.process` | Latency to process event before writing to target | ns | Histogram (u64) | | Buffer | | | | | | ↳ Data | `buffer.rows.in` | Number of rows read from the buffer | 1 | Counter (u64) | | | `buffer.rows.out` | Number of rows written to the buffer | 1 | Counter (u64) | | | `buffer.bytes.in` | Number of bytes read from the buffer | By | Counter (u64) | | | `buffer.bytes.out` | Number of bytes written to the buffer | By | Counter (u64) | | ↳ Latency | `buffer.latency.total` | Latency from source event timestamp to event being processed & written to buffer | ns | Histogram (u64) | | | `buffer.latency.in` | Latency from source event timestamp to when the event was received by buffer | ns | Histogram (u64) | | | `buffer.latency.out` | Latency from when the event was received to when event written to buffer | ns | Histogram (u64) | | | `buffer.latency.in.process` | Latency to process the event was received by buffer | ns | Histogram (u64) | | | `buffer.latency.out.process` | Latency to process event before writing to buffer | ns | Histogram (u64) | | ↳ Object Store | `buffer.object_store.files.force-flush` | Number of times buffer forced a flush (on no writes) | 1 | Counter (u64) | | | `buffer.object_store.files.abort` | Failed file operations due to errors | 1 | Counter (u64) | | | `buffer.object_store.files.create` | Successfully created new file in object storage | 1 | Counter (u64) | | | `buffer.object_store.files.delete` | Files removed from object storage | 1 | Counter (u64) | | | `buffer.object_store.files.get` | Files retrieved from object storage | 1 | Counter (u64) | | Snapshot Progress | `progress.snapshot` | Current progress of initial snapshot load (0.0 to 100.0) | % | Gauge (f64) | | | `buffer.progress.snapshot` | Current progress of buffer snapshot operations (0.0 to 100.0) | % | Gauge (f64) | ## Process Metrics System-level metrics for the Supermetal process following [OpenTelemetry semantic conventions](https://opentelemetry.io/docs/specs/semconv/system/process-metrics/). These help monitor resource usage and system health. | Category | Metric Name | Description | Unit | Type | | ----------------- | ------------------------------------ | --------------------------------------------------------- | ---- | ------------------- | | Process CPU | `process.cpu.time` | Total CPU time consumed by the process since startup | s | Counter (u64) | | | `process.cpu.utilization` | Current CPU usage as ratio (0.0=idle, 1.0=fully utilized) | 1 | Gauge (f64) | | Process Memory | `process.memory.usage` | Physical memory (RSS) currently used by the process | By | UpDownCounter (i64) | | | `process.memory.virtual` | Virtual memory currently allocated by the process | By | UpDownCounter (i64) | | Process Resources | `process.open_file_descriptor_count` | Currently open file descriptors (monitor for fd leaks) | 1 | UpDownCounter (i64) | | | `process.thread_count` | Currently active threads (monitor for thread leaks) | 1 | UpDownCounter (i64) | | | `process.disk.io` | Total disk I/O bytes with `direction` label (read/write) | By | Counter (u64) | ## Runtime Metrics Tokio async runtime metrics providing insights into task scheduling, worker thread utilization, and async I/O performance. | Category | Metric Name | Description | Unit | Type | | ----------------------- | ------------------------------------- | ------------------------------------------------------------------------------ | ---- | ------------- | | Runtime Tasks | `runtime.global_queue_depth` | Tasks waiting in global queue (high values indicate backpressure) | 1 | Gauge (u64) | | | `runtime.alive_tasks` | Currently executing async tasks | 1 | Gauge (u64) | | | `runtime.spawned_tasks` | Total async tasks created since startup | 1 | Counter (u64) | | | `runtime.budget_forced_yield_count` | Tasks forced to yield due to execution time limits (indicates CPU-bound tasks) | 1 | Counter (u64) | | Runtime Workers | `runtime.num_workers` | Total worker threads in the runtime thread pool | 1 | Gauge (u64) | | | `runtime.blocking_queue_depth` | Tasks waiting for blocking thread pool | 1 | Gauge (u64) | | | `runtime.num_blocking_threads` | Currently active blocking threads | 1 | Gauge (u64) | | | `runtime.num_idle_blocking_threads` | Idle blocking threads available for work | 1 | Gauge (u64) | | | `runtime.io_driver_ready_count` | I/O events processed by the runtime (network/file operations) | 1 | Counter (u64) | | Worker Metrics (opt-in) | `runtime.worker_local_queue_depth` | Tasks in each worker's local queue (set `ENABLE_WORKER_METRICS=1`) | 1 | Gauge (u64) | | | `runtime.worker_local_schedule_count` | Tasks scheduled locally per worker (indicates work distribution) | 1 | Counter (u64) | | | `runtime.worker_mean_poll_time` | Average task execution time per worker | ns | Gauge (u64) | | | `runtime.worker_noop_count` | Idle polls per worker (high values indicate low workload) | 1 | Counter (u64) | | | `runtime.worker_overflow_count` | Local queue overflows per worker (tasks moved to global queue) | 1 | Counter (u64) | | | `runtime.worker_park_count` | Times worker thread went idle waiting for tasks | 1 | Counter (u64) | | | `runtime.worker_poll_count` | Total task polls per worker | 1 | Counter (u64) | | | `runtime.worker_steal_count` | Tasks stolen from other workers (indicates load balancing) | 1 | Counter (u64) | | | `runtime.worker_steal_operations` | Steal attempts per worker (successful + failed) | 1 | Counter (u64) | | | `runtime.worker_total_busy_duration` | Cumulative execution time per worker | ns | Counter (u64) |