MySQL

MySQL's binary log (binlog) provides a reliable foundation for Change Data Capture and data replication. Supermetal builds on this capability to deliver efficient data extraction with minimal database overhead, leveraging Global Transaction Identifiers (GTIDs) for guaranteed consistency.

This guide covers the features, prerequisites, and configuration steps required to connect MySQL with Supermetal mainly focusing on binlog setup and permissions.


Features

FeatureNotes
Initial Data Sync

Change Data Capture

Schema Evolution

Transactional Consistency

Catalog Support


Prerequisites

Before you begin, ensure you have:

  • MySQL Requirements:

    • Version: MySQL 8.0 or later with GTID mode enabled
    • Supported Deployments:
      • Self-Hosted MySQL
      • Amazon RDS for MySQL (Coming Soon)
      • Azure Database for MySQL (Coming Soon)

    GTID Requirement

    GTID mode must be enabled on your MySQL server for Change Data Capture functionality. See MySQL GTID Documentation for details.

  • Binary Log Requirements:

    • Binary Log Format: Must be set to ROW format
    • Binary Log Retention: Configure appropriate retention period (recommended: at least 3 days)
  • Deployment Requirements:

    • Database Permissions: User with SELECT, REPLICATION SLAVE, and REPLICATION CLIENT privileges
    • Network Connectivity: Ensure Supermetal can reach your MySQL server (default port: 3306)

Setup

Permissions Overview

Supermetal requires a dedicated MySQL user with appropriate permissions to read data and access the binary log for CDC.

Required PermissionPurpose
SELECTRead data from tables during snapshot and CDC
REPLICATION SLAVEAccess the binary log for CDC
REPLICATION CLIENTQuery server status information including GTID position

Setup Instructions

Self-Hosted MySQL Setup

Enable Binary Logging and GTID Mode

Edit your MySQL configuration file (my.cnf or my.ini):

[mysqld]
# Server identification (must be unique in your infrastructure)
server-id = 223344

# Enable binary logging
log_bin = mysql-bin
binlog_format = ROW
binlog_row_image = FULL
binlog_row_metadata = FULL
   
# Enable GTID mode
gtid_mode = ON
enforce_gtid_consistency = ON
   
# Set binary log retention (3 days)
binlog_expire_logs_seconds = 259200

Configuration Parameters

  • server-id: Must be unique for each server in your replication topology. Can be any non-zero value between 1 and 4294967295.
  • log_bin: Base name for binary log files.
  • binlog_format: Must be set to ROW for CDC to work properly.
  • binlog_row_image: Must be set to FULL to capture complete before/after row images, which is required for CDC.
  • binlog_row_metadata: Must be set to FULL to include complete metadata in binary log events.
  • gtid_mode and enforce_gtid_consistency: Required for reliable replication position tracking.
  • binlog_expire_logs_seconds: Controls how long binary logs are retained before automatic deletion.

Create a Dedicated User for Replication

Connect to MySQL as a privileged user and run:

CREATE USER 'supermetal_user'@'%' IDENTIFIED BY 'strong-password';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'supermetal_user'@'%';
FLUSH PRIVILEGES;

Script Variables

Replace the following variables with your own values:

  • supermetal_user: Username for Supermetal access
  • strong-password: Secure password for the Supermetal user

Verify Configuration

Verify binary logging and GTID configuration:

SHOW VARIABLES LIKE 'gtid_mode';
SHOW VARIABLES LIKE 'enforce_gtid_consistency';
SHOW VARIABLES LIKE 'log_bin';
SHOW VARIABLES LIKE 'binlog_format';
SHOW VARIABLES LIKE 'binlog_row_image';
SHOW VARIABLES LIKE 'binlog_row_metadata';
SHOW VARIABLES LIKE 'server_id';

The output should show:

  • gtid_mode: ON
  • enforce_gtid_consistency: ON
  • log_bin: ON
  • binlog_format: ROW
  • binlog_row_image: FULL
  • binlog_row_metadata: FULL
  • server_id: A non-zero value

Amazon RDS MySQL Setup

Create a Parameter Group for GTID Mode

  1. In the AWS RDS console, navigate to "Parameter groups"
  2. Click "Create parameter group"
  3. Select the MySQL engine and version that matches your RDS instance
  4. Enter a name and description (e.g., "supermetal-cdc-params")
  5. Click "Create"
  6. Select the newly created parameter group and modify the following parameters:
    • gtid-mode: ON
    • enforce_gtid_consistency: ON
    • binlog_format: ROW
    • binlog_row_image: FULL
    • binlog_row_metadata: FULL

RDS Limitations

In RDS, some parameters cannot be modified, including server-id. RDS automatically assigns a unique server-id to each instance, so you don't need to configure this manually.

Apply the Parameter Group to Your RDS Instance

  1. Navigate to your RDS instances and select your MySQL instance
  2. Click "Modify"
  3. Under "Additional configuration", select the parameter group you created
  4. Choose "Apply immediately" or schedule for your next maintenance window
  5. Click "Continue" and confirm the changes

Set Binary Log Retention Period

Connect to your RDS MySQL instance and run:

CALL mysql.rds_set_configuration('binlog retention hours', 72);

RDS Specific

Amazon RDS uses binlog retention hours instead of the standard binlog_expire_logs_seconds parameter. The default value is 0 (immediate removal), so this step is essential.

Create a User for Replication

CREATE USER 'supermetal_user'@'%' IDENTIFIED BY 'strong-password';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'supermetal_user'@'%';
FLUSH PRIVILEGES;

Script Variables

Replace the following variables with your own values:

  • supermetal_user: Username for Supermetal access
  • strong-password: Secure password for the Supermetal user

Azure Database for MySQL Setup

Configure Server Parameters

  1. In the Azure portal, navigate to your MySQL server resource
  2. Select "Server parameters" under the Settings section
  3. Find and set the following parameters:
    • gtid_mode: ON
    • enforce_gtid_consistency: ON
    • binlog_format: ROW
    • binlog_row_image: FULL
    • binlog_row_metadata: FULL
  4. Click "Save" to apply the changes
  5. Restart your Azure Database for MySQL server if prompted

Azure Limitations

Some parameters might be read-only in Azure Database for MySQL, particularly in Flexible Server configurations. The server-id is automatically assigned by Azure and cannot be modified.

Create a User for Replication

Connect to your Azure MySQL instance and run:

CREATE USER 'supermetal_user'@'%' IDENTIFIED BY 'strong-password';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'supermetal_user'@'%';
FLUSH PRIVILEGES;

Script Variables

Replace the following variables with your own values:

  • supermetal_user: Username for Supermetal access
  • strong-password: Secure password for the Supermetal user

Network Security

Ensure your Azure MySQL server's firewall rules allow connections from the Supermetal service IP addresses.


Data Types Mapping

Supermetal automatically maps MySQL data types to Apache Arrow data types according to the following mapping:

MySQL TypeArrow TypeNotes
TINYINTInt8 / UInt8Signed/unsigned determined by column definition
SMALLINTInt16 / UInt16Signed/unsigned determined by column definition
MEDIUMINTInt32 / UInt32Signed/unsigned determined by column definition
INTInt32 / UInt32Signed/unsigned determined by column definition
BIGINTInt64 / UInt64Signed/unsigned determined by column definition
FLOATFloat32
DOUBLEFloat64
DECIMALDecimal128 / Decimal256 / Utf8Decimal128 for precision ≤ 38, Decimal256 for precision ≤ 76, Utf8 for precision > 76
MySQL TypeArrow TypeNotes
CHARUtf8
VARCHARUtf8
TEXTUtf8Includes all TEXT variants (TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT)
ENUMUtf8Original enum values preserved as strings
SETList<Utf8>Converted to arrays of strings (e.g., 'value1,value2' → ["value1", "value2"])
JSONUtf8Preserved as JSON string
MySQL TypeArrow TypeNotes
BINARYBinary
VARBINARYBinary
BLOBBinaryIncludes all BLOB variants (TINYBLOB, BLOB, MEDIUMBLOB, LONGBLOB)
BITUtf8Converted to binary string representation (e.g., BIT(4) value 5 becomes "0101")
MySQL TypeArrow TypeNotes
DATEDate32
TIMEIntervalDayTime / Utf8IntervalDayTime for precision ≤ 3 (milliseconds), Utf8 for precision > 3 (microseconds)
DATETIMETimestamp(Microsecond)Full microsecond precision without timezone
TIMESTAMPTimestamp(Microsecond)Full microsecond precision, stored in UTC internally by MySQL
YEARUInt16
MySQL TypeArrow Type
GEOMETRYUnsupported
POINTUnsupported
LINESTRINGUnsupported
POLYGONUnsupported
MULTIPOINTUnsupported
MULTILINESTRINGUnsupported
MULTIPOLYGONUnsupported
GEOMCOLLECTIONUnsupported
VECTORUnsupported

Columns with unsupported data types are automatically excluded from replication. Tables containing only unsupported column types cannot be replicated.

Last updated on