MongoDB

Supermetal's MongoDB integration leverages MongoDB's native change streams capability to provide efficient, reliable data extraction with minimal impact on your database. This integration enables both initial snapshot capture and continuous replication of changes through a strongly-typed schema approach.

This guide covers the features, prerequisites, and configuration steps required to connect MongoDB with Supermetal.


Features

FeatureNotes
Initial Data Sync

Change Data Capture

Schema Evolution

Catalog Support

Document Flattening


Prerequisites

Before you begin, ensure you have:

  • MongoDB Requirements:

    • Version: MongoDB 4.0 or higher (required for change streams support)
    • Supported Deployments:
      • MongoDB Community/Enterprise Server (replica set configuration)
      • MongoDB Atlas

    Change Streams Requirement

    Change streams require a replica set deployment. If you're using a standalone server, you must convert it to a single-node replica set to use change streams.

  • Deployment Requirements:

    • Replica Set: Your MongoDB deployment must be configured as a replica set
    • Read Concern Majority: Supermetal uses read concern "majority" to ensure consistent reads
    • Database Permissions: User with appropriate permissions (see Setup)
    • Network Connectivity: Ensure Supermetal can reach your MongoDB deployment (default port: 27017)
    • TLS/SSL Support: Supermetal supports both unencrypted and TLS/SSL encrypted connections
  • MongoDB Atlas Requirements:

    • Network Access: Configure network access rules to allow Supermetal to connect
    • Connection String: Use the connection string format that includes all replica set members

Setup

Permissions Overview

Supermetal requires a dedicated MongoDB user with appropriate permissions to read data and access change streams. The recommended approach is to create a dedicated read-only user.

MongoDB DeploymentMinimum Required Permissions
Self-managedread role on the mongo database to replicate from
MongoDB AtlasreadAnyDatabase role

Create a Dedicated Read-Only MongoDB User

Connect to your MongoDB instance using the mongo shell with admin privileges:

mongosh --host <host> --port <port> -u <admin-username> -p <admin-password> --authenticationDatabase admin

Script Variables

Replace the placeholder values in the command with your actual information:

  • <host>: Your MongoDB server hostname or IP address
  • <port>: MongoDB port (default is 27017)
  • <admin-username>: Username with admin privileges
  • <admin-password>: Password for the admin user

Create a dedicated user for Supermetal:

use admin
db.createUser({
  user: "supermetal_user",
  pwd: "strong-password",
  roles: [
    { role: "read", db: "target-database" }
  ]
})

Script Variables

Replace the placeholder values in the script with your actual information:

  • strong-password: Replace with a secure, unique password for the supermetal_user.
  • target-database: The name of the database you want to replicate from.

Data Types Mapping

Supermetal automatically maps MongoDB BSON types to Apache Arrow data types according to the following mapping:

MongoDB BSON Type(s)Apache Arrow DataTypeNotes
DoubleFloat64
Int32Int32
Int64Int64
Decimal128Utf8Preserved as string to maintain exact precision and handle MongoDB's variable decimal precision/scale requirements
MongoDB BSON Type(s)Apache Arrow DataTypeNotes
DateTimeTimestamp(Millisecond)
TimestampTimestamp(Millisecond)MongoDB internal timestamp type
MongoDB BSON Type(s)Apache Arrow DataTypeNotes
StringUtf8
SymbolUtf8Deprecated MongoDB type
RegularExpressionUtf8Converted to string representation
JavaScriptCodeUtf8Converted to string representation
JavaScriptCodeWithScopeUtf8Converted to string representation
MongoDB BSON Type(s)Apache Arrow DataTypeNotes
BooleanBoolean
MongoDB BSON Type(s)Apache Arrow DataTypeNotes
Array (same data type)List<ElementType>Only for arrays with homogeneous element types
Array (different data types)JsonHeterogeneous arrays are converted to JSON
DocumentJsonNested documents represented as JSON
MongoDB BSON Type(s)Apache Arrow DataTypeNotes
BinaryUtf8Encoded as hexadecimal string for lossless representation and compatibility
MongoDB BSON Type(s)Apache Arrow DataTypeNotes
ObjectIdUtf8Converted to hex string representation
DbPointerUtf8Legacy MongoDB type
NullNullRepresented as null in target columns
MinKeyUtf8Special MongoDB comparison type
MaxKeyUtf8Special MongoDB comparison type
UndefinedUtf8Deprecated MongoDB type

Last updated on