MongoDB

Supermetal's MongoDB integration leverages MongoDB's native change streams capability to provide efficient, reliable data extraction with minimal impact on your database. This integration enables both initial snapshot capture and continuous replication of changes.

This guide covers the features, prerequisites, and configuration steps required to connect MongoDB with Supermetal.


Replication Modes

Supermetal offers two replication modes to handle MongoDB's flexible document model:

Schema Mode: Automatically infers and evolves a strongly-typed schema from your documents. Each field becomes a typed column in the target, enabling native SQL queries and type-safe analytics. Best for structured data and analytics workloads.

Schemaless Mode: Preserves documents as JSON in a two-column format (_id, document). Supports parallelized snapshots for faster initial loads. Ideal for highly variable document structures or when you need to preserve the original document format for downstream JSON processing.

Click each tab to see how a MongoDB document is replicated in each mode.

{
  "_id": ObjectId("507f1f77bcf86cd799439011"),
  "name": "Alice",
  "email": "[email protected]",
  "age": 30,
  "is_active": true,
  "tags": ["admin", "user"]
}
ColumnValueType
_id507f1f77bcf86cd799439011Utf8
nameAliceUtf8
email[email protected]Utf8
age30Int32
is_activetrueBoolean
tags["admin","user"]Utf8

Arrays

Arrays are serialized as JSON strings to avoid schema conflicts when element types vary across documents.

ColumnValue
_id507f1f77bcf86cd799439011
document{"name":"Alice","email":"[email protected]","age":30,"is_active":true,"tags":["admin","user"]}

Features

FeatureNotes
Initial Data Sync

Change Data Capture

Schema Evolution

Catalog Support

Document Flattening


Prerequisites

Before you begin, ensure you have:

  • MongoDB Requirements:

    • Version: MongoDB 4.0 or higher (required for change streams support)
    • Supported Deployments:
      • MongoDB Community/Enterprise Server (replica set configuration)
      • MongoDB Atlas

    Change Streams Requirement

    Change streams require a replica set deployment. If you're using a standalone server, you must convert it to a single-node replica set to use change streams.

  • Deployment Requirements:

    • Replica Set: Your MongoDB deployment must be configured as a replica set
    • Read Concern Majority: Supermetal uses read concern "majority" to ensure consistent reads
    • Database Permissions: User with appropriate permissions (see Setup)
    • Network Connectivity: Ensure Supermetal can reach your MongoDB deployment (default port: 27017)
    • TLS/SSL Support: Supermetal supports both unencrypted and TLS/SSL encrypted connections
  • MongoDB Atlas Requirements:

    • Network Access: Configure network access rules to allow Supermetal to connect
    • Connection String: Use the connection string format that includes all replica set members

Setup

Permissions Overview

Supermetal requires a dedicated MongoDB user with appropriate permissions to read data and access change streams. The recommended approach is to create a dedicated read-only user.

MongoDB DeploymentMinimum Required Permissions
Self-managedread role on the mongo database to replicate from
MongoDB AtlasreadAnyDatabase role

Create a Dedicated Read-Only MongoDB User

Connect to your MongoDB instance using the mongo shell with admin privileges:

mongosh --host <host> --port <port> -u <admin-username> -p <admin-password> --authenticationDatabase admin

Script Variables

Replace the placeholder values in the command with your actual information:

  • <host>: Your MongoDB server hostname or IP address
  • <port>: MongoDB port (default is 27017)
  • <admin-username>: Username with admin privileges
  • <admin-password>: Password for the admin user

Create a dedicated user for Supermetal:

use admin
db.createUser({
  user: "supermetal_user",
  pwd: "strong-password",
  roles: [
    { role: "read", db: "target-database" }
  ]
})

Script Variables

Replace the placeholder values in the script with your actual information:

  • strong-password: Replace with a secure, unique password for the supermetal_user.
  • target-database: The name of the database you want to replicate from.

Configure MongoDB Source in Supermetal

You are now ready to configure MongoDB as a source within Supermetal. When configuring the source, select your preferred replication mode:

  • Schema Mode: Infers strongly-typed schemas from documents, creating typed columns in the target. Best for analytics workloads.
  • Schemaless Mode: Preserves documents as JSON in a two-column format (_id, document). Best for variable document structures.

Data Types Mapping

Schema Mode Only

The following type mappings apply to Schema Mode replication. In Schemaless Mode, all documents are stored as JSON in a two-column format (_id: Utf8, document: Json).

MongoDB BSON Type(s)Apache Arrow DataTypeNotes
DoubleFloat64Non-finite values (NaN, Infinity) are converted to null.
Int32Int32
Int64Int64
Decimal128Utf8Preserved as string to maintain exact precision and handle MongoDB's variable decimal precision/scale.
MongoDB BSON Type(s)Apache Arrow DataTypeNotes
BooleanBoolean
MongoDB BSON Type(s)Apache Arrow DataTypeNotes
DateTimeTimestamp(Millisecond, "UTC")Serialized as RFC3339 format with UTC timezone.
TimestampUtf8MongoDB internal oplog timestamp (seconds + ordinal). Serialized as JSON: {"t": seconds, "i": increment}.
MongoDB BSON Type(s)Apache Arrow DataTypeNotes
StringUtf8
SymbolUtf8Deprecated MongoDB type.
RegularExpressionUtf8Pattern string only.
JavaScriptCodeUtf8Code string only.
JavaScriptCodeWithScopeUtf8Serialized as JSON with code and scope.
MongoDB BSON Type(s)Apache Arrow DataTypeNotes
BinaryUtf8Encoded as hexadecimal string for lossless representation.
MongoDB BSON Type(s)Apache Arrow DataTypeNotes
ArrayUtf8Arrays are stringified as JSON to avoid schema conflicts when element types vary across documents.
DocumentUtf8 (JSON)Nested documents serialized as JSON strings. Empty documents do not contribute to schema inference.
MongoDB BSON Type(s)Apache Arrow DataTypeNotes
ObjectIdUtf8Converted to 24-character hex string.
DbPointerUtf8Legacy MongoDB type, serialized as JSON.
Null(no column)Null values do not contribute to schema inference.
MinKeyUtf8Serialized as {"$minKey":1}.
MaxKeyUtf8Serialized as {"$maxKey":1}.
UndefinedUtf8Deprecated type, serialized as {"$undefined":true}.

Last updated on

On this page