MongoDB
Supermetal's MongoDB integration leverages MongoDB's native change streams capability to provide efficient, reliable data extraction with minimal impact on your database. This integration enables both initial snapshot capture and continuous replication of changes.
This guide covers the features, prerequisites, and configuration steps required to connect MongoDB with Supermetal.
Replication Modes
Supermetal offers two replication modes to handle MongoDB's flexible document model:
Schema Mode: Automatically infers and evolves a strongly-typed schema from your documents. Each field becomes a typed column in the target, enabling native SQL queries and type-safe analytics. Best for structured data and analytics workloads.
Schemaless Mode: Preserves documents as JSON in a two-column format (_id, document). Supports parallelized snapshots for faster initial loads. Ideal for highly variable document structures or when you need to preserve the original document format for downstream JSON processing.
Click each tab to see how a MongoDB document is replicated in each mode.
{
"_id": ObjectId("507f1f77bcf86cd799439011"),
"name": "Alice",
"email": "[email protected]",
"age": 30,
"is_active": true,
"tags": ["admin", "user"]
}| Column | Value | Type |
|---|---|---|
_id | 507f1f77bcf86cd799439011 | Utf8 |
name | Alice | Utf8 |
email | [email protected] | Utf8 |
age | 30 | Int32 |
is_active | true | Boolean |
tags | ["admin","user"] | Utf8 |
Arrays
Arrays are serialized as JSON strings to avoid schema conflicts when element types vary across documents.
| Column | Value |
|---|---|
_id | 507f1f77bcf86cd799439011 |
document | {"name":"Alice","email":"[email protected]","age":30,"is_active":true,"tags":["admin","user"]} |
Features
| Feature | Notes |
|---|---|
| Initial Data Sync | |
| Change Data Capture | |
| Schema Evolution | |
| Catalog Support | |
| Document Flattening |
Prerequisites
Before you begin, ensure you have:
-
MongoDB Requirements:
- Version: MongoDB 4.0 or higher (required for change streams support)
- Supported Deployments:
MongoDB Community/Enterprise Server(replica set configuration)MongoDB Atlas
Change Streams Requirement
Change streams require a replica set deployment. If you're using a standalone server, you must convert it to a single-node replica set to use change streams.
-
Deployment Requirements:
- Replica Set: Your MongoDB deployment must be configured as a replica set
- Read Concern Majority: Supermetal uses read concern "majority" to ensure consistent reads
- Database Permissions: User with appropriate permissions (see Setup)
- Network Connectivity: Ensure Supermetal can reach your MongoDB deployment (default port: 27017)
- TLS/SSL Support: Supermetal supports both unencrypted and TLS/SSL encrypted connections
-
MongoDB Atlas Requirements:
- Network Access: Configure network access rules to allow Supermetal to connect
- Connection String: Use the connection string format that includes all replica set members
Setup
Permissions Overview
Supermetal requires a dedicated MongoDB user with appropriate permissions to read data and access change streams. The recommended approach is to create a dedicated read-only user.
| MongoDB Deployment | Minimum Required Permissions |
|---|---|
| Self-managed | read role on the mongo database to replicate from |
| MongoDB Atlas | readAnyDatabase role |
Create a Dedicated Read-Only MongoDB User
Connect to your MongoDB instance using the mongo shell with admin privileges:
mongosh --host <host> --port <port> -u <admin-username> -p <admin-password> --authenticationDatabase adminScript Variables
Replace the placeholder values in the command with your actual information:
<host>: Your MongoDB server hostname or IP address<port>: MongoDB port (default is 27017)<admin-username>: Username with admin privileges<admin-password>: Password for the admin user
Create a dedicated user for Supermetal:
use admin
db.createUser({
user: "supermetal_user",
pwd: "strong-password",
roles: [
{ role: "read", db: "target-database" }
]
})Script Variables
Replace the placeholder values in the script with your actual information:
strong-password: Replace with a secure, unique password for thesupermetal_user.target-database: The name of the database you want to replicate from.
Log in to your MongoDB Atlas account
Visit cloud.mongodb.com to access your account.
Configure MongoDB Source in Supermetal
You are now ready to configure MongoDB as a source within Supermetal. When configuring the source, select your preferred replication mode:
- Schema Mode: Infers strongly-typed schemas from documents, creating typed columns in the target. Best for analytics workloads.
- Schemaless Mode: Preserves documents as JSON in a two-column format (
_id,document). Best for variable document structures.
Data Types Mapping
Schema Mode Only
The following type mappings apply to Schema Mode replication. In Schemaless Mode, all documents are stored as JSON in a two-column format (_id: Utf8, document: Json).
| MongoDB BSON Type(s) | Apache Arrow DataType | Notes |
|---|---|---|
Double | Float64 | Non-finite values (NaN, Infinity) are converted to null. |
Int32 | Int32 | |
Int64 | Int64 | |
Decimal128 | Utf8 | Preserved as string to maintain exact precision and handle MongoDB's variable decimal precision/scale. |
| MongoDB BSON Type(s) | Apache Arrow DataType | Notes |
|---|---|---|
Boolean | Boolean |
| MongoDB BSON Type(s) | Apache Arrow DataType | Notes |
|---|---|---|
DateTime | Timestamp(Millisecond, "UTC") | Serialized as RFC3339 format with UTC timezone. |
Timestamp | Utf8 | MongoDB internal oplog timestamp (seconds + ordinal). Serialized as JSON: {"t": seconds, "i": increment}. |
| MongoDB BSON Type(s) | Apache Arrow DataType | Notes |
|---|---|---|
String | Utf8 | |
Symbol | Utf8 | Deprecated MongoDB type. |
RegularExpression | Utf8 | Pattern string only. |
JavaScriptCode | Utf8 | Code string only. |
JavaScriptCodeWithScope | Utf8 | Serialized as JSON with code and scope. |
| MongoDB BSON Type(s) | Apache Arrow DataType | Notes |
|---|---|---|
Binary | Utf8 | Encoded as hexadecimal string for lossless representation. |
| MongoDB BSON Type(s) | Apache Arrow DataType | Notes |
|---|---|---|
Array | Utf8 | Arrays are stringified as JSON to avoid schema conflicts when element types vary across documents. |
Document | Utf8 (JSON) | Nested documents serialized as JSON strings. Empty documents do not contribute to schema inference. |
| MongoDB BSON Type(s) | Apache Arrow DataType | Notes |
|---|---|---|
ObjectId | Utf8 | Converted to 24-character hex string. |
DbPointer | Utf8 | Legacy MongoDB type, serialized as JSON. |
Null | (no column) | Null values do not contribute to schema inference. |
MinKey | Utf8 | Serialized as {"$minKey":1}. |
MaxKey | Utf8 | Serialized as {"$maxKey":1}. |
Undefined | Utf8 | Deprecated type, serialized as {"$undefined":true}. |
Last updated on



