PostgreSQL

PostgreSQL (Postgres) is a powerful open-source relational database that supports logical replication, enabling real-time Change Data Capture (CDC).

This guide walks you through configuring your PostgreSQL database to work seamlessly with Supermetal.


Features

FeatureNotes
Initial Data Sync

Change Data Capture

Transactional Consistency

Catalog Support


Prerequisites

Before you begin, ensure you have:

  • Supported PostgreSQL Version: PostgreSQL version 9.4 or higher.
  • Database Admin Access: Sufficient privileges to create users, grant permissions, and modify database configuration parameters. This might require superuser privileges or specific cloud provider roles (e.g., rds_superuser on AWS RDS, azure_pg_admin on Azure).
  • Network Connectivity: Ensure that Supermetal's services can reach your PostgreSQL database. You may need to configure firewall rules, security groups, or VPC peering.

Setup

Supermetal supports two replication types for PostgreSQL sources:

  • Snapshot: One-time or periodic full table snapshots.
  • Logical Replication: Real-time change data capture (CDC) with initial snapshot.


Data Types Mapping

PostgreSQL Type(s)Apache Arrow DataTypeNotes
BOOLBoolean
PostgreSQL Type(s)Apache Arrow DataTypeNotes
INT2Int16
INT4Int32
INT8, MONEYInt64MONEY is mapped as a fixed-point integer.
FLOAT4Float32Single precision floating-point.
FLOAT8Float64Double precision floating-point.
NUMERIC(p, s)Decimal128(p, s) / Decimal256(p, s)Uses PostgreSQL precision (p) and scale (s) directly to ensure exact numeric representation.
NUMERIC (unconstrained)Utf8Unconstrained NUMERIC defaults to Utf8 as precision/scale are undefined.
OIDUInt32Object Identifier.
PG_LSNUInt64PostgreSQL Log Sequence Number.
PostgreSQL Type(s)Apache Arrow DataTypeNotes
DATEDate32Days since UNIX epoch. Precision preserved at the day level.
TIMETime64(µs)Microseconds since midnight. Precision preserved at the microsecond level.
TIMESTAMPTimestamp(µs)Microseconds since UNIX epoch, timezone naive. Precision preserved at microsecond level.
TIMESTAMPTZTimestamp(µs, "UTC")Microseconds since UNIX epoch, assumed UTC. Precision preserved at microsecond level.
TIMETZ, INTERVALUtf8Temporal types with string representation in Arrow.
PostgreSQL Type(s)Apache Arrow DataTypeNotes
CHAR, VARCHAR, TEXT, BPCHAR, NAME, UNKNOWNUtf8General-purpose text types.
citext, ltree, lquery, ltxtqueryUtf8Special text types (checked by name in code).
BIT, VARBITUtf8Bit string types represented as text.
UUIDUtf8Universally Unique Identifier.
ENUMUtf8Enumerated types represented as their text labels.
INET, CIDR, MACADDR, MACADDR8Utf8Network address types.
TSVECTORUtf8Text search vector type represented as text. Positions are not preserved
PostgreSQL Type(s)Apache Arrow DataTypeNotes
BYTEABinaryVariable-length byte array.
PostgreSQL Type(s)Apache Arrow DataTypeNotes
JSON, JSONBUtf8Mapped to Utf8. Using the Canonical JSON Extension (arrow.json) is recommended for interoperability, allowing downstream systems to recognize the content as JSON. This metadata signals to downstream supermetal targets (like Snowflake, Clickhouse, etc.) that the string contains JSON, allowing them to map it to their appropriate semi-structured types (e.g., VARIANT, JSON).
XMLUtf8Mapped to Utf8. A canonical extension type for XML might also exist or be defined depending on specific library support, similar to JSON.
PostgreSQL Type(s)Apache Arrow DataTypeNotes
BOOL[]List<Boolean>Base types map as per their categories.
BYTEA[]List<Binary>
INT2[]List<Int16>
INT4[]List<Int32>
INT8[], MONEY[]List<Int64>
OID[]List<UInt32>
PG_LSN[]List<UInt64>
FLOAT4[]List<Float32>
FLOAT8[]List<Float64>
NUMERIC(p, s)[]List<Decimal(p,s)>Precision and scale preserved within list elements.
DATE[]List<Date32>
TIME[]List<Time64(µs)>
TIMESTAMP[]List<Timestamp(µs)>
TIMESTAMPTZ[]List<Timestamp(µs, "UTC")>
CHAR[], VARCHAR[], TEXT[], etc., _citext, _ltree, etc., BIT[], VARBIT[], TIMETZ[], INTERVAL[], XML[], UUID[], INET[], CIDR[], MACADDR8[], MACADDR[], JSON[], JSONB[], NUMERIC[] (unconstrained)List<Utf8>Includes arrays of types mapped to Utf8.
Multi-dimensional Arrays (>1 dim)UnsupportedMulti-dimensional arrays (dimension > 1) are not supported.
PostgreSQL Type(s)Apache Arrow DataTypeNotes
INT4RANGE, INT8RANGE, NUMRANGE, TSRANGE, TSTZRANGE, DATERANGEUtf8Range types are serialized to JSON format using the Canonical JSON Extension (arrow.json).
Multirange typesUnsupportedMultirange types are not supported.
PostgreSQL Type(s)Apache Arrow DataTypeNotes
DOMAIN(base_type)Mapped as base_typeDomains are resolved to their underlying base type.
Composite Types (ROW)Utf8Composite types (structs/rows) are not explicitly mapped and default to a string (e.g., JSON) representation (Utf8).
PostgreSQL Type(s)Apache Arrow DataTypeNotes
POINT, LSEG, PATH, BOX, POLYGON, LINE, CIRCLEUnsupportedGeometric types are unsupported.

PostgreSQL to PostgreSQL Replication

When replicating from PostgreSQL to PostgreSQL, Supermetal replicates the source PostgreSQL data types 1:1, preserving the exact type from source to target when possible.


Last updated on