PostgreSQL
PostgreSQL (Postgres) is a powerful open-source relational database. This guide walks you through configuring your PostgreSQL database to work seamlessly with Supermetal.
Features
| Feature | Notes |
|---|---|
| Bulk Insert | |
| Transactional Consistency | |
| Schema Evolution | |
| Hard Deletes |
Prerequisites
Before you begin, ensure you have:
- Supported PostgreSQL Version: PostgreSQL 9.4 or higher.
- Database Admin Access: Sufficient privileges to create users and grant permissions.
- Network Connectivity: Ensure that Supermetal's services can reach your PostgreSQL database. You may need to configure security groups.
Setup
Log in to your PostgreSQL database and run the following:
-- Create a user for Supermetal
CREATE USER supermetal_user WITH PASSWORD 'strong-password';
-- Grant permissions
-- allows supermetal to connect to the database, create schemas and (temp) tables
GRANT CONNECT, CREATE, TEMPORARY ON DATABASE target_database TO supermetal_user;Password Security
Replace 'strong-password' with a secure, unique password for the supermetal_user. Store this password securely, as you'll need it when configuring the target in Supermetal.
Script Variables
Replace the placeholders in the script with your specific values:
target_database: The name of your PostgreSQL database
Foreign Key Constraints (Optional)
If your destination schema has foreign key constraints, tables loaded in parallel may temporarily violate them until all referenced rows arrive. To defer constraint checks during writes, grant elevated privileges:
-- AWS RDS
GRANT rds_superuser TO supermetal_user;
-- Standard PostgreSQL (run as superuser)
ALTER USER supermetal_user WITH SUPERUSER;Snapshot Load Performance (Optional)
For large snapshots, optimize checkpoint frequency to maximize COPY throughput and avoid I/O storms. Checkpoints trigger when either limit is reached first.
max_wal_size: Maximum WAL size before checkpoint. COPY generates WAL rapidly, and small values (default: 1GB) cause frequent checkpoints that throttle performance. Increase to 16GB - 64GB to allow bulk loads to complete with fewer I/O interruptions.checkpoint_timeout: Maximum time between checkpoints. Default (5min) triggers checkpoints before large snapshots complete. Increase to 30min - 1h to prevent premature interruptions during extended loads.
Data Types Mapping
| Apache Arrow DataType | PostgreSQL Type | Notes |
|---|---|---|
Int8 | SMALLINT | |
Int16 | SMALLINT | |
Int32 | INTEGER | |
Int64 | BIGINT | |
UInt8 | SMALLINT | |
UInt16 | INTEGER | |
UInt32 | BIGINT | |
UInt64 | NUMERIC | |
Float16 | REAL | |
Float32 | REAL | |
Float64 | DOUBLE PRECISION | |
Decimal128(p,s) | NUMERIC(p,s) | |
Decimal256(p,s) | NUMERIC(p,s) |
| Apache Arrow DataType | PostgreSQL Type | Notes |
|---|---|---|
Date32 | DATE | |
Date64 | DATE | |
Time32 | TIME | |
Time64 | TIME | |
Time64(ns) | BIGINT | Nanosecond precision stored as integer because Postgres does not support nanosecond precision |
Timestamp(s/ms/us, [tz]) | TIMESTAMP[Z] | |
Timestamp(ns, [tz]) | BIGINT | Nanosecond timestamps stored as integer because Postgres does not support nanosecond precision |
Interval | INTERVAL | |
Duration | Not supported | Durations are explicitly not supported |
| Apache Arrow DataType | PostgreSQL Type | Notes |
|---|---|---|
Utf8, LargeUtf8 | TEXT |
| Apache Arrow DataType | PostgreSQL Type | Notes |
|---|---|---|
Boolean | BOOLEAN | Boolean (true/false) |
| Apache Arrow DataType | PostgreSQL Type | Notes |
|---|---|---|
Utf8 JSON Extension (arrow.json) | JSONB | |
Struct | Not supported | |
Map | Not supported |
| Apache Arrow DataType | PostgreSQL Type | Notes |
|---|---|---|
Binary, LargeBinary, FixedSizeBinary | BYTEA |
| Apache Arrow DataType | PostgreSQL Type | Notes |
|---|---|---|
List<T>, LargeList<T>, FixedSizeList<T> | T[] | Only one-dimensional arrays of primitive types are supported |
Last updated on