Databricks
Supermetal replicates to Databricks through a serverless SQL warehouse, loading Parquet from a Unity Catalog volume stage or directly from the object store buffer with a storage credential.
Prerequisites
- Databricks on AWS, Azure, or GCP, or a serverless workspace.
- Unity Catalog enabled on the workspace.
- A serverless SQL warehouse.
Setup
Configure Authentication
Create a Service Principal
Follow the Databricks documentation to create a Service Principal.
Connection details
Note down the Client ID and Client Secret.
Create a Personal Access Token
Follow the Databricks documentation to create a PAT.
Connection details
Note down the personal access token.
Create a SQL Warehouse
Log in to the Databricks console to create or reuse a SQL warehouse.
- Go to SQL > SQL warehouses > Create SQL warehouse
- Fill in the fields
- Name
- Warehouse Size (2X Small)
- Warehouse Type (Serverless)
- Auto Stop (10 minutes)
- Scaling Min & Max (1)
- Unity Catalog (Enabled)
- Click Create
- Once created, click on the Connection Details tab
Connection details
Note down the following:
- Server Hostname (your-workspace.cloud.databricks.com)
- Warehouse ID (0123456789abcdef)
Configure a Catalog
Choose a catalog, or create one following the Databricks documentation.
- From the Databricks workspace console, navigate to Data
- Choose a catalog (
my_catalog)
Create a Volume
Supermetal uses the configured volume as a temporary stage.
Follow the steps from the Databricks documentation.
- From the Databricks workspace console, navigate to Catalog
- Choose the catalog from the previous step (
my_catalog) - Search or browse for the schema to add the volume to and select it
- Click Create Volume and specify a name
- Click Create
CREATE VOLUME my_catalog.my_schema.my_volume;Connection details
Note down the following:
- Catalog Name (my_catalog)
- Volume Path (/Volumes/my_catalog/my_schema/my_volume)
Storage Credential
By default, data stages through the configured volume. A Databricks storage credential with access to the object store buffer lets Databricks read the files directly from the buffer instead.
Data Types Mapping
| Apache Arrow DataType | Databricks Type | Notes |
|---|---|---|
Int8 | TINYINT | |
Int16 | SMALLINT | |
Int32 | INT | |
Int64 | BIGINT | |
UInt8 | SMALLINT | Promoted to a wider signed type |
UInt16 | INT | Promoted to a wider signed type |
UInt32 | BIGINT | Promoted to a wider signed type |
UInt64 | DECIMAL(20, 0) | Mapped to decimal to preserve the full unsigned range |
Float16 | FLOAT | Upcast to Float32 in Parquet |
Float32 | FLOAT | |
Float64 | DOUBLE | |
Decimal128(p, s)where p ≤ 38 | DECIMAL(p, s) | |
Decimal128(p, s)where p > 38 | STRING | Precision exceeds the Databricks maximum of 38 |
Decimal256(p, s)where p ≤ 38 | DECIMAL(p, s) | Downcast to Decimal128 in Parquet |
Decimal256(p, s)where p > 38 | STRING | Precision exceeds the Databricks maximum of 38 |
| Apache Arrow DataType | Databricks Type |
|---|---|
Boolean | BOOLEAN |
| Apache Arrow DataType | Databricks Type | Notes |
|---|---|---|
Date32 | DATE | |
Date64 | DATE | Converted to Date32 in Parquet |
Timestamp(s, tz) | TIMESTAMP_NTZ | Converted to Timestamp(ms) in Parquet for proper annotation |
Timestamp(ms, tz) | TIMESTAMP_NTZ | |
Timestamp(µs, tz) | TIMESTAMP_NTZ | Databricks supports microsecond precision |
Timestamp(ns, tz) | TIMESTAMP_NTZ | Converted to Timestamp(µs) in Parquet, the Databricks maximum precision |
Time32, Time64 | STRING | Databricks has no TIME types |
Interval | STRING | Databricks cannot read INTERVAL from Parquet |
| Apache Arrow DataType | Databricks Type |
|---|---|
Utf8, LargeUtf8 | STRING |
| Apache Arrow DataType | Databricks Type |
|---|---|
Binary, LargeBinary | BINARY |
| Apache Arrow DataType | Databricks Type | Notes |
|---|---|---|
Utf8 JSON Extension (arrow.json) | STRING | VARIANT will be supported in the future |
| Apache Arrow DataType | Databricks Type | Notes |
|---|---|---|
List<T>, LargeList<T>, FixedSizeList<T> | ARRAY<T> | Element type T is recursively mapped |
Changelog
0.1.7
2026-06-16
Variant type support.
Last updated on