DatabricksDatabricks

Supermetal replicates to Databricks through a serverless SQL warehouse, loading Parquet from a Unity Catalog volume stage or directly from the object store buffer with a storage credential.

Prerequisites

Setup

Configure Authentication

Create a Service Principal

Follow the Databricks documentation to create a Service Principal.

Connection details

Note down the Client ID and Client Secret.

Create a Personal Access Token

Follow the Databricks documentation to create a PAT.

Connection details

Note down the personal access token.

Create a SQL Warehouse

Log in to the Databricks console to create or reuse a SQL warehouse.

  • Go to SQL > SQL warehouses > Create SQL warehouse
  • Fill in the fields
    • Name
    • Warehouse Size (2X Small)
    • Warehouse Type (Serverless)
    • Auto Stop (10 minutes)
    • Scaling Min & Max (1)
    • Unity Catalog (Enabled)
  • Click Create
  • Once created, click on the Connection Details tab

Connection details

Note down the following:

  • Server Hostname (your-workspace.cloud.databricks.com)
  • Warehouse ID (0123456789abcdef)

Configure a Catalog

Choose a catalog, or create one following the Databricks documentation.

  • From the Databricks workspace console, navigate to Data
  • Choose a catalog (my_catalog)

Create a Volume

Supermetal uses the configured volume as a temporary stage.

Follow the steps from the Databricks documentation.

  • From the Databricks workspace console, navigate to Catalog
  • Choose the catalog from the previous step (my_catalog)
  • Search or browse for the schema to add the volume to and select it
  • Click Create Volume and specify a name
  • Click Create
CREATE VOLUME my_catalog.my_schema.my_volume;

Connection details

Note down the following:

  • Catalog Name (my_catalog)
  • Volume Path (/Volumes/my_catalog/my_schema/my_volume)

Storage Credential

By default, data stages through the configured volume. A Databricks storage credential with access to the object store buffer lets Databricks read the files directly from the buffer instead.

Data Types Mapping

Apache Arrow DataTypeDatabricks TypeNotes
Int8TINYINT
Int16SMALLINT
Int32INT
Int64BIGINT
UInt8SMALLINTPromoted to a wider signed type
UInt16INTPromoted to a wider signed type
UInt32BIGINTPromoted to a wider signed type
UInt64DECIMAL(20, 0)Mapped to decimal to preserve the full unsigned range
Float16FLOATUpcast to Float32 in Parquet
Float32FLOAT
Float64DOUBLE
Decimal128(p, s)
where p ≤ 38
DECIMAL(p, s)
Decimal128(p, s)
where p > 38
STRINGPrecision exceeds the Databricks maximum of 38
Decimal256(p, s)
where p ≤ 38
DECIMAL(p, s)Downcast to Decimal128 in Parquet
Decimal256(p, s)
where p > 38
STRINGPrecision exceeds the Databricks maximum of 38
Apache Arrow DataTypeDatabricks Type
BooleanBOOLEAN
Apache Arrow DataTypeDatabricks TypeNotes
Date32DATE
Date64DATEConverted to Date32 in Parquet
Timestamp(s, tz)TIMESTAMP_NTZConverted to Timestamp(ms) in Parquet for proper annotation
Timestamp(ms, tz)TIMESTAMP_NTZ
Timestamp(µs, tz)TIMESTAMP_NTZDatabricks supports microsecond precision
Timestamp(ns, tz)TIMESTAMP_NTZConverted to Timestamp(µs) in Parquet, the Databricks maximum precision
Time32, Time64STRINGDatabricks has no TIME types
IntervalSTRINGDatabricks cannot read INTERVAL from Parquet
Apache Arrow DataTypeDatabricks Type
Utf8, LargeUtf8STRING
Apache Arrow DataTypeDatabricks Type
Binary, LargeBinaryBINARY
Apache Arrow DataTypeDatabricks TypeNotes
Utf8 JSON Extension (arrow.json)STRINGVARIANT will be supported in the future
Apache Arrow DataTypeDatabricks TypeNotes
List<T>, LargeList<T>, FixedSizeList<T>ARRAY<T>Element type T is recursively mapped

Changelog

0.1.7

2026-06-16

Variant type support.

Last updated on

On this page