Supermetal Documentation

Databricks is a unified data platform that combines key features of data lakes and data warehouses.

This guide walks you through configuring your Databricks platform to work seamlessly with Supermetal.

Features

Feature	Notes
Schema Evolution
Soft Delete(s)

Before you begin, ensure you have:

Supported Databricks Implementations:
- AWS
- Azure
- GCP
- Serverless Workspace
Unity Catalog: Unity Catalog enabled on your Databricks workspace.
SQL Warehouse: Serverless SQL Warehouse.

Follow the databricks documentation to create a Service Principal.

Connection Details

Note down the Client ID and Client Secret.

Follow the databricks documentation to create a PAT.

Connection Details

Note down the personal access token.

Connection Details

Note down the following details:

Supermetal uses the configured volume as a temporary stage

Follow the steps from the databricks documentation.

From Databricks workspace console, navigate to Catalog
Choose the Catalog from above step (my_catalog)
Search or browse for the schema that you want to add the volume to and select it.
Click on Create Volume and specify a Name.
Click Create

CREATE VOLUME my_catalog.my_schema.my_volume;

Connection Details

Note down the following details:

Apache Arrow DataType	Databricks Type	Notes
`Int8`	`TINYINT`
`Int16`	`SMALLINT`
`Int32`	`INT`
`Int64`	`BIGINT`
`UInt8`	`SMALLINT`	Promoted to signed 16-bit
`UInt16`	`INT`	Promoted to signed 32-bit
`UInt32`	`BIGINT`	Promoted to signed 64-bit
`UInt64`	`DECIMAL(20, 0)`	Mapped to decimal to preserve full unsigned 64-bit range
`Float16`	`FLOAT`	Upcast to Float32 in Parquet
`Float32`	`FLOAT`
`Float64`	`DOUBLE`
`Decimal128(p, s)` where p ≤ 38	`DECIMAL(p, s)`
`Decimal128(p, s)` where p > 38	`STRING`	Precision exceeds Databricks maximum of 38
`Decimal256(p, s)` where p ≤ 38	`DECIMAL(p, s)`	Downcast to Decimal128 in Parquet
`Decimal256(p, s)` where p > 38	`STRING`	Precision exceeds Databricks maximum of 38

Apache Arrow DataType	Databricks Type
`Boolean`	`BOOLEAN`

Apache Arrow DataType	Databricks Type	Notes
`Date32`	`DATE`
`Date64`	`DATE`	Converted to Date32 in Parquet
`Timestamp(s, tz)`	`TIMESTAMP_NTZ`	Converted to Timestamp(ms) in Parquet for proper annotation
`Timestamp(ms, tz)`	`TIMESTAMP_NTZ`
`Timestamp(μs, tz)`	`TIMESTAMP_NTZ`	Databricks supports microsecond precision
`Timestamp(ns, tz)`	`TIMESTAMP_NTZ`	Converted to Timestamp(μs) in Parquet (Databricks max precision)
`Time32`, `Time64`	`STRING`	Databricks does not support TIME types
`Interval`	`STRING`	Databricks cannot read INTERVAL from Parquet

Apache Arrow DataType	Databricks Type
`Utf8`, `LargeUtf8`	`STRING`

Apache Arrow DataType	Databricks Type
`Binary`, `LargeBinary`	`BINARY`

Apache Arrow DataType	Databricks Type	Notes
`Utf8` JSON Extension (`arrow.json`)	`STRING`	`VARIANT` will be supported in the future

Apache Arrow DataType	Databricks Type	Notes
`List<T>`, `LargeList<T>`, `FixedSizeList<T>`	`ARRAY<T>`	Element type T is recursively mapped