BigQuery

Supermetal replicates to BigQuery through load jobs that bulk load from a Cloud Storage buffer.

Prerequisites

A Google Cloud project with billing enabled and the BigQuery API activated.
IAM access to create a service account, grant it project roles, and create datasets. This usually means roles/owner, or roles/iam.serviceAccountAdmin together with roles/resourcemanager.projectIamAdmin.
A Cloud Storage bucket in the same region as the target dataset, since BigQuery load jobs read only from GCS.

Setup

Create a Service Account

gcloud iam service-accounts create supermetal \
  --project="my-project-123" \
  --display-name="Supermetal"

In the Google Cloud Console, go to IAM & Admin, then Service Accounts, then Create service account. Name it supermetal and continue without assigning roles yet. The next step grants them.

Grant Project Roles

Supermetal creates datasets, runs load and query jobs, and reads and writes tables. Grant the service account both roles below.

PROJECT="my-project-123"
SA="supermetal@${PROJECT}.iam.gserviceaccount.com"

# Run load/query jobs and create datasets
gcloud projects add-iam-policy-binding "$PROJECT" \
  --member="serviceAccount:${SA}" \
  --role="roles/bigquery.user"

# Read and write tables and data within datasets
gcloud projects add-iam-policy-binding "$PROJECT" \
  --member="serviceAccount:${SA}" \
  --role="roles/bigquery.dataEditor"

Roles explained

roles/bigquery.user grants jobs.create (load jobs, MERGE, DDL) and datasets.create, which Supermetal uses to provision the datasets it needs. roles/bigquery.dataEditor grants the table and row level read and write needed to load and merge data.

Grant Bucket Access

BigQuery loads from the GCS buffer bucket, so grant the service account object access on it.

gcloud storage buckets add-iam-policy-binding "gs://my-buffer-bucket" \
  --member="serviceAccount:[email protected]" \
  --role="roles/storage.objectAdmin"

Region match

The buffer bucket must share a region with the target dataset. BigQuery rejects cross region load jobs.

Create and Download a Key

gcloud iam service-accounts keys create supermetal-key.json \
  --iam-account="[email protected]"

Key security

Paste the full contents of supermetal-key.json into the Service account JSON key field when configuring the target. Store the file securely and delete your local copy once configured.

Multi table transactions

Opt in. When enabled, Supermetal wraps DML across multiple tables in a BigQuery script transaction. Changes from a single source transaction apply fully or not at all.

Query Priority

Supermetal submits query jobs (DDL, MERGE, scripts) at Interactive priority by default. Load jobs are unaffected.

Priority	Use for
Interactive	Latency sensitive replication. The default.
Batch	CDC workloads that tolerate delay.

Table Naming

Target table names carry the source schema as a prefix by default, so public.users becomes public_users. Turning the schema prefix off writes to users instead, which is safe only when table names are unique across all source schemas. Independently, you can add a custom prefix or suffix to every table name (raw_users, users_log).

Data Types Mapping

Apache Arrow DataType	BigQuery Type	Notes
`Int8`, `Int16`, `Int32`, `Int64`, `UInt8`, `UInt16`, `UInt32`	`INT64`
`UInt64`	`NUMERIC`	Exceeds the signed 64 bit range.
`Float16`, `Float32`, `Float64`	`FLOAT64`
`Decimal128(p, s)`, `Decimal256(p, s)`	`NUMERIC`	When `p <= 38`, `s <= 9`, integer digits `<= 29`.
`Decimal128(p, s)`, `Decimal256(p, s)`	`BIGNUMERIC`	When `p <= 76`, `s <= 38`, integer digits `<= 38`.
`Decimal128(p, s)`, `Decimal256(p, s)`	`STRING`	Outside the `BIGNUMERIC` envelope.

Apache Arrow DataType	BigQuery Type
`Boolean`	`BOOL`

Apache Arrow DataType	BigQuery Type	Notes
`Date32`, `Date64`	`DATE`	Clamped to `0001-01-01` through `9999-12-31`.
`Time32`, `Time64`	`TIME`
`Timestamp(s\|ms\|us\|ns, [tz])`	`TIMESTAMP`	Microsecond resolution, clamped to BigQuery's supported range.

Apache Arrow DataType	BigQuery Type	Notes
`Utf8`, `LargeUtf8`, `Utf8View`	`STRING`
`Utf8` JSON extension (`arrow.json`)	`JSON`	Staged as `STRING`, then `PARSE_JSON` on merge.
`Null`	`STRING`

Apache Arrow DataType	BigQuery Type
`Binary`, `LargeBinary`, `BinaryView`, `FixedSizeBinary`	`BYTES`

Apache Arrow DataType	BigQuery Type	Notes
`Struct`	`STRUCT<...>`	Fields mapped recursively.

Apache Arrow DataType	BigQuery Type	Notes
`List`, `LargeList`, `FixedSizeList`	`ARRAY<...>`	Elements mapped recursively.