BigQueryBigQuery

Supermetal replicates to BigQuery through load jobs that bulk load from a Cloud Storage buffer.

Prerequisites

  • A Google Cloud project with billing enabled and the BigQuery API activated.
  • IAM access to create a service account, grant it project roles, and create datasets. This usually means roles/owner, or roles/iam.serviceAccountAdmin together with roles/resourcemanager.projectIamAdmin.
  • A Cloud Storage bucket in the same region as the target dataset, since BigQuery load jobs read only from GCS.

Setup

Create a Service Account

gcloud iam service-accounts create supermetal \
  --project="my-project-123" \
  --display-name="Supermetal"

In the Google Cloud Console, go to IAM & Admin, then Service Accounts, then Create service account. Name it supermetal and continue without assigning roles yet. The next step grants them.

Grant Project Roles

Supermetal creates datasets, runs load and query jobs, and reads and writes tables. Grant the service account both roles below.

PROJECT="my-project-123"
SA="supermetal@${PROJECT}.iam.gserviceaccount.com"

# Run load/query jobs and create datasets
gcloud projects add-iam-policy-binding "$PROJECT" \
  --member="serviceAccount:${SA}" \
  --role="roles/bigquery.user"

# Read and write tables and data within datasets
gcloud projects add-iam-policy-binding "$PROJECT" \
  --member="serviceAccount:${SA}" \
  --role="roles/bigquery.dataEditor"

Roles explained

roles/bigquery.user grants jobs.create (load jobs, MERGE, DDL) and datasets.create, which Supermetal uses to provision the datasets it needs. roles/bigquery.dataEditor grants the table and row level read and write needed to load and merge data.

Grant Bucket Access

BigQuery loads from the GCS buffer bucket, so grant the service account object access on it.

gcloud storage buckets add-iam-policy-binding "gs://my-buffer-bucket" \
  --member="serviceAccount:[email protected]" \
  --role="roles/storage.objectAdmin"

Region match

The buffer bucket must share a region with the target dataset. BigQuery rejects cross region load jobs.

Create and Download a Key

gcloud iam service-accounts keys create supermetal-key.json \
  --iam-account="[email protected]"

Key security

Paste the full contents of supermetal-key.json into the Service account JSON key field when configuring the target. Store the file securely and delete your local copy once configured.

Multi table transactions

Opt in. When enabled, Supermetal wraps DML across multiple tables in a BigQuery script transaction. Changes from a single source transaction apply fully or not at all.

Query Priority

Supermetal submits query jobs (DDL, MERGE, scripts) at Interactive priority by default. Load jobs are unaffected.

PriorityUse for
InteractiveLatency sensitive replication. The default.
BatchCDC workloads that tolerate delay.

Table Naming

Target table names carry the source schema as a prefix by default, so public.users becomes public_users. Turning the schema prefix off writes to users instead, which is safe only when table names are unique across all source schemas. Independently, you can add a custom prefix or suffix to every table name (raw_users, users_log).

Data Types Mapping

Apache Arrow DataTypeBigQuery TypeNotes
Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32INT64
UInt64NUMERICExceeds the signed 64 bit range.
Float16, Float32, Float64FLOAT64
Decimal128(p, s), Decimal256(p, s)NUMERICWhen p <= 38, s <= 9, integer digits <= 29.
Decimal128(p, s), Decimal256(p, s)BIGNUMERICWhen p <= 76, s <= 38, integer digits <= 38.
Decimal128(p, s), Decimal256(p, s)STRINGOutside the BIGNUMERIC envelope.
Apache Arrow DataTypeBigQuery Type
BooleanBOOL
Apache Arrow DataTypeBigQuery TypeNotes
Date32, Date64DATEClamped to 0001-01-01 through 9999-12-31.
Time32, Time64TIME
Timestamp(s|ms|us|ns, [tz])TIMESTAMPMicrosecond resolution, clamped to BigQuery's supported range.
Apache Arrow DataTypeBigQuery TypeNotes
Utf8, LargeUtf8, Utf8ViewSTRING
Utf8 JSON extension (arrow.json)JSONStaged as STRING, then PARSE_JSON on merge.
NullSTRING
Apache Arrow DataTypeBigQuery Type
Binary, LargeBinary, BinaryView, FixedSizeBinaryBYTES
Apache Arrow DataTypeBigQuery TypeNotes
StructSTRUCT<...>Fields mapped recursively.
Apache Arrow DataTypeBigQuery TypeNotes
List, LargeList, FixedSizeListARRAY<...>Elements mapped recursively.

Changelog

Last updated on

On this page