BigQuery
Supermetal replicates to BigQuery through load jobs that bulk load from a Cloud Storage buffer.
Prerequisites
- A Google Cloud project with billing enabled and the BigQuery API activated.
- IAM access to create a service account, grant it project roles, and create datasets. This usually means
roles/owner, orroles/iam.serviceAccountAdmintogether withroles/resourcemanager.projectIamAdmin. - A Cloud Storage bucket in the same region as the target dataset, since BigQuery load jobs read only from GCS.
Setup
Create a Service Account
gcloud iam service-accounts create supermetal \
--project="my-project-123" \
--display-name="Supermetal"In the Google Cloud Console, go to IAM & Admin, then Service Accounts, then Create service account. Name it supermetal and continue without assigning roles yet. The next step grants them.
Grant Project Roles
Supermetal creates datasets, runs load and query jobs, and reads and writes tables. Grant the service account both roles below.
PROJECT="my-project-123"
SA="supermetal@${PROJECT}.iam.gserviceaccount.com"
# Run load/query jobs and create datasets
gcloud projects add-iam-policy-binding "$PROJECT" \
--member="serviceAccount:${SA}" \
--role="roles/bigquery.user"
# Read and write tables and data within datasets
gcloud projects add-iam-policy-binding "$PROJECT" \
--member="serviceAccount:${SA}" \
--role="roles/bigquery.dataEditor"Roles explained
roles/bigquery.user grants jobs.create (load jobs, MERGE, DDL) and datasets.create, which Supermetal uses to provision the datasets it needs. roles/bigquery.dataEditor grants the table and row level read and write needed to load and merge data.
Grant Bucket Access
BigQuery loads from the GCS buffer bucket, so grant the service account object access on it.
gcloud storage buckets add-iam-policy-binding "gs://my-buffer-bucket" \
--member="serviceAccount:[email protected]" \
--role="roles/storage.objectAdmin"Region match
The buffer bucket must share a region with the target dataset. BigQuery rejects cross region load jobs.
Create and Download a Key
gcloud iam service-accounts keys create supermetal-key.json \
--iam-account="[email protected]"Key security
Paste the full contents of supermetal-key.json into the Service account JSON key field when configuring the target. Store the file securely and delete your local copy once configured.
Multi table transactions
Opt in. When enabled, Supermetal wraps DML across multiple tables in a BigQuery script transaction. Changes from a single source transaction apply fully or not at all.
Query Priority
Supermetal submits query jobs (DDL, MERGE, scripts) at Interactive priority by default. Load jobs are unaffected.
| Priority | Use for |
|---|---|
| Interactive | Latency sensitive replication. The default. |
| Batch | CDC workloads that tolerate delay. |
Table Naming
Target table names carry the source schema as a prefix by default, so public.users becomes public_users. Turning the schema prefix off writes to users instead, which is safe only when table names are unique across all source schemas. Independently, you can add a custom prefix or suffix to every table name (raw_users, users_log).
Data Types Mapping
| Apache Arrow DataType | BigQuery Type | Notes |
|---|---|---|
Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32 | INT64 | |
UInt64 | NUMERIC | Exceeds the signed 64 bit range. |
Float16, Float32, Float64 | FLOAT64 | |
Decimal128(p, s), Decimal256(p, s) | NUMERIC | When p <= 38, s <= 9, integer digits <= 29. |
Decimal128(p, s), Decimal256(p, s) | BIGNUMERIC | When p <= 76, s <= 38, integer digits <= 38. |
Decimal128(p, s), Decimal256(p, s) | STRING | Outside the BIGNUMERIC envelope. |
| Apache Arrow DataType | BigQuery Type |
|---|---|
Boolean | BOOL |
| Apache Arrow DataType | BigQuery Type | Notes |
|---|---|---|
Date32, Date64 | DATE | Clamped to 0001-01-01 through 9999-12-31. |
Time32, Time64 | TIME | |
Timestamp(s|ms|us|ns, [tz]) | TIMESTAMP | Microsecond resolution, clamped to BigQuery's supported range. |
| Apache Arrow DataType | BigQuery Type | Notes |
|---|---|---|
Utf8, LargeUtf8, Utf8View | STRING | |
Utf8 JSON extension (arrow.json) | JSON | Staged as STRING, then PARSE_JSON on merge. |
Null | STRING |
| Apache Arrow DataType | BigQuery Type |
|---|---|
Binary, LargeBinary, BinaryView, FixedSizeBinary | BYTES |
| Apache Arrow DataType | BigQuery Type | Notes |
|---|---|---|
Struct | STRUCT<...> | Fields mapped recursively. |
| Apache Arrow DataType | BigQuery Type | Notes |
|---|---|---|
List, LargeList, FixedSizeList | ARRAY<...> | Elements mapped recursively. |
Changelog
Last updated on