Apache Doris

Apache Doris is a real-time analytical MPP database with a MySQL-compatible interface.

Prerequisites

  • Supported Apache Doris Distributions
    • VeloDB Cloud
    • Apache Doris 4.0 or higher

      Cluster Topology

      The connector talks to the Frontend HTTP endpoint and follows Stream Load redirects to Backends. The machine running Supermetal must reach both FE and BE. If either is blocked, snapshot and CDC ingest will fail.

  • Database Admin Access. An account with Admin_priv (typically the built-in root). The setup steps below issue CREATE USER, CREATE DATABASE, and GRANT, which require this privilege.
  • Network Connectivity. Open inbound access from Supermetal to:
    • FE HTTP (default 8030). Stream Load entry point. Issues a 307 redirect to a BE.
    • BE HTTP (default 8040). The client follows the 307 directly to the BE holding the target tablet.
    • FE MySQL (default 9030). SQL, schema reflection, and INSERT WITH LABEL via the S3 TVF path. Override via fe_mysql_port.

Setup

Create a user & grant permissions

Connect to a Frontend

Log in to any FE as root (or another account with Admin_priv) using the MySQL protocol on port 9030.

mysql -h <fe-host> -P 9030 -u root -p

Create a user

CREATE USER 'supermetal_user' IDENTIFIED BY 'strong-password';

Create the target database

CREATE DATABASE IF NOT EXISTS target_database;

Grant privileges

GRANT SELECT_PRIV, LOAD_PRIV, ALTER_PRIV, CREATE_PRIV, DELETE_PRIV
  ON target_database.*
  TO 'supermetal_user';

Connection details

You'll need the following to configure the target in Supermetal.

  • Frontend HTTP URL (for example http://fe.example.internal:8030, or https://...:8030 if you've fronted FE with TLS)
  • Username and password you created above
  • Target database name

Multiple Frontends

If you run more than one FE, point Supermetal at a single URL behind a load balancer. The connector follows Stream Load's 307 redirects to BEs on its own. FE failover should be handled by your LB, not by listing FEs individually.

TLS / mTLS

For HTTPS endpoints, optionally set ssl_root_cert (private CA) and ssl_client_cert_pem + ssl_client_key_pem (mTLS). The same SSL config is applied to both the Stream Load HTTP client and the MySQL client.

Open the SQL editor

Log in to VeloDB Cloud and open the SQL editor for your warehouse. Cloud warehouses expose a MySQL-compatible endpoint. The same DDL works on the web console or any MySQL client pointed at port 9030.

Create a user

CREATE USER 'supermetal_user' IDENTIFIED BY 'strong-password';

Create the target database

CREATE DATABASE IF NOT EXISTS target_database;

Grant privileges

GRANT SELECT_PRIV, LOAD_PRIV, ALTER_PRIV, CREATE_PRIV, DELETE_PRIV
  ON target_database.*
  TO 'supermetal_user';

Connection details

You'll need the following to configure the target in Supermetal.

  • Frontend HTTP(S) URL (for example https://<warehouse-id>.cloud.velodb.io:8030)
  • Username and password you created above
  • Target database name

IP Allowlist

VeloDB Cloud restricts inbound traffic by default. From the warehouse settings, add the public IP (or VPC peering range) of the machine running Supermetal to the allowlist for both the FE HTTP port and the BE HTTP port.


Table Model

ValueDescription
Auto (default)Tables with primary keys land as Unique Key with Merge-on-Write (sequence column _sm_version). Tables without primary keys land as Duplicate Key.
UniqueKeyForces Unique Key with Merge-on-Write. Requires primary keys.
DuplicateKeyAppend-only, no deduplication. Lower write amplification when row-level updates are not needed.

Binary Handling

Doris has no native binary type. BINARY payloads land in STRING. The binary_handling_mode setting controls encoding.

ModeDescription
Bytes (default)Raw bytes pass through. Fastest, but Doris string functions (LIKE, REGEXP, UPPER) may misbehave on non-UTF-8 content.
HexHex-encoded on write. Decode with unhex().
Base64Standard base64 on write. Decode with from_base64().

Data Types Mapping

Apache Arrow DataTypeDoris TypeNotes
Int8TINYINT
Int16SMALLINT
Int32INT
Int64BIGINT
UInt8, UInt16INTWidened one signed level. Doris ignores Parquet's UINT_N annotation.
UInt32BIGINTWidened one signed level.
UInt64DECIMALV3(20, 0)BIGINT silently nulls values at or above 2⁶³. Promoted to DECIMAL.
Float16FLOAT
Float32FLOATNaN and ±Inf preserved.
Float64DOUBLENaN and ±Inf preserved.
Decimal128(p, s)DECIMALV3(p, s)p < 38.
Decimal128(38, s)STRINGWorkaround for Doris 4.0/4.1 silently nulling DECIMAL(38, *) on parquet ingest. Decode with CAST AS DECIMAL on read.
Decimal256(p, s)DECIMALV3(p, s)p < 38. Narrowed to Decimal128.
Decimal256(p, s)STRINGp ≥ 38.
Apache Arrow DataTypeDoris TypeNotes
Date32, Date64DATEV2Values outside 0000-01-01..9999-12-31 are nulled.
Timestamp(*, [tz])DATETIMEV2(N)Cast to UTC. N matches source-declared precision (capped at 6). Plain Arrow timestamps without source metadata land at DATETIMEV2(6). Values before 1900-01-01 are nulled. Doris parquet ingest corrupts pre-1900 timestamps.
Time32, Time64STRINGNo native TIME type.
Duration, IntervalSTRINGNo Doris equivalent.
Apache Arrow DataTypeDoris TypeNotes
Utf8, LargeUtf8, Utf8ViewSTRING
Utf8 JSON extension (arrow.json)JSON
Apache Arrow DataTypeDoris TypeNotes
BooleanBOOLEAN
Apache Arrow DataTypeDoris TypeNotes
Binary, LargeBinary, BinaryView, FixedSizeBinarySTRINGEncoding controlled by binary_handling_mode.
Apache Arrow DataTypeDoris TypeNotes
StructSTRUCT<...>
MapMAP<K, V>
Apache Arrow DataTypeDoris TypeNotes
List<T>, LargeList<T>, FixedSizeList<T>ARRAY<T>Stream Load rejects complex parquet types. ARRAY columns require an object-store buffer so loads go through the S3 TVF.

Nullability

All non-primary-key columns are nullable in Doris by default. Enable preserve_source_nullability to carry NOT NULL from the source schema.

Last updated on

On this page