Databricks cloud data source
This article describes how to set up the Databricks cloud data source.
The Databricks cloud data source is currently in Early Access and is only available to select customers. If you are interested in using Databricks as a data source, contact your Tealium Support representative.
A cloud data source connects your cloud data warehouse or database to Tealium, so you can import data as events for processing.
For more information, see About cloud data sources.
Data types
The Databricks data source supports all Databricks data types. To ensure data is imported correctly, map the Databricks data types according to the following guidelines:
Databricks | Tealium |
---|---|
Numeric data types | Number attributes |
String and binary data types | String attributes |
Logical data types | Boolean attributes |
Date and time data types | Date attributes |
Arrays | Array of strings, array of numbers, or array of booleans |
Map, struct, object, variant | String attributes |
For more information, see Databricks: Data Types (for AWS, Azure, GCP).
Create a connection
Tealium uses a service principal to access your Databricks compute resource. Before you proceed, you must create a service principal in Databricks and generate an OAuth secret. For more information, see Databricks: Authorize access with a service principal using OAuth.
To configure a new connection, enter the following connection details:
- Hostname: The hostname of your Databricks compute resource. Examples:
- AWS:
MY_ACCOUNT.cloud.databricks.com
- Azure:
MY_ACCOUNT.azuredatabricks.net
- GCP:
MY_ACCOUNT.gcp.databricks.com
- AWS:
- HTTP Path: The HTTP path to your compute resource. For example:
/sql/1.0/warehouses/3fbc78304284503a
. - Catalog: The name of the catalog for this connection.
- Schema: The name of the schema for this connection.
- OAuth Client ID: The service principal’s UUID or Application ID.
- OAuth Client Secret: The service principal’s generated secret.
For more information, see Databricks: Compute settings (for AWS, Azure, GCP).
After you connect to Databricks, select the data source table from the Table Selection list.
Query mode
For a general overview, see Query modes.
For the Timestamp + Incrementing and Timestamp modes, the selected timestamp column must be the type TIMESTAMP_NTZ
.
For more information, see Databricks: TIMESTAMP_NTZ (for AWS, Azure, GCP).
For the Incrementing mode, the selected numeric column must increment in value for every row added. A recommended definition for an auto-increment column is:
COL1 BIGINT GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1)
For more information, see Databricks CREATE TABLE (for AWS, Azure, GCP).
WHERE clause
For a general overview, see SQL Query.
The WHERE
clause does not support subqueries from multiple tables. To import data from multiple Databricks tables, create a view in Databricks and select the view in the data source configuration.
For more information, see Databricks: What is a view? (for AWS, Azure, GCP).
IP access list
If your Databricks workspace is restricted by IP addresses, add the Tealium IP addresses to your Databricks IP access list.
For more information, see Databricks: Manage IP access list (for AWS, Azure, GCP).
This page was last updated: June 17, 2025