Databricks cloud data source

The Databricks cloud data source is currently in Early Access and is only available to select customers. If you are interested in using Databricks as a data source, contact your Tealium Support representative.

A cloud data source links your cloud data warehouse or database to Tealium, which lets you import database rows as events.

For more information, see About cloud data sources.

Data types

The Databricks data source supports all Databricks data types. To ensure data is imported correctly, map the Databricks data types according to the following guidelines:

Databricks	Tealium
Numeric data types	Number attributes
String and binary data types	String attributes
Logical data types	Boolean attributes
Date and time data types	Date attributes
Arrays	Array of strings, array of numbers, or array of booleans
Map, struct, object, variant	String attributes

For more information, see Databricks: Data Types (for AWS, Azure, GCP).

Create a connection

Tealium uses a service principal to access your Databricks compute resource. Before you proceed, you must create a service principal in Databricks and generate an OAuth secret. For more information, see Databricks: Authorize access with a service principal using OAuth.

To configure a new connection, enter the following connection details:

Hostname: The hostname of your Databricks compute resource. Examples:
- AWS: MY_ACCOUNT.cloud.databricks.com
- Azure: MY_ACCOUNT.azuredatabricks.net
- GCP: MY_ACCOUNT.gcp.databricks.com
HTTP Path: The HTTP path to your compute resource. For example: /sql/1.0/warehouses/3fbc78304284503a.
Catalog: The name of the catalog for this connection.
Schema: The name of the schema for this connection.
OAuth Client ID: The service principal’s UUID or Application ID.
OAuth Client Secret: The service principal’s generated secret.

For more information, see Databricks: Compute settings (for AWS, Azure, GCP).

After you connect to Databricks, select the data source table from the Table Selection list.

Query mode

For a general overview, see Query modes.

For the Timestamp + Incrementing and Timestamp modes, the selected timestamp column must be the type TIMESTAMP_NTZ.

For more information, see Databricks: TIMESTAMP_NTZ (for AWS, Azure, GCP).

For the Incrementing mode, the selected numeric column must increment in value for every row added. A recommended definition for an auto-increment column is:

COL1 BIGINT GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1)

For more information, see Databricks CREATE TABLE (for AWS, Azure, GCP).

WHERE clause

For a general overview, see SQL Query.

The WHERE clause does not support subqueries from multiple tables. To import data from multiple Databricks tables, create a view in Databricks and select the view in the data source configuration.

For more information, see Databricks: What is a view? (for AWS, Azure, GCP).

IP access list

If your Databricks workspace is restricted by IP addresses, add the Tealium IP addresses to your Databricks IP access list.

For more information, see Databricks: Manage IP access list (for AWS, Azure, GCP).