Databricks cloud data source

For a general overview of setting up a cloud data source, see Manage a cloud data source.

Data types

The Databricks data source supports all Databricks data types. To ensure data is imported correctly, map the Databricks data types according to the following guidelines:

Databricks	Tealium
Numeric data types	Number attributes
String and binary data types	String attributes
Logical data types	Boolean attributes
Date and time data types	Date attributes
Arrays	Array of strings, array of numbers, or array of booleans
Map, struct, object, variant	String attributes

For more information, see Databricks: Data Types (AWS, Azure, GCP).

Multitable join is not currently supported. You can achieve the same functionality with a Databricks view. For more information, see Databricks: Create and Manage Views (AWS, Azure, GCP).

Create a connection

Tealium uses a service principal to access your Databricks compute resource. Before you proceed, you must create a service principal in Databricks and generate an OAuth secret. For more information, see Databricks: Authorize access with a service principal using OAuth.

Databricks personal access tokens (PAT) are not supported. For more information, see Databricks: Authenticate with Databricks personal access token (legacy) (AWS, Azure, GCP).

To configure a new connection, enter the following connection details:

Hostname: The hostname of your Databricks compute resource. Examples:
- AWS: MY_ACCOUNT.cloud.databricks.com
- Azure: MY_ACCOUNT.azuredatabricks.net
- GCP: MY_ACCOUNT.gcp.databricks.com
HTTP Path: The HTTP path to your compute resource. For example: /sql/1.0/warehouses/3fbc78304284503a. To find the HTTP Path, go to the SQL Warehouses screen in Databricks, select the warehouse where your table is located, and click Connection details.
Catalog: The name of the catalog for this connection.
Schema: The name of the schema for this connection.
OAuth Client ID: The service principal’s UUID or Application ID.
OAuth Client Secret: The service principal’s generated secret.

For more information, see Databricks: Compute settings (AWS, Azure, GCP).

After you connect to Databricks, select the data source table from the Table Selection list.

Query mode

For a general overview, see Query modes.

For Databricks, note the following requirements:

Timestamp + Incrementing and Timestamp modes: The selected timestamp column must be the type TIMESTAMP.
For more information, see Databricks: TIMESTAMP type (AWS, Azure, GCP).
Incrementing mode: The selected numeric column must increment in value for every row added. A recommended definition for an auto-increment column is:

COL1 BIGINT GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1)

For more information, see Databricks CREATE TABLE (AWS, Azure, GCP).

WHERE clause

For a general overview, see SQL Query.

The WHERE clause does not support subqueries from multiple tables. To import data from multiple Databricks tables, create a view in Databricks and select the view in the data source configuration.

For more information, see Databricks: What is a view? (AWS, Azure, GCP).

IP access list

If your Databricks workspace is restricted by IP addresses, add the Tealium IP addresses to your Databricks IP access list.

For more information, see Databricks: Manage IP access list (AWS, Azure, GCP).