Databricks Connector Setup Guide
This article describes how to set up the Databricks connector.
This connector is only available by using the new connectors interface available by clicking Connectors in the left navigation.
Actions
Action Name | AudienceStream | EventStream |
---|---|---|
Send Entire Event Data | ✗ | ✓ |
Send Custom Event Data | ✗ | ✓ |
Send Entire Visitor Data | ✓ | ✗ |
Send Custom Visitor Data | ✓ | ✗ |
How it works
The Databricks connector requires two sets of connections:
- Tealium to a compatible cloud storage solution (AWS S3, Azure Blob Storage or Google Cloud Storage).
- Databricks to that same cloud storage solution..
Tealium to cloud storage connection
Tealium requires a connection to an AWS S3, Azure Blob Storage, or Google Cloud Storage instance to list buckets and upload event and audience data into cloud storage objects and files. You have the following options for authentication for the Databricks connector:
- AWS S3
- Provide an Access Key and Access Secret.
- Provide STS (Security Token Service) credentials.
- Azure Blob Storage
- Client credentials.
- Authorization Code flow (SSO).
- Shared Access Signature (SAS).
- Google Cloud Storage
- Sign in with Google (SSO).
AWS S3 settings
Access Key and Secret credentials
To find your AWS Access Key and Secret:
- Log in to the AWS Management Console and go to the IAM (Identity and Access Management) service.
- Click Users and then Add user.
- Enter a username. For example,
TealiumS3User
. - Attach policies to the user you have just created.
- In the Permissions tab, click Attach existing policies directly.
- Search for and attach the
AmazonS3FullAccess
policy, for full access. If you want to restrict access to a specific bucket, you can write a policy similar to the example below. In this example,YOUR_BUCKET_NAME
is the bucket that Tealium would use to upload event and audience data into S3 objects.{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:ListBucket", "s3:PutObject", "s3:GetObject", "s3:ListBucketMultipartUploads", "s3:ListMultipartUploadParts" ], "Resource": [ "arn:aws:s3:::YOUR_BUCKET_NAME", "arn:aws:s3:::YOUR_BUCKET_NAME/*" ] } ] }
- Create the keys.
- Go to the Security credentials tab and click Create Access Key.
- Copy the Access Key ID and Secret Access Key and save them securely.
STS credentials settings
- Log in to the AWS Management Console and go to the IAM (Identity and Access Management) service.
- Click Roles and then Create role.
- For the Trusted entity type, select the AWS account.
- Select Another AWS account and specify the Tealium account ID:
757913464184
. - Optional. Check the Require external ID checkbox and specify the external ID that you want to use. External IDs can be up to 256 characters long and can include alphanumeric characters (
A-Z
,a-z
,0-9
) and symbols, such as hyphens (-
), underscores (_
), and periods (.
). - Enter a name for the role. The role name must start with
tealium-databricks
. For example,tealium-databricks-s3-test
. - Attach policies to the role.
- In the Permissions tab, click Attach existing policies directly.
- Search for and attach the
AmazonS3FullAccess
policy, for full access. If you want to restrict access to a specific bucket, you can write a policy similar to the example below. In this example,YOUR_BUCKET_NAME
is the bucket that Tealium would use to upload event and audience data into S3 objects.{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:ListBucket", "s3:PutObject", "s3:GetObject", "s3:ListBucketMultipartUploads", "s3:ListMultipartUploadParts" ], "Resource": [ "arn:aws:s3:::YOUR_BUCKET_NAME", "arn:aws:s3:::YOUR_BUCKET_NAME/*" ] } ] }
- Create a trust policy.
- Go to the Trust relationships tab and click Edit trust relationship.
- Ensure the trust policy allows the specific external ID to the role you created and that the Tealium production account ID is
757913464184
as seen in the example below. - Set the
EXTERNAL_ID
value for the connection to Tealium. The ID can be up to 256 characters long and can include alphanumeric characters (A-Z
,a-z
,0-9
) and symbols, such as hyphens (-
), underscores (_
), and periods (.
).
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::757913464184:root"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "EXTERNAL_ID"
}
}
}
]
}
Azure Blob Storage settings
Client credentials
To retrieve the tenant ID, client ID, and client secret for an application in Azure, use the following steps:
Step 1: Access Azure Portal
- Go to Azure Portal.
- Sign in with your Azure account.
Step 2: Go to the App Registration
- In the search bar at the top, enter
Azure Active Directory
and select it. - In the left menu, click App registrations.
- Locate your registered application.
Step 3: Find the Tenant ID & Client ID
- Click your application.
- In the Overview section, locate the following information:
- Tenant ID (also known as Directory ID) is listed under Tenant ID.
- Client ID (also known as Application ID) is displayed as Application (client) ID.
Step 4: Generate Client Secret
- In the left menu, go to Certificates & secrets.
- Under Client secrets, click New client secret.
- Provide a description and select an expiration time.
- Click Add.
- Once generated, copy the client secret immediately, because you won’t be able to view it again after leaving the page.
Shared Access Signature (SAS)
To generate a Shared Access Signature (SAS) token in Azure, use the following steps:
Step 1: Access Azure Portal
- Go to the Azure Portal.
- Sign in with your Azure account.
Step 2: Go to Storage Account
- In the search bar, enter
Storage accounts
and select it. - Choose the storage account where you want to generate the SAS token.
Step 3: Generate SAS Token
Option 1: Use the Azure Portal
- In the Storage Account, go to Shared access signature under the Security + networking section.
- Configure the permissions needed (
Read
,Write
,Delete
,List
, etc.). - Set the expiration date and time to define how long the token remains valid.
- Choose the allowed services (
Blob
,File
,Queue
, andTable
). - Click Generate SAS and connection string.
- Copy the SAS token or the connection string, which contains the SAS token.
Option 2: Use Azure Storage Explorer
- Open Azure Storage Explorer and sign in to your Azure account.
- Locate the storage account and right-click a Blob Container or File Share.
- Select Get Shared Access Signature.
- Configure permissions and expiration settings.
- Click Create and copy the generated SAS URL or token.
Option 3: Use Azure CLI
- Run the following command in Azure CLI to generate a SAS token:
az storage blob generate-sas \
--account-name <your-storage-account> \
--container-name <your-container> \
--name <your-blob> \
--permissions r \
--expiry 2026-04-25T12:00:00Z \
--output tsv
This will output a SAS token, which can be appended to the storage URL to provide controlled access.
Authorization Code flow (SSO)
When you click Establish Connection, you’re initiating a secure authentication process known as the Authorization Code Flow. This allows your application to gain access to your Azure Blob Storage without requiring you to manually enter credentials, ensuring a seamless and secure experience.
You will see the following:
- Redirect to Sign-In: You’re temporarily directed to your organization’s Identity Provider (IdP), such as Azure Active Directory, where you log in using your existing credentials.
- Granting Consent: After authenticating, you’ll see a consent screen explaining the permissions that Tealium’s app is requesting—specifically, access to your Blob Storage.
- Secure Access to Blob Storage: Tealium’s application now has permissions to interact with your storage while maintaining Azure’s security policies.
Google Cloud Storage settings
Sign in with Google
When you click Sign in with Google, you initiate a secure authorization process that allows your application to access Google Cloud Storage with your Google account. This process ensures a seamless experience while maintaining security and control over your data.
You will see the following:
- Redirect to Google Sign-In: You are temporarily redirected to Google’s authentication page, where you log in using your Google account credentials.
- Granting Consent: After signing in, you’ll be presented with a consent screen detailing the permissions Tealium’s app is requesting—such as access to your Cloud Storage.
- Receiving an Authorization Code: After you approve, Google generates a one-time authorization code and sends it back to your application.
- Secure Access to Cloud Storage: Tealium’s application now has permissions to interact with your storage while adhering to Google’s security policies.
Databricks to AWS S3 connection
To connect Databricks to an AWS S3 instance, you must first create an IAM role in your AWS instance that can be used to create storage credentials in the Databricks instance. For more information about creating the AWS IAM role, see Databricks: Create a storage credential for connecting to AWS S3.
After the storage credentials have been created, define the external location in the AWS S3 instance that you will pull data from. For more information, see Databricks: Create an external location to connect cloud storage to Databricks.
Databricks to Azure Blob Storage connection
To connect Databricks to an Azure Blob Storage instance, you need to create a storage credential using Azure service principals or managed identities. This allows Databricks to authenticate and access the Blob Storage securely. For more information, see Databricks: Create a storage credential for connecting to Azure Blob Storage.
Once the storage credentials are configured, define an external location in Azure Blob Storage that Databricks will use for reading and writing data. For more information, see Databricks: Create an external location to connect cloud storage to Databricks.
Databricks to Google Cloud Storage connection
To integrate Google Cloud Storage with Databricks, you first need to set up a service account in Google Cloud with the required permissions to access storage buckets. Then, you create a storage credential in Databricks to use this service account for authentication. For more information, see Databricks: Create a storage credential for connecting to Google Cloud Storage.
After setting up the storage credentials, you must define an external location in Google Cloud Storage, specifying the bucket and permissions necessary for Databricks to interact with the data. For more information, see Databricks: Create an external location to connect cloud storage to Databricks.
Batch Limits
This connector uses batched requests to support high-volume data transfers to the vendor. For more information, see Batched Actions. Requests are queued until one of the following thresholds is met or the profile is published:
- Maximum number of requests: 100,000
- Maximum time since oldest request: You can set a custom TTL between 1 and 60 minutes. The default value is 10 minutes.
- Maximum size of requests: 10 MB
Configuration
Go to the Connector Marketplace and add a new connector. For general instructions on how to add a connector, see About Connectors.
After adding the connector, configure the following settings:
- Cloud Solution: Select the cloud solution you are using. Available options are
AWS S3
,Azure Blob Storage
,Google Cloud Storage
. - Databricks Host URL: Provide the Databricks account URL. For example:
https://{ACCOUNT_NAME}.cloud.databricks.com
. - Databricks Token: Provide the Databricks access token. To create an access token in Databricks, click the user avatar in Databricks and go to Settings > Developer > Access Tokens > Manage > Generate New Token.
Authentication settings vary based on the cloud solution you use:
Amazon AWS S3
- Region: Required. Select a region.
- Authentication Type: Required. Select the authentication type for your platform:
- Provide an Access Key and Access Secret.
- Access Key - AWS Access Key: Required for Access Key authentication. Provide the AWS access key.
- Access Key - AWS Secret Access Key: Required for Access Key authentication. Provide the AWS secret access key.
- Provide STS (Security Token Service) credentials.
- STS - Assume Role: ARN: Required for STS authentication. Provide the Amazon Resource Name (ARN) of the role to assume. For example:
arn:aws:iam:222222222222:role/myrole
. For more information, see AWS Identity and Access Management: Switch to an IAM role (AWS API). - STS - Assume Role: Session Name: Required for STS authentication. Provide the session name of the role to assume. Minimum length of 2, maximum length of 64.
- STS - Assume Role: External ID: Required for STS authentication. Provide a third-party external identifier. For more information, see AWS Identity and Access Management: Access to AWS accounts owned by third parties.
- STS - Assume Role: ARN: Required for STS authentication. Provide the Amazon Resource Name (ARN) of the role to assume. For example:
- Provide an Access Key and Access Secret.
Azure Blob Storage
- Tenant ID: The unique identifier for your Azure Active Directory instance, representing your organization.
- Authentication type: Select the authentication type. Available options are: Client credentials, Authorization Code flow (SSO) and Shared Access Signature (SAS).
- Client ID: The unique identifier assigned to your application registered in Azure Active Directory.
- Client Secret: The string that acts like a password, used by the application to authenticate with Azure Active Directory.
- Shared Access Signature: Provide the special set of query parameters that indicate how the resources may be accessed by Tealium.
- Storage account name: The unique name of your Azure Storage account, used to access storage services like Blob, File, Queue, and Table storage.
- API version: The API version compatible with your Azure Storage instance. The default version is
2025-01-05
.
Google Cloud Storage
Click Sign in with Google and follow the on-screen instructions.
Create a notebook
Notebooks in Databricks are documents that contain executable code, visualizations, and narrative text. They are used for data exploration, visualization, and collaboration. In the connector configuration, you have the option to create a new notebook while you create a new connector by clicking Create Notebook in the configuration step.
- In the connector configuration screen, click Create Notebook.
- Enter the table name. The Schema is specified when creating the job, so do not add it into this field.
- Names can include alphanumeric characters (
A-Z
,a-z
,0-9
) and underscores (_
). - Spaces and special characters, such as
!
,@
,#
,-
, and.
, are not allowed. - Names are case-sensitive. For example,
tableName
andtablename
would be considered different names. - Names cannot start with a digit. For example,
1table
is invalid.
- Names can include alphanumeric characters (
- For Notebook Path, enter the absolute path of the notebook. For example:
/Users/user@example.com/project/NOTEBOOK_NAME
.- To locate the absolute path of the notebook in Databricks, go to your Databricks workspace and expand the Users section.
- Click on the user and then expand the options menu.
- Click on Copy URL/Path > Full Path. The path name will be in the following format:
/Workspace/Users/myemail@company.com
. Add the virtual folder and notebook name that you want separated by a slash/
. For example,/Workspace/Users/myemail@company.com/virtualfolder/virtualsubfolder/MyNotebook
.
- For Cloud bucket, select the cloud storage bucket to connect to Databricks.
- The Overwrite option indicates whether to overwrite an existing notebook in the specified workspace if one already exists.
Create a job
Jobs in Databricks automate running notebooks on a schedule or based on specific triggers. Jobs allow you to perform tasks, such as data processing, analysis, and reporting, at regular intervals or triggered on certain events.
- In the connector configuration screen, click Create Job.
- Enter a name for the processing job.
- For Catalog, specify a catalog from the Unity catalog to use as a destination for publishing pipeline data.
- For Target, specify the schema you want to publish/update your table in the above catalog. Do not specify the target table here, as the table that will be used is the one specified in the notebook.
- For Notebook Path, enter the absolute path of the notebook. For example:
/Users/user@example.com/project/NOTEBOOK_NAME
.- To locate the absolute path of the notebook in Databricks, go to your Databricks workspace and expand the Users section.
- Click on the user and then expand the options menu.
- Click on Copy URL/Path > Full Path. The path name will be in the following format:
/Workspace/Users/myemail@company.com
. Add the virtual folder and notebook name that you want separated by a slash/
. For example,/Workspace/Users/myemail@company.com/virtualfolder/virtualsubfolder/MyNotebook
.
- For Cloud Bucket, select the cloud storage bucket to connect to Databricks.
- For Trigger Type, select when to process the data. Available options are:
- File Arrived: Process data every time a new file arrives.
- Scheduled: Process data on a repeating schedule that you specify.
- Cron: Process data on a repeating schedule that you define in the Cron field.
- For Start Time, specify the start time for job processing in the
hh:mm
format. The default value for the start time is00:00
. - For Timezone, specify the timezone in
country/city
format. For example,Europe/London
. This field is required if you provide a start time. - For Cron, enter the quartz cron expression you want to use for scheduled processing. For example
20 30 * * * ?
will process files on the 20th second of the 30th minute of every hour, day, day of the week, and year. For more information, see Quartz: Cron Trigger Tutorial.
Actions
The following section lists the supported parameters for each action.
Send Entire Event Data
Parameters
Parameter | Description |
---|---|
Cloud Bucket | Select the cloud bucket or provide a custom value. |
Databricks Catalog | Select the Databricks catalog or provide a custom value. |
Databricks Schema | Select the Databricks schema or provide a custom value. |
Databricks Table | Select the Databricks table or provide a custom value. |
Column to record the payload | Select the VARIANT column to record the payload. |
Column to record the Timestamp | Select the column to record the timestamp. |
Timestamp Attribute | The default sends the current timestamp for the action. Select an attribute to assign as the timestamp if you want to send a different format. If an attribute is assigned and produces an empty value, we will send the current timestamp. |
Send Custom Event Data
Parameters
Parameter | Description |
---|---|
Cloud Bucket | Select the cloud bucket or provide a custom value. |
Databricks Catalog | Select the Databricks catalog or provide a custom value. |
Databricks Schema | Select the Databricks schema or provide a custom value. |
Databricks Table | Select the Databricks table or provide a custom value. |
Event Parameters
Map parameters to the columns of the Databricks table. You must map at least one parameter.
Send Entire Visitor Data
Parameters
Parameter | Description |
---|---|
Cloud Bucket | Select the cloud bucket or provide a custom value. |
Databricks Catalog | Select the Databricks catalog or provide a custom value. |
Databricks Schema | Select the Databricks schema or provide a custom value. |
Databricks Table | Select the Databricks table or provide a custom value. |
Column to record the visitor data | Select the VARIANT column to record the visitor data. |
Column to record the Timestamp | Select the column to record the timestamp. |
Timestamp Attribute | The default sends the current timestamp for the action. Select an attribute to assign as the timestamp if you want to send a different format. If an attribute is assigned and produces an empty value, we will send the current timestamp. |
Include Current Visit Data with Visitor Data | Select to include the current visit data with visitor data. |
Send Custom Visitor Data
Parameters
Parameter | Description |
---|---|
Cloud Bucket | Select the cloud bucket or provide a custom value. |
Databricks Catalog | Select the Databricks catalog or provide a custom value. |
Databricks Schema | Select the Databricks schema or provide a custom value. |
Databricks Table | Select the Databricks table or provide a custom value. |
Visitor Parameters
Map parameters to the columns of the Databricks table. You must map at least one parameter.
This page was last updated: April 24, 2025