Source Connector for Hive¶

This guide describes how to configure digna to connect to Hive using either the native Python connector or the ODBC driver.

It refers to the screen "Create a Database Connection".

Create a database connection

Native Python Driver¶

Library: PyHive
Supported Authentication: Password-based authentication only

⚠️ For other authentication methods, please use the ODBC driver.

digna Configuration (Native Driver)¶

Provide the following information in the "Create a Database Connection" screen:

Name:               Name of the connection. This is used for referencing the connection in other screens.
Technology:         Apache Hive
Host Address:       Server name or IP address
Host Port:          Port number, e.g. 10000
Database Name:      Schema that contains the source data
User Name:          Database user name
User Password:      Password for the user
Profiling Mode:     The profiling mode determines how digna processes data and calculates metrics:
                    - Standard: Metrics are calculated directly on the source tables without copying the data.
                    - Permanent: Data for the inspected day is copied into a permanent table, and metrics are calculated on the copied data.
                    - Session: Data is copied into a session or temporary table, and metrics are calculated on this temporary data.
Work Schema Name:   When using "Permanent" profiling mode, work tables will be placed in this schema.
Use ODBC:           Disabled (default)

ODBC Driver¶

The ODBC driver may support a broader range of authentication and connectivity options. This section focuses on password-based authentication using the driver Cloudera ODBC Driver for Apache Hive.

1. Install the ODBC Driver¶

Install the Cloudera ODBC Driver for Apache Hive (or similar) by following the vendor’s official installation guide.

2. Configure the ODBC Data Source¶

Follow these steps to configure a new ODBC data source using password-based authentication:

Step 1¶

Step 1

Step 2 – Test the connection¶

Provide the password and click Test button.

Step 2

After a successful test, click the OK button.

Now you can configure digna to use the ODBC connection, either with a DSN (Data Source Name) or a DSN-less setup.

A. DSN-Based Configuration¶

digna Configuration¶

In the "Create a Database Connection" screen, provide the following:

Name:               Name of the connection. This is used for referencing the connection in other screens.
Technology:         Apache Hive
Database Name:      Schema that contains the source data
Profiling Mode:     The profiling mode determines how digna processes data and calculates metrics:
                    - Standard: Metrics are calculated directly on the source tables without copying the data.
                    - Permanent: Data for the inspected day is copied into a permanent table, and metrics are calculated on the copied data.
                    - Session: Data is copied into a session or temporary table, and metrics are calculated on this temporary data.
Work Schema Name:   When using "Permanent" profiling mode, work tables will be placed in this schema.
Use ODBC:           Enabled

ODBC Properties¶

name: "DSN",            value: "*digna*data_hdp"
name: "PWD",            value: "{your password in curly braces}"

🔹 The DSN must match the name defined in your ODBC driver configuration.

B. DSN-less Configuration¶

digna Configuration¶

In the "Create a Database Connection" screen, provide the following:

Name:               Name of the connection. This is used for referencing the connection in other screens.
Technology:         Apache Hive
Database Name:      Schema that contains the source data
Profiling Mode:     The profiling mode determines how digna processes data and calculates metrics:
                    - Standard: Metrics are calculated directly on the source tables without copying the data.
                    - Permanent: Data for the inspected day is copied into a permanent table, and metrics are calculated on the copied data.
                    - Session: Data is copied into a session or temporary table, and metrics are calculated on this temporary data.
Work Schema Name:   When using "Permanent" profiling mode, work tables will be placed in this schema.
Use ODBC:           Enabled

ODBC Properties¶

name: "DRIVER",     value: "Cloudera ODBC Driver for Apache Hive"
name: "HOST",       value: "your server name or IP address"
name: "PORT",       value: "Port number, e.g. 10000"
name: "Schema",     value: "Schema that contains the source data"
name: "UID",        value: "your hive user'
name: "PWD",        value: "your hive password"
name: "AuthMech",   value: "3"