Skip to main content

Lyve Cloud Documentation

Using Trino

The analytics platform provides Trino as a service for data analysis. Trino is a distributed query engine that accesses data stored on object storage through ANSI SQL. This is also used for interactive query and analysis.

Trino is integrated with enterprise authentication and authorization automation to ensure seamless access provisioning with access ownership at the dataset level residing with the business unit owning the data. With Trino resource management and tuning, we ensure 95% of the queries are completed in less than 10 seconds to allow interactive UI and dashboard fetching data directly from Trino. This avoids the data duplication that can happen when creating multi-purpose data cubes.

DBeaver is a universal database administration tool to manage relational and NoSQL databases. Users can connect to Trino from DBeaver to perform the SQL operations on the Trino tables.

Using_Trino.png
Procedure. To configure Trino with Lyve Cloud:
  1. On the left-hand menu of the Platform Dashboard, select Services and then select New Services.

    1.png
  2. In the Create a new service dialogue, complete the following:

    • Basic Settings: Configure your service by entering the following details:

      • Service type: Select Trino from the list.

      • Service name: Enter a unique service name. This name is listed on the Services page.

      • Description: Enter the description of the service.

      • Enabled: The check box is selected by default. Selecting the option allows you to configure the Common and Custom parameters for the service.

      2.png
    • Common Parameters: Configure the memory and CPU resources for the service. The platform uses the default system values if you do not enter any values.

      Note

      When setting the resource limits, consider that an insufficient limit might fail to execute the queries.

      • Memory: Provide a minimum and maximum memory based on requirements by analyzing the cluster size, resources and available memory on nodes. Trino uses memory only within the specified limit.

      • CPU: Provide a minimum and maximum number of CPUs based on the requirement by analyzing cluster size, resources and availability on nodes. Trino uses CPU only the specified limit.

      • Priority Class: By default, the priority is selected as Medium. You can change it to High or Low.

      • Running User: Specifies the logged-in user ID.

      • Shared: Select the checkbox to share the service with other users.

      Common_Parameters.png
    • Custom Parameters: Configure the additional custom parameters for the Trino service.

      • Replicas: Configure the number of replicas or workers for the Trino service.

      • Enable Hive: Select the check box to enable Hive. Once enabled, You must enter the following:

        • Username: Enter the username of the platform (Lyve Cloud Compute) user creating and accessing Hive Metastore.

        • Container: Select big data from the list. This is the name of the container which contains Hive Metastore.

        • Hive Metastore path: Specify the relative path to the Hive Metastore in the configured container.

        Common_Parameters-without_advance.png
  3. Select Create Servive.

Procedure. To configure Trino with Lyve Cloud
  1. On the Services page, select the Trino services to edit.

  2. Select ellipses and then select Edit.

    Edit_Trino_service.png
  3. On the Edit service dialog, select the Custom Parameters tab.

  4. Expand Advanced, in the Predefined section, and select the pencil icon to edit Hive.

    Edit_Hive_-_Copy__2_.png
  5. Specify the following in the properties file:

    Property name

    Description

    hive.s3.aws-access-key

    Lyve cloud S3 access key is a private key used to authenticate for connecting a bucket created in Lyve Cloud. The access key is displayed when you create a new service account in Lyve Cloud. A service account contains bucket credentials for Lyve Cloud to access a bucket. For more information, see Creating service accounts.

    hive.s3.aws-secret-key

    Lyve cloud S3 secret key is private key password used to authenticate for connecting a bucket created in Lyve Cloud. The secret key displays when you create a new service account in Lyve Cloud. For more information, see Creating service accounts.

    hive.s3.endpoint

    Enter Lyve Cloud S3 endpoint of the bucket to connect to a bucket created in Lyve Cloud. For more information, see the S3 API endpoints.

    hive.s3.ssl.enabled

    Use the HTTPS to communicate with Lyve Cloud API. By default, it is set to true.

    hive.s3.path-style-access

    Use path-style access for all requests to access buckets created in Lyve Cloud. This is for S3-compatible storage that doesn’t support virtual-hosted-style access. By default it is set to false.

For more information about other properties, see S3 configuration properties.

Procedure. To configure advanced settings for Trino service:
  1. On the Services page, select the Trino services to edit.

  2. Select ellipses and then select Edit.

    Edit_Trino_service.png
  3. On the Edit service dialog, select the Custom Parameters tab.

  4. Expand Advanced, to edit the Configuration File for Coordinator and Worker.

    edit_service-advance.png
  5. You can edit the properties file for Coordinators and Workers. Select the Coordinator and Worker tab, and select the pencil icon to edit the predefined properties file.

    The following are the predefined properties file:

    • log properties: You can set the log level. For more information, see Log Levels.

    • JVM Config: It contains the command line options to launch the Java Virtual Machine. For more information, see JVM Config.

    • Config Properties: You can edit the advanced configuration for the Trino server. For more information, see Config properties.

    • Catalog Properties: You can edit the catalog configuration for connectors, which are available in the catalog properties file. For more information, see Catalog Properties.

    config_files.png

Assign a label to a node and configure Trino to use a node with the same label and make Trino use the intended nodes running the SQL queries on the Trino cluster. During the Trino service configuration, node labels are provided, you can edit these labels later.

Procedure. To assign node label to Trino:
  1. On the Services menu, select the Trino service and select Edit.

  2. In the Edit service dialogue, verify the Basic Settings and Common Parameters and select Next Step.

  3. In the Node Selection section under Custom Parameters, select Create a new entry.

    Trino_label-1.png
  4. Specify the Key and Value of nodes, and select Save Service.

    Trino_label-2.png

Note

The values in the image are for reference.

Prerequisite before you connect Trino with DBeaver.

Procedure. To connect Trino in DBeaver
  1. Start DBeaver.

  2. In the Database Navigator panel and select New Database Connection.

  3. In the Connect to a database dialog, select All and type Trino in the search field.

  4. Select Trino logo and select Next.

    Trino.png
  5. Select the Main tab and enter the following details:

    • Host: Enter the hostname or IP address of your Trino cluster coordinator.

    • Port: Enter the port number where the Trino server listens for a connection.

    • Database/Schema: Enter the database/schema name to connect.

    • Username: Enter the username of Lyve Cloud Analytics by Iguazio console.

    • Password: Enter the valid password to authenticate the connection to Lyve Cloud Analytics by Iguazio.

    Connect_to_DB-Main_tab.png
  6. Select Driver properties and add the following properties:

    • SSL: Set SSL to True.

    • SSL Verification: Set SSL verification to None.

    Driver_properties.png
  7. Select Test Connection.

    Sucess_mess_-_Trino_withDBeaver.png

    If the JDBC driver is not already installed, it opens the Download driver files dialog showing the latest available JDBC driver. You must select and download the driver.

    Download_driver.png
  8. Select Finish once the testing is completed successfully.

When you create a new Trino cluster, it can be challenging to predict the number of worker nodes needed in future. The number of worker nodes ideally should be sized to both ensure efficient performance and avoid excess costs. Scaling can help achieve this balance by adjusting the number of worker nodes, as these loads can change over time.

Note

The Lyve Cloud analytics platform supports static scaling, meaning the number of worker nodes is held constant while the cluster is used.

Procedure. To scale Trino services
  1. On the left-hand menu of the Platform Dashboard, select Services.

    Scaling_Trino-1.png
  2. Select the ellipses against the Trino services and select Edit.

    Edit_Trino_service.png
  3. Skip Basic Settings and Common Parameters and proceed to configure Custom Parameters.

    • In the Custom Parameters section, enter the Replicas and select Save Service.

      Scaling_Trino-4.png

      Trino scaling is complete once you save the changes.

After you install Trino the default configuration has no security features enabled. You can enable the security feature in different aspects of your Trino cluster.

Authentication in Trino

You can configure a preferred authentication provider, such as LDAP.

You can secure Trino access by integrating with LDAP. After completing the integration, you can establish the Trino coordinator UI and JDBC connectivity by providing LDAP user credentials.

To enable LDAP authentication for Trino, LDAP-related configuration changes need to make on the Trino coordinator.

  1. On the left-hand menu of the Platform Dashboard, select Services.

    1-Se.png
  2. Select the ellipses against the Trino services and select Edit.

    2-se.png
  3. Skip Basic Settings and Common Parameters and proceed to configure Custom Parameters.

    • In the Advanced section, add the ldap.properties file for Coordinator in the Custom section

      3-se.png

      Configure the password authentication to use LDAP in ldap.properties as below.

      password-authenticator.name=ldap
      ldap.url=ldaps://<ldap-server>:636
      ldap.user-bind-pattern=<Refer below for
      usage>
      ldap.user-base-dn=OU=Sites,DC=ad,DC=com

    Property name

    Description

    ldap.url

    The URL to the LDAP server. The URL scheme must be ldap:// or ldaps://.

    It connects to the LDAP server without TLS enabled requires ldap.allow-insecure=true.

    ldap.user-bind-pattern

    This property can be used to specify the LDAP user bind string for password authentication. This property must contain the pattern ${USER}, which is replaced by the actual username during password authentication.

    The property can contain multiple patterns separated by a colon. Each pattern is checked in order until a login succeeds or all logins fail.

    For example: ${USER}@corp.example.com:${USER}@corp.example.co.uk

    ldap.user-base-dn

    The base LDAP distinguished name for the user trying to connect to the server.

    For example: OU=America,DC=corp,DC=example,DC=com

Authorization based on LDAP group membership

You can restrict the set of users to connect to the Trino coordinator in following ways:

  • based on their group membership

  • by setting the optional ldap.group-auth-pattern  property

In addition to the basic LDAP authentication properties. Add below properties in ldap.properties file.

Property name

Description

ldap.group-auth-pattern

This property is used to specify the LDAP query for the LDAP group membership authorization. This query is executed against the LDAP server and if successful, a user distinguished name is extracted from a query result. Trino validates user password by creating LDAP context with user distinguished name and user password.

For more information about authorization properties, see Authorization based on LDAP group membership.

Add the ldap.properties file details in config.propertiesfile of Cordinator using the password-authenticator.config-files=/presto/etc/ldap.properties property:

Save changes to complete LDAP integration.

Note

You must configure one step at a time and always apply changes on dashboard after each change and verify the results before you proceed.

Once the Trino service is launched, create a web-based shell service to use Trino from the shell and run queries.

  1. On the left-hand menu of the Platform Dashboard, select Services and then select New Services.

    1.png
  2. In the Create a new service dialogue, complete the following:

    • Basic Settings: Configure your service by entering the following details:

      • Service type: Select Web-based shell from the list.

      • Service name: Enter a unique service name. This name is listed on the Services page.

      • Description: Enter the description of the service.

      • Enabled: The check box is selected by default. Selecting the option allows you to configure the Common and Custom parameters for the service.

      2.png
    • Common Parameters: Configure the memory and CPU resources for the service. The platform uses the default system values if you do not enter any values.

      Note

      When setting the resource limits, consider that an insufficient limit might fail to execute the queries.

      • Memory: Provide a minimum and maximum memory based on requirements by analyzing the cluster size, resources and available memory on nodes. Web-based shell uses memory only within the specified limit.

      • CPU: Provide a minimum and maximum number of CPUs based on the requirement by analyzing cluster size, resources and availability on nodes. Web-based shell uses CPU only the specified limit.

      • Priority Class: By default, the priority is selected as Medium. You can change it to High or Low.

      • Running User: Specifies the logged-in user ID.

      • Shared: Select the checkbox to share the service with other users.

      3.png
    • Custom Parameters: Configure the additional custom parameters for the Web-based shell service.

      • Spark: Assign Spark service from drop-down for which you want a web-based shell.

      • Trino: Assign Trino service from drop-down for which you want a web-based shell.

      • Service Account: A Kubernetes service account which determines the permissions for using the kubectl CLI to run commands against the platform's application clusters.

      4.png
  3. Select CREATE SERVICE

Sample querying to Trino

After you create a Web based shell with Trino service, start the service which opens web-based shell terminal to execute shell commands.

  1. Select the web-based shell with Trino service to launch web based shell.

    5.png
  2. The Web based shell launches in new tab.

    6.png
  3. Enter the Trino command to run the queries and inspect catalog structures.

    7.png
Example 8. Creating a schema

Create a Schema with a simple query CREATE SCHEMA hive.test_123. After the schema is created, execute SHOW create schema hive.test_123 to verify the schema.

trino> create schema hive.test_123;
CREATE SCHEMA

trino> show create schema hive.test_123;
Create Schema

------------------------------------------------------------------
CREATE SCHEMA hive.test_123

AUTHORIZATION USER edpadmin
 WITH (
    location ='v3io://projects/user/hive/warehouse/test_123.db'
     )


Example 9. Creating a sample table and with the table name as Employee

Create a sample table assuming you need to create a table named employee using CREATE TABLE statement.

trino> CREATE TABLE IF NOT EXISTS hive.test_123.employee (eid
varchar, name varchar,
    -> salary varchar, destination varchar);

CREATE TABLE

trino> SHOW CREATE TABLE hive.test_123.employee;
             Create Table

---------------------------------------
CREATE TABLE hive.test_123.employee (
    eid varchar,
    name varchar,
    salary varchar,
    destination varchar
 )
 WITH (
    format = 'ORC'
 )


Example 10. Inserting data into the table

Insert sample data into the employee table with an insert statement.

trino> INSERT INTO hive.test_123.employee VALUES ('1201', 'Mark','45000','Technical manager');
    ->
    -> INSERT INTO hive.test_123.employee VALUES ('1202', 'Paul','45000','Technical writer');
    ->
    -> INSERT INTO hive.test_123.employee VALUES ('1203', 'Allen','40000','Hr Admin');
    ->
    -> INSERT INTO hive.test_123.employee VALUES ('1204', 'John','30000','Op Admin');


Example 11. Viewing data in the table

View data in a table with select statement.


trino> select * from hive.test_123.employee;

eid      | name     | salary     |  destination
------+-------+--------+-------------------
1203     | Allen    | 40000      | Hr Admin

1201     | Mark     | 45000      | Technical manager

1202     | Paul     | 45000      | Technical writer

1204     | John     | 30000      | Op Admin