Skip to main content

Lyve Cloud Documentation

SQL reference for Lyve Cloud

This reference contains a description of the structured query language (SQL) elements that are supported by Lyve Cloud S3 Select.

SELECT Command

Lyve Cloud S3 supports only the SELECT: SQL command. The following ANSI standard clauses are supported for SELECT:

SELECT list: The SELECT list names the columns, functions, and expressions that you want the query to return. The list represents the output of the query.

SELECT *
SELECT projection [ AS column_alias | column_alias ] [, ...]

The first form with * (asterisk) returns every row that passed the WHERE clause, as-is. The second form creates a row with user-defined output scalar expressions projection for each column.

The FROM clause: Lyve Cloud S3 Select supports the following forms of the FROM clause:

FROM table_name
FROM table_name alias
FROM table_name AS alias

Where table_name is referring to the S3Object being queried over. Users using traditional relational databases can use this as a database schema that contains multiple views over a table.

Following standard SQL, the FROM clause creates rows that are filtered in the WHERE clause and projected in the SELECT list.

For JSON objects that are stored in Lyve Cloud S3 Select, you can also use the following forms of the FROM clause:

FROM S3Object[*].path
FROM S3Object[*].path alias
FROM S3Object[*].path AS alias

Using this form of the FROM clause, you can select from arrays or objects within a JSON object. You can specify path using one of the following forms:

  • By name (in an object): .name or ['name']

  • By index (in an array): [index]

  • By wildcard (in an object): .*

  • By wildcard (in an array): [*]

This example shows results using the following dataset and queries:

{ "created": "936864000", "dir_name": 
"important_docs", "files": [ {"name": "."}, {"name": ".."}, {"name": 
".aws"}, {"name": "downloads"} ], "owner": "Lyve Cloud S3" }
{ "created": "936864000", "dir_name": "other_docs", "files": [ {"name": 
"."}, {"name": ".."}, {"name": "my stuff"}, {"name": "backup"} ], 
"owner": "User" }
SELECT d.dir_name, d.files FROM S3Object[*] d
{"dir_name":"important_docs","files":[{"name":"."},{"name":".."},{"name":".Lyve Cloud"},{"name":"downloads"}]}
{"dir_name":"other_docs","files":[{"name":"."},{"name":".."},{"name":"my stuff"},{"name":"backup"}]}
{"dir_name":"important_docs","owner":"LYVE CLOUD S3"}
{"dir_name":"other_docs","owner":"User"}

WHERE clause: The WHERE clause filters rows based on the condition. A condition is an expression that has a Boolean result. Only rows for which the condition evaluates to TRUE are returned in the result.

WHERE condition

LIMIT clause: The LIMIT clause limits the number of records that you want the query to return based on the number.

Limit number

Attribute Access: The SELECT and WHERE clauses can refer to record data using one of the methods in the following sections, depending on whether the file that is being queried is in CSV or JSON format.

  • CSV

    • Column Numbers: You can refer to the Nth column of a row with the column name _N, where N is the column position. The position count starts at 1. For example, the first column is named _1 and the second column is named _2. You can refer to a column as _N or alias._N.

      For example, _2 and myAlias._2 are both valid ways to refer to a column in the SELECT list and WHERE clause.

    • Column Headers: For objects in CSV format that have a header row, the headers are available in the SELECT list and WHERE clause. In particular, as in traditional SQL, within SELECT and WHERE clause expressions, you can refer to the columns by alias.column_name or column_name.

  • JSON

    • Document: You can access JSON document fields as alias.name. Nested fields can also be accessed; for example, alias.name1.name2.name3.

    • List: You can access elements in a JSON list using zero-based indexes with the [] operator. For example, you can access the second element of a list as alias[1]. Accessing list elements can be combined with fields as alias.name1.name2[1].name3.

    • Examples: Consider this JSON object as a sample dataset:

      {"name": "Susan Smith",
      "org": "engineering",
      "projects":
          [
               {"project_name":"project1", "completed":false},
               {"project_name":"project2", "completed":true}
          ]
      }
    Example 3. The following query returns these results:
    Select s.name from S3Object s
    {"name":"Susan Smith"}


    Example 4. The following query returns these results:
    Select s.projects[0].project_name from S3Object s    
    {"project_name":"project1"}


Case Sensitivity of Header/Attribute Names

With Lyve Cloud S3 Select, you can use double quotation marks to indicate that column headers (for CSV objects) and attributes (for JSON objects) are case sensitive. Without double quotation marks, object headers/attributes are case insensitive. An error is displayed in case of ambiguity. The following examples are:

  1. Lyve Cloud S3 objects in CSV format with the specified column header(s), and with FileHeaderInfo set to "Use" for the query request.

    or

  2. Lyve Cloud S3 object in JSON format with the specified attributes.

Example 5. The object being queried has header/attribute "NAME".

The following expression successfully returns values from the object (no quotation marks: case insensitive):

SELECT s.name from S3Object s

The following expression results in a 400 error MissingHeaderName (quotation marks: case sensitive):

SELECT s."name" from S3Object s


Example 6. The Lyve Cloud S3 object being queried has one header/attribute with "NAME" and another header/attribute with "name".

The following expression results in a 400 error AmbiguousFieldName (no quotation marks: case insensitive, but there are two matches): SELECT s.name from S3Object s

The following expression successfully returns values from the object (quotation marks: case sensitive, so it resolves the ambiguity).

SELECT s."NAME" from S3Object s


Using Reserved Keywords as User-Defined Terms

Lyve Cloud S3 Select has a set of reserved keywords that are required to run the SQL expressions used to query object content. Reserved keywords include function names, data types, operators, and so on. In some cases, user-defined terms like the column headers (for CSV files) or attributes (for JSON objects) may clash with a reserved keyword. When this happens, you must use double quotation marks to indicate that you are intentionally using a user-defined term that clashes with a reserved keyword. Otherwise, a 400 parse error will result.

Scalar Expressions

Within the WHERE clause and the SELECT list, you can have SQL scalar expressions, which are expressions that return scalar values. They have the following form:

  • Literal: An SQL literal.

  • column_reference: A reference to a column in the form column_name or alias.column_name.

  • unary_op expression: Where unary_op unary is an SQL unary operator.

  • expression binary_op expression: Where binary_op is an SQL binary operator.

  • func_name: Where func_name is the name of a scalar function to invoke.

  • expression [ NOT ] BETWEEN expression AND expression

  • expression LIKE expression [ ESCAPE expression ]

Data Types

Lyve Cloud S3 Select support several primitive data types.

Data Type Conversions

The general rule is to follow the CAST function if defined. If CAST is not defined, then all input data is treated as a string. It must be cast into the relevant data types when necessary.

Name

Description

Examples

bool

True or False

False

int, integer

8-byte signed integer in the range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.

100000

string

UTF8-encoded variable-length string. The default limit is one character. The maximum character limit is 2,147,483,647.

xyz

float

8-byte floating-point number.

CAST(0.456 AS FLOAT)

decimal, numeric

Base-10 number, with a maximum precision of 38 (that is, the maximum number of significant digits), and with a scale within the range of -231 to 231-1 (that is, the base-10 exponent).

Note

Lyve Cloud S3 Select ignores scale and precision when both are provided at the same time.

123.456

timestamp

Time stamps represent a specific moment in time, always include a local offset, and are capable of arbitrary precision.

In the text format, time stamps follow the W3C note on date and time formats, but they must end with the literal "T" if not at least whole-day precision. Fractional seconds are allowed, with at least one digit of precision, and an unlimited maximum. Local-time offsets can be represented as either hour: minute offsets from UTC, or as the literal "Z" to denote a local time of UTC. They are required on timestamps with time and are not allowed on date values.

CAST('2007-04-05T14:30Z' AS TIMESTAMP)

Operators

Lyve Cloud S3 support the following operators.

Operator

Elements

Logical Operators

AND, NOT, OR

Comparison Operators

<, >, <=, >=, =, <>, !=, BETWEEN, IN (For example: IN ('a', 'b', 'c'))

Pattern Matching Operators

LIKE, _ (Matches any character), % (Matches any sequence of characters)

Unitary Operators

IS NULL, IS NOT NULL

Math Operators

Addition, subtraction, multiplication, division, and modulo are supported. (+, -, *, /, %, )

Operator Precedence

The following table shows the operators' precedence in decreasing order.

Operator/Element

Associativity

Required

-

unary minus

*, /, %

left

multiplication, division, modulo

+, -

left

addition, subtraction

IN

set membership

BETWEEN

range containment

LIKE

string pattern matching

<>

less than, greater than

=

right

equality, assignment

NOT

right

logical negation

AND

left

logical conjunction

OR

left

logical disjunction

Reserved Keyword

Below is the list of reserved keywords for Lyve Cloud S3 Select. These include function names, data types, operators, etc., that is required to run the SQL expressions used to query object content.

absolute, action, add, all, allocate, alter, and, any, are, as, asc, assertion, at, authorization, avg, bag, begin, between, bit, bit_length, blob, bool, boolean, both, by, cascade, cascaded, case, cast, catalog, char, char_length, character, character_length, check, clob, close, coalesce, collate, collation, column, commit, connect, connection, constraint, constraints, continue, convert, corresponding, count, create, cross, current, current_date, current_time, current_timestamp, current_user, cursor, date, day, deallocate, dec, decimal, declare, default, deferrable, deferred, delete, desc, describe, descriptor, diagnostics, disconnect, distinct, domain, double, drop, else, end, end-exec, escape, except, exception, exec, execute, exists, external, extract, false, fetch, first, float, for, foreign, found, from, full, get, global, go, goto, grant, group, having, hour, identity, immediate, in, indicator, initially, inner, input, insensitive, insert, int, integer, intersect, interval, into, is, isolation, join, key, language, last, leading, left, level, like, limit, list, local, lower, match, max, min, minute, missing, module, month, names, national, natural, nchar, next, no, not, null, nullif, numeric, octet_length, of, on, only, open, option, or, order, outer, output, overlaps, pad, partial, pivot, position, precision, prepare, preserve, primary, prior, privileges, procedure, public, read, real, references, relative, restrict, revoke, right, rollback, rows, schema, scroll, second, section, select, session, session_user, set, sexp, size, smallint, some, space, sql, sqlcode, sqlerror, sqlstate, string, struct, substring, sum, symbol, system_user, table, temporary, then, time, timestamp, timezone_hour, timezone_minute, to, trailing, transaction, translate, translation, trim, true, tuple, union, unique, unknown, unpivot, update, upper, usage, user, using, value, values, varchar, varying, view, when, whenever, where, with, work, write, year, zone

SQL Functions

Lyve Cloud S3 Select supports several SQL functions.

Aggregate Functions

Lyve Cloud S3 Select supports the following aggregate functions.

Function

Argument Type

Return Type

AVG(expression)

INT, FLOAT, DECIMAL

DECIMAL for an INT argument, FLOAT for a floating-point argument; otherwise the same as the argument data type.

COUNT

-

INT

MAX(expression)

INT, DECIMAL

Same as the argument type.

MIN(expression)

INT, DECIMAL

Same as the argument type.

SUM(expression)

INT, FLOAT, DOUBLE, DECIMAL

INT for INT argument, FLOAT for a floating-point argument; otherwise, the same as the argument data type.

Conditional Functions

Lyve Cloud S3 Select supports the following conditional functions.

CASE

The CASE expression is a conditional expression, similar to if/then/else statements found in other languages. CASE is used to specify a result when there are multiple conditions. There are two types of CASE expressions: simple and searched.

  • In simple CASE expressions, an expression is compared with a value. When a match is found, the specified action in the THEN clause is applied. If no match is found, the action in the ELSE clause is applied.

  • In searched CASE expressions, each CASE is evaluated based on a Boolean expression, and the CASE statement returns the first matching CASE. If no matching CASEs are found among the WHEN clauses, the action in the ELSE clause is returned.

Syntax

Simple CASE statement used to match conditions:

CASE expression
WHEN value THEN result
[WHEN...]
[ELSE result]
END

Searched CASE statement used to evaluate each condition:

CASE
WHEN boolean condition THEN result
[WHEN ...]
[ELSE result]
END
Example 7. Use a simple CASE expression to replace New York City with Big Apple in a query. Replace all other city names with other.
select venuecity,
case venuecity
when 'New York City'
then 'Big Apple' else 'other'
end from venue
order by venueid desc;
venuecity        |   case
-----------------+-----------
Los Angeles      | other
New York City    | Big Apple
San Francisco    | other
Baltimore        | other
...


COALESCE

Evaluate the arguments in order and returns the first non-unknown, that is, the first non-null or non-missing. This function does not propagate null and missing.

Syntax

COALESCE ( expression, expression, ... )

Parameters

Expression: The target expression that the function operates on.

COALESCE(1)                -- 1
COALESCE(null)             -- null
COALESCE(null, null)       -- null
COALESCE(missing)          -- null
COALESCE(missing, missing) -- null
COALESCE(1, null)          -- 1
COALESCE(null, null, 1)    -- 1
COALESCE(null, 'string')   -- 'string'
COALESCE(missing, 1)       -- 1

NULLIF

Given two expressions, it returns NULL if the two expressions evaluate to the same value; otherwise, it returns the result of evaluating the first expression.

Syntax

NULLIF ( expression1, expression2 )

Parameters

expression1, expression2: The target expressions that the function operates on.

NULLIF(1, 1)             -- null
NULLIF(1, 2)             -- 1
NULLIF(1.0, 1)           -- null
NULLIF(1, '1')           -- 1
NULLIF([1], [1])         -- null
NULLIF(1, NULL)          -- 1
NULLIF(NULL, 1)          -- null
NULLIF(null, null)       -- null
NULLIF(missing, null)    -- null
NULLIF(missing, missing) -- null
Conversion Functions

CAST

The CAST function converts an entity, such as an expression that evaluates to a single value, from one type to another.

Syntax

CAST ( expression AS data_type )

Parameters

  • Expression: A combination of one or more values, operators, and SQL functions that evaluate to a value.

  • data_type: The target data type, such as INT, to cast the expression to. For a list of supported data types, see Data Types.

CAST('2007-04-05T14:30Z' AS TIMESTAMP)
CAST(0.456 AS FLOAT)
Date Functions

DATE_ADD

Given a date part, a quantity, and a timestamp, returns an updated timestamp by altering the date part by the quantity.

Syntax

DATE_ADD( date_part, quantity, timestamp )

Parameters

The following table describes the parameters used to update the timestamp.

Parameter

Description

date_part

Specifies which part of the date to modify. This can be one of the following:

  • year

  • month

  • day

  • hour

  • miniute

  • second

quantity

The value to apply to the updated timestamp. Positive values for quantity add to the time stamp's date_part, and negative values subtract.

Timestamp

The target timestamp that the function operates on.

DATE_ADD(year, 5, `2010-01-01T`)                -- 2015-01-01 (equivalent to 2015-01-01T)
DATE_ADD(month, 1, `2010T`)                     -- 2010-02T (result will add precision as necessary)
DATE_ADD(month, 13, `2010T`)                    -- 2011-02T
DATE_ADD(day, -1, `2017-01-10T`)                -- 2017-01-09 (equivalent to 2017-01-09T)
DATE_ADD(hour, 1, `2017T`)                      -- 2017-01-01T01:00-00:00
DATE_ADD(hour, 1, `2017-01-02T03:04Z`)          -- 2017-01-02T04:04Z
DATE_ADD(minute, 1, `2017-01-02T03:04:05.006Z`) -- 2017-01-02T03:05:05.006Z
DATE_ADD(second, 1, `2017-01-02T03:04:05.006Z`) -- 2017-01-02T03:04:06.006Z

DATE_DIFF

Given a date part and two valid timestamps returns the difference in date parts. The return value is a negative integer when the date_part value of timestamp1 is greater than the date_part value of timestamp2. The return value is a positive integer when the date_part value of timestamp1 is less than the date_part value of timestamp2.

Syntax  

DATE_DIFF( date_part, timestamp1, timestamp2 )  

Parameters

The following table describes the parameters used to find the difference in dates.

Parameter

Description

date_part

Specifies which part of the timestamps to compare. For the definition of date_part, see DATE_ADD.

timestamp1

The first timestamp to compare.

timestamp2

The second timestamp to compare.

DATE_DIFF(year, `2010-01-01T`, `2011-01-01T`)            -- 1
DATE_DIFF(year, `2010T`, `2010-05T`)                     -- 4 (2010T is equivalent to 2010-01-01T00:00:00.000Z)
DATE_DIFF(month, `2010T`, `2011T`)                       -- 12
DATE_DIFF(month, `2011T`, `2010T`)                       -- -12
DATE_DIFF(day, `2010-01-01T23:00`, `2010-01-02T01:00`) -- 0 (need to be at least 24h apart to be 1 day apart)

EXTRACT

Given a date part and a timestamp returns the timestamp's date part value.  

Syntax

EXTRACT( date_part FROM timestamp )  

Parameters

The following table describes the parameters used to extract the timestamp.

Parameter

Description

date_part

Specifies which part of the date to modify.

This can be one of the following:

hour

  • year

  • month

  • day

  • hour

  • minute

  • second

  • timezone_hour

  • timezone_minute

Timestamp

The target timestamp that the function operates on.

EXTRACT(YEAR FROM `2010-01-01T`)                           -- 2010
EXTRACT(MONTH FROM `2010T`)                                -- 1 (equivalent to 2010-01-01T00:00:00.000Z)
EXTRACT(MONTH FROM `2010-10T`)                             -- 10
EXTRACT(HOUR FROM `2017-01-02T03:04:05+07:08`)             -- 3
EXTRACT(MINUTE FROM `2017-01-02T03:04:05+07:08`)           -- 4
EXTRACT(TIMEZONE_HOUR FROM `2017-01-02T03:04:05+07:08`)    -- 7
EXTRACT(TIMEZONE_MINUTE FROM `2017-01-02T03:04:05+07:08`)  -- 8

TO_STRING

Given a timestamp and a format pattern returns a string representation of the time stamp in the given format.  

Syntax

TO_STRING ( timestamp time_format_pattern )  

Parameters

The following table describes the parameters used to represent the timestamp in a given format

Parameter

Description

timestamp

The target timestamp that the function operates on.

time_format_pattern

A string that has the following special character interpretations.

TO_STRING(`1969-07-20T20:18Z`,  'MMMM d, y')                    -- "July 20, 1969"
TO_STRING(`1969-07-20T20:18Z`, 'MMM d, yyyy')                   -- "Jul 20, 1969"
TO_STRING(`1969-07-20T20:18Z`, 'M-d-yy')                        -- "7-20-69"
TO_STRING(`1969-07-20T20:18Z`, 'MM-d-y')                        -- "07-20-1969"
TO_STRING(`1969-07-20T20:18Z`, 'MMMM d, y h:m a')               -- "July 20, 1969 8:18 PM"
TO_STRING(`1969-07-20T20:18Z`, 'y-MM-dd''T''H:m:ssX')           -- "1969-07-20T20:18:00Z"
TO_STRING(`1969-07-20T20:18+08:00Z`, 'y-MM-dd''T''H:m:ssX')     -- "1969-07-20T20:18:00Z"
TO_STRING(`1969-07-20T20:18+08:00`, 'y-MM-dd''T''H:m:ssXXXX')   -- "1969-07-20T20:18:00+0800"
TO_STRING(`1969-07-20T20:18+08:00`, 'y-MM-dd''T''H:m:ssXXXXX')  -- "1969-07-20T20:18:00+08:00"

TO_TIMESTAMP  

Given a string, convert it into a timestamp.

This is the inverse operation of TO_STRING.  

Syntax

TO_TIMESTAMP ( string )  

Parameters

The following table describes the parameters used to convert a string to a timestamp.

Parameter

Description

string

The target string that the function operates on.

TO_TIMESTAMP('2007T')                         -- `2007T`
TO_TIMESTAMP('2007-02-23T12:14:33.079-08:00') -- `2007-02-23T12:14:33.079-08:00`

UTCNOW

It returns the current time in UTC as a time stamp.  

Syntax

UTCNOW()  

Parameters

None

UTCNOW() -- 2017-10-13T16:02:11.123Z
String Functions

CHAR_LENGTH, CHARACTER_LENGTH

Count the number of characters in the specified string.  

Syntax

CHAR_LENGTH ( string )

Parameters

The following table describes the parameters used to count the characters in a string.

Parameter

Description

string

The target string that the function operates on.

CHAR_LENGTH('')          -- 0
CHAR_LENGTH('abcdefg')   -- 7

LOWER

Given a string, it converts all uppercase characters to lowercase characters. Any non-uppercased characters remain unchanged.  

Syntax

LOWER ( string )

Parameters

The following table describes the parameters used to convert all upper case characters to lower case.

Parameter

Description

string

The target string that the function operates on.

LOWER('AbCdEfG!@#$') -- 'abcdefg!@#$'

SUBSTRING  

Given a string, a start index, and optionally a length, returns the substring from the start index up to the end of the string, or up to the length provided.

Note

The first character of the input string has index 1. If the start is <1, it is set to 1.

Syntax

SUBSTRING( string FROM start [ FOR length ] )  

Parameters

The following table describes the parameters that return the substring

Parameter

Description

string

The target string that the function operates on.

start

The start position of the string.

length

The length of the substring to return. If not present, proceed to the end of the string.

SUBSTRING("123456789", 0)      -- "123456789"
SUBSTRING("123456789", 1)      -- "123456789"
SUBSTRING("123456789", 2)      -- "23456789"
SUBSTRING("123456789", -4)     -- "123456789"
SUBSTRING("123456789", 0, 999) -- "123456789" 
SUBSTRING("123456789", 1, 5)   -- "12345"

TRIM

Trims leading or trailing characters from a string. The default character to remove is ' '.  

Syntax

TRIM ( [[LEADING | TRAILING | BOTH remove_chars] FROM] string )

Parameters

The following table describes the parameters used to trim characters from a string.

Parameter

Description

string

The target string that the function operates on.

LEADING | TRAILING | BOTH

Whether to trim leading or trailing characters or both leading and trailing characters.

remove_chars

The set of characters to remove. Note that remove_chars can be a string with length >1.

This function returns the string with any character from remove_chars found at the beginning or end of the string that was removed.

TRIM('       foobar         ')               -- 'foobar'
TRIM('      \tfoobar\t         ')            -- '\tfoobar\t'
TRIM(LEADING FROM '       foobar         ')  -- 'foobar         '
TRIM(TRAILING FROM '       foobar         ') -- '       foobar'
TRIM(BOTH FROM '       foobar         ')     -- 'foobar'
TRIM(BOTH '12' FROM '1112211foobar22211122') -- 'foobar'

UPPER

Given a string, it converts all lowercase characters to uppercase characters. Any non-lowercased characters remain unchanged.

Syntax

UPPER ( string )

Parameters

The following table describes the parameters used to convert the lower case character to upper case

Parameter

Description

string

The target string that the function operates on.

UPPER('AbCdEfG!@#$') -- 'ABCDEFG!@#$'