SQL reference for Lyve Cloud
This reference contains a description of the structured query language (SQL) elements that are supported by Lyve Cloud S3 Select.
SELECT Command
Lyve Cloud S3 supports only the SELECT: SQL command. The following ANSI standard clauses are supported for SELECT
:
SELECT list: The SELECT list names the columns, functions, and expressions that you want the query to return. The list represents the output of the query.
SELECT * SELECT projection [ AS column_alias | column_alias ] [, ...]
The first form with * (asterisk)
returns every row that passed the WHERE clause, as-is. The second form creates a row with user-defined output scalar expressions projection for each column.
The FROM clause: Lyve Cloud S3 Select supports the following forms of the FROM clause:
FROM table_name FROM table_name alias FROM table_name AS alias
Where table_name is referring to the S3Object being queried over. Users using traditional relational databases can use this as a database schema that contains multiple views over a table.
Following standard SQL, the FROM clause creates rows that are filtered in the WHERE clause and projected in the SELECT list.
For JSON objects that are stored in Lyve Cloud S3 Select, you can also use the following forms of the FROM clause:
FROM S3Object[*].path FROM S3Object[*].path alias FROM S3Object[*].path AS alias
Using this form of the FROM clause, you can select from arrays or objects within a JSON object. You can specify path using one of the following forms:
By name (in an object): .name or ['name']
By index (in an array): [index]
By wildcard (in an object): .*
By wildcard (in an array): [*]
This example shows results using the following dataset and queries:
{ "created": "936864000", "dir_name": "important_docs", "files": [ {"name": "."}, {"name": ".."}, {"name": ".aws"}, {"name": "downloads"} ], "owner": "Lyve Cloud S3" } { "created": "936864000", "dir_name": "other_docs", "files": [ {"name": "."}, {"name": ".."}, {"name": "my stuff"}, {"name": "backup"} ], "owner": "User" }
SELECT d.dir_name, d.files FROM S3Object[*] d
{"dir_name":"important_docs","files":[{"name":"."},{"name":".."},{"name":".Lyve Cloud"},{"name":"downloads"}]} {"dir_name":"other_docs","files":[{"name":"."},{"name":".."},{"name":"my stuff"},{"name":"backup"}]}
{"dir_name":"important_docs","owner":"LYVE CLOUD S3"} {"dir_name":"other_docs","owner":"User"}
WHERE clause: The WHERE clause filters rows based on the condition. A condition is an expression that has a Boolean result. Only rows for which the condition evaluates to TRUE are returned in the result.
WHERE condition
LIMIT clause: The LIMIT clause limits the number of records that you want the query to return based on the number.
Limit number
Attribute Access: The SELECT
and WHERE
clauses can refer to record data using one of the methods in the following sections, depending on whether the file that is being queried is in CSV or JSON format.
CSV
Column Numbers: You can refer to the Nth column of a row with the column name _N, where N is the column position. The position count starts at 1. For example, the first column is named _1 and the second column is named _2. You can refer to a column as _N or alias._N.
For example, _2 and myAlias._2 are both valid ways to refer to a column in the SELECT list and WHERE clause.
Column Headers: For objects in CSV format that have a header row, the headers are available in the SELECT list and WHERE clause. In particular, as in traditional SQL, within SELECT and WHERE clause expressions, you can refer to the columns by alias.column_name or column_name.
JSON
Document: You can access JSON document fields as alias.name. Nested fields can also be accessed; for example, alias.name1.name2.name3.
List: You can access elements in a JSON list using zero-based indexes with the [] operator. For example, you can access the second element of a list as alias[1]. Accessing list elements can be combined with fields as alias.name1.name2[1].name3.
Examples: Consider this JSON object as a sample dataset:
{"name": "Susan Smith", "org": "engineering", "projects": [ {"project_name":"project1", "completed":false}, {"project_name":"project2", "completed":true} ] }
Example 3. The following query returns these results:Select s.name from S3Object s
{"name":"Susan Smith"}
Example 4. The following query returns these results:Select s.projects[0].project_name from S3Object s
{"project_name":"project1"}
Case Sensitivity of Header/Attribute Names
With Lyve Cloud S3 Select, you can use double quotation marks to indicate that column headers (for CSV objects) and attributes (for JSON objects) are case sensitive. Without double quotation marks, object headers/attributes are case insensitive. An error is displayed in case of ambiguity. The following examples are:
Lyve Cloud S3 objects in CSV format with the specified column header(s), and with FileHeaderInfo set to "Use" for the query request.
or
Lyve Cloud S3 object in JSON format with the specified attributes.
The following expression successfully returns values from the object (no quotation marks: case insensitive):
SELECT s.name from S3Object s
The following expression results in a 400 error MissingHeaderName (quotation marks: case sensitive):
SELECT s."name" from S3Object s
The following expression results in a 400 error AmbiguousFieldName (no quotation marks: case insensitive, but there are two matches): SELECT s.name from S3Object s
The following expression successfully returns values from the object (quotation marks: case sensitive, so it resolves the ambiguity).
SELECT s."NAME" from S3Object s
Using Reserved Keywords as User-Defined Terms
Lyve Cloud S3 Select has a set of reserved keywords that are required to run the SQL expressions used to query object content. Reserved keywords include function names, data types, operators, and so on. In some cases, user-defined terms like the column headers (for CSV files) or attributes (for JSON objects) may clash with a reserved keyword. When this happens, you must use double quotation marks to indicate that you are intentionally using a user-defined term that clashes with a reserved keyword. Otherwise, a 400 parse error will result.
Scalar Expressions
Within the WHERE clause and the SELECT list, you can have SQL scalar expressions, which are expressions that return scalar values. They have the following form:
Literal: An SQL literal.
column_reference: A reference to a column in the form column_name or alias.column_name.
unary_op expression: Where unary_op unary is an SQL unary operator.
expression binary_op expression: Where binary_op is an SQL binary operator.
func_name: Where func_name is the name of a scalar function to invoke.
expression [ NOT ] BETWEEN expression AND expression
expression LIKE expression [ ESCAPE expression ]
Data Types
Lyve Cloud S3 Select support several primitive data types.
Data Type Conversions
The general rule is to follow the CAST function if defined. If CAST is not defined, then all input data is treated as a string. It must be cast into the relevant data types when necessary.
Name | Description | Examples |
---|---|---|
bool | True or False | False |
int, integer | 8-byte signed integer in the range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. | 100000 |
string | UTF8-encoded variable-length string. The default limit is one character. The maximum character limit is 2,147,483,647. | xyz |
float | 8-byte floating-point number. | CAST(0.456 AS FLOAT) |
decimal, numeric | Base-10 number, with a maximum precision of 38 (that is, the maximum number of significant digits), and with a scale within the range of -231 to 231-1 (that is, the base-10 exponent). NoteLyve Cloud S3 Select ignores scale and precision when both are provided at the same time. | 123.456 |
timestamp | Time stamps represent a specific moment in time, always include a local offset, and are capable of arbitrary precision. In the text format, time stamps follow the W3C note on date and time formats, but they must end with the literal "T" if not at least whole-day precision. Fractional seconds are allowed, with at least one digit of precision, and an unlimited maximum. Local-time offsets can be represented as either hour: minute offsets from UTC, or as the literal "Z" to denote a local time of UTC. They are required on timestamps with time and are not allowed on date values. | CAST('2007-04-05T14:30Z' AS TIMESTAMP) |
Operators
Lyve Cloud S3 support the following operators.
Operator | Elements |
---|---|
Logical Operators | AND, NOT, OR |
Comparison Operators | <, >, <=, >=, =, <>, !=, BETWEEN, IN (For example: IN ('a', 'b', 'c')) |
Pattern Matching Operators | LIKE, _ (Matches any character), % (Matches any sequence of characters) |
Unitary Operators | IS NULL, IS NOT NULL |
Math Operators | Addition, subtraction, multiplication, division, and modulo are supported. (+, -, *, /, %, ) |
Operator Precedence
The following table shows the operators' precedence in decreasing order.
Operator/Element | Associativity | Required |
---|---|---|
- | unary minus | |
*, /, % | left | multiplication, division, modulo |
+, - | left | addition, subtraction |
IN | set membership | |
BETWEEN | range containment | |
LIKE | string pattern matching | |
<> | less than, greater than | |
= | right | equality, assignment |
NOT | right | logical negation |
AND | left | logical conjunction |
OR | left | logical disjunction |
Reserved Keyword
Below is the list of reserved keywords for Lyve Cloud S3 Select. These include function names, data types, operators, etc., that is required to run the SQL expressions used to query object content.
absolute, action, add, all, allocate, alter, and, any, are, as, asc, assertion, at, authorization, avg, bag, begin, between, bit, bit_length, blob, bool, boolean, both, by, cascade, cascaded, case, cast, catalog, char, char_length, character, character_length, check, clob, close, coalesce, collate, collation, column, commit, connect, connection, constraint, constraints, continue, convert, corresponding, count, create, cross, current, current_date, current_time, current_timestamp, current_user, cursor, date, day, deallocate, dec, decimal, declare, default, deferrable, deferred, delete, desc, describe, descriptor, diagnostics, disconnect, distinct, domain, double, drop, else, end, end-exec, escape, except, exception, exec, execute, exists, external, extract, false, fetch, first, float, for, foreign, found, from, full, get, global, go, goto, grant, group, having, hour, identity, immediate, in, indicator, initially, inner, input, insensitive, insert, int, integer, intersect, interval, into, is, isolation, join, key, language, last, leading, left, level, like, limit, list, local, lower, match, max, min, minute, missing, module, month, names, national, natural, nchar, next, no, not, null, nullif, numeric, octet_length, of, on, only, open, option, or, order, outer, output, overlaps, pad, partial, pivot, position, precision, prepare, preserve, primary, prior, privileges, procedure, public, read, real, references, relative, restrict, revoke, right, rollback, rows, schema, scroll, second, section, select, session, session_user, set, sexp, size, smallint, some, space, sql, sqlcode, sqlerror, sqlstate, string, struct, substring, sum, symbol, system_user, table, temporary, then, time, timestamp, timezone_hour, timezone_minute, to, trailing, transaction, translate, translation, trim, true, tuple, union, unique, unknown, unpivot, update, upper, usage, user, using, value, values, varchar, varying, view, when, whenever, where, with, work, write, year, zone
SQL Functions
Lyve Cloud S3 Select supports several SQL functions.
Aggregate Functions
Lyve Cloud S3 Select supports the following aggregate functions.
Function | Argument Type | Return Type |
---|---|---|
AVG(expression) | INT, FLOAT, DECIMAL | DECIMAL for an INT argument, FLOAT for a floating-point argument; otherwise the same as the argument data type. |
COUNT | - | INT |
MAX(expression) | INT, DECIMAL | Same as the argument type. |
MIN(expression) | INT, DECIMAL | Same as the argument type. |
SUM(expression) | INT, FLOAT, DOUBLE, DECIMAL | INT for INT argument, FLOAT for a floating-point argument; otherwise, the same as the argument data type. |
Conditional Functions
Lyve Cloud S3 Select supports the following conditional functions.
CASE
The CASE expression is a conditional expression, similar to if/then/else statements found in other languages. CASE is used to specify a result when there are multiple conditions. There are two types of CASE expressions: simple and searched.
In simple CASE expressions, an expression is compared with a value. When a match is found, the specified action in the THEN clause is applied. If no match is found, the action in the ELSE clause is applied.
In searched CASE expressions, each CASE is evaluated based on a Boolean expression, and the CASE statement returns the first matching CASE. If no matching CASEs are found among the WHEN clauses, the action in the ELSE clause is returned.
Syntax
Simple CASE statement used to match conditions:
CASE expression WHEN value THEN result [WHEN...] [ELSE result] END
Searched CASE statement used to evaluate each condition:
CASE WHEN boolean condition THEN result [WHEN ...] [ELSE result] END
select venuecity, case venuecity when 'New York City' then 'Big Apple' else 'other' end from venue order by venueid desc; venuecity | case -----------------+----------- Los Angeles | other New York City | Big Apple San Francisco | other Baltimore | other ...
COALESCE
Evaluate the arguments in order and returns the first non-unknown, that is, the first non-null or non-missing. This function does not propagate null and missing.
Syntax
COALESCE ( expression, expression, ... )
Parameters
Expression: The target expression that the function operates on.
COALESCE(1) -- 1 COALESCE(null) -- null COALESCE(null, null) -- null COALESCE(missing) -- null COALESCE(missing, missing) -- null COALESCE(1, null) -- 1 COALESCE(null, null, 1) -- 1 COALESCE(null, 'string') -- 'string' COALESCE(missing, 1) -- 1
NULLIF
Given two expressions, it returns NULL if the two expressions evaluate to the same value; otherwise, it returns the result of evaluating the first expression.
Syntax
NULLIF ( expression1, expression2 )
Parameters
expression1, expression2: The target expressions that the function operates on.
NULLIF(1, 1) -- null NULLIF(1, 2) -- 1 NULLIF(1.0, 1) -- null NULLIF(1, '1') -- 1 NULLIF([1], [1]) -- null NULLIF(1, NULL) -- 1 NULLIF(NULL, 1) -- null NULLIF(null, null) -- null NULLIF(missing, null) -- null NULLIF(missing, missing) -- null
Conversion Functions
CAST
The CAST function converts an entity, such as an expression that evaluates to a single value, from one type to another.
Syntax
CAST ( expression AS data_type )
Parameters
Expression: A combination of one or more values, operators, and SQL functions that evaluate to a value.
data_type: The target data type, such as INT, to cast the expression to. For a list of supported data types, see Data Types.
CAST('2007-04-05T14:30Z' AS TIMESTAMP) CAST(0.456 AS FLOAT)
Date Functions
DATE_ADD
Given a date part, a quantity, and a timestamp, returns an updated timestamp by altering the date part by the quantity.
Syntax
DATE_ADD( date_part, quantity, timestamp )
Parameters
The following table describes the parameters used to update the timestamp.
Parameter | Description |
---|---|
date_part | Specifies which part of the date to modify. This can be one of the following:
|
quantity | The value to apply to the updated timestamp. Positive values for quantity add to the time stamp's date_part, and negative values subtract. |
Timestamp | The target timestamp that the function operates on. |
DATE_ADD(year, 5, `2010-01-01T`) -- 2015-01-01 (equivalent to 2015-01-01T) DATE_ADD(month, 1, `2010T`) -- 2010-02T (result will add precision as necessary) DATE_ADD(month, 13, `2010T`) -- 2011-02T DATE_ADD(day, -1, `2017-01-10T`) -- 2017-01-09 (equivalent to 2017-01-09T) DATE_ADD(hour, 1, `2017T`) -- 2017-01-01T01:00-00:00 DATE_ADD(hour, 1, `2017-01-02T03:04Z`) -- 2017-01-02T04:04Z DATE_ADD(minute, 1, `2017-01-02T03:04:05.006Z`) -- 2017-01-02T03:05:05.006Z DATE_ADD(second, 1, `2017-01-02T03:04:05.006Z`) -- 2017-01-02T03:04:06.006Z
DATE_DIFF
Given a date part and two valid timestamps returns the difference in date parts. The return value is a negative integer when the date_part value of timestamp1 is greater than the date_part value of timestamp2. The return value is a positive integer when the date_part value of timestamp1 is less than the date_part value of timestamp2.
Syntax
DATE_DIFF( date_part, timestamp1, timestamp2 )
Parameters
The following table describes the parameters used to find the difference in dates.
Parameter | Description |
---|---|
date_part | Specifies which part of the timestamps to compare. For the definition of date_part, see DATE_ADD. |
timestamp1 | The first timestamp to compare. |
timestamp2 | The second timestamp to compare. |
DATE_DIFF(year, `2010-01-01T`, `2011-01-01T`) -- 1 DATE_DIFF(year, `2010T`, `2010-05T`) -- 4 (2010T is equivalent to 2010-01-01T00:00:00.000Z) DATE_DIFF(month, `2010T`, `2011T`) -- 12 DATE_DIFF(month, `2011T`, `2010T`) -- -12 DATE_DIFF(day, `2010-01-01T23:00`, `2010-01-02T01:00`) -- 0 (need to be at least 24h apart to be 1 day apart)
EXTRACT
Given a date part and a timestamp returns the timestamp's date part value.
Syntax
EXTRACT( date_part FROM timestamp )
Parameters
The following table describes the parameters used to extract the timestamp.
Parameter | Description |
---|---|
date_part | Specifies which part of the date to modify. This can be one of the following: hour
|
Timestamp | The target timestamp that the function operates on. |
EXTRACT(YEAR FROM `2010-01-01T`) -- 2010 EXTRACT(MONTH FROM `2010T`) -- 1 (equivalent to 2010-01-01T00:00:00.000Z) EXTRACT(MONTH FROM `2010-10T`) -- 10 EXTRACT(HOUR FROM `2017-01-02T03:04:05+07:08`) -- 3 EXTRACT(MINUTE FROM `2017-01-02T03:04:05+07:08`) -- 4 EXTRACT(TIMEZONE_HOUR FROM `2017-01-02T03:04:05+07:08`) -- 7 EXTRACT(TIMEZONE_MINUTE FROM `2017-01-02T03:04:05+07:08`) -- 8
TO_STRING
Given a timestamp and a format pattern returns a string representation of the time stamp in the given format.
Syntax
TO_STRING ( timestamp time_format_pattern )
Parameters
The following table describes the parameters used to represent the timestamp in a given format
Parameter | Description |
---|---|
timestamp | The target timestamp that the function operates on. |
time_format_pattern | A string that has the following special character interpretations. |
TO_STRING(`1969-07-20T20:18Z`, 'MMMM d, y') -- "July 20, 1969" TO_STRING(`1969-07-20T20:18Z`, 'MMM d, yyyy') -- "Jul 20, 1969" TO_STRING(`1969-07-20T20:18Z`, 'M-d-yy') -- "7-20-69" TO_STRING(`1969-07-20T20:18Z`, 'MM-d-y') -- "07-20-1969" TO_STRING(`1969-07-20T20:18Z`, 'MMMM d, y h:m a') -- "July 20, 1969 8:18 PM" TO_STRING(`1969-07-20T20:18Z`, 'y-MM-dd''T''H:m:ssX') -- "1969-07-20T20:18:00Z" TO_STRING(`1969-07-20T20:18+08:00Z`, 'y-MM-dd''T''H:m:ssX') -- "1969-07-20T20:18:00Z" TO_STRING(`1969-07-20T20:18+08:00`, 'y-MM-dd''T''H:m:ssXXXX') -- "1969-07-20T20:18:00+0800" TO_STRING(`1969-07-20T20:18+08:00`, 'y-MM-dd''T''H:m:ssXXXXX') -- "1969-07-20T20:18:00+08:00"
TO_TIMESTAMP
Given a string, convert it into a timestamp.
This is the inverse operation of TO_STRING.
Syntax
TO_TIMESTAMP ( string )
Parameters
The following table describes the parameters used to convert a string to a timestamp.
Parameter | Description |
---|---|
string | The target string that the function operates on. |
TO_TIMESTAMP('2007T') -- `2007T` TO_TIMESTAMP('2007-02-23T12:14:33.079-08:00') -- `2007-02-23T12:14:33.079-08:00`
UTCNOW
It returns the current time in UTC as a time stamp.
Syntax
UTCNOW()
Parameters
None
UTCNOW() -- 2017-10-13T16:02:11.123Z
String Functions
CHAR_LENGTH, CHARACTER_LENGTH
Count the number of characters in the specified string.
Syntax
CHAR_LENGTH ( string )
Parameters
The following table describes the parameters used to count the characters in a string.
Parameter | Description |
---|---|
string | The target string that the function operates on. |
CHAR_LENGTH('') -- 0 CHAR_LENGTH('abcdefg') -- 7
LOWER
Given a string, it converts all uppercase characters to lowercase characters. Any non-uppercased characters remain unchanged.
Syntax
LOWER ( string )
Parameters
The following table describes the parameters used to convert all upper case characters to lower case.
Parameter | Description |
---|---|
string | The target string that the function operates on. |
LOWER('AbCdEfG!@#$') -- 'abcdefg!@#$'
SUBSTRING
Given a string, a start index, and optionally a length, returns the substring from the start index up to the end of the string, or up to the length provided.
Note
The first character of the input string has index 1. If the start is <1, it is set to 1.
Syntax
SUBSTRING( string FROM start [ FOR length ] )
Parameters
The following table describes the parameters that return the substring
Parameter | Description |
---|---|
string | The target string that the function operates on. |
start | The start position of the string. |
length | The length of the substring to return. If not present, proceed to the end of the string. |
SUBSTRING("123456789", 0) -- "123456789" SUBSTRING("123456789", 1) -- "123456789" SUBSTRING("123456789", 2) -- "23456789" SUBSTRING("123456789", -4) -- "123456789" SUBSTRING("123456789", 0, 999) -- "123456789" SUBSTRING("123456789", 1, 5) -- "12345"
TRIM
Trims leading or trailing characters from a string. The default character to remove is ' '.
Syntax
TRIM ( [[LEADING | TRAILING | BOTH remove_chars] FROM] string )
Parameters
The following table describes the parameters used to trim characters from a string.
Parameter | Description |
---|---|
string | The target string that the function operates on. |
LEADING | TRAILING | BOTH | Whether to trim leading or trailing characters or both leading and trailing characters. |
remove_chars | The set of characters to remove. Note that remove_chars can be a string with length >1. This function returns the string with any character from remove_chars found at the beginning or end of the string that was removed. |
TRIM(' foobar ') -- 'foobar' TRIM(' \tfoobar\t ') -- '\tfoobar\t' TRIM(LEADING FROM ' foobar ') -- 'foobar ' TRIM(TRAILING FROM ' foobar ') -- ' foobar' TRIM(BOTH FROM ' foobar ') -- 'foobar' TRIM(BOTH '12' FROM '1112211foobar22211122') -- 'foobar'
UPPER
Given a string, it converts all lowercase characters to uppercase characters. Any non-lowercased characters remain unchanged.
Syntax
UPPER ( string )
Parameters
The following table describes the parameters used to convert the lower case character to upper case
Parameter | Description |
---|---|
string | The target string that the function operates on. |
UPPER('AbCdEfG!@#$') -- 'ABCDEFG!@#$'