Lyve Cloud Documentation

Filtering and retrieving data with Lyve Cloud S3 Select

With Lyve Cloud S3 Select, you can use simple structured query language (SQL) statements to filter the contents of a Lyve Cloud S3 object and retrieve just the subset of data that you need. With this feature, you can reduce the amount of data that is transferred, resulting in lower cost and improved latency when you retrieve this data. Lyve Cloud S3 Select works on objects stored in CSV, JSON, or Apache Parquet format. It also works with objects that are compressed with GZIP or BZIP2 (for CSV and JSON objects only) and server-side encrypted objects. You can specify the format of the results as either CSV or JSON, and you can determine how the records in the result are delimited. Pass the SQL expressions to Lyve Cloud S3 in the request. Lyve Cloud S3 Select supports a subset of SQL. For more information about supported SQL elements from Lyve Cloud S3 Select, see SQL reference for Lyve Cloud.

Requirements and limitations

The following are the requirements for using Lyve Cloud S3 Select:

  • You must have the s3:GetObject permission to use S3 Select command.

  • Use HTTPS and include the customer-provided encryption key (SSE-C) if the object you query is encrypted.

The following limits apply when using Lyve Cloud S3 Select:

  • Lyve Cloud S3 Select can only emit nested data using the JSON output format.

  • The maximum length of a SQL expression is 256 KB.

  • The maximum length of a record in the input or result is 1 MB.

Additional limitations apply when using Lyve Cloud S3 Select with Parquet objects:

  • Lyve Cloud S3 Select does not support Parquet output. You must specify the output format as CSV or JSON.

  • The maximum uncompressed row group size is 256 MB.

  • Lyve Cloud S3 Select supports only columnar compression using GZIP or Snappy. Lyve Cloud S3 Select does not support whole-object compression for Parquet objects.

  • You must use the data types specified in the object's schema.

  • Selecting on a repeated field returns only the last value.

Prerequisites
  1. Install AWS CLI for Windows, MacOS, or Linux to execute all the queries using AWS CLI.

  2. Configure AWS CLI v2 with Lyve Cloud. See Using AWS CLI.

  3. Create a bucket and assign the required bucket permissions using either the Lyve Cloud console or the API.

  4. Configure the Access key and Secret Key.

Synopsis
select-object-content--bucket <value>--key <value>--expression <value>--expression-type <value>--input-serialization <value>--output-serialization <value><outfile>
Option

Options

Description

Bucket (String)

The S3 bucket.

key (string)

The object key.

expression (string)

The expression that is used to query the object.

expression-type (string)

The type of the provided expression (for example, SQL).Possible value: SQL.

input-serialization (structure)

Describes the format of the data in the object that is being queried.

  • CSV: (structure): Describes the serialization of a CSV-encoded object.

    • FileHeaderInfo: (string): Describes the first line of input. Valid values are:

      • NONE: The first line is not a header.

      • IGNORE: The first line is a header, but you can’t use the header values to indicate the column in an expression. You can use column position (such as _1, _2, …) to indicate the column (SELECT s._1 FROM OBJECT s ).

      • Use: The first line is a header, and you can use the header value to identify a column in an expression (SELECT "name" FROM OBJECT).

    • Comments: (string): A single character is used to indicate that a row should be ignored when the character is present at the start of that row. You can specify any character to indicate a comment line.

    • QuoteEscapeCharacter: (string): A single character used for escaping the quotation mark character inside an already escaped value. For example, the value “”” a, b “”” is parsed as ” a, b “.

    • RecordDelimiter: (string): A single character used to separate individual records in the input. Instead of the default value, you can specify an arbitrary delimiter.

    • FieldDelimiter: (string): A single character used to separate individual fields in a record. You can specify an arbitrary delimiter.

    • QuoteCharacter: (string): A single character is used for escaping when the field delimiter is part of the value. For example, if the value is a, b, Lyve Cloud S3 wraps this field value in quotation marks, as follows: " a , b " .

      • Type: String

      • Default: "

      • Ancestors: CSV

    • AllowQuotedRecordDelimiter: (boolean): Specifies that CSV field values may contain quoted record delimiters and such records should be allowed. Default value is FALSE. Setting this value to TRUE may lower performance.

    • CompressionType: (string): Specifies object’s compression format. Valid values: NONE, GZIP, BZIP2. Default Value: NONE.

    • JSON (structure): Specifies JSON as object’s input serialization format.

      • Type: (string)

      • The type of JSON. Valid values: Document, Lines.

    • Parquet (structure): Specifies Parquet as object’s input serialization format.

output-serialization (structure)

Describes the format of the data you want Lyve Cloud S3 to return in the response.

  • CSV: (structure): Describes the serialization of CSV-encoded Select results.

    • QuoteFields: (string): Indicates whether to use quotation marks around output fields.

      • ALWAYS: Always use quotation marks for output fields.

      • ASNEEDED: Use quotation marks for output fields when needed.

    • QuoteEscapeCharacter (string): The single character used for escaping the quote character inside an already escaped value.

    • RecordDelimiter: (string): A single character used to separate individual records in the output. Instead of the default value, you can specify an arbitrary delimiter.

    • FieldDelimiter: (string): The value used to separate individual fields in a record. You can specify an arbitrary delimiter.

    • QuoteCharacter: (string): A single character is used for escaping when the field delimiter is part of the value. For example, if the value is a, b, Lyve Cloud S3 wraps this field value in quotation marks, as follows: " a, b ".

  • JSON (structure): Specifies JSON as the request’s output serialization format.

    • RecordDelimiter (string): The value used to separate individual records in the output. If no value is specified, Lyve Cloud S3 uses a newline character (‘n’).

outfile (string)

Filename where the records will be saved.