Filtering and retrieving data with Lyve Cloud S3 Select
With Lyve Cloud S3 Select, you can use simple structured query language (SQL) statements to filter the contents of a Lyve Cloud S3 object and retrieve just the subset of data that you need. With this feature, you can reduce the amount of data that is transferred, resulting in lower cost and improved latency when you retrieve this data. Lyve Cloud S3 Select works on objects stored in CSV, JSON, or Apache Parquet format. It also works with objects that are compressed with GZIP or BZIP2 (for CSV and JSON objects only) and server-side encrypted objects. You can specify the format of the results as either CSV or JSON, and you can determine how the records in the result are delimited. Pass the SQL expressions to Lyve Cloud S3 in the request. Lyve Cloud S3 Select supports a subset of SQL. For more information about supported SQL elements from Lyve Cloud S3 Select, see SQL reference for Lyve Cloud.
Requirements and limitations
The following are the requirements for using Lyve Cloud S3 Select:
You must have the s3:GetObject permission to use S3 Select command.
Use HTTPS and include the customer-provided encryption key (SSE-C) if the object you query is encrypted.
The following limits apply when using Lyve Cloud S3 Select:
Lyve Cloud S3 Select can only emit nested data using the JSON output format.
The maximum length of a SQL expression is 256 KB.
The maximum length of a record in the input or result is 1 MB.
Additional limitations apply when using Lyve Cloud S3 Select with Parquet objects:
Lyve Cloud S3 Select does not support Parquet output. You must specify the output format as CSV or JSON.
The maximum uncompressed row group size is 256 MB.
Lyve Cloud S3 Select supports only columnar compression using GZIP or Snappy. Lyve Cloud S3 Select does not support whole-object compression for Parquet objects.
You must use the data types specified in the object's schema.
Selecting on a repeated field returns only the last value.
Prerequisites
Install AWS CLI for Windows, MacOS, or Linux to execute all the queries using AWS CLI.
Configure AWS CLI v2 with Lyve Cloud. See Using AWS CLI.
Create a bucket and assign the required bucket permissions using either the Lyve Cloud console or the API.
Configure the Access key and Secret Key.
Synopsis
select-object-content--bucket <value>--key <value>--expression <value>--expression-type <value>--input-serialization <value>--output-serialization <value><outfile>
Option
Options | Description |
---|---|
Bucket (String) | The S3 bucket. |
key (string) | The object key. |
expression (string) | The expression that is used to query the object. |
expression-type (string) | The type of the provided expression (for example, SQL).Possible value: SQL. |
input-serialization (structure) | Describes the format of the data in the object that is being queried.
|
output-serialization (structure) | Describes the format of the data you want Lyve Cloud S3 to return in the response.
|
outfile (string) | Filename where the records will be saved. |