Using S3 Select
Overview of S3 Select API
Welcome to the Lyve Cloud S3 Select API reference documentation. This section provides the S3 API commands that are required to successfully apply to a bucket and perform object-level actions in Lyve Cloud.
Data stored in the cloud can become very large and difficult to manage. The S3 Select API with Lyve Cloud can be utilized to pull out the necessary elements that users need. This will fundamentally improve data management and retrieval for improved functionality and latency.
S3 Select can run simple SQL expressions. For example, you can query S3 object data (through the Lyve Cloud Console or Analytics package) and retrieve a subset of S3 object data instead of retrieving the entire S3 object. Data can be as large as a terabyte (TB) and is available in CSV, JSON and Apache Parquet formats. You can run SQL clauses, such as SELECT and WHERE to fetch data from objects stored in the mentioned formats. The feature also supports objects that are compressed with GZIP or BZIP2 (only for CSV and JSON objects) and server-side encrypted objects.
Query pushdown with S3 Select is supported with Spark, Hive, and Presto. Pushdown optimizes mapping performance because the source database can process transformation logic faster than the Data Integration Service. Use this feature to push down the computational work of filtering large data sets for processing from the Spark/Presto cluster to Lyve Cloud S3, which improves performance and reduces the amount of data transferred between Analytical applications and Lyve Cloud S3.