Examples of using Lyve Cloud S3 Select on objects
Note
Run the commands using AWS CLI.
Read data from CSV
Example 1: The following example reads the data from CSV.
The following table consists of sample data:
Name | Team | Position | Height (Inches) | Weight (lbs) | Age |
---|---|---|---|---|---|
AdamDonachie | BAL | Catcher | 74 | 180 | 22.99 |
PaulBako | BAL | Catcher | 74 | 215 | 34.69 |
RamonHernandez | BAL | Catcher | 72 | 210 | 30.78 |
KevinMillar | BAL | FisherBaseman | 72 | 210 | 35.43 |
ChrisGomez | BAL | FisherBaseman | 73 | 188 | 35.71 |
BrianRoberts | BAL | SecondBaseman | 69 | 176 | 29.39 |
MiguelTejada | BAL | Shortstop | 69 | 209 | 30.77 |
MelvinMora | BAL | ThirdBaseman | 71 | 200 | 35.07 |
Copy CSV to Lyve Cloud S3 using the following command:
aws s3 cp input.csv s3a://lyve-cloud-bucket/path_to_s3_object/input.csv
Query the CSV using
s3api
with SQL dialect.Query: select * from s3object limit 10
Command:
aws s3api --endpoint=https://s3.us-east-1.Lyve Cloud.seagate.com \ select-object-content \ --bucket lyve-cloud-bucket \ --key path_to_s3_object/ input.csv \ --expression "select * from s3object " \ --expression-type 'SQL' \ --input-serialization '{"CSV": {}, "CompressionType": "NONE"}' \ --output-serialization '{"CSV": {}}' "output.csv"
Result: Run the
cat output.csv
command to see the output.
Add filter in select query.
Query: select * from s3object where Position = 'Catcher' limit 10
Command:
aws s3api --endpoint=https://s3.us-east-1.Lyve Cloud.seagate.com \ select-object-content \ --bucket lyve-cloud-bucket \ --key path_to_s3_object/ input.csv \ --expression " select * from s3object where Position ='Catcher' limit 10" \ --expression-type 'SQL' \ --input-serialization '{ "CSV": {"FileHeaderInfo": "Use" }, "CompressionType": "NONE"}' \ --output-serialization '{"CSV": {}}' "output.csv"
Result: Run the cat output.csv command to see the output.
Select a specific column from the CSV file.
Query: select Name, Team, Position from s3object limit 10
Command:
aws s3api --endpoint=https://s3.us-east-1.Lyve Cloud.seagate.com \ select-object-content \ --bucket lyve-cloud-bucket \ --key path_to_s3_object/ input.csv \ --expression "select Name, Team, Position from s3object where Position = 'Catcher' limit 10" \ --expression-type 'SQL' \ --input-serialization '{ "CSV": {"FileHeaderInfo": "Use" }, "CompressionType": "NONE"}' \ --output-serialization '{"CSV": {}}' "output.csv"
Result: Run the cat output.csv command to see the output.
Reads data from JSON
Example 2: The following example reads the data from JSON.
The following image is sample data:

Run the following command to copy JSON to Lyve Cloud S3.
aws s3 cp input.json s3a://lyve-cloud-bucket/path_to_s3_object/input.json
Query JSON using s3api with sql dialect.
Query: select * from s3object limit 10
Command:
aws s3api --endpoint=https://s3.us-east-1.Lyve Cloud.seagate.com select-object-content --bucket lyve-cloud-bucket --key path_to_s3_object/input.json --expression " select * from s3object limit 10" --expression-type 'SQL' --input-serialization '{"JSON": {"Type": "LINES"}, "CompressionType": "NONE"}' --output-serialization '{"JSON": {}}' "output.json"
Result:
Add filter in select query.
Query: select * from s3object where Position = 'Catcher' limit 10
Command:
aws s3api --endpoint=https://s3.us-east-1.Lyve Cloud.seagate.com select-object-content --bucket lyve-cloud-bucket --key path_to_s3_object/input.json --expression " select * from s3object where Position = 'Catcher' limit 10" --expression-type 'SQL' --input-serialization '{"JSON": {"Type": "LINES"}, "CompressionType": "NONE"}' --output-serialization '{"JSON": {}}' "output.json"
Result: Run the cat output.json command to see the output.
Select a specific column from the JSON file.
Query: select Name, Team, Position from s3object limit 10
Command:
aws s3api --endpoint=https://s3.us-east-1.Lyve Cloud.seagate.com select-object-content --bucket lyve-cloud-bucket --key path_to_s3_object/input.json --expression "select Name, Team, Position from s3object limit 10" --expression-type 'SQL' --input-serialization '{"JSON": {"Type": "LINES"}, "CompressionType": "NONE"}' --output-serialization '{"JSON": {}}' "output.json"
Result: Run the
cat output.json
command to see the output.
Read from PARQUET
Note
Parquet option is not enabled.