Skip to main content

Lyve Cloud Documentation

Examples of using Lyve Cloud S3 Select on objects

Note

Run the commands using AWS CLI.

Read data from CSV

Example 1: The following example reads the data from CSV.

The following table consists of sample data:

Name

Team

Position

Height (Inches)

Weight (lbs)

Age

AdamDonachie

BAL

Catcher

74

180

22.99

PaulBako

BAL

Catcher

74

215

34.69

RamonHernandez

BAL

Catcher

72

210

30.78

KevinMillar

BAL

FisherBaseman

72

210

35.43

ChrisGomez

BAL

FisherBaseman

73

188

35.71

BrianRoberts

BAL

SecondBaseman

69

176

29.39

MiguelTejada

BAL

Shortstop

69

209

30.77

MelvinMora

BAL

ThirdBaseman

71

200

35.07

  1. Copy CSV to Lyve Cloud S3 using the following command:

     aws s3 cp input.csv s3a://lyve-cloud-bucket/path_to_s3_object/input.csv
  2. Query the CSV using s3api with SQL dialect.

    • Query: select * from s3object limit 10

    • Command:

      aws s3api
      --endpoint=https://s3.us-east-1.Lyve Cloud.seagate.com \
      select-object-content \
      --bucket lyve-cloud-bucket \
      --key path_to_s3_object/ input.csv \
      --expression "select * from s3object " \
      --expression-type 'SQL' \
      --input-serialization '{"CSV": {}, "CompressionType": "NONE"}' \
      --output-serialization '{"CSV": {}}' "output.csv"
    • Result: Run the cat output.csv command to see the output.

      s3_select-Eg1-stp2.png
  3. Add filter in select query.

    • Query: select * from s3object where Position = 'Catcher' limit 10

    • Command:

      aws s3api 
      --endpoint=https://s3.us-east-1.Lyve Cloud.seagate.com \
      select-object-content \
      --bucket lyve-cloud-bucket \
      --key path_to_s3_object/ input.csv \
      --expression " select * from s3object  where Position ='Catcher' limit 10" \
      --expression-type 'SQL' \
      --input-serialization '{ "CSV": {"FileHeaderInfo": "Use" }, "CompressionType": "NONE"}' \
      --output-serialization '{"CSV": {}}' "output.csv"
    • Result: Run the cat output.csv command to see the output.

      s3_select-Eg1-stp3.png
  4. Select a specific column from the CSV file.

    • Query: select Name, Team, Position from s3object limit 10

    • Command:

      aws s3api 
      --endpoint=https://s3.us-east-1.Lyve Cloud.seagate.com \
      select-object-content \
      --bucket lyve-cloud-bucket \
      --key path_to_s3_object/ input.csv \
      --expression "select Name, Team, Position from s3object  where Position = 'Catcher' limit 10" \
      --expression-type 'SQL' \
      --input-serialization '{ "CSV": {"FileHeaderInfo": "Use" }, "CompressionType": "NONE"}' \
      --output-serialization '{"CSV": {}}' "output.csv"
    • Result: Run the cat output.csv command to see the output.

Reads data from JSON

Example 2: The following example reads the data from JSON.

The following image is sample data:

Exmple2-json.png
  1. Run the following command to copy JSON to Lyve Cloud S3.

    aws s3 cp input.json s3a://lyve-cloud-bucket/path_to_s3_object/input.json
  2. Query JSON using s3api with sql dialect.

    • Query: select * from s3object limit 10

    • Command:

      aws s3api
      --endpoint=https://s3.us-east-1.Lyve Cloud.seagate.com 
      select-object-content
      --bucket lyve-cloud-bucket
      --key path_to_s3_object/input.json 
      --expression " select * from s3object limit 10" 
      --expression-type 'SQL'
      --input-serialization '{"JSON": {"Type": "LINES"}, "CompressionType": "NONE"}'
      --output-serialization '{"JSON": {}}' "output.json"
    • Result:

      Exmple2-step_2-json.png
  3. Add filter in select query.

    • Query: select * from s3object where Position = 'Catcher' limit 10

    • Command:

      aws s3api
      --endpoint=https://s3.us-east-1.Lyve Cloud.seagate.com 
      select-object-content
      --bucket lyve-cloud-bucket
      --key path_to_s3_object/input.json 
      --expression " select * from s3object  where Position = 'Catcher' limit 10" 
      --expression-type 'SQL'
      --input-serialization '{"JSON": {"Type": "LINES"}, "CompressionType": "NONE"}'
      --output-serialization '{"JSON": {}}' "output.json"
    • Result: Run the cat output.json command to see the output.

  4. Select a specific column from the JSON file.

    • Query: select Name, Team, Position from s3object limit 10

    • Command:

      aws s3api
      --endpoint=https://s3.us-east-1.Lyve Cloud.seagate.com  
      select-object-content
      --bucket lyve-cloud-bucket
      --key path_to_s3_object/input.json 
      --expression "select Name, Team, Position from s3object limit 10" 
      --expression-type 'SQL'
      --input-serialization '{"JSON": {"Type": "LINES"}, "CompressionType": "NONE"}'
      --output-serialization '{"JSON": {}}' "output.json"
    • Result: Run the cat output.json command to see the output.

      Exmple2-step_4-json.png
Read from PARQUET

Note

Parquet option is not enabled.