Exploring Nested Data with Panoply

Nested Data Ingestion Methods

At Panoply, we've handled nested data since our inception. Nested data can come in multiple shapes and forms, including different combinations of the same types. Nested data can be:

  • Objects
  • Arrays
  • JSON strings

In this blog, I’ll take you through a journey of handling nested data in Panoply. You might see spoilers for the future as the data journey never ends 🙂.

nested data

When we introduced the Panoply solution back in 2015, we transformed all the nested data into relational structure. How? By creating additional tables with a one-to-many relationship between them and their “parent” table. These additional tables are subtables. 

Back then, this was the only way we handled nested data. The greatest advantage of the above ingestion method is that all the data is available to query directly with standard SQL. This is the main reason it was and is still the default behavior in all of our connectors.

Example:

Input:

{"id":1, "number":1, "text":"hello", "date":"2023-10-01", "nested":{"another_number":1, "another_level":{"another_date":"2023-10-02"}}}

{"id":2, "number":2, "text":"there", "date":"2023-10-01", "nested":{"another_text":"again", "another_level":{"another_date":"2023-10-02"}}}

Output: (... represents internal Panoply columns)

destination_table

id

number

text

date

1

1

hello

2023-10-01

 

2

2

there

2023-10-01

 

 

destination_table_nested

another_number

another_text

1

null

 

null

again

 

 

destination_table_nested_another_level

another_date

2023-10-02

 

2023-10-02

 



As a company that listens to our customers, we’ve always changed and adapted our features to the needs of our customers and technological capabilities. Throughout the years we’ve added a number of additional options to handle nested data and ingest it in many different ways.

Next, we’ve added the option to flatten the nested data to the parent table. This solution was (and still is) only available for objects and JSON strings. At the beginning, we only supported the ability to flatten the data one level up, but now users can flatten the data up to the top-most level (the main table). 

Example:

Input:

{"id":1, "number":1, "text":"hello", "date":"2023-10-01", "nested":{"another_number":1, "another_level":{"another_date":"2023-10-02"}}}

{"id":2, "number":2, "text":"there", "date":"2023-10-01", "nested":{"another_text":"again", "another_level":{"another_date":"2023-10-02"}}}

Output: (... represents internal Panoply columns)

destination_table

id

number

text

date

nested_another_number

nested_another_text

nested_another_level_another_date

1

1

hello

2023-10-01

1

null

2023-10-02

 

2

2

there

2023-10-01

null

again

2023-10-02

 



Both options still kept the data in a relational structure but they encountered a few edge cases. Some of them were the main driver to add more options. For example, if the user had a lot of uniquely named nested fields they would either end up having a lot of subtables (with the one-to-many ingestion approach) or a lot of columns in a single table (with the flattening approach). Because of this we’ve added the key:value ingestion option. This ingestion type converts the data from an object into an array of objects. Each object has two fields in it:

  • Key - The value of this field is the original field name
  • Value - The value of this field is the original field value

With this ingestion type, instead of creating multiple tables or columns we create a table with two columns (key & value) and multiple rows to represent a single row of the original data. This solution was available for our Redshift customers for over 5 years already but was not ported to our BigQuery solution when we originally introduced it. Spoiler alert: The key:value feature is coming soon to BigQuery. 

Example:

Input:

{"id":1, "number":1, "text":"hello", "date":"2023-10-01", "nested":{"another_number":1, "another_text":"again", "another_date":"2023-10-02"}}

Output: (... represents internal Panoply columns)

destination_table

id

number

text

date

1

1

hello

2023-10-01

 

 

Destination_table_nested

key

value

another_number

1

 

another_text

again

 

another_date

2023-10-02

 



In April 2023, we introduced to the world our Flex Connector. This connector can connect to literally any API service and ingest its data into your Panoply data warehouse. Due to the fact that this connector supports multiple different unknown responses, we knew we might encounter weird data structures that would require some pre-load modeling. These are tasks our Snap Connectors do automatically based on the source’s logic, but with the Flex Connector, we do not have this information prior to the user running the connector. 

That’s why we’ve added the option to flatten arrays. When flattening an array we are transforming each element in the array into a separate row and attach to each row the original top level row. By doing this we are actually duplicating the original data in order to accommodate the flattening of the array.

The Flex Connector is currently the only connector that supports this, but we are planning to add this feature to all the connectors in the future.

Example:

Input:

{"id":1, "number":1, "nested_array":[{"another_number":1, "another_text":"foo"},{"another_number":2, "another_text":"bar"}]}

Output: (... represents internal Panoply columns)

destination_table

id

number

nested_array_another_number

nested_array_another_text

1

1

1

foo

 

1

1

2

bar

 



We recently released the latest nested data feature - Panoply’s BigQuery users can now ingest nested data directly as a JSON data type. Thanks to the native JSON data type in BigQuery, we now allow users to ingest their nested objects and JSON strings directly as JSONs.

Example:

Input:

{"id":1, "number":1,"nested":{"another_number":1, "another_text":"again", "another_date":"2023-10-02"}}

Output: (... represents internal Panoply columns)

destination_table

id

number

nested

1

1

{"another_number":1, "another_text":"again", "another_date":"2023-10-02"}

 



Additional ingestion types in coming in the future:

  • JSON data type for arrays
  • Key:Value feature in BigQuery
  • Flattening arrays on all connectors and more than a single array
  • Array data type
  • Struct data type

As you can see there are many ways to ingest nested data in Panoply, with additional options coming in the future. Users often ask us why we have so many different options - this is a great question, and we have an answer! Every company, every user, and every piece of data has their own needs. These needs can be logical, analytical, or even simply data needs that require a certain data modeling that will fit their use case. Similar to a lot of other features, we truly believe that we should simplify the data process but still enable flexibility for the users to achieve their analytical goals.

 

Get a free consultation with a data architect to see how to build a data warehouse in minutes.
Request Demo
Read more in:
Share this post:

Contents

Work smarter, better, and faster with monthly tips and how-tos.