Panoply on Panoply - How We Leverage Our Own Platform

Written by Ilia Tenkalevich | Jun 23, 2023 8:24:23 PM

In today's modern business landscape, managing and analyzing vast amounts of data from various platforms and sources can be a daunting task. At Panoply, we understand these challenges firsthand. Our own data stack consists of BigQuery, Redshift, Postgres, MongoDB, Google Analytics, S3, Jira, Google Cloud Storage, Salesforce, Intercom, and many more.

When it comes to gaining actionable insights, we often find ourselves piecing together information from multiple sources. This process can be time-consuming, complex, and prone to errors. However, with Panoply, we have leveraged a powerful solution of our own that addresses these challenges head-on.

Purpose

The primary goal of leveraging Panoply on Panoply is to enhance our operational efficiency and decision-making processes.

As an R&D team leader, it is crucial to have a comprehensive understanding of any issues that arise in production and their impact on the platform. To address this need, we leverage Panoply's capabilities to collect and analyze data from various sources within our production environment. This gives us valuable insights into the nature of the issues, the affected components, the scope of the impact, and the customers involved.

Challenges

The challenges associated with managing data across multiple platforms and sources are numerous. Here are some of the key obstacles we face:

Different query languages: Each platform in our data stack utilizes its own query language, such as SQL, MongoDB query, Prometheus metrics, or even Excel file filtering. This diversity adds complexity and requires specialized knowledge to extract and combine data effectively.
Limited data retention: Certain platforms do not store data for extended periods, making it challenging to access historical information. This limitation hinders comprehensive analysis and restricts our ability to gain a holistic view of our data.
Security and access control: When dealing with production data, security is paramount. Limiting access to sensitive information to only a select few individuals is crucial for protecting our customers' data and maintaining compliance with regulatory requirements.
Data volume: Production data often contain significantly more information than what is required for analytical purposes. Extracting and processing only the relevant data is vital to prevent unnecessary resource consumption and optimize performance.

Panoply capabilities

By leveraging the capabilities of our own platform in three key areas, we’re uniquely prepared to address the above challenges:

Monitoring performance health: Panoply provides us with the ability to effectively monitor our own platform’s performance and health by tracking the success rate of collection jobs, we can proactively identify and address any production issues that may arise. This proactive approach ensures that our platform remains stable, reliable, and performs optimally for our customers.
Analyzing Panoply customer utilization: Gaining a comprehensive understanding of how our customers utilize Panoply is crucial to ensuring their utmost satisfaction. By examining various aspects such as connector and database configurations, as well as evaluating data like connector popularity and identifying connectivity issues, we can extract valuable insights into their preferences, behaviors, and the features they consider most valuable. Armed with this information, we can consistently enhance our platform, prioritize development initiatives, customize our services to align with their evolving requirements, and effectively manage expenses.
Investigating incidents: In the event of an incident or issue like jobs failing to collect, investigating the root cause is crucial. However, connecting directly to production databases and instances can be risky and impact the overall system performance, so collecting info about a job may take time and be a very sensitive procedure. With Panoply, we can access and query the necessary data without compromising the integrity of our production environment. This capability significantly streamlines the incident investigation process and helps us resolve issues more efficiently.

How Panoply Solves These Challenges:

Panoply offers a comprehensive solution that overcomes these challenges and empowers us to make data-driven decisions efficiently. Here's how:

Unified query language: Using Panoply, we can access all of our data through a single data warehouse, utilizing a universal query language—SQL. This standardization eliminates the need for specialized query languages, streamlining the data retrieval and analysis process.

Seamless data integration: Combining data from multiple sources is made effortless with Panoply. Through simple SQL join queries, we can easily merge datasets and derive valuable insights. This capability saves us time and eliminates the need to manually collate data from various platforms.

Extended data retention: Regardless of the limitations imposed by the original sources, Panoply retains all collected data for as long as we require it. This ensures we have access to a comprehensive historical record, enabling us to perform trend analysis and gain valuable long-term insights.

Robust access control: Our platform allows us to define granular access controls, ensuring that only authorized personnel can access specific tables and sensitive data. By providing fine-grained access restrictions, we can safeguard our production information while still enabling efficient data exploration and analysis.

Configurable data collection: We have the flexibility to select which data to collect and which to exclude. For instance, if a PostgreSQL table contains two columns A and B, we can exclude column B and only collect the necessary data from column A. This feature optimizes storage utilization and reduces unnecessary data transfer and exposure.

Minimal impact on production performance: One of the key advantages of using Panoply is that any queries executed on the collected data have no impact on the performance of our production systems. This separation ensures that our analytics dashboards and data exploration processes do not disrupt the smooth operation of our core business processes.

By leveraging Panoply on Panoply, we have transformed our data management and analytics workflows. We circumvented the challenges of multiple platforms, diverse query languages, limited data retention, security concerns, and performance impact. With Panoply's unified data warehouse and powerful data integration capabilities, we gained a competitive edge by efficiently extracting insights from our diverse data sources.

Steps of implementation:

Here's a detailed overview of how we achieve this:

Connecting to data sources: We utilize various Snap Connectors that our platform supports, including MongoDB, Postgres, Redshift, S3, Google Cloud Storage, and others previously mentioned. By leveraging these connectors, we collect the relevant data from our production sources efficiently and securely.
Customers’ data warehouses metadata: Essential information about our customers' data warehouses, including details such as the warehouse type, region, and status. By monitoring this information, we can quickly identify any potential issues specific to a particular warehouse or geographical region. This data provides valuable context for understanding the scope and magnitude of any problems that may occur.
Owners of data warehouses: In addition to tracking customers' data warehouses, we help track metrics around data warehouse ownership. This information helps us identify the responsible parties for each data warehouse, facilitating effective communication and collaboration during issue resolution. Understanding the ownership structure is crucial for efficient incident management and accountability.
Connectors configuration in data warehouses: Configuration details of connectors within each data warehouse. While sensitive information is excluded, we collect metadata about the connectors, such as their type and settings. This data enables us to identify which connectors are involved in potential issues, providing insights into the specific components that may be causing disruptions.
Collect jobs and their status: We gather data on the number of collect jobs executed by each connector and their corresponding status. This information helps us identify any failed or incomplete jobs, indicating potential issues with data ingestion or synchronization. By monitoring the status of collect jobs, we can proactively detect anomalies and take appropriate actions to resolve them promptly.
Jobs using cloud resources: We track the cloud resources consumed by each collect job. This information helps us identify any resource-intensive jobs that may impact overall system performance. By monitoring resource usage, we can optimize job configurations and allocate resources effectively, ensuring smooth operation of our platform.
Audit trail of connector changes: Panoply maintains an audit trail of changes made to each connector. While the detailed content of the changes is not accessible for security reasons, we can track when and by whom these changes were made. This audit trail assists us in understanding any modifications or updates that might have contributed to the occurrence of issues. It facilitates troubleshooting and aids in identifying potential areas for improvement.

Data collection and visualization: We access and visualize the collected data using Grafana, which connects to our own Panoply data warehouse. Grafana allows us to configure multiple dashboards tailored to different analytical needs and requirements. These dashboards provide us with insights into the performance health of our platform, the status of collect jobs, and other relevant statistics.

Furthermore, we configure alarms within Grafana to notify us of any critical events or anomalies. These alarms ensure that we receive immediate notifications when specific metrics or thresholds deviate from expected values. By proactively monitoring these alarms, we can quickly identify and address any issues, minimizing their impact on the platform and our customers.

With Panoply, our R&D team can effectively monitor the production environment and obtain granular insights into any issues that may arise.

In the evolving landscape of data-driven decision-making, Panoply enables us to harness the full potential of our data and drive business success and value to our customers. With its user-friendly interface, robust security features, and unparalleled performance, Panoply Is exactly what we need to make Panoply better.

Interested in learning more? Schedule a demo today.

View full post