Empower innovation for analytics and data science with real-time operational intelligence
What is Spark?
Apache Spark is an open-source general-purpose cluster computing system. With its in-memory data processing engine, development APIs, and support for higher level tools, it allows data workers to efficiently execute streaming, data processing, ETL, machine learning, or SQL workloads that require fast interactive access to datasets. With Spark running, developers everywhere can now create applications to exploit Spark’s power, derive insights, and enrich their data science workloads within a single, shared dataset in Hadoop.
The Challenges with Monitoring Spark
The adoption of Spark and the associated desire to innovate with massive amounts of data has changed the requirements of monitoring tools. There are two sources of challenges associated with Spark monitoring in production environments: complexities with multiple layers and ephemeral nature of Spark.
Complexities with Multiple Layers
The Spark architecture consists of multiple components at multiple layers, all which come together to make a Spark application work:
Spark infrastructure, including Spark Master and Spark Worker
Applications that run on top of the infrastructure, including Spark Executors and Spark Driver
Underlying resources, including disk, CPU, network bandwidth
Understanding performance and utilization of Spark clusters and applications require real-time visibility into each and every component.
Ephemeral Nature of Spark
The amount of time required to run certain Spark applications varies based on use case. Developers leveraging Spark applications may require tasks to be run once a day, few times a day, or even once a week or month. With tasks moving in and out of the environment at any given moment, it becomes increasingly difficult to track and monitor performance in production.
The SignalFx Difference
Monitoring at Every Level
When it comes to operating Spark environments at scale, understanding performance relies on dependencies among the various layers within the Spark architecture. With SignalFx, you can monitor from Spark process to node to cluster or job to application stage in a single dashboard. Instant visibility across all layers of the Spark architecture gives you both the flexibility to gain a service-wide view of performance and the power to explore individual details.
For Spark admins, start with an aggregated view of your Spark cluster. Easily drill down to master processes and worker processes. Correlate performance metrics down to the specific node, and evaluate whether the application is impacted at the service level.
For Spark application developers, start with an aggregated view of your data by Spark application and user. Easily drill down to key metrics on active stages and runtimes, driver and executor utilization, and processing times for streaming applications.
Select Spark Metrics
JVM Heap Used/Committed
Worker Cores Free/Used
Spark Job Tasks
Bytes & Records
Instant Visibility at Scale
There are many metrics specific to Spark, and knowing where to start and what to monitor can be difficult. Garbage collection stalls or abnormality in memory patterns can create issues. Performance issues typically arise during shuffles. Latency and throughput of Spark applications impact users.
SignalFx provides out-of-the-box insights across all the Spark metrics that matter. Built-in dashboards for Spark provide a running start to monitoring your complex, distributed environment. SignalFx also curates data from the other applications and cloud services in your environment enabling you to embed your own best practices for monitoring and alerting across services important to your specific use case.