Streaming Analytics for the Masses: Introducing the SignalFlow 2.0 API

Modern, distributed infrastructures supporting new applications generate vast amounts of data. This increase in data calls for powerful and intelligent monitoring systems that can operate at scale and perform real-time analysis to understand the performance and availability of any specific environment. At its core, we believe that monitoring modern infrastructures is an analytics problem.

Today, we are very excited to announce the limited access beta of the SignalFlow 2.0 API. This new API allows any SignalFx user to perform historical or real-time streaming analytics computations on all infrastructure and application data via a simple and well-documented HTTP API. It enables new and exciting ways to use SignalFx’s analytics and anomaly detection capabilities from virtually anywhere. Simply submit a SignalFlow computation program to the API and data starts streaming back to you!

The SignalFlow API 2.0 is available at https://stream.signalfx.com. Start using this new API today by contacting SignalFx Support at support@signalfx.com.

Analytics for Everyone

The SignalFlow 2.0 API is rooted in SignalFlow, our real-time, streaming analytics system. Until today, SignalFlow was only accessible through a private API for the sole benefit of our web interface. It powers all charts, anomaly detectors, and virtually all data egress in the SignalFx platform.

Before now, the SignalFlow language was rough and complicated. The semantics of the returned data were undocumented and difficult to use (even for us!). However, we knew that the ability to perform complex mathematical computations on both historical and real-time time series data with minimal latency was valuable to any user monitoring a modern infrastructure.

Over the course of the past few months, we’ve been hard at work making SignalFlow’s capabilities accessible to a broader audience. We took the knowledge and experience we’ve gained over the last few years of building SignalFx around its SignalFlow real-time analytics to redesigned the SignalFlow language. The SignalFlow 2.0 API is more powerful, yet easier to write and to understand for all users. Inspired by Python, it feels very natural to write and it will look familiar to many of you.


Check out a demo of our SignalFlow 2.0 API at Monitorama 2016 PDX » 

 

 

The SignalFlow Language

SignalFlow is a data-flow-oriented language with a Python-like syntax. Language built-ins allow you to create streams from your time series. Those streams also expose methods that help you easily modify the stream. You can chain those methods to combine multiple operations before eventually publishing the stream to get its output.

Here’s a simple example that calculates the average CPU utilization by Amazon availability zone and publishes the output, which should be one time series per availability zone:

data('cpu.utilization').mean(by='aws_availability_zone').publish('cpu_by_az')

This second example gives you a one-minute moving average of the 95th percentile of CPU utilization of your us-east-1 hosts. Because stream methods return a stream themselves, you can chain stream operations.

data('cpu.utilization', filter('aws_region','us-east-1')).percentile(95).mean(over=1m).publish('cpu_p95')

The SignalFlow language really starts to come alive in its ability to bind variables and understand any mathematical expression. This following example calculates a day-over-day rate of change percentage of the 95th percentile of CPU utilization.

now = data('cpu.utilization').percentile(95)
yesterday = now.timeshift(-1d)
(100 * (now / yesterday - 1)).publish('cpu_dod_change')

Transform streams with lambda functions:

data('speed_mph').map(lambda x: x * 1.609).publish('speed_kmh')

Finally, you can handle and publish multiple data streams at the same time:

hits = data('cache.hits').sum().publish()
misses = data('cache.misses').sum().publish()
(100 * hits / (hits + misses)).publish('hit_ratio')

For a complete reference of the SignalFlow language and to learn more about what’s possible (filtering, functions, expressions, and more), head over to the SignalFlow Overview, the Functions index, and the Stream methods index.

API Overview

To execute a SignalFlow program as a real-time, streaming computation, all you have to do is send the text of your program to the /v2/signalflow/execute API endpoint (authenticated with your session token).

$ curl --request POST \
       --header "Content-Type: text/plain" \
       --header "X-SF-Token: YOUR_TOKEN" \
       --data "data('cpu.utilization').mean().publish()" \
       "https://stream.signalfx.com/v2/signalflow/execute"
event: control-message
data: {
data:   "event" : "STREAM_START",
data:   "timestampMs" : 1461351769440
data: }
...

This /v2/signalflow/execute endpoint is designed to be as simple as possible to use. It’s a great way to get started in powering your simple streaming analytics use cases.

The returned data is a Server-Sent Events (SSE) stream of JSON-encoded messages that includes the output data, events, and time series metadata, as well as information and control messages. You can specify a start timestamp, a stop timestamp, the desired resolution and maxDelay (all in milliseconds). If the stop time is in the future or not specified, you’ll receive real-time results at a cadence dictated by the compute resolution until the stop time is reached or until you close the connection, whichever occurs first.

For more advanced uses cases or if you plan on using the API from JavaScript in the browser, the SignalFlow 2.0 API is also available through a WebSocket connection. By establishing a WebSocket at wss://stream.signalfx.com/v2/signalflow/connect, you get a long-lived connection with SignalFx to execute multiple analytics computation and to receive the output concurrently via named multiplexed channels.

The WebSocket connection also uses JSON as the wire format, except for data messages which we use a direct binary encoding through the WebSocket’s binary channel. This results in a 90% reduction in the required bandwidth and a reduction in the CPU cycles needed to decode the data (much more efficient than JSON).

Client Libraries

We’ve improved our client libraries in addition to making it easier to integrate with the SignalFlow 2.0 API. These libraries provide a SignalFlow API client that abstracts all the connection, transport, and encoding concerns and presents simple abstractions to execute your SignalFlow programs and work with the output. The signalfx-python and signalfx-nodejs libraries are updated, with the Go, Java, and Ruby client libraries in progress.

Here’s a short example using the Python client library (all these features are also available in the Javascript library):

import signalfx

flow = signalfx.SignalFx().signalflow('YOUR_TOKEN')
try:
    program = "data('cpu.utilization').mean(by='aws_availability_zone').publish()"
    # You can execute as many computations as you want from the same
    # client.
    c = flow.execute(program)
    for msg in c.stream():
        if isinstance(msg, signalfx.signalflow.messages.DataMessage):
            print('@ {0}: {1}'.format(msg.logical_timestamp_ms, msg.data))
finally:
    flow.close()

The .execute() method returns a Computation object that gives you access to the stream of messages via .stream()(a generator), as well as the metadata of each output time series with .get_metadata(tsid), the state of the computation (.state), its compute resolution (.resolution), and much more.

Similar to the HTTP API, you can specify a start time, stop time, desired resolution, and desired maxDelay with the corresponding parameters to the .execute() method. Use the stream of data, events, and messages you receive from .stream() for your specific use case.

An Implementation Example

With the power of the SignalFlow analytics language at your fingertips and available from your favorite programming language, the sky’s the limit for what you can build: dump data as CSV, pull some numbers for a report, render charts, build your own streaming dashboards or custom visualizations, or even use data and events to trigger other systems.

One example implementation and a way to make it easier for users to play with and learn the SignalFlow language is an interactive SignalFlow console:

$ pip install signalflowcli
$ signalflow
-*- SignalFx SignalFlow™ Analytics Console -*-

Enter your program and press  to execute.
SignalFlow programs may span multiple lines.
Set parameters with ". "; see current settings with "."
To stop streaming, or to exit, just press ^C.

->

This interface lets you enter SignalFlow programs and display the output as a live display, as CSV, or as a rendered graph image. Check out its source code to see how it works and gives you a base example of how to build your own programs and scripts.

Get Started

Needless to say, we’re looking forward to discovering all the cool things people will build by leveraging real-time data streams into their projects.

There is so much more to say about the SignalFlow language and the SignalFlow 2.0 API than what can fit in a blog post. Learn more in the SignalFlow Overview and check out How to Get Started with the SignalFlow API. Also, take a look at signalflow.vim for syntax highlighting of SignalFlow programs in Vim.

We love to get feedback and learn about your use cases, cool hacks, and projects using our streaming analytics. We can’t wait to hear from you!

 

Find your signal today »

About the authors

Maxime Petazzoni

Maxime has been a software engineer for over 15 years. At SignalFx, Max works on the core of SignalFx: real-time, streaming SignalFlow™ Analytics. He is also the creator of MaestroNG, a container orchestrator for Docker environments.

Enjoyed this blog post? Sign up for our blog updates