Extracting Data from SignalFx

While the data you send into SignalFx is visible in dashboards and charts, you can also extract past or streaming time series data that has been sent to SignalFx. You can extract “raw” data (metrics and their values), as well as data that has been processed by SignalFx analytics. You can also choose between extracting past data and extracting data as it is being streamed to SignalFx.

Why would I extract data from SignalFx?

You might want to extract data from SignalFx for a variety of reasons. Some of the most common use cases include:

  • You have an operational dashboard that pulls data from multiple sources, including SignalFx, to present a consolidated view of an environment. The tool you use for the operational dashboard has its own visualization capabilities, so all you need is the data to populate the visualizations.
  • You want to make data that’s stored in SignalFx visible to users that don’t have logins to SignalFx, such as executives or customers.
  • You want to use the metrics in SignalFx to drive automated actions, e.g. to remediate issues or to perform preventative maintenance, and you prefer not to do this using SignalFx detectors and their associated webhooks.
  • You want to analyze the data stored in SignalFx using a desktop product, or some other tool that performs different types of analytics than the SignalFx analytics engine.

How do I extract data from SignalFx?

There are three common ways to extract data from SignalFx: by using SignalFlow, SignalFx’s streaming analytics API; by using the /timeserieswindow endpoint in the SignalFx API; or from the SignalFx UI. Each method is summarized below. To use SignalFlow or /timeserieswindow, you must have an access token for the SignalFx API. For more information, see Authentication Overview.

Using SignalFlow to extract data

SignalFx provides a language called SignalFlow that is primarily used to describe computations for SignalFx’s real-time analytics engine. In addition, SignalFlow also provides a publish() stream method, which extracts data from a stream so that it is visible outside of a computation.

In most cases, you would use this method when using SignalFlow programmatically via curl or via one of our client libraries. In this document, however, we are using the SignalFlow command-line interface (CLI) for the SignalFx v2 API to provide examples of how the publish() method might be used. The CLI outputs historical or streaming data in text format to a live feed (or to the screen), to a simple graphical display, or as CSV-formatted text.

Advantages

  • SignalFlow provides powerful capabilities that let you filter data, apply analytics, and specify options for resolution, rollup, and other advanced settings.
  • In addition to past windows of data, you can also export streaming data, meaning you can stream data directly to another target as it is being sent to SignalFx.
  • You can specify relative time ranges, such as the last 15 minutes, or from 2 days ago to 1 day ago, rather than only using milliseconds since epoch.

Disadvantages

  • With great power comes a learning curve. To use SignalFlow to extract data, you must become familiar with either the SignalFx v2 API (to use it with curl or via a client library) or the CLI. To get the most out of SignalFlow, you’ll need to learn which options provide the information you need and how to build the query using the API or the CLI. This includes an understanding of maxDelay, rollups and resolutions.
Extracting data with the /timeserieswindow endpoint

The /timeserieswindow endpoint in the SignalFx API outputs raw metric data in JSON format.

Advantages

  • This method uses a very simple API, so it’s easy to use. It is a good choice if you simply need metrics from a prior time period, without any filters or analytics applied.
  • Returns a standard JSON format

Disadvantages

  • You can’t export streaming data.
  • You can export data only to JSON format, which generally means you can make use of this API only by incorporating it into a script that either parses the JSON data or pumps it elsewhere.
  • You can’t specify a rollup type; the default rollup for the type of metric being exported will be used.
  • You can’t specify a non-default resolution; only one of the resolutions at which SignalFx retains data can be used: 1000 (1s), 60000 (1m), 300000 (5m), or 3600000 (1h).
  • You can only export metrics with no filters, analytics, etc. (Note that what you can extract is not exactly the same as what was submitted, as rollups will be applied.)
  • You must express the start and start time using milliseconds since epoch, which is more cumbersome than specifying a relative time range or using a different format, such as UTC. (Note that epoch is often denoted in seconds, so be sure to multiply by 1000 to get the time in milliseconds.)
  • There is a maximum number of datapoints that you can get back in a single query. While the maximum is quite high (currently approximately 50 million datapoints), a query could theoretically apply to more than 50 million datapoints. In this situation, an error occurs and no data is returned.
Downloading data from the SignalFx UI

While viewing the Data Table tab in a chart or detector in SignalFx, you can download a CSV file containing the data displayed in the table. Also, when viewing a chart, you can export the most recent 100 datapoints to a CSV file.

Advantages

  • This method is easy, as it doesn’t require any programming or use of an API.
  • You can export data that has been filtered, that has a custom rollup or resolution, or that has been processed by SignalFx analytics, if that is what is being displayed on the chart.

Disadvantages

  • You can export data only when you are working in the SignalFx UI.
  • When exporting the data table, you see data for only one point in time represented on the chart.
  • When exporting the chart, you see only the most recent 100 datapoints being displayed on the chart.
  • You can’t export streaming data.

Note: Because downloading data from the UI is self-explanatory, we won’t be discussing it in detail later in this blog.

How do I decide which method to use?

The nature of your data requirements can help you decide whether to export data by using SignalFlow, by using the /timeserieswindow API, or by downloading it from the SignalFx UI. Use the following table to determine which technique to use. In almost every case, SignalFlow is the preferred option.

If you want to then
Export streaming data Use SignalFlow
Export a portion of the data behind the chart you are viewing onscreen Export chart to CSV (exports the most recent 100 datapoints) or, from the Data Table tab, export to CSV (exports the data listed in the data table).
Export data with a relative time range (e.g. last 15 minutes) Use SignalFlow
Export raw data (no analytics applied), for a specific past time range, using a default rollup and resolution Use SignalFlow or /timeserieswindow
Export raw data (no analytics applied), for a specific past time range, at a rollup or resolution different from the SignalFx defaults Use SignalFlow
Export data with analytics applied in a way that isn’t reflected in a chart (see note below) Use SignalFlow

 

Note on exporting data with analytics applied: For example, you might want to export the 5-minute moving average of a metric for the past hour. You don’t need to build a chart that displays a rolling average and then export the data; you can apply those analytics as part of your SignalFlow command.

Using the SignalFlow CLI

This section summarizes the syntax for using the SignalFlow CLI (command-line interface) to extract data from SignalFx, based on more detailed information here. The advantages and limitations of using SignalFlow are described above.

Note: The SignalFlow CLI is not an officially supported tool. It is intended to be an example of how to use the SignalFlow analytics language part of the signalfx-python library.

The following syntax summary is taken from the SignalFlow CLI interactive help ($ signalflow --help).

usage: SignalFlow [-h] [-t TOKEN] [-x] [--api-endpoint URL]
                  [--stream-endpoint URL] [-a START] [-o STOP] [-r RESOLUTION]
                  [-d MAX-DELAY] [--output {live,csv,graph}] [--timezone TZ]
                  [program]

SignalFlow Analytics interactive command-line client

positional arguments:
  program               file to read program from (default: stdin)

optional arguments:
  -h, --help            show this help message and exit
  -t TOKEN, --token TOKEN
                        session token
  -x, --execute         force non-interactive mode
  --api-endpoint URL    override API endpoint URL
  --stream-endpoint URL
                        override stream endpoint URL
  -a START, --start START
                        start timestamp or delta (default: -1m)
  -o STOP, --stop STOP  stop timestamp or delta (default: infinity)
  -r RESOLUTION, --resolution RESOLUTION
                        compute resolution (default: auto)
  -d MAX-DELAY, --max-delay MAX-DELAY
                        maximum data wait (default: auto)
  --output {live,csv,graph}
                        default output format
  --timezone TZ         set display timezone (default: US/Pacific)

When you invoke SignalFlow, you will see the prompt ->. You can then enter a SignalFlow program (even across multiple lines) and press <Esc><Enter> to execute the program and visualize the results. Press ^C at any time to interrupt the stream, and again to exit the client. To actually extract data, you use the publish() API.

Examples and usage

In this example, we are streaming live data directly to the screen.

$ signalflow
-> data('jvm.cpu.load').mean(by='aws_availability_zone').publish()

To see current parameter settings, use the . command ( press .<Esc><Enter>).

-> .
{'max_delay': None,
'output': 'live',
 'resolution': None,
 'start': '-1m',
 'stop': None}
->

To set a parameter, use .<parameter><value> :

-> .start -15m
-> .stop -1m
-> .
{'max_delay': None,
 'output': 'live',
 'resolution': None,
 'start': '-15m',
 'stop': '-1m'}

In this example, we are using the commands in a program named program.txt to extract non-streaming data from 15 minutes ago to 1 minute ago, and outputting it in CSV format to a file named csv-to-plot.csv.

$ signalflow --start=-15m --stop=-1m --output=csv < program.txt | csv-to-plot

Troubleshooting SignalFlow

My data is taking a long time to show up.

When you use SignalFlow, the data is processed using the full capabilities of the SignalFx analytics engine, which includes special handling of jitter and lag in data arrival times. There are two reasons that the analytics engine is waiting to process the computation.

The first is max_delay, which is the amount of time we wait for delayed data before processing analytics. If not specified or set to None, the value of max_delay is determined automatically, based on SignalFx’s analysis of incoming data. To avoid delays in getting data from SignalFlow, set the max_delay parameter to 1s. This means that even if data is delayed, SignalFx will process the analytics after 1 second, without the missing data.

$ signalflow
-> .max_delay 1s

If you want to set max_delay to a longer period of time, make sure that your stop value is an amount of time, before now, greater than max_delay. For example, if you want a max_delay of 30s then use a stop value of -31s or earlier.

-> .max_delay 30s
-> .stop -31s

For more information on max delay, see Delayed datapoints.

The second reason computations might be delayed is related to job resolution. SignalFlow must wait to the end of the current resolution window before making its computation. For example, if the job resolution is 300000 (5m) and the stop value is None (or not specified), SignalFlow will wait until it has all data points from the current 5m time window before performing any computations.

To avoid delays, make sure your stop value is an amount of time, before now, greater than the job resolution. For example, if you a looking at data from a few months back, the resolution may be 3600000 (1h). In this case, use a stop value of -1h or more.

-> .stop -1h

For more information on resolution and data retention policies, see How SignalFx Chooses Data Resolution.

When I ask for the latest data, it gives me data that is one minute old.

This issue can also be related to max delay. Instead of using a stop value of None (or not specifying a value), set the stop value to -1m.

-> .stop -1m
I’m having other problems extracting data with SignalFlow.

Our support team will be glad to help you. Simply contact support@signalfx.com.

More information on using SignalFlow

See developers.signalfx.com/docs/signalflow-overview and developers.signalfx.com/docs/getting-started-with-the-signalflow-api to learn more about working with SignalFlow.

Using /timeserieswindow

This section summarizes the syntax for using the /timeseries window endpoint in the SignalFx API, documented in full here. The advantages and limitations of using /timeserieswindow are described above.

Parameters
Parameter Type Description
query string Elasticsearch string query that specifies metric time series to retrieve.
startMs int Starting point of time window within which to find datapoints, in milliseconds since Unix epoch
endMs int Ending point of time window within which to find datapoints, in milliseconds since Unix epoch
resolution (optional,
default is 1000)
int The data resolution, in milliseconds, at which to return the data points. Acceptable values are 1000 (1s), 60000 (1m), 300000 (5m), and 3600000 (1h).
Examples

In the following example, curl is used to extract data for the metric jvm.cpu.load from 3/13/17 13:15 to 3/13/17 13:20, at the default resolution (1000ms).

 curl \
--header "X-SF-TOKEN: YOUR_ACCESS_TOJEN" \
--header "Content-Type: application/json" \
--request GET \
'https://api.signalfx.com/v1/timeserieswindow?query=sf_metric:"jvm.cpu.load"&startMs=1489410900000&endMs=1489411205000'

In the following example, the same data is being extracted, but at 5-minute resolution. If the metric data was sent to SignalFx more frequently than once every 5 minutes, the returned data will be rolled up using the default rollup for the type of metric (gauge, counter, or cumulative counter). In this case, the average of the values received during every 5-minute period will be returned.

 curl \
--header "X-SF-TOKEN: YOUR_ACCESS_TOJEN" \
--header "Content-Type: application/json" \
--request GET \
'https://api.signalfx.com/v1/timeserieswindow?query=sf_metric:"jvm.cpu.load"&startMs=1489410900000&endMs=1489411205000&resolution=300000'

Troubleshooting /timeserieswindow

When extracting data using /timeserieswindow, there are situations where expected data isn’t being returned, even though the request is syntactically correct. Causes and workarounds for these issues are discussed in the following sections.

  • No data is being returned
  • Only a subset of data is being returned
No data is being returned

There are two cases in which your request might return no data; that is, returned data looks like this:

{"data":{},"errors" : [ ]}

If you see this error, check to make sure there is actually data in SignalFx in the specified timeframe.

  • If there is no data, choose a different time frame.
  • If there is data, the likely reason for the error is that there are more than 5,000 metric time series in the timeframe. In this case, SignalFx has chosen a subset of the total data which happens to include only null values. For troubleshooting this issue, see “Only a subset of data is being returned,” below.
Only a subset of data is being returned

If you are asking for a given metric and set of dimensions, /timeserieswindow will look for all of the time series that matches that query across all time *first*. If the number of matching time series includes more than 5,000 time series, it will return a subset of the total time series, with no regard to whether there is data in the timeframe you’re asking for.

One way to work around this issue is to add sf_isActive:true as another filter in your query. This will return only the time series that are currently active (have reported at least 1 data point within the last 36 hours). This may or may not be appropriate depending on the nature of your data and how you are sending it in.

If using this filter won’t work for your situation, you need to break up the query to ensure that your response doesn’t contain more than 5,000 results. For example, suppose you were asking for sf_metric:content.global AND type:billable and seeing only a subset of results. You could break the query into two queries:

  • sf_metric:content.global AND type:billable AND status:paid
  • sf_metric:content.global AND type:billable AND status:unpaid

Break up your original query as granularly as necessary to ensure that the results will match no more than 5,000 time series.

Follow us on Twitter »

About the authors

Barbara Snyder

Barbara Snyder is an Information Architect at SignalFx. Merging an academic background in linguistics with longtime experience in the computer industry, she enjoys translating Engineering into English.

Enjoyed this blog post? Sign up for our blog updates