What is Amazon ELB?

Amazon Elastic Load Balancing (ELB) allows websites and web services to serve more requests from users by adding more servers based on need. A load balancer distributes load to ensure even usage of capacity, taking into account the type of services offered by each server, whether each server is healthy, and the demand on the server.

Amazon ELB Monitoring

Problems with ELB that may cause an outage include configuration errors with the load balancer, network or security settings, and problems with your backend service. Your monitoring tools will give you the information needed to troubleshoot and fix these issues, but the type of data and the speed to insight will vary greatly depending on which tool you use.

SignalFx offers pre-built content straight out of the box so that you can easily see aggregate data from many load balancers and instances and get rankings and distributions as well as comparisons over time.

 

integrations_awselb@4x

Latency Over Last Minute: If load balancing latency is too high, you might lose opportunities and miss SLAs. If you only looked at average latency over time, you wouldn’t see crucial details about user experience. Visualize latency as a percentile so you have more insight into the best—and worst—performers.

Total Requests/min: An increase in requests per minute might be correlated with an increase in latency. You’re probably used to operating your service within a certain range of requests per minute. If this number is way outside of that range, there are likely problems with routing or upstream on the client side.

Top LBs by Request/min: When the number of requests is higher than you usually expect, or there’s a spike in requests, the next step towards narrowing down the cause is to determine which load balancer is affected. You can see which load balancer is taking the brunt of the traffic or any abnormal behavior.

Highest Backend Error %: Backend errors are defined as errors between the load balancer and the server. When an error code is returned from a backend server call, ELB may retry the call. Additionally, this count includes errors returned during health checks. Thus, the number of backend errors may be higher than the number of frontend errors.

The SignalFx Difference

Unhealthy Host %: If you fired an alert on a simple threshold of unhealthy host count, your alerts would probably not indicate cluster health when the cluster autoscales over time. A smarter alert takes into account the size of your cluster and calculates the percentage of hosts that are unhealthy as a derived metric.

Aggregates & Correlations:  Alone, any Requests/min charts can indicate a moment-in-time issue that could be cause for concern. But looking at multiple collectively as part of a dashboard and applying solid alerting logic provides a stronger case for identifying a problematic trend and quickly isolating the cause before it leads to a load issue that impacts performance.

Trends Over Time: Latency over the last seven days can be a helpful way to determine if there has been a concerning pattern due to daily or weekly performance. You can compare this with the requests over the same period. Additionally, daily or weekly batch jobs like deployments or cleanup can affect latency. Deployments are especially important to watch because new versions of your service could cause slower or faster performance.

Amazon ELB Metrics

# LBs
Latency Over Last Minute
LBs with Worst Average Latency (ms)
Total Requests/min
Requests/min
Top LBs by Requests/min
Top Frontend Errors/min
Highest Backend Error %
Top Backend Connection Errors/min
LBs with Highest Unhealthy Host %
Requests/min 7d Change %
Latency 7d Change %

Start Your Amazon ELB Monitoring Trial

Try SignalFx for 14 days. No credit card required.