Reasoning about AWS costs using the AWS Cost Explorer and the AWS CLI
🚂 I’ll walk you through a train of thought. Along the way, you’ll see how I discover what aspects of CloudWatch and API Gateway are generating the most charges in the example AWS account, and how I drill deeper using different filters and even a script:
- Log into the AWS Management Console
- Go to Cost Management
- Open Cost Explorer
- I like to drill down to the Last 1 Day to see what is currently generating charges in the AWS account. So select 1D and Apply:
After “Apply” I see this:
By default, the services are ranked by the biggest dollar-eaters on top. To see what aspect of CloudWatch is having the biggest impact:
- Filter by the service you want to get details on, in this case CloudWatch. Enter the service name, Select All (2), and Apply filters:
2. Now that you’ve filtered by the CloudWatch service, group by “Usage Type”:
3. Now you’ll see exactly what aspects of CloudWatch are eating up the most cost. In this case it’s GetMetricData (GMD).
As I detailed in another post, it’s typically monitoring services, developed in-house or by a 3rd party, that are responsible for the GMD API requests, so lowering the frequency of those requests can reduce the GMD-Metrics bill.
An example of a tool that may be generating GMD-Metrics $s is the AWS Instance Scheduler Solution, which makes PutMetricData requests. And if you change the frequency of that solution’s CloudWatch Rule to something much more frequent like every 2 minutes, you’d end up paying that many times more for GMD-Metrics.
PutMetricData costs the same as GetMetricData, and I don’t find anything like “PutMetric” or “PMD” in the Usage Type filter for CloudWatch. Therefore, I assume GMD costs could also include PutMetricData costs. (Add a comment if there’s a different reason for it not showing up in the filters.)
If I drill down in the same way for API Gateway, I can see that my biggest cost is for the API Gateway Cache:
That makes me ask, “is this an API that I still want caching on, or did I just set that up a while ago as an experiment and forget to take it down?” To drill down more, I enter “cache” in the Usage Type filter, Select All, and Apply filters:
Conveniently, the Usage Type is prefixed with the region. In this case, USE1 and USE2. So the two AWS regions where these API Gateway cache costs are happening are US East 1 and US East 2.
At that point, it would be easy to use tags to figure out what’s causing the cost by clicking on “Tag” in the Group By section.…but only if you have good tags on all your resources.
Tag everything, so you can see exactly what apps are running up your AWS bill.
In my case, the APIs that had cache enabled didn’t have descriptive enough tags for me to do anything with them:
However, if you don’t have good tags, then you may need to write a little script to scan all your APIs.
Typically, when I need to write such a script, I google the following keywords: aws, [the name of the service], cli. In the case of API Gateway, that brings me to the aws apigateway reference.
Next, I just search the list of commands for cache. Since, neither of the two cache-related commands have “list” or “get” (but “flush”), I know I’m going to have to dig deeper.
So I look at the commands that look like they’ll list the properties of the APIs, such as
get-restapis, and do a command-F for the word “cache” on the detail page of that AWS API endpoint. It doesn’t show up there either.
Then I just skim through all the
get-prefixed commands and realize, “Ah, the cache settings can probably be configured on a stage-by-stage basis, and therefore, it’s probably under
get-stage. Sure enough, a quick find on the get-stage page for the word “cache”reveals that
cacheClusterEnabled is indeed stage-based.
Then it’s just a matter of working backwards from there, looping through all the API Gateway rest APIs in the account, and then a nested loop to loop through all the stages within each of those APIs.
In the end, here’s the script I wrote. It uses the AWS CLI to scan all the API Gateway REST APIs in a region to see which APIs have cache enabled in one of its stages:
Before running the script, I set my region (
export AWS_DEFAULT_REGION=us-east-2) to one of the two regions that Cost Explorer had told me was generating API Gateway caching costs.
API Gateway is nearly free for little-used apps, at $3.50/million-requests in the Ohio region… until you turn on caching in API Gateway, because then you’re not just paying per request but also per hour for storage.
Per-request is very different than continual storage costs if it’s an app no one is using. On a basic non-cache-enabled API, if there are no requests, you don’t pay anything ($0/month). However, on a cache-enabled API, if there are no requests, you still pay for storage by the hour (maybe $15–$100/month).
If there are no API Gateway requests you don’t pay anything, but storage costs money regardless of whether you use it or not.
With API Gateway with default settings you pay nothing until the API is invoked. Paying $5/month for cache on an app you just forgot to delete is $5/month too much. So it’s worth identifying that using AWS Cost Explorer and the above script.
Take the time to regularly drill down into your AWS bill using the AWS Cost Explorer. Write a script if necessary, and share it with the world, so that we can reuse it.