The Costly Elephant In The Room Is a (Data)dog | Part I
The Datadog cost series:
- Part I — The costly elephant In the room is a (Data)dog
- Part II — The magic that is Datadog pricing
- Part III — Data Puppy, shrinking Datadog costs
Unless you’re living under a rock, you’ve probably heard of Datadog, one of the most loved and used observability platforms currently available.
Recently, we at Finout, set out on the journey to shine a bright light on the Datadog costs, and allow organizations to better understand them, prepare for the end of month invoice, and optimize their bottom line.
The main reason for us to go down this rabbit hole, is that the Datadog platform lacks observability on its out cost (yep, I see the irony here), making it quite challenging to reason about its cost, optimizing it, and being alerted on ominous increase.
In the upcoming series of posts, I’ll cover
- In this post — why you take an interest in your Datadog costs
- In the second part — How Datadog pricing works, and the lessons learned as a Datadog user
- In the third part — What can we reduce the Datadog end of month blll.
This article, the first one in the series will try to answer the basic question of them all; Why you should care about your Datadog costs.
Firstly, a disclaimer — Datadog is an AMAZING product. And although throughout this series I’m going to cover how costly it is, it’s also very valuable, and the observability gained by it is exceptionally good. You pay a lot, but also get quite a lot in return.
I feel it’s important to explicitly mention it before we dive in.
Datadog Cost 101
Datadog has numerous products, for various different use cases, and most of them have addons with additional functionality, capabilities and cost. But the general rule of thumb you need to understand is — the more your use, the more you pay:
You are billed on the hosts to monitor or profile, You are billed on the volume (GB) of logs you are sending and indexing, etc.
If you know, or have a hunch how much you’re going to use — you can:
- Commit to a usage for relevant Datadog products, and get a decent discount.
- Prepay a “Base fee” — An upfront payment, then get discounted prices for any used products, and then any usage gets billed from that prepaid amount. It reminds debit cards in a way.
But the key here, you need to know how much to commit to, and then basically wait and pray you won’t over use it.
Why Should We Even Care About Datadog Costs?
If you’re asking that, you are either:
- Very optimized — and therefore a real kudos for you! I know how complex achieving this is.
- Still at a small scale, therefore your Datadog costs are limited.
- You never used Datadog, and therefore don’t know how easy it is to get to a very large invoice.
Let’s break it down, and understand what makes datadog different for your main cloud provider.
It’s Not Your Core Competence
First, let’s agree that a typical DD invoice can easily get to 4–10% of your cloud provider invoice — and since we know how expensive your cloud provider is, you can now understand how expensive Datadog is too.
Datadog is NOT your core competence, it’s a supporting tool — a very good one, and a very important one, but a supporting tool non the less. It’s not the EC2 instances that keeps your business available, or the underlying storage that stores the data that gives you competitive edge, it’s an observability platform that helps your operations organization keep the lights on and the engine going.
You are paying, because it’s not your core competence — you’re paying because you want to focus on building your core value.
That being said, we should really be vigilant on how much we pay on services that are outside our core competence too, and since there’s always room for “more monitoring”, and more monitoring means more cost, where do we draw the line?
Scalable — Throughput and $ Wise
It’s crazy scaleable, therefore anything you’ll throw at it will be ingested, stored, available for you to query and then alerted on. Sounds amazing right?
Well it’s indeed amazing, and then — anything you throw at it, also gets billed.
And that’s what makes it so so easy to get out of hand — launching a new service that sends too many logs can easily be the difference between $100 a month to $1500 a month. This can be due to developer error, or just since the service you’ve just launched serves so much traffic that even the minimal necessary logging causes excessive costs (and this happened to me, and I’ve written about our log cost optimization effort).
Implicitly Increases With Infrastructure Scaling
As the business grows you add more infrastructure instances (EC2 instances, serverless function etc), you implicitly add additional resources to be monitored, with in turn increase the Datadog billed amount.
This is the expected behaviour — you just need to keep you unit economics at check, i.e. the increase in monitoring cost is indeed in linear correlation to the business and infrastructure costs.
Difficult To Forecast
Lastly, it’s quite hard to provision usage, and therefore forecast the expected cost.
Ask your average developer what’s their services’s throughput, and what’s the volume of logs each request generates — I doubt they could give you the number, And even if they knew — there’s still the process of translating usage to estimated cost.
Or alternatively — what’s the expected number of hosts we’re going to use to support the black Friday sale, or the Superbowl half time? And should we commit to this number, or our average regular usage?
What’s the expected additional services we’re about to develop — how many compute resources, serverless, log and metrics they are going to use?
And since those numbers are hard to come up by, the commitment we buy from Datadog is also hard to come by. Most of the time, the de-facto way to do it, is to start using the product with some (or no commitment), and adjust the commitment according to the actual usage (which is obviously forever changing).
So Why Not Just Use Datadog Usage Dashboards?”
It’s true, Datadog offers usage metering, where you can have a general understanding of your stance and usage.
But does the average developer / product owner understand what 4TB of 7 days retention indexed logs means in terms of cost and operability of the service?
Can we do with 3.5TB of 3 days retention?
And how much will we save if we’ll go for it?
Do they understand how this can be optimized? and should this even be?
And do they understand how much it costs, which portion of it is committed up-front, and which gets billed on a pay-per use pricing model?
To Care Or Not To Care
We pay for Datadog so someone else will handle the headache of managing such a complex platform, but when the Datadog expense gets to be a headache of its own — this is the time to regroup and rethink your strategy.
Through right usage analysis that turns into usage optimizations and better commitment allocation, it’s possible to save a large chunk of the end-of-month bil. A chunk that can be in the ranges of tens and even hundreds of thousands of dollars a year.
And since either way when your bill gets painful enough you’re going to do the maths and calculate your commitments from time to time — why not do it better, and save more?
Next up
This was an initial post on why it is hard, and why we should care.
In the next post we’ll talk about the Datadog pricing models and their pitfalls.
As always, thoughts and comments are welcome on twitter at @cherkaskyb