TimescaleDB is a great product, but if you plan to go with them long term, there are few points to consider:
* They are still trying to figure out their monetization strategy. Initially, they betted on their on-premise Enterprise version, then abandoned it. Now they are pushing their cloud version.
* Even though most of their code licensed under Apache license, some code is under their proprietary license.
* I'm sure one can get some ideas about their development directions from their issue tracker and source code, but they don't have any public product roadmap.
* Even though the product itself is technically very stable, the version compatibility leaves a lot to be desired. There are removed features and broken APIs from version to version.
* Their commercial support terms for on-premise instances don't seem to be well defined, not publicly at least.
Timescale Co-founder here. Happy to address your concerns!
1. Monetization strategy
This funding round is actually a sign that our business model is working really well.
To quote Redpoint Ventures, who led this funding round:
"The [Timescale] team capitalized on their significant community momentum last year, with their cloud business being one of the fastest-growing database businesses we have seen in the past 20+ years." [0]
2. Licensing
Most companies (including open-source companies) actually have both open-source and proprietary software, but the proprietary software is often hidden inside private repos. The difference with Timescale is that we have made the source code for our proprietary software available (on Github), even allowing users to modify it (eg "right to repair), and made all of our software free (ie no paid software features). [1]
3. Public product roadmap
We aim to be transparent re: product roadmap via Github, blog posts, etc, but I appreciate the feedback that we could be more transparent. Thanks!
4. Version compatibility / broken APIs
Could you say more? AFIAK the only time we "broke" (ie changed) some APIs is with TimescaleDB 2.0, and when we did so we explained why we did that (mostly to improve user experience based on feedback). We take this topic very seriously and even the decision to do so in 2.0 was not something we did lightly (and it was also made after a lot of discussion with users). More about this decision here in our docs: [2]
5. Commercial support for on-premise
We offer free support for on-premise instances via Slack (where you can often find our engineers, support team, CTO, and myself). [3] However, if you would like a higher level of support for on premise (e.g., commercial SLAs), please reach out to us directly (e.g., via the form on that same page). [3]
To follow up Ajay's point, we really do take compatibility and stability seriously.
I believe that our "major version" upgrade from 1.x to 2.0 was the first time we changed/broke any APIs, but that involved a long beta/RC process, much documented about the changes [1], and upgrades that also meant to seamlessly migrate.
For example, upgrading from 1.x to 2.0 was still just running `ALTER EXTENSION timescaledb UPGRADE`. The main difference was if you were, for example, using some of our informational views in your applications, those had a change a bit. Or if you were querying internal catalogs in your app (although that is never recommended =)
Even after 2.0 was launched, we did backport bug fixes to some follow-on 1.x releases, and continued to support users running 1.x on our cloud platform.
Fair point about adaptive chunking. You sound like a long-term user!
There is always a trade-off between getting features to users quickly to experiment and incrementally improve, versus doing it always very conservatively.
When we launched adaptive chunking (introduced in 0.11, deprecated in 1.2), we explicitly marked it as beta and default off, to hopefully reflect that. [1]
The approach we are now taking with Timescale Analytics [2] is to have an explicit distinction between experimental features (which will be part of a distinct "experimental" schema in the database, and must be expressly turned on with appropriate warnings) and stable features. Hopefully this can help find a good balance between stability and velocity, but feedback welcome!
Lately, I've been studying machine learning, from point zero, with a focus on time series analysis. Two months in already completed a course on Python and another book on Pandas, several hundreds of hours later in the fourth chapter in a book I paid for on deep learning and time series analysis they provided me with the most important information I needed: there is no evidence deep learning works better than traditional statistical analysis using classical methods like SARIMA and ETS. Sure, great, if an academic is interested in theory hopefully making a breakthrough, however, the rest of us who are interested in applied should stick with the classical methods.
I was going to write a lot here but I'll keep it short.
What I discovered is that everything I want to do can best be done in PostgreSQL. It's one thing to do data analysis in a Python notebook and another in an environment that works dynamically on a server. My first guess was to do the heavy lifting in Python with Numpy, Pandas, and machine learning and have the node server -- instead of Django and if I'm learning a new web framework it will be Phoenix -- execute the Python scripts through stdin / stdout. Since started I learned that I don't need machine learning and that I can do the calculations inside PostgreSQL sometimes orders of magnitude faster than in Python.
I'm using TimescaleDB which provides the postgresSQL time_bucket() function and with chunks should scale very well. First I tried to integrate it with Prisma in node, however, that proved to be far too difficult and convoluted. I reverted back to using TypeORM in node and it was extremely easy to run all the boilerplate code to initialize the TimescaleDB plugin inside of migrations which would probably be just as easy in another framework like Phoenix with Ecto. Sometimes I use SQL queries in a string literal and other times I use the query builder for more dynamic interaction with the database and to leverage some of TypeORM's other features beyond only being a connection manager.
What I discovered which interestingly someone yesterday shared a popular link to a blog post on the subject[0], for most of time series analysis, Pandas isn't required and perhaps not the fastest solution. Grokking window function was a little difficult until I found this lecture on YouTube, Postgres Window Magic[1]. Leveraging and understand window function in SQL is probably the most important skill to have.
I don't need Python and Pandas for time series analysis. I can using TimescaleDB and some increased knowledge of using PostgreSQL do time series analysis using all the same infrastructure I've been using for the past several years.
Yeah, I didn't fully understand the problem of time series forecasting. Looks like I'll probably have to have C or python bindings / bridge to do the computation.
> Since started I learned that I don't need machine learning and that I can do the calculations inside PostgreSQL sometimes orders of magnitude faster than in Python.
remeber that you are in a unique position where you know the ML application and specialized pSQL to implement it.
The market is paying big bucks for people that have either of those skills. If you are making less than 300k/y (at the very least), move out now ;)
I prefer 'esoteric' programming languages too, they are better. Experience has taught that ecosystem is way more important for shipping stuff, so stick mostly to mainstream ones.
> Most companies (including open-source companies) actually have both open-source and proprietary software, but the proprietary software is often hidden inside private repos. The difference with Timescale is that we have made the source code for our proprietary software available (on Github), even allowing users to modify it (eg "right to repair), and made all of our software free (ie no paid software features).
So I personally like your company, but I find these sorts of marketing speak responses obnoxious. Your communications strategy here is causing brand harm, not benefit.
1) You have an open source core, with proprietary components.
2) Open source adopters get a crippled product.
3) You have a custom license for the proprietary components, which is designed to allow people to make some use of those, but is poorly-written ambiguous (preventing many types of commercial use), non-open-source compatible (preventing integration into open source projects), and requires a lawyer to review (preventing integration by smaller projects).
This feels like your Achilles' Heel.
Troll Tech tried to go down this line for years, with their QPL license. And they even had sane messaging, where whenever I read your messaging, it feels weaselly, and it changes week-to-week. Still, they didn't really take off until they went with a licensing system customers could trust and understand.
The standard dual-license model would be AGPL and commercial (or GPL+commercial).
* Most open source developers won't mind (or even notice) licenses, so long as their open source and have the nice OSI and FSF logos.
* Most commercial companies won't mind paying $$$.
Commercial customers treat you like Microsoft. Open source developers treat you like community members. Hybrid customers are okay too; if I'm working on a piece of BSD code, I can use the AGPL license on your code, while commercial users of my code can buy a commercial license from you.
And if you insist on the crazy custom license, figure out the messaging. This was better than what I read before. "Proprietary with a public repo" makes more sense than previous messaging which sounded like open source but wasn't. At that point, at least the license overdelivers rather than underdelivers. I still trust that at some point, as an adopter who can't or won't use those components, they'll become increasingly mandatory if you ever fall on hard times. The problem is still that it makes it sound like you have open source and proprietary products. You don't. You have a product with open source and proprietary components, a confused freemium model, and not something I'd ever use without consulting a good lawyer, who in turn would tell me to stay away.
There are many other good models. You could go in the other direction and close up a bit too.
The response didn't seem marketing heavy at all and directly addressed all of the points...
I see posts like yours all the time - attacking companies who believe in open source but know that the existing situation can and will lead to abuse by huge players in the market. It sounds like they have a very happy medium.
I find posts like this very frustrating because you almost never see the same kind of standards for proprietary software. People get less pushback for closed source than they do for "90% open, with restrictions on the last 10%".
As for AGPL dual license, have you tried selling AGPL software before? Even dual licensed, you've just added a massive roadblock - and it's no better than a custom license anyways, since it implies one.
Not to mention you've completely broken any path from "I'm a free user" to "I'm a paying user" - someone has to decide upfront to pay. Given that this is the most significant "win" for open source software (try before you buy, easy to inspect and get started with) it's kind of ridiculous how often I see it suggested.
Let's say I work for a company willing to pay for software. I have a hackweek where I want to try out TimescaleDB, in a world where it's AGPL or, upon payment, dual licensed. That's now dead in the water - I can't use it, AGPL is banned at every company I've worked at and we're not going to pay for me to try it out for a hackweek project.
> You have a custom license for the proprietary components, which is designed to allow people to make some use of those, but is poorly-written ambiguous
I disagree with both the tone and content of your whole comment, but this bit in particular stood out. I'm a very happy TimescaleDB user, and I really like the company and community around it too. Also, support on Slack is amazing and they are always responsive on GitHub. And of course, I have read the license - to me, it's easy to read, and the intent is very clear. I see no ambiguity.
The intent is pretty clear. Unfortunately, the actual legal language is as clear as mud:
"that does not expose or give access to, directly or indirectly (e.g., via a wrapper), the Timescale Data Definition Interfaces or the Timescale Data Manipulation Interfaces to any person or entity other than You or Your employees and Contractors working on Your behalf"
If I have a system, and it has an AJAX API, at what point am I violating this? Virtually every system I build provides customers with access to data via some API which uses "SELECT, INSERT, UPDATE, and DELETE" "via a wrapper." I have no intent of competing with TimescaleDB for hosting, but it's hard to argue that data interface don't provide some form of access to Timescale Data Manipulation Interfaces "via a wrapper."
"the customer is prohibited, either contractually or technically, from defining, redefining, or modifying the database schema or other structural aspects of database objects, such as through use of the Timescale Data Definition Interfaces, in a Timescale Database utilized by such Value Added Products or Services"
I won't even begin to get into this one.
If you build on this kind of legal language, you're taking on a legal liability the size of a moon crater.
The first line you quote is in a section defining rights for Internal Use (Section 2.1.a) If you are providing access to your customers, it's not internal use.
The second quote is about providing access to customers (Section 2.1.b). Note it certainly allows SELECT, INSERT, UPDATE, DELETEs (those are DML operations), it prohibits you from allowing customers to do things like `CREATE TABLE` (those are DDL operations).
This is our approach to define what it means to provide "TimescaleDB-as-a-Service" from a more technical perspective, that hopefully a developer can grok, as opposed to just stating something about "you can't compete", which is open to broader interpretation.
I think you've done a very good job of doing something which a developer will grog.
I think you've done a very poor job with doing something which a court will grog the same way.
I think that's where the astronomical potential liability comes in with respect to using your product.
A lot of 2.1.a versus 2.1.b will hinge on details of how a court will read ambiguous language like "not primarily database storage or operations products". I assume you wanted to say "not primarily database storage or database operations products." However, it could just as easily read "not primarily operations or database storage products." At that point, "operations" has broadly different meanings (e.g. business operations?).
And aside from that, if I'm making a medical database product, is that primarily "database storage?" Probably.
The problem with ambiguous legal language is that:
1) You, or a vulture successor, can plausibly sue anyone who does just about anything.
2) If we assume you vulture successor has a 20% chance of winning $20 million, the outcomes is a $4 million settlement.
Which is why good lawyers avoid it. The whole document is just bad legal language.
But even if it was GOOD legal language, it wouldn't matter. The difference between a custom-form license and a standard OSI license is that competent customers need to spend a few grand on legal fees before they use yours.
I understand what you're trying to do, but every other organization that went that way eventually went with a standard license. You'd be better off doing likewise. Or if you really can't, you're better off working to make a community-recognized standard form license which is used by enough products that it has a standard, common, recognized legal understanding.
I know the risks of the AGPL, and where it will or won't hurt me. I don't know the risks of your license, except that they're obviously huge.
Realize you might not be comfortable with the license.
For others, I can share at least that it was drafted by some of the most experienced copyright & IP counsel there is, including with significant open-source licensing experience.
But anyway, we're providing it as free software, so if you don't feel comfortable with it, you are certainly free to use our Apache-2 version. Cheers!
Terms-of-service and Facebook's employment agreement were generally drafted by experienced counsel. That means they do a good job of protecting the person on the opposite side of the table, not of protecting me.
And the term isn't "free software." It's "freemium software."
It's exactly this sort of comment which makes me distrust TimescaleDB.
I don't know about the US specifically, but here in Europe the courts take a dim view of trying to weasel around wording when the intent is clear - if the intent is clear, that's the most import thing.
Even with the EU, there isn't a "here in Europe." Europe has common law jurisdiction, like Ireland, and civil law jurisdictions, like France, and there isn't uniformity.
I don't know about civil law jurisdictions, but in the US, this license is a liability bomb.
For background, I haven't used TimescaleDB before, but I've done some pretty advanced ORM work to vertically shard PG tables in Rails and I know PG pretty well, so I'm quite curious about TimescaleDB.
> * Even though most of their code licensed under Apache license, some code is under their proprietary license.
I don't really think this is a perfectly fair characterization. Their proprietary license is essentially "don't host a cloud database and charge for it" to stop Amazon from building TimescaleDB right into RDS, or similar.
I think it's a totally fair license without too much to worry about if they go out of business.
> * Even though the product itself is technically very stable, the version compatibility leaves a lot to be desired. There are removed features and broken APIs from version to version.
This would be my biggest worry. Upgrading Postgres is already stressful enough, having to deal with broken APIs from version to version would leave me pretty upset, though I've not heard of anyone complain about this before, so I'm not sure how much of a problem this is in practice.
> > * Even though most of their code licensed under Apache license, some code is under their proprietary license.
> I don't really think this is a perfectly fair characterization. Their proprietary license is essentially "don't host a cloud database and charge for it" to stop Amazon from building TimescaleDB right into RDS, or similar.
"yeah, go ahead, infringe that copyright and host a internal/for-direct-clients only database, because i guess that would be OK from their license, even though it is not explicitly allowed"
pardon the sarcasm, but i literally heard that from our lawyers today regarding another project's license, as something (quoting again) "no lawyer would ever say to their clients".
As a bit of an aside Michael, I've been pretty impressed with the quality of your team's response on HackerNews. If my skillset was a better fit for your company I'd consider applying. I hope you all get some great talent with this latest raise and I hope I get a chance to try out the product soon.
Even though it is proprietary, I appreciate the current fine print in their current Timescale license compared to most other proprietary licenses. It doesn't have scary ambiguous language that could apply to even small, non-cloud-provider users that the SSPL contains, and they have nice "we won't sue you" clauses that were written favorably for users.
At least that's what I think, I'd want to hear kemitchell's review of the most recent iteration of their license, I think it incorporates much of what he's discussed as the correct legal direction for open-except-for-clouds licenses which strikes the right balance between user protections and safe guards against cloud providers.
Just in case you're not a native English speaker: The verb "to bet" is irregular and the past tense is simply "they bet" rather than "betted" (which would be far more logical).
Right, the dictionary is not your friend here. It is true that "betted" is occasionally used... but we're talking maybe 1% of usage, and mostly in older text. I recommend sticking with "bet".
People want to buy services not software, that's why they go to the cloud in the first place. Vendors keep fighting this to their own detriment so it's nice to see Timescale is actually giving customers what they want.
Modifying business models to optimize for success is a good thing, not a negative.
Very happy with our choice to use TimescaleDB. The idea to simply make it a Postgres extension was brilliant. The compression release was one of the cooler features I've seen in recent times. Row database for recent transactional data, columnar compressed database for historical OLAP workloads - pretty much automagically.
Under 100GB; I'm sure vanilla Postgres would suit our needs too. However, adding TimescaleDB on top was not much of an investment and in exchange we got an interface for operations we do often, effortless continuous aggregation, near-constant time appends, and a native way to leave data mutable for a period of time before marking it immutable and compressing it.
The performance is a great feature but its also just an intuitive, familiar (pretty much just SQL) tool that makes life easier.
I was in Grand Central Tech with the Timescale folks, became friends with both Ajay and Mike. Ajay gave me a lot of good thoughts on building my startup(thanks!), and Mike is..well.. just hyper smart. That is to say, I'm not surprised to read this and for what it's worth: they deserve it, really great humans! :)
I guess this is an unpopular opinion, but I’ve found InfluxDB to be superb for being trivial to get going in a high performance way. I have never touched InfluxDB Cloud - always just InfluxDB either as an arbitrary process or container. Examples of where I’ve found InfluxDB to be more pleasant:
* InfluxDB has way better documentation on functions. For example, look up moving average by time (not points) on TimescaleDB vs InfluxDB. We use these more complex queries and have no problem on Influx. Going further, the number of functions built in is impressive with the same ability to define new functions.
* InfluxDB containers are totally self contained which is great for simple architectures. As a process, InfluxDB is a single executable thanks to Go.
* This is extremely subjective, but I find Flux easier to comprehend as a separate query vs. the use of SQL to do higher complexity functions; however, I am sure this is due to my lack of experience and know how to write said queries in SQL.
The benchmarks are interesting, showing TimescaleDB to be the clear winner in most scenarios.
For me that's nice, but it's a bigger deal to me personally that I already have Postgres and SQL experience that translates directly to TimescaleDB, I don't have to learn a new tool and query language. Development is complex enough and I have to learn too many things as it is. The older I get the less enthusiastic I am about adding something new to the stack.
Agree totally on the "double down on what you know" point. That pays off in spades usually.
Tangentially related to that: their mongo benchmark numbers always looked odd to me. Given that I've used mongo for 10+ years for high throughput time series data without major issues, I decided to do my own benchmarks. In my testing, mongo outperformed timescale significantly both in write throughput and query performance.
This is likely in part due to the fact that I'm using well-understood internal data from real production systems, and as such my ability to be able to build performant indexes / query strategies in the database that I know best introduces a performance bias.
I always take benchmarks with a grain of salt, for this reason. And I try to lean into the tech I understand best.
Hi @spmurrayzzz thanks for the feedback. (Timescale person)
Always strive to do the best and fairest benchmarks we can, and for that reason, all our benchmarks are fully open-source for both repeatability and improvements/contributions:
We also really did spend a lot of time investigating approaches with MongoDB, so you'll see our benchmarks actually evaluate two _different_ ways to use time-series data with MongoDB (culled & optimized from suggestions in MongoDB forums). But always welcome to feedback:
Thanks for engaging here, and congrats on the round!
I've reviewed all these resources multiple times in the past, which is what prompted me to do my own benchmarks (in which mongo outperforms both multinode and single node configurations).
Some issues I noticed:
- youre using gopkg.in/mgo.v2 which is a mongo driver that hasn't had a release in 6 years. Not sure of the general performance impact here, but my tests use mongo 4.2 with a modern node.js driver. So thats one difference.
- your indexing strategy for mongo is easily changed to be able to get much better performance than the naive compound approach you took in the code (measurement > tags.hostname > timestamp).
- you didnt test the horizontal scaling path at all, this is where mongo arguably shines
I'm glad you all open source this stuff because it helps engineering leaders make better decisions, so thank you for that. But your data does not align with my own: either our production metrics or through structured load testing.
I also recall that when we [Timescale] first did our benchmarks vs Mongo for time-series, our use of MongoDB for time-series beat Mongo's own benchmarks :-)
That's probably not something most companies would do for benchmarking, but we take ours seriously :-)
I'm currently using InfluxDB (v1 not v2) and I've looked into switching over to Timescale DB.
Currently I'm stuck on figuring out how to get data into TimescaleDB. My company makes heavy use of Telegraf, which is a natural fit for InfluxDB, but not so much for TimescaleDB. The original pull request for the Telegraf plugin for Postgres/TimescaleDB was closed because the author was non-responsive: https://github.com/influxdata/telegraf/pull/3428
I can even write data to it using simple TCP or UDP tools like netcat or curl. And for some cases I have simple scripts that do exactly that. TimescaleDB, on the other hand, requires some sort of Postgres client.
What do you, or other people, use for writing data into TimescaleDB?
One of our active community members took over the effort to merge PostgreSQL/TimescaleDB support into telegraf here, so hopefully that can make progress:
Yeah, I saw that. I guess I'm a just a little disappointed that nobody at the TimescaleDB team saw through the process. But I understand if you have higher priorities.
I still wonder what other people are using to feed information into TimescaleDB. I'm wondering if I should switch to a different approach, such as using Telegraf but routing the data to something else that will push data into TimescaleDB.
Don't want this to come across as overly defensive, but was under PR review for 3+ years by Influx with little progress (first submitted in November 2017) and during that period I think we did something like 2 significant rewrites. Became a bit of a moving target against telegraf that became harder to prioritize.
Fully agreed on having that SQL experience guiding you on a totally reasonable solution.
However; our problem space is not high cardinality data; it more closely aligns to the first performance comparison with 10 devices and 10 metrics. The ease of getting high performance with pre implemented functions is great for us. Reliability is obviously a concern, and I can agree that if data is sacred, then choosing something built on Postgres is going to be a better thought.
Again, this is just our problem space; small scale deployments on many machines with no preexisting RDMS, low cardinality data, etc. I think it’d be a different story if we were huge, but for us, InfluxDB provides some seriously handy feature and is worth consideration if your problem is similar.
We totally hear you that usability and the developer experience is super important, especially when starting out.
One project we launched earlier this year "Timescale Analytics" actually seeks to address exactly this, e.g., bring more useful features and easier programmability to SQL [1] and you can see (or add) to the discussion on github [2].
Also informed by some of the super helpful functions we've seen in PromQL. And by the way, if you are interested in PromQL, we have 100% compatibility with PromQL through Promscale [3], which provides an observability platform for Prometheus data (built on TimescaleDB).
Postgres as a base is battle-tested, extremely reliable, and well understood.
Most developers are already familiar with Postgres or at least SQL.
The tooling around Postgres is basically universal.
There's huge value in an option that is literally just "install this Postgres extension and everything works and gets out of your way".
We use TimescaleDB for a handful of products. In several cases we literally just updated a DSN to point a product at TimescaleDB instead of an existing database and the project Just Worked(TM) except hundreds of times faster.
And some of those that we developed on TimescaleDB natively, it was more or less the same thing... give a team TimescaleDB and they're basically productive immediately. There's no learning and integrating new libraries and query languages, no time from ops finding new and exciting problems to solve in hosting and scaling the DB, etc.
We get all this with all the functionality and strong guarantees that Postgres provides.
You will find a lot of people (myself included) who've bet on InfluxDB and sorely regretted it afterwards. It's not even remotely close to Postgres and Timescale in reliability, and to be honest hardly production ready if you work with critical data.
Funny, I started using InfluxDB in several projects, but threw it out immediately when TimescaleDB appeared because I thought TSDB seemed a lot more solid and well-designed. Influx seemed much more of a quick hack in comparison - I did not like the (Python?) SDK or the docs, and operational-wise it felt a little flaky compared to TimescaleDB. A disclaimer is that I'm very familiar w PostgreSQL since many years back, so TSDB operations felt very intuitive to me while Influx was all new stuff - that probably made a difference.
Example: your temperature sensor is faulty and produces values like -100. You can't delete this data by using "delete from measurement where temperature < -50". You have to get all timestamps, then delete those timestamps one by one.
I think idea and promise of Timescale is great, but current(well actually I tried it 1 year ago) state of things makes it very hard to choose Timescale over Clickhouse.
I've tried to setup simple Twitter parser for trends analysis, so I needed few thousand counters every few seconds. While I did not encounter any perfomance issues, size on disk was a huge deal. I don't remember precise numbers, but Clickhouse used few magnitudes lower disk space. And while Timescale has nice things like materialized views, Clickhouse has them too. And apart from them Clickhouse has excellent data compression algorithms for repeated key value type counters.
So it becomes really hard to understand why Timescale. It aims to help you with tables bigger than traditional pg can handle, but at the same time uses same amount of space.
I think ClickHouse is underestimated as a database for time series. Many companies using it for analytics purposes (like Cloudflare [0]), for logs processing (like Uber [1]). I'm just waiting when someone builds something outstanding for monitoring. Articles like [2] shows ClickHouse potential in this area.
Btw, ClickHouse is under Apache 2 license, which makes it much easier to use in big companies.
I checked documentation and I don't think I did. Looks like it has same compressing algorithms as Clickhouse, so it should be pretty close in space requirements for old chunks.
> When we launched TimescaleDB, we met a fair amount of skepticism.... The top voted Hacker News comment at the time called us, “a rather bad idea[0].”
Having been on HN long enough, what I look for during any idea/startup launch is polarization and intensity of viewpoints. If people are reacting to the idea (for better or worse), it means its had an impact. Those are often the products that find success. A no-comment launch is far worse than one riddled with criticism.
IMO HN's classic "skepticism" is usually just engineering nerd insecurity projected outwards, with enough techno-jargon to maintain plausible deniability. Folks feel threatened by a great idea so it's safer to find some way to tear it down. Not to dismiss all feedback as projected insecurity of course.
I've been working on a rubric for evaluating HN reaction to "Show HN" launch posts:
1. Universally Negative - Either it's cryptocurrency-related, or it depends on source of negativity:
A. "I read the site and I don't know what this is" - Genuinely bad explanation of an idea that doesn't seem particularly technically interesting or challenging.
B. Criticism of superficial aspects (e.g. website, related topics) - Genuinely bad explanation of an idea that DOES seem particularly technically interesting or challenging. _(Commenters don't get the message, but are worried they'll appear ignorant if they say it.)_
C. "Nobody needs this" "Why is this a thing" - Either bad or HN is nowhere near the target audience.
D. "This is not the right way to do it" "You can just do X" - Either bad or revolutionary (and new enough that the idea hasn't clicked with anyone.)
2. Polarization -
A. If positive people are REALLY positive about it - potentially a disruptive technology, potentially ahead of its time.
B. If negative people say it's actually much harder to solve - the idea is great in principle but the only reason it hasn't already been solved is it's not possible or very difficult in practice.
3. Universal Adulation - It will transparently never make any money, it is some kind of attempt at decentralization that will never get adoption beyond hardcore nerds.
Your comment sounds like wild projection in itself. Most skepticism is based on wisdom and experience gained over years of working in the industry and noticing the patterns of 100s of past companies and projects.
Timescale when it first launched was little more than an automatic-sharding extension for Postgres with some convenience functions for handling time data. It was competing with Postgres itself which added native partitions, other sharding extensions like Citus, and an entire class of column-oriented relational databases that have become much more capable.
Timescale today is very different and has added a lot of the missing functionality to make it a very attractive database option, especially the columnstore/compression feature mentioned in that first HN comment.
I'm still confused why time-series databases are even a thing. It seems to me that time-series just means you have a date/time column plus an index on it. Which is something typical databases already do well, and like the referenced post mentioned, you could use a column store for better performance.
But I just don't see anything that makes creating an entire database design for one specific index type worthwhile...
I index many tables on my site by num_upvotes so I can find the top ranked items to show. Does this mean that I need an UpvoteDB? I don't think so.
A previous time I argued this point, it was mentioned that you rarely need to update or delete old rows. This allows you to tailor the storage solution better. However, this basically means a compressed column store, which again, doesn't really have much to do with time.
The internals are completely different. Given the collection of software technologies we posses today, you can't assemble them around a database using a row-oriented encoding and come up with something that can outperform (in space, time and cost) the kinds of query styles that column-oriented encodings absolutely murder.
Logically they're the same thing, but engineering is about details, details in this case that could easily be a 2x to 20x budget difference given an appropriate project
A column store can take 100 years worth of samples occurring every 10ms that yield a constant result and using technology we actually have, represent those ~87 million data points on disk and in CPU using somewhere under 10 bytes.
Certain time series databases tend to me be optimized towards making the most recent data readily available and quick to fetch. There are also certain filtering / compression algorithms that are run on these time series databases that only make sense in a time domain.
Also, some of these time series databases have very specific use cases and you have to also think about the client tools associated with the database. Many of these databases sit in power plants, factories, etc. and they stream data to tools that are built to visualize or analyze the last few minutes of data and then trigger alerts based on patterns. Also, these database are very "device" aware and integrates with other systems that represent their data in a timeseries fashion already (like a sensor). A lot of customers who needed this type of database care only about this index because their concern is record keeping and monitoring. Not necessarily number crunching (this is changing though).
There are drawbacks to storing your data this way. If your primary index is time, it can be hard to merge that with some based on a coordinate system. So doing certain types of analysis is really difficult unless you replicate your data into some other database with a different index.
This is a thought exercise I've done myself, and your questions will mostly be answered by looking at the features (https://docs.timescale.com/api/latest/) that TimescaleDB provides.
> However, this basically means a compressed column store, which again, doesn't really have much to do with time.
It does though: which data do you compress? The old data. Why not let the database figure that out for you, so you don't specifically have to tell it.
Other features include:
- Continuous Aggregates: a materialized view aggregating data over time is doable, but why not let the database materialize it for you, and automatically fall back to an un-materialized query for the newest data?
- Retention: deleting (or downsampling) old data is easy to do on your own, but why not let the database do it for you according to a policy?
If I recall correctly TimescaleDB is mostly some extension functions for Postgres, with indeed some specific indices that vastly increase some often used insert & lookup query's.
You can also just extend it with PostGIS for those really fancy smancy geographical oriented time series query's. Pretty neat stuff running out of the box. Here's the docker implementation: https://hub.docker.com/r/timescale/timescaledb-postgis/
Everything works reasonably well in a relational database if your data is small. As you scale up, the performance will fall off a cliff for any data model that the internals of the database kernel were not specifically designed for. No relational database kernel is optimized for time-series data models, so poor performance is just a matter of scale.
There’s also a tendency to think: “I don’t need this, so neither does anyone else”. I know that guilty of applying that logic more times that I’d like.
I think it goes to show how impossible it is to judge an idea. YC itself doesn't pretend to do this with the ~15,000 applications they go through each cohort. They try instead to look at the team, look at their progress, imagine what would need to happen for the company to succeed at the level required for them to get the returns they seek.
Founders should have a thick skin when it comes to criticism on HN, because we don't know either.
To be fair, the skepticism wasn't without merits given the lengths TimescaleDB goes to make timeseries work. From their blog entries [0][1], it is evident that they essentially shoe-horn techniques from columnar stores like Apache Druid / Kudu, and file-types like Apache ORC / Parquet into Poatgres' row-based data-model. Reminds me of BigTable / HBase, in a way, too.
TimescaleDB's biggest feat here is of course pulling the engineering magic rabbit out of the hat by chipping away at it for 4+ years, and effectively answering the skepticism by delivering on their promise.
Note though, Amazon Redshift is built on Postgres, and (allegedly) so is Amazon Timestream.
Technically you don't need a good idea to raise $40 million these days. /s
To me building upon PostgreSQL was however, a good idea. Long term all databases gain relational features, through a rather painful process of realization that RDBMS actually did some things right. They'll skip that pain and focus on new features.
Microsoft does something similar by offering a Graph DB on top of MS SQL.
Betamax was slightly technically superior, but the reason it lost was that Sony initially limited recording times to 1 hour. This meant that most movies required 2 Betamax tapes, vs 1 for VHS. Betamax players were also much more expensive.
The point is that a lot of "bad ideas" have something major going for them.
This is awesome. In certain industries (manufacturing, energy, etc) there are companies (really 1 company) that essentially have monopolies on time series databases / historians. There has been 0 competition in that space and as a result the databases and surrounding client tools are just so awful. It'll be interesting to see if timescaledb can really enter that market and force those companies to adapt.
We actually see a bunch of startups directly going after that market leader in historian space that are building on _top_ of TimescaleDB.
So they can bring their domain expertise in process manufacturing and elsewhere, and then build on a modern, powerful platform. We're excited to see this!
We too are also doing something similar. We've just started the move to timescale for real-time energy and sensor data from industrial assets. We have a single customer with about 10TB of data and growing from 2 years worth of real-time monitoring which is stored in a mixture of table storage and SQL. Timescale on PG seems like blessing for our future plans :)
That's awesome. Y'all are going to crush it, especially since I saw that lots of companies have an appetite for adopting new backends, especially dbs that can work in both on-prem & in cloud.
I would love to known of some alternatives to Rockwell, wonder ware, Schneider to propose to our customers. What startups are building on top of timescale?
I was employed there so I don't feel entirely comfortable stating their name. Just google around for "operational historian". It was really frustrating because we had these ideas years ago (and many other ideas), but due to reasons we were not allowed to actually make the product genuinely better. Also, the product is in more places than you'd think because they partner with vendors to sell the database as a component in a larger system that the vendor packages up and sells.
Their stuff is a nightmare to use, and is insanely expensive.
Always nice when we get to just swap it out for something like Canary or even Ignition although folks are always trying to trash the Ignition Historian when it works well for most use cases people need to solve.
Yeah, Ignition’s historian isn’t meant to replace or compete with PI, it’s just meant to provide a basic historian that meets the average user’s needs.
Congrats!
I really love the approach they took to deliver value: An extension of an existing rock-solid platform (Postgres) instead of building a new server which would require a lot of time to learn and manage.
Is TimescaleDB suitable to store logs? If yes, how to architect the tables?
(Timescale engineer here). We believe so and we have customers using us for just that. We haven't created our own product for that yet (as we have for metrics -- Promscale) but it is an idea we are playing with. You may want to look at our Promscale design doc[1] for ideas on table layout.
I'm pretty stoked for this. Timescale ability to use time series on a subset of tables (hypertables) really makes it an interesting choice. I've just dabbled with it but seeing that they'll finally be improving their hosted solution make me want to dive in deeper! Anyone has experience running large DB on timescale, would you recommend it?
I haven't tried running the distributed Timescale DB - in general more parts means more things to go wrong, so using their managed cloud service for that is a good idea.
But I can attest that the single-server version is rock solid, just like PostgreSQL that it's based on. And it's free and source visible. The pace of innovation has also been really high, it just keeps getting better with every release.
One of the big items that stood out to me is the inability to be able to migrate data from an existing table when creating a distributed hypertable. There were also some significant query performance reports as well.
These all may improve with time of course, so watching the dev cycles will give you a good sense of that I think.
We used historical flow meter readings (started out with 15 minute intervals but it worked much better with a higher frequency) and used that to train Recurrent Neural Networks (RNN's) to predict where areas were likely to flood. I was the devops lead on it for the prototype, not the data scientist unfortunately so can't give you the in's and outs but can tell you that we used tensorflow together with pandas/df/timescaledb. We then displayed that using plotly, all this was stuffed inside several containers. It was a great project to work on actually. The whole setup was pretty much a joy to work with.
That is awesome. I work in flood forecasting and previously worked in NCAR/NASA. If you would be so kind, feel free to share any white papers, links etc to the project you are referring to.
I don't I'm afraid, for starters I contract and no longer work on the project, plus there's IP invovled etc. Also I'm no data scientist, just a hacker really :)
The main difficulty was getting access to the data (and ensuring it was valid, as with all ML projects), lucking we managed to get that from several sources (councils, water companies etc). The flow data was in TimescaleDB, pandas dataframes so we could use varying levels of frequency of data and we used HDF5 iirc as well (detail is hazy, it was a few years back now).
We did demo it to the Met Office here in the UK too. They were interested but already had their own thing cooking up so the project never really got out of prototype, but it was making accurate predictions. I think there were some other areas that might turn out to be flaky over time using this method (such as rapid changes to catchment areas) but that could maybe be factored in someway with more thought on the model and verification on a larger set/timeframes. Feel free to hit me up if you want any more detail on the tech side, but don't ask me about stats, maths fu I ain't :)
TimescaleDB looks great. I'm interested in using it but I'm concerned about the upgrade path across major PostgreSQL versions. Logical replication is a big help when upgrading PostgreSQL across major versions while minimizing downtime. As far as I understand it, TimescaleDB doesn't support logical replication yet. Major version upgrades with TimescaleDB is obviously a solvable problem but it probably means we'll have a more complicated upgrade path. Upgrading via logical replication is just so nice.
Congrats to the tsdb team. I've tried the database extension months ago and it works perfect for my use case standalone.
I just need to integrate it with Django, which is not easy given the current schema of the database and the way Django creates defaults autoincrements and PK, but I'm sure there will be a workaround for that.
I ran into the same problem trying to pair it with Django. From what I remember, some manipulation of primary key constraints in the schema could get them to coexist (although you lose any enforcement of unique PKs, so there's the potential for things to go bad). We ended up moving to FastAPI, so don't have this problem anymore.
I'm using TimescaleDB more and more, I just deployed another instance this morning for another customer that need to store timeseries (hundreds of servers metrics and some logs), I do have other instances in production since a year without any issues and the compression and expiration policy are really great for this use case!
Thanks to the team.
What I want in TimescaleDB is aggregation of old values in the SAME table. There is no use when I have to do this in a seperate table, Grafana overhead will be insane.
Have you looked at Real-time aggregates in TimescaleDB? It might help address the problem you are facing:
"With real-time aggregation, when you query a continuous aggregate view, rather than just getting the pre-computed aggregate from the materialized table, the query will transparently combine this pre-computed aggregate with raw data from the hypertable that’s yet to be materialized. And, by combining raw and materialized data in this way, you get accurate and up-to-date results, while still enjoying the speedups that come from pre-computing a large portion of the result."
Standard SQL is rubbish for time series queries as it is based on set theory which does not have order. Most SQL datbases exploit that fact to increase performance. Fundamentally kdb is based on ordered lists, which is a much better paradigm for time series data.
Have you checked questdb [1]? the data structure is arrays with data that lands in order and SQL queries on top. The fallback was that it was difficult to deal with out of order data, but we have just solved this by re-ordering data on the fly in memory before it hits the disk. Performance wise probably not far from kdb itself (will be sharing some bench results soon vs open source tsdbs)
An alternative is to build a system on a top of data warehousing technology, but it's very tough, so many stuff built on top of KDB+, I think it will stay for there for next two decades.
* They are still trying to figure out their monetization strategy. Initially, they betted on their on-premise Enterprise version, then abandoned it. Now they are pushing their cloud version.
* Even though most of their code licensed under Apache license, some code is under their proprietary license.
* I'm sure one can get some ideas about their development directions from their issue tracker and source code, but they don't have any public product roadmap.
* Even though the product itself is technically very stable, the version compatibility leaves a lot to be desired. There are removed features and broken APIs from version to version.
* Their commercial support terms for on-premise instances don't seem to be well defined, not publicly at least.