博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
InfluxDB存储引擎Time Structured Merge Tree——本质上和LSM无异,只是结合了列存储压缩,其中引入fb的float压缩,字串字典压缩等...
阅读量:6113 次
发布时间:2019-06-21

本文共 3653 字,大约阅读时间需要 12 分钟。

For more than a year we’ve been talking about potentially making a storage engine purpose-built for our use case of time series data. Today I’m excited to announce that we have the first version of our new storage engine available in a nightly build for testing. We’re calling it the Time Structured Merge Tree or TSM Tree for short.

In our early testing we’ve seen up to a 45x improvement in disk space usage over 0.9.4 and we’ve written 10,000,000,000 (10B) data points (divided over 100k unique series written in 5k point batches) at a sustained rate greater than 300,000 per second on an EC2 c3.8xlarge instance. This new engine uses up to 98% less disk space to store the same data as 0.9.4 with no reduction in query performance.

In this post I’ll talk a little bit about the new engine and give pointers to more detailed write-ups and instructions on how to get started with testing it out.

Columnar storage and unlimited fields

The new storage engine is a columnar format, which means that having multiple fields won’t negatively affect query performance. For this engine we’ve also lifted the limitation on the number of fields you can have in a measurement. For instance, you could have MySQL as the thing you’re measuring and represent each of the few hundred different metrics that you gather from MySQL as separate fields.

Even though the engine isn’t optimized for updates, the new columnar format also means that it’s possible to update single fields without touching the other fields for a given data point.

Compression

We use multiple compression techniques which vary depending on the data type of the field and the precision of the timestamps. Timestamp precision matters because you can represent them down to nanosecond scale. For timestamps we use delta encoding, scaling and compression using simple8b, run-length encoding or falling back to no compression if the deltas are too large. Timestamps in which the deltas are small and regular compress best. For instance, we can get great compression on nanosecond timestamps if they’re only 10ns apart each. We’d achieve the same level of compression for second precision timestamps that are 10s apart.

We use the same delta encoding for floats mentioned in , bits for booleans, delta encoding for integers, and Snappy compression for strings. We’re also considering adding dictionary style compression for strings, which is very efficient if you have repeated strings.

Depending on the shape of your data, the total size for storage including all tag metadata can range from 2 bytes per point on the low end to more for random data. We found that random floats with second level precision in series sampled every 10 seconds take about 3 bytes per point. For reference, Graphite’s Whisper storage uses 12 bytes per point. Real world data will probably look a bit better since there are often repeated values or small deltas.

LSM Tree similarities

The new engine has similarities with LSM Trees (like LevelDB and Cassandra’s underlying storage). It has a write ahead log, index files that are read only, and it occasionally performs compactions to combine index files. We’re calling it a Time Structured Merge Tree because the index files keep contiguous blocks of time and the compactions merge those blocks into larger blocks of time.

Compression of the data improves as the index files are compacted. Once a shard becomes cold for writes it will be compacted into as few files as possible, which yield the best compression.

 

 

转自:https://www.influxdata.com/new-storage-engine-time-structured-merge-tree/

 

 

转载地址:http://pmcka.baihongyu.com/

你可能感兴趣的文章
[转]响应式表格jQuery插件 – Responsive tables
查看>>
8个3D视觉效果的HTML5动画欣赏
查看>>
C#如何在DataGridViewCell中自定义脚本编辑器
查看>>
【linux】crontab定时命令
查看>>
Android UI优化——include、merge 、ViewStub
查看>>
Office WORD如何取消开始工作右侧栏
查看>>
Android Jni调用浅述
查看>>
CodeCombat森林关卡Python代码
查看>>
第一个应用程序HelloWorld
查看>>
(二)Spring Boot 起步入门(翻译自Spring Boot官方教程文档)1.5.9.RELEASE
查看>>
Android Annotation扫盲笔记
查看>>
React 整洁代码最佳实践
查看>>
聊聊架构设计做些什么来谈如何成为架构师
查看>>
Java并发编程73道面试题及答案
查看>>
iOS知识小集·设置userAgent的那件小事
查看>>
移动端架构的几点思考
查看>>
Tomcat与Spring中的事件机制详解
查看>>
Spark综合使用及用户行为案例区域内热门商品统计分析实战-Spark商业应用实战...
查看>>
初学者自学前端须知
查看>>
Retrofit 源码剖析-深入
查看>>