Software Development

Introduction to Log structured merge (LSM) Tree


Improve Article

Save Article

Like Article

Improve Article

Save Article

Like Article

B+ Trees and LSM Trees are two basic data structures when we talk about the building blocks of Databases. B+ Trees are used when we need less search and insertion time and on the other hand, LSM trees are used when we have write-intensive datasets and reads are not that high.

This article will teach about Log Structured Merge Tree aka LSM Tree. LSM Trees are the data structure underlying many highly scalable NoSQL distributed key-value type databases such as Amazon’s DynamoDB, Cassandra, and ScyllaDB.

LSM Trees

A simple version of LSM Trees comprises 2 levels of tree-like data structure:

  • Memtable and resides completely in memory (let’s say T0)
  • SStables stored in disk (Let’s say T1)
Simple LSM Tree

Simple LSM Tree

New records are inserted into the memtable T0 component. If the insertion causes the T0 component to exceed a certain size threshold, a contiguous segment of entries is removed from T0 and merged into T1 on disk.

LSM Workflow

LSM primarily uses 3 concepts to optimize read and write operations:

  • Sorted String Table (SSTables): Data is sorted in sorted order so that whenever the data is read its time complexity will be O( Log(N) ) in the worst case, where N is the number of entries in a Database table.
    Android-UML---Algorithm-flowchart-example-(2).png

    1. SSTable

  • Memtable:
    • An in-memory structure
    • Stores data in a sorted fashion
    • Acts as a write-back cache
    • After reaching a certain size, its data is flushed as an SSTable to Database
    • As, the number of SSTable increase in Disk, and if some key is not present in the records
      • While reading that key, we need to read all the SSTables, which increases the Read Time Complexity.
      • To overcome this issue, the Bloom filter comes into the picture.
      • Bloom filter is a space-efficient data structure that can tell us if the key is missing in our Records with an accuracy rate of 99.9%.
      • To use this filter, we can add our entries to it when they are written, and check the key at the beginning of read requests in order to serve requests more efficiently when they come in first place.
Memtable representation

Memtable representation

  • Compaction:
    • As we are storing data as SSTable in the disk, let’s say there are N SSTable and each table’s size is M
    • Then worst case Read time complexity is O(N* Log(M) )
    • So, as the number of SSTables increases the Read Time Complexity also increases.
    • Also, when we are just flushing the SSTables in Database, the same Key is present in multiple SSTables
    • Here comes the use of a Compactor
    • Compactor runs in the background, merges SSTables and removes multiple rows with the same and adds the new key with the latest data, and stores them in a new merged/compacted SSTable.
Android-UML---Algorithm-flowchart-example-(4).png

3.1. SSTables flushed to Disk

Android-UML---Algorithm-flowchart-example-(5).png

3.6. Compactor compacted 2 SSTables to 1 SSTable

Conclusion:

  • Writes are stored in an in-memory tree (Memtable). Any supporting data structures (bloom filters and sparse index) are also updated if necessary.
  • When this tree becomes too large it is flushed to disk with the keys in sorted order.
  • When a read comes in we check it using the bloom filter. If the bloom filter indicates that the value is not present then we tell the client that the key could not be found. If the bloom filter means that the value is present then we begin iterating SSTables from newest to oldest.
  • LSM time complexities
    • Read Time: O(log(N)) where N is the number of records in the disk
    • Write Time: O(1) as it writes in in-memory
    • Delete Time: O(log(N)) where N is the number of records in the disk
  • LSM Trees can be modified to more efficient data structures using more than 2 filters. Some of them are bLSM, Diff-Index LSM.

Last Updated :
25 Jul, 2023

Like Article

Save Article