poster

14 downloads 425 Views 1MB Size Report
All update operations send redo log to Disk Writer. • Delete does not include value-related ... Logs are queued to be
ACM SIGMOD 2011 Programming Contest A Durable Main-Memory Index Using Flash Team greensky

Jung-Sang Ahn, KAIST The Task

(Korea Advanced Institute of Science and Technology)

Architecture Overview

• To implement high-throughput main-memory index • All completed updates must be durable using SSD

Workload Threads

• Recoverable after system crash

Key-value operations

• Stores key-value pairs

Cache hit

• Key (unique): up to 1024 byte • Value: up to 4096 byte

Hash Cache Cache miss

• Offers 5 key-value operations:

B+-Tree based Trie

• Insert, find, delete, compare & swap, and iterate (lexicographical ordered traversal)

• Should support high-concurrent accesses

Disk Writer

• All single-key operations must be atomic

Logging

• SSD is formatted using ext4 filesystem

Main Structure:

+ B -Tree

Garbage Collector

SSD

Concurrency Control

based Trie

+ B -tree

• Trie uses as a node • Key is split into 8-byte chunks

• Tree level readers-writer lock • Only downward propagation

• Each chunk is used as a key for each level of B+-tree Key

• To avoid deadlock Gets lock

chunk1 chunk2 chunk3

Releases lock

B+-tree

8 byte Search with chunk1

Key-value

Search with chunk2

Disk Writer

Search with chunk3

• All update operations send redo log to Disk Writer • Delete does not include value-related fields

Ordered traversal

Type Key length

• Skewed trees are merged into one tree • To speed up traversing common prefix

1 byte

Value length

3 byte

Key

Value

4 byte

• Logs are queued to be written in bulk • Associated operations are completed after writing is done

a

a

b

b

b

a bbaa

Stores common prefix ‘bba’ b bbab

a

Garbage Collector • Reclaims invalid logs to gather free space

a bbaa

• Recovery: redoing all written logs

b

• Victim selection: round-robin policy

bbab

Validity check

Index

Hash Cache • Maps from key to key-value object for fast lookup

Invalid: reclaim Valid: write as new log

Logs Victim