28 DuckDB Design Document
Hannes Mühleisen edited this page 2018-07-12 17:08:35 +02:00

This is the DuckDB design document. DuckDB consists of the following major components:

  1. API
  2. Parser
  3. Planner
  4. Optimiser
  5. Execution
  6. Storage

Below, ideas for individual components are described.

API

Use case

  • Embedded? Server? Or both by wrapping the embedded DB into a process?
  • How are we going to share resources (CPU, memory) with other processes?

Parser

  • Tokenises query into strings (which may be quoted), then dispatches code that consumes tokens.
  • Extensible through registrations of new tokens and code that consumes them.
  • Input: Query
  • Output: Parse tree (token-specific nodes, e.g. where node containing an expression)
  • Perhaps use this: https://github.com/lfittl/libpg_query

Planner

  • Transform parse tree into basic, stupid, runnable plan.
  • Extensible with new transformation rules so it can handle the new tokens from parser plugins.
  • Knows database schema so it can select columns and validate etc.
  • Input: Parse tree
  • Output: Query Plan, binary tree with operators (scan, project, join etc.).

Optimiser

see also:

** Thomas Neumann, César A. Galindo-Legaria: Taking the Edge off Cardinality Estimation Errors using Incremental Execution. BTW 2013: 73-92.

** Viktor Leis, Bernhard Radke, Andrey Gubichev, Alfons Kemper, Thomas Neumann: Cardinality Estimation Done Right: Index-Based Join Sampling. CIDR 2017

Execution

  • Vectorized execution engine (http://cidrdb.org/cidr2005/papers/P19.pdf)
  • Alternatively, we could implement operators in Tim's VM (supposed to be a library) and get vectorized execution (planned: plus adaptive optimizations, pluggable JIT and GPU offloading)
  • Work stealing parallelisation (Morsel-driven? https://db.in.tum.de/~leis/papers/morsels.pdf)
  • Extensible, new operators can be added
  • Input: Query plan
  • Output: Result Sets possibly

Transactions

Serializability? (less Tx/s but at least the end-user understands it)

Storage Engine

  • Single-File (The problem with single-file is storing the transactional log but if we say that we only require that for atomicity and consistency then we can easily put the log into another file)
  • MVCC (https://db.in.tum.de/~muehlbau/papers/mvcc.pdf)
  • Fixed block size to support
  • Eviction based buffer manager (https://db.in.tum.de/~leis/papers/leanstore.pdf)
  • Block-based compressions
  • Keep samples for tables around for optimizer
  • Min/Max information stored similar to samples, to restrict scan ranges/data types etc
  • Use K/V for storage?
  • Question: Are we planning to use primary/secondary indexes?
  • For primary/clustered indexes we could just sort the table (or keep it sorted), potentially use a PMA (faster inserts, decreased scanning speed)
  • Quacking - our automatic indexes?

Implementation Choices

  • C++ with auto-formatting
  • No dependencies (without a very good reason)
  • Try to minimize amount of tunable knobs
  • Basic data models are tables. Tables consists of same-length columns of a native type (?)
  • Unit tests + End-To-end. Use Travis. Re-use SQLite tests.
  • Auto-detect performance regressions?

Notes 2018-07-12

  • can we steal planner/parser somewhere? Peloton? Peloton uses this library: https://github.com/lfittl/libpg_query
  • sampling, maintain during load, hyper-log-log to estimate unique?
  • tree transformation rules with pattern matching
  • re-optimize during execution? Bonus feature
  • execution engine run pipeline from source to sink. default vectorized but pluggable
  • where is the state of the operator tree? on the stack?
  • hash tables in query state
  • do we need another representation for queries, data structure choices physical representation langugage
  • single-prefix add delete blobs, compacting, for cloud
  • single write blob, LSM, could give us multi-version
  • no indexes? enforcing sorting on pk? z-order in compaction
  • join order imdb benchmark
  • storage small data types based on global minmax/sample