DuckDB Design Document

mirror/duckdb

mirror of https://github.com/duckdb/duckdb.git synced 2026-07-27 03:54:16 +00:00

Table of Contents

This is the DuckDB design document. DuckDB consists of the following major components:

Below, ideas for individual components are described.

API

C/C++ API, similar to https://github.com/hannesmuehleisen/MonetDBLite-C/blob/master/src/embedded/embedded.h
Allows bulk access to values to support analytics tools (R/NumPy)
Allows multiple databases instances in same process
No authorisation etc, a wrapping process can do that.
No schemas, because you can just have another database.

Tokenises query into strings (which may be quoted), then dispatches code that consumes tokens.
Extensible through registrations of new tokens and code that consumes them.
Input: Query
Output: Parse tree (token-specific nodes, e.g. where node containing an expression)
Perhaps use this: https://github.com/lfittl/libpg_query

Transform parse tree into basic, stupid, runnable plan.
Extensible with new transformation rules so it can handle the new tokens from parser plugins.
Knows database schema so it can select columns and validate etc.
Input: Parse tree
Output: Query Plan, binary tree with operators (scan, project, join etc.).

Static rewriting + sampling-based optimiser. Samples base tables, runs first and second layer (scan/join) of query plan to get idea about cardinalities. Perhaps http://pages.cs.wisc.edu/~wentaowu/papers/sigmod16-reoptimization.pdf is good?

Vectorized execution engine (http://cidrdb.org/cidr2005/papers/P19.pdf)
Alternatively, we could implement operators in Tim's VM (supposed to be a library) and get vectorized execution (planned: plus adaptive optimizations, pluggable JIT and GPU offloading)
Work stealing parallelisation (Morsel-driven? https://db.in.tum.de/~leis/papers/morsels.pdf)
Extensible, new operators can be added
Input: Query plan
Output: Result Sets possibly

Serializability? (less Tx/s but at least the end-user understands it)

Single-File (The problem with single-file is storing the transactional log but if we say that we only require that for atomicity and consistency then we can easily put the log into another file)
MVCC (https://db.in.tum.de/~muehlbau/papers/mvcc.pdf)
Fixed block size to support
Eviction based buffer manager (https://db.in.tum.de/~leis/papers/leanstore.pdf)
Block-based compressions
Keep samples for tables around for optimizer
Min/Max information stored similar to samples, to restrict scan ranges/data types etc
Use K/V for storage?
Question: Are we planning to use primary/secondary indexes?
For primary/clustered indexes we could just sort the table (or keep it sorted), potentially use a PMA (faster inserts, decreased scanning speed)
Quacking - our automatic indexes?

C++ with auto-formatting
No dependencies (without a very good reason)
Try to minimize amount of tunable knobs
Basic data models are tables. Tables consists of same-length columns of a native type (?)
Unit tests + End-To-end. Use Travis. Re-use SQLite tests.
Auto-detect performance regressions?

can we steal planner/parser somewhere? Peloton? Peloton uses this library: https://github.com/lfittl/libpg_query
sampling, maintain during load, hyper-log-log to estimate unique?
tree transformation rules with pattern matching
re-optimize during execution? Bonus feature
execution engine run pipeline from source to sink. default vectorized but pluggable
where is the state of the operator tree? on the stack?
hash tables in query state
do we need another representation for queries, data structure choices physical representation langugage
single-prefix add delete blobs, compacting, for cloud
single write blob, LSM, could give us multi-version
no indexes? enforcing sorting on pk? z-order in compaction
join order imdb benchmark
storage small data types based on global minmax/sample