Ofey Chan, aka 'ofey404'
Pretending a subtitle is out there...
Youtube Playlist: 15-721 Advanced Database Systems (Spring 2020)
M. Stonebraker, et al., What Goes Around Comes Around, in Readings in Database Systems, 4th Edition, 2006 (Optional)
Main idea:
Takeaway:
Systems:
| Data model | System | Interface |
|---|---|---|
| Hierarchical tree | IBM IMS | DL/1, a record at a time, limited P/L independence |
| Hyperspace network | CODASYL | Navigating in the hyperspace, no P/L independence |
| Relational | System on VAX, IBM DB/2 | SQL, QUEL… |
| Entity - Relational | Schema normalization tools | DBA tools |
| Object oriented | Garden and Exodus | Certain programming language |
| Object - Rational | Sybase | SQL + User defined components |
| Semi structured and XML |
A. Pavlo, et al., What’s New with NewSQL?, in SIGMOD Record (vol. 45, iss. 2), 2016 (Optional)
X. Yu, et al., Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores, in VLDB, 2014
Main idea:
7 concurrent control algrithms, in 2 schemes(2PL and Timestamp Ordering), on a 1024 core simulator.
Bottlenecks to scalability: lock-thrashing, preemptive abort, deadlock, timestamp allocation, memory copying.
| 2PL | T/O |
|---|---|
| low contention | higher contention |
| short transaction | longer transaction |
| kv workload | OLTP workload |
Takeaway:
| Bottleneck | Direction |
|---|---|
| timestamp allocation | hardware counter, clock, and atomic addition |
| memory allocation, copying | CPU background copyer, thread-local memory pool |
| No superior scheme | switch between schemes or hybrid approach |
System:
Workload:
Main idea:
Scaling MVCC on modern multi-core, in-memory hardware setting.
Key design decisions:
Takeaway:
MVTO works well for most workloads.
Transaction level gc has small memory footprint, which is good.
System:
| Configuration | CC protocol | Storage Scheme | GC | Index |
|---|---|---|---|---|
| Oracle/MySQL | MV2PL | Delta | Vacuum | Logical |
| Postgres | MV2PL/MV-TO | Append-Only | Vacuum | Physical |
Workload:
Main idea:
update by delete then insert.Takeaway:
It’s good to know internals of current databases’ implementation. They might be simple and out-dated with state-of-art hardware.
Eg: Current serializability validation implementation in 2.3, check entire read set and re-checked in the end. It may be a suitable way in in-disk era.
System used:
Research on HyPer.
This MVCC model suits HTAP databases best, like SAP HANA. Can be implemented in high-performance transactional systems, H-Store/VoltDB. Little need to prefer snapshot isolation in the future.
Workload evaluated:
J. Böttcher, et al., Scalable Garbage Collection for In-Memory MVCC Systems, in VLDB, 2019
Main idea:
Takeaway:
In place GC would make system more robust to skew.
System used:
Hyper.
Workload evaluated:
CH benchmark, a stress test for GC.
TPC-C, scalability and overhead.
* Style sheet refers to Dr. Brian Robert Callahan