|
| 1 | +# DuckDB in AliSQL |
| 2 | + |
| 3 | + |
| 4 | +[ [AliSQL DuckDB 引擎](./duckdb-zh.md) | [DuckDB in AliSQL](./duckdb-en.md) ] |
| 5 | + |
| 6 | +## What is DuckDB? |
| 7 | + |
| 8 | +[DuckDB](https://github.com/duckdb/duckdb) is an open-source embedded analytical database system (OLAP) designed for data analysis workloads. DuckDB is rapidly becoming a popular choice in data science, BI tools, and embedded analytics scenarios due to its key characteristics: |
| 9 | + |
| 10 | +- **Exceptional Query Performance**: Single-node DuckDB performance not only far exceeds InnoDB, but even surpasses ClickHouse and SelectDB |
| 11 | +- **Excellent Compression**: DuckDB uses columnar storage and automatically selects appropriate compression algorithms based on data types, achieving very high compression ratios |
| 12 | +- **Embedded Design**: DuckDB is an embedded database system, naturally suitable for integration into MySQL |
| 13 | +- **Plugin Architecture**: DuckDB uses a plugin-based design, making it very convenient for third-party development and feature extensions |
| 14 | +- **Friendly License**: DuckDB's license allows any form of use, including commercial purposes |
| 15 | + |
| 16 | + |
| 17 | +## Why Integrate DuckDB with AliSQL? |
| 18 | + |
| 19 | +MySQL has long lacked an analytical query engine. While InnoDB is naturally designed for OLTP and excels in TP scenarios, its query efficiency is very low for analytical workloads. This integration enables: |
| 20 | + |
| 21 | +- **Hybrid Workloads**: Run both OLTP (MySQL/InnoDB) and OLAP (DuckDB) queries in a single database system |
| 22 | +- **High-Performance Analytics**: Analytical query performance improves up to **200x** compared to InnoDB |
| 23 | +- **Storage Cost Reduction**: DuckDB read replicas typically use only **20%** of the main instance's storage space due to high compression |
| 24 | +- **100% MySQL Syntax Compatibility**: No learning curve - DuckDB is integrated as a storage engine, so users continue using MySQL syntax |
| 25 | +- **Zero Additional Management Cost**: DuckDB instances are managed, operated, and monitored exactly like regular RDS MySQL instances |
| 26 | +- **One-Click Deployment**: Create DuckDB read-only instances with automatic data conversion from InnoDB to DuckDB |
| 27 | + |
| 28 | +**AliSQL** integrates **DuckDB** as a native AP engine, empowering users with high-performance, lightweight analytical capabilities while maintaining a seamless, MySQL-compatible experience. |
| 29 | + |
| 30 | + |
| 31 | +## Architecture |
| 32 | +### MySQL's Pluggable Storage Engine Architecture |
| 33 | +MySQL's pluggable storage engine architecture allows it to extend its capabilities through different storage engines: |
| 34 | + |
| 35 | + |
| 36 | + |
| 37 | +The architecture consists of four main layers: |
| 38 | +- **Runtime Layer**: Handles MySQL runtime tasks like communication, access control, system configuration, and monitoring |
| 39 | +- **Binlog Layer**: Manages binlog generation, replication, and application |
| 40 | +- **SQL Layer**: Handles SQL parsing, optimization, and execution |
| 41 | +- **Storage Engine Layer**: Manages data storage and access |
| 42 | + |
| 43 | +### DuckDB Read-Only Instance Architecture |
| 44 | + |
| 45 | + |
| 46 | + |
| 47 | +DuckDB analytical read-only instances use a read-write separation architecture: |
| 48 | +- Analytical workloads are separated from the main instance, ensuring no mutual impact |
| 49 | +- Data replication from the main instance via binlog mechanism (similar to regular read replicas) |
| 50 | +- InnoDB stores only metadata and system information (accounts, configurations) |
| 51 | +- All user data resides in the DuckDB engine |
| 52 | + |
| 53 | +### Query Path |
| 54 | + |
| 55 | + |
| 56 | + |
| 57 | +1. Users connect via MySQL client |
| 58 | +2. MySQL parses the query and performs necessary processing |
| 59 | +3. SQL is sent to DuckDB engine for execution |
| 60 | +4. DuckDB returns results to server layer |
| 61 | +5. Server layer converts results to MySQL format and returns to client |
| 62 | + |
| 63 | +**Compatibility**: |
| 64 | +- Extended DuckDB's syntax parser to support MySQL-specific syntax |
| 65 | +- Rewrote numerous DuckDB functions and added many MySQL functions |
| 66 | +- Automated compatibility testing platform with ~170,000 SQL tests shows **[99% compatibility rate](https://www.alibabacloud.com/help/en/rds/apsaradb-rds-for-mysql/compatibility-of-duckdb-based-analytical-instances?spm=a2c63.p38356.help-menu-26090.d_3_4_2.6a97448exEuaFG)** |
| 67 | + |
| 68 | +### Binlog Replication Path |
| 69 | + |
| 70 | + |
| 71 | + |
| 72 | + |
| 73 | +AliSQL allows DuckDB nodes to serve as replicas via Binlog synchronization. By re-engineering the transaction commit and replay processes, AliSQL overcomes the lack of 2PC support in DuckDB, ensuring full data and metadata consistency even after abnormal crashes. |
| 74 | + |
| 75 | +**Idempotent Replay**: |
| 76 | +- Since DuckDB doesn't support two-phase commit, custom transaction commit and binlog replay processes ensure data consistency after instance crashes |
| 77 | + |
| 78 | +**DML Replay Optimization**: |
| 79 | +- DuckDB favors large transactions; frequent small transactions cause severe replication lag |
| 80 | +- Implemented batch replay mechanism achieving **300K rows/s** replay capability |
| 81 | +- In Sysbench testing, achieves zero replication lag, even higher than InnoDB replay performance |
| 82 | +- Batch-write optimization also applies to the primary node: with our DML optimizations, INSERT and DELETE may achieve excellent performance on the primary. |
| 83 | + |
| 84 | + |
| 85 | +### DDL Compatibility & Optimizations |
| 86 | + |
| 87 | + |
| 88 | + |
| 89 | +- Natively supported DDL uses Inplace/Instant execution |
| 90 | +- For DDL operations DuckDB doesn't natively support (e.g., column reordering), implemented Copy DDL mechanism |
| 91 | +- Convert from InnoDB to DuckDB using multi-threaded parallel execution. Execution time reduced by **7x** |
| 92 | + |
| 93 | + |
| 94 | + |
| 95 | +## Performance Benchmarks |
| 96 | +**Test Environment**: |
| 97 | +- ECS Instance: 32 CPU, 128GB Memory, ESSD PL1 Cloud Disk 500GB |
| 98 | +- Benchmark: TPC-H SF100 |
| 99 | + |
| 100 | +| Query ID | DuckDB | InnoDB | ClickHouse | |
| 101 | +| --- | --- | --- | --- | |
| 102 | +|q1|0.92|1134.25|3.47| |
| 103 | +|q2|0.15|1800|1.52| |
| 104 | +|q3|0.53|802.94|3.65| |
| 105 | +|q4|0.46|1000.45|2.77| |
| 106 | +|q5|0.5|1800|5.38| |
| 107 | +|q6|0.22|566.73|0.73| |
| 108 | +|q7|0.59|1800|6.06| |
| 109 | +|q8|0.68|1800|6.99| |
| 110 | +|q9|1.44|1800|13.29| |
| 111 | +|q10|0.91|894.35|3.22| |
| 112 | +|q11|0.11|79.63|1.1| |
| 113 | +|q12|0.44|734.35|1.69| |
| 114 | +|q13|1.59|454.15|5.85| |
| 115 | +|q14|0.38|574.07|0.83| |
| 116 | +|q15|0.31|568.43|1.53| |
| 117 | +|q16|0.32|63.56|0.52| |
| 118 | +|q17|0.89|1800|7.96| |
| 119 | +|q18|1.59|1800|3.11| |
| 120 | +|q19|0.8|1800|2.96| |
| 121 | +|q20|0.51|1800|3.38| |
| 122 | +|q21|1.64|1800|OOM| |
| 123 | +|q22|0.33|361.4|4| |
| 124 | +|total|15.31|25234.31|80.01 |
| 125 | + |
| 126 | +DuckDB demonstrates significant performance advantages over InnoDB in analytical query scenarios, with up to **200x improvement**. |
| 127 | + |
| 128 | +## Try It on Alibaba Cloud |
| 129 | +You can experience RDS MySQL with DuckDB engine on Alibaba Cloud: |
| 130 | + |
| 131 | +https://help.aliyun.com/zh/rds/apsaradb-rds-for-mysql/duckdb-based-analytical-instance/ |
| 132 | + |
| 133 | + |
| 134 | +## See also |
| 135 | + |
| 136 | +- [DuckDB Variables Reference](./duckdb_variables-en.md) |
| 137 | +- [How to Setup DuckDB Node](./how-to-setup-duckdb-node-en.md) |
| 138 | +- [DuckDB GitHub Repository](https://github.com/duckdb/duckdb) |
| 139 | +- [Detailed Article (Chinese)](https://mp.weixin.qq.com/s/_YmlV3vPc9CksumXvXWBEw) |
| 140 | +- [AliSQL](https://github.com/alibaba/AliSQL.git) |
0 commit comments