Skip to content

Commit 57c4a2e

Browse files
committed
Update README and wiki
1 parent 5bc0c2f commit 57c4a2e

8 files changed

Lines changed: 1140 additions & 235 deletions

README.md

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
AliSQL is Alibaba's MySQL branch, forked from official MySQL and used extensively in Alibaba Group's production environment. It includes various performance optimizations, stability improvements, and features tailored for large-scale applications.
44

55
- [AliSQL](#alisql)
6+
- [🚀 Quick Start (DuckDB)](#-quick-start-duckdb)
67
- [Version Information](#version-information)
78
- [Features](#features)
89
- [Roadmap](#roadmap)
@@ -12,16 +13,20 @@ AliSQL is Alibaba's MySQL branch, forked from official MySQL and used extensivel
1213
- [License](#license)
1314
- [See Also](#see-also)
1415

16+
## 🚀 Quick Start (DuckDB)
17+
18+
> **Quickly build your DuckDB node:** **[How to set up a DuckDB node](./wiki/duckdb/how-to-setup-duckdb-node-en.md)**
19+
1520
## Version Information
1621

1722
- **AliSQL Version**: 8.0.44 (LTS)
1823
- **Based on**: MySQL 8.0.44
1924

2025
## Features
2126

22-
- **[DuckDB Storage Engine](./wiki/duckdb/duckdb.md)**:AliSQL integrates DuckDB as a native storage engine, allowing users to operate DuckDB with the same experience as MySQL. By leveraging AliSQL for rapid deployment of DuckDB service nodes, users can easily achieve lightweight analytical capabilities.
27+
- **[DuckDB Storage Engine](./wiki/duckdb/duckdb-en.md)**:AliSQL integrates DuckDB as a native storage engine, allowing users to operate DuckDB with the same experience as MySQL. By leveraging AliSQL for rapid deployment of DuckDB service nodes, users can easily achieve lightweight analytical capabilities.
2328

24-
- **[Vector Storage](https://www.alibabacloud.com/help/en/rds/apsaradb-rds-for-mysql/vector-storage-1?spm=a2c63.p38356.help-menu-26090.d_3_3_0.6bb8d111D06xOW)**:AliSQL natively supports enterprise-grade vector processing for up to 16,383 dimensions. By integrating a highly optimized HNSW algorithm for high-performance Approximate Nearest Neighbor (ANN) search, AliSQL empowers users to build AI-driven applications—such as semantic search and recommendation systems—seamlessly using standard SQL interfaces.
29+
- **[Vector Storage](./wiki/vidx/vidx_readme.md)**:AliSQL natively supports enterprise-grade vector processing for up to 16,383 dimensions. By integrating a highly optimized HNSW algorithm for high-performance Approximate Nearest Neighbor (ANN) search, AliSQL empowers users to build AI-driven applications—such as semantic search and recommendation systems—seamlessly using standard SQL interfaces.
2530

2631
## Roadmap
2732

@@ -88,7 +93,7 @@ AliSQL is based on MySQL, which is licensed under GPL-2.0. The DuckDB integratio
8893

8994
## See Also
9095
- [AliSQL Release Notes](./wiki/changes-in-alisql-8.0.44.md)
91-
- [DuckDB Storage Engine in AliSQL](./wiki/duckdb/duckdb.md)
96+
- [DuckDB Storage Engine in AliSQL](./wiki/duckdb/duckdb-en.md)
9297
- [Vector Index in AliSQL](./wiki/vidx/vidx_readme.md)
9398
- [MySQL 8.0 Documentation](https://dev.mysql.com/doc/refman/8.0/en/)
9499
- [MySQL 8.0 Github Repository](https://github.com/mysql/mysql-server)

wiki/duckdb/duckdb-en.md

Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
# DuckDB in AliSQL
2+
![MySQL with DuckDB](./pic/mysql_with_duckdb.png)
3+
4+
[ [AliSQL DuckDB 引擎](./duckdb-zh.md) | [DuckDB in AliSQL](./duckdb-en.md) ]
5+
6+
## What is DuckDB?
7+
8+
[DuckDB](https://github.com/duckdb/duckdb) is an open-source embedded analytical database system (OLAP) designed for data analysis workloads. DuckDB is rapidly becoming a popular choice in data science, BI tools, and embedded analytics scenarios due to its key characteristics:
9+
10+
- **Exceptional Query Performance**: Single-node DuckDB performance not only far exceeds InnoDB, but even surpasses ClickHouse and SelectDB
11+
- **Excellent Compression**: DuckDB uses columnar storage and automatically selects appropriate compression algorithms based on data types, achieving very high compression ratios
12+
- **Embedded Design**: DuckDB is an embedded database system, naturally suitable for integration into MySQL
13+
- **Plugin Architecture**: DuckDB uses a plugin-based design, making it very convenient for third-party development and feature extensions
14+
- **Friendly License**: DuckDB's license allows any form of use, including commercial purposes
15+
16+
17+
## Why Integrate DuckDB with AliSQL?
18+
19+
MySQL has long lacked an analytical query engine. While InnoDB is naturally designed for OLTP and excels in TP scenarios, its query efficiency is very low for analytical workloads. This integration enables:
20+
21+
- **Hybrid Workloads**: Run both OLTP (MySQL/InnoDB) and OLAP (DuckDB) queries in a single database system
22+
- **High-Performance Analytics**: Analytical query performance improves up to **200x** compared to InnoDB
23+
- **Storage Cost Reduction**: DuckDB read replicas typically use only **20%** of the main instance's storage space due to high compression
24+
- **100% MySQL Syntax Compatibility**: No learning curve - DuckDB is integrated as a storage engine, so users continue using MySQL syntax
25+
- **Zero Additional Management Cost**: DuckDB instances are managed, operated, and monitored exactly like regular RDS MySQL instances
26+
- **One-Click Deployment**: Create DuckDB read-only instances with automatic data conversion from InnoDB to DuckDB
27+
28+
**AliSQL** integrates **DuckDB** as a native AP engine, empowering users with high-performance, lightweight analytical capabilities while maintaining a seamless, MySQL-compatible experience.
29+
30+
31+
## Architecture
32+
### MySQL's Pluggable Storage Engine Architecture
33+
MySQL's pluggable storage engine architecture allows it to extend its capabilities through different storage engines:
34+
35+
![MySQL Architecture](./pic/mysql_arch.png)
36+
37+
The architecture consists of four main layers:
38+
- **Runtime Layer**: Handles MySQL runtime tasks like communication, access control, system configuration, and monitoring
39+
- **Binlog Layer**: Manages binlog generation, replication, and application
40+
- **SQL Layer**: Handles SQL parsing, optimization, and execution
41+
- **Storage Engine Layer**: Manages data storage and access
42+
43+
### DuckDB Read-Only Instance Architecture
44+
45+
![DuckDB Architecture](./pic/duckdb_arch.png)
46+
47+
DuckDB analytical read-only instances use a read-write separation architecture:
48+
- Analytical workloads are separated from the main instance, ensuring no mutual impact
49+
- Data replication from the main instance via binlog mechanism (similar to regular read replicas)
50+
- InnoDB stores only metadata and system information (accounts, configurations)
51+
- All user data resides in the DuckDB engine
52+
53+
### Query Path
54+
55+
![Query Path](./pic/query_path.png)
56+
57+
1. Users connect via MySQL client
58+
2. MySQL parses the query and performs necessary processing
59+
3. SQL is sent to DuckDB engine for execution
60+
4. DuckDB returns results to server layer
61+
5. Server layer converts results to MySQL format and returns to client
62+
63+
**Compatibility**:
64+
- Extended DuckDB's syntax parser to support MySQL-specific syntax
65+
- Rewrote numerous DuckDB functions and added many MySQL functions
66+
- Automated compatibility testing platform with ~170,000 SQL tests shows **[99% compatibility rate](https://www.alibabacloud.com/help/en/rds/apsaradb-rds-for-mysql/compatibility-of-duckdb-based-analytical-instances?spm=a2c63.p38356.help-menu-26090.d_3_4_2.6a97448exEuaFG)**
67+
68+
### Binlog Replication Path
69+
70+
![Binlog Replication](./pic/binlog_replication.png)
71+
72+
73+
AliSQL allows DuckDB nodes to serve as replicas via Binlog synchronization. By re-engineering the transaction commit and replay processes, AliSQL overcomes the lack of 2PC support in DuckDB, ensuring full data and metadata consistency even after abnormal crashes.
74+
75+
**Idempotent Replay**:
76+
- Since DuckDB doesn't support two-phase commit, custom transaction commit and binlog replay processes ensure data consistency after instance crashes
77+
78+
**DML Replay Optimization**:
79+
- DuckDB favors large transactions; frequent small transactions cause severe replication lag
80+
- Implemented batch replay mechanism achieving **300K rows/s** replay capability
81+
- In Sysbench testing, achieves zero replication lag, even higher than InnoDB replay performance
82+
- Batch-write optimization also applies to the primary node: with our DML optimizations, INSERT and DELETE may achieve excellent performance on the primary.
83+
![Batch commit](./pic/batch_commit.png)
84+
85+
### DDL Compatibility & Optimizations
86+
87+
![DDL Compatibility](./pic/ddl_support.png)
88+
89+
- Natively supported DDL uses Inplace/Instant execution
90+
- For DDL operations DuckDB doesn't natively support (e.g., column reordering), implemented Copy DDL mechanism
91+
- Convert from InnoDB to DuckDB using multi-threaded parallel execution. Execution time reduced by **7x**
92+
![Copy DDL from InnoDB](./pic/parallel_copy_from_innodb.png)
93+
94+
95+
## Performance Benchmarks
96+
**Test Environment**:
97+
- ECS Instance: 32 CPU, 128GB Memory, ESSD PL1 Cloud Disk 500GB
98+
- Benchmark: TPC-H SF100
99+
100+
| Query ID | DuckDB | InnoDB | ClickHouse |
101+
| --- | --- | --- | --- |
102+
|q1|0.92|1134.25|3.47|
103+
|q2|0.15|1800|1.52|
104+
|q3|0.53|802.94|3.65|
105+
|q4|0.46|1000.45|2.77|
106+
|q5|0.5|1800|5.38|
107+
|q6|0.22|566.73|0.73|
108+
|q7|0.59|1800|6.06|
109+
|q8|0.68|1800|6.99|
110+
|q9|1.44|1800|13.29|
111+
|q10|0.91|894.35|3.22|
112+
|q11|0.11|79.63|1.1|
113+
|q12|0.44|734.35|1.69|
114+
|q13|1.59|454.15|5.85|
115+
|q14|0.38|574.07|0.83|
116+
|q15|0.31|568.43|1.53|
117+
|q16|0.32|63.56|0.52|
118+
|q17|0.89|1800|7.96|
119+
|q18|1.59|1800|3.11|
120+
|q19|0.8|1800|2.96|
121+
|q20|0.51|1800|3.38|
122+
|q21|1.64|1800|OOM|
123+
|q22|0.33|361.4|4|
124+
|total|15.31|25234.31|80.01
125+
126+
DuckDB demonstrates significant performance advantages over InnoDB in analytical query scenarios, with up to **200x improvement**.
127+
128+
## Try It on Alibaba Cloud
129+
You can experience RDS MySQL with DuckDB engine on Alibaba Cloud:
130+
131+
https://help.aliyun.com/zh/rds/apsaradb-rds-for-mysql/duckdb-based-analytical-instance/
132+
133+
134+
## See also
135+
136+
- [DuckDB Variables Reference](./duckdb_variables-en.md)
137+
- [How to Setup DuckDB Node](./how-to-setup-duckdb-node-en.md)
138+
- [DuckDB GitHub Repository](https://github.com/duckdb/duckdb)
139+
- [Detailed Article (Chinese)](https://mp.weixin.qq.com/s/_YmlV3vPc9CksumXvXWBEw)
140+
- [AliSQL](https://github.com/alibaba/AliSQL.git)

0 commit comments

Comments
 (0)