Skip to content

Commit 66dcc3e

Browse files
authored
Update README.md
1 parent 1b93426 commit 66dcc3e

1 file changed

Lines changed: 24 additions & 22 deletions

File tree

README.md

Lines changed: 24 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -389,30 +389,8 @@ print(f"Syncode augmented LLM output:\n{output}")
389389
```
390390
 
391391

392-
## How Does **SynCode** Compare to Other Constrained Decoders?
393392

394393

395-
| Tool | Regex | CFG* | Pre-Computed* | GPL* |
396-
|---------------------------------------------------- |-----------|-----------|:-------------:|------|
397-
| [`LMQL`](https://github.com/eth-sri/lmql) |||||
398-
| [`GUIDANCE`](https://github.com/guidance-ai/guidance) |||||
399-
| [`OUTLINES`](https://github.com/outlines-dev/outlines) |||||
400-
| [`PICARD`](https://github.com/ServiceNow/picard) |||||
401-
| [`SYNCHROMESH`](https://arxiv.org/abs/2201.11227) |||||
402-
| [`LLAMA.CPP`](https://github.com/ggerganov/llama.cpp) |||||
403-
| [`GCD`](https://arxiv.org/abs/2305.13971) |||||
404-
| **SynCode** | **** | **** | **** | **** |
405-
---
406-
407-
**CFG***: Guide generation with a Context Free Grammar (CFG)
408-
409-
**Pre-Computed***: Precompute masks over the vocabulary to significantly improve generation speed
410-
411-
**GPL***: Support general-purpose programming languages, which involve non-context-free fragments, such as indentation in Python and end-of-scope markers in Golang.
412-
413-
[test-img]: https://github.com/shubhamugare/llm-cfg/actions/workflows/run_tests.yml/badge.svg
414-
[tests]: https://github.com/shubhamugare/llm-cfg/actions/workflows/run_tests.yml
415-
416394
## 📜 Citation
417395
<p>
418396
<a href="https://arxiv.org/abs/2403.01632"><img src="https://img.shields.io/badge/Paper-arXiv-blue"></a>
@@ -437,6 +415,30 @@ print(f"Syncode augmented LLM output:\n{output}")
437415

438416
In the SynCode workflow, the LLM takes partial code _C<sub>k</sub>_ and generates a distribution for the next token _t<sub>k+1</sub>_. The incremental parser processes _C<sub>k</sub>_ to generate accept sequences _A_, the sequences of terminals that can follow partial code called accept sequences. Simultaneously, the incremental parser computes a remainder _r_ from the partial code, representing the suffix that may change its terminal type in subsequent generations. The backbone of SynCode is the offline construction of a DFA mask store, a lookup table derived from regular expressions representing the terminals of the language grammar. The DFA mask store facilitates efficient traversal of DFA states, enabling the retrieval of masks mapped to each state and accept sequence. SynCode walks over the DFA using the remainder and uses the mask store to compute the mask specific to each accept sequence. By unifying masks for each accept sequence SynCode gets the set of syntactically valid tokens. The LLM iteratively generates a token _t<sub>k+1</sub>_ using the distribution and the mask, appending it to _C<sub>k</sub>_ to create the updated code _C<sub>k+1</sub>_. The process continues until the LLM returns the final code _C<sub>n</sub>_ based on the defined stop condition.
439417

418+
## How Does **SynCode** Compare to Other Constrained Decoders?
419+
420+
421+
| Tool | Regex | CFG* | Pre-Computed* | GPL* |
422+
|---------------------------------------------------- |-----------|-----------|:-------------:|------|
423+
| [`LMQL`](https://github.com/eth-sri/lmql) |||||
424+
| [`GUIDANCE`](https://github.com/guidance-ai/guidance) |||||
425+
| [`OUTLINES`](https://github.com/outlines-dev/outlines) |||||
426+
| [`PICARD`](https://github.com/ServiceNow/picard) |||||
427+
| [`SYNCHROMESH`](https://arxiv.org/abs/2201.11227) |||||
428+
| [`LLAMA.CPP`](https://github.com/ggerganov/llama.cpp) |||||
429+
| [`GCD`](https://arxiv.org/abs/2305.13971) |||||
430+
| **SynCode** | **** | **** | **** | **** |
431+
---
432+
433+
**CFG***: Guide generation with a Context Free Grammar (CFG)
434+
435+
**Pre-Computed***: Precompute masks over the vocabulary to significantly improve generation speed
436+
437+
**GPL***: Support general-purpose programming languages, which involve non-context-free fragments, such as indentation in Python and end-of-scope markers in Golang.
438+
439+
[test-img]: https://github.com/shubhamugare/llm-cfg/actions/workflows/run_tests.yml/badge.svg
440+
[tests]: https://github.com/shubhamugare/llm-cfg/actions/workflows/run_tests.yml
441+
440442
## Contact
441443
For questions, please contact [Shubham Ugare](mailto:shubhamdugare@gmail.com).
442444

0 commit comments

Comments
 (0)