Skip to content

Commit fcef045

Browse files
committed
Add latest article on type system optimizations
1 parent ef8f990 commit fcef045

File tree

1 file changed

+363
-0
lines changed

1 file changed

+363
-0
lines changed
Lines changed: 363 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,363 @@
1+
---
2+
layout: post
3+
title: "Lazy BDDs with eager literal intersections"
4+
authors:
5+
- José Valim
6+
category: Internals
7+
excerpt: "This article explores the latest batch of optimizations we did to set-theoretic types and their representation"
8+
---
9+
10+
# Lazy BDDs with eager literal intersections
11+
12+
In [a previous article](/blog/2025/12/02/lazier-bdds-for-set-theoretic-types/),
13+
we discussed how Elixir changed its set-theoretic types representation from
14+
Disjunctive Normal Forms (DNFs) to Lazy Binary Decision Diagrams (Lazy BDDs).
15+
16+
In a nutshell, DNFs allow us to represent unions, intersections,
17+
and negations as a flat data structure:
18+
19+
```elixir
20+
(c1 and not d1) or (c2 and not d2) or (c3 and not d3) or ...
21+
```
22+
23+
This meant that any operation between complex types was immediately
24+
flattened. For example, intersections of unions, such as
25+
`(foo or bar) and (baz or bat)`, had to be immediately flatten into the
26+
cartesian production `(foo and baz) or (foo and bat) or (bar and baz) or (bar and bat)`.
27+
Even worse, unions of differences could lead to exponential expansion.
28+
29+
Elixir v1.19 then introduced BDDs with lazy unions (in short, lazy BDDs).
30+
They are trees which allow us to represent set-theoretic operations of any
31+
arbitrary depth, without flattening them, while also detecting duplicate types.
32+
A lazy BDD has type
33+
34+
```elixir
35+
type lazy_bdd() = :top or :bottom or
36+
{type(), constrained :: lazy_bdd(), uncertain :: lazy_bdd(), dual :: lazy_bdd()}
37+
```
38+
39+
where `type()` is the representation of the actual type. For example,
40+
if the type being represented is a `tuple`, `type()` would be a list of
41+
all elements in the tuple. In literature, `type()` is known as literal.
42+
43+
Throughout this article, we will use the following notation to represent
44+
lazy BDDs:
45+
46+
```elixir
47+
B = {a, C, U, D}
48+
```
49+
50+
where `B` stands for BDD, `a` is the literal, `C` is constrained, `U`
51+
is uncertain, and `D` is dual. Semantically, the BDD above is the same as:
52+
53+
```elixir
54+
B = (a and C) or U or (not a and D)
55+
```
56+
57+
Which means the following expression, where `foo`, `bar`, `baz`,
58+
and `bat` below represent types:
59+
60+
```elixir
61+
(foo and not (bar or (baz and bat))
62+
```
63+
64+
will be stored as:
65+
66+
```elixir
67+
{foo,
68+
{bar, :bottom, :bottom,
69+
{baz, :bottom,
70+
{bat, :bottom, :bottom, :top}, :top}, :bottom, :bottom}
71+
```
72+
73+
The conversion to lazy BDDs and a few optimizations we added to their
74+
formulation in literature allowed us to type check Elixir programs faster,
75+
despite Elixir v1.19 performing more type checks than v1.18!
76+
77+
However, when working on Elixir v1.20, which introduced type inference of
78+
all constructs, we noticed some of the downsides of lazy BDDs. This article
79+
explores these downsides and how we addressed them.
80+
81+
> As with previous articles, we discuss implementation details of the
82+
> type system. You don’t need to understand these internals to use the type
83+
> system. Our goal is simply to document our progress and provide guidance
84+
> for future maintainers and implementers. Let’s get started.
85+
86+
## The trouble with laziness
87+
88+
As we described above, lazy BDDs allow us to represent set-theoretic
89+
operations at any depth. And while this is extremely useful for unions
90+
and differences, they offer a downside when it comes to intersections.
91+
For example, take this type:
92+
93+
```elixir
94+
(%Foo{} or %Bar{} or %Baz{} or %Bat{}) and %Bar{}
95+
```
96+
97+
While we could store the above as a BDD, it is also clear that the
98+
above can only be equal to `%Bar{}`. In other words, if we
99+
resolve intersections eagerly, we will most likely reduce the tree
100+
size, speeding up all future operations! To do this, we need
101+
to compute the intersection between literal types (the first element of
102+
the BDD node), rather than intersections between BDDs. So we are naming
103+
these new optimizations **eager literal intersections**.
104+
105+
## Eager literal intersections
106+
107+
Our goal is to apply intersections between literals as soon as possible
108+
as it helps us reduce the size of the tree whenever literal intersections
109+
are empty.
110+
111+
Take two BDDs:
112+
113+
```elixir
114+
B1 = {a1, C1, U1, D1}
115+
B2 = {a2, C2, U2, D2}
116+
```
117+
118+
And imagine there is a function that can compute the intersection between
119+
`a1` and `a2` (which is the intersection of literals, not BDDs). The optimization
120+
works as long as one of `C1` or `C2` are `:top`. In this case, let's choose `C2`,
121+
so we have:
122+
123+
```elixir
124+
B1 = (a1 and C1) or U1 or (not a1 and D1)
125+
B2 = a2 or U2 or (not a2 and D2)
126+
```
127+
128+
The intersection of `B1` and `B2` can be computed as:
129+
130+
```elixir
131+
B1 and (a2 or U2 or (not a2 and D2))
132+
```
133+
134+
Let's distribute it:
135+
136+
```elixir
137+
(a2 and B1) or (U2 and B1) or ((not a2 and D2) and B1)
138+
```
139+
140+
And expand the first `B1`:
141+
142+
```elixir
143+
(a2 and ((a1 and C1) or U1 or (not a1 and D1))) or
144+
(U2 and B1) or ((not a2 and D2) and B1)
145+
```
146+
147+
And now let's distribute `a2` while reordering the arguments:
148+
149+
```elixir
150+
(((a1 and a2) and C1) or (a2 and U1) or ((a2 and D1) and not a1) or
151+
(U2 and B1) or ((not a2 and D2) and B1)
152+
```
153+
154+
In the first term of the union, we have `a1 and a2` as a literal intersection.
155+
If `a1 and a2` is empty, then the whole C1 node can be eliminated.
156+
157+
Then we proceed recursively intersect `a2` with every literal node in `a2 and U1`
158+
and `a2 and D1`. And, if their literal nodes are empty, those subtrees are eliminated
159+
too. This allows us to dramatically cut down the size of tree! In our benchmarks,
160+
these optimizations allowed us to reduce type checking of a module from 10s to
161+
25ms.
162+
163+
The remaining terms of the union are `U2 and B1` and `(not a2 and D2) and B1`.
164+
If desired, we could apply the same eager literal intersection optimization to
165+
`U2 and B1` (as long as the constrained part in either `U2` or `B1 ` are `:top`).
166+
167+
This optimization has worked quite well for us, but it is important to keep in
168+
mind since BDDs are ordered and the literal intersection may create a new literal
169+
value, this optimization must be applied semantically so we can recompute the
170+
position of intersected literals in the tree. We cannot apply it when we are
171+
already traversing the tree using the general lazy BDD formulas from the previous
172+
article.
173+
174+
Finally, note this optimization may eagerly reintroduce some of the complexity seen
175+
in DNFs if applied recursively. For instance, if you have `(foo or bar) and (baz or bat)`,
176+
the recursive application of eager literal intersections will yield
177+
`(foo and baz) or (foo and bat) or (bar and baz) or (bar and bat)`. If most of those
178+
intersections are eliminated, then applying eager literal intersections is still beneficial,
179+
but that may not always be the case.
180+
181+
To discuss exactly when these trade-offs may be problematic, let's talk about open vs
182+
closed types.
183+
184+
### Open vs closed types
185+
186+
Elixir's type system can represent both open and closed maps. When you write:
187+
188+
```elixir
189+
user = %{name: "john", age: 42}
190+
```
191+
192+
We are certain the map has keys `:name` and `:age` and only those keys.
193+
We say this map is closed, as it has no other keys, and it would have
194+
the type `%{name: binary(), age: integer()}`.
195+
196+
However, when you pattern match on it:
197+
198+
```elixir
199+
def can_drive?(%{age: age}) when is_integer(age) and age >= 18
200+
```
201+
202+
Because pattern matching only validates the `age` key, a map given as
203+
argument may have other keys! Therefore, we say the map is open and
204+
it has the type `%{..., age: integer()}`. This type says the map may
205+
have any keys but we are sure the `age` is `integer()`.
206+
207+
The trouble is that, when we are intersecting two maps, because the
208+
open map is very broad, their intersection rarely eliminate entries.
209+
For example, the intersection between `%{..., age: integer()}` and
210+
`%{..., name: binary()}` is the map `%{..., name: binary(), age: integer()}`.
211+
212+
So when we have to compute the intersection between `(foo or bar) and (baz or bat)`
213+
and `foo`, `bar`, `baz`, and `bat` are open maps with different keys,
214+
then it will generate a cartesian product of all combinations! However,
215+
if they were closed maps, the end result would be empty. For this reason,
216+
we recommend applying the eager literal intersection only when the
217+
intersection will often lead to empty types.
218+
219+
## Optimizing differences
220+
221+
The difference between `B1 \ B2` can always be expressed as the intersection
222+
between `B1` and `not B2`, which is precisely how we write differences in Elixir.
223+
224+
Now take the following type:
225+
226+
```elixir
227+
{:ok, integer()} and not {:error, integer()}
228+
```
229+
230+
Currently, both `ok`-type and `error`-type are stored as nodes in the BDD.
231+
However, it is clear in the example above the types are disjoint and the
232+
result is `{:ok, integer()}`.
233+
234+
If we know the literals `a1` and `a2` are disjoint (their intersection is
235+
empty), then it is likely we can avoid adding nodes to the tree.
236+
237+
Furthermore, imagine this type:
238+
239+
```elixir
240+
{:ok, integer()} and not (not {:error, integer()})
241+
```
242+
243+
The difference will convert the negation into a positive, resulting in
244+
`{:ok, integer()} and {:error, integer()}`, which is empty. Therefore,
245+
if `B2` contains negations (which means its `D2` component is not bottom),
246+
then we can also apply the eager literal intersection optimization from
247+
the previous section.
248+
249+
We will explore both scenarios next, starting with the one where D2
250+
is bottom.
251+
252+
### When `D2` is bottom
253+
254+
We start `B1 and not B2`:
255+
256+
```elixir
257+
B1 and not B2
258+
```
259+
260+
Next let's break `B1` into `(a1 and C1) or B1_no_C1`, where `B1_no_C1` is `U1 or (not a1 and D1)`:
261+
262+
```elixir
263+
((a1 and C1) or B1_no_C1) and not B2
264+
```
265+
266+
Now we distribute the difference:
267+
268+
```elixir
269+
((a1 and C1) and not B2) or (B1_no_C1 and not B2)
270+
```
271+
272+
Let's solve the left-hand side. We know `B2 = (a2 and C2) or U2` (remember `D2` is bottom), so let's add that:
273+
274+
```elixir
275+
(a1 and C1 and not ((a2 and C2) or U2))
276+
or (B1_no_C1 and not B2)
277+
```
278+
279+
Now distribute the `not` and remove the parenthesis:
280+
281+
```elixir
282+
(a1 and C1 and not (a2 and C2) and not U2)
283+
or (B1_no_C1 and not B2)
284+
```
285+
286+
Now, **if `a1` and `a2` are disjoint**, then `a1 and not (a2 and C2)` is the same as `a1`.
287+
This happens because if `a1` and `a2` are disjoint, `a2 and C2` is a subset of `a2`,
288+
which is then also disjoint with `a1`. So the first part simplifies to `a1 and C1 and not U2`:
289+
290+
```elixir
291+
(a1 and C1 and not U2) or (B1_no_C1 and not B2)
292+
```
293+
294+
Now by expanding `B1_no_C1` into `U1 or (not a1 and D1)` and distributed
295+
the union, we get:
296+
297+
```elixir
298+
(a1 and C1 and not U2) or (U1 and not B2) or (not a1 and D1 and not B2)
299+
```
300+
301+
which is in the BDD format! So we can rewrite the difference of disjoint literals as:
302+
303+
```elixir
304+
{a1, C1 and not U2, U1 and not B2, D1 and not B2}
305+
```
306+
307+
which completely avoids adding `a2` to the BDD and can then continue recursively.
308+
309+
### When `D2` is not bottom
310+
311+
When D2 is not bottom, it means B2 has a negated component.
312+
Since B2 itself is negated when part of a difference, the D2
313+
component of B2 becomes an intersection, and we can apply the
314+
same eager literal technique we applied to intersections.
315+
316+
Once again, we start `B1 and not B2`:
317+
318+
```elixir
319+
B1 and not B2
320+
```
321+
322+
Now let's break `B2` into `(B2_no_D2 or (not a2 and D2))`, where `B2_no_D2` is `(a2 and C2) or U2`:
323+
324+
```elixir
325+
(B1) and not (B2_no_D2 or (not a2 and D2))
326+
```
327+
328+
Now we distribute the negation all the way through:
329+
330+
```elixir
331+
(B1) and (not B2_no_D2 and (a2 or not D2))
332+
```
333+
334+
And distribute `B1`'s intersection with `(a2 or not D2)`:
335+
336+
```elixir
337+
((B1 and a2) or (B1 and not D2)) and not B2_no_D2
338+
```
339+
340+
`B1 and a2` is an eager literal intersection from the previous section,
341+
which we can reuse!
342+
343+
Furthermore, notice at the end we compute the difference between
344+
`((B1 and a2) or (B1 and not D2))` and `B2_no_D2`. Given `B2_no_D2`
345+
by definition has `D2 = bottom`, we can apply the optimized
346+
difference for when D2 is bottom.
347+
348+
At the moment, we are not applying this optimization in Elixir,
349+
as the difference with negations on the right-hand side are uncommon.
350+
We may revisit this in the future.
351+
352+
## Results
353+
354+
We initially [implemented eager literal intersections as part of
355+
Elixir v1.20 release](
356+
https://github.com/elixir-lang/elixir/compare/995f7fc2c4080d2c0d1f78a7d896366b0c715178...0e3d22fd7997004ad23ff02d9ee935160869522f),
357+
which reduced the type checking time of one of the pathological cases
358+
from 10 seconds to 25ms!
359+
360+
However, our initial implementation also caused a performance regression,
361+
as we did not distinguish between open and closed maps. This regression was
362+
addressed by [applying the optimization only to closed maps](https://github.com/elixir-lang/elixir/commit/e5dc69398ef172b4a590e7e4e20f9d52b4b7ab59), as discussed
363+
in the article.

0 commit comments

Comments
 (0)