You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
-6Lines changed: 0 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -84,12 +84,6 @@ Ensure you have at least two GPU to run the full test suite:
84
84
```bash
85
85
uv run pytest
86
86
```
87
-
88
-
### On/Off Ramping Routines
89
-
- For the first initialisation, all the GLOBAL env vars matter and will be used by the nodes to initialize.
90
-
- When nodes join, only the `GLOBAL_ADDR` and `GLOBAL_PORT` matter. You still have to set `GLOBAL_RANK` and `GLOBAL_WORLD_SIZE` but they will be updated when the global pg initializes.
91
-
- When a node wishes to offboard, it must call `edm._queue_leave()` and then `edm.maybe_reinit_global_pg()`. The mechanism is that it has to tell the master it is leaving and then join the the next `edm.maybe_reinit_global_pg()` in order to not deadlock the barrier for master's `_resolve_world()`. Jackmin is trying to change this behavior such that the leaving node can leave without having to `edm.maybe_reinit_global_pg()` or without `edm._queue_leave()` but they are required for now.
92
-
93
87
## Environment variables
94
88
### Global Store Initialization
95
89
| Environment Variable | Description | Default Value |
0 commit comments