docs: add local-model providers section to README

normand-marineau · normand-marineau · commit 81d4e0ce23d2 · 2026-05-03T07:20:49.000-04:00
Promotes the existing `### LLM endpoint compatibility` subsection
from inside `## Configuration` to a top-level `## Using local LLM
providers` section, expanded with per-provider paragraphs covering
Ollama, llama.cpp, and vLLM/DeepSeek per CONV7_handoff.md sec 5.4.

This is the documentation half of Reviewer 2's optional
recommendation O7 ("API Flexibility: Support for local models...
will further improve metaScreener"). The technical capability has
existed since the project's first release - any OpenAI-compatible
endpoint just works once OPENAI_BASE_URL is set - but it was
previously buried in a single-paragraph subsection. The new
structure gives the topic visibility appropriate to its scope and
provides concrete copy-paste-ready environment variable settings
for each common provider.

WHAT THE NEW SECTION COVERS:

  * Opening paragraph stating that metaScreener targets any
    OpenAI-compatible endpoint, with a bulleted summary
    distinguishing hosted commercial APIs (Azure OpenAI, DeepSeek)
    from locally hosted models (Ollama, llama.cpp, vLLM).
  * The OPENAI_BASE_URL / OPENAI_API_KEY / Model field contract,
    explained once at section level so per-provider paragraphs
    don't need to repeat it.
  * ### Ollama subsection: endpoint URL, install/pull workflow,
    Model field guidance.
  * ### llama.cpp subsection: llama-server invocation, endpoint URL,
    note that Model field is informational when running llama.cpp
    directly (the server uses whichever model is currently loaded).
  * ### vLLM and DeepSeek subsection: vLLM as a high-throughput
    self-hosted alternative; DeepSeek as a hosted alternative with
    larger context windows than GPT-4o-mini.
  * Closing evidence-gating caveat (preserved VERBATIM from the
    previous subsection per sec 5.4: open-weight model
    compatibility with the evidence-gating protocol has not been
    formally tested; users testing local models are invited to
    file feedback).

WHAT WAS REMOVED:

  * The previous `### LLM endpoint compatibility` subsection
    inside `## Configuration` (8 lines). Its content is fully
    absorbed into the new top-level section, with the bullet list
    of compatible backends restructured and expanded. The verbatim
    caveat is preserved word-for-word as the closing note.

  `## Configuration` retains its `### Environment variables`
  subsection unchanged; only the LLM-endpoint subsection is moved
  out.

INVARIANTS PRESERVED:

  * Test count unchanged (no test changes): 103 passed, 1 xfailed.
  * The README badge regression test added in C0
    (test_readme_tested_on_badge_lists_actual_ci_platforms) still
    passes via the GitHub Actions CI badge present from C1.
  * No code changes; no plugin changes; no test changes.

Spec: see CONV7_handoff.md sec 4 ("Add README section on local-model
providers") and sec 5.4 (where it goes; verbatim caveat directive).
diff --git a/README.md b/README.md
@@ -290,12 +290,26 @@ Tested on Windows 10 and Ubuntu 24.04 (headless, via WSL/Docker).
 
 Copy `.env.example` to `.env` and set your API key. The application will prompt for confirmation on each launch.
 
-### LLM endpoint compatibility
+## Using local LLM providers
 
-metaScreener targets any **OpenAI-compatible API endpoint**. This includes:
-- OpenAI (GPT-4o, GPT-4o-mini, etc.)
-- Azure OpenAI
-- Locally hosted models via compatible inference frameworks (e.g., Ollama, LM Studio, vLLM)
+metaScreener targets any **OpenAI-compatible API endpoint**. The default backend is OpenAI's hosted API, but the same Python client transparently supports:
+
+- **Hosted commercial APIs** — Azure OpenAI, DeepSeek, and others that mirror OpenAI's chat completions schema.
+- **Locally hosted models** — open-weight models served via compatible inference frameworks such as Ollama, llama.cpp, and vLLM.
+
+Switching providers requires no code change: set the `OPENAI_BASE_URL` environment variable to the target endpoint and ensure `OPENAI_API_KEY` is non-empty (most local servers ignore the key value but require it to be set). The **Model** field in metaScreener's EL/IL Settings panels then selects which backend model to use. Three commonly used local-model paths are described below.
+
+### Ollama
+
+[Ollama](https://ollama.com/) exposes an OpenAI-compatible chat completions endpoint at `http://localhost:11434/v1`. After installing Ollama and pulling a model (e.g., `ollama pull llama3.1`), set `OPENAI_BASE_URL=http://localhost:11434/v1` and `OPENAI_API_KEY=ollama` (or any non-empty placeholder). In the EL/IL Settings panels, set **Model** to the local model name (e.g., `llama3.1`).
+
+### llama.cpp
+
+[llama.cpp](https://github.com/ggerganov/llama.cpp)'s `llama-server` binary exposes an OpenAI-compatible endpoint at `http://localhost:8080/v1` by default. Start the server with `./llama-server --model your-model.gguf` and set `OPENAI_BASE_URL=http://localhost:8080/v1` with `OPENAI_API_KEY=llama-cpp` (or any non-empty placeholder). The **Model** field can be set to any value when running llama.cpp directly, since the server uses whichever model is currently loaded.
+
+### vLLM and DeepSeek
+
+For higher-throughput self-hosted inference, [vLLM](https://github.com/vllm-project/vllm) exposes an OpenAI-compatible API tuned for batched GPU workloads; consult the vLLM documentation for the deployment-specific `OPENAI_BASE_URL`. As a hosted alternative, [DeepSeek](https://platform.deepseek.com/) provides an OpenAI-compatible endpoint at `https://api.deepseek.com/v1` with substantially larger context windows than GPT-4o-mini, useful when working with very long records. Use your DeepSeek API key as `OPENAI_API_KEY` for the hosted route.
 
 > **Note**: open-weight model compatibility with the evidence gating protocol (which requires models to produce verbatim substring quotations) has not been formally tested. If you test with a local model, we welcome your feedback via the issue tracker.