Skip to content

Cron job creation should reconstruct DM origin from HERMES_SESSION_KEY when session vars are missing #10604

@KeaneYan

Description

@KeaneYan

Bug description

When a cron job is created from a Weixin DM (and likely other DM platforms), the cron tool can persist:

{
  "deliver": "origin",
  "origin": null
}

Even when the gateway session still has a valid HERMES_SESSION_KEY like:

agent:main:weixin:dm:<chat_id>

This makes deliver=origin fragile: if explicit session platform/chat env vars are missing in the worker thread, the cron tool drops the origin metadata instead of reconstructing it from the session key.

Why this is a separate bug from the existing issues

There are already open issues/PRs around cron delivery:

  • #8848 / #9193: deliver=origin fallback missing weixin/feishu/wecom
  • #9354 / #10227: gateway worker threads lose session contextvars, so cron creation may not see HERMES_SESSION_PLATFORM / HERMES_SESSION_CHAT_ID

Those are real, but the cron tool still has a robustness gap of its own:

  • tools/cronjob_tools.py::_origin_from_env() only trusts explicit HERMES_SESSION_PLATFORM + HERMES_SESSION_CHAT_ID
  • if those are unavailable, it returns None
  • it does not attempt to reconstruct DM origin from HERMES_SESSION_KEY, even though the DM session key format is deterministic and reversible

So even with the broader gateway/contextvars fixes pending, the cron tool is currently less defensive than it could be.

Reproduction

  1. Start from a Weixin DM session.
  2. Ensure HERMES_SESSION_KEY=agent:main:weixin:dm:<chat_id> is present.
  3. Create a cron job through the cron tool.
  4. In the execution path where HERMES_SESSION_PLATFORM / HERMES_SESSION_CHAT_ID are missing, inspect ~/.hermes/cron/jobs.json.
  5. The created job is stored with:
    • deliver: "origin"
    • origin: null

Expected behavior

If explicit session vars are unavailable, _origin_from_env() should fall back to parsing DM HERMES_SESSION_KEY values of the form:

agent:main:<platform>:dm:<chat_id>[:<thread_id>]

and reconstruct:

{
  "platform": "weixin",
  "chat_id": "...",
  "thread_id": null
}

Suggested fix

In tools/cronjob_tools.py, make _origin_from_env():

  1. first read HERMES_SESSION_PLATFORM / HERMES_SESSION_CHAT_ID as it does today
  2. if missing, read HERMES_SESSION_KEY
  3. if the key matches DM format, reconstruct origin from it

Pseudo-shape:

session_key = get_session_env("HERMES_SESSION_KEY")
parts = session_key.split(":") if session_key else []
if len(parts) >= 5 and parts[0] == "agent" and parts[1] == "main" and parts[3] == "dm":
    return {
        "platform": parts[2],
        "chat_id": parts[4],
        "thread_id": parts[5] if len(parts) >= 6 else None,
    }

Why this helps

This does not replace the gateway-side context propagation fix, but it makes cron creation much more resilient:

  • DM-origin cron jobs stop depending on a single env propagation path
  • origin metadata is still preserved when the session key is available
  • deliver=origin becomes safer for Weixin/Telegram/other DM platforms

Regression test ideas

Add tests for:

  1. explicit platform/chat vars still win when present
  2. Weixin DM session key reconstructs platform=weixin, chat_id=<id>, thread_id=None
  3. DM session key with thread id reconstructs the thread id correctly

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions