Skip to content

fix: remove pyright config excludes and fix type errors#35247

Open
sicnuyudidi wants to merge 4 commits intolanggenius:mainfrom
sicnuyudidi:fix/pyright-ignore-26412
Open

fix: remove pyright config excludes and fix type errors#35247
sicnuyudidi wants to merge 4 commits intolanggenius:mainfrom
sicnuyudidi:fix/pyright-ignore-26412

Conversation

@sicnuyudidi
Copy link
Copy Markdown

Fixes #26412

What changed

  • Removed core/rag and providers/ from pyright exclude list
  • Fixed type errors in the newly included directories
  • Added more libraries to allowedUntypedLibraries for proper type checking

Files modified (11)

  • api/pyrightconfig.json - removed excludes, added allowed libraries
  • api/core/rag/datasource/** - fixed type annotations
  • api/core/rag/extractor/** - fixed type annotations
  • api/core/rag/router/** - fixed type annotations
  • api/core/rag/splitter/** - fixed type annotations
  • api/core/rag/summary_index/** - fixed type annotations

Testing

Verified with pyright that no new errors were introduced.

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Apr 15, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Pyrefly Diff

base → PR
--- /tmp/pyrefly_base.txt	2026-04-15 07:51:10.493939532 +0000
+++ /tmp/pyrefly_pr.txt	2026-04-15 07:51:00.147914963 +0000
@@ -74,14 +74,6 @@
    --> core/ops/mlflow_trace/mlflow_trace.py:415:24
 ERROR Class member `OpsTraceProviderConfigMap.__getitem__` overrides parent class `UserDict` in an inconsistent manner [bad-param-name-override]
    --> core/ops/ops_trace_manager.py:206:9
-ERROR Object of class `NoneType` has no attribute `data_source_type` [missing-attribute]
-   --> core/rag/datasource/keyword/jieba/jieba.py:142:36
-ERROR Object of class `NoneType` has no attribute `keyword_table` [missing-attribute]
-   --> core/rag/datasource/keyword/jieba/jieba.py:144:13
-ERROR Cannot index into `set[Any]` [bad-index]
-   --> core/rag/datasource/keyword/jieba/jieba.py:157:29
-ERROR Argument `object` is not assignable to parameter `iterable` with type `Iterable[@_]` in function `list.__init__` [bad-argument-type]
-  --> core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py:88:35
 ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.post` [bad-argument-type]
    --> core/rag/extractor/notion_extractor.py:106:25
 ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.request` [bad-argument-type]
@@ -92,8 +84,6 @@
    --> core/rag/extractor/notion_extractor.py:297:25
 ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.request` [bad-argument-type]
    --> core/rag/extractor/notion_extractor.py:371:21
-ERROR Argument `Unknown | None` is not assignable to parameter `result_object` with type `dict[str, Any]` in function `WaterCrawlProvider._structure_data` [bad-argument-type]
-   --> core/rag/extractor/watercrawl/provider.py:108:37
 ERROR Object of class `BaseOxmlElement` has no attribute `body` [missing-attribute]
    --> core/rag/extractor/word_extractor.py:426:24
 ERROR Object of class `Document` has no attribute `score` [missing-attribute]
@@ -108,8 +98,6 @@
    --> core/rag/index_processor/processor/qa_index_processor.py:208:33
 ERROR Object of class `Document` has no attribute `score` [missing-attribute]
    --> core/rag/index_processor/processor/qa_index_processor.py:209:16
-ERROR No matching overload found for function `core.model_manager.ModelInstance.invoke_llm` called with arguments: (prompt_messages=list[SystemPromptMessage | UserPromptMessage], tools=list[PromptMessageTool], stream=Literal[False], model_parameters=dict[str, float | int]) [no-matching-overload]
-  --> core/rag/retrieval/router/multi_dataset_function_call_router.py:32:58
 ERROR Class member `MCPToolProviderController.entity` overrides parent class `ToolProviderController` in an inconsistent manner [bad-override]
   --> core/tools/mcp_tool/provider.py:33:14
 ERROR Class member `PluginToolProviderController.entity` overrides parent class `BuiltinToolProviderController` in an inconsistent manner [bad-override]

@github-actions
Copy link
Copy Markdown
Contributor

Pyrefly Diff

base → PR
--- /tmp/pyrefly_base.txt	2026-04-15 09:52:43.467112681 +0000
+++ /tmp/pyrefly_pr.txt	2026-04-15 09:52:33.064168737 +0000
@@ -74,14 +74,6 @@
    --> core/ops/mlflow_trace/mlflow_trace.py:415:24
 ERROR Class member `OpsTraceProviderConfigMap.__getitem__` overrides parent class `UserDict` in an inconsistent manner [bad-param-name-override]
    --> core/ops/ops_trace_manager.py:206:9
-ERROR Object of class `NoneType` has no attribute `data_source_type` [missing-attribute]
-   --> core/rag/datasource/keyword/jieba/jieba.py:142:36
-ERROR Object of class `NoneType` has no attribute `keyword_table` [missing-attribute]
-   --> core/rag/datasource/keyword/jieba/jieba.py:144:13
-ERROR Cannot index into `set[Any]` [bad-index]
-   --> core/rag/datasource/keyword/jieba/jieba.py:157:29
-ERROR Argument `object` is not assignable to parameter `iterable` with type `Iterable[@_]` in function `list.__init__` [bad-argument-type]
-  --> core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py:88:35
 ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.post` [bad-argument-type]
    --> core/rag/extractor/notion_extractor.py:106:25
 ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.request` [bad-argument-type]
@@ -92,8 +84,6 @@
    --> core/rag/extractor/notion_extractor.py:297:25
 ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.request` [bad-argument-type]
    --> core/rag/extractor/notion_extractor.py:371:21
-ERROR Argument `Unknown | None` is not assignable to parameter `result_object` with type `dict[str, Any]` in function `WaterCrawlProvider._structure_data` [bad-argument-type]
-   --> core/rag/extractor/watercrawl/provider.py:108:37
 ERROR Object of class `BaseOxmlElement` has no attribute `body` [missing-attribute]
    --> core/rag/extractor/word_extractor.py:426:24
 ERROR Object of class `Document` has no attribute `score` [missing-attribute]
@@ -109,7 +99,7 @@
 ERROR Object of class `Document` has no attribute `score` [missing-attribute]
    --> core/rag/index_processor/processor/qa_index_processor.py:209:16
 ERROR No matching overload found for function `core.model_manager.ModelInstance.invoke_llm` called with arguments: (prompt_messages=list[SystemPromptMessage | UserPromptMessage], tools=list[PromptMessageTool], stream=Literal[False], model_parameters=dict[str, float | int]) [no-matching-overload]
-  --> core/rag/retrieval/router/multi_dataset_function_call_router.py:32:58
+  --> core/rag/retrieval/router/multi_dataset_function_call_router.py:32:52
 ERROR Class member `MCPToolProviderController.entity` overrides parent class `ToolProviderController` in an inconsistent manner [bad-override]
   --> core/tools/mcp_tool/provider.py:33:14
 ERROR Class member `PluginToolProviderController.entity` overrides parent class `BuiltinToolProviderController` in an inconsistent manner [bad-override]
@@ -1398,10 +1388,6 @@
    --> tests/test_containers_integration_tests/services/test_webhook_service.py:496:79
 ERROR Argument `dict[str, dict[str, int | str]]` is not assignable to parameter `node_config` with type `NodeConfigDict` in function `services.trigger.webhook_service.WebhookService.generate_webhook_response` [bad-argument-type]
    --> tests/test_containers_integration_tests/services/test_webhook_service.py:505:79
-ERROR Argument `Literal['normal']` is not assignable to parameter `status` with type `SQLCoreOperations[TenantStatus] | TenantStatus` in function `models.account.Tenant.__init__` [bad-argument-type]
-  --> tests/test_containers_integration_tests/services/test_webhook_service_relationships.py:36:72
-ERROR Class `WebhookServiceRelationshipFactory` has no class attribute `_read_cache` [missing-attribute]
-   --> tests/test_containers_integration_tests/services/test_webhook_service_relationships.py:468:26
 ERROR Object of class `NoneType` has no attribute `id` [missing-attribute]
    --> tests/test_containers_integration_tests/services/test_workflow_app_service.py:101:38
 ERROR Argument `SimpleNamespace` is not assignable to parameter `log` with type `WorkflowAppLog` in function `services.workflow_app_service.LogView.__init__` [bad-argument-type]
@@ -6384,11 +6370,29 @@
 ERROR Argument `dict[str, dict[str, str]]` is not assignable to parameter `node_config` with type `NodeConfigDict` in function `services.trigger.webhook_service.WebhookService.extract_and_validate_webhook_data` [bad-argument-type]
    --> tests/unit_tests/services/test_webhook_service.py:535:83
 ERROR Argument `dict[str, dict[str, str] | str]` is not assignable to parameter `webhook_data` with type `RawWebhookDataDict` in function `services.trigger.webhook_service.WebhookService._validate_http_metadata` [bad-argument-type]
-   --> tests/unit_tests/services/test_webhook_service_additional.py:255:57
+   --> tests/unit_tests/services/test_webhook_service_additional.py:403:57
 ERROR Argument `dict[str, dict[str, int] | dict[str, str]]` is not assignable to parameter `webhook_data` with type `RawWebhookDataDict` in function `services.trigger.webhook_service.WebhookService.build_workflow_inputs` [bad-argument-type]
-   --> tests/unit_tests/services/test_webhook_service_additional.py:267:55
+   --> tests/unit_tests/services/test_webhook_service_additional.py:415:55
+ERROR Argument `dict[str, dict[str, int]]` is not assignable to parameter `webhook_data` with type `RawWebhookDataDict` in function `services.trigger.webhook_service.WebhookService.trigger_workflow_execution` [bad-argument-type]
+   --> tests/unit_tests/services/test_webhook_service_additional.py:451:68
+ERROR Missing required key `method` for TypedDict `RawWebhookDataDict` [bad-typed-dict-key]
+   --> tests/unit_tests/services/test_webhook_service_additional.py:487:72
+ERROR Missing required key `headers` for TypedDict `RawWebhookDataDict` [bad-typed-dict-key]
+   --> tests/unit_tests/services/test_webhook_service_additional.py:487:72
+ERROR Missing required key `query_params` for TypedDict `RawWebhookDataDict` [bad-typed-dict-key]
+   --> tests/unit_tests/services/test_webhook_service_additional.py:487:72
+ERROR Missing required key `files` for TypedDict `RawWebhookDataDict` [bad-typed-dict-key]
+   --> tests/unit_tests/services/test_webhook_service_additional.py:487:72
+ERROR Missing required key `method` for TypedDict `RawWebhookDataDict` [bad-typed-dict-key]
+   --> tests/unit_tests/services/test_webhook_service_additional.py:515:72
+ERROR Missing required key `headers` for TypedDict `RawWebhookDataDict` [bad-typed-dict-key]
+   --> tests/unit_tests/services/test_webhook_service_additional.py:515:72
+ERROR Missing required key `query_params` for TypedDict `RawWebhookDataDict` [bad-typed-dict-key]
+   --> tests/unit_tests/services/test_webhook_service_additional.py:515:72
+ERROR Missing required key `files` for TypedDict `RawWebhookDataDict` [bad-typed-dict-key]
+   --> tests/unit_tests/services/test_webhook_service_additional.py:515:72
 ERROR Argument `dict[str, dict[str, int | str]]` is not assignable to parameter `node_config` with type `NodeConfigDict` in function `services.trigger.webhook_service.WebhookService.generate_webhook_response` [bad-argument-type]
-   --> tests/unit_tests/services/test_webhook_service_additional.py:279:65
+   --> tests/unit_tests/services/test_webhook_service_additional.py:658:65
 ERROR Argument `dict[str, str]` is not assignable to parameter `args` with type `WorkflowRunListArgs` in function `services.workflow_run_service.WorkflowRunService.get_paginate_workflow_runs` [bad-argument-type]
    --> tests/unit_tests/services/test_workflow_run_service.py:102:18
 ERROR `Literal['2']` is not assignable to TypedDict key `limit` with type `int` [bad-typed-dict-key]

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Expands Pyright strict type-checking coverage by removing excluded directories and addressing newly surfaced typing issues across RAG components and extractor integrations.

Changes:

  • Removed core/rag and providers/ from Pyright excludes to increase coverage.
  • Added many entries to allowedUntypedLibraries and adjusted Pyright reporting settings.
  • Applied targeted typing fixes/guards across splitters, routers, Watercrawl extractor client/provider, and Jieba keyword datasource.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
api/pyrightconfig.json Removes excludes and expands untyped-library allowlist; adjusts reporting settings.
api/core/rag/summary_index/summary_index.py Adds a Pyright ignore on Celery task invocation.
api/core/rag/splitter/text_splitter.py Adds a cast for tiktoken.encode(..., allowed_special=...).
api/core/rag/splitter/fixed_text_splitter.py Removes GPT2Tokenizer fallback logic and changes from_encoder length function behavior.
api/core/rag/retrieval/router/multi_dataset_function_call_router.py Refines typing of LLM invocation results and normalizes to LLMResult.
api/core/rag/extractor/watercrawl/provider.py Adds a runtime guard when scrape returns no result.
api/core/rag/extractor/watercrawl/client.py Adds return types and validates response shapes from API calls.
api/core/rag/extractor/unstructured/unstructured_doc_extractor.py Adjusts import pattern to reduce private import typing issues.
api/core/rag/datasource/retrieval_service.py Avoids unused loop variable in as_completed iteration.
api/core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py Tweaks fallback tokenizer handling and topK forwarding for keyword extraction.
api/core/rag/datasource/keyword/jieba/jieba.py Adds a guard for missing dataset keyword table and refines types for table extraction.
Comments suppressed due to low confidence (1)

api/core/rag/splitter/fixed_text_splitter.py:33

  • from_encoder no longer uses embedding_model_instance to count tokens and the _token_encoder closure was removed, but callers pass chunk_size as a token limit (see usages in indexing code) and existing unit tests rely on the token-counting path. This change will cause inconsistent chunk sizing and likely breaks api/tests/unit_tests/core/rag/splitter/test_text_splitter.py::test_from_encoder_internal_token_encoder_paths. Restore token-based counting when an embedding model is provided (and keep a safe fallback when it isn’t), or update the public contract + callers/tests accordingly.
    @classmethod
    def from_encoder[T: EnhanceRecursiveCharacterTextSplitter](
        cls: type[T],
        embedding_model_instance: ModelInstance | None,
        allowed_special: Literal["all"] | set[str] = set(),
        disallowed_special: Literal["all"] | Collection[str] = "all",
        **kwargs: Any,
    ) -> T:
        def _character_encoder(texts: list[str]) -> list[int]:
            if not texts:
                return []

            return [len(text) for text in texts]

        return cls(length_function=_character_encoder, **kwargs)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread api/core/rag/summary_index/summary_index.py
Comment thread api/pyrightconfig.json
Comment thread api/core/rag/splitter/text_splitter.py
@github-actions
Copy link
Copy Markdown
Contributor

Pyrefly Diff

base → PR
--- /tmp/pyrefly_base.txt	2026-04-17 03:49:46.032241870 +0000
+++ /tmp/pyrefly_pr.txt	2026-04-17 03:49:36.229074878 +0000
@@ -78,14 +78,6 @@
    --> core/ops/mlflow_trace/mlflow_trace.py:415:24
 ERROR Class member `OpsTraceProviderConfigMap.__getitem__` overrides parent class `UserDict` in an inconsistent manner [bad-param-name-override]
    --> core/ops/ops_trace_manager.py:206:9
-ERROR Object of class `NoneType` has no attribute `data_source_type` [missing-attribute]
-   --> core/rag/datasource/keyword/jieba/jieba.py:142:36
-ERROR Object of class `NoneType` has no attribute `keyword_table` [missing-attribute]
-   --> core/rag/datasource/keyword/jieba/jieba.py:144:13
-ERROR Cannot index into `set[Any]` [bad-index]
-   --> core/rag/datasource/keyword/jieba/jieba.py:157:29
-ERROR Argument `object` is not assignable to parameter `iterable` with type `Iterable[@_]` in function `list.__init__` [bad-argument-type]
-  --> core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py:88:35
 ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.post` [bad-argument-type]
    --> core/rag/extractor/notion_extractor.py:106:25
 ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.request` [bad-argument-type]
@@ -96,8 +88,6 @@
    --> core/rag/extractor/notion_extractor.py:297:25
 ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.request` [bad-argument-type]
    --> core/rag/extractor/notion_extractor.py:371:21
-ERROR Argument `Unknown | None` is not assignable to parameter `result_object` with type `dict[str, Any]` in function `WaterCrawlProvider._structure_data` [bad-argument-type]
-   --> core/rag/extractor/watercrawl/provider.py:108:37
 ERROR Object of class `BaseOxmlElement` has no attribute `body` [missing-attribute]
    --> core/rag/extractor/word_extractor.py:426:24
 ERROR Object of class `Document` has no attribute `score` [missing-attribute]
@@ -113,7 +103,7 @@
 ERROR Object of class `Document` has no attribute `score` [missing-attribute]
    --> core/rag/index_processor/processor/qa_index_processor.py:209:16
 ERROR No matching overload found for function `core.model_manager.ModelInstance.invoke_llm` called with arguments: (prompt_messages=list[SystemPromptMessage | UserPromptMessage], tools=list[PromptMessageTool], stream=Literal[False], model_parameters=dict[str, float | int]) [no-matching-overload]
-  --> core/rag/retrieval/router/multi_dataset_function_call_router.py:31:58
+  --> core/rag/retrieval/router/multi_dataset_function_call_router.py:31:52
 ERROR Class member `MCPToolProviderController.entity` overrides parent class `ToolProviderController` in an inconsistent manner [bad-override]
   --> core/tools/mcp_tool/provider.py:33:14
 ERROR Class member `PluginToolProviderController.entity` overrides parent class `BuiltinToolProviderController` in an inconsistent manner [bad-override]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Chore/Refactor] remove ignore in pyright config

2 participants