fix: remove pyright config excludes and fix type errors#35247
Open
sicnuyudidi wants to merge 4 commits intolanggenius:mainfrom
Open
fix: remove pyright config excludes and fix type errors#35247sicnuyudidi wants to merge 4 commits intolanggenius:mainfrom
sicnuyudidi wants to merge 4 commits intolanggenius:mainfrom
Conversation
Contributor
Pyrefly Diffbase → PR--- /tmp/pyrefly_base.txt 2026-04-15 07:51:10.493939532 +0000
+++ /tmp/pyrefly_pr.txt 2026-04-15 07:51:00.147914963 +0000
@@ -74,14 +74,6 @@
--> core/ops/mlflow_trace/mlflow_trace.py:415:24
ERROR Class member `OpsTraceProviderConfigMap.__getitem__` overrides parent class `UserDict` in an inconsistent manner [bad-param-name-override]
--> core/ops/ops_trace_manager.py:206:9
-ERROR Object of class `NoneType` has no attribute `data_source_type` [missing-attribute]
- --> core/rag/datasource/keyword/jieba/jieba.py:142:36
-ERROR Object of class `NoneType` has no attribute `keyword_table` [missing-attribute]
- --> core/rag/datasource/keyword/jieba/jieba.py:144:13
-ERROR Cannot index into `set[Any]` [bad-index]
- --> core/rag/datasource/keyword/jieba/jieba.py:157:29
-ERROR Argument `object` is not assignable to parameter `iterable` with type `Iterable[@_]` in function `list.__init__` [bad-argument-type]
- --> core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py:88:35
ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.post` [bad-argument-type]
--> core/rag/extractor/notion_extractor.py:106:25
ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.request` [bad-argument-type]
@@ -92,8 +84,6 @@
--> core/rag/extractor/notion_extractor.py:297:25
ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.request` [bad-argument-type]
--> core/rag/extractor/notion_extractor.py:371:21
-ERROR Argument `Unknown | None` is not assignable to parameter `result_object` with type `dict[str, Any]` in function `WaterCrawlProvider._structure_data` [bad-argument-type]
- --> core/rag/extractor/watercrawl/provider.py:108:37
ERROR Object of class `BaseOxmlElement` has no attribute `body` [missing-attribute]
--> core/rag/extractor/word_extractor.py:426:24
ERROR Object of class `Document` has no attribute `score` [missing-attribute]
@@ -108,8 +98,6 @@
--> core/rag/index_processor/processor/qa_index_processor.py:208:33
ERROR Object of class `Document` has no attribute `score` [missing-attribute]
--> core/rag/index_processor/processor/qa_index_processor.py:209:16
-ERROR No matching overload found for function `core.model_manager.ModelInstance.invoke_llm` called with arguments: (prompt_messages=list[SystemPromptMessage | UserPromptMessage], tools=list[PromptMessageTool], stream=Literal[False], model_parameters=dict[str, float | int]) [no-matching-overload]
- --> core/rag/retrieval/router/multi_dataset_function_call_router.py:32:58
ERROR Class member `MCPToolProviderController.entity` overrides parent class `ToolProviderController` in an inconsistent manner [bad-override]
--> core/tools/mcp_tool/provider.py:33:14
ERROR Class member `PluginToolProviderController.entity` overrides parent class `BuiltinToolProviderController` in an inconsistent manner [bad-override]
|
Contributor
Pyrefly Diffbase → PR--- /tmp/pyrefly_base.txt 2026-04-15 09:52:43.467112681 +0000
+++ /tmp/pyrefly_pr.txt 2026-04-15 09:52:33.064168737 +0000
@@ -74,14 +74,6 @@
--> core/ops/mlflow_trace/mlflow_trace.py:415:24
ERROR Class member `OpsTraceProviderConfigMap.__getitem__` overrides parent class `UserDict` in an inconsistent manner [bad-param-name-override]
--> core/ops/ops_trace_manager.py:206:9
-ERROR Object of class `NoneType` has no attribute `data_source_type` [missing-attribute]
- --> core/rag/datasource/keyword/jieba/jieba.py:142:36
-ERROR Object of class `NoneType` has no attribute `keyword_table` [missing-attribute]
- --> core/rag/datasource/keyword/jieba/jieba.py:144:13
-ERROR Cannot index into `set[Any]` [bad-index]
- --> core/rag/datasource/keyword/jieba/jieba.py:157:29
-ERROR Argument `object` is not assignable to parameter `iterable` with type `Iterable[@_]` in function `list.__init__` [bad-argument-type]
- --> core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py:88:35
ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.post` [bad-argument-type]
--> core/rag/extractor/notion_extractor.py:106:25
ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.request` [bad-argument-type]
@@ -92,8 +84,6 @@
--> core/rag/extractor/notion_extractor.py:297:25
ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.request` [bad-argument-type]
--> core/rag/extractor/notion_extractor.py:371:21
-ERROR Argument `Unknown | None` is not assignable to parameter `result_object` with type `dict[str, Any]` in function `WaterCrawlProvider._structure_data` [bad-argument-type]
- --> core/rag/extractor/watercrawl/provider.py:108:37
ERROR Object of class `BaseOxmlElement` has no attribute `body` [missing-attribute]
--> core/rag/extractor/word_extractor.py:426:24
ERROR Object of class `Document` has no attribute `score` [missing-attribute]
@@ -109,7 +99,7 @@
ERROR Object of class `Document` has no attribute `score` [missing-attribute]
--> core/rag/index_processor/processor/qa_index_processor.py:209:16
ERROR No matching overload found for function `core.model_manager.ModelInstance.invoke_llm` called with arguments: (prompt_messages=list[SystemPromptMessage | UserPromptMessage], tools=list[PromptMessageTool], stream=Literal[False], model_parameters=dict[str, float | int]) [no-matching-overload]
- --> core/rag/retrieval/router/multi_dataset_function_call_router.py:32:58
+ --> core/rag/retrieval/router/multi_dataset_function_call_router.py:32:52
ERROR Class member `MCPToolProviderController.entity` overrides parent class `ToolProviderController` in an inconsistent manner [bad-override]
--> core/tools/mcp_tool/provider.py:33:14
ERROR Class member `PluginToolProviderController.entity` overrides parent class `BuiltinToolProviderController` in an inconsistent manner [bad-override]
@@ -1398,10 +1388,6 @@
--> tests/test_containers_integration_tests/services/test_webhook_service.py:496:79
ERROR Argument `dict[str, dict[str, int | str]]` is not assignable to parameter `node_config` with type `NodeConfigDict` in function `services.trigger.webhook_service.WebhookService.generate_webhook_response` [bad-argument-type]
--> tests/test_containers_integration_tests/services/test_webhook_service.py:505:79
-ERROR Argument `Literal['normal']` is not assignable to parameter `status` with type `SQLCoreOperations[TenantStatus] | TenantStatus` in function `models.account.Tenant.__init__` [bad-argument-type]
- --> tests/test_containers_integration_tests/services/test_webhook_service_relationships.py:36:72
-ERROR Class `WebhookServiceRelationshipFactory` has no class attribute `_read_cache` [missing-attribute]
- --> tests/test_containers_integration_tests/services/test_webhook_service_relationships.py:468:26
ERROR Object of class `NoneType` has no attribute `id` [missing-attribute]
--> tests/test_containers_integration_tests/services/test_workflow_app_service.py:101:38
ERROR Argument `SimpleNamespace` is not assignable to parameter `log` with type `WorkflowAppLog` in function `services.workflow_app_service.LogView.__init__` [bad-argument-type]
@@ -6384,11 +6370,29 @@
ERROR Argument `dict[str, dict[str, str]]` is not assignable to parameter `node_config` with type `NodeConfigDict` in function `services.trigger.webhook_service.WebhookService.extract_and_validate_webhook_data` [bad-argument-type]
--> tests/unit_tests/services/test_webhook_service.py:535:83
ERROR Argument `dict[str, dict[str, str] | str]` is not assignable to parameter `webhook_data` with type `RawWebhookDataDict` in function `services.trigger.webhook_service.WebhookService._validate_http_metadata` [bad-argument-type]
- --> tests/unit_tests/services/test_webhook_service_additional.py:255:57
+ --> tests/unit_tests/services/test_webhook_service_additional.py:403:57
ERROR Argument `dict[str, dict[str, int] | dict[str, str]]` is not assignable to parameter `webhook_data` with type `RawWebhookDataDict` in function `services.trigger.webhook_service.WebhookService.build_workflow_inputs` [bad-argument-type]
- --> tests/unit_tests/services/test_webhook_service_additional.py:267:55
+ --> tests/unit_tests/services/test_webhook_service_additional.py:415:55
+ERROR Argument `dict[str, dict[str, int]]` is not assignable to parameter `webhook_data` with type `RawWebhookDataDict` in function `services.trigger.webhook_service.WebhookService.trigger_workflow_execution` [bad-argument-type]
+ --> tests/unit_tests/services/test_webhook_service_additional.py:451:68
+ERROR Missing required key `method` for TypedDict `RawWebhookDataDict` [bad-typed-dict-key]
+ --> tests/unit_tests/services/test_webhook_service_additional.py:487:72
+ERROR Missing required key `headers` for TypedDict `RawWebhookDataDict` [bad-typed-dict-key]
+ --> tests/unit_tests/services/test_webhook_service_additional.py:487:72
+ERROR Missing required key `query_params` for TypedDict `RawWebhookDataDict` [bad-typed-dict-key]
+ --> tests/unit_tests/services/test_webhook_service_additional.py:487:72
+ERROR Missing required key `files` for TypedDict `RawWebhookDataDict` [bad-typed-dict-key]
+ --> tests/unit_tests/services/test_webhook_service_additional.py:487:72
+ERROR Missing required key `method` for TypedDict `RawWebhookDataDict` [bad-typed-dict-key]
+ --> tests/unit_tests/services/test_webhook_service_additional.py:515:72
+ERROR Missing required key `headers` for TypedDict `RawWebhookDataDict` [bad-typed-dict-key]
+ --> tests/unit_tests/services/test_webhook_service_additional.py:515:72
+ERROR Missing required key `query_params` for TypedDict `RawWebhookDataDict` [bad-typed-dict-key]
+ --> tests/unit_tests/services/test_webhook_service_additional.py:515:72
+ERROR Missing required key `files` for TypedDict `RawWebhookDataDict` [bad-typed-dict-key]
+ --> tests/unit_tests/services/test_webhook_service_additional.py:515:72
ERROR Argument `dict[str, dict[str, int | str]]` is not assignable to parameter `node_config` with type `NodeConfigDict` in function `services.trigger.webhook_service.WebhookService.generate_webhook_response` [bad-argument-type]
- --> tests/unit_tests/services/test_webhook_service_additional.py:279:65
+ --> tests/unit_tests/services/test_webhook_service_additional.py:658:65
ERROR Argument `dict[str, str]` is not assignable to parameter `args` with type `WorkflowRunListArgs` in function `services.workflow_run_service.WorkflowRunService.get_paginate_workflow_runs` [bad-argument-type]
--> tests/unit_tests/services/test_workflow_run_service.py:102:18
ERROR `Literal['2']` is not assignable to TypedDict key `limit` with type `int` [bad-typed-dict-key]
|
Contributor
There was a problem hiding this comment.
Pull request overview
Expands Pyright strict type-checking coverage by removing excluded directories and addressing newly surfaced typing issues across RAG components and extractor integrations.
Changes:
- Removed
core/ragandproviders/from Pyright excludes to increase coverage. - Added many entries to
allowedUntypedLibrariesand adjusted Pyright reporting settings. - Applied targeted typing fixes/guards across splitters, routers, Watercrawl extractor client/provider, and Jieba keyword datasource.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| api/pyrightconfig.json | Removes excludes and expands untyped-library allowlist; adjusts reporting settings. |
| api/core/rag/summary_index/summary_index.py | Adds a Pyright ignore on Celery task invocation. |
| api/core/rag/splitter/text_splitter.py | Adds a cast for tiktoken.encode(..., allowed_special=...). |
| api/core/rag/splitter/fixed_text_splitter.py | Removes GPT2Tokenizer fallback logic and changes from_encoder length function behavior. |
| api/core/rag/retrieval/router/multi_dataset_function_call_router.py | Refines typing of LLM invocation results and normalizes to LLMResult. |
| api/core/rag/extractor/watercrawl/provider.py | Adds a runtime guard when scrape returns no result. |
| api/core/rag/extractor/watercrawl/client.py | Adds return types and validates response shapes from API calls. |
| api/core/rag/extractor/unstructured/unstructured_doc_extractor.py | Adjusts import pattern to reduce private import typing issues. |
| api/core/rag/datasource/retrieval_service.py | Avoids unused loop variable in as_completed iteration. |
| api/core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py | Tweaks fallback tokenizer handling and topK forwarding for keyword extraction. |
| api/core/rag/datasource/keyword/jieba/jieba.py | Adds a guard for missing dataset keyword table and refines types for table extraction. |
Comments suppressed due to low confidence (1)
api/core/rag/splitter/fixed_text_splitter.py:33
from_encoderno longer usesembedding_model_instanceto count tokens and the_token_encoderclosure was removed, but callers passchunk_sizeas a token limit (see usages in indexing code) and existing unit tests rely on the token-counting path. This change will cause inconsistent chunk sizing and likely breaksapi/tests/unit_tests/core/rag/splitter/test_text_splitter.py::test_from_encoder_internal_token_encoder_paths. Restore token-based counting when an embedding model is provided (and keep a safe fallback when it isn’t), or update the public contract + callers/tests accordingly.
@classmethod
def from_encoder[T: EnhanceRecursiveCharacterTextSplitter](
cls: type[T],
embedding_model_instance: ModelInstance | None,
allowed_special: Literal["all"] | set[str] = set(),
disallowed_special: Literal["all"] | Collection[str] = "all",
**kwargs: Any,
) -> T:
def _character_encoder(texts: list[str]) -> list[int]:
if not texts:
return []
return [len(text) for text in texts]
return cls(length_function=_character_encoder, **kwargs)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Contributor
Pyrefly Diffbase → PR--- /tmp/pyrefly_base.txt 2026-04-17 03:49:46.032241870 +0000
+++ /tmp/pyrefly_pr.txt 2026-04-17 03:49:36.229074878 +0000
@@ -78,14 +78,6 @@
--> core/ops/mlflow_trace/mlflow_trace.py:415:24
ERROR Class member `OpsTraceProviderConfigMap.__getitem__` overrides parent class `UserDict` in an inconsistent manner [bad-param-name-override]
--> core/ops/ops_trace_manager.py:206:9
-ERROR Object of class `NoneType` has no attribute `data_source_type` [missing-attribute]
- --> core/rag/datasource/keyword/jieba/jieba.py:142:36
-ERROR Object of class `NoneType` has no attribute `keyword_table` [missing-attribute]
- --> core/rag/datasource/keyword/jieba/jieba.py:144:13
-ERROR Cannot index into `set[Any]` [bad-index]
- --> core/rag/datasource/keyword/jieba/jieba.py:157:29
-ERROR Argument `object` is not assignable to parameter `iterable` with type `Iterable[@_]` in function `list.__init__` [bad-argument-type]
- --> core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py:88:35
ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.post` [bad-argument-type]
--> core/rag/extractor/notion_extractor.py:106:25
ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.request` [bad-argument-type]
@@ -96,8 +88,6 @@
--> core/rag/extractor/notion_extractor.py:297:25
ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.request` [bad-argument-type]
--> core/rag/extractor/notion_extractor.py:371:21
-ERROR Argument `Unknown | None` is not assignable to parameter `result_object` with type `dict[str, Any]` in function `WaterCrawlProvider._structure_data` [bad-argument-type]
- --> core/rag/extractor/watercrawl/provider.py:108:37
ERROR Object of class `BaseOxmlElement` has no attribute `body` [missing-attribute]
--> core/rag/extractor/word_extractor.py:426:24
ERROR Object of class `Document` has no attribute `score` [missing-attribute]
@@ -113,7 +103,7 @@
ERROR Object of class `Document` has no attribute `score` [missing-attribute]
--> core/rag/index_processor/processor/qa_index_processor.py:209:16
ERROR No matching overload found for function `core.model_manager.ModelInstance.invoke_llm` called with arguments: (prompt_messages=list[SystemPromptMessage | UserPromptMessage], tools=list[PromptMessageTool], stream=Literal[False], model_parameters=dict[str, float | int]) [no-matching-overload]
- --> core/rag/retrieval/router/multi_dataset_function_call_router.py:31:58
+ --> core/rag/retrieval/router/multi_dataset_function_call_router.py:31:52
ERROR Class member `MCPToolProviderController.entity` overrides parent class `ToolProviderController` in an inconsistent manner [bad-override]
--> core/tools/mcp_tool/provider.py:33:14
ERROR Class member `PluginToolProviderController.entity` overrides parent class `BuiltinToolProviderController` in an inconsistent manner [bad-override]
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #26412
What changed
core/ragandproviders/from pyright exclude listallowedUntypedLibrariesfor proper type checkingFiles modified (11)
api/pyrightconfig.json- removed excludes, added allowed librariesapi/core/rag/datasource/**- fixed type annotationsapi/core/rag/extractor/**- fixed type annotationsapi/core/rag/router/**- fixed type annotationsapi/core/rag/splitter/**- fixed type annotationsapi/core/rag/summary_index/**- fixed type annotationsTesting
Verified with
pyrightthat no new errors were introduced.