feat(lsp): add shebang detection for extensionless files#3460
feat(lsp): add shebang detection for extensionless files#3460unprolix wants to merge 2 commits intocode-yeongyu:devfrom
Conversation
Enables LSP language detection for executable scripts without file extensions by reading and parsing shebang lines. Changes: - Add shebang-detection.ts with parser and language mapping - Update getLanguageId() to fallback to shebang detection for extensionless files - Add LRU cache to avoid repeated file reads - Add comprehensive tests for shebang parsing and detection Supports common formats: - #!/usr/bin/node -> javascript - #!/usr/bin/env python3 -> python - #!/bin/bash -> shellscript - #!/usr/bin/env -S node --flags -> javascript Fixes: LSP tools now correctly handle Makefile, Dockerfile, and executable scripts that rely on shebang lines for language identification.
|
Thank you for your contribution! Before we can merge this PR, we need you to sign our Contributor License Agreement (CLA). To sign the CLA, please comment on this PR with: This is a one-time requirement. Once signed, all your future contributions will be automatically accepted. I have read the CLA Document and I hereby sign the CLA Jeremy Bornstein seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. |
There was a problem hiding this comment.
3 issues found across 5 files
Confidence score: 3/5
- There is a concrete user-facing regression risk in
src/tools/lsp/shebang-detection.ts: shebangs with interpreter arguments (for example#!/bin/bash -e) may fail detection, so valid scripts can open with the wrong or missing language mode. src/tools/lsp/lsp-client.tskeeps a path-only shebang fallback cache without invalidation, so extensionless files that change shebangs may continue using a stale language ID until cache state is reset.- Given a medium-severity, high-confidence detection bug plus related cache behavior concerns, this carries some merge risk but appears tractable with targeted fixes.
- Pay close attention to
src/tools/lsp/shebang-detection.tsandsrc/tools/lsp/lsp-client.ts- shebang parsing/caching behavior can produce stale or incorrect language detection.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/tools/lsp/shebang-detection.ts">
<violation number="1" location="src/tools/lsp/shebang-detection.ts:57">
P2: Shebang parsing fails when the interpreter path includes arguments (for example `#!/bin/bash -e`), causing valid scripts to miss language detection.</violation>
<violation number="2" location="src/tools/lsp/shebang-detection.ts:67">
P2: Cache hits do not refresh entry recency, so frequently used paths can still be evicted as stale despite repeated access.</violation>
</file>
<file name="src/tools/lsp/lsp-client.ts">
<violation number="1" location="src/tools/lsp/lsp-client.ts:22">
P2: This fallback uses a path-only shebang cache with no invalidation, so an extensionless file that changes its shebang can keep opening with the stale language ID.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| if (!this.openedFiles.has(absPath)) { | ||
| const ext = extname(absPath) | ||
| const languageId = getLanguageId(ext) | ||
| const languageId = getLanguageId(absPath, ext) |
There was a problem hiding this comment.
P2: This fallback uses a path-only shebang cache with no invalidation, so an extensionless file that changes its shebang can keep opening with the stale language ID.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/tools/lsp/lsp-client.ts, line 22:
<comment>This fallback uses a path-only shebang cache with no invalidation, so an extensionless file that changes its shebang can keep opening with the stale language ID.</comment>
<file context>
@@ -19,7 +19,7 @@ export class LSPClient extends LSPClientConnection {
if (!this.openedFiles.has(absPath)) {
const ext = extname(absPath)
- const languageId = getLanguageId(ext)
+ const languageId = getLanguageId(absPath, ext)
const version = 1
</file context>
- Handle interpreter arguments (e.g. #!/bin/bash -e) by splitting on whitespace - Refresh LRU cache recency on hit (delete + re-set pattern) - Add mtime-based cache invalidation to detect file changes - Add tests for interpreter args and cache invalidation
|
@cubic-dev-ai All 3 issues addressed in c95ddc9: |
@unprolix I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
4 issues found across 5 files
Confidence score: 2/5
- High-confidence issues in
src/tools/lsp/shebang-detection.tsindicate real regression risk: cached shebang results can become stale after file edits, causing incorrect language detection for users. parseShebang()insrc/tools/lsp/shebang-detection.tskeeps interpreter arguments (for examplebash -e), which can fail language mapping for valid shebangs and reduce LSP accuracy.- Negative-result caching in
src/tools/lsp/shebang-detection.tscan persist misclassification when file readability/state changes, so behavior may remain wrong until cache is manually cleared. - Pay close attention to
src/tools/lsp/shebang-detection.ts,src/tools/lsp/shebang-detection.test.ts- cache invalidation and test isolation need tightening to avoid persistent misdetection and flaky verification.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/tools/lsp/shebang-detection.ts">
<violation number="1" location="src/tools/lsp/shebang-detection.ts:57">
P1: `parseShebang()` does not strip interpreter arguments, so shebangs like `#!/bin/bash -e` resolve to `"bash -e"` and miss language mapping.</violation>
<violation number="2" location="src/tools/lsp/shebang-detection.ts:66">
P1: The shebang cache can return stale language after file edits because cached values are reused without validating file modification time.</violation>
<violation number="3" location="src/tools/lsp/shebang-detection.ts:67">
P2: Negative shebang results are cached forever, so extensionless files can stay misclassified after a file appears or becomes readable again.</violation>
</file>
<file name="src/tools/lsp/shebang-detection.test.ts">
<violation number="1" location="src/tools/lsp/shebang-detection.test.ts:102">
P2: `clearShebangCache` test is order-dependent because it assumes the shared shebang cache starts empty, but earlier tests can leave entries behind.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| expect(getShebangCacheSize()).toBe(0) | ||
|
|
||
| detectShebangLanguage(file) | ||
| expect(getShebangCacheSize()).toBe(1) |
There was a problem hiding this comment.
P2: clearShebangCache test is order-dependent because it assumes the shared shebang cache starts empty, but earlier tests can leave entries behind.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/tools/lsp/shebang-detection.test.ts, line 102:
<comment>`clearShebangCache` test is order-dependent because it assumes the shared shebang cache starts empty, but earlier tests can leave entries behind.</comment>
<file context>
@@ -0,0 +1,142 @@
+ expect(getShebangCacheSize()).toBe(0)
+
+ detectShebangLanguage(file)
+ expect(getShebangCacheSize()).toBe(1)
+
+ detectShebangLanguage(file)
</file context>
|
@cubic-dev-ai The review appears to be analyzing commit 1124574 (the original). All fixes are in c95ddc9: |
@unprolix I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
1 issue found across 5 files
Confidence score: 3/5
- There is a concrete behavior risk in
src/tools/lsp/shebang-detection.ts:envshebangs with variable assignments (for example,env VAR=value command) can be misparsed, so valid interpreters may not be detected. - I’m scoring this as moderate merge risk because the issue is medium severity (6/10) with high confidence (9/10) and can directly affect user-facing shebang recognition.
- Pay close attention to
src/tools/lsp/shebang-detection.ts- theenvtoken parsing should skip variable assignments before resolving the interpreter command.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/tools/lsp/shebang-detection.ts">
<violation number="1" location="src/tools/lsp/shebang-detection.ts:51">
P2: The `env` parser treats variable assignments as the interpreter token, so valid `env VAR=value command` shebangs are not recognized.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| let command = trimmed.slice(2).trim() | ||
|
|
||
| if (command.includes("env ")) { | ||
| const envMatch = command.match(/env\s+(?:-\S+\s+)*(\S+)/) |
There was a problem hiding this comment.
P2: The env parser treats variable assignments as the interpreter token, so valid env VAR=value command shebangs are not recognized.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/tools/lsp/shebang-detection.ts, line 51:
<comment>The `env` parser treats variable assignments as the interpreter token, so valid `env VAR=value command` shebangs are not recognized.</comment>
<file context>
@@ -0,0 +1,115 @@
+ let command = trimmed.slice(2).trim()
+
+ if (command.includes("env ")) {
+ const envMatch = command.match(/env\s+(?:-\S+\s+)*(\S+)/)
+ if (envMatch) {
+ command = envMatch[1]
</file context>
Summary
Summary by cubic
Adds shebang detection to identify the language of extensionless executable scripts in the LSP. Improves syntax highlighting and tooling for Node, Python, Bash, Deno, and more.
New Features
node→javascript,python3→python,bash→shellscript,deno→typescript).getLanguageId()now falls back to shebang when no extension; extension still takes priority.#!/bin/bash -e); CRLF supported; tests added.Migration
getLanguageId(filePath, ext)to pass the file path and extension.Written for commit c95ddc9. Summary will update on new commits.