Supply chain attacks usually leave traces. A suspicious new dependency, an unusual network call, code that looks out of place. The Glassworm technique is different: the malicious payload is genuinely invisible. You can read the file, run a diff, review the pull request, and see nothing wrong — because the attack hides inside Unicode codepoints that render as zero-width or blank in every standard editor and terminal.
Aikido Security published a detailed analysis of a March 2026 wave that compromised over 150 GitHub repositories in a single week. The targets included projects with thousands of stars, published npm packages, and at least one VS Code extension distributed via the Open VSX Registry. This article breaks down the technique and gives you concrete tools to detect and block it.
What Glassworm Is
Glassworm is the name given to a threat actor (or coordinated campaign) that has been running invisible Unicode supply chain attacks since at least early 2025. The name refers to the attack's core property: the payload is transparent — present in the file but visually undetectable without specialized tooling.
The campaign has targeted the three main distribution channels for open-source JavaScript code:
- GitHub repositories — via pull requests with realistic-looking commits
- npm packages — published packages containing invisible encoded payloads
- VS Code extensions — distributed through the Open VSX Registry
The March 2026 wave was the largest so far. Between March 3–9, 2026, the same decoder fingerprint appeared in 151+ repositories. Compromised projects included Wasmer, Reworm (1,460 GitHub stars), OpenCode-bench, and Docz. On the npm side: @aifabrix/miso-client and @iflow-mcp/watercrawl-watercrawl-mcp. VS Code: quartz.quartz-markdown-editor v0.3.0.
The Unicode Characters Used
The attack exploits two specific Unicode ranges that are categorized as Private Use Area (PUA) characters. These are codepoints that have no assigned meaning in the Unicode standard — they render as invisible in virtually every font, editor, and terminal.
Variation Selectors (U+FE00 – U+FE0F)
Variation selectors are normally used to specify alternate glyph forms for emoji and CJK characters. The 16 selectors in this range (U+FE00 through U+FE0F) encode the values 0–15. When attached to an ordinary character, a variation selector is invisible — it modifies how the preceding character renders, or simply disappears if the font has no alternate form defined.
In the Glassworm attack, variation selectors are used as the lower nibble of a hex encoding scheme. U+FE00 = 0, U+FE01 = 1, ..., U+FE0F = 15.
Tags Block (U+E0100 – U+E01EF)
The Tags block was originally intended for language tags, deprecated in Unicode 6.0, and is now listed as "for restricted use." Characters U+E0100 through U+E01EF encode values 16–255 in the Glassworm scheme. Like variation selectors, these characters render as completely invisible in modern software.
Other Unicode attack characters (broader context)
Glassworm uses PUA ranges for payload encoding, but the broader category of invisible Unicode attacks includes several other character types you should know about:
| Character | Codepoint | How it's abused |
|---|---|---|
| Zero-Width Space | U+200B | Breaks string matching, bypasses keyword filters |
| Zero-Width Non-Joiner | U+200C | Splits identifiers visually while keeping them tokenized together |
| Zero-Width Joiner | U+200D | Invisible character between tokens; used in homograph attacks |
| Right-to-Left Override | U+202E | Reverses display order — "codexe.py" shows as "codpy.exe" |
| Left-to-Right Embedding | U+202A | Resets bidi direction; used to hide reversed segments |
| Pop Directional Formatting | U+202C | Closes bidi override blocks |
| Byte Order Mark | U+FEFF | Injected mid-file to confuse parsers and diff tools |
| Soft Hyphen | U+00AD | Invisible in rendered output; breaks string comparisons |
| Variation Selectors | U+FE00–U+FE0F | Glassworm payload encoding (lower nibble) |
| Tags Block | U+E0100–U+E01EF | Glassworm payload encoding (upper values) |
The bidirectional override attack (U+202E) is worth understanding separately — it predates Glassworm and targets file naming rather than payload encoding. A file named importantgpj.exe displays in Windows Explorer as importantexe.jpg because the RTL override reverses the characters that follow it. Applied to code, the same technique can make a function name appear to be something it isn't.
How the Glassworm Attack Works
The decoder is the signature piece of every Glassworm injection. Here is the actual pattern found in compromised repositories:
const s = v => [...v].map(w => (
w = w.codePointAt(0),
w >= 0xFE00 && w <= 0xFE0F ? w - 0xFE00 :
w >= 0xE0100 && w <= 0xE01EF ? w - 0xE0100 + 16 : null
)).filter(n => n !== null);
eval(Buffer.from(s(``)).toString('utf-8'));
Breaking this down:
- The template literal (backtick string) appears completely empty in any editor. It is not empty — it contains a sequence of invisible PUA characters.
- The
s()function iterates over every character in the string, reads its codepoint, and maps it to a numeric value based on the two PUA ranges. - The resulting array of numbers is treated as a byte array and passed to
Buffer.from(), reconstructing the original payload as a UTF-8 string. eval()executes whatever that payload contains — completely arbitrary JavaScript.
The payload itself is never visible anywhere in the source file. A reviewer reading the code sees eval(Buffer.from(s(``)).toString('utf-8')) — a call to eval with an empty string. At runtime, the "empty" backtick string expands to a full malicious program.
Cover commits
The social engineering layer is equally sophisticated. Glassworm injections do not arrive as a standalone "add evil.js" commit. They arrive bundled with legitimate-looking changes: documentation updates, version bumps, dependency patches, refactors that match the project's existing style. Aikido's analysis suggests the cover commits are generated by language models, given how well they mimic each project's conventions at scale.
This means standard code review — even careful code review — will miss the injection. The malicious lines look like normal code and the surrounding context is plausible.
What the payloads do
Historical Glassworm payloads (from the 2025 campaigns) used Solana's blockchain as a second-stage delivery mechanism: the initial payload fetched a script stored on-chain, making it both persistent and resistant to takedowns. Observed capabilities include:
- Credential and secret theft (environment variables, SSH keys,
.envfiles) - Crypto wallet token exfiltration
- Persistent backdoor installation
- Exfiltration of package contents during
npm installlifecycle scripts
Where It Has Been Found
GitHub repositories
The March 2026 wave hit 151+ repositories in seven days. The attack vector was pull requests — legitimate-looking contributions from accounts with no prior history, or from accounts that had been dormant and were reactivated. GitHub's own UI does not flag invisible Unicode by default in the PR diff view.
npm packages
Published npm packages carry the same decoder pattern in their distributed JavaScript files. When a developer runs npm install, the package is installed locally with the invisible payload intact. If any of the package's lifecycle scripts (postinstall, preinstall) run code that triggers the decoder, the payload executes during installation — before the developer has even imported the package.
VS Code extensions
The Open VSX Registry — the VS Code extension marketplace used by VSCodium, Eclipse Theia, and other editors — hosted a compromised extension (quartz.quartz-markdown-editor v0.3.0) from October 2025. VS Code extensions run with elevated permissions inside the editor process. A compromised extension has access to all open files, terminal sessions, and environment variables in your development environment.
How to Detect Invisible Unicode
GitHub code search
The Glassworm decoder has a specific fingerprint you can search for across GitHub:
0xFE00&&w<=0xFE0F?w-0xFE00:w>=0xE0100&&w<=0xE01EF
Paste this into GitHub's code search to find repositories containing the exact decoder pattern. You can also scope it to your organization with org:yourorg.
grep for invisible Unicode in your codebase
The most direct detection method is searching for non-ASCII characters in source files that should only contain ASCII:
# Find files containing characters outside the printable ASCII range
grep -rP "[\x80-\xFF]" --include="*.js" --include="*.ts" --include="*.py" .
# Find files containing zero-width characters specifically
grep -rP "[\x{200B}-\x{200D}\x{FEFF}\x{00AD}]" --include="*.js" .
# Find variation selectors (U+FE00-U+FE0F) — the Glassworm range
grep -rP "\x{FE00}-\x{FE0F}" --include="*.js" .
# Broad scan for any non-printable, non-whitespace characters
grep -rP "[^\x09\x0A\x0D\x20-\x7E]" --include="*.js" --include="*.ts" .
These commands use Perl-compatible regex (-P) which supports Unicode codepoint ranges. On macOS, use ggrep (from brew install grep) if the system grep does not support -P.
Python detection script
For more precise detection with context output:
#!/usr/bin/env python3
import sys, os, unicodedata
SUSPICIOUS = {
0x200B: "Zero-Width Space",
0x200C: "Zero-Width Non-Joiner",
0x200D: "Zero-Width Joiner",
0x202E: "Right-to-Left Override",
0x202A: "Left-to-Right Embedding",
0x202C: "Pop Directional Formatting",
0xFEFF: "Byte Order Mark (mid-file)",
0x00AD: "Soft Hyphen",
}
def check_file(path):
findings = []
try:
text = open(path, encoding='utf-8', errors='replace').read()
except Exception:
return findings
for i, ch in enumerate(text):
cp = ord(ch)
# Glassworm variation selector range
if 0xFE00 <= cp <= 0xFE0F:
findings.append((i, cp, f"Variation Selector VS{cp - 0xFE00 + 1}"))
# Glassworm tags block
elif 0xE0100 <= cp <= 0xE01EF:
findings.append((i, cp, f"Tags Block U+{cp:X}"))
# Known suspicious characters
elif cp in SUSPICIOUS:
findings.append((i, cp, SUSPICIOUS[cp]))
return findings
for root, dirs, files in os.walk(sys.argv[1] if len(sys.argv) > 1 else '.'):
dirs[:] = [d for d in dirs if d not in {'.git', 'node_modules', '__pycache__'}]
for fname in files:
if not any(fname.endswith(ext) for ext in ('.js', '.ts', '.py', '.mjs', '.cjs')):
continue
path = os.path.join(root, fname)
hits = check_file(path)
if hits:
print(f"\n[!] {path}")
for pos, cp, name in hits[:10]:
print(f" pos {pos}: U+{cp:04X} ({name})")
Save as scan-unicode.py and run with python3 scan-unicode.py ./src.
git diff — seeing what editors hide
Git's diff output will show invisible characters if you configure it correctly. Standard git diff renders them as blank, but these options help:
# Show non-printable characters as hex escape sequences
git diff --word-diff=plain | cat -v
# Use git's built-in textconv to hexdump changed files
# Add to .git/config or ~/.gitconfig:
[diff "hex"]
textconv = hexdump -C
# Then mark files in .gitattributes:
*.js diff=hex
# More targeted: use xxd and grep for the Glassworm ranges
git show HEAD:path/to/file.js | xxd | grep -E "fe0[0-9a-f]|e010[0-9a-f]"
VS Code settings
VS Code has built-in Unicode highlighting that is off by default. Enable it in your settings.json:
{
"editor.unicodeHighlight.invisibleCharacters": true,
"editor.unicodeHighlight.ambiguousCharacters": true,
"editor.unicodeHighlight.nonBasicASCII": true,
"editor.unicodeHighlight.allowedLocales": {},
"editor.renderControlCharacters": true
}
With these settings active, VS Code will draw a yellow highlight box around any suspicious Unicode character. Variation selectors and tags-block characters will appear as highlighted rectangles rather than invisible gaps.
GitHub's built-in warning
GitHub added a Unicode warning feature after the original Trojan Source disclosure in 2021. When a file contains bidirectional Unicode control characters, GitHub's PR diff view shows a yellow warning banner: "This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below."
Important limitation: this warning covers bidi override characters (U+202E, U+202A, etc.) but does not currently flag variation selectors or the tags-block characters used by Glassworm. Do not rely on GitHub's warning as your only defense.
How to Prevent It
.gitattributes — block non-ASCII in source files
You can configure git to treat certain files as requiring only ASCII content, and have a pre-receive hook reject non-ASCII commits. The .gitattributes approach marks files for export-subst filtering:
# .gitattributes
# Treat JavaScript and TypeScript as text and normalize line endings
*.js text eol=lf
*.ts text eol=lf
*.mjs text eol=lf
*.cjs text eol=lf
*.py text eol=lf
For stronger enforcement, add a pre-commit hook:
#!/bin/bash
# .git/hooks/pre-commit
# Block commits containing invisible Unicode in source files
SUSPICIOUS_PATTERN=$'[\xef\xb8\x80-\xef\xb8\x8f]' # FE00-FE0F in UTF-8
files=$(git diff --cached --name-only --diff-filter=ACM | grep -E '\.(js|ts|mjs|cjs|py)$')
for f in $files; do
if git show ":$f" | grep -qP "[\x{FE00}-\x{FE0F}\x{E0100}-\x{E01EF}\x{200B}-\x{200D}\x{202E}\x{FEFF}]"; then
echo "ERROR: $f contains suspicious Unicode characters"
echo "Run: python3 scan-unicode.py $f"
exit 1
fi
done
Make it executable: chmod +x .git/hooks/pre-commit. For team enforcement, use a tool like pre-commit (the framework) to distribute hooks via .pre-commit-config.yaml.
ESLint rule for zero-width characters
ESLint's no-irregular-whitespace rule catches some invisible characters, but does not cover the full range. For comprehensive coverage, add the no-misleading-character-class rule and consider a custom rule or plugin:
// .eslintrc.js
module.exports = {
rules: {
"no-irregular-whitespace": ["error", {
"skipStrings": false,
"skipComments": false,
"skipRegExps": false,
"skipTemplates": false // This is what matters for Glassworm
}],
"no-misleading-character-class": "error",
"no-control-regex": "error"
}
};
Note: standard ESLint does not have a built-in rule that flags variation selectors or tags-block characters. The custom Python script above is more reliable for Glassworm-specific detection.
Semgrep rules
Semgrep can scan for the Glassworm decoder pattern directly. Create a rule file:
# semgrep-unicode.yaml
rules:
- id: glassworm-decoder-pattern
patterns:
- pattern: eval(Buffer.from(...).toString('utf-8'))
- pattern-not: eval(Buffer.from($X).toString('utf-8'))
message: Potential Glassworm eval-based payload decoder detected
languages: [javascript, typescript]
severity: ERROR
- id: suspicious-eval-template-literal
pattern: eval(Buffer.from($F(``)).toString(...))
message: eval() with function-processed template literal — possible invisible Unicode payload
languages: [javascript, typescript]
severity: ERROR
- id: unicode-private-use-area-in-source
pattern-regex: "[\uFE00-\uFE0F]"
message: Variation Selector Unicode character found — possible Glassworm encoding
languages: [javascript, typescript, python]
severity: WARNING
Run with: semgrep --config semgrep-unicode.yaml ./src
CI/CD integration
Add a Unicode scan step to your GitHub Actions workflow:
# .github/workflows/unicode-scan.yml
name: Unicode Security Scan
on: [push, pull_request]
jobs:
unicode-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Scan for invisible Unicode characters
run: |
python3 - <<'EOF'
import sys, os
DANGEROUS_RANGES = [
(0xFE00, 0xFE0F, "Variation Selector (Glassworm range)"),
(0xE0100, 0xE01EF, "Tags Block (Glassworm range)"),
(0x200B, 0x200D, "Zero-Width Character"),
(0x202A, 0x202E, "Bidi Control Character"),
(0xFEFF, 0xFEFF, "Byte Order Mark"),
]
found = False
for root, dirs, files in os.walk('.'):
dirs[:] = [d for d in dirs if d not in {'.git', 'node_modules'}]
for fname in files:
if not any(fname.endswith(e) for e in ('.js','.ts','.mjs','.py')):
continue
path = os.path.join(root, fname)
text = open(path, encoding='utf-8', errors='replace').read()
for i, ch in enumerate(text):
cp = ord(ch)
for lo, hi, name in DANGEROUS_RANGES:
if lo <= cp <= hi:
print(f"::error file={path},title=Invisible Unicode::{name} at position {i} (U+{cp:04X})")
found = True
break
sys.exit(1 if found else 0)
EOF
- name: Semgrep scan
uses: returntocorp/semgrep-action@v1
with:
config: p/security-audit
npm install defense: safe chain wrappers
For npm dependencies, the attack can trigger during postinstall lifecycle scripts. Defense options:
# Option 1: disable lifecycle scripts entirely (breaks some packages)
npm install --ignore-scripts
# Option 2: use npm's audit
npm audit --audit-level=high
# Option 3: scan installed packages for invisible Unicode after install
find node_modules -name "*.js" -not -path "*/\.git/*" | \
xargs python3 scan-unicode.py
Aikido Security also publishes an open-source CLI wrapper called Safe Chain that intercepts npm/yarn/pnpm installs and scans packages for malware signatures (including invisible Unicode patterns) before they execute.
Tooling Summary
| Tool | What it catches | Integration |
|---|---|---|
grep -P |
Any non-ASCII or specific codepoint ranges | CLI, pre-commit hook, CI |
| Python scan script (above) | Full Glassworm + bidi + ZWC ranges with context | CLI, CI pipeline |
| VS Code Unicode Highlight | Invisible chars, ambiguous chars, non-basic ASCII | Editor (settings.json) |
| Semgrep | Decoder pattern, suspicious eval() constructs | CLI, GitHub Actions |
| ESLint no-irregular-whitespace | Some zero-width chars in templates | Editor, CI |
| Aikido Safe Chain | npm package malware including invisible Unicode | npm/yarn/pnpm wrapper |
| git + hexdump | Raw bytes in committed files | Manual investigation |
| GitHub code search | Glassworm decoder fingerprint across repos | Web (manual or API) |
The Broader Trojan Source Problem
The academic paper that kicked off serious attention to invisible Unicode in source code was "Trojan Source: Invisible Vulnerabilities" (Boucher & Anderson, 2021, Cambridge University). That paper focused primarily on bidirectional override attacks — using U+202E and related characters to make code appear to execute in a different order than it actually does.
Glassworm goes further by using the PUA ranges for full payload encoding rather than just display manipulation. The technique is more powerful because:
- The payload is completely arbitrary, not just reordered visible code
- Bidi overrides are now flagged by GitHub and many editors; variation selectors are not
- The encoding is self-contained — no second file, no import, just the invisible characters in the template literal
The Trojan Source paper prompted patches in compilers (GCC, Clang, rustc, Go) to warn on bidi control characters in source. Those patches do not help against Glassworm's variation-selector technique.
What Maintainers Should Do Right Now
- Audit open pull requests — run the Python scan script or grep against any recently merged PRs from external contributors
- Enable VS Code Unicode highlighting — turn on all four settings listed above
- Add the GitHub Actions workflow — the Unicode scan step above takes under 30 seconds and will block future injections at PR time
- Search GitHub for your repo — use the code search fingerprint to check if the decoder is present anywhere in your codebase
- Review recently published npm versions — if you published a package between March 3–9, 2026, scan the published tarball with
npm pack --dry-runand run the Unicode scanner on the output - Check VS Code extensions — if you maintain an Open VSX or VS Code Marketplace extension, scan all JavaScript in the extension with the Python script
Key Takeaways
- The Glassworm attack hides complete executable payloads in Unicode variation selectors (U+FE00–U+FE0F) and tags-block characters (U+E0100–U+E01EF) — both of which render as invisible in all standard tooling
- Code review cannot catch this attack without specialized tooling. The code literally looks clean
- 151+ GitHub repositories, multiple npm packages, and at least one VS Code extension were compromised in a single week in March 2026
- Detection is straightforward once you know to look: grep for the codepoint ranges, or run the Python script above
- Prevention is a CI/CD problem: add a Unicode scan step to your GitHub Actions workflow and block it at PR time
- GitHub's existing Unicode warning does not cover the Glassworm character ranges
Building secure developer workflows? Check out the free tools at helloandy.net — including the CLAUDE.md writer for structuring AI agent instructions and the API tester for debugging endpoints. No signup required.