Invisible Unicode Attacks on GitHub & npm

Supply chain attacks usually leave traces. A suspicious new dependency, an unusual network call, code that looks out of place. The Glassworm technique is different: the malicious payload is genuinely invisible. You can read the file, run a diff, review the pull request, and see nothing wrong — because the attack hides inside Unicode codepoints that render as zero-width or blank in every standard editor and terminal.

Aikido Security published a detailed analysis of a March 2026 wave that compromised over 150 GitHub repositories in a single week. The targets included projects with thousands of stars, published npm packages, and at least one VS Code extension distributed via the Open VSX Registry. This article breaks down the technique and gives you concrete tools to detect and block it.

What Glassworm Is

Glassworm is the name given to a threat actor (or coordinated campaign) that has been running invisible Unicode supply chain attacks since at least early 2025. The name refers to the attack's core property: the payload is transparent — present in the file but visually undetectable without specialized tooling.

The campaign has targeted the three main distribution channels for open-source JavaScript code:

GitHub repositories — via pull requests with realistic-looking commits
npm packages — published packages containing invisible encoded payloads
VS Code extensions — distributed through the Open VSX Registry

The March 2026 wave was the largest so far. Between March 3–9, 2026, the same decoder fingerprint appeared in 151+ repositories. Compromised projects included Wasmer, Reworm (1,460 GitHub stars), OpenCode-bench, and Docz. On the npm side: @aifabrix/miso-client and @iflow-mcp/watercrawl-watercrawl-mcp. VS Code: quartz.quartz-markdown-editor v0.3.0.

The Unicode Characters Used

The attack exploits two specific Unicode ranges that are categorized as Private Use Area (PUA) characters. These are codepoints that have no assigned meaning in the Unicode standard — they render as invisible in virtually every font, editor, and terminal.

Variation Selectors (U+FE00 – U+FE0F)

Variation selectors are normally used to specify alternate glyph forms for emoji and CJK characters. The 16 selectors in this range (U+FE00 through U+FE0F) encode the values 0–15. When attached to an ordinary character, a variation selector is invisible — it modifies how the preceding character renders, or simply disappears if the font has no alternate form defined.

In the Glassworm attack, variation selectors are used as the lower nibble of a hex encoding scheme. U+FE00 = 0, U+FE01 = 1, ..., U+FE0F = 15.

Tags Block (U+E0100 – U+E01EF)

The Tags block was originally intended for language tags, deprecated in Unicode 6.0, and is now listed as "for restricted use." Characters U+E0100 through U+E01EF encode values 16–255 in the Glassworm scheme. Like variation selectors, these characters render as completely invisible in modern software.

Other Unicode attack characters (broader context)

Glassworm uses PUA ranges for payload encoding, but the broader category of invisible Unicode attacks includes several other character types you should know about:

Character	Codepoint	How it's abused
Zero-Width Space	U+200B	Breaks string matching, bypasses keyword filters
Zero-Width Non-Joiner	U+200C	Splits identifiers visually while keeping them tokenized together
Zero-Width Joiner	U+200D	Invisible character between tokens; used in homograph attacks
Right-to-Left Override	U+202E	Reverses display order — "cod‮exe.py" shows as "codpy.exe"
Left-to-Right Embedding	U+202A	Resets bidi direction; used to hide reversed segments
Pop Directional Formatting	U+202C	Closes bidi override blocks
Byte Order Mark	U+FEFF	Injected mid-file to confuse parsers and diff tools
Soft Hyphen	U+00AD	Invisible in rendered output; breaks string comparisons
Variation Selectors	U+FE00–U+FE0F	Glassworm payload encoding (lower nibble)
Tags Block	U+E0100–U+E01EF	Glassworm payload encoding (upper values)

The bidirectional override attack (U+202E) is worth understanding separately — it predates Glassworm and targets file naming rather than payload encoding. A file named important‮gpj.exe displays in Windows Explorer as importantexe.jpg because the RTL override reverses the characters that follow it. Applied to code, the same technique can make a function name appear to be something it isn't.

How the Glassworm Attack Works

The decoder is the signature piece of every Glassworm injection. Here is the actual pattern found in compromised repositories:

const s = v => [...v].map(w => (
  w = w.codePointAt(0),
  w >= 0xFE00 && w <= 0xFE0F ? w - 0xFE00 :
  w >= 0xE0100 && w <= 0xE01EF ? w - 0xE0100 + 16 : null
)).filter(n => n !== null);

eval(Buffer.from(s(``)).toString('utf-8'));

Breaking this down:

The template literal (backtick string) appears completely empty in any editor. It is not empty — it contains a sequence of invisible PUA characters.
The s() function iterates over every character in the string, reads its codepoint, and maps it to a numeric value based on the two PUA ranges.
The resulting array of numbers is treated as a byte array and passed to Buffer.from(), reconstructing the original payload as a UTF-8 string.
eval() executes whatever that payload contains — completely arbitrary JavaScript.

The payload itself is never visible anywhere in the source file. A reviewer reading the code sees eval(Buffer.from(s(``)).toString('utf-8')) — a call to eval with an empty string. At runtime, the "empty" backtick string expands to a full malicious program.

Cover commits

The social engineering layer is equally sophisticated. Glassworm injections do not arrive as a standalone "add evil.js" commit. They arrive bundled with legitimate-looking changes: documentation updates, version bumps, dependency patches, refactors that match the project's existing style. Aikido's analysis suggests the cover commits are generated by language models, given how well they mimic each project's conventions at scale.

This means standard code review — even careful code review — will miss the injection. The malicious lines look like normal code and the surrounding context is plausible.

What the payloads do

Historical Glassworm payloads (from the 2025 campaigns) used Solana's blockchain as a second-stage delivery mechanism: the initial payload fetched a script stored on-chain, making it both persistent and resistant to takedowns. Observed capabilities include:

Credential and secret theft (environment variables, SSH keys, .env files)
Crypto wallet token exfiltration
Persistent backdoor installation
Exfiltration of package contents during npm install lifecycle scripts

Where It Has Been Found

GitHub repositories

The March 2026 wave hit 151+ repositories in seven days. The attack vector was pull requests — legitimate-looking contributions from accounts with no prior history, or from accounts that had been dormant and were reactivated. GitHub's own UI does not flag invisible Unicode by default in the PR diff view.

npm packages

Published npm packages carry the same decoder pattern in their distributed JavaScript files. When a developer runs npm install, the package is installed locally with the invisible payload intact. If any of the package's lifecycle scripts (postinstall, preinstall) run code that triggers the decoder, the payload executes during installation — before the developer has even imported the package.

VS Code extensions

The Open VSX Registry — the VS Code extension marketplace used by VSCodium, Eclipse Theia, and other editors — hosted a compromised extension (quartz.quartz-markdown-editor v0.3.0) from October 2025. VS Code extensions run with elevated permissions inside the editor process. A compromised extension has access to all open files, terminal sessions, and environment variables in your development environment.

How to Detect Invisible Unicode

GitHub code search

The Glassworm decoder has a specific fingerprint you can search for across GitHub:

0xFE00&&w<=0xFE0F?w-0xFE00:w>=0xE0100&&w<=0xE01EF

Paste this into GitHub's code search to find repositories containing the exact decoder pattern. You can also scope it to your organization with org:yourorg.

grep for invisible Unicode in your codebase

The most direct detection method is searching for non-ASCII characters in source files that should only contain ASCII:

# Find files containing characters outside the printable ASCII range
grep -rP "[\x80-\xFF]" --include="*.js" --include="*.ts" --include="*.py" .

# Find files containing zero-width characters specifically
grep -rP "[\x{200B}-\x{200D}\x{FEFF}\x{00AD}]" --include="*.js" .

# Find variation selectors (U+FE00-U+FE0F) — the Glassworm range
grep -rP "\x{FE00}-\x{FE0F}" --include="*.js" .

# Broad scan for any non-printable, non-whitespace characters
grep -rP "[^\x09\x0A\x0D\x20-\x7E]" --include="*.js" --include="*.ts" .

These commands use Perl-compatible regex (-P) which supports Unicode codepoint ranges. On macOS, use ggrep (from brew install grep) if the system grep does not support -P.

Python detection script

For more precise detection with context output:

#!/usr/bin/env python3
import sys, os, unicodedata

SUSPICIOUS = {
    0x200B: "Zero-Width Space",
    0x200C: "Zero-Width Non-Joiner",
    0x200D: "Zero-Width Joiner",
    0x202E: "Right-to-Left Override",
    0x202A: "Left-to-Right Embedding",
    0x202C: "Pop Directional Formatting",
    0xFEFF: "Byte Order Mark (mid-file)",
    0x00AD: "Soft Hyphen",
}

def check_file(path):
    findings = []
    try:
        text = open(path, encoding='utf-8', errors='replace').read()
    except Exception:
        return findings

    for i, ch in enumerate(text):
        cp = ord(ch)
        # Glassworm variation selector range
        if 0xFE00 <= cp <= 0xFE0F:
            findings.append((i, cp, f"Variation Selector VS{cp - 0xFE00 + 1}"))
        # Glassworm tags block
        elif 0xE0100 <= cp <= 0xE01EF:
            findings.append((i, cp, f"Tags Block U+{cp:X}"))
        # Known suspicious characters
        elif cp in SUSPICIOUS:
            findings.append((i, cp, SUSPICIOUS[cp]))
    return findings

for root, dirs, files in os.walk(sys.argv[1] if len(sys.argv) > 1 else '.'):
    dirs[:] = [d for d in dirs if d not in {'.git', 'node_modules', '__pycache__'}]
    for fname in files:
        if not any(fname.endswith(ext) for ext in ('.js', '.ts', '.py', '.mjs', '.cjs')):
            continue
        path = os.path.join(root, fname)
        hits = check_file(path)
        if hits:
            print(f"\n[!] {path}")
            for pos, cp, name in hits[:10]:
                print(f"    pos {pos}: U+{cp:04X} ({name})")

Save as scan-unicode.py and run with python3 scan-unicode.py ./src.

git diff — seeing what editors hide

Git's diff output will show invisible characters if you configure it correctly. Standard git diff renders them as blank, but these options help:

# Show non-printable characters as hex escape sequences
git diff --word-diff=plain | cat -v

# Use git's built-in textconv to hexdump changed files
# Add to .git/config or ~/.gitconfig:
[diff "hex"]
    textconv = hexdump -C

# Then mark files in .gitattributes:
*.js diff=hex

# More targeted: use xxd and grep for the Glassworm ranges
git show HEAD:path/to/file.js | xxd | grep -E "fe0[0-9a-f]|e010[0-9a-f]"

VS Code settings

VS Code has built-in Unicode highlighting that is off by default. Enable it in your settings.json:

{
  "editor.unicodeHighlight.invisibleCharacters": true,
  "editor.unicodeHighlight.ambiguousCharacters": true,
  "editor.unicodeHighlight.nonBasicASCII": true,
  "editor.unicodeHighlight.allowedLocales": {},
  "editor.renderControlCharacters": true
}

With these settings active, VS Code will draw a yellow highlight box around any suspicious Unicode character. Variation selectors and tags-block characters will appear as highlighted rectangles rather than invisible gaps.

GitHub's built-in warning

GitHub added a Unicode warning feature after the original Trojan Source disclosure in 2021. When a file contains bidirectional Unicode control characters, GitHub's PR diff view shows a yellow warning banner: "This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below."

Important limitation: this warning covers bidi override characters (U+202E, U+202A, etc.) but does not currently flag variation selectors or the tags-block characters used by Glassworm. Do not rely on GitHub's warning as your only defense.

How to Prevent It

.gitattributes — block non-ASCII in source files

You can configure git to treat certain files as requiring only ASCII content, and have a pre-receive hook reject non-ASCII commits. The .gitattributes approach marks files for export-subst filtering:

# .gitattributes
# Treat JavaScript and TypeScript as text and normalize line endings
*.js    text eol=lf
*.ts    text eol=lf
*.mjs   text eol=lf
*.cjs   text eol=lf
*.py    text eol=lf

For stronger enforcement, add a pre-commit hook:

#!/bin/bash
# .git/hooks/pre-commit
# Block commits containing invisible Unicode in source files

SUSPICIOUS_PATTERN=$'[\xef\xb8\x80-\xef\xb8\x8f]'  # FE00-FE0F in UTF-8

files=$(git diff --cached --name-only --diff-filter=ACM | grep -E '\.(js|ts|mjs|cjs|py)$')

for f in $files; do
  if git show ":$f" | grep -qP "[\x{FE00}-\x{FE0F}\x{E0100}-\x{E01EF}\x{200B}-\x{200D}\x{202E}\x{FEFF}]"; then
    echo "ERROR: $f contains suspicious Unicode characters"
    echo "Run: python3 scan-unicode.py $f"
    exit 1
  fi
done

Make it executable: chmod +x .git/hooks/pre-commit. For team enforcement, use a tool like pre-commit (the framework) to distribute hooks via .pre-commit-config.yaml.

ESLint rule for zero-width characters

ESLint's no-irregular-whitespace rule catches some invisible characters, but does not cover the full range. For comprehensive coverage, add the no-misleading-character-class rule and consider a custom rule or plugin:

// .eslintrc.js
module.exports = {
  rules: {
    "no-irregular-whitespace": ["error", {
      "skipStrings": false,
      "skipComments": false,
      "skipRegExps": false,
      "skipTemplates": false  // This is what matters for Glassworm
    }],
    "no-misleading-character-class": "error",
    "no-control-regex": "error"
  }
};

Note: standard ESLint does not have a built-in rule that flags variation selectors or tags-block characters. The custom Python script above is more reliable for Glassworm-specific detection.

Semgrep rules

Semgrep can scan for the Glassworm decoder pattern directly. Create a rule file:

# semgrep-unicode.yaml
rules:
  - id: glassworm-decoder-pattern
    patterns:
      - pattern: eval(Buffer.from(...).toString('utf-8'))
      - pattern-not: eval(Buffer.from($X).toString('utf-8'))
    message: Potential Glassworm eval-based payload decoder detected
    languages: [javascript, typescript]
    severity: ERROR

  - id: suspicious-eval-template-literal
    pattern: eval(Buffer.from($F(``)).toString(...))
    message: eval() with function-processed template literal — possible invisible Unicode payload
    languages: [javascript, typescript]
    severity: ERROR

  - id: unicode-private-use-area-in-source
    pattern-regex: "[\uFE00-\uFE0F]"
    message: Variation Selector Unicode character found — possible Glassworm encoding
    languages: [javascript, typescript, python]
    severity: WARNING

Run with: semgrep --config semgrep-unicode.yaml ./src

CI/CD integration

Add a Unicode scan step to your GitHub Actions workflow:

# .github/workflows/unicode-scan.yml
name: Unicode Security Scan

on: [push, pull_request]

jobs:
  unicode-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Scan for invisible Unicode characters
        run: |
          python3 - <<'EOF'
          import sys, os

          DANGEROUS_RANGES = [
              (0xFE00, 0xFE0F, "Variation Selector (Glassworm range)"),
              (0xE0100, 0xE01EF, "Tags Block (Glassworm range)"),
              (0x200B, 0x200D, "Zero-Width Character"),
              (0x202A, 0x202E, "Bidi Control Character"),
              (0xFEFF, 0xFEFF, "Byte Order Mark"),
          ]

          found = False
          for root, dirs, files in os.walk('.'):
              dirs[:] = [d for d in dirs if d not in {'.git', 'node_modules'}]
              for fname in files:
                  if not any(fname.endswith(e) for e in ('.js','.ts','.mjs','.py')):
                      continue
                  path = os.path.join(root, fname)
                  text = open(path, encoding='utf-8', errors='replace').read()
                  for i, ch in enumerate(text):
                      cp = ord(ch)
                      for lo, hi, name in DANGEROUS_RANGES:
                          if lo <= cp <= hi:
                              print(f"::error file={path},title=Invisible Unicode::{name} at position {i} (U+{cp:04X})")
                              found = True
                              break
          sys.exit(1 if found else 0)
          EOF

      - name: Semgrep scan
        uses: returntocorp/semgrep-action@v1
        with:
          config: p/security-audit

npm install defense: safe chain wrappers

For npm dependencies, the attack can trigger during postinstall lifecycle scripts. Defense options:

# Option 1: disable lifecycle scripts entirely (breaks some packages)
npm install --ignore-scripts

# Option 2: use npm's audit
npm audit --audit-level=high

# Option 3: scan installed packages for invisible Unicode after install
find node_modules -name "*.js" -not -path "*/\.git/*" | \
  xargs python3 scan-unicode.py

Aikido Security also publishes an open-source CLI wrapper called Safe Chain that intercepts npm/yarn/pnpm installs and scans packages for malware signatures (including invisible Unicode patterns) before they execute.

Tooling Summary

Tool	What it catches	Integration
`grep -P`	Any non-ASCII or specific codepoint ranges	CLI, pre-commit hook, CI
Python scan script (above)	Full Glassworm + bidi + ZWC ranges with context	CLI, CI pipeline
VS Code Unicode Highlight	Invisible chars, ambiguous chars, non-basic ASCII	Editor (settings.json)
Semgrep	Decoder pattern, suspicious eval() constructs	CLI, GitHub Actions
ESLint no-irregular-whitespace	Some zero-width chars in templates	Editor, CI
Aikido Safe Chain	npm package malware including invisible Unicode	npm/yarn/pnpm wrapper
git + hexdump	Raw bytes in committed files	Manual investigation
GitHub code search	Glassworm decoder fingerprint across repos	Web (manual or API)

The Broader Trojan Source Problem

The academic paper that kicked off serious attention to invisible Unicode in source code was "Trojan Source: Invisible Vulnerabilities" (Boucher & Anderson, 2021, Cambridge University). That paper focused primarily on bidirectional override attacks — using U+202E and related characters to make code appear to execute in a different order than it actually does.

Glassworm goes further by using the PUA ranges for full payload encoding rather than just display manipulation. The technique is more powerful because:

The payload is completely arbitrary, not just reordered visible code
Bidi overrides are now flagged by GitHub and many editors; variation selectors are not
The encoding is self-contained — no second file, no import, just the invisible characters in the template literal

The Trojan Source paper prompted patches in compilers (GCC, Clang, rustc, Go) to warn on bidi control characters in source. Those patches do not help against Glassworm's variation-selector technique.

What Maintainers Should Do Right Now

Audit open pull requests — run the Python scan script or grep against any recently merged PRs from external contributors
Enable VS Code Unicode highlighting — turn on all four settings listed above
Add the GitHub Actions workflow — the Unicode scan step above takes under 30 seconds and will block future injections at PR time
Search GitHub for your repo — use the code search fingerprint to check if the decoder is present anywhere in your codebase
Review recently published npm versions — if you published a package between March 3–9, 2026, scan the published tarball with npm pack --dry-run and run the Unicode scanner on the output
Check VS Code extensions — if you maintain an Open VSX or VS Code Marketplace extension, scan all JavaScript in the extension with the Python script

Key Takeaways

The Glassworm attack hides complete executable payloads in Unicode variation selectors (U+FE00–U+FE0F) and tags-block characters (U+E0100–U+E01EF) — both of which render as invisible in all standard tooling
Code review cannot catch this attack without specialized tooling. The code literally looks clean
151+ GitHub repositories, multiple npm packages, and at least one VS Code extension were compromised in a single week in March 2026
Detection is straightforward once you know to look: grep for the codepoint ranges, or run the Python script above
Prevention is a CI/CD problem: add a Unicode scan step to your GitHub Actions workflow and block it at PR time
GitHub's existing Unicode warning does not cover the Glassworm character ranges

Building secure developer workflows? Check out the free tools at helloandy.net — including the CLAUDE.md writer for structuring AI agent instructions and the API tester for debugging endpoints. No signup required.