Performance Web Dev Optimization March 2026 · Andy

Web Performance Guide: Why Your Site Is Slow (And How to Fix It)

A 49MB web page hit Hacker News this week with 629 points and 288 comments. It struck a nerve because everyone recognizes the pattern: sites that bloat invisibly over years of feature additions until they're unusable on anything but fiber. This is the practical guide I wish existed — real commands, real configs, and the math behind every recommendation.

In this article

Why performance matters (more than you think)
Diagnosing the problem
Images: the biggest win
JavaScript: the other killer
CSS optimization
Fonts
Caching
Server-side wins
The audit checklist
Tooling

Why performance matters (more than you think)

Let's ground this in real numbers before touching any code.

53%

of mobile users abandon after 3 seconds

delay → 7% conversion drop (Amazon, 2012)

200ms

TTFB Google considers "good"

2.5s

LCP threshold for "good" ranking signal

Google's Core Web Vitals became a confirmed ranking signal in 2021. There are three metrics that actually matter:

LCP — Largest Contentful Paint

How long until the main content is visible. Good: <2.5s. This is usually your hero image or H1. Optimize the thing users actually see first.

CLS — Cumulative Layout Shift

How much content jumps around as the page loads. Good: <0.1. The main culprit: images without explicit width/height attributes.

INP — Interaction to Next Paint

How fast the page responds to clicks/taps. Good: <200ms. Replaced FID in 2024. Heavy JavaScript main-thread work is the killer here.

The 49MB page story isn't an edge case — it's a snapshot of how web bloat accumulates. Each library added "just in case," each unoptimized image uploaded by a non-technical editor, each analytics script that loads five more scripts. No single decision is obviously wrong; the aggregate is a disaster. Bloat is the #1 silent killer of web performance because it never announces itself.

Diagnosing the problem

Before you optimize anything, you need to know what's actually slow. Guessing is expensive. Here's how to get real data.

Lighthouse CLI (not just the browser panel)

The browser DevTools Lighthouse is useful, but inconsistent — your local machine has extensions, warm caches, and fast CPU. The CLI lets you run headless, simulate real devices, and get reproducible results:

# Install once
npm install -g lighthouse

# Run against your URL, simulate mobile on 4G
lighthouse https://yoursite.com \
  --output=html \
  --output-path=./report.html \
  --preset=perf \
  --emulated-form-factor=mobile \
  --throttling-method=simulate

# Open the report
open report.html

The output tells you your LCP, CLS, INP scores with specific elements to blame. More importantly, it tells you why each score is what it is — render-blocking resources, image size, unused JavaScript.

For CI pipelines, use --output=json and parse the categories.performance.score field. Fail the build if it drops below 0.8.

WebPageTest for real-world simulation

WebPageTest (webpagetest.org) runs from actual locations on real devices. The waterfall view is what you actually want here. Look for:

The "critical path" — which resources block the first render
Connection waterfalls — how many round trips to how many domains
Time to First Byte — is your server slow, or is it the client?
Film strip view — what does the user actually see at 1s, 2s, 3s?

Chrome DevTools Network tab

Open DevTools → Network → check "Disable cache" → hard reload. Sort by Size descending. The top entries are your targets. Key things to check:

Initiator column — which script loaded which other scripts (tracking pixels loading five sub-scripts is a common culprit)
Waterfall column — stacked resources that could be parallelized
Transfer Size vs. Resource Size — the gap is how much compression saved you (or didn't)

Coverage tab for unused JS/CSS

This one is underused and brutally revealing. Open DevTools → Command+Shift+P (or Ctrl+Shift+P) → "Coverage" → click record → reload the page → stop. You'll see exactly what percentage of each JavaScript and CSS file was never executed on page load.

Common finding

A typical marketing site loads 300KB of JavaScript on the homepage. The Coverage tab reveals that 70% of it — the product tour logic, the checkout flow, the admin dashboard code — is never touched. You're making every visitor download code they'll never run.

Images: the biggest win

On most sites, images account for 60-80% of total page weight. This is where optimizing first almost always pays off the most.

Modern formats: WebP and AVIF

The before/after math is real:

# A typical JPEG hero image
hero.jpg: 450KB

# Convert to WebP
cwebp -q 82 hero.jpg -o hero.webp
# hero.webp: 180KB  (60% smaller, visually identical)

# Convert to AVIF (even better compression, slower encode)
avifenc --min 30 --max 63 hero.jpg hero.avif
# hero.avif: 95KB  (79% smaller)

AVIF gets you smaller files but takes longer to encode and has slightly less browser support (still 93%+ as of 2026). WebP is the safe default — virtually universal support, massive savings.

Use the <picture> element to serve the right format without losing fallback support:

<picture>
  <source srcset="hero.avif" type="image/avif">
  <source srcset="hero.webp" type="image/webp">
  <img src="hero.jpg" alt="Hero image" width="1200" height="630"
       loading="lazy" decoding="async">
</picture>

srcset and sizes for responsive images

Serving a 2400px image to a 375px mobile screen is a 6x waste. srcset fixes this:

<img
  src="hero-800.webp"
  srcset="
    hero-400.webp  400w,
    hero-800.webp  800w,
    hero-1200.webp 1200w,
    hero-2400.webp 2400w
  "
  sizes="
    (max-width: 600px) 100vw,
    (max-width: 1200px) 80vw,
    1200px
  "
  alt="Hero"
  width="1200"
  height="630"
  loading="lazy"
>

The sizes attribute tells the browser how wide the image will actually render, so it can pick the right source before downloading anything. Without sizes, it defaults to 100vw — almost always wrong.

Explicit width and height to prevent CLS

This one is easy to miss. If you don't specify width and height on an image, the browser doesn't know how much space to reserve until the image loads. Everything below it shifts down. That's your CLS score tanking.

Always include the intrinsic dimensions. CSS can still control the displayed size — the HTML attributes just give the browser the aspect ratio it needs to reserve space:

<!-- Bad: browser doesn't know how tall this will be -->
<img src="photo.webp" alt="Photo" style="width: 100%">

<!-- Good: browser reserves correct space, no layout shift -->
<img src="photo.webp" alt="Photo" width="800" height="600"
     style="width: 100%; height: auto">

Lazy loading

Native lazy loading is supported everywhere and requires one attribute:

<img src="below-fold.webp" loading="lazy" width="800" height="400" alt="...">

The exception: your hero image, your logo, anything above the fold. Those should have loading="eager" (the default) or be explicitly preloaded. Lazy-loading the LCP element is a common mistake that tanks your score.

<!-- Preload the hero image so it starts loading immediately -->
<link rel="preload" as="image" href="hero.webp"
      imagesrcset="hero-400.webp 400w, hero-800.webp 800w"
      imagesizes="100vw">

Typical image optimization results

A 4.2MB homepage drops to 1.1MB after converting to WebP + adding srcset. No visual difference. LCP drops from 4.8s to 1.9s on simulated mobile 4G. That's the entire CWV picture improving from one change.

JavaScript: the other killer

JavaScript is more expensive than any other asset type. A 1MB image and a 1MB JavaScript file are not equivalent — the JS has to be parsed, compiled, and executed. On a mid-range mobile device (Moto G4, the standard benchmark), that 1MB of JS costs roughly 3–4 seconds of CPU time. The image costs a network transfer and a texture decode.

Bundle splitting

If you're using webpack, Vite, or Rollup, route-based code splitting is the first thing to enable:

// Vite config — automatic chunk splitting by route
// vite.config.js
export default {
  build: {
    rollupOptions: {
      output: {
        manualChunks: {
          // Vendor chunk: stuff that changes rarely
          vendor: ['react', 'react-dom'],
          // Separate heavy libraries
          charts: ['recharts'],
          editor: ['@codemirror/state', '@codemirror/view'],
        }
      }
    }
  }
}

With React Router or Next.js, use dynamic imports for route components so users only download the code for pages they visit:

// Instead of:
import ProductPage from './ProductPage'

// Use:
const ProductPage = React.lazy(() => import('./ProductPage'))

// Wrap with Suspense
<Suspense fallback={<Loading />}>
  <ProductPage />
</Suspense>

Tree shaking: eliminating dead code

Tree shaking removes code that's imported but never actually called. It requires ES modules (not CommonJS) to work properly. The trap: many libraries still ship CommonJS, which defeats tree shaking entirely.

# Check what's actually in your bundle
npx bundle-buddy ./dist/assets/*.js

# Or use source-map-explorer
npm install -g source-map-explorer
source-map-explorer dist/main.js

The visual treemap shows you where your bundle bytes are coming from. Lodash imported as import _ from 'lodash' drags in all 70KB even if you only use _.debounce. Fix: import debounce from 'lodash/debounce' or switch to lodash-es.

defer vs async vs type="module"

How you load scripts matters enormously for Time to Interactive:

<!-- Blocks HTML parsing. Never do this for non-critical scripts. -->
<script src="app.js"></script>

<!-- Downloads in parallel, executes when ready (order not guaranteed) -->
<script async src="analytics.js"></script>

<!-- Downloads in parallel, executes after HTML is parsed (order preserved) -->
<script defer src="app.js"></script>

<!-- ES modules: implicitly deferred, always strict mode -->
<script type="module" src="app.mjs"></script>

Use defer for your application code. Use async for analytics and ads (they don't need the DOM and don't affect each other). Never use bare <script> tags in the <head> for external files.

Find and remove unused dependencies

# depcheck scans your code and tells you what's imported vs what's in package.json
npx depcheck

# Sample output:
# Unused dependencies
# * moment        (you're using date-fns now, remove this)
# * lodash        (only used in one file, inline it)
# * react-tooltip (removed from UI, forgot to uninstall)

Every unused dependency is dead weight in your node_modules and potentially in your bundle. Run depcheck before any major performance audit.

The import cost rule

Before adding any npm dependency, check its cost at bundlephobia.com. A library that adds 200KB gzipped to your bundle is not free. Ask: can you do this with a smaller library, a built-in browser API, or 20 lines of code? The answer is often yes.

CSS optimization

Critical CSS extraction

CSS blocks rendering. The browser can't paint anything until all CSS is loaded. For the above-the-fold content, you want the critical CSS inlined in <head> so the browser can render without waiting for a stylesheet download.

# Generate critical CSS for your main page
npx critical https://yoursite.com \
  --width 1300 \
  --height 900 \
  --inline \
  --base dist/ \
  dist/index.html

The output inlines the minimum CSS needed for above-the-fold rendering, then loads the full stylesheet non-blocking. This is one of the highest-leverage LCP improvements you can make.

For the rest of your stylesheet, load it non-blocking:

<!-- Non-blocking stylesheet load -->
<link rel="preload" href="/styles.css" as="style"
      onload="this.onload=null;this.rel='stylesheet'">
<noscript><link rel="stylesheet" href="/styles.css"></noscript>

PurgeCSS: eliminate unused rules

Most projects have significant CSS dead weight — rules for components that were deleted, utility classes that are no longer used, entire vendor themes that ship thousands of selectors. PurgeCSS analyzes your HTML/JS and removes any CSS selector that doesn't appear:

# Install
npm install -D purgecss

# Run against your built files
npx purgecss \
  --css dist/styles.css \
  --content dist/**/*.html dist/**/*.js \
  --output dist/styles.purged.css

If you're using Tailwind CSS, PurgeCSS is built in — Tailwind's JIT mode only generates the classes your templates actually use. A fresh Tailwind project generates under 10KB of CSS. A misconfigured one can ship all 3MB of utility classes.

CSS animations: stick to transform and opacity

This is subtle but impactful. Animating properties that trigger layout (width, height, margin, padding, top, left) forces the browser to recalculate the entire document layout on every frame. This runs on the main thread and competes with your JavaScript.

Animating transform and opacity runs on the compositor thread — completely separate from the main thread, hardware-accelerated, silky smooth even under load:

/* Bad: triggers layout recalc every frame */
.slide-in {
  animation: slide 0.3s ease;
}
@keyframes slide {
  from { left: -100px; }
  to   { left: 0; }
}

/* Good: compositor-only, no layout impact */
.slide-in {
  animation: slide 0.3s ease;
}
@keyframes slide {
  from { transform: translateX(-100px); }
  to   { transform: translateX(0); }
}

Add will-change: transform to elements that will animate, but use it sparingly — it forces the browser to create a separate compositor layer for the element, which costs GPU memory.

Fonts

font-display: swap

Without this, the browser shows invisible text while waiting for the font to load (FOIT — Flash of Invisible Text). With font-display: swap, it shows the fallback font immediately and swaps in the web font when it arrives:

@font-face {
  font-family: 'Inter';
  src: url('/fonts/inter-regular.woff2') format('woff2');
  font-weight: 400;
  font-style: normal;
  font-display: swap; /* Show fallback immediately, swap when ready */
}

Preconnect to font servers

If you're using Google Fonts or another external font service, preconnect eliminates the DNS lookup + TLS handshake delay before the browser can even start downloading the font:

<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;600;700&display=swap"
      rel="stylesheet">

Note the crossorigin attribute on the gstatic preconnect — required because fonts are loaded with CORS.

Self-hosting vs CDN

Self-hosting fonts eliminates the external DNS lookup entirely and gives you control over caching headers. The tradeoff: you lose the (now mostly mythical) shared cache benefit — browsers stopped sharing cross-origin caches in 2020 for privacy reasons.

Self-hosting is generally the right call for performance. Use google-webfonts-helper.herokuapp.com to download the woff2 files directly.

Subsetting fonts

A full Inter font file is 300KB. If your site is English-only, you only need the Latin character set — around 30KB. Subset with pyftsubset:

# Install fonttools
pip install fonttools brotli

# Subset to Latin characters only
pyftsubset inter-regular.ttf \
  --output-file=inter-latin.woff2 \
  --flavor=woff2 \
  --layout-features='' \
  --unicodes="U+0020-007E,U+00A0-00FF"

For Google Fonts, you can pass &text= to get a subset URL, but the proper approach is to download and self-host your subset.

Caching

Cache-Control headers

The right caching strategy depends on whether your asset filenames are hashed or stable:

# Hashed assets (app.a3f9b2c1.js, styles.8d4e1f20.css)
# These can be cached forever — the filename changes when content changes
Cache-Control: public, max-age=31536000, immutable

# HTML files — never cache these, always revalidate
Cache-Control: no-cache

# APIs — usually no caching or short TTLs
Cache-Control: no-store

# Versioned but stable path assets (images referenced by URL)
# Cache for a week, allow revalidation
Cache-Control: public, max-age=604800, stale-while-revalidate=86400

The immutable directive tells the browser "this file will never change at this URL, don't even bother checking." It eliminates the conditional GET request on repeat visits. Only use it for content-hashed assets.

nginx configuration for caching

# /etc/nginx/sites-available/yoursite

server {
    listen 443 ssl http2;
    server_name yoursite.com;

    root /var/www/yoursite;

    # HTML: no cache
    location ~* \.html$ {
        add_header Cache-Control "no-cache";
    }

    # Hashed assets: cache forever
    location ~* \.(js|css)\?v= {
        add_header Cache-Control "public, max-age=31536000, immutable";
    }

    # Hashed filenames (common webpack/vite output)
    location ~* \.[0-9a-f]{8,}\.(js|css|woff2)$ {
        add_header Cache-Control "public, max-age=31536000, immutable";
    }

    # Images: cache for a week
    location ~* \.(webp|avif|jpg|jpeg|png|gif|svg|ico)$ {
        add_header Cache-Control "public, max-age=604800, stale-while-revalidate=86400";
    }
}

You can verify these headers are set correctly using the API Tester — just enter your asset URL and check the response headers in the output.

Service workers for offline caching

For progressive web apps or any site that benefits from offline access, a service worker gives you a programmable cache. The Cache API lets you cache responses on first visit and serve them instantly on repeat visits — even offline:

// sw.js — minimal service worker for static assets
const CACHE_NAME = 'v1';
const STATIC_ASSETS = ['/', '/styles.css', '/app.js'];

self.addEventListener('install', event => {
  event.waitUntil(
    caches.open(CACHE_NAME).then(cache => cache.addAll(STATIC_ASSETS))
  );
});

self.addEventListener('fetch', event => {
  event.respondWith(
    caches.match(event.request).then(cached => {
      // Serve from cache, fetch in background to update
      const networkFetch = fetch(event.request).then(response => {
        const clone = response.clone();
        caches.open(CACHE_NAME).then(cache => cache.put(event.request, clone));
        return response;
      });
      return cached || networkFetch;
    })
  );
});

Server-side wins

HTTP/2 and HTTP/3

HTTP/2 multiplexes requests over a single connection — eliminating the "6 parallel requests per domain" limit of HTTP/1.1. HTTP/3 uses QUIC instead of TCP, which dramatically reduces latency on lossy connections (mobile networks). Both are enabled at the nginx/load-balancer level:

# nginx.conf — enable HTTP/2 and HTTP/3
server {
    listen 443 ssl http2;       # HTTP/2
    listen 443 quic reuseport;  # HTTP/3 (QUIC)
    http3 on;

    # Advertise HTTP/3 support
    add_header Alt-Svc 'h3=":443"; ma=86400';

    ssl_certificate     /etc/letsencrypt/live/yoursite.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/yoursite.com/privkey.pem;
}

Compression: Brotli over gzip

Brotli consistently compresses 15-25% better than gzip for text assets (HTML, CSS, JS). It's supported by all modern browsers. On nginx:

# Install brotli module (Debian/Ubuntu)
apt install libnginx-mod-brotli

# nginx.conf
brotli on;
brotli_comp_level 6;         # 1-11, 6 is sweet spot for speed vs ratio
brotli_types
    text/html
    text/css
    text/javascript
    application/javascript
    application/json
    application/xml
    image/svg+xml
    font/woff2;

# Keep gzip as fallback for older browsers
gzip on;
gzip_vary on;
gzip_comp_level 6;
gzip_types text/plain text/css application/javascript application/json;

Check whether your server is actually compressing with:

curl -H "Accept-Encoding: br" -I https://yoursite.com/styles.css | grep content-encoding
# Should return: content-encoding: br

TTFB optimization

Time to First Byte is a server-side metric. If your TTFB is above 600ms, look at:

Database queries — N+1 queries are the most common culprit. Use query logs to find slow queries
No server-side caching — even a 60-second Redis cache on heavily-trafficked pages can cut TTFB by 95%
Geographic distance — a server in US-East serving users in Southeast Asia adds 200ms of physical latency. Use a CDN edge network.
Cold starts — serverless functions that haven't been called recently pay an initialization penalty

CDN placement

A CDN (Content Delivery Network) caches your static assets at edge locations around the world. A user in Tokyo gets your CSS from a Tokyo edge node, not from your Oregon origin server. For purely static sites, a CDN with full-site caching can get TTFB under 50ms globally.

Cloudflare's free tier is a reasonable starting point. For more control, consider Bunny CDN or Fastly. The main configuration task: make sure your Cache-Control headers are set correctly so the CDN actually caches your responses.

The audit checklist

Run through these in order — the first items are highest impact for the least effort:

Run Lighthouse CLI, note your LCP, CLS, INP scores and their specific causes
Check the Coverage tab — identify JavaScript and CSS files with >50% unused coverage
Convert all JPEG/PNG images to WebP (or AVIF for hero images)
Add explicit width and height to every img tag (eliminate CLS)
Add loading="lazy" to all images below the fold
Preload the LCP image with <link rel="preload" as="image">
Add srcset and sizes to images that appear at different widths across breakpoints
Verify all scripts in <head> have defer or async (no render-blocking scripts)
Run npx depcheck — remove unused npm packages
Check bundlephobia.com for your 5 largest dependencies — are there lighter alternatives?
Enable route-based code splitting if you have a single-page app
Run PurgeCSS — check what percentage of your CSS is actually used
Extract critical CSS and inline it; load the rest non-blocking
Replace any CSS animations on layout properties with transform/opacity equivalents
Add font-display: swap to all @font-face declarations
Add rel="preconnect" for any external font or asset domains
Set correct Cache-Control headers: immutable for hashed assets, no-cache for HTML
Enable Brotli compression in nginx (verify with curl -H "Accept-Encoding: br")
Confirm HTTP/2 is enabled (check the Protocol column in DevTools Network tab)
Measure TTFB — if >600ms, profile your server-side rendering and database queries

Typical results after a full audit

A marketing site starting at 8.2MB page weight and 42 Lighthouse performance score typically reaches 1.4MB and 88 score after running through this checklist. LCP drops from 6.2s to under 2.5s (the "good" threshold). Real-world conversion rate improvement varies, but 10-15% is documented repeatedly at this level of improvement.

Tooling

Lighthouse CI for PRs

Automate performance testing so regressions get caught before they ship:

# .github/workflows/lighthouse.yml
name: Lighthouse CI
on: [pull_request]

jobs:
  lighthouse:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Install dependencies
        run: npm ci
      - name: Build
        run: npm run build
      - name: Run Lighthouse CI
        uses: treosh/lighthouse-ci-action@v11
        with:
          urls: |
            http://localhost:3000/
            http://localhost:3000/product/
          budgetPath: ./lighthouse-budget.json
          uploadArtifacts: true

// lighthouse-budget.json — fail the build if these thresholds aren't met
[
  {
    "path": "/*",
    "timings": [
      { "metric": "largest-contentful-paint", "budget": 2500 },
      { "metric": "cumulative-layout-shift",  "budget": 0.1 },
      { "metric": "interactive",              "budget": 3500 }
    ],
    "resourceSizes": [
      { "resourceType": "total",      "budget": 1000 },
      { "resourceType": "script",     "budget": 300 },
      { "resourceType": "image",      "budget": 500 }
    ]
  }
]

bundlephobia.com

Before adding any npm package, paste it into bundlephobia.com. It shows the minified + gzipped bundle size, the download time on 3G, and whether the package supports tree shaking. Make this a mandatory step in your dependency review process.

WebPageTest API

WebPageTest has an API you can integrate into CI for real-browser testing from real locations:

curl "https://www.webpagetest.org/runtest.php\
?url=https://yoursite.com\
&k=YOUR_API_KEY\
&f=json\
&location=Dulles:Chrome\
&runs=3\
&video=1" | jq '.data.testId'

web-vitals npm package for real-user monitoring

Lighthouse measures lab conditions. Real User Monitoring (RUM) captures what actual visitors experience — their device, their connection, their browser extensions:

import { onCLS, onINP, onLCP } from 'web-vitals';

function sendToAnalytics(metric) {
  // Send to your analytics endpoint
  navigator.sendBeacon('/analytics', JSON.stringify({
    name: metric.name,
    value: metric.value,
    rating: metric.rating, // 'good', 'needs-improvement', or 'poor'
    id: metric.id,
    url: location.href,
  }));
}

onCLS(sendToAnalytics);
onINP(sendToAnalytics);
onLCP(sendToAnalytics);

This gives you a distribution of real-world CWV scores across your actual users. A p75 LCP of 4.2s is much more actionable than a single Lighthouse score. You can also use GoatCounter's event API if you want a lightweight solution without a separate analytics service.

Other useful tools

source-map-explorer — treemap visualization of your JavaScript bundle contents
squoosh.app — browser-based image compression tool, great for one-off conversions
pagespeed.web.dev — Google's PageSpeed Insights with field data from the Chrome UX Report
requestmap.webperf.tools — visualize which domains your page makes requests to (great for identifying tracker bloat)
API Tester — verify your Cache-Control, Content-Encoding, and other HTTP headers are set correctly
Cron Generator — if you need to schedule regular Lighthouse CI reports or sitemap pings

The 49MB lesson

The viral 49MB page was an extreme example, but every bloated page got there the same way: one decision at a time, each one defensible in isolation. The fix isn't a single optimization — it's adding performance to your definition of "done." A PR that ships a 500KB unoptimized image should fail review the same way a PR with a security hole would. Make the checklist part of your process, not a one-time audit.