A rust CLI that calculates the file size of a web page when loaded with all external ressources.
  • Rust 96.4%
  • Just 1.7%
  • Dockerfile 1.2%
  • Nix 0.7%
Find a file
2026-03-11 11:30:02 +01:00
.cargo Added New build system, and updated .gitignore 2025-12-05 00:15:17 +01:00
.forgejo/workflows cross compilation with armv7l support 2025-09-24 22:34:46 +02:00
assets Added icon and icon source mention 2025-08-18 19:32:40 +02:00
src fix: guard display_analysis against divide-by-zero; update stale README 2026-02-24 16:39:24 +01:00
tests feat(test): add browser integration test scaffold (§9) and close all remaining TODO items 2026-02-24 16:35:58 +01:00
.gitignore Add Cargo.lock and flake.lock for nix develop compatibility 2026-03-11 11:25:17 +01:00
BUILDING.md docs: Comprehensive documentation update for v0.2.0 release 2025-09-19 01:52:19 +02:00
Cargo.lock Add Cargo.lock and flake.lock for nix develop compatibility 2026-03-11 11:25:17 +01:00
Cargo.toml release: correct version to v0.4.0 2026-02-24 16:50:32 +01:00
Containerfile Added New build system, and updated .gitignore 2025-12-05 00:15:17 +01:00
CONTRIBUTING.md feat(test): add browser integration test scaffold (§9) and close all remaining TODO items 2026-02-24 16:35:58 +01:00
flake.lock Add Cargo.lock and flake.lock for nix develop compatibility 2026-03-11 11:25:17 +01:00
flake.nix release: correct version to v0.4.0 2026-02-24 16:50:32 +01:00
Justfile refactor(core): separate asset-size collection from AssetInfo construction 2026-02-24 14:31:16 +01:00
LICENSE Initial commit 2025-04-16 10:28:33 +00:00
README.md release: correct version to v0.4.0 2026-02-24 16:50:32 +01:00
TODO.md release: correct version to v0.4.0 2026-02-24 16:50:32 +01:00

Logo of a white weighing machine in a green nature colored circle

A high-performance Rust utility for calculating and injecting web page sizes into HTML files with advanced optimization and parallel processing.

Description

webweigh is a command-line tool that scans HTML files in a directory, calculates their total page size (including referenced assets like CSS, JavaScript, and images), and optionally injects this size information into specified HTML elements.

Key capabilities:

  • Scan mode: Calculate total website size without modifying files
  • Injection mode: Update HTML files with calculated page sizes
  • Analysis mode: Detailed breakdown of page size calculations for debugging
  • Advanced asset detection: Finds assets referenced in CSS, JS, and HTML
  • Smart comment handling: Ignores commented-out code (matches browser behavior)
  • Performance optimized: Parallel processing with intelligent caching

This is a Rust rewrite of the Python script originally used by solar.lowtech-website, offering significantly improved performance and new features.

Features

Core Functionality

  • Triple Operation Modes: Scan-only, injection, or analysis mode for debugging
  • Page Analysis Mode: Detailed breakdown by asset type with size calculations
  • Comprehensive Asset Detection: CSS, JavaScript, images, fonts, and dynamic assets
  • Smart Comment Stripping: Ignores commented-out code in CSS and JavaScript
  • Advanced Asset Parsing: Detects dynamically loaded assets (script.src, fetch calls, etc.)
  • Nested Dependency Resolution: Follows asset imports and dependencies

Performance & Optimization

  • Parallel Processing: Multi-threaded file processing with Rayon
  • Intelligent Caching: File content and metadata caching to eliminate redundant I/O
  • Optimized Regex Engine: Pre-filtering with early exit patterns for 3-5x faster parsing
  • Memory Efficient: Reusable buffers and zero-allocation utilities

User Experience

  • Flexible CSS Selectors: Target any HTML element for size injection
  • Human-Readable Output: IEC units (KiB, MiB, GiB) with precise formatting
  • Advanced Exclusion System: Regex patterns with contextual exceptions
  • Configurable Logging: Silent to trace verbosity levels
  • Base URL Handling: Strip URL prefixes for relative path calculation
  • Comprehensive Statistics: Detailed processing reports and error context

Installation

Binary releases

Currently, I only build for Linux X86. You can grab the executable from the release page.

Cargo

cargo install --git https://codeberg.org/Pontoporeia/webweigh

From Source

See BUILDING.md for detailed build instructions.

git clone https://codeberg.org/Pontoporeia/webweigh.git
cd webweigh
cargo install --path .

Prerequisites

  • Rust 1.70+ (recommended)
  • Cargo package manager

Usage

Basic Usage

Scan mode (calculate sizes without modifying files):

webweigh --directory /path/to/website

Injection mode (update HTML files with calculated sizes):

webweigh --directory /path/to/website --selector ".page-size"

Common Examples

Scan a website (read-only analysis):

webweigh --directory ./dist

Update page sizes in elements with class "page-size":

webweigh --directory ./dist --selector ".page-size"

Process with exclusions and exceptions:

webweigh --directory ./dist --selector ".size" \
  --exclude "portfolio" "demo" \
  --except "portfolio/index.html"

Remove base URL prefix from asset paths:

webweigh --directory ./dist --selector ".size" \
  --base-url "https://example.com"

Dry run (calculate and validate selectors but write nothing):

webweigh --directory ./dist --selector ".page-size" --dry-run

Verbose processing with detailed logs:

webweigh --directory ./dist --selector "#size-info" -vv

Silent operation (no output):

webweigh --directory ./dist --selector "body" --silent

Analyze a specific page (detailed breakdown for debugging):

webweigh --directory ./dist --analyze-page "/"
webweigh --directory ./dist --analyze-page "/portfolio/" --base-url "https://example.com"

This will show:

  • Base HTML size
  • Assets grouped by type (Stylesheets, Scripts, Images, Fonts, etc.)
  • Size and percentage contribution of each asset type
  • Detailed list of all assets sorted by size

Perfect for debugging discrepancies between webweigh calculations and browser dev tools.

Excludes & Exceptions

Exclude Examples:

# excludes entire directory
-e static/portfolio
-e static/portfolio templates

# regex: excludes .tmp files in static/
-e "static/.*\.tmp$" 

# regex: excludes all .html files in content/microblog/
-e "content/microblog/.*\.html$"

Exceptions Examples:

# excludes all HTML in content/microblog/ except index.html in that directory
-e "content/microblog/.*.html$" --except index.html

# excludes content/temp/ directory except files ending with important.txt
  -e content/temp --except "important\.txt$"

Command Line Options

Usage: webweigh [OPTIONS] --directory <DIR>

Options:
  -d, --directory <DIR>          Directory to traverse (required)
      --base-url <URL>           Base URL prefix to remove from asset paths
      --selector <SELECTOR>      CSS selector for size injection (optional — scans without modification if omitted)
      --analyze-page <PATH>      Analyze specific page with detailed breakdown (e.g., "/", "/portfolio/")
      --dry-run                  Calculate sizes and validate selectors but do not write any files
  -e, --exclude <PATTERN>...     Exclude paths/patterns (supports literal paths and regex)
      --except <PATTERN>...      Exception patterns within exclusion scope
  -v, --verbose...               Logging verbosity: -v (errors), -vv (info), -vvv (debug), -vvvv (trace)
  -s, --silent                   Suppress all output
  -h, --help                     Show help information
  -V, --version                  Show version

Verbosity Levels

  • Default: Shows only the final statistics report (no log output)
  • -v: Shows error-level logs only
  • -vv: Shows info-level logs (processing start, configuration)
  • -vvv: Shows debug-level logs (detailed processing information)
  • -vvvv: Shows trace-level logs (maximum detail)
  • --silent: No output at all (suppresses even the statistics report)

Supported Assets

Direct References:

  • HTML files (.html, .htm)
  • CSS stylesheets (<link rel="stylesheet">)
  • JavaScript files (<script src>)
  • Images (<img src>, CSS url())
  • Fonts (CSS @font-face, url())
  • Icons (<link rel="icon">)

Dynamic Assets (detected in JS/CSS):

  • Dynamic script loading (script.src = "...")
  • Dynamic imports (import(), require())
  • Fetch calls (fetch("/api/data.json"))
  • CSS imports (@import url(...))
  • Asset assignments (link.href = "...")

What's New in v0.4.0

Refactoring & Optimization

  • Dependency reduction: removed thiserror, tempfile, walkdir, once_cell, percent-encoding, env_logger, clap_derive/heck; stdlib or inline replacements throughout
  • CLI: switched Clap from derive to builder API, eliminating proc-macro compile overhead
  • Performance: strip_css/js_comments now avoids allocation for comment-free files; redundant Html::parse_document call eliminated; inner Rayon threshold removed from collect_nested_assets_cached
  • Correctness: display_analysis guarded against divide-by-zero; replace_by_* loops converted to plain conditionals (fixes Clippy deny-level never_loop)
  • New flag: --dry-run — calculate sizes and validate selectors without writing any files
  • Naming: module and function names aligned with §4 conventions
  • Test scaffold: browser integration test skeleton added (--features browser-tests)

What's New in v0.3.0

Major Features

  • Page Analysis Mode: New --analyze-page flag provides detailed breakdown of page size calculations

    • View assets grouped by type (Stylesheets, Scripts, Images, Fonts, Icons)
    • See size and percentage contribution of each asset type
    • Complete asset list sorted by size for easy debugging
    • Perfect for investigating discrepancies with browser dev tools
  • Smart Comment Stripping: Revolutionary accuracy improvement

    • Ignores commented-out CSS (/* @import "file.css"; */)
    • Skips commented JavaScript (// import './module.js';)
    • Preserves comment-like text in strings and templates
    • Matches browser behavior: Only counts assets that are actually loaded
    • Significantly more accurate size calculations

Testing & Quality

  • Comprehensive Test Suite: 41 test cases including 12 new comment stripping tests
  • All Tests Passing: Verified behavior for edge cases and real-world scenarios
  • Better Accuracy: Calculations now match browser dev tools more closely

What's New in v0.2.0

Major Refactoring

  • Modular Architecture: Split monolithic code into focused modules
  • Performance Optimizations: 3-5x faster processing with intelligent caching
  • Enhanced Asset Detection: Comprehensive dynamic asset discovery

New Features

  • Scan-only Mode: Analyze websites without modifying files
  • Parallel Processing: Multi-threaded asset collection and processing
  • Advanced Caching: File content and metadata caching
  • Memory Optimization: Reduced allocations and memory reuse

Improvements

  • Better Error Handling: More descriptive error messages
  • Enhanced Exclusions: Improved regex vs literal pattern detection
  • Zero Warnings: Clean compilation with no warnings
  • Comprehensive Tests: 26 test cases covering all functionality

Dependencies

Core runtime dependencies:

  • clap: Command-line argument parsing (builder API, no proc-macro derive)
  • scraper: HTML parsing and CSS selector evaluation
  • regex: Pattern matching for asset discovery in CSS and JS
  • rayon: Work-stealing parallel processing
  • log: Logging facade (stderr logger is inlined; no env_logger required)
  • anyhow: Error handling in the binary entry point

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines on how to contribute to webweigh.

For build instructions and development setup, see BUILDING.md.

Credits

Icon: Phosphor Icons

License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).

Copyright (C) 2025 Pontoporeia

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see http://www.gnu.org/licenses/.