- Rust 96.4%
- Just 1.7%
- Dockerfile 1.2%
- Nix 0.7%
| .cargo | ||
| .forgejo/workflows | ||
| assets | ||
| src | ||
| tests | ||
| .gitignore | ||
| BUILDING.md | ||
| Cargo.lock | ||
| Cargo.toml | ||
| Containerfile | ||
| CONTRIBUTING.md | ||
| flake.lock | ||
| flake.nix | ||
| Justfile | ||
| LICENSE | ||
| README.md | ||
| TODO.md | ||
A high-performance Rust utility for calculating and injecting web page sizes into HTML files with advanced optimization and parallel processing.
Description
webweigh is a command-line tool that scans HTML files in a directory, calculates their total page size (including referenced assets like CSS, JavaScript, and images), and optionally injects this size information into specified HTML elements.
Key capabilities:
- Scan mode: Calculate total website size without modifying files
- Injection mode: Update HTML files with calculated page sizes
- Analysis mode: Detailed breakdown of page size calculations for debugging
- Advanced asset detection: Finds assets referenced in CSS, JS, and HTML
- Smart comment handling: Ignores commented-out code (matches browser behavior)
- Performance optimized: Parallel processing with intelligent caching
This is a Rust rewrite of the Python script originally used by solar.lowtech-website, offering significantly improved performance and new features.
Features
Core Functionality
- Triple Operation Modes: Scan-only, injection, or analysis mode for debugging
- Page Analysis Mode: Detailed breakdown by asset type with size calculations
- Comprehensive Asset Detection: CSS, JavaScript, images, fonts, and dynamic assets
- Smart Comment Stripping: Ignores commented-out code in CSS and JavaScript
- Advanced Asset Parsing: Detects dynamically loaded assets (script.src, fetch calls, etc.)
- Nested Dependency Resolution: Follows asset imports and dependencies
Performance & Optimization
- Parallel Processing: Multi-threaded file processing with Rayon
- Intelligent Caching: File content and metadata caching to eliminate redundant I/O
- Optimized Regex Engine: Pre-filtering with early exit patterns for 3-5x faster parsing
- Memory Efficient: Reusable buffers and zero-allocation utilities
User Experience
- Flexible CSS Selectors: Target any HTML element for size injection
- Human-Readable Output: IEC units (KiB, MiB, GiB) with precise formatting
- Advanced Exclusion System: Regex patterns with contextual exceptions
- Configurable Logging: Silent to trace verbosity levels
- Base URL Handling: Strip URL prefixes for relative path calculation
- Comprehensive Statistics: Detailed processing reports and error context
Installation
Binary releases
Currently, I only build for Linux X86. You can grab the executable from the release page.
Cargo
cargo install --git https://codeberg.org/Pontoporeia/webweigh
From Source
See BUILDING.md for detailed build instructions.
git clone https://codeberg.org/Pontoporeia/webweigh.git
cd webweigh
cargo install --path .
Prerequisites
- Rust 1.70+ (recommended)
- Cargo package manager
Usage
Basic Usage
Scan mode (calculate sizes without modifying files):
webweigh --directory /path/to/website
Injection mode (update HTML files with calculated sizes):
webweigh --directory /path/to/website --selector ".page-size"
Common Examples
Scan a website (read-only analysis):
webweigh --directory ./dist
Update page sizes in elements with class "page-size":
webweigh --directory ./dist --selector ".page-size"
Process with exclusions and exceptions:
webweigh --directory ./dist --selector ".size" \
--exclude "portfolio" "demo" \
--except "portfolio/index.html"
Remove base URL prefix from asset paths:
webweigh --directory ./dist --selector ".size" \
--base-url "https://example.com"
Dry run (calculate and validate selectors but write nothing):
webweigh --directory ./dist --selector ".page-size" --dry-run
Verbose processing with detailed logs:
webweigh --directory ./dist --selector "#size-info" -vv
Silent operation (no output):
webweigh --directory ./dist --selector "body" --silent
Analyze a specific page (detailed breakdown for debugging):
webweigh --directory ./dist --analyze-page "/"
webweigh --directory ./dist --analyze-page "/portfolio/" --base-url "https://example.com"
This will show:
- Base HTML size
- Assets grouped by type (Stylesheets, Scripts, Images, Fonts, etc.)
- Size and percentage contribution of each asset type
- Detailed list of all assets sorted by size
Perfect for debugging discrepancies between webweigh calculations and browser dev tools.
Excludes & Exceptions
Exclude Examples:
# excludes entire directory
-e static/portfolio
-e static/portfolio templates
# regex: excludes .tmp files in static/
-e "static/.*\.tmp$"
# regex: excludes all .html files in content/microblog/
-e "content/microblog/.*\.html$"
Exceptions Examples:
# excludes all HTML in content/microblog/ except index.html in that directory
-e "content/microblog/.*.html$" --except index.html
# excludes content/temp/ directory except files ending with important.txt
-e content/temp --except "important\.txt$"
Command Line Options
Usage: webweigh [OPTIONS] --directory <DIR>
Options:
-d, --directory <DIR> Directory to traverse (required)
--base-url <URL> Base URL prefix to remove from asset paths
--selector <SELECTOR> CSS selector for size injection (optional — scans without modification if omitted)
--analyze-page <PATH> Analyze specific page with detailed breakdown (e.g., "/", "/portfolio/")
--dry-run Calculate sizes and validate selectors but do not write any files
-e, --exclude <PATTERN>... Exclude paths/patterns (supports literal paths and regex)
--except <PATTERN>... Exception patterns within exclusion scope
-v, --verbose... Logging verbosity: -v (errors), -vv (info), -vvv (debug), -vvvv (trace)
-s, --silent Suppress all output
-h, --help Show help information
-V, --version Show version
Verbosity Levels
- Default: Shows only the final statistics report (no log output)
-v: Shows error-level logs only-vv: Shows info-level logs (processing start, configuration)-vvv: Shows debug-level logs (detailed processing information)-vvvv: Shows trace-level logs (maximum detail)--silent: No output at all (suppresses even the statistics report)
Supported Assets
Direct References:
- HTML files (
.html,.htm) - CSS stylesheets (
<link rel="stylesheet">) - JavaScript files (
<script src>) - Images (
<img src>, CSSurl()) - Fonts (CSS
@font-face,url()) - Icons (
<link rel="icon">)
Dynamic Assets (detected in JS/CSS):
- Dynamic script loading (
script.src = "...") - Dynamic imports (
import(),require()) - Fetch calls (
fetch("/api/data.json")) - CSS imports (
@import url(...)) - Asset assignments (
link.href = "...")
What's New in v0.4.0
Refactoring & Optimization
- Dependency reduction: removed
thiserror,tempfile,walkdir,once_cell,percent-encoding,env_logger,clap_derive/heck; stdlib or inline replacements throughout - CLI: switched Clap from derive to builder API, eliminating proc-macro compile overhead
- Performance:
strip_css/js_commentsnow avoids allocation for comment-free files; redundantHtml::parse_documentcall eliminated; inner Rayon threshold removed fromcollect_nested_assets_cached - Correctness:
display_analysisguarded against divide-by-zero;replace_by_*loops converted to plain conditionals (fixes Clippy deny-levelnever_loop) - New flag:
--dry-run— calculate sizes and validate selectors without writing any files - Naming: module and function names aligned with
§4conventions - Test scaffold: browser integration test skeleton added (
--features browser-tests)
What's New in v0.3.0
Major Features
-
Page Analysis Mode: New
--analyze-pageflag provides detailed breakdown of page size calculations- View assets grouped by type (Stylesheets, Scripts, Images, Fonts, Icons)
- See size and percentage contribution of each asset type
- Complete asset list sorted by size for easy debugging
- Perfect for investigating discrepancies with browser dev tools
-
Smart Comment Stripping: Revolutionary accuracy improvement
- Ignores commented-out CSS (
/* @import "file.css"; */) - Skips commented JavaScript (
// import './module.js';) - Preserves comment-like text in strings and templates
- Matches browser behavior: Only counts assets that are actually loaded
- Significantly more accurate size calculations
- Ignores commented-out CSS (
Testing & Quality
- Comprehensive Test Suite: 41 test cases including 12 new comment stripping tests
- All Tests Passing: Verified behavior for edge cases and real-world scenarios
- Better Accuracy: Calculations now match browser dev tools more closely
What's New in v0.2.0
Major Refactoring
- Modular Architecture: Split monolithic code into focused modules
- Performance Optimizations: 3-5x faster processing with intelligent caching
- Enhanced Asset Detection: Comprehensive dynamic asset discovery
New Features
- Scan-only Mode: Analyze websites without modifying files
- Parallel Processing: Multi-threaded asset collection and processing
- Advanced Caching: File content and metadata caching
- Memory Optimization: Reduced allocations and memory reuse
Improvements
- Better Error Handling: More descriptive error messages
- Enhanced Exclusions: Improved regex vs literal pattern detection
- Zero Warnings: Clean compilation with no warnings
- Comprehensive Tests: 26 test cases covering all functionality
Dependencies
Core runtime dependencies:
- clap: Command-line argument parsing (builder API, no proc-macro derive)
- scraper: HTML parsing and CSS selector evaluation
- regex: Pattern matching for asset discovery in CSS and JS
- rayon: Work-stealing parallel processing
- log: Logging facade (stderr logger is inlined; no
env_loggerrequired) - anyhow: Error handling in the binary entry point
Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines on how to contribute to webweigh.
For build instructions and development setup, see BUILDING.md.
Credits
Icon: Phosphor Icons
License
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).
Copyright (C) 2025 Pontoporeia
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see http://www.gnu.org/licenses/.