Search

Taxus provides a built-in search component with client-side full-text search. The SearchBox island component uses TF-IDF (Term Frequency-Inverse Document Frequency) ranking with English stemming.

Overview

When the islands feature is enabled, the build pipeline:

Generates a search index at dist/search_index.bin
The SearchBox component is available for use in templates

The binary index contains:

Document metadata — Title, path, summary, tags, and categories for each page
Inverted index — Mapping from word stems to document IDs with TF-IDF scores

The index is serialized with postcard for compact storage and fast deserialization in the browser.

Enabling Search

Search requires the islands feature:

cargo run --features islands -- build --dir my-site

This generates dist/search_index.bin alongside your static files.

The SearchBox island component provides a ready-to-use search interface. Add it to any template:

<div class="search-container">
  {{ island(component="SearchBox") | safe }}
</div>

Props

Prop	Type	Default	Description
`placeholder`	string	`"Search..."`	Placeholder text for the input
`max_results`	number	`5`	Maximum number of results to display
`class`	string	`""`	Custom CSS classes to append to the outer container

Example with custom props:

{{ island(component="SearchBox", placeholder="Find content...", max_results=10, class="docs-search") | safe }}

Styling

The component uses these CSS classes that you can style:

Class	Element
`.search-box`	Container div
`.search-input`	Text input field
`.search-results`	Results list (`<ul>`)
`.search-result`	Individual result item (`<li>`)
`.search-result-link`	Result title link
`.search-result-summary`	Result summary text

Use the class prop to add custom classes for styling hooks:

{{ island(component="SearchBox", class="docs-search") | safe }}

Then target the custom class in your SCSS:

.docs-search .search-input {
  // Custom styles for docs search input
}

Example SCSS:

.search-container {
  max-inline-size: 48rem;
  margin-inline: auto;
  padding-inline: 1.5rem;
}

.search-input {
  font-family: var(--font-mono);
  font-size: 0.95rem;
  padding: 0.6rem 1rem;
  border-radius: 0.5rem;
  border: 1px solid var(--border);
  background-color: var(--bg-surface);
  color: var(--text);
}

.search-input:focus {
  outline: none;
  border-color: var(--accent);
  box-shadow: 0 0 0 3px var(--accent-soft);
}

.search-result {
  background-color: var(--bg-surface);
  border: 1px solid var(--border);
  border-radius: 0.5rem;
  padding: 0.75rem 1rem;
}

.search-result-link {
  font-family: var(--font-mono);
  font-weight: 600;
  color: var(--accent);
  text-decoration: none;
}

.search-result-summary {
  font-size: 0.85rem;
  color: var(--text-muted);
}

How It Works

Indexing Pipeline

Tokenization — Content is split into lowercase words, filtering out words shorter than 3 characters
Stemming — Words are reduced to their root form using the Porter stemmer (e.g., "programming" → "program")
TF-IDF Scoring — Each term gets a weight based on:
- Term Frequency (TF) — How often the term appears in a document
- Inverse Document Frequency (IDF) — How rare the term is across all documents

Search Query Processing

When a user searches:

The query is tokenized and stemmed using the same process
Each stem's postings are retrieved from the index
TF-IDF scores are summed for matching documents
Results are returned sorted by relevance score

Component Architecture

The SearchBox component:

Uses a 200ms debounce on input to avoid excessive queries
Requires at least 2 characters before searching
Calls the window.wasmBindings.search() function exposed by the WASM client
The WASM client lazily loads the search index on first use
Results are truncated to max_results and displayed in a list

Output Format

The search index is written to dist/search_index.bin in postcard binary format.

Each SearchDocument in the results contains:

Field	Description
`id`	Unique document identifier
`title`	Page title from frontmatter
`path`	URL path (e.g., `/blog/my-post/`)
`summary`	Page summary for display
`tags`	Tags from frontmatter
`categories`	Categories from frontmatter

API Reference

`SearchDocument`

#![allow(unused)]
fn main() {
pub struct SearchDocument {
    pub id: u32,
    pub title: String,
    pub path: String,
    pub summary: String,
    pub tags: Vec<String>,
    pub categories: Vec<String>,
}
}

`SearchIndex`

#![allow(unused)]
fn main() {
pub struct SearchIndex {
    pub documents: Vec<SearchDocument>,
    pub index: HashMap<String, Vec<(u32, f32)>>,
}
}

Method	Description
`new() -> Self`	Create an empty index
`add_document(doc, content)`	Add a document with its content
`search(query) -> Vec<&SearchDocument>`	Search and return ranked results
`finalize()`	Apply IDF weighting (call after all documents added)
`to_bytes() -> Vec<u8>`	Serialize to binary
`from_bytes(bytes) -> Self`	Deserialize from binary

Helper Functions

#![allow(unused)]
fn main() {
pub fn tokenize(text: &str) -> Vec<String>
}

Splits text into lowercase tokens, filtering words shorter than 3 characters.

#![allow(unused)]
fn main() {
pub fn stem(tokens: &[String]) -> Vec<String>
}

Applies English Porter stemmer to tokens.

Performance

Index size — Typically 10-30% of total content size
Deserialization — Near-instant with postcard format
Search latency — Sub-millisecond for typical queries
Lazy loading — Index is loaded only when first search is performed

Limitations

English only — Stemming is currently English-only
No phrase search — Queries are treated as bag-of-words
No highlighting — Results don't include matched snippets

Taxus Documentation