Content Moderation

Decentralized moderation is an unsolved problem. Hashiverse does not pretend to have fully solved it, but it does have a layered, principled approach that addresses most of the common failure modes without reintroducing a central authority. Each layer is independently useful; together they provide meaningful coverage.

Layer 1: Organic expiry through healing

Content in Hashiverse survives as long as clients keep fetching it. Every time a client retrieves a bucket of posts, it identifies servers that are missing posts and heals those gaps. This means popular, frequently-visited content is continuously replicated across the network's nearest servers. Content that nobody fetches stops being healed. As servers fill up and apply their LRU eviction policies, unreferenced content is quietly dropped — with no central decision, no takedown notice, no human judgment required.

Content that endures does so because it remains interesting to someone. Content that causes acute harm — a harassment campaign, targeted disinformation — typically spikes in engagement and then fades as the episode passes. Once clients stop fetching it, healing stops, and the network naturally forgets it. Truly persistent harmful content would require persistent human interest, which is a much harder bar to sustain than simply posting something once.

Layer 2: PoW-weighted feedback

Users can signal on posts using typed feedback: positive reactions (like, love, etc.) and report categories including sensitive content, hate speech, harassment, misinformation, spam, and scam. Each signal carries a proof-of-work level. The network maintains the globally maximum PoW signal for each (post_id, feedback_type) pair and heals this maximum across servers.

Harm metrics are computed from the aggregate of report signals, weighted by their PoW. High-PoW signals from multiple independent users are very difficult to game: an adversary would need to out-compute the combined work of a community that cares about a piece of content.

CSAM reporting

Child Sexual Abuse Material (CSAM) is a distinct feedback category with a lower client-side coverage threshold than other report types — a smaller amount of accumulated PoW signal is required before the client covers the content. However, the underlying PoW mechanics are identical to all other feedback types: reporters must still do real computational work, which is the primary sybil resistance mechanism.

This design is a deliberate trade-off. Any low-threshold, fast-action mechanism can be weaponised: a troll could falsely flag legitimate content as CSAM to censor it. The PoW requirement means mass false flagging campaigns have a real and proportional cost, making them economically unattractive at scale. Coverage is also not irreversible — the viewer retains the ability to uncover content with a strong warning — preserving user agency and preventing a single false flag from permanently silencing a post for everyone.

When a user submits a CSAM report, the client displays a confirmation dialog warning that falsely reporting CSAM is illegal in many jurisdictions and diverts law enforcement resources from genuine cases of child abuse.

Source: encoded_post_feedback.rs

Layer 3: User-configurable categories

CSAM is always filtered. This is a hard-coded, non-overridable default. All other harm categories have user-configurable defaults: violence, threats, and spam are filtered by default; adult content filtering is on by default but can be turned off for appropriate contexts. The configuration lives in the client, not on any server, so no server operator can override a user's choices.

Layer 4: Friction-proportional revelation

Rather than hiding flagged content entirely — which would be a form of censorship — clients introduce a revelation delay proportional to the severity of the community's feedback signal:

Low signal: 0–5 second delay before the content is shown
Moderate signal: 10–30 seconds
High signal: 30–60+ seconds

The delay is session-based: once a user has waited through the delay and viewed a piece of content, they are not asked again in the same session. This respects user autonomy — the content is accessible — while making casual exposure to severely flagged content practically difficult. The friction serves a similar function to the "are you sure?" dialog, scaled to community concern.

Layer 5: Image restrictions in public contexts

Images in hashtag and mention buckets — contexts where content is surfaced to users who didn't specifically subscribe to the author — are restricted by default. A user's personal timeline has no such restriction. This limits the blast radius of image-based harm in discovery contexts without affecting content within chosen subscriptions.

Layer 6: On-device classifiers (pending)

This layer is planned but not yet implemented, pending sufficient maturity in client-side AI models running in the browser. The intent is for the client to run a nudity classifier locally — no content sent to a central service — and automatically apply a content warning before the community PoW feedback system has had time to accumulate signal. This is particularly valuable for newly posted content that hasn't yet been seen by enough users to trigger coverage thresholds.

The leading candidate is NSFWJS, which runs MobileNet-based classification entirely in the browser via TensorFlow.js. As model quality and browser inference performance improve, a layered approach combining NSFWJS for fast first-pass screening with a more capable model for age estimation is also under consideration.

Interaction of layers

The layers are designed to interact constructively. A piece of harmful content that enters the network immediately encounters: PoW cost on submission (expensive to post at scale), image restrictions in discovery contexts, on-device classification, and community feedback that accumulates PoW weight over time and introduces increasing friction. By six months, the content expires entirely. No single layer is sufficient; together they create multiple independent barriers that an adversary must overcome simultaneously.

Known gaps

Text-based harm that does not trigger image classifiers and does not accumulate community reports quickly — sophisticated disinformation, subtle grooming, context-dependent threats — is the hardest case. The six-month window means serious harm can persist for an uncomfortable length of time before expiry. These are real limitations of the architecture. Work continues on improving detection within the constraints of keeping the system decentralized and the client the arbiter of what is shown.

Source: encoded_post_feedback.rs