GhostFS
A production-grade FUSE filesystem backed by sled, written in Rust. Two personalities in one binary: Normal — a high-performance ext4-class filesystem for everyday use; Cybersec — a forensics-ready, encryption-mandatory, MAC-enforcing fortress for security professionals and incident responders.
What is GhostFS?
GhostFS is a userspace filesystem designed for HackerOS. It runs on top of FUSE and stores all metadata and data inside a sled embedded key-value store, giving it ACID-grade crash safety without requiring a dedicated block device.
It is compiled as one of two personalities selected at build time:
Normal Mode
Feature: normal
- Write-Ahead Log (WAL) for crash safety
- Extent-based block mapping (fewer metadata ops)
- HTree-style directory index (O(log n) lookup)
- Read-ahead + write-back block cache
- Content-addressable block deduplication (BLAKE3)
- zstd / lz4 / zlib transparent compression
- Per-user disk quotas
- Extended attributes (xattr)
- Optional AES-256-GCM encryption
- Rotating structured audit log
- Background self-repair worker
Cybersec Mode
Feature: cybersec
- Everything in Normal, plus:
- Encryption is mandatory — no unencrypted blocks
- Per-file Merkle integrity tree (tamper detection)
- Bell-LaPadula Mandatory Access Control labels
- Filesystem-level IDS with anomaly alerting
- Hash-chained immutable forensics log
- SIEM-exportable forensics tail
- Root bypass only within MAC trust domain
- Compartment-based access control
- Rapid-enumeration scanner detection
- Mass-delete ransomware heuristic
Layer Architecture
Build Modes
Normal Mode
# Build Normal (default) cargo build --release # With all compression backends cargo build --release --features zstd,lz4,normal
Cybersec Mode
# Build Cybersec (mandatory encryption)
cargo build --no-default-features --features cybersec,zstd,lz4 --release
--key-file results in an immediate error.Feature Matrix
Write-Ahead Journal
GhostFS Normal implements a ordered-data WAL modelled after ext4's default journalling mode. Before any block or metadata write is committed to sled, a before-image is appended to the journal. On next mount, journal::recover() replays uncommitted records to restore consistency.
Journal record format
// journal.rs — JournalRecord pub enum JournalOp { WriteBlock { ino, block_idx, before: Option<Vec<u8>> }, DeleteBlock { ino, block_idx, before: Option<Vec<u8>> }, MetaUpdate { key: String, before: Option<Vec<u8>> }, }
Commit barrier
Every batch write calls journal::commit_barrier() before sled::apply_batch(). This marks all pending records as committed and flushes sled to disk (fdatasync-equivalent). Entries older than 256 are pruned to keep the journal compact.
SYNC_ON_BARRIER = false in journal.rs. Not recommended for production.Extent Tree
Rather than storing one sled key per block (flat mapping), GhostFS groups consecutive blocks into extents — runs of contiguous logical blocks that share a single metadata record. This mirrors ext4's extent B-tree and dramatically reduces the metadata footprint for large files.
| Scenario | Flat mapping keys | Extent mapping keys |
|---|---|---|
| 1 GiB sequential file | 262,144 | 1 (single extent) |
| Fragmented file (64 runs) | 262,144 | 64 |
| Small file 12 KiB | 3 | 1 |
Extent coalescing
When put_block() writes block N+1 immediately after an existing extent ending at N, the extent is extended in-place — no new sled key is written. This makes sequential writes extremely metadata-efficient.
// extents.rs — coalescing logic if prev_start + ext.length as u64 == logical { ext.length += 1; // extend in-place — O(1), no new key self.db.insert(ekey, bincode::serialize(&ext)?)?; return Ok(()); }
HTree Directory Index
Large directories in ext4 use an HTree (hash B-tree) for O(log n) lookup. GhostFS implements the same idea via dirindex.rs: every directory entry is duplicated into a secondary index keyed by the BLAKE3 hash of the name, giving stable sort order independent of insertion sequence.
Sled key structure
For small directories (<16 entries), readdir_entries() falls back to a linear scan to avoid index overhead. The switch is transparent to callers.
Read-ahead Cache
cache.rs provides a two-level LRU cache (10,000 inodes + 10,000 blocks ≈ 40 MiB) backed by DashMap for concurrent access and a LruCache for eviction policy.
Read-ahead
After a block is fetched, Cache::read_ahead_hint(last_block) returns the next 8 block indices. The caller (read_data) can prefetch these in a background pass, hiding latency for sequential reads.
Write-back coalescing
Dirty blocks are collected in dirty_blocks and flushed together via flush_dirty(). This coalesces multiple small writes into a single sled batch, reducing write amplification.
Block Deduplication
Every block written is hashed with BLAKE3 before storage. If an identical hash already exists in the dedup index, only a reference record is written — the block data is not duplicated. Reference counting ensures blocks are reclaimed only when all references are removed.
| Sled key | Value | Purpose |
|---|---|---|
dedup:<hash> | (ino, block_idx) | Hash → first occurrence |
ref:<ino>:<idx> | (orig_ino, orig_idx) | Back-reference |
refcount:<ino>:<idx> | u64 | Reference counter |
hash:<ino>:<idx> | [u8;32] | Per-block integrity hash |
Transparent Compression
The compression pipeline sits between the block cache and the deduplication / encryption layers. Data is compressed before encryption, maximising the efficiency of both operations.
| Algorithm | Feature flag | Best for | Notes |
|---|---|---|---|
none | always | Encrypted / already-compressed data | Zero CPU overhead |
zlib | always | Text, source code | High ratio, moderate speed |
zstd | zstd | General purpose | Best ratio/speed tradeoff |
lz4 | lz4 | Hot data, real-time | Fastest decompression |
# Mount with zstd compression
ghostfs mount -d /dev/sdb1 -m /mnt/ghost --compression zstd
AES-256-GCM Encryption
Encryption uses AES-256-GCM (authenticated encryption) with a fresh random 96-bit nonce per block. The nonce is prepended to the ciphertext. The GCM authentication tag detects any tampering at the ciphertext level — this is the first line of defence, before the Merkle tree.
Key management
# Generate a 256-bit key openssl rand -hex 32 > /etc/ghostfs/key.hex chmod 600 /etc/ghostfs/key.hex # Mount cybersec mode ghostfs mount -d /dev/sdb1 -m /mnt/secure \ --cybersecurity \ --key-file /etc/ghostfs/key.hex \ --compression zstd
Encryption pipeline
plaintext block
│
▼ compression (zstd/lz4/zlib)
compressed
│
▼ AES-256-GCM (random nonce, authenticated)
[nonce 12B][ciphertext][GCM tag 16B]
│
▼ sled insert("data:<ino>:<block>", ...)
Merkle Integrity Tree
The AES-GCM tag protects against ciphertext tampering. The Merkle tree (integrity.rs) provides an additional layer: it detects tampering with the plaintext after decryption (e.g. key compromise, sled-level manipulation).
Every block's BLAKE3 hash is stored as a Merkle leaf. Internal nodes are BLAKE3 hashes of child pairs. The root hash summarises the entire file in 32 bytes and is recomputed on every write.
Verification
// integrity.rs — verify_block() called on every read let computed = blake3::hash(data); if computed.as_bytes() != stored.as_ref() { return Err(HfsError::CorruptedData); // → EIO to application }
Root export
The Merkle root for any file is accessible via IntegrityTree::root(ino). This enables external verification pipelines (e.g. comparing roots between a live mount and a forensic image).
Bell-LaPadula MAC
Mandatory Access Control enforces information flow policies regardless of filesystem permissions. Every inode has a MacLabel; every UID has a MacClearance.
Sensitivity levels
| Level | Value | Analog |
|---|---|---|
Unclassified | 0 | Public / world-readable |
Restricted | 1 | Internal use only |
Confidential | 2 | Need-to-know |
TopSecret | 3 | Compartmented intelligence |
MAC rules
| Rule | Condition for ALLOW |
|---|---|
| No-Read-Up | clearance.level ≥ label.level |
| No-Write-Down | clearance.level ≤ label.level |
| Compartments | (clearance.compartments & label.compartments) == label.compartments |
Assigning labels
// Programmatic — called from a management daemon or ghostfs-admin tool fs.mac.set_label(ino, &MacLabel { level: SensitivityLevel::Confidential, compartments: 0b0000_0011, // compartments A + B })?; fs.mac.set_clearance(uid, &MacClearance { level: SensitivityLevel::Confidential, compartments: 0b1111_1111, // all compartments trusted: false, })?;
Intrusion Detection System
ids.rs hooks into every I/O path and applies five detection rules using per-UID sliding-window counters (60-second windows, stored in sled):
| Rule | Trigger | Alert Kind |
|---|---|---|
| Brute-force | >20 permission denials / 60 s per UID | BruteForce |
| Mass-delete | >50 unlinks / 60 s per UID | MassDelete |
| Rapid enumeration | >500 readdirs / 60 s per UID | RapidEnumeration |
| Integrity violation | Merkle hash mismatch on read | IntegrityViolation |
| MAC violation | Access denied by MAC check | MacViolation |
Alerts are persisted at ids:alert:<timestamp>:<seq> and retrievable via Ids::recent_alerts(n) for SIEM integration. All thresholds are compile-time constants in ids.rs and easy to tune.
Hash-chained Forensics Log
Every filesystem operation is recorded in an append-only, hash-chained log in forensics.rs. Each entry contains the BLAKE3 hash of the previous entry — any deletion or modification of a historical record breaks the chain and is immediately detectable.
Entry structure
pub struct ForensicsEntry { pub seq: u64, pub timestamp_us: u128, // microsecond precision pub uid: u32, pub operation: String, pub ino: u64, pub name: Option<Vec<u8>>, pub prev_hash: [u8; 32], // chain link pub self_hash: [u8; 32], // BLAKE3 of all above fields }
Chain verification
# Verify chain integrity (0 = clean, Err = tampered) let count = fs.forensics.verify_chain()?; println!("✓ {} forensics entries verified", count);
SIEM export
Use Forensics::tail(n) to export the last n entries for ingestion into Splunk, Elastic, or any syslog-compatible SIEM. Each entry includes µs-precision timestamps suitable for timeline correlation.
Structured Audit Log
The audit log (audit.rs) provides a fast, human-readable trail of all filesystem operations. It is separate from the forensics chain (which is immutable and cryptographically linked); the audit log rotates at 100,000 entries.
In cybersec mode, log_audit() in lib.rs writes to both the audit log (fast, mutable, rotatable) and the forensics chain (slow, immutable, hash-chained). This dual-write strategy allows operators to query recent events quickly while retaining court-admissible evidence.
Building GhostFS
# Prerequisites apt install fuse libfuse-dev build-essential curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh # Clone and build — Normal (default) git clone https://github.com/hackeros/ghostfs cd ghostfs cargo build --release # Build — Cybersec cargo build --no-default-features --features cybersec,zstd,lz4 --release # Install install -m 755 target/release/ghostfs /usr/local/bin/
Formatting (mkfs)
# Format a raw file or block device dd if=/dev/zero of=/var/ghostfs.img bs=1M count=4096 ghostfs mkfs --device /var/ghostfs.img # With custom block size (must be power of 2, ≥ 512) ghostfs mkfs --device /dev/sdb1 --block-size 4096
mkfs is destructive — it overwrites the sled database. Always back up data before re-formatting.Mounting GhostFS
Normal mode
# Plain mount ghostfs mount -d /var/ghostfs.img -m /mnt/ghost # With lz4 compression and noatime ghostfs mount -d /var/ghostfs.img -m /mnt/ghost \ --compression lz4 --noatime # Normal mode with optional encryption ghostfs mount -d /var/ghostfs.img -m /mnt/ghost \ --cybersecurity --key-file /root/key.hex
Cybersec mode
# Cybersec — encryption mandatory ghostfs mount -d /dev/sdb1 -m /mnt/secure \ --cybersecurity \ --key-file /etc/ghostfs/key.hex \ --compression zstd \ --noatime # Unmount ghostfs umount -m /mnt/secure
systemd unit
# /etc/systemd/system/ghostfs.service
[Unit]
Description=GhostFS
After=local-fs.target
[Service]
ExecStart=/usr/local/bin/ghostfs mount \
-d /var/ghostfs.img -m /mnt/ghost \
--compression zstd --noatime
ExecStop=/usr/local/bin/ghostfs umount -m /mnt/ghost
Restart=on-failure
[Install]
WantedBy=multi-user.target
Sled Key Layout
All state lives inside the sled embedded database. Understanding the key layout is useful for debugging, forensics, and manual inspection.
| Key pattern | Value type | Description |
|---|---|---|
next_ino | u64 | Next inode number |
sb:* | mixed | Superblock fields (version, block_size, created) |
inode:<ino> | Inode | Inode metadata |
dir:<parent>:<name> | u64 | Directory entry → child ino |
didx:<parent>:<hash>:<name> | u64 | HTree index entry |
data:<ino>:<block> | [u8] | Block data (compressed, possibly encrypted) |
ext:<ino>:<start> | Extent | Extent tree record |
dedup:<hash> | (u64, usize) | Content hash → origin block |
ref:<ino>:<idx> | (u64, usize) | Dedup back-reference |
xattr:<ino>:<name> | [u8] | Extended attribute value |
quota:<uid> | UserQuota | Per-user quota + usage |
journal:seq:<n> | JournalRecord | WAL entry |
audit:entry:<seq> | AuditEntry | Audit log record |
itree:<ino>:leaf:<idx> | [u8;32] | [cybersec] Merkle leaf hash |
itree:<ino>:root | [u8;32] | [cybersec] Merkle root |
mac:label:<ino> | MacLabel | [cybersec] MAC label |
mac:clearance:<uid> | MacClearance | [cybersec] Subject clearance |
ids:stats:<uid> | UidStats | [cybersec] IDS per-UID counters |
ids:alert:<ts>:<seq> | IdsAlert | [cybersec] IDS alert |
forensics:seq:<n> | ForensicsEntry | [cybersec] Hash-chained log entry |
Testing Guide
Prerequisites
apt install fuse3 libfuse-dev attr modprobe fuse
mkdir -p /tmp/gfs-db /tmp/gfs-mnt
dd if=/dev/zero of=/tmp/gfs.img bs=1M count=512
ghostfs mkfs --device /tmp/gfs.img
ghostfs mount -d /tmp/gfs.img -m /tmp/gfs-mnt --noatime &
sleep 1
mount | grep ghostfs # should show the mount
ghostfs umount -m /tmp/gfs-mnt
ghostfs mount -d /tmp/gfs.img -m /tmp/gfs-mnt --compression lz4 & sleep 1 # Create, write, read echo "hello ghostfs" > /tmp/gfs-mnt/test.txt cat /tmp/gfs-mnt/test.txt # Directory ops mkdir -p /tmp/gfs-mnt/a/b/c ls -la /tmp/gfs-mnt/a/b/ # Symlinks & hard links ln -s /tmp/gfs-mnt/test.txt /tmp/gfs-mnt/link.txt ln /tmp/gfs-mnt/test.txt /tmp/gfs-mnt/hard.txt # xattr setfattr -n user.comment -v "test" /tmp/gfs-mnt/test.txt getfattr -n user.comment /tmp/gfs-mnt/test.txt # Permissions chmod 600 /tmp/gfs-mnt/test.txt stat /tmp/gfs-mnt/test.txt ghostfs umount -m /tmp/gfs-mnt
ghostfs mount -d /tmp/gfs.img -m /tmp/gfs-mnt & GPID=$! sleep 1 # Write a large file dd if=/dev/urandom of=/tmp/gfs-mnt/big.bin bs=1M count=64 # Simulate crash — kill without clean unmount kill -9 $GPID sleep 1 # Remount — journal recovery should replay automatically ghostfs mount -d /tmp/gfs.img -m /tmp/gfs-mnt & sleep 1 # Verify file is intact ls -lh /tmp/gfs-mnt/big.bin md5sum /tmp/gfs-mnt/big.bin ghostfs umount -m /tmp/gfs-mnt
# Build cybersec binary first cargo build --no-default-features --features cybersec,zstd,lz4 --release openssl rand -hex 32 > /tmp/key.hex ghostfs mkfs --device /tmp/gfs-sec.img ghostfs mount -d /tmp/gfs-sec.img -m /tmp/gfs-mnt \ --cybersecurity --key-file /tmp/key.hex & sleep 1 echo "secret data" > /tmp/gfs-mnt/secret.txt # Verify sled data is ciphertext (should be random-looking) strings /tmp/gfs-sec.img | grep "secret" && echo "FAIL: plaintext found" || echo "PASS: encrypted" ghostfs umount -m /tmp/gfs-mnt
# Create a file, set MAC label = TopSecret, test as unprivileged user # (Requires MAC label management tool — see mac.rs API) # Expected: read from UID with Unclassified clearance → EACCES sudo -u nobody cat /tmp/gfs-mnt/secret.txt echo "Exit code: $?" # should be non-zero
# After some operations, verify the hash chain # (integrate into a management daemon or CLI subcommand) # Quick Rust test cargo test --no-default-features --features cybersec forensics
apt install fio ghostfs mount -d /tmp/gfs.img -m /tmp/gfs-mnt \ --compression lz4 --noatime & sleep 1 # Sequential write fio --name=seqwr --ioengine=sync --rw=write --bs=1m \ --size=512m --filename=/tmp/gfs-mnt/fio.tmp # Sequential read fio --name=seqrd --ioengine=sync --rw=read --bs=1m \ --size=512m --filename=/tmp/gfs-mnt/fio.tmp # Random 4K read/write mix fio --name=rand4k --ioengine=sync --rw=randrw --bs=4k \ --size=256m --filename=/tmp/gfs-mnt/fio2.tmp ghostfs umount -m /tmp/gfs-mnt
Calamares Integration
Calamares is the HackerOS installer framework. Integrating GhostFS requires providing a filesystem module, a mount module, and optionally a pre-install hook.
Step-by-step
-
Add
ghostfsto the live ISO.
Include the compiledghostfsbinary and the FUSE library in the squashfs image at/usr/local/bin/ghostfsand ensurefuse3is in the package list (packages.conf). -
Create the filesystem module.
Add a new module directory:/usr/lib/calamares/modules/ghostfs-mkfs/withmodule.descand a Python job script.# /usr/lib/calamares/modules/ghostfs-mkfs/module.desc --- type: "job" name: "ghostfs-mkfs" interface: "python" script: "main.py"# main.py import subprocess, libcalamares def run(): device = libcalamares.globalstorage.value("rootMountPoint") partition = libcalamares.globalstorage.value("rootPartition") result = subprocess.run( ["ghostfs", "mkfs", "--device", partition], capture_output=True, text=True ) if result.returncode != 0: return ("GhostFS mkfs failed", result.stderr) return None
-
Create the mount module.
Add/usr/lib/calamares/modules/ghostfs-mount/. The mount module mounts the freshly formatted partition at the Calamares root mount point so the installer can copy files.# main.py — mount module import subprocess, os, libcalamares def run(): root = libcalamares.globalstorage.value("rootMountPoint") device = libcalamares.globalstorage.value("rootPartition") os.makedirs(root, exist_ok=True) result = subprocess.run( ["ghostfs", "mount", "-d", device, "-m", root, "--compression", "zstd", "--noatime"], capture_output=True, text=True ) if result.returncode != 0: return ("GhostFS mount failed", result.stderr) return None
-
Add the unmount hook.
After file copy (unpackfs), unmount GhostFS cleanly using a post-install module:subprocess.run(["ghostfs", "umount", "-m", root])
-
Register in
settings.conf.
Add the modules in the correct sequence:# /etc/calamares/settings.conf — sequence section sequence: - show: - welcome - locale - partition - users - summary - exec: - partition # standard partitioning - ghostfs-mkfs # ← format with GhostFS - ghostfs-mount # ← mount GhostFS root - unpackfs # copy squashfs to GhostFS - machineid - fstab - locale - keyboard - localecfg - users - displaymanager - networkcfg - hwclock - services-systemd - ghostfs-umount # ← clean unmount - bootloader - show: - finished
-
Generate the fstab entry.
Calamares'sfstabmodule must emit a GhostFS-compatible entry. Override the template in/etc/calamares/modules/fstab.conf:# fstab.conf mountOptions: ghostfs: "noatime,compression=zstd"The generated/etc/fstabline will look like:/dev/sda1 / ghostfs noatime,compression=zstd 0 0
-
[Cybersec only] Key provisioning.
For cybersec installations, generate and store the key during install:# In the ghostfs-mkfs module, cybersec variant import secrets key = secrets.token_hex(32) key_path = "/etc/ghostfs/key.hex" os.makedirs("/etc/ghostfs", exist_ok=True) with open(key_path, "w") as f: f.write(key) os.chmod(key_path, 0o600) # Pass key to mount step via globalstorage libcalamares.globalstorage.insert("ghostfsKeyPath", key_path)
initramfs hook that runs ghostfs mount with the appropriate key-file before pivot_root.CLI Reference
| Subcommand | Options | Description |
|---|---|---|
mount |
-d <device> -m <mountpoint> --compression --noatime --cybersecurity --key-file |
Mount a GhostFS image or block device |
mkfs |
-d <device> --block-size --encryption |
Format (initialise) a GhostFS volume |
umount |
-m <mountpoint> |
Unmount a GhostFS volume (calls fusermount -u) |
Error Codes
| HfsError variant | errno | Cause |
|---|---|---|
NoEntry | ENOENT | Inode not found in sled |
QuotaExceeded(uid) | EDQUOT | User disk quota exceeded |
CorruptedData | EIO | BLAKE3 / Merkle mismatch or bad GCM tag |
InvalidArgument(s) | EINVAL | Bad parameter to filesystem call |
CryptoError | EIO | AES-GCM encrypt/decrypt failure |
MissingKey | EIO | Cybersec mode mounted without key |
TimeError | EIO | SystemTime before UNIX_EPOCH |
Compile-time Tunables
| Constant | File | Default | Description |
|---|---|---|---|
FS_BLOCK_SIZE | lib.rs | 4096 | Filesystem block size in bytes |
TTL | lib.rs | 1 s | FUSE attribute cache TTL |
INODE_CACHE_CAP | cache.rs | 10,000 | LRU inode cache capacity |
BLOCK_CACHE_CAP | cache.rs | 10,000 | LRU block cache capacity (~40 MiB) |
READ_AHEAD_BLOCKS | cache.rs | 8 | Prefetch window in blocks |
SYNC_ON_BARRIER | journal.rs | true | fdatasync on every journal commit |
MAX_AUDIT_ENTRIES | audit.rs | 100,000 | Rotating audit log size |
PERM_FAIL_THRESHOLD | ids.rs | 20 / 60s | IDS brute-force trigger |
MASS_DELETE_THRESHOLD | ids.rs | 50 / 60s | IDS mass-delete (ransomware) trigger |
ENUM_THRESHOLD | ids.rs | 500 / 60s | IDS rapid-enumeration trigger |