SciTrails
Scientific Documentation & Knowledge Representation · 2020–2024
Overview
SciTrails (originally "circuit-factology") is a framework for converting computational results into structured, publishable scientific knowledge. It addresses the reproducibility crisis by treating scientific documentation as a first-class computational artifact — not an afterthought written after the analysis is done.
The Problem
Scientific analyses, particularly in computational neuroscience, involve multiple interconnected steps: data processing, statistical analysis, visualization, and interpretation. These are typically managed through ad-hoc scripts, Jupyter notebooks, and manual documentation. The result is fragmented, non-reproducible work where the path from raw data to a published figure is often obscure. A small change in the model requires re-running a tangled web of scripts, usually with manual intervention.
The Solution: Declarative Fact Generation
Instead of writing imperative scripts ("do this, then that, then save a plot"), the scientist declares what knowledge they want to obtain. The framework handles the how.
This is achieved through a clear hierarchy:
- Laboratory — A consistent interface for querying the underlying dataset (the circuit model)
- Measurement — Isolated algorithmic logic for calculating a single value or generating a single plot
- Fact / Figure — A structured object binding a scientific question to its computed answer, with full provenance
- Factsheet — A thematic collection of Facts and Figures constituting a coherent report on a topic
How It Works
What the scientist provides:
- A data interface — a Laboratory-like class for their specific dataset
- Measurement functions — Python functions performing core calculations
- YAML configurations — declaring desired Facts, Figures, and Factsheets
What the framework delivers:
- Automated reports — structured, human-readable documents from a single
command (
scitale init → setup → generate) - Full reproducibility — the entire knowledge generation process captured in configuration and code
- Clear provenance — every number and figure linked to the code that generated it
- Scalability — add new analyses or run the suite on a new dataset with minimal changes
Design Principles
- Separation of content from presentation — scientific logic is independent of output format (HTML, PDF, Markdown)
- Version-controlled notebooks — unlike Jupyter, configurations are diff-friendly YAML that work with git
- FAIR principles — Findable, Accessible, Interoperable, Reusable knowledge artifacts
- Multi-scale organization — handles hierarchical data from brain regions down to individual synapses
Comparative Positioning
SciTrails occupies a unique niche compared to existing tools: it combines the interactivity of Jupyter with the reproducibility of Snakemake and the publication quality of RMarkdown, while adding declarative fact generation and provenance tracking that none of these provide individually.
Technical Stack
Python · YAML configuration · Jinja2 templating · HTML / Markdown / PDF generation · Git-based version control · HDF5 data storage