Connsense-TAP

HPC Workflow Analysis Pipeline · 2020–2024

Python HPC SLURM HDF5 Connectomics

Overview

Connsense-TAP (Topological Analysis Pipeline) is a computational framework for large-scale analysis of digitally reconstructed brain circuits on HPC clusters. It separates scientific inquiry from computational engineering — the scientist focuses on what to measure and why, while the framework automates the how.

The Problem

Four challenges that make large-scale circuit analysis difficult:

  1. Scale — Millions of neurons, billions of synapses. Analysis is computationally prohibitive on a single machine.
  2. Complexity & Reproducibility — A typical analysis is a multi-stage workflow: define regions, extract data, apply transformations, run measurements. Managing parameters and intermediate results across hundreds of subtargets is a "bookkeeping headache."
  3. Accessibility — Underlying data formats (SONATA) and libraries (bluepy) are laden with informatics jargon, creating barriers for scientists whose primary goal is to ask scientific questions.
  4. Scientific Evolution — A scientist's preferred analysis changes as they learn more about their subject. The framework must track not just computations but the development of the analysis itself.

Architecture: Three Core Components

1. tap-config: Configuration as Scientific Document

YAML configuration files that are not just parameter dumps but structured, version-controllable documents narrating the entire study. A pipeline.yaml defines subjects (subtargets), measurements (analyses), statistical controls, and variations (slicing). The configuration file is a primary artifact of reproducibility.

2. tap-env: Automated HPC Execution

A CLI (tap) that manages the three-stage workflow: setup → launch → collect. During setup, it intelligently batches inputs, estimates computational load, balances jobs, and generates SLURM sbatch scripts. This completely abstracts away parallel job management — the scientist executes large-scale analyses with a few commands.

3. tap-store: Intelligent Data Store

Results are collected into a single, structured HDF5 file (connsense.h5) with a sophisticated Python interface:

Pipeline Stages

  1. define-subtargets — Generate subvolumes as collections of node-ids (e.g., flatmap columns)
  2. extract-node-populations — Pull neuron properties (layer, position, cell type)
  3. extract-edge-populations — Create adjacency matrices for connectivity
  4. analyze-connectivity — Compute metrics (simplex counts, degree distributions), apply statistical controls
  5. collect-results — Aggregate into unified HDF5 store

Impact

Technical Stack

Python · SLURM / HPC orchestration · HDF5 (lazy-loading API) · YAML/JSON configuration · SONATA circuit format · NumPy / Pandas / SciPy

← Projects