FIGURE SUMMARY
Title

A database and deep learning toolbox for noise-optimized, generalized spike inference from calcium imaging

Authors
Rupprecht, P., Carta, S., Hoffmann, A., Echizen, M., Blot, A., Kwan, A.C., Dan, Y., Hofer, S.B., Kitamura, K., Helmchen, F., Friedrich, R.W.
Source
Full text @ Nat. Neurosci.

a, A large and diverse ground truth database obtained by simultaneous calcium imaging and juxtacellular recording (left) can be used 1) for the exploration of the ground truth by a user, 2) for the analysis of the out-ofdataset generalization of spike inference and 3) for the training of a supervised algorithm for spike inference. The right column refers to relevant figures. Colab Notebook refers to relevant cloud-based tools accompanying this paper. b-f, Examples of ground truth recordings with different indicators, different brain regions and species. Left: calcium signal traces (ΔF/F) are shown together with the detected action potentials (APs). Dashed lines indicate breaks during recordings. Traces are representative for recordings from different datasets (see Table 1 for detailed information). Middle: linear kernels of ΔF/F (time scale in seconds) and electrophysiological data (time scale in milliseconds) triggered by single spikes. Right: fluorescence image of the respective neuron, together with the ROI for fluorescence extraction. g, Average spike rate for each neuron of the ground truth database (log scale). 27 datasets (DS) were included in total. Datasets from inhibitory neurons comprise DS#22-27. h, Integral ΔF/F of the spike kernel (first 2 s) for each neuron. Lowest values are observed in PV+ interneurons (DS#23 and #24). See Extended Data Fig. 1 for the underlying kernels.

Training a deep network with noise-matched ground truth improves spike inference.

a, The default deep network consists of an input time window of 64 time points centered around the time point of interest. Through three convolutional layers, two pooling layers and one small dense layer, the spiking probability is extracted from the input time window and returned as a single number for each time point. b, Properties of the population data (frame rate, noise level; dashed line) are extracted and used for noise-matched resampling of existing ground truth datasets. The resampled ground truth is used to train the algorithm, resulting in calibrated spike inference of the population imaging data. c, Top: a low-noise ΔF/F trace is translated into spike rates (SR; inferred spike rates in black, ground truth in orange) more precisely when low-noise ground truth has been used for training. Bottom: a high-noise ΔF/F trace is translated into spike rates (SR; inferred spike rates in black, ground truth in orange) more precisely when high-noise ground truth has been used for training. v in units of standardized noise, % · Hz −1/2. d, The spike inference performance for two test conditions (low noise, v = 2, dark gray; high noise, v = 8, light gray) is optimal when training noise approximates testing noise levels. e, Correlation between predictions and ground truth is maximized if noise levels of training datasets match noise levels of testing sets. f, Relative error of predictions with respect to ground truth. g, Relative bias of predictions with respect to ground truth. Column-wise normalized versions of (e-g) are shown in Extended Data Fig. 3.

Generalization across datasets.

The network was trained on a given dataset (indicated by the row number) and tested on each other ground truth dataset (column). Diagonal values correspond to metrics shown in Fig. 3e. "NAOMi” is a model trained on simulated GCaMP6f data based on Charles et al. (2019). Rows 21-24 are networks trained on datasets with inhibitory neurons. "Global EXC model” and "global INH model” are globally trained on all excitatory or inhibitory datasets (except datasets #01 and the respective test dataset). a, Correlation of predictions with the ground truth. The size and color of the squares scale with correlation. b, Distribution of the performance of each trained network (row) across all other datasets (distribution across n=25 datasets for each box plot). The dashed line highlights the median of the best-performing model (‘global EXC model’). c-d, Relative error of predictions compared to the ground truth. The dashed line in (d) highlights the median of the best-performing mode (‘global EXC model’). e-f, Relative bias of predictions compared to the ground truth (distribution across n=25 datasets for each box plot). All datasets were re-sampled at a frame rate of 7.5 Hz, with a standardized noise level of 2. For box plots, the median is indicated by the central line, 25th and 75th percentiles by the box, and maximum/minimum values excluding outliers (points) by the whiskers.

Comparison with model-based algorithms.

a, Example calcium imaging recording and corresponding predictions from the deep-learning based method (CASCADE) and five model-based algorithms (MLSpike, Peeling, CaImAn, Suite2p, Jewell&Witten). Respective predictions are in black, ground truth in orange. r indicates correlation of predictions with ground truth. Clear false negative detections are labeled with red arrowheads. b, Heat map of the performance (correlation) of each algorithm for each dataset and neuron, calculated at standardized noise level 2 % Hz −1/2. All algorithms, except for CASCADE’s global EXC model (cf. Fig. 3), were tuned to the respective dataset by the mean squared error between ground truth and inferred spike rate. Arrowheads highlight the example neurons shown in Fig. 4a (black) and Extended Data Fig. 7 (grey). c, Direct comparison of performance (b) between CASCADE and other algorithms on a single-neuron basis. The difference in performance (correlation) is shown as a histogram across all neurons. ‘Global EXC model’ as defined in Fig. 3. d-f, Comparison of correlation, error and bias across all algorithms and noise levels. v in units of standardized noise, % · Hz−1/2. Solid/dashed lines indicate the mean across all neurons, shaded areas represent the SEM. g, Spiking activity in 2 s-bins, ground truth vs. Predictions. Lines indicate medians across distributions. Algorithms are color-coded as before. Underlying distributions are shown in Fig. S10. The unity relationship is shown as dashed line. h, Variability shared across algorithms, measured by the correlation between predictions. i, Histogram of error shared between CASCADE and MLSpike, quantified as the correlation between the unexplained variances for each neuron. Dashed line indicates the median. j, Shared median errors as illustrated in (i) for all pairs of algorithms. The smaller matrices to the right break the shared errors down into false positives and false negatives. All quantifications were performed with ground truth datasets resampled at 7.5 Hz with a noise level of 2 unless otherwise indicated. Dataset #03 was omitted for all comparisons in Fig. 4 since the short recordings (<10 s) could not be processed by all algorithms.

Inference of spiking activity with CASCADE from population calcium imaging across >1100 neurons in adult zebrafish.

a, Multiple planes were imaged simultaneously. Similar results have been obtained in 21 fish. The ROIs are colored with the average number of inferred stimulus-evoked spikes (colorbar). Non-active neurons were left uncolored. b, Randomly selected examples of calcium traces (ΔF/F, blue), inferred spike rates (SR, black) and inferred discrete spikes, highlighting the de-noising through spike inference. c, Correlation of odor-evoked responses across trials, based on ΔF/F data during the initial 2.5 s of the odor response. d, Correlation of odor-evoked responses across trials, based on inferred spiking probabilities. e, Unsupervised detection of sequential factors (left) and their temporal ‘loading’ (bottom), shown together with the inferred spiking probabilities (center) across a subset of stimulus repetitions. The temporal loadings indicate when a given factor becomes active. All neurons were ordered according to highest activity in pattern #4, highlighting the sequential activity pattern that is evoked by stimuli at multiple times.

Inference of spiking activity with CASCADE for the Allen Brain Observatory dataset in mice.

a, Number of recorded neurons vs. standardized noise levels (in % · Hz −1/2) for all experiments from dataset from excitatory (blue) and inhibitory (red) datasets; population imaging datasets in zebrafish (Fig. 5) in black for comparison. b, Example predictions from calcium data (blue). Discrete inferred spikes are shown in red below the inferred spike rates (black). See Extended Data Fig. 10 for more examples. c, Spike rates across the entire population are well described by a log-normal distribution (black fit). n = 38,466 neurons. d, Inferred spike rates across all neurons for recordings in different layers (colors) and for different transgenic driver lines of excitatory neurons. Each underlying data point is the mean spike rate across an experiment (n=336 experiments). e, Average spike rates for different stimulus conditions (x-labels) across layers (colors). Each data point is the mean spike rate across one experiment. f, Excerpt of raw ΔF/F traces of a subset of neurons of a single experiment (L2/3-Slc17a7, experiment ID ‘652989705’). Correlated noise is visible as vertical striping patterns. g, Same as (f), but with inferred spike rates. h, Average correlation between neuron pairs within an experiment (n=336 experiments), computed from raw ΔF/F traces (left) and inferred spike rates (right). For box plots, the median is indicated by the central line, 25th and 75th percentiles by the box, and maximum/minimum values excluding outliers (points) by the whiskers.

Acknowledgments
This image is the copyrighted work of the attributed author or publisher, and ZFIN has permission only to display this image to its users. Additional permissions should be obtained from the applicable author or publisher of the image. Full text @ Nat. Neurosci.