Functional connectomics spanning multiple areas of mouse visual cortex - Nature

Mouse lines

All procedures were approved by the Institutional Animal Care and Use Committee (IACUC) of Baylor College of Medicine. All results described here are from a single male mouse, age 65 days at onset of experiments, expressing GCaMP6s in excitatory neurons via Slc17a7-Cre⁶⁵ and Ai162⁶⁶ heterozygous transgenic lines (recommended and generously shared by H. Zeng at Allen Institute for Brain Science; JAX stock 023527 and 031562, respectively). In order to select this animal, 31 (12 female, 19 male) GCaMP6-expressing animals underwent surgery as described below. Of these, eight animals were chosen based on a variety of criteria including surgical success and animal recovery, the accessibility of lateral higher visual areas in the cranial window, the degree of vascular occlusion, and the success of cortical tissue block extraction and staining. Of these 8 animals, one was chosen for 40-nm slicing and EM imaging based on overall quality using these criteria.

Timeline

Mouse birth date: 19 December 2017

Surgery: 21 February 2018 (P64)

2P imaging start: 4 March 2018 (P75)

2P imaging end: 9 March 2018 (P80)

Structural Stack: 21 March 2018 (P83)

Perfusion: 16 March 2018 (P87)

Surgery

Anaesthesia was induced with 3% isoflurane and maintained with 1.5–2% isoflurane during the surgical procedure. Mice were injected with 5–10 mg kg⁻¹ ketoprofen subcutaneously at the start of the surgery. Anaesthetized mice were placed in a stereotaxic head holder (Kopf Instruments) and their body temperature was maintained at 37 °C throughout the surgery using a homeothermic blanket system (Harvard Instruments). After shaving the scalp, bupivicane (0.05 ml, 0.5%, Marcaine) was applied subcutaneously, and after 10–20 min an approximately 1 cm² area of skin was removed above the skull and the underlying fascia was scraped and removed. The wound margins were sealed with a thin layer of surgical glue (VetBond, 3 M), and a 13-mm stainless steel washer clamped in the headbar was attached with dental cement (Dentsply Grip Cement). At this point, the mouse was removed from the stereotax and the skull was held stationary on a small platform by means of the newly attached headbar. Using a surgical drill and HP 1/2 burr, a 4-mm-diameter circular craniotomy was made centred on the border between primary visual cortex and lateromedial visual cortex (V1, lateral–medial; 3.5 mm lateral of the midline, ~1 mm anterior to the lambda suture), followed by a durotomy. The exposed cortex was washed with artificial cerebrospinal fluid (25 mM NaCl, 5 mM KCl, 10 mM glucose, 10 mM HEPES, 2 mM CaCl₂, 2 mM MgSO₄) with 0.3 mg ml⁻¹ gentamicin sulfate (Aspen Veterinary Resources). The cortical window was then sealed with a 4-mm coverslip (Warner Instruments), using cyanoacrylate glue (VetBond). The mouse was allowed to recover for 1 day prior to imaging. After imaging, the washer was released from the headbar and the mouse was returned to the home cage. Prior to surgery and throughout the imaging period, mice were singly housed and maintained on a reverse 12-h light cycle (off at 11:00, on at 23:00).

2P imaging

Mice were head-mounted above a cylindrical treadmill and calcium imaging was performed using Chameleon Ti-Sapphire laser (Coherent) tuned to 920 nm and a large FOV mesoscope⁶⁷ equipped with a custom objective (excitation NA 0.6, collection NA 1.0, 21 mm focal length). Laser power after the objective was increased exponentially as a function of depth from the surface according to:

$$P={P}_{0}\times {{\rm{e}}}^{(z/{L}_{z})}$$

Here P is the laser power used at target depth z, P₀ is the power used at the surface (not exceeding 10 mW), and L_z is the depth constant (not less than 150 μm). Maximum laser output of 115 mW was used for scans approximately 450–500 μm from the surface and below.

Monitor positioning

Visual stimuli were presented to the left eye with a 31.8 × 56.5 cm (height × width) monitor (ASUS PB258Q) with a resolution of 1,080 × 1,920 pixels positioned 15 cm away from the eye. When the monitor is centred on and perpendicular to the surface of the eye at the closest point, this corresponds to a visual angle of ~3.8° cm⁻¹ at the nearest point and 0.7° cm⁻¹ at the most remote corner of the monitor. As the craniotomy coverslip placement during surgery and the resulting mouse positioning relative to the objective is optimized for imaging quality and stability, uncontrolled variance in animal skull position relative to the washer used for head-mounting was compensated with tailored monitor positioning on a six-dimensional monitor arm. The pitch of the monitor was kept in the vertical position for all animals, while the roll was visually matched to the roll of the animal’s head beneath the headbar by the experimenter. In order to optimize the translational monitor position for centred visual cortex stimulation with respect to the imaging FOV, we used a dot stimulus with a bright background (maximum pixel intensity) and a single dark square dot (minimum pixel intensity). Dot locations were randomly ordered from a 5 × 8 grid to tile the screen, with 15 repetitions of 200 ms presentation at each location. The final monitor position for each animal was chosen in order to centre the population receptive field of the scan field ROI on the monitor, with the yaw of the monitor visually matched to be perpendicular to and 15 cm from the nearest surface of the eye at that position. An L-bracket on a six-dimensional arm was fitted to the corner of the monitor at this location and locked in position, so that the monitor could be returned to the chosen position between scans and across days.

Imaging site selection

The craniotomy window was leveled with regards to the objective with six degrees of freedom, five of which were locked between days to allow us to return to the same imaging site using the z axis. Pixel-wise responses from a 3,000 × 3,000 μm ROI spanning the cortical window (150 μm from surface, five 600 × 3,000 μm fields, 0.2 pixels per μm) to drifting bar stimuli were used to generate a sign map for delineating visual areas⁶⁸. Our target imaging site was a 1,200 × 1,100 × 500 μm volume (anteroposterior × mediolateral × radial depth) spanning layer 2 to layer 6 at the conjunction of VISp and three higher visual areas: VISlm, VISrl and VISal⁶⁹. This resulted in an imaging volume that was roughly 50% VISp and 50% higher visual area (HVA). This target was chosen to maximize the number of visual areas within the reconstructed cortical volume, as well as maximizing the overlap in represented visual space. The imaging site was further optimized to minimize vascular occlusion and to minimize motion artefact, especially where the brain curves away from the skull/coverslip towards the lateral aspect.

Once the imaging volume was chosen, a second retinotopic mapping scan with the same stimulus was collected at 12.6 Hz and matching the imaging volume FOV with four 600 × 1,100 μm fields per frame at 0.4 pixels per μm xy resolution to tile a 1,200 × 1,100 μm FOV at 2 depths (2 planes per depth, with no overlap between coplanar fields). Area boundaries on the sign map were manually annotated.

2P functional imaging

Of 19 completed scans over 6 days of imaging, 14 are described here (Extended Data Table 1). Scan placement targeted 10–15 μm increments in depth to maximize coverage of the volume in depth.

For 11 scans, imaging was performed at 6.3 Hz, collecting eight 620 × 1,100 μm fields per frame at 0.4 pixel per μm xy resolution to tile a 1,200 × 1,100 μm (width × height) FOV at four depths (two planes per depth, 40 μm overlap between coplanar fields).

For 2 scans, imaging was performed at 8.6 Hz, collecting six 620 × 1,100 μm fields per frame at 0.4 pixels per μm xy resolution to tile a 1,200 × 1,100 μm (width × height) FOV at 3 depths (2 planes per depth, 40 μm overlap between coplanar fields).

For 1 scan, imaging was performed at 9.6 Hz, collecting four 620 × 1,000 μm fields per frame at 0.6 pixels per μm xy resolution to tile a 1,200 × 1,000 μm (width × height) FOV at 2 depths (2 planes per depth, 40 μm overlap between coplanar fields).

The higher-resolution scans were designed to enable future analysis efforts to extract signals from large apical dendrites for example using EM-Assisted Source Extraction (EASE⁷⁰). In addition to locking the craniotomy window mount between days, the target imaging site was manually matched each day to preceding scans within several micrometres using structural features including horizontal blood vessels (which have a distinctive z-profile) and patterns of somata (identifiable by GCaMP6s exclusion as dark spots).

The full 2P imaging processing pipeline is available at (https://github.com/cajal/pipeline). Raster correction for bidirectional scanning phase row misalignment was performed by iterative greedy search at increasing resolution for the raster phase resulting in the maximum cross-correlation between odd and even rows. Motion correction for global tissue movement was performed by shifting each frame in x and y to maximize the correlation between the cross-power spectra of a single scan frame and a template image, generated from the Gaussian-smoothed average of the Anscombe transform from the middle 2,000 frames of the scan. Neurons were automatically segmented using constrained non-negative matrix factorization, then deconvolved to extract estimates of spiking activity, within the CaImAn pipeline⁷¹. Cells were further selected by a classifier trained to separate somata versus artefacts based on segmented cell masks, resulting in exclusion of 8.1% of masks. The functional data is available in a DataJoint⁷² database and can also be read as NWB files deposited in the DANDI data archive⁷³.

2P structural stack

Approximately 55 min prior to collecting the stack, the mouse was injected subcutaneously with 60 μl of 8.3 mM Dextran Texas Red fluorescent dye (Invitrogen, D3329). The stack was composed of 30 repeats of three 620 × 1,300 μm (width × height) fields per depth in 2 channels (green and red, respectively), tiling a 1,400 × 1,300 μm FOV (460 μm total overlap in width) at 335 depths from 21 μm above the surface to 649 μm below the surface. The green channel average image across repetitions for each field was enhanced with local contrast normalization using a Gaussian filter to calculate the local pixel means and standard deviations. The resulting image was then Gaussian smoothed and sharpened using a Laplacian filter. Enhanced and sharpened fields were independently stitched at each depth. The resulting stitched planes were independently horizontally and vertically aligned by maximizing the correlation of the cross-power spectrum of their Fourier transformations. Finally, the resulting alignment was detrended in z using a Hann filter with a size of 60 μm to remove the influence of vessels passing through the fields. The resulting transform was applied to the original average images resulting in a structural 2P 1,412 × 1,322 × 670 μm (width × height × depth) volume at 0.5 × 0.5 × 0.5 pixels per μm resolution in both red and green channels.

Owing to tissue deformation from day to day across such a wide FOV, some cells are recorded in more than one scan. To assure we count cells only once, we subsample our recorded cells based on proximity in 3D space. Functional scan fields were independently registered using an affine transformation matrix with 9 parameters estimated via gradient ascent on the correlation between the sharpened average scanning plane and the extracted plane from the sharpened stack. Using the 3D centroids of all segmented cells, we iteratively group the closest 2 cells from different scans until all pairs of cells are at least 10 μm apart or a further join produces an unrealistically tall mask (20 μm in z). Sequential registration of sections of each functional scan into the structural stack was performed to assess the level of drift in the z dimension. All scans had less than 10-μm drift over the 1.5-h recording, and for most of them drift was limited to

Fields from the FOV-matched retinotopy scan described above were registered into the stack using the same approach, and the manually annotated area masks were transformed into the stack. These area masks were extended vertically across all depths, and functional units inherit their area membership from their stack xy coordinates.

Eye and face camera

Video images of the eye and face of the mouse were captured throughout the experiment. A hot mirror (Thorlabs FM02) positioned between the animal’s left eye and the stimulus monitor was used to reflect an IR image onto a camera (Genie Nano C1920M, Teledyne Dalsa) without obscuring the visual stimulus. An infrared 940 nm LED (Thorlabs M940L2) illuminated the right side of the animal, backlighting the silhouette of the face. The position of the mirror and camera were manually calibrated per session and focused on the pupil. FOV was manually cropped for each session (ranging from 828 × 1,217 pixels to 1,080 × 1920 pixels at ~20 Hz), such that the FOV contained the superior, frontal, and inferior portions of the facial silhouette as well as the left eye in its entirety. Frame times were time stamped in the behavioural clock for alignment to the stimulus and scan frame times. Video was compressed using Labview’s MJPEG codec with quality constant of 600 and stored the frames in AVI file.

Light diffusing from the laser during scanning through the pupil was used to capture pupil diameter and eye movements. Notably, scans using wide ranges in laser power to scan both superficial and deep planes resulted in a variable pupil intensity between frames. A custom semi-automated user interface in Python was built for dynamic adaptation of fitting parameters throughout the scan to maximize pupil tracking accuracy and coverage. The video was manually cropped to a rectangular region that includes the entirety of the eye at all time points. The video was further manually masked to exclude high intensity regions in the surrounding eyelids and fur. In cases where a whisker is present and occluding the pupil at some time points, a merge mask was drawn to bridge ROIs drawn on both sides of the whisker into a single ROI. For each frame, the original and filtered image was visible to the user. The filtered image was an exponentially weighted temporal running average, which undergoes exponentiation, Gaussian blur, automatic Otsu thresholding into a binary image, and finally pixel-wise erosion/dilation. In cases where only one ROI was present, the contour of the binary ROI was fit with an ellipse by minimizing least squares error, and for ellipses greater than the minimum contour length the xy centre and major and minor radii were stored. In cases where more than one ROI was present, the tracking was automatically halted until the user either resolved the ambiguity, or the frame was not tracked (a NaN (Not a Number) is stored). Processing parameters were under dynamic control of the user, with instructions to use the minimally sufficient parameters that result in reliably and continuous tracing of the pupil, as evidenced by plotting of the fitted ROI over the original image. Users could also return to previous points in the trace for re-tracking with modified processing parameters, as well as manually exclude periods of the trace in which insufficient reliable pupil boundary was visible for tracking.

Treadmill

The mouse was head-restrained during imaging but could walk on a treadmill. Rostro-caudal treadmill movement was measured using a rotary optical encoder (Accu-Coder 15T-01SF-2000NV1ROC-F03-S1) with a resolution of 8,000 pulses per revolution, and was recorded at ~57–100 Hz in order to extract locomotion velocity.

Stimulus composition

The stimulus was designed to cover a sufficiently large feature space to support training highly accurate models that predict neural responses to arbitrary visual stimuli^11,38,74,75. Each scan stimulus was approximately 84 min in duration and comprised:

Oracle natural videos: 6 natural video clips, 2 from each category. 10 s each, 10 repeats per scan, 10 min total. Conserved across all scans.
Unique natural videos: 144 natural videos, 48 from each category. 10 s each, 1 repeat per scan, 24 min total. Unique to each scan.
2× repeat natural videos: 90 natural videos, 30 from each category. 10 s each, 2 repeats (one in each half of the scan), 30 min total. Conserved across all scans.
Local directional parametric stimulus (Trippy): 20 seeds, 15 s each, 2 repeats (one in each half of the scan), 10 min total. 10 seeds conserved across all scans, 10 unique to each scan.
Global directional parametric stimulus (Monet2): 20 seeds, 15 s each, 2 repeats (one in each half of the scan), 10 min total. 10 seeds conserved across all scans, 10 unique to each scan.

Each scan was also preceded by 0.15–5.5 min with the monitor on, and followed by 8.3–21.2 min with the monitor off, in order to collect spontaneous neural activity.

Natural visual stimulus

The visual stimulus was composed of dynamic stimuli, primarily including natural video but also including generated parametric stimuli with strong local or global directional component. Natural video clips were 10 s clips from one of three categories:

Cinematic, from the following sources: Mad Max: Fury Road (2015), Star Wars: Episode VII—The Force Awakens (2015), The Matrix (1999), The Matrix Reloaded (2003), The Matrix Revolutions (2003), Koyaanisqatsi: Life Out of Balance (1982), Powaqqatsi: Life in Transformation (1988) and Naqoyqatsi: Life as War (2002).
Sports-1M collection³⁷, with the following keywords: cycling, mountain unicycling, bicycle, BMX, cyclo-cross, cross-country cycling, road bicycle racing, downhill mountain biking, freeride, dirt jumping, slopestyle, skiing, skijoring, Alpine skiing, freestyle skiing, Greco-Roman wrestling, luge, canyoning, adventure racing, streetluge, riverboarding, snowboarding, mountainboarding, aggressive inline skating, carting, freestyle motocross, f1 powerboat racing, basketball and base jumping.
Rendered 3D video of first-person POV random exploration of a virtual environment with moving objects, produced in a customized version of Unreal Engine 4 with modifications that enable precise control and logging of frame timing and camera positions to ensure repeatability across multiple rendering runs. Environments and assets were purchased from Unreal Engine Marketplace. Assets chosen for diversity of appearance were translated along a piecewise linear trajectory, and rotated with a piecewise constant angular velocity. Intervals between change points were drawn from a uniform distribution from 1 to 5 s. If a moving object encountered an environmental object, it bounced off and continued along a linear trajectory reflected across the surface normal. The first-person POV camera followed the same trajectory process as the moving objects. Light sources were the default for the environment. Latent variable images were generated by re-generating the scenes and trajectories, rendering different properties, including absolute depth, object identification number and surface normals.

All natural videos were temporally resampled to 30 frames per second, and were converted to greyscale with 256 × 144 pixel resolution with FFmpeg (ibx264 at YUV4:2:0 8 bit). Stimuli were automatically filtered for upper 50th percentile Lucas–Kanade optical flow and temporal contrast of the central region of each clip. All natural videos included in these experiments were further manually screened for unsuitable characteristics (for example, fragments of rendered videos in which the first-person perspective would enter a corner and become ‘trapped’ or follow an unnatural camera trajectory, or fragments of cinematic or Sports-1M containing screen text or other post-processing editing).

Global directional parametric stimulus

To probe neuronal tuning to orientation and direction of motion, a visual stimulus (Monet2) was designed in the form of smoothened Gaussian noise with coherent orientation and motion. In brief, an independently identically distributed (i.i.d.) Gaussian noise video was passed through a temporal low-pass Hamming filter (4 Hz) and a 2D Gaussian spatial filter (σ = 3.0° at the nearest point on the monitor to the mouse). Each 15-s block consisted of 16 equal periods of motion along one of 16 unique directions of motion between 0–360° with a velocity of 42.8° s⁻¹ at the nearest point on the monitor. The video was spatial filtered to introduce a spatial orientation bias perpendicular to the direction of movement by applying a bandpass Hanning filter G(ω; c) in the polar coordinates in the frequency domain for $\omega =\phi -\theta $ where $\phi $ is the polar angle coordinate and $\theta $ is the movement direction θ. Then:

$$G(\omega \,;c)=\sqrt{c\,}H(c\omega )$$

and

$$H(\omega )=\frac{1}{2}+\frac{1}{2}\cos \,\omega \,{\rm{if}}\,| \omega |

Here, c = 2.5 is an orientation selectivity coefficient. At this value, the resulting orientation kernel’s size is 72° full width at half maximum in spatial coordinates.

Local directional parametric stimulus Trippy

To probe the tuning of neurons to local spatial features including orientation, direction, spatial and temporal frequency, the Trippy stimulus was synthesized by applying the cosine function to a smoothened noise video. In brief, a phase movie was generated as an i.i.d. uniform noise video with 4 Hz temporal bandwidth. The video was up-sampled to 60 Hz with the Hanning temporal kernel. An increasing trend of 8π s⁻¹ was added to the video to produce drifting grating movements whereas the noise component added local variations of the spatial features. The video was spatially up-sampled to the full screen with a 2D Gaussian kernel with a sigma of 5.97 cm or 22.5° at the nearest point. The resulting stimulus yielded the local phase video of the gratings, from which all visual features are derived analytically.

Stimulus alignment

A photodiode (TAOS TSL253) was sealed to the top left corner of the monitor, where stimulus sequence information was encoded in a three-level signal according to the binary encoding of the flip number assigned in order. This signal was recorded at 10 MHz on the behaviour clock (MasterClock PCIe-OSC-HSO-2 card). The signal underwent a sine convolution, allowing for local peak detection to recover the binary signal. The encoded binary signal was reconstructed for 89% of trials. A linear fit was applied to the trial timestamps in the behavioural and stimulus clocks, and the offset of that fit was applied to the data to align the two clocks, allowing linear interpolation between them.

Oracle score

We used six natural video conditions that were present in all scans and repeated ten times per scan to calculate an oracle score representing the reliability of the trace response to repeated visual stimuli. This score was computed as the jackknife mean of correlations between the leave-one-out mean across repeated stimuli with the remaining trial.

Tissue preparation

After optical imaging at Baylor College of Medicine, candidate mice were shipped via overnight air freight to the Allen Institute. All procedures were carried out in accordance with the Institutional Animal Care and Use Committee at the Allen Institute for Brain Science. All mice were housed in individually ventilated cages, 20–26 °C, 30–70% relative humidity, with a 12-h light:dark cycle. Mice were transcardially perfused with a fixative mixture of 2.5% paraformaldehyde, 1.25% glutaraldehyde, and 2 mM calcium chloride, in 0.08 M sodium cacodylate buffer, pH 7.4. After dissection, the neurophysiological recording site was identified by mapping the brain surface vasculature. A thick (1,200-μm) slice was cut with a vibratome and post-fixed in perfusate solution for 12–48 h. Slices were extensively washed and prepared for reduced osmium treatment based on the protocol of Hua et al.⁷⁶. All steps were performed at room temperature, unless indicated otherwise. Osmium tetroxide (2%, 78 mM) with 8% v/v formamide (1.77 M) in 0.1 M sodium cacodylate buffer, pH 7.4, for 180 min, was the first osmication step. Potassium ferricyanide 2.5% (76 mM) in 0.1 M sodium cacodylate, 90 min, was then used to reduce the osmium. The second osmium step was at a concentration of 2% in 0.1 M sodium cacodylate, for 150 min. Samples were washed with water, then immersed in thiocarbohydrazide (TCH) for further intensification of the staining (1% TCH (94 mM) in water, 40 °C, for 50 min). After washing with water, samples were immersed in a third osmium immersion of 2% in water for 90 min. After extensive washing in water, lead aspartate (Walton’s (20 mM lead nitrate in 30 mM aspartate buffer, pH 5.5), 50 °C, 120 min) was used to enhance contrast. After two rounds of water wash steps, samples proceeded through a graded ethanol dehydration series (50%, 70%, 90% w/v in water, 30 min each at 4 °C, then 3× 100%, 30 min each at room temperature). Two rounds of 100% acetonitrile (30 min each) served as a transitional solvent step before proceeding to epoxy resin (EMS Hard Plus). A progressive resin infiltration series (1:2 resin:acetonitrile (33% v/v), 1:1 resin:acetonitrile (50% v/v), 2:1 resin acetonitrile (66% v/v), then 2× 100% resin, each step for 24 h or more, on a gyrotary shaker) was done before final embedding in 100% resin in small coffin molds. Epoxy was cured at 60 °C for 96 h before unmolding and mounting on microtome sample stubs for trimming.

The surface of the brain in the neurophysiology ROI was highly irregular, with depressions and elevations that made it impossible to trim all the resin from the surface of the cortex without removing layer 1 (L1) and some portions of layer 2 (L2). Though empty resin increases the number of folds in resulting sections, we left some resin so as to keep the upper layers (L1 and L2) intact to preserve inter-areal connectivity and the apical tufts of pyramidal neurons. Similarly, white matter was also maintained in the block to preserve inter-areal connections despite the risk of increased sectioning artefacts that then have to be corrected through proofreading.

Ultrathin sectioning

The sections were then collected at a nominal thickness of 40 nm using a modified ATUMtome⁶³ (RMC/Boeckeler) onto 6 reels of grid tape⁴⁵. The knife was cleaned every 100–500 sections, occasionally leading to the loss of a very thin partial section (≪40 nm). Thermal expansion of the block as sectioning resumed post-cleaning resulted in a short series of sections substantially thicker than the nominal cutting thickness. The sectioning took place in two sessions, the first session took 8 consecutive days on a 24 h a day schedule and contained sections 1 to 14773. The loss rate on this initial session was low, but before section 7931 there were two events that led to consecutive section loss (due to these consecutive section losses we decided to not reconstruct the region containing sections 1 to 7931 even though the imagery was collected). The first event that led to consecutive section loss was due to sections being collected onto apertures with damaged films. To prevent this from happening again, we installed a camera that monitors the aperture before collection. The second event was due to an accident where the knife bumped the block and nicked a region near the edge of the ROI. At the end of this session we started seeing differential compression between the resin and the surface of the cortex. Because this could lead to severe section artefacts, we paused to trim additional empty resin from the block and also replaced the knife. The second session lasted five consecutive days and an additional 13,199 sections were cut. Due to the interruption, block shape changes and knife replacement, there are approximately 45 partial sections at the start of this session; importantly, these do not represent tissue loss (see stitching and alignment section). As will be described later, the EM dataset is subdivided into two subvolumes due to sectioning and imaging events that resulted in loss of a series of sections.

TEM imaging

The parallel imaging pipeline described here⁶³ converts a fleet of TEMs into high-throughput automated image systems capable of 24/7 continuous operation. It is built upon a standard JEOL 1200EXII 120 kV TEM that has been modified with customized hardware and software. The key hardware modifications include an extended column and a custom electron-sensitive scintillator. A single large-format CMOS camera outfitted with a low distortion lens is used to grab image frames at an average speed of 100 ms. The autoTEM is also equipped with a nano-positioning sample stage that offers fast, high-fidelity montaging of large tissue sections and an advanced reel-to-reel tape translation system that accurately locates each section using index barcodes for random access on the GridTape. In order for the autoTEM system to control the state of the microscope without human intervention and ensure consistent data quality, we also developed customized software infrastructure piTEAM that provides a convenient GUI-based operating system for image acquisition, TEM image database, real-time image processing and quality control, and closed-loop feedback for error-detection and system protection etc. During imaging, the reel-to-reel GridStage moves the tape and locates targeting aperture through its barcode. The 2D montage is then acquired through raster scanning the ROI area of tissue. Images along with metadata files are transferred to the data storage server. We perform image quality control on all the data and reimage sections that fail the screening. Pixel sizes for all systems were calibrated within the range between 3.95 and 4.05 nm per pixel and the montages had a typical size of 1.2 mm × 0.82 mm. The EM dataset contains raw tile images with two different sizes because two cameras with two different resolutions were used during acquisition. The most commonly used was a 20-megapixel camera that required 5,000 individual tiles to capture the 1 mm² montage of each section. During the dataset acquisition, three autoTEMs were upgraded with 50-megapixel camera sensors, which increased the frame size and reduced the total number of tiles required per montage to ~2,600

Volume assembly

The images in the serial section are first corrected for lens distortion effects. A nonlinear transformation of higher order is computed for each section using a set of 10 × 10 highly overlapping images collected at regular intervals during imaging⁶⁴. The lens distortion correction transformations should represent the dynamic distortion effects from the TEM lens system and hence require an acquisition of highly overlapping calibration montages at regular intervals. Overlapping image pairs are identified within each section and point correspondences are extracted for every pair using a feature based approach. In our stitching and alignment pipeline, we use SIFT (scale invariant feature transform) feature descriptors to identify and extract these point correspondences. Per image transformation parameters are estimated by a regularized solver algorithm. The algorithm minimizes the sum of squared distances between the point correspondences between these tile images. Deforming the tiles within a section based on these transformations results in a seamless registration of the section. A down-sampled version of these stitched sections are produced for estimating a per section transformation that roughly aligns these sections in 3D. A process similar to 2D stitching is followed here, where the point correspondences are computed between pairs of sections that are within a desired distance in z direction. The per section transformation is then applied to all the tile images within the section to obtain a rough aligned volume. MIPmaps are utilized throughout the stitching process for faster processing without compromise in stitching quality.

The rough aligned volume is rendered to disk for further fine alignment. The software tools used to stitch and align the dataset are available in our github repository (https://github.com/AllenInstitute/asap-modules). The volume assembly process is entirely based on image metadata and transformations manipulations and is supported by the Render service (https://github.com/saalfeldlab/render).

Cracks larger than 30 μm in 34 sections were corrected by manually defining transforms. The smaller and more numerous cracks and folds in the dataset were automatically identified using convolutional networks trained on manually labelled samples using 64 × 64 × 40 nm³ resolution image. The same was done to identify voxels which were considered tissue. The rough alignment was iteratively refined in a coarse-to-fine hierarchy⁷⁷, using an approach based on a convolutional network to estimate displacements between a pair of images⁷⁸. Displacement fields were estimated between pairs of neighbouring sections, then combined to produce a final displacement field for each image to further transform the image stack. Alignment was first refined using 1,024 × 1,024 × 40 nm³ images, then 64 × 64 × 40 nm³ images.

The composite image of the partial sections was created using the tissue mask previously computed. Pixels in a partial section which were not included in the tissue mask were set to the value of the nearest pixel in a higher-indexed section that was considered tissue. This composite image was used for downstream processing, but not included with the released images.

Segmentation

Remaining misalignments were detected by cross-correlating patches of image in the same location between two sections, after transforming into the frequency domain and applying a high-pass filter. Combining with the tissue map previously computed, a mask was generated that sets the output of later processing steps to zero in locations with poor alignment. This is called the segmentation output mask.

Using the outlined method⁷⁹, a convolutional network was trained to estimate inter-voxel affinities that represent the potential for neuronal boundaries between adjacent image voxels. A convolutional network was also trained to perform a semantic segmentation of the image for neurite classifications, including: (1) soma plus nucleus; (2) axon; (3) dendrite; (4) glia; and (5) blood vessel. Following the described methods⁸⁰, both networks were applied to the entire dataset at 8 × 8 × 40 nm³ in overlapping chunks to produce a consistent prediction of the affinity and neurite classification maps. The segmentation output mask was applied to the predictions.

The affinity map was processed with a distributed watershed and clustering algorithm to produce an over-segmented image, where the watershed domains are agglomerated using single-linkage clustering with size thresholds^81,82. The over-segmentation was then processed by a distributed mean affinity clustering algorithm^81,82 to create the final segmentation. We augmented the standard mean affinity criterion with constraints based on segment sizes and neurite classification maps during the agglomeration process to prevent neuron-glia mergers as well as axon–dendrite and axon–soma mergers.

Synapse detection and assignment

A convolutional network was trained to predict whether a given voxel participated in a synaptic cleft. Inference on the entire dataset was processed using the methods described previously⁸⁰ (using 8 × 8 × 40 nm³ images). These synaptic cleft predictions were segmented using connected components, and components smaller than 40 voxels were removed.

A separate network was trained to perform synaptic partner assignment by predicting the voxels of the synaptic partners given the synaptic cleft as an attentional signal⁸³. This assignment network was run for each detected cleft, and coordinates of both the presynaptic and postsynaptic partner predictions were logged along with each cleft prediction.

To evaluate precision and recall, we manually identified synapses within 70 small subvolumes (n = 8,611 synapses) spread throughout the dataset⁸⁴.

Nucleus detection

A convolutional network was trained to predict whether a voxel participated in a cell nucleus. Following the methods described previously⁸⁰, a nucleus prediction map was produced on the entire dataset at 64 × 64 × 40 nm³. The nucleus prediction was thresholded at 0.5, and segmented using connected components.

Proofreading

Extensive manual, semi-automated, and fully automated proofreading of the segmentation data was performed by multiple teams to improve the accuracy of the neural circuit reconstruction.

Critical to enabling these coordinated proofreading activities is the central ChunkedGraph system^1,85,86, which maintains a dynamic segmentation dataset, and supports real-time collaborative proofreading on petascale datasets though scalable software interfaces to receive edit requests from various proofreading platforms and support querying and analysis on edit history.

Multiple proofreading platforms and interfaces were developed and leveraged to support the large-scale proofreading activities performed by various teams at Princeton University, the Allen Institute for Brain Science, Baylor College of Medicine, the Johns Hopkins University Applied Physics Laboratory, and ariadne.ai (individual proofreaders are listed in Acknowledgements). Below we outline the methods for these major proofreading activities focused on improving the completeness of neurons within and proximal to the main cortical column, splitting of merged multi-soma objects distributed throughout the image volume, and distributed application of automated proofreading edits to split erroneously merged neuron segments.

Manual proofreading of dendritic and axonal processes

Following the methods described previously^26,85,87 proofreaders from Princeton University, the Allen Institute for Brain Science, Baylor College of Medicine, and ariadne.ai used a modified version of Neuroglancer with annotation capabilities as a user interface to make manual split and merge edits to neurons with somata spatially located throughout the dataset. The choice of which neurons to proofread was based on the scientific needs of different projects, which are described in the accompanying studies^4,5,7,10.

Proofreading was aided by on-demand highlighting of branch points and tips on user-defined regions of a neuron based on rapid skeletonization (https://github.com/AllenInstitute/Guidebook). This approach quickly directed proofreader attention to potential false merges and locations for extension, as well as allowed a clear record of regions of an arbor that had been evaluated.

For dendrites, we checked all branch points for correctness and all tips to see if they could be extended. False merges of simple axon fragments onto dendrites were often not corrected in the raw data, since they could be computationally filtered for analysis after skeletonization (see next section). Detached spine heads were not comprehensively proofread. Dendrites that were proofread are identified in CAVE table proofreading_status_and_strategy as status_dendrite = “true”.

For axons, we began by ‘cleaning’ axons of false merges by looking at all branch points. We then performed an extension of axonal tips, the degree of this extension depended on the scientific goals of the different project. The different proofreading strategies were as follows:

(1)

Comprehensive extension: each axon end and branch point was visited and checked to see if it was possible to extend until either their biological completion or reached an incomplete end (incomplete ends were due to either the axon reaching the borders of the volume or an artefact that curtailed its continuation). Label: axon_fully_extended.
(2)

Substantial extension: each axon branch point was visited and checked, many but not all ends were visited and many but not all ends were done. Label: axon_partially_extended.
(3)

Inter_areal_extension: a subset of axons that projected either from a HVA to V1, or from V1 to a HVA were preferentially extended to look specifically at inter-areal connections. Label: axon_interareal
(4)

Local cylinder cutting: a subset of pyramidal cells were proofread in a local cylinder which had a 300-μm radius centred around the column featured in Schneider-Mizell et al.⁴. For layer 2/3 cells the cylinder had a a floor at the layer 4/5 border, for layer 4 cells it had a floor at the layer 5/6 border. Any axon leaving the cylinder was cut and
(5)

At least 100 synapses: axons were extended until at least 100 synapses were present on the axon to get a sampling of their output connectivity profile. Label: also axon_partially_extended.

Axons that were proofread are identified in CAVE table proofreading_status_and_strategy as status_axon=‘true’ and the proofreading strategy label associated with each axon is described in the column ‘strategy_axon’.

Manual proofreading to split incorrectly merged cells

Proofreading was also performed to correctively split multi-soma objects containing more than one neuronal soma, which had been incorrectly merged from the agglomeration step in the reconstruction process. This proofreading was performed by the Johns Hopkins University Applied Physics Laboratory, Princeton University, the Allen Institute for Brain Science, and Baylor College of Medicine. These erroneously merged multi-soma objects were specifically targeted given their number, distribution throughout the volume, and subsequent impact on global neural connectivity⁸⁸ (Extended Data Fig. 3). As an example, multi-soma objects comprised up to 20% of the synaptic targets for 78 excitatory cells that with proofreading status ‘comprehensive extension’. Although the majority of multi-soma objects contained 2 to 25 nuclei (Extended Data Fig. 3a), one large multi-soma object contained 172 neuronal nuclei due to proximity to a major blood vessel present in a substantial portion of the image volume.

Different Neuroglancer web-based applications^1,85,86,88 were used to perform this proofreading, but most edits were performed using NeuVue⁸⁸. NeuVue enables scalable task management across dozens of concurrent users, as well as provide efficient queuing, review, and execution of proofreading edits by integrating with primary data management APIs such as CAVE and PCG. Multi-soma objects used to generate proofreading tasks were originally identified using the nucleus detection table available through CAVE. Additionally, algorithms were employed in a semi-automated workflow to detect the presence of incorrect merges and proposed potential corrective split locations in the segmentation for proofreaders to review and apply².

Proofreading through automated error-detection and correction framework

Following methods described elsewhere² automated error-detection and error-correction methods were utilized using the Neural De-composition (NEURD) framework to apply edits to split incorrectly merged axonal and dendritic segments distributed across the image volume. These automated methods leveraged graph filter and graph analysis algorithms to accurately identify errors in the reconstruction and generate corrective solutions. Validation and refinement of these methods were performed through manual review of proposed automated edits through the NeuVue platform⁸⁸.

Co-registration

Transform

We initially manually matched 2,934 fiducials between the EM volume and the 2P structural dataset (1,994 somata and 942 blood vessels, mostly branch points, which are available as part of the resource). Though the fiducials cover the total volume of the dataset it is worth noting that below 400 µm from the surface there is much lower signal to noise in the 2P structural dataset requiring more effort to identify somata, therefore we made use of more vascular fiducials. The fiducial annotation was done using a down-sampled EM dataset with pixel sizes 256 nm (x), 256 nm (y) and 940 nm (z).

Using the fiducials, a transform between the EM dataset and the 2P structural stack was calculated (Methods). To evaluate the error of the transform we evaluated the distance in micrometres between the location of a fiducial after co-registration and its original location; a perfect co-registration would have residuals of 0 μm. The average residual was 3.8 μm.

For calculating the transform we introduced a staged approach to separate the gross transformation between the EM volume and the 2P space from the finer nonlinear deformations needed to get good residuals. This was done by taking advantage of the infrastructure created for the alignment of the EM dataset described above.

The full 3D transform is a list of eight transforms that fall into four groups with different purposes:

(1)

The first group is a single transform that is a second-order polynomial transform between the two datasets. This first group serves to scale and rotate the optical dataset into EM space, followed by a single global nonlinear term, leaving an average residual of ~10 µm.
(2)

The second group of transforms addresses an issue we saw in the residuals: there were systematic trends in the residuals, both positive and negative, that aligned well with the EM z axis. These trends are spaced in a way that is indicative of changing shape of the EM data on approximately the length scale between knife cleanings or tape changes. We addressed this with a transform that binned the data into z ranges and applied a further second-order polynomial to each bin. We did this in a 2-step hierarchical fashion, first with 5 z bins, followed by a second with 21 z bins. These steps removed the systematic trends in the residuals versus z and the average residuals dropped to 5.6 µm and 4.6 µm respectively.
(3)

The third group is a set of hierarchical thin plate spline transforms. We used successively finer grids of control points of even n × n × n spacing in the volume. We used 4 steps with n = [3, 5, 10, 12]. The idea here is to account for deformations on larger length scales first, so that the highest order transforms introduce smaller changes in position. The average residuals in these steps were 3.9, 3.5, 3.1 and 2.9 µm accomplished with average control point motions of 12.5, 7.5, 3.8 and 1.6 µm.
(4)

The final group is a single thin plate spline transform. The control points for this transform are no longer an evenly spaced grid. Instead, each fiducial point is assigned to be a control point. This transform minimizes the residuals almost perfectly (as it should for the control points which are identical to the fiducials; 0.003 µm on average; Fig. 3) and accomplishes this final step by moving each data point on average another 2.9 µm. This last transform is very sensitive to error in fiducial location but provides the co-registration with minimal residuals. This last transform is also more likely to create errors in regions with strong distortions, as for example the edges of the dataset.

Since the nature of transform 4 is to effectively set the residuals to zero for the control points, we used a new measure to evaluate the error of the transform. We created 2,933 3D transforms, each time leaving out one fiducial and then evaluated the residual of the left-out point. We call this measure ‘leave-one-out’ residuals and it evaluates how well the transform does with a new point.

Assigning manual matches

A custom user interface was used to visualize images from both the functional data and EM data side-by-side to manually associate functional ROIs to their matching EM cell counterpart and vice versa. To visualize the functional scans, summary images were generated by averaging the scan over time (average image) and correlating pixels with neighbour pixels over time (correlation image). The product of the average and correlation images were used to clearly visualize cell body locations. Using the per field affine registration into the 2P structural stack (Fig. 3b), a representative image of labelled vasculature corresponding to the registered field was extracted from the red channel of the stack. EM imagery and EM nucleus segmentation was resized to 1 μm³ resolution, and transformed into the 2P structural stack coordinates using the co-registration transform, allowing an EM image corresponding to the registered field to be extracted. The overlay of the extracted vessel field and extracted EM image were used to confirm local alignment of the vasculature visible in both domains. Soma identity was assessed by comparing the spatial structure of the target soma and nearby somas in the functional image to soma locations from the EM cell nuclei image. Using the tool, matchers generated a hypothesis for which EM cell nucleus matched to a given functional unit or vice versa. A custom version of Neuroglancer (Seung laboratory; https://github.com/seung-lab/neuroglancer) was used to visualize the region of interest in the ground truth EM data for match confirmation. The breakdown in the number of unique neuron matches per 2P scan is shown in Extended Data Fig. 4a. The resulting matches are uploaded to CAVE table coregistration_manual_v4. The latest recommended manual match table can be found at https://www.microns-explorer.org/cortical-mm3#f-coreg.

Evaluating manual matches

In addition to the matches, the manual co-registration table includes two metrics that help assess confidence. The first, residual, measures the distance between the matched 2P functional unit centroid and EM neuron soma centroid, after transformation with the EM to 2P fiducial-based transform (Extended Data Fig. 4b, top). The second metric, separation score, measures the difference in residuals between the match and the nearest non-matched EM neuron. (Extended Data Fig. 4b, bottom) Negative separation indicates that the nearest EM neuron to the functional unit after transformation was not chosen by the matchers. Smaller residuals and larger separation scores indicate higher confidence matches, as is the case for a majority of matches (Extended Data Fig. 4c). To help validate the manual matches, for every EM neuron that was independently matched to at least two scans, the in vivo signal correlation (correlation between trial-averaged responses to oracle stimuli) was computed between the matched unit in scan A to the matched unit in scan B. In addition, for each neuron, two control correlations were computed, the matched unit in scan A to the nearest unit not matched to the neuron in scan B, and vice versa (Extended Data Fig. 4d). As expected, the distribution of oracle scores between the matched neurons and control neurons are qualitatively similar, with a slight right-shift towards higher oracle scores for matches, as higher oracle scores were prioritized for matching (Extended Data Fig. 4e). The comparison of signal correlation between matched neurons and their control counterparts exhibits a strong trend, with a clustering in the upper left quadrant and most data points positioned above the diagonal. This indicates that the matched neurons consistently have stronger signal correlations compared to their nearest counterparts, and high signal correlation overall, especially when the oracle score is larger (Extended Data Fig. 4f). Filtering by oracle score further refines the trend, highlighting that high oracle score neurons (score >0.2) show even more distinct separation, with matched neurons maintaining superior signal correlation values compared to the nearest-neighbour matches (Extended Data Fig. 4g,h).

Generating the fiducial-based automatch

To generate the fiducial-based automatch, we utilized the EM-to-2P co-registration transform to map all EM neuron nucleus centroids (retrieved from the CAVE table nucleus_neuron_svm) into the 2P functional space. Next, we applied the minimum weight matching algorithm for bipartite graphs⁸⁹ using the linear_sum_assignment function from the scipy.optimize module⁹⁰ to perform the matching. The resulting automatch table is stored in the CAVE table coregistration_auto_phase3_fwd, which also includes the associated residual and separation scores. The latest recommended fiducial-based automatch table can be found at https://www.microns-explorer.org/cortical-mm3#f-coreg.

Vessel-based co-registration and automatch

To achieve co-registration starting with the 2P structural stacks and EM segmentation and without the use of fiducials, we employed a multi-scale B-spline registration⁹¹ using only vasculature data. This non-rigid transformation method corrects the extreme nonlinear tissue distortions caused by shrinkage from 2P to EM. Both the EM segmentation and the 2P structural stack volumes were subsampled to match 1-μm voxel resolution, ensuring consistent scaling and indexing between the volumes.

Pre-processing on the vessels was necessary to address inconsistent signal quality in the 2P data, especially for vessels located deeper in the cortex, which emit lower fluorescence. A Meijering neurite filter⁹² was applied to the vessels, using the eigenvectors of the Hessian matrix to detect vessels effectively.

An additional filtering step mitigated discrepancies in z resolution and errors from false splits in the EM segmentation. To address the z direction smearing in 2P due to anisotropy, both the 2P and EM volumes were binarized, skeletonized and further processed by removing small isolated segments. A Gaussian filter was convolved over the skeletons, forming tubes of constant radius for co-registration. Another round of skeletonization and Gaussian filtering was applied to correct for false splits in thicker vessels.

The final co-registration was computed using SimpleITK’s B-spline algorithm⁹³, treating the EM volume as the ‘moving’ volume. Initially, centroid alignment was achieved via template matching within a small subvolume. Despite tissue shrinkage, the volumes were locally aligned well enough to yield good correlations. The B-spline transformation was performed across multiple scales, progressing from coarse grids with strong smoothing to finer grids with minimal smoothing. The Limited-Memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) optimizer with 600 iterations was used, sampling 1% of the points to handle large matrices. The resulting flow field and its inverse defined how each voxel mapped between spaces.

For the final step, the flow field was applied to both the EM nuclear segmentation and the 2P unit centroids. Minimum weight matching was performed (as described in ‘Generating the fiducial-based automatch’) to establish match assignments, using excitatory neuron centroids from the CAVE table aibs_metamodel_mtypes_v661_v2. The final table is uploaded to apl_functional_coreg_vess_fwd with the associated residual and separation scores. The latest recommended vessel-based automatch table can be found at https://www.microns-explorer.org/cortical-mm3#f-coreg.

Generating the fiducial-vessel agreement automatch table

To generate the fiducial-vessel agreement automatch table, first, for each table described above (coregistration_auto_phase3_fwd, apl_functional_coreg_vess_fwd), the residual and separation scores were transformed into percentiles. Then, the two tables were merged on keys: ‘session’, ‘scan_idx’, ‘field’, ‘unit_id’ and ‘target_id’.

Evaluating automatch tables

To evaluate the automatch tables, we computed precision and recall using manual matches as ground truth. To ensure a fair comparison, we first restricted both the automatch and manual match tables to only contain rows where the functional unit or EM neuron was commonly attempted. For calculating precision and recall, true positives were rows common to both tables, false positives were rows only in the automatch table, and false negatives were rows only in the manual match table. The precision-recall curves can be used to select an automatch, and/ or a metric with which to threshold matches (Extended Data Fig. 5a). In addition, heat maps are provided indicating precision levels (Extended Data Fig. 5b) and number of automatches remaining (Extended Data Fig. 5c) for jointly applied residual and separation percentile thresholds. To apply a threshold, first convert the residual and separation (named ‘score’ in the table) to percentiles. Then for residual, apply the threshold as a maximum, taking the matches below the threshold. Conversely, for separation, apply the threshold as a minimum.

Cell classification

We analysed the nucleus segmentations for features such as volume, surface area, fraction of membrane within folds and depth in cortex. We trained a support vector machine (SVM) machine classifier to use these features to detect which nucleus detections were likely neurons within the volume, with 96.9% precision and 99.6% recall. This model was trained based upon data from an independent dataset, and the performance numbers are based upon evaluating the concordance of the model with the manual cell-type calls within the volume. This model predicted 82,247 neurons detected within the larger subvolume. For the neurons, we extracted additional features from the somatic region of the cell, including its volume, surface area, and density of synapses. Dimensionality reduction on this feature space revealed a clear separation between neurons with well-segmented somatic regions (n = 69,957) from those with fragmented segmentations or sizable merges with other objects (n = 12,290). Combining those features with the nucleus features, we trained a multi-layer perceptron classifier to distinguish excitatory from inhibitory neurons among the well-segmented subset, using the 80% of the manual labelled data as a training set, and 20% as a validation set to choose hyper-parameters. After running the classifier across the entire dataset, we then tested the performance by sampling an additional 350 cells (250 excitatory and 100 inhibitory). We estimate from this test that the classifier had an overall accuracy of 97% with an estimated 96% precision and 94% recall for inhibitory calls.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Source link

Functional connectomics spanning multiple areas of mouse visual cortex – Nature