Algorithm Details

X-Shift (Fast KNN Estimation)

X-Shift is a population finding algorithm that can process large datasets using fast KNN estimation of cell event density and automatically arranges populations by a marker-based classification system.

X-Shift algorithm

Given a dataset (A), X-shift computes the density estimate for each data point (B). It then searches for the local density maxima in a nearest-neighbor graph, which become cluster centroids. All the remaining data points are then connected to the centroids via density-ascending paths in the graph, thus forming clusters (C). The algorithm further checks for the presence of density minima on a straight line segment between the neighboring centroids and merges them as necessary (D). This is needed to ensure that the neighboring clusters, even if they have similar phenotypes, do in fact represent unique density-separated populations. Furthermore, clusters are merged based on a fixed Mahalanobis distance threshold.

Samusik, N., Good, Z., Spitzer, M. H., Davis, K. L., & Nolan, G. P. (2016). Automated mapping of phenotype space with single-cell data. Nature methods, 13(6), 493.

GitHub Link

FLANN

Fast Library for Approximate Nearest Neighbors (FLANN) is a library for performing fast approximate nearest neighbor searches in high dimensional spaces. It contains a collection of algorithms we found to work best for nearest neighbor search and a system for automatically choosing the best algorithm and optimum parameters depending on the dataset. CodexMAV uses a flann_java port of FLANN library.

Muja, M., & Lowe, D. G. (2014). Scalable nearest neighbor algorithms for high dimensional data. IEEE transactions on pattern analysis and machine intelligence, 36(11), 2227-2240.

GitHub Link

Box Plot

A box plot is a method for graphically depicting groups of numerical data through their quartiles. Box plots may also have lines extending vertically from the boxes (whiskers) indicating variability outside the upper and lower quartiles. The spacings between the different parts of the box indicate the degree of dispersion (spread) and skewness in the data, and show outliers.

Box plot

Voronoi Diagram

A Voronoi diagram is a partitioning of a plane into regions based on distance to points in a specific subset of the plane. That set of points (called seeds, sites, or generators) is specified beforehand, and for each seed there is a corresponding region consisting of all points closer to that seed than to any other. These regions are called Voronoi cells. The Voronoi diagram of a set of points is dual to its Delaunay triangulation. CodexMAV computes Delaunay graph of cell centers to estimate which cells are interacting with each other and uses Voronoi diagrams to display the abstract-level single-cell architecture of the tissue, as described here.

Voronoi diagram

Dendrograms

A dendrogram is a diagram representing a tree. In hierarchical clustering, it illustrates the arrangement of the clusters produced by the corresponding analyses. The hierarchical clustering dendrogram shows a column of N nodes representing the initial data, and the remaining nodes represent the clusters to which the data belong, with the lines representing the distance (dissimilarity). The distance between merged clusters is monotone, increasing with the level of the merger: the height of each node in the plot is proportional to the value of the intergroup dissimilarity between its two daughters (the nodes on the right representing individual observations all plotted at zero height).

Dendrogram example

Spatial Interactions

Figure 1: Graphical representation of criteria for spatial interactions. Cells within the gray region are considered spatially interacting.

In the Figure 1, P1 = Population 1, P2 = Population 2, Min and Max Distance are user-specified values to define the proximity of spatial interactions, which also define the radius.

Spatial interaction count from P1 to P2 : Consider P1 the origin. The number of P2 cells within shaded radius (Figure 1) is the spatial interaction count between Population 1 and Population 2. For example in Figure 1, spatial interaction count from P1 to P2 = 3.

Spatial interaction count from P1 to Total cells: Consider P1 the origin. The number of Pn (n = 1, 2, 3 ….) cells within the shaded radius is the interaction count between Population 1 and total cells. For example in Figure 1, spatial interaction count from P1 to total cells = 4.

LogOdds ratio for spatial interaction from P1 to P2 = log (Element A / Element B)

where Element A = Spatial interaction count from P1 to P2 / Spatial interaction count from P1 to total cells. Element B = Number of cells in Population 2 / total cells.