Image segmentation is a crucial step in various image analysis pipelines and constitutes one of the cutting-edge areas of digital pathology. The advent of quantitative analysis has enabled the evaluation of millions of individual cells in tissues, allowing for the combined assessment of morphological features, biomarker expression, and spatial context. The recorded cells can be described as a point pattern process. However, the classical statistical approaches to point pattern processes prove unreliable in this context due to the presence of multiple irregularly-shaped interstitial cell-devoid spaces in the domain, which correspond to anatomical features (e.g. vessels, lipid vacuoles, glandular lumina) or tissue artefacts (e.g. tissue fractures), and whose coordinates are unknown. These interstitial spaces impede the accurate calculation of the domain area, resulting in biased clustering measurements. Moreover, the mistaken inclusion of empty regions of the domain can directly impact the results of hypothesis testing. The literature currently lacks any introduced bias correction method to address interstitial cell-devoid spaces. To address this gap, we propose novel resampling methods for testing spatial randomness and evaluating relationships among different cell populations. Our methods obviate the need for domain area estimation and provide non-biased clustering measurements. We created the SpaceR software (https://github.com/GBertolazzi/SpaceR) to enhance the accessibility of our methodologies.

Several approaches are employed in image segmentation, ranging from traditional techniques based on the threshold values and cellular features (including size, shape and edge) to advanced methods utilizing machine learning and deep neural networks. Cellular segmentation algorithms are critical components of image analysis processes and typically involve a series of computational steps that aim to identify the background and separate cells from each other, ultimately producing precise boundaries for each cell. Indeed, these algorithms are designed to automatically identify and delineate individual cells within an image. This task can be highly challenging due to variations in cell shapes, sizes, and staining intensities. In image analysis, older software like Imagescope traditionally allows us to evaluate signal intensity or pixel count thresholds by categorizing them as weak, moderate, or strong positivity, but cannot perform cell segmentation. Over recent years, there has been a growing emphasis on cellular segmentation, driven by the advent of new methods and a heightened focus on automation and precision. Contemporary, cutting-edge software tools have evolved from foundational image processing algorithms developed decades ago, including ImageJ, Cellsegm, and CellProfiler [1,2]. HALO is a notable addition to these recent segmentation tools, which facilitates automated tissue segmentation, serial section analysis, and TMA processing (as referenced in HALO). Using HALO, we investigated how cells are spatially distributed. Moreover, we identified different cell populations by recording signal intensities.

Classifying cells in populations based on the protein markers is a common procedure in clinical medicine and scientific research. The study of those populations helps us in tissue characterization and understanding the cell interaction behaviour. A cell population could be distributed in clusters and could preferentially aggregate or segregate with other specific populations. The recorded cell set X = (x1, x2, …, xN} can be described as a point pattern process within the domain De R2. In this context, the D domain is affected by the presence of multiple irregularly shaped interstitial cell-devoid spaces, which correspond to anatomical features (e.g. vessels, lipid vacuoles, glandular lumina) or tissue artefacts (e.g. tissue fractures), and whose coordinates are unknown (Fig. 1). These interstitial spaces limit analyses conducted using the point pattern approach; specifically, the calculation of the domain area is not accurate. Despite some authors having utilized classical clustering measurements in spatial cytometry [3, 4, 5], no bias correction method has been introduced in the literature to account for interstitial cell-devoid spaces. To address this gap, we propose novel resampling methods for testing spatial randomness and evaluating relationships among different cell populations. Our approach doesn't need the domain area estimation and provides non-biased clustering measurements.

Figure 1.

Spatial segmentation of a lung tissue using HALO software. The empty white holes are the air space of the alveolus.

Figure 1.

Spatial segmentation of a lung tissue using HALO software. The empty white holes are the air space of the alveolus.

Close modal

HALO [6] is one of the most widely adopted software tools for tissue image analysis in multiplexed staining approaches. It allows accurate quantitative tissue analyses in digital pathology, also integrating artificial intelligence [7, 8, 9]. In detail, it enables quantifying various histopathological features in digital slides with high reproducibility. The HALO platform (v3.2.1851.229, Indica Labs) image analysis tool encompasses different modules for brightfield or fluorescence microscopy. Besides the automated tissue segmentation, the different modules evaluate the spatial distribution, maintaining an interactive link between cell data and images. In our spatial cytometry dataset, we adopted a specific immunohistochemistry module for recognizing each cell's nucleus, cytoplasm, and membrane. The adopted module reports staining intensity metrics and cell coordinates within the tissue (see supplementary data).

Spatial Analysis offers the ability to identify cells’ proximity and spatial distribution in individual tissues. It determines the average distance and the number of unique neighbours between cell populations.

In this section, we present three spatial analysis methods to study spatial cell distributions and relations;

  • Spatial randomness test evaluates the presence of cell clustering behavior.

  • Spatial dependence test evaluates the presence of aggregation and segregation effects between two cell populations.

  • Distribution equality test evaluates if two cell populations are differentially distributed over the domain.

All these methods are based on resampling approaches conceptually close to a bootstrapping procedure without replacement. Therefore, the resampling under the null hypothesis will provide the null distribution of the statistics of interest.

3.1 Spatial randomness test

Let's consider a set S of N cells and a subset A of na. lls identified by a specific biomarker. Our approach evaluates the clustering degree of subset A by comparing the sample average nearest neighbour (NN) distance a. within the subset A with the distribution of the average NN distances under the hypothesis of spatial randomness.

The average NN distance of the subset A is equal to:

x¯a=1nai=1n2dia

where, dia. is the distance between the ith cell of set A and its nearest neighbour cell inside A. The empirical distribution of a under the hypothesis of spatial randomness is obtained by using the following resampling procedure:

  • K groups of na cells are randomly selected from the whole set S.

  • The average NN distance of the kth random group is calculated as.

x¯k=1nai=1n2dik

where, dik the distance between the ith cell of set k and its nearest neighbour cell inside k.

We obtain a vector of K values that compose the empirical null distribution of a. Therefore, the p-value for spatial randomness testing can be calculated as the proportion of values from the empirical distribution lower or equal to the observed average NN distance a:

p-value=1Kk=1KI(x¯kx¯a)

As an additional advantage of our approach, we can calculate an empirical version of the Clark-Evans aggregation index [10] as the ratio between the observed average NN distance and the average value of the null distribution:

R=Kxa¯k=1Kx¯k

The empirical R index has the same interpretation of the Clark-Evans index. Indeed, R < 1 indicates aggregation (i.e., clustering), whereas R > 1 indicates segregation. Finally, a value of R ≈ 1 indicates spatial randomness.

3.2 Spatial dependence test

We can extend the criterion used in the previous testing procedure to evaluate if a cell population is significantly close or far from another one. Let's consider a set S of N cells and two subsets, say A and B, identified through two different biomarkers, with AB = Ø. We want to evaluate if A cells are significantly close or far from B cells. To reach this goal, we use a resampling approach that compares the average NN distance ab from A cells to B cells with the distribution of ab under the hypothesis of spatial random distribution of the B cells. The average NN distance ab is calculated as:

x¯ab=1nai=1n2diab

where, diab is the distance between the ith cell of set A and its nearest neighbour cell of B. The empirical distribution of a under the hypothesis of spatial randomness of B cells is obtained using the following resampling procedure;

  • After excluding the A cells from the S set, K groups of nb cells are randomly selected from the remaining N-na cells.

  • The average NN distance of the kth random group is calculated as.

x¯ak=1nai=1n2diak

where, diab is the distance between the ith cell of set A and its nearest neighbour cell inside k.

This resampling procedure gives a vector of K values that composes the empirical null distribution of ab.

Considering the null distribution, the dependence test can be defined as a two-tailed test or as a one-tailed test to distinguish aggregation and segregation effects. If we want to evaluate if A cells are significantly close to B cells (aggregation effect), the p-value can be calculated as the proportion of values from the empirical distribution lower or equal to the observed average NN distance ab:

aggregationp-value=1Kk=1KI(x¯akx¯ab)

On the other hand, if we want to evaluate if A cells are significantly far from B cells (segregation effect), the p-value can be calculated as the proportion of values from the empirical distribution greater or equal to the observed average NN distance ab:

segregationp-value=1Kk=1Kl(x¯akx¯ab)

Finally, the two-tailed p-value can be calculated as:

two-tailedp-value={2Kk=1KI(x¯akx¯ab),ifx¯abM2Kk=1KI(x¯akx¯ab),otherwise

where M is the median value of the empirical null distribution. Additionally, we can calculate an interaction index through the following ratio:

Rab=KX¯abk=1KX¯ak

where Rab < 1 indicates an aggregation effect of A cells on B cells, Rab > 1 indicates a segregation effect of A cells on B cells, and Rab ≈ 1 indicates the absence of a spatial association between A and B.

It is important to underline that the spatial relation between A and B cells could be asymmetrical. Indeed, the spatial behaviour of NN cells of A could be very different from that of the NN cells of B. Therefore, the test results have to be interpreted from the point of view of the A population in terms of NN cells. In other words, we evaluate if the NN cells of A involve a significant number of B cells.

Until now, we have assumed that AB = Ø. If there are nab double-positive cells that belong to both A and B populations, some observed dab distances are equal to zero. In this case, we fix the number of double-positive cells equal to nab in each random sample. This strategy allows us to run the test despite double positive cells, making the testing procedure more conservative. Moreover, we suggest considering a part of the double-positive cells during the testing to understand their spatial relation with the other populations.

3.3 Distribution equality test

Let's consider two cell populations, say A and B, identified through two different biomarkers, with AB = Ø.

We propose a statistical test to evaluate the hypothesis of spatial distribution equality between A and B populations. A rejection of this hypothesis indicates a different spatial distribution of the two cell groups.

Our statistical approach is conceptually close to the quadrat count method [11]; the domain of interest is divided into m subareas (empty subareas are eventually excluded). Then, we calculate the difference between the number of the two types of cells present in each subarea. Statistical test C compares the observed differences with the expected differences under the hypothesis of spatial cell mixing:

C=i=1m(diE[di])2|E[di]|

where, di is the difference between the number of A cells and B cells in the ith subarea, and E[di] is the expected difference calculated as:

E[di]=pni(1p)ni

where ni is the total number of cells in the ith subarea, and p is the proportion of A cells in the whole domain, that is, p = na/(na + nb).

Although the previous formula looks familiar, the C variable doesn't follow a Chi-square distribution because the expected difference in the denominator is expressed as an absolute value. For this reason, we empirically calculate the null distribution through a resampling approach:

  • The cell groups A and B are randomly mixed K times, fixing their sizes na and nb.

  • The C statistic is calculated for each randomly mixed scenario.

Therefore, the p-value can be calculated as the proportion of values from the empirical null distribution greater than or equal to the observed statistical test value c:

p-value=1Kk=1KI(ckC)

Like any quadrat count approach, our method is influenced by the quadrat size [12]. On the other hand, an advantage of our approach is that quadrats that go out of the domain don't bias the C statistics calculation because the expected difference di is calculated based on the cell number ni in the ith subarea.

Adopting the same criteria of the previous indexes, we can calculate a dissimilarity index as:

IC=KCk=1Kck

where, IC ≠ 1 indicates that A and B are differentially distributed over the domain, and IC ≈ 1 indicates that A and B cells are equally distributed.

4.1 Simulation Analysis

We carried out simulation analyses to evaluate the p-value accuracy of the introduced statistical tests. Specifically, for each testing procedure, the null distribution of the test statistic has been obtained by calculating the statistic values among K two-dimensional random fields containing na = p · N cells of set A and N = 10.000 total cells. Then, we obtained the p-value distribution by performing K2 additional simulations, each providing a statistical test value xi, with i ∈ {1, …, K2}. Therefore, the p-value pi has been calculated by comparing the statistic value xi with the null distribution previously obtained.

Table 1, 2, and 3 show the p-value distribution properties varying the p = na/N parameter. For K≥1.000, we observed a uniform distribution of p-values in almost all simulations, with a p-value accuracy close to 0.001. The results highlight high p-value reliability for all the introduced statistical tests.

Table 1.

Properties of spatial randomness test p-value distribution over different null distribution sizes. The analysis has been carried out by varying the dimension na of the cell set on which the clustering measurements were calculated.

Distribution size# p-valuesna/NUniform distr. p-valueProp. p-val < 0.01Accuracy
K = 1.000 K2 = 49.000 0.001 0.24 0.00944 0.00056 
  0.01 0.085 0.01327 0.00327 
  0.05 0.98 0.01229 0.00229 
  0.1 0.96 0.00671 0.00329 
  0.2 0.73 0.0084 0.0016 
K = 5.000 K2 = 45.000 0.001 0.1 0.0095 5e-04 
  0.01 0.096 0.00912 0.00088 
  0.05 0.66 0.01074 0.00074 
  0.1 0.99 0.0099 1e-04 
  0.2 0.76 0.00981 0.00019 
K = 9.999 K2 = 40.000 0.001 0.11 0.01026 0.00026 
  0.01 0.13 0.01016 0.00016 
  0.05 0.4 0.01049 0.00049 
  0.1 0.91 0.01138 0.00138 
  0.2 0.4 0.00891 0.00109 
Distribution size# p-valuesna/NUniform distr. p-valueProp. p-val < 0.01Accuracy
K = 1.000 K2 = 49.000 0.001 0.24 0.00944 0.00056 
  0.01 0.085 0.01327 0.00327 
  0.05 0.98 0.01229 0.00229 
  0.1 0.96 0.00671 0.00329 
  0.2 0.73 0.0084 0.0016 
K = 5.000 K2 = 45.000 0.001 0.1 0.0095 5e-04 
  0.01 0.096 0.00912 0.00088 
  0.05 0.66 0.01074 0.00074 
  0.1 0.99 0.0099 1e-04 
  0.2 0.76 0.00981 0.00019 
K = 9.999 K2 = 40.000 0.001 0.11 0.01026 0.00026 
  0.01 0.13 0.01016 0.00016 
  0.05 0.4 0.01049 0.00049 
  0.1 0.91 0.01138 0.00138 
  0.2 0.4 0.00891 0.00109 
Table 2.

Properties of spatial dependence test p-value distribution over different null distribution sizes. Since we want to evaluate if A cells are significantly close to B cells, we have fixed the dimension na of the A set such that na/N = 0.01, and we have varied the dimension nb of the B set.

Distribution size# p-valuesnb/NUniform distr. p-valueProp. p-val < 0.01Accuracy
K = 1.000 K2 = 49.000 0.01 0.35 0.0194 0.0094 
  0.05 0.018 0.00853 0.00147 
  0.1 0.17 0.00651 0.00349 
  0.2 0.8 0.01246 0.00246 
K = 5.000 K2 = 45.000 0.01 0.34 0.01159 0.00159 
  0.05 0.035 0.00731 0.00269 
  0.1 0.38 0.00935 0.00065 
  0.2 0.83 0.01064 0.00064 
K = 9.999 K2 = 40.000 0.01 0.25 0.01122 0.00122 
  0.05 0.028 0.00737 0.00263 
  0.1 0.27 0.01104 0.00104 
  0.2 0.78 0.01139 0.00139 
Distribution size# p-valuesnb/NUniform distr. p-valueProp. p-val < 0.01Accuracy
K = 1.000 K2 = 49.000 0.01 0.35 0.0194 0.0094 
  0.05 0.018 0.00853 0.00147 
  0.1 0.17 0.00651 0.00349 
  0.2 0.8 0.01246 0.00246 
K = 5.000 K2 = 45.000 0.01 0.34 0.01159 0.00159 
  0.05 0.035 0.00731 0.00269 
  0.1 0.38 0.00935 0.00065 
  0.2 0.83 0.01064 0.00064 
K = 9.999 K2 = 40.000 0.01 0.25 0.01122 0.00122 
  0.05 0.028 0.00737 0.00263 
  0.1 0.27 0.01104 0.00104 
  0.2 0.78 0.01139 0.00139 
Table 3.

Properties of equality test p-value distribution over different null distribution sizes. Since we want to test the spatial difference between A and B sets, the analysis has been carried out by varying the size gap between the two cell sets under exam (where na = N - nb).

Distribution size# p-valuesna/NUniform distr. p-valueProp. p-val< 0.01Accuracy
K = 1.000 K2 = 49.000 0.001 0.33 0.00944 0.00056 
  0.01 0.034 0.01263 0.00263 
  0.05 0.71 0.00806 0.00194 
  0.1 0.49 0.01004 4e-05 
  0.2 0.1 0.00938 0.00062 
  0.48 0.04 0.01495 0.00495 
K = 5.000 K2 = 45.000 0.001 0.76 0.01304 0.00304 
  0.01 0.15 0.01177 0.00177 
  0.05 0.39 0.00897 0.00103 
  0.1 0.5 0.0115 0.0015 
  0.2 0.7 0.009 0.001 
  0.48 0.43 0.01053 0.00053 
K = 9.999 K2 = 40.000 0.001 0.62 0.01109 0.00109 
  0.01 0.16 0.01102 0.00102 
  0.05 0.67 0.00969 0.00031 
  0.1 0.59 0.01112 0.00112 
  0.2 0.89 0.00974 0.00026 
  0.48 0.48 0.00887 0.00113 
Distribution size# p-valuesna/NUniform distr. p-valueProp. p-val< 0.01Accuracy
K = 1.000 K2 = 49.000 0.001 0.33 0.00944 0.00056 
  0.01 0.034 0.01263 0.00263 
  0.05 0.71 0.00806 0.00194 
  0.1 0.49 0.01004 4e-05 
  0.2 0.1 0.00938 0.00062 
  0.48 0.04 0.01495 0.00495 
K = 5.000 K2 = 45.000 0.001 0.76 0.01304 0.00304 
  0.01 0.15 0.01177 0.00177 
  0.05 0.39 0.00897 0.00103 
  0.1 0.5 0.0115 0.0015 
  0.2 0.7 0.009 0.001 
  0.48 0.43 0.01053 0.00053 
K = 9.999 K2 = 40.000 0.001 0.62 0.01109 0.00109 
  0.01 0.16 0.01102 0.00102 
  0.05 0.67 0.00969 0.00031 
  0.1 0.59 0.01112 0.00112 
  0.2 0.89 0.00974 0.00026 
  0.48 0.48 0.00887 0.00113 

4.2 Analysis on real spatial cytometry data

As proof of concept, we applied our statistical tests to real spatial cytometry data from the area segmentation of lung and tonsil samples. In particular, we focused on lymph node germinal center (GC) compartments. The

GC is a specialized microenvironment formed within the B-cell follicles in secondary lymphoid tissues and is subdivided into two distinct functional compartments: the Dark Zone (DZ), the site of B-cell proliferation and somatic hypermutation (SHM); and the Light Zone (LZ), where antigens are presented to follicular T helper (Tfh) cells for selection [13].

Table 1 reports the spatial randomness test results from one lung sample (Fig. 3A) and two tonsil samples (Fig. 3B and 3C). Those samples clearly show a clustering behaviour of cell populations, while the bottom part of Table 1 reports the results from a randomized sample (Fig. 3D). As we expected, clustering p-values calculated on the random sample are not significant. Table 2 reports the results from the spatial dependence test. In particular, we have separately evaluated the aggregation effect of cells expressing the dark-zone associated B-cell marker AID on cells expressing the DNA-damage marker gH2AX and the segregation effect of AID+ B cells with CD3-expressing T cells. While the bottom part of Table 2 reports the results from a randomized sample (Fig. 3D). As we expected, the randomization eliminates the spatial dependence among cell populations.

Figure 2.

Cell spatial distribution of four different samples. Clustering measurements have been calculated for each cell type group as reported in Tab.4. A) Spatial distribution of ACE2 cells in lung tissue - B) Spatial distribution of AID and CD3 cells in tonsil tissue - C) Spatial distribution of AID and gH2AX cells in tonsil tissue - D) Spatial distribution of randomized cells from sample C.

Figure 2.

Cell spatial distribution of four different samples. Clustering measurements have been calculated for each cell type group as reported in Tab.4. A) Spatial distribution of ACE2 cells in lung tissue - B) Spatial distribution of AID and CD3 cells in tonsil tissue - C) Spatial distribution of AID and gH2AX cells in tonsil tissue - D) Spatial distribution of randomized cells from sample C.

Close modal
Figure 3.

Cell spatial distribution of two cell populations clearly separated (cell mixing p-value < 2e-04).

Figure 3.

Cell spatial distribution of two cell populations clearly separated (cell mixing p-value < 2e-04).

Close modal

Fig. 4, 5, 6, and 7 report the cell spatial distributions on which we applied the spatial equality test. In the right panel of each figure, we report the observed differences and the E[di] expected differences obtained from the quadrat count approach. In particular, Fig. 4 shows a clear cell spatial separation (p-value < 2e-04), Fig. 5 reports two cell sets artificially randomly mixed (p-value = 0.96), Fig. 6 shows two cell populations naturally mixed (p-value = 0.35), and Fig. 7 shows two cell population spatially separated (p-value = 0.0016).

Figure 4.

Cell spatial distribution of two randomized cell populations (cell mixing p-value = 0.96).

Figure 4.

Cell spatial distribution of two randomized cell populations (cell mixing p-value = 0.96).

Close modal
Figure 5.

Cell spatial distribution of two mixed cell populations (cell mixing p-value = 0.35).

Figure 5.

Cell spatial distribution of two mixed cell populations (cell mixing p-value = 0.35).

Close modal
Figure 6.

Cell spatial distribution of two separated cell populations (cell mixing p-value = 0.0016).

Figure 6.

Cell spatial distribution of two separated cell populations (cell mixing p-value = 0.0016).

Close modal

Our results provide a quantitative layout for interpreting specific biological features of the investigated tissue. Specifically, we demonstrated a spatial association between AID-expressing B cells of the dark zone, which undergo replication and mutational activity, with the DNA-damage-associated marker pgH2AX. Moreover, we could support by quantitative analysis, consistent segregation of the AID+ DZ cells with T-cells residing in the germinal center, a feature that prompts intriguing hypotheses regarding the selective regulation of T-cell presence in the DZ.

Table 4.

Cluster analysis on four separate samples shown in Fig. 2. Each sample contains different cell populations (i.e., gH2AX, AID, CD3, CD8, CD4) on which the clustering measurements have been calculated.

samplesetnaab∑x̄k/KRp-value
S1 ACE2 203 47.34 61.61 0.768 <2e-04 
S2 AID 218 62.48 84.9 0.736 <2e-04 
 CD3 264 64.58 77.57 0.832 <2e-04 
S3 AID only 988 26.88 30.08 0.894 <2e-04 
 gH2AX only 163 56.78 64.02 0.887 6e-04 
 double 102 53.88 80.69 0.668 <2e-04 
random AID only 988 29.86 30.07 0.993 0.241 
 gH2AX only 163 63.66 64.06 0.994 0.445 
 double 102 82.38 80.74 1.02 0.658 
samplesetnaab∑x̄k/KRp-value
S1 ACE2 203 47.34 61.61 0.768 <2e-04 
S2 AID 218 62.48 84.9 0.736 <2e-04 
 CD3 264 64.58 77.57 0.832 <2e-04 
S3 AID only 988 26.88 30.08 0.894 <2e-04 
 gH2AX only 163 56.78 64.02 0.887 6e-04 
 double 102 53.88 80.69 0.668 <2e-04 
random AID only 988 29.86 30.07 0.993 0.241 
 gH2AX only 163 63.66 64.06 0.994 0.445 
 double 102 82.38 80.74 1.02 0.658 
Table 5.

Results of spatial cell aggregation and segregation tests on cell distributions reported in Fig. 2. In the table, N is the total number of cells per sample used for the testing procedure.

sampleNA setnaB setnbab∑x̄k/KRabp-valueH1 hypothesis
S2 2418 AID 218 CD3 264 153.44 88.38 1.736 <2e-04 segregation 
S3 2369 AID 163 gH2AX only 163 72.14 91.61 0.787 <2e-04 aggregation 
    double 102 100.03 112.63 0.888 0.0936  
random 2369 AID 163 gH2AX only 163 64.89 63.72 1.018 0.49 two-tailed 
    double 102 81 80.24 1.009 0.71  
sampleNA setnaB setnbab∑x̄k/KRabp-valueH1 hypothesis
S2 2418 AID 218 CD3 264 153.44 88.38 1.736 <2e-04 segregation 
S3 2369 AID 163 gH2AX only 163 72.14 91.61 0.787 <2e-04 aggregation 
    double 102 100.03 112.63 0.888 0.0936  
random 2369 AID 163 gH2AX only 163 64.89 63.72 1.018 0.49 two-tailed 
    double 102 81 80.24 1.009 0.71  

This paper introduces a novel testing procedure for spatial cytometry data. The procedure is not affected by the presence of interstitial cell-devoid spaces in the domain, which typically determines a bias in the outcome of classical clustering measurements.

Even though certain authors have applied traditional point pattern techniques in spatial cytometry [3], no method has been proposed in the literature to consider the presence of interstitial cell-devoid spaces. Those spaces don't allow a reliable domain area estimation, causing inaccuracy in clustering measurements and p-value calculation. Our research significantly advances the existing literature by providing a point pattern approach that avoids the domain area estimation.

The presented methods can be used to test spatial randomness, aggregation effects, segregation effects, and spatial distribution equality, highlighting the spatial relationships among cell populations. Spatial interaction and clustering indices can be estimated to describe spatial cell behaviour.

Our methods can be applied to any point pattern analysis in which an inaccurate estimation of the domain area limits the study of sub-populations. Indeed, the elements that don't belong to the group of interest can be considered to perform the resampling and construct the null distributions.

The simulations support a high p-value accuracy for resampling numbers of 1,000 or more. The results obtained for real data provide a clear view of spatial cell behaviour, while the analysis of artificially randomized data yields not-significant p-values coherently with the null hypothesis.

This article's Supplementary data and R functions can be found on the SpaceR GitHub page (https://github.com/GBertolazzi/SpaceR). The SpaceR software allows the reader to use the R functions for analyzing his own data. The supplementary dataset can be used as an example to run the analysis reported in the manuscript.

Claudio Tripodo formulated the research problems. Giorgio Bertolazzi developed and implemented the method under the supervision of Michele Tumminello and Claudio Tripodo. Gaia Morello and Beatrice Belmonte collected the data. Giorgio Bertolazzi and Michele Tumminello analyzed the data. All the authors wrote and revised the manuscript.

This study has been supported by the Italian Ministry of Education, University and Research (MIUR) through the “PON Research and Innovation 2014-2020” to G.B.; by the National Biodiversity Future Center (NBFC) CN00000033 (CUP UNIPA B73C22000790001), and through the OBIND project N.086202000366 (CUP G29J18000700007) to M.T.; by the Italian Foundation for Cancer Research (AIRC) through the 5x1000 I.D. 22759 Grant and AIRC Accelerator Award ID. 24296 to C.T.; and by the Italian Ministry of Education, University and Research (MIUR) Grant 2017K7FSYB to C.T.

[1]
Wang
,
Z.
Cell Segmentation for Image Cytometry: Advances, Insufficiencies, and Challenges
.
Cytometry
95
(
7
),
708
711
(
2019
).
[2]
Wang
,
Z.
,
Li
,
H.
Generalizing cell segmentation and quantification
.
BMC Bioinformatics
18
,
189
(
2017
).
[3]
Parra
,
E.R.
Methods to determine and analyze the cellular spatial distribution extracted from multiplex immunofluorescence data to understand the tumor microenvironment
.
Frontiers in Molecular Sciences
8
PMC8226163 (
2021
).
[4]
Kaufmann
et al
. (
2021
)
Using the r package spatstat to assess inhibitory effects of microregional hypoxia on the infiltration of cancers of the head and neck region by cyto- toxic t lymphocytes
,
Cancers
13
(
8
),
1924
(2021).
[5]
Gong
,
C.
, et al
.
Quantitative characterization of CD8+ T cell clustering and spatial heterogeneity in solid tumors
,
Frontiers in Oncology
649
(
8
), (
2018
).
[7]
Bakouny
,
Z.
, et al
.
Integrative molecular characterization of sarcomatoid and rhabdoid renal cell carcinoma
,
Nature Communications
12
(
1
), (
2021
).
[8]
Gao
,
J.
, et al
.
Neoadjuvant PD-L1 plus CTLA-4 blockade in patients with cisplatin-ineligible operable high-risk urothelial carcinoma
,
Nature Medicine
26
,
1845
1851
(
2020
).
[9]
Goto
,
M.
, et al
.
Prognostic Impact of CXCR7 and CXCL12 Expression in Patients with Esophageal Adenocarcinoma
,
Annals Of Surgical Oncology
28
(
9
),
4943
4951
(
2021
).
[10]
Petrere Jr
,
M.
:
The variance of the index (R) of aggregation of Clark and Evans
.
Oecologia
68
,
158
159
(
1985
).
[11]
De La Cruza
,
J. L.
,
Gutiérrez
,
M.A.
:
Spatial statistics of pitting corrosion patterning: Quadrat counts and the non-homogeneous Poisson process
.
Corrosion Science
50
(
5
), pp.
1441
1448
(
2008
).
[12]
Diggle
,
P.J.
Statistical Analysis of Spatial Point Patterns
,
Academic Press
,
New York
(
1983
).
[13]
Mesin
,
L.
,
Ersching
,
J.
,
Victora
,
GD.
Germinal center B cell dynamics
.
Immunity
45
(
3
),
471
482
(
2016
).
This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For a full description of the license, please visit https://creativecommons.org/licenses/by/4.0/legalcode.