Rank Spatially Correlated Gene Clusters¶

doHMRF()

Rank spatial correlated clusters according to correlation structure.

doHMRF(
    gobject,
    expression_values = c("normalized", "scaled", "custom"),
    spatial_network_name = "Delaunay_network",
    spatial_genes = NULL,
    spatial_dimensions = c("sdimx", "sdimy", "sdimz"),
    dim_reduction_to_use = NULL,
    dim_reduction_name = "pca",
    dimensions_to_use = 1:10,
    seed = 100,
    name = "test",
    k = 10,
    betas = c(0, 2, 50),
    tolerance = 1e-10,
    zscore = c("none", "rowcol", "colrow"),
    numinit = 100,
    python_path = NULL,
    output_folder = NULL,
    overwrite_output = TRUE
)

Arguments¶

gobject	giotto object
expression_values	expression values to use
spatial_network_name	name of spatial network to use for HMRF
spatial_genes	spatial genes to use for HMRF
spatial_dimensions	select spatial dimensions to use, default is all possible dimensions
dim_reduction_to_use	use another dimension reduction set as input
dim_reduction_name	name of dimension reduction set to use
dimensions_to_use	number of dimensions to use as input
seed	seed to fix random number generator (for creating initialization of HMRF) (-1 if no fixing)
name	name of HMRF run
k	number of HMRF domains
betas	betas to test for. three numbers: start_beta, beta_increment, num_betas e.g. c(0, 2.0, 50)
tolerance	tolerance
zscore	zscore
numinit	number of initializations
python_path	python path to use
output_folder	output folder to save results
overwrite_output	overwrite output folder

Value¶

Creates a directory with results that can be viewed with viewHMRFresults().

Returns a list type object with the following fields:

name: name used
output_data: output path
k: k used,
betas: betas used,
python_path: python_path_used

Details¶

Previously we have published a paper that describes the hidden markov random field (HMRF) method for identifying spatial patterns from spatial gene expression data. Unlike typical clustering techniques, HMRF integrates both spatial locations and the gene expression to find groups of cells that are both spatially adjacent and which are similar to each other in expression space.

A general guideline for choosing k is to use the gap-statistics by Tibshrani. The elbow point of the gap size vs. k plot usually indicates choices of k. To decide which beta one should use, the current state of algorithm repeats HMRF for a number of betas. Here are some guidelines:

if the number of genes is from 10 to 50, the recommended range is 0 to 10 at increment of 0.5. E.g. c(0,0.5,20)
if the number of genes is below 50, the recommended range is 0 to 15 at increment of 1. E.g. c(0,1,15)
if the number of genes is between 50 to 100, the range is 0 to 50 at increment of 2. E.g. c(0, 2, 25)
if the number of genes is between 100 and 500, the range is 0 to 100 at increment of 5. E.g. c(0, 5, 20)

Within the range of beta, we recommend selecting the best beta by the Bayes information criterion. This requires first performing randomization of spatial positions to generate the null distribution of log-likelihood scores for randomly distributed cells for the same range of betas. Then find the beta where the difference between the observed and the null log-likelihood is maximized. See the HMRF tutorial from the Chapter.

After running HMRF, one can further view the HMRF spatial clusters (viewHMRFresults2D), and add HMRF annotation to pDataDT table (addHMRF).

The main steps are:

Create neighborhood network based on spatial positions
Define genes of interest
Call HMRF routine to find spatial clusters

Important

HMRF is now split into two functions: initHMRF() and doHMRF()
Removed python requirement for HMRF and Java
Removed writing to files and reading from files for HMRF
initHMRF() step consists of filtering spatial genes, sampling spatial genes, and kmeans initialization
doHMRF() will perform HMRF after init step
Spatial gene functions (binspect and silhouetteRank) now automatically add spatial gene results to gene metadata table. To do so, return_gobject is added to functions.
initHMRF will automatically use spatial genes from gene metadata table for gene filtering, sampling purposes. So, binSpect/silhouetteRank should be run with return_gobject=TRUE. BinSpect/silhouetteRank will default to TRUE for return_gobject.

Example

visium=binSpect(visium, return_gobject=TRUE).

Example¶

Previous version of doHMRF()

#from seqfish+ dataset
#100 spatial genes
HMRF_spatial_genes = doHMRF(gobject = VC_test, expression_values = 'scaled', spatial_genes = my_spatial_genes, spatial_network_name = 'Delaunay_network', k = 9, betas = c(0,1,50), output_folder = paste0(hmrf_folder, '/', 'Spatial_genes/SG_top100_k9_scaled'))
## view results of HMRF
for(i in seq(0, 50, by = 1)) {
viewHMRFresults2D(gobject = VC_test,HMRFoutput = HMRF_spatial_genes,k = 9, betas_to_view = i,point_size = 2)
}
## add HMRF of interest to giotto object
VC_test = addHMRF(gobject = VC_test,HMRFoutput = HMRF_spatial_genes, k = 9, betas_to_add = c(28), hmrf_name = 'HMRF_2')

Upated version of doHMRF()

visium=silhouetteRank(visium, rbp_p=0.99, examine_top=0.1, return_gobject=T)
hmrf=initHMRF(visium, spatial_network_name="spatial_network", use_spatial_genes="silhouetteRank", k=10, nstart=100)
res=doHMRF(hmrf, betas=c(0, 10, 5))

visium = binSpect(visium, calc_hub = T, hub_min_int = 5,spatial_network_name = 'spatial_network', return_gobject=T)
hmrf2=initHMRF(visium, spatial_network_name="spatial_network", use_spatial_genes="binSpect", k=10, nstart=100)
res2=doHMRF(hmrf2, betas=c(0, 10, 5))