空间转录组细胞通讯分析：CellChat v2（Nature Protocols，IF: 16.0/Q1）

软件于2024年9月16号更新发表在Nature Protocols杂志上，文献标题为《CellChat for systematic analysis of cell–cell communication from single-cell transcriptomics》，官网如下：

https://github.com/jinworks/CellChat

安装部分这里就跳过啦，非常简单~

0.示例数据

本次使用的是来自10x genomics官网的数据：mouse brain 10X visium dataset (https://www./resources/datasets/mouse-brain-serial-section-1-sagittal-anterior-1-standard-1-0-0)

注释使用的这个教程：Seurat (https:///seurat/articles/spatial_vignette.html)

这里下载了（教程中的示例数据都可以在这里进行下载：https:///projects/Example_data_for_cell-cell_communication_analysis_using_CellChat/157272 ）已经处理好的：visium_mouse_cortex_annotated.RData

数据加载进来：

rm(list=ls())
options(stringsAsFactors = FALSE)
library(Seurat)
library(CellChat)
packageVersion("CellChat")
library(patchwork)

## 加载数据
# Here we load a Seurat object of 10X Visium mouse cortex data and its associated cell meta data
# load("visium_mouse_cortex_annotated_full.RData")
load("visium_mouse_cortex_annotated.RData")
ls()

# show the image and annotated spots
nlevels(visium.brain)
color.use <- scPalette(nlevels(visium.brain)); 
names(color.use) <- levels(visium.brain)
color.use

# 看一下切片注释结果
Seurat::SpatialDimPlot(visium.brain, label = T, label.size = 3, cols = color.use, pt.size.factor = 1600)

1.数据预处理

CellChat软件输入要求：

在从空间分辨的转录组数据中推断空间邻近的细胞间通信时，需要提供 spot/cell centroids 的空间坐标/位置。此外，为了筛选出超出分子最大扩散范围（例如，约250微米）的细胞间通信，CellChat 需要以微米为单位计算 cell centroid-to-centroid 的距离。因此，对于仅提供像素空间坐标的成像技术，CellChat 要求用户输入转换因子，以将空间坐标从像素转换为微米。

基因表达数据（data.input）：基因在行，细胞在列，数据需经过标准化（如库大小标准化后对数转换）。
细胞和样本标签（meta）：包含细胞信息的数据框，用于定义细胞群组，并提供“samples”列以整合多个样本进行分析。
空间坐标（coordinates）：每个细胞/斑点质心的空间坐标。
空间因素（spatial.factors）：包含两个参数ratio（像素到微米的转换因子）和tol（容差因子，用于增加距离比较的鲁棒性）。

如果分析接触依赖或旁分泌信号传导，用户还需要提供以下内容：

contact.range：表示细胞间接触范围的值（单位：微米），通常为细胞直径或中心到中心的距离。
或者，用户可以提供contact.knn.k：将接触依赖信号限制在k最近邻内。

基因表达数据

提取标准化后的数据：

note：这个数据不是一个完整的数据，仅作为示例代码用。

# Prepare input data for CelChat analysis
# normalized data matrix
data.input = Seurat::GetAssayData(visium.brain, slot = "data", assay = "SCT") 
data.input[1:5,1:5]
dim(data.input)

metadata数据

# define the meta data: 
# a column named `samples` should be provided for spatial transcriptomics analysis, 
# which is useful for analyzing cell-cell communication by aggregating multiple samples/replicates. 
# Of note, for comparison analysis across different conditions, 
# users still need to create a CellChat object seperately for each condition.  

# manually create a dataframe consisting of the cell labels
meta = data.frame(labels = Seurat::Idents(visium.brain), samples = "sample1", 
                  row.names = names(Seurat::Idents(visium.brain))) 
meta$samples <- factor(meta$samples)
# check the cell labels
unique(meta$labels) 

# check the sample labels
unique(meta$samples)

下面这个文件 scalefactors_json.json 去10x官网下载就可以了：

# 加载空间信息
# Spatial locations of spots from full (NOT high/low) resolution images are required. For 10X Visium, 
# this information is in `tissue_positions.csv`. 
spatial.locs = Seurat::GetTissueCoordinates(visium.brain, scale = NULL, cols = c("imagerow", "imagecol")) 
head(spatial.locs)

# Spatial factors of spatial coordinates
# For 10X Visium, the conversion factor of converting spatial coordinates from Pixels to Micrometers can be computed as the ratio of the theoretical spot size (i.e., 65um) over the number of pixels that span the diameter of a theoretical spot size in the full-resolution image 

# (i.e., 'spot_diameter_fullres' in pixels in the 'scalefactors_json.json' file). 

# Of note, the 'spot_diameter_fullres' factor is different from the `spot` in Seurat object and thus users still need to get the value from the original json file. 

scalefactors = jsonlite::fromJSON(txt = file.path('../data/V1_Mouse_Brain_Sagittal_Anterior/spatial/scalefactors_json.json'))
scalefactors
scalefactors$spot_diameter_fullres
spot.size = 65 # the theoretical spot size (um) in 10X Visium
conversion.factor = spot.size/scalefactors$spot_diameter_fullres
spatial.factors = data.frame(ratio = conversion.factor, tol = spot.size/2)
spatial.factors

d.spatial <- computeCellDistance(coordinates = spatial.locs, ratio = spatial.factors$ratio, tol = spatial.factors$tol)
# this value should approximately equal 100um for 10X Visium data
min(d.spatial[d.spatial!=0])

2.创建cellchat对象

用户可以从数据矩阵或Seurat对象创建一个新的CellChat对象。如果输入是Seurat对象，则默认使用该对象中的元数据，用户必须提供group.by以定义细胞群组。例如，group.by = “ident”用于Seurat对象中的默认细胞身份。

注意：如果用户加载了之前计算的CellChat对象（版本 < 2.1.0），请通过updateCellChat更新该对象。

## 创建cellchat对象
cellchat <- createCellChat(object = data.input, meta = meta, group.by = "labels",
                           datatype = "spatial", coordinates = spatial.locs, spatial.factors = spatial.factors)

cellchat

2.设置配体受体库

在使用CellChat进行细胞间通信分析之前，用户需要设置配体-受体相互作用数据库，并识别高表达的配体或受体。

空间转录组细胞通讯分析：CellChat v2（Nature Protocols，IF: 16.0/Q1）

CellChatDB是一个基于文献支持的配体-受体相互作用数据库，涵盖了人类和小鼠的约3300个经过验证的分子相互作用。与旧版本相比，CellChatDB v2新增了1000多个蛋白质和非蛋白质相互作用，并增加了配体-受体对的功能注释（如UniProtKB关键词、亚细胞定位等）。

## 配体受体库
# use CellChatDB.human if running on human data
CellChatDB <- CellChatDB.mouse 
showDatabaseCategory(CellChatDB)

# Show the structure of the database
dplyr::glimpse(CellChatDB$interaction)

# 使用其中一部分
# use a subset of CellChatDB for cell-cell communication analysis
# use Secreted Signaling
CellChatDB.use <- subsetDB(CellChatDB, search = "Secreted Signaling", key = "annotation")

# Only uses the Secreted Signaling from CellChatDB v1
#  CellChatDB.use <- subsetDB(CellChatDB, search = list(c("Secreted Signaling"), c("CellChatDB v1")), key = c("annotation", "version"))

# use all CellChatDB except for "Non-protein Signaling" for cell-cell communication analysis
# CellChatDB.use <- subsetDB(CellChatDB)

# use all CellChatDB for cell-cell communication analysis
# CellChatDB.use <- CellChatDB # simply use the default CellChatDB. We do not suggest to use it in this way because CellChatDB v2 includes "Non-protein Signaling" (i.e., metabolic and synaptic signaling) that can be only estimated from gene expression data. 

# set the used database in the object
cellchat@DB <- CellChatDB.use

3.识别亚群高表达基因

CellChat通过识别细胞群中过度表达的配体或受体来推断细胞状态特异性的通信。如果配体或受体中有任何一个过度表达，就会识别出过度表达的配体-受体相互作用。

此外，CellChat提供了一个功能，可以将基因表达数据投影到蛋白质-蛋白质相互作用（PPI）网络上。

# subset the expression data of signaling genes for saving computation cost
cellchat <- subsetData(cellchat) # This step is necessary even if using the whole database
future::plan("multisession", workers = 4) 
cellchat <- identifyOverExpressedGenes(cellchat)
cellchat <- identifyOverExpressedInteractions(cellchat, variable.both = F)
cellchat

4.细胞互作网络推断

计算通信 probability 并推断互作网络，probability 的含义见帖子：cellchat细胞通讯中 prob 与 pval 的含义是什么?

快速检查推断结果：在computeCommunProb中设置nboot = 20可以快速查看结果。此时，如果p值小于0.05，则表示观察到的通信概率显著高于置换结果。
调整基因表达计算方法：如果已知信号通路未被预测到，用户可以通过truncatedMean和较低的trim值来调整计算细胞群平均基因表达的方法。
调整参数：对于不同空间转录组技术的数据，用户可能需要调整scale.distance参数，并通过?computeCommunProb查看详细文档。
接触依赖信号的设置：

推断接触依赖或旁分泌信号时，需要设置contact.range（细胞直径或中心到中心的距离）和contact.dependent = TRUE。
对于单细胞分辨率数据，contact.range通常设置为10（典型人类细胞大小）。
对于低分辨率数据（如10X Visium），contact.range应设置为细胞中心到中心的距离（如100）。

cellchat <- computeCommunProb(cellchat, type = "truncatedMean", trim = 0.1,
                              distance.use = TRUE, interaction.range = 250, scale.distance = 0.01,
                              contact.dependent = TRUE, contact.range = 100)

cellchat <- filterCommunication(cellchat, min.cells = 10)

计算通路水平的信号并整合通讯网络：

# summarizing the communication probabilities of all ligands-receptors interactions associated with each signaling pathway
cellchat <- computeCommunProbPathway(cellchat)

# summarizing the communication probability.
cellchat <- aggregateNet(cellchat)

5.结果提取

默认提取所有的结果：

# 提取推断结果
df.net <- subsetCommunication(cellchat)
#df.net <- subsetCommunication(cellchat, sources.use = c(1,2), targets.use = c(4,5))
#df.net <- subsetCommunication(cellchat, signaling = c("WNT", "TGFb"))

# All the signaling pathways showing significant communications can be accessed by 
df <- cellchat@netP$pathways

6.结果可视化

其他的可视化部分就不展示了（跟单细胞部分一样），这里只展示在空间切片上的图片结果

# Spatial plot
par(mfrow=c(1,1))
netVisual_aggregate(cellchat, signaling = pathways.show, layout = "spatial", edge.width.max = 2, vertex.size.max = 1, alpha.image = 0.2, vertex.label.cex = 3.5)

设置圈的权重：bigger circle indicates larger incoming signaling

# USER can show this information on the spatial transcriptomics when visualizing a signaling network, e.g., bigger circle indicates larger incoming signaling
par(mfrow=c(1,1))
netVisual_aggregate(cellchat, signaling = pathways.show, layout = "spatial", edge.width.max = 2, alpha.image = 0.2, vertex.weight = "incoming", vertex.size.max = 4, vertex.label.cex = 3.5)

展示基因的表达：

# Take an input of a few genes
spatialFeaturePlot(cellchat, features = c("Igf1","Igf1r"), point.size = 0.8, color.heatmap = "Reds", direction = 1)

展示配受体基因的表达：

cutoff = 0.05

# Take an input of a ligand-receptor pair
spatialFeaturePlot(cellchat, pairLR.use = "IGF1_IGF1R", point.size = 0.5, do.binary = FALSE, cutoff = 0.05, enriched.only = F, color.heatmap = "Reds", direction = 1)

显示共表达：

# Take an input of a ligand-receptor pair and show expression in binary
spatialFeaturePlot(cellchat, pairLR.use = "IGF1_IGF1R", point.size = 1, do.binary = TRUE, cutoff = 0.05, enriched.only = F, color.heatmap = "Reds", direction = 1)

最后保存一下 rds对象：

saveRDS(cellchat, file = "cellchat_visium_mouse_cortex.rds")

微精选