You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
pval.type = "any" # ranking of p-values based on any comparison (default)
68
-
)
69
-
```
55
+
* Log(fold change) in the proportion of cells in which the gene is detected (counts > 0) in the cluster of interest versus the proportion of cells in which the gene is detected in the other cluster.
56
+
* Takes no account of the magnitude of gene expression.
57
+
* Positive values indicate that the gene is detected in more cells in the cluster of interest than the other cluster.
70
58
71
59
60
+
## `scran::scoreMarkers()` function
72
61
73
-
## `findMarkers`
62
+
For each cluster the function computes the effect size scores between it and every
63
+
other cluster.
74
64
75
-
```{r, eval=FALSE}
76
-
findMarkers(
65
+
```r
66
+
scoreMarkers(
77
67
sce,
78
-
groups = sce$louvain, # clusters to compare
68
+
groups=sce$louvain15# clusters to compare
79
69
block=sce$SampleGroup, # covariates in statistical model
80
-
test.type = "t", # t-test (default)
81
-
direction = "any", # test for either higher or lower expression (default)
* t-test: "Is the mean expression of a gene in cluster 1 and cluster 2 the same?"
96
-
97
-
* Wilcoxon rank-sum test: "It is equally likely that a randomly selected cell from cluster 1 has higher or lower expression of a gene than a randomly selected cell from cluster 2?"
98
-
99
-
* Binomial test: "Is the probability of a gene being expressed the same in cluster 1 and cluster 2?"
***mean.X** - mean score across all pairwise comparisons.
79
+
***min.X** - minimum score obtained across all pairwise comparisons. Most stringent statistic: high score indicates upregulation relative to *all* other clusters.
80
+
***median.X** - median score across all pairwise comparisons. More robust to outliers than the mean.
81
+
***max.X** - maximum score obtained across all pairwise comparisons. The least stringent summary statistic: a high score only indicates that the gene is upregulated relative to *at least one* other cluster.
82
+
***rank.X** - minimum ranking ("min-rank") of that gene's score across all clusters. A rank of 1 indicates that gene had the highest score in at least one of the pairwise comparisons.
154
83
155
84
156
85
## So, what's really important?
157
86
158
-
* understand what are we trying to compare with the different tests (difference in mean expression, difference in probability of being expressed, probability of being highly/lowly expressed)
159
-
160
-
* It’s important to understand the underlying data
161
-
162
-
* It’s important to assess and **validate the results**
163
-
164
-
* Strictly speaking, identifying genes differentially expressed between clusters is statistically flawed, since the clusters were themselves defined based on the gene expression data itself. Validation is crucial as a follow-up from these analyses.
165
-
166
-
167
-
## Things to think about: during analysis
87
+
* Understand what are we trying to compare with the different scores:
88
+
+ difference in mean expression
89
+
+ probability of being highly/lowly expressed
90
+
+ difference in probability of being expressed)
168
91
169
-
*Do not use batch-integrated expression data for differential analysis
92
+
*Strictly speaking, identifying genes differentially expressed between clusters is statistically flawed, since the clusters were themselves defined based on the gene expression data itself. Validation is crucial as a follow-up from these analyses.
170
93
171
-
* Instead, **include batch in the statistical model** (the `findMarkers()` function has the `block` argument to achieve this)
94
+
* Do not use batch-integrated expression data for calculating marker gene scores, instead, **include batch in the statistical model** (the `scoreMarkers()` function has the `block` argument to achieve this).
172
95
173
-
* Depending on the method you choose use: counts, normalised counts or log-normalized counts.
174
-
175
-
* Normalization strategy has a big influence on the results in differential expression.
176
-
177
-
* e.g comparing cell types with few expressed genes vs a cell type with many genes.
178
-
179
-
180
-
## Things to think about: after analysis
96
+
* Normalization strategy has a big influence on the results in differences in expression between cell and between clusters.
181
97
182
98
* A lot of what you get might be noise. Take two random set of cells and run DE and you probably with have a few significant genes with most of the commonly used tests.
183
99
184
-
* Think of the results as hypotheses that need independent verification (e.g. microscopy, qPCR)
100
+
* It’s important to assess and **validate the results**. Think of the results as
101
+
hypotheses that need independent verification (e.g. microscopy, qPCR)
0 commit comments