targets = read.delim('http://bioinf.wehi.edu.au/edgeR/F1000Research2016/targets.txt',
stringsAsFactors=FALSE)
targets
## GEO SRA CellType Status
## MCL1.DG GSM1480297 SRR1552450 B virgin
## MCL1.DH GSM1480298 SRR1552451 B virgin
## MCL1.DI GSM1480299 SRR1552452 B pregnant
## MCL1.DJ GSM1480300 SRR1552453 B pregnant
## MCL1.DK GSM1480301 SRR1552454 B lactating
## MCL1.DL GSM1480302 SRR1552455 B lactating
## MCL1.LA GSM1480291 SRR1552444 L virgin
## MCL1.LB GSM1480292 SRR1552445 L virgin
## MCL1.LC GSM1480293 SRR1552446 L pregnant
## MCL1.LD GSM1480294 SRR1552447 L pregnant
## MCL1.LE GSM1480295 SRR1552448 L lactating
## MCL1.LF GSM1480296 SRR1552449 L lactating
To study the luminal cells in the mouse mammary gland.
group <- paste(targets$CellType, targets$Status, sep=".")
group <- factor(group)
table(group)
## group
## B.lactating B.pregnant B.virgin L.lactating L.pregnant L.virgin
## 2 2 2 2 2 2
Why do we combine the Cell Type and Status to create group?
FileURL = paste(
"http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE60450",
"format=file",
"file=GSE60450_Lactation-GenewiseCounts.txt.gz",
sep="&"
)
download.file(FileURL, "GSE60450_Lactation-GenewiseCounts.txt.gz")
GenewiseCounts <- read.delim("GSE60450_Lactation-GenewiseCounts.txt.gz",
row.names="EntrezGeneID")
colnames(GenewiseCounts) <- substring(colnames(GenewiseCounts),1,7)
dim(GenewiseCounts)
## [1] 27179 13
head(GenewiseCounts)
## Length MCL1.DG MCL1.DH MCL1.DI MCL1.DJ MCL1.DK MCL1.DL MCL1.LA
## 497097 3634 438 300 65 237 354 287 0
## 100503874 3259 1 0 1 1 0 4 0
## 100038431 1634 0 0 0 0 0 0 0
## 19888 9747 1 1 0 0 0 0 10
## 20671 3130 106 182 82 105 43 82 16
## 27395 4203 309 234 337 300 290 270 560
## MCL1.LB MCL1.LC MCL1.LD MCL1.LE MCL1.LF
## 497097 0 0 0 0 0
## 100503874 0 0 0 0 0
## 100038431 0 0 0 0 0
## 19888 3 10 2 0 0
## 20671 25 18 8 3 10
## 27395 464 489 328 307 342
DGElist
data object The edgeR package stores data in a simple list-based data object called DGElist
, main components include:
data.frame
formaty <- DGEList(genes=GenewiseCounts[,1,drop=FALSE],
GenewiseCounts[,-1],
group=group)
options(digits=3)
y$samples
## group lib.size norm.factors
## MCL1.DG B.virgin 23227641 1
## MCL1.DH B.virgin 21777891 1
## MCL1.DI B.pregnant 24100765 1
## MCL1.DJ B.pregnant 22665371 1
## MCL1.DK B.lactating 21529331 1
## MCL1.DL B.lactating 20015386 1
## MCL1.LA L.virgin 20392113 1
## MCL1.LB L.virgin 21708152 1
## MCL1.LC L.pregnant 22241607 1
## MCL1.LD L.pregnant 21988240 1
## MCL1.LE L.lactating 24723827 1
## MCL1.LF L.lactating 24657293 1
The row names of the DGElist
object y
are ENTREZIDs. We can map gene annotations to these IDs.
# source("https://bioconductor.org/biocLite.R")
# biocLite("org.Mm.eg.db")
# library(org.Mm.eg.db)
y$genes$Symbol <- mapIds(org.Mm.eg.db,
rownames(y),
keytype="ENTREZID",
column="SYMBOL")
## 'select()' returned 1:1 mapping between keys and columns
head(y$genes)
## Length Symbol
## 497097 3634 Xkr4
## 100503874 3259 Gm19938
## 100038431 1634 Gm10568
## 19888 9747 Rp1
## 20671 3130 Sox17
## 27395 4203 Mrpl15
Remove genes that don’t have official gene symbols.
y <- y[!is.na(y$genes$Symbol), ]
dim(y)
## [1] 26327 12
From biological point of view, a gene must be expressed at some mini- mal level before it is likely to be translated into a protein or to be considered biologically important. From a statistical point of view, genes with consistently low counts are very unlikely be assessed as significantly DE because low counts do not provide enough statistical evidence for a reliable judgement to be made.
Filter genes based on count-per-million(CPM) values instead of simply on raw counts. The CPM takes into account the library size.
What problem could have if the filtering is based on raw counts?
keep <- rowSums(cpm(y) > 0.5) >= 2
table(keep)
## keep
## FALSE TRUE
## 10686 15641
y <- y[keep, , keep.lib.sizes=FALSE]
Use Multi-Dimensional Scaling (MDS) plots to visualize the difference of within- and betwee-groups.
y <- calcNormFactors(y)
pch <- c(0,1,2,15,16,17)
colors <- rep(c("darkgreen", "red", "blue"), 2)
plotMDS(y, col=colors[group], pch=pch[group], xlim=c(-4, 4), ylim=c(-5,5))
legend("topleft", legend=levels(group), pch=pch, col=colors, ncol=2)
Use Mean-Difference (MD) plots to visualize log-fold change between two libraries against the average log-expression.
what are those diagnoal lines in the MD plots?
par(mfrow=c(1,2))
plotMD(y, column=1, ylim=c(-10, 10))
plotMD(y, column=11, ylim=c(-10, 10))
plotMD(y[y$counts[,1] < 3, ], column = 1, ylab="read counts < 3")
plotMD(y[y$counts[,1] == 300, ], column = 1, ylab = "read counts = 300")
par(mfrow=c(1,1))
design <- model.matrix(~0+group)
colnames(design) <- levels(group)
design
## B.lactating B.pregnant B.virgin L.lactating L.pregnant L.virgin
## 1 0 0 1 0 0 0
## 2 0 0 1 0 0 0
## 3 0 1 0 0 0 0
## 4 0 1 0 0 0 0
## 5 1 0 0 0 0 0
## 6 1 0 0 0 0 0
## 7 0 0 0 0 0 1
## 8 0 0 0 0 0 1
## 9 0 0 0 0 1 0
## 10 0 0 0 0 1 0
## 11 0 0 0 1 0 0
## 12 0 0 0 1 0 0
## attr(,"assign")
## [1] 1 1 1 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$group
## [1] "contr.treatment"
Usually, the predictors in the linear model in differential expression analysis are categorical variables (e.g., the group
variable). Have you encountered situations in your research that the predictors are continous variables?
See the change in norm.factors
column before and after normalization.
y <- calcNormFactors(y)
y$samples
## group lib.size norm.factors
## MCL1.DG B.virgin 23137472 1.235
## MCL1.DH B.virgin 21687755 1.213
## MCL1.DI B.pregnant 23974787 1.125
## MCL1.DJ B.pregnant 22545375 1.069
## MCL1.DK B.lactating 21420532 1.036
## MCL1.DL B.lactating 19916685 1.087
## MCL1.LA L.virgin 20273585 1.370
## MCL1.LB L.virgin 21568458 1.368
## MCL1.LC L.pregnant 22117517 1.006
## MCL1.LD L.pregnant 21877287 0.924
## MCL1.LE L.lactating 24657903 0.529
## MCL1.LF L.lactating 24600304 0.535
Effective library size = library size X normalizatin factor.
The normalization factors of all the libraries multiply to unity. A normalization factor below one indicates that a small number of high count genes are monopolizing the sequencing, causing the counts for other genes to be lower than would be usual given the library size. As a result, the effective library size will be scaled down for that sample. Here we see that the luminal-lactating samples have low normalization factors. This is a sign that these samples contain a number of very highly upregulated genes.
Dispersion = Coefficient of Variation = Stadard Deviation divided by Mean Biological Cofficient of Variation (BCV) = sqaure root Dispersion = square root Coefficient of Variation
The dispersion trend tends to decrease smoothly with abundance and to asymptotic to a constant value for genes with larger counts
y <- estimateDisp(y, design, robust=TRUE)
par(mfrow=c(1,2))
plotBCV(y, ylim=c(0,2.5))
fit <- glmQLFit(y, design, robust=TRUE)
plotQLDisp(fit, ylim=c(0,2.5))
par(mfrow=c(1,1))
design
## B.lactating B.pregnant B.virgin L.lactating L.pregnant L.virgin
## 1 0 0 1 0 0 0
## 2 0 0 1 0 0 0
## 3 0 1 0 0 0 0
## 4 0 1 0 0 0 0
## 5 1 0 0 0 0 0
## 6 1 0 0 0 0 0
## 7 0 0 0 0 0 1
## 8 0 0 0 0 0 1
## 9 0 0 0 0 1 0
## 10 0 0 0 0 1 0
## 11 0 0 0 1 0 0
## 12 0 0 0 1 0 0
## attr(,"assign")
## [1] 1 1 1 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$group
## [1] "contr.treatment"
B.LvsP <- makeContrasts(B.lactating-B.pregnant, levels=design)
res <- glmQLFTest(fit, contrast=B.LvsP)
topTags(res)
## Coefficient: 1*B.lactating -1*B.pregnant
## Length Symbol logFC logCPM F PValue FDR
## 12992 765 Csn1s2b 6.08 10.19 421 4.77e-11 7.47e-07
## 211577 2006 Mrgprf 5.15 2.75 343 1.32e-10 7.94e-07
## 226101 7094 Myof 2.32 6.45 322 1.96e-10 7.94e-07
## 381290 8292 Atp2b4 2.14 6.15 320 2.03e-10 7.94e-07
## 140474 11281 Muc4 -7.17 6.06 308 2.61e-10 8.16e-07
## 231830 3346 Micall2 -2.25 5.19 282 4.43e-10 1.15e-06
## 24117 2242 Wif1 -1.82 6.77 261 7.10e-10 1.59e-06
## 12740 1812 Cldn4 -5.32 9.87 298 9.07e-10 1.73e-06
## 21953 667 Tnni2 5.75 3.86 313 9.93e-10 1.73e-06
## 231991 2873 Creb5 2.57 4.87 241 1.17e-09 1.83e-06
Determine which genes are DE genes based on the glmQLFTest
result.
is.de <- decideTestsDGE(res)
summary(is.de)
## 1*B.lactating -1*B.pregnant
## -1 2758
## 0 10396
## 1 2487
Visualize log-fold change against average abundance.
plotMD(res, status=is.de, values=c(1,-1), col=c("red","blue"),
legend="topright")
Are the DE genes identified based on P values or the FDR values in the res
? If we use P values to filter DE genes, which DE genes are more likely to be false positives?
par(mfrow=c(1,2))
# based on P value
plotMD(res[res$table$PValue < 0.05, ], values=c(1,-1),
col=c("red","blue"),legend="topright")
# Based on FDR
plotMD(res[is.de != 0, ], values=c(1,-1),
col=c("red","blue"),legend="topright")
par(mfrow=c(1,1))
Are all statistically significant DE genes biologically meaningful to us?
Test whether the log-fold changes are significantly greater than a number.
tr <- glmTreat(fit, contrast=B.LvsP, lfc=log2(1.5))
par(mfrow=c(1,2))
# original
plotMD(res[is.de != 0, ], values=c(1,-1),
col=c("red","blue"),legend="topright")
# log-fold change greater than 1.5
is.de.1point5 = decideTestsDGE(tr)
plotMD(res[is.de.1point5 != 0, ], values=c(1,-1),
col=c("red","blue"),legend="topright")
par(mfrow=c(1,1))
logCPM <- cpm(y, prior.count=2, log=TRUE)
rownames(logCPM) <- y$genes$Symbol
colnames(logCPM) <- paste(y$samples$group, 1:2, sep="-")
o <- order(tr$table$PValue)
logCPM <- logCPM[o[1:30],]
logCPM <- t(scale(t(logCPM)))
col.pan <- colorpanel(100, "blue", "white", "red")
heatmap.2(logCPM, col=col.pan, Rowv=TRUE, scale="none",
trace="none", dendrogram="both", cexRow=1, cexCol=1.4,
density.info="none",
margin=c(7,8), lhei=c(2,6), lwid=c(2,5))
The differential expression analysis comparing two groups can be easily extended to comparisons between three or more groups. This is done by creating a matrix of independent contrasts. In this manner, users can perform a one-way analysis of deviance (ANODEV) for each gene.
The QL F-test identifies genes that are DE between the three groups. This combines the three pairwise comparisons into a single F-statistic and p-value.
con <- makeContrasts(
L.PvsL = L.pregnant - L.lactating,
L.VvsL = L.virgin - L.lactating,
L.VvsP = L.virgin - L.pregnant, levels=design)
res <- glmQLFTest(fit, contrast=con)
topTags(res)
## Coefficient: LR test on 2 degrees of freedom
## Length Symbol logFC.L.PvsL logFC.L.VvsL logFC.L.VvsP logCPM F
## 19242 2021 Ptn -1.54 7.26 8.802 7.97 2389
## 13645 4757 Egf -5.36 -7.22 -1.863 3.67 1123
## 52150 4089 Kcnk6 -2.42 -7.00 -4.578 5.91 1020
## 15439 1345 Hp 1.08 5.42 4.337 4.93 991
## 12992 765 Csn1s2b -8.55 -11.36 -2.810 10.19 1050
## 14183 5346 Fgfr2 -1.15 3.95 5.097 7.38 952
## 20856 1793 Stc2 -1.81 3.20 5.006 6.11 920
## 11941 7050 Atp2b2 -7.37 -10.56 -3.189 6.60 1134
## 13358 1678 Slc25a1 -4.13 -4.91 -0.783 7.50 887
## 17068 691 Ly6d 3.42 9.24 5.820 4.69 887
## PValue FDR
## 19242 3.70e-17 5.79e-13
## 13645 4.49e-15 3.14e-11
## 52150 8.28e-15 3.14e-11
## 15439 9.92e-15 3.14e-11
## 12992 1.00e-14 3.14e-11
## 14183 1.28e-14 3.15e-11
## 20856 1.59e-14 3.15e-11
## 11941 1.78e-14 3.15e-11
## 13358 2.01e-14 3.15e-11
## 17068 2.01e-14 3.15e-11
What does a significant PValue or FDR value in the table above mean? Does it mean all three groups are significantly different from each other?.
See the MD plot below, why are there no “down-regulated” genes?
is.de = decideTestsDGE(res)
plotMD(res, status=is.de, values=c(1,-1), col=c("red","blue"),
legend="topright")
The decideTestsDGE()
function: > If object contains F-tests or LRTs for multiple contrasts, then the genes are simply classified as significant (1) or not significant. In this case, the log2-fold-change theshold lfc has to be achieved by at least one of the contrastsf or a gene to be significant. >
To count the number of DE genes that are annotated with each possible GO term. GO terms that occur frequently in the list of DE genes are said to be over-represented or enriched.
fit <- glmQLFit(y, design, robust=TRUE)
tr <- glmTreat(fit, contrast=B.LvsP, lfc=log2(1.5))
go <- goana(tr, species="Mm")
topGO(go, n=15)
## Term Ont N Up Down P.Up
## GO:0007059 chromosome segregation BP 261 1 56 0.998
## GO:0000280 nuclear division BP 507 10 79 0.792
## GO:1903047 mitotic cell cycle process BP 665 8 92 0.993
## GO:0007067 mitotic nuclear division BP 407 5 68 0.972
## GO:0022402 cell cycle process BP 949 15 111 0.975
## GO:0000278 mitotic cell cycle BP 747 8 96 0.998
## GO:0048285 organelle fission BP 553 12 79 0.697
## GO:0051301 cell division BP 496 3 72 1.000
## GO:0000070 mitotic sister chromatid segregation BP 125 0 35 1.000
## GO:0098813 nuclear chromosome segregation BP 203 1 44 0.993
## GO:0000776 kinetochore CC 123 1 34 0.952
## GO:0007049 cell cycle BP 1317 17 129 0.999
## GO:0000819 sister chromatid segregation BP 148 0 37 1.000
## GO:0000775 chromosome, centromeric region CC 176 1 40 0.987
## GO:0098687 chromosomal region CC 299 5 49 0.856
## P.Down
## GO:0007059 3.22e-23
## GO:0000280 7.62e-23
## GO:1903047 9.70e-23
## GO:0007067 1.83e-21
## GO:0022402 2.30e-21
## GO:0000278 2.31e-21
## GO:0048285 1.93e-20
## GO:0051301 4.41e-19
## GO:0000070 7.25e-19
## GO:0098813 1.20e-18
## GO:0000776 3.60e-18
## GO:0007049 4.23e-18
## GO:0000819 4.66e-18
## GO:0000775 7.83e-18
## GO:0098687 2.16e-15
The P.Up and P.Down columns contain the p-values for over-representation of the GO term in the up- and down-regulated genes, respectively. Note that the p-values are not adjusted for multiple testing—we would usually ignore GO terms with p-values greater than about 10-5 (why such a small p value?).
keg <- kegga(tr, species="Mm")
topKEGG(keg, n=15, truncate=34)
## Pathway N Up Down P.Up
## path:mmu03008 Ribosome biogenesis in eukaryot... 76 0 19 1.00e+00
## path:mmu05150 Staphylococcus aureus infection 30 0 10 1.00e+00
## path:mmu04110 Cell cycle 120 0 20 1.00e+00
## path:mmu00100 Steroid biosynthesis 18 5 0 4.55e-05
## path:mmu00240 Pyrimidine metabolism 93 0 15 1.00e+00
## path:mmu04970 Salivary secretion 63 8 7 1.01e-04
## path:mmu00900 Terpenoid backbone biosynthesis 21 5 0 1.02e-04
## path:mmu04972 Pancreatic secretion 66 8 4 1.42e-04
## path:mmu04610 Complement and coagulation casc... 47 2 9 3.05e-01
## path:mmu00230 Purine metabolism 151 2 17 8.78e-01
## path:mmu04510 Focal adhesion 187 12 8 1.43e-03
## path:mmu04261 Adrenergic signaling in cardiom... 120 9 5 1.98e-03
## path:mmu05144 Malaria 38 2 7 2.26e-01
## path:mmu04640 Hematopoietic cell lineage 60 2 9 4.17e-01
## path:mmu04911 Insulin secretion 60 6 5 2.72e-03
## P.Down
## path:mmu03008 2.96e-09
## path:mmu05150 1.02e-06
## path:mmu04110 1.51e-06
## path:mmu00100 1.00e+00
## path:mmu00240 4.75e-05
## path:mmu04970 3.65e-02
## path:mmu00900 1.00e+00
## path:mmu04972 4.21e-01
## path:mmu04610 4.35e-04
## path:mmu00230 1.30e-03
## path:mmu04510 7.26e-01
## path:mmu04261 7.24e-01
## path:mmu05144 2.39e-03
## path:mmu04640 2.67e-03
## path:mmu04911 1.79e-01
Possible species values include “Hs” (human), “Mm” (mouse), “Rn” (rat), “Dm” (fly) or “Pt” (chimpanzee), but other values are possible if the corresponding organism package is available.
Aims to test for particular gene expression signatures or particular pathways
cyt.go <- c("GO:0032465", "GO:0000281", "GO:0000920")
Rkeys(org.Mm.egGO2ALLEGS) <- cyt.go
cyt.go.genes <- as.list(org.Mm.egGO2ALLEGS)
cyt.go.genes
## $`GO:0032465`
## ISO ISO IMP ISO ISO ISO ISO ISO
## "11848" "12145" "12190" "12211" "12531" "12704" "12795" "13489"
## ISO ISO ISO ISO ISO ISO ISO ISO
## "13490" "13605" "14539" "16001" "16569" "18226" "18754" "20609"
## IBA IBA ISO IBA ISO IMP ISO ISO
## "20871" "20877" "20877" "20878" "23834" "23988" "23988" "26370"
## ISO ISO ISO IMP ISO ISO ISO ISO
## "26554" "26934" "50850" "52679" "54673" "56194" "56208" "57028"
## ISO ISO ISO ISO ISO ISO ISO ISS
## "66371" "66700" "67903" "68112" "71819" "72008" "73139" "73139"
## ISO ISO ISO ISO IDA ISO ISO ISO
## "74393" "75305" "75669" "78610" "83560" "83770" "101565" "102774"
## IMP ISO ISO IBA ISO ISO ISO ISO
## "108961" "109333" "116733" "211660" "211660" "216805" "216963" "225326"
## ISO ISO IMP ISO ISO ISO ISO
## "227937" "239336" "240641" "240641" "242785" "381293" "387349"
##
## $`GO:0000281`
## ISO IMP ISO IGI ISO ISO IMP ISO
## "11735" "11789" "11789" "11840" "11848" "12615" "12631" "12704"
## IMP ISO IGI ISO ISO ISO ISO ISO
## "13163" "16571" "16765" "16765" "18226" "18817" "19348" "20658"
## ISO ISO ISO IEA ISO ISO ISO ISO
## "20742" "26934" "50850" "57784" "66515" "66616" "68743" "69028"
## ISO ISO IBA ISO ISO ISO IMP ISO
## "70527" "71819" "74107" "74107" "75305" "75608" "77579" "77579"
## ISO ISO ISO ISA ISO ISO ISO IMP
## "80986" "84092" "102774" "108907" "170625" "216049" "216846" "235072"
## ISO IMP
## "235406" "240641"
##
## $`GO:0000920`
## ISO ISO ISO ISO ISO ISO ISO ISO
## "18571" "19348" "20479" "66371" "66700" "67064" "68942" "68953"
## ISO IBA ISO ISO ISO ISO ISO ISO
## "69028" "74107" "74107" "75608" "76959" "105513" "116733" "208092"
## ISO
## "234852"
# note that the row names of GenewiseCounts are the Entrez Gene Identifiers
B.VvsL <- makeContrasts(B.virgin-B.lactating, levels=design)
fry(y, index=cyt.go.genes, design=design, contrast=B.VvsL)
## NGenes Direction PValue FDR PValue.Mixed FDR.Mixed
## GO:0032465 49 Up 0.00055 0.00165 3.82e-06 7.03e-06
## GO:0000920 16 Down 0.00186 0.00207 4.69e-06 7.03e-06
## GO:0000281 37 Up 0.00207 0.00207 1.20e-05 1.20e-05
Using the code from the article generates wrong barcode plot. Need to switch the “Up” and “Down” labels.
res <- glmQLFTest(fit, contrast=B.VvsL)
index1 <- rownames(fit) %in% cyt.go.genes[[1]]
index2 = rownames(fit) %in% cyt.go.genes[[2]]
barcodeplot(res$table$logFC, index=index1, index2 = index2, labels=c("Down (B.lactating)", "Up (B.virgin)"),
main=paste0(cyt.go[1], '(Up), ' , cyt.go[2], '(Down)'))
Genes associated with GO term GO:0032465 tend to be up-regulated in the basal cells of virgin mice compared to lactating mice, and genes associated with GO term GO:0000920 tend to be down-regulated confirming the result of the fry test above.
load(url("http://bioinf.wehi.edu.au/software/MSigDB/mouse_c2_v5p1.rdata"))
idx <- ids2indices(Mm.c2,id=rownames(y))
BvsL.v <- makeContrasts(B.virgin - L.virgin, levels=design)
cam <- camera(y, idx, design, contrast=BvsL.v, inter.gene.cor=0.01)
options(digits=2)
head(cam,14)
## NGenes Direction
## LIM_MAMMARY_STEM_CELL_UP 782 Up
## LIM_MAMMARY_LUMINAL_MATURE_DN 169 Up
## SASAI_RESISTANCE_TO_NEOPLASTIC_TRANSFROMATION 80 Up
## LIM_MAMMARY_STEM_CELL_DN 664 Down
## FARMER_BREAST_CANCER_CLUSTER_4 74 Up
## NABA_BASEMENT_MEMBRANES 52 Up
## HAEGERSTRAND_RESPONSE_TO_IMATINIB 35 Up
## ROZANOV_MMP14_TARGETS_SUBSET 83 Up
## REACTOME_COLLAGEN_FORMATION 70 Up
## REACTOME_NCAM1_INTERACTIONS 71 Up
## OXFORD_RALB_TARGETS_UP 27 Up
## ANASTASSIOU_CANCER_MESENCHYMAL_TRANSITION_SIGNATURE 148 Up
## PAPASPYRIDONOS_UNSTABLE_ATEROSCLEROTIC_PLAQUE_DN 68 Up
## LIM_MAMMARY_LUMINAL_PROGENITOR_UP 92 Down
## PValue FDR
## LIM_MAMMARY_STEM_CELL_UP 2.0e-43 9.5e-40
## LIM_MAMMARY_LUMINAL_MATURE_DN 3.8e-25 9.0e-22
## SASAI_RESISTANCE_TO_NEOPLASTIC_TRANSFROMATION 4.1e-20 6.5e-17
## LIM_MAMMARY_STEM_CELL_DN 2.3e-19 2.7e-16
## FARMER_BREAST_CANCER_CLUSTER_4 2.9e-19 2.8e-16
## NABA_BASEMENT_MEMBRANES 1.6e-17 1.2e-14
## HAEGERSTRAND_RESPONSE_TO_IMATINIB 4.0e-17 2.7e-14
## ROZANOV_MMP14_TARGETS_SUBSET 5.2e-16 3.1e-13
## REACTOME_COLLAGEN_FORMATION 1.3e-15 6.8e-13
## REACTOME_NCAM1_INTERACTIONS 2.2e-15 1.0e-12
## OXFORD_RALB_TARGETS_UP 4.6e-15 2.0e-12
## ANASTASSIOU_CANCER_MESENCHYMAL_TRANSITION_SIGNATURE 7.1e-15 2.8e-12
## PAPASPYRIDONOS_UNSTABLE_ATEROSCLEROTIC_PLAQUE_DN 1.3e-14 4.9e-12
## LIM_MAMMARY_LUMINAL_PROGENITOR_UP 2.0e-14 6.9e-12
res <- glmQLFTest(fit, contrast=BvsL.v)
barcodeplot(res$table$logFC,
index=idx[["LIM_MAMMARY_STEM_CELL_UP"]],
index2=idx[["LIM_MAMMARY_STEM_CELL_DN"]],
labels=c("Down (B.lactating)", "Up (B.virgin)"),
main="LIM_MAMMARY_STEM_CELL",alpha=1)
Can you think of other use cases of FRY or Camera analysis?
What other downstream analysis can you do after you get a list of DE genes?
What RNA-Seq analysis pipeline do you use for your RNA-Seq data? Do you prefer running the whole workflow in the R environment? Why or why not?
Other materials that I found useful to understand RNASeq and differential expression analysis: http://www.pathwaycommons.org/guide/primers/functional_analysis/rna_sequencing_analysis/