Here we store material for the Data Analysis part of the CB2030, Systems Biology course.
This project is maintained by statisticalbiotechnology
A: No. That is the point of the whole point of significance testing. Regardless how you select your threshold, there will be errors. The question is just if you prefer errors of one kind (FP) over the other (FN).
A Bonferroni correction is a way to control for the family-wise error rate (FWER), i.e. what is the probability that at least one of your significant features is generated under H0. One good reason for calculating Bonnferroni corrections are that they are very simple to calculate.
It is a statement of the set of findings we report from an experiment. If we expect 5% of our reported findings to be incorrect, we report a FDR of 5%.
No, each feature is tested individually. However, we want to make an assesment of the errors of the features we called significant.
Sure, anytime you are interested in what fraction of all FPs in your experiment that is called significant, an FPR should be used.
There is no rule. However, frequently tresholds of 1% or 5% are used.
Whenever multiple tests are involved, most journals require you to report FDR/p values. However, whenever you have measurements on one indidual substance that you were interested prior to the experiment, you can report p values.
Overall there are m features in your assay. Each one of these has a prior probability of π0 of being null. Hence you can estimate m0 = mπ0 null features.
If you do not take any equation in consideration you are right. Your confusion arises when we estimate the number of errors under threshold F(t) = m0t. This estimation can be done as m0 = mπ0, or we can conservatively approxiamte F(t) ≈ mt. The π0 = 1 estimate is conservative, as it assumes a larger number of errors than really are present.
Yes this is an accurate description. Features are null or not, however the distinction between a TN and a FP is a treshold.
The expectation value of any random variable gives a sense of how the variable will behave “on avarage”.
”#{null pi <=t}” counts the number of p values under t that were generted under the null hypothesis, while “#{pi <=t}” counts all the p values under t.
It means that it exist a treshold that defines a set with a FDR=q that includes the current p value. The maximal FDR of any set containg the current p value would allways be π0.
Well it is a problem. However, for large datasets, we do not see much variability in this. The case I show in the video is an exadjerated example to demonstrate why the procedure is needed.
We smooth the FDRs into q values by defining a feature’s q value to be the minimal FDR of any treshold that includes the feature. The procedure is important as we otherwise would not be able to allocate a FDR treshhold to a particular feature (, as there will be multiple such FDRs).
The spline procedure is frequently used. However, it should be noted that I in my notebook use a differnt procedure for π0 estimation than the one described in Storey&Tibshirani. In my eyes, the bootstrap method gives more stable estimates.