You can perform Haplotypic association tests here.
To run these tests, SNPator needs, besides the usual information stored in a Study, a list of Haplotypes corresponding to a set of Samples. (Remember
that data is stored in the Genotypes Table without phase information).
This haplotype list may come from an external source, or can be easily
computed using the Phase program that SNPator runs in background.
SNPator will classify the Haplotypes provided by Phase in cases and
controls following the information stored in the Samples Table. Then, it will
proceed to create contingency tables for each haplotype against all the others
and to compute a set of statistics.
The procedure is as follows:
Samples with values other than the ones declared here will be
discarded and not used in the process.
When more than two possible values exist in a field. You may want to
enter only one of these values in either the "Case" or the "Control"
boxes, leaving the other box untouched (that is, leaving the "[Select
Item]" option there). By doing so, those samples that match your
explicit selection are being defined as, say, Cases (or Controls, if you
selected so). All the remaining samples having other values will be
used as the other option (Controls, in this example).
If a blank is selected " ", that means that there are Samples with no
value in the field and that those are being selected as a case or
control.
- Fisher's exact test.
Performs Fisher's Exact Test upon the contingency table in addition to
the Chi-square test that is always done. A combo box allows the user
to select weather to do it for every SNP or only in those cases in which
the Chi-square test has Validity=N (See below).
- Odds Ratio - Haldane Correction.
If some value of the original contingency table is 0, this option tells
SNPator to perform the Haldane Correction when computing Odds
Ratios and their Confidence Intervals.
- Batch mode.
This is a fundamental time-saving feature. Selecting one of the fields in
the Samples table, SNPator will run this analysis as many times as
different values are in that fields, using each time only those samples
that have each of the values. For instance, if you have defined your
samples in the "sex" field as "M" or "W", selecting "sex" as the attribute
of the batch mode will result in having two runs of the analysis, taking
separately men and women.
The haplotype list can be specified by 4 different ways:
Using genotypes already stored SNPator. This option is only valid
when all genotypes are haploid since in this case a set of genotypes is
equivalent to a haplotype.
- Selecting a PHASE* file previously produced in SNPator and stored in
the User Results seccion. You only have to select the result that you
want in the list.
- Providing a "report" file (one of the outputs of the Phase program) that
may be stored in your local computer. A "report.bis" file generated by
SNPator can be also used in this seccion. (More info about "report.bis"
files in Run PHASE help)
- Providing your own list of Haplotypes in a tabulated format. This list
should have the following format:
Sample1 Hap1 Hap2
Sample2 Hap1 Hap2
... ... ...
NOTE:
* missing alleles inside a haplotype must be entered as '?'
* haploid samples will have no "Hap2"
All haplotypes containing missing alleles will be discarded and not taken into
account when calculating the association test.
In order to do the association analysis SNPator has to decide which samples
of those entered in the Haplotypes list are case and which are control. This
information is stored in the Samples Table. The analysis cannot be
performed, or will be incomplete if the sample names of the list and those
entered in SNPator are not the same.
A very usual problem arises when a filter is activated that does not allow to
retrieve information from some samples (because they are excluded by the
filter). At this point, thus, you need to be sure that all the intended samples
are being analyzed.
Job : H_Association
Description : ff
User : advanced
Study : Pruebas_2
Request time : 2005-10-06 17:48:58
Start time : 2005-10-06 17:48:58
End time : 2005-10-06 17:49:03
Ready to use time : 2005-10-06 17:49:06
Filter Information:
Filter: 0
Filter description: central
Filter version: 2
-------------------------------------------------------------------------------
Percentage of samples used: 100.00 %
-------------------------------------------------------------------------------
Haplotype N P V ODDS Ratio
CI 95% Fisher
-------------------- ------ ----------- -- ------------- ----------------------- -----------
CCCGTT 1 0.3880 N 0.4286 +
0.02 - 11.63 + 1.0000
CGCGTT 8 0.0404 * Y 0.1111
0.01 - 1.13 0.0791
CGTCCT 1 0.3880 N 0.4286 +
0.02 - 11.63 + 1.0000
CGTGCT 1 0.3880 N 0.4286 +
0.02 - 11.63 + 1.0000
CGTGTT 1 0.3880 N 0.4286 +
0.02 - 11.63 + 1.0000
TCCCCC 1 0.2268 N 4.5789 + 0.17
- 124.59 + 0.4167
TCCCCT 2 0.0805 N 8.5294 + 0.36
- 199.49 + 0.1630
TCCGCT 2 0.0805 N 8.5294 + 0.36
- 199.49 + 0.1630
TCCGTT 2 0.0805 N 8.5294 + 0.36
- 199.49 + 0.1630
TCTCTT 1 0.2268 N 4.5789 + 0.17
- 124.59 + 0.4167
TGCGTT 1 0.2268 N 4.5789 + 0.17
- 124.59 + 0.4167
TGTCCT 1 0.3880 N 0.4286 +
0.02 - 11.63 + 1.0000
TGTCTT 1 0.3880 N 0.4286 +
0.02 - 11.63 + 1.0000
TGTGTT 1 0.3880 N 0.4286 +
0.02 - 11.63 + 1.0000
------
24
+ Haldane correction applied
| CCCGTT | Others | Chi Squared: 0.7453
|--------|--------| pValue: 0.3880
Case | 0 | 10 | Odds Ratio: 0.4286 +
Control | 1 | 13 | CI 95%: 0.02 - 11.63 +
|--------|--------|
| CGCGTT | Others | Chi Squared: 4.2000
|--------|--------| pValue: 0.0404 *
Case | 1 | 9 | Odds Ratio: 0.1111
Control | 7 | 7 | CI 95%: 0.01 - 1.13
|--------|--------|
| CGTCCT | Others | Chi Squared: 0.7453
|--------|--------| pValue: 0.3880
Case | 0 | 10 | Odds Ratio: 0.4286 +
Control | 1 | 13 | CI 95%: 0.02 - 11.63 +
|--------|--------|
| CGTGCT | Others | Chi Squared: 0.7453
|--------|--------| pValue: 0.3880
Case | 0 | 10 | Odds Ratio: 0.4286 +
Control | 1 | 13 | CI 95%: 0.02 - 11.63 +
|--------|--------|
| CGTGTT | Others | Chi Squared: 0.7453
|--------|--------| pValue: 0.3880
Case | 0 | 10 | Odds Ratio: 0.4286 +
Control | 1 | 13 | CI 95%: 0.02 - 11.63 +
|--------|--------|
At the top, you can find the usual header informing you about dates and times
of performance, user, study, filters applied and other data.
"Percentage of samples used" informs the user about the percentage of
samples included in the haplotype list that have matched samples included in
the sample table. This percentage should be 100% if everything is OK.
Otherwise, it is possible that an incorrect filter was activated when the test
was performed or that there areproblems with the spelling of samples.
Below that, the association results are printed in several columns, containing:
- Haplotype
The haplotype used in a 2x2 association test (the rest of
haplotypes were pooled together).
- N
Occurrences of this haplotype in the haplotype list.
- pValue
P-value from the chi square test applied to the 2x2 contingency
table for each haplotype.
- Significance
* for pValue<0.05, ** for pValue<0.01
- Validity
Not valid (N) when there is some expected value in the chi-
squared contingency table that is equal or below 1.
- Odds Ratio
The odds ratio resulting from the association analysis of each
haplotype. If the Haldane correction has been used, this value is
flagged with a "+" sign.
- CI95
It is the 95% Confidence Interval for the Odds Ratio value
obtained before. Here, too, if Haldane correction has been
applied it will be flagged with a "+" sign.
- Fisher_p
P-value from the Fisher exact test applied to the 2x2
contingency table for each haplotype.
- Significance (of the of Fisher's Exact Test)
* for P-values<0.05, ** for P-values<0.01
Finally, for each haplotype, the contingency table with some of the statistics is
printed.
References:
- Stephens, M., and Donnelly, P. (2003). A comparison of Bayesian methods for haplotype
reconstruction from population genotype data. American Journal of Human Genetics,
73:1162-1169.