SNPator uses an up to date version of the Phase software* installed in
dedicated servers. The "Run Phase" option creates a Phase input file from
data stored in SNPator, then it runs Phase with this input file and returns the
results to the user in the User Results section.
What you need to do is as follows:
1. Enter a Description to identify the process in the top box.
2. Select which of the SNPs in your Study
are going to be used. You can do it
in several ways:
- By position: You can specify a chromosome and a begin and end
positions which delimit a region you are interested in.
- By Region: Depending on the content of the field "Region" in your SNPs Table.
- By Gene: Depending on the content of the field "Gene" in your SNPs
- ALL: All SNPs will be selected.
3. Set the Phase running parameters:
- SEED: Randomly generated by SNPator each time this page is
- Number of iterations: SNPator sets it at the Phase default value of
- Thinning interval: SNPator sets it at the Phase default value of 1.
- Burn-in: SNPator sets it at the Phase default value of 100.
4. If the "Case Control" box is ticked, the Phase input file generated by SNPator will be ready to be run with the Case/Control option on. In this case
the field that will be used to distinguish cases from controls must be selected:
- First, the field has to be selected:
- Then, you have to select a value for cases and controls:
Samples with other values other than the ones declared here will be
discarded and not used in the process.
When more than two possible values exist in a field. You may want to
enter only one of those values in either the "Case" or the "Control"
boxes, leaving the other box untouched (that is, leaving the "[Select
Item]" option there). By doing so, those samples that match your
explicit selection are being defined as, say, Cases (or Controls, if you
selected so). All the remaining samples having other values will be
used as Controls.
If a blank is selected " ", that means that there are Samples with no
value in the field and that those are being selected as a case or
5. If the "SimplifiedOutput" box is ticked, PHASE will run using the "-T"
parameter option which gives only the best guess of each haplotype.
6. You can choose Batch Mode:
This is a fundamental time-saving feature. Selecting one of the fields in
the Samples or SNP table, SNPator will run this process as many
times as different values are in that fields, using each time only those samples or SNPs that have each of the values. For instance, if you
have defined your samples in the "sex" field as "M" or "W", selecting
"sex" as the attribute of the Sample batch mode will result in having
two runs of PHASE, taking separately men and women.
If you select at the same time sample and SNP batch fields, you are
going to obtain as many runs of the process as all possible
combinations of the values of samples and SNPs in the fields you
When running PHASE in sample batch mode, you will get, besides
the expected result files for each batch, a common file called
"haplotypes.tab" which contains all the haplotypes of all the batches
put together in TAB format.
This "haplotypes.tab" file can be used as an input file in all SNPator
analysis that need a list of haplotypes in TAB format such as, for
example, Haplotype Association Test or Haplotype Analysis.
If it happens that, for different batches, a different set of SNPs have
been used in order to estimate the haplotypes, the resulting haplotypes
will be grouped in the "haplotypes.tab" file according to the SNPs involved and additional information will be given in order to identify
7. You click "RUN" .
The Phase process will now be launched. Its running life can be followed by
means of the options available in the Jobs section. Once it is finished, Phase
results will appear in the User results section. There will be a *.zip file
- All the output files that Phase has generated during its execution
A text file containing the list of SNPs that have been used to
build the phase input file. It contains:
- SNP code
- Distance to next SNP
Another SNP info text with a format suitable for some programs.
A log of the messages that phase shows during execution. It's
what users would see on their screen if they executed phase on
General information about dates and times of performance,
user, study, filters applied and other data
(See above the "Haploid data treatment" section)
The estimated haplotypes for each sample in a TAB format.
- Haploid data treatment.
* All genotypes are haploid
Haplotype estimation does not make sense since in haploid genotypes phase
is already known. PHASE program is not run but the genotypes from SNPator are inserted in a haplotype format in a "haplotype.txt" file.
* Some samples have diploid genotypes and some others have haploid genotypes.
This could be the case of Genotypes coming from the X chromosome.
Diploid samples are treated as usual. The genotypes of haploid samples are duplicated in the PHASE input file.
Once PHASE has been run, the "report" file generated is processed to
simplify the haplotypes of the samples which were artificially
duplicated in order to run PHASE.
The new processed file will be called "report.bis".
So, the "BEGIN BESTPAIRS_SUMMARY" seccion of the report file:
will be transformed in "report.bis" to:
An exception to this transformation comes, in some rare cases, when
an originally duplicated haploid sample with missing genotypes is
estimated by PHASE to have heterozygous haplotypes. In this case,
the "report.bis" file will keep both haplotypes but will mark the sample
with an asterisk
When such an indetermination arises, the haplotype written for this
sample in the "haplotypes.txt" and "haplotypes.tab" files will contain
indeterminations ("?") for the differing nucleotides of the two estimated
* Haploid and diploid genotypes are mixed in the same samples.
An error message appears and no action is taken.
- Stephens, M., and Donnelly, P. (2003). A comparison of Bayesian methods for haplotype
reconstruction from population genotype data. American Journal of Human Genetics,