Run Phase
Previous  Next

SNPator uses an up to date version of the Phase software* installed in dedicated servers. The "Run Phase" option creates a Phase input file from data stored in SNPator, then it runs Phase with this input file and returns the results to the user in the User Results section.


What you need to do is as follows:

1. Enter a Description to identify the process in the top box.

2. Select which of the SNPs in your Study are going to be used. You can do it in several ways:

- By position: You can specify a chromosome and a begin and end positions which delimit a region you are interested in.

- By Region: Depending on the content of the field "Region" in your SNPs Table.

- By Gene: Depending on the content of the field "Gene" in your SNPs Table.

- ALL: All SNPs will be selected.

3. Set the Phase running parameters:
- SEED: Randomly generated by SNPator each time this page is accessed.

- Number of iterations: SNPator sets it at the Phase default value of 100.

- Thinning interval: SNPator sets it at the Phase default value of 1.

- Burn-in: SNPator sets it at the Phase default value of 100.

4. If the "Case Control" box is ticked, the Phase input file generated by SNPator will be ready to be run with the Case/Control option on. In this case the field that will be used to distinguish cases from controls must be selected:

- First, the field has to be selected:

- Then, you have to select a value for cases and controls:


Samples with other values other than the ones declared here will be discarded and not used in the process.

When more than two possible values exist in a field. You may want to enter only one of those values in either the "Case" or the "Control" boxes, leaving the other box untouched (that is, leaving the "[Select Item]" option there). By doing so, those samples that match your explicit selection are being defined as, say, Cases (or Controls, if you selected so). All the remaining samples having other values will be used as Controls.

If a blank is selected " ", that means that there are Samples with no value in the field and that those are being selected as a case or control.

5. If the "SimplifiedOutput" box is ticked, PHASE will run using the "-T" parameter option which gives only the best guess of each haplotype.

6. You can choose Batch Mode:

This is a fundamental time-saving feature. Selecting one of the fields in the Samples or SNP table, SNPator will run this process as many times as different values are in that fields, using each time only those samples or SNPs that have each of the values. For instance, if you have defined your samples in the "sex" field as "M" or "W", selecting "sex" as the attribute of the Sample batch mode will result in having two runs of PHASE, taking separately men and women.

If you select at the same time sample and SNP batch fields, you are going to obtain as many runs of the process as all possible combinations of the values of samples and SNPs in the fields you selected.

When running PHASE in sample batch mode, you will get, besides the expected result files for each batch, a common file called "" which contains all the haplotypes of all the batches put together in TAB format.

This "" file can be used as an input file in all SNPator analysis that need a list of haplotypes in TAB format such as, for example, Haplotype Association Test or Haplotype Analysis.

If it happens that, for different batches, a different set of SNPs have been used in order to estimate the haplotypes, the resulting haplotypes will be grouped in the "" file  according to the SNPs involved and additional information will be given in order to identify them.

7. You click "RUN" .

The Phase process will now be launched. Its running life can be followed by means of the options available in the Jobs section. Once it is finished, Phase results will appear in the User results section. There will be a *.zip file containing:

- All the output files that Phase has generated during its execution

- SNPs.txt
A text file containing the list of SNPs that have been used to build the phase input file. It contains:

- SNP code
- Position
- Distance to next SNP

- SNPs.snp
Another SNP info text with a format suitable for some programs.

- phaseProcess.log
A log of the messages that phase shows during execution. It's what  users would see on their screen if they executed phase on their workstation.

- Information.txt
General information about dates and times of performance, user, study, filters applied and other data

- report.bis
(See above the "Haploid data treatment" section)

- haplotypes.txt
The estimated haplotypes for each sample in a TAB format.

- Haploid data treatment.

* All genotypes are haploid

Haplotype estimation does not make sense since in haploid genotypes phase is already known. PHASE program is not run but the genotypes from SNPator are inserted in a haplotype format in a "haplotype.txt" file.

* Some samples have diploid genotypes and some others have haploid genotypes.

This could be the case of Genotypes coming from the X chromosome. Diploid samples are treated as usual. The genotypes of haploid samples are duplicated in the PHASE input file.

Once PHASE has been run, the "report" file generated is processed to simplify the haplotypes of the samples which were artificially duplicated in order to run PHASE.

The new processed file will be called "report.bis".

So, the "BEGIN BESTPAIRS_SUMMARY" seccion of the report file:

     NA06985: (1,2)
     NA06991: (1,1)
     NA06993: (1,3)
     NA06994: (2,4)
     NA07000: (2,2)
NA07019: (2,2)

will be transformed in "report.bis" to:

NA06985: (1,2)
NA06991: (1)
NA06993: (1,3)
NA06994: (2,4)
NA07000: (2)
NA07019: (2)

An exception to this transformation comes, in some rare cases, when an originally duplicated haploid sample with missing genotypes is estimated by PHASE to have heterozygous haplotypes. In this case, the "report.bis" file will keep both haplotypes but will mark the sample with an asterisk

NA06991: (1,3)*

When such an indetermination arises, the haplotype written for this sample in the "haplotypes.txt" and "" files will contain indeterminations ("?") for the differing nucleotides of the two estimated haplotypes.

* Haploid and diploid genotypes are mixed in the same samples.

An error message appears and no action is taken.