Data retrieval: Formats - Phase
Previous  Next

This section generates a properly formatted and ready-to-use file that you can use as an input file for the Phase software*:


What you need to do is as follows:

1. Enter a Description to identify the process in the top box.

2. Select which of the SNPs in your Study are going to be used. You can do it in several ways:

- By position: You can specify a chromosome and a begin and end positions which delimit a region you are interested in.

- By Region: Depending on the content of the field "Region" in your SNPs Table.

- By Gene: Depending on the content of the field "Gene" in your SNPs Table.

3 If the "Case Control" box is ticked, the Phase input file generated by SNPator will be ready to be run with the Case/Control option on. In this case the field that will be used to distinguish cases from controls must be selected:


- Then, you have to select a value for cases and controls:


Samples with values other than the ones declared here will be discarded and not used in the process.

When more than two possible values exist in a field. You may want to enter only one of those values in either the "Case" or the "Control" boxes, leaving the other box untouched (that is, leaving the "[Select Item]" option there). By doing so, those samples that match your explicit selection are being defined as, say, Cases (or Controls, if you selected so). All the remaining samples having other values will be used as Controls.

If a blank is selected " ", that means that there are Samples with no value in the field and that those are being selected as a case or control.

4. You can choose Batch Mode:

This is a fundamental time-saving feature. Selecting one of the fields in the Samples or SNP table, SNPator will run this process as many times as different values are in that fields, using each time only those samples or SNPs that have each of the values. For instance, if you have defined your samples in the "sex" field as "M" or "W", selecting "sex" as the attribute of the Sample batch mode will result in having two runs of PHASE format retrieval, taking separately men and women.

If you select at the same time sample and SNP batch fields, you are going to obtain as many runs of the process as all possible combinations of the values of samples and SNPs in the fields you selected.

5. Finally, the resulting Phase input file is sent to the User Results section. It will be a *.zip file containing several files:

- phaseConfigFile.inp
This is the Phase input file that has to be given to Phase in order to perform haplotype estimations.

- SNPs.txt
A text file containing the list of SNPs that have been used to build the phase input file. It provides info on:

- SNP code
- Position
- Distance to next SNP

- SNPs.snp
Another SNP info text with a format suitable for some programs.

- information.txt
SNPator information to identify the job: date, time, user, study and filter.

- status.txt
Report of possible errors in the process.

- Haploid data treatment.

When haploid data are used in this section, three scenarios are possible:

* If all genotypes are haploid.

Haplotype estimation does not make sense. An error message appears and no action is taken.

* If some samples have diploid genotypes, while other samples have haploid genotypes.

This could be the case of Genotypes coming from the X chromosome. Diploid samples are treated as usual. The genotypes of haploid samples are duplicated.

* Haploid and diploid genotypes are mixed in the same samples.

An error message appears and no action is taken.