Where to add a new format

In the workflows/ directory, ng6 provides a formats.py file where new formats can be added. It already includes "any", "bam", "fasta", "fastq", and "sff" formats.

ng6/
├── bin/
├── docs/
├── src/
├── workflows/
│   ├── components/
│   ├── extparsers/
│   ├── __init__.py
│   ├── formats.py   [ file where to add new ng6 formats ]
│   └── types.py
├── applications.properties
└── README

How to add a new format

In ng6 a format is represented by a function named by the desired format name. The function should take only one argument, whose value is the file path given by the user. The function is in charge of opening and checking the content of the file. If an error occurres or if the value does not meet the expected criteria, a jflow.InvalidFormatError should be raised with the suitable error message. This message will be used by ng6 to inform the final user of the error.

In the following example, the fasta function checks if the 10 first lines of the input file are in a fasta format:

def fasta(ifile):
    try:
        reader = seqio.FastaReader(ifile, wholefile=True)
        nb_seq = 0
        for id, desc, seq, qualities in reader:
            nb_seq += 1
             # only check the first 10 sequences
            if nb_seq == 10: break
    except:
        raise jflow.InvalidFormatError("The provided file '" + ifile + "' is not a fasta file!")

How to use a new format

The new created format can then be used in all add_input_* functions of the classes jflow.workflow.Workflow, ng6.ng6workflow.NG6Workflow, ng6.ng6workflow.CASAVANG6Workflow, ng6.analysis.Analysis and jflow.component.Component as following:

[...]
def define_parameters(self, function="process"):
    self.add_input_file("reference_genome", "Which genome should the read be align on", file_format="fasta", required=True)
[...]