In the workflows/
directory, ng6 provides a formats.py
file where new formats can be added.
It already includes "any", "bam", "fasta", "fastq", and "sff" formats.
ng6/
├── bin/
├── docs/
├── src/
├── workflows/
│ ├── components/
│ ├── extparsers/
│ ├── __init__.py
│ ├── formats.py [ file where to add new ng6 formats ]
│ └── types.py
├── applications.properties
└── README
In ng6 a format is represented by a function named by the desired format name. The function should take only
one argument, whose value is the file path given by the user. The function is in charge of opening and checking the
content of the file. If an error occurres or if the value does not meet the expected criteria, a
jflow.InvalidFormatError
should be raised with the suitable error message. This message will
be used by ng6 to inform the final user of the error.
In the following example, the fasta
function checks if the 10 first lines of the input file are in
a fasta format:
def fasta(ifile):
try:
reader = seqio.FastaReader(ifile, wholefile=True)
nb_seq = 0
for id, desc, seq, qualities in reader:
nb_seq += 1
# only check the first 10 sequences
if nb_seq == 10: break
except:
raise jflow.InvalidFormatError("The provided file '" + ifile + "' is not a fasta file!")
jflow.seqio
and jflow.featureio
libraries where several file formats handler are available.
The new created format can then be used in all add_input_*
functions of the classes jflow.workflow.Workflow
, ng6.ng6workflow.NG6Workflow
,
ng6.ng6workflow.CASAVANG6Workflow
, ng6.analysis.Analysis
and jflow.component.Component
as following:
[...]
def define_parameters(self, function="process"):
self.add_input_file("reference_genome", "Which genome should the read be align on", file_format="fasta", required=True)
[...]