The NG6 interface is build upon 3 main objects:
A NG6Workflow allows to create a single Run object with input data files and populate this run with analyzes regarding those data.
It's an extension of ng6.ng6workflow.NG6Workflow
class and can be viewed as a collection of analysis.
It also lists all the inputs and parameters that should be requested to the final user and build the execution
process by adding analysis and linking them to each others.
There are two classes that can be used to create a workflow in NG6:
ng6.ng6workflow.NG6Workflow
- This class adds required parameters for the description of a Run and a Sample. ng6.ng6workflow.CasavaNG6Workflow
- An extension of ng6.ng6workflow.NG6Workflow
which add the support of illumina
CASAVA output directories for Samples description.
New wokflow must be added as a new python package in the workflows
package. The implementation of
a workflows must be written in the package __init__.py
file. The developper can also create:
components
package, where all the workflow specific components and analysis can be stored,lib
package to import specific libraries within its workflow,bin
folder with the binaries used in the workflow.nG6/
├── bin/
├── docs/
├── src/
├── workflows/
│ ├── myworkflow/ [ the new ng6workflow package ]
│ │ ├── components/ [ specific components and analyses]
│ │ ├── lib/ [ specific libraries ]
│ │ ├── bin/ [ specific binairies ]
│ │ └── __init__.py [ the ng6workflow implementation ]
│ ├── components/
│ ├── extparsers/
│ ├── __init__.py
│ ├── formats.py
│ └── types.py
├── applications.properties
└── README
A NG6Workflow is a class defined in the __init__.py
file. In order to add a new one, the developper has to:
ng6.ng6workflow.NG6Workflow
class,get_description()
method to provide to the final user a description of the workflow,define_parameters()
method to add the workflow inputs and parameters,process()
method by adding analyses or components and setting their arguments,The class skeleton is given by
from ng6.ng6workflow import NG6Workflow
class MyWorkflow (NG6Workflow):
def get_description(self):
return "a description"
def define_parameters(self, function="process"):
# define the parameters
def process(self):
# add and link the components
By inheriting ng6.ng6workflow.NG6Workflow
class, default parameters regarding the description of a run and
also the description of samples, are added automatically and will be requested.
Flag | Help | Required | type |
---|---|---|---|
--admin-login | Who is the project administrator | true | adminlogin |
--project-name | The project name the run belongs to | true | existingproject |
--name | Give a name to your run | true | string |
--description | Give a description to your run | true | string |
--date | When were the data produced | true | date |
--data-nature | Are Sequences cDNA, genomique, RNA, ... | true | string |
--sequencer | Which sequencer produced the data | true | string |
--species | Which species has been sequenced | true | string |
--type | What type of data is it (1 lane, 1 region) | true | string |
The sample parameter is a multiple parameter list. It must be set as
--sample [subparam=value ...]
. Subparameters are :
Flag | Help | Required | type |
---|---|---|---|
sample_id | The uniq identifier of the sample. | false | string |
sample_name | A descriptive name for the sample. | false | string |
sample_description | A brief description of the sample. | false | string |
type | Read orientation and type. Choose a value from :
|
false | string |
insert_size | Insert size for paired end reads. | false | integer |
metadata | Add metadata to the sample. A sample metadata must be set as --sample metatada=key:value . | false | samplemetadata |
read1 | Read 1 data file path. | true | inputfile list |
read2 | Read 2 data file path. | false | inputfile list |
CasavaNG6Workflow is an extension of NG6Workflow
, it has the exact same requirements for the run description except that
it overloads the sample definition parameter to parse illumina CASAVA output directory. With this parameter, the final user does not
have to define his sample directly in the command line.
Flag | Help | Required | type |
---|---|---|---|
--casava-directory | Path to the CASAVA directory to use | true | string |
--casava-lane | The lane number to be retrieved from the casava directory | true | integer |
--mismatch-index | Set this value to true if the index sequence in the sample fastq files allows at least 1 mismatch | false | boolean |
The define_parameters()
method is used to add workflow parameters and inputs. To do so, several methods are available.
Once defined, the new parameters are available as object attibuts, thus they are accessible through self.parameter_name
.
Several types of parameters can be added, all described in the following sections. All have two required positional
arguments: name
and help
. The other arguments are optional and can be given to the method by using their
keywords.
is
is not supported to evaluate a parameter, the operator ==
must be prefered.
if self.param_name is not None :
becomes if self.param_name != None :
Parameters can be added to handle a single element or a list of elements. Thus, the add_parameter()
method can be used to force
the final user to provide one and only one value, where the add_parameter_list()
method allows the final user to give as many values he
wants.
In the following example, a parameter named sequencer
is added to the workflow. It has a list of choices and the default value is "HiSeq2000".
self.add_parameter("sequencer",
"The sequencer type.",
choices = ["HiSeq2000", "ILLUMINA","SLX","SOLEXA","454","UNKNOWN"],
default="HiSeq2000")
There are two positional arguments: name
and help
. All other options are keyword options
Name | Type | Required | Default value | Description |
---|---|---|---|---|
name | string | true | None | The name of the parameter. The parameter value is accessible
within the workflow object through the attribute named self.parameter_name NB: "-" in parameter name will be automatically replace by "_", so --name-param become self.name_param in the code. |
help | string | true | None | The parameter help message. |
default | - | false | None | The default parameter value. It's type depends on the parameter type. |
type | string | false | "str" | The parameter type. The value provided by the final user will be casted and checked against this type. All built-in Python types are available "int", "str", "float", "bool", "date", ... To create customized types, refere to the Add a data type documentation. |
choices | list | false | [] | A list of the allowed values. |
required | boolean | false | false | Wether or not the parameter can be ommitted. |
flag | string | false | None | The command line flag (if the value is None, the flag will be --name ). |
group | string | false | "default" | The value is used to group a list of parameters in sections. The group is used in both command line and GUI. |
display_name | string | false | None | The parameter name that should be displayed on the final form. |
add_to | string | false | None | If this parameter is part of a multiple parameter, add_to allows to define to which "parent" parameter it should be
linked to. |
The add_parameter_list()
method takes the same arguments as add_parameter()
. However, adding this parameter,
the final user will be allowed to enter multiple values for this parameter and the object attribut self.parameter_name
will be
settled as a Python list.
Just like parameters, inputs can be added to handle a single file or a list of files. Thus, the add__input_file()
method can be used to force
the final user to provide one and only one file, where the add__input_file_list()
method allows the final user to give as many files as he
wants.
In the following example, an input named reads
is added to the workflow. The provided file is required and should be in fastq format. No file size limitation is specified.
self.add_input_file_list("reads",
"Which read files should be used",
file_format="fastq",
required=True)
There are two positional argument : name
and help
. All other options are keyword options.
Name | Type | Required | Default value | Description |
---|---|---|---|---|
name | string | true | None | The name of the parameter. The parameter value is accessible
within the workflow object through the attribute named self.parameter_name . |
help | string | true | None | The parameter help message. |
default | string | false | None | The default path value. |
file_format | string | false | "any" | The file format is checked before running the workflow. Available format are "any", "bam", "fasta", "fastq", and "sff". To create customized format, refere to the Add a file format documentation. |
type | string | false | "inputfile" | The type can be "inputfile", "localfile", "urlfile" or "browsefile". An "inputfile" allows the final user to provide a "localfile" or an "urlfile" or a "browsefile". A "localfile" restricts the final user to provide a path to a file visible by ng6. An "urlfile" only permits the final user to give an URL as input, where a "browsefile" force the final user to upload a file from its own computer. This last option is only available from the GUI and is considered as a "localfile" from the command line. All the uploading process is handled by ng6. |
required | boolean | false | false | Wether or not the parameter can be ommitted. |
flag | string | false | None | The command line flag (if the value is None, the flag will be --name ). |
group | string | false | "default" | The value is used to group a list of parameters in sections. The group is used in both command line and GUI. |
display_name | string | false | None | The parameter name that should be displayed on the final form. |
add_to | string | false | None | If this parameter is part of a multiple parameter, add_to allows to define to which "parent" parameter it should be
linked to. |
size_limit | string | false | "0" | Which maximum file size is allowed. If the value is "0", the file size allowed is unlimited. The given value should also provides the file size units between "bytes", "Kb", "Mb", "Gb", "Tb", "Pb", "Eb" and "Zb". A value of 10Mb will restrict the user to upload a file of 10 Mega Bytes. |
This method takes the same arguments as add_input_file()
. However, adding this parameter,
the final user will be allowed to provide multiple files and the object attribut self.parameter_name
will be
settled as a Python list.
The add_input_directory()
method allows the user to select files from a specific directory. This
kind of input can be useful for tools outputing not only files but an organized directory.
The parameter get_files_fn
specify the function that will be used to retrieve the files. This method can
take as many arguments as required, but the first argument has to be a string representing the folder path.
By default all files will be selected. From the workflow process()
function, the files can be retrieved
by using the get_files()
method.
In the following example, the add_input_directory()
method is used to parse a directory and retrieve only fasta files
inside this directory. get_files()
will browse the directory and get all fasta files.
import os
from ng6.ng6workflow import NG6Workflow
def fasta_files(folder):
res = []
for file in os.listdir(folder):
if file.endswith(".fasta"):
res.append(file)
return res
class WF(NG6Workflow):
def define_parameters(self, function="process"):
self.add_input_directory("fastadir", "Path to folder with fasta files",
get_files_fn=fasta_files)
def process(self):
# to retrieve the files
for fastafile in self.fastadir.get_files():
# do something
There are two positional argument : name
and help
. All other options are keyword options.
Name | Type | Required | Default value | Description |
---|---|---|---|---|
name | string | true | None | The name of the parameter. The parameter value is accessible
within the workflow object through the attribute named self.parameter_name . |
help | string | true | None | The parameter help message. |
default | string | false | None | The default path value. |
get_files_fn | function | false | - |
get_files_fn will be the method called when executing param.get_files() . All argument from get_files()
will be used as arguments in get_files_fn
|
required | boolean | false | false | Wether or not the parameter can be ommitted. |
flag | string | false | None | The command line flag (if the value is None, the flag will be --name ). |
group | string | false | "default" | The value is used to group a list of parameters in sections. The group is used in both command line and GUI. |
display_name | string | false | None | The parameter name that should be displayed on the final form. |
add_to | string | false | None | If this parameter is part of a multiple parameter, add_to allows to define to which "parent" parameter it should be
linked to. |
The developper has the possibility to structure the input data by using the notion of multiple parameters. A multi
parameter is a collection of parameters linked together. Just like parameters and inputs, it can be added to handle a single collection or a
list of collections. Thus, the add_multiple_parameter()
method can be used to force the final user to provide one and only one
collection, where the add_multiple_parameter_list()
method allows the final user to give as many collection he wants. To add a
parameter within the multiple parameter, it only requires to set the option add_to
of any methods previously described.
The accessible object attribut self.multi_parameter_name
is then a Python dictionary gathering all the values of the different
parameters under the format {"sub_parameter1":value}
library
which contains two input files R1
(which is mandatory) and R2
and a sequencer
parameter. The parameter R1
is required only if a library
is defined.
self.add_multiple_parameter("library", "Library.", required=False)
self.add_input_file("R1", "Path to R1 file.", required=True, add_to="library")
self.add_input_file("R2", "Path to R2 file.", add_to="library")
self.add_parameter("sequencer", "The sequencer type.", choices=["HiSeq2000",
"ILLUMINA", "UNKNOWN"], default="HiSeq2000", add_to="library")
There are two positional arguments : name and help. All other options are keyword options.
Name | Type | Required | Default value | Description |
---|---|---|---|---|
name | string | true | None | The name of the multi parameter. The parameter value is accessible
within the workflow object through the attribute named self.multi_parameter_name . And its sub parameters
using self.multi_parameter_name["sub_parameter_name"] . |
help | string | true | None | The parameter help message. |
required | boolean | false | false | Wether or not the parameter can be ommitted. |
flag | string | false | None | The command line flag (if the value is None, the flag will be --name ). The sub parameters can be set as following
--name sub1=... sub2=... |
group | string | false | "default" | The value is used to group a list of parameters in sections. The group is used in both command line and GUI. |
display_name | string | false | None | The parameter name that should be displayed on the final form. |
This method takes the same arguments as add_multiple_parameter()
. However, adding this parameter,
the final user will be allowed to provide multiple collection and the object attribut self.multi_parameter_name
will be
settled as a Python list of Python dictionary.
There is a possibility to exclude some rules from each others. To do so, the method add_exclusion_rule()
is available. It only works with simple parameter.
In the following example, the final user will not be allowed to provide both fasta_file
and fastq_file
parameters.
self.add_input_file("fasta_file", "Path to the fasta file.", format="fasta")
self.add_input_file("fastq_file", "Path to the fastq file.", format="fastq")
self.add_exclution_rule("fasta_file", "fastq")
The method accept the following options
Name | Type | Required | Default value | Description |
---|---|---|---|---|
*args2exclude | string | true | None | The name of the parameter to exclude. |
The process()
method is in charge of building the workflow by adding analyses and components (using the method add_component()
) and
linking their inputs and their outputs. A analaysis and a component are classes representing a workflow step. See the
analyses documentation for more information.
The add_component()
method add an analysis or a component to the workflow by building respectively a ng6.analysis.Analysis
or a jflow.component.Component
object and returning it. All attributs defined within this object, such as the outputs,
are then available from the workflow and can be used as inputs of other components.
In the following example, the first component BWAIndex
is built and returned in the bwaindex
object.
The output bwaindex.databank
is accessible as an object attribut and can be used as input of the BWAmem
component.
def process(self):
# index the reference genome
bwaindex = self.add_component("BWAIndex", [self.reference_genome])
# align reads against the indexed genome
bwamem = self.add_component("BWAmem", [bwaindex.databank, self.reads])
There is one positional argument : component_name. All other options are keyword options.
Name | Type | Required | Default value | Description |
---|---|---|---|---|
component_name | string | true | None | The component class name to add to the workflow. |
args | list | false | [] | The component's arguments (see here for more details). |
kwargs | dict | false | {} | The component's keyword arguments (see here for more details). |
component_prefix | string | false | "default" | The prefix is used to name the component at the execution. The prefix allows to add multiple components of the same class within the same workflow. |
The method get_resource()
, giving a specific resource, returns the defined value within the resource
section of the jflow configuration file : application.properties
.
There is one required argument : resource.
Name | Type | Required | Default value | Description |
---|---|---|---|---|
resource | string | true | None | The resource name for which is requested the configured value. |