Job Description JSON Schema¶
The Job Description json (input of Tibanna) defines an individual execution. It has two parts, args and config. args contains information about pipeline, input files, output bucket, input parameters, etc. config has parameters about AWS such as instance type, EBS size, ssh password, etc.
Example job description for CWL¶
{
"args": {
"cwl_directory_url": "https://raw.githubusercontent.com/4dn-dcic/pipelines-cwl/0.2.0/cwl_awsem/",
"cwl_main_filename": "pairsam-parse-sort.cwl",
"cwl_version": "v1",
"input_files": {
"bam": {
"bucket_name": "montys-data-bucket",
"object_key": "dataset1/sample1.bam"
},
"chromsize": {
"bucket_name": "montys-data-bucket",
"object_key": "references/hg38.chrom.sizes"
}
},
"input_parameters": {
"nThreads": 16
},
"input_env": {
"TEST_ENV_VAR": "abcd"
},
"output_S3_bucket": "montys-data-bucket",
"output_target": {
"out_pairsam": "output/dataset1/sample1.sam.pairs.gz"
},
"secondary_output_target": {
"out_pairsam": "output/dataset1/sample1.sam.pairs.gz.px2"
}
},
"config": {
"instance_type": "t3.micro",
"ebs_size": 10,
"EBS_optimized": true,
"log_bucket": "montys-log-bucket"
}
}
args¶
The args
field describe pipeline, input and output.
Pipeline specification¶
CWL-specific¶
cwl_directory_url: | |
---|---|
|
|
cwl_directory_local: | |
|
|
cwl_main_filename: | |
|
|
cwl_child_filenames: | |
|
|
cwl_version: |
|
singularity: |
|
WDL-specific¶
language: |
|
---|---|
wdl_directory_url: | |
|
|
wdl_directory_local: | |
|
|
wdl_main_filename: | |
|
|
wdl_child_filenames: | |
|
Shell command-specific¶
language: |
|
---|---|
container_image: | |
|
|
command: |
"command": "echo \"haha\" > outfile"
|
Snakemake-specific¶
language: |
|
---|---|
container_image: | |
|
|
command: |
"command": "snakemake <target> --use-conda"
"command": "snakemake <target> --config=region=\"22:30000000-40000000\"
|
snakemake_main_filename: | |
|
|
snakemake_child_filenames: | |
|
|
snakemake_directory_local: | |
|
|
snakemake_directory_url: | |
|
Input data specification¶
input_files: |
{
"bam": {
"bucket_name": "montys-data-bucket",
"object_key": "dataset1/sample1.bam",
"mount": true
},
"chromsize": {
"bucket_name": "montys-data-bucket",
"object_key": "references/JKGFALIFVG.chrom.sizes"
'rename': 'some_dir_on_ec2/hg38.chrom.sizes'
}
}
)
|
---|---|
secondary_files: | |
{
"bam": {
"bucket_name": "montys-data-bucket",
"object_key": "dataset1/sample1.bam.bai"
}
}
) |
|
input_parameters: | |
{
'nThreads': 16
}
) |
|
input_env: |
{
"TEST_ENV_VAR": "abcd"
}
) |
Output target specification¶
output_S3_bucket: | |
---|---|
|
|
output_target: |
{
"out_pairsam": "output/dataset1/sample1.sam.pairs.gz"
}
)
{
"file:///data1/out/some_random_output.txt": "output/some_random_output.txt"
}
{
"out_pairsam": {
"object_key": "output/renamed_pairsam_file"
}
}
{
"out_pairsam": {
"object_key": "output/renamed_pairsam_file",
"bucket_name" : "some_different_bucket"
}
}
{
"some_output_as_dir": {
"object_prefix": "some_dir_output/",
"bucket_name": "some_different_bucket"
}
}
{
"out_zip": {
"object_prefix": "zip_output/",
"unzip": true
}
{
"out_zip": {
"object_key": "result.txt",
"tag": "Key1=Value1&Key2=Value2"
}
|
secondary_output_target: | |
{
"out_pairsam": "output/dataset1/sample1.sam.pairs.gz.px2"
}
) |
|
alt_cond_output_argnames: | |
'alt_cond_output_argnames' : {
'merged' : ['cond_merged.paste.pasted', 'cond_merged.cat.concatenated']
},
'output_target': {
'merged' : 'somedir_on_s3/somefilename'
}
|
Dependency specification¶
dependency: |
{
"exec_arn": ["arn:aws:states:us-east-1:643366669028:execution:tibanna_unicorn_default_7927:md5_test"]
}
|
---|
Custom error handling¶
custom_errors: |
[
{
"error_type": "Unmatching pairs in fastq"
"pattern": "paired reads have different names: .+",
"multiline": False
}
]
|
---|
config¶
The config
field describes execution configuration.
log_bucket: |
|
---|---|
instance_type: |
|
mem: |
|
mem_as_is: |
|
cpu: |
|
ebs_size: |
|
ebs_size_as_is: |
|
EBS_optimized: |
|
root_ebs_size: |
|
shutdown_min: |
|
password: |
|
key_name: |
|
ebs_iops: |
|
ebs_throughput: |
|
ebs_type: |
|
cloudwatch_dashboard: | |
|
|
spot_instance: |
|
spot_duration: |
|
behavior_on_capacity_limit: | |
|
|
availability_zone: | |
|
|
security_group: |
|
subnet: |
|