dsub is a command-line tool that makes it easy to submit and run batch scripts in the cloud.

Last update: Jan 01, 2023

Related tags

Overview

dsub: simple batch jobs with Docker

Overview

dsub is a command-line tool that makes it easy to submit and run batch scripts in the cloud.

The dsub user experience is modeled after traditional high-performance computing job schedulers like Grid Engine and Slurm. You write a script and then submit it to a job scheduler from a shell prompt on your local machine.

Today dsub supports Google Cloud as the backend batch job runner, along with a local provider for development and testing. With help from the community, we'd like to add other backends, such as a Grid Engine, Slurm, Amazon Batch, and Azure Batch.

Getting started

You can install dsub from PyPI, or you can clone and install from github.

Sunsetting Python 2 support

Python 2 support ended in January 2020. See Python's official Sunsetting Python 2 announcement for details.

Automated dsub tests running on Python 2 have been disabled. Release 0.3.10 is the last version of dsub that supports Python 2.

Use Python 3.6 or greater. For earlier versions of Python 3, use dsub 0.4.1.

Pre-installation steps

This is optional, but whether installing from PyPI or from github, you are encouraged to use a Python virtual environment.

You can do this in a directory of your choosing.

    python3 -m venv dsub_libs
    source dsub_libs/bin/activate

Using a Python virtual environment isolates dsub library dependencies from other Python applications on your system.

Activate this virtual environment in any shell session before running dsub. To deactivate the virtual environment in your shell, run the command:

    deactivate

Alternatively, a set of convenience scripts are provided that activate the virutalenv before calling dsub, dstat, and ddel. They are in the bin directory. You can use these scripts if you don't want to activate the virtualenv explicitly in your shell.

Install `dsub`

Choose one of the following:

Install from PyPI

If necessary, install pip.
Install dsub
```
 pip install dsub
```

Install from github

Be sure you have git installed

Instructions for your environment can be found on the git website.

Clone this repository.

git clone https://github.com/DataBiosphere/dsub
cd dsub

Install dsub (this will also install the dependencies)
```
python setup.py install
```
Set up Bash tab completion (optional).
```
source bash_tab_complete
```

Post-installation steps

Minimally verify the installation by running:
```
dsub --help
```
(Optional) Install Docker.

This is necessary only if you're going to create your own Docker images or use the local provider.

Makefile

After cloning the dsub repo, you can also use the Makefile by running:

    make

This will create a Python virtual environment and install dsub into a directory named dsub_libs.

Getting started with the local provider

We think you'll find the local provider to be very helpful when building your dsub tasks. Instead of submitting a request to run your command on a cloud VM, the local provider runs your dsub tasks on your local machine.

The local provider is not designed for running at scale. It is designed to emulate running on a cloud VM such that you can rapidly iterate. You'll get quicker turnaround times and won't incur cloud charges using it.

Run a dsub job and wait for completion.

Here is a very simple "Hello World" test:
"${OUT}"' \ --wait ">
```
 dsub \
   --provider local \
   --logging "${TMPDIR:-/tmp}/dsub-test/logging/" \
   --output OUT="${TMPDIR:-/tmp}/dsub-test/output/out.txt" \
   --command 'echo "Hello World" > "${OUT}"' \
   --wait
```
Note: TMPDIR is commonly set to /tmp by default on most Unix systems, although it is also often left unset. On some versions of MacOS TMPDIR is set to a location under /var/folders.

Note: The above syntax ${TMPDIR:-/tmp} is known to be supported by Bash, zsh, ksh. The shell will expand TMPDIR, but if it is unset, /tmp will be used.

View the output file.

 cat "${TMPDIR:-/tmp}/dsub-test/output/out.txt"

Getting started on Google Cloud

dsub supports the use of two different APIs from Google Cloud for running tasks. Google Cloud is transitioning from Genomics v2alpha1 to Cloud Life Sciences v2beta.

dsub supports both APIs with the (old) google-v2 and (new) google-cls-v2 providers respectively. google-v2 is the current default provider. dsub will be transitioning to make google-cls-v2 the default in coming releases.

The steps for getting started differ slightly as indicated in the steps below:

Sign up for a Google account and create a project.
Enable the APIs:
- For the v2alpha1 API (provider: google-v2):
Enable the Genomics, Storage, and Compute APIs.
- For the v2beta API (provider: google-cls-v2):
Enable the Cloud Life Sciences, Storage, and Compute APIs
Install the Google Cloud SDK and run
```
gcloud init
```
This will set up your default project and grant credentials to the Google Cloud SDK. Now provide credentials so dsub can call Google APIs:
```
gcloud auth application-default login
```
Create a Google Cloud Storage bucket.

The dsub logs and output files will be written to a bucket. Create a bucket using the storage browser or run the command-line utility gsutil, included in the Cloud SDK.
```
gsutil mb gs://my-bucket
```
Change my-bucket to a unique name that follows the bucket-naming conventions.

(By default, the bucket will be in the US, but you can change or refine the location setting with the -l option.)

Run a very simple "Hello World" dsub job and wait for completion.

For the v2alpha1 API (provider: google-v2):

"${OUT}"' \ --wait ">

  dsub \
    --provider google-v2 \
    --project my-cloud-project \
    --regions us-central1 \
    --logging gs://my-bucket/logging/ \
    --output OUT=gs://my-bucket/output/out.txt \
    --command 'echo "Hello World" > "${OUT}"' \
    --wait

Change my-cloud-project to your Google Cloud project, and my-bucket to the bucket you created above.

For the v2beta API (provider: google-cls-v2):

"${OUT}"' \ --wait ">

  dsub \
    --provider google-cls-v2 \
    --project my-cloud-project \
    --regions us-central1 \
    --logging gs://my-bucket/logging/ \
    --output OUT=gs://my-bucket/output/out.txt \
    --command 'echo "Hello World" > "${OUT}"' \
    --wait

Change my-cloud-project to your Google Cloud project, and my-bucket to the bucket you created above.

The output of the script command will be written to the OUT file in Cloud Storage that you specify.

View the output file.

 gsutil cat gs://my-bucket/output/out.txt

Backend providers

Where possible, dsub tries to support users being able to develop and test locally (for faster iteration) and then progressing to running at scale.

To this end, dsub provides multiple "backend providers", each of which implements a consistent runtime environment. The current providers are:

local
google-v2 (the default)
google-cls-v2 (new)

More details on the runtime environment implemented by the backend providers can be found in dsub backend providers.

Differences between `google-v2` and `google-cls-v2`

The google-cls-v2 provider is built on the Cloud Life Sciences v2beta API. This API is very similar to its predecessor, the Genomics v2alpha1 API. Details of the differences can be found in the Migration Guide.

dsub largely hides the differences between the two APIs, but there are a few difference to note:

v2beta is a regional service, v2alpha1 is a global service

What this means is that with v2alpha1, the metadata about your tasks (called "operations"), is stored in a global database, while with v2beta, the metadata about your tasks are stored in a regional database. If your operation information needs to stay in a particular region, use the v2beta API (the google-cls-v2 provider), and specify the --location where your operation information should be stored.

The --regions and --zones flags can be omitted when using google-cls-v2

The --regions and --zones flags for dsub specify where the tasks should run. More specifically, this specifies what Compute Engine Zones to use for the VMs that run your tasks.

With the google-v2 provider, there is no default region or zone, and thus one of the --regions or --zones flags is required.

With google-cls-v2, the --location flag defaults to us-central1, and if the --regions and --zones flags are omitted, the location will be used as the default regions list.

`dsub` features

The following sections show how to run more complex jobs.

Defining what code to run

You can provide a shell command directly in the dsub command-line, as in the hello example above.

You can also save your script to a file, like hello.sh. Then you can run:

dsub \
    ... \
    --script hello.sh

If your script has dependencies that are not stored in your Docker image, you can transfer them to the local disk. See the instructions below for working with input and output files and folders.

Selecting a Docker image

To get started more easily, dsub uses a stock Ubuntu Docker image. This default image may change at any time in future releases, so for reproducible production workflows, you should always specify the image explicitly.

You can change the image by passing the --image flag.

dsub \
    ... \
    --image ubuntu:16.04 \
    --script hello.sh

Note: your --image must include the Bash shell interpreter.

For more information on using the --image flag, see the image section in Scripts, Commands, and Docker

Passing parameters to your script

You can pass environment variables to your script using the --env flag.

dsub \
    ... \
    --env MESSAGE=hello \
    --command 'echo ${MESSAGE}'

The environment variable MESSAGE will be assigned the value hello when your Docker container runs.

Your script or command can reference the variable like any other Linux environment variable, as ${MESSAGE}.

Be sure to enclose your command string in single quotes and not double quotes. If you use double quotes, the command will be expanded in your local shell before being passed to dsub. For more information on using the --command flag, see Scripts, Commands, and Docker

To set multiple environment variables, you can repeat the flag:

--env VAR1=value1 \
--env VAR2=value2

You can also set multiple variables, space-delimited, with a single flag:

--env VAR1=value1 VAR2=value2

Working with input and output files and folders

dsub mimics the behavior of a shared file system using cloud storage bucket paths for input and output files and folders. You specify the cloud storage bucket path. Paths can be:

file paths like gs://my-bucket/my-file
folder paths like gs://my-bucket/my-folder
wildcard paths like gs://my-bucket/my-folder/*

See the inputs and outputs documentation for more details.

Transferring input files to a Google Cloud Storage bucket.

If your script expects to read local input files that are not already contained within your Docker image, the files must be available in Google Cloud Storage.

If your script has dependent files, you can make them available to your script by:

Building a private Docker image with the dependent files and publishing the image to a public site, or privately to Google Container Registry
Uploading the files to Google Cloud Storage

To upload the files to Google Cloud Storage, you can use the storage browser or gsutil. You can also run on data that’s public or shared with your service account, an email address that you can find in the Google Cloud Console.

Files

To specify input and output files, use the --input and --output flags:

"${OUTPUT_FILE}"' ">

dsub \
    ... \
    --input INPUT_FILE_1=gs://my-bucket/my-input-file-1 \
    --input INPUT_FILE_2=gs://my-bucket/my-input-file-2 \
    --output OUTPUT_FILE=gs://my-bucket/my-output-file \
    --command 'cat "${INPUT_FILE_1}" "${INPUT_FILE_2}" > "${OUTPUT_FILE}"'

In this example:

a file will be copied from gs://my-bucket/my-input-file-1 to a path on the data disk
the path to the file on the data disk will be set in the environment variable ${INPUT_FILE_1}
a file will be copied from gs://my-bucket/my-input-file-2 to a path on the data disk
the path to the file on the data disk will be set in the environment variable ${INPUT_FILE_2}

The --command can reference the file paths using the environment variables.

Also in this example:

a path on the data disk will be set in the environment variable ${OUTPUT_FILE}
the output file will written to the data disk at the location given by ${OUTPUT_FILE}

After the --command completes, the output file will be copied to the bucket path gs://my-bucket/my-output-file

Multiple --input, and --output parameters can be specified and they can be specified in any order.

Folders

To copy folders rather than files, use the --input-recursive and output-recursive flags:

dsub \
    ... \
    --input-recursive FOLDER=gs://my-bucket/my-folder \
    --command 'find ${FOLDER} -name "foo*"'

Multiple --input-recursive, and --output-recursive parameters can be specified and they can be specified in any order.

Mounting "resource data"

If you have one of the following:

A large set of resource files, your code only reads a subset of those files, and the decision of which files to read is determined at runtime, or
A large input file over which your code makes a single read pass or only needs to read a small range of bytes,

then you may find it more efficient at runtime to access this resource data via mounting a Google Cloud Storage bucket read-only or mounting a persistent disk created from a Compute Engine Image read-only.

The google-v2 and google-cls-v2 providers support these two methods of providing access to resource data. The local provider supports mounting a local directory in a similar fashion to support your local development.

To have the google-v2 or google-cls-v2 provider mount a Cloud Storage bucket using Cloud Storage FUSE, use the --mount command line flag:

--mount MYBUCKET=gs://mybucket

The bucket will be mounted into the Docker container running your --script or --command and the location made available via the environment variable ${MYBUCKET}. Inside your script, you can reference the mounted path using the environment variable. Please read Key differences from a POSIX file system and Semantics before using Cloud Storage FUSE.

To have the google-v2 or google-cls-v2 provider mount a persistent disk created from an image, use the --mount command line flag and the url of the source image and the size (in GB) of the disk:

--mount MYDISK="https://www.googleapis.com/compute/v1/projects/your-project/global/images/your-image 50"

The image will be used to create a new persistent disk, which will be attached to a Compute Engine VM. The disk will mounted into the Docker container running your --script or --command and the location made available by the environment variable ${MYDISK}. Inside your script, you can reference the mounted path using the environment variable.

To create an image, see Creating a custom image.

To have the local provider mount a directory read-only, use the --mount command line flag and a file:// prefix:

--mount LOCAL_MOUNT=file://path/to/my/dir

The local directory will be mounted into the Docker container running your --scriptor --command and the location made available via the environment variable ${LOCAL_MOUNT}. Inside your script, you can reference the mounted path using the environment variable.

Setting resource requirements

dsub tasks run using the local provider will use the resources available on your local machine.

dsub tasks run using the google, google-v2, or google-cls-v2 providers can take advantage of a wide range of CPU, RAM, disk, and hardware accelerator (eg. GPU) options.

See the Compute Resources documentation for details.

Submitting a batch job

Each of the examples above has demonstrated submitting a single task with a single set of variables, inputs, and outputs. If you have a batch of inputs and you want to run the same operation over them, dsub allows you to create a batch job.

Instead of calling dsub repeatedly, you can create a tab-separated values (TSV) file containing the variables, inputs, and outputs for each task, and then call dsub once. The result will be a single job-id with multiple tasks. The tasks will be scheduled and run independently, but can be monitored and deleted as a group.

Tasks file format

The first line of the TSV file specifies the names and types of the parameters. For example:

--env SAMPLE_ID
   
    --input VCF_FILE
    
     --output OUTPUT_PATH

Each addition line in the file should provide the variable, input, and output values for each task. Each line beyond the header represents the values for a separate task.

Multiple --env, --input, and --output parameters can be specified and they can be specified in any order. For example:

--env SAMPLE
   
    --input A
    
     --input B
     
      --env REFNAME
      
       --output O
S1
       
        gs://path/A1.txt
        
         gs://path/B1.txt
         
          R1
          
           gs://path/O1.txt S2
           
            gs://path/A2.txt
            
             gs://path/B2.txt
             
              R2
              
               gs://path/O2.txt

Tasks parameter

Pass the TSV file to dsub using the --tasks parameter. This parameter accepts both the file path and optionally a range of tasks to process. The file may be read from the local filesystem (on the machine you're calling dsub from), or from a bucket in Google Cloud Storage (file name starts with "gs://").

For example, suppose my-tasks.tsv contains 101 lines: a one-line header and 100 lines of parameters for tasks to run. Then:

dsub ... --tasks ./my-tasks.tsv

will create a job with 100 tasks, while:

dsub ... --tasks ./my-tasks.tsv 1-10

will create a job with 10 tasks, one for each of lines 2 through 11.

The task range values can take any of the following forms:

m indicates to submit task m (line m+1)
m- indicates to submit all tasks starting with task m
m-n indicates to submit all tasks from m to n (inclusive).

Logging

The --logging flag points to a location for dsub task log files. For details on how to specify your logging path, see Logging.

Job control

It's possible to wait for a job to complete before starting another. For details, see job control with dsub.

Retries

It is possible for dsub to automatically retry failed tasks. For details, see retries with dsub.

Labeling jobs and tasks

You can add custom labels to jobs and tasks, which allows you to monitor and cancel tasks using your own identifiers. In addition, with the Google providers, labeling a task will label associated compute resources such as virtual machines and disks.

For more details, see Checking Status and Troubleshooting Jobs

Viewing job status

The dstat command displays the status of jobs:

dstat --provider google-v2 --project my-cloud-project

With no additional arguments, dstat will display a list of running jobs for the current USER.

To display the status of a specific job, use the --jobs flag:

dstat --provider google-v2 --project my-cloud-project --jobs job-id

For a batch job, the output will list all running tasks.

Each job submitted by dsub is given a set of metadata values that can be used for job identification and job control. The metadata associated with each job includes:

job-name: defaults to the name of your script file or the first word of your script command; it can be explicitly set with the --name parameter.
user-id: the USER environment variable value.
job-id: takes the form job-name--userid--timestamp where the job-name is truncated at 10 characters and the timestamp is of the form YYMMDD-HHMMSS-XX, unique to hundredths of a second.
task-id: if the job is submitted with the --tasks parameter, each task gets a sequential value of the form "task-n" where n is 1-based.

Note that the job metadata values will be modified to conform with the "Label Restrictions" listed in the Checking Status and Troubleshooting Jobs guide.

Metadata can be used to cancel a job or individual tasks within a batch job.

For more details, see Checking Status and Troubleshooting Jobs

Summarizing job status

By default, dstat outputs one line per task. If you're using a batch job with many tasks then you may benefit from --summary.

$ dstat --provider google-v2 --project my-project --status '*' --summary

Job Name        Status         Task Count
-------------   -------------  -------------
my-job-name     RUNNING        2
my-job-name     SUCCESS        1

In this mode, dstat prints one line per (job name, task status) pair. You can see at a glance how many tasks are finished, how many are still running, and how many are failed/canceled.

Deleting a job

The ddel command will delete running jobs.

By default, only jobs submitted by the current user will be deleted. Use the --users flag to specify other users, or '*' for all users.

To delete a running job:

ddel --provider google-v2 --project my-cloud-project --jobs job-id

If the job is a batch job, all running tasks will be deleted.

To delete specific tasks:

ddel \
    --provider google-v2 \
    --project my-cloud-project \
    --jobs job-id \
    --tasks task-id1 task-id2

To delete all running jobs for the current user:

ddel --provider google-v2 --project my-cloud-project --jobs '*'

Service Accounts and Scope (Google providers only)

When you run the dsub command with the google-v2 or google-cls-v2 provider, there are two different sets of credentials to consider:

Account submitting the pipelines.run() request to run your command/script on a VM
Account accessing Cloud resources (such as files in GCS) when executing your command/script

The account used to submit the pipelines.run() request is typically your end user credentials. You would have set this up by running:

gcloud auth application-default login

The account used on the VM is a service account. The image below illustrates this:

By default, dsub will use the default Compute Engine service account as the authorized service account on the VM instance. You can choose to specify the email address of another service acount using --service-account.

By default, dsub will grant the following access scopes to the service account:

In addition, the API will always add this scope:

https://www.googleapis.com/auth/cloud-platform

You can choose to specify scopes using --scopes.

Recommendations for service accounts

While it is straightforward to use the default service account, this account also has broad privileges granted to it by default. Following the Principle of Least Privilege you may want to create and use a service account that has only sufficient privileges granted in order to run your dsub command/script.

To create a new service account, follow the steps below:

Execute the gcloud iam service-accounts create command. The email address of the service account will be [email protected].
```
 gcloud iam service-accounts create "sa-name"
```

Grant IAM access on buckets, etc. to the service account.

 gsutil iam ch serviceAccount:[email protected]:roles/storage.objectAdmin gs://bucket-name

Update your dsub command to include --service-account

 dsub \
   --service-account [email protected]
   ...

What next?

See the examples:
See more documentation for:

Comments

NO_JOB eventhough nothing ran

I am trying to submit a dsub job and i am not getting the output. I am getting no_job and i am sure the input and output had run before. Can someone help me wi

#!/usr/bin/python

PROJECT_PATH="xyz"

# There is a manual step: please create a tab-delimited phenotype file at
# $PROJECT_PATH/pheno.tsv . Output from this project will ultimately go to
# $PROJECT_PATH/output/* .

# Leave one chromosome out?
USE_LOCO="TRUE"
TASK_DEFINITION_FILE="xyz/task1.tsv"

MAX_PREEMPTION=6

HAIL_DOCKER_IMAGE="gcr.io/jhs-project-243319/hail_latest:latest"

# Enable exit on error
set -o errexit

# Create the mytasks.tsv from our template
gsutil cat ${TASK_DEFINITION_FILE} | sed -e "s%gs://%${PROJECT_PATH}/output%g" > my.tasks.tsv

# Check for errors when we can
echo "Checking to make sure that ${PROJECT_PATH}/pheno.tsv exists"
gsutil ls ${PROJECT_PATH}/jhs.protOI.batch123.ALL.tab

# Launch step 1 and block until completion
echo "Test"
dsub \
   --project jhs-project-243319 \
   --provider google-v2 \
   --use-private-address \
   --regions us-central1 us-east1 us-west1 \
   --disk-type local-ssd \
   --disk-size 375 \
   --min-cores 64 \
   --min-ram 64 \
   --image ${HAIL_DOCKER_IMAGE} \
   --retries 1 \
   --skip \
   --wait \
   --logging ${PROJECT_PATH}/dsub-logs \
   --input PHENO_FILE=${PROJECT_PATH}/jhs.protOI.batch123.ALL.tab \
   --input HAIL_PATH=${PROJECT_PATH}/topmed_6a_pass_2k_minDP10_sQC_vQC_AF01_jhsprot.mt \
   --output-recursive OUTPUT_PATH=${PROJECT_PATH}/logs \
   --env LOCO=${USE_LOCO} \
   --timeout '12w' \
   --name test3 \
   --script /home/akhil/anaconda3/lib/python3.7/site-packages/dsub/commands/phewas_jhs_lmm.py \```

opened by apampana 13

Silent delocalizing failure

Hello! I'm trying to use dsub with the --tasks option to run an analysis in 20 chunks. Curiously, the *.logs indicate that the script runs to completion for every task, but only some random subset execute the delocalizing. Furthermore, the tasks that don't delocalize don't throw any kind of error captured in the *.logs. dstat -f, however, identifies the tasks that failed.

Here's an example of a success:

- create-time: '2019-07-25 02:16:39.297447'
  dsub-version: v0-3-2
  end-time: '2019-07-25 02:32:30.556849'
  envs:
    CHUNK: '3'
  events:
  - name: start
    start-time: 2019-07-25 06:16:42.171100+00:00
  - name: pulling-image
    start-time: 2019-07-25 06:17:32.995391+00:00
  - name: localizing-files
    start-time: 2019-07-25 06:18:34.308943+00:00
  - name: running-docker
    start-time: 2019-07-25 06:18:36.658863+00:00
  - name: delocalizing-files
    start-time: 2019-07-25 06:32:24.497567+00:00
  - name: ok
    start-time: 2019-07-25 06:32:30.556849+00:00
  input-recursives: {}
  inputs:
    INFILE: gs://haddath/sgosai/hff/data/FADS1_rep8detailed.txt
  internal-id: projects/sabeti-encode/operations/1351805964445161078
  job-id: python--sagergosai--190725-021637-18
  job-name: python
  labels: {}
  last-update: '2019-07-25 02:32:30.556849'
  logging: gs://haddath/sgosai/hff/logs/python--sagergosai--190725-021637-18.4.1.log
  mounts: {}
  output-recursives: {}
  outputs:
    OUTFILE: gs://haddath/sgosai/hff/data/FADS1_rep8__3_20.bed
  provider: google-v2
  provider-attributes:
    accelerators: []
    boot-disk-size: 250
    cpu_platform: ''
    disk-size: 200
    disk-type: pd-standard
    enable-stackdriver-monitoring: false
    instance-name: google-pipelines-worker-fae4230d454b3f6e1038535cbcb0da50
    machine-type: n1-standard-8
    network: ''
    preemptible: true
    regions: []
    service-account: default
    subnetwork: ''
    use_private_address: false
    zone: us-west2-c
    zones:
    - us-central1-a
    - us-central1-b
    - us-central1-c
    - us-central1-f
    - us-east1-b
    - us-east1-c
    - us-east1-d
    - us-east4-a
    - us-east4-b
    - us-east4-c
    - us-west1-a
    - us-west1-b
    - us-west1-c
    - us-west2-a
    - us-west2-b
    - us-west2-c
  script: |-
    #!/usr/bin/env bash
    python /app/hcr-ff/call_peaks.py ${INFILE} ${OUTFILE} -ji ${CHUNK} -jr 20 -ws 100 -ss 100
  script-name: python
  start-time: '2019-07-25 02:16:42.171100'
  status: SUCCESS
  status-detail: Success
  status-message: Success
  task-attempt: 1
  task-id: '4'
  user-id: sagergosai

And a failure:

- create-time: '2019-07-25 02:16:39.576571'
  dsub-version: v0-3-2
  end-time: '2019-07-25 02:52:45.047989'
  envs:
    CHUNK: '4'
  events:
  - name: start
    start-time: 2019-07-25 06:16:42.182994+00:00
  - name: pulling-image
    start-time: 2019-07-25 06:17:41.422799+00:00
  - name: localizing-files
    start-time: 2019-07-25 06:18:41.913631+00:00
  - name: running-docker
    start-time: 2019-07-25 06:18:44.379215+00:00
  - name: The assigned worker has failed to complete the operation
    start-time: 2019-07-25 06:52:43.907976+00:00
  input-recursives: {}
  inputs:
    INFILE: gs://haddath/sgosai/hff/data/FADS1_rep8detailed.txt
  internal-id: projects/sabeti-encode/operations/8834123416523977731
  job-id: python--sagergosai--190725-021637-18
  job-name: python
  labels: {}
  last-update: '2019-07-25 02:52:45.047989'
  logging: gs://haddath/sgosai/hff/logs/python--sagergosai--190725-021637-18.5.1.log
  mounts: {}
  output-recursives: {}
  outputs:
    OUTFILE: gs://haddath/sgosai/hff/data/FADS1_rep8__4_20.bed
  provider: google-v2
  provider-attributes:
    accelerators: []
    boot-disk-size: 250
    cpu_platform: ''
    disk-size: 200
    disk-type: pd-standard
    enable-stackdriver-monitoring: false
    instance-name: google-pipelines-worker-1d27f8b0a26375721946e521a550105a
    machine-type: n1-standard-8
    network: ''
    preemptible: true
    regions: []
    service-account: default
    subnetwork: ''
    use_private_address: false
    zone: us-east1-b
    zones:
    - us-central1-a
    - us-central1-b
    - us-central1-c
    - us-central1-f
    - us-east1-b
    - us-east1-c
    - us-east1-d
    - us-east4-a
    - us-east4-b
    - us-east4-c
    - us-west1-a
    - us-west1-b
    - us-west1-c
    - us-west2-a
    - us-west2-b
    - us-west2-c
  script: |-
    #!/usr/bin/env bash
    python /app/hcr-ff/call_peaks.py ${INFILE} ${OUTFILE} -ji ${CHUNK} -jr 20 -ws 100 -ss 100
  script-name: python
  start-time: '2019-07-25 02:16:42.182994'
  status: FAILURE
  status-detail: The assigned worker has failed to complete the operation
  status-message: The assigned worker has failed to complete the operation
  task-attempt: 1
  task-id: '5'
  user-id: sagergosai

dsub version: 0.3.2

opened by sjgosai 13

dsub 0.4.3 crashes on ubuntu 20.04

When I run the hello world example

dsub \
   --provider local \
   --logging "${TMPDIR:-/tmp}/dsub-test/logging/" \
   --output OUT="${TMPDIR:-/tmp}/dsub-test/output/out.txt" \
   --command 'echo "Hello World" > "${OUT}"' \
   --wait

it crashes as follows:

***WARNING: No Docker image specified. The default, `ubuntu:14.04` will be used.
***WARNING: For reproducible pipelines, specify an image with the `--image` flag.
Job properties:
  job-id: echo--hylke--201220-165144-45
  job-name: echo
  user-id: hylke
Launched job-id: echo--hylke--201220-165144-45
To check the status, run:
  dstat --provider local --jobs 'echo--hylke--201220-165144-45' --users 'hylke' --status '*'
To cancel the job, run:
  ddel --provider local --jobs 'echo--hylke--201220-165144-45' --users 'hylke'
Waiting for job to complete...
Waiting for: echo--hylke--201220-165144-45.
Traceback (most recent call last):
  File "/home/hylke/.local/bin/dsub", line 8, in <module>
    sys.exit(main())
  File "/home/hylke/.local/lib/python3.8/site-packages/dsub/commands/dsub.py", line 1106, in main
    dsub_main(prog, argv)
  File "/home/hylke/.local/lib/python3.8/site-packages/dsub/commands/dsub.py", line 1091, in dsub_main
    launched_job = run_main(args)
  File "/home/hylke/.local/lib/python3.8/site-packages/dsub/commands/dsub.py", line 1168, in run_main
    return run(
  File "/home/hylke/.local/lib/python3.8/site-packages/dsub/commands/dsub.py", line 1322, in run
    error_messages = _wait_after(provider, [job_metadata['job-id']],
  File "/home/hylke/.local/lib/python3.8/site-packages/dsub/commands/dsub.py", line 786, in _wait_after
    jobs_left = _wait_for_any_job(provider, job_ids_to_check, poll_interval,
  File "/home/hylke/.local/lib/python3.8/site-packages/dsub/commands/dsub.py", line 1015, in _wait_for_any_job
    tasks = provider.lookup_job_tasks({'*'}, job_ids=job_ids)
  File "/home/hylke/.local/lib/python3.8/site-packages/dsub/providers/local.py", line 519, in lookup_job_tasks
    task = self._get_task_from_task_dir(j, u, task_id, task_attempt)
  File "/home/hylke/.local/lib/python3.8/site-packages/dsub/providers/local.py", line 669, in _get_task_from_task_dir
    end_time = self._get_end_time_from_task_dir(task_dir)
  File "/home/hylke/.local/lib/python3.8/site-packages/dsub/providers/local.py", line 583, in _get_end_time_from_task_dir
    datetime.datetime.strptime(f.readline().strip(),
  File "/usr/lib/python3.8/_strptime.py", line 568, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
  File "/usr/lib/python3.8/_strptime.py", line 349, in _strptime
    raise ValueError("time data %r does not match format %r" %
ValueError: time data '' does not match format '%Y-%m-%d %H:%M:%S.%f

Any ideas how to get the Hello World example to work?

Versions: dsub: 0.4.3 Ubuntu: 20.04 Python: 3.8.5 Docker: 19.03.11

Thanks,

Hylke

opened by hylkedonker 12

Failure message: wrapping host binaries: pulling image: retry budget exhausted (10 attempts): running ["docker" "pull" "bash"]: exit status 1 (standard error: "Error response from daemon: Get https://registry-1.docker.io/v2/
This is a new error message for me, and I checked the GCP status page in case it was transient but don't see any active issues.

I am launching a bunch of tasks across a bunch of zones (within a single job) using the google-cls-v2 API with dsub 0.4.3. I am getting the following error, but only on some tasks within this job (while others within the same job are launching and succeeding):

Failure message: wrapping host binaries: pulling image: retry budget exhausted (10 attempts): running ["docker" "pull" "bash"]: exit status 1 (standard error: "Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)\n")

This strikes me as an odd message, because (1) I'm running these machines without a public IP, and so (2) I'm only using gcr.io Docker instances with them. I don't know why they'd be pointing to registry-1.docker.io.

If this isn't a dsub issue, I can point this to the mailing list. (And if there is a quick way for me to get a flavor for "dsub issue vs not dsub issue", just let me know so I can self-triage.)

Thanks!
opened by carbocation 12
Add a verbose mode option flag to dsub
I noticed that the providers have a verbose mode object variable, and that object variable is potentially set via the args :

$ git grep "getattr(args, 'verbose'" dsub/providers/provider_base.py: getattr(args, 'verbose', False), dsub/providers/provider_base.py: getattr(args, 'verbose', False), getattr(args, 'dry_run', False),

but there was no way to toggle that option via the CLI. This change adds a --verbose option to the command line.
opened by indraniel 12

The NVIDIA driver on your system is too old (found version 10020).

Not actually 100% sure that this is a dsub issue, but I'm trying to run a Docker image which is based on gcr.io/deeplearning-platform-release/pytorch-gpu.1-6:latest. When I execute python, I get the following error in dsub:

Failure message: Stopped running "user-command": exit status 1: /site-packages/torch/nn/modules/module.py", line 225, in _apply
    module._apply(fn)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 247, in _apply
    param_applied = fn(param)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 463, in convert
    return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
  File "/opt/conda/lib/python3.7/site-packages/torch/cuda/__init__.py", line 150, in _lazy_init
    _check_driver()
  File "/opt/conda/lib/python3.7/site-packages/torch/cuda/__init__.py", line 63, in _check_driver
    of the CUDA driver.""".format(str(torch._C._cuda_getDriverVersion())))
AssertionError: 
The NVIDIA driver on your system is too old (found version 10020).
Please update your GPU driver by downloading and installing a new
version from the URL: http://www.nvidia.com/Download/index.aspx
Alternatively, go to: https://pytorch.org to install
a PyTorch version that has been compiled with your version
of the CUDA driver.

I believe this is mapped via dsub and so this isn't something I can fix on my end. Is that accurate?

opened by carbocation 11

Error when trying to use requester pays buckets

Testing whether or not I can use dsub on files stored in a requester pays bucket returned the following error:

[u'Error in job wc--jslagel--171129-194301-06 - code 5: 9: Failed to localize files: failed to copy the following files: "gs://xxx-test-requester-pay/xxx.txt -> /mnt/datadisk/input/gs/xxx-test-requester-pay/xxx.txt (cp failed: gsutil -q -m cp gs://xxx-test-requester-pay/xxx.txt /mnt/datadisk/input/gs/xxx-test-requester-pay/xxx.txt, command failed: BadRequestException: 400 Bucket is requester pays bucket but no user project provided.\nCommandException: 1 file/object could not be transferred.\n)"'] JobExecutionError: One or more jobs finished with status FAILURE or CANCELED during wait.

On the surface it appears that the gsutil command is missing the new '-u ' argument.

opened by slagelwa 11

Does mounting disk images actually work?

I am trying to mount a disk image (ideally, I'd mount a disk snapshot, but that's a secondary issue). However, I can't seem to get the mount to work. Error message is below. Presumably I'm passing the argument incorrectly or missing something in my incantation, but it's not obvious to me what I've done wrong. It seems that the python code being executed is invalid, but maybe this is an issue with my local python installation. Any pointers?

$ python --version
Python 2.7.16 :: Anaconda, Inc.

$ dsub --version
dsub version: 0.3.5

$ dsub --provider google-v2 --project broad-ml4cvd --image ubuntu:18.04 --command '/bin/ls ${MYDISK}' --mount MYDISK=https://www.googleapis.com/compute/v1/projects/broad-ml4cvd/global/images/dl-image-2019-05-13 1000 --logging gs://ukbb_v2/projects/jamesp/tmp/dsub --regions us-central1 --wait
Traceback (most recent call last):
  File "/Users/jamesp/anaconda2/bin/dsub", line 11, in <module>
    load_entry_point('dsub==0.3.5', 'console_scripts', 'dsub')()
  File "/Users/jamesp/anaconda2/lib/python2.7/site-packages/dsub-0.3.5-py2.7.egg/dsub/commands/dsub.py", line 998, in main
    dsub_main(prog, argv)
  File "/Users/jamesp/anaconda2/lib/python2.7/site-packages/dsub-0.3.5-py2.7.egg/dsub/commands/dsub.py", line 983, in dsub_main
    launched_job = run_main(args)
  File "/Users/jamesp/anaconda2/lib/python2.7/site-packages/dsub-0.3.5-py2.7.egg/dsub/commands/dsub.py", line 1030, in run_main
    output_file_param_util, mount_param_util)
  File "/Users/jamesp/anaconda2/lib/python2.7/site-packages/dsub-0.3.5-py2.7.egg/dsub/lib/param_util.py", line 681, in args_to_job_params
    mount_data.add(mount_param_util.make_param(name, value, disk_size=None))
  File "/Users/jamesp/anaconda2/lib/python2.7/site-packages/dsub-0.3.5-py2.7.egg/dsub/lib/param_util.py", line 273, in make_param
    if raw_uri.startswith('https://www.googleapis.com/compute'):
AttributeError: 'NoneType' object has no attribute 'startswith'

opened by carbocation 10

dstat returns nothing for jobs submitted from cloud shell

I run jobs with dsub both from the Google Cloud Shell and from the Google Cloud SDK installed on my desktop. When I submit a job from my desktop, dstat works as expected. But when I submit a job from the Cloud Shell, dstat returns nothing (when called from either Cloud Shell or my desktop SDK).

Below is a screenshot of my cloud console (in which I'd previously installed dsub using sudo pip install dsub), showing job submission and attempt to call dstat.

And below is my desktop command line (iTerm on OSX, with dsub installed via cloning this repo and running python setup.py --install), showing dstat working effectively on a job I'd previously submitted from desktop, but returning nothing on the job I ran from the cloud console:

Thanks in advance for your advice.

opened by bertozzivill 10
dsub with a VPC

Is it possible to use dsub with a VPC? I can't seem to find a way to specify network/subnet compute resources to pass along to Google pipelines (which in itself is problematic to specify...).
google-v1-wontfix google-v2

opened by slagelwa 9

ddel: AttributeError: type object 'HttpError' has no attribute 'resp'

I've created too many jobs "by accident" (or rather I hoped it would re-use compute engines). When I tried to delete all of them using: ddel --provider google-v2 --project my-project-name --jobs '*'

I am getting the following exception at some point:

Traceback (most recent call last):
  File "/path/to/venv/bin/ddel", line 11, in <module>
    sys.exit(main())
  File "/path/to/venv/local/lib/python2.7/site-packages/dsub/commands/ddel.py", line 137, in main
    create_time_min=create_time)
  File "/path/to/venv/local/lib/python2.7/site-packages/dsub/commands/ddel.py", line 184, in ddel_tasks
    user_ids, job_ids, task_ids, labels, create_time_min, create_time_max)
  File "/path/to/venv/local/lib/python2.7/site-packages/dsub/providers/google_v2.py",line 1069, in delete_jobs
    tasks)
  File "/path/to/venv/local/lib/python2.7/site-packages/dsub/providers/google_base.py", line 445, in cancel
    batch_fn, cancel_fn, ops[first_op:first_op + max_batch])
  File "/path/to/venv/local/lib/python2.7/site-packages/dsub/providers/google_base.py", line 409, in _cancel_batch
    batch.execute()
  File "/path/to/venv/local/lib/python2.7/site-packages/dsub/providers/google_v2.py",line 400, in execute
    self._response_handler(request_id, response, exception)
  File "/path/to/venv/local/lib/python2.7/site-packages/dsub/providers/google_base.py", line 383, in handle_cancel_response
    msg = 'error %s: %s' % (exception.resp.status, exception.resp.reason)
AttributeError: type object 'HttpError' has no attribute 'resp'

It could be that there are so many tasks to delete. HttpError should have the resp set in the constructor, not sure why it hasn't in that case. Maybe it's a different object (although the full classname was googleapiclient.errors.HttpError).

I got around it by putting a try/catch google_base, something like (which is obviously a workaround, not a proper solution, but enough to get everything deleted - which took a while):

      try:
        msg = 'error %s: %s' % (exception.resp.status, exception.resp.reason)
        if exception.resp.status == FAILED_PRECONDITION_CODE:
          detail = json.loads(exception.content)
          status = detail.get('error', {}).get('status')
          if status == FAILED_PRECONDITION_STATUS:
            msg = 'Not running'
      except AttributeError:
        msg = 'error %s' % exception

opened by de-code 8

Support request for nvidia-a100-80g
I believe this is an issue for the life sciences API devs, however not sure where to ask.

Seeing this error when I request machine a2-ultrapu-4g with accelerator_type: nvidia-a100-80g

"Error: validating pipeline: unsupported accelerator: "nvidia-a100-80g"". Details: "Error: validating pipeline: unsupported accelerator: "nvidia-a100-80g""

Can you please either request these machines to be made available or please direct me as to best place to ask and I will do so. Thank you.
opened by rivershah 1
Mounting a writable existing persistent disk?

I've had some good successes with mounting existing read-only persistent disks to the VM running a dsub job, and its very cool that one can do this. However I was wondering about attached writable disks. According to the Life Science API documentation:

If all Mount references to this disk have the readOnly flag set to true, the disk will be attached in read-only mode and can be shared with other instances. Otherwise, the disk will be available for writing but cannot be shared.

I'm not exactly sure what they mean by Mount references. Do they mean that the disk is attached to zero or more VMs in read only mode? As that would seem to be what is implied by the description. (I'm not sure how outside of the VM that GCP would explicitly know how the disk is actually mounted). I've done some testing with a persistent disk that's unattached to any VMs, and one that was already attached in read only mode to a VM and in either case when I launch a dsub job the persistent disk is always attached in read only mode regardless.

opened by slagelwa 2

ERROR: gcloud crashed (TypeError): a bytes-like object is required, not 'str'

When I run the "Hello World" test for the local provider, it works. When I run it on a custom image, it also works. But when I try to make it run on on any gcr.io/ image, it does not work. Instead, I get the following message in the runner-log.txt file:

WARNING: `gcloud docker` will not be supported for Docker client versions above 18.03.

As an alternative, use `gcloud auth configure-docker` to configure `docker` to
use `gcloud` as a credential helper, then use `docker` as you would for non-GCR
registries, e.g. `docker pull gcr.io/project-id/my-image`. Add
`--verbosity=error` to silence this warning: `gcloud docker
--verbosity=error -- pull gcr.io/project-id/my-image`.

See: https://cloud.google.com/container-registry/docs/support/deprecation-notices#gcloud-docker

ERROR: gcloud crashed (TypeError): a bytes-like object is required, not 'str'

If you would like to report this issue, please run the following command:
  gcloud feedback

To check gcloud for common problems, please run the following command:
  gcloud info --run-diagnostics

Given the contents of the error message, and the fact that I have docker version 20 installed, I think this error is not surprising. But is there a way to bypass dsub calling gcloud docker when running gcr.io images with the --local provider so that those of us with docker > 18.03 can use the --local provider?

opened by carbocation 1

Feature request TPU v4 support

With tpu v4, google has really cleaned up the user experience around tpu vms. Does the google-cls-v2 provider allow provisioning of tpu v4 machines? If so, is there any example that can please be shown that illustrates provisioning, and loading up any drivers to make the tpu v4 accelerator types visible to jobs submitted via dsub.

opened by rivershah 3
Use gcloud storage instead of gsutil

It seems that gcloud storage will be substantially faster for localization/delocalization vs gsutil. Seems like it would make sense to either apply the shim or to transition to using gcloud storage in place of gsutil in dsub.

opened by carbocation 8
Upgrade dsub dependencies
A project is having trouble resolving dependencies. Could we please consider relaxing dsub dependencies in the next release:

The conflict is caused by: The user requested google-api-python-client==2.52.0 dsub 0.4.7 depends on google-api-python-client<=2.47.0
opened by rivershah 1

Releases(v0.4.8)

v0.4.8(Dec 22, 2022)
This release includes:

dsub

(New) (In Development) Implemented google-batch provider.

Note that this is not yet feature parity with the other providers.

See Get started with Batch for details on Google Cloud Batch.

setup.py: Update dsub dependent libraries to pick up newer versions.

Update providers to allow a mount other than /mnt/data.

Fix documentation about mounting an existing disk

Fix unit test socket.timeout alias issue with Python 3.10

Source code(tar.gz)
Source code(zip)
v0.4.7(May 18, 2022)
This release includes:

dsub

Add support for mounting an existing disk read-only to a pipeline VM

setup.py: Update dsub dependent libraries to pick up newer versions

Documentation

Include documenting the Cloud SDK as a requirement for the local provider

Source code(tar.gz)
Source code(zip)
v0.4.6(Jan 26, 2022)
This release includes:

dsub

Add support for Toronto, Delhi, Melbourne, and Warsaw regions

setup.py: Update dsub dependent libraries to pick up newer versions.

Source code(tar.gz)
Source code(zip)
v0.4.5(Aug 26, 2021)
This release includes small maintenance updates:

dsub

Quiet a warning about 'oauth2client' (which dsub no longer uses).

Fix one other instance of cache_discovery=True raising ImportError.

Add flush method to _Printer.

Run pytype on dsub, and fix type errors

setup.py: Update dsub dependent libraries to pick up newer versions.

Update tenacity version

Source code(tar.gz)
Source code(zip)
v0.4.4(Feb 18, 2021)
This release includes:

dsub

Implement --block-external-network to support network-sandboxed user action.

setup.py: Update dsub dependent libraries to pick up newer versions.

Check for python or python3 executable in local runner script. Also fix error handling for the first "write_event".

Add zones for Seoul, Jakarta, Salt Lake City, and Las Vegas

Documentation

Rename references to the master branch to the main branch.

Source code(tar.gz)
Source code(zip)
v0.4.3(Nov 24, 2020)
This release includes:

dsub

Update parsing of worker assigned event text to account for changes in Pipelines API.

Hide the --nvidia-driver-version flag as it is now ignored by the Pipelines API (since Sept 2020).

Documentation

Fix link to accelerator API doc in README.

Fix documentation that assumed TMPDIR is /tmp.

Mention SSH from the browser in troubleshooting docs.

Tests

Add e2e test that confirms GPU is installed when --accelerator-type and --accelerator-count is used.

Removed infrequently used environment variables, CHECK_RESULTS_ONLY and ALLOW_DIRTY_TESTS, from dsub tests.

Source code(tar.gz)
Source code(zip)
v0.4.2(Oct 15, 2020)
Release 0.4.2 of dsub ends support Python 3.5, which reached its "end of life" at the end of September, 2020.

The last version of dsub that supports Python 3.5 is 0.4.1. Please use Python 3.6 or greater.

This release includes:

Python code health

Remove uses of future from dsub

Remove six and its usages from dsub

Explicitly support Python 3.6 and up.

Feature updates

Improvements to dstat output

Use "tenacity" library instead of "retrying" library for API retries.

Add a get_credentials function that Python clients of dsub, dstat, ddel can override for non-standard runtime environments.

google-cls-v2 provider updates:

Use batch endpoint in google-cls-v2 provider for job deletion (ddel).

google-cls-v2: Use the batch endpoint only for --location us-central1.

Source code(tar.gz)
Source code(zip)
v0.4.1(Aug 27, 2020)
This release includes one change to support users of Python 3.5:

Remove a use of f-strings, which was introduced in Python 3.6.

Source code(tar.gz)
Source code(zip)
v0.4.0(Aug 26, 2020)
Release 0.4.0 completes the sunsetting of Python 2 support for dsub. The last version of dsub that supports Python 2 is 0.3.10.

This release also adds a WARNING when the --image flag is omitted from a call to dsub. The default image is available as a getting started convenience, but for ongoing reproducible workflows, the image should be specified by the caller. The current default is ubuntu:14.04 which reached End Of Life in April 2019. The default image will change in future releases and it is likely to be changed on a semi-regular basis, as popular base Docker linux images change.

This release includes:

dsub

Update setup.py in dsub to be Python 3 only.

Lint dsub source files as Python3 only. Fix a few lint warnings.

Emit warning if default image is used.

Print full path of exceptions that are retried.

Print retry errors for socket timeout error.

Add socket.timeout exceptions to the retry list.

Fix markdown formatting in dsub README

Source code(tar.gz)
Source code(zip)
v0.3.10(Aug 4, 2020)
This release includes:

dsub

Update Makefile to use Python3 venv

Add documentation around Compute Engine Quotas.

Have dsub output job-name and user-id in addition to job-id prior to launching job.

Fix for --users '*'

Update httplib2 for dsub to 0.18.1

Retry transient http error codes when checking GCS

Fix for yaml 5.3 where timestamps are already loaded as timezone aware.

Improve performance of GCS output file checks for --skip

Source code(tar.gz)
Source code(zip)
v0.3.9(Jul 6, 2020)
This release includes:

dsub

Update version of cloudsdk docker image and revert workaround for gsutil/gcloud auth token bug on GCE, which should now be fixed in the updated image.

Update google-auth to 1.18.0 and pin google-api-core to 1.21.0

Remove leading characters that are not a letter, number, or underscore when auto-creating a job-name from a command string.

Move google_v2 arguments to google_common in the --help text.

Add a section in the documentation for Google provider-specific command-line flags.

Remove support/tests for legacy local provider job metadata file

Testing updates:

Re-enable the test e2e_errors.sh for all providers.

Add unit test for retrying BrokenPipeError

Fix ResourceWarning when running python unit tests.

Source code(tar.gz)
Source code(zip)
v0.3.8(May 27, 2020)
This release includes:

dsub

Remove the google provider and its documentation.

Document the existence of the google-cls-v2 provider

Document using venv for installation

Add a flag --credentials-file to pass service account credentials to the provider.

google-v2 provider updates:

Add --ssh to dstat.

google-cls-v2 provider updates:

Add default locations for google-cls-v2 in dstat and ddel.

Source code(tar.gz)
Source code(zip)
v0.3.7(Feb 3, 2020)
This release includes:

dsub

(New) (Experimental) Implemented google-cls-v2 provider that passes all tests.

Pin all dsub dependencies to a max version

Fix broken urls in dsub docs.

Add dsub --summary output in wait_and_retry loop.

google-v2 provider updates:

Enable a shared PID namespace when --ssh is specified.

Also retry broken pipe errors

Setting --preemptible 0 should not cause an error.

Testing

Change Travis Python3 version from 3.8 to 3.7.

Remove sorting_util_test.

Source code(tar.gz)
Source code(zip)
v0.3.6(Nov 22, 2019)
This release includes:

dsub

Add periodic status update in output (via --summary flag).

Update help text to clarify that timeout has a default of 7 days.

Replace apiclient with googleapiclient.

Emit a message to make it more clear that the dsub process must continue running for retries.

Add missing quotes in documentation in example for --mount.

google-v2 provider updates:

Update gsutil rsync warning message to include command being run.

Add workaround for gsutil/gcloud bug that prevents re-authentication.

Source code(tar.gz)
Source code(zip)
v0.3.5(Oct 23, 2019)
This release includes:

dsub

Fix RFC3339 date parsing errors with specific values under Python3

google-v2 provider updates:

Filter out warning from google-auth.

Retry ResponseNotReady error.

Add zones for Zurich and Osaka.

Move "sleep before retry" messages from INFO in stdout to WARNING in stderr. This ensures that the retry messages bubble up to the stderr output recorded in the pipeline's operation.

Source code(tar.gz)
Source code(zip)
v0.3.4(Oct 1, 2019)
This release includes:

dsub

Explicitly reject launching jobs if blank lines found in tasks file (instead of erroring)

google-v2 provider updates:

Expose the Pipelines API operation id in dsub launch stderr output.

Replace oauth2lib with google-auth

Update cloud sdk image to a GCR hosted one, gcr.io/google.com/cloudsdktool/cloud-sdk:264.0.0-slim. This version was specifically chosen as the next version updates gsutil from 4.42 to 4.43. 4.43 includes undesired changes to gsutil cp. See gsutil's changelogs and the regression for details.

Test updates:

Use fake time in retrying_test.

Source code(tar.gz)
Source code(zip)
v0.3.3(Sep 3, 2019)
This release includes:

dsub

Fix cases where stderr was redirected to stdout instead of vice-versa.

google-v2 provider updates:

Retry support for starting with preemptible and falling back to non-preemptible VMs

Add a delay between gsutil cp retries

Documentation updates:

Remove note from README about experimental Python3 support.

Source code(tar.gz)
Source code(zip)
v0.3.2(Jun 13, 2019)
This release includes:

dsub

Py3 compliance updates.

Fix UnicodeEncodeError when task error message contains non-ascii characters

Fix datetime.datetime.max construction used in wait loop to be an offset-aware datetime.

Silence retrying messages during the wait loop.

google-v2 provider updates:

Fix failing final_logging actions.

Support stackdriver monitoring

Documentation updates:

Made it clear that multiple --input, --output, --input-recursive, and --output-recursive parameters may be used.

Fix 'lookup' typo.

Source code(tar.gz)
Source code(zip)
v0.3.1(Apr 16, 2019)
Python 3 is now part of automated testing. All unit and integration tests are now run with Python 2.7 and Python 3.7.

The google provider is no longer included in automated tests.

Specific changes in this release include:

dstat

Separate recursive input/output from input/output fields..

Fix YAMLLoadWarning that was being emitted.

dsub

Experimental: generate 32 character UUID job ids with --unique-job-ids flag.

local provider updates:

Ensure runner-log.txt is written as text.

google-v2 provider updates:

Add timestamps in .log files.

Test updates:

Remove the google provider from automated tests.

Fix the way that the project ID is discovered for Python 3.

Increase concurrency and optionally disable Python module tests.

Source code(tar.gz)
Source code(zip)
v0.3.0(Apr 3, 2019)
This release includes one change:

google-v2 is now the default provider.

Source code(tar.gz)
Source code(zip)
v0.2.6(Apr 1, 2019)
Note: This is the last planned release with google as the default provider. The new default provider will be google-v2.

This release includes:

google-v2 provider updates:

--disk-type option added.

--service-account option added.

Fix dstat to find jobs submitted with user-ids containing non-alphanumeric non-hyphen characters.

Fix dstat to better handle operations with no actions.

Source code(tar.gz)
Source code(zip)
v0.2.5(Feb 7, 2019)
This release includes:

dsub

When tasks are retried (based on use of the --retries flag), output messages are now more informative.

bash image can now be used with the local and google-v2 providers (updated Docker entrypoints to use /usr/bin/env bash instead of /bin/bash)

google-v2 provider updates:

--nvidia-driver-version parameter can now be used to specifiy the NVIDIA driver version to use when attaching an NVIDIA GPU accelerator.

log files are copied locally on the VM before upload to GCS. This avoids ResumableUploadAbortException copying large logs.

dstat

script and script-name now appears in dstat --full output.

pyyaml version updated to latest version allowing Python3.7 support.

Source code(tar.gz)
Source code(zip)
v0.2.4(Dec 11, 2018)
This release includes:

google-v2 provider:

--min-ram and --min-core parameters can now be used to automatically select one of the Compute Engine Custom Machine Types.

--mount a persistent disk created from a Compute Engine Image

Prefixes like "Execution failed: action <n>" removed from status messages in dstat.

local provider:

--mount a local directory

Add Hong Kong zones.

Source code(tar.gz)
Source code(zip)
v0.2.3(Nov 20, 2018)
This release includes:

google:

A large deprecation WARNING will now be emitted when using the google provider.

google-v2

Exit with error if logging fails.

Better messaging on failures.

Bug fixes for exception handling in ddel jobs

local:

Better messaging on logging failures.

Source code(tar.gz)
Source code(zip)
v0.2.2(Nov 2, 2018)
This release includes:

Support for Requester Pays buckets via the new --user-project option (google-v2 and local providers)

Fixes dstat ... --limit off-by-one bug

Source code(tar.gz)
Source code(zip)
v0.2.1(Oct 4, 2018)
This release includes:

Documentation updates noting the deprecation of the google provider.

google-v2 provider updates:

--log-interval flag to configure the amount of time to sleep between copying log files from the pipeline to the logging path.

Adding google-v2 specific elements to dstat output.

--ssh flag to start an SSH container in the background, allowing you to inspect the runtime environment of your job's container in real time.

Experimental support for gcsfuse through the --mount flag.

local provider updates:

Add support for Docker images with entrypoints

Source code(tar.gz)
Source code(zip)
v0.2.0(Sep 20, 2018)
This release includes:

Experimental Python 3 support.

Improvements to retry logic on transient HTTP errors.

google-v2 provider:

dstat fixes:

Fix for "ok" event in the wrong order.

Decreased operations.list page size to work around quota exception.

Fix for 'pulling-image' event discovery.

dsub fixes:

Fix pipeline hang in localization/delocalization on error on multi-core VMs

Source code(tar.gz)
Source code(zip)
v0.1.10(Aug 27, 2018)
This release includes:

google-v2 provider:

Support for --network, --cpu-platform, --timeout parameters

Add events to dstat.list for google V2 provider

Using --min-cores or --min-ram now directs users to --machine-type

Add Finland and Los Angeles regions and new Singapore zone.

Various test suite improvements

Note that this release includes an important change to the way that log file names are formatted for dsub jobs that use --retries. The task-attempt is now automatically included in the log file name such that logs for each attempt do not overwrite the previous attempt.
Source code(tar.gz)
Source code(zip)
v0.1.9(Jun 28, 2018)
All unit and integration tests now pass for the google-v2 provider. Users interested in Google's Pipelines v2 are encouraged to give it a try.

This release includes:

google-v2 provider:

dstat fields such as envs, inputs, outputs, and logging now supported.

ddel supported

General

dsub returns task-id even if failure is detected when using --wait.

events included in dstat output (local and google providers)

Source code(tar.gz)
Source code(zip)
v0.1.8(Jun 5, 2018)
This release includes:

Initial support of dsub automatic retries

dstat support for the --summary flag, providing a more compact output for --tasks jobs

Provider improvements

Local provider

Use parallel copy for localizing and de-localizing files

Deterministic dstat response ordering

Google-v2 provider (still in progress)

File localization/delocalization

v2alpha1 timestamp format support

Various test suite fixes

Source code(tar.gz)
Source code(zip)

dsub is a command-line tool that makes it easy to submit and run batch scripts in the cloud.

Related tags

Overview

dsub: simple batch jobs with Docker

Overview

Getting started

Sunsetting Python 2 support

Pre-installation steps

Install dsub

Install from PyPI

Install from github

Post-installation steps

Makefile

Getting started with the local provider

Getting started on Google Cloud

Backend providers

Differences between google-v2 and google-cls-v2

dsub features

Defining what code to run

Selecting a Docker image

Passing parameters to your script

Working with input and output files and folders

Transferring input files to a Google Cloud Storage bucket.

Files

Folders

Mounting "resource data"

Setting resource requirements

Submitting a batch job

Tasks file format

Tasks parameter

Logging

Job control

Retries

Labeling jobs and tasks

Viewing job status

Summarizing job status

Deleting a job

Service Accounts and Scope (Google providers only)

Recommendations for service accounts

What next?

Comments

Releases(v0.4.8)

v0.4.8(Dec 22, 2022)

v0.4.7(May 18, 2022)

v0.4.6(Jan 26, 2022)

v0.4.5(Aug 26, 2021)

v0.4.4(Feb 18, 2021)

v0.4.3(Nov 24, 2020)

v0.4.2(Oct 15, 2020)

v0.4.1(Aug 27, 2020)

v0.4.0(Aug 26, 2020)

v0.3.10(Aug 4, 2020)

v0.3.9(Jul 6, 2020)

v0.3.8(May 27, 2020)

v0.3.7(Feb 3, 2020)

v0.3.6(Nov 22, 2019)

v0.3.5(Oct 23, 2019)

v0.3.4(Oct 1, 2019)

v0.3.3(Sep 3, 2019)

v0.3.2(Jun 13, 2019)

v0.3.1(Apr 16, 2019)

v0.3.0(Apr 3, 2019)

v0.2.6(Apr 1, 2019)

v0.2.5(Feb 7, 2019)

v0.2.4(Dec 11, 2018)

v0.2.3(Nov 20, 2018)

v0.2.2(Nov 2, 2018)

v0.2.1(Oct 4, 2018)

v0.2.0(Sep 20, 2018)

v0.1.10(Aug 27, 2018)

v0.1.9(Jun 28, 2018)

v0.1.8(Jun 5, 2018)

Owner

Data Biosphere

Command line tool to automate transforming the effects of one color profile to another, possibly more standard one.

Patool is a portable command line archive file manager

Spongebob-cli - Watch classic spongebob from the terminal

Baseline is a cross-platform library and command-line utility that creates file-oriented baselines of your systems.

Color preview command-line tool written in python

PipeCat - A command line Youtube music player written in python.

Install `dsub`

Differences between `google-v2` and `google-cls-v2`

`dsub` features