Batch processing with AWS Batch and CDK

Welcome

This repository demostrates provisioning the necessary infrastructure for running a job on AWS Batch using Cloud Development Kit (CDK). The AWS Batch job reads images from an S3 bucket, runs inference over image-to-vector computer vision model, and stores the results in DynamoDB. Code can be easily modified to fit other batch job transformations you might want to perform.

This code repository is part of the Deep learning image vector embeddings at scale using AWS Batch and CDK AWS DevOps Blog post.

Pre-requisites

Create and source a Python virtualenv on MacOS and Linux, and install python dependencies:

$ python3 -m venv .env
$ source .env/bin/activate
$ pip install -r requirements.txt

Install the latest version of the AWS CDK CLI:

$ npm i -g aws-cdk

Usage

Current code creates a the AWS Batch infrastructure, S3 Bucket for reading the data from, a DynamoDB table to write te batch operation results. Once the infrastructure is provisioned trough AWS CDK, you need to upload the images you want to process to the created S3 bucket. Once you've done that, go to the created AWS Lambda and submit a job. This will trigger a job execution on AWS Batch and you should see the results in the created DynamoDB table.

To deploy and run the batch inference, follow the following steps:

Make sure you have AWS CDK installed and working, all the dependencies of this project defiend in the requirements.txt file, as well as having an installed and configured Docker in your environment;
Set the CDK_DEPLOY_ACCOUNT ENV variable to the name of the AWS account you want to use (pre-defined with AWS CLI);
Set the CDK_DEPLOY_REGION ENV variable to the name of the region you want to deploy the infra in (e.g. 'us-west-2');
Run cdk deploy in the root of this project and wait for the deployment to finish successfully;
Upload the images you need to proccess to the newly created S3 bucket under a S3 bucket path (e.g. /images). Use this path in the next step;
Go to the created AWS Lambda and execute the lambda function with the following JSON:

{
"Paths": [
    "images"
   ]
}

In the AWS console, go to AWS batch and make sure the jobs are submitted and are running successfully;
Open the created DynamoDB table and validate the results are there;
You can now use a DynamoDB client to read and consume the results;

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
batch_job_cdk		batch_job_cdk
src_batch_job		src_batch_job
src_lambda		src_lambda
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
app.py		app.py
cdk.json		cdk.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

batch_job_cdk

batch_job_cdk

src_batch_job

src_batch_job

src_lambda

src_lambda

CODE_OF_CONDUCT.md

CODE_OF_CONDUCT.md

CONTRIBUTING.md

CONTRIBUTING.md

LICENSE

LICENSE

README.md

README.md

app.py

app.py

cdk.json

cdk.json

requirements.txt

requirements.txt

Repository files navigation

Batch processing with AWS Batch and CDK

Welcome

Pre-requisites

Usage

License

About

Releases

Packages

Contributors 2

Languages

License

aws-samples/aws-cdk-deep-learning-image-vector-embeddings-at-scale-using-aws-batch

Folders and files

Latest commit

History

Repository files navigation

Batch processing with AWS Batch and CDK

Welcome

Pre-requisites

Usage

License

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages