Skip to content

allenai/gpv2-web10k

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gpv2-web10k

This repository contains the script to download images from the Web-10K dataset. The script takes in a list of queries, queries Bing Image Search, and downloads the returned thumbnail images to an Amazon S3 bucket the user specifies. To use this script, you will need a Bing Image Search API key.

Setup

python3 -mvenv venv
source venv/bin/activate
pip install -r requirements.txt

Adding the Bing Search API Key and Amazon S3 Bucket name

Add your API key to get_api_key() in tasks.py on Line 45.

Add the bucket name to tasks.py on Line 21. The images will be downloaded to this bucket.

Running the script

invoke query query_sample.json  # to query Bing Image Search with the queries listed in query_sample.json
invoke print-query-results "mt. everest"  # to print the results of a specific query
invoke generate-html  # to generate an HTML containing the returned images
invoke download-images  # to download the images to an Amazon S3 bucket

Useful links:

Bing Image Search API Pricing (for ~40K queries using an S3-tier instance, we paid about $160)

Bing Image Search API v7 query parameters (to change the returned response content)

Bing Image Search APIs v7 response objects (to understand the returned objects)

About

Download Web-10K data by querying Bing Image Search

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages