google-resumable-media Apache-2google-resumable-media (🥉28 · ⭐ 27) - Utilities for Google Media Downloads and Resumable.. Apache-2

Last update: Nov 22, 2022

Overview

`google-resumable-media`

Utilities for Google Media Downloads and Resumable Uploads

See the docs for examples and usage.

Experimental asyncio Support

While still in development and subject to change, this library has asyncio support at google._async_resumable_media.

Supported Python Versions

Python >= 3.5

Deprecated Python Versions

Python == 2.7. Python 2.7 support will be removed on January 1, 2020.

License

Apache 2.0 - See the LICENSE for more information.

Comments

feat: async changes to resumable upload/download

AsyncIO functionality for resumable media

Current State: All unit tests for asynchronous functionality are passing. The system tests are close to complete, though a timeout context manager error means that we are currently not able to run the upload and download tests simultaneously and need to run each in separate sessions to avoid that bug.

Related Branch for auth: https://github.com/googleapis/google-auth-library-python/tree/async
cla: no

opened by anibadde 53

Python SDK unable to download file due to checksum mismatch

Object download failed complaining about checksum mismatch. Downloading the object through gsutils works fine.

./gcs-download-object.py
Traceback (most recent call last):
  File "./gcs-download-object.py", line 29, in <module>
    download_blob('##REDACTED##',
  File "./gcs-download-object.py", line 20, in download_blob
    blob.download_to_filename(destination_file_name)
  File "/usr/local/lib/python3.8/site-packages/google/cloud/storage/blob.py", line 1184, in download_to_filename
    client.download_blob_to_file(
  File "/usr/local/lib/python3.8/site-packages/google/cloud/storage/client.py", line 719, in download_blob_to_file
    blob_or_uri._do_download(
  File "/usr/local/lib/python3.8/site-packages/google/cloud/storage/blob.py", line 956, in _do_download
    response = download.consume(transport, timeout=timeout)
  File "/usr/local/lib/python3.8/site-packages/google/resumable_media/requests/download.py", line 171, in consume
    self._write_to_stream(result)
  File "/usr/local/lib/python3.8/site-packages/google/resumable_media/requests/download.py", line 120, in _write_to_stream
    raise common.DataCorruption(response, msg)
google.resumable_media.common.DataCorruption: Checksum mismatch while downloading:
  ##REDACTED##
The X-Goog-Hash header indicated an MD5 checksum of:
  lAhluFgTEwcNJDvTSap2fQ==
but the actual MD5 checksum of the downloaded contents was:
  61Kz/FQdqRvwqacGuwuFIA==

The Code itself is pretty straight forward:

#!/usr/bin/env python3.8
from google.cloud import storage
def download_blob(bucket_name, source_blob_name, destination_file_name):
    """Downloads a blob from the bucket."""
    # bucket_name = "your-bucket-name"
    # source_blob_name = "storage-object-name"
    # destination_file_name = "local/path/to/file"
    storage_client = storage.Client()
    bucket = storage_client.bucket(bucket_name)
    # Construct a client side representation of a blob.
    # Note `Bucket.blob` differs from `Bucket.get_blob` as it doesn't retrieve
    # any content from Google Cloud Storage. As we don't need additional data,
    # using `Bucket.blob` is preferred here.
    blob = bucket.blob(source_blob_name)
    blob.download_to_filename(destination_file_name)
    print(
        "Blob {} downloaded to {}.".format(
            source_blob_name, destination_file_name
        )
    )
download_blob('##REDACTED##',
              'remedia/mezzanines/Live/2018-06-24/M31_POL-COL_ESFUHD_06_24.mov', 'M31_POL-COL_ESFUHD_06_24.mov')

The file size is 2.3TB if that matters.

Following are the plugin versions

pip3.8 list
Package                  Version
------------------------ ---------
boto3                    1.17.13
botocore                 1.20.13
cachetools               4.2.1
certifi                  2020.12.5
cffi                     1.14.5
chardet                  4.0.0
google-api-core          1.26.0
google-auth              1.27.0
google-cloud-core        1.6.0
google-cloud-storage     1.36.0
google-crc32c            1.1.2
google-resumable-media   1.2.0
googleapis-common-protos 1.52.0
idna                     2.10
jmespath                 0.10.0
packaging                20.9
pip                      19.2.3
protobuf                 3.15.1
pyasn1                   0.4.8
pyasn1-modules           0.2.8
pycparser                2.20
pyparsing                2.4.7
python-dateutil          2.8.1
pytz                     2021.1
requests                 2.25.1
rsa                      4.7.1
s3transfer               0.3.4
setuptools               41.2.0
six                      1.15.0
urllib3                  1.26.3

I'm able to reproduce this issue for this file. I had downloaded several hundred objects with the same SDK. Not sure why its failing on this file.

type: bug priority: p1 :rotating_light: api: storage

opened by cloudryder 17

Synthesis failed for google-resumable-media-python

Hello! Autosynth couldn't regenerate google-resumable-media-python. :broken_heart:

Here's the output from running synth.py:

5de3e1e14a0b07eab8b474e669164dbd31f81fb
2020-08-29 05:17:08,070 autosynth [DEBUG] > Running: git log -1 --pretty=%at 968465a1cad496e1292ef4584a054a35f756ff94
2020-08-29 05:17:08,073 autosynth [DEBUG] > Running: git log -1 --pretty=%at a9eea2c50b7dce0dffcc010c1caf712802155403
2020-08-29 05:17:08,076 autosynth [DEBUG] > Running: git log -1 --pretty=%at 637f8aa1373a0a6be1b626a282f2e13d7d6d7d6c
2020-08-29 05:17:08,079 autosynth [DEBUG] > Running: git log -1 --pretty=%at d0198121927f606e113275a4b0f3560a7a821470
2020-08-29 05:17:08,082 autosynth [DEBUG] > Running: git log -1 --pretty=%at 8cf6d2834ad14318e64429c3b94f6443ae83daf9
2020-08-29 05:17:08,085 autosynth [DEBUG] > Running: git log -1 --pretty=%at 019c7168faa0e56619f792693a8acdb30d6de19b
2020-08-29 05:17:08,088 autosynth [DEBUG] > Running: git log -1 --pretty=%at 5d916ec54cadd4674e80e6555d0c6a78849ef4a7
2020-08-29 05:17:08,091 autosynth [DEBUG] > Running: git log -1 --pretty=%at cbcd64279572769b4d350bf8078bcd1f151c9684
2020-08-29 05:17:08,094 autosynth [DEBUG] > Running: git log -1 --pretty=%at 80f46100c047bc47efe0025ee537dc8ee413ad04
2020-08-29 05:17:08,097 autosynth [DEBUG] > Running: git checkout 0902df119547d952c70cd740e272d7dc4e272ae3
Note: checking out '0902df119547d952c70cd740e272d7dc4e272ae3'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at 0902df1 chore: release 1.0.0 (#164)
2020-08-29 05:17:08,112 autosynth [DEBUG] > Running: git checkout 80f46100c047bc47efe0025ee537dc8ee413ad04
Note: checking out '80f46100c047bc47efe0025ee537dc8ee413ad04'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at 80f4610 chore: remove monolith nodejs autosynth (#741)
2020-08-29 05:17:08,120 autosynth [DEBUG] > Running: git branch -f autosynth-30
2020-08-29 05:17:08,123 autosynth [DEBUG] > Running: git checkout autosynth-30
Switched to branch 'autosynth-30'
2020-08-29 05:17:08,127 autosynth [INFO] > Running synthtool
2020-08-29 05:17:08,127 autosynth [INFO] > ['/tmpfs/src/github/synthtool/env/bin/python3', '-m', 'synthtool', '--metadata', 'synth.metadata', 'synth.py', '--']
2020-08-29 05:17:08,127 autosynth [DEBUG] > log_file_path: /tmpfs/src/logs/google-resumable-media-python/30/sponge_log.log
2020-08-29 05:17:08,129 autosynth [DEBUG] > Running: /tmpfs/src/github/synthtool/env/bin/python3 -m synthtool --metadata synth.metadata synth.py --
2020-08-29 05:17:08,343 synthtool [DEBUG] > Executing /home/kbuilder/.cache/synthtool/google-resumable-media-python/synth.py.
On branch autosynth-30
nothing to commit, working tree clean
2020-08-29 05:17:08,468 synthtool [DEBUG] > Using precloned repo /home/kbuilder/.cache/synthtool/synthtool
.coveragerc
.flake8
.github/CONTRIBUTING.md
.github/ISSUE_TEMPLATE/bug_report.md
.github/ISSUE_TEMPLATE/feature_request.md
.github/ISSUE_TEMPLATE/support_request.md
.github/PULL_REQUEST_TEMPLATE.md
.github/release-please.yml
.gitignore
.kokoro/build.sh
.kokoro/continuous/common.cfg
.kokoro/continuous/continuous.cfg
.kokoro/docker/docs/Dockerfile
.kokoro/docker/docs/fetch_gpg_keys.sh
.kokoro/docs/common.cfg
.kokoro/docs/docs-presubmit.cfg
.kokoro/docs/docs.cfg
.kokoro/presubmit/common.cfg
.kokoro/presubmit/presubmit.cfg
.kokoro/publish-docs.sh
.kokoro/release.sh
.kokoro/release/common.cfg
.kokoro/release/release.cfg
.kokoro/samples/lint/common.cfg
.kokoro/samples/lint/continuous.cfg
.kokoro/samples/lint/periodic.cfg
.kokoro/samples/lint/presubmit.cfg
.kokoro/samples/python3.6/common.cfg
.kokoro/samples/python3.6/continuous.cfg
.kokoro/samples/python3.6/periodic.cfg
.kokoro/samples/python3.6/presubmit.cfg
.kokoro/samples/python3.7/common.cfg
.kokoro/samples/python3.7/continuous.cfg
.kokoro/samples/python3.7/periodic.cfg
.kokoro/samples/python3.7/presubmit.cfg
.kokoro/samples/python3.8/common.cfg
.kokoro/samples/python3.8/continuous.cfg
.kokoro/samples/python3.8/periodic.cfg
.kokoro/samples/python3.8/presubmit.cfg
.kokoro/test-samples.sh
.kokoro/trampoline.sh
.kokoro/trampoline_v2.sh
.trampolinerc
CODE_OF_CONDUCT.md
CONTRIBUTING.rst
LICENSE
MANIFEST.in
docs/_static/custom.css
docs/_templates/layout.html
docs/conf.py.j2
docs/multiprocessing.rst
noxfile.py.j2
renovate.json
Skipping: samples/AUTHORING_GUIDE.md
Skipping: samples/CONTRIBUTING.md
scripts/decrypt-secrets.sh
scripts/readme-gen/readme_gen.py.j2
scripts/readme-gen/templates/README.tmpl.rst
scripts/readme-gen/templates/auth.tmpl.rst
scripts/readme-gen/templates/auth_api_key.tmpl.rst
scripts/readme-gen/templates/install_deps.tmpl.rst
scripts/readme-gen/templates/install_portaudio.tmpl.rst
setup.cfg
testing/.gitignore
nox > Running session blacken
nox > Session blacken skipped: Python interpreter 3.8 not found.
2020-08-29 05:17:11,754 synthtool [DEBUG] > Wrote metadata to synth.metadata.
2020-08-29 05:17:11,796 autosynth [INFO] > Changed files:
2020-08-29 05:17:11,796 autosynth [INFO] > M synth.metadata
2020-08-29 05:17:11,797 autosynth [DEBUG] > Running: git log 80f46100c047bc47efe0025ee537dc8ee413ad04 -1 --no-decorate --pretty=%s
2020-08-29 05:17:11,800 autosynth [DEBUG] > Running: git log 80f46100c047bc47efe0025ee537dc8ee413ad04 -1 --no-decorate --pretty=%b%n%nSource-Author: %an <%ae>%nSource-Date: %ad
2020-08-29 05:17:11,804 autosynth [DEBUG] > Running: git add -A
2020-08-29 05:17:11,807 autosynth [DEBUG] > Running: git status --porcelain
2020-08-29 05:17:11,811 autosynth [DEBUG] > Running: git commit -m chore: remove monolith nodejs autosynth

Final step in sharding nodejs autosynth.
fixes https://github.com/googleapis/synthtool/issues/697

Source-Author: Jeffrey Rennie <[email protected]>
Source-Date: Fri Aug 28 09:43:32 2020 -0700
Source-Repo: googleapis/synthtool
Source-Sha: 80f46100c047bc47efe0025ee537dc8ee413ad04
Source-Link: https://github.com/googleapis/synthtool/commit/80f46100c047bc47efe0025ee537dc8ee413ad04
[autosynth-30 3edef6e] chore: remove monolith nodejs autosynth
 1 file changed, 40 insertions(+), 2 deletions(-)
2020-08-29 05:17:11,817 autosynth [DEBUG] > Running: git reset --hard HEAD
HEAD is now at 3edef6e chore: remove monolith nodejs autosynth
2020-08-29 05:17:11,821 autosynth [DEBUG] > Running: git checkout autosynth
Switched to branch 'autosynth'
2020-08-29 05:17:11,825 autosynth [DEBUG] > Running: git checkout autosynth
Already on 'autosynth'
2020-08-29 05:17:11,829 autosynth [DEBUG] > Running: git checkout autosynth-30
Switched to branch 'autosynth-30'
2020-08-29 05:17:11,833 autosynth [DEBUG] > Running: git checkout autosynth
Switched to branch 'autosynth'
2020-08-29 05:17:11,837 autosynth [DEBUG] > Running: git merge --squash autosynth-30
Updating 0902df1..3edef6e
Fast-forward
Squash commit -- not updating HEAD
 synth.metadata | 42 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 40 insertions(+), 2 deletions(-)
2020-08-29 05:17:11,841 autosynth [DEBUG] > Running: git commit -m chore: start tracking obsolete files
[autosynth 2d0ab38] chore: start tracking obsolete files
 1 file changed, 40 insertions(+), 2 deletions(-)
2020-08-29 05:17:11,847 autosynth [DEBUG] > Running: git push --force origin autosynth
To https://github.com/googleapis/google-resumable-media-python.git
 + ec2e32e...2d0ab38 autosynth -> autosynth (forced update)
2020-08-29 05:17:15,099 autosynth [DEBUG] > Running: git log -1 --pretty=%b
2020-08-29 05:17:15,656 autosynth [ERROR] > Error making request (422): Validation Failed
2020-08-29 05:17:15,656 autosynth [DEBUG] > {'message': 'Validation Failed', 'errors': [{'resource': 'PullRequest', 'code': 'custom', 'message': 'A pull request already exists for googleapis:autosynth.'}], 'documentation_url': 'https://docs.github.com/rest/reference/pulls#create-a-pull-request'}
2020-08-29 05:17:15,656 autosynth [DEBUG] > Running: git clean -fdx
Removing __pycache__/
Removing google/__pycache__/
Traceback (most recent call last):
  File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 690, in <module>
    main()
  File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 539, in main
    return _inner_main(temp_dir)
  File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 670, in _inner_main
    commit_count = synthesize_loop(x, multiple_prs, change_pusher, synthesizer)
  File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 388, in synthesize_loop
    pr = change_pusher.push_changes(1, toolbox.branch, pr_title)
  File "/tmpfs/src/github/synthtool/autosynth/change_pusher.py", line 103, in push_changes
    self._repository, branch=branch, title=pr_title, body=new_body,
  File "/tmpfs/src/github/synthtool/autosynth/github.py", line 94, in create_pull_request
    return cast(Dict, _get_json_or_raise_exception(response))
  File "/tmpfs/src/github/synthtool/autosynth/github.py", line 488, in _get_json_or_raise_exception
    response.raise_for_status()
  File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/requests/models.py", line 941, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 422 Client Error: Unprocessable Entity for url: https://api.github.com/repos/googleapis/google-resumable-media-python/pulls

Google internal developers can see the full log here.

type: bug priority: p2 autosynth failure

opened by yoshi-automation 15

Uploads Using Python Google Cloud Storage/Bigquery Client Libraries are Very Slow

This was originally opened as a Stackoverflow question, then reported as a Bug through the IssueTracker site, but neither have gotten any traction...I'm thinking this is the best place to report this issue since the maintainers might have quicker insight into what the issue might be.

Issue

Using python google.cloud.bigquery.client.Client load_table_from_file() or uploading them first to my storage bucket using google.cloud.storage.blob.Blob upload_from_file() (to then load to BQ) both result in awfully slow uploads (1-4 MBps) when gsutil/Dropbox show 10x speeds on same machine/environment.

I am using python 3.6.9. All metrics are tested using a single file upload. I have tried:

Running in a docker container on a Google Compute Engine Ubuntu VM. Running in a docker container on my mac. Running on my Mac using just python (no docker). Uploading from the whole file in memory, from disk, uncompressed, gzipped. No difference. Using older and the most recent python client library versions. For older (1.24.0 Bigquery, 1.25.0 Storage) clients I see 1-3MB per second upload speeds. For the 2.13.1 Bigquery client I see 3-4MB per second.

All these tests resulted in identically slow performance.

I have 900+ Mbps up/down on my mac. The Dropbox python client library running on the same setups easily smokes these speeds using the exact same files, and using gsutil/bq on my mac also shows 10x + speeds for the same file.

Environment details

OS type and version:
Python version: 3.6
pip version: 21.1.3
google-cloud-bigquery version: Multiple

Steps to reproduce

Attempt to upload a large (1-4GB) CSV file to an existing BQ table using the python biquery client load_table_from_file() method. See code example for a sample script as well as sample bq command line comparison.
Same limited speed can be observed when trying to use the python storage client blob.upload_from_file().

Code example (Find and Replace `UPDATE_THIS`)

import csv
import logging
import random
import string

from google.cloud import bigquery
from google.oauth2 import service_account


SVC_ACCT_JSON_FILENAME = "creds.json"
GCP_PROJECT_ID = "UPDATE_THIS"
DATASET_ID = "UPDATE_THIS"
BQ_TABLE_ID = "UPDATE_THIS"

CSV_FILENAME = "test_table_data.csv"
CSV_NUM_ROWS = 3000000

"""
-- Create table in BQ before running this script
CREATE TABLE UPDATE_THIS.UPDATE_THIS
(
  col1_str STRING,
  col2_str STRING,
  col3_str STRING,
  col4_str STRING,
  col5_str STRING,
  col6_str STRING,
  col7_int INT64,
  col8_int INT64
)

"""

# Command line comparison. This uses full bandwidth ~70 MBps on my mac vs
# about 4MBps on the same machine using this script/python client
"""
bq load \
    --source_format=CSV \
    --replace=true \
    --skip_leading_rows=1 \
    UPDATE_THIS:UPDATE_THIS.UPDATE_THIS \
    ./test_table_data.csv \
    col1_str:STRING,col2_str:STRING,col3_str:STRING,col4_str:STRING,col5_str:STRING,col6_str:STRING,col7_int:INTEGER,col8_int:INTEGER
"""


def main():
    generate_csv()  # Run first time then reuse

    # Create client
    credentials = service_account.Credentials.from_service_account_file(
        SVC_ACCT_JSON_FILENAME,
        scopes=["https://www.googleapis.com/auth/cloud-platform"],
    )
    bq_client = bigquery.Client(
        credentials=credentials,
        project=GCP_PROJECT_ID
    )

    dataset_ref = bq_client.dataset(DATASET_ID)
    table_ref = dataset_ref.table(BQ_TABLE_ID)

    config = bigquery.LoadJobConfig()
    config.autodetect = False
    config.source_format = "CSV"
    config.skip_leading_rows = 1
    config.write_disposition = "WRITE_TRUNCATE"

    logging.info("Beginning load job...")
    with open(CSV_FILENAME, "rb") as source_file:
        job = bq_client.load_table_from_file(
            source_file,
            table_ref,
            job_config=config
        )
    job.result()  # Starts job and waits for table load to complete.
    logging.info("Job ID: %s", job.job_id)
    if job.errors is None and job.error_result is None:
        logging.info("BQ load job complete without error!")
        logging.info(
            "Loaded %d rows",
            job.output_rows
        )
    else:
        msg = ("bderr: BQ load job failed with error_result: "
               f"{job.error_result} and errors: {job.errors}")
        logging.error(msg)


def generate_csv():
    """Generates csv of string/int data types. File size should be around 1GB
    and include the header.

    """
    logging.info("Generating CSV...")
    header = [
        "col1_str",
        "col2_str",
        "col3_str",
        "col4_str",
        "col5_str",
        "col6_str",
        "col7_int",
        "col8_int"
    ]

    char_bank = string.ascii_letters + string.digits
    with open(CSV_FILENAME, "w") as fout:
        w_csv = csv.writer(fout)
        w_csv.writerow(header)
        for x in range(CSV_NUM_ROWS):
            if x % 100000 == 0:
                logging.info("Written %d out of %d rows...", x, CSV_NUM_ROWS)
            w_csv.writerow([
                "".join(random.choices(char_bank, k=48)),
                "".join(random.choices(char_bank, k=48)),
                "".join(random.choices(char_bank, k=48)),
                "".join(random.choices(char_bank, k=48)),
                "".join(random.choices(char_bank, k=48)),
                "".join(random.choices(char_bank, k=48)),
                random.randrange(100000000),
                random.randrange(100000000),
            ])


if __name__ == "__main__":
    fmt = "%(asctime)s %(name)-25s  %(module)-24s %(levelname)9s: %(message)s"
    logging.basicConfig(format=fmt)
    logging.getLogger().setLevel(logging.INFO)
    logging.info("SCRIPT START")
    main()
    logging.info("SCRIPT END")

Making sure to follow these steps will guarantee the quickest resolution possible.

Thanks!

type: bug priority: p2 api: storage

opened by KevinTydlacka 14

MD5 validation broken?

I have a file that I uploaded and then downloaded with google-resumable-media under the hood. It seems that 0.3 reports that the X-Goog-Hash header reports a different MD5 than the computed one. The downloaded file however is intact.

The only thing is my file is gzipped. Does gzip break the MD5 calculation?

Thanks!

Also to be closed: https://github.com/GoogleCloudPlatform/google-cloud-python/issues/4227
:rotating_light: triage me

opened by danqing 14
Add checksum validation for non-chunked non-composite downloads
This partially addresses https://github.com/GoogleCloudPlatform/google-resumable-media-python/issues/22

Note current limitations:

doesn't handle chunked download case

doesn't handle case of downloading composite objects

doesn't do anything about upload checksumming

I ran the unit and system tests only against Python 2.7, because my current OS doesn't support Python 3.6.
download requests-transport
opened by mfschwartz 11
Dependency on urllib3?

I just upgraded the packages on my system, and this package (which must be a dependency for another package) broke my system. I'm getting the error that:

ImportError: No module named urllib3.response

And I don't have urllib3 installed, so that's the problem. However, why don't I have it installed? I only use pip to install things, so I think the fact that urllib3 isn't installed means that it's not listed in a requirements.txt file somewhere.

It doesn't look like this project has a requirements list, but I don't know. It does seem like the urllib3 requirement might be a new one?
:rotating_light: triage me

opened by mlissner 10
feat: add _async_resumable_media experimental support

This is a merge of multiple previously reviewed PRs #178 #176 #153

It says 9k lines, keep in mind, technically this is all reviewed code. I think areas to focus on are changes to existing public interfaces, as the async code should all be boxed in an internal area. This is setup.py and noxfile.py
cla: yes kokoro:run

opened by crwilcox 9
feat(resumable-media): add customizable timeouts to upload/download methods

Fixes #45.

This PR adds configurable timeouts to upload objects.

Motivation: We would like to add timeouts to BigQuery methods that depend on the file upload functionality (using resumable media), so that requests do not get stuck indefinitely at the transport layer.

Currently there is a default timeout used (61, 60), but that might not suit all use cases, hence the need for optional explicit timeout parameters.

CAUTION: Changing public function signatures might break some libraries that utilize resumable media, even though the new arguments are optional, and positioned last), and this will not be caught by the test suite.

It has happened recently when we added timeouts to google-auth, thus it is advisable that code owners of the dependent libraries have a look, too (if that is feasible).

cla: yes

opened by plamut 9

Import error despite having satisfactory version of requests

Using a virtualenv for installing dependencies through pip. I notice that pip installs the following versions of requests and google-resumable-media:

requests-2.21.0 google-resumable-media-0.3.2

But I'm still met with the following import error:

Traceback (most recent call last):
  File "some/proprietary/file.py", line X, in <module>
  File "/tmp/pip-install-itP2Ai/pyinstaller/PyInstaller/loader/pyimod03_importers.py", line 395, in load_module
  File "another/proprietary/file.py", line Y, in <module>
  File "/tmp/pip-install-itP2Ai/pyinstaller/PyInstaller/loader/pyimod03_importers.py", line 395, in load_module
  File "site-packages/google/cloud/storage/__init__.py", line 39, in <module>
  File "/tmp/pip-install-itP2Ai/pyinstaller/PyInstaller/loader/pyimod03_importers.py", line 395, in load_module
  File "site-packages/google/cloud/storage/blob.py", line 44, in <module>
  File "/tmp/pip-install-itP2Ai/pyinstaller/PyInstaller/loader/pyimod03_importers.py", line 395, in load_module
  File "site-packages/google/resumable_media/requests/__init__.py", line 673, in <module>
  File "site-packages/six.py", line 737, in raise_from
ImportError: ``requests >= 2.18.0`` is required by the ``google.resumable_media.requests`` subpackage.
It can be installed via
    pip install google-resumable-media[requests].

Even after adding google-resumable-media[requests] to the set of pip dependencies being installed, the same message appears.

type: question

opened by kingkupps 9

BigQuery: 400 PUT: Unknown upload notification type: 5

This error seems to happen randomly, running again with the same configuration usually succeeds. The CSV has around 10M rows / 1G compressed.

Environment details

OS type and version

Linux 58900130398d 4.9.184-linuxkit googleapis/google-cloud-python#1 SMP Tue Jul 2 22:58:16 UTC 2019 x86_64 GNU/Linux

Python version and virtual environment information: python --version

Python 3.6.8

google-cloud- version: pip show google-<service> or pip freeze

google-cloud-bigquery==1.23.0

Steps to reproduce

Unable to reproduce, seems to happen randomly.

Code example

        with open(csv_file_path, 'rb') as readable:
            job_config = bigquery.LoadJobConfig()
            job_config.source_format = 'CSV'
            job_config.skip_leading_rows = 1
            job_config.allow_quoted_newlines = True

            job = self.bigquery_client.load_table_from_file(
                readable,
                table_reference,
                job_config=job_config
            )

            return job

https://github.com/keboola/google-bigquery-writer/blob/ef75b76208510a23dec840446623c75bb2d73286/google_bigquery_writer/writer.py#L127

Stack trace

Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/google/cloud/bigquery/client.py", line 1494, in load_table_from_file
file_obj, job_resource, num_retries
File "/usr/local/lib/python3.6/site-packages/google/cloud/bigquery/client.py", line 1806, in _do_resumable_upload
response = upload.transmit_next_chunk(transport)
File "/usr/local/lib/python3.6/site-packages/google/resumable_media/requests/upload.py", line 427, in transmit_next_chunk
self._process_response(response, len(payload))
File "/usr/local/lib/python3.6/site-packages/google/resumable_media/_upload.py", line 597, in _process_response
callback=self._make_invalid,
File "/usr/local/lib/python3.6/site-packages/google/resumable_media/_helpers.py", line 96, in require_status_code
*status_codes
google.resumable_media.common.InvalidResponse: ('Request failed with status code', 400, 'Expected one of', <HTTPStatus.OK: 200>, 308)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "./main.py", line 11, in main
application.run()
File "/home/google_bigquery_writer/app.py", line 127, in run
self.action_run()
File "/home/google_bigquery_writer/app.py", line 179, in action_run
incremental=incremental
File "/home/google_bigquery_writer/writer.py", line 145, in write_table_sync
incremental=incremental
File "/home/google_bigquery_writer/writer.py", line 127, in write_table
job_config=job_config
File "/usr/local/lib/python3.6/site-packages/google/cloud/bigquery/client.py", line 1501, in load_table_from_file
raise exceptions.from_http_response(exc.response)
google.api_core.exceptions.BadRequest: 400 PUT https://bigquery.googleapis.com/upload/bigquery/v2/projects/***/jobs?uploadType=resumable&upload_id=***: Unknown upload notification type: 5

type: bug priority: p1 needs more info external api: bigquery

opened by ondrejhlavacek 8

chore(main): release 2.4.1
:robot: I have created a release beep boop

2.4.1 (2023-01-06)

Bug Fixes

Avoid validating checksums for partial responses (#361) (354287f)

This PR was generated with Release Please. See documentation.
autorelease: pending api: storage size: xs
opened by release-please[bot] 0
Explicit version range needed for dependency pytest-asyncio?

System test failure was due to a pytest-asyncio package issue, in which the loop cleanup changes were causing test errors. To unblock PR submissions, a fix is submitted to pin pytest-asyncio to previous working version. Follow up with dependency change and whether or not we should keep it pinned with version range.
type: process api: storage

opened by cojenco 1
Reduce duplication for tests suite files

As part of #179 it was raised that we may have opportunity to refactor some of our resources a bit to avoid duplication. For instance, tests_async/system/credentials.json.enc could likely be shared with sync.
type: process api: storage

opened by crwilcox 0
Should an explicit dependency on google-auth be added?

It seems we don't currently have an explicit dependency on google-auth, though testing requires it. I reverted a change I made as part of async work, but am making this issue to consider further discussion on this. I am guessing we have a use or resumable media that isn't using google-auth?

https://github.com/googleapis/google-resumable-media-python/pull/179/commits/13ede9152183e732791c9c3030793e5dbf49d648
type: question api: storage priority: p3

opened by crwilcox 1
Support cancelling a Resumable Upload

Support for this API https://cloud.google.com/storage/docs/performing-resumable-uploads#cancel-upload in https://github.com/googleapis/google-resumable-media-python/blob/master/google/resumable_media/requests/upload.py#L152 would be a good addition.
type: feature request api: storage

opened by romil93 1

Releases(v2.4.0)

v2.4.0(Sep 29, 2022)
2.4.0 (2022-09-29)

Features

Handle interrupted downloads with decompressive transcoding (#346) (f4d26b7)

Bug Fixes

Allow recover to check the status of upload regardless of state (#343) (3599267)

Require python 3.7+ (#337) (942665f)

Use unittest.mock (#329) (82f9769)

Documentation

Fix changelog header to consistent size (#331) (7b1dc9c)

Source code(tar.gz)
Source code(zip)
v2.3.3(May 19, 2022)
2.3.3 (2022-05-05)

Bug Fixes

retry client side requests timeout (#319) (d0649c7)

Source code(tar.gz)
Source code(zip)
v2.3.2(Mar 8, 2022)
2.3.2 (2022-03-08)

Bug Fixes

append existing headers in prepare_initiate_request (#314) (dfaa317)

Source code(tar.gz)
Source code(zip)
v2.3.1(Mar 3, 2022)
2.3.1 (2022-03-03)

Bug Fixes

include existing headers in prepare request (#309) (010680b)

Source code(tar.gz)
Source code(zip)
v2.3.0(Feb 23, 2022)
2.3.0 (2022-02-11)

Features

safely resume interrupted downloads (#294) (b363329)

Source code(tar.gz)
Source code(zip)
v2.2.1(Feb 9, 2022)
2.2.1 (2022-02-09)

Bug Fixes

don't overwrite user-agent on requests (42b380e)

Source code(tar.gz)
Source code(zip)
v2.2.0(Feb 7, 2022)
Features

add 'py.typed' declaration file (#287) (cee4164)

add support for signed resumable upload URLs (#290) (e1290f5)

Bug Fixes

add user-agent on requests (#295) (e107a0c)

Source code(tar.gz)
Source code(zip)
v2.1.0(Oct 25, 2021)
Features

add support for Python 3.10 (#279) (4dbd14a)

Bug Fixes

Include ConnectionError and urllib3 exception as retriable (#282) (d33465f)

Source code(tar.gz)
Source code(zip)
v2.0.3(Sep 20, 2021)
Bug Fixes

add REQUEST_TIMEOUT 408 as retryable code (#270) (d0ad0aa)

un-pin google-crc32c (#267) (6b03a13)

Source code(tar.gz)
Source code(zip)
v2.0.2(Sep 2, 2021)
Bug Fixes

temporarily pin google-crc32c to 1.1.2 to mitigate upstream issue affecting OS X Big Sur (#264) (9fa344f)

Source code(tar.gz)
Source code(zip)
v2.0.1(Sep 1, 2021)
Bug Fixes

check if retry is allowed after retry wait calculation (#258) (00ccf71)

do not mark upload download instances invalid with retriable error codes (#261) (a1c5f7d)

Source code(tar.gz)
Source code(zip)
v2.0.0(Aug 19, 2021)
⚠ BREAKING CHANGES

drop Python 2.7 support (#229) (af10d4b)

Bug Fixes

retry ConnectionError and similar errors that occur mid-download (#251) (bb3ec13)

Source code(tar.gz)
Source code(zip)
v2.0.0b1(Aug 2, 2021)
⚠ BREAKING CHANGES

drop Python 2.7 support (#229) (af10d4b)

Source code(tar.gz)
Source code(zip)
v1.3.3(Jul 30, 2021)
Reverts

revert "fix: add retry coverage to the streaming portion of a download" (#245) (98673d0)

Source code(tar.gz)
Source code(zip)
v1.3.2(Jul 29, 2021)
Bug Fixes

add retry coverage to the streaming portion of a download (#241) (cc1f07c)

Source code(tar.gz)
Source code(zip)
v1.3.1(Jun 18, 2021)
Bug Fixes

deps: require six>=1.4.0 (#194) (a840691)

Source code(tar.gz)
Source code(zip)
v1.3.0(May 18, 2021)
Features

allow RetryStrategy to be configured with a custom initial wait and multiplier (#216) (579a54b)

Documentation

address terminology (#201) (a88cfb9)

Source code(tar.gz)
Source code(zip)
v1.2.0(Dec 14, 2020)
Features

add support for Python 3.9, drop support for Python 3.5 (#191) (76839fb), closes #189

add retries for 'requests.ConnectionError' (#186) (0d76eac)

Source code(tar.gz)
Source code(zip)
v1.1.0(Oct 5, 2020)
Features

add _async_resumable_media experimental support (#179) (03c11ba), closes #160 #153 #176 #178

Bug Fixes

allow space in checksum header (#170) (224fc98), closes #169

lint: blacken 5 files (#171) (cdea3ee)

Source code(tar.gz)
Source code(zip)
v1.0.0(Aug 24, 2020)
Features

bump 'google-crc32c >= 1.0' (#162) (eaf9faa)

Source code(tar.gz)
Source code(zip)
v0.7.1(Aug 6, 2020)
Dependencies

pin 'google-crc32c < 0.2dev' (#160) (52a322d)

Documentation

update docs build (via synth) (#155) (1c33de4)

use googleapis.dev docs link (#149) (90bd0c1)

Source code(tar.gz)
Source code(zip)
v0.7.0(Jul 23, 2020)
Features

add configurable checksum support for uploads (#139) (68264f8)

Bug Fixes

accept 201 Created as valid upload response (#141) (00d280e), closes #125 #124

Source code(tar.gz)
Source code(zip)
v0.6.0(Jul 16, 2020)
Features

add customizable timeouts to upload/download methods (#116) (5310921)

add configurable crc32c checksumming for downloads (#135) (db31bf5)

add templates for python samples projects (#506) (#132) (8e60cc4)

Documentation

update client_documentation link (#136) (063b4f9)

Source code(tar.gz)
Source code(zip)
v0.5.1(Jun 3, 2020)

Source code(tar.gz)
Source code(zip)
v0.5.0(Oct 29, 2019)
10-28-2019 09:16 PDT

New Features

Add raw download classes. (#109)

Documentation

Update Sphinx inventory URL for requests library. (#108)

Internal / Testing Changes

Initial synth. (#105)

Remove CircleCI. (#102)

Source code(tar.gz)
Source code(zip)
v0.4.1(Sep 17, 2019)
09-16-2019 17:59 PDT

Implementation Changes

Revert "Always use raw response data. (#87)" (#103)

Internal / Testing Changes

Add black. (#94)

Source code(tar.gz)
Source code(zip)
v0.4.0(Sep 6, 2019)
09-05-2019 11:59 PDT

Backward-Compatibility Note

The change to use raw response data (PR #87) might break the hypothetical usecase of downloading a blob marked with Content-Encoding: gzip and expecting to get the expanded data.

Implementation Changes

Require 200 response for initial resumable upload request. (#95)

Use response as variable for object returned from http_request. (#98)

Further DRY request dependency pins. (#96)

Finish download on seeing 416 response with zero byte range. (#86)

Always use raw response data. (#87)

Dependencies

Drop runtime dependency check on requests. (#97)

Documentation

Update docs after release (#93)

Source code(tar.gz)
Source code(zip)
v0.3.3(Aug 26, 2019)
08-23-2019 14:15 PDT

Implementation Changes

Add a default timeout for the http_request method (#88)

DRY 'requests' pin; don't shadow exception. (#83)

Drop a hardcoded value in an error message. (#48)

Documentation

Reconstruct 'CHANGELOG.md' from pre-releasetool era releases. (#66)

Internal / Testing Changes

Use Kokoro for CI (#90)

Renovate: preserve semver ranges. (#82)

Add renovate.json (#79)

Fix systest bitrot. (#77)

Fix docs build redux. (#75)

Update to new nox (#57)

Source code(tar.gz)
Source code(zip)
0.3.2(Dec 20, 2018)
Implementation Changes

Using str instead of repr for multipart boundary.

Dependencies

Making requests a strict dependency for the requests subpackage.

Documentation

Announce deprecation of Python 2.7 (#51)

Fix broken redirect after repository move

Updating generated static content in docs.

Internal / Testing Changes

Modify file not found test to look for the correct error message

Harden tests so they can run with debug logging statements

Adding AppVeyor support.

Marking the version in master as .dev1.

Source code(tar.gz)
Source code(zip)
0.3.1(Oct 20, 2017)
Added fix (#36) to correctly compute an MD5 checksum for Download-s that have Content-Encoding: gzip.

PyPI: https://pypi.org/project/google-resumable-media/0.3.1/ Docs: https://googlecloudplatform.github.io/google-resumable-media-python/0.3.1/
Source code(tar.gz)
Source code(zip)

Owner

Google APIs

Clients for Google APIs and tools that help produce them.

GitHub Repository

google-resumable-media Apache-2google-resumable-media (🥉28 · ⭐ 27) - Utilities for Google Media Downloads and Resumable.. Apache-2

Related tags

Overview

google-resumable-media

Experimental asyncio Support

Supported Python Versions

Deprecated Python Versions

License

Comments

Issue

Environment details

Steps to reproduce

Code example (Find and Replace UPDATE_THIS)

Environment details

Steps to reproduce

Code example

Stack trace

:robot: I have created a release beep boop

2.4.1 (2023-01-06)

Bug Fixes

Releases(v2.4.0)

v2.4.0(Sep 29, 2022)

2.4.0 (2022-09-29)

Features

Bug Fixes

Documentation

v2.3.3(May 19, 2022)

2.3.3 (2022-05-05)

Bug Fixes

v2.3.2(Mar 8, 2022)

2.3.2 (2022-03-08)

Bug Fixes

v2.3.1(Mar 3, 2022)

2.3.1 (2022-03-03)

Bug Fixes

v2.3.0(Feb 23, 2022)

2.3.0 (2022-02-11)

Features

v2.2.1(Feb 9, 2022)

2.2.1 (2022-02-09)

Bug Fixes

v2.2.0(Feb 7, 2022)

Features

Bug Fixes

v2.1.0(Oct 25, 2021)

Features

Bug Fixes

v2.0.3(Sep 20, 2021)

Bug Fixes

v2.0.2(Sep 2, 2021)

Bug Fixes

v2.0.1(Sep 1, 2021)

Bug Fixes

v2.0.0(Aug 19, 2021)

⚠ BREAKING CHANGES

Bug Fixes

v2.0.0b1(Aug 2, 2021)

⚠ BREAKING CHANGES

v1.3.3(Jul 30, 2021)

Reverts

v1.3.2(Jul 29, 2021)

Bug Fixes

v1.3.1(Jun 18, 2021)

Bug Fixes

v1.3.0(May 18, 2021)

Features

Documentation

v1.2.0(Dec 14, 2020)

Features

v1.1.0(Oct 5, 2020)

Features

Bug Fixes

v1.0.0(Aug 24, 2020)

Features

v0.7.1(Aug 6, 2020)

Dependencies

Documentation

v0.7.0(Jul 23, 2020)

Features

Bug Fixes

`google-resumable-media`

Code example (Find and Replace `UPDATE_THIS`)