Pandas Google BigQuery

Overview

pandas-gbq

Build Status Version Status Coverage Status Black Formatted

pandas-gbq is a package providing an interface to the Google BigQuery API from pandas

Installation

Install latest release version via conda

$ conda install pandas-gbq --channel conda-forge

Install latest release version via pip

$ pip install pandas-gbq

Install latest development version

$ pip install git+https://github.com/pydata/pandas-gbq.git

Usage

See the pandas-gbq documentation for more details.

Comments
  • ENH: Convert read_gbq() function to use google-cloud-python

    ENH: Convert read_gbq() function to use google-cloud-python

    Description

    I've rewritten the current read_gbq() function using google-cloud-python, which handles the naming of structs and arrays out of the box. For more discussion about this, see: https://github.com/pydata/pandas-gbq/issues/23.

    ~However, because of the fact that google-cloud-python potentially uses different authentication flows and may break existing behavior, I've left the existing read_gbq() function and and named this new function from_gbq(). If in the future we are able to reconcile the authentication flows and/or decide to deprecate flows that are not supported in google-cloud-python, we can rename this to read_gbq().~

    UPDATE: As requested in comment by @jreback (https://github.com/pydata/pandas-gbq/pull/25/files/a763cf071813c836b7e00ae40ccf14e93e8fd72b#r110518161), I deleted old read_gbq() and named my new function read_gbq(), deleting all legacy functions and code.

    Added in a few lines to requirements file, but I'll leave it to you @jreback to deal with conda dependency issues which you mentioned in Issue 23.

    Let know if any questions or if any tests need to be written. You can confirm that it works by running the following:

    q = """
    select ROW_NUMBER() over () row_num, struct(a,b) col, c, d, c*d c_times_d, e
    from
    (select * from
        (SELECT 1 a, 2 b, null c, 0 d, 100 e)
        UNION ALL
        (SELECT 5 a, 6 b, 0 c, null d, 200 e)
        UNION ALL
        (SELECT 8 a, 9 b, 10.0 c, 10 d, 300 e)
    )
    """
    df = gbq.read_gbq(q, dialect='standard')
    df
    

    | row_num | col | c | d | c_times_d | e | |---------|--------------------|------|------|-----------|-----| | 2 | {u'a': 5, u'b': 6} | 0.0 | NaN | NaN | 200 | | 1 | {u'a': 1, u'b': 2} | NaN | 0.0 | NaN | 100 | | 3 | {u'a': 8, u'b': 9} | 10.0 | 10.0 | 100.0 | 300 |

    q = """
    select array_agg(a) mylist
    from
    (select "1" a UNION ALL select "2" a)
    """
    df = gbq.read_gbq(q, dialect='standard')
    df
    

    | mylist | |--------| | [1, 2] |

    q = """
    select array_agg(struct(a,b)) col, f
    from
    (select * from
        (SELECT 1 a, 2 b, null c, 0 d, 100 e, "hello" f)
        UNION ALL
        (SELECT 5 a, 6 b, 0 c, null d, 200 e, "ok" f)
        UNION ALL
        (SELECT 8 a, 9 b, 10.0 c, 10 d, 300 e, "ok" f)
    )
    group by f
    """
    df = gbq.read_gbq(q, dialect='standard')
    df
    

    | col | f | |------------------------------------------|-------| | [{u'a': 5, u'b': 6}, {u'a': 8, u'b': 9}] | ok | | [{u'a': 1, u'b': 2}] | hello |

    Confirmed that col_order and index_col still work ~(feel free to pull that out into a separate function since there's now redundant code with read_gbq())~, and I removed the type conversion lines which appear to be unnecessary (google-cloud-python and/or pandas appears to do the necessary type conversion automatically, even if there are nulls; can confirm by examining the datatypes in the resulting dataframes).

    type: feature request 
    opened by jasonqng 55
  • Performance

    Performance

    We're starting to use BigQuery heavily but becoming increasingly 'bottlenecked' with the performance of moving moderate amounts of data from BigQuery to python.

    Here's a few stats:

    • 29.1s: Pulling 500k rows with 3 columns of data (with cached data) using pandas-gbq
    • 36.5s: Pulling the same query with google-cloud-bigquery - i.e. client.query(query)..to_dataframe()
    • 2.4s: Pulling very similar data - same types, same size, from our existing MSSQL box hosted in AWS (using pd.read_sql). That's on standard drivers, nothing like turbodbc involved

    ...so using BigQuery with python is at least an order of magnitude slower than traditional DBs.

    We've tried exporting tables to CSV on GCS and reading those in, which works fairly well for data processes, though not for exploration.

    A few questions - feel free to jump in with partial replies:

    • Are these results expected, or are we doing something very wrong?
    • My prior is that a lot of this slowdown is caused by pulling in HTTP pages, converting to python objects, and then writing those into arrays. Is this approach really scalable? Should pandas-gbq invest resources into getting a format that's query-able in exploratory workflows that can deal with more reasonable datasets? (or at least encourage Google to)
    opened by max-sixty 45
  • BUG: oauth2client deprecated, use google-auth instead.

    BUG: oauth2client deprecated, use google-auth instead.

    Remove the use of oauth2client and use google-auth library, instead.

    Rather than check for multiple versions of the libraries, use the setup.py to specify compatible versions. I believe this is safe since Pandas checks for the pandas_gbq package.

    Since google-auth does not use the argparse module to override user authentication flow settings, add a parameter to choose between the web and console flow.

    Closes https://github.com/pydata/pandas-gbq/issues/37.

    type: bug 
    opened by tswast 41
  • to_gbq result in UnicodeEncodeError

    to_gbq result in UnicodeEncodeError

    Hi, I'm using Heroku to run a python based ETL process where I'm pushing the contents of a Pandas dataframe into Google BQ using to_gbq. However, it's generating a UnicodeEncodeError with the following stack trace, due to some non-latin characters.

    Strangely this works fine on my Mac but when I try to run it on Heroku, it's failing. It seems that for some reason, http.client.py is getting an un-encoded string rather than bytes and therefore, it's trying to encode with latin-1, which is the default but obviously would choke on anything non-latin, like Chinese chars.

    2018-01-08T04:54:17.307496+00:00 app[run.2251]: Load is 100.0% Complete044+00:00 app[run.2251]: 2018-01-08T04:54:20.443238+00:00 app[run.2251]: Traceback (most recent call last): 2018-01-08T04:54:20.443267+00:00 app[run.2251]: File "AllCostAndRev.py", line 534, in 2018-01-08T04:54:20.443708+00:00 app[run.2251]: main(yaml.dump(data=ads_dict)) 2018-01-08T04:54:20.443710+00:00 app[run.2251]: File "AllCostAndRev.py", line 475, in main 2018-01-08T04:54:20.443915+00:00 app[run.2251]: private_key=environ['skynet_bq_pk'] 2018-01-08T04:54:20.443917+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/site-packages/pandas_gbq/gbq.py", line 989, in to_gbq 2018-01-08T04:54:20.444390+00:00 app[run.2251]: connector.load_data(dataframe, dataset_id, table_id, chunksize) 2018-01-08T04:54:20.444391+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/site-packages/pandas_gbq/gbq.py", line 590, in load_data 2018-01-08T04:54:20.444653+00:00 app[run.2251]: job_config=job_config).result() 2018-01-08T04:54:20.444656+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/site-packages/google/cloud/bigquery/client.py", line 748, in load_table_from_file 2018-01-08T04:54:20.445248+00:00 app[run.2251]: response = upload.transmit_next_chunk(transport) 2018-01-08T04:54:20.445250+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/site-packages/google/resumable_media/requests/upload.py", line 395, in transmit_next_chunk 2018-01-08T04:54:20.444942+00:00 app[run.2251]: file_obj, job_resource, num_retries) 2018-01-08T04:54:20.445457+00:00 app[run.2251]: retry_strategy=self._retry_strategy) 2018-01-08T04:54:20.444943+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/site-packages/google/cloud/bigquery/client.py", line 777, in _do_resumable_upload 2018-01-08T04:54:20.445458+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/site-packages/google/resumable_media/requests/_helpers.py", line 101, in http_request 2018-01-08T04:54:20.445592+00:00 app[run.2251]: func, RequestsMixin._get_status_code, retry_strategy) 2018-01-08T04:54:20.445594+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/site-packages/google/resumable_media/_helpers.py", line 146, in wait_and_retry 2018-01-08T04:54:20.445725+00:00 app[run.2251]: response = func() 2018-01-08T04:54:20.445726+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/site-packages/google/auth/transport/requests.py", line 186, in request 2018-01-08T04:54:20.445866+00:00 app[run.2251]: method, url, data=data, headers=request_headers, **kwargs) 2018-01-08T04:54:20.445867+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/site-packages/requests/sessions.py", line 508, in request 2018-01-08T04:54:20.446099+00:00 app[run.2251]: resp = self.send(prep, **send_kwargs) 2018-01-08T04:54:20.446101+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/site-packages/requests/sessions.py", line 618, in send 2018-01-08T04:54:20.446456+00:00 app[run.2251]: r = adapter.send(request, **kwargs) 2018-01-08T04:54:20.446457+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/site-packages/requests/adapters.py", line 440, in send 2018-01-08T04:54:20.446728+00:00 app[run.2251]: timeout=timeout 2018-01-08T04:54:20.446730+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/site-packages/urllib3/connectionpool.py", line 601, in urlopen 2018-01-08T04:54:20.446969+00:00 app[run.2251]: chunked=chunked) 2018-01-08T04:54:20.446970+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/site-packages/urllib3/connectionpool.py", line 357, in _make_request 2018-01-08T04:54:20.447229+00:00 app[run.2251]: conn.request(method, url, **httplib_request_kw) 2018-01-08T04:54:20.447231+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/http/client.py", line 1239, in request 2018-01-08T04:54:20.447690+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/http/client.py", line 1284, in _send_request 2018-01-08T04:54:20.448232+00:00 app[run.2251]: body = _encode(body, 'body') 2018-01-08T04:54:20.448234+00:00 app[run.2251]: File "/app/.heroku/python/lib/python3.6/http/client.py", line 161, in _encode 2018-01-08T04:54:20.448405+00:00 app[run.2251]: UnicodeEncodeError: 'latin-1' codec can't encode characters in position 553626-553628: Body ('信用卡') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8. 2018-01-08T04:54:20.447689+00:00 app[run.2251]: self._send_request(method, url, body, headers, encode_chunked) 2018-01-08T04:54:20.448396+00:00 app[run.2251]: (name.title(), data[err.start:err.end], name)) from None 2018-01-08T04:54:20.621819+00:00 heroku[run.2251]: State changed from up to complete 2018-01-08T04:54:20.609814+00:00 heroku[run.2251]: Process exited with status 1

    opened by 2legit 24
  • Import error with pandas_gbq

    Import error with pandas_gbq

    There's a bug with the most recent google bigquery library. Hence, this error occurs in pandas-gbq

    ImportError: pandas-gbq requires google-cloud-bigquery: cannot import name 'collections_abc'

    opened by winsonhys 23
  • When appending to a table, load if the dataframe contains a subset of the existing schema

    When appending to a table, load if the dataframe contains a subset of the existing schema

    Purpose

    Current behavior of to_gbq is fail if the schema of the new data is not equivalent to the current schema. However, this means that the load fails if the new data is missing columns that are present in the current schema. For instance, this may occur when the data source I am using to construct the dataframe only provides non-empty values. Rather than determining the current schema of the GBQ table and adding empty columns to my dataframe, I would like to_gbq to load my data if the columns in the dataframe are a subset of the current schema.

    Primary changes made

    • Factoring a schema function out of verify_schema to support both verify_schema and schema_is_subset
    • schema_is_subset determines whether local_schema is a subset of remote_schema
    • the append flag uses schema_is_subset rather than verify_schema to determine if the data can be loaded

    Auxiliary changes made

    • PROJECT_ID etc are retrieved from an environment variable to facilitate local testing
    • Running test_gbq through autopep8 added a row after two class names
    opened by mr-mcox 19
  • Set project_id (and other settings) once for all subsequent queries so you don't have to pass every time

    Set project_id (and other settings) once for all subsequent queries so you don't have to pass every time

    One frustrating thing is having to pass the project_id (among other parameters) every time you write a query. For example, personally, I usually use the same project_id, almost always query with standard sql, and usually turn off verbose. I have to pass those three with every read_gbq, typing which adds up.

    Potential options include setting an environment variable and reading from these default settings, but sometimes it can be different each time and fiddling with environment variables feels unfriendly. My suggestion would perhaps be to add a class that can wrap read_gbq() and to_gbq() in a client object. You could set the project_id attribute and dialect and whatever else in the client object, then re-use the object every time you want a query with those settings.

    A very naive implementation here in this branch: https://github.com/pydata/pandas-gbq/compare/master...jasonqng:client-object-class?expand=1

    Usage would be like:

    >>> import gbq
    >>> client = gbq.Client(project_id='project-name',dialect='standard',verbose=False)
    >>> client.read("select 1")
       f0_
    0    1
    >>> client.read("select 2")
       f0_
    0    2
    >>> client.verbose=True
    >>> client.read("select 3")
    Requesting query... ok.
    Job ID: c7d7e4c0-883a-4e14-b35f-61c9fae0c08b
    Query running...
    Query done.
    Processed: 0.0 B Billed: 0.0 B
    Standard price: $0.00 USD
    
    Retrieving results...
    Got 1 rows.
    
    Total time taken 1.66 s.
    Finished at 2018-01-02 14:06:01.
       f0_
    0    3
    

    Does that seem like a reasonable solution to all this extra typing or is there another preferred way? If so, I can open up a PR with the above branch.

    Thanks, my tired fingers thank you all!

    @tswast @jreback @parthea @maxim-lian

    opened by jasonqng 17
  • Structs lack proper names as dicts and arrays get turned into array of dicts

    Structs lack proper names as dicts and arrays get turned into array of dicts

    Version 0.1.4

    This query returns a improperly named dict:

    q = """
    select struct(a,b) col
    from
    (SELECT 1 a, 2 b)
    """
    df = gbq.read_gbq(q, dialect='standard', verbose=False)
    

    image

    Compare with result from Big Query: image

    An array of items also get turned into a arrays of dicts sometimes. For example:

    q = """
    select array_agg(a)
    from
    (select "1" a UNION ALL select "2" a)
    """
    gbq.read_gbq(q, dialect='standard', verbose=False, project_id='project')
    

    outputs: image

    Compare to Big Query: image

    These issues may or may not be related?

    type: bug help wanted 
    opened by jasonqng 17
  • Printing rather than logging?

    Printing rather than logging?

    We're printing in addition to logging, when querying from BigQuery. This makes controlling the output much harder, aside from being un-idiomatic.

    Printing in white, logging in red:

    https://cloud.githubusercontent.com/assets/5635139/23176541/6028b884-f831-11e6-911a-48aa7741a4da.png

    type: feature request 
    opened by max-sixty 17
  • BUG: Add bigquery scope for google credentials

    BUG: Add bigquery scope for google credentials

    Bigquery requires scoped credentials when loading application default credentials

    Quick test code below, it will return "invalid token" error. When uncomment the create_scoped() statement, the code run correctly without any error.

    # use google default application credentials
    export GOOGLE_APPLICATION_CREDENTIALS=/PATH/TO/GOOGLE_DEFAULT_CREDENTIALS.json
    
    import httplib2
    
    from googleapiclient.discovery import build
    from oauth2client.client import GoogleCredentials
    
    credentials = GoogleCredentials.get_application_default()
    #credentials = credentials.create_scoped('https://www.googleapis.com/auth/bigquery')
    
    http = httplib2.Http()
    http = credentials.authorize(http)
    
    service = build('bigquery', 'v2', http=http)
    
    jobs = service.jobs()
    job_data = {'configuration': {'query': {'query': 'SELECT 1'}}}
    
    jobs.insert(projectId='projectid', body=job_data).execute()
    
    type: bug 
    opened by xcompass 16
  • read_gbq() unnecessarily waiting on getting default credentials from Google

    read_gbq() unnecessarily waiting on getting default credentials from Google

    When attempting to grant pandas access to my GBQ project, I am running into an issue where read_gbq is trying to get default credentials, failing / timing out, then printing out a URL to go to to grant the credentials. Since I'm not running this on google cloud platform, I do not expect to be able to get default credentials. In my case, I only want to run the CLI flow (without having oauth call back to my local server).

    Here's the code

    >>> import pandas_gbq as gbq
    >>> gbq.read_gbq('SELECT 1', project_id=<project_id>, auth_local_webserver=False)
    

    Here's what I see when I trigger a SIGINT once the query is invoked:

      File "/usr/lib/python3.5/site-packages/pandas_gbq/gbq.py", line 214, in get_credentials
        credentials = self.get_application_default_credentials()
      File "/usr/lib/python3.5/site-packages/pandas_gbq/gbq.py", line 243, in get_application_default_credentials
        credentials, _ = google.auth.default(scopes=[self.scope])
      File "/usr/lib/python3.5/site-packages/google/auth/_default.py", line 277, in default
        credentials, project_id = checker()
      File "/usr/lib/python3.5/site-packages/google/auth/_default.py", line 274, in <lambda>
        lambda: _get_gce_credentials(request))
      File "/usr/lib/python3.5/site-packages/google/auth/_default.py", line 176, in _get_gce_credentials
        if _metadata.ping(request=request):
      File "/usr/lib/python3.5/site-packages/google/auth/compute_engine/_metadata.py", line 73, in ping
        timeout=timeout)
      File "/usr/lib/python3.5/site-packages/google/auth/transport/_http_client.py", line 103, in __call__
        method, path, body=body, headers=headers, **kwargs)
      File "/usr/lib/python3.5/http/client.py", line 1106, in request
        self._send_request(method, url, body, headers)
      File "/usr/lib/python3.5/http/client.py", line 1151, in _send_request
        self.endheaders(body)
      File "/usr/lib/python3.5/http/client.py", line 1102, in endheaders
        self._send_output(message_body)
      File "/usr/lib/python3.5/http/client.py", line 934, in _send_output
        self.send(msg)
      File "/usr/lib/python3.5/http/client.py", line 877, in send
        self.connect()
      File "/usr/lib/python3.5/http/client.py", line 849, in connect
        (self.host,self.port), self.timeout, self.source_address)
      File "/usr/lib/python3.5/socket.py", line 702, in create_connection
        sock.connect(sa)
    KeyboardInterrupt
    

    I've also tried setting the env variable GOOGLE_APPLICATIONS_CREDENTIALS to empty. I'm using pandas-gbq version at commit 64a19b.

    type: bug type: cleanup 
    opened by dfontenot 15
  • docs: fix reading dtypes

    docs: fix reading dtypes

    Hello. I faced the same confusion as #579, so have tried to update the docs.

    Not only the BQ DATE type but also the TIME, TIMESTAMP, and FLOAT64 types seems to be wrong.

    It seems to be due to a breaking change in google-cloud-bigquery v3.
    https://cloud.google.com/python/docs/reference/bigquery/latest/upgrading#changes-to-data-types-loading-a-pandas-dataframe

    I have confirmed the correct dtypes as:

    >>> import pandas
    
    >>> sql1 = """
    SELECT
      TRUE AS BOOL,
      123 AS INT64,
      123.456 AS FLOAT64,
    
      TIME '12:30:00.45' AS TIME,
      DATE "2023-01-01" AS DATE,
      DATETIME "2023-01-01 12:30:00.45" AS DATETIME,
      TIMESTAMP "2023-01-01 12:30:00.45" AS TIMESTAMP
    """
    
    >>> pandas.read_gbq(sql).dtypes
    BOOL                        boolean
    INT64                         Int64
    FLOAT64                     float64
    TIME                         dbtime
    DATE                         dbdate
    DATETIME             datetime64[ns]
    TIMESTAMP       datetime64[ns, UTC]
    dtype: object
    
    >>> sql2 = """
    SELECT
      DATE "2023-01-01" AS DATE,
      DATETIME "2023-01-01 12:30:00.45" AS DATETIME,
      TIMESTAMP "2023-01-01 12:30:00.45" AS TIMESTAMP,
    UNION ALL
    SELECT
      DATE "2263-04-12" AS DATE,
      DATETIME "2263-04-12 12:30:00.45" AS DATETIME,
      TIMESTAMP "2263-04-12 12:30:00.45" AS TIMESTAMP
    """
    
    >>> pandas.read_gbq(sql2).dtypes
    DATE         object
    DATETIME     object
    TIMESTAMP    object
    dtype: object
    

    Fixes #579 🦕

    api: bigquery size: s 
    opened by yokomotod 1
  • feat: adds ability to provide redirect uri

    feat: adds ability to provide redirect uri

    WIP PR for discussion: aiming to provide the ability to include a redirect URI, client ID, and client secrets to facilitate the migration away from "out of band" OAuth authentication.

    @tswast

    See also changes in these repos:

    • https://github.com/googleapis/python-bigquery-pandas/pull/595 #python-bigquery-pandas
    • https://github.com/googleapis/google-auth-library-python-oauthlib/pull/259
    • https://github.com/pydata/pydata-google-auth/pull/58
    api: bigquery size: m 
    opened by chalmerlowe 1
  • Problems installing the package on macOS M1 chip

    Problems installing the package on macOS M1 chip

    Hi,

    I am having problems to install this package on a macos with M1 chip.

    The error: Could not find <Python.h>. This could mean the following: * You're on Ubuntu and haven't run apt-get install python3-dev. * You're on RHEL/Fedora and haven't run yum install python3-devel or dnf install python3-devel (make sure you also have redhat-rpm-config installed) * You're on Mac OS X and the usual Python framework was somehow corrupted (check your environment variables or try re-installing?) * You're on Windows and your Python installation was somehow corrupted (check your environment variables or try re-installing?)

      [end of output]
    

    note: This error originates from a subprocess, and is likely not a problem with pip. error: legacy-install-failure

    × Encountered error while trying to install package. ╰─> grpcio

    note: This is an issue with the package mentioned above, not pip. hint: See above for output from the failure.

    Environment details

    • OS type and version: MacOS 12.4 (chip M1)
    • Python version: Python 3.10.0
    • pip version: pip 22.3.1

    Thanks!

    api: bigquery 
    opened by davidgarciatwenix 0
  • Ability to handle a dry_run

    Ability to handle a dry_run

    Hi, after checking out the pandas_gbq.read_gbq call parametrization I see that I can supply configuration={'dry_run': True} to make the query job to be a dry run.

    However it will still attempt to find query destination to try download rows, which in this case will be nonexistent. It would be great if the pandas_gbq would be aware of dry_run and just output the query stats to debug log or return some stats data.

    e.g. querying something like this: pandas_gbq.read_gbq("SELECT * FROM 'my_project_id.billing_ds.cloud_pricing_export'", configuration={'dry_run': True})

    still results in the exception

    Traceback (most recent call last): File "big_query_utils.py", line 134, in print(read_df('SELECT * FROM 'my_project_id.billing_ds.cloud_pricing_export'', configuration={'dry_run': True})) File "/Users/.../big_query/big_query_utils.py", line 95, in read_df return pandas_gbq.read_gbq(sql_or_table_id, **gbq_kwargs) File "/Users/.../lib/python3.9/site-packages/pandas_gbq/gbq.py", line 921, in read_gbq final_df = connector.run_query( File "/Users/.../lib/python3.9/site-packages/pandas_gbq/gbq.py", line 526, in run_query rows_iter = self.client.list_rows( File "/Users/.../lib/python3.9/site-packages/google/cloud/bigquery/client.py", line 3790, in list_rows table = self.get_table(table.reference, retry=retry, timeout=timeout) File "/Users/.../lib/python3.9/site-packages/google/cloud/bigquery/client.py", line 1034, in get_table api_response = self._call_api( File "/Users/.../lib/python3.9/site-packages/google/cloud/bigquery/client.py", line 782, in _call_api return call() File "/Users/.../lib/python3.9/site-packages/google/api_core/retry.py", line 283, in retry_wrapped_func return retry_target( File "/Users/.../lib/python3.9/site-packages/google/api_core/retry.py", line 190, in retry_target return target() File "/Users/.../lib/python3.9/site-packages/google/cloud/_http/init.py", line 494, in api_request raise exceptions.from_http_response(response) google.api_core.exceptions.NotFound: 404 GET https://bigquery.googleapis.com/bigquery/v2/projects/my_project_id/datasets/_6a20f817b1e72d456384bdef157062be9989000e/tables/anon71d825e7efee2856ce2b5e50a3df3a2579fd5583d14740ca3064bab740c8ffd9?prettyPrint=false: Not found: Table my_project_id:_6a20f817b1e72d456384bdef157062be9989000e.anon71d825e7efee2856ce2b5e50a3df3a2579fd5583d14740ca3064bab740c8ffd9

    type: feature request api: bigquery 
    opened by ehborisov 0
  • NUMERIC Field failing with conversion from NoneType to Decimal is not supported

    NUMERIC Field failing with conversion from NoneType to Decimal is not supported

    Saving data to NUMERIC Field failing with conversion from NoneType to Decimal is not supported

    • python 3.9
    • pandas 1.5.1

    Stack trace

    
    ...........
    
    df.to_gbq(project_id=self.client.project,
              File "/Users/xxxx/.local/lib/python3.9/site-packages/pandas-1.5.1-py3.9-macosx-10.9-x86_64.egg/pandas/core/frame.py", line 2168, in to_gbq
    gbq.to_gbq(
        File "/Users/xxxx/.local/lib/python3.9/site-packages/pandas-1.5.1-py3.9-macosx-10.9-x86_64.egg/pandas/io/gbq.py", line 218, in to_gbq
    pandas_gbq.to_gbq(
        File "/Users/xxxx/.local/lib/python3.9/site-packages/pandas_gbq-0.17.9-py3.9.egg/pandas_gbq/gbq.py", line 1198, in to_gbq
    connector.load_data(
        File "/Users/xxxx/.local/lib/python3.9/site-packages/pandas_gbq-0.17.9-py3.9.egg/pandas_gbq/gbq.py", line 591, in load_data
    chunks = load.load_chunks(
        File "/Users/xxxx/.local/lib/python3.9/site-packages/pandas_gbq-0.17.9-py3.9.egg/pandas_gbq/load.py", line 240, in load_chunks
    load_parquet(
        File "/Users/xxxx/.local/lib/python3.9/site-packages/pandas_gbq-0.17.9-py3.9.egg/pandas_gbq/load.py", line 128, in load_parquet
    dataframe = cast_dataframe_for_parquet(dataframe, schema)
    File "/Users/xxxx/.local/lib/python3.9/site-packages/pandas_gbq-0.17.9-py3.9.egg/pandas_gbq/load.py", line 103, in cast_dataframe_for_parquet
    cast_column = dataframe[column_name].map(decimal.Decimal)
    File "/Users/xxxx/.local/lib/python3.9/site-packages/pandas-1.5.1-py3.9-macosx-10.9-x86_64.egg/pandas/core/series.py", line 4539, in map
    new_values = self._map_values(arg, na_action=na_action)
    File "/Users/xxxx/.local/lib/python3.9/site-packages/pandas-1.5.1-py3.9-macosx-10.9-x86_64.egg/pandas/core/base.py", line 890, in _map_values
    new_values = map_f(values, mapper)
    File "pandas/_libs/lib.pyx", line 2918, in pandas._libs.lib.map_infer
    TypeError: conversion from NoneType to Decimal is not supported
    
    api: bigquery 
    opened by ismailsimsek 1
  • ImportError: cannot import name 'external_account_authorized_user' from 'google.auth'

    ImportError: cannot import name 'external_account_authorized_user' from 'google.auth'

    Environment details

    • OS type and version: Linux
    • Python version: 3.9
    • pip version: 21.2
    • pandas-gbq version: 0.17.9

    Steps to reproduce

    1. Running a simple query using
    test = pd.read_gbq('select * from `data-production.dwh_core.transaction_code` limit 1', 
                       project_id='data-production', 
                       dialect='standard', 
                       location='asia-southeast2')
    

    Results in:

    ImportError: cannot import name 'external_account_authorized_user' from 'google.auth' (/opt/conda/lib/python3.8/site-packages/google/auth/__init__.py)
    

    Stack trace

    ---------------------------------------------------------------------------
    ImportError                               Traceback (most recent call last)
    <ipython-input-2-99714eec0d64> in <module>
    ----> 1 import pydata_google_auth
    
    /opt/conda/lib/python3.8/site-packages/pydata_google_auth/__init__.py in <module>
    ----> 1 from .auth import default
          2 from .auth import get_user_credentials
          3 from .auth import load_user_credentials
          4 from .auth import save_user_credentials
          5 from .auth import load_service_account_credentials
    
    /opt/conda/lib/python3.8/site-packages/pydata_google_auth/auth.py in <module>
          6 import google.auth.exceptions
          7 import google.oauth2.credentials
    ----> 8 from google_auth_oauthlib import flow
          9 import oauthlib.oauth2.rfc6749.errors
         10 import google.auth.transport.requests
    
    /opt/conda/lib/python3.8/site-packages/google_auth_oauthlib/__init__.py in <module>
         19 """
         20 
    ---> 21 from .interactive import get_user_credentials
         22 
         23 __all__ = ["get_user_credentials"]
    
    /opt/conda/lib/python3.8/site-packages/google_auth_oauthlib/interactive.py in <module>
         25 import socket
         26 
    ---> 27 import google_auth_oauthlib.flow
         28 
         29 
    
    /opt/conda/lib/python3.8/site-packages/google_auth_oauthlib/flow.py in <module>
         67 import google.oauth2.credentials
         68 
    ---> 69 import google_auth_oauthlib.helpers
         70 
         71 
    
    /opt/conda/lib/python3.8/site-packages/google_auth_oauthlib/helpers.py in <module>
         25 import json
         26 
    ---> 27 from google.auth import external_account_authorized_user
         28 import google.oauth2.credentials
         29 import requests_oauthlib
    
    ImportError: cannot import name 'external_account_authorized_user' from 'google.auth' (/opt/conda/lib/python3.8/site-packages/google/auth/__init__.py)
    

    Resolution

    We had to downgrade the google-auth-oauthlib to 0.5.3:

     !pip install google-auth-oauthlib==0.5.3
    

    It seems like the most recent change (October 25) broke something: https://pypi.org/project/google-auth-oauthlib/#history

    api: bigquery 
    opened by benjamintanweihao 0
Releases(v0.18.1)
MySQL Operator for Kubernetes

MySQL Operator for Kubernetes The MYSQL Operator for Kubernetes is an Operator for Kubernetes managing MySQL InnoDB Cluster setups inside a Kubernetes

MySQL 462 Dec 24, 2022
SpyQL - SQL with Python in the middle

SpyQL SQL with Python in the middle Concept SpyQL is a query language that combines: the simplicity and structure of SQL with the power and readabilit

Daniel Moura 853 Dec 30, 2022
Pysolr — Python Solr client

pysolr pysolr is a lightweight Python client for Apache Solr. It provides an interface that queries the server and returns results based on the query.

Haystack Search 626 Dec 01, 2022
A wrapper around asyncpg for use with sqlalchemy

asyncpgsa A python library wrapper around asyncpg for use with sqlalchemy Backwards incompatibility notice Since this library is still in pre 1.0 worl

Canopy 404 Dec 03, 2022
Find graph motifs using intuitive notation

d o t m o t i f Find graph motifs using intuitive notation DotMotif is a library that identifies subgraphs or motifs in a large graph. It looks like t

APL BRAIN 45 Jan 02, 2023
A collection of awesome sqlite tools, scripts, books, etc

Awesome Series @ Planet Open Data World (Countries, Cities, Codes, ...) • Football (Clubs, Players, Stadiums, ...) • SQLite (Tools, Books, Schemas, ..

Planet Open Data 205 Dec 16, 2022
A pythonic interface to Amazon's DynamoDB

PynamoDB A Pythonic interface for Amazon's DynamoDB. DynamoDB is a great NoSQL service provided by Amazon, but the API is verbose. PynamoDB presents y

2.1k Dec 30, 2022
Create a database, insert data and easily select it with Sqlite

sqliteBasics create a database, insert data and easily select it with Sqlite Watch on YouTube a step by step tutorial explaining this code: https://yo

Mariya 27 Dec 27, 2022
Async ORM based on PyPika

PyPika-ORM - ORM for PyPika SQL Query Builder The package gives you ORM for PyPika with asycio support for a range of databases (SQLite, PostgreSQL, M

Kirill Klenov 7 Jun 04, 2022
MongoX is an async python ODM for MongoDB which is built on top Motor and Pydantic.

MongoX MongoX is an async python ODM (Object Document Mapper) for MongoDB which is built on top Motor and Pydantic. The main features include: Fully t

Amin Alaee 112 Dec 04, 2022
Import entity definition document into SQLie3. Manage the entity. Also, create a "Create Table SQL file".

EntityDocumentMaker Version 1.00 After importing the entity definition (Excel file), store the data in sqlite3. エンティティ定義(Excelファイル)をインポートした後、データをsqlit

G-jon FujiYama 1 Jan 09, 2022
MinIO Client SDK for Python

MinIO Python SDK for Amazon S3 Compatible Cloud Storage MinIO Python SDK is Simple Storage Service (aka S3) client to perform bucket and object operat

High Performance, Kubernetes Native Object Storage 582 Dec 28, 2022
Class to connect to XAMPP MySQL Database

MySQL-DB-Connection-Class Class to connect to XAMPP MySQL Database Basta fazer o download o mysql_connect.py e modificar os parâmetros que quiser. E d

Alexandre Pimentel 4 Jul 12, 2021
Familiar asyncio ORM for python, built with relations in mind

Tortoise ORM Introduction Tortoise ORM is an easy-to-use asyncio ORM (Object Relational Mapper) inspired by Django. Tortoise ORM was build with relati

Tortoise 3.3k Dec 31, 2022
PostgreSQL database adapter for the Python programming language

psycopg2 - Python-PostgreSQL Database Adapter Psycopg is the most popular PostgreSQL database adapter for the Python programming language. Its main fe

The Psycopg Team 2.8k Jan 05, 2023
Apache Libcloud is a Python library which hides differences between different cloud provider APIs and allows you to manage different cloud resources through a unified and easy to use API

Apache Libcloud - a unified interface for the cloud Apache Libcloud is a Python library which hides differences between different cloud provider APIs

The Apache Software Foundation 1.9k Dec 25, 2022
Estoult - a Python toolkit for data mapping with an integrated query builder for SQL databases

Estoult Estoult is a Python toolkit for data mapping with an integrated query builder for SQL databases. It currently supports MySQL, PostgreSQL, and

halcyon[nouveau] 15 Dec 29, 2022
aioodbc - is a library for accessing a ODBC databases from the asyncio

aioodbc aioodbc is a Python 3.5+ module that makes it possible to access ODBC databases with asyncio. It relies on the awesome pyodbc library and pres

aio-libs 253 Dec 31, 2022
Redis Python Client

redis-py The Python interface to the Redis key-value store. Python 2 Compatibility Note redis-py 3.5.x will be the last version of redis-py that suppo

Andy McCurdy 11k Dec 29, 2022
This repository is for active development of the Azure SDK for Python.

Azure SDK for Python This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public

Microsoft Azure 3.4k Jan 02, 2023