Client library to download and publish models and other files on the huggingface.co hub

Last update: Jan 01, 2023

Overview

`huggingface_hub`

Client library to download and publish models and other files on the huggingface.co hub

Do you have an open source ML library? We're looking to partner with a small number of other cool open source ML libraries to provide model hosting + versioning. https://twitter.com/julien_c/status/1336374565157679104 https://twitter.com/mnlpariente/status/1336277058062852096

Advantages are:

versioning is built-in (as hosting is built around git and git-lfs), no lock-in, you can just git clone away.

anyone can upload a new model for your library, just need to add the corresponding tag for the model to be discoverable – no more need for a hardcoded list in your code

Fast downloads! We use Cloudfront (a CDN) to geo-replicate downloads so they're blazing fast from anywhere on the globe

Usage stats and more features to come

Ping us if interested 😎

♻️ Partial list of implementations in third party libraries:

Download files from the huggingface.co hub

Integration inside a library is super simple. We expose two functions, hf_hub_url() and cached_download().

`hf_hub_url`

hf_hub_url() takes:

a model id (like julien-c/EsperBERTo-small i.e. a user or organization name and a repo name, separated by /),
a filename (like pytorch_model.bin),
and an optional git revision id (can be a branch name, a tag, or a commit hash)

and returns the url we'll use to download the actual files: https://huggingface.co/julien-c/EsperBERTo-small/resolve/main/pytorch_model.bin

If you check out this URL's headers with a HEAD http request (which you can do from the command line with curl -I) for a few different files, you'll see that:

small files are returned directly
large files (i.e. the ones stored through git-lfs) are returned via a redirect to a Cloudfront URL. Cloudfront is a Content Delivery Network, or CDN, that ensures that downloads are as fast as possible from anywhere on the globe.

`cached_download`

cached_download() takes the following parameters, downloads the remote file, stores it to disk (in a versioning-aware way) and returns its local file path.

Parameters:

a remote url
your library's name and version (library_name and library_version), which will be added to the HTTP requests' user-agent so that we can provide some usage stats.
a cache_dir which you can specify if you want to control where on disk the files are cached.

Check out the source code for all possible params (we'll create a real doc page in the future).

Publish models to the huggingface.co hub

Uploading a model to the hub is super simple too:

create a model repo directly from the website, at huggingface.co/new (models can be public or private, and are namespaced under either a user or an organization)
clone it with git
download and install git lfs if you don't already have it on your machine (you can check by running a simple git lfs)
add, commit and push your files, from git, as you usually do.

We are intentionally not wrapping git too much, so that you can go on with the workflow you’re used to and the tools you already know.

👀 To see an example of how we document the model sharing process in transformers, check out https://huggingface.co/transformers/model_sharing.html

Users add tags into their README.md model cards (e.g. your library_name, a domain tag like audio, etc.) to make sure their models are discoverable.

Documentation about the model hub itself is at https://huggingface.co/docs

API utilities in `hf_api.py`

You don't need them for the standard publishing workflow, however, if you need a programmatic way of creating a repo, deleting it (⚠️ caution), or listing models from the hub, you'll find helpers in hf_api.py.

We also have an API to query models by specific tags (e.g. if you want to list models compatible to your library)

`huggingface-cli`

Those API utilities are also exposed through a CLI:

huggingface-cli login
huggingface-cli logout
huggingface-cli whoami
huggingface-cli repo create

Need to upload large (>5GB) files?

To upload large files (>5GB 🔥 ), you need to install the custom transfer agent for git-lfs, bundled in this package.

To install, just run:

$ huggingface-cli lfs-enable-largefiles

This should be executed once for each model repo that contains a model file >5GB. If you just try to push a file bigger than 5GB without running that command, you will get an error with a message reminding you to run it.

Finally, there's a huggingface-cli lfs-multipart-upload command but that one is internal (called by lfs directly) and is not meant to be called by the user.

Visual integration into the huggingface.co hub

Finally, we'll implement a few tweaks to improve the UX for your models on the website – let's use Asteroid as an example:

Model authors add an asteroid tag to their model card and they get the advantages of model versioning built-in

We add a custom "Use in Asteroid" button.

When clicked you get a library-specific code sample that you'll be able to specify. 🔥

Feedback (feature requests, bugs, etc.) is super welcome 💙 💚 💛 💜 ♥️ 🧡

Comments

:triangular_flag_on_post: Scan cache tool: ability to free up space
Originally from @stas00 in slack (internal link):

for me the main query / need is usually to free up some disk space and so I'd look at the top entries of these 3 groups:

nuking the largest entries (obvious)

nuking the long not accessed entries (sorted by access time)

may be also nuking the oldest entries (probably not using them anymore) (i.e. sorted by file create time) - but most likely 2. would have already caught this category

In general, this is a feature request already discussed in https://github.com/huggingface/huggingface_hub/pull/990. The approach of "pruning" we are aiming is to provide the user information and a tool to delete a specific revision and then make it as easy as possible for the user to define its own strategy.
enhancement
opened by Wauplin 32
Developer mode requirement on Windows
The current snapshot_download and hf_hub_download methods currently use symlinks for efficient storage management. However, symlinks are not properly supported on Windows where administrator privileges or Developer Mode needs to be enabled in order to be used.

We chose to take this approach so that it mirrors the linux/osx behavior.

Opening an issue here to track issues encountered by users in the ecosystem:

https://github.com/huggingface/transformers/issues/19048
opened by LysandreJik 27

Add autcompletion to huggingface-cli (fix #1197)

$ huggingface-cli repo create -<TAB>
Name for your repo. Will be namespaced under your username to build the repo id.
option
--help           show this help message and exit
-h               show this help message and exit
--organization   Optional: organization namespace.
--space_sdk      Optional: Hugging Face Spaces SDK type. Required when --type is set to "space".
--type           Optional: repo_type: set to "dataset" or "space" if creating a dataset or space, default is model.
--yes            Optional: answer Yes to the prompt
-y               Optional: answer Yes to the prompt

on-hold

opened by Freed-Wu 25

Git: find a "better" way to handle tokens than git credential store
Mentioned in https://github.com/huggingface/huggingface_hub/issues/1043#issuecomment-1246009544.

Currently we store the user token for git commands in the git-credential-store. This is the default git storage that stores creds in plain text in a file. huggingface_hub warns the user to use it by default to avoid problems (by running git config --global credential.helper store). In a perfect world, it would be good to use the default credential helper from the user. In particular, macos users have a macosxkeychain tool by default to securely handle credentials.

Another possibility is to not store the credential in git and automatically fill the values (from python) when git requires them (in the Repository module).

Note: I am no expert on that topic so any addition is welcomed here :)

Useful links:

git credentials doc

git-credential-store doc (the one we use, stores the password in plain in a file)

git credential doc (higher-level API to abstract the store helper)

(Edit: also to mention that when a user do huggingface-cli login or notebook_login(), the token is also stored locally in plain text in the home directory ~/.huggingface/token to be reused in API calls. Changing this is out of topic for this issue)
opened by Wauplin 22

413 Client Error: Payload Too Large when using upload_folder on a lot of files

Describe the bug

When trying to commit a folder with many CSV files, I got the following error:

HTTPError: 413 Client Error: Payload Too Large for url: https://huggingface.co/api/datasets/nateraw/test-upload-folder-bug/preupload/main

I assume there is a limit to total payload size when uploading a folder that I am going over here. I confirmed it has nothing to do with the number of files, but rather the total size of the files that are being uploaded. It would be great in the short term if we could document what this limit is clearly in the upload_folder fn.

Reproduction

The following fails on the last line. I wrote it so you can run it yourself without updating the repo ID or anything...so if you're logged in, the below should work (assuming you have torchvision installed).

import os

from torchvision.datasets.utils import download_and_extract_archive
from huggingface_hub import upload_folder, whoami, create_repo

user = whoami()['name']
repo_id = f'{user}/test-upload-folder-bug'
create_repo(repo_id, exist_ok=True, repo_type='dataset')

os.mkdir('./data')
download_and_extract_archive(
    url='https://zenodo.org/api/files/f7f7377b-8405-4d4f-b814-f021df5593b1/hyperbard_data.zip',
    download_root='./data',
    remove_finished=True
)
upload_folder(
    folder_path='./data',
    path_in_repo="",
    repo_id=repo_id,
    repo_type='dataset'
)

Logs

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-2-91516b1ea47f> in <module>()
     18     path_in_repo="",
     19     repo_id=repo_id,
---> 20     repo_type='dataset'
     21 )

3 frames
/usr/local/lib/python3.7/dist-packages/huggingface_hub/hf_api.py in upload_folder(self, repo_id, folder_path, path_in_repo, commit_message, commit_description, token, repo_type, revision, create_pr)
   2115             token=token,
   2116             revision=revision,
-> 2117             create_pr=create_pr,
   2118         )
   2119 

/usr/local/lib/python3.7/dist-packages/huggingface_hub/hf_api.py in create_commit(self, repo_id, operations, commit_message, commit_description, token, repo_type, revision, create_pr, num_threads)
   1813             token=token,
   1814             revision=revision,
-> 1815             endpoint=self.endpoint,
   1816         )
   1817         upload_lfs_files(

/usr/local/lib/python3.7/dist-packages/huggingface_hub/_commit_api.py in fetch_upload_modes(additions, repo_type, repo_id, token, revision, endpoint)
    380         headers=headers,
    381     )
--> 382     resp.raise_for_status()
    383 
    384     preupload_info = validate_preupload_info(resp.json())

/usr/local/lib/python3.7/dist-packages/requests/models.py in raise_for_status(self)
    939 
    940         if http_error_msg:
--> 941             raise HTTPError(http_error_msg, response=self)
    942 
    943     def close(self):

HTTPError: 413 Client Error: Payload Too Large for url: https://huggingface.co/api/datasets/nateraw/test-upload-folder-bug/preupload/main



### System Info

```shell
Colab

bug

opened by nateraw 22

Add fastai upstream and downstream capacities for fastai>=2.4 and fastcore>=1.3.27 versions
Sorry, I created a new PR (previous one: PR huggingface/huggingface_hub#416) I had a couple of mistakes with git, and seemed easier to create a new one. Thanks @muellerzr and @osanseviero for your comments!

What

Add fastai upstream and downstream capabilities for versions fastai>=2.4 (link) and fastcore>=1.3.27 (link).

Inform users that lower versions of fastai are not supported yet (TODO -- Examine whether it is worth implementing previous versions).

Why

Supporting version fastai1 might not be worth it. Loading and pushing Learners involves several changes and the fastai1 library has not been updated for over a year.

Supporting versions 2.0.6>= fastai <2.4 involve complex changes. Fastai was updated due to what appear to be modifications to pytorch/serialization.py. For the sake of agility in our first release, I suggest releasing this version without supporting these previous versions.

fastai>=2.4 versions work with fastcore>=1.3.27 version which is installed automatically when fastai>=2.4 is installed.

How?

Following huggingface_hub practices, file_download.py checks for fastcore and fastai availability and versions.

The fastai and fastcore versions used to train the Learner are automatically stored in a config.json when pushed to hub.

A README.md is automatically generated, if none, with the fastai tag.

Testing?

Tested that all fastai>=2.4 versions work with the present code and with the fastcore==1.3.27 version.

Next?

Examine whether it is worth implementing fastai <2.4 versions. I will ask on the fastai forums to assess how many users would require this support.

Update description on how to load fastai models in Libraries.ts.

enhancement
opened by omarespejel 22
FIX Avoid creating repository when it exists on remote
This PR partially fixes https://github.com/huggingface/huggingface_hub/issues/672.

I observed that the following error pops up when I create a repository using huggingface API create_repo and then have a local repository cloning from the URL using a token (here) ValueError: No space_sdk provided. create_repo expects space_sdk to be one of ['gradio', 'streamlit', 'static'] when repo_type is 'space' So I noticed the following block was causing the problem:

if token is not None: whoami_info = self.client.whoami(token) user = whoami_info["name"] valid_organisations = [org["name"] for org in whoami_info["orgs"]] if namespace is not None: repo_id = f"{namespace}/{repo_id}" repo_url += repo_id scheme = urlparse(repo_url).scheme repo_url = repo_url.replace(f"{scheme}://", f"{scheme}://user:{token}@") if namespace == user or namespace in valid_organisations: self.client.create_repo( repo_id=repo_id, token=token, repo_type=self.repo_type, exist_ok=True, private=self.private, ) else: if namespace is not None: repo_url += f"{namespace}/" repo_url += repo_id

Only name being valid shouldn't be enough to create a repository, we have to check if the repository exists on remote so I added a small check on that (thanks @muellerzr for pointing out to particular HfApi function that does it).

With this, space_sdk error should be gone. I'll add test once @LysandreJik approves my logic in here.

Also this is separate but IDK if it's good to have an attribute clone_from and the clone_from() function. I can refactor couple of things I saw there that's not good to me as well.
opened by merveenoyan 21
Add text classification for spaCy

This requires adding a text-classification script here that can be based in the token-classification implementation.

There is useful spaCy documentation in https://spacy.io/api/textcategorizer, but I think this should be straightforward to implement. Here is an example repo to use for testing - https://huggingface.co/edichief/en_textcat_goemotions
good first issue

opened by osanseviero 20

[RFC] Proposal for a way to cache files in downstream libraries

This is a proposal following discussions started with the datasets team (@lhoestq @albertvillanova).

The goal is to have a proper way to cache any kind of files from a downstream library and manage them (e.g.: scan and delete) from huggingface_hub. From hfh's perspective, there is not much work to do. We should have a canonical procedure to generate cache paths for a library. Then within a cache folder, the downstream library handles its files as it wants. Once this helper starts to be used, we can adapt the scan-cache and delete-cache commands.

I tried to document the cached_assets_path() helper to describe the way I see it. Any feedback is welcomed, this is really just a proposal. All the examples are very datasets-focused but I think this could benefit to other libraries as transformers (@sgugger @LysandreJik ), diffusers (@apolinario @patrickvonplaten) or skops (@adrinjalali @merveenoyan) to store any kind of intermediate files. IMO the difficulty mainly resides in making the feature used :smile:.

EDIT: see generated documentation here. EDIT 2: assets/ might be a better naming here (common naming in dev)

WDYT ?

(cc @julien-c @osanseviero as well)

Example:

>>> from huggingface_hub import cached_assets_path

>>> cached_assets_path(library_name="datasets", namespace="SQuAD", subfolder="download")
PosixPath('/home/wauplin/.cache/huggingface/extra/datasets/SQuAD/download')

>>> cached_assets_path(library_name="datasets", namespace="SQuAD", subfolder="extracted")
PosixPath('/home/wauplin/.cache/huggingface/extra/datasets/SQuAD/extracted')

>>> cached_assets_path(library_name="datasets", namespace="SQuAD")
PosixPath('/home/wauplin/.cache/huggingface/extra/datasets/SQuAD/default')

>>> cached_assets_path(library_name="datasets", subfolder="modules")
PosixPath('/home/wauplin/.cache/huggingface/extra/datasets/default/modules')

>>> cached_assets_path(library_name="datasets", cache_dir="/tmp/tmp123456")
PosixPath('/tmp/tmp123456/datasets/default/default')

And the generated tree:

    assets/
    ├── datasets/
    │   ├── default/
    │   │   ├── modules/
    │   ├── SQuAD/
    │   │   ├── downloaded/
    │   │   ├── extracted/
    │   │   └── processed/
    │   ├── Helsinki-NLP--tatoeba_mt/
    │       ├── downloaded/
    │       ├── extracted/
    │       └── processed/
    └── transformers/
        ├── default/
        │   ├── something/
        ├── bert-base-cased/
        │   ├── default/
        │   └── training/
    hub/
    └── models--julien-c--EsperBERTo-small/
        ├── blobs/
        │   ├── (...)
        │   ├── (...)
        ├── refs/
        │   └── (...)
        └── [ 128]  snapshots/
            ├── 2439f60ef33a0d46d85da5001d52aeda5b00ce9f/
            │   ├── (...)
            └── bbc77c8132af1cc5cf678da3f1ddf2de43606d48/
                └── (...)

opened by Wauplin 19

API deprecate positional args in file_download and hf_api
Fixes #732

This PR deprecates passing positional args to most functions and methods in file_download.py and hf_api.py.

Things to discuss:

whether we want to make all parameters kwarg only or leave some as positional

make more parts of the API kwarg only

cc @julien-c @LysandreJik @osanseviero

Question: do we have a place to put changelog/release logs? How do we handle those now?
opened by adrinjalali 19

Logging with organization token is successful and leads to side effects

It was reported that users get KeyError when push_to_hub_keras() is used to push for organization.

Example:

from huggingface_hub import push_to_hub_keras
push_to_hub_keras(model=vqvae_trainer, repo_url='https://huggingface.co/keras-io/vq_vae', organization='keras-io')

When organization token is explicitly written, the problem goes away. Example:

push_to_hub_keras(model=forest_model, .....
                  repo_path_or_name='.', repo_url = "https://huggingface.co/keras-io/deep-neural-decision-forests", 
                  use_auth_token=keras_io_hub_token)

Is this intended behavior? @nateraw Also cc: @osanseviero

opened by merveenoyan 19

Repository does not work on HF spaces

Hi,

I am recently getting an error when trying to use my HF space together with the Repository from HF hub resulting in the following error message.

I would assume this is a bug since before it worked nicely. On my local machine the entrypoint.py below also works. I am open for any suggestions 😃

Traceback (most recent call last):
  File "/home/user/.local/lib/python3.8/site-packages/huggingface_hub/repository.py", line 742, in clone_from
    run_subprocess("git lfs install", self.local_dir)
  File "/home/user/.local/lib/python3.8/site-packages/huggingface_hub/utils/_subprocess.py", line 61, in run_subprocess
    return subprocess.run(
  File "/usr/local/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['git', 'lfs', 'install']' returned non-zero exit status 2.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "entrypoint.py", line 6, in 
    Repository("repos/hand-ki-model", f"https://oauth2:{os.getenv('HANDKIGIT5')}@git5.cs.fau.de/folle/hand-ki-model.git", use_auth_token=os.getenv(""))
  File "/home/user/.local/lib/python3.8/site-packages/huggingface_hub/utils/_deprecation.py", line 101, in inner_f
    return f(*args, **kwargs)
  File "/home/user/.local/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 124, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/user/.local/lib/python3.8/site-packages/huggingface_hub/repository.py", line 528, in __init__
    self.clone_from(repo_url=clone_from)
  File "/home/user/.local/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 124, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/user/.local/lib/python3.8/site-packages/huggingface_hub/repository.py", line 797, in clone_from
    raise EnvironmentError(exc.stderr)
OSError: Hook already exists: pre-push

	#!/bin/sh
	command -v git-lfs >/dev/null 2>&1 || { echo >&2 "\nThis repository is configured for Git LFS but 'git-lfs' was not found on your path. If you no longer wish to use Git LFS, remove this hook by deleting '.git/hooks/pre-push'.\n"; exit 2; }
	git lfs pre-push "$@"

To resolve this, either:
  1: run `git lfs update --manual` for instructions on how to merge hooks.
  2: run `git lfs update --force` to overwrite your hook.

This is my entrypoint.py:

import os
import sys
import subprocess
from huggingface_hub import Repository

Repository("repos/hand-ki-model", f"https://oauth2:{os.getenv('HANDKIGIT5')}@git5.cs.fau.de/folle/hand-ki-model.git", use_auth_token=os.getenv(""))
subprocess.check_call([sys.executable, "-m", "pip", "install", "repos/hand-ki-model/"])
import app

opened by lukasfolle 0

[Dataset | Model card] When pushing to template repos, work on actual raw contents

Question: do we actually want this feature or not?

Internal Slack convo cc @Wauplin

Details

Generated commit for model cards (would need to do the same for datasets): https://huggingface.co/templates/model-card-example/commit/901deccf5acead553f9b082aca480d966e61f355

opened by julien-c 5

KeyError: 'multilinguality' when calling DatasetSearchArguments()

Describe the bug

KeyError: 'multilinguality' when calling DatasetSearchArguments()

Reproduction

from huggingface_hub import DatasetSearchArguments
dataset_args = DatasetSearchArguments()

Logs

from huggingface_hub import DatasetSearchArguments
dataset_args = DatasetSearchArguments()


Error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/var/folders/1h/lqt86wdn4nq9z9h_c4q7s4gr0000gn/T/ipykernel_43840/980484241.py in <module>
      1 from huggingface_hub import DatasetSearchArguments
----> 2 dataset_args = DatasetSearchArguments()

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/huggingface_hub/hf_api.py in __init__(self, api)
    548     def __init__(self, api: Optional["HfApi"] = None):
    549         self._api = api if api is not None else HfApi()
--> 550         tags = self._api.get_dataset_tags()
    551         super().__init__(tags)
    552         self._process_models()

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/huggingface_hub/hf_api.py in get_dataset_tags(self)
    669         hf_raise_for_status(r)
    670         d = r.json()
--> 671         return DatasetTags(d)
    672 
    673     @_deprecate_list_output(version="0.14")

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/huggingface_hub/utils/endpoint_helpers.py in __init__(self, dataset_tag_dictionary)
    365             "license",
    366         ]
--> 367         super().__init__(dataset_tag_dictionary, keys)

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/huggingface_hub/utils/endpoint_helpers.py in __init__(self, tag_dictionary, keys)
    298             keys = list(self._tag_dictionary.keys())
    299         for key in keys:
--> 300             self._unpack_and_assign_dictionary(key)
    301 
    302     def _unpack_and_assign_dictionary(self, key: str):

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/huggingface_hub/utils/endpoint_helpers.py in _unpack_and_assign_dictionary(self, key)
    303         "Assignes nested attributes to `self.key` containing information as an `AttributeDictionary`"
    304         setattr(self, key, AttributeDictionary())
--> 305         for item in self._tag_dictionary[key]:
    306             ref = getattr(self, key)
    307             item["label"] = (

KeyError: 'multilinguality'

System info

- huggingface_hub version: 0.11.1
- Platform: macOS-10.16-x86_64-i386-64bit
- Python version: 3.9.6
- Running in iPython ?: Yes
- iPython shell: ZMQInteractiveShell
- Running in notebook ?: Yes
- Running in Google Colab ?: No
- Token path ?: /Users/x/.huggingface/token
- Has saved token ?: False
- Configured git credential helpers: osxkeychain
- FastAI: N/A
- Tensorflow: 2.9.1
- Torch: N/A
- Jinja2: 3.0.1
- Graphviz: N/A
- Pydot: N/A

bug

opened by animator 0

Refacto Repository tests
TL;DR:

Sorry for huge diff :cry:

Repository tests work on Windows (related to https://github.com/huggingface/huggingface_hub/pull/1112)

Repository tests run in parallel

Repository tests use /tmp dir

Less redundancy

First of all, sorry for the huge diff in the PR :cry: . I started it as a preliminary work for https://github.com/huggingface/huggingface_hub/pull/1112 (Windows CI):

I started to refacto how paths are generated => no more "directory/whatever_file.txt" as it breaks on Windows

While doing that, I removed the WORKING_DIR_FIXTURE to replace it by a proper /tmp folder => Avoid to have to clean it after tests (when tests were failing, I often ended up with untracked folders in my huggingface_hub folder that I had to delete)

Then I read more the tests and realized we were doing a lot of redundant work (creating 1 new repo on the Hub for each test even though we do not modify it)

I've split repository tests between TestRepositoryShared and TestRepositoryUniqueRepos. In shared test, a single repo is created on the Hub and cloned multiple times. No files are pushed to this repo during the tests. => Save quite some time of setup/teardown. In Unique test, a repo is created per test (same as before) so that they are independent from each other.

I also added attributes like self.repo_id, self.repo_path and self.repo_url and an helper self.clone_from(...) to avoid reduncancy (e.g. recomputing repo_id=f"{USER}/{self.REPO_NAME}", local_dir="{WORKING_DIR}/{REPO_NAME}" all the time).

And finally I removed some tests that were duplicated (*).

=> Now we have a Repository test suite that should be "iso" compared to before but running in parallel (~1min instead of 8) and on Windows.

(*) For example in dataset tests, we are testing that cloning from a different repo type works. But also testing that we can commit, pull, push,... which is quite redundant because what we really want is to check if a repo can be cloned with a repo_type. All subsequent actions are already tested separately ("test only 1 feature at a time").

(EDIT: "non repository" tests are failing but it's not due to this PR :confused:)
opened by Wauplin 1
hf_hub_download call does not increase the download counter
Describe the bug

I have downloaded this model more than 100 times in the last week, but the counter shows 0 downloads: https://huggingface.co/fcakyon/yolov5s-v7.0

Can you tell me which request triggers the download counter? Does hf_hub_download function trigger this counter?

Reproduction

Using yolov5 pypi package for downloading the model.

Here is the download code: https://github.com/fcakyon/yolov5-pip/blob/main/yolov5/utils/downloads.py#L131-L145

Logs

No response

System info

I have downloaded this model from windows 10 and ubuntu 18.04 with huggingface=hub==0.11.1.
bug
opened by fcakyon 7

Releases(v0.11.1)

v0.11.1(Nov 28, 2022)

Hot-fix to fix permission issues when downloading with hf_hub_download or snapshot_download. For more details, see https://github.com/huggingface/huggingface_hub/pull/1220, https://github.com/huggingface/huggingface_hub/issues/1141 and https://github.com/huggingface/huggingface_hub/issues/1215.

Full changelog: https://github.com/huggingface/huggingface_hub/compare/v0.11.0...v0.11.1
Source code(tar.gz)
Source code(zip)
v0.11.0(Nov 14, 2022)
New features and improvements for HfApi

HfApi is the central point to interact with the Hub API (manage repos, create commits,...). The goal is to propose more and more git-related features using HTTP endpoints to allow users to interact with the Hub without cloning locally a repo.

Create/delete tags and branches

from huggingface_hub import create_branch, create_tag, delete_branch, delete_tag create_tag(repo_id, tag="v0.11", tag_message="Release v0.11") delete_tag(repo_id, tag="something") # If you created a tag by mistake create_branch(repo_id, branch="experiment-154") delete_branch(repo_id, branch="experiment-1") # Clean some old branches

Add a create_tag method to create tags from the HTTP endpoint by @Wauplin in #1089

Add delete_tag method to HfApi by @Wauplin in #1128

Create tag twice doesn't work by @Wauplin in #1149

Add "create_branch" and "delete_branch" endpoints by @Wauplin #1181

Upload lots of files in a single commit

Making a very large commit was previously tedious. Files are now processed by chunks which makes it possible to upload 25k files in a single commit (and 1Gb payload limitation if uploading only non-LFS files). This should make it easier to upload large datasets.

Create commit by streaming a ndjson payload (allow lots of file in single commit) by @Wauplin in #1117

Delete an entire folder

from huggingface_hub import CommitOperationDelete, create_commit, delete_folder # Delete a single folder delete_folder(repo_id=repo_id, path_in_repo="logs/") # Alternatively, use the low-level `create_commit` create_commit( repo_id, operations=[ CommitOperationDelete(path_in_repo="old_config.json") # Delete a file CommitOperationDelete(path_in_repo="logs/") # Delete a folder ], commit_message=..., )

Delete folder with commit endpoint by @Wauplin in #1163

Support pagination when listing repos

In the future, listing models, datasets and spaces will be paginated on the Hub by default. To avoid breaking changes, huggingface_hub follows already pagination. Output type is currently a list (deprecated), will become a generator in v0.14.

Add support for pagination in list_models list_datasets and list_spaces by @Wauplin #1176

Deprecate output in list_models by @Wauplin in #1143

Misc

Allow create PR against non-main branch by @Wauplin in #1168

1162 Reorder operations correctly in commit endpoint by @Wauplin in #1175

Login, tokens and authentication

Authentication has been revisited to make it as easy as possible for the users.

Unified login and logout methods

from huggingface_hub import login, logout # `login` detects automatically if you are running in a notebook or a script # Launch widgets or TUI accordingly login() # Now possible to login with a hardcoded token (non-blocking) login(token="hf_***") # If you want to bypass the auto-detection of `login` notebook_login() # still available interpreter_login() # to login from a script # Logout programmatically logout()

# Still possible to login from CLI huggingface-cli login

Unified login/logout methods by @Wauplin in #1111

Set token only for a HfApi session

from huggingface_hub import HfApi # Token will be sent in every request but not stored on machine api = HfApi(token="hf_***")

Add token attribute to HfApi by @Wauplin in #1116

Stop using use_auth_token in favor of token, everywhere

token parameter can now be passed to every method in huggingface_hub. use_auth_token is still accepted where it previously existed but the mid-term goal (~6 months) is to deprecate and remove it.

Replace use_auth_token arg by token everywhere by @Wauplin in #1122

Respect git credential helper from the user

Previously, token was stored in the git credential store. Can now be in any helper configured by the user -keychain, cache,...-.

Refactor git credential handling in login workflow by @Wauplin in #1138

Better error handling

Helper to dump machine information

# Dump all relevant information. To be used when reporting an issue. ➜ huggingface-cli env Copy-and-paste the text below in your GitHub issue. - huggingface_hub version: 0.11.0.dev0 - Platform: Linux-5.15.0-52-generic-x86_64-with-glibc2.35 - Python version: 3.10.6 ...

1173 Add dump env helper by @Wauplin in #1177

Misc

Cache not found is not an error by @singingwolfboy in #1101

Propagate error messages when multiple on BadRequest by @Wauplin in #1115

Add error message from x-error-message header if exists by @Wauplin in #1121

Modelcards

Few improvements/fixes in the modelcard module:

:art: make repocard content a property by @nateraw in #1147

:white_check_mark: fix content string in repocard tests by @nateraw in #1155

Add Hub verification token to evaluation metadata by @lewtun in #1142

Use default model_name in metadata_update by @lvwerra in #1157

Refer to modelcard creator app from doc by @Wauplin in #1184

Parent Model --> Finetuned from model by @meg-huggingface #1191

FIX overwriting metadata when both verified and unverified reported values by @Wauplin in #1186

Cache assets

New feature to provide a path in the cache where any downstream library can store assets (processed data, files from the web, extracted data, rendered images,...)

[RFC] Proposal for a way to cache files in downstream libraries by @Wauplin in #1088

Documentation updates

Fixing a typo in the doc. by @Narsil in #1113

Fix docstring of list_datasets by @albertvillanova in #1125

Add repo_type=dataset possibility to guide by @Wauplin in #1134

Fix PyTorch & Keras mixin doc by @lewtun in #1139

Update how-to-manage.mdx by @severo in #1150

Typo fix by @meg-huggingface in #1166

Adds link to model card metadata spec by @meg-huggingface in #1171

Removing "Related Models" & just asking for "Parent Model" by @meg-huggingface in #1178

Breaking changes

Cannot provide an organization to create_repo

identical_ok removed in upload_file

Breaking changes in arguments for validate_preupload_info, prepare_commit_payload, _upload_lfs_object (internal helpers for the commit API)

huggingface_hub.snapshot_download is not exposed as a public module anymore

Deprecations

Remove deprecated code from v0.9, v0.10 and v0.11 by @Wauplin in #1092

Rename languages to langage + remove duplicate code in tests by @Wauplin in #1169

Deprecate output in list_models by @Wauplin in #1143

Set back feature to create a repo when using clone_from by @Wauplin in #1187

Internal

Configure pytest to run on staging by default + flags in config by @Wauplin in #1093

fix search models test by @Wauplin in #1106

Add mypy in the CI (and fix existing type issues) by @Wauplin in #1097

Fix deprecation warnings for assertEquals in tests by @Wauplin in #1135

Skip failing test in ci by @Wauplin in #1148

:green_heart: fix mypy ci by @nateraw in #1167

Update pr docs actions by @mishig25 in #1170

Revert "Update pr docs actions" by @mishig25 #1192

Bugfixes & small improvements

Expose list_spaces by @osanseviero in #1132

respect NO_COLOR env var by @singingwolfboy in #1103

Fix list_models bool parameters by @Wauplin in #1152

FIX url encoding in hf_hub_url by @Wauplin in #1164

Fix cannot create pr on foreign repo by @Wauplin #1183

Fix HfApi.move_repo(...) and complete tests by @Wauplin in #1136

Commit empty files as regular and warn user by @Wauplin in #1180

Parse file size in get_hf_file_metadata by @Wauplin #1179

Fix get file size on lfs by @Wauplin #1188

More robust create relative symlink in cache by @Wauplin in #1109

Test running CI on Python 3.11 #1189

Source code(tar.gz)
Source code(zip)
v0.10.1(Oct 11, 2022)

Hot-fix to force utf-8 encoding in modelcards. See https://github.com/huggingface/huggingface_hub/pull/1102 and https://github.com/skops-dev/skops/pull/162#issuecomment-1263516507 for context.

Full Changelog: https://github.com/huggingface/huggingface_hub/compare/v0.10.0...v0.10.1
Source code(tar.gz)
Source code(zip)
v0.10.0(Sep 28, 2022)
Modelcards

Contribution from @nateraw to integrate the work done on Modelcards and DatasetCards (from nateraw/modelcards) directly in huggingface_hub.

>>> from huggingface_hub import ModelCard >>> card = ModelCard.load('nateraw/vit-base-beans') >>> card.data.to_dict() {'language': 'en', 'license': 'apache-2.0', 'tags': ['generated_from_trainer', 'image-classification'],...}

Related commits

Add additional repo card utils from modelcards repo by @nateraw in #940

Add regression test for empty modelcard update by @Wauplin in #1060

Add template variables to dataset card template by @nateraw in #1068

Further clarifying Model Card sections by @meg-huggingface in #1052

Create modelcard if doesn't exist on update_metadata by @Wauplin in #1061

Related documentation

Creating and Sharing Model Cards

Repository Cards

Cache management (huggingface-cli scan-cache and huggingface-cli delete-cache)

New commands in huggingface-cli to scan and delete parts of the cache. Goal is to manage the cache-system the same way for any dependent library that uses huggingface_hub. Only the new cache-system format is supported.

➜ huggingface-cli scan-cache REPO ID REPO TYPE SIZE ON DISK NB FILES LAST_ACCESSED LAST_MODIFIED REFS LOCAL PATH --------------------------- --------- ------------ -------- ------------- ------------- ------------------- ------------------------------------------------------------------------- glue dataset 116.3K 15 4 days ago 4 days ago 2.4.0, main, 1.17.0 /home/wauplin/.cache/huggingface/hub/datasets--glue google/fleurs dataset 64.9M 6 1 week ago 1 week ago refs/pr/1, main /home/wauplin/.cache/ (...) Done in 0.0s. Scanned 6 repo(s) for a total of 3.4G. Got 1 warning(s) while scanning. Use -vvv to print details.

Related commits

Feature: add an utility to scan cache by @Wauplin in #990

Utility to delete revisions by @Wauplin in #1035

1025 add time details to scan cache by @Wauplin in #1045

Fix scan cache failure when cached snapshot is empty by @Wauplin in #1054

1025 huggingface-cli delete-cache command by @Wauplin in #1046

Sort repos/revisions by age in delete-cache by @Wauplin in #1063

Related documentation

Manage huggingface_hub cache-system

Cache-system reference

Better error handling (and http-related stuff)

HTTP calls to the Hub have been harmonized to behave the same across the library.

Major differences are:

Unified way to handle HTTP errors using hf_raise_for_status (more informative error message)

Auth token is always sent by default when a user is logged in (see documentation).

package versions are sent as user-agent header for telemetry (python, huggingface_hub, tensorflow, torch,...). It was already the case for hf_hub_download.

Related commits

Always send the cached token when user is logged in by @Wauplin in #1064

Add user agent to all requests with huggingface_hub version (and other) by @Wauplin in #1075

[Repository] Add better error message by @patrickvonplaten in #993

Clearer HTTP error messages in huggingface_hub by @Wauplin in #1019

Handle backoff on HTTP 503 error when pushing repeatedly by @Wauplin in #1038

Breaking changes

For consistency, the return type of create_commit has been modified. This is a breaking change, but we hope the return type of this method was never used (quite recent and niche output type).

Return more information in create_commit output by @Wauplin in #1066

Since repo_id is now validated using @validate_hf_hub_args (see below), a breaking change can be caused if repo_id was previously miused. A HFValidationError is now raised if repo_id is not valid.

Miscellaneous improvements

Add support for autocomplete

Add autocomplete + tests + type checking by @Wauplin in #1041

http-based push_to_hub_fastai

Add changes for push_to_hub_fastai to use the new http-based approach. by @nandwalritik in #1040

Check if a file is cached

try_to_load_from_cache returns cached non-existence by @sgugger in #1039

Get file metadata (commit hash, etag, location) without downloading

Add get_hf_file_metadata to fetch metadata from the Hub by @Wauplin in #1058

Validate arguments using @validate_hf_hub_args

Add validator for repo id + decorator to validate arguments in huggingface_hub by @Wauplin in #1029

Remove repo_id validation in hf_hub_url and hf_hub_download by @Wauplin in #1031

:warning: This is a breaking change if repo_id was previously misused :warning:

Related documentation:

Utilities#Validators

Documentation updates

Fix raise syntax: remove markdown bullet point by @mishig25 in #1034

docs render tree correctly by @mishig25 in #1070

Deprecations

ENH Deprecate clone_from behavior by @merveenoyan in #952

🗑 Deprecate token in read-only methods of HfApi in favor of use_auth_token by @SBrandeis in #928

Remove legacy helper 'install_lfs_in_userspace' by @Wauplin in #1059

1055 deprecate private and repo type in repository class by @Wauplin in #1057

Bugfixes & small improvements

Consider empty subfolder as None in hf_hub_url and hf_hub_download by @Wauplin in #1021

enable http request retry under proxy by @MrZhengXin in #1022

Add securityStatus to ModelInfo object with default value None. by @Wauplin in #1026

👽️ Add size parameter for lfsFiles when committing on the hub by @coyotte508 in #1048

Use /models/ path for api call to update settings by @Wauplin in #1049

Globally set git credential.helper to store in google colab by @Wauplin in #1053

FIX notebook login by @Wauplin in #1073

Windows-specific bug fixes

Fix default cache on windows by @thomwolf in #1069

Degraded but fully working cache-system when symlinks are not supported by @Wauplin in #1067

Check symlinks support per directory instead of globally by @Wauplin in #1077

Source code(tar.gz)
Source code(zip)
v0.9.1(Aug 25, 2022)

Hot-fix error message on gated repositories (https://github.com/huggingface/huggingface_hub/pull/1015).

Context: https://huggingface.co/CompVis/stable-diffusion-v1-4 has been widely shared in the last days but since it's a gated-repo, lots of users are getting confused by the Authentification error received. Error message is now more detailed.

Full Changelog: https://github.com/huggingface/huggingface_hub/compare/v0.9.0...v0.9.1
Source code(tar.gz)
Source code(zip)
v0.9.0(Aug 23, 2022)
Community API

Huge work to programmatically interact with the community tab, thanks to @SBrandeis ! It is now possible to:

Manage discussions (create_discussion, create_pull_request, merge_pull_request, change_discussion_status, rename_discussion)

Comment on them (comment_discussion, edit_discussion_comment)

List them (get_repo_discussions, get_discussion_details)

See full documentation for more details.

✨ Programmatic API for the community tab by @SBrandeis in #930

HTTP-based push_to_hub mixins

push_to_hub mixin and push_to_hub_keras have been refactored to leverage the http-endpoint. This means pushing to the hub will no longer require to first download the repo locally. Previous git-based version is planned to be supported until v0.12.

Push to hub mixins that do not leverage git by @LysandreJik in #847

Miscellaneous API improvements

✨ parent_commit argument for create_commit and related functions by @SBrandeis in #916

Add a helpful error message when commit_message is empty in create_commit by @sgugger in #962

✨ create_commit: more user-friendly errors on HTTP 400 by @SBrandeis in #963

✨ Add files_metadata option to repo_info by @SBrandeis in #951

Add list_spaces to HfApi by @cakiki in #889

Miscellaneous helpers (advanced)

Filter which files to upload in upload_folder

Allowlist and denylist when uploading a folder by @Wauplin in #994

Non-existence of files in a repo is now cached

Cache non-existence of files or completeness of repo by @sgugger in #986

Progress bars can be globally disabled via the HF_HUB_DISABLE_PROGRESS_BARS env variable or using disable_progress_bars/enable_progress_bars helpers.

Add helpers to disable progress bars globally + tests by @Wauplin in #987

Use try_to_load_from_cache to check if a file is locally cached

Add utility to load files from cache by @sgugger in #980

Documentation updates

[Doc] Update "Download files from the Hub" doc by @julien-c in #948

Docs: Fix some missing images and broken links by @NimaBoscarino in #936

Replace upload_file with upload_folder in upload_folder docstring by @mariosasko in #927

Clarify upload docs by @stevhliu in #944

Bugfixes & small improvements

Handle redirections in hf_hub_download for a renamed repo by @Wauplin in #983

PR Make path_in_repo optional in upload folder by @Wauplin in #988

Use a finer exception when local_files_only=True and a file is missing in cache by @Wauplin in #985

use fixes JSONDecodeError by @Wauplin in #974

🐛 Fix PR creation for a repo the user does not own by @SBrandeis in #922

login: tiny messaging tweak by @julien-c in #964

Display endpoint URL in whoami command by @juliensimon in #895

Small orphaned tweaks from #947 by @julien-c in #958

FIX LFS track fix for Hub Mixin by @merveenoyan in #919

:bug: fix multilinguality test and example by @nateraw in #941

Fix custom handling of refined HTTPError by @osanseviero in #924

Followup to #901: Tweak repocard_types.py by @julien-c in #931

[Keras Mixin] - Flattening out nested configurations for better table parsing. by @ariG23498 in #914

[Keras Mixin] Rendering the Hyperparameter table vertically by @ariG23498 in #917

Internal

Disable codecov + configure pytest FutureWarnings by @Wauplin in #976

Enable coverage in CI by @Wauplin in #992

Enable flake8 on W605 by @Wauplin in #975

Enable flake8-bugbear + adapt existing codebase by @Wauplin in #967

Test that TensorFlow is not imported on startup by @lhoestq in #904

Pin black to 22.3.0 to benefit from a stable --preview flag by @LysandreJik in #934

Update dev version by @gante in #921

Source code(tar.gz)
Source code(zip)
v0.8.1(Jun 15, 2022)
Git-aware cache file layout

v0.8.1 introduces a new way of caching files from the Hugging Face Hub, to two methods: snapshot_download and hf_hub_download. The new approach is extensively documented in the Documenting files guide and we recommend checking it out to get a better understanding of how caching works.

New git-aware cache file layout by @julien-c in #801

New create_commit API

A new create_commit API allows users to upload and delete several files at once using HTTP-based methods. You can read more about it in this guide. The following convenience methods were also introduced:

upload_folder: Allows uploading a local directory to a repo.

delete_file allows deleting a single file from a repo.

upload_file now uses create_commit under the hood.

create_commit also allows creating pull requests with a create_pr=True flag.

None of the methods rely on Git locally.

New create_commit API by @SBrandeis in #888

Lazy loading

All modules will now be lazy-loaded. This should drastically reduce the time it takes to import huggingface_hub as it will no longer load all soft dependencies.

ENH lazy load modules in the root init by @adrinjalali in #874

Improvements and bugfixes

Add request ID to all requests by @LysandreJik in #909

Remove deprecations by @LysandreJik in #910

FIX Avoid creating repository when it exists on remote by @merveenoyan in #900

🏗 Use hub-ci for tests by @SBrandeis in #898

Refine 404 errors by @LysandreJik in #878

Fix typo by @lsb in #902

FIX metadata_update: work on a copy of the upstream file, to not mess up the cache by @julien-c in #891

ENH Removed history writing in Keras model card by @merveenoyan in #876

CI enable codecov by @adrinjalali in #893

MNT deprecate imports from snapshot_download by @adrinjalali in #880

Pushback deprecation for v0.7 release by @LysandreJik in #882

FIX make import machinary private by @adrinjalali in #879

ENH Keras Use table instead of dictionary for hyperparameters in model card by @merveenoyan in #877

Invert deprecation for create_repo in #912

Constant was accidentally removed during deprecation transition in #913

Source code(tar.gz)
Source code(zip)

v0.7.0(May 30, 2022)

Repocard metadata

This PR adds a metadata_update function that allows the user to update the metadata in a repository on the hub. The function accepts a dict with metadata (following the same pattern as the YAML in the README) and behaves as follows for all top level fields except model-index.

Examples:

Starting from

existing_results = [{
    'dataset': {'name': 'IMDb', 'type': 'imdb'},
    'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.995}],
     'task': {'name': 'Text Classification', 'type': 'text-classification'}
}]

1. Overwrite existing metric value in existing result

new_results = deepcopy(existing_results)
new_results[0]["metrics"][0]["value"] = 0.999
_update_metadata_model_index(existing_results, new_results, overwrite=True)

[{'dataset': {'name': 'IMDb', 'type': 'imdb'},
  'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.999}],
  'task': {'name': 'Text Classification', 'type': 'text-classification'}}]

2. Add new metric to existing result

new_results = deepcopy(existing_results)
new_results[0]["metrics"][0]["name"] = "Recall"
new_results[0]["metrics"][0]["type"] = "recall"

[{'dataset': {'name': 'IMDb', 'type': 'imdb'},
  'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.995},
              {'name': 'Recall', 'type': 'recall', 'value': 0.995}],
  'task': {'name': 'Text Classification', 'type': 'text-classification'}}]

3. Add new result

new_results = deepcopy(existing_results)
new_results[0]["dataset"] = {'name': 'IMDb-2', 'type': 'imdb_2'}

[{'dataset': {'name': 'IMDb', 'type': 'imdb'},
  'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.995}],
  'task': {'name': 'Text Classification', 'type': 'text-classification'}},
 {'dataset': ({'name': 'IMDb-2', 'type': 'imdb_2'},),
  'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.995}],
  'task': {'name': 'Text Classification', 'type': 'text-classification'}}]

ENH Add update metadata to repocard by @lvwerra in #844

Improvements and bug fixes

Keras: Saving history in a JSON file by @merveenoyan in #861
space after uri by @leondz in #866

Source code(tar.gz)
Source code(zip)

v0.6.0(May 9, 2022)
Disclaimer: This release was initially released with advertised support for #844. It was not released in this release and will be in v0.7.

fastai support

v0.6.0 introduces downstream (download) and upstream (upload) support for the fastai libraries. It supports fastai versions above 2.4. The integration is detailed in the following blog.

Add fastai upstream and downstream capacities for fastai>=2.4 and fastcore>=1.3.27 versions by @omarespejel in #678

Automatic binary file tracking in Repository

Binary files are now rejected by default by the Hub. v0.6.0 introduces automatic binary file tracking through the auto_lfs_track argument of the Repository.git_add method. It also introduces the Repository.auto_track_binary_files method which can be used independently of other methods.

ENH Auto track binary files in Repository by @LysandreJik in #828

skip_lfs_file is now added to mixins

The parameter skip_lfs_files is now added to the different mixins. This will enable pushing files to the hub without first downloading the files above 10MB. This should drammatically reduce the time needed when updating a modelcard, a configuration file, and others.

:sparkles: add skip_lfs_files to mixins' push_to_hub by @nateraw in #858

Keras support improvement

The support for Keras model is greatly improved through several additions:

The save_pretrained_keras method now accepts a list of tags that will automatically be added to the repository.

Download statistics are now available on Keras models

Introducing list of tags to Keras model card by @merveenoyan in #806

Enable keras download stats by @merveenoyan in #860

Bugfixes and improvements

FIX don't raise if name/organizaiton are passed postionally by @adrinjalali in #822

ENH Use provided token from HUGGING_FACE_HUB_TOKEN env variable if available by @FrancescoSaverioZuppichini in #794

tests(hf_api): remove infectionTypes field by @McPatate in #834

Remove docs, tasks and inference API from huggingface_hub by @osanseviero in #833

FEAT Uniformize hf_api a bit and add support for Spaces by @julien-c in #792

Add a bug report template by @osanseviero in #832

clean up formatting by @stevhliu in #839

Release guide by @LysandreJik in #820

Fix keras test by @osanseviero in #855

DOC Add quick start guide by @stevhliu in #850

MNT refactor: subprocess.run -> run_subprocess by @LysandreJik in #352

MNT enable preview on black by @adrinjalali in #849

Update how to guides by @stevhliu in #840

Update contribution guide for merging PRs by @stevhliu in #856

DOC Update landing page by @stevhliu in #854

space after uri by @leondz in #866

Source code(tar.gz)
Source code(zip)
v0.5.1(Apr 7, 2022)

This is a patch release fixing a breaking backward compatibility issue.

Linked PR: https://github.com/huggingface/huggingface_hub/pull/822
Source code(tar.gz)
Source code(zip)
v0.5.0(Apr 7, 2022)
Documentation

Version v0.5.0 is the first version which features an API reference. It is still a work in progress with features lacking, some images not rendering, and a documentation reorg coming up, but should already provide significantly simpler access to the huggingface_hub API.

The documentation is visible here.

API reference documentation by @LysandreJik in https://github.com/huggingface/huggingface_hub/pull/782

[API Reference docs] Remove git references from GitHub Action templates by @LysandreJik in https://github.com/huggingface/huggingface_hub/pull/813

DOC API docstring improvements by @adrinjalali in https://github.com/huggingface/huggingface_hub/pull/731

Model & datasets list improvements

The list_models and list_datasets methods have been improved in several ways.

List private models

These two methods now accept the token keyword to specify your token. Specifying the token will include your private models and datasets in the returned list.

Support list_models and list_datasets with token arg by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/638

Modelcard metadata

These two methods now accept the cardData boolean argument. If set to True, the modelcard metadata will also be returned when using these two methods.

Include cardData in list_models and list_datasets by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/639

Filtering by carbon emissions

The list_models method now also accepts an emissions_trehsholds parameter to filter by carbon emissions.

Enable filtering by carbon emission by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/668

Keras improvements

The Keras serialization and upload methods have been worked on to provide better support for models:

All parameters are now included in the saved model when using push_to_hub_keras

log_dir parameter for TensorBoard logs, which will automatically spawn a TensorBoard instance on the Hub.

Automatic model card

Introduce include_optimizer parameter to push_to_hub_keras() by @merveenoyan in https://github.com/huggingface/huggingface_hub/pull/616

Add TensorBoard for Keras models by @merveenoyan in https://github.com/huggingface/huggingface_hub/pull/651

Create Automatic Keras model card by @merveenoyan in https://github.com/huggingface/huggingface_hub/pull/679

Allow TensorBoard Override for same Repository by @merveenoyan in https://github.com/huggingface/huggingface_hub/pull/709

Add tempfile for tensorboard logs in tensorboard tests in test_keras_integration.py by @merveenoyan in https://github.com/huggingface/huggingface_hub/pull/761

Contributing guide

A contributing guide is now available for the huggingface_hub repository. For any and all information related to contributing to the repository, please check it out!

Read more about it here: CONTRIBUTING.md.

Pre-commit hooks

The huggingface_hub GitHub repository has several checks to ensure that the code respects code quality standards. Opt-in pre-commit hooks have been added in order to make it simpler for contributors to leverage them.

Read more about it in the aforementionned CONTRIBUTING guide.

MNT Add pre-commit hooks by @adrinjalali in https://github.com/huggingface/huggingface_hub/pull/807

Renaming and transferring repositories

Repositories can now be renamed and transferred programmatically using move_repo.

Allow renaming and transferring repos programmatically by @osanseviero in https://github.com/huggingface/huggingface_hub/pull/704

Breaking changes & deprecation

⛔ The following methods have now been removed following a deprecation cycle

list_repos_objs

The list_repos_objs and the accompanying CLI utility huggingface-cli repo ls-files have been removed. The same can be done using the model_info and dataset_info methods.

Remove deprecated list_repos_objs and huggingface-cli repo ls-files by @julien-c in https://github.com/huggingface/huggingface_hub/pull/702

Python 3.6

Python 3.6 support is now dropped as end of life. Using Python 3.6 and installing huggingface_hub will result in version v0.4.0 being installed.

CI support python 3.7-3.10 - remove 3.6 support by @adrinjalali in https://github.com/huggingface/huggingface_hub/pull/790

⚠️ Items below are now deprecated and will be removed in a future version

API deprecate positional args in file_download and hf_api by @adrinjalali in https://github.com/huggingface/huggingface_hub/pull/745

MNT deprecate name and organization in favor of repo_id by @adrinjalali in https://github.com/huggingface/huggingface_hub/pull/733

What's Changed

Include "model" in repo_type to keep consistency by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/620

Hotfix for repo_type by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/623

fix: typo in docstring by @ariG23498 in https://github.com/huggingface/huggingface_hub/pull/647

{upload|delete}_file: Remove client-side filename validation by @SBrandeis in https://github.com/huggingface/huggingface_hub/pull/669

Ensure post_method is only executed once by @sgugger in https://github.com/huggingface/huggingface_hub/pull/676

Remove paying subscription mention from docstring by @cakiki in https://github.com/huggingface/huggingface_hub/pull/653

Improve tests and logging by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/682

docs(links): Update settings/token to settings/tokens by @ronvoluted in https://github.com/huggingface/huggingface_hub/pull/699

Add support for private hub by @juliensimon in https://github.com/huggingface/huggingface_hub/pull/703

Add retry_endpoint for test stability by @osanseviero in https://github.com/huggingface/huggingface_hub/pull/719

FIX fix a bug in _filter_emissions to accept numbers w/o decimal and dict emissions by @adrinjalali in https://github.com/huggingface/huggingface_hub/pull/753

Logging fix for hf_api, logging documentation by @LysandreJik in https://github.com/huggingface/huggingface_hub/pull/748

Contributing guide & code of conduct by @LysandreJik in https://github.com/huggingface/huggingface_hub/pull/692

Fix pytorch and tensorflow python matrix by @osanseviero in https://github.com/huggingface/huggingface_hub/pull/760

MNT add links to related projects and the forum on issue template by @adrinjalali in https://github.com/huggingface/huggingface_hub/pull/773

Note on the README by @LysandreJik in https://github.com/huggingface/huggingface_hub/pull/772

Remove autoreviewers by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/793

CI Error on FutureWarning by @adrinjalali in https://github.com/huggingface/huggingface_hub/pull/787

MNT more informative message on error in Hf.Api.delete_repo by @adrinjalali in https://github.com/huggingface/huggingface_hub/pull/783

Add security status by @McPatate in https://github.com/huggingface/huggingface_hub/pull/654

Remove redundant part of security test by @osanseviero in https://github.com/huggingface/huggingface_hub/pull/802

Changed test repository names to fix tests by @merveenoyan in https://github.com/huggingface/huggingface_hub/pull/803

TST calling delete_repo under tempfile for fixing the test by @merveenoyan in https://github.com/huggingface/huggingface_hub/pull/804

Disable logging in with organization token by @merveenoyan in https://github.com/huggingface/huggingface_hub/pull/780

MNT change dev version to 0.5, 0.4 is already released by @adrinjalali in https://github.com/huggingface/huggingface_hub/pull/810

👨‍💻 Configure HF Hub URL with environment variable by @SBrandeis in https://github.com/huggingface/huggingface_hub/pull/815

MNT support oder requests versions by @adrinjalali in https://github.com/huggingface/huggingface_hub/pull/817

Rename the env variable HF_ENDPOINT. by @Narsil in https://github.com/huggingface/huggingface_hub/pull/819

New Contributors

@McPatate made their first contribution in https://github.com/huggingface/huggingface_hub/pull/583

@FremyCompany made their first contribution in https://github.com/huggingface/huggingface_hub/pull/606

@simoninithomas made their first contribution in https://github.com/huggingface/huggingface_hub/pull/633

@mlonaws made their first contribution in https://github.com/huggingface/huggingface_hub/pull/630

@ariG23498 made their first contribution in https://github.com/huggingface/huggingface_hub/pull/647

@J-Petiot made their first contribution in https://github.com/huggingface/huggingface_hub/pull/660

@ronvoluted made their first contribution in https://github.com/huggingface/huggingface_hub/pull/699

@juliensimon made their first contribution in https://github.com/huggingface/huggingface_hub/pull/703

@allendorf made their first contribution in https://github.com/huggingface/huggingface_hub/pull/742

@frgfm made their first contribution in https://github.com/huggingface/huggingface_hub/pull/747

@hbredin made their first contribution in https://github.com/huggingface/huggingface_hub/pull/688

Full Changelog: https://github.com/huggingface/huggingface_hub/compare/v0.4.0...v0.5.0
Source code(tar.gz)
Source code(zip)
v0.4.0(Jan 26, 2022)
Tag listing

Introduce Tag Listing by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/537

This PR introduces the ability to fetch all available tags for models or datasets and returns them as a nested namespace object, for example:

>>> from huggingface_hub import HfApi >>> api = HfApi() >>> tags = api.get_model_tags() >>> print(tags) Available Attributes: * benchmark * language_creators * languages * licenses * multilinguality * size_categories * task_categories * task_ids >>> print(tags.benchmark) Available Attributes: * raft * superb * test

Namespace objects

Namespace Objects for Search Parameters by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/556

With a goal of adding more tab-completion to the library, this PR introduces two objects:

DatasetSearchArguments

ModelSearchArguments

These two AttributeDictionary objects contain all the valid information we can extract from a model as tab-complete parameters. We also include the author_or_organization and dataset (or model) _name as well through careful string splitting.

Model Filter

Implement a Model Filter class by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/553

This PR introduces a new way to search the hub: the ModelFilter class.

It is a simple Enum at first to the user, allowing them to specify what they want to search for, such as:

f = ModelFilter(author="microsoft", model_name="wavlm-base-sd", framework="pytorch")

From there, they can pass in this filter to the new list_models_by_filter function in HfApi to search through it:

models = api.list_modes(filter=f)

The API may then be used for complex queries:

args = ModelSearchArguments() f = ModelFilter(framework=[args.library.pytorch, args.library.TensorFlow], model_name="bert", tasks=[args.pipeline_tag.Summarization, args.pipeline_tag.TokenClassification]) api.list_models_from_filter(f)

Ignoring filenames in snapshot_download

This PR introduces a way to limit the files that will be fetched by the snapshot_download. This is useful when you want to download and cache an entire repository without using git, and that you want to skip files according to their filenames.

[Snapshot download] allow some filenames to be ignored by @patrickvonplaten in https://github.com/huggingface/huggingface_hub/pull/566

What's Changed

[Hotfix][API] card_data => cardData on /api/datasets by @julien-c in https://github.com/huggingface/huggingface_hub/pull/530

Fix the progress bars when cloning a repository by @LysandreJik in https://github.com/huggingface/huggingface_hub/pull/517

Update Hugging Face Hub documentation README and Endpoints by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/527

Convert string functions to f-string by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/536

Fixing FS for espnet. by @Narsil in https://github.com/huggingface/huggingface_hub/pull/542

[snapshot_download] upgrade to canonical separator by @julien-c in https://github.com/huggingface/huggingface_hub/pull/545

Add test directions by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/547

[HOTFIX] Change test for missing_input to reflect back-end redirect changes by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/552

Bring consistency to download and upload APIs by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/574

Search by authors and string by @FrancescoSaverioZuppichini in https://github.com/huggingface/huggingface_hub/pull/531

Quick typo by @muellerzr in https://github.com/huggingface/huggingface_hub/pull/575

New Contributors

@kahne made their first contribution in https://github.com/huggingface/huggingface_hub/pull/569

@FrancescoSaverioZuppichini made their first contribution in https://github.com/huggingface/huggingface_hub/pull/531

Full Changelog: https://github.com/huggingface/huggingface_hub/compare/v0.2.1...v0.4.0
Source code(tar.gz)
Source code(zip)
v0.2.1(Jan 26, 2022)

This is a patch release fixing an issue with the notebook login.

https://github.com/huggingface/huggingface_hub/commit/5e2da9bae95ed4c99683e9572ecc32c9e0da5e15#diff-fb1696cbcf008dd89dde5e8c1da9d4be5a8f7d809bc32f07d4453caba40df15f
Source code(tar.gz)
Source code(zip)
v0.2.0(Jan 26, 2022)

Access tokens

Version v0.2.0 introduces the access token compatibility with the hub. It offers the access tokens as the main login handler, with the possibility to still login with username/password when doing [Ctrl/CMD]+C on the login prompt:

The notebook login is adapted to work with the access tokens.

Skipping large files

The Repository class now has an additional parameter, skip_lfs_files, which allows cloning the repository while skipping the large file download.

https://github.com/huggingface/huggingface_hub/pull/472

Local files only for snapshot_download

The snapshot_download method can now take local_files_only as a parameter to enable leveraging previously downloaded files.

https://github.com/huggingface/huggingface_hub/pull/505
Source code(tar.gz)
Source code(zip)
v0.1.2(Nov 9, 2021)
What's Changed

clean_ok should be True by default by @LysandreJik in https://github.com/huggingface/huggingface_hub/pull/462

Full Changelog: https://github.com/huggingface/huggingface_hub/compare/v0.1.1...v0.1.2
Source code(tar.gz)
Source code(zip)
v0.1.1(Nov 5, 2021)
What's Changed

Fix typing-extensions minimum version by @lhoestq in https://github.com/huggingface/huggingface_hub/pull/453

Fix argument order in create_repo for Repository.clone_from by @sgugger in https://github.com/huggingface/huggingface_hub/pull/459

Full Changelog: https://github.com/huggingface/huggingface_hub/compare/v0.1.0...v0.1.1
Source code(tar.gz)
Source code(zip)
v0.1.0(Nov 2, 2021)
What's Changed

Version v0.1.0 is the first minor release of the huggingface_hub package, which promises better stability for the incoming versions. This update comes with big quality of life improvements.

Make token optional in all HfApi methods. by @sgugger in https://github.com/huggingface/huggingface_hub/pull/379

Previously, most methods of the HfApi class required the token to be explicitly passed. This is changed in this version, where it defaults to the token stored in the cache. This results in a re-ordering of arguments, but backward compatibility is preserved in most cases. Where it is not preserved, an explicit error is thrown.

Root methods instead of HfApi by @LysandreJik in https://github.com/huggingface/huggingface_hub/pull/388

The HfApi class now exposes its methods through the hf_api file, reducing the friction to access these helpers. See the example below:

# Previously from huggingface_hub import HfApi api = HfApi() user = api.whoami() # Now from huggingface_hub.hf_api import whoami user = whoami()

The HfApi can still be imported and works as before for backward compatibility.

Add list_repo_files util by @sgugger in https://github.com/huggingface/huggingface_hub/pull/395

Offers a list_repo_files to ... list the repo files! Supports both model repositories and dataset repositories

Add helper to generate an eval result model-index, with proper typing by @julien-c in https://github.com/huggingface/huggingface_hub/pull/382

Offers a metadata_eval_result in order to generate a YAML block to put in model cards according to evaluation results.

Add metrics to API by @mariosasko in https://github.com/huggingface/huggingface_hub/pull/429

Adds a list_metrics method to HfApi!

Git prune by @LysandreJik in https://github.com/huggingface/huggingface_hub/pull/450

Adds a git_prune method to the Repository class. This prunes local files which are unneeded as already pushed to a remote. It adds the argument auto_lfs_prune to git_push and the commit context-manager for simpler handling.

Bug fixes

Fix HfApi.create_repo when repo_type is 'space' by @nateraw in https://github.com/huggingface/huggingface_hub/pull/394

Last fixes for datasets' push_to_hub method by @LysandreJik in https://github.com/huggingface/huggingface_hub/pull/415

Full Changelog: https://github.com/huggingface/huggingface_hub/compare/v0.0.19...v0.1.0
Source code(tar.gz)
Source code(zip)
v0.0.18(Oct 4, 2021)
v0.0.18: Repo metadata, git tags, Keras mixin

Repository metadata (@julien-c)

The version v0.0.18 of the huggingface_hub includes tools to manage repository metadata. The following example reads metadata from a repository:

from huggingface_hub import Repository repo = Repository("xxx", clone_from="yyy") data = repo.repocard_metadata_load()

The following example completes that metadata before writing it to the repository locally.

data["license"] = "apache-2.0" repo.repocard_metadata_save(data)

Repo metadata load and save #339 (@julien-c)

Git tags (@AngledLuffa)

Tag management is now available! Add, check, delete tags locally or remotely directly from the Repository utility.

Tags #323 (@AngledLuffa)

Revisited Keras support (@nateraw)

The Keras mixin has been revisited:

It now saves models as SavedModel objects rather than .h5 files.

It now offers methods that can be leveraged simply as a functional API, instead of having to use the Mixin as an actual mixin.

Improvements and bug fixes

Better error message for bad token. #362 (@sgugger)

Add utility to get repo name #364 (@sgugger)

Improve save and load repocard metadata #355 (@elishowk)

Update Keras Mixin #284 (@nateraw)

Add timeout to dataset_info #373 (@lhoestq)

Source code(tar.gz)
Source code(zip)
v0.0.17(Oct 4, 2021)
v0.0.17: Non-blocking git push, notebook login

Non-blocking git-push

The pushing methods now have access to a blocking boolean parameter to indicate whether the push should happen asynchronously.

In order to see if the push has finished or its status code (to spot a failure), one should use the command_queue property on the Repository object.

For example:

from huggingface_hub import Repository repo = Repository("<local_folder>", clone_from="<user>/<model_name>") with repo.commit("Commit message", blocking=False): # Save data last_command = repo.command_queue[-1] # Status of the push command last_command.status # Will return the status code # -> -1 will indicate the push is still ongoing # -> 0 will indicate the push has completed successfully # -> non-zero code indicates the error code if there was an error # if there was an error, the stderr may be inspected last_command.stderr # Whether the command finished or if it is still ongoing last_command.is_done # Whether the command errored-out. last_command.failed

When using blocking=False, the commands will be tracked and your script will exit only when all pushes are done, even if other errors happen in your script (a failed push counts as done).

Non blocking git push #315 (@LysandreJik)

Notebook login (@sgugger)

The huggingface_hub library now has a notebook_login method which can be used to login on notebooks with no access to the shell. In a notebook, login with the following:

from huggingface_hub import notebook_login notebook_login()

Add a widget to login in notebook #329 (@sgugger)

Improvements and bugfixes

added option to create private repo #319 (@philschmid)

display git push warnings #326 (@elishowk)

Allow specifying data with the Inference API wrapper #271 (@osanseviero)

Add auth to snapshot download #340 (@lewtun)

Source code(tar.gz)
Source code(zip)
v0.0.16(Aug 27, 2021)
v0.0.16: Progress bars, git credentials

The huggingface_hub version v0.0.16 introduces several quality of life improvements.

Progress bars in Repository

Progress bars are now visible with many git operations, such as pulling, cloning and pushing:

>>> from huggingface_hub import Repository >>> repo = Repository("local_folder", clone_from="huggingface/CodeBERTa-small-v1")

Cloning https://huggingface.co/huggingface/CodeBERTa-small-v1 into local empty directory. Download file pytorch_model.bin: 45%|████████████████████████████▋ | 144M/321M [00:13<00:12, 14.7MB/s] Download file flax_model.msgpack: 42%|██████████████████████████▌ | 134M/319M [00:13<00:13, 14.4MB/s]

Branching support

There is now branching support in Repository. This will clone the xxx repository and checkout the new-branch revision. If it is an existing branch on the remote, it will checkout that branch. If it is another revision, such as a commit or a tag, it will also checkout that revision.

If the revision does not exist, it will create a branch from the latest commit on the main branch.

>>> from huggingface_hub import Repository >>> repo = Repository("local", clone_from="xxx", revision="new-branch")

Once the repository is instantiated, it is possible to manually checkout revisions using the git_checkout method. If the revision already exists:

>>> repo.git_checkout("main")

If a branch should be created from the current head in the case that it does not exist:

>>> repo.git_checkout("brand-new-branch", create_branch_ok=True)

Revision `brand-new-branch` does not exist. Created and checked out branch `brand-new-branch`

Finally, the commit context manager has a new branch parameter to specify to which branch the utility should push:

>>> with repo.commit("New commit on branch brand-new-branch", branch="brand-new-branch"): ... # Save any file or model here, it will be committed to that branch. ... torch.save(model.state_dict())

Git credentials

The login system has been redesigned to leverage git-credential instead of a token-based authentication system. It leverages the git-credential store helper. If you're unaware of what this is, you may see the following when logging in with huggingface_hub:

_| _| _| _| _|_|_| _|_|_| _|_|_| _| _| _|_|_| _|_|_|_| _|_| _|_|_| _|_|_|_| _| _| _| _| _| _| _| _|_| _| _| _| _| _| _| _| _|_|_|_| _| _| _| _|_| _| _|_| _| _| _| _| _| _|_| _|_|_| _|_|_|_| _| _|_|_| _| _| _| _| _| _| _| _| _| _| _|_| _| _| _| _| _| _| _| _| _| _|_| _|_|_| _|_|_| _|_|_| _| _| _|_|_| _| _| _| _|_|_| _|_|_|_| Username: Password: Login successful Your token has been saved to /root/.huggingface/token Authenticated through git-crendential store but this isn't the helper defined on your machine. You will have to re-authenticate when pushing to the Hugging Face Hub. Run the following command in your terminal to set it as the default git config --global credential.helper store

Running the command git config --global credential.helper store will set this as the default way to handle credentials for git authentication. All repositories instantiated with the Repository utility will have this helper set by default, so no action is required from your part when leveraging it.

Improved logging

The logging system is now similar to the existing logging system in transformers and datasets, based on a logging module that controls the entire library's logging level:

>>> from huggingface_hub import logging >>> logging.set_verbosity_error() >>> logging.set_verbosity_info()

Bug fixes and improvements

Add documentation to GitHub and the Hub docs about the Inference client wrapper #253 (@osanseviero)

Have large files enabled by default when using Repository #219 (@LysandreJik)

Clarify/specify/document model card metadata, model-index, and pipeline/task types #265 (@julien-c)

[model_card][metadata] Actually, lets make dataset.name required #267 (@julien-c)

Progress bars #261 (@LysandreJik)

Add keras mixin #230 (@nateraw)

Open source code related to the repo type (tag icon, display order, snippets) #273 (@osanseviero)

Branch push to hub #276 (@LysandreJik)

Git credentials #277 (@LysandreJik)

Push to hub/commit with branches #282 (@LysandreJik)

Better logging #262 (@LysandreJik)

Remove custom language pack behavior #291 (@LysandreJik)

Update Hub and huggingface_hub docs #293 (@osanseviero)

Adding a handler #292 (@LysandreJik)

Source code(tar.gz)
Source code(zip)
v0.0.15(Jul 28, 2021)
v0.0.15: Documentation, bug fixes and misc improvements

Improvements and bugfixes

[Docs] Update link to Gradio documentation #206 (@abidlabs)

Fix title typo (Cliet -> Client) #207 (@cakiki)

add _from_pretrained hook #159 (@nateraw)

Add filename option to lfs_track #212 (@LysandreJik)

Repository fixes #213 (@LysandreJik)

Repository documentation #214 (@LysandreJik)

Add datasets filtering and sorting #194 (@lhoestq)

doc: sync github to spaces #221 (@borisdayma)

added batch transform documentation & model archive documentation #224 (@philschmid)

Sync with hf internal #228 (@mishig25)

Adding batching support for superb #215 (@Narsil)

Adding SD for superb (speech-classification). #225 (@Narsil)

Use Hugging Face fork for s3prl #229 (@lewtun)

Mv interfaces -> widgets/lib/interfaces #227 (@mishig25)

Tweak to prevent accidental sharing of token #226 (@julien-c)

Fix CLI-based repo creation #234 (@osanseviero)

Add proxify util function #235 (@mishig25)

Source code(tar.gz)
Source code(zip)
v0.0.14(Jul 18, 2021)
v0.0.14: LFS Auto tracking, dataset_info and list_datasets, documentation

Datasets

Datasets repositories get better support, by first enabling full usage of the Repository class for datasets repositories:

from huggingface_hub import Repository repo = Repository("local_directory", clone_from="<user>/<model_id>", repo_type="dataset")

Datasets can now be retrieved from the Python runtime using the list_datasets method from the HfApi class:

from huggingface_hub import HfApi api = HfApi() datasets = api.list_datasets() len(datasets) # 1048 publicly available dataset repositories at the time of writing

Information can be retrieved on specific datasets using the dataset_info method from the HfApi class:

from huggingface_hub import HfApi api = HfApi() api.dataset_info("squad") # DatasetInfo: { # id: squad # lastModified: 2021-07-07T13:18:53.595Z # tags: ['pretty_name:SQuAD', 'annotations_creators:crowdsourced', 'language_creators:crowdsourced', 'language_creators:found', # [...]

Add dataset_info and list_datasets #164 (@lhoestq)

Enable dataset repositories #151 (@LysandreJik)

Inference API wrapper client

Version v0.0.14 introduces a wrapper client for the Inference API. No need to use custom-made requests anymore. See below for an example.

from huggingface_hub import InferenceApi api = InferenceApi("bert-base-uncased") api(inputs="The [MASK] is great") # [ # {'sequence': 'the music is great', 'score': 0.03599703311920166, 'token': 2189, 'token_str': 'music'}, # {'sequence': 'the price is great', 'score': 0.02146693877875805, 'token': 3976, 'token_str': 'price'}, # {'sequence': 'the money is great', 'score': 0.01866752654314041, 'token': 2769, 'token_str': 'money'}, # {'sequence': 'the fun is great', 'score': 0.01654735580086708, 'token': 4569, 'token_str': 'fun'}, # {'sequence': 'the effect is great', 'score': 0.015102624893188477, 'token': 3466, 'token_str': 'effect'} # ]

Inference API wrapper client #65 (@osanseviero)

Auto-track with LFS

Version v0.0.14 introduces an auto-tracking mechanism with git-lfs for large files. Files that are larger than 10MB can be automatically tracked by using the auto_track_large_files method:

from huggingface_hub import Repository repo = Repository("local_directory", clone_from="<user>/<model_id>") # save large files in `local_directory` repo.git_add() repo.auto_track_large_files() repo.git_commit("Add large files") repo.git_push() # No push rejected error anymore!

It is automatically used when leveraging the commit context manager:

from huggingface_hub import Repository repo = Repository("local_directory", clone_from="<user>/<model_id>") with repo.commit("Add large files"): # add large files # No push rejected error anymore!

Auto track with LFS #177 (@LysandreJik)

Documentation

Update docs structure #145 (@Pierrci)

Update links to docs #147 (@LysandreJik)

Add new repo guide #153 (@osanseviero)

Add documentation for endpoints #155 (@osanseviero)

Document hf.co webhook publicly #156 (@julien-c)

docs: ✏️ mention the Training metrics tab #193 (@severo)

doc for Spaces #189 (@julien-c)

Breaking changes

Reminder: the huggingface_hub library follows semantic versioning and is undergoing active development. While the first major version is not out (v1.0.0), you should expect breaking changes and we strongly recommend pinning the library to a specific version.

Two breaking changes are introduced with version v0.0.14.

The whoami return changes from a tuple to a dictionary

Allow obtaining Inference API tokens with whoami #157 (@osanseviero)

The whoami method changes its returned value from a tuple of (<user>, [<organisations>]) to a dictionary containing a lot more information:

In versions v0.0.13 and below, here was the behavior of the whoami method from the HfApi class:

from huggingface_hub import HfFolder, HfApi api = HfApi() api.whoami(HfFolder.get_token()) # ('<user>', ['<org_0>', '<org_1>'])

In version v0.0.14, this is updated to the following:

from huggingface_hub import HfFolder, HfApi api = HfApi() api.whoami(HfFolder.get_token()) # { # 'type': str, # 'name': str, # 'fullname': str, # 'email': str, # 'emailVerified': bool, # 'apiToken': str, # `plan': str, # 'avatarUrl': str, # 'orgs': List[str] # }

The Repository's use_auth_token initialization parameter now defaults to True.

The use_auth_token initialization parameter of the Repository class now defaults to True. The behavior is unchanged if users are not logged in, at which point Repository remains agnostic to the huggingface_hub.

Set use_auth_token to True by default #204 (@LysandreJik)

Improvements and bugfixes

Add sklearn code snippet #133 (@osanseviero)

Allow passing only model ID to clone when authenticated #150 (@LysandreJik)

More robust endpoint with toggled staging endpoint #148 (@LysandreJik)

Add config to list_models #152 (@osanseviero)

Fix audio-to-audio widget and add icon #142 (@osanseviero)

Upgrade spaCy to api 0.0.12 and remove allowlist #161 (@osanseviero)

docs: fix webhook response format #162 (@severo)

Update link in README.md #163 (@nateraw)

Revert "docs: fix webhook response format (#162)" #165 (@severo)

Add Keras docker image #117 (@osanseviero)

Allow multiple models when testing a pipeline #124 (@osanseviero)

scikit rebased #170 (@Narsil)

Upgrading community frameworks to audio-to-audio. #94 (@Narsil)

Add sagemaker docs #173 (@philschmid)

Add Structured Data Classification as task #172 (@osanseviero)

Fixing keras outputs (widgets was ignoring because of type mismatch, now testing for it) #176 (@Narsil)

Updating spacy. #179 (@Narsil)

Create initial superb docker image structure #181 (@osanseviero)

Upgrading asteroid image. #175 (@Narsil)

Removing tests on huggingface_hub for unrelated changes in api-inference-community #180 (@Narsil)

Fixing audio-to-audio validation. #184 (@Narsil)

rmdir api-inference-community/src/sentence-transformers #188 (@Pierrci)

Allow generic inference for ASR for superb #185 (@osanseviero)

Add timestamp to snapshot download tests #201 (@LysandreJik)

No need for token to understand HF urls #203 (@LysandreJik)

Remove --no_renames argument to list deleted files. #205 (@LysandreJik)

Source code(tar.gz)
Source code(zip)

v0.0.13(Jun 28, 2021)

v0.0.13: Context Manager

Version 0.0.13 introduces a context manager to save files directly to the Hub. See below for some examples.

Example with a single file

from huggingface_hub import Repository

repo = Repository("text-files", clone_from="<user>/text-files", use_auth_token=True)

with repo.commit("My first file."):
    with open("file.txt", "w+") as f:
        f.write(json.dumps({"key": "value"}))

Example with a `torch.save` statement:

import torch
from huggingface_hub import Repository

model = torch.nn.Transformer()

repo = Repository("torch-files", clone_from="<user>/torch-files", use_auth_token=True)

with repo.commit("Adding my cool model!"):
    torch.save(model.state_dict(), "model.pt")

Example with a Flax/JAX seralization statement

from flax import serialization
from jax import random
from flax import linen as nn
from huggingface_hub import Repository

model = nn.Dense(features=5)

key1, key2 = random.split(random.PRNGKey(0))
x = random.normal(key1, (10,))
params = model.init(key2, x)

bytes_output = serialization.to_bytes(params)

repo = Repository("flax-model", clone_from="<user>/flax-model", use_auth_token=True)

with repo.commit("Adding my cool Flax model!"):
    with open("flax_model.msgpack", "wb") as f:
        f.write(bytes_output)

Source code(tar.gz)
Source code(zip)

v0.0.12(Jun 23, 2021)

Patches an issue when cloning a repository twice.
Source code(tar.gz)
Source code(zip)
v0.0.11(Jun 23, 2021)
v0.0.11: Improved documentation, hf_hub_download and Repository power-up

Improved documentation

The huggingface_hub documentation is now available on hf.co/docs! Additionally, a new step-by-step guide to adding libraries is available.

New documentation for 🤗 Hub #71 (@osanseviero)

Step by step guide on adding Model Hub support to libraries #86 (@LysandreJik)

New method: hf_hub_download

A new method is introduced: hf_hub_download. It is the equivalent of doing cached_download(hf_hub_url()), in a single method.

HF Hub download #137 (@LysandreJik)

Repository power-up

The Repository class is updated to behave more similarly to git. It is now impossible to clone a repository in a folder that already contains files.

The PyTorch Mixin contributed by @vasudevgupta7 is slightly updated to have the push_to_hub method manage a repository as one would from the command line.

Repository power-up #132 (@LysandreJik)

Improvement & Fixes

Adding audio-to-audio task. #93 (@Narsil)

When pipelines fail to load in framework code, for whatever reason #96 (@Narsil)

Solve rmtree issue on windows #105 (@SBrandeis)

Add identical_ok option to HfApi.upload_file method #102 (@SBrandeis)

Solve compatibility issues when calling subprocess.run #104 (@SBrandeis)

Open source Inference widgets + optimize for community contributions #87 (@julien-c)

model tags can be undefined #107 (@Pierrci)

Doc tweaks #109 (@julien-c)

[huggingface_hub] Support for spaces #108 (@julien-c)

speechbrain library tag + code snippet #73 (@osanseviero)

Allow batching for feature-extraction #106 (@osanseviero)

adding audio-to-audio widget. #95 (@Narsil)

Add image to text (for image captioning) #114 (@osanseviero)

Add formatting and upgrade Sentence Transformers api version for better error messages #119 (@osanseviero)

Change videos in docs so they are played directly in our site #120 (@osanseviero)

Fix inference API GitHub actions #125 (@osanseviero)

Fixing sentence-transformers CACHE value for docker + functools (docker needs Py3.8) #123 (@Narsil)

Load errors with flair should now be generating proper API errors. #121 (@Narsil)

Simplify manage to autodetect task+framework if possible. #122 (@Narsil)

Change sentence transformers source to original repo #128 (@osanseviero)

Allow Python versions with letters in the minor version suffix #82 (@ulf1)

Update upload_file docs #136 (@LysandreJik)

Reformat repo README #130 (@osanseviero)

Add config to model info #135 (@osanseviero)

Add input validation for structured-data-classification #97 (@osanseviero)

Source code(tar.gz)
Source code(zip)
v0.0.10(Jun 8, 2021)
v0.0.10: Merging huggingface_hub with api-inference-community and hub interfaces

v0.0.10 Signs the merging of three components of the HuggingFace stack: the huggingface_hub repository is now the central platform to contribute new libraries to be supported on the hub.

It regroups three previously separated components:

The huggingface_hub Python library, as the Python library to download, upload, and retrieve information from the hub.

The api-inference-community, as the platform where libraries wishing for hub support may be added.

The interfaces, as the definition for pipeline types as well as default widget inputs and definitions/UI elements for third-party libraries.

Future efforts will be focused on further easing contributing third-party libraries to the Hugging Face Hub

Improvement & Fixes

Add typing extensions to conda yaml file #49 (@LysandreJik)

Alignment on modelcard metadata specification #39 (@LysandreJik)

Bring interfaces from widgets-server #50 (@julien-c)

Sentence similarity default widget and pipeline type #52 (@osanseviero)

[interfaces] Expose configuration options for external libraries #51 (@julien-c)

Adding api-inference-community to huggingface_hub. #48 (@Narsil)

Add TensorFlowTTS as library + code snippet #55 (@osanseviero)

Add protobuf as a dependency to handle tokenizers that require it: #58 (@Narsil)

Update validation for NLP tasks #59 (@osanseviero)

spaCy code snippet and language tag #57 (@osanseviero)

SpaCy fixes #60 (@osanseviero)

Allow changing repo visibility programmatically #61 (@osanseviero)

Add Adapter Transformers snippet #62 (@osanseviero)

Change order in spaCy snippet #66 (@osanseviero)

Add validation to check all rows in table question answering have same length #67 (@osanseviero)

added question-answering part for Bengali language #68 (@sagorbrur)

Add spaCy to inference API #63 (@osanseviero)

AllenNLP library tag + code snippet #72 (@osanseviero)

Fix AllenNLP QA example #80 (@epwalsh)

do not crash even if this config isn't set #81 (@julien-c)

Mark model config as optional #83 (@Pierrci)

Add repr() to ModelFile and RepoObj #75 (@lewtun)

Refactor create_repo #84 (@SBrandeis)

Source code(tar.gz)
Source code(zip)
v0.0.9(May 20, 2021)
v0.0.9: HTTP file uploads, multiple filter model selection

Support for large file uploads

Implementation of an endpoint to programmatically upload (large) files to any repo on the hub, without the need for git, using HTTP POST requests.

[API] Support for the file upload endpoint #42 (@SBrandeis)

The HfApi.model_list method now allows multiple filters

Models may now be filtered using several filters:

Example usage: >>> from huggingface_hub import HfApi >>> api = HfApi() >>> # List all models >>> api.list_models() >>> # List only the text classification models >>> api.list_models(filter="text-classification") >>> # List only the russian models compatible with pytorch >>> api.list_models(filter=("ru", "pytorch")) >>> # List only the models trained on the "common_voice" dataset >>> api.list_models(filter="dataset:common_voice") >>> # List only the models from the AllenNLP library >>> api.list_models(filter="allennlp")

Document the filter argument #41 (@LysandreJik)

ModelInfo now has a readable representation

Improvement of the ModelInfo class so that it displays information about the object.

Include a readable repr for ModelInfo #32 (@muellerzr)

Improvements and bugfixes

Fix conda by specifying python version + add tests to main branch #28 (@LysandreJik)

Improve Mixin #34 (@LysandreJik)

Enable library_name and library_version in snapshot_download #38 (@LysandreJik)

[Windows support] Very long filenames #40 (@LysandreJik)

Make error message more verbose when creating a repo #44 (@osanseviero)

Open-source /docs #46 (@julien-c)

Source code(tar.gz)
Source code(zip)

v0.0.8(Apr 7, 2021)

Addition of the HfApi.model_info method to retrieve information about a repo given a revision.
The accompanying snapshot_download utility to download to cache all files stored in that repo at that given revision.

Example usage of HfApi.model_info:

from huggingface_hub import HfApi

hf_api = HfApi()
model_info = hf_api.model_info("lysandre/dummy-hf-hub")

print("Model ID:", model_info.modelId)

for file in model_info.siblings:
    print("file:", file.rfilename)

outputs:

Model ID: lysandre/dummy-hf-hub
file: .gitattributes
file: README.md

Example usage of snapshot_download:

from huggingface_hub import snapshot_download
import os

repo_path = snapshot_download("lysandre/dummy-hf-hub")
print(os.listdir(repo_path))

outputs:

['.gitattributes', 'README.md']

Source code(tar.gz)
Source code(zip)

v0.0.7(Mar 18, 2021)

Networking improvements by @Pierrci and @lhoestq (#21 and #22)
Adding mixin class for ease saving, uploading, downloading a PyTorch model. See PR #11 by @vasudevgupta7

Example usage:

from huggingface_hub import ModelHubMixin

class MyModel(nn.Module, ModelHubMixin):
   def __init__(self, **kwargs):
      super().__init__()
      self.config = kwargs.pop("config", None)
      self.layer = ...
   def forward(self, ...):
      return ...

model = MyModel()

# saving model to local directory & pushing to hub
model.save_pretrained("mymodel", push_to_hub=True, config={"act": "gelu"})

# initiatizing model & loading it from trained-weights
model = MyModel.from_pretrained("username/[email protected]")

Thanks a ton for your contributions ♥️

Source code(tar.gz)
Source code(zip)

v0.0.6(Mar 2, 2021)

Source code(tar.gz)
Source code(zip)

Client library to download and publish models and other files on the huggingface.co hub

Related tags

Overview

huggingface_hub

Client library to download and publish models and other files on the huggingface.co hub

♻️ Partial list of implementations in third party libraries:

Download files from the huggingface.co hub

hf_hub_url

cached_download

Publish models to the huggingface.co hub

API utilities in hf_api.py

huggingface-cli

Need to upload large (>5GB) files?

Visual integration into the huggingface.co hub

Feedback (feature requests, bugs, etc.) is super welcome 💙 💚 💛 💜 ♥️ 🧡

Comments

Describe the bug

Reproduction

Logs

What

Why

How?

Testing?

Next?

Question: do we actually want this feature or not?

Details

Describe the bug

Reproduction

Logs

System info

Describe the bug

Reproduction

Logs

System info

Releases(v0.11.1)

v0.11.1(Nov 28, 2022)

v0.11.0(Nov 14, 2022)

New features and improvements for HfApi

Create/delete tags and branches

Upload lots of files in a single commit

Delete an entire folder

Support pagination when listing repos

Misc

Login, tokens and authentication

Unified login and logout methods

Set token only for a HfApi session

Stop using use_auth_token in favor of token, everywhere

Respect git credential helper from the user

Better error handling

Helper to dump machine information

Misc

Modelcards

Cache assets

Documentation updates

Breaking changes

Deprecations

Internal

Bugfixes & small improvements

v0.10.1(Oct 11, 2022)

v0.10.0(Sep 28, 2022)

Modelcards

Related commits

Related documentation

Cache management (huggingface-cli scan-cache and huggingface-cli delete-cache)

Related commits

Related documentation

Better error handling (and http-related stuff)

Related commits

Breaking changes

Miscellaneous improvements

Add support for autocomplete

http-based push_to_hub_fastai

Check if a file is cached

Get file metadata (commit hash, etag, location) without downloading

Validate arguments using @validate_hf_hub_args

Related documentation:

Documentation updates

Deprecations

Bugfixes & small improvements

Windows-specific bug fixes

`huggingface_hub`

`hf_hub_url`

`cached_download`

API utilities in `hf_api.py`

`huggingface-cli`

New features and improvements for `HfApi`

Unified `login` and `logout` methods

Set token only for a `HfApi` session

Stop using `use_auth_token` in favor of `token`, everywhere

Cache management (`huggingface-cli scan-cache` and `huggingface-cli delete-cache`)

http-based `push_to_hub_fastai`

Validate arguments using `@validate_hf_hub_args`

HTTP-based `push_to_hub` mixins

Filter which files to upload in `upload_folder`

Progress bars can be globally disabled via the `HF_HUB_DISABLE_PROGRESS_BARS` env variable or using `disable_progress_bars`/`enable_progress_bars` helpers.

Use `try_to_load_from_cache` to check if a file is locally cached

New `create_commit` API

Automatic binary file tracking in `Repository`

`skip_lfs_file` is now added to mixins

`list_repos_objs`