XManager: A framework for managing machine learning experiments 🧑‍🔬

Overview

XManager: A framework for managing machine learning experiments 🧑‍🔬

XManager is a platform for packaging, running and keeping track of machine learning experiments. It currently enables one to launch experiments locally or on Google Cloud Platform (GCP). Interaction with experiments is done via XManager's APIs through Python launch scripts.

To get started, install the prerequisites, XManager itself and follow the tutorial to create and run a launch script.

See CONTRIBUTING.md for guidance on contributions.

Prerequisites

The codebase assumes Python 3.7+.

Install Docker

If you use xmanager.xm.PythonDocker to run XManager experiments, you need to install Docker.

  1. Follow the steps to install Docker.

  2. And if you are a Linux user, follow the steps to enable sudoless Docker.

Install Bazel

If you use xmanager.xm_local.BazelContainer or xmanager.xm_local.BazelBinary to run XManager experiments, you need to install Bazel.

  1. Follow the steps to install Bazel.

Create a GCP project

If you use xm_local.Caip (Cloud AI Platform) to run XManager experiments, you need to have a GCP project in order to be able to access CAIP to run jobs.

  1. Create a GCP project.

  2. Install gcloud.

  3. Associate your Google Account (Gmail account) with your GCP project by running:

    export GCP_PROJECT=<GCP PROJECT ID>
    gcloud auth login
    gcloud auth application-default login
    gcloud config set project $GCP_PROJECT
  4. Set up gcloud to work with Docker by running:

    gcloud auth configure-docker
  5. Enable Google Cloud Platform APIs.

  6. Create a staging bucket in us-central1 if you do not already have one. This bucket should be used to save experiment artifacts like TensorFlow log files, which can be read by TensorBoard. This bucket may also be used to stage files to build your Docker image if you build your images remotely.

    export GOOGLE_CLOUD_BUCKET_NAME=<GOOGLE_CLOUD_BUCKET_NAME>
    gsutil mb -l us-central1 gs://$GOOGLE_CLOUD_BUCKET_NAME

    Add GOOGLE_CLOUD_BUCKET_NAME to the environment variables or your .bashrc:

    export GOOGLE_CLOUD_BUCKET_NAME=<GOOGLE_CLOUD_BUCKET_NAME>

Install XManager

pip install git+https://github.com/deepmind/xmanager.git

Or, alternatively, a PyPI project is also available.

pip install xmanager

Writing XManager launch scripts

A snippet for the impatient 🙂
# Contains core primitives and APIs.
from xmanager import xm
# Implementation of those core concepts for what we call 'the local backend',
# which means all executables are sent for execution from this machine,
# independently of whether they are actually executed on our machine or on GCP.
from xmanager import xm_local
#
# Creates an experiment context and saves its metadata to the database, which we
# can reuse later via `xm_local.list_experiments`, for example. Note that
# `experiment` has tracking properties such as `id`.
with xm_local.create_experiment(experiment_title='cifar10') as experiment:
  # Packaging prepares a given *executable spec* for running with a concrete
  # *executor spec*: depending on the combination, that may involve building
  # steps and / or copying the results somewhere. For example, a
  # `xm.python_container` designed to run on `Kubernetes` will be built via
  #`docker build`, and the new image will be uploaded to the container registry.
  # But for our simple case where we have a prebuilt Linux binary designed to
  # run locally only some validations are performed -- for example, that the
  # file exists.
  #
  # `executable` contains all the necessary information needed to launch the
  # packaged blob via `.add`, see below.
  [executable] = experiment.package([
      xm.binary(
          # What we are going to run.
          path='/home/user/project/a.out',
          # Where we are going to run it.
          executor_spec=xm_local.Local.Spec(),
      )
  ])
  #
  # Let's find out which `batch_size` is best -- presumably our jobs write the
  # results somewhere.
  for batch_size in [64, 1024]:
    # `add` creates a new *experiment unit*, which is usually a collection of
    # semantically united jobs, and sends them for execution. To pass an actual
    # collection one may want to use `JobGroup`s (more about it later in the
    # documentation, but for our purposes we are going to pass just one job.
    experiment.add(xm.Job(
        # The `a.out` we packaged earlier.
        executable=executable,
        # We are using the default settings here, but executors have plenty of
        # arguments available to control execution.
        executor=xm_local.Local(),
        # Time to pass the batch size as a command-line argument!
        args={'batch_size': batch_size},
        # We can also pass environment variables.
        env_vars={'HEAPPROFILE': '/tmp/a_out.hprof'},
    ))
  #
  # The context will wait for locally run things (but not for remote things such
  # as jobs sent to GCP, although they can be explicitly awaited via
  # `wait_for_completion`).

The basic structure of an XManager launch script can be summarized by these steps:

  1. Create an experiment and acquire its context.

    from xmanager import xm
    from xmanager import xm_local
    
    with xm_local.create_experiment(experiment_title='cifar10') as experiment:
  2. Define specifications of executables you want to run.

    spec = xm.PythonContainer(
        path='/path/to/python/folder',
        entrypoint=xm.ModuleName('cifar10'),
    )
  3. Package your executables.

    from xmanager import xm_local
    
    [executable] = experiment.package([
      xm.Packageable(
        executable_spec=spec,
        executor_spec=xm_local.Caip.Spec(),
      ),
    ])
  4. Define your hyperparameters.

    import itertools
    
    batch_sizes = [64, 1024]
    learning_rates = [0.1, 0.001]
    trials = list(
      dict([('batch_size', bs), ('learning_rate', lr)])
      for (bs, lr) in itertools.product(batch_sizes, learning_rates)
    )
  5. Define resource requirements for each job.

    requirements = xm.JobRequirements(T4=1)
  6. For each trial, add a job / job groups to launch them.

    for hyperparameters in trials:
      experiment.add(xm.Job(
          executable=executable,
          executor=xm_local.Caip(requirements=requirements),
          args=hyperparameters,
        ))

Now we should be ready to run the launch script.

To learn more about different executables and executors follow 'Components'.

Run XManager

xmanager launch ./xmanager/examples/cifar10_tensorflow/launcher.py

In order to run multi-job experiments, the --xm_wrap_late_bindings flag might be required:

xmanager launch ./xmanager/examples/cifar10_tensorflow/launcher.py -- --xm_wrap_late_bindings

Components

Executable specifications

XManager executable specifications define what should be packaged in the form of binaries, source files, and other input dependencies required for job execution. Executable specifications are reusable are generally platform-independent.

Container

Container defines a pre-built Docker image located at a URL (or locally).

xm.Container(path='gcr.io/project-name/image-name:latest')

xm.container is a shortener for packageable construction.

assert xm.container(
    executor_spec=xm_local.Local.Spec(),
    args=args,
    env_vars=env_vars,
    ...
) == xm.Packageable(
    executable_spec=xm.Container(...),
    executor_spec=xm_local.Local.Spec(),
    args=args,
    env_vars=env_vars,
)

BazelBinary

BazelBinary defines a Bazel binary target identified by a label.

xm.Binary(path='//path/to/target:label')

xm.bazel_binary is a shortener for packageable construction.

assert xm.bazel_binary(
    executor_spec=xm_local.Local.Spec(),
    args=args,
    env_vars=env_vars,
    ...
) == xm.Packageable(
    executable_spec=xm.BazelBinary(...),
    executor_spec=xm_local.Local.Spec(),
    args=args,
    env_vars=env_vars,
)

PythonContainer

PythonContainer defines a Python project that is packaged into a Docker container.

xm.PythonContainer(
    entrypoint: xm.ModuleName('
   
   
   
    
    
    '
   
   
   ),

    # Optionals.
    path: '/path/to/python/project/',  # Defaults to the current directory of the launch script.
    base_image: '[:
   
   
   
    
    
    ]'
   
   
   ,
    docker_instructions: ['RUN ...', 'COPY ...', ...],
)

A simple form of PythonContainer is to just launch a Python module with default docker_intructions.

xm.PythonContainer(entrypoint=xm.ModuleName('cifar10'))

That specification produces a Docker image that runs the following command:

python3 -m cifar10 fixed_arg1 fixed_arg2

An advanced form of PythonContainer allows you to override the entrypoint command as well as the Docker instructions.

xm.PythonContainer(
    entrypoint=xm.CommandList([
      './pre_process.sh',
      'python3 -m cifar10 $@',
      './post_process.sh',
    ]),
    docker_instructions=[
      'COPY pre_process.sh pre_process.sh',
      'RUN chmod +x ./pre_process.sh',
      'COPY cifar10.py',
      'COPY post_process.sh post_process.sh',
      'RUN chmod +x ./post_process.sh',
    ],
)

That specification produces a Docker image that runs the following commands:

./pre_process.sh
python3 -m cifar10 fixed_arg1 fixed_arg2
./post_process.sh

IMPORTANT: Note the use of $@ which accepts command-line arguments. Otherwise, all command-line arguments are ignored by your entrypoint.

xm.python_container is a shortener for packageable construction.

assert xm.python_container(
    executor_spec=xm_local.Local.Spec(),
    args=args,
    env_vars=env_vars,
    ...
) == xm.Packageable(
    executable_spec=xm.PythonContainer(...),
    executor_spec=xm_local.Local.Spec(),
    args=args,
    env_vars=env_vars,
)

Executors

XManager executors define a platform where the job runs and resource requirements for the job.

Each executor also has a specification which describes how an executable specification should be prepared and packaged.

Cloud AI Platform (CAIP)

The Caip executor declares that an executable will be run on the CAIP platform.

The Caip executor takes in a resource requirements object.

xm_local.Caip(
    xm.JobRequirements(
        cpu=1,  # Measured in vCPUs.
        ram=4 * xm.GiB,
        T4=1,  # NVIDIA Tesla T4.
    ),
)
xm_local.Caip(
    xm.JobRequirements(
        cpu=1,  # Measured in vCPUs.
        ram=4 * xm.GiB,
        TPU_V2=8,  # TPU v2.
    ),
)

As of June 2021, the currently supported accelerator types are:

  • P100
  • V100
  • P4
  • T4
  • A100
  • TPU_V2
  • TPU_V3

IMPORTANT: Note that for TPU_V2 and TPU_V3 the only currently supported count is 8.

Caip Specification

The CAIP executor allows you specify a remote image repository to push to.

xm_local.Caip.Spec(
    push_image_tag='gcr.io/
   
   
   
    
    
    /
    
    
    :
    
    
    
     
     
     '
    
    
    
   
   
   ,
)

Local

The local executor declares that an executable will be run on the same machine from which the launch script is invoked.

Kubernetes (experimental)

The Kubernetes executor declares that an executable will be run on a Kubernetes cluster. As of October 2021, Kubernetes is not fully supported.

The Kubernetes executor pulls from your local kubeconfig. The XManager command-line has helpers to set up a Google Kubernetes Engine (GKE) cluster.

pip install caliban==0.4.1
xmanager cluster create

# cleanup
xmanager cluster delete

You can store the GKE credentials in your kubeconfig:

gcloud container clusters get-credentials <cluster-name>
Kubernetes Specification

The Kubernetes executor allows you specify a remote image repository to push to.

xm_local.Kubernetes.Spec(
    push_image_tag='gcr.io/
   
   
   
    
    
    /
    
    
    :
    
    
    
     
     
     '
    
    
    
   
   
   ,
)

Job / JobGroup

A Job represents a single executable on a particular executor, while a JobGroup unites a group of Jobs providing a gang scheduling concept: Jobs inside them are scheduled / descheduled simultaneously. Same Job and JobGroup instances can be added multiple times.

Job

A Job accepts an executable and an executor along with hyperparameters which can either be command-line arguments or environment variables.

Command-line arguments can be passed in list form, [arg1, arg2, arg3]:

binary arg1 arg2 arg3

They can also be passed in dictionary form, {key1: value1, key2: value2}:

binary --key1=value1 --key2=value2

Environment variables are always passed in Dict[str, str] form:

export KEY=VALUE

Jobs are defined like this:

[executable] = xm.Package(...)

executor = xm_local.Caip(...)

xm.Job(
    executable=executable,
    executor=executor,
    args={
        'batch_size': 64,
    },
    env_vars={
        'NCCL_DEBUG': 'INFO',
    },
)

JobGroup

A JobGroup accepts jobs in a kwargs form. The keyword can be any valid Python identifier. For example, you can call your jobs 'agent' and 'observer'.

agent_job = xm.Job(...)
observer_job = xm.Job(...)

xm.JobGroup(agent=agent_job, observer=observer_job)
Comments
  • `xmanager launch` cannot resolve `'docker'` in subprocess

    `xmanager launch` cannot resolve `'docker'` in subprocess

    Hi All,

    I am trying to run the example xmanager launch ./xmanager/examples/cifar10_tensorflow/launcher.py. However, I get the following error. Do you have any suggestion where this error may coming from and how could I fix it?

    I1020 10:57:30.250377 4561079808 build_image.py:134] Local docker: {'Platform': {'Name': 'Docker Engine - Community'}, 'Components': [{'Name': 'Engine', 'Version': '20.10.8', 'Details': {'ApiVersion': '1.41', 'Arch': 'amd64', 'BuildTime': '2021-07-30T19:52:10.000000000+00:00', 'Experimental': 'false', 'GitCommit': '75249d8', 'GoVersion': 'go1.16.6', 'KernelVersion': '5.10.47-linuxkit', 'MinAPIVersion': '1.12', 'Os': 'linux'}}, {'Name': 'containerd', 'Version': '1.4.9', 'Details': {'GitCommit': 'e25210fe30a0a703442421b0f60afac609f950a3'}}, {'Name': 'runc', 'Version': '1.0.1', 'Details': {'GitCommit': 'v1.0.1-0-g4144b63'}}, {'Name': 'docker-init', 'Version': '0.19.0', 'Details': {'GitCommit': 'de40ad0'}}], 'Version': '20.10.8', 'ApiVersion': '1.41', 'MinAPIVersion': '1.12', 'GitCommit': '75249d8', 'GoVersion': 'go1.16.6', 'Os': 'linux', 'Arch': 'amd64', 'KernelVersion': '5.10.47-linuxkit', 'BuildTime': '2021-07-30T19:52:10.000000000+00:00'}
    I1020 10:57:30.250654 4561079808 docker_lib.py:64] Building Docker image
    Dockerfile:
    
    FROM gcr.io/deeplearning-platform-release/tf2-gpu.2-6
    
    RUN if ! id 1000; then useradd -m -u 1000 clouduser; fi
    
    ENV LANG=C.UTF-8
    RUN apt-get update && apt-get install -y git netcat
    RUN python -m pip install --upgrade pip setuptools
    COPY cifar10_tensorflow/requirements.txt /cifar10_tensorflow/requirements.txt
    RUN python -m pip install -r cifar10_tensorflow/requirements.txt
    COPY cifar10_tensorflow/ /cifar10_tensorflow
    RUN chown -R 1000:root /cifar10_tensorflow && chmod -R 775 /cifar10_tensorflow
    WORKDIR cifar10_tensorflow
    
    COPY entrypoint.sh ./entrypoint.sh
    RUN chown -R 1000:root ./entrypoint.sh && chmod -R 775 ./entrypoint.sh
    
    ENTRYPOINT ["./entrypoint.sh"]
    
    Size of Docker input: 7.0 kB
    Building Docker image, please wait...
    Traceback (most recent call last):
      File "/Users/chuchu/anaconda3/envs/jax/bin/xmanager", line 8, in <module>
        sys.exit(entrypoint())
      File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/xmanager/cli/cli.py", line 65, in entrypoint
        app.run(main)
      File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/absl/app.py", line 303, in run
        _run_main(main, args)
      File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main
        sys.exit(main(argv))
      File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/xmanager/cli/cli.py", line 41, in main
        app.run(m.main, argv=argv)
      File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/absl/app.py", line 303, in run
        _run_main(main, args)
      File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main
        sys.exit(main(argv))
      File "/Users/chuchu/Documents/gt_local/try/xmanager/examples/cifar10_tensorflow/launcher.py", line 48, in main
        args={},
      File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/xmanager/xm/core.py", line 484, in package
        return cls._async_packager.package(packageables)
      File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/xmanager/xm/async_packager.py", line 104, in package
        executables = self._package_batch(packageables)
      File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/xmanager/xm_local/packaging/router.py", line 56, in package
        for packageable in packageables
      File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/xmanager/xm_local/packaging/router.py", line 56, in <listcomp>
        for packageable in packageables
      File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/xmanager/xm/pattern_matching.py", line 113, in apply
        return case.handle(*values)
      File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/xmanager/xm_local/packaging/router.py", line 27, in _visit_caip_spec
        packageable.executable_spec)
      File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/xmanager/xm_local/packaging/cloud.py", line 153, in package_cloud_executable
        return _CLOUD_PACKAGING_ROUTER(packageable, executable_spec)
      File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/xmanager/xm/pattern_matching.py", line 113, in apply
        return case.handle(*values)
      File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/xmanager/xm_local/packaging/cloud.py", line 129, in _package_python_container
        packageable.env_vars, push_image_tag))
      File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/xmanager/cloud/build_image.py", line 110, in build
        image_name, project, bucket)
      File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/xmanager/cloud/build_image.py", line 154, in build_by_dockerfile
        show_docker_command_progress=_SHOW_DOCKER_COMMAND_PROGRESS.value)
      File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/xmanager/cloud/docker_lib.py", line 70, in build_docker_image
        dockerfile, show_docker_command_progress)
      File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/xmanager/cloud/docker_lib.py", line 113, in _build_image_with_docker_command
        subprocess.run(command, check=True, env={'DOCKER_BUILDKIT': '1'})
      File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/subprocess.py", line 488, in run
        with Popen(*popenargs, **kwargs) as process:
      File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/subprocess.py", line 800, in __init__
        restore_signals, start_new_session)
      File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/subprocess.py", line 1551, in _execute_child
        raise child_exception_type(errno_num, err_msg, err_filename)
    FileNotFoundError: [Errno 2] No such file or directory: 'docker': 'docker'
    
    opened by felix0901 23
  • Can't change the number of vCPUs

    Can't change the number of vCPUs

    I'm trying to launch a job that requires multiple CPU cores to run faster, for that I make the executor as follows

    xm_local.Vertex(
                    requirements=xm.JobRequirements(CPU=vcpu_count)
                )
    

    setting vcpu_count to 1, 8, 32 and 64 doesn't change the actual number of vCPUs allocated for the task. I check the number of CPUs by running

    import multiprocessing
    
    multiprocessing.cpu_count()
    

    and also running this in the debug terminal of the job cat /proc/cpuinfo | grep processor | wc -l. In all cases these two commands return 4 regardless of the changing requirements.

    Background:

    • The job launches and executes to completion. Although very slow.
    • During build (after the image is pushed to the container registry) I get this warning message
    W0510 14:00:15.198342 140373868750400 http.py:139] Encountered 403 Forbidden with reason "PERMISSION_DENIED"
    

    Followed immediately by

    I0510 14:00:15.200866 140373858600512 base.py:80] Creating CustomJob
    
    • The launched jobs don't show up under the Training Pipelines tab but rather the Custom Jobs tab in Vertex AI -> Training
    opened by AbubakrHassan 7
  • Tensorboard instance is not found when running examples/cifar10_tensorflow

    Tensorboard instance is not found when running examples/cifar10_tensorflow

    When running examples/cifar10_tensorflow the job launches fine and trains to completion. however the tensorboard link created shows a page that says

    Not found: TensorboardExperiment projects/****/locations/us-central1/tensorboards/2824407877244944384/experiments/7194241469736026112 is not found.
    

    Logs from building the job

    I0331 10:24:09.242159 139812284593984 docker_lib.py:67] Local docker: {'Platform': {'Name': 'Docker Engine - Community'}, 'Components': [{'Name': 'Engine', 'Version': '20.10.2', 'Details': {'ApiVersion': '1.41', 'Arch': 'amd64', 'BuildTime': '2020-12-28T16:15:28.000000000+00:00', 'Experimental': 'false', 'GitCommit': '8891c58', 'GoVersion': 'go1.13.15', 'KernelVersion': '5.15.15-1rodete2-amd64', 'MinAPIVersion': '1.12', 'Os': 'linux'}}, {'Name': 'containerd', 'Version': '1.4.3', 'Details': {'GitCommit': '269548fa27e0089a8b8278fc4fc781d7f65a939b'}}, {'Name': 'runc', 'Version': '1.0.0-rc92', 'Details': {'GitCommit': 'ff819c7e9184c13b7c2607fe6c30ae19403a7aff'}}, {'Name': 'docker-init', 'Version': '0.19.0', 'Details': {'GitCommit': 'de40ad0'}}], 'Version': '20.10.2', 'ApiVersion': '1.41', 'MinAPIVersion': '1.12', 'GitCommit': '8891c58', 'GoVersion': 'go1.13.15', 'Os': 'linux', 'Arch': 'amd64', 'KernelVersion': '5.15.15-1rodete2-amd64', 'BuildTime': '2020-12-28T16:15:28.000000000+00:00'}
    Dockerfile:
    
    FROM gcr.io/deeplearning-platform-release/tf2-gpu.2-6
    
    RUN if ! id 1000; then useradd -m -u 1000 clouduser; fi
    
    ENV LANG=C.UTF-8
    RUN apt-get update && apt-get install -y git netcat
    RUN python -m pip install --upgrade pip
    COPY cifar10_tensorflow/requirements.txt /cifar10_tensorflow/requirements.txt
    RUN python -m pip install -r cifar10_tensorflow/requirements.txt
    COPY cifar10_tensorflow/ /cifar10_tensorflow
    RUN chown -R 1000:root /cifar10_tensorflow && chmod -R 775 /cifar10_tensorflow
    WORKDIR cifar10_tensorflow
    
    COPY entrypoint.sh ./entrypoint.sh
    RUN chown -R 1000:root ./entrypoint.sh && chmod -R 775 ./entrypoint.sh
    
    ENTRYPOINT ["./entrypoint.sh"]
    
    Size of Docker input: 7.1 kB
    Building Docker image, please wait...
    I0331 10:24:10.163763 139812284593984 docker_lib.py:67] Local docker: {'Platform': {'Name': 'Docker Engine - Community'}, 'Components': [{'Name': 'Engine', 'Version': '20.10.2', 'Details': {'ApiVersion': '1.41', 'Arch': 'amd64', 'BuildTime': '2020-12-28T16:15:28.000000000+00:00', 'Experimental': 'false', 'GitCommit': '8891c58', 'GoVersion': 'go1.13.15', 'KernelVersion': '5.15.15-1rodete2-amd64', 'MinAPIVersion': '1.12', 'Os': 'linux'}}, {'Name': 'containerd', 'Version': '1.4.3', 'Details': {'GitCommit': '269548fa27e0089a8b8278fc4fc781d7f65a939b'}}, {'Name': 'runc', 'Version': '1.0.0-rc92', 'Details': {'GitCommit': 'ff819c7e9184c13b7c2607fe6c30ae19403a7aff'}}, {'Name': 'docker-init', 'Version': '0.19.0', 'Details': {'GitCommit': 'de40ad0'}}], 'Version': '20.10.2', 'ApiVersion': '1.41', 'MinAPIVersion': '1.12', 'GitCommit': '8891c58', 'GoVersion': 'go1.13.15', 'Os': 'linux', 'Arch': 'amd64', 'KernelVersion': '5.15.15-1rodete2-amd64', 'BuildTime': '2020-12-28T16:15:28.000000000+00:00'}
    I0331 10:24:10.164260 139812284593984 docker_lib.py:89] Building Docker image
    [+] Building 55.8s (16/16) FINISHED                                                                                                                                          
     => [internal] load build definition from Dockerfile                                                                                                                    0.2s
     => => transferring dockerfile: 694B                                                                                                                                    0.0s
     => [internal] load .dockerignore                                                                                                                                       0.2s
     => => transferring context: 2B                                                                                                                                         0.0s
     => [internal] load metadata for gcr.io/deeplearning-platform-release/tf2-gpu.2-6:latest                                                                                0.7s
     => [ 1/11] FROM gcr.io/deeplearning-platform-release/[email protected]:d9bf7c2069ff4bec9d9fc6d30fb286f1646124d04012d9932ee59d58eaca9ac4                               0.0s
     => [internal] load build context                                                                                                                                       0.1s
     => => transferring context: 8.03kB                                                                                                                                     0.0s
     => CACHED [ 2/11] RUN if ! id 1000; then useradd -m -u 1000 clouduser; fi                                                                                              0.0s
     => [ 3/11] RUN apt-get update && apt-get install -y git netcat                                                                                                        15.9s
     => [ 4/11] RUN python -m pip install --upgrade pip                                                                                                                    16.5s
     => [ 5/11] COPY cifar10_tensorflow/requirements.txt /cifar10_tensorflow/requirements.txt                                                                               0.5s
     => [ 6/11] RUN python -m pip install -r cifar10_tensorflow/requirements.txt                                                                                           17.7s
     => [ 7/11] COPY cifar10_tensorflow/ /cifar10_tensorflow                                                                                                                0.5s
     => [ 8/11] RUN chown -R 1000:root /cifar10_tensorflow && chmod -R 775 /cifar10_tensorflow                                                                              1.0s
     => [ 9/11] WORKDIR cifar10_tensorflow                                                                                                                                  0.3s
     => [10/11] COPY entrypoint.sh ./entrypoint.sh                                                                                                                          0.2s
     => [11/11] RUN chown -R 1000:root ./entrypoint.sh && chmod -R 775 ./entrypoint.sh                                                                                      0.7s
     => exporting to image                                                                                                                                                  1.4s
     => => exporting layers                                                                                                                                                 1.0s
     => => writing image sha256:1fb33a18a65d7efd4fcec00ef688ec2ac5502851be5d36bcc9a7b5cf342da775                                                                            0.0s
     => => naming to gcr.io/***/cifar10_tensorflow:20220331-102410-116512                                                                                  0.0s
     => => naming to gcr.io/***/cifar10_tensorflow:latest                                                                                                  0.0s
    I0331 10:25:06.734303 139812284593984 docker_lib.py:98] Building docker image: Done
    I0331 10:25:06.775659 139812284593984 docker_lib.py:67] Local docker: {'Platform': {'Name': 'Docker Engine - Community'}, 'Components': [{'Name': 'Engine', 'Version': '20.10.2', 'Details': {'ApiVersion': '1.41', 'Arch': 'amd64', 'BuildTime': '2020-12-28T16:15:28.000000000+00:00', 'Experimental': 'false', 'GitCommit': '8891c58', 'GoVersion': 'go1.13.15', 'KernelVersion': '5.15.15-1rodete2-amd64', 'MinAPIVersion': '1.12', 'Os': 'linux'}}, {'Name': 'containerd', 'Version': '1.4.3', 'Details': {'GitCommit': '269548fa27e0089a8b8278fc4fc781d7f65a939b'}}, {'Name': 'runc', 'Version': '1.0.0-rc92', 'Details': {'GitCommit': 'ff819c7e9184c13b7c2607fe6c30ae19403a7aff'}}, {'Name': 'docker-init', 'Version': '0.19.0', 'Details': {'GitCommit': 'de40ad0'}}], 'Version': '20.10.2', 'ApiVersion': '1.41', 'MinAPIVersion': '1.12', 'GitCommit': '8891c58', 'GoVersion': 'go1.13.15', 'Os': 'linux', 'Arch': 'amd64', 'KernelVersion': '5.15.15-1rodete2-amd64', 'BuildTime': '2020-12-28T16:15:28.000000000+00:00'}
    I0331 10:25:20.892401 139812284593984 docker_lib.py:107] {"status":"The push refers to repository [gcr.io/***/cifar10_tensorflow]"}
    {"status":"Preparing","progressDetail":{},"id":"ecbb601dd983"}
    {"status":"Preparing","progressDetail":{},"id":"0490c7aeabf0"}
    {"status":"Preparing","progressDetail":{},"id":"5f70bf18a086"}
    {"status":"Preparing","progressDetail":{},"id":"57c7da5da29e"}
    {"status":"Preparing","progressDetail":{},"id":"ddfab15718d9"}
    {"status":"Preparing","progressDetail":{},"id":"a43c37333595"}
    {"status":"Preparing","progressDetail":{},"id":"479d29ce9800"}
    {"status":"Preparing","progressDetail":{},"id":"f4bfb05d8c99"}
    {"status":"Preparing","progressDetail":{},"id":"f634932f0fdf"}
    {"status":"Preparing","progressDetail":{},"id":"5f70bf18a086"}
    {"status":"Preparing","progressDetail":{},"id":"e5a69fe43a97"}
    {"status":"Preparing","progressDetail":{},"id":"ed55b6190435"}
    {"status":"Preparing","progressDetail":{},"id":"87ec19f85372"}
    {"status":"Preparing","progressDetail":{},"id":"8c3b041fd87c"}
    {"status":"Preparing","progressDetail":{},"id":"0ac428b7127a"}
    {"status":"Preparing","progressDetail":{},"id":"370688903f01"}
    {"status":"Waiting","progressDetail":{},"id":"a43c37333595"}
    {"status":"Preparing","progressDetail":{},"id":"76d62c4c37cc"}
    {"status":"Preparing","progressDetail":{},"id":"b3ab95a574c8"}
    {"status":"Preparing","progressDetail":{},"id":"d1b010151b48"}
    {"status":"Preparing","progressDetail":{},"id":"b80bc089358e"}
    {"status":"Preparing","progressDetail":{},"id":"11bc9b36546a"}
    {"status":"Preparing","progressDetail":{},"id":"fffe44800c74"}
    {"status":"Preparing","progressDetail":{},"id":"1175e7a0a8e0"}
    {"status":"Preparing","progressDetail":{},"id":"992f2c95dad2"}
    {"status":"Waiting","progressDetail":{},"id":"f4bfb05d8c99"}
    {"status":"Preparing","progressDetail":{},"id":"91b2ad1e9845"}
    {"status":"Waiting","progressDetail":{},"id":"f634932f0fdf"}
    {"status":"Waiting","progressDetail":{},"id":"ed55b6190435"}
    {"status":"Waiting","progressDetail":{},"id":"e5a69fe43a97"}
    {"status":"Preparing","progressDetail":{},"id":"178f9673d3c0"}
    {"status":"Preparing","progressDetail":{},"id":"3298591378da"}
    {"status":"Waiting","progressDetail":{},"id":"1175e7a0a8e0"}
    {"status":"Preparing","progressDetail":{},"id":"b79b505a5328"}
    {"status":"Preparing","progressDetail":{},"id":"963f45082214"}
    {"status":"Waiting","progressDetail":{},"id":"8c3b041fd87c"}
    {"status":"Preparing","progressDetail":{},"id":"59edb8a95299"}
    {"status":"Preparing","progressDetail":{},"id":"6083edd74f0c"}
    {"status":"Waiting","progressDetail":{},"id":"370688903f01"}
    {"status":"Preparing","progressDetail":{},"id":"4236d5cafaa0"}
    {"status":"Preparing","progressDetail":{},"id":"924dcf5e7282"}
    {"status":"Waiting","progressDetail":{},"id":"3298591378da"}
    {"status":"Waiting","progressDetail":{},"id":"963f45082214"}
    {"status":"Preparing","progressDetail":{},"id":"da29c29e84ca"}
    {"status":"Preparing","progressDetail":{},"id":"1526a09df7d6"}
    {"status":"Preparing","progressDetail":{},"id":"f35a9ab279de"}
    {"status":"Preparing","progressDetail":{},"id":"6cd83fbc36a4"}
    {"status":"Preparing","progressDetail":{},"id":"a7a59823f7fd"}
    {"status":"Preparing","progressDetail":{},"id":"a86b3e862105"}
    {"status":"Waiting","progressDetail":{},"id":"b80bc089358e"}
    {"status":"Waiting","progressDetail":{},"id":"d1b010151b48"}
    {"status":"Preparing","progressDetail":{},"id":"9ad794ce6bea"}
    {"status":"Preparing","progressDetail":{},"id":"d533033842c0"}
    {"status":"Preparing","progressDetail":{},"id":"9f54eef41275"}
    {"status":"Waiting","progressDetail":{},"id":"6083edd74f0c"}
    {"status":"Waiting","progressDetail":{},"id":"4236d5cafaa0"}
    {"status":"Waiting","progressDetail":{},"id":"d533033842c0"}
    {"status":"Waiting","progressDetail":{},"id":"9f54eef41275"}
    {"status":"Waiting","progressDetail":{},"id":"a86b3e862105"}
    {"status":"Waiting","progressDetail":{},"id":"924dcf5e7282"}
    {"status":"Waiting","progressDetail":{},"id":"da29c29e84ca"}
    {"status":"Waiting","progressDetail":{},"id":"b79b505a5328"}
    {"status":"Waiting","progressDetail":{},"id":"91b2ad1e9845"}
    {"status":"Waiting","progressDetail":{},"id":"1526a09df7d6"}
    {"status":"Waiting","progressDetail":{},"id":"a7a59823f7fd"}
    {"status":"Waiting","progressDetail":{},"id":"178f9673d3c0"}
    {"status":"Waiting","progressDetail":{},"id":"9ad794ce6bea"}
    {"status":"Waiting","progressDetail":{},"id":"0ac428b7127a"}
    {"status":"Waiting","progressDetail":{},"id":"11bc9b36546a"}
    {"status":"Waiting","progressDetail":{},"id":"59edb8a95299"}
    {"status":"Waiting","progressDetail":{},"id":"992f2c95dad2"}
    {"status":"Waiting","progressDetail":{},"id":"f35a9ab279de"}
    {"status":"Waiting","progressDetail":{},"id":"6cd83fbc36a4"}
    {"status":"Waiting","progressDetail":{},"id":"b3ab95a574c8"}
    {"status":"Pushing","progressDetail":{"current":512,"total":528},"progress":"[================================================\u003e  ]     512B/528B","id":"ecbb601dd983"}
    {"status":"Pushing","progressDetail":{"current":512,"total":528},"progress":"[================================================\u003e  ]     512B/528B","id":"0490c7aeabf0"}
    {"status":"Pushing","progressDetail":{"current":512,"total":7094},"progress":"[===\u003e                                               ]     512B/7.094kB","id":"ddfab15718d9"}
    {"status":"Pushing","progressDetail":{"current":512,"total":7094},"progress":"[===\u003e                                               ]     512B/7.094kB","id":"57c7da5da29e"}
    {"status":"Pushing","progressDetail":{"current":11776,"total":7094},"progress":"[==================================================\u003e]  11.78kB","id":"ddfab15718d9"}
    {"status":"Pushing","progressDetail":{"current":3072,"total":528},"progress":"[==================================================\u003e]  3.072kB","id":"0490c7aeabf0"}
    {"status":"Pushing","progressDetail":{"current":11776,"total":7094},"progress":"[==================================================\u003e]  11.78kB","id":"57c7da5da29e"}
    {"status":"Layer already exists","progressDetail":{},"id":"5f70bf18a086"}
    {"status":"Pushing","progressDetail":{"current":31984,"total":2986836},"progress":"[\u003e                                                  ]  31.98kB/2.987MB","id":"a43c37333595"}
    {"status":"Pushing","progressDetail":{"current":3072,"total":528},"progress":"[==================================================\u003e]  3.072kB","id":"ecbb601dd983"}
    {"status":"Pushing","progressDetail":{"current":1428239,"total":2986836},"progress":"[=======================\u003e                           ]  1.428MB/2.987MB","id":"a43c37333595"}
    {"status":"Pushing","progressDetail":{"current":2741798,"total":2986836},"progress":"[=============================================\u003e     ]  2.742MB/2.987MB","id":"a43c37333595"}
    {"status":"Pushing","progressDetail":{"current":3273728,"total":2986836},"progress":"[==================================================\u003e]  3.274MB","id":"a43c37333595"}
    {"status":"Pushed","progressDetail":{},"id":"ddfab15718d9"}
    {"status":"Pushed","progressDetail":{},"id":"0490c7aeabf0"}
    {"status":"Pushed","progressDetail":{},"id":"57c7da5da29e"}
    {"status":"Pushing","progressDetail":{"current":512,"total":39},"progress":"[==================================================\u003e]     512B","id":"479d29ce9800"}
    {"status":"Pushing","progressDetail":{"current":2560,"total":39},"progress":"[==================================================\u003e]   2.56kB","id":"479d29ce9800"}
    {"status":"Pushing","progressDetail":{"current":512,"total":19820},"progress":"[=\u003e                                                 ]     512B/19.82kB","id":"f4bfb05d8c99"}
    {"status":"Pushing","progressDetail":{"current":28160,"total":19820},"progress":"[==================================================\u003e]  28.16kB","id":"f4bfb05d8c99"}
    {"status":"Pushing","progressDetail":{"current":413696,"total":38721310},"progress":"[\u003e                                                  ]  413.7kB/38.72MB","id":"f634932f0fdf"}
    {"status":"Pushed","progressDetail":{},"id":"ecbb601dd983"}
    {"status":"Pushed","progressDetail":{},"id":"a43c37333595"}
    {"status":"Pushing","progressDetail":{"current":2009600,"total":38721310},"progress":"[==\u003e                                                ]   2.01MB/38.72MB","id":"f634932f0fdf"}
    {"status":"Pushing","progressDetail":{"current":3582464,"total":38721310},"progress":"[====\u003e                                              ]  3.582MB/38.72MB","id":"f634932f0fdf"}
    {"status":"Pushing","progressDetail":{"current":5180416,"total":38721310},"progress":"[======\u003e                                            ]   5.18MB/38.72MB","id":"f634932f0fdf"}
    {"status":"Pushing","progressDetail":{"current":6753280,"total":38721310},"progress":"[========\u003e                                          ]  6.753MB/38.72MB","id":"f634932f0fdf"}
    {"status":"Pushing","progressDetail":{"current":8331264,"total":38721310},"progress":"[==========\u003e                                        ]  8.331MB/38.72MB","id":"f634932f0fdf"}
    {"status":"Pushing","progressDetail":{"current":10291712,"total":38721310},"progress":"[=============\u003e                                     ]  10.29MB/38.72MB","id":"f634932f0fdf"}
    {"status":"Pushing","progressDetail":{"current":12274176,"total":38721310},"progress":"[===============\u003e                                   ]  12.27MB/38.72MB","id":"f634932f0fdf"}
    {"status":"Pushing","progressDetail":{"current":14240256,"total":38721310},"progress":"[==================\u003e                                ]  14.24MB/38.72MB","id":"f634932f0fdf"}
    {"status":"Mounted from deeplearning-platform-release/tf2-gpu.2-6","progressDetail":{},"id":"e5a69fe43a97"}
    {"status":"Pushing","progressDetail":{"current":16206336,"total":38721310},"progress":"[====================\u003e                              ]  16.21MB/38.72MB","id":"f634932f0fdf"}
    {"status":"Pushing","progressDetail":{"current":17779200,"total":38721310},"progress":"[======================\u003e                            ]  17.78MB/38.72MB","id":"f634932f0fdf"}
    {"status":"Pushing","progressDetail":{"current":19745280,"total":38721310},"progress":"[=========================\u003e                         ]  19.75MB/38.72MB","id":"f634932f0fdf"}
    {"status":"Mounted from deeplearning-platform-release/tf2-gpu.2-6","progressDetail":{},"id":"ed55b6190435"}
    {"status":"Pushing","progressDetail":{"current":21318144,"total":38721310},"progress":"[===========================\u003e                       ]  21.32MB/38.72MB","id":"f634932f0fdf"}
    {"status":"Pushing","progressDetail":{"current":23284224,"total":38721310},"progress":"[==============================\u003e                    ]  23.28MB/38.72MB","id":"f634932f0fdf"}
    {"status":"Pushing","progressDetail":{"current":25250304,"total":38721310},"progress":"[================================\u003e                  ]  25.25MB/38.72MB","id":"f634932f0fdf"}
    {"status":"Pushing","progressDetail":{"current":27216384,"total":38721310},"progress":"[===================================\u003e               ]  27.22MB/38.72MB","id":"f634932f0fdf"}
    {"status":"Pushing","progressDetail":{"current":29182464,"total":38721310},"progress":"[=====================================\u003e             ]  29.18MB/38.72MB","id":"f634932f0fdf"}
    {"status":"Pushing","progressDetail":{"current":30768640,"total":38721310},"progress":"[=======================================\u003e           ]  30.77MB/38.72MB","id":"f634932f0fdf"}
    {"status":"Pushing","progressDetail":{"current":32359424,"total":38721310},"progress":"[=========================================\u003e         ]  32.36MB/38.72MB","id":"f634932f0fdf"}
    {"status":"Mounted from deeplearning-platform-release/tf2-gpu.2-6","progressDetail":{},"id":"87ec19f85372"}
    {"status":"Pushing","progressDetail":{"current":33952256,"total":38721310},"progress":"[===========================================\u003e       ]  33.95MB/38.72MB","id":"f634932f0fdf"}
    {"status":"Pushing","progressDetail":{"current":35525120,"total":38721310},"progress":"[=============================================\u003e     ]  35.53MB/38.72MB","id":"f634932f0fdf"}
    {"status":"Pushing","progressDetail":{"current":37110272,"total":38721310},"progress":"[===============================================\u003e   ]  37.11MB/38.72MB","id":"f634932f0fdf"}
    {"status":"Pushing","progressDetail":{"current":38809600,"total":38721310},"progress":"[==================================================\u003e]  38.81MB","id":"f634932f0fdf"}
    {"status":"Mounted from deeplearning-platform-release/tf2-gpu.2-6","progressDetail":{},"id":"8c3b041fd87c"}
    {"status":"Pushed","progressDetail":{},"id":"479d29ce9800"}
    {"status":"Pushed","progressDetail":{},"id":"f4bfb05d8c99"}
    {"status":"Mounted from deeplearning-platform-release/tf2-gpu.2-6","progressDetail":{},"id":"0ac428b7127a"}
    {"status":"Layer already exists","progressDetail":{},"id":"b3ab95a574c8"}
    {"status":"Layer already exists","progressDetail":{},"id":"d1b010151b48"}
    {"status":"Mounted from deeplearning-platform-release/tf2-gpu.2-6","progressDetail":{},"id":"370688903f01"}
    {"status":"Layer already exists","progressDetail":{},"id":"b80bc089358e"}
    {"status":"Layer already exists","progressDetail":{},"id":"11bc9b36546a"}
    {"status":"Layer already exists","progressDetail":{},"id":"fffe44800c74"}
    {"status":"Mounted from deeplearning-platform-release/tf2-gpu.2-6","progressDetail":{},"id":"76d62c4c37cc"}
    {"status":"Layer already exists","progressDetail":{},"id":"1175e7a0a8e0"}
    {"status":"Layer already exists","progressDetail":{},"id":"992f2c95dad2"}
    {"status":"Layer already exists","progressDetail":{},"id":"91b2ad1e9845"}
    {"status":"Layer already exists","progressDetail":{},"id":"178f9673d3c0"}
    {"status":"Layer already exists","progressDetail":{},"id":"3298591378da"}
    {"status":"Layer already exists","progressDetail":{},"id":"963f45082214"}
    {"status":"Layer already exists","progressDetail":{},"id":"59edb8a95299"}
    {"status":"Layer already exists","progressDetail":{},"id":"b79b505a5328"}
    {"status":"Layer already exists","progressDetail":{},"id":"6083edd74f0c"}
    {"status":"Layer already exists","progressDetail":{},"id":"4236d5cafaa0"}
    {"status":"Layer already exists","progressDetail":{},"id":"da29c29e84ca"}
    {"status":"Layer already exists","progressDetail":{},"id":"924dcf5e7282"}
    {"status":"Layer already exists","progressDetail":{},"id":"1526a09df7d6"}
    {"status":"Layer already exists","progressDetail":{},"id":"f35a9ab279de"}
    {"status":"Layer already exists","progressDetail":{},"id":"6cd83fbc36a4"}
    {"status":"Layer already exists","progressDetail":{},"id":"a7a59823f7fd"}
    {"status":"Layer already exists","progressDetail":{},"id":"a86b3e862105"}
    {"status":"Layer already exists","progressDetail":{},"id":"9ad794ce6bea"}
    {"status":"Layer already exists","progressDetail":{},"id":"9f54eef41275"}
    {"status":"Layer already exists","progressDetail":{},"id":"d533033842c0"}
    {"status":"Pushed","progressDetail":{},"id":"f634932f0fdf"}
    {"status":"20220331-102410-116512: digest: sha256:c059dbb502a1b915aef8b10e0e1dd4e9a241d23adb19431ffad552a4edfeb3b9 size: 9127"}
    {"progressDetail":{},"aux":{"Tag":"20220331-102410-116512","Digest":"sha256:c059dbb502a1b915aef8b10e0e1dd4e9a241d23adb19431ffad552a4edfeb3b9","Size":9127}}
    
    Your image URI is: gcr.io/***/cifar10_tensorflow:20220331-102410-116512
    E0331 10:25:27.511640750 3964344 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
    E0331 10:25:29.811606183 3964345 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
    E0331 10:25:30.695179886 3964345 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
    E0331 10:25:31.677614770 3964345 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
    E0331 10:25:32.550932107 3964345 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
    E0331 10:25:34.109284446 3964345 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
    E0331 10:25:35.012729765 3964345 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
    W0331 10:25:36.525629 139812189455936 http.py:139] Encountered 403 Forbidden with reason "PERMISSION_DENIED"
    I0331 10:25:36.528207 139812155360832 base.py:80] Creating CustomJob
    I0331 10:25:37.428091 139812155360832 base.py:127] CustomJob created. Resource name: projects/****/locations/us-central1/customJobs/1290022358253305856
    I0331 10:25:37.428335 139812155360832 base.py:128] To use this CustomJob in another session:
    I0331 10:25:37.428413 139812155360832 base.py:129] custom_job = aiplatform.CustomJob.get('projects/***/locations/us-central1/customJobs/1290022358253305856')
    I0331 10:25:37.429027 139812155360832 jobs.py:1412] View Custom Job:
    https://console.cloud.google.com/ai/platform/locations/us-central1/training/1290022358253305856?project=***
    I0331 10:25:37.429559 139812155360832 jobs.py:1415] View Tensorboard:
    https://us-central1.tensorboard.googleusercontent.com/experiment/projects+***+locations+us-central1+tensorboards+2824407877244944384+experiments+1290022358253305856
    Job launched at: https://console.cloud.google.com/ai/platform/locations/us-central1/training/1290022358253305856?project=***
    E0331 10:25:37.646365894 3964345 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
    E0331 10:25:38.566072246 3964345 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
    E0331 10:25:39.474203776 3964345 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
    E0331 10:25:40.424329133 3964345 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
    E0331 10:25:41.956427682 3964345 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
    E0331 10:25:42.839551026 3964345 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
    I0331 10:25:43.589789 139812155360832 jobs.py:178] CustomJob projects/***/locations/us-central1/customJobs/1290022358253305856 current state:
    JobState.JOB_STATE_PENDING
    W0331 10:25:44.369634 139812189455936 http.py:139] Encountered 403 Forbidden with reason "PERMISSION_DENIED"
    I0331 10:25:44.371166 139812138391104 base.py:80] Creating CustomJob
    I0331 10:25:45.369434 139812138391104 base.py:127] CustomJob created. Resource name: projects/***/locations/us-central1/customJobs/7194241469736026112
    I0331 10:25:45.369657 139812138391104 base.py:128] To use this CustomJob in another session:
    I0331 10:25:45.369731 139812138391104 base.py:129] custom_job = aiplatform.CustomJob.get('projects/***/locations/us-central1/customJobs/7194241469736026112')
    I0331 10:25:45.369875 139812138391104 jobs.py:1412] View Custom Job:
    https://console.cloud.google.com/ai/platform/locations/us-central1/training/7194241469736026112?project=***
    I0331 10:25:45.370020 139812138391104 jobs.py:1415] View Tensorboard:
    https://us-central1.tensorboard.googleusercontent.com/experiment/projects+***+locations+us-central1+tensorboards+2824407877244944384+experiments+7194241469736026112
    Job launched at: https://console.cloud.google.com/ai/platform/locations/us-central1/training/7194241469736026112?project=***
    Waiting for local jobs to complete. Press Ctrl+C to terminate them and exit
    I0331 10:25:51.482136 139812138391104 jobs.py:178] CustomJob projects/***/locations/us-central1/customJobs/7194241469736026112 current state:
    JobState.JOB_STATE_PENDING
    I0331 10:25:54.744579 139812155360832 jobs.py:178] CustomJob projects/***/locations/us-central1/customJobs/1290022358253305856 current state:
    JobState.JOB_STATE_PENDING
    I0331 10:26:02.614101 139812138391104 jobs.py:178] CustomJob projects/***/locations/us-central1/customJobs/7194241469736026112 current state:
    JobState.JOB_STATE_PENDING
    I0331 10:26:17.016368 139812155360832 jobs.py:178] CustomJob projects/***/locations/us-central1/customJobs/1290022358253305856 current state:
    JobState.JOB_STATE_PENDING
    I0331 10:26:24.918474 139812138391104 jobs.py:178] CustomJob projects/***/locations/us-central1/customJobs/7194241469736026112 current state:
    JobState.JOB_STATE_PENDING
    I0331 10:27:01.445312 139812155360832 jobs.py:178] CustomJob projects/***/locations/us-central1/customJobs/1290022358253305856 current state:
    JobState.JOB_STATE_PENDING
    I0331 10:27:08.947669 139812138391104 jobs.py:178] CustomJob projects/***/locations/us-central1/customJobs/7194241469736026112 current state:
    JobState.JOB_STATE_PENDING
    I0331 10:28:24.716295 139812155360832 jobs.py:178] CustomJob projects/***/locations/us-central1/customJobs/1290022358253305856 current state:
    JobState.JOB_STATE_PENDING
    I0331 10:28:31.778189 139812138391104 jobs.py:178] CustomJob projects/***/locations/us-central1/customJobs/7194241469736026112 current state:
    JobState.JOB_STATE_PENDING
    I0331 10:30:33.545330 139812138391104 jobs.py:1127] CustomJob projects/***/locations/us-central1/customJobs/7194241469736026112 access the interactive shell terminals for the custom job:
    workerpool0-0:
    cb68e24fff4cd7b5-dot-us-central1.aiplatform-training.googleusercontent.com
    I0331 10:30:42.651297 139812155360832 jobs.py:1127] CustomJob projects/***/locations/us-central1/customJobs/1290022358253305856 access the interactive shell terminals for the custom job:
    workerpool0-0:
    8dc25320200e3bcd-dot-us-central1.aiplatform-training.googleusercontent.com
    I0331 10:31:09.588982 139812155360832 jobs.py:178] CustomJob projects/***/locations/us-central1/customJobs/1290022358253305856 current state:
    JobState.JOB_STATE_RUNNING
    I0331 10:31:12.343992 139812138391104 jobs.py:178] CustomJob projects/***/locations/us-central1/customJobs/7194241469736026112 current state:
    JobState.JOB_STATE_RUNNING
    
    opened by AbubakrHassan 6
  • Examples Timeout

    Examples Timeout

    Hi! I'm trying to use xmanager and while the setup went well all of the examples are timing out before even running the network. Any ideas what the error could be?

    cifar10 pytorch log
    starting build "518b658c-c8ae-4595-9058-95eea6cdcaa5"
    
    FETCHSOURCE
    Fetching storage object: gs://revirainbow_bucket2/cifar10_torch-latest.tar.gz#1624743987085175
    Copying gs://revirainbow_bucket2/cifar10_torch-latest.tar.gz#1624743987085175...
    / [0 files][    0.0 B/  7.1 KiB]                                                
    / [1 files][  7.1 KiB/  7.1 KiB]                                                
    Operation completed over 1 objects/7.1 KiB.                                      
    tar: Removing leading `/' from member names
    BUILD
    Pulling image: gcr.io/kaniko-project/executor:latest
    latest: Pulling from kaniko-project/executor
    Digest: sha256:6ecc43ae139ad8cfa11604b592aaedddcabff8cef469eda303f1fb5afe5e3034
    Status: Downloaded newer image for gcr.io/kaniko-project/executor:latest
    gcr.io/kaniko-project/executor:latest
    INFO[0000] GET KEYCHAIN                                 
    INFO[0000] Retrieving image manifest gcr.io/deeplearning-platform-release/pytorch-gpu.1-6 
    INFO[0000] Retrieving image gcr.io/deeplearning-platform-release/pytorch-gpu.1-6 from registry gcr.io 
    INFO[0000] GET KEYCHAIN                                 
    INFO[0000] Retrieving image manifest gcr.io/deeplearning-platform-release/pytorch-gpu.1-6 
    INFO[0000] Returning cached image manifest              
    INFO[0000] Built cross stage deps: map[]                
    INFO[0000] Retrieving image manifest gcr.io/deeplearning-platform-release/pytorch-gpu.1-6 
    INFO[0000] Returning cached image manifest              
    INFO[0000] Retrieving image manifest gcr.io/deeplearning-platform-release/pytorch-gpu.1-6 
    INFO[0000] Returning cached image manifest              
    INFO[0000] Executing 0 build triggers                   
    INFO[0000] Checking for cached layer gcr.io/researchprojects-msc/cifar10_torch/cache:f6b49a2c721c492debdfe49e26c8073947a7c2f39c82709a8463ca794242a13b... 
    INFO[0000] GET KEYCHAIN                                 
    INFO[0001] No cached layer found for cmd RUN apt-get update && apt-get install -y git 
    INFO[0001] Unpacking rootfs as cmd RUN apt-get update && apt-get install -y git requires it. 
    INFO[0502] ENV LANG=C.UTF-8                             
    INFO[0502] No files changed in this command, skipping snapshotting. 
    INFO[0502] RUN apt-get update && apt-get install -y git 
    INFO[0502] Taking snapshot of full filesystem...        
    INFO[0811] cmd: /bin/sh                                 
    INFO[0811] args: [-c apt-get update && apt-get install -y git] 
    INFO[0811] Running: [/bin/sh -c apt-get update && apt-get install -y git] 
    Get:1 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
    Get:2 http://packages.cloud.google.com/apt gcsfuse-bionic InRelease [5385 B]
    Get:3 http://packages.cloud.google.com/apt cloud-sdk-bionic InRelease [6780 B]
    Ign:4 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
    Hit:5 http://archive.ubuntu.com/ubuntu bionic InRelease
    Ign:6 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
    Get:7 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Release [697 B]
    Get:8 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release [564 B]
    Get:9 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Release.gpg [836 B]
    Err:2 http://packages.cloud.google.com/apt gcsfuse-bionic InRelease
      The following signatures couldn't be verified because the public key is not available: NO_PUBKEY FEEA9169307EA071 NO_PUBKEY 8B57C5C2836F4BEB
    Get:10 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release.gpg [833 B]
    Get:11 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
    Get:12 http://security.ubuntu.com/ubuntu bionic-security/restricted amd64 Packages [473 kB]
    Get:13 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages [2220 kB]
    Err:3 http://packages.cloud.google.com/apt cloud-sdk-bionic InRelease
      The following signatures couldn't be verified because the public key is not available: NO_PUBKEY FEEA9169307EA071 NO_PUBKEY 8B57C5C2836F4BEB
    Get:14 http://security.ubuntu.com/ubuntu bionic-security/multiverse amd64 Packages [24.7 kB]
    Get:15 http://security.ubuntu.com/ubuntu bionic-security/universe amd64 Packages [1418 kB]
    Get:16 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]
    Ign:17 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Packages
    Get:17 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Packages [599 kB]
    Get:18 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Packages [73.8 kB]
    Get:19 http://archive.ubuntu.com/ubuntu bionic-updates/restricted amd64 Packages [505 kB]
    Get:20 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages [2188 kB]
    Get:21 http://archive.ubuntu.com/ubuntu bionic-updates/multiverse amd64 Packages [33.5 kB]
    Get:22 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages [2653 kB]
    Fetched 10.5 MB in 2s (4910 kB/s)
    Reading package lists...
    W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: http://packages.cloud.google.com/apt gcsfuse-bionic InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY FEEA9169307EA071 NO_PUBKEY 8B57C5C2836F4BEB
    W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: http://packages.cloud.google.com/apt cloud-sdk-bionic InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY FEEA9169307EA071 NO_PUBKEY 8B57C5C2836F4BEB
    W: Failed to fetch http://packages.cloud.google.com/apt/dists/gcsfuse-bionic/InRelease  The following signatures couldn't be verified because the public key is not available: NO_PUBKEY FEEA9169307EA071 NO_PUBKEY 8B57C5C2836F4BEB
    W: Failed to fetch http://packages.cloud.google.com/apt/dists/cloud-sdk-bionic/InRelease  The following signatures couldn't be verified because the public key is not available: NO_PUBKEY FEEA9169307EA071 NO_PUBKEY 8B57C5C2836F4BEB
    W: Some index files failed to download. They have been ignored, or old ones used instead.
    Reading package lists...
    Building dependency tree...
    Reading state information...
    Suggested packages:
      gettext-base git-daemon-run | git-daemon-sysvinit git-doc git-el git-email
      git-gui gitk gitweb git-cvs git-mediawiki git-svn
    The following packages will be upgraded:
      git
    1 upgraded, 0 newly installed, 0 to remove and 104 not upgraded.
    Need to get 3916 kB of archives.
    After this operation, 8192 B of additional disk space will be used.
    Get:1 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 git amd64 1:2.17.1-1ubuntu0.8 [3916 kB]
    debconf: delaying package configuration, since apt-utils is not installed
    Fetched 3916 kB in 1s (4849 kB/s)
    (Reading database ... 
    (Reading database ... 5%
    (Reading database ... 10%
    (Reading database ... 15%
    (Reading database ... 20%
    (Reading database ... 25%
    (Reading database ... 30%
    (Reading database ... 35%
    (Reading database ... 40%
    (Reading database ... 45%
    (Reading database ... 50%
    (Reading database ... 55%
    (Reading database ... 60%
    (Reading database ... 65%
    (Reading database ... 70%
    (Reading database ... 75%
    (Reading database ... 80%
    (Reading database ... 85%
    (Reading database ... 90%
    (Reading database ... 95%
    (Reading database ... 100%
    (Reading database ... 91467 files and directories currently installed.)
    Preparing to unpack .../git_1%3a2.17.1-1ubuntu0.8_amd64.deb ...
    Unpacking git (1:2.17.1-1ubuntu0.8) over (1:2.17.1-1ubuntu0.7) ...
    Setting up git (1:2.17.1-1ubuntu0.8) ...
    INFO[0819] Taking snapshot of full filesystem...        
    INFO[1109] RUN python -m pip install --upgrade pip      
    INFO[1109] cmd: /bin/sh                                 
    INFO[1109] args: [-c python -m pip install --upgrade pip] 
    INFO[1109] Running: [/bin/sh -c python -m pip install --upgrade pip] 
    INFO[1109] Pushing layer gcr.io/researchprojects-msc/cifar10_torch/cache:f6b49a2c721c492debdfe49e26c8073947a7c2f39c82709a8463ca794242a13b to cache now 
    INFO[1109] GET KEYCHAIN                                 
    INFO[1110] Pushing image to gcr.io/researchprojects-msc/cifar10_torch/cache:f6b49a2c721c492debdfe49e26c8073947a7c2f39c82709a8463ca794242a13b 
    Collecting pip
      Downloading pip-21.1.3-py3-none-any.whl (1.5 MB)
    Installing collected packages: pip
      Attempting uninstall: pip
        Found existing installation: pip 20.2.4
        Uninstalling pip-20.2.4:
    INFO[1113] Pushed image to 1 destinations               
          Successfully uninstalled pip-20.2.4
    Successfully installed pip-21.1.3
    INFO[1115] Taking snapshot of full filesystem...        
    ERROR
    ERROR: build step 0 "gcr.io/kaniko-project/executor:latest" failed: step exited with non-zero status: 2
    
    Tensorflow take 1
    starting build "d153cbb4-f4f5-48f2-a935-40d4e6f47584"
    
    FETCHSOURCE
    Fetching storage object: gs://revirainbow_bucket2/cifar10_tensorflow-latest.tar.gz#1624821333936417
    Copying gs://revirainbow_bucket2/cifar10_tensorflow-latest.tar.gz#1624821333936417...
    / [0 files][    0.0 B/  5.6 KiB]                                                
    / [1 files][  5.6 KiB/  5.6 KiB]                                                
    Operation completed over 1 objects/5.6 KiB.                                      
    tar: Removing leading `/' from member names
    BUILD
    Pulling image: gcr.io/kaniko-project/executor:latest
    latest: Pulling from kaniko-project/executor
    Digest: sha256:6ecc43ae139ad8cfa11604b592aaedddcabff8cef469eda303f1fb5afe5e3034
    Status: Downloaded newer image for gcr.io/kaniko-project/executor:latest
    gcr.io/kaniko-project/executor:latest
    INFO[0000] GET KEYCHAIN                                 
    INFO[0000] Retrieving image manifest gcr.io/deeplearning-platform-release/tf2-gpu.2-1 
    INFO[0000] Retrieving image gcr.io/deeplearning-platform-release/tf2-gpu.2-1 from registry gcr.io 
    INFO[0000] GET KEYCHAIN                                 
    INFO[0000] Retrieving image manifest gcr.io/deeplearning-platform-release/tf2-gpu.2-1 
    INFO[0000] Returning cached image manifest              
    INFO[0000] Built cross stage deps: map[]                
    INFO[0000] Retrieving image manifest gcr.io/deeplearning-platform-release/tf2-gpu.2-1 
    INFO[0000] Returning cached image manifest              
    INFO[0000] Retrieving image manifest gcr.io/deeplearning-platform-release/tf2-gpu.2-1 
    INFO[0000] Returning cached image manifest              
    INFO[0000] Executing 0 build triggers                   
    INFO[0000] Checking for cached layer gcr.io/researchprojects-msc/cifar10_tensorflow/cache:331fd6441d19d384b5d8f21997642529c44fad394563eff5b2843bd14dae0f7d... 
    INFO[0000] GET KEYCHAIN                                 
    INFO[0000] No cached layer found for cmd RUN apt-get update && apt-get install -y git 
    INFO[0000] Unpacking rootfs as cmd RUN apt-get update && apt-get install -y git requires it. 
    INFO[0403] ENV LANG=C.UTF-8                             
    INFO[0403] No files changed in this command, skipping snapshotting. 
    INFO[0403] RUN apt-get update && apt-get install -y git 
    INFO[0403] Taking snapshot of full filesystem...        
    INFO[0662] cmd: /bin/sh                                 
    INFO[0662] args: [-c apt-get update && apt-get install -y git] 
    INFO[0662] Running: [/bin/sh -c apt-get update && apt-get install -y git] 
    Get:1 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
    Ign:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
    Get:3 http://archive.ubuntu.com/ubuntu bionic InRelease [242 kB]
    Ign:4 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
    Get:5 http://packages.cloud.google.com/apt gcsfuse-bionic InRelease [5385 B]
    Get:6 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Release [697 B]
    Get:7 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release [564 B]
    Get:8 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Release.gpg [836 B]
    Get:9 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release.gpg [833 B]
    Get:10 http://security.ubuntu.com/ubuntu bionic-security/multiverse amd64 Packages [24.7 kB]
    Get:11 http://packages.cloud.google.com/apt cloud-sdk-bionic InRelease [6780 B]
    Get:12 http://security.ubuntu.com/ubuntu bionic-security/restricted amd64 Packages [473 kB]
    Get:13 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages [2220 kB]
    Get:14 http://security.ubuntu.com/ubuntu bionic-security/universe amd64 Packages [1418 kB]
    Get:15 http://packages.cloud.google.com/apt gcsfuse-bionic/main amd64 Packages [339 B]
    Get:16 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
    Get:17 https://packages.cloud.google.com/apt google-fast-socket InRelease [5405 B]
    Get:18 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]
    Ign:19 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Packages
    Get:19 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Packages [599 kB]
    Get:20 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Packages [73.8 kB]
    Get:21 http://packages.cloud.google.com/apt cloud-sdk-bionic/main amd64 Packages [191 kB]
    Get:22 http://archive.ubuntu.com/ubuntu bionic/restricted amd64 Packages [13.5 kB]
    Get:23 https://packages.cloud.google.com/apt google-fast-socket/main amd64 Packages [431 B]
    Get:24 http://archive.ubuntu.com/ubuntu bionic/universe amd64 Packages [11.3 MB]
    Get:25 http://archive.ubuntu.com/ubuntu bionic/multiverse amd64 Packages [186 kB]
    Get:26 http://archive.ubuntu.com/ubuntu bionic/main amd64 Packages [1344 kB]
    Get:27 http://archive.ubuntu.com/ubuntu bionic-updates/restricted amd64 Packages [505 kB]
    Get:28 http://archive.ubuntu.com/ubuntu bionic-updates/multiverse amd64 Packages [33.5 kB]
    Get:29 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages [2188 kB]
    Get:30 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages [2653 kB]
    Get:31 http://archive.ubuntu.com/ubuntu bionic-backports/main amd64 Packages [11.3 kB]
    Get:32 http://archive.ubuntu.com/ubuntu bionic-backports/universe amd64 Packages [11.4 kB]
    Fetched 23.8 MB in 3s (6929 kB/s)
    Reading package lists...
    Reading package lists...
    Building dependency tree...
    Reading state information...
    git is already the newest version (1:2.17.1-1ubuntu0.8).
    0 upgraded, 0 newly installed, 0 to remove and 17 not upgraded.
    INFO[0669] Taking snapshot of full filesystem...        
    INFO[0907] Pushing layer gcr.io/researchprojects-msc/cifar10_tensorflow/cache:331fd6441d19d384b5d8f21997642529c44fad394563eff5b2843bd14dae0f7d to cache now 
    INFO[0907] GET KEYCHAIN                                 
    INFO[0907] RUN python -m pip install --upgrade pip      
    INFO[0907] cmd: /bin/sh                                 
    INFO[0907] args: [-c python -m pip install --upgrade pip] 
    INFO[0907] Running: [/bin/sh -c python -m pip install --upgrade pip] 
    INFO[0907] Pushing image to gcr.io/researchprojects-msc/cifar10_tensorflow/cache:331fd6441d19d384b5d8f21997642529c44fad394563eff5b2843bd14dae0f7d 
    Requirement already satisfied: pip in /opt/conda/lib/python3.7/site-packages (21.1.2)
    Collecting pip
      Downloading pip-21.1.3-py3-none-any.whl (1.5 MB)
    INFO[0910] Pushed image to 1 destinations               
    Installing collected packages: pip
      Attempting uninstall: pip
        Found existing installation: pip 21.1.2
        Uninstalling pip-21.1.2:
          Successfully uninstalled pip-21.1.2
    Successfully installed pip-21.1.3
    WARNING: Running pip as root will break packages and permissions. You should install packages reliably by using venv: https://pip.pypa.io/warnings/venv
    INFO[0912] Taking snapshot of full filesystem...        
    INFO[1148] Pushing layer gcr.io/researchprojects-msc/cifar10_tensorflow/cache:0466755a7b51465b8a5ebf0a031a24056603ae4c48f3aa8abca6faf15373ca69 to cache now 
    INFO[1148] COPY cifar10_tensorflow/requirements.txt cifar10_tensorflow/requirements.txt 
    INFO[1148] Taking snapshot of files...                  
    INFO[1148] GET KEYCHAIN                                 
    INFO[1148] RUN python -m pip install -r cifar10_tensorflow/requirements.txt 
    INFO[1148] cmd: /bin/sh                                 
    INFO[1148] args: [-c python -m pip install -r cifar10_tensorflow/requirements.txt] 
    INFO[1148] Running: [/bin/sh -c python -m pip install -r cifar10_tensorflow/requirements.txt] 
    INFO[1148] Pushing image to gcr.io/researchprojects-msc/cifar10_tensorflow/cache:0466755a7b51465b8a5ebf0a031a24056603ae4c48f3aa8abca6faf15373ca69 
    Requirement already satisfied: absl-py in /opt/conda/lib/python3.7/site-packages (from -r cifar10_tensorflow/requirements.txt (line 1)) (0.8.1)
    Requirement already satisfied: tensorflow in /opt/conda/lib/python3.7/site-packages (from -r cifar10_tensorflow/requirements.txt (line 2)) (2.1.4)
    Requirement already satisfied: tensorflow-datasets in /opt/conda/lib/python3.7/site-packages (from -r cifar10_tensorflow/requirements.txt (line 3)) (2.0.0)
    Requirement already satisfied: six in /opt/conda/lib/python3.7/site-packages (from absl-py->-r cifar10_tensorflow/requirements.txt (line 1)) (1.16.0)
    Requirement already satisfied: wheel>=0.26 in /opt/conda/lib/python3.7/site-packages (from tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (0.36.2)
    Requirement already satisfied: h5py<=2.10.0 in /opt/conda/lib/python3.7/site-packages (from tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (2.10.0)
    Requirement already satisfied: tensorflow-estimator<2.2.0,>=2.1.0rc0 in /opt/conda/lib/python3.7/site-packages (from tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (2.1.0)
    Requirement already satisfied: grpcio>=1.8.6 in /opt/conda/lib/python3.7/site-packages (from tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (1.38.0)
    Requirement already satisfied: google-pasta>=0.1.6 in /opt/conda/lib/python3.7/site-packages (from tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (0.2.0)
    Requirement already satisfied: astor>=0.6.0 in /opt/conda/lib/python3.7/site-packages (from tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (0.8.1)
    Requirement already satisfied: keras-applications>=1.0.8 in /opt/conda/lib/python3.7/site-packages (from tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (1.0.8)
    Requirement already satisfied: gast==0.2.2 in /opt/conda/lib/python3.7/site-packages (from tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (0.2.2)
    Requirement already satisfied: opt-einsum>=2.3.2 in /opt/conda/lib/python3.7/site-packages (from tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (3.3.0)
    Requirement already satisfied: termcolor>=1.1.0 in /opt/conda/lib/python3.7/site-packages (from tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (1.1.0)
    Requirement already satisfied: wrapt>=1.11.1 in /opt/conda/lib/python3.7/site-packages (from tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (1.12.1)
    Requirement already satisfied: protobuf>=3.8.0 in /opt/conda/lib/python3.7/site-packages (from tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (3.16.0)
    Collecting numpy<1.19.0,>=1.16.0
      Downloading numpy-1.18.5-cp37-cp37m-manylinux1_x86_64.whl (20.1 MB)
    INFO[1150] Pushed image to 1 destinations               
    Requirement already satisfied: keras-preprocessing==1.1.0 in /opt/conda/lib/python3.7/site-packages (from tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (1.1.0)
    Collecting tensorboard<2.2.0,>=2.1.0
      Downloading tensorboard-2.1.1-py3-none-any.whl (3.8 MB)
    Requirement already satisfied: werkzeug>=0.11.15 in /opt/conda/lib/python3.7/site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (2.0.1)
    Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /opt/conda/lib/python3.7/site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (0.4.4)
    Requirement already satisfied: markdown>=2.6.8 in /opt/conda/lib/python3.7/site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (3.3.4)
    Requirement already satisfied: requests<3,>=2.21.0 in /opt/conda/lib/python3.7/site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (2.25.1)
    Requirement already satisfied: google-auth<2,>=1.6.3 in /opt/conda/lib/python3.7/site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (1.30.2)
    Requirement already satisfied: setuptools>=41.0.0 in /opt/conda/lib/python3.7/site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (49.6.0.post20210108)
    Requirement already satisfied: pyasn1-modules>=0.2.1 in /opt/conda/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (0.2.7)
    Requirement already satisfied: rsa<5,>=3.1.4 in /opt/conda/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (4.7.2)
    Requirement already satisfied: cachetools<5.0,>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (4.2.2)
    Requirement already satisfied: requests-oauthlib>=0.7.0 in /opt/conda/lib/python3.7/site-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (1.3.0)
    Requirement already satisfied: importlib-metadata in /opt/conda/lib/python3.7/site-packages (from markdown>=2.6.8->tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (4.5.0)
    Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /opt/conda/lib/python3.7/site-packages (from pyasn1-modules>=0.2.1->google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (0.4.8)
    Requirement already satisfied: idna<3,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (2.10)
    Requirement already satisfied: chardet<5,>=3.0.2 in /opt/conda/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (4.0.0)
    Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (1.26.5)
    Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (2021.5.30)
    Requirement already satisfied: oauthlib>=3.0.0 in /opt/conda/lib/python3.7/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (3.1.1)
    Requirement already satisfied: tqdm in /opt/conda/lib/python3.7/site-packages (from tensorflow-datasets->-r cifar10_tensorflow/requirements.txt (line 3)) (4.61.1)
    Requirement already satisfied: promise in /opt/conda/lib/python3.7/site-packages (from tensorflow-datasets->-r cifar10_tensorflow/requirements.txt (line 3)) (2.3)
    Requirement already satisfied: dill in /opt/conda/lib/python3.7/site-packages (from tensorflow-datasets->-r cifar10_tensorflow/requirements.txt (line 3)) (0.3.0)
    Requirement already satisfied: attrs>=18.1.0 in /opt/conda/lib/python3.7/site-packages (from tensorflow-datasets->-r cifar10_tensorflow/requirements.txt (line 3)) (21.2.0)
    Requirement already satisfied: tensorflow-metadata in /opt/conda/lib/python3.7/site-packages (from tensorflow-datasets->-r cifar10_tensorflow/requirements.txt (line 3)) (0.21.2)
    Requirement already satisfied: future in /opt/conda/lib/python3.7/site-packages (from tensorflow-datasets->-r cifar10_tensorflow/requirements.txt (line 3)) (0.18.2)
    Requirement already satisfied: typing-extensions>=3.6.4 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata->markdown>=2.6.8->tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (3.10.0.0)
    Requirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata->markdown>=2.6.8->tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (3.4.1)
    Requirement already satisfied: googleapis-common-protos in /opt/conda/lib/python3.7/site-packages (from tensorflow-metadata->tensorflow-datasets->-r cifar10_tensorflow/requirements.txt (line 3)) (1.53.0)
    Installing collected packages: numpy, tensorboard
      Attempting uninstall: numpy
        Found existing installation: numpy 1.19.5
        Uninstalling numpy-1.19.5:
          Successfully uninstalled numpy-1.19.5
      Attempting uninstall: tensorboard
        Found existing installation: tensorboard 2.5.0
        Uninstalling tensorboard-2.5.0:
          Successfully uninstalled tensorboard-2.5.0
    Successfully installed numpy-1.18.5 tensorboard-2.1.1
    ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
    tfx-bsl 0.21.4 requires google-api-python-client<2,>=1.7.11, but you have google-api-python-client 2.9.0 which is incompatible.
    tfx-bsl 0.21.4 requires pyarrow<0.16.0,>=0.15.0, but you have pyarrow 4.0.1 which is incompatible.
    tensorflow-model-analysis 0.21.6 requires pyarrow<1,>=0.15, but you have pyarrow 4.0.1 which is incompatible.
    tensorflow-model-analysis 0.21.6 requires scipy==1.4.1; python_version >= "3", but you have scipy 1.6.3 which is incompatible.
    tensorflow-io 0.11.0 requires tensorflow==2.1.0, but you have tensorflow 2.1.4 which is incompatible.
    tensorflow-data-validation 0.21.5 requires joblib<0.15,>=0.12, but you have joblib 1.0.1 which is incompatible.
    tensorflow-data-validation 0.21.5 requires pandas<1,>=0.24, but you have pandas 1.2.4 which is incompatible.
    tensorflow-data-validation 0.21.5 requires scikit-learn<0.22,>=0.18, but you have scikit-learn 0.24.2 which is incompatible.
    tensorflow-cloud 0.1.13 requires tensorboard>=2.3.0, but you have tensorboard 2.1.1 which is incompatible.
    apache-beam 2.17.0 requires httplib2<=0.12.0,>=0.8, but you have httplib2 0.19.1 which is incompatible.
    apache-beam 2.17.0 requires pyarrow<0.16.0,>=0.15.1; python_version >= "3.0" or platform_system != "Windows", but you have pyarrow 4.0.1 which is incompatible.
    WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
    INFO[1155] Taking snapshot of full filesystem...        
    ERROR
    ERROR: build step 0 "gcr.io/kaniko-project/executor:latest" failed: step exited with non-zero status: 2
    
    Tensorflow take 2
    starting build "29331116-f1e7-4f16-b5e4-6d40b248d099"
    
    FETCHSOURCE
    Fetching storage object: gs://revirainbow_bucket2/cifar10_tensorflow-latest.tar.gz#1624822680512194
    Copying gs://revirainbow_bucket2/cifar10_tensorflow-latest.tar.gz#1624822680512194...
    / [0 files][    0.0 B/  5.6 KiB]                                                
    / [1 files][  5.6 KiB/  5.6 KiB]                                                
    Operation completed over 1 objects/5.6 KiB.                                      
    tar: Removing leading `/' from member names
    BUILD
    Pulling image: gcr.io/kaniko-project/executor:latest
    latest: Pulling from kaniko-project/executor
    Digest: sha256:6ecc43ae139ad8cfa11604b592aaedddcabff8cef469eda303f1fb5afe5e3034
    Status: Downloaded newer image for gcr.io/kaniko-project/executor:latest
    gcr.io/kaniko-project/executor:latest
    INFO[0000] GET KEYCHAIN                                 
    INFO[0000] Retrieving image manifest gcr.io/deeplearning-platform-release/tf2-gpu.2-1 
    INFO[0000] Retrieving image gcr.io/deeplearning-platform-release/tf2-gpu.2-1 from registry gcr.io 
    INFO[0000] GET KEYCHAIN                                 
    INFO[0000] Retrieving image manifest gcr.io/deeplearning-platform-release/tf2-gpu.2-1 
    INFO[0000] Returning cached image manifest              
    INFO[0000] Built cross stage deps: map[]                
    INFO[0000] Retrieving image manifest gcr.io/deeplearning-platform-release/tf2-gpu.2-1 
    INFO[0000] Returning cached image manifest              
    INFO[0000] Retrieving image manifest gcr.io/deeplearning-platform-release/tf2-gpu.2-1 
    INFO[0000] Returning cached image manifest              
    INFO[0000] Executing 0 build triggers                   
    INFO[0000] Checking for cached layer gcr.io/researchprojects-msc/cifar10_tensorflow/cache:331fd6441d19d384b5d8f21997642529c44fad394563eff5b2843bd14dae0f7d... 
    INFO[0000] GET KEYCHAIN                                 
    INFO[0000] Using caching version of cmd: RUN apt-get update && apt-get install -y git 
    INFO[0000] Checking for cached layer gcr.io/researchprojects-msc/cifar10_tensorflow/cache:0466755a7b51465b8a5ebf0a031a24056603ae4c48f3aa8abca6faf15373ca69... 
    INFO[0000] GET KEYCHAIN                                 
    INFO[0001] Using caching version of cmd: RUN python -m pip install --upgrade pip 
    INFO[0001] Checking for cached layer gcr.io/researchprojects-msc/cifar10_tensorflow/cache:dde4d6174b68d54544a7b5309497e841f0a94c6bcf0d33ea4958d7c528a0c80f... 
    INFO[0001] GET KEYCHAIN                                 
    INFO[0001] No cached layer found for cmd RUN python -m pip install -r cifar10_tensorflow/requirements.txt 
    INFO[0001] Unpacking rootfs as cmd COPY cifar10_tensorflow/requirements.txt cifar10_tensorflow/requirements.txt requires it. 
    INFO[0433] ENV LANG=C.UTF-8                             
    INFO[0433] No files changed in this command, skipping snapshotting. 
    INFO[0433] RUN apt-get update && apt-get install -y git 
    INFO[0433] Found cached layer, extracting to filesystem 
    INFO[0435] RUN python -m pip install --upgrade pip      
    INFO[0435] Found cached layer, extracting to filesystem 
    INFO[0435] COPY cifar10_tensorflow/requirements.txt cifar10_tensorflow/requirements.txt 
    INFO[0435] Taking snapshot of files...                  
    INFO[0435] RUN python -m pip install -r cifar10_tensorflow/requirements.txt 
    INFO[0435] Taking snapshot of full filesystem...        
    INFO[0698] cmd: /bin/sh                                 
    INFO[0698] args: [-c python -m pip install -r cifar10_tensorflow/requirements.txt] 
    INFO[0698] Running: [/bin/sh -c python -m pip install -r cifar10_tensorflow/requirements.txt] 
    WARNING: Ignoring invalid distribution -wh-pip (/opt/conda/lib/python3.7/site-packages)
    WARNING: Ignoring invalid distribution -wh-pip (/opt/conda/lib/python3.7/site-packages)
    Requirement already satisfied: absl-py in /opt/conda/lib/python3.7/site-packages (from -r cifar10_tensorflow/requirements.txt (line 1)) (0.8.1)
    Requirement already satisfied: tensorflow in /opt/conda/lib/python3.7/site-packages (from -r cifar10_tensorflow/requirements.txt (line 2)) (2.1.4)
    Requirement already satisfied: tensorflow-datasets in /opt/conda/lib/python3.7/site-packages (from -r cifar10_tensorflow/requirements.txt (line 3)) (2.0.0)
    Requirement already satisfied: six in /opt/conda/lib/python3.7/site-packages (from absl-py->-r cifar10_tensorflow/requirements.txt (line 1)) (1.16.0)
    Collecting tensorboard<2.2.0,>=2.1.0
      Downloading tensorboard-2.1.1-py3-none-any.whl (3.8 MB)
    Requirement already satisfied: h5py<=2.10.0 in /opt/conda/lib/python3.7/site-packages (from tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (2.10.0)
    Requirement already satisfied: keras-applications>=1.0.8 in /opt/conda/lib/python3.7/site-packages (from tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (1.0.8)
    Requirement already satisfied: gast==0.2.2 in /opt/conda/lib/python3.7/site-packages (from tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (0.2.2)
    Requirement already satisfied: opt-einsum>=2.3.2 in /opt/conda/lib/python3.7/site-packages (from tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (3.3.0)
    Requirement already satisfied: tensorflow-estimator<2.2.0,>=2.1.0rc0 in /opt/conda/lib/python3.7/site-packages (from tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (2.1.0)
    Requirement already satisfied: google-pasta>=0.1.6 in /opt/conda/lib/python3.7/site-packages (from tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (0.2.0)
    Requirement already satisfied: keras-preprocessing==1.1.0 in /opt/conda/lib/python3.7/site-packages (from tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (1.1.0)
    Requirement already satisfied: wrapt>=1.11.1 in /opt/conda/lib/python3.7/site-packages (from tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (1.12.1)
    Collecting numpy<1.19.0,>=1.16.0
      Downloading numpy-1.18.5-cp37-cp37m-manylinux1_x86_64.whl (20.1 MB)
    Requirement already satisfied: grpcio>=1.8.6 in /opt/conda/lib/python3.7/site-packages (from tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (1.38.0)
    Requirement already satisfied: astor>=0.6.0 in /opt/conda/lib/python3.7/site-packages (from tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (0.8.1)
    Requirement already satisfied: protobuf>=3.8.0 in /opt/conda/lib/python3.7/site-packages (from tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (3.16.0)
    Requirement already satisfied: termcolor>=1.1.0 in /opt/conda/lib/python3.7/site-packages (from tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (1.1.0)
    Requirement already satisfied: wheel>=0.26 in /opt/conda/lib/python3.7/site-packages (from tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (0.36.2)
    Requirement already satisfied: werkzeug>=0.11.15 in /opt/conda/lib/python3.7/site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (2.0.1)
    Requirement already satisfied: markdown>=2.6.8 in /opt/conda/lib/python3.7/site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (3.3.4)
    Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /opt/conda/lib/python3.7/site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (0.4.4)
    Requirement already satisfied: requests<3,>=2.21.0 in /opt/conda/lib/python3.7/site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (2.25.1)
    Requirement already satisfied: google-auth<2,>=1.6.3 in /opt/conda/lib/python3.7/site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (1.30.2)
    Requirement already satisfied: setuptools>=41.0.0 in /opt/conda/lib/python3.7/site-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (49.6.0.post20210108)
    Requirement already satisfied: cachetools<5.0,>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (4.2.2)
    Requirement already satisfied: rsa<5,>=3.1.4 in /opt/conda/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (4.7.2)
    Requirement already satisfied: pyasn1-modules>=0.2.1 in /opt/conda/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (0.2.7)
    Requirement already satisfied: requests-oauthlib>=0.7.0 in /opt/conda/lib/python3.7/site-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (1.3.0)
    Requirement already satisfied: importlib-metadata in /opt/conda/lib/python3.7/site-packages (from markdown>=2.6.8->tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (4.5.0)
    Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /opt/conda/lib/python3.7/site-packages (from pyasn1-modules>=0.2.1->google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (0.4.8)
    Requirement already satisfied: chardet<5,>=3.0.2 in /opt/conda/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (4.0.0)
    Requirement already satisfied: idna<3,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (2.10)
    Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (1.26.5)
    Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (2021.5.30)
    Requirement already satisfied: oauthlib>=3.0.0 in /opt/conda/lib/python3.7/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (3.1.1)
    Requirement already satisfied: tensorflow-metadata in /opt/conda/lib/python3.7/site-packages (from tensorflow-datasets->-r cifar10_tensorflow/requirements.txt (line 3)) (0.21.2)
    Requirement already satisfied: tqdm in /opt/conda/lib/python3.7/site-packages (from tensorflow-datasets->-r cifar10_tensorflow/requirements.txt (line 3)) (4.61.1)
    Requirement already satisfied: dill in /opt/conda/lib/python3.7/site-packages (from tensorflow-datasets->-r cifar10_tensorflow/requirements.txt (line 3)) (0.3.0)
    Requirement already satisfied: attrs>=18.1.0 in /opt/conda/lib/python3.7/site-packages (from tensorflow-datasets->-r cifar10_tensorflow/requirements.txt (line 3)) (21.2.0)
    Requirement already satisfied: promise in /opt/conda/lib/python3.7/site-packages (from tensorflow-datasets->-r cifar10_tensorflow/requirements.txt (line 3)) (2.3)
    Requirement already satisfied: future in /opt/conda/lib/python3.7/site-packages (from tensorflow-datasets->-r cifar10_tensorflow/requirements.txt (line 3)) (0.18.2)
    Requirement already satisfied: typing-extensions>=3.6.4 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata->markdown>=2.6.8->tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (3.10.0.0)
    Requirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata->markdown>=2.6.8->tensorboard<2.2.0,>=2.1.0->tensorflow->-r cifar10_tensorflow/requirements.txt (line 2)) (3.4.1)
    Requirement already satisfied: googleapis-common-protos in /opt/conda/lib/python3.7/site-packages (from tensorflow-metadata->tensorflow-datasets->-r cifar10_tensorflow/requirements.txt (line 3)) (1.53.0)
    WARNING: Ignoring invalid distribution -wh-pip (/opt/conda/lib/python3.7/site-packages)
    Installing collected packages: numpy, tensorboard
      Attempting uninstall: numpy
        WARNING: Ignoring invalid distribution -wh-pip (/opt/conda/lib/python3.7/site-packages)
        Found existing installation: numpy 1.19.5
        Uninstalling numpy-1.19.5:
          Successfully uninstalled numpy-1.19.5
      Attempting uninstall: tensorboard
        WARNING: Ignoring invalid distribution -wh-pip (/opt/conda/lib/python3.7/site-packages)
        Found existing installation: tensorboard 2.5.0
        Uninstalling tensorboard-2.5.0:
          Successfully uninstalled tensorboard-2.5.0
    WARNING: Ignoring invalid distribution -wh-pip (/opt/conda/lib/python3.7/site-packages)
    WARNING: Ignoring invalid distribution -wh-pip (/opt/conda/lib/python3.7/site-packages)
    ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
    tfx-bsl 0.21.4 requires google-api-python-client<2,>=1.7.11, but you have google-api-python-client 2.9.0 which is incompatible.
    tfx-bsl 0.21.4 requires pyarrow<0.16.0,>=0.15.0, but you have pyarrow 4.0.1 which is incompatible.
    tensorflow-model-analysis 0.21.6 requires pyarrow<1,>=0.15, but you have pyarrow 4.0.1 which is incompatible.
    tensorflow-model-analysis 0.21.6 requires scipy==1.4.1; python_version >= "3", but you have scipy 1.6.3 which is incompatible.
    tensorflow-io 0.11.0 requires tensorflow==2.1.0, but you have tensorflow 2.1.4 which is incompatible.
    tensorflow-data-validation 0.21.5 requires joblib<0.15,>=0.12, but you have joblib 1.0.1 which is incompatible.
    tensorflow-data-validation 0.21.5 requires pandas<1,>=0.24, but you have pandas 1.2.4 which is incompatible.
    tensorflow-data-validation 0.21.5 requires scikit-learn<0.22,>=0.18, but you have scikit-learn 0.24.2 which is incompatible.
    tensorflow-cloud 0.1.13 requires tensorboard>=2.3.0, but you have tensorboard 2.1.1 which is incompatible.
    apache-beam 2.17.0 requires httplib2<=0.12.0,>=0.8, but you have httplib2 0.19.1 which is incompatible.
    apache-beam 2.17.0 requires pyarrow<0.16.0,>=0.15.1; python_version >= "3.0" or platform_system != "Windows", but you have pyarrow 4.0.1 which is incompatible.
    Successfully installed numpy-1.18.5 tensorboard-2.1.1
    WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
    WARNING: Ignoring invalid distribution -wh-pip (/opt/conda/lib/python3.7/site-packages)
    WARNING: Ignoring invalid distribution -wh-pip (/opt/conda/lib/python3.7/site-packages)
    INFO[0708] Taking snapshot of full filesystem...        
    INFO[0950] COPY cifar10_tensorflow/ cifar10_tensorflow  
    INFO[0950] Taking snapshot of files...                  
    INFO[0950] WORKDIR cifar10_tensorflow                   
    INFO[0950] cmd: workdir                                 
    INFO[0950] Changed working directory to /cifar10_tensorflow 
    INFO[0950] No files changed in this command, skipping snapshotting. 
    INFO[0950] COPY entrypoint.sh ./entrypoint.sh           
    INFO[0950] Taking snapshot of files...                  
    INFO[0950] RUN chmod +x ./entrypoint.sh                 
    INFO[0950] cmd: /bin/sh                                 
    INFO[0950] args: [-c chmod +x ./entrypoint.sh]          
    INFO[0950] Running: [/bin/sh -c chmod +x ./entrypoint.sh] 
    INFO[0950] Taking snapshot of full filesystem...        
    INFO[0950] Pushing layer gcr.io/researchprojects-msc/cifar10_tensorflow/cache:dde4d6174b68d54544a7b5309497e841f0a94c6bcf0d33ea4958d7c528a0c80f to cache now 
    INFO[0950] GET KEYCHAIN                                 
    INFO[0950] Pushing image to gcr.io/researchprojects-msc/cifar10_tensorflow/cache:dde4d6174b68d54544a7b5309497e841f0a94c6bcf0d33ea4958d7c528a0c80f 
    INFO[0952] Pushed image to 1 destinations               
    INFO[1180] Pushing layer gcr.io/researchprojects-msc/cifar10_tensorflow/cache:87d5ac879ebb7364fda6eb69e993ae5cba71fe1ec9bdce47bea59cdc6e3e9021 to cache now 
    INFO[1180] RUN chmod +x ./wrapped_entrypoint.sh         
    INFO[1180] cmd: /bin/sh                                 
    INFO[1180] args: [-c chmod +x ./wrapped_entrypoint.sh]  
    INFO[1180] Running: [/bin/sh -c chmod +x ./wrapped_entrypoint.sh] 
    INFO[1180] GET KEYCHAIN                                 
    INFO[1180] Taking snapshot of full filesystem...        
    INFO[1180] Pushing image to gcr.io/researchprojects-msc/cifar10_tensorflow/cache:87d5ac879ebb7364fda6eb69e993ae5cba71fe1ec9bdce47bea59cdc6e3e9021 
    INFO[1182] Pushed image to 1 destinations               
    ERROR
    ERROR: build step 0 "gcr.io/kaniko-project/executor:latest" failed: step exited with non-zero status: 2
    
    Torch XLA
    starting build "46618c1c-1f1d-409e-8b31-74240bda94e2"
    
    FETCHSOURCE
    Fetching storage object: gs://revirainbow_bucket2/cifar10_torch_xla-latest.tar.gz#1624824967473593
    Copying gs://revirainbow_bucket2/cifar10_torch_xla-latest.tar.gz#1624824967473593...
    / [0 files][    0.0 B/  6.7 KiB]                                                
    / [1 files][  6.7 KiB/  6.7 KiB]                                                
    Operation completed over 1 objects/6.7 KiB.                                      
    tar: Removing leading `/' from member names
    BUILD
    Pulling image: gcr.io/kaniko-project/executor:latest
    latest: Pulling from kaniko-project/executor
    Digest: sha256:6ecc43ae139ad8cfa11604b592aaedddcabff8cef469eda303f1fb5afe5e3034
    Status: Downloaded newer image for gcr.io/kaniko-project/executor:latest
    gcr.io/kaniko-project/executor:latest
    INFO[0000] GET KEYCHAIN                                 
    INFO[0000] Retrieving image manifest gcr.io/deeplearning-platform-release/pytorch-xla.1-8 
    INFO[0000] Retrieving image gcr.io/deeplearning-platform-release/pytorch-xla.1-8 from registry gcr.io 
    INFO[0000] GET KEYCHAIN                                 
    INFO[0000] Retrieving image manifest gcr.io/deeplearning-platform-release/pytorch-xla.1-8 
    INFO[0000] Returning cached image manifest              
    INFO[0000] Built cross stage deps: map[]                
    INFO[0000] Retrieving image manifest gcr.io/deeplearning-platform-release/pytorch-xla.1-8 
    INFO[0000] Returning cached image manifest              
    INFO[0000] Retrieving image manifest gcr.io/deeplearning-platform-release/pytorch-xla.1-8 
    INFO[0000] Returning cached image manifest              
    INFO[0000] Executing 0 build triggers                   
    INFO[0000] Checking for cached layer gcr.io/researchprojects-msc/cifar10_torch_xla/cache:d44b9071ffdd974e978b3e6db70a4690d0eb85a012e775e287ca60878f2f9f14... 
    INFO[0000] GET KEYCHAIN                                 
    INFO[0000] No cached layer found for cmd RUN apt-get update && apt-get install -y git 
    INFO[0000] Unpacking rootfs as cmd RUN apt-get update && apt-get install -y git requires it. 
    INFO[0333] ENV LANG=C.UTF-8                             
    INFO[0333] No files changed in this command, skipping snapshotting. 
    INFO[0333] RUN apt-get update && apt-get install -y git 
    INFO[0333] Taking snapshot of full filesystem...        
    INFO[0634] cmd: /bin/sh                                 
    INFO[0634] args: [-c apt-get update && apt-get install -y git] 
    INFO[0634] Running: [/bin/sh -c apt-get update && apt-get install -y git] 
    Get:1 http://packages.cloud.google.com/apt gcsfuse-bionic InRelease [5385 B]
    Get:2 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
    Get:3 http://packages.cloud.google.com/apt cloud-sdk-bionic InRelease [6780 B]
    Hit:4 http://archive.ubuntu.com/ubuntu bionic InRelease
    Get:5 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
    Get:6 http://packages.cloud.google.com/apt cloud-sdk-bionic/main amd64 Packages [191 kB]
    Get:7 http://security.ubuntu.com/ubuntu bionic-security/restricted amd64 Packages [473 kB]
    Get:8 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages [2220 kB]
    Get:9 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]
    Get:10 http://security.ubuntu.com/ubuntu bionic-security/universe amd64 Packages [1418 kB]
    Get:11 http://archive.ubuntu.com/ubuntu bionic-updates/restricted amd64 Packages [505 kB]
    Get:12 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages [2188 kB]
    Get:13 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages [2653 kB]
    Fetched 9911 kB in 2s (5160 kB/s)
    Reading package lists...
    Reading package lists...
    Building dependency tree...
    Reading state information...
    git is already the newest version (1:2.17.1-1ubuntu0.8).
    0 upgraded, 0 newly installed, 0 to remove and 6 not upgraded.
    INFO[0639] Taking snapshot of full filesystem...        
    INFO[0901] RUN python -m pip install --upgrade pip      
    INFO[0901] cmd: /bin/sh                                 
    INFO[0901] args: [-c python -m pip install --upgrade pip] 
    INFO[0901] Running: [/bin/sh -c python -m pip install --upgrade pip] 
    INFO[0901] Pushing layer gcr.io/researchprojects-msc/cifar10_torch_xla/cache:d44b9071ffdd974e978b3e6db70a4690d0eb85a012e775e287ca60878f2f9f14 to cache now 
    INFO[0901] GET KEYCHAIN                                 
    INFO[0901] Pushing image to gcr.io/researchprojects-msc/cifar10_torch_xla/cache:d44b9071ffdd974e978b3e6db70a4690d0eb85a012e775e287ca60878f2f9f14 
    Requirement already satisfied: pip in /opt/conda/lib/python3.7/site-packages (21.1.2)
    Collecting pip
      Downloading pip-21.1.3-py3-none-any.whl (1.5 MB)
    INFO[0903] Pushed image to 1 destinations               
    Installing collected packages: pip
      Attempting uninstall: pip
        Found existing installation: pip 21.1.2
        Uninstalling pip-21.1.2:
          Successfully uninstalled pip-21.1.2
    WARNING: Running pip as root will break packages and permissions. You should install packages reliably by using venv: https://pip.pypa.io/warnings/venv
    Successfully installed pip-21.1.3
    INFO[0906] Taking snapshot of full filesystem...        
    INFO[1170] Pushing layer gcr.io/researchprojects-msc/cifar10_torch_xla/cache:39c4fcff89964385cbe78b4c1701f08eff30706e32ce3559eedebce39738669a to cache now 
    INFO[1170] GET KEYCHAIN                                 
    INFO[1170] COPY cifar10_torch_xla/requirements.txt cifar10_torch_xla/requirements.txt 
    INFO[1170] Taking snapshot of files...                  
    INFO[1170] RUN python -m pip install -r cifar10_torch_xla/requirements.txt 
    INFO[1170] cmd: /bin/sh                                 
    INFO[1170] args: [-c python -m pip install -r cifar10_torch_xla/requirements.txt] 
    INFO[1170] Running: [/bin/sh -c python -m pip install -r cifar10_torch_xla/requirements.txt] 
    INFO[1170] Pushing image to gcr.io/researchprojects-msc/cifar10_torch_xla/cache:39c4fcff89964385cbe78b4c1701f08eff30706e32ce3559eedebce39738669a 
    Collecting absl-py
      Downloading absl_py-0.13.0-py3-none-any.whl (132 kB)
    Requirement already satisfied: numpy in /opt/conda/lib/python3.7/site-packages (from -r cifar10_torch_xla/requirements.txt (line 16)) (1.19.5)
    Collecting tensorflow
      Downloading tensorflow-2.5.0-cp37-cp37m-manylinux2010_x86_64.whl (454.3 MB)
    INFO[1172] Pushed image to 1 destinations               
    Requirement already satisfied: torch in /opt/conda/lib/python3.7/site-packages (from -r cifar10_torch_xla/requirements.txt (line 18)) (1.8.0)
    Requirement already satisfied: torchvision in /opt/conda/lib/python3.7/site-packages (from -r cifar10_torch_xla/requirements.txt (line 19)) (0.9.0+cu111)
    Requirement already satisfied: six in /opt/conda/lib/python3.7/site-packages (from absl-py->-r cifar10_torch_xla/requirements.txt (line 15)) (1.16.0)
    Collecting h5py~=3.1.0
      Downloading h5py-3.1.0-cp37-cp37m-manylinux1_x86_64.whl (4.0 MB)
    Collecting six
      Downloading six-1.15.0-py2.py3-none-any.whl (10 kB)
    Requirement already satisfied: wheel~=0.35 in /opt/conda/lib/python3.7/site-packages (from tensorflow->-r cifar10_torch_xla/requirements.txt (line 17)) (0.36.2)
    Collecting flatbuffers~=1.12.0
      Downloading flatbuffers-1.12-py2.py3-none-any.whl (15 kB)
    Collecting opt-einsum~=3.3.0
      Downloading opt_einsum-3.3.0-py3-none-any.whl (65 kB)
    Collecting gast==0.4.0
      Downloading gast-0.4.0-py3-none-any.whl (9.8 kB)
    Collecting typing-extensions~=3.7.4
      Downloading typing_extensions-3.7.4.3-py3-none-any.whl (22 kB)
    Collecting tensorboard~=2.5
      Downloading tensorboard-2.5.0-py3-none-any.whl (6.0 MB)
    Collecting termcolor~=1.1.0
      Downloading termcolor-1.1.0.tar.gz (3.9 kB)
    Collecting tensorflow-estimator<2.6.0,>=2.5.0rc0
      Downloading tensorflow_estimator-2.5.0-py2.py3-none-any.whl (462 kB)
    Collecting astunparse~=1.6.3
      Downloading astunparse-1.6.3-py2.py3-none-any.whl (12 kB)
    Collecting keras-preprocessing~=1.1.2
      Downloading Keras_Preprocessing-1.1.2-py2.py3-none-any.whl (42 kB)
    Requirement already satisfied: protobuf>=3.9.2 in /opt/conda/lib/python3.7/site-packages (from tensorflow->-r cifar10_torch_xla/requirements.txt (line 17)) (3.16.0)
    Requirement already satisfied: wrapt~=1.12.1 in /opt/conda/lib/python3.7/site-packages (from tensorflow->-r cifar10_torch_xla/requirements.txt (line 17)) (1.12.1)
    Collecting keras-nightly~=2.5.0.dev
      Downloading keras_nightly-2.5.0.dev2021032900-py2.py3-none-any.whl (1.2 MB)
    Collecting google-pasta~=0.2
      Downloading google_pasta-0.2.0-py3-none-any.whl (57 kB)
    Collecting grpcio~=1.34.0
      Downloading grpcio-1.34.1-cp37-cp37m-manylinux2014_x86_64.whl (4.0 MB)
    Collecting cached-property
      Downloading cached_property-1.5.2-py2.py3-none-any.whl (7.6 kB)
    Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /opt/conda/lib/python3.7/site-packages (from tensorboard~=2.5->tensorflow->-r cifar10_torch_xla/requirements.txt (line 17)) (0.4.4)
    Collecting tensorboard-plugin-wit>=1.6.0
      Downloading tensorboard_plugin_wit-1.8.0-py3-none-any.whl (781 kB)
    Requirement already satisfied: markdown>=2.6.8 in /opt/conda/lib/python3.7/site-packages (from tensorboard~=2.5->tensorflow->-r cifar10_torch_xla/requirements.txt (line 17)) (3.3.4)
    Collecting werkzeug>=0.11.15
      Downloading Werkzeug-2.0.1-py3-none-any.whl (288 kB)
    Requirement already satisfied: setuptools>=41.0.0 in /opt/conda/lib/python3.7/site-packages (from tensorboard~=2.5->tensorflow->-r cifar10_torch_xla/requirements.txt (line 17)) (49.6.0.post20210108)
    Requirement already satisfied: requests<3,>=2.21.0 in /opt/conda/lib/python3.7/site-packages (from tensorboard~=2.5->tensorflow->-r cifar10_torch_xla/requirements.txt (line 17)) (2.25.1)
    Requirement already satisfied: google-auth<2,>=1.6.3 in /opt/conda/lib/python3.7/site-packages (from tensorboard~=2.5->tensorflow->-r cifar10_torch_xla/requirements.txt (line 17)) (1.30.2)
    Collecting tensorboard-data-server<0.7.0,>=0.6.0
      Downloading tensorboard_data_server-0.6.1-py3-none-manylinux2010_x86_64.whl (4.9 MB)
    Requirement already satisfied: cachetools<5.0,>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard~=2.5->tensorflow->-r cifar10_torch_xla/requirements.txt (line 17)) (4.2.2)
    Requirement already satisfied: pyasn1-modules>=0.2.1 in /opt/conda/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard~=2.5->tensorflow->-r cifar10_torch_xla/requirements.txt (line 17)) (0.2.7)
    Requirement already satisfied: rsa<5,>=3.1.4 in /opt/conda/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard~=2.5->tensorflow->-r cifar10_torch_xla/requirements.txt (line 17)) (4.7.2)
    Requirement already satisfied: requests-oauthlib>=0.7.0 in /opt/conda/lib/python3.7/site-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard~=2.5->tensorflow->-r cifar10_torch_xla/requirements.txt (line 17)) (1.3.0)
    Requirement already satisfied: importlib-metadata in /opt/conda/lib/python3.7/site-packages (from markdown>=2.6.8->tensorboard~=2.5->tensorflow->-r cifar10_torch_xla/requirements.txt (line 17)) (4.5.0)
    Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /opt/conda/lib/python3.7/site-packages (from pyasn1-modules>=0.2.1->google-auth<2,>=1.6.3->tensorboard~=2.5->tensorflow->-r cifar10_torch_xla/requirements.txt (line 17)) (0.4.8)
    Requirement already satisfied: chardet<5,>=3.0.2 in /opt/conda/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard~=2.5->tensorflow->-r cifar10_torch_xla/requirements.txt (line 17)) (4.0.0)
    Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard~=2.5->tensorflow->-r cifar10_torch_xla/requirements.txt (line 17)) (1.26.5)
    Requirement already satisfied: idna<3,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard~=2.5->tensorflow->-r cifar10_torch_xla/requirements.txt (line 17)) (2.10)
    Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard~=2.5->tensorflow->-r cifar10_torch_xla/requirements.txt (line 17)) (2021.5.30)
    Requirement already satisfied: oauthlib>=3.0.0 in /opt/conda/lib/python3.7/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard~=2.5->tensorflow->-r cifar10_torch_xla/requirements.txt (line 17)) (3.1.1)
    Requirement already satisfied: pillow>=4.1.1 in /opt/conda/lib/python3.7/site-packages (from torchvision->-r cifar10_torch_xla/requirements.txt (line 19)) (8.2.0)
    Requirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata->markdown>=2.6.8->tensorboard~=2.5->tensorflow->-r cifar10_torch_xla/requirements.txt (line 17)) (3.4.1)
    Building wheels for collected packages: termcolor
      Building wheel for termcolor (setup.py): started
      Building wheel for termcolor (setup.py): finished with status 'done'
      Created wheel for termcolor: filename=termcolor-1.1.0-py3-none-any.whl size=4829 sha256=21af378bdd76722c309847123ef4aa570b137ae5527136336d134b545f312b1c
      Stored in directory: /root/.cache/pip/wheels/3f/e3/ec/8a8336ff196023622fbcb36de0c5a5c218cbb24111d1d4c7f2
    Successfully built termcolor
    Installing collected packages: typing-extensions, six, werkzeug, tensorboard-plugin-wit, tensorboard-data-server, grpcio, cached-property, absl-py, termcolor, tensorflow-estimator, tensorboard, opt-einsum, keras-preprocessing, keras-nightly, h5py, google-pasta, gast, flatbuffers, astunparse, tensorflow
      Attempting uninstall: typing-extensions
        Found existing installation: typing-extensions 3.10.0.0
        Uninstalling typing-extensions-3.10.0.0:
          Successfully uninstalled typing-extensions-3.10.0.0
      Attempting uninstall: six
        Found existing installation: six 1.16.0
        Uninstalling six-1.16.0:
          Successfully uninstalled six-1.16.0
      Attempting uninstall: grpcio
        Found existing installation: grpcio 1.38.0
        Uninstalling grpcio-1.38.0:
          Successfully uninstalled grpcio-1.38.0
    TIMEOUT
    ERROR: context deadline exceeded
    

    Thank you in advance for your help

    opened by joaogui1 5
  • error: googleapiclient.errors.UnknownApiNameOrVersion: name: us-central1-aiplatform  version: v1

    error: googleapiclient.errors.UnknownApiNameOrVersion: name: us-central1-aiplatform version: v1

    When I run 'xmanager launch ./xmanager/examples/cifar10_torch/launcher.py' i get the following error: googleapiclient.errors.UnknownApiNameOrVersion: name: us-central1-aiplatform version: v1

    Captura de pantalla de 2021-05-31 16-01-32

    Please, could you guide me?

    opened by JohanSamir 5
  • Use Existing Service Account

    Use Existing Service Account

    I am trying to use XManager with Vertex AI but do not have permissions to create a new service account. I noticed that the service account name is hard-coded to "xmanager" here:

    https://github.com/deepmind/xmanager/blob/6b748270894ee2ca3499710620574eca2aa62ef0/xmanager/cloud/auth.py#L70-L92

    Is it possible to add an option or parameter so that we can specify an existing service account name for XManager to use? Thanks.

    opened by sanath-2024 2
  • Codelab instructions say

    Codelab instructions say "install XManager" but the command clones "Raksha"

    The codelab.ipynb currently has the following instructions:

    Download and install XManager

    !git clone https://github.com/google-research/raksha.git ~/xmanager
    !pip install ~/xmanager
    

    It's unclear how https://github.com/google-research/raksha.git is related to XManager; is that line supposed to be cloning https://github.com/deepmind/xmanager.git instead?

    Also, why not use one of the following commands as directed by the README.md instructions?

    pip install git+https://github.com/deepmind/xmanager.git
    

    or

    pip install xmanager
    

    Happy to make a PR (or CL) to update this, but just wanted to get clarity if this is intentional, and if so, what the rationale is.

    Thanks!

    opened by mbrukman 2
  • `pip install xmanager==0.2.0` yields import error (previous version works OK)

    `pip install xmanager==0.2.0` yields import error (previous version works OK)

    To reproduce:

    pip install xmanager==0.2.0
    

    Then:

    from xmanager import xm
    

    Yields error:

    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    /tmp/ipykernel_30726/3915442554.py in <module>
    ----> 1 from xmanager import xm
    
    ~/xmanager/xmanager/xm/__init__.py in <module>
         19 from xmanager.xm import job_operators
         20 from xmanager.xm.compute_units import *
    ---> 21 from xmanager.xm.core import *
         22 from xmanager.xm.executables import *
         23 from xmanager.xm.job_blocks import *
    
    ~/xmanager/xmanager/xm/core.py in <module>
        529 
        530 
    --> 531 class Experiment(abc.ABC):
        532   """Experiment contains a family of jobs run on the same snapshot of code.
        533 
    
    ~/xmanager/xmanager/xm/core.py in Experiment()
        659       *,  # parameters after “*” are keyword-only parameters
        660       identity: str = ''
    --> 661   ) -> asyncio.Future[ExperimentUnit]:
        662     ...
        663 
    
    TypeError: 'type' object is not subscriptable
    

    However, installing the previous version 0.1.5 works OK.

    This failed on both http://colab.research.google.com and on a GCP Vertex AI managed Jupyter notebook.

    opened by letsbuild 2
  • enable_web_access for Vertex AI jobs

    enable_web_access for Vertex AI jobs

    Hi, is there a way to launch Vertex AI jobs with enabled access to an interactive shell (https://cloud.google.com/vertex-ai/docs/training/monitor-debug-interactive-shell#vertexai_enable_web_access-python)?

    opened by DzvinkaYarish 2
  • ValueError when target name contains a `.`

    ValueError when target name contains a `.`

    The BazelContainer documentation uses image.tar as an example target name, which actually returns a ValueError.

    Traceback
    Traceback (most recent call last):
      File "/Users/ryo.takahashi/ghq/github.com/deepmind/xmanager/.venv/bin/xmanager", line 33, in <module>
        sys.exit(load_entry_point('xmanager', 'console_scripts', 'xmanager')())
      File "/Users/ryo.takahashi/ghq/github.com/deepmind/xmanager/xmanager/cli/cli.py", line 65, in entrypoint
        app.run(main)
      File "/Users/ryo.takahashi/ghq/github.com/deepmind/xmanager/.venv/lib/python3.9/site-packages/absl/app.py", line 312, in run
        _run_main(main, args)
      File "/Users/ryo.takahashi/ghq/github.com/deepmind/xmanager/.venv/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main
        sys.exit(main(argv))
      File "/Users/ryo.takahashi/ghq/github.com/deepmind/xmanager/xmanager/cli/cli.py", line 41, in main
        app.run(m.main, argv=argv)
      File "/Users/ryo.takahashi/ghq/github.com/deepmind/xmanager/.venv/lib/python3.9/site-packages/absl/app.py", line 312, in run
        _run_main(main, args)
      File "/Users/ryo.takahashi/ghq/github.com/deepmind/xmanager/.venv/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main
        sys.exit(main(argv))
      File "/Users/ryo.takahashi/ghq/github.com/deepmind/xmanager/launcher.py", line 7, in main
        [executable] = experiment.package(
      File "/Users/ryo.takahashi/ghq/github.com/deepmind/xmanager/xmanager/xm/core.py", line 636, in package
        return cls._async_packager.package(packageables)
      File "/Users/ryo.takahashi/ghq/github.com/deepmind/xmanager/xmanager/xm/async_packager.py", line 114, in package
        executables = self._package_batch(packageables)
      File "/Users/ryo.takahashi/ghq/github.com/deepmind/xmanager/xmanager/xm_local/packaging/router.py", line 112, in package
        bazel_kinds = bazel_service.fetch_kinds(bazel_labels)
      File "/Users/ryo.takahashi/ghq/github.com/deepmind/xmanager/xmanager/xm_local/packaging/bazel_tools.py", line 186, in fetch_kinds
        labels = [_assemble_label(_lex_label(label)) for label in labels]
      File "/Users/ryo.takahashi/ghq/github.com/deepmind/xmanager/xmanager/xm_local/packaging/bazel_tools.py", line 186, in <listcomp>
        labels = [_assemble_label(_lex_label(label)) for label in labels]
      File "/Users/ryo.takahashi/ghq/github.com/deepmind/xmanager/xmanager/xm_local/packaging/bazel_tools.py", line 160, in _lex_label
        raise ValueError(f'{label} is not an absolute Bazel label')
    ValueError: //path/to/target:label.tar is not an absolute Bazel label
    

    This error is caused by _LABEL_LEXER. This regex does not allow the inclusion of a single . which is not an expansion, so the label name in the example will not match. https://github.com/deepmind/xmanager/blob/18652570332e284a6b2c184e6ab943ca56f6a11a/xmanager/xm_local/packaging/bazel_tools.py#L149-L152

    The immediate solution that comes to mind is to allow containing . in the regex, then look for a consecutive . in post-processing like the following:

    diff --git a/xmanager/xm_local/packaging/bazel_tools.py b/xmanager/xm_local/packaging/bazel_tools.py
    index 694f001..4dc52b0 100644
    --- a/xmanager/xm_local/packaging/bazel_tools.py
    +++ b/xmanager/xm_local/packaging/bazel_tools.py
    @@ -147,7 +147,7 @@ def _build_multiple_targets(
     
     
     # Expansions (`...`, `*`) are not allowed.
    -_NAME_RE = '[^:/.*]+'
    +_NAME_RE = '[^:/*]+'
     _LABEL_LEXER = re.compile(
         f'^//(?P<packages>{_NAME_RE}(/{_NAME_RE})*)?(?P<target>:{_NAME_RE})?$')
     _LexedLabel = Tuple[List[str], str]
    @@ -156,8 +156,10 @@ _LexedLabel = Tuple[List[str], str]
     def _lex_label(label: str) -> _LexedLabel:
       """Splits the label into packages and target."""
       match = _LABEL_LEXER.match(label)
    -  if match is None:
    -    raise ValueError(f'{label} is not an absolute Bazel label')
    +  for g in match.groups('packages'):
    +    print('group:', g)
    +    if '..' in g:
    +      raise ValueError(f'{label} is not an absolute Bazel label')
       groups = match.groupdict()
       packages: Optional[str] = groups['packages']
       target: Optional[str] = groups['target']
    
    opened by reiyw 1
  • Better support for Kubernetes

    Better support for Kubernetes

    This is to gauge interest if better Kubernetes support would be useful. Please comment if this would be useful to you and ideally explain your use case a bit.

    enhancement 
    opened by dfurrer 1
  • pinned sqlalchemy and alembic dependencies are more than two years old

    pinned sqlalchemy and alembic dependencies are more than two years old

    sqlalchemy is pinned to 1.2.19, which was released in April of 2019.

    alembic is pinned to 1.4.3, which was released in September of 2020.

    This was already brought up in https://github.com/deepmind/xmanager/issues/28.

    Old dependencies like this make it difficult for xmanager to coexist with other packages that keep their dependencies up to date - for instance, the hyperparameter optimization package Optuna.

    opened by kalaracey 0
  • FileExistsError on launch

    FileExistsError on launch

    I am trying to run the xmanager with following script.

    
    from __future__ import annotations
    from xmanager import xm
    import os
    from xmanager import xm_local
    
    def main(_):
    
        with xm_local.create_experiment(experiment_title='cifar102') as experiment:
    
          # path = os.path.join(os.path.dirname(__file__), "learned_optimization")
          path = os.path.join(os.path.dirname(__file__), "./")
    
          spec = xm.PythonContainer(
              path=path,
              
              entrypoint=xm.CommandList(['main.py']),
    
          )
    
          [executable] = experiment.package([
              xm.Packageable(
                  # What we are going to run.
                  executable_spec=spec,
                  # Where we are going to run it.
                  executor_spec=xm_local.Vertex.Spec(),
              )
          ])
          #
        
          experiment.add(xm.Job(
              executable=executable,
              
              executor=xm_local.Vertex(xm.JobRequirements(a100=1)),
    
              # Time to pass the batch size as a command-line argument!
              # args={'batch_size': 16},
              args={'--cfg': "configs/run_cub.yaml"},
              # We can also pass environment variables.
              env_vars={'HEAPPROFILE': '/tmp/a_out.hprof'},
          ))
        
    
    if __name__ == '__main__':
      app.run(main)
    

    I get following fileexists error which did not occur when i first ran the code. I am not sure how to solve this issue.

    Error: File "/home/gulzain/gen_39/lib/python3.9/site-packages/xmanager/xm_local/packaging/cloud.py", line 133, in _package_python_container image = build_image.build( File "/home/gulzain/gen_39/lib/python3.9/site-packages/xmanager/cloud/build_image.py", line 119, in build docker_lib.prepare_directory(staging, python_path, dirname, entrypoint, File "/home/gulzain/gen_39/lib/python3.9/site-packages/xmanager/cloud/docker_lib.py", line 56, in prepare_directory shutil.copytree(source_directory, File "/usr/lib/python3.9/shutil.py", line 568, in copytree return _copytree(entries=entries, src=src, dst=dst, symlinks=symlinks, File "/usr/lib/python3.9/shutil.py", line 467, in _copytree os.makedirs(dst, exist_ok=dirs_exist_ok) File "/usr/lib/python3.9/os.py", line 225, in makedirs mkdir(name, mode) FileExistsError: [Errno 17] File exists: '/tmp/tmpm5m_0mb_/'

    I have python3.9 and latest pip version of xmanager.

    opened by gulzainali98 0
  • ResourceExhausted: 429 The following quota metrics exceed quota limits

    ResourceExhausted: 429 The following quota metrics exceed quota limits

    Hi! Thanks for building this amazing project. Recently I'm running script on xmanager+vertex.AI on TPU v2 and v3, but I keep getting this error:

    google.api_core.exceptions.ResourceExhausted: 429 The following quota metrics exceed quota limits: aiplatform.googleapis.com/custom_model_training_tpu_v2
    

    The error is thrown at this line - https://github.com/deepmind/xmanager/blob/v0.2.0/xmanager/cloud/vertex.py#L181.

    Below are the sanity checks that I've done:

    • I found the service account here can be loaded nicely, tho it would soon be assigned to `None here as I'm requesting TPU v2 or v3.
    • tensorboard is set to empty string.
    • the self.location, self.project, pools and auth.get_bucket() all look good. where the location is us-central1, and pools showing --
    [machine_spec {
      machine_type: "cloud-tpu"
      accelerator_type: TPU_V2
      accelerator_count: 8
    }
    

    I've enabled the three APIs mentioned in the readme (IAM, Cloud AI Platform, Container Registry), additionally Vertex API and Cloud Resource Manager API was enabled. I also checked the Quota page on the console, which looks fine as well. Doesn't look like I'm overusing the resources as described in the error message "exceed quota limits".

    It's been bugging me for quite a few days, and would be really appreciated if anyone could suggest what's possibly going on there. Thanks in advance!

    opened by crystina-z 1
  • Error running cifar10_tensorflow_tpu example

    Error running cifar10_tensorflow_tpu example

    I'm trying to run the cifar10_tensorflow_tpu example on GCP and got this error:

      File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 445, in result
        return self.__get_result()
      File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 390, in __get_result
        raise self._exception
      File "/home/koles/.local/lib/python3.9/site-packages/xmanager/xm/core.py", line 824, in launch
        await experiment_unit.add(job, args, identity=identity)
      File "/home/koles/.local/lib/python3.9/site-packages/xmanager/xm_local/experiment.py", line 211, in _launch_job_group
        launch_result = await self._submit_jobs_for_execution(job_group)
      File "/home/koles/.local/lib/python3.9/site-packages/xmanager/xm_local/experiment.py", line 83, in _submit_jobs_for_execution
        vertex_handles = vertex.launch(self._experiment_title,
      File "/home/koles/.local/lib/python3.9/site-packages/xmanager/cloud/vertex.py", line 335, in launch
        job_name = get_default_client().launch(
      File "/home/koles/.local/lib/python3.9/site-packages/xmanager/cloud/vertex.py", line 181, in launch
        custom_job.wait_for_resource_creation()
      File "/home/koles/.local/lib/python3.9/site-packages/google/cloud/aiplatform/jobs.py", line 1026, in wait_for_resource_creation
        self._wait_for_resource_creation()
      File "/home/koles/.local/lib/python3.9/site-packages/google/cloud/aiplatform/base.py", line 1246, in _wait_for_resource_creation
        self._raise_future_exception()
      File "/home/koles/.local/lib/python3.9/site-packages/google/cloud/aiplatform/base.py", line 214, in _raise_future_exception
        raise self._exception
      File "/home/koles/.local/lib/python3.9/site-packages/google/cloud/aiplatform/base.py", line 226, in _complete_future
        future.result()  # raises
      File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 438, in result
        return self.__get_result()
      File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 390, in __get_result
        raise self._exception
      File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 52, in run
        result = self.fn(*self.args, **self.kwargs)
      File "/home/koles/.local/lib/python3.9/site-packages/google/cloud/aiplatform/base.py", line 316, in wait_for_dependencies_and_invoke
        result = method(*args, **kwargs)
      File "/home/koles/.local/lib/python3.9/site-packages/google/cloud/aiplatform/jobs.py", line 1496, in run
        self._gca_resource = self.api_client.create_custom_job(
      File "/home/koles/.local/lib/python3.9/site-packages/google/cloud/aiplatform_v1/services/job_service/client.py", line 794, in create_custom_job
        response = rpc(
      File "/home/koles/.local/lib/python3.9/site-packages/google/api_core/gapic_v1/method.py", line 154, in __call__
        return wrapped_func(*args, **kwargs)
      File "/home/koles/.local/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 52, in error_remapped_callable
        raise exceptions.from_grpc_error(exc) from exc
    google.api_core.exceptions.NotFound: 404 custom_job.job_spec.service_account must be specified when uploading to TensorBoard.
    
    

    I followed the xmanager setup instructions and then run the example from a clean GCP VM:

    xmanager launch examples/cifar10_tensorflow_tpu/launcher.py
    

    Thank you for the help.

    opened by akolesnikov 0
  • installing xmanager from pip fails in colab environment

    installing xmanager from pip fails in colab environment

    python -m pip install xmanager fails with the following error in colab (18.06) environment:

    ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
    tensorflow 2.8.0 requires tf-estimator-nightly==2.8.0.dev2021122109, which is not installed.
    pandas-gbq 0.13.3 requires google-cloud-bigquery[bqstorage,pandas]<2.0.0dev,>=1.11.1, but you have google-cloud-bigquery 2.34.3 which is incompatible.
    google-cloud-translate 1.5.0 requires google-cloud-core<2.0dev,>=1.0.0, but you have google-cloud-core 2.3.0 which is incompatible.
    google-cloud-firestore 1.7.0 requires google-cloud-core<2.0dev,>=1.0.3, but you have google-cloud-core 2.3.0 which is incompatible.
    google-cloud-datastore 1.8.0 requires google-cloud-core<2.0dev,>=1.0.0, but you have google-cloud-core 2.3.0 which is incompatible.
    Successfully installed async-generator-1.10 docker-5.0.3 google-cloud-aiplatform-1.12.1 google-cloud-bigquery-2.34.3 google-cloud-core-2.3.0 google-cloud-resource-manager-1.4.1 google-cloud-storage-2.3.0 google-crc32c-1.3.0 google-resumable-media-2.3.2 grpc-google-iam-v1-0.12.4 immutabledict-2.2.1 kubernetes-23.3.0 proto-plus-1.20.3 protobuf-3.20.1 pyyaml-6.0 sqlalchemy-1.2.19 websocket-client-1.3.2 xmanager-0.1.5
    
    opened by proppy 2
  • JOB_STATE_FAILED for cifar10_tensorflow

    JOB_STATE_FAILED for cifar10_tensorflow

    I am unable to launch an example script. Following is the command and console output/Error. I am running the command from PyCharm terminal. The job is launched but fails immediately with "JOB_STATE_FAILED" error.

    % sudo xmanager launch ./examples/cifar10_tensorflow/launcher.py

    Console output + Error (a part of it): [+] Building 0.5s (16/16) FINISHED
    => [internal] load build definition from Dockerfile 0.0s => => transferring dockerfile: 694B 0.0s => [internal] load .dockerignore 0.0s => => transferring context: 2B 0.0s => [internal] load metadata for gcr.io/deeplearning-platform-release/tf2-gpu.2-6:latest 0.4s => [ 1/11] FROM gcr.io/deeplearning-platform-release/[email protected]:<"a bunch of HEX digits"> 0.0s => [internal] load build context 0.0s => => transferring context: 8.07kB 0.0s => CACHED [ 2/11] RUN if ! id 1000; then useradd -m -u 1000 clouduser; fi 0.0s => CACHED [ 3/11] RUN apt-get update && apt-get install -y git netcat 0.0s => CACHED [ 4/11] RUN python -m pip install --upgrade pip 0.0s => CACHED [ 5/11] COPY cifar10_tensorflow/requirements.txt /cifar10_tensorflow/requirements.txt 0.0s => CACHED [ 6/11] RUN python -m pip install -r cifar10_tensorflow/requirements.txt 0.0s => CACHED [ 7/11] COPY cifar10_tensorflow/ /cifar10_tensorflow 0.0s => CACHED [ 8/11] RUN chown -R 1000:root /cifar10_tensorflow && chmod -R 775 /cifar10_tensorflow 0.0s => CACHED [ 9/11] WORKDIR cifar10_tensorflow 0.0s => CACHED [10/11] COPY entrypoint.sh ./entrypoint.sh 0.0s => CACHED [11/11] RUN chown -R 1000:root ./entrypoint.sh && chmod -R 775 ./entrypoint.sh 0.0s => exporting to image 0.0s => => exporting layers
    ... {"status":"Waiting","progressDetail":{},"id": .... {"status":"Layer already exists","progressDetail":{},"id": .... Your image URI is: Job launched at: https://console.cloud.google.com/ai/platform/locations//training/ current state: JobState.JOB_STATE_QUEUED current state: JobState.JOB_STATE_PENDING current state: JobState.JOB_STATE_FAILED

    opened by nayakanuj 0
Releases(v0.3.0)
Owner
DeepMind
DeepMind
CrayLabs and user contibuted examples of using SmartSim for various simulation and machine learning applications.

SmartSim Example Zoo This repository contains CrayLabs and user contibuted examples of using SmartSim for various simulation and machine learning appl

Cray Labs 14 Mar 30, 2022
Implementations of Machine Learning models, Regularizers, Optimizers and different Cost functions.

Linear Models Implementations of LinearRegression, LassoRegression and RidgeRegression with appropriate Regularizers and Optimizers. Linear Regression

Keivan Ipchi Hagh 1 Nov 22, 2021
This repo includes some graph-based CTR prediction models and other representative baselines.

Graph-based CTR prediction This is a repository designed for graph-based CTR prediction methods, it includes our graph-based CTR prediction methods: F

Big Data and Multi-modal Computing Group, CRIPAC 47 Dec 30, 2022
GRaNDPapA: Generator of Rad Names from Decent Paper Acronyms

Generator of Rad Names from Decent Paper Acronyms

264 Nov 08, 2022
This project impelemented for midterm of the Machine Learning #Zoomcamp #Alexey Grigorev

MLProject_01 This project impelemented for midterm of the Machine Learning #Zoomcamp #Alexey Grigorev Context Dataset English question data set file F

Hadi Nakhi 1 Dec 18, 2021
This is an auto-ML tool specialized in detecting of outliers

Auto-ML tool specialized in detecting of outliers Description This tool will allows you, with a Dash visualization, to compare 10 models of machine le

1 Nov 03, 2021
A webpage that utilizes machine learning to extract sentiments from tweets.

Tweets_Classification_Webpage The goal of this project is to be able to predict what rating customers on social media platforms would give to products

Ayaz Nakhuda 1 Dec 30, 2021
PySpark + Scikit-learn = Sparkit-learn

Sparkit-learn PySpark + Scikit-learn = Sparkit-learn GitHub: https://github.com/lensacom/sparkit-learn About Sparkit-learn aims to provide scikit-lear

Lensa 1.1k Jan 04, 2023
Uber Open Source 1.6k Dec 31, 2022
DistML is a Ray extension library to support large-scale distributed ML training on heterogeneous multi-node multi-GPU clusters

DistML is a Ray extension library to support large-scale distributed ML training on heterogeneous multi-node multi-GPU clusters

27 Aug 19, 2022
Educational python for Neural Networks, written in pure Python/NumPy.

Educational python for Neural Networks, written in pure Python/NumPy.

127 Oct 27, 2022
An open-source library of algorithms to analyse time series in GPU and CPU.

An open-source library of algorithms to analyse time series in GPU and CPU.

Shapelets 216 Dec 30, 2022
Python package for stacking (machine learning technique)

vecstack Python package for stacking (stacked generalization) featuring lightweight functional API and fully compatible scikit-learn API Convenient wa

Igor Ivanov 671 Dec 25, 2022
Graphsignal is a machine learning model monitoring platform.

Graphsignal is a machine learning model monitoring platform. It helps ML engineers, MLOps teams and data scientists to quickly address issues with data and models as well as proactively analyze model

Graphsignal 143 Dec 05, 2022
Tools for diffing and merging of Jupyter notebooks.

nbdime provides tools for diffing and merging of Jupyter Notebooks.

Project Jupyter 2.3k Jan 03, 2023
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext

Benedek Rozemberczki 1.8k Jan 03, 2023
Kaggle Tweet Sentiment Extraction Competition: 1st place solution (Dark of the Moon team)

Kaggle Tweet Sentiment Extraction Competition: 1st place solution (Dark of the Moon team)

Artsem Zhyvalkouski 64 Nov 30, 2022
Simple data balancing baselines for worst-group-accuracy benchmarks.

BalancingGroups Code to replicate the experimental results from Simple data balancing baselines achieve competitive worst-group-accuracy. Replicating

Facebook Research 29 Dec 02, 2022
Unofficial pytorch implementation of the paper "Context Reasoning Attention Network for Image Super-Resolution (ICCV 2021)"

CRAN Unofficial pytorch implementation of the paper "Context Reasoning Attention Network for Image Super-Resolution (ICCV 2021)" This code doesn't exa

4 Nov 11, 2021
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learn

Vowpal Wabbit 8.1k Dec 30, 2022