Model serving at scale

Last update: Jan 06, 2023

Overview

Run inference at scale

Cortex is an open source platform for large-scale machine learning inference workloads.

Workloads

Realtime APIs - respond to prediction requests in real-time

Deploy TensorFlow, PyTorch, and other models.
Scale to handle production workloads with server-side batching and request-based autoscaling.
Configure rolling updates and live model reloading to update APIs without downtime.
Serve many models efficiently with multi-model caching.
Perform A/B tests with configurable traffic splitting.
Stream performance metrics and structured logs to any monitoring tool.

Batch APIs - run distributed inference on large datasets

Deploy TensorFlow, PyTorch, and other models.
Configure the number of workers and the compute resources for each worker.
Recover from failures with automatic retries and dead letter queues.
Stream performance metrics and structured logs to any monitoring tool.

How it works

Implement a Predictor

# predictor.py

from transformers import pipeline

class PythonPredictor:
    def __init__(self, config):
        self.model = pipeline(task="text-generation")

    def predict(self, payload):
        return self.model(payload["text"])[0]

Configure a realtime API

# text_generator.yaml

- name: text-generator
  kind: RealtimeAPI
  predictor:
    type: python
    path: predictor.py
  compute:
    gpu: 1
    mem: 8Gi
  autoscaling:
    min_replicas: 1
    max_replicas: 10

Deploy

$ cortex deploy text_generator.yaml

# creating http://example.com/text-generator

Serve prediction requests

$ curl http://example.com/text-generator -X POST -H "Content-Type: application/json" -d '{"text": "hello world"}'

Get started

Comments

Display realtime output
I have a text-generator language model that is compressed and is in .bin format, and it can accessed by command line terminal. It generates about one word per second and prints out every word in realtime when accessed through command line. I would like to deploy my model using cortex but I'm struggling to get the output in realtime word-by-word. Right now my code prints out only one line a time.

import subprocess def run_command(text): command = ['./mycommand', text] process = subprocess.Popen(command, stdout=subprocess.PIPE, universal_newlines=True, bufsize=-1) while True: output = process.stdout.readline() if output == '' and process.poll() is not None: break if output: print(output.strip()) run_command('TEXT')

One line may include about 20 words, so that means 1 line will be displayed every 20 seconds (since my model outputs roughly 1 word/ second). I would really like the output to be more dynamic and output one word at a time (as it does through command line) instead of just one line. Is there a way this can be achieved?
question
opened by AbbeKamalov 61
Persistent private instances

I would like to use Cortex functionality, to create an application where each user will be able to request and communicate with AWS instance for a period of time. In this scenario, data of each user will be processed and stored on one whole AWS instance. From the documentation, I understand that each API call will use an instance that it is not busy at the moment. It wouldn’t be ideal if by making an API call, a user would receive sensitive data stored by a another user on the same instance. Would it be possible to somehow mark an instance to which an API call is being made? That way the data of individual users wouldn’t be made accesible to everyone, but only to those users who request/use an instance.
question

opened by da-source 41

Resource exhausted error

I'm trying to send audio files, which are fairly large, to the server and am getting a resource exhausted error. Is there any way to configure the server in order to increase the maximum allowed message size?

Here's the stack trace:

2020-12-24 23:30:14.941839:cortex:pid-2247:INFO:500 Internal Server Error POST /
2020-12-24 23:30:14.942071:cortex:pid-2247:ERROR:Exception in ASGI application
Traceback (most recent call last):
  File "/opt/conda/envs/env/lib/python3.6/site-packages/uvicorn/protocols/http/httptools_impl.py", line
390, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
    return await self.app(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/fastapi/applications.py", line 181, in __call__
    await super().__call__(scope, receive, send)  # pragma: no cover
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/applications.py", line 111, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/errors.py", line 181, in __call__
    raise exc from None
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/errors.py", line 159, in __call__
    await self.app(scope, receive, _send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 25, in __call__
    response = await self.dispatch_func(request, self.call_next)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/serve/serve.py", line 187, in parse_payload
    return await call_next(request)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 45, in call_next
    task.result()
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 38, in coro
    await self.app(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 25, in __call__
    response = await self.dispatch_func(request, self.call_next)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/serve/serve.py", line 134, in register_request
    response = await call_next(request)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 45, in call_next
    task.result()
 File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 38, in coro
    await self.app(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/exceptions.py", line 82, in __call__
    raise exc from None
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/exceptions.py", line 71, in __call__
    await self.app(scope, receive, sender)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/routing.py", line 566, in __call__
    await route.handle(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/routing.py", line 227, in handle
    await self.app(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/routing.py", line 41, in app
    response = await func(request)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/fastapi/routing.py", line 183, in app
    dependant=dependant, values=values, is_coroutine=is_coroutine
  File "/opt/conda/envs/env/lib/python3.6/site-packages/fastapi/routing.py", line 135, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/concurrency.py", line 34, in run_in_threadpool
    return await loop.run_in_executor(None, func, *args)
  File "/opt/conda/envs/env/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/serve/serve.py", line 200, in predict
    prediction = predictor_impl.predict(**kwargs)
  File "/mnt/project/serving/cortex_server.py", line 10, in predict
    return self.client.predict({"waveform": np.array(payload["audio"]).astype("float32")})
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/lib/client/tensorflow.py", line
114, in predict
    return self._run_inference(model_input, consts.SINGLE_MODEL_NAME, model_version)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/lib/client/tensorflow.py", line
164, in _run_inference
    return self._client.predict(model_input, model_name, model_version)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/lib/model/tfs.py", line 376, in
predict
    response_proto = self._pred.Predict(prediction_request, timeout=timeout)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/grpc/_channel.py", line 826, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/grpc/_channel.py", line 729, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.RESOURCE_EXHAUSTED
        details = "Received message larger than max (102484524 vs. 4194304)"
        debug_error_string = "{"created":"@1608852614.937822193","description":"Received message larger
than max (102484524 vs. 4194304)","file":"src/core/ext/filters/message_size/message_size_filter.cc","file_line":203,"grpc_status":8}"

bug

opened by lminer 24

Per Process GPU Ram

As you have mentioned in docs gpus.md about the limiting the gpu ram, I know what exactly code snippet I have to use, but dont know exactly where I have to write that code snippet in cortex source code.

mem_limit_mb = 1024 for gpu in tf.config.list_physical_devices("GPU"): tf.config.set_logical_device_configuration( gpu, [tf.config.LogicalDeviceConfiguration(memory_limit=mem_limit_mb)])
question

opened by akash-harijan 23
Support aws_session_token for CLI auth

Description

In order to authenticate with the Cortex operator, the Cortex CLI should be able to use aws_session_token (currently only static credentials are supported).

Also, consider enabling auth via IAM role (e.g. inherited from Lambda, EC2)
enhancement research

opened by deliahu 22
Package Cortex library into .ZIP

I'm trying to create a microservice to manage my cluster via Cortex and Lambda. AWS Lambda requires python dependencies to be packaged and uploaded as a .zip files. How can I package Cortex library to .zip?
question

opened by imagine3D-ai 18

How to make Cortex XmlHttpRequest on HTTPS page?

I have website which runs on https:// and I can't make Cortex API XmlHttpRequest requests over it.

When running on localhost using http://, everything works fine:

function postData(url = '', data = {}) {
  // Default options are marked with *
  const response = await fetch(url, {
    method: 'POST', // *GET, POST, PUT, DELETE, etc.
    mode: 'cors', // no-cors, *cors, same-origin
    cache: 'no-cache', // *default, no-cache, reload, force-cache, only-if-cached
    credentials: 'same-origin', // include, *same-origin, omit
    headers: {
      'Content-Type': 'application/json'
      // 'Content-Type': 'application/x-www-form-urlencoded',
    },
    redirect: 'follow', // manual, *follow, error
    referrerPolicy: 'no-referrer', // no-referrer, *no-referrer-when-downgrade, origin, origin-when-cross-origin, same-origin, strict-origin, strict-origin-when-cross-origin, unsafe-url
    body: JSON.stringify(data) // body data type must match "Content-Type" header
  });
  return response.json(); // parses JSON response into native JavaScript objects
}

But when making the same request on a https:// page gives following:

Mixed Content: The page at 'https://www.@' was loaded over HTTPS, but requested an insecure XMLHttpRequest endpoint 'http://a6cc8d4dee22a448e81bb29862332bf0-93580d7c9d7d2256.elb.us-east-2.amazonaws.com/newtest-user'. This request has been blocked; the content must be served over HTTPS.

How can I access Cortex API over HTTPS?

question

opened by imagine3D-ai 17

upstream connect error or disconnect/reset before headers. reset reason: connection failure

Version

cli version: 0.18.1

Description

Intermittent 503 errors on AWS cluster.

Configuration

cortex.yaml

# cortex.yaml

- name: offer-features
  predictor:
    type: python
    path: predictor.py
    config:
      bucket: XXXXXXXXXXXXXXXXXXXX
  compute:
    cpu: 1  # CPU request per replica, e.g. 200m or 1 (200m is equivalent to 0.2) (default: 200m)
    gpu: 0  # GPU request per replica (default: 0)
    inf: 0 # Inferentia ASIC request per replica (default: 0)
    mem: 1Gi
  autoscaling:
    min_replicas: 2
    max_replicas: 3
    init_replicas: 2
    max_replica_concurrency: 13
    target_replica_concurrency: 5
    window: 1m0s
    downscale_stabilization_period: 5m0s
    upscale_stabilization_period: 1m0s
    max_downscale_factor: 0.75
    max_upscale_factor: 1.5
    downscale_tolerance: 0.05
    upscale_tolerance: 0.05

# cluster.yaml

# AWS credentials (if not specified, ~/.aws/credentials will be checked) (can be overridden by $AWS_ACCESS_KEY_ID and $AWS_SECRET_ACCESS_KEY)
aws_access_key_id: XXXXXXXXXXXXXX
aws_secret_access_key: XXXXXXXXXXXXXXXXX

# optional AWS credentials for the operator which may be used to restrict its AWS access (defaults to the AWS credentials set above)
cortex_aws_access_key_id: XXXXXXXXXXXXXXXX
cortex_aws_secret_access_key: XXXXXXXXXXXXXXXXXXXXX

# EKS cluster name for cortex (default: cortex)
cluster_name: cortex

# AWS region
region: us-east-1

# S3 bucket (default: <cluster_name>-<RANDOM_ID>)
# note: your cortex cluster uses this bucket for metadata storage, and it should not be accessed directly (a separate bucket should be used for your models)
bucket: # cortex-<RANDOM_ID>

# list of availability zones for your region (default: 3 random availability zones from the specified region)
availability_zones: # e.g. [us-east-1a, us-east-1b, us-east-1c]

# instance type
instance_type: t3.medium

# minimum number of instances (must be >= 0)
min_instances: 1

# maximum number of instances (must be >= 1)
max_instances: 5

# disk storage size per instance (GB) (default: 50)
instance_volume_size: 50

# instance volume type [gp2, io1, st1, sc1] (default: gp2)
instance_volume_type: gp2

# instance volume iops (only applicable to io1 storage type) (default: 3000)
# instance_volume_iops: 3000

# whether the subnets used for EC2 instances should be public or private (default: "public")
# if "public", instances will be assigned public IP addresses; if "private", instances won't have public IPs and a NAT gateway will be created to allow outgoing network requests
# see https://docs.cortex.dev/v/0.18/miscellaneous/security#private-cluster for more information
subnet_visibility: public  # must be "public" or "private"

# whether to include a NAT gateway with the cluster (a NAT gateway is necessary when using private subnets)
# default value is "none" if subnet_visibility is set to "public"; "single" if subnet_visibility is "private"
nat_gateway: none  # must be "none", "single", or "highly_available" (highly_available means one NAT gateway per availability zone)

# whether the API load balancer should be internet-facing or internal (default: "internet-facing")
# note: if using "internal", APIs will still be accessible via the public API Gateway endpoint unless you also disable API Gateway in your API's configuration (if you do that, you must configure VPC Peering to connect to your APIs)
# see https://docs.cortex.dev/v/0.18/miscellaneous/security#private-cluster for more information
api_load_balancer_scheme: internet-facing  # must be "internet-facing" or "internal"

# whether the operator load balancer should be internet-facing or internal (default: "internet-facing")
# note: if using "internal", you must configure VPC Peering to connect your CLI to your cluster operator (https://docs.cortex.dev/v/0.18/guides/vpc-peering)
# see https://docs.cortex.dev/v/0.18/miscellaneous/security#private-cluster for more information
operator_load_balancer_scheme: internet-facing  # must be "internet-facing" or "internal"

# CloudWatch log group for cortex (default: <cluster_name>)
log_group: cortex

# additional tags to assign to aws resources for labelling and cost allocation (by default, all resources will be tagged with cortex.dev/cluster-name=<cluster_name>)
tags:  # <string>: <string> map of key/value pairs

# whether to use spot instances in the cluster (default: false)
# see https://docs.cortex.dev/v/0.18/cluster-management/spot-instances for additional details on spot configuration
spot: false

# see https://docs.cortex.dev/v/0.18/guides/custom-domain for instructions on how to set up a custom domain
ssl_certificate_arn: XXXXXXXXXXXXXXXXXXXXXXXXXXXX

Steps to reproduce

Spin up instances on AWS.
Wait a couple of days / hours (varies).
Notice sudden 503 errors

Expected behavior

It should work

Actual behavior

503 errors with the message

upstream connect error or disconnect/reset before headers. reset reason: connection failure

Screenshots

NOTE: The endpoint stopped responding around 15:30 in the graphs below.

Monitoring nr of bytes in:

Nr of requests:

Stack traces

Nothing useful, just:

2020-08-16 05:38:34.697979:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:38:37.643022:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:38:40.577522:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:38:42.008412:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:38:43.513294:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:38:45.425255:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:38:48.327276:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:38:51.316962:cortex:pid-447:INFO:200 OK POST /predict
2020-08-16 05:38:54.009212:cortex:pid-447:INFO:200 OK POST /predict
2020-08-16 05:38:55.852878:cortex:pid-447:INFO:200 OK POST /predict
2020-08-16 05:38:57.525264:cortex:pid-447:INFO:200 OK POST /predict
2020-08-16 05:39:00.795236:cortex:pid-447:INFO:200 OK POST /predict
2020-08-16 05:39:04.437013:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:39:05.981920:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:39:09.314293:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:39:12.343143:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:39:15.821708:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:39:19.083554:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:39:22.048843:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:39:24.943968:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:39:26.613330:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:39:29.702703:cortex:pid-448:INFO:200 OK POST /predict

Additional context

All the load balancer target are marked as "unhealthy", even when they work (i.e. I can send requests and receive 2XX responses)
The load balancer healthcheck endpoint returns the following

/healthz
{
        "service": {
                "namespace": "istio-system",
                "name": "ingressgateway-operator"
        },
        "localEndpoints": 0
}%

Suggested solution

(optional)

bug

opened by cristianmtr 14

Add possibility to export environment variables with .env file

Description

Add support for exporting environment variables from an .env file placed in the root directory of a Cortex project.

Motivation

In case the user doesn't want to export environment variables using the predictor:env field in cortex.yaml. A reason for that could be to keep the cortex.yaml deployment clean.
enhancement

opened by RobertLucian 14
Is there a way to speed-up API deployment
When deploying an API and observing logs, it seems that the most time-consuming part of deployment is:

2021-01-25 18:37:27.401057:cortex:pid-1:INFO:downloading the project code 2021-01-25 18:37:27.483562:cortex:pid-1:INFO:downloading the python serving image

Is there a way to somehow make deploying an API quicker?
question
opened by imagine3D-ai 13
Why is min_replicas 0 not possible?

We are trying to deploy a text generation API on AWS. We do not expect the API to receive a lot of traffic initially and hence we would like to save some costs. My idea was that min_replicas can be set to 0 which would not keep an instance idle in case the traffic on the API is none. As soon as a new request would come in cortex would spawn a new instance and shut it down once the traffic goes back to 0.

However, I noticed that setting min_replicas to 0 is invalid. Isn't the above use case a valid one for this? Also, is this a recent change? I vaguely(very) remember that this was possible to do in version 0.20(Please correct me if I'm wrong) but it seems like it is not in 0.26.

cc @deliahu I opened a new thread here because - 1) It's a different issue than the other thread , 2) Other users might benefit from the conversation here.
question

opened by dakshvar22 13
Fix Grafana dashboard for AsyncAPIs
Changes

Fix typo: async_queue_length -> async_queued so that the list of api_names is populated (currently empty)

Use =~ with api_name where missing to enable displaying multiple AsyncAPIs on a panel

For the "In-Flight Requests" panel include the api_name in the legend

Testing

I have made the corresponding updates manually through the Grafana UI for our deployed Cortex cluster. AsyncAPIs now list in the "Cortex / AsyncAPI" dashboard and the dashboard works when multiple AsyncAPIs are selected.

checklist:

[ ] run make test and make lint

[ ] test manually (i.e. build/push all images, restart operator, and re-deploy APIs)

[ ] update examples

[ ] update docs and add any new files to summary.md (view in gitbook after merging)

[ ] cherry-pick into release branches if applicable

[ ] alert the dev team if the dev environment changed
opened by jackmpcollins 0
Use of root url

I don't really know how to word it correctly, long story short, I need to use the "http://$URL/" instead of http://$URL/$API_NAME" for one of the multiple APIs inside the cluster, I haven't found any way to do it in the documentation, but surely it is implemented.
question

opened by Lunatik00 0
Bump sigs.k8s.io/aws-iam-authenticator from 0.5.3 to 0.5.9
Bumps sigs.k8s.io/aws-iam-authenticator from 0.5.3 to 0.5.9.

Release notes

Sourced from sigs.k8s.io/aws-iam-authenticator's releases.

v0.5.9

Changelog

1209cfe2 Bump version in Makefile

029d1dcf Add query parameter validation for multiple parameters

v0.5.7

What's Changed

Remove duplicate InitMetrics by @jngo2 in kubernetes-sigs/aws-iam-authenticator#448

fixes a crash when executing authenticator in server mode

New Contributors

@jngo2 made their first contribution in kubernetes-sigs/aws-iam-authenticator#448

Full Changelog: https://github.com/kubernetes-sigs/aws-iam-authenticator/compare/v0.5.6...v0.5.7

v0.5.6

Changelog

Bump AWS SDK to v1.43.28 (#445, @nckturner)

Use the apiversion from KUBERNETES_EXEC_INFO (#439, @jyotimahapatra)

Bump promptui module to v0.9.0 (#437, @abhay-krishna)

Docker Images

Note: You must log in with the registry ID and your role must have the necessary ECR privileges:

$(aws ecr get-login --no-include-email --region us-west-2 --registry-ids 602401143452)

docker pull 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-iam-authenticator:v0.5.6

docker pull 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-iam-authenticator:v0.5.6-arm64

docker pull 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-iam-authenticator:v0.5.6-amd64

v0.5.5

Changelog

Use full package name for goreleaser version (#433, @nckturner)

add sts error metric (#430, @jyotimahapatra)

emit metric for EC2 describeInstance calls (#428, @jyotimahapatra)

Rename configmap_watch_failures to configmap_watch_failures_total (#432, @nckturner)

Simplify goreleaser Dockerfiles (#431, @jyotimahapatra)

Don't pass metrics around (#423, @nckturner)

Docker Images

Note: You must log in with the registry ID and your role must have the necessary ECR privileges:

$(aws ecr get-login --no-include-email --region us-west-2 --registry-ids 602401143452)

... (truncated)

Commits

1209cfe Bump version in Makefile

029d1dc Add query parameter validation for multiple parameters

0a72c12 Merge pull request #455 from jyotimahapatra/rev2

596a043 revert use of upstream yaml parsing

2a9ee95 Merge pull request #448 from jngo2/master

fc4e6cb Remove unused imports

f0fe605 Remove duplicate InitMetrics

99f04d6 Merge pull request #447 from nckturner/release-0.5.6

9dcb6d1 Faster multiarch docker builds

a9cc81b Bump timeout for image build job

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies go
opened by dependabot[bot] 1
Consider using the CDK SDK for `cortex cluster up / down` commands

Description

Replace cloud provider specific code in cortex cluster commands by using the CDK API.

Motivation

Make cluster management commands more independent from each cloud provider. Make it easier to use code to define the infrastructure (aka Cortex) in this case.
enhancement research

opened by RobertLucian 0
Restrict minimum EC2/EKS IAM policies by resource

Description

As it is described in https://docs.cortex.dev/clusters/management/auth#minimum-iam-policy, the current minimum IAM policy is to grant the cortex CLI (and by that extension to eskctl) full control over the EC2/EKS services.

Motivation

These should be restricted to a resource-based policy that would limit what an IAM role/user can do. This is especially helpful in bigger corporations where there are more than a handful of developers and the company's policy on what access its devs have is more stringent.

Additional context

This seems to be blocked on what eksctl requires: https://eksctl.io/usage/minimum-iam-policies/. Talk to the eksctl team to see if there's a way to further reduce the IAM policy requirements.
enhancement provisioning

opened by RobertLucian 0

Releases(v0.42.1)

v0.42.1(Sep 23, 2022)
v0.42.1

New features

Add support for new set of EC2 instances amongst which the c6 and g5 families can be found https://github.com/cortexlabs/cortex/issues/2414 (RobertLucian)

Bug fixes

Esthetic fix where the VPC CNI logging functionality was triggering warn logs when running the cortex CLI https://github.com/cortexlabs/cortex/pull/2443 (RobertLucian)

Misc

Update Cortex dependency versions; eksctl, EKS to 1.22, AWS IAM, Python, etc https://github.com/cortexlabs/cortex/issues/2414 (RobertLucian, deliahu)

Source code(tar.gz)
Source code(zip)
v0.42.0(Jan 10, 2022)
v0.42.0

New features

Add support for the Classic Load Balancer for APIs; the Network Load Balancer remains the default (docs) https://github.com/cortexlabs/cortex/pull/2413 https://github.com/cortexlabs/cortex/issues/2414 (RobertLucian)

Bug fixes

Fix Async API http/tcp probes when probing the empty root path (/) https://github.com/cortexlabs/cortex/pull/2407 (RobertLucian)

Fix nil pointer exception in the cortex cluster export command https://github.com/cortexlabs/cortex/pull/2415 https://github.com/cortexlabs/cortex/issues/2414 (RobertLucian)

Ensure that user-specified environment variables are ordered deterministically in the Kubernetes deployment spec https://github.com/cortexlabs/cortex/pull/2411 (deliahu)

Misc

Ensure that the batch on-job-complete request contains a valid JSON body https://github.com/cortexlabs/cortex/pull/2409 (RobertLucian)

Source code(tar.gz)
Source code(zip)
v0.41.0(Dec 8, 2021)
v0.41.0

New features

Support configurable pre_stop command for containers https://github.com/cortexlabs/cortex/pull/2403 (docs) (deliahu)

Misc

Support m6i instance types https://github.com/cortexlabs/cortex/pull/2398 (deliahu)

Update to Kubernetes v1.21 https://github.com/cortexlabs/cortex/pull/2398 (deliahu)

Bug fixes

Wait for in-flight requests to reach zero before terminating the proxy container https://github.com/cortexlabs/cortex/pull/2402 (deliahu)

Fix cortex get --env command https://github.com/cortexlabs/cortex/pull/2404 (deliahu)

Fix cluster price estimate during cortex cluster up for spot node groups with on-demand base capacity https://github.com/cortexlabs/cortex/pull/2406 (RobertLucian)

Nucleus Model Server

We have released v0.1.0 of the Nucleus model server!

Nucleus is a model server for TensorFlow and generic Python models. It is compatible with Cortex clusters, Kubernetes clusters, and any other container-based deployment platforms. Nucleus can also be run locally via Docker compose.

Some of Nucleus's features include:

Generic Python models (PyTorch, ONNX, Sklearn, MLFlow, Numpy, Pandas, etc)

TensorFlow models

CPU and GPU support

Serve models directly from S3 paths

Configurable multiprocessing and multithreadding

Multi-model endpoints

Dynamic server-side request batching

Automatic model reloading when new model versions are uploaded to S3

Model caching based on LRU policy (on disk and memory)

HTTP and gRPC support

Source code(tar.gz)
Source code(zip)
v0.40.0(Aug 5, 2021)
v0.40.0

New features

Support concurrency for Async APIs (via the max_concurrency field) https://github.com/cortexlabs/cortex/pull/2376 https://github.com/cortexlabs/cortex/issues/2200 (miguelvr)

Add graphs for cluster-wide and per-API cost breakdowns to the cluster metrics dashboard https://github.com/cortexlabs/cortex/pull/2382 https://github.com/cortexlabs/cortex/issues/1962 (RobertLucian)

Allow worker nodes containing Async APIs to scale to zero (now a shared async gateway is used, which runs on the operator node group) https://github.com/cortexlabs/cortex/pull/2380 https://github.com/cortexlabs/cortex/issues/2279 (vishalbollu)

Add cortex describe API_NAME command for Realtime and Async APIs https://github.com/cortexlabs/cortex/pull/2368 https://github.com/cortexlabs/cortex/issues/2320 https://github.com/cortexlabs/cortex/issues/2359 (RobertLucian)

Support updating the priority of an existing node group https://github.com/cortexlabs/cortex/pull/2369 https://github.com/cortexlabs/cortex/issues/2254 (vishalbollu)

Misc

Improve the reporting of API statuses https://github.com/cortexlabs/cortex/pull/2368 https://github.com/cortexlabs/cortex/issues/2320 https://github.com/cortexlabs/cortex/issues/2359 (RobertLucian)

Remove the default readiness probe on the target port if a custom readiness probe is specified in the API spec https://github.com/cortexlabs/cortex/pull/2379 (RobertLucian)

Source code(tar.gz)
Source code(zip)
v0.39.1(Jul 21, 2021)
v0.39.1

Bug fixes

Remove an unnecessary cluster validation which limited the IP ranges that could be used in api_load_balancer_cidr_white_list and operator_load_balancer_cidr_white_list https://github.com/cortexlabs/cortex/pull/2363 (RobertLucian)

Source code(tar.gz)
Source code(zip)
v0.39.0(Jul 20, 2021)
v0.39.0

New features

Add cortex cluster health command to show the health of the cluster's components https://github.com/cortexlabs/cortex/pull/2313 https://github.com/cortexlabs/cortex/issues/2029 (miguelvr)

Forward request headers to AsyncAPIs https://github.com/cortexlabs/cortex/pull/2329 https://github.com/cortexlabs/cortex/issues/2296 (miguelvr)

Add metrics dashboard for Task APIs https://github.com/cortexlabs/cortex/pull/2311 https://github.com/cortexlabs/cortex/pull/2322 (RobertLucian)

Reliability

Enable larger cluster sizes (up to 1000 nodes with 10000 pods) by enabling IPVS https://github.com/cortexlabs/cortex/pull/2357 https://github.com/cortexlabs/cortex/issues/1834 (RobertLucian)

Automatically limit the rate at which nodes are added to avoid overloading the Kubernetes API server https://github.com/cortexlabs/cortex/pull/2331 https://github.com/cortexlabs/cortex/pull/2338 https://github.com/cortexlabs/cortex/issues/2314 (RobertLucian)

Ensure cluster autoscaler availability https://github.com/cortexlabs/cortex/pull/2347 https://github.com/cortexlabs/cortex/issues/2346 (RobertLucian)

Improve istiod availability at large scale https://github.com/cortexlabs/cortex/pull/2342 https://github.com/cortexlabs/cortex/issues/2332 (RobertLucian)

Reduce metrics shown in cortex get to improve scalability and reliability of the command https://github.com/cortexlabs/cortex/pull/2333 https://github.com/cortexlabs/cortex/issues/2319 (vishalbollu)

Show aggregated node statistics in the cluster dashboard https://github.com/cortexlabs/cortex/pull/2336 https://github.com/cortexlabs/cortex/issues/2318 (RobertLucian)

Bug fixes

Ensure that the Content-Type header is properly set to application/json for responses to Async API submissions https://github.com/cortexlabs/cortex/pull/2323 (vishalbollu)

Fix pod autoscaler scale-to-zero edge cases https://github.com/cortexlabs/cortex/pull/2350 (miguelvr)

Allow autoscaling configuration to be updated on a running API https://github.com/cortexlabs/cortex/pull/2355 (RobertLucian)

Fix node group priority calculation for the cluster autoscaler https://github.com/cortexlabs/cortex/pull/2358 https://github.com/cortexlabs/cortex/pull/2343 (RobertLucian, deliahu)

Allow the node_groups selector to be updated in a running API https://github.com/cortexlabs/cortex/pull/2354 (RobertLucian)

Fix the active replicas graph on the Async API dashboard https://github.com/cortexlabs/cortex/pull/2328 (RobertLucian)

Docs

Add a guide for running in production https://github.com/cortexlabs/cortex/pull/2334 https://github.com/cortexlabs/cortex/issues/2317 (vishalbollu)

Add a guide for configuring an HTTP API Gateway https://github.com/cortexlabs/cortex/pull/2341 (deliahu)

Misc

Add a graph of the number of active and queued requests to the Async API dashboard https://github.com/cortexlabs/cortex/pull/2326 https://github.com/cortexlabs/cortex/issues/1960 (deliahu)

Add a graph of the number of instances to the cluster dashboard https://github.com/cortexlabs/cortex/pull/2336 https://github.com/cortexlabs/cortex/issues/2318 (RobertLucian)

Ensure that cortex cluster info --print-config displays YAML that is consumable by cortex cluster configure https://github.com/cortexlabs/cortex/pull/2324 (vishalbollu)

Source code(tar.gz)
Source code(zip)
v0.38.0(Jul 6, 2021)
v0.38.0

New features

Support autoscaling down to zero replicas for Realtime APIs https://github.com/cortexlabs/cortex/pull/2298 https://github.com/cortexlabs/cortex/issues/445 (miguelvr)

Allow ssl_certificate_arn, api_load_balancer_cidr_white_list, and operator_load_balancer_cidr_white_list to be updated on an existing cluster (via the cortex cluster configure command) https://github.com/cortexlabs/cortex/pull/2305 https://github.com/cortexlabs/cortex/issues/2107 (vishalbollu)

Allow Prometheus's instance type to be configured (docs) https://github.com/cortexlabs/cortex/pull/2307 https://github.com/cortexlabs/cortex/issues/2285 (RobertLucian)

Allow multiple Inferentia chips to be assigned to a single container https://github.com/cortexlabs/cortex/pull/2304 https://github.com/cortexlabs/cortex/issues/1123 (deliahu)

Bug fixes

Fix cluster autoscaler's nodegroup priority calculation https://github.com/cortexlabs/cortex/pull/2309 (RobertLucian)

Misc

Various scalability improvements https://github.com/cortexlabs/cortex/pull/2307 https://github.com/cortexlabs/cortex/pull/2304 https://github.com/cortexlabs/cortex/issues/2297 https://github.com/cortexlabs/cortex/issues/2278 https://github.com/cortexlabs/cortex/issues/2285

Allow setting a nodegroup's max_instances to 0 https://github.com/cortexlabs/cortex/pull/2310 (RobertLucian)

Source code(tar.gz)
Source code(zip)
v0.37.0(Jun 24, 2021)
v0.37.0

New features

Support ARM instance types https://github.com/cortexlabs/cortex/pull/2268 https://github.com/cortexlabs/cortex/issues/1528 (RobertLucian)

Add cortex cluster configure command to add, remove, or scale nodegroups on a running cluster https://github.com/cortexlabs/cortex/pull/2246 https://github.com/cortexlabs/cortex/issues/2096 (RobertLucian)

Add cortex cluster info --print-config command to print the current configuration of a running cluster https://github.com/cortexlabs/cortex/pull/2246 (RobertLucian)

Add metrics dashboard for Async APIs https://github.com/cortexlabs/cortex/pull/2242 https://github.com/cortexlabs/cortex/issues/1958 (miguelvr)

Support cortex refresh command for Async APIs https://github.com/cortexlabs/cortex/pull/2265 https://github.com/cortexlabs/cortex/issues/2237 (deliahu)

Breaking changes

The cortex cluster scale command has been replaced by the cortex cluster configure command.

Bug fixes

Fix Async API metrics reporting for non-200 response status codes https://github.com/cortexlabs/cortex/pull/2266 (miguelvr)

Make batch job metrics persistence resilient to instance termination https://github.com/cortexlabs/cortex/pull/2247 https://github.com/cortexlabs/cortex/issues/2041 (vishalbollu)

Make network validations during cortex cluster up more permissive (to avoid unnecessarily failing checks on GovCloud) https://github.com/cortexlabs/cortex/pull/2248 (vishalbollu)

Fix Inferentia resource requests https://github.com/cortexlabs/cortex/pull/2250 (RobertLucian)

Docs

Add instructions for exporting logs and metrics to external tools (vishalbollu)

Misc

Improve output of cortex cluster info for running batch jobs https://github.com/cortexlabs/cortex/pull/2270 (deliahu)

Persist Batch job metrics regardless of job status https://github.com/cortexlabs/cortex/pull/2244 (miguelvr)

Support creating clusters with no node groups https://github.com/cortexlabs/cortex/pull/2269 (deliahu)

Improve handling of container startup errors in batch jobs with multiple containers https://github.com/cortexlabs/cortex/pull/2260 https://github.com/cortexlabs/cortex/issues/2217 (vishalbollu)

Add CPU and memory resource requests to the proxy and dequeuer containers https://github.com/cortexlabs/cortex/pull/2252 (deliahu)

Source code(tar.gz)
Source code(zip)
v0.36.0(Jun 8, 2021)
v0.36.0

New features

Support running arbitrary Docker containers in all workload types (Realtime, Async, Batch, Task) https://github.com/cortexlabs/cortex/pull/2173 (RobertLucian, miguelvr, vishalbollu, deliahu, ospillinger)

Support autoscaling Async APIs to zero replicas https://github.com/cortexlabs/cortex/pull/2224 https://github.com/cortexlabs/cortex/issues/2199 (RobertLucian)

Breaking changes

With this release, we have generalized Cortex to exclusively support running arbitrary Docker containers for all workload types (Realtime, Async, Batch, and Task). This enables the use of any model server, programming language, etc. As a result, the API configuration has been updated: the predictor section has been removed, the pod section has been added, and the autoscaling parameters have been modified slightly (depending on the workload type). See updated docs for Realtime, Async, Batch, and Task. If you'd like to to see examples of Dockerizing Python applications, see our test/apis folder.

The cortex prepare-debug command has been removed; Cortex now exclusively runs Docker containers, which can be run locally via docker run.

The cortex patch command as been removed; its behavior is now identical to cortex deploy.

The cortex logs command now prints a CloudWatch Insights URL with a pre-populated query which can be executed to show logs from your workloads, since this is the recommended approach in production. If you wish to stream logs from a pod at random, you can use cortex logs --random-pod (keep in mind that these logs will not include some system logs related to your workload).

gRPC support has been temporarily removed; we are working on adding it back in v0.37.

Bug fixes

Handle exception when initializing the Python client when the default environment is not set https://github.com/cortexlabs/cortex/pull/2225 https://github.com/cortexlabs/cortex/issues/2223 (deliahu)

Docs

Document how to configure SMTP in Grafana (e.g to enable email alerts) https://github.com/cortexlabs/cortex/pull/2219 (RobertLucian)

Misc

Show CloudWatch Insights URL with a pre-populated query in the output of cortex logs https://github.com/cortexlabs/cortex/issues/2085 (vishalbollu)

Improve efficiency of batch job submission validations https://github.com/cortexlabs/cortex/pull/2179 https://github.com/cortexlabs/cortex/issues/2178 (deliahu)

Source code(tar.gz)
Source code(zip)
v0.35.0(May 11, 2021)
v0.35.0

New features

Avoid processing HTTP requests that have been cancelled by the client https://github.com/cortexlabs/cortex/pull/2135 https://github.com/cortexlabs/cortex/issues/1453 (vishalbollu)

Support GP3 volumes (and make GP3 the default volume type) https://github.com/cortexlabs/cortex/pull/2130 https://github.com/cortexlabs/cortex/issues/1843 (RobertLucian)

Allow setting the shared memory (shm) size for Task APIs https://github.com/cortexlabs/cortex/pull/2132 https://github.com/cortexlabs/cortex/issues/2115 (RobertLucian)

Implement automatic 7-day expiration for Async API responses https://github.com/cortexlabs/cortex/pull/2151 (RobertLucian)

Add cortex env rename command https://github.com/cortexlabs/cortex/pull/2165 https://github.com/cortexlabs/cortex/issues/1773 (deliahu)

Breaking changes

The Python client methods which deploy Python classes have been separated from the deploy() method. Now, deploy() is used only to deploy project folders, and deploy_realtime_api(), deploy_async_api(), deploy_batch_api(), and deploy_task_api() are for deploying Python classes. (docs)

The name of the bucket that Cortex uses for internal purposes is no longer configurable. During cluster creation, Cortex will auto-generate the bucket name (and create the bucket if it doesn't exist). During cluster deletion, the bucket will be emptied (unless the --keep-aws-resources flag is provided to cortex cluster down). Users' files should not be stored in the Cortex internal bucket.

Bug fixes

Fix the number of Async API replicas shown in cortex cluster info https://github.com/cortexlabs/cortex/pull/2140 https://github.com/cortexlabs/cortex/issues/2129 (RobertLucian)

Misc

Delete all cortex-created AWS resources when deleting a cluster, and support the --keep-aws-resources flag with cortex cluster down to preserve AWS resources https://github.com/cortexlabs/cortex/pull/2161 https://github.com/cortexlabs/cortex/issues/1612 (RobertLucian)

Validate the user's AWS service quota for number of security groups and in/out rules during cluster creation https://github.com/cortexlabs/cortex/pull/2127 https://github.com/cortexlabs/cortex/issues/2087 (RobertLucian)

Allow specifying only one of --min-instances or --max-instances with cortex cluster scale https://github.com/cortexlabs/cortex/pull/2149 (RobertLucian)

Use 405 status code for un-implemented Realtime API methods https://github.com/cortexlabs/cortex/pull/2158 (RobertLucian)

Decrease file size and project size limits https://github.com/cortexlabs/cortex/pull/2152 (deliahu)

Set the default environment name to the cluster name when creating a cluster https://github.com/cortexlabs/cortex/pull/2164 https://github.com/cortexlabs/cortex/issues/1546 (deliahu)

Source code(tar.gz)
Source code(zip)
v0.34.0(Apr 27, 2021)
v0.34.0

New features

Support handling GET, PUT, PATCH, and DELETE HTTP requests in Realtime APIs (docs) https://github.com/cortexlabs/cortex/pull/2111 https://github.com/cortexlabs/cortex/issues/2063 (RobertLucian)

Support running realtime API containers locally for debugging / development purposes (docs) https://github.com/cortexlabs/cortex/pull/2112 https://github.com/cortexlabs/cortex/issues/2077 (vishalbollu)

Support multiple gRPC services / methods (which can be named arbitrarily) in a single Realtime API (docs) https://github.com/cortexlabs/cortex/pull/2111 https://github.com/cortexlabs/cortex/issues/2063 (RobertLucian)

Support specifying a list of node groups on which a workload is allowed to run (see configuration docs for Realtime, Async, Batch, or Task APIs) https://github.com/cortexlabs/cortex/pull/2098 https://github.com/cortexlabs/cortex/issues/2034 (RobertLucian)

Support AWS GovCloud regions https://github.com/cortexlabs/cortex/pull/2118 https://github.com/cortexlabs/cortex/issues/2103 (vishalbollu)

Breaking changes

"predictor" has been renamed to "handler" throughout the product (API configuration and Python APIs). In addition, as a result of supporting additional HTTP method verbs, predict() has been renamed to handle_post() in Realtime APIs (handle_get(), handle_put(), handle_patch(), and handle_delete() are now also supported). For consistency, predict() has been renamed to handle_async() for Async APIs, and handle_batch() for Batch APIs. See the examples for Realtime, Async, and Batch APIs. Task APIs have not been changed.

Bug fixes

Fix invalid Async workload status during processing https://github.com/cortexlabs/cortex/pull/2106 https://github.com/cortexlabs/cortex/issues/2104 (RobertLucian)

Docs

Add docs for configuring Grafana alerts (RobertLucian)

Document how to create a Cortex cluster without administrator IAM access (vishalbollu)

Add docs for mirroring Cortex's docker images to a private repo (vishalbollu)

Misc

Support json output for the cortex cluster info command https://github.com/cortexlabs/cortex/pull/2089 https://github.com/cortexlabs/cortex/issues/2062 (RobertLucian)

Allow nodegroups to be scaled down to max_instances == 0 https://github.com/cortexlabs/cortex/pull/2095 (deliahu)

Source code(tar.gz)
Source code(zip)
v0.33.0(Apr 13, 2021)
v0.33.0

New features

Allow specifying a CIDR range whitelist for APIs and the operator (docs) https://github.com/cortexlabs/cortex/pull/2071 https://github.com/cortexlabs/cortex/issues/2003 (vishalbollu)

Enable CORS for async, batch, and task APIs https://github.com/cortexlabs/cortex/pull/2082 https://github.com/cortexlabs/cortex/issues/2073 (deliahu)

Breaking changes

The onnx predictor type has been replaced by the python predictor type; please use the python predictor type instead (all onnx models are fully supported by the python predictor type)

Bug fixes

Fix bug affecting async api consistency during heavy traffic https://github.com/cortexlabs/cortex/pull/2072 (RobertLucian)

Fix bug affecting async api updates https://github.com/cortexlabs/cortex/pull/2067 (vishalbollu)

Misc

Rename cortex cluster configure command to cortex cluster scale https://github.com/cortexlabs/cortex/pull/2040 https://github.com/cortexlabs/cortex/issues/1972 (RobertLucian)

Disable AZRebalance autoscaling group process https://github.com/cortexlabs/cortex/pull/2042 https://github.com/cortexlabs/cortex/issues/1349 (RobertLucian)

Add horizontal pod autoscaler to async API gateway https://github.com/cortexlabs/cortex/pull/2079 https://github.com/cortexlabs/cortex/issues/2078 (RobertLucian)

Rename async modules to async_api to avoid name collision with the reserved keyword in Python 3.7+ https://github.com/cortexlabs/cortex/pull/2066 https://github.com/cortexlabs/cortex/issues/2052 (vishalbollu)

Backup images to dockerhub https://github.com/cortexlabs/cortex/pull/2081 (vishalbollu)

Add additional debugging info for cluster up failures https://github.com/cortexlabs/cortex/pull/2080 https://github.com/cortexlabs/cortex/issues/2027 (vishalbollu)

Source code(tar.gz)
Source code(zip)
v0.32.0(Mar 30, 2021)
v0.32.0

New features

Add gRPC support to realtime APIs (docs) https://github.com/cortexlabs/cortex/pull/1997 https://github.com/cortexlabs/cortex/issues/1056 (RobertLucian)

Add support for ONNX and TensorFlow predictor types in async APIs (docs) https://github.com/cortexlabs/cortex/pull/1996 https://github.com/cortexlabs/cortex/issues/1980 (miguelvr)

Support using ECR images from other AWS accounts and regions https://github.com/cortexlabs/cortex/pull/2011 https://github.com/cortexlabs/cortex/issues/1988 (vishalbollu)

Breaking changes

GCP support has been removed so that we can focus our efforts on improving the scalability, reliability, and security for Cortex on AWS. Cortex on GCP will still be available in v0.31. If you are currently using Cortex on GCP, our team will be happy to help you migrate to AWS or work with you to find alternative solutions. Please feel free to reach out to us on slack or email us at [email protected] if you're interested.

Bug fixes

Fix memory plots on Grafana dashboards for realtime and batch APIs https://github.com/cortexlabs/cortex/pull/2024 https://github.com/cortexlabs/cortex/pull/2014 https://github.com/cortexlabs/cortex/issues/1970 (RobertLucian)

Docs

Misc docs improvements https://github.com/cortexlabs/cortex/pull/1994 (ospillinger)

Misc

Increase kubelet's registryPullQPS limit from 5 to 10 https://github.com/cortexlabs/cortex/pull/2023 https://github.com/cortexlabs/cortex/issues/1989 (miguelvr)

Pin the AMI version https://github.com/cortexlabs/cortex/pull/2010 https://github.com/cortexlabs/cortex/issues/1975 https://github.com/cortexlabs/cortex/issues/1615 (vishalbollu)

Source code(tar.gz)
Source code(zip)
v0.31.1(Mar 23, 2021)
v0.31.1

Bug fixes

Preemptible node pools on GCP aren't autoscaling https://github.com/cortexlabs/cortex/pull/1981 (vishalbollu)

Replica autoscaler targets incorrect deployments on operator restart https://github.com/cortexlabs/cortex/pull/1982 (miguelvr)

Replica autoscaler is not reinitialized for running APIs on operator restart on GCP https://github.com/cortexlabs/cortex/pull/1984 (vishalbollu)

Source code(tar.gz)
Source code(zip)
v0.31.0(Mar 17, 2021)
v0.31.0

New features

Add support for AsyncAPI (experimental) (docs) https://github.com/cortexlabs/cortex/pull/1935 https://github.com/cortexlabs/cortex/issues/1610 (miguelvr)

Add support for multi-instance-type clusters to AWS/GCP providers (experimental) (aws/gcp docs) https://github.com/cortexlabs/cortex/pull/1951 (RobertLucian)

Allow users to duplicate/mirror traffic using shadow pipelines https://github.com/cortexlabs/cortex/pull/1948 https://github.com/cortexlabs/cortex/issues/1889 (docs) (vishalbollu)

Breaking changes

on_demand_backup in cluster configuration has been removed in favour of using a cluster with a mixture of spot and on-demand nodegroups. See multi-instance documentation for aws and gcp for more details.

Bug fixes

Fix Python client not respecting CORTEX_CLI_CONFIG_DIR environment variable for client-id.txt https://github.com/cortexlabs/cortex/pull/1953 (jackmpcollins)

Prevent threads from being stuck in DynamicBatcher https://github.com/cortexlabs/cortex/pull/1915 (cbensimon)

Fix unexpected cortex logs termination by increasing buffer size https://github.com/cortexlabs/cortex/pull/1939 (vishalbollu)

Decouple cluster deletion from EBS volume deletion for cortex cluster down https://github.com/cortexlabs/cortex/pull/1954 (deliahu)

Fix spot/on-demand GPU instances not joining the cluster by upgrading to eksctl 0.40.0 https://github.com/cortexlabs/cortex/pull/1955 (vishalbollu)

Prevent premature queue not found errors by preserving the SQS for minutes till after the job has completed https://github.com/cortexlabs/cortex/pull/1952 (vishalbollu)

Docs

Update docs https://github.com/cortexlabs/cortex/pull/1949 (ospillinger)

Misc

Configure a default cortex client to manage APIs from with cortex workloads https://github.com/cortexlabs/cortex/pull/1942 https://github.com/cortexlabs/cortex/issues/1644 (RobertLucian)

Save batch metrics to cloud to preserve job metrics history https://github.com/cortexlabs/cortex/pull/1940 (vishalbollu)

Source code(tar.gz)
Source code(zip)
v0.30.0(Mar 3, 2021)
v0.30.0

New features

Record custom metrics from predictors and view them in Grafana (docs) https://github.com/cortexlabs/cortex/pull/1910 https://github.com/cortexlabs/cortex/issues/1897 (miguelvr)

Add granular pod metrics to the Grafana dashboards https://github.com/cortexlabs/cortex/pull/1905 (RobertLucian)

Add node metrics to Grafana dashboards https://github.com/cortexlabs/cortex/pull/1900 (miguelvr)

Breaking changes

Remove support for installing Cortex on your own Kubernetes Cluster https://github.com/cortexlabs/cortex/pull/1921 (RobertLucian)

Bug fixes

Fix bug where successfully completed jobs were marked as completed with errors https://github.com/cortexlabs/cortex/pull/1913 (vishalbollu)

Fix bug where batch jobs were being terminated unnecessarily https://github.com/cortexlabs/cortex/pull/1917 (vishalbollu)

Prevent cluster autoscaler from reallocating job pods https://github.com/cortexlabs/cortex/pull/1919 (vishalbollu)

Address AWS cluster up quota issues such not enough NAT Gateways or EIPs https://github.com/cortexlabs/cortex/pull/1912 (RobertLucian)

Delete unused prometheus volume on cluster down https://github.com/cortexlabs/cortex/pull/1863 (miguelvr)

Create .cortex dir if not present https://github.com/cortexlabs/cortex/pull/1909 (RobertLucian)

Docs

Add docs for accessing dashboard through private load balancer (docs) https://github.com/cortexlabs/cortex/pull/1907 (deliahu)

Misc

Allow specifying paths for requirements.txt, conda-packages.txt & dependencies.sh (docs) https://github.com/cortexlabs/cortex/pull/1896 https://github.com/cortexlabs/cortex/pull/1927 https://github.com/cortexlabs/cortex/issues/1777 (miguelvr)

Log relevant kubernetes events to API specific log streams https://github.com/cortexlabs/cortex/pull/1906 https://github.com/cortexlabs/cortex/issues/833 (miguelvr)

Support credentials using AWS_SESSION_TOKEN with the CLI/Client (docs) https://github.com/cortexlabs/cortex/pull/1908 https://github.com/cortexlabs/cortex/pull/1920 https://github.com/cortexlabs/cortex/issues/1134 https://github.com/cortexlabs/cortex/issues/1865 (vishalbollu)

Provide auth to Operator and APIs by attaching IAM policies to the cluster (docs) https://github.com/cortexlabs/cortex/pull/1908 https://github.com/cortexlabs/cortex/issues/1858 (vishalbollu)

Source code(tar.gz)
Source code(zip)
v0.29.0(Feb 17, 2021)
v0.29.0

New features

Add Grafana dashboard for APIs (docs) https://github.com/cortexlabs/cortex/pull/1867 https://github.com/cortexlabs/cortex/pull/1885 https://github.com/cortexlabs/cortex/pull/1890 https://github.com/cortexlabs/cortex/pull/1887 (miguelvr)

Support API autoscaling in GCP clusters (docs) https://github.com/cortexlabs/cortex/pull/1814 https://github.com/cortexlabs/cortex/pull/1879 https://github.com/cortexlabs/cortex/issues/1601 (miguelvr)

Support traffic splitting in GCP clusters (docs) https://github.com/cortexlabs/cortex/pull/1892 https://github.com/cortexlabs/cortex/issues/1660 (miguelvr)

Breaking changes

The default Docker images for APIs have been slimmed down to not include packages other than what Cortex requires to function. Therefore, when deploying APIs, it is now necessary to include the dependencies that your predictor needs in requirements.txt (docs) and/or dependencies.sh (docs).

Bug fixes

Disable dynamic batcher for TensorFlow predictor type https://github.com/cortexlabs/cortex/pull/1888 (miguelvr)

Support empty directory objects for models saved in S3/GCS https://github.com/cortexlabs/cortex/pull/1830 https://github.com/cortexlabs/cortex/issues/1829 (RobertLucian)

Fix bug which prevented Task APIs on GCP from being cleaned up after completion https://github.com/cortexlabs/cortex/pull/1871 (RobertLucian)

Docs

Add documentation for using a version of Python other than the default via dependencies.sh (docs) or custom images (docs) https://github.com/cortexlabs/cortex/pull/1862 https://github.com/cortexlabs/cortex/issues/1779 (RobertLucian)

Misc

Support deploying predictor Python classes from more environments (e.g. from separate Python files, AWS Lambda) https://github.com/cortexlabs/cortex/pull/1883 https://github.com/cortexlabs/cortex/commit/3a1b777d06e660a49b6223badda4c5e8b1fe4ec1 https://github.com/cortexlabs/cortex/issues/1824 https://github.com/cortexlabs/cortex/issues/1826 (vishalbollu)

Improve error logging for Batch and Task APIs https://github.com/cortexlabs/cortex/pull/1866 https://github.com/cortexlabs/cortex/issues/1833 (RobertLucian)

Source code(tar.gz)
Source code(zip)
v0.28.0(Feb 3, 2021)
v0.28.0

New features

Support installing Cortex on an existing Kubernetes cluster (on AWS or GCP) (docs) https://github.com/cortexlabs/cortex/pull/1837 https://github.com/cortexlabs/cortex/issues/1808 (vishalbollu)

Breaking changes

The cloudwatch dashboard has been removed as a result of our switch to Prometheus for metrics aggregation. The dashboard will be replaced with an alternative in an upcoming release.

Bug fixes

Fix bug which can cause requests to APIs from a Python client to timeout during cluster autoscaling https://github.com/cortexlabs/cortex/pull/1841 https://github.com/cortexlabs/cortex/issues/1840 (RobertLucian)

Fix bug which can cause downscale_stabilization_period to be disregarded during downscaling https://github.com/cortexlabs/cortex/pull/1847 https://github.com/cortexlabs/cortex/issues/1846 (RobertLucian)

Misc

AWS credentials are no longer required to connect the CLI to the cluster operator. If you need to restrict access to your cluster operator, configure the operator's load balancer to be private by setting operator_load_balancer_scheme: internal in your cluster configuration file, and set up VPC Peering. We plan in supporting a new auth strategy in an upcoming release.

Improve S6 error code/signal handling https://github.com/cortexlabs/cortex/pull/1825 https://github.com/cortexlabs/cortex/issues/1703 (RobertLucian)

Source code(tar.gz)
Source code(zip)
v0.27.0(Jan 21, 2021)
v0.27.0

New features

Add new API type TaskAPI for running arbitrary Python jobs (docs) https://github.com/cortexlabs/cortex/pull/1717 https://github.com/cortexlabs/cortex/issues/253 (miguelvr, RobertLucian)

Write Cortex's logs as structured logs, and allow use of Cortex's structured logger in predictors (supports adding extra fields) (aws docs, gcp docs) https://github.com/cortexlabs/cortex/pull/1778 https://github.com/cortexlabs/cortex/pull/1803 https://github.com/cortexlabs/cortex/pull/1804 https://github.com/cortexlabs/cortex/issues/1732 https://github.com/cortexlabs/cortex/issues/1563 (vishalbollu)

Support preemptible instances on GCP (docs) https://github.com/cortexlabs/cortex/pull/1791 https://github.com/cortexlabs/cortex/issues/1631 (RobertLucian)

Support private load balancers on GCP (docs) https://github.com/cortexlabs/cortex/pull/1786 https://github.com/cortexlabs/cortex/issues/1621 (deliahu)

Support GCP instances with multiple GPUs (docs) https://github.com/cortexlabs/cortex/pull/1789 https://github.com/cortexlabs/cortex/issues/1784 (deliahu)

Breaking changes

cortex logs now streams logs from a single replica at random when there are multiple replicas for an API. The recommended way to analyze production logs is via a dedicated logging tool (by default, logs are sent to CloudWatch on AWS and StackDriver on GCP)

Bug fixes

Misc Python client fixes https://github.com/cortexlabs/cortex/pull/1798 https://github.com/cortexlabs/cortex/pull/1782 https://github.com/cortexlabs/cortex/pull/1772 (vishalbollu, RobertLucian)

Docs

Document the shared /mnt directory for TensorFlow predictors https://github.com/cortexlabs/cortex/pull/1802 https://github.com/cortexlabs/cortex/issues/1792 (deliahu)

Misc GCP docs improvements https://github.com/cortexlabs/cortex/pull/1799 (deliahu)

Misc

Improve out-of-memory status reporting (RobertLucian)

Improve batch job cleanup process https://github.com/cortexlabs/cortex/pull/1797 https://github.com/cortexlabs/cortex/pull/1796 (vishalbollu)

Remove grpc msg send/receive limit https://github.com/cortexlabs/cortex/pull/1769 https://github.com/cortexlabs/cortex/issues/1740 (RobertLucian)

Source code(tar.gz)
Source code(zip)
v0.26.0(Jan 6, 2021)
v0.26.0

New features

Support configuring the log level for APIs (docs) https://github.com/cortexlabs/cortex/pull/1741 https://github.com/cortexlabs/cortex/issues/1484 (RobertLucian)

Support creating a cluster in an existing AWS VPC (docs) https://github.com/cortexlabs/cortex/pull/1759 https://github.com/cortexlabs/cortex/issues/1142 (deliahu)

Support specifying the GCP network and subnet for the Cortex cluster (docs) https://github.com/cortexlabs/cortex/pull/1752 https://github.com/cortexlabs/cortex/issues/1738 (deliahu)

Support configuring shared memory size (shm) for inter-process communication (docs) https://github.com/cortexlabs/cortex/pull/1756 https://github.com/cortexlabs/cortex/issues/1638 (vishalbollu)

Breaking changes

The local provider has been removed. The best way to test your predictor implementation locally is to import it in a separate Python file and call your __init__() and predict() functions directly. The best way to test your API is to deploy it to a dev/test cluster.

Built-in support for API Gateway has been removed. If you need to create an https endpoint with valid certs, some options are to set up a custom domain or to manually create an API Gateway.

Prediction monitoring has been removed. We are exploring how to build a more powerful and customizable solution for this.

The predict CLI command has been deleted. curl, requests, etc. are the best tools for testing APIs.

Bug fixes

For multi-model APIs, allow model names to share a prefix https://github.com/cortexlabs/cortex/pull/1745 https://github.com/cortexlabs/cortex/issues/1699 (RobertLucian)

Docs

Misc docs improvements (ospillinger)

Source code(tar.gz)
Source code(zip)
v0.25.0(Dec 23, 2020)
v0.25.0

New features

Support server-side micro batching for the Python predictor (docs) https://github.com/cortexlabs/cortex/pull/1653 https://github.com/cortexlabs/cortex/issues/1382 (miguelvr)

Add timeout configuration for batch jobs (docs) https://github.com/cortexlabs/cortex/pull/1712 https://github.com/cortexlabs/cortex/issues/1324 (vishalbollu)

Support batch retries (docs) https://github.com/cortexlabs/cortex/pull/1713 https://github.com/cortexlabs/cortex/issues/1540 (lapaniku, vishalbollu)

Support sending failed batches to a dead-letter queue (docs) https://github.com/cortexlabs/cortex/pull/1713 https://github.com/cortexlabs/cortex/issues/1541 (lapaniku, vishalbollu)

Support installing the cortex Python client in predictors https://github.com/cortexlabs/cortex/pull/1709 https://github.com/cortexlabs/cortex/issues/1670 https://github.com/cortexlabs/cortex/issues/1206 (RobertLucian)

Breaking changes

The predictor.model_path field of the realtime api configuration has been moved to predictor.models.path. In addition, for the Python predictor type, predictor.models has been renamed to predictor.multi_model_reloading. Here is the entire API configuration schema.

Bug fixes

Misc batch reliability improvements https://github.com/cortexlabs/cortex/pull/1705 https://github.com/cortexlabs/cortex/pull/1718 https://github.com/cortexlabs/cortex/pull/1729 (vishalbollu)

Docs

Reorganize the docs structure https://github.com/cortexlabs/cortex/pull/1696 https://github.com/cortexlabs/cortex/pull/1701 https://github.com/cortexlabs/cortex/pull/1704 https://github.com/cortexlabs/cortex/pull/1719 https://github.com/cortexlabs/cortex/issues/1675 (ospillinger)

Add GCP to the contributing guide https://github.com/cortexlabs/cortex/pull/1720 https://github.com/cortexlabs/cortex/issues/1654 (deliahu)

Add docs for setting up kubectl on GCP https://github.com/cortexlabs/cortex/commit/759b4b144c25cc623e1b385b036f83825d122db7 (deliahu)

Misc

Parse the request body as a string when content type text/plain is specified https://github.com/cortexlabs/cortex/pull/1714 (deliahu)

Support paths to single ONNX files in API configuration https://github.com/cortexlabs/cortex/pull/1711 https://github.com/cortexlabs/cortex/issues/1686 (RobertLucian)

Support deploying public S3 models on GCP, and public GCS models on AWS https://github.com/cortexlabs/cortex/pull/1694 https://github.com/cortexlabs/cortex/issues/1684 (RobertLucian)

Pre-download docker images when creating GCP clusters https://github.com/cortexlabs/cortex/pull/1721 https://github.com/cortexlabs/cortex/issues/1658 (deliahu)

Speed up the validation processes for multi-model APIs https://github.com/cortexlabs/cortex/pull/1690 https://github.com/cortexlabs/cortex/issues/1663 (RobertLucian)

Source code(tar.gz)
Source code(zip)
v0.24.1(Dec 13, 2020)
v0.24.1

Bug fixes

Propagate the exit code from the predictor's initialization so that the API status is set to "error" when initialization fails https://github.com/cortexlabs/cortex/issues/1680 https://github.com/cortexlabs/cortex/pull/1691 (RobertLucian)

Source code(tar.gz)
Source code(zip)
v0.24.0(Dec 9, 2020)
v0.24.0

New features

Add GCP support: our initial release supports all three predictor types (Python, TensorFlow, ONNX), on CPU or GPU, with live reloading, multi-model caching, and cluster autoscaling https://github.com/cortexlabs/cortex/pull/1655 https://github.com/cortexlabs/cortex/pull/1672 https://github.com/cortexlabs/cortex/pull/1667 https://github.com/cortexlabs/cortex/issues/1661 https://github.com/cortexlabs/cortex/issues/114 https://github.com/cortexlabs/cortex/issues/1600 https://github.com/cortexlabs/cortex/issues/1602 https://github.com/cortexlabs/cortex/issues/1616 https://github.com/cortexlabs/cortex/issues/1624 (RobertLucian, deliahu, vishalbollu)

Add the patch command to the CLI and Python client, which can be used to update an API using only the API configuration (without needing to provide the predictor's Python implementation) https://github.com/cortexlabs/cortex/pull/1651 https://github.com/cortexlabs/cortex/pull/1666 https://github.com/cortexlabs/cortex/issues/1329 (vishalbollu)

Support deploying predictor Python classes from the Python client https://github.com/cortexlabs/cortex/pull/1587 https://github.com/cortexlabs/cortex/issues/1617 (see the tutorial for an example) (vishalbollu)

Breaking changes

The Python client's deploy() function has been renamed to create_api(), and some of the argument names have changed (docs)

Bug fixes

Enable CORS for APIs accessed via API Gateway or load balancer https://github.com/cortexlabs/cortex/pull/1649 https://github.com/cortexlabs/cortex/issues/1234 (RobertLucian, deliahu)

Fix local TensorFlow models when live reloading is enabled https://github.com/cortexlabs/cortex/pull/1668 https://github.com/cortexlabs/cortex/issues/1554 (RobertLucian)

Prevent TensorFlow multi-model caching from attempting to download local models from S3 https://github.com/cortexlabs/cortex/pull/1669 https://github.com/cortexlabs/cortex/issues/1598 (RobertLucian)

Docs

Miscellaneous docs improvements (vishalbollu, ospillinger)

Misc

Improve Python client cross Python version compatibility https://github.com/cortexlabs/cortex/pull/1640 (vishalbollu)

Reinstall TensorFlow and ONNX dependencies when the Python version is overridden https://github.com/cortexlabs/cortex/pull/1652 (vishalbollu)

Terminate container when bootloader script fails https://github.com/cortexlabs/cortex/pull/1639 (vishalbollu)

Source code(tar.gz)
Source code(zip)
v0.23.0(Nov 25, 2020)
v0.23.0

New features

Update Python client deploy() to accept a Python dictionary for API configuration (previously, only a file path was supported) (docs) https://github.com/cortexlabs/cortex/pull/1587 (vishalbollu)

Show API deployment history in cortex get API_NAME command https://github.com/cortexlabs/cortex/pull/1544 https://github.com/cortexlabs/cortex/issues/1496 (deliahu)

Add cortex export API_NAME and cortex export API_NAME API_ID commands to export specific and historical API deployments https://github.com/cortexlabs/cortex/pull/1544 https://github.com/cortexlabs/cortex/issues/1497 (deliahu)

Build and push python-predictor-gpu-slim image with different combinations of cuda and cudnn (cuda10.0-cudnn7, cuda10.1-cudnn7, cuda10.1-cudnn8, cuda10.2-cudnn7, cuda10.2-cudnn8, cuda11.0-cudnn8, cuda11.1-cudnn8) (docs) https://github.com/cortexlabs/cortex/pull/1575 https://github.com/cortexlabs/cortex/issues/1574 (deliahu)

Bug fixes

Allow local deployments of public S3 models without requiring AWS credentials https://github.com/cortexlabs/cortex/pull/1589 https://github.com/cortexlabs/cortex/issues/1588 (RobertLucian)

Docs

Add guide for avoiding Docker Hub rate limits https://github.com/cortexlabs/cortex/pull/1576 (RobertLucian, deliahu)

Add guide for self-hosting Cortex's Docker images https://github.com/cortexlabs/cortex/pull/1579 (RobertLucian, deliahu)

Misc

Remove API request maximum payload size limit https://github.com/cortexlabs/cortex/pull/1583 (deliahu)

Switch to Quay docker container registry https://github.com/cortexlabs/cortex/pull/1578 (deliahu, RobertLucian)

Source code(tar.gz)
Source code(zip)
v0.22.1(Nov 19, 2020)
v0.22.1

Bug fixes

Set the predictor's working directory to the root Cortex project directory https://github.com/cortexlabs/cortex/pull/1573 https://github.com/cortexlabs/cortex/issues/1572 (deliahu)

Allow max_instances to be updated via cortex cluster configure https://github.com/cortexlabs/cortex/pull/1568 https://github.com/cortexlabs/cortex/issues/1567 (deliahu)

Gracefully stop the serving container when a multi-processed cron throws exception https://github.com/cortexlabs/cortex/pull/1560 https://github.com/cortexlabs/cortex/issues/1552 (RobertLucian)

Docs

Demonstrate how to make API requests with various payload types (binary, form fields, etc), and show how to access them in predict() https://github.com/cortexlabs/cortex/pull/1566 (docs)

Misc docs improvements https://github.com/cortexlabs/cortex/pull/1551 https://github.com/cortexlabs/cortex/pull/1556 c3dab4045a61703cb1db1d5f95776614252f96c0 https://github.com/cortexlabs/cortex/pull/1557 (deliahu, RobertLucian)

Misc

Build and upload the Python package/CLI to a public S3 bucket https://github.com/cortexlabs/cortex/pull/1562 (vishalbollu)

Source code(tar.gz)
Source code(zip)
v0.22.0(Nov 11, 2020)
v0.22.0

New features

Multi-model caching: serve a collection of models that is collectively bigger than what will fit in memory (via LRU cache eviction) (docs) https://github.com/cortexlabs/cortex/pull/1428 https://github.com/cortexlabs/cortex/issues/619 (RobertLucian)

Live reloading: support updating models in running APIs by adding new versions to the model's S3 directory (docs) https://github.com/cortexlabs/cortex/pull/1428 https://github.com/cortexlabs/cortex/issues/1252 (RobertLucian)

Inter-process fairness: distribute requests within an API replica evenly across all processes https://github.com/cortexlabs/cortex/pull/1526 https://github.com/cortexlabs/cortex/issues/839 https://github.com/cortexlabs/cortex/issues/1298 (RobertLucian)

Support requests between APIs within the same cluster (docs) https://github.com/cortexlabs/cortex/pull/1503 https://github.com/cortexlabs/cortex/issues/1241 (deliahu)

Allow overriding of CLI install path and config directory (via $CORTEX_INSTALL_PATH and $CORTEX_CLI_CONFIG_DIR) (docs) https://github.com/cortexlabs/cortex/pull/1521 https://github.com/cortexlabs/cortex/issues/1222 (deliahu)

Breaking changes

ONNX model paths in API configuration files must now point to a directory containing a single ONNX file, rather than the onnx file itself. For example model_path: s3://cortex-examples/onnx/yolov5-youtube/yolov5s.onnx becomes model_path: s3://cortex-examples/onnx/yolov5-youtube.

The --env/-e flag in all cortex cluster commands has been renamed to --configure-env/-e, and if not provided, the environment named aws will no longer be configured in the cortex cluster info command

Bug fixes

Fix intermittent failed requests during rolling updates https://github.com/cortexlabs/cortex/pull/1526 https://github.com/cortexlabs/cortex/issues/814 (RobertLucian)

Prevent CLI environments from getting overwritten when multiple cortex cluster commands are run concurrently https://github.com/cortexlabs/cortex/pull/1520 https://github.com/cortexlabs/cortex/issues/1410 (deliahu)

Docs

Add Python client docs https://github.com/cortexlabs/cortex/pull/1519 https://github.com/cortexlabs/cortex/issues/1502 (deliahu)

Add guide for running in production https://github.com/cortexlabs/cortex/pull/1513 https://github.com/cortexlabs/cortex/issues/1464 https://github.com/cortexlabs/cortex/issues/1257 (deliahu)

Add guide for low-cost clusters https://github.com/cortexlabs/cortex/pull/1514 https://github.com/cortexlabs/cortex/issues/1425 (deliahu)

Add guide for using a REST API Gateway https://github.com/cortexlabs/cortex/pull/1505 https://github.com/cortexlabs/cortex/issues/1228 (deliahu)

Add guide for troubleshooting cortex cluster down failures https://github.com/cortexlabs/cortex/pull/1515 https://github.com/cortexlabs/cortex/issues/1319 (deliahu)

Misc

Stagger Predictor __init__() calls to reduce peak memory consumption https://github.com/cortexlabs/cortex/pull/1543 https://github.com/cortexlabs/cortex/issues/1450 (RobertLucian)

Add --name/-n and --region/-r flags to cortex cluster info, cortex cluster export, and cortex cluster down commands https://github.com/cortexlabs/cortex/pull/1492 https://github.com/cortexlabs/cortex/issues/1363 (RobertLucian)

Rename --env/-e flag to --configure-env/-e in cortex cluster commands and update its behavior https://github.com/cortexlabs/cortex/pull/1533 https://github.com/cortexlabs/cortex/issues/1412 (deliahu)

Disallow ARM-based instances, which are not currently supported https://github.com/cortexlabs/cortex/pull/1536 (deliahu)

Validate AWS vCPU quota is sufficient for up to max_instances instances when running cortex cluster up and cortex cluster configure https://github.com/cortexlabs/cortex/pull/1537 https://github.com/cortexlabs/cortex/issues/1461 (deliahu)

Source code(tar.gz)
Source code(zip)
v0.21.0(Oct 27, 2020)
New features

Add Python client: pypi.org/project/cortex https://github.com/cortexlabs/cortex/pull/1449 https://github.com/cortexlabs/cortex/issues/684 (vishalbollu)

Add support for private docker image registries (docs) https://github.com/cortexlabs/cortex/pull/1460 https://github.com/cortexlabs/cortex/issues/1113 (deliahu)

Bug fixes

Fix minor BatchAPI bugs https://github.com/cortexlabs/cortex/pull/1471 https://github.com/cortexlabs/cortex/pull/1468 https://github.com/cortexlabs/cortex/pull/1480 https://github.com/cortexlabs/cortex/issues/1473 (vishalbollu, RobertLucian)

Bypass instance limit check if AWS's API doesn't provide quota information (this was blocking cluster creation in eu-north-1) https://github.com/cortexlabs/cortex/pull/1439 https://github.com/cortexlabs/cortex/issues/1438 (deliahu)

Docs

Add a guide for how to install the CLI on Windows https://github.com/cortexlabs/cortex/pull/1476 https://github.com/cortexlabs/cortex/issues/715 (RobertLucian)

Misc

Change default local port from 8888 to 8890 to avoid port conflicts with Jupyter https://github.com/cortexlabs/cortex/pull/1456 (vishalbollu)

Disallow instance types that aren't supported by NLB https://github.com/cortexlabs/cortex/pull/1436 https://github.com/cortexlabs/cortex/issues/1433 (deliahu)

Add --cluster-aws-key and --cluster-aws-secret flags to cortex cluster configure command https://github.com/cortexlabs/cortex/pull/1404 (deliahu)

Add --output flag to cortex env list command https://github.com/cortexlabs/cortex/pull/1444 (vishalbollu)

Source code(tar.gz)
Source code(zip)
v0.20.0(Sep 29, 2020)
v0.20.0

New features

Add cortex cluster export command to export all APIs running in a cluster (docs) https://github.com/cortexlabs/cortex/pull/1368 https://github.com/cortexlabs/cortex/issues/1255 (vishalbollu)

Enable users to specify CIDR ranges for the cluster's VPC (docs) https://github.com/cortexlabs/cortex/pull/1388 (vishalbollu)

Support json output for CLI commands (via -o/--output json) https://github.com/cortexlabs/cortex/pull/1365 https://github.com/cortexlabs/cortex/issues/1161 (vishalbollu)

Support the nvidia device driver (nvidia-container-toolkit) when running locally https://github.com/cortexlabs/cortex/pull/1366 https://github.com/cortexlabs/cortex/issues/1223 (vishalbollu)

Breaking changes

The valid values for api_gateway in the cluster configuration file have been changed from enabled/disabled to public/none (to match the values for networking.api_gateway in the API configuration file).

Bug fixes

Support AWS tags with spaces and valid special characters https://github.com/cortexlabs/cortex/pull/1374 https://github.com/cortexlabs/cortex/pull/1355 https://github.com/cortexlabs/cortex/pull/1380 https://github.com/cortexlabs/cortex/pull/1385 https://github.com/cortexlabs/cortex/issues/1373 (deliahu)

Fix tensor shape validation for the TensorFlow predictor https://github.com/cortexlabs/cortex/pull/1311 https://github.com/cortexlabs/cortex/issues/1310 (RobertLucian)

Allow cortex cluster * commands to be run from within a docker container https://github.com/cortexlabs/cortex/pull/1370 https://github.com/cortexlabs/cortex/issues/1361 https://github.com/cortexlabs/cortex/issues/1325 (deliahu)

New examples

pytorch/question-generator to generate questions given text and the correct answer (uses transformers and spacy) https://github.com/cortexlabs/cortex/pull/1308 (ismaelc)

Docs

Add documentation for how to install a specific version of the CLI https://github.com/cortexlabs/cortex/pull/1386 https://github.com/cortexlabs/cortex/issues/1244 (vishalbollu)

Add sections for overprovisioning and responsiveness to autoscaling docs https://github.com/cortexlabs/cortex/pull/1397 (deliahu)

Add documentation for how to allow IAM users who did not create the cortex cluster to run cortex cluster * commands https://github.com/cortexlabs/cortex/pull/1392 https://github.com/cortexlabs/cortex/issues/1391 (deliahu)

Add guide for setting up kubectl to access the cluster https://github.com/cortexlabs/cortex/pull/1344 https://github.com/cortexlabs/cortex/issues/1343 (RobertLucian)

Misc

Update sources of AWS credentials for cortex cluster * commands, and improve transparency (docs) https://github.com/cortexlabs/cortex/pull/1378 https://github.com/cortexlabs/cortex/issues/1229 (vishalbollu)

Rename cluster api_gateway config values to match API config https://github.com/cortexlabs/cortex/pull/1335 https://github.com/cortexlabs/cortex/issues/1334 (deliahu)

Set the default value for networking.api_gateway in the API configuration to none if api gateway is disabled cluster-wide https://github.com/cortexlabs/cortex/pull/1337 https://github.com/cortexlabs/cortex/issues/1336 (deliahu)

Support c6g and r6g instances https://github.com/cortexlabs/cortex/pull/1332 https://github.com/cortexlabs/cortex/issues/809 (deliahu)

Display autoscaling group activity history when cortex cluster up fails https://github.com/cortexlabs/cortex/pull/1342 https://github.com/cortexlabs/cortex/issues/1340 (deliahu)

Print debug info if cortex cluster up times out https://github.com/cortexlabs/cortex/pull/1396 (deliahu)

Add Inferentia compute statistics to cortex cluster info command https://github.com/cortexlabs/cortex/pull/1354 https://github.com/cortexlabs/cortex/issues/1304 (RobertLucian)

Disable prompts in get-cli.sh if not running interactively https://github.com/cortexlabs/cortex/pull/1372 https://github.com/cortexlabs/cortex/issues/1371 (deliahu)

Update cortex help output https://github.com/cortexlabs/cortex/pull/1398 (deliahu)

Source code(tar.gz)
Source code(zip)
v0.19.0(Aug 25, 2020)
New features

Support batch APIs docs https://github.com/cortexlabs/cortex/pull/1203 https://github.com/cortexlabs/cortex/issues/523 (vishalbollu)

Support traffic splitting (enables A/B testing, multi-armed bandit, etc) docs https://github.com/cortexlabs/cortex/pull/1213 https://github.com/cortexlabs/cortex/pull/1270 https://github.com/cortexlabs/cortex/issues/1132 https://github.com/cortexlabs/cortex/issues/275 https://github.com/cortexlabs/cortex/issues/1089 (tthebst)

Support server-side request batching for the TensorFlow Predictor docs https://github.com/cortexlabs/cortex/pull/1193 https://github.com/cortexlabs/cortex/issues/1060 (RobertLucian)

Add post_predict() method to Predictor interface (runs after the response has been sent) docs https://github.com/cortexlabs/cortex/pull/1237 https://github.com/cortexlabs/cortex/issues/954 (RobertLucian)

Support disabling API Gateway cluster-wide docs https://github.com/cortexlabs/cortex/pull/1259 https://github.com/cortexlabs/cortex/issues/1198 (deliahu)

Support different CUDA versions for the slim Python Predictor image docs https://github.com/cortexlabs/cortex/pull/1263 https://github.com/cortexlabs/cortex/issues/923 https://github.com/cortexlabs/cortex/issues/1254 (RobertLucian)

Add additional widgets to the CloudWatch Dashboard (avg in-flight requests per replica, active replicas) docs https://github.com/cortexlabs/cortex/pull/1181 (RobertLucian)

Breaking changes

kind is now a required top-level field for all API configurations. Existing APIs should add kind: RealtimeAPI. This release adds support for kind: BatchAPI and kind: TrafficSplitter.

Bug fixes

Fix python_path config field https://github.com/cortexlabs/cortex/pull/1202 (deliahu)

Fix local TensorFlow deploy from parent directory https://github.com/cortexlabs/cortex/pull/1274 (deliahu)

Improve error response for invalid payloads https://github.com/cortexlabs/cortex/pull/1212 https://github.com/cortexlabs/cortex/issues/1208 (RobertLucian)

New examples

onnx/yolov5-youtube https://github.com/cortexlabs/cortex/pull/1201 (dsuess)

Update PyTorch text generator example to use Hugging Face transfomers GPT-2 model https://github.com/cortexlabs/cortex/pull/1177 (ospillinger)

Docs

Update tutorial to use the pytorch text-generator example https://github.com/cortexlabs/cortex/pull/1278 https://github.com/cortexlabs/cortex/issues/1256 (deliahu)

Improve instructions for updating cluster without downtime https://github.com/cortexlabs/cortex/pull/1261 (deliahu)

Mention API Gateway timeout in 404/503 API responses guide https://github.com/cortexlabs/cortex/pull/1264 https://github.com/cortexlabs/cortex/issues/1225 (deliahu)

Misc

Set tags on log groups https://github.com/cortexlabs/cortex/pull/1164 https://github.com/cortexlabs/cortex/issues/1078 (tthebst)

Display API metrics in the CLI by API ID (rather than by API name) https://github.com/cortexlabs/cortex/pull/1216 (vishalbollu)

Fix recursive error message for deploy/delete CLI commands https://github.com/cortexlabs/cortex/pull/1247 https://github.com/cortexlabs/cortex/issues/1218 (RobertLucian)

Add shell completion to .zshrc file during CLI installation https://github.com/cortexlabs/cortex/pull/1265 https://github.com/cortexlabs/cortex/issues/1221 (deliahu)

Handle OOM error when project files are too large https://github.com/cortexlabs/cortex/pull/1217 (RobertLucian)

Display image pull errors https://github.com/cortexlabs/cortex/pull/1167 https://github.com/cortexlabs/cortex/issues/955 (deliahu)

Display local Docker image pull error when out of space https://github.com/cortexlabs/cortex/pull/1238 https://github.com/cortexlabs/cortex/issues/1236 (zouyee)

Source code(tar.gz)
Source code(zip)
v0.18.1(Jun 30, 2020)
Bug fixes

Fix dynamic axes for ONNX models https://github.com/cortexlabs/cortex/pull/1187 https://github.com/cortexlabs/cortex/issues/1186 (RobertLucian)

Fix memory node capacity calculation for multi-api configuration files https://github.com/cortexlabs/cortex/pull/1185 (deliahu)

Check cluster-name tag when choosing load balancer for VPC Link integration https://github.com/cortexlabs/cortex/pull/1173 (deliahu)

New guides

Troubleshooting: API request errors (deliahu)

Troubleshooting: TensorFlow session in predict() (RobertLucian)

Misc

Delete API Gateway if cluster up fails https://github.com/cortexlabs/cortex/pull/1172 (deliahu)

Move image version verification from serve.py to run.sh https://github.com/cortexlabs/cortex/pull/1180 https://github.com/cortexlabs/cortex/pull/1183 (vishalbollu)

Add retries for resource tagging during cluster up https://github.com/cortexlabs/cortex/pull/1188 (deliahu)

Use info log level when TensorFlow model is being loaded https://github.com/cortexlabs/cortex/pull/1171 (RobertLucian)

Increase max number of processes per API replica to 100 https://github.com/cortexlabs/cortex/pull/1166 (RobertLucian)

Allow empty cluster config https://github.com/cortexlabs/cortex/pull/1179 (deliahu)

Source code(tar.gz)
Source code(zip)

Model serving at scale

Related tags

Overview

Run inference at scale

Workloads

Realtime APIs - respond to prediction requests in real-time

Batch APIs - run distributed inference on large datasets

How it works

Implement a Predictor

Configure a realtime API

Deploy

Serve prediction requests

Get started

Comments

Description

Version

Description

Configuration

Steps to reproduce

Expected behavior

Actual behavior

Screenshots

Stack traces

Additional context

Suggested solution

Description

Motivation

Changes

Testing

v0.5.9

Changelog

v0.5.7

What's Changed

New Contributors

v0.5.6

Changelog

Docker Images

v0.5.5

Changelog

Docker Images

Description

Motivation

Description

Motivation

Additional context

Releases(v0.42.1)

v0.42.1(Sep 23, 2022)

v0.42.1

v0.42.0(Jan 10, 2022)

v0.42.0

v0.41.0(Dec 8, 2021)

v0.41.0

Nucleus Model Server

v0.40.0(Aug 5, 2021)

v0.40.0

v0.39.1(Jul 21, 2021)

v0.39.1

v0.39.0(Jul 20, 2021)

v0.39.0

v0.38.0(Jul 6, 2021)

v0.38.0

v0.37.0(Jun 24, 2021)

v0.37.0

v0.36.0(Jun 8, 2021)

v0.36.0

v0.35.0(May 11, 2021)

v0.35.0

v0.34.0(Apr 27, 2021)

v0.34.0

v0.33.0(Apr 13, 2021)

v0.33.0

v0.32.0(Mar 30, 2021)

v0.32.0

v0.31.1(Mar 23, 2021)

v0.31.1

v0.31.0(Mar 17, 2021)

v0.31.0

v0.30.0(Mar 3, 2021)

v0.30.0

v0.29.0(Feb 17, 2021)