Python client for Apache Kafka

Last update: Jan 08, 2023

Related tags

Overview

Kafka Python client

https://coveralls.io/repos/dpkp/kafka-python/badge.svg?branch=master&service=github

https://travis-ci.org/dpkp/kafka-python.svg?branch=master

Python client for the Apache Kafka distributed stream processing system. kafka-python is designed to function much like the official java client, with a sprinkling of pythonic interfaces (e.g., consumer iterators).

kafka-python is best used with newer brokers (0.9+), but is backwards-compatible with older versions (to 0.8.0). Some features will only be enabled on newer brokers. For example, fully coordinated consumer groups -- i.e., dynamic partition assignment to multiple consumers in the same group -- requires use of 0.9+ kafka brokers. Supporting this feature for earlier broker releases would require writing and maintaining custom leadership election and membership / health check code (perhaps using zookeeper or consul). For older brokers, you can achieve something similar by manually assigning different partitions to each consumer instance with config management tools like chef, ansible, etc. This approach will work fine, though it does not support rebalancing on failures. See <https://kafka-python.readthedocs.io/en/master/compatibility.html> for more details.

Please note that the master branch may contain unreleased features. For release documentation, please see readthedocs and/or python's inline help.

>>> pip install kafka-python

KafkaConsumer

KafkaConsumer is a high-level message consumer, intended to operate as similarly as possible to the official java client. Full support for coordinated consumer groups requires use of kafka brokers that support the Group APIs: kafka v0.9+.

See <https://kafka-python.readthedocs.io/en/master/apidoc/KafkaConsumer.html> for API and configuration details.

The consumer iterator returns ConsumerRecords, which are simple namedtuples that expose basic message attributes: topic, partition, offset, key, and value:

>>> from kafka import KafkaConsumer
>>> consumer = KafkaConsumer('my_favorite_topic')
>>> for msg in consumer:
...     print (msg)

>>> # join a consumer group for dynamic partition assignment and offset commits
>>> from kafka import KafkaConsumer
>>> consumer = KafkaConsumer('my_favorite_topic', group_id='my_favorite_group')
>>> for msg in consumer:
...     print (msg)

>>> # manually assign the partition list for the consumer
>>> from kafka import TopicPartition
>>> consumer = KafkaConsumer(bootstrap_servers='localhost:1234')
>>> consumer.assign([TopicPartition('foobar', 2)])
>>> msg = next(consumer)

>>> # Deserialize msgpack-encoded values
>>> consumer = KafkaConsumer(value_deserializer=msgpack.loads)
>>> consumer.subscribe(['msgpackfoo'])
>>> for msg in consumer:
...     assert isinstance(msg.value, dict)

>>> # Access record headers. The returned value is a list of tuples
>>> # with str, bytes for key and value
>>> for msg in consumer:
...     print (msg.headers)

>>> # Get consumer metrics
>>> metrics = consumer.metrics()

KafkaProducer

KafkaProducer is a high-level, asynchronous message producer. The class is intended to operate as similarly as possible to the official java client. See <https://kafka-python.readthedocs.io/en/master/apidoc/KafkaProducer.html> for more details.

>>> from kafka import KafkaProducer
>>> producer = KafkaProducer(bootstrap_servers='localhost:1234')
>>> for _ in range(100):
...     producer.send('foobar', b'some_message_bytes')

>>> # Block until a single message is sent (or timeout)
>>> future = producer.send('foobar', b'another_message')
>>> result = future.get(timeout=60)

>>> # Block until all pending messages are at least put on the network
>>> # NOTE: This does not guarantee delivery or success! It is really
>>> # only useful if you configure internal batching using linger_ms
>>> producer.flush()

>>> # Use a key for hashed-partitioning
>>> producer.send('foobar', key=b'foo', value=b'bar')

>>> # Serialize json messages
>>> import json
>>> producer = KafkaProducer(value_serializer=lambda v: json.dumps(v).encode('utf-8'))
>>> producer.send('fizzbuzz', {'foo': 'bar'})

>>> # Serialize string keys
>>> producer = KafkaProducer(key_serializer=str.encode)
>>> producer.send('flipflap', key='ping', value=b'1234')

>>> # Compress messages
>>> producer = KafkaProducer(compression_type='gzip')
>>> for i in range(1000):
...     producer.send('foobar', b'msg %d' % i)

>>> # Include record headers. The format is list of tuples with string key
>>> # and bytes value.
>>> producer.send('foobar', value=b'c29tZSB2YWx1ZQ==', headers=[('content-encoding', b'base64')])

>>> # Get producer performance metrics
>>> metrics = producer.metrics()

Thread safety

The KafkaProducer can be used across threads without issue, unlike the KafkaConsumer which cannot.

While it is possible to use the KafkaConsumer in a thread-local manner, multiprocessing is recommended.

Compression

kafka-python supports the following compression formats:

gzip
LZ4
Snappy
Zstandard (zstd)

gzip is supported natively, the others require installing additional libraries. See <https://kafka-python.readthedocs.io/en/master/install.html> for more information.

Optimized CRC32 Validation

Kafka uses CRC32 checksums to validate messages. kafka-python includes a pure python implementation for compatibility. To improve performance for high-throughput applications, kafka-python will use crc32c for optimized native code if installed. See <https://kafka-python.readthedocs.io/en/master/install.html> for installation instructions. See https://pypi.org/project/crc32c/ for details on the underlying crc32c lib.

Protocol

A secondary goal of kafka-python is to provide an easy-to-use protocol layer for interacting with kafka brokers via the python repl. This is useful for testing, probing, and general experimentation. The protocol support is leveraged to enable a KafkaClient.check_version() method that probes a kafka broker and attempts to identify which version it is running (0.8.0 to 2.6+).

Comments

Heartbeat failed for group xxxWorker because it is rebalancing
Hi all I'm facing this problem that is driving to me crazy with 1.4.1 version of Kafka python the instruction that i perform are:

create a consumer in this way KafkaConsumer(bootstrap_servers=kafka_multi_hosts, auto_offset_reset=earliest, enable_auto_commit=False, group_id=group_name, reconnect_backoff_ms=1, consumer_timeout_ms=5000)

subscribe on a topics

finally the consumer.poll(500)

no problem till now but then in the log i see:

[INFO] 03/08/2018 02:52:53 PM Subscribe executed. [INFO] 03/08/2018 02:52:53 PM Initialization pool executed. [INFO] 03/08/2018 02:52:53 PM Subscribed to topic: event [INFO] 03/08/2018 02:52:53 PM eventHandle connected [WARNING] 03/08/2018 02:53:23 PM Heartbeat failed for group emsWorker because it is rebalancing [WARNING] 03/08/2018 02:53:26 PM Heartbeat failed for group emsWorker because it is rebalancing [WARNING] 03/08/2018 02:53:29 PM Heartbeat failed for group emsWorker because it is rebalancing [WARNING] 03/08/2018 02:53:32 PM Heartbeat failed for group emsWorker because it is rebalancing ...... ...... ...... [WARNING] 03/08/2018 02:57:48 PM Heartbeat failed for group emsWorker because it is rebalancing [WARNING] 03/08/2018 02:57:51 PM Heartbeat failed for group emsWorker because it is rebalancing [INFO] 03/08/2018 02:57:53 PM Leaving consumer group (group_name).

why?

I've also added the option max_poll_records=50 in the kafkaConsumer definition but nothing is changed

@dpkp can you help me? do you know if the 1.4.1 version presents some problem about that? Because in the previous version I can not see this problem.

thanks in advance
consumer
opened by ealfatt 54

kafka.common.KafkaTimeoutError: ('Failed to update metadata after %s secs.', 60.0)

kafka version: 0.8.2.0-1.kafka1.3.2.p0.15 (cloudera released)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/archimonde/lib/python2.6/site-packages/kafka/producer/kafka.py", line 357, in send
    self._wait_on_metadata(topic, self.config['max_block_ms'] / 1000.0)
  File "/opt/archimonde/lib/python2.6/site-packages/kafka/producer/kafka.py", line 465, in _wait_on_metadata
    "Failed to update metadata after %s secs.", max_wait)
kafka.common.KafkaTimeoutError: ('Failed to update metadata after %s secs.', 60.0)

But it's ok on 2.0.0-1.kafka2.0.0.p0.12.

bug critical/stability

opened by archiechen 50

KafkaProducer produces corrupt "double-compressed" messages on retry when compression is enabled. KafkaConsumer gets "stuck" consuming them
This is an interesting one.

In all of our topics every day a handful of partitions get "stuck". Basically the reading of the partition stops at a given message and kafka-python reports that there are no more messages in the given partition (just like it would have consumed all messages), while there are unconsumed messages. The only way to get the consumers moving again is to manually seek the offset forward by stepping over the "stuck" messages and then works again for a few million records and then get stuck again at some later offset.

I have multiple consumers consuming from the same topic and they all get stuck at the same messages of the same topics. Random number of partitions are affected day-to-day.

We are using Kafka broker version 0.9.0.1, kafka-python 1.2.1 (had the same issue with 1.1.1).

The consumer code is very simple (the below code is trying to read only partition #1, which is currently "stuck"):

kafka_consumer = KafkaConsumer( group_id=kafka_group_id, bootstrap_servers=kafka_servers, enable_auto_commit=False, consumer_timeout_ms=10000, fetch_max_wait_ms=10*1000, request_timeout_ms=10*1000 ) topics = [TopicPartition(topic, 1)] kafka_consumer.assign(topics) for message in kafka_consumer: print(message) print("Completed")

The above code prints "Completed", but not the messages, while there is a 5M offset lag in partition 1, so there would be plenty of messages to read. After seeking the consumer offset forward the code works again until it doesn't get "stuck" again.
bug consumer producer
opened by zoltan-fedor 41
Integration tests migration to pytest framework

In preparation for the changes I'll need to make to the integration tests to cover the GSSAPI authentication changes in PR #1283, I had to make a few changes to the existing test cases, which were written using unittest.TestCase.

After some discussions with @dpkp and @jeffwidman I decided to first move those test cases to pytest, which is the preferred test framework for the project.

opened by asdaraujo 33

kafka.errors.NoBrokersAvailable exception when running producer example on Mac

Running single-node Kafka cluster on localhost on Mac (OS X 10.11.6) Getting error on attempt to instantiate producer

>>> from kafka import KafkaProducer
>>> producer = KafkaProducer(bootstrap_servers=['localhost:9092'])

getting error

  File "<stdin>", line 1, in <module>
  File "/Users/user1/anaconda/envs/myenv/lib/python2.7/site-packages/kafka/producer/kafka.py", line 347, in __init__
    **self.config)
  File "/Users/user1/anaconda/envs/myenv/lib/python2.7/site-packages/kafka/client_async.py", line 221, in __init__
    self.config['api_version'] = self.check_version(timeout=check_timeout)
  File "/Users/user1/anaconda/envs/myenv/lib/python2.7/site-packages/kafka/client_async.py", line 826, in check_version
    raise Errors.NoBrokersAvailable()
kafka.errors.NoBrokersAvailable: NoBrokersAvailable

Kafka is up, and running locally and producer from confluent-kafka-python works without issues. Any suggestions what to look for?

server.properties:
. . . 
listeners=PLAINTEXT://localhost:9092
. . .

opened by vkroz 31

OffsetCommit failed for group XXX due to group error

while running producer send test with key, the consumer output some error message:

[2016-02-29 17:20:56,243] [ERROR] [Thread-2] OffsetCommit failed for group XXX due to group error (IllegalGenerationError - 22 - Returned from group membership requests (such as heartbeats) when the generation id provided in the request is not the current generation.), will rejoin
[2016-02-29 17:20:56,787] [ERROR] [Thread-2] OffsetCommit failed for group XXX due to group error (IllegalGenerationError - 22 - Returned from group membership requests (such as heartbeats) when the generation id provided in the request is not the current generation.), will rejoin

the consumer can still work

I mentioned there is a similar issue #498, and it's fixed. after upgraded from 1.0 to 1.01 and also to 1.02-dev, my test script still output the error message.

so, what's this message mean? does it make some wrong?

opened by abc100m 31

KIP-62 / KAFKA-3888: Allow consumer to send heartbeats from a background thread

Allows consumers to take their time processing messages without being timed out from their consumer group.

max_poll_records is a decent workaround for most of the pain here, but it'd still be nice to add this for consumers that have inconsistent message processing times.

Related issues: #872, #544

opened by jeffwidman 27
Fixing copy() methods to work with gevent library

This fixes an exception where calling copy() on a KafkaClient object after monkey patching with gevent. So code like this:

import gevent.monkey; gevent.monkey.patch_all() import kafka

c=kafka.KafkaClient(['localhost:9092']) c.copy()

opened by hmahmood 27
Help: NoBrokersAvailable

hi in kafka_2.11-0.10.1.0 version use kafka-python to connect error

use bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning is correct my python code: from kafka import KafkaConsumer consumer = KafkaConsumer('test',bootstrap_servers=['localhost:9092']) for message in consumer: print ("%s:%d:%d: key=%s value=%s" % (message.topic, message.partition, message.offset, message.key, message.value)) error message: Traceback (most recent call last): File "pkafka.py", line 5, in bootstrap_servers=['localhost:9092']) File "/usr/lib/python2.7/site-packages/kafka_python-1.3.2.dev0-py2.7.egg/kafka/consumer/group.py", line 287, in init self._client = KafkaClient(metrics=self._metrics, **self.config) File "/usr/lib/python2.7/site-packages/kafka_python-1.3.2.dev0-py2.7.egg/kafka/client_async.py", line 204, in init self.config['api_version'] = self.check_version(timeout=check_timeout) File "/usr/lib/python2.7/site-packages/kafka_python-1.3.2.dev0-py2.7.egg/kafka/client_async.py", line 813, in check_version raise Errors.NoBrokersAvailable() kafka.errors.NoBrokersAvailable: NoBrokersAvailable

opened by wawava 26
High CPU usage in KafkaConsumer.poll() when subscribed to many topics with no new messages (possibly SSL related)
Experiencing high CPU usage when sitting idle in poll() (i.e., waiting for a timeout when there are no new messages on the broker). Gets worse the more topics I am subscribed to (I have cpu pegged at 100 with 40 topics). Note that I am using 1.3.4 with mostly default configs, and repro'd also in the curret master.

Seems to be a couple things at play here. One is that poll() will do fetch requests in a tight loop. The other, the one that really seems to be killing cpu, is that when a fetch response is received, the low level poll() will get in a relatively tight loop as the payload buffer fills, adding a relatively small number of bytes at a time. This explains the effect of adding more topics: the fetch responses are bigger so it more time in this tight loop. Here's some debug output based on a couple probes I put in the code:

In conn.py: _recv()

if staged_bytes != self._next_payload_bytes: print("staged: {} payload: {}".format(staged_bytes, self._next_payload_bytes)) return None

In consumer/group.py: _poll_once()

print("fetch!") # Send any new fetches (won't resend pending fetches) self._fetcher.send_fetches()

So, for one topic I get output like this while blocked in poll():

fetch! staged: 4 payload: 104 fetch! staged: 12 payload: 104 fetch! staged: 50 payload: 104 fetch! staged: 68 payload: 104 fetch! staged: 86 payload: 104 fetch! fetch! staged: 4 payload: 104

For 2 topics:

fetch! staged: 4 payload: 179 fetch! staged: 12 payload: 179 fetch! staged: 51 payload: 179 fetch! staged: 69 payload: 179 fetch! staged: 87 payload: 179 fetch! staged: 105 payload: 179 fetch! staged: 143 payload: 179 fetch! staged: 161 payload: 179 fetch! fetch! staged: 4 payload: 197 fetch!

For 40 topics:

fetch! staged: 2867 payload: 3835 fetch! staged: 2885 payload: 3835 fetch! staged: 2939 payload: 3835 fetch! staged: 2957 payload: 3835 fetch! staged: 2975 payload: 3835 staged: 4 payload: 3799 fetch! staged: 12 payload: 3799 fetch! staged: 58 payload: 3799 fetch! staged: 76 payload: 3799 fetch! staged: 94 payload: 3799 fetch! staged: 112 payload: 3799 fetch! staged: 154 payload: 3799 fetch! ... and many mnay more

so it gets stuck spinning in this, and cpu goes to 100.

I tried mitigating this using consumer fetch config:

fetch_min_bytes=1000000, fetch_max_wait_ms=2000,

but that did nothing.

The only thing that gets the cpu down is to to a non-blocking poll() instead of using a timeout, and then doing a short sleep when there are no result records (my application can tolerate that latency). It looks like poll used to support something like this, i.e., there was a sleep parameter that caused a sleep for the remainder of the timeout period if there were no records on first fetch. Looks like that was removed in 237bd73, not sure why.

So... like I said I can workaround the continuous fetching with my own sleep. Would be good to understand the real problem which is the tight _recv() loop, and whether anything can be done about it.
opened by rmechler 25
Add v2 record batch format for FetchResponse V4 support.
Changed layout of record parsing to allow V2 integration with batch concept. For now I only moved the abstract implementation and wrapped LegacyBatch using existing MessageSet and Message constructs. What I want to do next is:

[x] Add implementation for v2 records. A bit dirty implementation is already done here

[x] Add builder class implementation and fix Producer to use that.

[x] Refactor LegacyRecordBatch to not use Message and MessageSet constructs (not needed overhead)

@dpkp What do you think, it's quite a hard change, but I did not find a good way to abstract Batches using existing constructs. Moreover this is based on current Java client version, thou with not as many classes.
opened by tvoinarovskyi 25
I use this library to connect to Kafka in a scenario with a large amount of data. How can I know whether the current performance is sufficient?

sorry my English is not good

I use this library. I want to count the number of topics in an hour. But I don't know whether I will block when running the program because of the large amount of data, so I can't count all the times of topic when I stop the task. Is there any way to know whether the current performance is sufficient?

Do I need to start multiple KafkaConsumers with multiple processes？

opened by kslz 0
Support for MSK & IAM integration for producer and consumer.

Hello,

I am facing situation where I have to listen to MSK using python with IAM integration - as mentioned in the issue (https://github.com/dpkp/kafka-python/issues/2232). However, I didn't find any solution and it looks like kafka-python has not yet integrated this scenario.

Please suggest if there is any way I can achieve it with Python -> MSK -> IAM role

Thanks, AK

opened by ash10don 0

topic_errors elements are not subscriptable by field name in CreateTopicResponse

There is a schema of a response in a CreateTopicsResponse

SCHEMA = Schema(
        ('throttle_time_ms', Int32),
        ('topic_errors', Array(
            ('topic', String('utf-8')),
            ('error_code', Int16),
            ('error_message', String('utf-8'))))
    )

however, attempt to get value of a array element and get its' value by field name fails. Example:

topics = [NewTopic("test_topic", 1, 1)]
response = client.create_topics(new_topics=topics, validate_only=False)
print(response.topic_errors[0].error_code)
# fails with AttributeError: 'tuple' object has no attribute 'error_code'

Is it expected behaviour (schemas intended only for pretty-printing response, field names are shown when printing the whole response object) or bug?

opened by MikhailGolubtsov94 0

`bootstrap_connected` is returning false even though connection is still there

I was trying to write a module to reconnect to Kafka after connection failure and i was using this kafka_producer.bootstrap_connected() to check if the connection is still there or not. It is returning faulty value.

Libraries used Python: 3.9.15 kafka-python: 2.0.2 Kafka: confluentinc/cp-kafka:7.0.1 Zookeeper: confluentinc/cp-zookeeper:7.0.1

Sample code

from kafka import KafkaProducer
import json
producer =  KafkaProducer(bootstrap_servers=["localhost:9092"], value_serializer=lambda payload: payload.encode("utf-8"), key_serializer=lambda key: key.encode("utf-8"))
producer.bootstrap_connected() # returns True
producer.send(topic="test-topic", value='{"x": 1, "y": 2}',key="") # working
producer.bootstrap_connected() # returns False??
producer.flush()
producer.bootstrap_connected() # returns False
producer.send(topic="test-topic", value='{"key":"value"}',key="") # this is working although it just said bootstrap is not connected
producer.bootstrap_connected()
producer.flush()

Sample Screenshot

IPython

Broker

opened by prdpx7 0

multiple consumer in one process will cause poll() block?

when multiple consumer in one process, each consumer invoke poll() will block server minutes.

consumer1 = KafkaConsumer(topic1, client_id=topic1, bootstrap_servers=KAFKA_SERVER, consumer_timeout_ms=1000, group_id="group_test")
consumer2 = KafkaConsumer(topic2, client_id=topic2, bootstrap_servers=KAFKA_SERVER, consumer_timeout_ms=1000, group_id="group_test")
consumer1.poll() # this is ok
consumer2.poll() # this will block server minutes

the environment and config is one zk, one broker, one partition per topic

can any one help? Thanks

opened by jixushui 0

Releases(2.0.2)

2.0.2(Sep 30, 2020)
2.0.2 (Sep 29, 2020)

Consumer

KIP-54: Implement sticky partition assignment strategy (aynroot / PR #2057)

Fix consumer deadlock when heartbeat thread request timeout (huangcuiyang / PR #2064)

Compatibility

Python 3.8 support (Photonios / PR #2088)

Cleanups

Bump dev requirements (jeffwidman / PR #2129)

Fix crc32c deprecation warning (crc32c==2.1) (jeffwidman / PR #2128)

Lint cleanup (jeffwidman / PR #2126)

Fix initialization order in KafkaClient (pecalleja / PR #2119)

Allow installing crc32c via extras (mishas / PR #2069)

Remove unused imports (jameslamb / PR #2046)

Admin Client

Merge _find_coordinator_id methods (jeffwidman / PR #2127)

Feature: delete consumergroups (swenzel / PR #2040)

Allow configurable timeouts in admin client check version (sunnyakaxd / PR #2107)

Enhancement for Kafka Admin Client's "Describe Consumer Group" (Apurva007 / PR #2035)

Protocol

Add support for zstd compression (gabriel-tincu / PR #2021)

Add protocol support for brokers 1.1.0 - 2.5.0 (gabriel-tincu / PR #2038)

Add ProduceRequest/ProduceResponse v6/v7/v8 (gabriel-tincu / PR #2020)

Fix parsing NULL header values (kvfi / PR #2024)

Tests

Add 2.5.0 to automated CI tests (gabriel-tincu / PR #2038)

Add 2.1.1 to build_integration (gabriel-tincu / PR #2019)

Documentation / Logging / Errors

Disable logging during producer object gc (gioele / PR #2043)

Update example.py; use threading instead of multiprocessing (Mostafa-Elmenbawy / PR #2081)

Fix typo in exception message (haracejacob / PR #2096)

Add kafka.structs docstrings (Mostafa-Elmenbawy / PR #2080)

Fix broken compatibility page link (anuragrana / PR #2045)

Rename README to README.md (qhzxc0015 / PR #2055)

Fix docs by adding SASL mention (jeffwidman / #1990)

Source code(tar.gz)
Source code(zip)
2.0.1(Sep 30, 2020)
2.0.1 (Feb 19, 2020)

Admin Client

KAFKA-8962: Use least_loaded_node() for AdminClient.describe_topics() (jeffwidman / PR #2000)

Fix AdminClient topic error parsing in MetadataResponse (jtribble / PR #1997)

Source code(tar.gz)
Source code(zip)
2.0.0(Feb 11, 2020)
2.0.0 (Feb 10, 2020)

This release includes breaking changes for any application code that has not migrated from older Simple-style classes to newer Kafka-style classes.

Deprecation

Remove deprecated SimpleClient, Producer, Consumer, Unittest (jeffwidman / PR #1196)

Admin Client

Use the controller for topic metadata requests (TylerLubeck / PR #1995)

Implement list_topics, describe_topics, and describe_cluster (TylerLubeck / PR #1993)

Implement eq and hash for ACL objects (TylerLubeck / PR #1955)

Fixes KafkaAdminClient returning IncompatibleBrokerVersion when passing an api_version (ian28223 / PR #1953)

Admin protocol updates (TylerLubeck / PR #1948)

Fix describe config for multi-broker clusters (jlandersen / PR #1869)

Miscellaneous Bugfixes / Improvements

Enable SCRAM-SHA-256 and SCRAM-SHA-512 for sasl (swenzel / PR #1918)

Fix slots usage and use more slots (carsonip / PR #1987)

Optionally return OffsetAndMetadata from consumer.committed(tp) (dpkp / PR #1979)

Reset conn configs on exception in conn.check_version() (dpkp / PR #1977)

Do not block on sender thread join after timeout in producer.close() (dpkp / PR #1974)

Implement methods to convert a Struct object to a pythonic object (TylerLubeck / PR #1951)

Test Infrastructure / Documentation / Maintenance

Update 2.4.0 resource files for sasl integration (dpkp)

Add kafka 2.4.0 to CI testing (vvuibert / PR #1972)

convert test_admin_integration to pytest (ulrikjohansson / PR #1923)

xfail test_describe_configs_topic_resource_returns_configs (dpkp / Issue #1929)

Add crc32c to README and docs (dpkp)

Improve docs for reconnect_backoff_max_ms (dpkp / PR #1976)

Fix simple typo: managementment -> management (timgates42 / PR #1966)

Fix typos (carsonip / PR #1938)

Fix doc import paths (jeffwidman / PR #1933)

Update docstring to match conn.py's (dabcoder / PR #1921)

Do not log topic-specific errors in full metadata fetch (dpkp / PR #1980)

Raise AssertionError if consumer closed in poll() (dpkp / PR #1978)

Log retriable coordinator NodeNotReady, TooManyInFlightRequests as debug not error (dpkp / PR #1975)

Remove unused import (jeffwidman)

Remove some dead code (jeffwidman)

Fix a benchmark to Use print() function in both Python 2 and Python 3 (cclauss / PR #1983)

Fix a test to use ==/!= to compare str, bytes, and int literals (cclauss / PR #1984)

Fix benchmarks to use pyperf (carsonip / PR #1986)

Remove unused/empty .gitsubmodules file (jeffwidman / PR #1928)

Remove deprecated ConnectionError (jeffwidman / PR #1816)

Source code(tar.gz)
Source code(zip)
1.4.7(Sep 30, 2019)
1.4.7 (Sep 30, 2019)

This release is focused on KafkaConsumer performance, Admin Client improvements, and Client concurrency. The KafkaConsumer iterator implementation has been greatly simplified so that it just wraps consumer.poll(). The prior implementation will remain available for a few more releases using the optional KafkaConsumer config: legacy_iterator=True . This is expected to improve consumer throughput substantially and help reduce heartbeat failures / group rebalancing.

Major thanks to @carsonip @Baisang @iv-m @davidheitman @cardy31 @ulrikjohansson @iAnomaly @Wayde2014 @ossdev07 @commanderdishwasher @justecorruptio @melor @rustyrothwurt @sachiin @jacky15 and @rikonen for submitting PRs; thanks as well to everyone that submitted bug reports and issues, and to @jeffwidman and @tvoinarovskyi for code reviews, comments, testing, debugging, and helping to maintain kafka-python!

Client

Send socket data via non-blocking IO with send buffer (@dpkp / PR #1912)

Rely on socket selector to detect completed connection attempts (@dpkp / PR #1909)

Improve connection lock handling; always use context manager (@melor @dpkp / PR #1895)

Reduce client poll timeout when there are no in-flight requests (@dpkp / PR #1823)

KafkaConsumer

Do not use wakeup when sending fetch requests from consumer (@dpkp / PR #1911)

Wrap consumer.poll() for KafkaConsumer iteration (@dpkp / PR #1902)

Allow the coordinator to auto-commit on old brokers (@justecorruptio / PR #1832)

Reduce internal client poll timeout for (legacy) consumer iterator interface (@dpkp / PR #1824)

Use dedicated connection for group coordinator (@dpkp / PR #1822)

Change coordinator lock acquisition order (@dpkp / PR #1821)

Make partitions_for_topic a read-through cache (@Baisang / PR #1781,#1809)

Fix consumer hanging indefinitely on topic deletion while rebalancing (@commanderdishwasher / PR #1782)

Miscellaneous Bugfixes / Improvements

Fix crc32c avilability on non-intel architectures (@ossdev07 / PR #1904)

Load system default SSL CAs if ssl_cafile is not provided (@iAnomaly / PR #1883)

Catch py3 TimeoutError in BrokerConnection send/recv (@dpkp / PR #1820)

Added a function to determine if bootstrap is successfully connected (@Wayde2014 / PR #1876)

Admin Client

Add ACL api support to KafkaAdminClient (@ulrikjohansson / PR #1833)

Add sasl_kerberos_domain_name config to KafkaAdminClient (@jeffwidman / PR #1852)

Update security_protocol config documentation for KafkaAdminClient (@cardy31 / PR #1849)

Break FindCoordinator into request/response methods in KafkaAdminClient (@jeffwidman / PR #1871)

Break consumer operations into request / response methods in KafkaAdminClient (@jeffwidman / PR #1845)

Parallelize calls to _send_request_to_node() in KafkaAdminClient (@davidheitman / PR #1807)

Test Infrastructure / Documentation / Maintenance

Add Kafka 2.3.0 to test matrix and compatibility docs (dpkp / PR #1915)

Convert remaining KafkaConsumer tests to pytest (@jeffwidman / PR #1886)

Bump integration tests to 0.10.2.2 and 0.11.0.3 (@jeffwidman / #1890)

Cleanup handling of KAFKA_VERSION env var in tests (@jeffwidman / PR #1887)

Minor test cleanup (@jeffwidman / PR #1885)

Use socket.SOCK_STREAM in test assertions (@iv-m / PR #1879)

Sanity test for consumer.topics() and consumer.partitions_for_topic() (@Baisang / PR #1829)

Cleanup seconds conversion in client poll timeout calculation (@jeffwidman / PR #1825)

Remove unused imports (@jeffwidman / PR #1808)

Cleanup python nits in RangePartitionAssignor (@jeffwidman / PR #1805)

Update links to kafka consumer config docs (@jeffwidman)

Fix minor documentation typos (@carsonip / PR #1865)

Remove unused/weird comment line (@jeffwidman / PR #1813)

Update docs for api_version_auto_timeout_ms (@jeffwidman / PR #1812)

Source code(tar.gz)
Source code(zip)
1.4.6(Apr 3, 2019)
1.4.6 (Apr 2, 2019)

This is a patch release primarily focused on bugs related to concurrency, SSL connections and testing, and SASL authentication. Major thanks to @pt2pham , @isamaru , @braedon , @gingercookiemage , for submitting PRs to help fix many of these issues. And major thanks to everyone that submitted bug reports and issues. And thanks always to @jeffwidman and @tvoinarovskyi for code reviews, comments, testing, debugging, and helping to maintain this project!

Client Concurrency Issues (Race Conditions / Deadlocks)

Fix race condition in protocol.send_bytes (isamaru / PR #1752)

Do not call state_change_callback with lock (dpkp / PR #1775)

Additional BrokerConnection locks to synchronize protocol/IFR state (dpkp / PR #1768)

Send pending requests before waiting for responses (dpkp / PR #1762)

Avoid race condition on client._conns in send() (dpkp / PR #1772)

Hold lock during client.check_version (dpkp / PR #1771)

Producer Wakeup / TimeoutError

Dont wakeup during maybe_refresh_metadata -- it is only called by poll() (dpkp / PR #1769)

Dont do client wakeup when sending from sender thread (dpkp / PR #1761)

SSL - Python3.7 Support / Bootstrap Hostname Verification / Testing

Wrap SSL sockets after connecting for python3.7 compatibility (dpkp / PR #1754)

Allow configuration of SSL Ciphers (dpkp / PR #1755)

Maintain shadow cluster metadata for bootstrapping (dpkp / PR #1753)

Generate SSL certificates for local testing (dpkp / PR #1756)

Rename ssl.keystore.location and ssl.truststore.location config files (dpkp)

Reset reconnect backoff on SSL connection (dpkp / PR #1777)

SASL - OAuthBearer support / api version bugfix

Fix 0.8.2 protocol quick detection / fix SASL version check (dpkp / PR #1763)

Update sasl configuration docstrings to include supported mechanisms (dpkp)

Support SASL OAuthBearer Authentication (pt2pham / PR #1750)

Miscellaneous Bugfixes

Dont force metadata refresh when closing unneeded bootstrap connections (dpkp / PR #1773)

Fix possible AttributeError during conn._close_socket (dpkp / PR #1776)

Return connection state explicitly after close in connect() (dpkp / PR #1778)

Fix flaky conn tests that use time.time (dpkp / PR #1758)

Add py to requirements-dev (dpkp)

Fixups to benchmark scripts for py3 / new KafkaFixture interface (dpkp)

Source code(tar.gz)
Source code(zip)
1.4.5(Mar 15, 2019)
1.4.5 (Mar 14, 2019)

This release is primarily focused on addressing lock contention and other coordination issues between the KafkaConsumer and the background heartbeat thread that was introduced in the 1.4 release.

Consumer

connections_max_idle_ms must be larger than request_timeout_ms (jeffwidman / PR #1688)

Avoid race condition during close() / join heartbeat thread (dpkp / PR #1735)

Use last offset from fetch v4 if available to avoid getting stuck in compacted topic (keithks / PR #1724)

Synchronize puts to KafkaConsumer protocol buffer during async sends (dpkp / PR #1733)

Improve KafkaConsumer join group / only enable Heartbeat Thread during stable group (dpkp / PR #1695)

Remove unused skip_double_compressed_messages (jeffwidman / PR #1677)

Fix commit_offsets_async() callback (Faqa / PR #1712)

Client

Retry bootstrapping after backoff when necessary (dpkp / PR #1736)

Recheck connecting nodes sooner when refreshing metadata (dpkp / PR #1737)

Avoid probing broker versions twice on newer brokers (dpkp / PR #1738)

Move all network connections and writes to KafkaClient.poll() (dpkp / PR #1729)

Do not require client lock for read-only operations (dpkp / PR #1730)

Timeout all unconnected conns (incl SSL) after request_timeout_ms (dpkp / PR #1696)

Admin Client

Fix AttributeError in response topic error codes checking (jeffwidman)

Fix response error checking in KafkaAdminClient send_to_controller (jeffwidman)

Fix NotControllerError check (jeffwidman)

Core/Protocol

Fix default protocol parser version / 0.8.2 version probe (dpkp / PR #1740)

Make NotEnoughReplicasError/NotEnoughReplicasAfterAppendError retriable (le-linh / PR #1722)

Bugfixes

Use copy() in metrics() to avoid thread safety issues (emeric254 / PR #1682)

Test Infrastructure

Mock dns lookups in test_conn (dpkp / PR #1739)

Use test.fixtures.version not test.conftest.version to avoid warnings (dpkp / PR #1731)

Fix test_legacy_correct_metadata_response on x86 arch (stanislavlevin / PR #1718)

Travis CI: 'sudo' tag is now deprecated in Travis (cclauss / PR #1698)

Use Popen.communicate() instead of Popen.wait() (Baisang / PR #1689)

Compatibility

Catch thrown OSError by python 3.7 when creating a connection (danjo133 / PR #1694)

Update travis test coverage: 2.7, 3.4, 3.7, pypy2.7 (jeffwidman, dpkp / PR #1614)

Drop dependency on sphinxcontrib-napoleon (stanislavlevin / PR #1715)

Remove unused import from kafka/producer/record_accumulator.py (jeffwidman / PR #1705)

Fix SSL connection testing in Python 3.7 (seanthegeek, silentben / PR #1669)

Source code(tar.gz)
Source code(zip)
1.4.4(Nov 23, 2018)
Bugfixes

(Attempt to) Fix deadlock between consumer and heartbeat (zhgjun / dpkp #1628)

Fix Metrics dict memory leak (kishorenc #1569)

Client

Support Kafka record headers (hnousiainen #1574)

Set socket timeout for the write-side of wake socketpair (Fleurer #1577)

Add kerberos domain name config for gssapi sasl mechanism handshake (the-sea #1542)

Support smaller topic metadata fetch during bootstrap (andyxning #1541)

Use TypeError for invalid timeout type (jeffwidman #1636)

Break poll if closed (dpkp)

Admin Client

Add KafkaAdminClient class (llamahunter #1540)

Fix list_consumer_groups() to query all brokers (jeffwidman #1635)

Stop using broker-errors for client-side problems (jeffwidman #1639)

Fix send to controller (jeffwidman #1640)

Add group coordinator lookup (jeffwidman #1641)

Fix describe_groups (jeffwidman #1642)

Add list_consumer_group_offsets() (jeffwidman #1643)

Remove support for api versions as strings from KafkaAdminClient (jeffwidman #1644)

Set a clear default value for validate_only/include_synonyms (jeffwidman #1645)

Bugfix: Always set this_groups_coordinator_id (jeffwidman #1650)

Consumer

Fix linter warning on import of ConsumerRebalanceListener (ben-harack #1591)

Remove ConsumerTimeout (emord #1587)

Return future from commit_offsets_async() (ekimekim #1560)

Core / Protocol

Add protocol structs for {Describe,Create,Delete} Acls (ulrikjohansson #1646/partial)

Pre-compile pack/unpack function calls (billyevans / jeffwidman #1619)

Don't use kafka.common internally (jeffwidman #1509)

Be explicit with tuples for %s formatting (jeffwidman #1634)

Documentation

Document connections_max_idle_ms (jeffwidman #1531)

Fix sphinx url (jeffwidman #1610)

Update remote urls: snappy, https, etc (jeffwidman #1603)

Minor cleanup of testing doc (jeffwidman #1613)

Various docstring / pep8 / code hygiene cleanups (jeffwidman #1647)

Test Infrastructure

Stop pinning pylint (jeffwidman #1611)

(partial) Migrate from Unittest to pytest (jeffwidman #1620)

Minor aesthetic cleanup of partitioner tests (jeffwidman #1618)

Cleanup fixture imports (jeffwidman #1616)

Fix typo in test file name (jeffwidman)

Remove unused ivy_root variable (jeffwidman)

Add test fixtures for kafka versions 1.0.2 -> 2.0.1 (dpkp)

Bump travis test for 1.x brokers to 1.1.1 (dpkp)

Logging / Error Messages

raising logging level on messages signalling data loss (sibiryakov #1553)

Stop using deprecated log.warn() (jeffwidman #1615)

Fix typo in logging message (jeffwidman)

Compatibility

Vendor enum34 (jeffwidman #1604)

Bump vendored six to 1.11.0 (jeffwidman #1602)

Vendor six consistently (jeffwidman #1605)

Prevent pylint import errors on six.moves (jeffwidman #1609)

Source code(tar.gz)
Source code(zip)
1.4.3(May 26, 2018)
Compatibility

Fix for python 3.7 support: remove 'async' keyword from SimpleProducer (dpkp #1454)

Client

Improve BrokerConnection initialization time (romulorosa #1475)

Ignore MetadataResponses with empty broker list (dpkp #1506)

Improve connection handling when bootstrap list is invalid (dpkp #1507)

Consumer

Check for immediate failure when looking up coordinator in heartbeat thread (dpkp #1457)

Core / Protocol

Always acquire client lock before coordinator lock to avoid deadlocks (dpkp #1464)

Added AlterConfigs and DescribeConfigs apis (StephenSorriaux #1472)

Fix CreatePartitionsRequest_v0 (StephenSorriaux #1469)

Add codec validators to record parser and builder for all formats (tvoinarovskyi #1447)

Fix MemoryRecord bugs re error handling and add test coverage (tvoinarovskyi #1448)

Force lz4 to disable Kafka-unsupported block linking when encoding (mnito #1476)

Stop shadowing ConnectionError (jeffwidman #1492)

Documentation

Document methods that return None (jeffwidman #1504)

Minor doc capitalization cleanup (jeffwidman)

Adds add_callback/add_errback example to docs (Berkodev #1441)

Fix KafkaConsumer docstring for request_timeout_ms default (dpkp #1459)

Test Infrastructure

Skip flakey SimpleProducer test (dpkp)

Fix skipped integration tests if KAFKA_VERSION unset (dpkp #1453)

Logging / Error Messages

Stop using deprecated log.warn() (jeffwidman)

Change levels for some heartbeat thread logging (dpkp #1456)

Log Heartbeat thread start / close for debugging (dpkp)

Source code(tar.gz)
Source code(zip)
1.4.2(Mar 11, 2018)
Bugfixes

Close leaked selector in version check (dpkp #1425)

Fix BrokerConnection.connection_delay() to return milliseconds (dpkp #1414)

Use local copies in Fetcher._fetchable_partitions to avoid mutation errors (dpkp #1400)

Fix error var name in _unpack (j2gg0s #1403)

Fix KafkaConsumer compacted offset handling (dpkp #1397)

Fix byte size estimation with kafka producer (blakeembrey #1393)

Fix coordinator timeout in consumer poll interface (braedon #1384)

Client

Add BrokerConnection.connect_blocking() to improve bootstrap to multi-address hostnames (dpkp #1411)

Short-circuit BrokerConnection.close() if already disconnected (dpkp #1424)

Only increase reconnect backoff if all addrinfos have been tried (dpkp #1423)

Make BrokerConnection .host / .port / .afi immutable to avoid incorrect 'metadata changed' checks (dpkp #1422)

Connect with sockaddrs to support non-zero ipv6 scope ids (dpkp #1433)

Check timeout type in KafkaClient constructor (asdaraujo #1293)

Update string representation of SimpleClient (asdaraujo #1293)

Do not validate api_version against known versions (dpkp #1434)

Consumer

Avoid tight poll loop in consumer when brokers are down (dpkp #1415)

Validate max_records in KafkaConsumer.poll (dpkp #1398)

KAFKA-5512: Awake heartbeat thread when it is time to poll (dpkp #1439)

Producer

Validate that serializers generate bytes-like (or None) data (dpkp #1420)

Core / Protocol

Support alternative lz4 package: lz4framed (everpcpc #1395)

Use hardware accelerated CRC32C function if available (tvoinarovskyi #1389)

Add Admin CreatePartitions API call (alexef #1386)

Test Infrastructure

Close KafkaConsumer instances during tests (dpkp #1410)

Introduce new fixtures to prepare for migration to pytest (asdaraujo #1293)

Removed pytest-catchlog dependency (asdaraujo #1380)

Fixes racing condition when message is sent to broker before topic logs are created (asdaraujo #1293)

Add kafka 1.0.1 release to test fixtures (dpkp #1437)

Logging / Error Messages

Re-enable logging during broker version check (dpkp #1430)

Connection logging cleanups (dpkp #1432)

Remove old CommitFailed error message from coordinator (dpkp #1436)

Source code(tar.gz)
Source code(zip)
1.4.1(Feb 9, 2018)
Bugfixes

Fix consumer poll stuck error when no available partition (ckyoog #1375)

Increase some integration test timeouts (dpkp #1374)

Use raw in case string overriden (jeffwidman #1373)

Fix pending completion IndexError bug caused by multiple threads (dpkp #1372)

Source code(tar.gz)
Source code(zip)
1.4.0(Feb 7, 2018)
This is a substantial release. Although there are no known 'showstopper' bugs as of release, we do recommend you test any planned upgrade to your application prior to running in production.

Some of the major changes include:

We have officially dropped python 2.6 support

The KafkaConsumer now includes a background thread to handle coordinator heartbeats

API protocol handling has been separated from networking code into a new class, KafkaProtocol

Added support for kafka message format v2

Refactored DNS lookups during kafka broker connections

SASL authentication is working (we think)

Removed several circular references to improve gc on close()

Thanks to all contributors -- the state of the kafka-python community is strong!

Detailed changelog are listed below:

Client

Fixes for SASL support

Refactor SASL/gssapi support (dpkp #1248 #1249 #1257 #1262 #1280)

Add security layer negotiation to the GSSAPI authentication (asdaraujo #1283)

Fix overriding sasl_kerberos_service_name in KafkaConsumer / KafkaProducer (natedogs911 #1264)

Fix typo in _try_authenticate_plain (everpcpc #1333)

Fix for Python 3 byte string handling in SASL auth (christophelec #1353)

Move callback processing from BrokerConnection to KafkaClient (dpkp #1258)

Use socket timeout of request_timeout_ms to prevent blocking forever on send (dpkp #1281)

Refactor dns lookup in BrokerConnection (dpkp #1312)

Read all available socket bytes (dpkp #1332)

Honor reconnect_backoff in conn.connect() (dpkp #1342)

Consumer

KAFKA-3977: Defer fetch parsing for space efficiency, and to raise exceptions to user (dpkp #1245)

KAFKA-4034: Avoid unnecessary consumer coordinator lookup (dpkp #1254)

Handle lookup_coordinator send failures (dpkp #1279)

KAFKA-3888 Use background thread to process consumer heartbeats (dpkp #1266)

Improve KafkaConsumer cleanup (dpkp #1339)

Fix coordinator join_future race condition (dpkp #1338)

Avoid KeyError when filtering fetchable partitions (dpkp #1344)

Name heartbeat thread with group_id; use backoff when polling (dpkp #1345)

KAFKA-3949: Avoid race condition when subscription changes during rebalance (dpkp #1364)

Fix #1239 regression to avoid consuming duplicate compressed messages from mid-batch (dpkp #1367)

Producer

Fix timestamp not passed to RecordMetadata (tvoinarovskyi #1273)

Raise non-API exceptions (jeffwidman #1316)

Fix reconnect_backoff_max_ms default config bug in KafkaProducer (YaoC #1352)

Core / Protocol

Add kafka.protocol.parser.KafkaProtocol w/ receive and send (dpkp #1230)

Refactor MessageSet and Message into LegacyRecordBatch to later support v2 message format (tvoinarovskyi #1252)

Add DefaultRecordBatch implementation aka V2 message format parser/builder. (tvoinarovskyi #1185)

optimize util.crc32 (ofek #1304)

Raise better struct pack/unpack errors (jeffwidman #1320)

Add Request/Response structs for kafka broker 1.0.0 (dpkp #1368)

Bugfixes

use python standard max value (lukekingbru #1303)

changed for to use enumerate() (TheAtomicOption #1301)

Explicitly check for None rather than falsey (jeffwidman #1269)

Minor Exception cleanup (jeffwidman #1317)

Use non-deprecated exception handling (jeffwidman a699f6a)

Remove assertion with side effect in client.wakeup() (bgedik #1348)

use absolute imports everywhere (kevinkjt2000 #1362)

Test Infrastructure

Use 0.11.0.2 kafka broker for integration testing (dpkp #1357 #1244)

Add a Makefile to help build the project, generate docs, and run tests (tvoinarovskyi #1247)

Add fixture support for 1.0.0 broker (dpkp #1275)

Add kafka 1.0.0 to travis integration tests (dpkp #1365)

Change fixture default host to localhost (asdaraujo #1305)

Minor test cleanups (dpkp #1343)

Use latest pytest 3.4.0, but drop pytest-sugar due to incompatibility (dpkp #1361)

Documentation

Expand metrics docs (jeffwidman #1243)

Fix docstring (jeffwidman #1261)

Added controlled thread shutdown to example.py (TheAtomicOption #1268)

Add license to wheel (jeffwidman #1286)

Use correct casing for MB (jeffwidman #1298)

Logging / Error Messages

Fix two bugs in printing bytes instance (jeffwidman #1296)

Source code(tar.gz)
Source code(zip)
1.3.5(Oct 8, 2017)
Bugfixes

Fix partition assignment race condition (jeffwidman #1240)

Fix consumer bug when seeking / resetting to the middle of a compressed messageset (dpkp #1239)

Fix traceback sent to stderr not logging (dbgasaway #1221)

Stop using mutable types for default arg values (jeffwidman #1213)

Remove a few unused imports (jameslamb #1188)

Client

Refactor BrokerConnection to use asynchronous receive_bytes pipe (dpkp #1032)

Consumer

Drop unused sleep kwarg to poll (dpkp #1177)

Enable KafkaConsumer beginning_offsets() and end_offsets() with older broker versions (buptljy #1200)

Validate consumer subscription topic strings (nikeee #1238)

Documentation

Small fixes to SASL documentation and logging; validate security_protocol (dpkp #1231)

Various typo and grammar fixes (jeffwidman)

Source code(tar.gz)
Source code(zip)
1.3.4(Aug 13, 2017)
Bugfixes

Avoid multiple connection attempts when refreshing metadata (dpkp #1067)

Catch socket.errors when sending / recving bytes on wake socketpair (dpkp #1069)

Deal with brokers that reappear with different IP address (originsmike #1085)

Fix join-time-max and sync-time-max metrics to use Max() measure function (billyevans #1146)

Raise AssertionError when decompression unsupported (bts-webber #1159)

Catch ssl.EOFErrors on Python3.3 so we close the failing conn (Ormod #1162)

Select on sockets to avoid busy polling during bootstrap (dpkp #1175)

Initialize metadata_snapshot in group coordinator to avoid unnecessary rebalance (dpkp #1174)

Client

Timeout idle connections via connections_max_idle_ms (dpkp #1068)

Warn, dont raise, on DNS lookup failures (dpkp #1091)

Support exponential backoff for broker reconnections -- KIP-144 (dpkp #1124)

Add gssapi support (Kerberos) for SASL (Harald-Berghoff #1152)

Add private map of api key -> min/max versions to BrokerConnection (dpkp #1169)

Consumer

Backoff on unavailable group coordinator retry (dpkp #1125)

Only change_subscription on pattern subscription when topics change (Artimi #1132)

Add offsets_for_times, beginning_offsets and end_offsets APIs (tvoinarovskyi #1161)

Producer

Raise KafkaTimeoutError when flush times out (infecto)

Set producer atexit timeout to 0 to match del (Ormod #1126)

Core / Protocol

0.11.0.0 protocol updates (only - no client support yet) (dpkp #1127)

Make UnknownTopicOrPartitionError retriable error (tvoinarovskyi)

Test Infrastructure

pylint 1.7.0+ supports python 3.6 and merge py36 into common testenv (jianbin-wei #1095)

Add kafka 0.10.2.1 into integration testing version (jianbin-wei #1096)

Disable automated tests for python 2.6 and kafka 0.8.0 and 0.8.1.1 (jianbin-wei #1096)

Support manual py26 testing; dont advertise 3.3 support (dpkp)

Add 0.11.0.0 server resources, fix tests for 0.11 brokers (dpkp)

Use fixture hostname, dont assume localhost (dpkp)

Add 0.11.0.0 to travis test matrix, remove 0.10.1.1; use scala 2.11 artifacts (dpkp #1176)

Logging / Error Messages

Improve error message when expiring batches in KafkaProducer (dpkp #1077)

Update producer.send docstring -- raises KafkaTimeoutError (infecto)

Use logging's built-in string interpolation (jeffwidman)

Fix produce timeout message (melor #1151)

Fix producer batch expiry messages to use seconds (dnwe)

Documentation

Fix typo in KafkaClient docstring (jeffwidman #1054)

Update README: Prefer python-lz4 over lz4tools (kiri11 #1057)

Fix poll() hyperlink in KafkaClient (jeffwidman)

Update RTD links with https / .io (jeffwidman #1074)

Describe consumer thread-safety (ecksun)

Fix typo in consumer integration test (jeffwidman)

Note max_in_flight_requests_per_connection > 1 may change order of messages (tvoinarovskyi #1149)

Source code(tar.gz)
Source code(zip)
1.3.3(Apr 25, 2017)

Source code(tar.gz)
Source code(zip)
1.3.2(Apr 25, 2017)

Source code(tar.gz)
Source code(zip)
1.3.1(Aug 19, 2016)
Bugfixes

Fix AttributeError in BrokerConnectionMetrics after reconnecting

Source code(tar.gz)
Source code(zip)
1.3.0(Aug 5, 2016)
Incompatible Changes

Delete KafkaConnection class (dpkp 769)

Rename partition_assignment -> assignment in MemberMetadata for consistency

Move selectors34 and socketpair to kafka.vendor (dpkp 785)

Change api_version config to tuple; deprecate str with warning (dpkp 761)

Rename _DEFAULT_CONFIG -> DEFAULT_CONFIG in KafkaProducer (dpkp 788)

Improvements

Vendor six 1.10.0 to eliminate runtime dependency (dpkp 785)

Add KafkaProducer and KafkaConsumer.metrics() with instrumentation similar to java client (dpkp 754 / 772 / 794)

Support Sasl PLAIN authentication (larsjsol PR 779)

Add checksum and size to RecordMetadata and ConsumerRecord (KAFKA-3196 / 770 / 594)

Use MetadataRequest v1 for 0.10+ api_version (dpkp 762)

Fix KafkaConsumer autocommit for 0.8 brokers (dpkp 756 / 706)

Improve error logging (dpkp 760 / 759)

Adapt benchmark scripts from https://github.com/mrafayaleem/kafka-jython (dpkp 754)

Add api_version config to KafkaClient (dpkp 761)

New Metadata method with_partitions() (dpkp 787)

Use socket_options configuration to setsockopts(). Default TCP_NODELAY (dpkp 783)

Expose selector type as config option (dpkp 764)

Drain pending requests to the coordinator before initiating group rejoin (dpkp 798)

Send combined size and payload bytes to socket to avoid potentially split packets with TCP_NODELAY (dpkp 797)

Bugfixes

Ignore socket.error when checking for protocol out of sync prior to socket close (dpkp 792)

Fix offset fetch when partitions are manually assigned (KAFKA-3960 / 786)

Change pickle_method to use python3 special attributes (jpaulodit 777)

Fix ProduceResponse v2 throttle_time_ms

Always encode size with MessageSet (#771)

Avoid buffer overread when compressing messageset in KafkaProducer

Explicit format string argument indices for python 2.6 compatibility

Simplify RecordMetadata; short circuit callbacks (#768)

Fix autocommit when partitions assigned manually (KAFKA-3486 / #767 / #626)

Handle metadata updates during consumer rebalance (KAFKA-3117 / #766 / #701)

Add a consumer config option to exclude internal topics (KAFKA-2832 / #765)

Protect writes to wakeup socket with threading lock (#763 / #709)

Fetcher spending unnecessary time during metrics recording (KAFKA-3785)

Always use absolute_import (dpkp)

Test / Fixtures

Catch select errors while capturing test fixture logs

Fix consumer group test race condition (dpkp 795)

Retry fixture failures on a different port (dpkp 796)

Dump fixture logs on failure

Documentation

Fix misspelling of password (ssaamm 793)

Document the ssl_password config option (ssaamm 780)

Fix typo in KafkaConsumer documentation (ssaamm 775)

Expand consumer.fetcher inline comments

Update kafka configuration links -> 0.10.0.0 docs

Fixup metrics_sample_window_ms docstring in consumer

Source code(tar.gz)
Source code(zip)
1.2.5(Aug 5, 2016)
Bugfixes

Fix bug causing KafkaProducer to double-compress message batches on retry

Check for double-compressed messages in KafkaConsumer, log warning and optionally skip

Drop recursion in _unpack_message_set; only decompress once

Source code(tar.gz)
Source code(zip)
1.2.4(Aug 5, 2016)
Bugfixes

Update consumer_timeout_ms docstring - KafkaConsumer raises StopIteration, no longer ConsumerTimeout

Use explicit subscription state flag to handle seek() during message iteration

Fix consumer iteration on compacted topics (dpkp PR 752)

Support ssl_password config when loading cert chains (amckemie PR 750)

Source code(tar.gz)
Source code(zip)
1.2.3(Aug 5, 2016)
Patch Improvements

Fix gc error log: avoid AttributeError in _unregister_cleanup (dpkp PR 747)

Wakeup socket optimizations (dpkp PR 740)

Assert will be disabled by "python -O" (tyronecai PR 736)

Randomize order of topics/partitions processed by fetcher to improve balance (dpkp PR 732)

Allow client.check_version timeout to be set in Producer and Consumer constructors (eastlondoner PR 647)

Source code(tar.gz)
Source code(zip)
1.2.2(Aug 5, 2016)
Bugfixes

Clarify timeout unit in KafkaProducer close and flush (ms7s PR 734)

Avoid busy poll during metadata refresh failure with retry_backoff_ms (dpkp PR 733)

Check_version should scan nodes until version found or timeout (dpkp PR 731)

Fix bug which could cause least_loaded_node to always return the same unavailable node (dpkp PR 730)

Fix producer garbage collection with weakref in atexit handler (dpkp PR 728)

Close client selector to fix fd leak (msmith PR 729)

Tweak spelling mistake in error const (steve8918 PR 719)

Rearrange connection tests to separate legacy KafkaConnection

Source code(tar.gz)
Source code(zip)
1.2.1(Aug 5, 2016)
Bugfixes

Fix regression in MessageSet decoding wrt PartialMessages (#716)

Catch response decode errors and log details (#715)

Fix Legacy support url (#712 - JonasGroeger)

Update sphinx docs re 0.10 broker support

Source code(tar.gz)
Source code(zip)
1.2.0(Aug 5, 2016)
Support for Kafka 0.10

Add protocol support for ApiVersionRequest (dpkp PR 678)

KAFKA-3025: Message v1 -- add timetamp and relative offsets (dpkp PR 693)

Use Fetch/Produce API v2 for brokers >= 0.10 (uses message format v1) (dpkp PR 694)

Use standard LZ4 framing for v1 messages / kafka 0.10 (dpkp PR 695)

Consumers

Update SimpleConsumer / legacy protocol to handle compressed messages (paulcavallaro PR 684)

Producers

KAFKA-3388: Fix expiration of batches sitting in the accumulator (dpkp PR 699)

KAFKA-3197: when max.in.flight.request.per.connection = 1, attempt to guarantee ordering (dpkp PR 698)

Dont use soon-to-be-reserved keyword await as function name (FutureProduceResult) (dpkp PR 697)

Clients

Fix socket leaks in KafkaClient (dpkp PR 696)

Documentation

Internals

Support SSL CRL requires python 2.7.9+ / 3.4+

Use original hostname for SSL checks (vincentbernat PR 682)

Always pass encoded message bytes to MessageSet.encode()

Raise ValueError on protocol encode/decode errors

Supplement socket.gaierror exception in BrokerConnection.connect() (erikbeebe PR 687)

BrokerConnection check_version: expect 0.9 to fail with CorrelationIdError

Fix small bug in Sensor (zackdever PR 679)

Source code(tar.gz)
Source code(zip)
1.1.1(Aug 5, 2016)
quick bugfixes

fix throttle_time_ms sensor handling (zackdever pr 667)

improve handling of disconnected sockets (easypost pr 666 / dpkp)

disable standard metadata refresh triggers during bootstrap (dpkp)

more predictable future callback/errback exceptions (zackdever pr 670)

avoid some exceptions in coordinator.del (dpkp pr 668)

Source code(tar.gz)
Source code(zip)
1.1.0(Aug 5, 2016)
Consumers

Avoid resending FetchRequests that are pending on internal queue

Log debug messages when skipping fetched messages due to offset checks

KAFKA-3013: Include topic-partition in exception for expired batches

KAFKA-3318: clean up consumer logging and error messages

Improve unknown coordinator error handling

Improve auto-commit error handling when group_id is None

Add paused() API (zackdever PR 602)

Add default_offset_commit_callback to KafkaConsumer DEFAULT_CONFIGS

Producers

Clients

Support SSL connections

Use selectors module for non-blocking IO

Refactor KafkaClient connection management

Fix AttributeError in del

SimpleClient: catch errors thrown by _get_leader_for_partition (zackdever PR 606)

Documentation

Fix serializer/deserializer examples in README

Update max.block.ms docstring

Remove errant next(consumer) from consumer documentation

Add producer.flush() to usage docs

Internals

Add initial metrics implementation (zackdever PR 637)

KAFKA-2136: support Fetch and Produce v1 (throttle_time_ms)

Use version-indexed lists for request/response protocol structs (dpkp PR 630)

Split kafka.common into kafka.structs and kafka.errors

Handle partial socket send() (dpkp PR 611)

Fix windows support (dpkp PR 603)

IPv6 support (TimEvens PR 615; Roguelazer PR 642)

Source code(tar.gz)
Source code(zip)
1.0.2(Apr 9, 2016)
This release includes critical bugfixes -- upgrade strongly recommended

Consumers

Improve KafkaConsumer Heartbeat handling (dpkp PR 583)

Fix KafkaConsumer.position bug (stefanth PR 578)

Raise TypeError when partition is not a TopicPartition (dpkp PR 587)

KafkaConsumer.poll should sleep to prevent tight-loops (dpkp PR 597)

Producers

Fix producer threading bug that can crash sender (dpkp PR 590)

Fix bug in producer buffer pool reallocation (dpkp PR 585)

Remove spurious warnings when closing sync SimpleProducer (twm PR 567)

Fix FutureProduceResult.await() on python2.6 (dpkp)

Add optional timeout parameter to KafkaProducer.flush() (dpkp)

KafkaProducer Optimizations (zackdever PR 598)

Clients

Improve error handling in SimpleClient.load_metadata_for_topics (dpkp)

Improve handling of KafkaClient.least_loaded_node failure (dpkp PR 588)

Documentation

Fix KafkaError import error in docs (shichao-an PR 564)

Fix serializer / deserializer examples (scribu PR 573)

Internals

Update to Kafka 0.9.0.1 for integration testing

Fix ifr.future.failure in conn.py (mortenlj PR 566)

Improve Zookeeper / Kafka Fixture management (dpkp)

Source code(tar.gz)
Source code(zip)
1.0.1(Apr 9, 2016)
Consumers

Add RangePartitionAssignor (and use as default); add assignor tests (dpkp PR 550)

Make sure all consumers are in same generation before stopping group test

Verify node ready before sending offset fetch request from coordinator

Improve warning when offset fetch request returns unknown topic / partition

Producers

Warn if pending batches failed during flush

Fix concurrency bug in RecordAccumulator.ready()

Fix bug in SimpleBufferPool memory condition waiting / timeout

Support batch_size = 0 in producer buffers (dpkp PR 558)

Catch duplicate batch.done() calls [e.g., maybe_expire then a response errback]

Clients

Documentation

Improve kafka.cluster docstrings

Migrate load_example.py to KafkaProducer / KafkaConsumer

Internals

Dont override system rcvbuf or sndbuf unless configured explicitly (dpkp PR 557)

Some attributes may not exist in __del__ if we failed assertions

Break up some circular references and close client wake pipes on __del__ (aisch PR 554)

Source code(tar.gz)
Source code(zip)
1.0.0(Feb 16, 2016)
This release includes significant code changes. Users of older kafka-python versions are encouraged to test upgrades before deploying to production as some interfaces and configuration options have changed.

Users of SimpleConsumer / SimpleProducer / SimpleClient (formerly KafkaClient) from prior releases should migrate to KafkaConsumer / KafkaProducer. Low-level APIs (Simple*) are no longer being actively maintained and will be removed in a future release.

For comprehensive API documentation, please see python help() / docstrings, kafka-python.readthedocs.org, or run tox -e docs from source to build documentation locally.

Consumers

KafkaConsumer re-written to emulate the new 0.9 kafka consumer (java client) and support coordinated consumer groups (feature requires >= 0.9.0.0 brokers)

Methods no longer available:

configure [initialize a new consumer instead]

set_topic_partitions [use subscribe() or assign()]

fetch_messages [use poll() or iterator interface]

get_partition_offsets

offsets [use committed(partition)]

task_done [handled internally by auto-commit; or commit offsets manually]

Configuration changes (consistent with updated java client):

lots of new configuration parameters -- see docs for details

auto_offset_reset: previously values were 'smallest' or 'largest', now values are 'earliest' or 'latest'

fetch_wait_max_ms is now fetch_max_wait_ms

max_partition_fetch_bytes is now max_partition_fetch_bytes

deserializer_class is now value_deserializer and key_deserializer

auto_commit_enable is now enable_auto_commit

auto_commit_interval_messages was removed

socket_timeout_ms was removed

refresh_leader_backoff_ms was removed

SimpleConsumer and MultiProcessConsumer are now deprecated and will be removed in a future release. Users are encouraged to migrate to KafkaConsumer.

Producers

new producer class: KafkaProducer. Exposes the same interface as official java client. Async by default; returned future.get() can be called for synchronous blocking

SimpleProducer is now deprecated and will be removed in a future release. Users are encouraged to migrate to KafkaProducer.

Clients

synchronous KafkaClient renamed to SimpleClient. For backwards compatibility, you will get a SimpleClient via from kafka import KafkaClient. This will change in a future release.

All client calls use non-blocking IO under the hood.

Add probe method check_version() to infer broker versions.

Documentation

Updated README and sphinx documentation to address new classes.

Docstring improvements to make python help() easier to use.

Internals

Old protocol stack is deprecated. It has been moved to kafka.protocol.legacy and may be removed in a future release.

Protocol layer re-written using Type classes, Schemas and Structs (modeled on the java client).

Add support for LZ4 compression (including broken framing header checksum).

Source code(tar.gz)
Source code(zip)
v0.9.5(Feb 16, 2016)
Consumers

Initial support for consumer coordinator: offsets only (toddpalino PR 420)

Allow blocking until some messages are received in SimpleConsumer (saaros PR 457)

Support subclass config changes in KafkaConsumer (zackdever PR 446)

Support retry semantics in MultiProcessConsumer (barricadeio PR 456)

Support partition_info in MultiProcessConsumer (scrapinghub PR 418)

Enable seek() to an absolute offset in SimpleConsumer (haosdent PR 412)

Add KafkaConsumer.close() (ucarion PR 426)

Producers

Catch client.reinit() exceptions in async producer (dpkp)

Producer.stop() now blocks until async thread completes (dpkp PR 485)

Catch errors during load_metadata_for_topics in async producer (bschopman PR 467)

Add compression-level support for codecs that support it (trbs PR 454)

Fix translation of Java murmur2 code, fix byte encoding for Python 3 (chrischamberlin PR 439)

Only call stop() on not-stopped producer objects (docker-hub PR 435)

Allow null payload for deletion feature (scrapinghub PR 409)

Clients

Use non-blocking io for broker aware requests (ecanzonieri PR 473)

Use debug logging level for metadata request (ecanzonieri PR 415)

Catch KafkaUnavailableError in _send_broker_aware_request (mutability PR 436)

Lower logging level on replica not available and commit (ecanzonieri PR 415)

Documentation

Update docs and links wrt maintainer change (mumrah -> dpkp)

Internals

Add py35 to tox testing

Update travis config to use container infrastructure

Add 0.8.2.2 and 0.9.0.0 resources for integration tests; update default official releases

new pylint disables for pylint 1.5.1 (zackdever PR 481)

Fix python3 / python2 comments re queue/Queue (dpkp)

Add Murmur2Partitioner to kafka all imports (dpkp Issue 471)

Include LICENSE in PyPI sdist (koobs PR 441)

Source code(tar.gz)
Source code(zip)
v0.9.4(Jun 12, 2015)
Consumers

Refactor SimpleConsumer internal fetch handling (dpkp #399)

Handle exceptions in SimpleConsumer commit() and reset_partition_offset() (dpkp #404)

Improve FailedPayloadsError handling in KafkaConsumer (dpkp #398)

KafkaConsumer: avoid raising KeyError in task_done (dpkp #389)

MultiProcessConsumer -- support configured partitions list (dpkp #380)

Fix SimpleConsumer leadership change handling (dpkp #393)

Fix SimpleConsumer connection error handling (reAsOn2010 #392)

Improve Consumer handling of 'falsy' partition values (wting #342)

Fix _offsets call error in KafkaConsumer (hellais #376)

Fix str/bytes bug in KafkaConsumer (dpkp #365)

Register atexit handlers for consumer and producer thread/multiprocess cleanup (dpkp #360)

Always fetch commit offsets in base consumer unless group is None (dpkp #356)

Stop consumer threads on delete (dpkp #357)

Deprecate metadata_broker_list in favor of bootstrap_servers in KafkaConsumer (dpkp #340)

Support pass-through parameters in multiprocess consumer (scrapinghub #336)

Enable offset commit on SimpleConsumer.seek (ecanzonieri #350)

Improve multiprocess consumer partition distribution (scrapinghub #335)

Ignore messages with offset less than requested (wkiser #328)

Handle OffsetOutOfRange in SimpleConsumer (ecanzonieri #296)

Producers

Add Murmur2Partitioner (dpkp #378)

Log error types in SimpleProducer and SimpleConsumer (dpkp #405)

SimpleProducer support configuration of fail_on_error (dpkp #396)

Deprecate KeyedProducer.send() (dpkp #379)

Further improvements to async producer code (dpkp #388)

Add more configuration parameters for async producer (dpkp)

Deprecate SimpleProducer batch_send=True in favor of async (dpkp)

Improve async producer error handling and retry logic (vshlapakov #331)

Support message keys in async producer (vshlapakov #329)

Use threading instead of multiprocessing for Async Producer (vshlapakov #330)

Stop threads on __del__ (chmduquesne #324)

Fix leadership failover handling in KeyedProducer (dpkp #314)

KafkaClient

Add .topics property for list of known topics (dpkp)

Fix request / response order guarantee bug in KafkaClient (dpkp #403)

Improve KafkaClient handling of connection failures in _get_conn (dpkp)

Client clears local metadata cache before updating from server (dpkp #367)

KafkaClient should return a response or error for each request - enable better retry handling (dpkp #366)

Improve str/bytes conversion in KafkaClient and KafkaConsumer (dpkp #332)

Always return sorted partition ids in client.get_partition_ids_for_topic() (dpkp #315)

Documentation

Cleanup Usage Documentation

Improve KafkaConsumer documentation (dpkp #341)

Update consumer documentation (sontek #317)

Add doc configuration for tox (sontek #316)

Switch to .rst doc format (sontek #321)

Fixup google groups link in README (sontek #320)

Automate documentation at kafka-python.readthedocs.org

Internals

Switch integration testing from 0.8.2.0 to 0.8.2.1 (dpkp #402)

Fix most flaky tests, improve debug logging, improve fixture handling (dpkp)

General style cleanups (dpkp #394)

Raise error on duplicate topic-partition payloads in protocol grouping (dpkp)

Use module-level loggers instead of simply 'kafka' (dpkp)

Remove pkg_resources check for __version__ at runtime (dpkp #387)

Make external API consistently support python3 strings for topic (kecaps #361)

Fix correlation id overflow (dpkp #355)

Cleanup kafka/common structs (dpkp #338)

Use context managers in gzip_encode / gzip_decode (dpkp #337)

Save failed request as FailedPayloadsError attribute (jobevers #302)

Remove unused kafka.queue (mumrah)

Source code(tar.gz)
Source code(zip)

Python client for Apache Kafka

Related tags

Overview

Kafka Python client

KafkaConsumer

KafkaProducer

Thread safety

Compression

Optimized CRC32 Validation

Protocol

Comments

Releases(2.0.2)

2.0.2(Sep 30, 2020)

2.0.2 (Sep 29, 2020)

Consumer

Compatibility

Cleanups

Admin Client

Protocol

Tests

Documentation / Logging / Errors

2.0.1(Sep 30, 2020)

2.0.1 (Feb 19, 2020)

Admin Client

2.0.0(Feb 11, 2020)

2.0.0 (Feb 10, 2020)

Deprecation

Admin Client

Miscellaneous Bugfixes / Improvements

Test Infrastructure / Documentation / Maintenance

1.4.7(Sep 30, 2019)

1.4.7 (Sep 30, 2019)

Client

KafkaConsumer

Miscellaneous Bugfixes / Improvements

Admin Client

Test Infrastructure / Documentation / Maintenance

1.4.6(Apr 3, 2019)

1.4.6 (Apr 2, 2019)

Client Concurrency Issues (Race Conditions / Deadlocks)

Producer Wakeup / TimeoutError

SSL - Python3.7 Support / Bootstrap Hostname Verification / Testing

SASL - OAuthBearer support / api version bugfix

Miscellaneous Bugfixes

1.4.5(Mar 15, 2019)

1.4.5 (Mar 14, 2019)

Consumer

Client

Admin Client

Core/Protocol

Bugfixes

Test Infrastructure

Compatibility

1.4.4(Nov 23, 2018)

Bugfixes

Client

Admin Client

Consumer

Core / Protocol

Documentation

Test Infrastructure

Logging / Error Messages

Compatibility

1.4.3(May 26, 2018)

Compatibility

Client

Consumer

Core / Protocol

Documentation

Test Infrastructure

Logging / Error Messages

1.4.2(Mar 11, 2018)

Bugfixes

Client

Consumer

Producer

Core / Protocol

Test Infrastructure

Logging / Error Messages

1.4.1(Feb 9, 2018)