Please describe how you installed Flake8
$ sudo apt install flake8. # Ubuntu 20.04.3. I might pip-install if this feature request gets released.
Please provide the exact, unmodified output of
Please describe the problem or feature
Hi All. Thanks so much for maintaining
Edit: ignore the sorting/optimizing chunks part. It seems to possibly have a negative benefit on large code-bases, though maybe still be slightly helpful for smaller codebases, though probably isn't worth the complexity.
I think flake8 should increase batches per work and/or optimize chunks, possibly as settings.
I have 496 files in my project, many of them are small
django migrations and
__init__.py files, but there are a few really big ones (15482 tokens, ~1000 physical lines), which end up making one worker (out of 4) quit much later than the others, so the large chunks (2 per worker) aren't balanced.
In my case a normal
flake8 run on my project takes ~3.8 seconds (tested 10 runs / 10), if I edit the code to force pool
chunksize to 1 or 2, I get ~3.3 seconds, a 13% improvement in my case.
Another thing could help would be to
stat all the files and sort by file size, biggest first, then deal them out to each worker to help keep avoid a worker getting too much work at the end. For me when the files are cached in memory, it takes 2-4ms to stat and sort the files. That makes a larger
chunksize more worth it:
# in run_parallel(self):
chunksize = calculate_pool_chunksize(len(self.checkers), self.jobs)
# sort files by file size, largest first
self.checkers.sort(key=lambda checker: os.stat(checker.filename).st_size, reverse=True)
# deal the files out to the worker chunks
new_sorted_checkers = 
chunks = [ for x in range(self.jobs)]
for n, checker in enumerate(self.checkers):
worker = n % self.jobs
if len(chunks[worker]) >= chunksize:
new_sorted_checkers += chunks[worker]
chunks[worker] = 
for chunk in chunks:
new_sorted_checkers += chunk
self.checkers = new_sorted_checkers
With the sorting and dealing alone, using flake8's default of 2-chunks per worker (62 chunksize in my case), i get ~3.1 seconds (15% faster), which is a pretty good speedup. If I also force it to have a chunksize of 5, I get down to about ~2.9 seconds (23% faster), so increasing batches-per-worker or having a chunksize option would still be nice.
Here's my results for the django 3.2 codebase (2696 checkers/files, 4 workers/jobs, default chunksize=337):
37 seconds - chunksize=337 (~2 batches per worker)
Increasing batches/chunks per worker/job, not stating/sorting/dealing:
32 seconds - chunksize=168 (~4 batches per worker)
30.5 seconds - chunksize=94 (~8 batches per worker)
30 seconds - chunksize=42 (~16 batches per worker)
30.6 seconds - chunksize=21 (~32 batches per worker)
30.9 seconds chunksize=337 (~2 batches per worker) (default batches per worker)
30.1 seconds - chunksize=168 (~4 batches per worker)
29.9 seconds - chunksize=84 (~8 batches per worker)
29.74 seconds - chunksize=42 (~16 batches per worker)
29.9 seconds - chunksize=21 (~32 batches per worker)
(stating, sorting, and dealing alone takes ~14-19ms)
Update: results for cpython 3.7 (no sorting) (1856 checkers/files, 4 workers/jobs):
1m14.535s - 2 batches per worker (default)
1m5.341s - 4 batches per worker (12% faster)
1m2.735s - 8 batches per worker(~15.8% faster)
1m2.576s - 16 batches per worker(~16% faster)
1m2.541s - 32 batches per worker(~16% faster)
1m2.139s - 64 batches per worker(~17% faster)
At some point the benefits are within the margin of variance so it gets hard to measure, but somewhere around 16 batches-per-worker seems to be optimal in my case, and that alone, even without sorting by size etc, seems to really help.
Anyway I thought I'd post my findings in case anyone wants to experiment with these things themselves and possibly improve flake8.
If nothing else, changing the default "batches per worker" from 2 to 8 or 16 (in
calculate_pool_chunksize) would be pretty simple and I think would really help keep work balanced (~15% faster).
P.S. I'm realizing sorting by most-recently modified could also be helpful for quickly showing errors on recently edited files.