Downloader Middleware to support Playwright in Scrapy & Gerapy

Overview

Gerapy Playwright

This is a package for supporting Playwright in Scrapy, also this package is a module in Gerapy.

Installation

pip3 install gerapy-playwright

Usage

You can use PlaywrightRequest to specify a request which uses playwright to render.

For example:

yield PlaywrightRequest(detail_url, callback=self.parse_detail)

And you also need to enable PlaywrightMiddleware in DOWNLOADER_MIDDLEWARES:

DOWNLOADER_MIDDLEWARES = {
    'gerapy_playwright.downloadermiddlewares.PlaywrightMiddleware': 543,
}

Congratulate, you've finished the all of the required configuration.

If you run the Spider again, Playwright will be started to render every web page which you configured the request as PlaywrightRequest.

Settings

GerapyPlaywright provides some optional settings.

Concurrency

You can directly use Scrapy's setting to set Concurrency of Playwright, for example:

CONCURRENT_REQUESTS = 3

Pretend as Real Browser

Some website will detect WebDriver or Headless, GerapyPlaywright can pretend Chromium by inject scripts. This is enabled by default.

You can close it if website does not detect WebDriver to speed up:

GERAPY_PLAYWRIGHT_PRETEND = False

Also you can use pretend attribute in PlaywrightRequest to overwrite this configuration.

Logging Level

By default, Playwright will log all the debug messages, so GerapyPlaywright configured the logging level of Playwright to WARNING.

If you want to see more logs from Playwright, you can change the this setting:

import logging
GERAPY_PLAYWRIGHT_LOGGING_LEVEL = logging.DEBUG

Download Timeout

Playwright may take some time to render the required web page, you can also change this setting, default is 30s:

# playwright timeout
GERAPY_PLAYWRIGHT_DOWNLOAD_TIMEOUT = 30

Headless

By default, Playwright is running in Headless mode, you can also change it to False as you need, default is True:

GERAPY_PLAYWRIGHT_HEADLESS = False

Window Size

You can also set the width and height of Playwright window:

GERAPY_PLAYWRIGHT_WINDOW_WIDTH = 1400
GERAPY_PLAYWRIGHT_WINDOW_HEIGHT = 700

Default is 1400, 700.

Proxy

You can set a proxy channel via below this config:

GERAPY_PLAYWRIGHT_PROXY = 'http://tps254.kdlapi.com:15818'
GERAPY_PLAYWRIGHT_PROXY_CREDENTIAL = {
  'username': 'xxx',
  'password': 'xxxx'
}

Screenshot

You can get screenshot of loaded page, you can pass screenshot args to PlaywrightRequest as dict:

Below are the supported args:

  • type (str): Specify screenshot type, can be either jpeg or png. Defaults to png.
  • quality (int): The quality of the image, between 0-100. Not applicable to png image.
  • full_page (bool): When true, take a screenshot of the full scrollable page. Defaults to False.
  • clip (dict): An object which specifies clipping region of the page. This option should have the following fields:
    • x (int): x-coordinate of top-left corner of clip area.
    • y (int): y-coordinate of top-left corner of clip area.
    • width (int): width of clipping area.
    • height (int): height of clipping area.
  • omit_background (bool): Hide default white background and allow capturing screenshot with transparency.
  • timeout (str): Maximum time in milliseconds, defaults to 30 seconds, pass 0 to disable timeout.

Check more from https://playwright.dev/python/docs/api/class-page#page-screenshot

For example:

yield PlaywrightRequest(start_url, callback=self.parse_index, wait_for='.item .name', screenshot={
            'type': 'png',
            'full_page': True
        })

then you can get screenshot result in response.meta['screenshot']:

Simplest save it to file:

def parse_index(self, response):
    with open('screenshot.png', 'wb') as f:
        f.write(response.meta['screenshot'].getbuffer())

If you want to enable screenshot for all requests, you can configure it by GERAPY_PLAYWRIGHT_SCREENSHOT.

For example:

GERAPY_PLAYWRIGHT_SCREENSHOT = {
    'type': 'png',
    'full_page': True
}

PlaywrightRequest

PlaywrightRequest provide args which can override global settings above.

  • url: request url
  • callback: callback
  • wait_until: one of "load", "domcontentloaded", "networkidle" see https://playwright.dev/python/docs/api/class-page#page-wait-for-load-state, default is domcontentloaded
  • wait_for: wait for some element to load, also supports dict
  • script: script to execute
  • actions: actions defined for execution of Page object
  • proxy: use proxy for this time, like http://x.x.x.x:x
  • proxy_credential: the proxy credential, like {'username': 'xxxx', 'password': 'xxxx'}
  • sleep: time to sleep after loaded, override GERAPY_PLAYWRIGHT_SLEEP
  • timeout: load timeout, override GERAPY_PLAYWRIGHT_DOWNLOAD_TIMEOUT
  • ignore_resource_types: ignored resource types, override GERAPY_PLAYWRIGHT_IGNORE_RESOURCE_TYPES
  • pretend: pretend as normal browser, override GERAPY_PLAYWRIGHT_PRETEND
  • screenshot: ignored resource types, see https://playwright.dev/python/docs/api/class-page#page-screenshot, override GERAPY_PLAYWRIGHT_SCREENSHOT

For example, you can configure PlaywrightRequest as:

from gerapy_playwright import PlaywrightRequest

def parse(self, response):
    yield PlaywrightRequest(url,
        callback=self.parse_detail,
        wait_until='domcontentloaded',
        wait_for='title',
        script='() => { return {name: "Germey"} }',
        sleep=2)

Then Playwright will:

  • wait for document to load
  • wait for title to load
  • execute console.log(document) script
  • sleep for 2s
  • return the rendered web page content, get from response.meta['screenshot']
  • return the script executed result, get from response.meta['script_result']

For waiting mechanism controlled by JavaScript, you can use await in script, for example:

js = '''async () => {
    await new Promise(resolve => setTimeout(resolve, 10000));
    return {
        'name': 'Germey'
    }
}
'''
yield PlaywrightRequest(url, callback=self.parse, script=js)

Then you can get the script result from response.meta['script_result'], result is {'name': 'Germey'}.

If you think the JavaScript is wired to write, you can use actions argument to define a function to execute Python based functions, for example:

async def execute_actions(page):
    await page.evaluate('() => { document.title = "Hello World"; }')
    return 1
yield PlaywrightRequest(url, callback=self.parse, actions=execute_actions)

Then you can get the actions result from response.meta['actions_result'], result is 1.

Also you can define proxy and proxy_credential for each Reqest, for example:

yield PlaywrightRequest(
  self.base_url,
  callback=self.parse_index,
  priority=10,
  proxy='http://tps254.kdlapi.com:15818',
  proxy_credential={
      'username': 'xxxx',
      'password': 'xxxx'
})

proxy and proxy_credential will override the settings GERAPY_PLAYWRIGHT_PROXY and GERAPY_PLAYWRIGHT_PROXY_CREDENTIAL.

Example

For more detail, please see example.

Also you can directly run with Docker:

docker run germey/gerapy-playwright-example

Outputs:

2021-12-27 16:54:14 [scrapy.utils.log] INFO: Scrapy 2.2.0 started (bot: example)
2021-12-27 16:54:14 [scrapy.utils.log] INFO: Versions: lxml 4.7.1.0, libxml2 2.9.12, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 21.7.0, Python 3.7.9 (default, Aug 31 2020, 07:22:35) - [Clang 10.0.0 ], pyOpenSSL 21.0.0 (OpenSSL 1.1.1l  24 Aug 2021), cryptography 35.0.0, Platform Darwin-21.1.0-x86_64-i386-64bit
2021-12-27 16:54:14 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor
2021-12-27 16:54:14 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'example',
 'CONCURRENT_REQUESTS': 1,
 'NEWSPIDER_MODULE': 'example.spiders',
 'RETRY_HTTP_CODES': [403, 500, 502, 503, 504],
 'SPIDER_MODULES': ['example.spiders']}
2021-12-27 16:54:14 [scrapy.extensions.telnet] INFO: Telnet Password: e931b241390ad06a
2021-12-27 16:54:14 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats']
2021-12-27 16:54:14 [gerapy.playwright] INFO: playwright libraries already installed
2021-12-27 16:54:14 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'gerapy_playwright.downloadermiddlewares.PlaywrightMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2021-12-27 16:54:14 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2021-12-27 16:54:14 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2021-12-27 16:54:14 [scrapy.core.engine] INFO: Spider opened
2021-12-27 16:54:14 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2021-12-27 16:54:14 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2021-12-27 16:54:14 [example.spiders.movie] DEBUG: start url https://antispider1.scrape.center/page/1
2021-12-27 16:54:14 [gerapy.playwright] DEBUG: processing request <GET https://antispider1.scrape.center/page/1>
2021-12-27 16:54:14 [gerapy.playwright] DEBUG: playwright_meta {'wait_until': 'domcontentloaded', 'wait_for': '.item', 'script': None, 'actions': None, 'sleep': None, 'proxy': None, 'proxy_credential': None, 'pretend': None, 'timeout': None, 'screenshot': None}
2021-12-27 16:54:14 [gerapy.playwright] DEBUG: set options {'headless': False}
cookies []
2021-12-27 16:54:16 [gerapy.playwright] DEBUG: PRETEND_SCRIPTS is run
2021-12-27 16:54:16 [gerapy.playwright] DEBUG: timeout 10
2021-12-27 16:54:16 [gerapy.playwright] DEBUG: crawling https://antispider1.scrape.center/page/1
2021-12-27 16:54:16 [gerapy.playwright] DEBUG: request https://antispider1.scrape.center/page/1 with options {'url': 'https://antispider1.scrape.center/page/1', 'wait_until': 'domcontentloaded'}
2021-12-27 16:54:18 [gerapy.playwright] DEBUG: waiting for .item
2021-12-27 16:54:18 [gerapy.playwright] DEBUG: sleep for 1s
2021-12-27 16:54:19 [gerapy.playwright] DEBUG: taking screenshot using args {'type': 'png', 'full_page': True}
2021-12-27 16:54:19 [gerapy.playwright] DEBUG: close playwright
2021-12-27 16:54:20 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://antispider1.scrape.center/page/1> (referer: None)
2021-12-27 16:54:20 [example.spiders.movie] DEBUG: start url https://antispider1.scrape.center/page/2
2021-12-27 16:54:20 [gerapy.playwright] DEBUG: processing request <GET https://antispider1.scrape.center/page/2>
2021-12-27 16:54:20 [gerapy.playwright] DEBUG: playwright_meta {'wait_until': 'domcontentloaded', 'wait_for': '.item', 'script': None, 'actions': None, 'sleep': None, 'proxy': None, 'proxy_credential': None, 'pretend': None, 'timeout': None, 'screenshot': None}
2021-12-27 16:54:20 [gerapy.playwright] DEBUG: set options {'headless': False}
2021-12-27 16:54:20 [example.spiders.movie] INFO: detail url https://antispider1.scrape.center/detail/1
2021-12-27 16:54:20 [example.spiders.movie] INFO: detail url https://antispider1.scrape.center/detail/2
2021-12-27 16:54:20 [example.spiders.movie] INFO: detail url https://antispider1.scrape.center/detail/3
2021-12-27 16:54:20 [example.spiders.movie] INFO: detail url https://antispider1.scrape.center/detail/4
2021-12-27 16:54:20 [example.spiders.movie] INFO: detail url https://antispider1.scrape.center/detail/5
2021-12-27 16:54:20 [example.spiders.movie] INFO: detail url https://antispider1.scrape.center/detail/6
2021-12-27 16:54:20 [example.spiders.movie] INFO: detail url https://antispider1.scrape.center/detail/7
2021-12-27 16:54:20 [example.spiders.movie] INFO: detail url https://antispider1.scrape.center/detail/8
2021-12-27 16:54:20 [example.spiders.movie] INFO: detail url https://antispider1.scrape.center/detail/9
2021-12-27 16:54:20 [example.spiders.movie] INFO: detail url https://antispider1.scrape.center/detail/10
cookies []
2021-12-27 16:54:21 [gerapy.playwright] DEBUG: PRETEND_SCRIPTS is run
2021-12-27 16:54:21 [gerapy.playwright] DEBUG: timeout 10
2021-12-27 16:54:21 [gerapy.playwright] DEBUG: crawling https://antispider1.scrape.center/page/2
2021-12-27 16:54:21 [gerapy.playwright] DEBUG: request https://antispider1.scrape.center/page/2 with options {'url': 'https://antispider1.scrape.center/page/2', 'wait_until': 'domcontentloaded'}
2021-12-27 16:54:23 [gerapy.playwright] DEBUG: waiting for .item
2021-12-27 16:54:24 [gerapy.playwright] DEBUG: sleep for 1s
2021-12-27 16:54:25 [gerapy.playwright] DEBUG: taking screenshot using args {'type': 'png', 'full_page': True}
2021-12-27 16:54:25 [gerapy.playwright] DEBUG: close playwright
2021-12-27 16:54:25 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://antispider1.scrape.center/page/2> (referer: None)
2021-12-27 16:54:25 [gerapy.playwright] DEBUG: processing request <GET https://antispider1.scrape.center/detail/10>
2021-12-27 16:54:25 [gerapy.playwright] DEBUG: playwright_meta {'wait_until': 'domcontentloaded', 'wait_for': '.item', 'script': None, 'actions': None, 'sleep': None, 'proxy': None, 'proxy_credential': None, 'pretend': None, 'timeout': None, 'screenshot': None}
2021-12-27 16:54:25 [gerapy.playwright] DEBUG: set options {'headless': False}
...
Comments
  • twisted.internet.error.ReactorAlreadyInstalledError: reactor already installed

    twisted.internet.error.ReactorAlreadyInstalledError: reactor already installed

    python: 3.9 GerapyPlaywright: 0.2.4 os: mac 11.6

    运行scrapy crawl spider的时候直接报错:

    Traceback (most recent call last):
      File "/Users/zz/.virtualenvs/crawler-apk/bin/scrapy", line 8, in <module>
        sys.exit(execute())
      File "/Users/zz/.virtualenvs/crawler-apk/lib/python3.9/site-packages/scrapy/cmdline.py", line 145, in execute
        _run_print_help(parser, _run_command, cmd, args, opts)
      File "/Users/zz/.virtualenvs/crawler-apk/lib/python3.9/site-packages/scrapy/cmdline.py", line 100, in _run_print_help
        func(*a, **kw)
      File "/Users/zz/.virtualenvs/crawler-apk/lib/python3.9/site-packages/scrapy/cmdline.py", line 153, in _run_command
        cmd.run(args, opts)
      File "/Users/zz/.virtualenvs/crawler-apk/lib/python3.9/site-packages/scrapy/commands/crawl.py", line 22, in run
        crawl_defer = self.crawler_process.crawl(spname, **opts.spargs)
      File "/Users/zz/.virtualenvs/crawler-apk/lib/python3.9/site-packages/scrapy/crawler.py", line 205, in crawl
        crawler = self.create_crawler(crawler_or_spidercls)
      File "/Users/zz/.virtualenvs/crawler-apk/lib/python3.9/site-packages/scrapy/crawler.py", line 238, in create_crawler
        return self._create_crawler(crawler_or_spidercls)
      File "/Users/zz/.virtualenvs/crawler-apk/lib/python3.9/site-packages/scrapy/crawler.py", line 313, in _create_crawler
        return Crawler(spidercls, self.settings, init_reactor=True)
      File "/Users/zz/.virtualenvs/crawler-apk/lib/python3.9/site-packages/scrapy/crawler.py", line 82, in __init__
        default.install()
      File "/Users/zz/.virtualenvs/crawler-apk/lib/python3.9/site-packages/twisted/internet/selectreactor.py", line 194, in install
        installReactor(reactor)
      File "/Users/zz/.virtualenvs/crawler-apk/lib/python3.9/site-packages/twisted/internet/main.py", line 32, in installReactor
        raise error.ReactorAlreadyInstalledError("reactor already installed"
    twisted.internet.error.ReactorAlreadyInstalledError: reactor already installed
    

    这个应该改哪里呢

    opened by yyyy777 3
  • Fix leaking file descriptors by using the context manager

    Fix leaking file descriptors by using the context manager

    I was running into OSError: [Errno 24] Too many open files while using this with scrapy for scraping a domain.

    By using async_playwright() as a context manager, we ensure it's closed once finished. This fixes the issue.

    opened by xolan 2
  •     raise BadGzipFile('Not a gzipped file (%r)' % magic) gzip.BadGzipFile: Not a gzipped file (b'<!')

    raise BadGzipFile('Not a gzipped file (%r)' % magic) gzip.BadGzipFile: Not a gzipped file (b'

    崔佬,我这边也不能用。一启动scrapy,就会报这个。 配置: python:3.9.4 macOs: 11.5.2

    Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/twisted/internet/defer.py", line 1445, in _inlineCallbacks result = current_context.run(g.send, result) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/scrapy/core/downloader/middleware.py", line 54, in process_response response = yield deferred_from_coro(method(request=request, response=response, spider=spider)) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/scrapy/downloadermiddlewares/httpcompression.py", line 62, in process_response decoded_body = self._decode(response.body, encoding.lower()) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/scrapy/downloadermiddlewares/httpcompression.py", line 82, in _decode body = gunzip(body) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/scrapy/utils/gz.py", line 27, in gunzip chunk = f.read1(8196) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/gzip.py", line 313, in read1 return self._buffer.read1(size) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/_compression.py", line 68, in readinto data = self.read(len(byte_view)) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/gzip.py", line 487, in read if not self._read_gzip_header(): File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/gzip.py", line 435, in _read_gzip_header raise BadGzipFile('Not a gzipped file (%r)' % magic) gzip.BadGzipFile: Not a gzipped file (b'<!')

    opened by wf4867612 2
  • Method is not Json serializable for actions

    Method is not Json serializable for actions

    Hello,

    I am running into an issue with 'yield' for gerapy-playwright when I need to access a login page. I try to run: yield PlaywrightRequest(login_page_url, self.parse_login, actions = self.login_action) in order to first use playwright to login and then access data that can only be accessed when logging in with self.parse_login. I am getting a: builtins.TypeError: is not JSON serializable.

    I am using scrapy cluster along with gerapy-playwright in order to run a scheduler for all the spiders that I have: https://github.com/istresearch/scrapy-cluster

    It seems that the action is saved in the meta data as a method and cannot be passed to the scheduler. Is it possible to type cast the action as a string and then when the action is called later, to do a method call on the string? If I understand correctly, the action is produced on line 339 of the downloadermiddlewares.py inside of gerepy-playwright. Would it be possible to evaluate the string as a method so that the scrapy-cluster scheduler can pass the string but gerapy-playwright still calls the self.login_action method prior to the self.parse_login?

    opened by BenzTivianne 0
  • playwright._impl._api_types.Error: Browser closed.

    playwright._impl._api_types.Error: Browser closed.

    这种报错会是什么原因呢...

    2022-03-23 06:21:06 [scrapy.core.scraper] ERROR: Error downloading <GET https://apkpure.com/bikers-men-women-bike-photo-editor-future-trends/com.dsrtech.bikers> Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/twisted/internet/defer.py", line 1656, in _inlineCallbacks result = current_context.run( File "/usr/local/lib/python3.8/dist-packages/twisted/python/failure.py", line 489, in throwExceptionIntoGenerator return g.throw(self.type, self.value, self.tb) File "/usr/local/lib/python3.8/dist-packages/scrapy/core/downloader/middleware.py", line 41, in process_request response = yield deferred_from_coro(method(request=request, spider=spider)) File "/usr/local/lib/python3.8/dist-packages/twisted/internet/defer.py", line 1030, in adapt extracted = result.result() File "/usr/local/lib/python3.8/dist-packages/gerapy_playwright/downloadermiddlewares.py", line 243, in _process_request context = await browser.new_context( File "/usr/local/lib/python3.8/dist-packages/playwright/async_api/_generated.py", line 11254, in new_context await self._async( File "/usr/local/lib/python3.8/dist-packages/playwright/_impl/_browser.py", line 117, in new_context channel = await self._channel.send("newContext", params) File "/usr/local/lib/python3.8/dist-packages/playwright/_impl/_connection.py", line 39, in send return await self.inner_send(method, params, False) File "/usr/local/lib/python3.8/dist-packages/playwright/_impl/_connection.py", line 63, in inner_send result = next(iter(done)).result() playwright._impl._api_types.Error: Browser closed. ==================== Browser output: ==================== /ms-playwright/chromium-978106/chrome-linux/chrome --disable-background-networking --enable-features=NetworkService,NetworkServiceInProcess --disable-background-timer-throttling --disable-backgrounding-occluded-windows --disable-breakpad --disable-client-side-phishing-detection --disable-component-extensions-with-background-pages --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-features=ImprovedCookieControls,LazyFrameLoading,GlobalMediaControls,DestroyProfileOnBrowserClose,MediaRouter,AcceptCHFrame,AutoExpandDetailsElement --allow-pre-commit-input --disable-hang-monitor --disable-ipc-flooding-protection --disable-popup-blocking --disable-prompt-on-repost --disable-renderer-backgrounding --disable-sync --force-color-profile=srgb --metrics-recording-only --no-first-run --enable-automation --password-store=basic --use-mock-keychain --no-service-autorun --export-tagged-pdf --headless --hide-scrollbars --mute-audio --blink-settings=primaryHoverType=2,availableHoverTypes=2,primaryPointerType=4,availablePointerTypes=4 --no-sandbox --disable-extensions --hide-scrollbars --mute-audio --no-sandbox --disable-setuid-sandbox --disable-gpu --user-data-dir=/tmp/playwright_chromiumdev_profile-LGppgb --remote-debugging-pipe --no-startup-window pid=1185 [pid=1185][err] [0323/062041.773971:ERROR:platform_thread_posix.cc(151)] pthread_create: Resource temporarily unavailable (11) [pid=1185][err] [0323/062041.774268:ERROR:platform_thread_posix.cc(151)] pthread_create: Resource temporarily unavailable (11) [pid=1185][err] [0323/062041.778128:ERROR:zygote_communication_linux.cc(142)] Did not receive ping from zygote child [pid=1185][err] [0323/062041.778090:ERROR:zygote_linux.cc(607)] Zygote could not fork: process_type gpu-process numfds 3 child_pid -1 [pid=1185][err] [0323/062041.778548:ERROR:gpu_process_host.cc(968)] GPU process launch failed: error_code=1002 [pid=1185][err] [0323/062041.778568:WARNING:gpu_process_host.cc(1279)] The GPU process has crashed 1 time(s) [pid=1185][err] [0323/062041.778540:ERROR:zygote_linux.cc(271)] Unexpected real PID message from browser [pid=1185][err] [0323/062041.780496:ERROR:zygote_communication_linux.cc(142)] Did not receive ping from zygote child [pid=1185][err] [0323/062041.785120:ERROR:zygote_linux.cc(607)] Zygote could not fork: process_type gpu-process numfds 3 child_pid -1 [pid=1185][err] [0323/062041.785947:ERROR:gpu_process_host.cc(968)] GPU process launch failed: error_code=1002 [pid=1185][err] [0323/062041.785963:WARNING:gpu_process_host.cc(1279)] The GPU process has crashed 2 time(s) [pid=1185][err] [0323/062041.786835:ERROR:bus.cc(397)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory [pid=1185][err] [0323/062041.786892:ERROR:bus.cc(397)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory [pid=1185][err] [0323/062041.787157:WARNING:bluez_dbus_manager.cc(248)] Floss manager not present, cannot set Floss enable/disable. [pid=1185][err] [0323/062041.787335:ERROR:zygote_linux.cc(271)] Unexpected real PID message from browser [pid=1185][err] [0323/062041.816287:ERROR:zygote_communication_linux.cc(142)] Did not receive ping from zygote child [pid=1185][err] [0323/062041.815965:ERROR:zygote_linux.cc(607)] Zygote could not fork: process_type gpu-process numfds 3 child_pid -1 [pid=1185][err] [0323/062041.816903:ERROR:gpu_process_host.cc(968)] GPU process launch failed: error_code=1002 [pid=1185][err] [0323/062041.816914:WARNING:gpu_process_host.cc(1279)] The GPU process has crashed 3 time(s) [pid=1185][err] [0323/062041.816721:ERROR:zygote_linux.cc(271)] Unexpected real PID message from browser [pid=1185][err] [0323/062041.821091:ERROR:zygote_communication_linux.cc(142)] Did not receive ping from zygote child [pid=1185][err] [0323/062041.821123:ERROR:zygote_linux.cc(607)] Zygote could not fork: process_type gpu-process numfds 3 child_pid -1 [pid=1185][err] [0323/062041.821310:ERROR:gpu_process_host.cc(968)] GPU process launch failed: error_code=1002 [pid=1185][err] [0323/062041.821321:WARNING:gpu_process_host.cc(1279)] The GPU process has crashed 4 time(s) [pid=1185][err] [0323/062041.822089:ERROR:zygote_linux.cc(271)] Unexpected real PID message from browser [pid=1185][err] [0323/062041.823058:ERROR:zygote_communication_linux.cc(142)] Did not receive ping from zygote child [pid=1185][err] [0323/062041.823172:ERROR:zygote_linux.cc(607)] Zygote could not fork: process_type gpu-process numfds 3 child_pid -1 [pid=1185][err] [0323/062041.823358:ERROR:gpu_process_host.cc(968)] GPU process launch failed: error_code=1002 [pid=1185][err] [0323/062041.823369:WARNING:gpu_process_host.cc(1279)] The GPU process has crashed 5 time(s) [pid=1185][err] [0323/062041.824213:ERROR:zygote_linux.cc(271)] Unexpected real PID message from browser [pid=1185][err] [0323/062041.825010:ERROR:zygote_communication_linux.cc(142)] Did not receive ping from zygote child [pid=1185][err] [0323/062041.825129:ERROR:zygote_linux.cc(607)] Zygote could not fork: process_type gpu-process numfds 3 child_pid -1 [pid=1185][err] [0323/062041.825312:ERROR:gpu_process_host.cc(968)] GPU process launch failed: error_code=1002 [pid=1185][err] [0323/062041.825323:WARNING:gpu_process_host.cc(1279)] The GPU process has crashed 6 time(s) [pid=1185][err] [0323/062041.825608:ERROR:zygote_linux.cc(271)] Unexpected real PID message from browser [pid=1185][err] [0323/062041.825332:FATAL:gpu_data_manager_impl_private.cc(447)] GPU process isn't usable. Goodbye. [pid=1185][err] #0 0x55f9da32a369 base::debug::CollectStackTrace() [pid=1185][err] #1 0x55f9da2908c3 base::debug::StackTrace::StackTrace() [pid=1185][err] #2 0x55f9da2a3650 logging::LogMessage::~LogMessage() [pid=1185][err] #3 0x55f9d7e92bf7 content::(anonymous namespace)::IntentionallyCrashBrowserForUnusableGpuProcess() [pid=1185][err] #4 0x55f9d7e903fe content::GpuDataManagerImplPrivate::FallBackToNextGpuMode() [pid=1185][err] #5 0x55f9d7e8f303 content::GpuDataManagerImpl::FallBackToNextGpuMode() [pid=1185][err] #6 0x55f9d7e99d13 content::GpuProcessHost::RecordProcessCrash() [pid=1185][err] #7 0x55f9d7e9af44 content::GpuProcessHost::OnProcessLaunchFailed() [pid=1185][err] #8 0x55f9d7d15421 content::BrowserChildProcessHostImpl::OnProcessLaunchFailed() [pid=1185][err] #9 0x55f9d7d6faf5 content::internal::ChildProcessLauncherHelper::PostLaunchOnClientThread() [pid=1185][err] #10 0x55f9d7d6fd15 base::internal::Invoker<>::RunOnce() [pid=1185][err] #11 0x55f9da2e8bb0 base::TaskAnnotator::RunTaskImpl() [pid=1185][err] #12 0x55f9da2fca99 base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWorkImpl() [pid=1185][err] #13 0x55f9da2fc7bc base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWork() [pid=1185][err] #14 0x55f9da2fcf92 base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWork() [pid=1185][err] #15 0x55f9da2ac06b base::(anonymous namespace)::WorkSourceDispatch() [pid=1185][err] #16 0x7fe589c2b17d g_main_context_dispatch [pid=1185][err] #17 0x7fe589c2b400 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.6400.6+0x523ff) [pid=1185][err] #18 0x7fe589c2b4a3 g_main_context_iteration [pid=1185][err] #19 0x55f9da2abeb3 base::MessagePumpGlib::Run() [pid=1185][err] #20 0x55f9da2fd1fe base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::Run() [pid=1185][err] #21 0x55f9da2ca3ed base::RunLoop::Run() [pid=1185][err] #22 0x55f9d7d2d2ad content::BrowserMainLoop::RunMainMessageLoop() [pid=1185][err] #23 0x55f9d7d2eb62 content::BrowserMainRunnerImpl::Run() [pid=1185][err] #24 0x55f9df75683e headless::HeadlessContentMainDelegate::RunProcess() [pid=1185][err] #25 0x55f9d9e42862 content::RunBrowserProcessMain() [pid=1185][err] #26 0x55f9d9e43d0f content::ContentMainRunnerImpl::RunBrowser() [pid=1185][err] #27 0x55f9d9e4389f content::ContentMainRunnerImpl::Run() [pid=1185][err] #28 0x55f9d9e40cb4 content::RunContentProcess() [pid=1185][err] #29 0x55f9d9e415ce content::ContentMain() [pid=1185][err] #30 0x55f9d9e9cc5a headless::(anonymous namespace)::RunContentMain() [pid=1185][err] #31 0x55f9d9e9c965 headless::HeadlessShellMain() [pid=1185][err] #32 0x55f9d6961fa8 ChromeMain [pid=1185][err] #33 0x7fe588ea60b3 __libc_start_main [pid=1185][err] #34 0x55f9d6961dea _start [pid=1185][err] Task trace: [pid=1185][err] #0 0x55f9d7d6f9ac content::internal::ChildProcessLauncherHelper::PostLaunchOnLauncherThread() [pid=1185][err] #1 0x55f9d7d6f3aa content::internal::ChildProcessLauncherHelper::StartLaunchOnClientThread() [pid=1185][err] #2 0x55f9da682456 mojo::SimpleWatcher::Context::Notify() [pid=1185][err] #3 0x55f9d7d6f3aa content::internal::ChildProcessLauncherHelper::StartLaunchOnClientThread() [pid=1185][err] #4 0x55f9da682456 mojo::SimpleWatcher::Context::Notify() [pid=1185][err] Task trace buffer limit hit, update PendingTask::kTaskBacktraceLength to increase. [pid=1185][err]

    opened by yyyy777 0
Releases(v0.2.3)
  • v0.2.3(Jan 11, 2022)

  • v0.2.0(Dec 28, 2021)

    • New Feature: Add support for:
      • Specifying channel for launching
      • Specifying executablePath for launching
      • Specifying slowMo for launching
      • Specifying devtools for launching
      • Specifying --disable-extensions in args for launching
      • Specifying --hide-scrollbars in args for launching
      • Specifying --no-sandbox in args for launching
      • Specifying --disable-setuid-sandbox in args for launching
      • Specifying --disable-gpu in args for launching
    • Update: change GERAPY_PLAYWRIGHT_SLEEP default to 0
    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(Dec 28, 2021)

  • v0.1.1(Dec 27, 2021)

    First version of Playwright, add basic support for:

    • Auto Installation
    • Render with Playwright
    • Setting Concurrency
    • Setting Proxy
    • Setting Cookies
    • Screenshot
    • Evaluating Script
    • Wait for Elements
    • Wait loading control
    • Setting Timeout
    • Pretending Webdriver
    Source code(tar.gz)
    Source code(zip)
Owner
Gerapy
Distributed Crawler Management Framework Based on Scrapy, Scrapyd.
Gerapy
Downloads state flags from wikipedia for states/regions from all countries

world-state-flags Downloads state flags from wikipedia for states/regions from all countries This data is NOT curated Uses https://github.com/dr5hn/co

João Ribeiro Bezerra 2 Dec 15, 2022
Download minecraft head or skin, allows TLauncher accounts

Minecraft-skin-downloader Download minecraft head or skin, allows TLauncher accounts by BoBkiNN_ Contact: https://vk.com/bobkinnvk Requirements: Modul

3 Apr 03, 2022
Downloads files and folders

PyDownloader Downloads files and folders at high speed (based on your interent speed). This is very useful to transfer big files from one computer to

ArmenG 4 Feb 24, 2022
A downloader for the ISIS service of TU Berlin

isis_dl A downloading utility for the ISIS tool of TU-Berlin. Version 0.4 Features Downloads all Material from all courses of your ISIS page. Efficien

1 Nov 06, 2021
Download from HBO-MAX-BLIM-TV-Paramount

#HBO MAX- BlimTV -Paramount plus 4K Downloader Tool To download 4K HDR DV SDR from HBO MAX- BlimTV -Paramount plus Hello Fellow Developers/ ! Hi! M

4 Dec 25, 2021
A cross platform front-end GUI of the popular youtube-dl written in wxPython.

youtube-dlG A cross platform front-end GUI of the popular youtube-dl media downloader written in wxPython. Supported sites Screenshots Requirements Py

8.7k Dec 31, 2022
Parallels Desktop dmg downloader

parallelsdesktop-dl Parallels Desktop dmg file downloader Usage usage: pd-dl [-h] [--dlv [DLV]] [-v] Parallels Desktop downloader optional arguments

2 Sep 13, 2022
Programmers-quest - Programmer's Quest! An open source MMO built on top of the Panda3D game engine and Astron server

Programmer's Quest! Programmer's Quest! The open source Python 3 2D MMORPG showc

Jordan Maxwell 5 Oct 07, 2022
🔥 A Bot To Telegram For Download High Qulity Videos & Songs From Youtube

🔥 A Bot To Telegram For Download High Qulity Videos & Songs From Youtube 🎗 Fast And Free Bot No Need To Pay ✅ By SL-Alpha-X-Team ⚡

Official Alpha-X-Team Account 7 Aug 31, 2022
Download YouTube videos that are available in the given playlist

Youtube-Playlist-Downloader Download YouTube videos that are in a playlist Project assets: music downloaded music folder. (will be generated) music.db

Sultan Aljaberi 1 Dec 22, 2021
Twitter Media Downloader (Telegram Bot)

Twitter Media Downloader (Telegram Bot)

Matin Baloochestani 8 Oct 27, 2022
Scripts to download files and folders programmatically from Google Drive

Google Drive Downloader Scripts Every time I need to download a lot of files from Google Drive (e.g. a dataset), it's always incredibly frustrating an

Ivan Evtimov 6 Jul 22, 2021
A growing collection of search plugins for the qBittorrent, an awesome and opensource torrent client

qBittorrent Search Plugins This is a still growing collection of search plugins for qBittorent, an amazing and open source torrent client, maintained

Alessio Tudisco 59 Dec 26, 2022
Itchio Downloader Tool with python

Itchio Downloader Tool Install pip install git+https://github.com/emersont1/itchio Download All Games in library from account python -m itchio.downloa

Peter Taylor 69 Dec 05, 2022
ComicDownloader - Downloads Comics from readcomiconline.li

ComicDownloader Downloads Comics from readcomiconline.li To use this script from

2 Nov 08, 2022
Python module to download all media from a GoFile gallery.

GoFile Downloader Setup First of all, clone this repository : ~$ git clone https://github.com/quatrecentquatre-404/gofile-downloader Second, oh wait..

Quatrecentquatre 61 Jan 01, 2023
Youtube-music - Youtube music with python

youtube-music fzf on https://github.com/junegunn/fzf python3 ytb.py [no/yes] yes

direskyfer 0 Feb 03, 2022
Ripurei is a free-to-use osu! replay downloader, that can be configured to download from any osu! server.

Ripurei Ripurei is a fully functional osu! replay downloader, fully capable of downloading from almost any osu! server. Functionality Timeline ✔️ Able

Thomas 0 Feb 11, 2022
Youtube playlist downloader with full metadata support

ytrake GUI tool to embed metadata for albums on Youtube with youtube-dl. Requires youtube-dl v2021.06.06. Post-processing Album metadata: Usage ytrake

28 Jul 12, 2022
Youtube videos and channels scraper python wrapper!

YouTubeCrawle Wrapper for python Why This wrapper? This is wrapper is not limited to videos only it can scrape both channel and videos seperately ;D

Kei 16 Aug 08, 2022