Go Crawler Documentation #

About Go Crawler #

A web crawling framework implemented in Golang, it is simple to write and delivers powerful performance. It comes with a wide range of practical middleware and supports various parsing and storage methods. Additionally, it supports distributed deployment.

Run #

git clone git@github.com:lizongying/go-crawler-example.git my-crawler
cd my-crawler
go run cmd/multi_spider/*.go -c example.yml -n test1 -m once

Feature #

Simple to write, yet powerful in performance.
Built-in various practical middleware for easier development.
Supports multiple parsing methods for simpler page parsing.
Supports multiple storage methods for more flexible data storage.
Provides numerous configuration options for richer customization.
Allows customizations for components, providing more freedom for feature extensions.
Includes a built-in mock Server for convenient debugging and development.
It supports distributed deployment.

Support Summary #

Parsing supports CSS, XPath, Regex, and JSON.
Output supports JSON, CSV, MongoDB, MySQL, Sqlite, and Kafka.
Supports Chinese decoding for gb2312, gb18030, gbk, big5 character encodings.
Supports gzip, deflate, and brotli decompression.
Supports distributed processing.
Supports Redis and Kafka as message queues.
Supports automatic handling of cookies and redirects.
Supports BaseAuth authentication.
Supports request retry.
Supports request filtering.
Supports image file downloads.
Supports image processing.
Supports object storage.
Supports SSL fingerprint modification.
Supports HTTP/2.
Supports random request headers.
Browser simulation is supported.
Supports browser AJAX requests.
Mock server is supported.
Priority queue is supported.
Supports scheduled tasks, recurring tasks, and one-time tasks.
Supports parsing based on field labels.
Supports DNS Cache.
Supports MITM
Supports error logging