• nodejs爬虫,服务器经常返回ECONNRESET或者socket hang up错误的解决方法


    最近在做一个超大爬虫,总的http请求数量接近一万次,而且都是对同一个url发出的请求,只是请求的body有所不同。

    测试发现,经常会出现request error: read ECONNRESET、request error: socket hang up 这样的错误,查看日志发现每次出错的请求的body并不一样,并且通过正常方式(手机APP)访问这个资源时不会出错。 虽然已经用了for ... of + await 的语法代替forEach,限制了同一时间的异步请求数量,但每次这1万个请求中总会出现4 5 百个这样的错误。猜测是服务器有某种保护机制,拒绝了同一IP的过多请求,搜索一番后在以下链接中找到解决方法:

    https://www.gregjs.com/javascript/2015/how-to-scrape-the-web-gently-with-node-js/

    Limiting maximum concurrent sockets in Node

    即限制并发套接字的最大数量

    1 var http = require('http');
    2 var https = require('https');
    3 http.globalAgent.maxSockets = 1;
    4 https.globalAgent.maxSockets = 1;

    而我用的是request包,参考npm文档(https://www.npmjs.com/package/request):

    pool - an object describing which agents to use for the request. If this option is omitted the request will use the global agent (as long as your options allow for it). Otherwise, request will search the pool for your custom agent. If no custom agent is found, a new agent will be created and added to the pool. Note: pool is used only when the agent option is not specified.

    • maxSockets property can also be provided on the pool object to set the max number of sockets for all agents created (ex: pool: {maxSockets: Infinity}).
    • Note that if you are sending multiple requests in a loop and creating multiple new pool objects, maxSockets will not work as intended. To work around this, either use request.defaults with your pool options or create the pool object with the maxSockets property outside of the loop.

    于是,在option中加入,即可解决。

    当然具体最大值设为多少合适需要多次测试,值太低则爬虫耗时过长,太高又可能出现之前的错误。

  • 相关阅读:
    分布式事务之最终一致性BASE理论
    CAP理论
    Comparator中返回0导致数据丢失的大坑
    电脑主板分类
    SimpleDateFormat线程不安全
    Redis面试题
    JS闭包
    ES6将两个数组合并成一个对象数组
    视频色彩空间RGB、YUV、YCbCr
    c#接口作用的深入理解
  • 原文地址:https://www.cnblogs.com/mrlonely2018/p/12930174.html
Copyright © 2020-2023  润新知