• [Storm] java.io.FileNotFoundException: File '../stormconf.ser' does not exist


    This bug will kill supervisors

    Affects Version/s: 0.9.2-incubating, 0.9.3, 0.9.4 

    Fix Version/s: 0.10.0, 0.9.5

    问题背景

    最近发现刚搭起的Storm集群,没过多久,Supervisor 便悄然死去了一大半。查看死去Supervisor的log,发现java.io.FileNotFoundException: File '../stormconf.ser' does not exist异常。网上给出的答案大多是

        将 { storm.local.dir } 目录下的文件清空,重启就好了。

    但这是指标不治本,即时重启可以跑起来,可是为什么会出现这个问题,依然不知道。

    然后才发现线STORM-130解决了这个问题。该问题的重现场景:

    1) Run a storm cluster with atleast 2 supervisors with 4 slots each
    2) Deploy a topology that uses 4 workers, topology will be distributed with each supervisor having two workers each
    3) kill one of the supervisor lets say supervisor1 
    4) wait till topology re-balances to occupy 4 workers on supervisor2
    5) now bring up supervisor1, It goes through the cycle of cleaning up old topology code
    6) nimbus re-balances topology which triggers supervisor.sync-process method
    7) sync-process tries to launch a worker for the topology whose code data is delete when the supervisor started causing it throw up following exception

    问题原因

    上面场景分析提到的 sync-process是supervisor运行的一个函数。Supervisor会在后台运行这两个函数:

    • synchronize-supervisor: This is called whenever assignments in Zookeeper change and also every 10 seconds. 
      • Downloads code from Nimbus for topologies assigned to this machine for which it doesn't have the code yet. 
      • Writes into local filesystem what this node is supposed to be running. It writes a map from port -> LocalAssignment. LocalAssignment contains a topology id as well as the list of task ids for that worker. 
    • sync-processes: Reads from the LFS what synchronize-supervisor wrote and compares that to what's actually running on the machine. It then starts/stops worker processes as necessary to synchronize. 

    从描述中可以看出,synchronized-supervisor 和 sync-process 两个函数是通过 LFS 进行同步。The key reason is "synchronize-supervisor" which responsible for download file and remove file thread and "sync-processes" which responsible for start worker process thread is Asynchronous. 

    in synchronize-supervisor read assigment information from zk, supervisor download necessary file from nimbus and write local state. In aother thread sync-processes funciton read local state to launch workor process, when the worker process has not start ,synchronize-supervisor function is called again topology's assignment information has changed (cased by rebalance,or worker time out etc) worker assignment to this supervisor has move to another supervisor, synchronize-supervisor remove the unnecessary file (jar file and ser file etc.) , after this, worker launched by " sync-processes" ,ser file was not exsit , this issue occur. 

    可能解决办法

    • 换一个storm
    • 调整参数
      • Change "synchronize-supervisor" thread loop time to a longger than 10(default time) sec, such as 30 sec。
      • supervisor.worker.timeout.secs: 30 -> 5

    References:

    • https://issues.apache.org/jira/browse/STORM-130
    • http://storm.apache.org/documentation/Lifecycle-of-a-topology.html

     

  • 相关阅读:
    摄影技巧:如何拍好夜景?这些拍摄要点值得借鉴
    单反摄影:快门优先怎么用?
    摄影基础知识:什么是光圈优先?
    【震惊】、【无耻】、【嚣张】浙江谷誉科技旗下爱卡之家,黑商圈钱跑路,强行黑吃,用户损失累计数亿
    爱卡之家是不是骗人的,爱卡之家跑路了吗?
    浙江谷誉网络的爱卡之家怎么样,是不是真实的,靠不靠谱?
    爱卡之家app怎么样?爱卡之家油卡套餐可信吗?爱卡之家是不是骗人的,靠不靠谱?
    爱卡之家充值不到账 爱卡之家疑似跑路 爱卡之家客服联系不上
    android TypedValue.applyDimension()的作用
    Android 在xml中配置 float 和 integer 值
  • 原文地址:https://www.cnblogs.com/qingwen/p/4997302.html
Copyright © 2020-2023  润新知