• 一种比较low的linux的hung分析


    在调试一个功能的时候,发现了两种hung,以前认为的hung肯定是softlock导致的,后来才发现不一定要有lock这种结构,但是有类似于锁的功能的时候,也可能触发hung,为了避免大家走弯路,故记录之。

    unsigned find_get_pages(struct address_space *mapping, pgoff_t start,
    			    unsigned int nr_pages, struct page **pages)
    {
    	struct radix_tree_iter iter;
    	void **slot;
    	unsigned ret = 0;
    
    	if (unlikely(!nr_pages))
    		return 0;
    
    	rcu_read_lock();
    restart:
    	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
    		struct page *page;
    repeat:
    		page = radix_tree_deref_slot(slot);
    		if (unlikely(!page))
    			continue;
    
    		if (radix_tree_exception(page)) {
    			if (radix_tree_deref_retry(page)) {
    				/*
    				 * Transient condition which can only trigger
    				 * when entry at index 0 moves out of or back
    				 * to root: none yet gotten, safe to restart.
    				 */
    				WARN_ON(iter.index);
    				goto restart;
    			}
    			/*
    			 * A shadow entry of a recently evicted page,
    			 * or a swap entry from shmem/tmpfs.  Skip
    			 * over it.
    			 */
    			continue;
    		}
    
    		if (!page_cache_get_speculative(page))--------------------遇到过hung
    			goto repeat;
    
    		/* Has the page moved? */
    		if (unlikely(page != *slot)) {
    			page_cache_release(page);
    			goto repeat;
    		}
    
    		pages[ret] = page;
    		if (++ret == nr_pages)
    			break;
    	}
    
    	rcu_read_unlock();
    	return ret;
    }
    

      为什么这里会hung呢,因为repeat之后,又会执行到page_cache_get_speculative 函数,而这个是判断page的计数是否为0,如果不为0,则原子加1,否则就repeat。

    那一个page怎么会一直计数为0,而又在radix树中呢,所以接下来又判断,如果这个page被移除了,则说明树中对应的位置有新的page,则重新取page来做计数+1,但是我遇到的情况是,page的计数为0,同时,又没有被radix树移除,导致在  page_cache_get_speculative 函数就repeat 了,当然,这个是由于page的计数管理不当导致的,但是内核表现的情况确是hung 了。

    还遇到过一次suse的hung,也是这个函数,当时我写的代码是:

    unsigned caq_find_get_pages(struct address_space *mapping, pgoff_t start,pgoff_t end,
    			    unsigned int nr_pages, struct page **pages,pgoff_t *indices)
    {
    	unsigned int i;
    	unsigned int ret;
    	unsigned int nr_found, nr_skip;
    	unsigned int overrange=0;
    
    	if (unlikely(!nr_pages)||(!pages)||(!indices))
    		return 0;
    
    	rcu_read_lock();
    restart:
    	nr_found = radix_tree_gang_lookup_slot(&mapping->page_tree,
    				(void ***)pages, indices, start, nr_pages);
    	ret = 0;
    	nr_skip = 0;
    	for (i = 0; i < nr_found; i++) {
    		struct page *page;
    repeat:
    		page = radix_tree_deref_slot((void **)pages[i]);
    		if (unlikely(!page))
    			continue;
    
    		if (radix_tree_exception(page)) {
    			if (radix_tree_exceptional_entry(page)) {
    				nr_skip++;
    				continue;
    			}
    			/*
    			 * radix_tree_deref_retry(page):
    			 * can only trigger when entry at index 0 moves out of
    			 * or back to root: none yet gotten, safe to restart.
    			 */
    			WARN_ON(start | i);
    			goto restart;
    		}
    
    		//added by caq for not find the count >2
    		if(atomic_read(&(page->_count))!=1)--------------------我增加的代码
                        continue;	
    
    		if (!page_cache_get_speculative(page))
    			goto repeat;
    
    		/* Has the page moved? */
    		if (unlikely(page != *((void **)pages[i]))) {
    			page_cache_release(page);
    			goto repeat;
    		}
    		pages[ret] = page;//在此已经增加了计数了,
    		ret++;
    		if(page->index>=end)
    		{
    			overrange=1;
    			break;
    		}
    	}
    
    	/*
    	 * If all entries were removed before we could secure them,
    	 * try again, because callers stop trying once 0 is returned.
    	 */
    	if (unlikely(!ret && nr_found > nr_skip && !overrange))
    		goto restart;
    	rcu_read_unlock();
    	return ret;
    }
    

      本来是想,不要找那些计数大于1的page,因为我需要拿这些page做一些特殊处理,结果,由于多个用户读同一个page,导致了循环无法退出,出现了hung。

    水平有限,如果有错误,请帮忙提醒我。如果您觉得本文对您有帮助,可以点击下面的 推荐 支持一下我。版权所有,需要转发请带上本文源地址,博客一直在更新,欢迎 关注 。
  • 相关阅读:
    Drozer渗透测试工具(使用篇)
    Teamcenter中TCComponentItem与TCComponentBOMLine的创建
    Swing中分割面板JSplitPane的使用
    Swing中菜单栏JToolBar的使用
    Javaweb项目导出成jar包并使用Windows定时任务定时执行
    TCSOA获取BOMLine
    SQLite Expert安装与注册
    获取分类节点
    处理TC的Command问题
    通过TCComponentBomLine获取ItemRevision的两种情况
  • 原文地址:https://www.cnblogs.com/10087622blog/p/9394162.html
Copyright © 2020-2023  润新知