• gfp_xxx 的理解


    gfp_mask参数可以设置很多值,一下是各个取值的用处(直接引用至LKD):

    #define GFP_ATOMIC    (__GFP_HIGH)
    GFP_ATOMIC  The allocation is high priority and must not sleep. This is the flag to use in interrupt handlers, in bottom halves, while holding a spinlock, and in other situations where you cannot sleep.

    #define GFP_NOWAIT    (GFP_ATOMIC & ~__GFP_HIGH), 大概意思是在说, 这必须是原子的(不可睡眠), 但优先级没那么高
    GFP_NOWAIT  Like GFP_ATOMIC, except that the call will not fallback on emergency memory pools. This increases the liklihood of the memory allocation failing.
     
    #define GFP_NOIO    (__GFP_WAIT)
    GFP_NOIO  This allocation can block, but must not initiate disk I/O. This is the flag to use in block I/O code when you cannot cause more disk I/O, which might lead to some unpleasant recursion.
     
    #define GFP_NOFS    (__GFP_WAIT | __GFP_IO)
    GFP_NOFS  This allocation can block and can initiate disk I/O, if it must, but it will not initiate a filesystem operation. This is the flag to use in filesystem code when you cannot start another filesystem operation.

    #define
    GFP_KERNEL (__GFP_WAIT | __GFP_IO | __GFP_FS)
    GFP_KERNEL  This is a normal allocation and might block. This is the flag to use in process context code when it is safe to sleep. The kernel will do whatever it has to do to obtain the memory requested by the caller. This flag should be your default choice.
     
    #define GFP_TEMPORARY    (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_RECLAIMABLE)
    #define GFP_USER    (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL)
    GFP_USER  This is a normal allocation and might block. This flag is used to allocate memory for user-space processes.
     
    #define GFP_HIGHUSER    (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL | __GFP_HIGHMEM)
    GFP_HIGHUSER  This is an allocation from ZONE_HIGHMEM  and might block. This flag is used to allocate memory for user-space processes.
     
    #define GFP_HIGHUSER_MOVABLE    (__GFP_WAIT | __GFP_IO | __GFP_FS | \
                     __GFP_HARDWALL | __GFP_HIGHMEM | \
                     __GFP_MOVABLE)
    #define GFP_IOFS    (__GFP_IO | __GFP_FS) 照规律, 似乎是在说, 可以发起 io, 可以发起文件系统操作, 但不能睡眠(有这样保证不睡眠的io/文件系统的操作吗)
     
    #define GFP_DMA        __GFP_DMA
    GFP_DMA  This is an allocation from ZONE_DMA. Device drivers that need DMA-able memory use this flag, usually in combination with one of the preceding flags.


    带2个下划线的 __GFP_XXX 基本上是带三个下划线的 ___GFP_XXX 的对应物,比如:
    #define ___GFP_WAIT        0x10u
    #define __GFP_WAIT    ((__force gfp_t)___GFP_WAIT)

    这里有几点有必要理解清楚:
    - 带三个下划线的 ___GFP_XXX 是最底层的定义, 每个标识定义为 2^n
    - ___GFP_XXX 前4个定义, 也就是 2^0 到 2^3 构成 gfp_t 类型的最低 4bits, 含义是 page zone, 其余的占据 gfp_t 高 __GFP_BITS_SHIFT - 4 位, 携带 flags 信息, 比如
    __GFP_IO, __GFP_FS 等等, 部分含义在上面有述。 当前 __GFP_BITS_SHIFT 定义为 25, __GFP_BITS_MASK 为其掩码
    #define __GFP_BITS_SHIFT 25    /* Room for N __GFP_FOO bits */
    #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))

    比较有意思的部分:

    #if MAX_NR_ZONES < 2
    #define ZONES_SHIFT 0
    #elif MAX_NR_ZONES <= 2
    #define ZONES_SHIFT 1
    #elif MAX_NR_ZONES <= 4
    #define ZONES_SHIFT 2
    #else
    #error ZONES_SHIFT -- too many zones configured adjust calculation
    #endif
    /*
     * GFP_ZONE_TABLE is a word size bitstring that is used for looking up the
     * zone to use given the lowest 4 bits of gfp_t. Entries are ZONE_SHIFT long
     * and there are 16 of them to cover all possible combinations of
     * __GFP_DMA, __GFP_DMA32, __GFP_MOVABLE and __GFP_HIGHMEM.
     *
     * The zone fallback order is MOVABLE=>HIGHMEM=>NORMAL=>DMA32=>DMA.
     * But GFP_MOVABLE is not only a zone specifier but also an allocation
     * policy. Therefore __GFP_MOVABLE plus another zone selector is valid.
     * Only 1 bit of the lowest 3 bits (DMA,DMA32,HIGHMEM) can be set to "1".
     *
     *       bit       result
     *       =================
     *       0x0    => NORMAL
     *       0x1    => DMA or NORMAL
     *       0x2    => HIGHMEM or NORMAL
     *       0x3    => BAD (DMA+HIGHMEM)
     *       0x4    => DMA32 or DMA or NORMAL
     *       0x5    => BAD (DMA+DMA32)
     *       0x6    => BAD (HIGHMEM+DMA32)
     *       0x7    => BAD (HIGHMEM+DMA32+DMA)
     *       0x8    => NORMAL (MOVABLE+0)
     *       0x9    => DMA or NORMAL (MOVABLE+DMA)
     *       0xa    => MOVABLE (Movable is valid only if HIGHMEM is set too)
     *       0xb    => BAD (MOVABLE+HIGHMEM+DMA)
     *       0xc    => DMA32 (MOVABLE+HIGHMEM+DMA32)
     *       0xd    => BAD (MOVABLE+DMA32+DMA)
     *       0xe    => BAD (MOVABLE+DMA32+HIGHMEM)
     *       0xf    => BAD (MOVABLE+DMA32+HIGHMEM+DMA)
     *
     * ZONES_SHIFT must be <= 2 on 32 bit platforms.
     */
    
    #if 16 * ZONES_SHIFT > BITS_PER_LONG
    #error ZONES_SHIFT too large to create GFP_ZONE_TABLE integer
    #endif
    
    #define GFP_ZONE_TABLE ( \
        (ZONE_NORMAL << 0 * ZONES_SHIFT)                      \
        | (OPT_ZONE_DMA << ___GFP_DMA * ZONES_SHIFT)                  \
        | (OPT_ZONE_HIGHMEM << ___GFP_HIGHMEM * ZONES_SHIFT)              \
        | (OPT_ZONE_DMA32 << ___GFP_DMA32 * ZONES_SHIFT)              \
        | (ZONE_NORMAL << ___GFP_MOVABLE * ZONES_SHIFT)                  \
        | (OPT_ZONE_DMA << (___GFP_MOVABLE | ___GFP_DMA) * ZONES_SHIFT)          \
        | (ZONE_MOVABLE << (___GFP_MOVABLE | ___GFP_HIGHMEM) * ZONES_SHIFT)   \
        | (OPT_ZONE_DMA32 << (___GFP_MOVABLE | ___GFP_DMA32) * ZONES_SHIFT)   \
    )
    
    /*
     * GFP_ZONE_BAD is a bitmap for all combinations of __GFP_DMA, __GFP_DMA32
     * __GFP_HIGHMEM and __GFP_MOVABLE that are not permitted. One flag per
     * entry starting with bit 0. Bit is set if the combination is not
     * allowed.
     */
    #define GFP_ZONE_BAD ( \
        1 << (___GFP_DMA | ___GFP_HIGHMEM)                      \
        | 1 << (___GFP_DMA | ___GFP_DMA32)                      \
        | 1 << (___GFP_DMA32 | ___GFP_HIGHMEM)                      \
        | 1 << (___GFP_DMA | ___GFP_DMA32 | ___GFP_HIGHMEM)              \
        | 1 << (___GFP_MOVABLE | ___GFP_HIGHMEM | ___GFP_DMA)              \
        | 1 << (___GFP_MOVABLE | ___GFP_DMA32 | ___GFP_DMA)              \
        | 1 << (___GFP_MOVABLE | ___GFP_DMA32 | ___GFP_HIGHMEM)              \
        | 1 << (___GFP_MOVABLE | ___GFP_DMA32 | ___GFP_DMA | ___GFP_HIGHMEM)  \
    )
    
    static inline enum zone_type gfp_zone(gfp_t flags)
    {
        enum zone_type z;
        int bit = (__force int) (flags & GFP_ZONEMASK);
    
        z = (GFP_ZONE_TABLE >> (bit * ZONES_SHIFT)) &
                         ((1 << ZONES_SHIFT) - 1);
        VM_BUG_ON((GFP_ZONE_BAD >> bit) & 1);
        return z;
    }
    
    /*
     * There is only one page-allocator function, and two main namespaces to
     * it. The alloc_page*() variants return 'struct page *' and as such
     * can allocate highmem pages, the *get*page*() variants return
     * virtual kernel addresses to the allocated page(s).
     */
    
    static inline int gfp_zonelist(gfp_t flags)
    {
        if (NUMA_BUILD && unlikely(flags & __GFP_THISNODE))
            return 1;
    
        return 0;
    }

    GFP_ZONE_TABLE/GFP_ZONE_BAD 是键值对表, 其中键为 __GFP_XXX 的低 4bits 的组合, 可以表示 2^4 种可能, 其中GFP_ZONE_TABLE为合理组合, GFP_ZONE_BAD为不合理组合, 键表现为值在 bitstring 中的索引上, 比如, 在32位CPU上, GFP_ZONE_TABLE/GFP_ZONE_BAD 是一个 32 位的 bit array, 16个键值平均分配这 32bits 的空间, 决定了每个键值最多分配 2bits。 这也是下面的宏尝试检测的内容

    #if 16 * ZONES_SHIFT > BITS_PER_LONG
    #error ZONES_SHIFT too large to create GFP_ZONE_TABLE integer
    #endif
    

    GFP_ZONE_TABLE 值为某个类型的zone;  恰好的是, zone 的类型最多只有四种, DMA/DMA32, NORMAL, HIGHMEM, MOVABLE, 正好 2bits 可以放的下。

    通过键返回 GFP_ZONE_TABLE 中对应的值的辅助函数即 gfp_zone, 返回的 zone 在 alloc_pages 中起着关键作用; 简单说来, 在 3.x 内核中, alloc_pages 分配策略与之前相比有了简化, 之前是每个 pg_data_t 中存在多个分配策略, 比如优先在本地CPU的某些zone中分配, 不成功在另一个CPU的某些zone上分配, 这个优先顺序被保存在zonelist中, zonelist地址又保存在 pg_data_t 中的 node_zonelists 数组中, 数组每一项称为一个策略, 供调用者进行选择;而在 3.x 代码中, 最多存在两个策略, node_zonelists[0] 保存的是包含所有 pg_data_t 所有zone的一个链表(其实是zoneref), 按 MOVABLE => HIGHMEM => NORMAL => DMA/DMA32 排序, 每个 zoneref 中有一个成员为 zone_idx, 取值即为MOVABLE, HIGHMEM, NORMAL 或者 DMA/DMA32, zoneref也是  MOVABLE => HIGHMEM => NORMAL => DMA/DMA32 顺序排列, 即不同 pg_data_list 的相同类型的 zone 是排在一起的, 可以推断, 本地CPU肯定又在相同类型的zone中排第一, 形成:

    本地CPU MOVABLE=>其他CPU MOVABLE=>本地CPU HIGHMEM=>其他CPU HIGHMEM 这类链表; 在这个结构的基础上, 分配策略就很容易确定了, 根据  flag 找到其他的 zone 类型, 而后遍历链表, 找到该类型的第一个zone, 然后以着为起点, 尝试分配, 若不成功, 则尝试后继的zone. 

    代码如下:

    /* Returns the next zone at or below highest_zoneidx in a zonelist */
    struct zoneref *next_zones_zonelist(struct zoneref *z,
                        enum zone_type highest_zoneidx,
                        nodemask_t *nodes,
                        struct zone **zone)
    {
        /*
         * Find the next suitable zone to use for the allocation.
         * Only filter based on nodemask if it's set
         */
        if (likely(nodes == NULL))
            while (zonelist_zone_idx(z) > highest_zoneidx)
                z++;
        else
            while (zonelist_zone_idx(z) > highest_zoneidx ||
                    (z->zone && !zref_in_nodemask(z, nodes)))
                z++;
    
        *zone = zonelist_zone(z);
        return z;
    }
    
    
    /**
     * first_zones_zonelist - Returns the first zone at or below highest_zoneidx within the allowed nodemask in a zonelist
     * @zonelist - The zonelist to search for a suitable zone
     * @highest_zoneidx - The zone index of the highest zone to return
     * @nodes - An optional nodemask to filter the zonelist with
     * @zone - The first suitable zone found is returned via this parameter
     *
     * This function returns the first zone at or below a given zone index that is
     * within the allowed nodemask. The zoneref returned is a cursor that can be
     * used to iterate the zonelist with next_zones_zonelist by advancing it by
     * one before calling.
     */
    static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist,
                        enum zone_type highest_zoneidx,
                        nodemask_t *nodes,
                        struct zone **zone)
    {
        return next_zones_zonelist(zonelist->_zonerefs, highest_zoneidx, nodes,
                                    zone);
    }
    
    /**
     * for_each_zone_zonelist_nodemask - helper macro to iterate over valid zones in a zonelist at or below a given zone index and within a nodemask
     * @zone - The current zone in the iterator
     * @z - The current pointer within zonelist->zones being iterated
     * @zlist - The zonelist being iterated
     * @highidx - The zone index of the highest zone to return
     * @nodemask - Nodemask allowed by the allocator
     *
     * This iterator iterates though all zones at or below a given zone index and
     * within a given nodemask
     */
    #define for_each_zone_zonelist_nodemask(zone, z, zlist, highidx, nodemask) \
        for (z = first_zones_zonelist(zlist, highidx, nodemask, &zone);    \
            zone;                            \
            z = next_zones_zonelist(++z, highidx, nodemask, &zone))    \

    GFP_ZONE_BAD更显简单一些, 键如前一样, 值只有一个1, 表示这是一个无效组合, 这样根据键, 找到在 bit array 中的位置, 再判断该位置上的值是不是1即可。

     



  • 相关阅读:
    win11如何启用IIS管理器应用
    vue 滑块 验证
    将页面dom导出为pdf格式并进行下载
    vue3+vite应用中添加sass预处理器
    Vue2.0与Vue3.0区别总结
    迅为iTOPSTM32MP157开发板重磅发布
    迅为RK3568核心板
    迅为STM32MP157开发板入门教程之外设功能验证
    迅为2K1000开发板龙芯平台Ejtag 单步调试 PMON 的汇编阶段
    迅为2K1000龙芯开发板pmon BSP移植之配置CAN总线
  • 原文地址:https://www.cnblogs.com/zylthinking/p/2626015.html
Copyright © 2020-2023  润新知