源码剖析Linux epoll实现机制及Linux上惊群

源码剖析Linux epoll实现机制及Linux上惊群
转载：https://blog.csdn.net/tgxallen/article/details/78086360

看源码是对一个技术认识最直接且最有效的方式了，之前用Linux Epoll做过一个服务程序，但是只是停留在会用的层次，对其中的原理和实现细节却认识较少，最近在阅读Linux epoll实现的源码，所以把epoll的实现做一个详细的介绍，如果有不到之处或存在错误，请大家指正。

本文主要内容如下：
1. 实现epoll的一些重要数据结构
2. epoll使用中关键函数源码剖析
3. epoll中ET与LT模式原理介绍
4. epoll惊群与accept惊群
5. epoll与IOCP
1. 实现epoll的一些重要数据结构
关于epoll数据结构，其实说三个就差不多可以解释epoll的大概原理了。红黑树，链表，队列这三个数据结构分别对应了epoll实现机制中的evenpoll，epitem，list_head三个类型。其中eventpoll对应于epoll_create创建的epfd对应的结构体，它贯穿整个epoll处理过程，epitem结构对应于每一个我感兴趣的事件，eventpoll管理的epitem的方法是基于红黑树的方式，在epitem所对应的事件的发生时，内核会将其移到eventpoll的就绪队列里。下面对eventpoll和epitem进行详细分析
- eventpoll
在linux内核中一切皆文件，同样epoll在内部也是先创建了一个匿名文件系统，然后文件节点将epoll fd与文件节点绑定，这个文件节点只用于epoll。这个在后面源码分析中会有所体现，这边提到这个是因为eventpoll这个结构体是在文件结构体中的private_data中存储，在后面的实现中需要通过文件节点的private_data来寻找对应的eventpoll结构体，eventpoll结构体是epoll最主要的结构体。
1. struct eventpoll {
2. /* Protect the access to this structure */
3. spinlock_t lock; /*用来保护当前数据结构的旋转锁*/
5. /*
6. * This mutex is used to ensure that files are not removed
7. * while epoll is using them. This is held during the event
8. * collection loop, the file cleanup path, the epoll file exit
9. * code and the ctl operations.
10. */
11. struct mutex mtx;
13. /* Wait queue used by sys_epoll_wait() */
14. wait_queue_head_t wq; /*当我们调用epoll_wait时，系统陷入内核，内核会监听wait_queue中我们感兴趣的事件，如果事件发生会将该描述符放到rdlist中*/
16. /* Wait queue used by file->poll() */
17. wait_queue_head_t poll_wait;
19. /* List of ready file descriptors */
20. struct list_head rdllist; /*当我们调用epoll_wait时，内核会检查这个队列中是否有已完成的事件*/
22. /* RB tree root used to store monitored fd structs */
23. struct rb_root_cached rbr;
25. /*在内核将已就绪的事件从内核转移到用户空间的这段时间内，由ovflist来负责接收ready event
26. */
27. struct epitem *ovflist;
29. /* wakeup_source used when ep_scan_ready_list is running */
30. struct wakeup_source *ws;
32. /* The user that created the eventpoll descriptor */
33. struct user_struct *user;
35. struct file *file;
37. /* used to optimize loop detection check */
38. int visited;
39. struct list_head visited_list_link;
41. #ifdef CONFIG_NET_RX_BUSY_POLL
42. /* used to track busy poll napi_id */
43. unsigned int napi_id;
44. #endif
45. };
- epitem
epitem如之前所说对应到每个用户注册的事件，当用户调用epoll_ctl(event,EPOLL_CTL_ADD|..),用户在用户空间创建epoll_event，内核会调用epoll_insert，然后创建一个epitem结构存放event。
1. struct epitem {
2. union {
3. /* 这个节点用来链接当前epitem所绑定到的eventpoll中的红黑树rbr结构 */
4. struct rb_node rbn;
5. /* Used to free the struct epitem */
6. struct rcu_head rcu;
7. };
9. /* 类似于rbn，这个结构用于链接eventpoll的rdlist结构 */
10. struct list_head rdllink;
12. /*
13. * Works together "struct eventpoll"->ovflist in keeping the
14. * single linked chain of items.
15. */
16. struct epitem *next;
18. /* 从epitem可以得到当前item绑定的是哪个epoll文件描述符 */
19. struct epoll_filefd ffd;
21. /* Number of active wait queue attached to poll operations */
22. int nwait;
24. /* List containing poll wait queues */
25. struct list_head pwqlist;
27. /* 链接到对应的eventpoll */
28. struct eventpoll *ep;
30. /* List header used to link this item to the "struct file" items list */
31. struct list_head fllink;
33. /* wakeup_source used when EPOLLWAKEUP is set */
34. struct wakeup_source __rcu *ws;
36. /* 这个结构保存着用户创建的epoll_event，当用户调用epoll_ctl向epoll注册一个事件时，内核会创建一个epitem，并将用户传进来的epoll_event保存在这个结构中 */
37. struct epoll_event event;
38. };
另外有一点值得一说的是，内核为epitem和eppoll_entry分别在内核创建了高速缓存层，这也是epoll性能强大的原因之一吧。

2. epoll使用中关键函数源码剖析
epoll编程的用户态接口很简单，epoll_create, epoll_ctl, epoll_wait三个函数基本上就包含了epoll 编程相关的所有操作。虽然用户态很简单的调用一个接口就可以创建epoll描述符，但是内核却是需要忙活一阵子的。本节就针对这三个函数在内核中如何实现，以及其他相关的一些操作函数做一些解读。
- epoll_create
int epoll_create(int size);在比较新的Linux版本中，size其实已经不起什么作用了，只要其数值大于0即可，原来的用途是指定可以在epoll_create返回的描述符上最大可以注册多少感兴趣的事件。自从Linux2.6.8之后，size就不用了。epoll_create在系统内部不做其他任何操作，直接调用另一个系统调用，sys_epoll_create1.
1. SYSCALL_DEFINE1(epoll_create, int, size)
2. {
3. if (size <= 0)
4. return -EINVAL;
6. return sys_epoll_create1(0);
7. }
sys_epoll_create1负责epoll_create内核部分的所有工作，由上一节描述的，eventpoll是贯穿在epoll整个流程中的主要数据结构，所以很容易知道，sys_epoll_create1的一个任务应该是创建一个eventpoll结构供后续所有epoll操作使用。
1. SYSCALL_DEFINE1(epoll_create1, int, flags)
2. {
3. int error, fd;
4. struct eventpoll *ep = NULL;
5. struct file *file;
7. /* Check the EPOLL_* constant for consistency. */
8. BUILD_BUG_ON(EPOLL_CLOEXEC != O_CLOEXEC);
10. if (flags & ~EPOLL_CLOEXEC)
11. return -EINVAL;
12. /*
13. * 创建数据结构 ("struct eventpoll").
14. */
15. error = ep_alloc(&ep);
16. if (error < 0)
17. return error;
18. /*
19. * 从文件系统获取一个未被使用的文件描述符
20. */
21. fd = get_unused_fd_flags(O_RDWR | (flags & O_CLOEXEC));
22. if (fd < 0) {
23. error = fd;
24. goto out_free_ep;
25. }
```
	/* 从匿名文件系统中。系统中获取一个名字为eventpoll的文件实例*/ 
```
1. file = anon_inode_getfile("[eventpoll]", &eventpoll_fops, ep,
2. O_RDWR | (flags & O_CLOEXEC));
3. if (IS_ERR(file)) {
4. error = PTR_ERR(file);
5. goto out_free_fd;
6. }
7. ep->file = file;
8. fd_install(fd, file);
9. return fd;
11. out_free_fd:
12. put_unused_fd(fd);
13. out_free_ep:
14. ep_free(ep);
15. return error;
16. }
sys_epoll_create1所做的事情很清晰，首先创建一个eventpoll数据结构，另外从系统中获取一个未被使用的描述符，这个描述符就是epoll_create返回的epfd，然后从匿名文件系统中获取一个名字为eventpoll的文件实例。将这个实例与eventpoll结构体链接，然后将这个描述符置于这个文件系统的描述符队列中。
- epoll_ctl
创建epfd之后，便可以通过调用epoll_ctl向epfd注册我们感兴趣的描述符了。

SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd, struct epoll_event __user *, event)；参数epfd为epoll_create创建的epoll文件描述符，op是epoll提供的几种添加动作，主要有 EPOLL_CTL_ADD, EPOLL_CTL_DEL,EPOLL_CTL_MOD.从名字中可以知道这几个操作的目的，event参数是用户态的一个数据结构，用来指定感兴趣文件描述，并且该事件的属性，一个event可以的属性有：

#define EPOLLIN 0x00000001 （读事件属性）
#define EPOLLPRI 0x00000002
#define EPOLLOUT 0x00000004（写事件属性）
#define EPOLLERR 0x00000008
#define EPOLLHUP 0x00000010
#define EPOLLRDNORM 0x00000040
#define EPOLLRDBAND 0x00000080
#define EPOLLWRNORM 0x00000100
#define EPOLLWRBAND 0x00000200
#define EPOLLMSG 0x00000400
#define EPOLLRDHUP 0x00002000

#define EPOLLEXCLUSIVE (1U << 28)（指定文件描述符唤醒方式，这个标志应该是4.9版的Linux之后加进去的，与惊群问题相关，这个后续还会讨论）

#define EPOLLWAKEUP (1U << 29)

#define EPOLLONESHOT (1U << 30)

#define EPOLLET (1U << 31)（边缘触发模式，与水平触发对应（LT）后续详细讨论）

针对op的类型，epoll_ctl内部会调用相应的处理方式。

EPOLL_CTL_ADD --> ep_insert：创建epitem与event绑定，并将epitem添加到eventpoll的rbtree中，并为该epitem设置callback函数，epoll内部事件ready通知都是通过callback实现的。

EPOLL_CTL_DEL --> ep_remove：将该事件对应的epitem从eventpoll RBTree中删除，并释放相应资源

EPOLL_CTL_MOD --> ep_modify：修改epitem中的事件属性。
1. SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
2. struct epoll_event __user *, event)
3. {
4. int error;
5. int full_check = 0;
6. struct fd f, tf;
7. struct eventpoll *ep;
8. struct epitem *epi;
9. struct epoll_event epds;
10. struct eventpoll *tep = NULL;
12. error = -EFAULT;
13. if (ep_op_has_event(op) &&
14. copy_from_user(&epds, event, sizeof(struct epoll_event)))
15. goto error_return;
17. error = -EBADF;
18. f = fdget(epfd);
19. if (!f.file)
20. goto error_return;
22. /* Get the "struct file *" for the target file */
23. tf = fdget(fd);
24. if (!tf.file)
25. goto error_fput;
27. /* The target file descriptor must support poll */
28. error = -EPERM;
29. if (!tf.file->f_op->poll)
30. goto error_tgt_fput;
32. /* Check if EPOLLWAKEUP is allowed */
33. if (ep_op_has_event(op))
34. ep_take_care_of_epollwakeup(&epds);
36. /*
37. * We have to check that the file structure underneath the file descriptor
38. * the user passed to us _is_ an eventpoll file. And also we do not permit
39. * adding an epoll file descriptor inside itself.
40. */
41. error = -EINVAL;
42. if (f.file == tf.file || !is_file_epoll(f.file))
43. goto error_tgt_fput;
45. /*
46. * epoll adds to the wakeup queue at EPOLL_CTL_ADD time only,
47. * so EPOLLEXCLUSIVE is not allowed for a EPOLL_CTL_MOD operation.
48. * Also, we do not currently supported nested exclusive wakeups.
49. */
50. if (ep_op_has_event(op) && (epds.events & EPOLLEXCLUSIVE)) {
51. if (op == EPOLL_CTL_MOD)
52. goto error_tgt_fput;
53. if (op == EPOLL_CTL_ADD && (is_file_epoll(tf.file) ||
54. (epds.events & ~EPOLLEXCLUSIVE_OK_BITS)))
55. goto error_tgt_fput;
56. }
58. /*
59. * At this point it is safe to assume that the "private_data" contains
60. * our own data structure.
61. */
62. ep = f.file->private_data;
64. /*
65. * When we insert an epoll file descriptor, inside another epoll file
66. * descriptor, there is the change of creating closed loops, which are
67. * better be handled here, than in more critical paths. While we are
68. * checking for loops we also determine the list of files reachable
69. * and hang them on the tfile_check_list, so we can check that we
70. * haven't created too many possible wakeup paths.
71. *
72. * We do not need to take the global 'epumutex' on EPOLL_CTL_ADD when
73. * the epoll file descriptor is attaching directly to a wakeup source,
74. * unless the epoll file descriptor is nested. The purpose of taking the
75. * 'epmutex' on add is to prevent complex toplogies such as loops and
76. * deep wakeup paths from forming in parallel through multiple
77. * EPOLL_CTL_ADD operations.
78. */
79. mutex_lock_nested(&ep->mtx, 0);
80. if (op == EPOLL_CTL_ADD) {
81. if (!list_empty(&f.file->f_ep_links) ||
82. is_file_epoll(tf.file)) {
83. full_check = 1;
84. mutex_unlock(&ep->mtx);
85. mutex_lock(&epmutex);
86. if (is_file_epoll(tf.file)) {
87. error = -ELOOP;
88. if (ep_loop_check(ep, tf.file) != 0) {
89. clear_tfile_check_list();
90. goto error_tgt_fput;
91. }
92. } else
93. list_add(&tf.file->f_tfile_llink,
94. &tfile_check_list);
95. mutex_lock_nested(&ep->mtx, 0);
96. if (is_file_epoll(tf.file)) {
97. tep = tf.file->private_data;
98. mutex_lock_nested(&tep->mtx, 1);
99. }
100. }
101. }
103. /*
104. * Try to lookup the file inside our RB tree, Since we grabbed "mtx"
105. * above, we can be sure to be able to use the item looked up by
106. * ep_find() till we release the mutex.
107. */
108. epi = ep_find(ep, tf.file, fd);
110. error = -EINVAL;
111. switch (op) {
112. case EPOLL_CTL_ADD:
113. if (!epi) {
114. epds.events |= POLLERR | POLLHUP;
115. error = ep_insert(ep, &epds, tf.file, fd, full_check);
116. } else
117. error = -EEXIST;
118. if (full_check)
119. clear_tfile_check_list();
120. break;
121. case EPOLL_CTL_DEL:
122. if (epi)
123. error = ep_remove(ep, epi);
124. else
125. error = -ENOENT;
126. break;
127. case EPOLL_CTL_MOD:
128. if (epi) {
129. if (!(epi->event.events & EPOLLEXCLUSIVE)) {
130. epds.events |= POLLERR | POLLHUP;
131. error = ep_modify(ep, epi, &epds);
132. }
133. } else
134. error = -ENOENT;
135. break;
136. }
137. if (tep != NULL)
138. mutex_unlock(&tep->mtx);
139. mutex_unlock(&ep->mtx);
141. error_tgt_fput:
142. if (full_check)
143. mutex_unlock(&epmutex);
145. fdput(tf);
146. error_fput:
147. fdput(f);
148. error_return:
150. return error;
151. }
- epoll_wait
将感兴趣的文件描述符注册到epfd后，便可以调用epoll_wait等待事件发生时内核回调我们了。timeout：<0则为阻塞状态，=0表示立刻查看有无就绪事件，>0则表示等待一段时间，如果没有就绪事件也返回。

在第一小节已经描述过，epoll几个重要的结构体，epoll_wait主要工作是检查eventpoll的rdlist就绪队列是否有事件，而检查rdlist这个过程就是通过ep_poll这个函数实现的。而rdlist中的事件是由epoll_call_back函数从内核态拷贝到用户态的，然后会唤醒正在等待的epoll_wait，由epoll_wait将就绪的事件返回给用户。在ep_poll调用中有一个值得注意的地方，就是在没有可用时间就绪的时候，内核使用__add_wait_queue_exclusive(&ep->wq, &wait);将当前进程放入等待队列中，这个队列使用的是Linux的wait_queue，使用exclusive系列的函数会将当前等待进程放到等待队列的尾部，在唤醒进程的时候会在遇到第一个设置了WQ_FLAG_EXCLUSIVE之后停止唤醒新的进程。
1. SYSCALL_DEFINE4(epoll_wait, int, epfd, struct epoll_event __user *, events,
2. int, maxevents, int, timeout)
3. {
4. int error;
5. struct fd f;
6. struct eventpoll *ep;
8. /* The maximum number of event must be greater than zero */
9. if (maxevents <= 0 || maxevents > EP_MAX_EVENTS)
10. return -EINVAL;
12. /* Verify that the area passed by the user is writeable */
13. if (!access_ok(VERIFY_WRITE, events, maxevents * sizeof(struct epoll_event)))
14. return -EFAULT;
16. /* Get the "struct file *" for the eventpoll file */
17. f = fdget(epfd);
18. if (!f.file)
19. return -EBADF;
21. /*
22. * We have to check that the file structure underneath the fd
23. * the user passed to us _is_ an eventpoll file.
24. */
25. error = -EINVAL;
26. if (!is_file_epoll(f.file))
27. goto error_fput;
29. /*
30. * At this point it is safe to assume that the "private_data" contains
31. * our own data structure.
32. */
33. ep = f.file->private_data;
35. /* Time to fish for events ... */
36. error = ep_poll(ep, events, maxevents, timeout);
38. error_fput:
39. fdput(f);
40. return error;
41. }
1. static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
2. int maxevents, long timeout)
3. {
4. int res = 0, eavail, timed_out = 0;
5. unsigned long flags;
6. u64 slack = 0;
7. wait_queue_entry_t wait;
8. ktime_t expires, *to = NULL;
10. if (timeout > 0) {
11. struct timespec64 end_time = ep_set_mstimeout(timeout);
13. slack = select_estimate_accuracy(&end_time);
14. to = &expires;
15. *to = timespec64_to_ktime(end_time);
16. } else if (timeout == 0) {
17. /*
18. * Avoid the unnecessary trip to the wait queue loop, if the
19. * caller specified a non blocking operation.
20. */
21. timed_out = 1;
22. spin_lock_irqsave(&ep->lock, flags);
23. goto check_events;
24. }
26. fetch_events:
28. if (!ep_events_available(ep))
29. ep_busy_loop(ep, timed_out);
31. spin_lock_irqsave(&ep->lock, flags);
33. if (!ep_events_available(ep)) {
34. /*
35. * Busy poll timed out. Drop NAPI ID for now, we can add
36. * it back in when we have moved a socket with a valid NAPI
37. * ID onto the ready list.
38. */
39. ep_reset_busy_poll_napi_id(ep);
41. /*
42. * We don't have any available event to return to the caller.
43. * We need to sleep here, and we will be wake up by
44. * ep_poll_callback() when events will become available.
45. */
46. init_waitqueue_entry(&wait, current);
47. __add_wait_queue_exclusive(&ep->wq, &wait);
49. for (;;) {
50. /*
51. * We don't want to sleep if the ep_poll_callback() sends us
52. * a wakeup in between. That's why we set the task state
53. * to TASK_INTERRUPTIBLE before doing the checks.
54. */
55. set_current_state(TASK_INTERRUPTIBLE);
56. /*
57. * Always short-circuit for fatal signals to allow
58. * threads to make a timely exit without the chance of
59. * finding more events available and fetching
60. * repeatedly.
61. */
62. if (fatal_signal_pending(current)) {
63. res = -EINTR;
64. break;
65. }
66. if (ep_events_available(ep) || timed_out)
67. break;
68. if (signal_pending(current)) {
69. res = -EINTR;
70. break;
71. }
73. spin_unlock_irqrestore(&ep->lock, flags);
74. if (!schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS))
75. timed_out = 1;
77. spin_lock_irqsave(&ep->lock, flags);
78. }
80. __remove_wait_queue(&ep->wq, &wait);
81. __set_current_state(TASK_RUNNING);
82. }
83. check_events:
84. /* Is it worth to try to dig for events ? */
85. eavail = ep_events_available(ep);
87. spin_unlock_irqrestore(&ep->lock, flags);
89. /*
90. * Try to transfer events to user space. In case we get 0 events and
91. * there's still timeout left over, we go trying again in search of
92. * more luck.
93. */
94. if (!res && eavail &&
95. !(res = ep_send_events(ep, events, maxevents)) && !timed_out) /* 这里会调用ep_scan_ready_list检查就绪队列*/
96. goto fetch_events;
98. return res;
99. }
3.epoll中的水平触发（LT）与边缘出发（ET）区别

水平触发表达的含义就是有一个事件在某一时刻就绪了，内核会通知我们，多路复用分离器会告知我们这个事件目前已经就绪，可以进行处理，但是这个时候我们可以选择处理也可以选择不处理，对于水平触发模式，如果我们本次不处理这个事件，那么下次等待函数返回时（例如select，epoll_wait）仍然会继续通知用户这个事件就绪。这个地方就是ET与LT之间的区别，如果采用ET模式，那么如果第一次通知用户处理这个事件，如果用户没有处理那么下次epoll_wait返回时就不再向用户通知这个事件了，所以如果采用ET模式，用户需要针对每次事件做完备的处理，不然过了这村就没这个店了。其实这也是ET模式比LT模式要高效的一个很重要的原因。只是ET模式下处理事件的时候要格外注意。
原理上已经讲明白了，那么在源码实现上是如何实现两种模式的呢？

当我们调用epoll_wait时候，有这样一个流程：

epoll_wait-->ep_poll(轮询就绪事件链表)-->ep_send_events(如果rdlist有就绪事件(每当有就绪事件会调用ep_poll_callback函数将其放入到eventpoll的rdlist中))-->ep_scan_ready_list(扫描就绪事件链表)-->ep_send_events_proc(这个函数会遍历rdlist，对于如果其中的事件是LT模式的，那就再将其放回rdlist中，下次会继续通知用户)
1. if (epi->event.events & EPOLLONESHOT)
2. epi->event.events &= EP_PRIVATE_BITS;
3. else if (!(epi->event.events & EPOLLET)) {
4. /*
5. * If this file has been added with Level
6. * Trigger mode, we need to insert back inside
7. * the ready list, so that the next call to
8. * epoll_wait() will check again the events
9. * availability. At this point, no one can insert
10. * into ep->rdllist besides us. The epoll_ctl()
11. * callers are locked out by
12. * ep_scan_ready_list() holding "mtx" and the
13. * poll callback will queue them in ep->ovflist.
14. */
15. list_add_tail(&epi->rdllink, &ep->rdllist);
16. ep_pm_stay_awake(epi);
17. }
4. epoll惊群与accept惊群
早期的Linux版本中accept存在惊群问题，什么叫惊群呢？就是在主线程中创建socket然后bind socket但是在子线程中调用accept，那么当几个子线程都处于accpet阻塞状态时，如果有新的连接到来时，系统会将几个阻塞的线程同时唤醒，但是最终只有一个accept线程会抢到这个新的连接，其他的则会出错。后来Linux版本中修复了这个问题，应该是在2.3.*之后，记不清了，具体如何修复的，主要是采用等待队列EXLUSIVE的方式，具体代码网上有很多，可以参考这篇博客：http://simohayha.iteye.com/blog/561424。

但是现在网络IO多路复用已经很成熟，所以以前的那种在各个子线程中单独accpet的方式，现在已经基本不用了，在Linux中常用的高性能服务模型也就是epoll了，但是已经解决的accept惊群问题又重新在epoll_wait上复现了，后来这个问题也得到了解决，具体解决版本应该是在4.9版本之后，解决方法同样也是对listen套接字设置EXCLUSIVE标志然后再将其加入epfd中。但是由于epoll的水平触发模式，导致仍然会存在所谓的惊群现象，设想一种状况，几个子线程或进程在epoll_wait上阻塞，这个时候来了一个新连接，内核会唤醒在等待的进程来处理这个事件，由于设置了EXCLUSIVE标志，那么内核只能唤醒一个阻塞的进程来处理，但是有一种可能是被唤醒的进程正在忙于处理其他事，并没有及时的去将这个新连接及时接收，那么当内核再去scan epoll的就绪队列时，发现扔有新的连接需要处理，这时候内核就会唤醒其他进程处理这个事件，那么在用户看来这又是一种惊群现象。所以说epoll的EXCLUSIVE只是部分解决了惊群问题。性能强大的Ngnix同样也使用了epoll作为linux平台下的事件分离器，但是Nginx在解决惊群的问题上直接用的最原始的方法，加锁，当有新连接到来时，各线程或进程会抢锁，抢到的才可以进行accecpt。后续会有文章针对Ngnix进行详细讨论。

5. epoll与IOCP

epoll与IOCP因为其强大的性能经常被拿来做比较，因为是基于不同平台上的技术，所以直接作比较当然不妥。但是两者的实现思想上还是有一些明显区别的，这里制作一个简单的讨论。

在做服务器设计的时候有两种模型，Reactor和Proactor，Reactor：就是只负责监听事件并不负责处理，epoll就是Reactor， Proactor:就是即负责监听又负责收数据，用户只要取数据就OK，IOCP就是一种Proactor模型。Reactor和Proactor其实是需要系统支持的，但是现在也有一些第三方库在用户层进行了封装模拟，也可以实现一种类似的Proactor模型。比如libevent中bufferevent就是一种Proactor模拟。
相关阅读:
OpenCV里面的一些常用函数
 c++ 里面的字符类型转换
 互斥研究
 git 命令
 pipe的操作
 二叉树总结(五)伸展树、B-树和B+树
 二叉树总结(四)平衡二叉树
 二叉树总结(三)二叉搜索树
 [LeetCode]Construct Binary Tree from Preorder and Inorder Traversal
二叉树总结(一)概念和性质
原文地址：https://www.cnblogs.com/shihuvini/p/9352102.html

源码剖析Linux epoll实现机制及Linux上惊群

#define EPOLLEXCLUSIVE (1U << 28)（指定文件描述符唤醒方式，这个标志应该是4.9版的Linux之后加进去的，与惊群问题相关，这个后续还会讨论）

#define EPOLLWAKEUP (1U << 29)

#define EPOLLONESHOT (1U << 30)

#define EPOLLET (1U << 31)（边缘触发模式，与水平触发对应（LT）后续详细讨论）

针对op的类型，epoll_ctl内部会调用相应的处理方式。

EPOLL_CTL_ADD --> ep_insert：创建epitem与event绑定，并将epitem添加到eventpoll的rbtree中，并为该epitem设置callback函数，epoll内部事件ready通知都是通过callback实现的。

EPOLL_CTL_DEL --> ep_remove：将该事件对应的epitem从eventpoll RBTree中删除，并释放相应资源

EPOLL_CTL_MOD --> ep_modify：修改epitem中的事件属性。

4. epoll惊群与accept惊群

5. epoll与IOCP