一、整体流程概览
从GitHub下载源码后,代理的源码在src中,同时还用到了lib库中的一些函数。对项目的工作流程有个大概理解是分析mosquitto的访问控制权限的基础,网络上已有很多中文博客在介绍,如逍遥子,尽管比较老,但是主要结构体的意义没有变;首先对结构体的含义有所理解对后面进一步看源码是非常有帮助的,如struct mosquitto代表了一个客户端,mosquitto_db代表代理内的一个仓库来存储各种东西。
因为是C语言编写,首先寻找main函数,服务器从/src/mosquitto.c中的main函数开始启动。注意,看的时候会发现有很多宏定义(如WIN32),我们选择自己熟悉的一个平台开始看就好,把其他的折叠掉以免产生混乱。main函数进行了订阅树初始化和加载安全配置文件后,便进入mosquitto_main_loop主循环;该函数首先开始用epoll机制来监听socket读,之后便进入了真正的核心主循环while(run){},这里也才是服务器运行真正逻辑开始的地方。
从上至下流程概括如下:
- 检查并释放闲置的代表客户端结构体context;
- 然后通过哈希表的形式遍历客户端(即context),发送客户端context队列里的数据包,并且把超时的断掉。
- 通过epoll_wait循环处理socket事件,net__socket_accept里接受客户端的连接同时创建了该客户端的context;loop_handle_reads_writes根据读写事件发送或接收数据包。
- packet__read会读取所有数据包内容,然后开始调用handle__packet开始根据数据帧的协议类型分开处理,特别注意一下服务器使用的是src文件夹下的read_handle.c里的函数,区别于客户端使用的lib,有时候自动跳转会坑。根据handle__packet函数里的switch case,就可以方便的进行更详细的跟进。
while(run){//进入主死循环 context__free_disused(db); #ifdef WITH_SYS_TREE if(db->config->sys_interval > 0){ sys_tree__update(db, db->config->sys_interval, start_time); } #endif #ifndef WITH_EPOLL memset(pollfds, -1, sizeof(struct pollfd)*pollfd_max); pollfd_index = 0; for(i=0; i<listensock_count; i++){ pollfds[pollfd_index].fd = listensock[i]; pollfds[pollfd_index].events = POLLIN; pollfds[pollfd_index].revents = 0; pollfd_index++; } #endif now_time = time(NULL); time_count = 0; HASH_ITER(hh_sock, db->contexts_by_sock, context, ctxt_tmp){//遍历哈希表 if(time_count > 0){ time_count--; }else{ time_count = 1000; now = mosquitto_time(); } context->pollfd_index = -1; if(context->sock != INVALID_SOCKET){ #ifdef WITH_BRIDGE if(context->bridge){ mosquitto__check_keepalive(db, context); if(context->bridge->round_robin == false && context->bridge->cur_address != 0 && context->bridge->primary_retry && now > context->bridge->primary_retry){ if(context->bridge->primary_retry_sock == INVALID_SOCKET){ rc = net__try_connect(context, context->bridge->addresses[0].address, context->bridge->addresses[0].port, &context->bridge->primary_retry_sock, NULL, false); if(rc == 0){ COMPAT_CLOSE(context->bridge->primary_retry_sock); context->bridge->primary_retry_sock = INVALID_SOCKET; context->bridge->primary_retry = 0; net__socket_close(db, context); context->bridge->cur_address = 0; } }else{ len = sizeof(int); if(!getsockopt(context->bridge->primary_retry_sock, SOL_SOCKET, SO_ERROR, (char *)&err, &len)){ if(err == 0){ COMPAT_CLOSE(context->bridge->primary_retry_sock); context->bridge->primary_retry_sock = INVALID_SOCKET; context->bridge->primary_retry = 0; net__socket_close(db, context); context->bridge->cur_address = context->bridge->address_count-1; }else{ COMPAT_CLOSE(context->bridge->primary_retry_sock); context->bridge->primary_retry_sock = INVALID_SOCKET; context->bridge->primary_retry = now+5; } }else{ COMPAT_CLOSE(context->bridge->primary_retry_sock); context->bridge->primary_retry_sock = INVALID_SOCKET; context->bridge->primary_retry = now+5; } } } } #endif /* Local bridges never time out in this fashion. */ if(!(context->keepalive) || context->bridge || now - context->last_msg_in <= (time_t)(context->keepalive)*3/2){ //判断当客户端在线时,给客户端发送inflight的数据包 if(db__message_write(db, context) == MOSQ_ERR_SUCCESS){ #ifdef WITH_EPOLL if(context->current_out_packet || context->state == mosq_cs_connect_pending || context->ws_want_write){ if(!(context->events & EPOLLOUT)) { ev.data.fd = context->sock; ev.events = EPOLLIN | EPOLLOUT; if(epoll_ctl(db->epollfd, EPOLL_CTL_ADD, context->sock, &ev) == -1) { if((errno != EEXIST)||(epoll_ctl(db->epollfd, EPOLL_CTL_MOD, context->sock, &ev) == -1)) { log__printf(NULL, MOSQ_LOG_DEBUG, "Error in epoll re-registering to EPOLLOUT: %s", strerror(errno)); } } context->events = EPOLLIN | EPOLLOUT; } context->ws_want_write = false; } else{ if(context->events & EPOLLOUT) { ev.data.fd = context->sock; ev.events = EPOLLIN; if(epoll_ctl(db->epollfd, EPOLL_CTL_ADD, context->sock, &ev) == -1) { if((errno != EEXIST)||(epoll_ctl(db->epollfd, EPOLL_CTL_MOD, context->sock, &ev) == -1)) { log__printf(NULL, MOSQ_LOG_DEBUG, "Error in epoll re-registering to EPOLLIN: %s", strerror(errno)); } } context->events = EPOLLIN; } } #else pollfds[pollfd_index].fd = context->sock; pollfds[pollfd_index].events = POLLIN; pollfds[pollfd_index].revents = 0; if(context->current_out_packet || context->state == mosq_cs_connect_pending || context->ws_want_write){ pollfds[pollfd_index].events |= POLLOUT; context->ws_want_write = false; } context->pollfd_index = pollfd_index; pollfd_index++; #endif }else{ do_disconnect(db, context); } }else{//客户端超时 if(db->config->connection_messages == true){ if(context->id){ id = context->id; }else{ id = "<unknown>"; } log__printf(NULL, MOSQ_LOG_NOTICE, "Client %s has exceeded timeout, disconnecting.", id); } /* Client has exceeded keepalive*1.5 */ do_disconnect(db, context); } } } #ifdef WITH_BRIDGE time_count = 0; for(i=0; i<db->bridge_count; i++){ if(!db->bridges[i]) continue; context = db->bridges[i]; if(context->sock == INVALID_SOCKET){ if(time_count > 0){ time_count--; }else{ time_count = 1000; now = mosquitto_time(); } /* Want to try to restart the bridge connection */ if(!context->bridge->restart_t){ context->bridge->restart_t = now+context->bridge->restart_timeout; context->bridge->cur_address++; if(context->bridge->cur_address == context->bridge->address_count){ context->bridge->cur_address = 0; } }else{ if((context->bridge->start_type == bst_lazy && context->bridge->lazy_reconnect) || (context->bridge->start_type == bst_automatic && now > context->bridge->restart_t)){ #if defined(__GLIBC__) && defined(WITH_ADNS) if(context->adns){ /* Connection attempted, waiting on DNS lookup */ rc = gai_error(context->adns); if(rc == EAI_INPROGRESS){ /* Just keep on waiting */ }else if(rc == 0){ rc = bridge__connect_step2(db, context); if(rc == MOSQ_ERR_SUCCESS){ #ifdef WITH_EPOLL ev.data.fd = context->sock; ev.events = EPOLLIN; if(context->current_out_packet){ ev.events |= EPOLLOUT; } if(epoll_ctl(db->epollfd, EPOLL_CTL_ADD, context->sock, &ev) == -1) { if((errno != EEXIST)||(epoll_ctl(db->epollfd, EPOLL_CTL_MOD, context->sock, &ev) == -1)) { log__printf(NULL, MOSQ_LOG_DEBUG, "Error in epoll re-registering bridge: %s", strerror(errno)); } }else{ context->events = ev.events; } #else pollfds[pollfd_index].fd = context->sock; pollfds[pollfd_index].events = POLLIN; pollfds[pollfd_index].revents = 0; if(context->current_out_packet){ pollfds[pollfd_index].events |= POLLOUT; } context->pollfd_index = pollfd_index; pollfd_index++; #endif }else if(rc == MOSQ_ERR_CONN_PENDING){ context->bridge->restart_t = 0; }else{ context->bridge->cur_address++; if(context->bridge->cur_address == context->bridge->address_count){ context->bridge->cur_address = 0; } context->bridge->restart_t = 0; } }else{ /* Need to retry */ if(context->adns->ar_result){ freeaddrinfo(context->adns->ar_result); } mosquitto__free(context->adns); context->adns = NULL; context->bridge->restart_t = 0; } }else{ rc = bridge__connect_step1(db, context); if(rc){ context->bridge->cur_address++; if(context->bridge->cur_address == context->bridge->address_count){ context->bridge->cur_address = 0; } }else{ /* Short wait for ADNS lookup */ context->bridge->restart_t = 1; } } #else { rc = bridge__connect(db, context); context->bridge->restart_t = 0; if(rc == MOSQ_ERR_SUCCESS){ if(context->bridge->round_robin == false && context->bridge->cur_address != 0){ context->bridge->primary_retry = now + 5; } #ifdef WITH_EPOLL ev.data.fd = context->sock; ev.events = EPOLLIN; if(context->current_out_packet){ ev.events |= EPOLLOUT; } if(epoll_ctl(db->epollfd, EPOLL_CTL_ADD, context->sock, &ev) == -1) { if((errno != EEXIST)||(epoll_ctl(db->epollfd, EPOLL_CTL_MOD, context->sock, &ev) == -1)) { log__printf(NULL, MOSQ_LOG_DEBUG, "Error in epoll re-registering bridge: %s", strerror(errno)); } }else{ context->events = ev.events; } #else pollfds[pollfd_index].fd = context->sock; pollfds[pollfd_index].events = POLLIN; pollfds[pollfd_index].revents = 0; if(context->current_out_packet){ pollfds[pollfd_index].events |= POLLOUT; } context->pollfd_index = pollfd_index; pollfd_index++; #endif }else{ context->bridge->cur_address++; if(context->bridge->cur_address == context->bridge->address_count){ context->bridge->cur_address = 0; } } } #endif } } } } #endif now_time = time(NULL); if(db->config->persistent_client_expiration > 0 && now_time > expiration_check_time){ HASH_ITER(hh_id, db->contexts_by_id, context, ctxt_tmp){ if(context->sock == INVALID_SOCKET && context->clean_session == 0){ /* This is a persistent client, check to see if the * last time it connected was longer than * persistent_client_expiration seconds ago. If so, * expire it and clean up. */ if(now_time > context->disconnect_t+db->config->persistent_client_expiration){ if(context->id){ id = context->id; }else{ id = "<unknown>"; } log__printf(NULL, MOSQ_LOG_NOTICE, "Expiring persistent client %s due to timeout.", id); G_CLIENTS_EXPIRED_INC(); context->clean_session = true; context->state = mosq_cs_expiring; do_disconnect(db, context); } } } expiration_check_time = time(NULL) + 3600; } #ifndef WIN32 sigprocmask(SIG_SETMASK, &sigblock, &origsig); #ifdef WITH_EPOLL //监听socket事件 fdcount = epoll_wait(db->epollfd, events, MAX_EVENTS, 100); #else fdcount = poll(pollfds, pollfd_index, 100); #endif sigprocmask(SIG_SETMASK, &origsig, NULL); #else fdcount = WSAPoll(pollfds, pollfd_index, 100); #endif #ifdef WITH_EPOLL switch(fdcount){ case -1: if(errno != EINTR){ log__printf(NULL, MOSQ_LOG_ERR, "Error in epoll waiting: %s.", strerror(errno)); } break; case 0: break; default: //循环处理socket事件 for(i=0; i<fdcount; i++){ for(j=0; j<listensock_count; j++){ if (events[i].data.fd == listensock[j]) { if (events[i].events & (EPOLLIN | EPOLLPRI)){ //接受客户端的连接,net__socket_accept里同时创建了该客户端的context while((ev.data.fd = net__socket_accept(db, listensock[j])) != -1){ ev.events = EPOLLIN; if (epoll_ctl(db->epollfd, EPOLL_CTL_ADD, ev.data.fd, &ev) == -1) { log__printf(NULL, MOSQ_LOG_ERR, "Error in epoll accepting: %s", strerror(errno)); } context = NULL; HASH_FIND(hh_sock, db->contexts_by_sock, &(ev.data.fd), sizeof(mosq_sock_t), context); if(!context) { log__printf(NULL, MOSQ_LOG_ERR, "Error in epoll accepting: no context"); } context->events = EPOLLIN; } } break; } } if (j == listensock_count) { loop_handle_reads_writes(db, events[i].data.fd, events[i].events); } } } #else if(fdcount == -1){ log__printf(NULL, MOSQ_LOG_ERR, "Error in poll: %s.", strerror(errno)); }else{ loop_handle_reads_writes(db, pollfds); for(i=0; i<listensock_count; i++){ if(pollfds[i].revents & (POLLIN | POLLPRI)){ while(net__socket_accept(db, listensock[i]) != -1){ } } } } #endif #ifdef WITH_PERSISTENCE if(db->config->persistence && db->config->autosave_interval){ if(db->config->autosave_on_changes){ if(db->persistence_changes >= db->config->autosave_interval){ persist__backup(db, false); db->persistence_changes = 0; } }else{ if(last_backup + db->config->autosave_interval < mosquitto_time()){ persist__backup(db, false); last_backup = mosquitto_time(); } } } #endif #ifdef WITH_PERSISTENCE if(flag_db_backup){ persist__backup(db, false); flag_db_backup = false; } #endif if(flag_reload){ log__printf(NULL, MOSQ_LOG_INFO, "Reloading config."); config__read(db, db->config, true); mosquitto_security_cleanup(db, true); mosquitto_security_init(db, true); mosquitto_security_apply(db); log__close(db->config); log__init(db->config); flag_reload = false; } if(flag_tree_print){ sub__tree_print(db->subs, 0); flag_tree_print = false; } #ifdef WITH_WEBSOCKETS for(i=0; i<db->config->listener_count; i++){ /* Extremely hacky, should be using the lws provided external poll * interface, but their interface has changed recently and ours * will soon, so for now websockets clients are second class * citizens. */ if(db->config->listeners[i].ws_context){ libwebsocket_service(db->config->listeners[i].ws_context, 0); } } if(db->config->have_websockets_listener){ temp__expire_websockets_clients(db); } #endif }//end while(run)
二、mosquitto原生权限功能
在mosquitto_plugin.h中唯一一次出现了对这几个权限宏定义的说明:
/* * Function: mosquitto_auth_acl_check * * Called by the broker when topic access must be checked. access will be one * of: * MOSQ_ACL_SUBSCRIBE when a client is asking to subscribe to a topic string. * This differs from MOSQ_ACL_READ in that it allows you to * deny access to topic strings rather than by pattern. For * example, you may use MOSQ_ACL_SUBSCRIBE to deny * subscriptions to '#', but allow all topics in * MOSQ_ACL_READ. This allows clients to subscribe to any * topic they want, but not discover what topics are in use * on the server. * MOSQ_ACL_READ when a message is about to be sent to a client (i.e. whether * it can read that topic or not). * MOSQ_ACL_WRITE when a message has been received from a client (i.e. whether * it can write to that topic or not). *
后面的解释说明了实现时要在哪些位置检查这个权限。执行检查的函数是
int mosquitto_acl_check(struct mosquitto_db *db, struct mosquitto *context, const char *topic, long payloadlen, void* payload, int qos, bool retain, int access)
其中context就是就是被检查的客户端信息,topic、payload、retain等是当前消息的属性,access是要检查的具体权限。通过这个函数参数的接口设计可以猜测其是根据客户端的context来进行检查,也就是根据客户端的事件(ps.不然怎么知道要传入哪个context?一般都是哪个context有行为用哪个吧)。那么是不是所有消息都能找到对应的客户端context呢?请继续看下文分解。
- WRITE权限是当代理收到客户端的消息时进行检查的,特别注意,last will消息存储在了客户端的context里,因此是当do_disconnect的时候代理才根据这个context发送last will消息;但是,按照这个定义retain消息显然是不在这个权限管辖范围内的,因为代理可能已经很早就存储了该消息,发送的客户端的context早已经被清楚掉了。不过,mosquitto这个项目已经添加了在publish时限制retain的功能限制,可以在这里看到讨论。此外还有很多邮件列有相关的权限设计讨论(关于subscribe权限的提出 1 ,2:可见作者觉得设计在消息发出时检查是因为不用考虑通配符的问题,实现简单,而没有考虑撤销问题,后来补了订阅权限是防止订阅通配符,也能提高效率),有时观察这些一线牛人的讨论也能从中学到很多,可以直观的看到这个项目的权限是如何一步一步建立起来的,为什么要这么建立。还能看到有论文的作者利用mosquitto实现方案时与作者的讨论 。
- SUBSCRIBE权限是在客户端订阅时检查,不同在于可以拒绝订阅#。可见作者没有考虑只有这一个权限会动态撤销有问题。
- READ权限是在消息即将放入客户端context的发送队列时进行检查的,包括subscribe时的retained消息,每个消息要发送的时候。这个实现的特点就允许管理员动态更新策略,取消掉客户端接收某个主题消息的权限。
想要查看作者具体是在哪里检查什么权限的可以全局搜索这个函数在哪里调用过。
三、对于mosquitto原生权限的改进
上节提到了,由于权限检查函数需要context的特点,以及retain消息是保存在订阅树叶子节点上的特点,导致retain消息WRITE权限检查丢失。本节讨论如何加入检查retained消息权限的功能。先来看代理是如何处理retained消息的。
- 代理接收并存储retained消息:retained消息是PUBLISH发布到服务器的(last will也可设置,传入的是一个函数)。通过PUBLISH对应函数,可以看到db__messages_easy_queue调用了db__message_store这个函数,将消息及各种属性存入stored保存下来,然后调用sub__messages_queue将消息加入订阅树对应的结点中。最终是在subs__process将retain消息放入结点struct mosquitto__subhier *hier的retained中。
- 代理发布retained消息:handle__subscribe函数中,检查完权限并加入订阅树中(sub__add)后,检查该主题下是否有retained消息需要发送,调用sub__retain_queue、retain__search,再使用retain__process发送该消息。
所以修改思路就是在存入消息的时候,即db__message_store中,保存retained消息发送源的context(为了复用mosquitto_acl_check);在要发送给订阅客户端的时候,即retain__process中,检查发送源的权限。虽然看似简单但还是要考虑很多其他因素,尤其C语音要自己控制内存释放与初始化,一不小心就会段错误。具体修改细节:
-
1.修改mosquitto_broker_internal.h里mosquitto结构体,加入该客户端共注册过多少个retained message的计数,以方便维护该客户端context的消亡。加数:新retain消息入代理时;减:该retain消息被替代时。注意要初始化这个值!找到context第一次被创建在context.c中的context__init函数。
-
2.在handle_publish.c中可以看出,代理会存储消息,使用了database.c中的db__message_store函数。应该修改这个函数,将发送源的context存入给mosquitto_msg_store。
-
3. mosquitto_msg_store肯定也要加入一个mosquitto结构体指针存储context。在subs.c中的subs__process可以看到如果是retain消息就把这个结构体存入当前主题结点。
-
4. subs.c文件的retain__process,在把retained消息给客户端之前检查发送源权限。
-
5. 在loop.c文件的do_disconnect函数,在调用context__add_to_disused之前检查是不是有retained message注册过,即检查计数。只有对该context调用do_disconnect才能销毁context。
-
6. 因为怕contex最后没有被调用do_disconnect释放掉,所以要在保存消息被删掉时专门检查一下“已经不在线的客户端是否还有retained message,没有就减少计数,若计数为0,且不需要恢复会话context->clean_session==true(不能影响保存会话且没有retain消息的人),且不在线context->state= = mosq_cs_disconnected,就调用释放函数context__add_to_disused。
-
7.是否影响会话恢复?
-
8.注意msg_store也有维护,如database.c中的db__msg_store_deref后要释放掉这个消息的时候db__msg_store_remove,就减少源context的引用。(因为这时候保存的消息要被清掉了)
-
10. 整个项目有很多地方调用了db__message_store,应该仔细考察到底什么时候存context!
-
11.store message也记得初始化!所有加入的东西都要记得初始化和释放
-
12.一直保留着有retain消息的context,甚至保留其id,对系统有什么影响?再有人想使用相同的id会发生什么?区分出在线的该id和不在线的?
-
13.小心宏定义导致代码实际没有加入进去。