| 1 |
sysadm |
1.1 |
PvPGN Abstract socket event notification API ie fdwatch
|
| 2 |
|
|
v.01 (C) 2003 Dizzy <dizzy@roedu.net>
|
| 3 |
|
|
|
| 4 |
|
|
|
| 5 |
|
|
1. The problem:
|
| 6 |
|
|
|
| 7 |
|
|
I wanted to design and implement a general API to have PvPGN inspect
|
| 8 |
|
|
socket status (read/write availability) in the best possible way that the
|
| 9 |
|
|
host OS may offer (select()/poll()/kqueue()/epoll()/etc..).
|
| 10 |
|
|
Also the old PvPGN code to do such things was doing them in a very
|
| 11 |
|
|
slow way, especially if the system was using poll() (which was the default
|
| 12 |
|
|
with most Unices). When the server started to have lots of active connections
|
| 13 |
|
|
the CPU used in PvPGN to inspect and handle them was increasing very much (the
|
| 14 |
|
|
code complexity of the code executed each main loop run was of O(n^2) complexity,
|
| 15 |
|
|
where n is the number of connections to the server, and the main loop is cycling
|
| 16 |
|
|
at least 1000/BNETD_POLL_INTERVAL times per second ie at least 50 times per second).
|
| 17 |
|
|
|
| 18 |
|
|
2. The fdwatch API:
|
| 19 |
|
|
|
| 20 |
|
|
I started by reading the fdwatch code from the thttpd project, I used the
|
| 21 |
|
|
ideeas found on that code as a start point, but I got much far from those :).
|
| 22 |
|
|
The fdwatch API is described in fdwatch.h as follows:
|
| 23 |
|
|
|
| 24 |
|
|
extern int fdwatch_init(void);
|
| 25 |
|
|
extern int fdwatch_close(void);
|
| 26 |
|
|
extern int fdwatch_add_fd(int fd, t_fdwatch_type rw, fdwatch_handler h, void *data);
|
| 27 |
|
|
extern int fdwatch_update_fd(int fd, t_fdwatch_type rw);
|
| 28 |
|
|
extern int fdwatch_del_fd(int fd);
|
| 29 |
|
|
extern int fdwatch(long timeout_msecs);
|
| 30 |
|
|
extern int fdwatch_check_fd(int fd, t_fdwatch_type rw);
|
| 31 |
|
|
extern void fdwatch_handle(void);
|
| 32 |
|
|
|
| 33 |
|
|
The name of the functions should be self explanatory to what those functions
|
| 34 |
|
|
do.
|
| 35 |
|
|
|
| 36 |
|
|
3. The changed code flow:
|
| 37 |
|
|
A. the code flow before fdwatch
|
| 38 |
|
|
- main() calls server_process()
|
| 39 |
|
|
- server_process() after doing some single time initializations, entered
|
| 40 |
|
|
the main loop
|
| 41 |
|
|
- in the main loop after handing the events it starts to prepare the sockets
|
| 42 |
|
|
for select/poll
|
| 43 |
|
|
- it starts a loop cycling through each address configured in bnetd.conf
|
| 44 |
|
|
to listen on them and adds their sockets to the socket inspection list
|
| 45 |
|
|
- after this, it does a O(n) cycle where it populates the socket inspection list
|
| 46 |
|
|
with the sockets of every t_connection the server has (read availability)
|
| 47 |
|
|
- if any of this t_connections have packets in the outqueue (they need to
|
| 48 |
|
|
send data) then the sockets are also added for write availability
|
| 49 |
|
|
- then pvpgn calls select()/poll() on the socket inspection list
|
| 50 |
|
|
- after the syscall returns, pvpgn cycles through each address configured
|
| 51 |
|
|
in bnetd.conf and checks if they are read available (if a new connection
|
| 52 |
|
|
is to be made)
|
| 53 |
|
|
- pvpgn doesnt want to accept new connections when in shutdown phase but
|
| 54 |
|
|
it did it the wrong way: it completly ignored the listening sockets if
|
| 55 |
|
|
in shutdown phase, this made that once a connection was pending while in
|
| 56 |
|
|
shutdown phase, select/poll imediatly returns because it has it in the read
|
| 57 |
|
|
availability list and thus pvpgn was using 99% cpu while in shutdown phase
|
| 58 |
|
|
- anyway, after this, pvpgn does a O(n) cycle through each t_connection to
|
| 59 |
|
|
check if its socket is read/write available
|
| 60 |
|
|
- problem is that when it was using poll() (the common case on Unices) to
|
| 61 |
|
|
check if a socket was returned as read/write available by poll() it was
|
| 62 |
|
|
using another O(n) function thus making the total cycle of O(n^2)
|
| 63 |
|
|
- while cycling through each connection to inspect if its socket was
|
| 64 |
|
|
returned available by select/poll , pvpgn also checks if the connection
|
| 65 |
|
|
is in destroy state (conn_state_destroy) and if so it destroys it
|
| 66 |
|
|
|
| 67 |
|
|
B. the code flow after fdwatch
|
| 68 |
|
|
- I have tried to get every bit of speed I could from the design, so some
|
| 69 |
|
|
things while it may look complex they have the reason of speed behind
|
| 70 |
|
|
- just like the old code flow main calls server_process()
|
| 71 |
|
|
- here pvpgn does some single time initializations
|
| 72 |
|
|
- different than before, here, in the single time intializations code I also
|
| 73 |
|
|
add the listening sockets to the fdwatch socket inspection list (also
|
| 74 |
|
|
the code will need to update this list when receiving SIGHUP)
|
| 75 |
|
|
- then pvpgn enters main server loop
|
| 76 |
|
|
- the code first treats any received events (just like the old code)
|
| 77 |
|
|
- then it calls fdwatch() to inspect the sockets state
|
| 78 |
|
|
- then it calls conn_reap() to destroy the conn_state_destroy connections
|
| 79 |
|
|
- then it calls fdwatch_handle() to cycle through each ready socket and handle
|
| 80 |
|
|
its changed status
|
| 81 |
|
|
|
| 82 |
|
|
This is it! :)
|
| 83 |
|
|
No cycles, no O(n^2), not even a O(n) there (well in fact there is something
|
| 84 |
|
|
similar to a O(n) inside fdwatch() but about that read bellow).
|
| 85 |
|
|
|
| 86 |
|
|
FAQ:
|
| 87 |
|
|
1. Q: but where do the new connections get into the fdwatch inspection
|
| 88 |
|
|
list ?
|
| 89 |
|
|
A: they get in there when they are created, that means in the
|
| 90 |
|
|
function sd_accept() from server.c
|
| 91 |
|
|
the reason is: why add the connection sockets each time before poll() when the
|
| 92 |
|
|
event of having a new connection, so a new socket to inspect is very very rare
|
| 93 |
|
|
compared to the number of times we call select/poll).
|
| 94 |
|
|
|
| 95 |
|
|
2. Q: where are the connections removed from the fdwatch inspection list ?
|
| 96 |
|
|
A: where they should be, in conn_destroy() just before calling close() on the
|
| 97 |
|
|
socket
|
| 98 |
|
|
|
| 99 |
|
|
3. Q: where do we manifest our interest for write availability of a socket if
|
| 100 |
|
|
we have data to send to it ?
|
| 101 |
|
|
A: in conn_push_outuque. the ideea is if we got data to send, ok update fdwatch
|
| 102 |
|
|
socket inspection list to look for write availability of the socket where we
|
| 103 |
|
|
need to send data
|
| 104 |
|
|
|
| 105 |
|
|
4. Q: what does fdwatch() do ?
|
| 106 |
|
|
A: depending on the chosen backend it calls select or poll, or kqueue etc...
|
| 107 |
|
|
For some backends it has to do some work before calling the syscall. Ex. for
|
| 108 |
|
|
select() and poll() it needs to copy from a template list of sockets to inspect
|
| 109 |
|
|
to the actual working set. The reason why depends on the backend but it really is
|
| 110 |
|
|
a limitation of the how the syscall works and there is nothing pvpgn that can be
|
| 111 |
|
|
made to not do that. For example in the poll backend, one might argue that
|
| 112 |
|
|
instead of updating a template and copy it to a working array before each poll(),
|
| 113 |
|
|
we should update the working set. But that also means that before calling poll(),
|
| 114 |
|
|
we must set all "revents" field of each fd_struct to 0 , and my tests show that
|
| 115 |
|
|
a cycle through 1000 elements of poll fd structs setting revents to 0 is 5 times
|
| 116 |
|
|
slower than using a memcpy() to copy the whole array from a source.
|
| 117 |
|
|
|
| 118 |
|
|
5. Q: what does conn_reap() do ?
|
| 119 |
|
|
A: to get the maximum from each possible backend (kqueue was the main reason here)
|
| 120 |
|
|
I moved the cycling through each ready socket and calling the handling function
|
| 121 |
|
|
for it, outside server.c and inside fdwatch backends. Because the old code used
|
| 122 |
|
|
that cycle from server.c to also check if connections are dead and need destroyed
|
| 123 |
|
|
I had to find another way to do it. The best way I found was to have in connection.c
|
| 124 |
|
|
besides the connlist, another list, conn_dead, which will contain the current
|
| 125 |
|
|
connections which have the state set to conn_set_state. Then conn_reap() just
|
| 126 |
|
|
cycles through each element of conn_dead and destroys them. This was the fastest
|
| 127 |
|
|
solution I could find out.
|
| 128 |
|
|
|
| 129 |
|
|
6. Q: what does fdwatch_handle() do ?
|
| 130 |
|
|
A: it calls the backend's handle function. To get the max from each backend I
|
| 131 |
|
|
had to move the handling cycle as a backend specific function. In general this
|
| 132 |
|
|
functions cycle through each socket which was returned ready by the last
|
| 133 |
|
|
fdwatch() call, and calls the handler function (which was set when the socket
|
| 134 |
|
|
was added to the socket inspection list) giving it arguments a void * parameter
|
| 135 |
|
|
(also set when socket was added to the inspection list), and the type of readiness
|
| 136 |
|
|
(read/write). Currently, pvpgn has 3 possible handlers: handle_tcp, handle_udp
|
| 137 |
|
|
and handle_accept. Each of this calls acordingly sd_accept, sd_tcpinput,
|
| 138 |
|
|
sd_tcpoutput, sd_udpinput (UDP sends are done directly, not queueing them and
|
| 139 |
|
|
checking for socket readiness to write, maybe this is another bug ?)
|
| 140 |
|
|
|