| 1 |
PvPGN Abstract socket event notification API ie fdwatch
|
| 2 |
v.01 (C) 2003 Dizzy <dizzy@roedu.net>
|
| 3 |
|
| 4 |
|
| 5 |
1. The problem:
|
| 6 |
|
| 7 |
I wanted to design and implement a general API to have PvPGN inspect
|
| 8 |
socket status (read/write availability) in the best possible way that the
|
| 9 |
host OS may offer (select()/poll()/kqueue()/epoll()/etc..).
|
| 10 |
Also the old PvPGN code to do such things was doing them in a very
|
| 11 |
slow way, especially if the system was using poll() (which was the default
|
| 12 |
with most Unices). When the server started to have lots of active connections
|
| 13 |
the CPU used in PvPGN to inspect and handle them was increasing very much (the
|
| 14 |
code complexity of the code executed each main loop run was of O(n^2) complexity,
|
| 15 |
where n is the number of connections to the server, and the main loop is cycling
|
| 16 |
at least 1000/BNETD_POLL_INTERVAL times per second ie at least 50 times per second).
|
| 17 |
|
| 18 |
2. The fdwatch API:
|
| 19 |
|
| 20 |
I started by reading the fdwatch code from the thttpd project, I used the
|
| 21 |
ideeas found on that code as a start point, but I got much far from those :).
|
| 22 |
The fdwatch API is described in fdwatch.h as follows:
|
| 23 |
|
| 24 |
extern int fdwatch_init(void);
|
| 25 |
extern int fdwatch_close(void);
|
| 26 |
extern int fdwatch_add_fd(int fd, t_fdwatch_type rw, fdwatch_handler h, void *data);
|
| 27 |
extern int fdwatch_update_fd(int fd, t_fdwatch_type rw);
|
| 28 |
extern int fdwatch_del_fd(int fd);
|
| 29 |
extern int fdwatch(long timeout_msecs);
|
| 30 |
extern int fdwatch_check_fd(int fd, t_fdwatch_type rw);
|
| 31 |
extern void fdwatch_handle(void);
|
| 32 |
|
| 33 |
The name of the functions should be self explanatory to what those functions
|
| 34 |
do.
|
| 35 |
|
| 36 |
3. The changed code flow:
|
| 37 |
A. the code flow before fdwatch
|
| 38 |
- main() calls server_process()
|
| 39 |
- server_process() after doing some single time initializations, entered
|
| 40 |
the main loop
|
| 41 |
- in the main loop after handing the events it starts to prepare the sockets
|
| 42 |
for select/poll
|
| 43 |
- it starts a loop cycling through each address configured in bnetd.conf
|
| 44 |
to listen on them and adds their sockets to the socket inspection list
|
| 45 |
- after this, it does a O(n) cycle where it populates the socket inspection list
|
| 46 |
with the sockets of every t_connection the server has (read availability)
|
| 47 |
- if any of this t_connections have packets in the outqueue (they need to
|
| 48 |
send data) then the sockets are also added for write availability
|
| 49 |
- then pvpgn calls select()/poll() on the socket inspection list
|
| 50 |
- after the syscall returns, pvpgn cycles through each address configured
|
| 51 |
in bnetd.conf and checks if they are read available (if a new connection
|
| 52 |
is to be made)
|
| 53 |
- pvpgn doesnt want to accept new connections when in shutdown phase but
|
| 54 |
it did it the wrong way: it completly ignored the listening sockets if
|
| 55 |
in shutdown phase, this made that once a connection was pending while in
|
| 56 |
shutdown phase, select/poll imediatly returns because it has it in the read
|
| 57 |
availability list and thus pvpgn was using 99% cpu while in shutdown phase
|
| 58 |
- anyway, after this, pvpgn does a O(n) cycle through each t_connection to
|
| 59 |
check if its socket is read/write available
|
| 60 |
- problem is that when it was using poll() (the common case on Unices) to
|
| 61 |
check if a socket was returned as read/write available by poll() it was
|
| 62 |
using another O(n) function thus making the total cycle of O(n^2)
|
| 63 |
- while cycling through each connection to inspect if its socket was
|
| 64 |
returned available by select/poll , pvpgn also checks if the connection
|
| 65 |
is in destroy state (conn_state_destroy) and if so it destroys it
|
| 66 |
|
| 67 |
B. the code flow after fdwatch
|
| 68 |
- I have tried to get every bit of speed I could from the design, so some
|
| 69 |
things while it may look complex they have the reason of speed behind
|
| 70 |
- just like the old code flow main calls server_process()
|
| 71 |
- here pvpgn does some single time initializations
|
| 72 |
- different than before, here, in the single time intializations code I also
|
| 73 |
add the listening sockets to the fdwatch socket inspection list (also
|
| 74 |
the code will need to update this list when receiving SIGHUP)
|
| 75 |
- then pvpgn enters main server loop
|
| 76 |
- the code first treats any received events (just like the old code)
|
| 77 |
- then it calls fdwatch() to inspect the sockets state
|
| 78 |
- then it calls conn_reap() to destroy the conn_state_destroy connections
|
| 79 |
- then it calls fdwatch_handle() to cycle through each ready socket and handle
|
| 80 |
its changed status
|
| 81 |
|
| 82 |
This is it! :)
|
| 83 |
No cycles, no O(n^2), not even a O(n) there (well in fact there is something
|
| 84 |
similar to a O(n) inside fdwatch() but about that read bellow).
|
| 85 |
|
| 86 |
FAQ:
|
| 87 |
1. Q: but where do the new connections get into the fdwatch inspection
|
| 88 |
list ?
|
| 89 |
A: they get in there when they are created, that means in the
|
| 90 |
function sd_accept() from server.c
|
| 91 |
the reason is: why add the connection sockets each time before poll() when the
|
| 92 |
event of having a new connection, so a new socket to inspect is very very rare
|
| 93 |
compared to the number of times we call select/poll).
|
| 94 |
|
| 95 |
2. Q: where are the connections removed from the fdwatch inspection list ?
|
| 96 |
A: where they should be, in conn_destroy() just before calling close() on the
|
| 97 |
socket
|
| 98 |
|
| 99 |
3. Q: where do we manifest our interest for write availability of a socket if
|
| 100 |
we have data to send to it ?
|
| 101 |
A: in conn_push_outuque. the ideea is if we got data to send, ok update fdwatch
|
| 102 |
socket inspection list to look for write availability of the socket where we
|
| 103 |
need to send data
|
| 104 |
|
| 105 |
4. Q: what does fdwatch() do ?
|
| 106 |
A: depending on the chosen backend it calls select or poll, or kqueue etc...
|
| 107 |
For some backends it has to do some work before calling the syscall. Ex. for
|
| 108 |
select() and poll() it needs to copy from a template list of sockets to inspect
|
| 109 |
to the actual working set. The reason why depends on the backend but it really is
|
| 110 |
a limitation of the how the syscall works and there is nothing pvpgn that can be
|
| 111 |
made to not do that. For example in the poll backend, one might argue that
|
| 112 |
instead of updating a template and copy it to a working array before each poll(),
|
| 113 |
we should update the working set. But that also means that before calling poll(),
|
| 114 |
we must set all "revents" field of each fd_struct to 0 , and my tests show that
|
| 115 |
a cycle through 1000 elements of poll fd structs setting revents to 0 is 5 times
|
| 116 |
slower than using a memcpy() to copy the whole array from a source.
|
| 117 |
|
| 118 |
5. Q: what does conn_reap() do ?
|
| 119 |
A: to get the maximum from each possible backend (kqueue was the main reason here)
|
| 120 |
I moved the cycling through each ready socket and calling the handling function
|
| 121 |
for it, outside server.c and inside fdwatch backends. Because the old code used
|
| 122 |
that cycle from server.c to also check if connections are dead and need destroyed
|
| 123 |
I had to find another way to do it. The best way I found was to have in connection.c
|
| 124 |
besides the connlist, another list, conn_dead, which will contain the current
|
| 125 |
connections which have the state set to conn_set_state. Then conn_reap() just
|
| 126 |
cycles through each element of conn_dead and destroys them. This was the fastest
|
| 127 |
solution I could find out.
|
| 128 |
|
| 129 |
6. Q: what does fdwatch_handle() do ?
|
| 130 |
A: it calls the backend's handle function. To get the max from each backend I
|
| 131 |
had to move the handling cycle as a backend specific function. In general this
|
| 132 |
functions cycle through each socket which was returned ready by the last
|
| 133 |
fdwatch() call, and calls the handler function (which was set when the socket
|
| 134 |
was added to the socket inspection list) giving it arguments a void * parameter
|
| 135 |
(also set when socket was added to the inspection list), and the type of readiness
|
| 136 |
(read/write). Currently, pvpgn has 3 possible handlers: handle_tcp, handle_udp
|
| 137 |
and handle_accept. Each of this calls acordingly sd_accept, sd_tcpinput,
|
| 138 |
sd_tcpoutput, sd_udpinput (UDP sends are done directly, not queueing them and
|
| 139 |
checking for socket readiness to write, maybe this is another bug ?)
|
| 140 |
|