pvpgn-1.7.4/docs/README.fdwatch

        PvPGN Abstract socket event notification API ie fdwatch
            v.01 (C) 2003 Dizzy <dizzy@roedu.net>


1. The problem:

    I wanted to design and implement a general API to have PvPGN inspect
socket status (read/write availability) in the best possible way that the
host OS may offer (select()/poll()/kqueue()/epoll()/etc..).
    Also the old PvPGN code to do such things was doing them in a very 
slow way, especially if the system was using poll() (which was the default
with most Unices). When the server started to have lots of active connections 
the CPU used in PvPGN to inspect and handle them was increasing very much (the
code complexity of the code executed each main loop run was of O(n^2) complexity,
where n is the number of connections to the server, and the main loop is cycling
at least 1000/BNETD_POLL_INTERVAL times per second ie at least 50 times per second).

2. The fdwatch API:

    I started by reading the fdwatch code from the thttpd project, I used the 
ideeas found on that code as a start point, but I got much far from those :).
    The fdwatch API is described in fdwatch.h as follows:

extern int fdwatch_init(void);
extern int fdwatch_close(void);
extern int fdwatch_add_fd(int fd, t_fdwatch_type rw, fdwatch_handler h, void *data);
extern int fdwatch_update_fd(int fd, t_fdwatch_type rw);
extern int fdwatch_del_fd(int fd);
extern int fdwatch(long timeout_msecs);
extern int fdwatch_check_fd(int fd, t_fdwatch_type rw);
extern void fdwatch_handle(void);

    The name of the functions should be self explanatory to what those functions
do.

3. The changed code flow:
A. the code flow before fdwatch
    - main() calls server_process() 
    - server_process() after doing some single time initializations, entered
    the main loop
    - in the main loop after handing the events it starts to prepare the sockets
    for select/poll
    - it starts a loop cycling through each address configured in bnetd.conf 
    to listen on them and adds their sockets to the socket inspection list
    - after this, it does a O(n) cycle where it populates the socket inspection list
    with the sockets of every t_connection the server has (read availability)
    - if any of this t_connections have packets in the outqueue (they need to 
    send data) then the sockets are also added for write availability
    - then pvpgn calls select()/poll() on the socket inspection list
    - after the syscall returns, pvpgn cycles through each address configured
    in bnetd.conf and checks if they are read available (if a new connection 
    is to be made)
    - pvpgn doesnt want to accept new connections when in shutdown phase but 
    it did it the wrong way: it completly ignored the listening sockets if 
    in shutdown phase, this made that once a connection was pending while in 
    shutdown phase, select/poll imediatly returns because it has it in the read 
    availability list and thus pvpgn was using 99% cpu while in shutdown phase
    - anyway, after this, pvpgn does a O(n) cycle through each t_connection to 
    check if its socket is read/write available
    - problem is that when it was using poll() (the common case on Unices) to 
    check if a socket was returned as read/write available by poll() it was 
    using another O(n) function thus making the total cycle of O(n^2)
    - while cycling through each connection to inspect if its socket was 
    returned available by select/poll , pvpgn also checks if the connection 
    is in destroy state (conn_state_destroy) and if so it destroys it

B. the code flow after fdwatch
    - I have tried to get every bit of speed I could from the design, so some 
    things while it may look complex they have the reason of speed behind
    - just like the old code flow main calls server_process()
    - here pvpgn does some single time initializations
    - different than before, here, in the single time intializations code I also 
    add the listening sockets to the fdwatch socket inspection list (also 
    the code will need to update this list when receiving SIGHUP)
    - then pvpgn enters main server loop
    - the code first treats any received events (just like the old code)
    - then it calls fdwatch() to inspect the sockets state
    - then it calls conn_reap() to destroy the conn_state_destroy connections
    - then it calls fdwatch_handle() to cycle through each ready socket and handle
    its changed status

This is it! :)
No cycles, no O(n^2), not even a O(n) there (well in fact there is something 
similar to a O(n) inside fdwatch() but about that read bellow).

FAQ:
1. Q: but where do the new connections get into the fdwatch inspection 
list ? 
A: they get in there when they are created, that means in the 
function sd_accept() from server.c
the reason is: why add the connection sockets each time before poll() when the 
event of having a new connection, so a new socket to inspect is very very rare 
compared to the number of times we call select/poll).

2. Q: where are the connections removed from the fdwatch inspection list ?
A: where they should be, in conn_destroy() just before calling close() on the 
socket

3. Q: where do we manifest our interest for write availability of a socket if 
we have data to send to it ?
A: in conn_push_outuque. the ideea is if we got data to send, ok update fdwatch 
socket inspection list to look for write availability of the socket where we 
need to send data

4. Q: what does fdwatch() do ?
A: depending on the chosen backend it calls select or poll, or kqueue etc...
For some backends it has to do some work before calling the syscall. Ex. for 
select() and poll() it needs to copy from a template list of sockets to inspect 
to the actual working set. The reason why depends on the backend but it really is
a limitation of the how the syscall works and there is nothing pvpgn that can be
made to not do that. For example in the poll backend, one might argue that 
instead of updating a template and copy it to a working array before each poll(), 
we should update the working set. But that also means that before calling poll(),
we must set all "revents" field of each fd_struct to 0 , and my tests show that 
a cycle through 1000 elements of poll fd structs setting revents to 0 is 5 times 
slower than using a memcpy() to copy the whole array from a source.

5. Q: what does conn_reap() do ?
A: to get the maximum from each possible backend (kqueue was the main reason here)
 I moved the cycling through each ready socket and calling the handling function 
for it, outside server.c and inside fdwatch backends. Because the old code used 
that cycle from server.c to also check if connections are dead and need destroyed 
I had to find another way to do it. The best way I found was to have in connection.c
besides the connlist, another list, conn_dead, which will contain the current
connections which have the state set to conn_set_state. Then conn_reap() just 
cycles through each element of conn_dead and destroys them. This was the fastest 
solution I could find out.

6. Q: what does fdwatch_handle() do ?
A: it calls the backend's handle function. To get the max from each backend I 
had to move the handling cycle as a backend specific function. In general this 
functions cycle through each socket which was returned ready by the last 
fdwatch() call, and calls the handler function (which was set when the socket 
was added to the socket inspection list) giving it arguments a void * parameter 
(also set when socket was added to the inspection list), and the type of readiness
(read/write). Currently, pvpgn has 3 possible handlers: handle_tcp, handle_udp 
and handle_accept. Each of this calls acordingly sd_accept, sd_tcpinput, 
sd_tcpoutput, sd_udpinput (UDP sends are done directly, not queueing them and 
checking for socket readiness to write, maybe this is another bug ?)

1	sysadm	1.1	PvPGN Abstract socket event notification API ie fdwatch
2			v.01 (C) 2003 Dizzy <dizzy@roedu.net>
3
4
5			1. The problem:
6
7			I wanted to design and implement a general API to have PvPGN inspect
8			socket status (read/write availability) in the best possible way that the
9			host OS may offer (select()/poll()/kqueue()/epoll()/etc..).
10			Also the old PvPGN code to do such things was doing them in a very
11			slow way, especially if the system was using poll() (which was the default
12			with most Unices). When the server started to have lots of active connections
13			the CPU used in PvPGN to inspect and handle them was increasing very much (the
14			code complexity of the code executed each main loop run was of O(n^2) complexity,
15			where n is the number of connections to the server, and the main loop is cycling
16			at least 1000/BNETD_POLL_INTERVAL times per second ie at least 50 times per second).
17
18			2. The fdwatch API:
19
20			I started by reading the fdwatch code from the thttpd project, I used the
21			ideeas found on that code as a start point, but I got much far from those :).
22			The fdwatch API is described in fdwatch.h as follows:
23
24			extern int fdwatch_init(void);
25			extern int fdwatch_close(void);
26			extern int fdwatch_add_fd(int fd, t_fdwatch_type rw, fdwatch_handler h, void *data);
27			extern int fdwatch_update_fd(int fd, t_fdwatch_type rw);
28			extern int fdwatch_del_fd(int fd);
29			extern int fdwatch(long timeout_msecs);
30			extern int fdwatch_check_fd(int fd, t_fdwatch_type rw);
31			extern void fdwatch_handle(void);
32
33			The name of the functions should be self explanatory to what those functions
34			do.
35
36			3. The changed code flow:
37			A. the code flow before fdwatch
38			- main() calls server_process()
39			- server_process() after doing some single time initializations, entered
40			the main loop
41			- in the main loop after handing the events it starts to prepare the sockets
42			for select/poll
43			- it starts a loop cycling through each address configured in bnetd.conf
44			to listen on them and adds their sockets to the socket inspection list
45			- after this, it does a O(n) cycle where it populates the socket inspection list
46			with the sockets of every t_connection the server has (read availability)
47			- if any of this t_connections have packets in the outqueue (they need to
48			send data) then the sockets are also added for write availability
49			- then pvpgn calls select()/poll() on the socket inspection list
50			- after the syscall returns, pvpgn cycles through each address configured
51			in bnetd.conf and checks if they are read available (if a new connection
52			is to be made)
53			- pvpgn doesnt want to accept new connections when in shutdown phase but
54			it did it the wrong way: it completly ignored the listening sockets if
55			in shutdown phase, this made that once a connection was pending while in
56			shutdown phase, select/poll imediatly returns because it has it in the read
57			availability list and thus pvpgn was using 99% cpu while in shutdown phase
58			- anyway, after this, pvpgn does a O(n) cycle through each t_connection to
59			check if its socket is read/write available
60			- problem is that when it was using poll() (the common case on Unices) to
61			check if a socket was returned as read/write available by poll() it was
62			using another O(n) function thus making the total cycle of O(n^2)
63			- while cycling through each connection to inspect if its socket was
64			returned available by select/poll , pvpgn also checks if the connection
65			is in destroy state (conn_state_destroy) and if so it destroys it
66
67			B. the code flow after fdwatch
68			- I have tried to get every bit of speed I could from the design, so some
69			things while it may look complex they have the reason of speed behind
70			- just like the old code flow main calls server_process()
71			- here pvpgn does some single time initializations
72			- different than before, here, in the single time intializations code I also
73			add the listening sockets to the fdwatch socket inspection list (also
74			the code will need to update this list when receiving SIGHUP)
75			- then pvpgn enters main server loop
76			- the code first treats any received events (just like the old code)
77			- then it calls fdwatch() to inspect the sockets state
78			- then it calls conn_reap() to destroy the conn_state_destroy connections
79			- then it calls fdwatch_handle() to cycle through each ready socket and handle
80			its changed status
81
82			This is it! :)
83			No cycles, no O(n^2), not even a O(n) there (well in fact there is something
84			similar to a O(n) inside fdwatch() but about that read bellow).
85
86			FAQ:
87			1. Q: but where do the new connections get into the fdwatch inspection
88			list ?
89			A: they get in there when they are created, that means in the
90			function sd_accept() from server.c
91			the reason is: why add the connection sockets each time before poll() when the
92			event of having a new connection, so a new socket to inspect is very very rare
93			compared to the number of times we call select/poll).
94
95			2. Q: where are the connections removed from the fdwatch inspection list ?
96			A: where they should be, in conn_destroy() just before calling close() on the
97			socket
98
99			3. Q: where do we manifest our interest for write availability of a socket if
100			we have data to send to it ?
101			A: in conn_push_outuque. the ideea is if we got data to send, ok update fdwatch
102			socket inspection list to look for write availability of the socket where we
103			need to send data
104
105			4. Q: what does fdwatch() do ?
106			A: depending on the chosen backend it calls select or poll, or kqueue etc...
107			For some backends it has to do some work before calling the syscall. Ex. for
108			select() and poll() it needs to copy from a template list of sockets to inspect
109			to the actual working set. The reason why depends on the backend but it really is
110			a limitation of the how the syscall works and there is nothing pvpgn that can be
111			made to not do that. For example in the poll backend, one might argue that
112			instead of updating a template and copy it to a working array before each poll(),
113			we should update the working set. But that also means that before calling poll(),
114			we must set all "revents" field of each fd_struct to 0 , and my tests show that
115			a cycle through 1000 elements of poll fd structs setting revents to 0 is 5 times
116			slower than using a memcpy() to copy the whole array from a source.
117
118			5. Q: what does conn_reap() do ?
119			A: to get the maximum from each possible backend (kqueue was the main reason here)
120			I moved the cycling through each ready socket and calling the handling function
121			for it, outside server.c and inside fdwatch backends. Because the old code used
122			that cycle from server.c to also check if connections are dead and need destroyed
123			I had to find another way to do it. The best way I found was to have in connection.c
124			besides the connlist, another list, conn_dead, which will contain the current
125			connections which have the state set to conn_set_state. Then conn_reap() just
126			cycles through each element of conn_dead and destroys them. This was the fastest
127			solution I could find out.
128
129			6. Q: what does fdwatch_handle() do ?
130			A: it calls the backend's handle function. To get the max from each backend I
131			had to move the handling cycle as a backend specific function. In general this
132			functions cycle through each socket which was returned ready by the last
133			fdwatch() call, and calls the handler function (which was set when the socket
134			was added to the socket inspection list) giving it arguments a void * parameter
135			(also set when socket was added to the inspection list), and the type of readiness
136			(read/write). Currently, pvpgn has 3 possible handlers: handle_tcp, handle_udp
137			and handle_accept. Each of this calls acordingly sd_accept, sd_tcpinput,
138			sd_tcpoutput, sd_udpinput (UDP sends are done directly, not queueing them and
139			checking for socket readiness to write, maybe this is another bug ?)
140
webmaster@leafok.com	ViewVC Help
Powered by ViewVC 1.3.0-beta1