|
|
Multiplexing filehandles with select() in perl.
The problem
I/O requests such as read() and write() are blocking requests. Suppose you have a line in a program that get
STDIN from a terminal like the following:
$input = <STDIN>;
What will happen here is that the program's execution will block until there a line of input is available, i.e. the
user types something followed by a newline. In many cases this is the desired behavior. Suppose you have a
program that accepts requests through a socket and does some processing for each request, then moves on
to the next request.
01 # Create the receiving socket
02 my $s = new IO::Socket (
03 LocalHost => thekla,
04 LocalPort => 7070,
05 Proto => 'tcp'
06 Listen => 16,
07 Reuse => 1,
08 );
09 die "Could not create socket: $!\n" unless $s;
10
11 my ($ns, $buf);
12 while( $ns = $s->accept() ) { # wait for and accept a connection
13 while( defined( $buf = <$ns> ) ) { # read from the socket
14 # do some processing
15 }
16 }
17 close($s);
Although this is a perfectly valid way of handling the incoming requests, it does suffer some serious problems,
especially if the frequency of incoming requests is high and the processing that needs to be performed for
each is a lot.
Clearly, the problem is that, once a request has been accepted, we have to keep other requests hanging in
the queue while we read the request message and process it. Now, reading from a socket is a blocking call,
so if the client takes too long to transmit the request message, we just sit there waiting while we could be
doing useful processing of other requests. Obviously, not only this is not acceptable, but in cases where the
demand for request processing is high, the program may not be able to meet its operating reqiurements. Also
think that a single client failure at a critical point (in the middle of an ongoing transmission) poses the risk of
making the server block indefinetly.
What can we do about it?
What we need to deal with situations like the above, is a way to handle I/O (we use sockets for this example,
but the rules apply in general to any kind of filehandles) independently and with some sort of apparent
parallelism/multiprocessing. There are two very common approaches to deal with this.
One approach is to spawn separate threads of control to handle each request. This can be done either at
process-level, using fork() to create a new process for each request, or at thread-level using perl's
threading capabilities to create multiple threads within the same process. (Perl's support for threads was
introduced in version 5.005)
The other approach - which is the one that we will discuss here - is to use the select() to multiplex
between several filehandles within a single thread of control, thus creating the effect of parallelism in the
handling of I/O.
What does select() do?
The idea behind select() is to avoid blocking calls by making sure that a call will not block before
attempting it. How do we do that? Suppose we have two filehandles, and we want to read data from them as it
comes in. Let's call them A and B. Now, let's assume that A has no input pending yet, but B is ready to
respond to a read() call. If we know this bit of information, we can try readin from B first, instead of A,
knowing that our call will not block. select() gives us this bit of information. All we need to do is to define
sets of filehandles (one for reading, one for writing and one for errors) and ask call select() on them which
will return a filehandle which is ready to perform the operation for which it has been delegated (depending on
which set it is in) as soon as such a filhandle is ready.
Obviously this provides us with the advantage of always picking up a filehandle that will not block thus
avoiding the possibility of delaying the entire program for one lazy filehandle just because it happened to be
the first we picked at random. Still, it does not guarantee that the selected filehandle is the best choice,
because we still don't know how much data can be read, or how qucikly it can take in data that we wrte to it.
But it is definetly a big step forward from our initial program.
Using select()
We will try writing the example program we attempted on the beginnign of this article, but now using the
select() method. Instead of using perl's select call directly we will use a wrapper module, IO::Select
that makes life easier for us.
... create socket as before ...
11 use IO::Select;
12 $read_set = new IO::Select(); # create handle set for reading
13 $read_set->add($s); # add the main socket to the set
14
15 while (1) { # forever
16 # get a set of readable handles (blocks until at least one handle is ready)
17 my ($rh_set) = IO::Select->select($read_set, undef, undef, 0);
18 # take all readable handles in turn
19 foreach $rh (@$rh_set) {
20 # if it is the main socket then we have an incoming connection and
21 # we should accept() it and then add the new socket to the $read_set
22 if ($rh == $s) {
23 $ns = $rh->accept();
24 $read_set->add($ns);
25 }
26 # otherwise it is an ordinary socket and we should read and process the request
27 else {
28 $buf = <$rh>;
29 if($buf) { # we get normal input
30 # ... process $buf ...
31 }
32 else { # the client has closed the socket
33 # remove the socket from the $read_set and close it
34 $read_set->remove($rh);
35 close($rh);
36 }
37 }
38 }
39 }
We create an IO::Select object, $read_set, which is our set of handles to test for readability, and add all
open handles to it. We start by adding the main socket and each time a new connection is made returning a
new socket for it, we add that socket to the set. Then we go into a loop where we ask select to give us a list
of readable handles and we examine each one in turn. If it is the main socket then we want to call accept()
to receive the incoming connection and add the new socket to the read set. Otherwise it must be an ordinary
socket in which case we read from it and process its input. If the read fails, that means the socket has been
closed on the client side, so we close it, too, and remove it from the read set. So we work our way
continuously through the incoming requests, by making sure that a call for I/O on any filehandle will progress
since select() tells us it will.
As we already mentioned earlier, this method does not guarantee progress as it only tests whether a handle is
ready to respond to I/O. The question still remains, whether the handle we pick from the ready ones is the one
that will respond faster to I/O, and how much data there is available for reading or how much data it is ready
to receive. So it is still possible to block a bit after the point where we picked the handle. Also, we did not take
into account the impact on performance that the actual processing of requests will have. We might just be
printing incoming data to a file, but then again, each request might need heavy processing that would slow
down the entire handle processing loop. But these are issues that must be considered in the context of the
individual application.
Comments
|
David | Posted at 11:33pm on Wednesday, June 27th, 2007 | Excellent tutorial, thank you! |
max | Posted at 6:45pm on Friday, July 13th, 2007 | great read |
Vijay | Posted at 8:20am on Wednesday, July 18th, 2007 | A very good tuturial, very clearly explained. Thank you. |
Anis | Posted at 4:10am on Friday, July 20th, 2007 | Excellent thanks.. |
Shahid Khan | Posted at 1:06am on Thursday, October 18th, 2007 | SO useful information |
Jonathan Perkin | Posted at 6:09am on Tuesday, January 29th, 2008 | The last argument to select() should really be undef, so that it blocks until ready. A timeout of 0 means continuously check, so it chews up 100% CPU. |
Wilko | Posted at 1:24pm on Monday, February 4th, 2008 | Very good article thank you. Thanks to Jonathan P aswell! I was maxxing out the CPU whilst the server was waiting for incoming connections. Changing the 0 to undef worked a treat. Thanks again |
DimeCadmium | Posted at 1:02pm on Tuesday, March 4th, 2008 | A better way (IMO) to get the error message (more details): $@
Also, I use:
new IO::Socket::INET(...) or die "No socket: $!/$@n"; |
alpha | Posted at 10:56pm on Friday, March 14th, 2008 | You should not mix buffered input, i.e. , with select. Use {select/sysread/syswrite} or {print//read/write} |
He Man | Posted at 4:42am on Wednesday, March 19th, 2008 | NICE.... |
Rick | Posted at 4:49pm on Friday, April 25th, 2008 | I believe line 28 is a blocking IO statement. If the other end of the connection went away during an IO, this entire app will wait on that line. I tested this via Telnet as the other end - when I type my first character, the IO::Select detects it and then it blocks at line 28 until I hit carriage return. Does anyone have a solution to this issue? |
MattCarter | Posted at 11:39am on Wednesday, July 30th, 2008 | As alpha pointed out, the IO::Select example above has a serious flaw: The diamond operator () (the shortcut for readline()) does buffered I/O. The buffer that perl uses for the diamond operator is NOT visible to IO::Select. So, the above code will hang in the subsequent select call if multiple lines arrive simultaneously. To avoid this problem, the perl program must use unbuffered IO calls like sysread(...) . |
Ankit Kapoor | Posted at 12:06pm on Sunday, November 9th, 2008 | Xcellent Tutorial! |
Frank | Posted at 8:27am on Tuesday, March 24th, 2009 | Is it the same also for UDP socket? my udp socket don't accept the connect(), i deleted the line with the connect command, and i modified $read_set->add($ns); to $read_set->add(my main socket);
but it'doesn't work.
HELP
tnx for your tutorial! |
Daniel | Posted at 10:46am on Saturday, June 6th, 2009 | Very nice tutorial. it inspired me very much |
martin007 | Posted at 3:06am on Thursday, August 27th, 2009 | IT is one of the best and most leading technology in this modern era. There are many different ways to get high level posts in any famous organization. In order to get a good post in Information Technology, you must have detailed knowledge and experience about different topics like test king. I really appreciate the best and amazing efforts like that. Well done.. |
Vetrivel | Posted at 8:13pm on Thursday, September 10th, 2009 | nice to have tutorial |
Travis | Posted at 5:28am on Tuesday, October 20th, 2009 | Excellent tutorial, this was EXACTLY what I was looking for. |
Colin | Posted at 10:39am on Sunday, November 22nd, 2009 | It was mentioned a couple time that mixing unbuffered reading/writing with buffered reading/writing is a bad idea, this may be a ridiculous question, but how would that look if this program was re-written using entirely un-buffered IO calls ( sysread, syswrite etc.. ) |
sri | Posted at 6:23pm on Thursday, February 4th, 2010 | Good tut.
So is there any solution to the concerns of the last para.
Thanks |
Jagadeesan | Posted at 10:59pm on Thursday, March 25th, 2010 | Execellent !!!! Thank you very much |
saurabh verma | Posted at 3:03am on Tuesday, April 20th, 2010 | Well , I've been looking out for a similar kind of article , and this one really helped me understand multiplexing filehandles |
stefan | Posted at 10:45am on Saturday, May 8th, 2010 | thanks, great totorial. thats what I have been looking for |
Brett G | Posted at 2:12pm on Tuesday, August 31st, 2010 | I'm no expert, but I think the following code properly converts the buffered call (to the readline diamond op ) to buffered sysread command (reference 'alpha' and 'MattCarter" posts)
$status = sysread($rh,$buf,512);
if ($status>0) { # we get normal input
# ... process $buf ... |
Brett G | Posted at 2:14pm on Tuesday, August 31st, 2010 | OOPS...make that "UN-buffered sysread call" in my last post. |
Natali | Posted at 5:15am on Monday, January 31st, 2011 | Hello friends,this is a nice site and I wanted to post a note to let you know, good job! Thanks
Best regards, Natali, CEO of os x iscsi initiator |
Christina | Posted at 3:01pm on Wednesday, May 11th, 2011 | Walking in the presence of giants here. Cool thinknig all around! |
wcotjpe | Posted at 6:23pm on Thursday, May 12th, 2011 | vbEVFO ggcbkplgshdb |
wcotjpe | Posted at 6:23pm on Thursday, May 12th, 2011 | vbEVFO ggcbkplgshdb |
ccwkdsqnn | Posted at 9:22pm on Friday, May 13th, 2011 | qGzU31 mfpflxuujnwp |
Manoj Hirwani | Posted at 12:50am on Saturday, May 14th, 2011 | Its really nice tutorial, Thanks alot!! |
Cory | Posted at 6:09pm on Tuesday, May 24th, 2011 | MLCP is a great place to buy Used Cisco Equipment. |
MooneySonja25 | Posted at 11:44pm on Wednesday, August 24th, 2011 | I took 1 st home loans when I was not very old and it helped me very much. But, I require the secured loan once more time. |
WHITNEYFlowers27 | Posted at 2:12pm on Friday, September 30th, 2011 | I strictly recommend not to hold back until you earn big sum of cash to order goods! You should just get the loan or auto loan and feel yourself comfortable |
Anonymous | Posted at 7:06am on Thursday, March 15th, 2012 | This was very useful |
denv | Posted at 4:16pm on Wednesday, June 27th, 2012 | Thx to u for notes!
Specialy thx to Jonathan Perkin with undef! |
Richard | Posted at 8:37pm on Wednesday, August 22nd, 2012 | Deep thinking - adds a new diemnsion to it all. |
bnzqagvsaaq | Posted at 5:11am on Thursday, August 23rd, 2012 | 2fBiQp kngcwokpkgwx |
nadxarfijgs | Posted at 7:16pm on Friday, August 24th, 2012 | NUWeWS qcpfvbhldzkl |
enzo | Posted at 1:14am on Tuesday, November 13th, 2012 | you can all so use epoll or anyevent have a look at cpan
https://metacpan.org/search?q=anyevent
https://metacpan.org/search?q=io::epoll |
z-man | Posted at 4:07pm on Tuesday, November 27th, 2012 | Thank you for explaining it so clearly - what a difference a few well-placed comments make!
Also thanks to others for the sysread() notes. |
Comments to date: 41.
|
Suggested Reading
Advanced Perl Programming among various other very interesting subjects, dedicates a chapter to socket
programming, providing a very clear and to-the-point approach to the issue. There is a short discussion on select() and
its use to manipulate sockets. It is also a good book to have in general if you're seriously interested in perl
programming.
|