It's way harder than it should be to have a CGI script do something
asynchronously in Apache. The root of the problem is that it's not
enough to fork a child, you have to close stdin, stdout, and stderr.
Only you can't really close them, you have to reassign them.
import sys, os, time print "Content-Type: text/plain\n\n", print "Script started" if os.fork() == 0: # Reassign stdin, stdout, stderr for child # so Apache will ignore it si = file('/dev/null', 'r') so = file('/dev/null', 'a+') se = file('/dev/null', 'a+', 0) os.dup2(si.fileno(), sys.stdin.fileno()) os.dup2(so.fileno(), sys.stdout.fileno()) os.dup2(se.fileno(), sys.stderr.fileno()) # Do whatever you want asynchronously time.sleep(2) os.execv('/bin/sleep', ['sleep', '5']) print "Process was forked"This is explained pretty well in Perl and in Python. It's a shame that sys.stdin.close() doesn't work. I still haven't seen a good explanation for why Apache doesn't send partial output from a CGI: Apache says it doesn't buffer and neither does python -u. Grr. Ah ha, mod_gzip does buffer, unsurprisingly. Thanks to Marc for research help
I've been using mod_gzip on
my weblog server to try to save bandwidth. Today I crunched some
numbers and learned that gzip encoding only works for about one third of the web
requests for HTML that I get. When it does work, it compresses to
about 30% of the original size.
Turns out that while most user browsers support gzip encoding, most spiders don't. GoogleGuy says this may be because servers don't reliably serve gzip. I could believe that given the contortions I had to go through. RSS aggregators are mostly good about supporting gzip. They are good about handling 304 Not Modified, too. Good thing; RSS polling is such a huge source of traffic.
I serve my blog via my dinky 128kbps upstream DSL link, so bandwidth
is precious. Fixing
the fiasco of
mod_gzip triggering an MSIE bug helps a lot. Now I'm
supporting If-Modified-Since and ETags headers on my blog contents,
too. The magic is Bob Schumaker's
lastmodified
plugin, which pretty much Just Works. Thanks, Bob!
Please tell me if you see any caching weirdness.
Today I learned that Internet Explorer isn't caching any images from my blog
at all. Why? A
nasty
bug in MSIE that
mod_gzip
triggers. Gory details and a partial fix below.
The issue is that mod_gzip includes the following header in all responses:
Vary: Accept-Encoding
This helps prevent caches from serving gzip data to browsers that
can't support it.
Unfortunately it also triggers a bug in MSIE - the browser won't cache any document with that header! So with mod_gzip 95% of the world's browsers won't cache any pages from the server. Some bandwidth savings. It'd be nice if mod_gzip was smart enough not to add the Vary: header if it didn't compress the file, but it's not. A partial workaround is to turn mod_gzip off for files it won't be compressing anyway, like images.
<FilesMatch "\.(gif|jpe?g|png)$">
This fix is only partial; other files (say, HTML) still won't be
cached. Three choices - stop using gzip, lose caching in IE,
or drop the Vary: header and break caches.
mod_gzip_on No </FilesMatch>
Michael was kind enough to write me and comment on my mod_gzip notes.
He suggests not specifying
mod_gzip_item_exclude reqheader "User-agent: Mozilla/4.0[678]"
because it results in a
Vary: User-Agent
header which makes life hard on proxy servers and only protects the
miniscule few people who run old Netscape 4.0 versions. Isn't
technology fun? He also says that Apache 2.0's
mod_deflate
does indeed make HTTP compression easier; Apache 2 was designed for
plugins to filter traffic as it is served.
In the spirit of saving bits I set up mod_gzip on my
Apache 1.3 server. Now HTTP stuff is compressed in transit.
Fetching my weblog went from 20384 bytes to 9717
bytes; even better, it went from 38 packets to 21 packets. This may
seem silly but on an ADSL line upstream bandwidth is hugely limited;
anything I can do to save bandwidth is welcome.
Usability on mod_gzip is fairly low. Original site is offline, docs are awful. Fortunately someone has taken on the task of making a decent support site. Even then the details of how it works are magic and opaque; honestly, this kind of server configuration should be much easier or automatic. Maybe it is in Apache 2.0.
Here's the magic I'm using:
LoadModule gzip_module /usr/lib/apache/1.3/mod_gzip.so
<IfModule mod_gzip.c> mod_gzip_on yes mod_gzip_dechunk yes mod_gzip_item_exclude reqheader "User-agent: Mozilla/4.0[678]" mod_gzip_item_include handler ^cgi-script$ mod_gzip_item_include mime text/ </IfModule> |