Compressed NNTP feeds

I have modified NNTPRelay and Diablo to support on-the-fly compression for NNTP over TCP feeds.  This page describes how the protocol negoitiation happens, and some of how the implementation works.

In a normal NNTP conversation, a typical command exchange might look like this:

(s> for the feeding machine, r> for the recipient)

s> (connects)
r> 200 Welcome
s> mode stream
r> 203 StreamOK
s> check <anId@tttt.ddd.yyy>
r> 238 <anId@ttt.ddd.yyy>
s> takethis <anId@ttt.ddd.yyy>
(...article...CRLF.CRLF)
r> 239 OK

In my implementation, the stream looks like this:

s> (connects)
r> 200 Welcome
s> mode stream
r> 203 StreamOK
s> mode compress
r> 207 compression enabled
s> check <anId@tttt.ddd.yyy> (compressed)
r> 238 <anId@ttt.ddd.yyy>
s> takethis <anId@ttt.ddd.yyy> (compressed)
(...article...CRLF.CRLF) (compressed)
r> 239 OK

"(compressed)" indicates that the data is compressed.  All data sent from the server to the client is compressed after the 207 response is returned.

Details of the compression.

The compression is accomplished with zlib.

When the "mode compress" is received on the downstream site, I initialize a z_stream structure with inflateInit(), and return a 207 response to the upstream feeder.   The 207 response causes the server to initialize a z_stream structure with deflateInit().

For each write, the server notices that it is in compressed mode, and compresses the data with a call to the deflate() command.  Now, in order to constrain buffering requirements, and keep the stream flowing reasonably, you must flush the compressor.   I flush the compressor in 2 different scenarios. 

First, I never send more than 2048 bytes of uncompressed command (CHECK or IHAVE) data without  flushing the compressor with a Z_PARTIAL_FLUSH flag to deflate.  This reduces buffering requirements on the remote host, and is small enough to keep streams flowing without waiting too long for the decompressor to output a full transaction.   I did not experiment very much with the 2k size, so it would be worthwhile to try larger sizes to see if it helps compression rates.

Second, I flush the compressor (again with Z_PARTIAL_FLUSH) after each block write on the software side.  In NNTPRelay, a block write might include 1 IHAVE command or (up to) 30 CHECK commands.  In diablo, each IHAVE or CHECK is flushed individually because of the way the implementation above the I/O routine works.  Diablo and NNTPRelay flush the compressor after the CRLF.CRLF terminating article transmission.  

I currently assume that the compressor doesn't produce data blocks that will stall the remote end decompressor on article transmission.  It could happen if a compressed block had forward references for data further than 2048 - (its size) bytes forward in the stream.  This situation is recognized and the connection is closed if it happens, but this is sub-optimal. :)

back to NNTPRelay main page