spugnews is a script intended to extract large binaries from multiple
usenet postings. It knows how to reassemble groups of files, and can do
rudimentary analysis of articles based on their subject lines to allow you to
easily see what's available.
First, you'll need my Spug Libraries.
I also recommend the yenc
module, which decodes yenc encoded posts 4 times as fast as the built-in
100% python decoder.
GNUish folks can just do the usual "configure; make; make install"
dance. Others should just do whatever is necessary to the spugnews
file to make it executable on their systems.
The first time you run spugnews, it will ask you for the name of your
usenet server and the directory in which to store files. This information will
be stored in a ".spugnewsrc" file in your home directory. This is just
a plain python file with some variable definitions in it, so you can directly
edit it if you want to change these things.
spugnews evaluates each of its command line options in sequence. You
can specify as many options as you want - some perform actions, some change the
programs internal state to affect actions later on.
The "-g" option must be specified prior to any others. This option
identifies the usenet group.
"-r" refreshes the article header list for the current usenet group -
it retrieves all new headers and deletes information on articles with a
reference number earlier than the lower article number stored on the server.
The header list is stored in the file "headers.dat" in a
subdirectory of the storage directory named after the usenet group. For
example, if you chose the default location for files during installation, the
headers.dat file for "alt.binaries.movies" would be "
~/news/alt.binaries.movies/headers.dat".
The program maintains an internal "current header set", which is a subset of
the header list. Filter operations (such as "-p") modify this subset,
allowing you perform operations on a subset of the available headers.
The "-l" options lists the article number followed by the the subject
for each article in the current header set. For example, to list all headers in
the group "alt.binaries.movies", you could use:
The "-p" option allows you to filter the current header set based on a
regular expression, so to see all articles with "Happy Gilmore" in the subject
line:
Note that the -p must come befire the -l: this is because
command line options are processed in the order that they are specified, and we
want to filter before listing.
Trying to determine the status of files encoded in hundreds of articles can
be very time-consuming, so you will generally want to use the -a
(analyze) option instead.
The "-a" option performs a detailed analysis of the current header set
and attempts to group together articles that are part of the same file set based
on the contents of the subject line. Unfortunately, there is no universal
format used for identifying these kinds of file sets, but most people seem to
use one of the following formats:
As such, these are the formats that are recognized by spugnews.
"-a" first lists "rogue" articles: these are articles that it was
unable to group with any others. After this, the "article groups" (sets of
articles grouped together based on the information in their subject lines) are
listed in order of the article number of their first articles. This ordering
scheme causes the more recent groups to appear last.
Following each article group description is a list of any issues that were
encountered with the set (if any). Typical problems include missing files and
missing parts.
The "-d" option is used to download articles. It downloads all
articles in the current article set that have not already been downloaded, so
you generally want to use this with a filter. Articles are downloaded into
files of the form "news. article-number" in the group directory.
"-d" is often accompanied by "-x" (extract files) to extract
and reassemble multi-part files. For example, to download and extract all parts
of "Happy Gilmore":
Note that options without arguments can be grouped together (i.e. we used
"-dx" instead of "-d -x").
spugnews doesn't make any decisions about when to delete the articles
that it has downloaded - you'll want to periodically go into the group directory
and delete *.news.
Sometimes you just want to view a single article, headers and all. The
"-v" option allows you to do this. Specify "-v" with the article
number to download, and the entire article will be formatted to standard output:
The complete set of command line options can be viewed with "spugnews
-h".
I wrote spugnews because I didn't like any of the other free
newsgrabbers that I looked at and because it bothered my that there didn't seem
to be one written in Python. It isn't the best grabber in the world: there are
subject formats that it doesn't recognize which it arguably should, it doesn't
put together rar files and it would be nice if it had a better user interface
and the ability to work more automatically in the background.
However, in spite of these limitations, I find spugnews to be pretty
useful. There's a good chance that I'll add features to it as I need them, and
there are certainly "big picture" items I'd like to see added (integration into
spugmail, perhaps?). However, at this point it is what it is. Hope you like
it, and feel free to send me comments and patches.
internal yenc decoder (not as fast as the yenc module, but good for the
impatient)
work around for problem with hex encoding in newer (2.3+) versions of Python
that affects yenc CRC check.
status indicator for header refresh.
incremental header writing (so if there's a problem when in the middle of
downloading thousands of headers you don't loose what you've got so far)
Initial release.
Michael A. Muller mmuller@enduden.com
Installation
Usage
Listing Headers
spugnews -g alt.binaries.movies -l
spugnews -g alt.binaries.movies -p 'Happy Gilmore' -l
Analyzing Article Groups
prefix file_number "/" num_files bridge part_number "/" num_parts suffix
prefix file_number " of " num_files bridge part_number "/" num_parts suffix
prefix part_number "/" num_parts suffix
Downloading
spugnews -g alt.binaries.movies -p "Happy Gilmore" -dx
Viewing Articles
spugnews -g alt.binaries.movies -v 1933765
Other Info
Release History
1.1 - 2005-11-22
1.0
Contact Info