sebsauvage.net - Quick & Dirty Python Programs

Quick & Dirty Python Programs

This page contains (hopefully usefull) quick & dirty Python programs. I created these programs because I needed them. I decided to put them in the public domain. Feel free to reuse and tweak them.

stripscripts.py - stripscripts 1.0p - Script stripper

This script will scan a directory (and its subdirectories) and disable all scripts (javascript, vbscript...) from .html and .htm files (The scripts will not be deleted, but simply deactivated, so that you can review them later if you like.)

Can be usefull for sites you have downloaded with HTTrack or similar tools. No more nosey or buggy scripts in your local html files.

Syntax : python stripscripts.py <directory>

Example : python stripscripts.py d:\myfiles

snapper.py - Snapper 1.2p - System snapshop

This script will scan a directory (and its subdirectories), compute SHA-1 (Secure Hash Algorithm) for specific files (according to their extensions) and ouput a CSV file (suited for loading into a spreadsheet editor, a database, or simply comparing using a diff or ExamDiff.). Usefull for system tampering detection (kind of poor man's TripWire).

You can redirect the output of this script to a file.
eg. python snapper.py > todayCheck.csv

Directory to scan and extensions are hardcoded in the script. Feel free to tweak them to suit your needs.

newsarchiver.py - newsArchiver 1.1p - Usenet archiver

This script will download all available message from the desired Usenet group with NNTP and save them as plain text files. Usefull for bulk group archiving, even if messages fall beyond the server delete threshold.

Each message will be saved in seperate file (filenames are choosen according to the group and message ID, eg. comp_lang_python_108476).

You can interrupt the script anytime (by pressing CTRL+C) and re-run it later. The script will resume and download only the missing messages.

Read script comments for configuration information.

email_extractor.py - Email Extractor 1.0p - Email addresses extractor

This script takes whatever you throw at stdin (text file, html, EXE...) and extracts email addresses.

eg. python email_extractor.py < PythonFAQ.html

This script can be used for whatever you want, except spamming !

autozip.py - Auto-Zip 1.0p - Auto-zipper

This script will scan a directory (and its subdirectories) and will automatically zip found files (according to their extensions). I use this script to automatically ZIP the backups of my SQL server.

This script does not use Python's internal ZIP routines. This script requires InfoZip's ZIP.EXE to be present in the path (see script comments for URL).

Extensions to zip are hardcoded in the script.
Directory to scan is hardcoded at the end of the script.
Feel free to tweak them to suit your needs.

ultima_s.py - Ult*ma Strasbourg list-dumper 1.0

This script will fetch the entire catalog from Ult*ma Strabourg (french CD and DVD bargains).
The script will request all pages with HTTP, parse HTML code and output a CSV file (suited to loading into Excel or any other spreadsheet).

eg. python ultima_s.py > today.csv

Comments in this script are in french (sorry !).

doublesdetector.py - Doubles detector 1.0p - Duplicate file finder

This script will find files that have the same content (in several directories or drives), whatever their name, date, time or attributes.

eg. python doublesdetector.py c:\;d:\;e:\ > doubles.txt
will find identical files on C:, D: and E: drives.

kupdate101.py - KAV/AVP update - Kasperksy antivirus signatures updater

This script is a handy replacement of the KAV/AVP antivirus auto-update feature. It will download only necessary files and unzip them in the right directory. You can schedule this script everyday to be up-to-date.

pypack002.py - pyPack 0.0.2 - Python script packer

Packs individual python scripts in pure text. Benefits:

smaller scripts (size gain as soon as the script is above 800 bytes)
text-only output : the scripts can be directly executed on any platform.
no indentation in packed scripts : you can easily distribute scripts without fearing of breaking the script because of loss of indentation or unsupported character sets.
original source code can be retreive by replacing exec with print in code.

Of course, pyPack is itself packed with pyPack ! ;-)

fum01dev3.py - fastUnixMailbox 0.1 DEV 3 - Fast Unix mailbox reader

This module is a replacement of Python's Unixmailbox module. fastUnixMailbox is much faster:

the mailbox is automatically indexed upon opening. Further message access is almost instant.
messages can be access randomly.
This module does not use line-read (readline or xreadlines), only binary file access. It does not use regular expressions either.

You'll be able to access the content mailbox files very quickly. It's especially usefull on large mailboxes.
(This version is a development version, but performs very well.)

scsv.py - Simple CSV file reader 0.0.2 - Fast and precise CSV/TSV file reader

This parser can read:

CSV (Comma-separated value) files (,)
SCSV (Semi-colon-separated value) files (;)
TSV (Tab-separated value) files (tabulation)

This parser:

supports lines with mixed quoted and unquoted strings.
supports cells which contain newlines (quoted strings spanning on several lines).
can optionnaly ignore comments.
can optionnaly ignore empty lines.
can optionnaly strip left and right spaces in cells.
can efficiently iterate over very large CSV files without eating all the memory.
can handle very badly-formatted CSV files.
does not use regular expressions (which makes it faster and "Maximum recursion limit exceeded"-error-proof.)

This class is very easy to use : open the file, and use nextrow() to get each CSV row. Example included in source code.

Note on 2003-05-06: This module is deprecated and will no longer be updated. Python has CSV support integrated since version 2.3. You'd probably better switch to Python 2.3's CSV module and drop SCSV.

html2csv.py - HTML tables to CSV converter

A coarse "HTML tables to CSV" (Comma-Separated Values) converter. All tables from the HTML file will be converted (as they occur) into a single CSV file, suited for loading into a spreadsheet editor.

Can convert arbitrary size HTML files.
Supports badly-formatted HTML (missing tag, etc.).

To convert a bunch of HTML files, just type: python html2csv.py *.html

This is also an example of the use of the HTMLParser module.

gossyp - gossips from the internet

See http://sebsauvage.net/python/gossyp/ for more details.

webGobbler - mix random images from the internet

See http://sebsauvage.net/python/webgobbler/ for more details.

delxml2html.py - del.icio.us XML export to HTML converter

HTML export of del.icio.us sucks. It's not in chronological order.
This program takes the XML export of del.icio.us and converts it to an HTML page. It's a nice way to have all your del.icio.us bookmarks in a single HTML page (like mine).

Instructions:

Go to http://del.icio.us/api/posts/all
Enter your del.icio.us login and password
Save the page as all.xml
Run this program
You have your bookmarks in favs.html

Feel free to tweak this program to suit your needs.

PS: Yes, I know XSLT exists. But XSLT sucks.

myradioplayer.py - No-brainer "Play the music I want"

This is a no-brainer program "Play the music I want". It's dead simple:

Enter an artist name or song title.
clic "Play!"

That's all !
Quality is low (MP3 at 64 kbits/s most the time), but you'll get almost any song you want !
Give it a try with "u2", "manson" or "wild horses".

How can this be ?
Well, this program simply uses RadioBlogClub.com to find the music you want.
THIS PROGRAM DOES NOT DOWNLOAD A SINGLE BYTE OF MUSIC FROM THE INTERNET. So don't bother suing me, you'd be wasting your time.

Requirements: An MP3 player which supports the M3U playlist format and HTTP streaming.

About proxy support: If you use a proxy, you can set the environment variable HTTP_PROXY. Example: SET HTTP_PROXY=http://proxy.myisp.com:3128

cbscraper.py - ASPN Python cookbook scraper

ASPN Python cookbook (http://aspn.activestate.com/ASPN/Cookbook/Python) is a great site with tons of recipes for Python.
Shame is: You cannot browse these snippets offline.

This program downloads all the recipes and packs them in a single html file. This is nice for offline browsing while travelling, or taking on a USB key.
This is also handy for quick full-text search.

For those interested, I keep a copy of the result of this script here. It's updated once a while.

bashfr_download.py - bashfr.org downloader

This program downloads the whole bashfr.org archive and writes it in a single HTML file. Nice for offline reading.

Keep in mind these scripts are quick & dirty ! (some parameters are hardcoded, some script do not have error checking at all, etc.)

On the other end, they are fully functionnal, properly written and quite readable. Feel free to tweak the source code. There is always place for code beautifying and optimization. I would appreciate to be informed if you make use of these scripts, but this is not compulsory.

You will find downloads, documentation, howtos, tutorials, books, code snippets, loads of sources and links at http://www.python.org and http://www.python-eggs.org/links.html.

Back to main Python page