Python for dummies, Part II
In my previous Python entry I posted code to collect string/word information from a document/string, plus a simple MSN-specific parser that allows me to navigate a DOM document and extract its nodes and content.
Now, I need functionality to search for the files to parse on a disk and I need a GUI.
Search first.
I spent an awful lot of time playing around with OS-specific code to traverse and navigate directory structures only to find out that Python has most of this functionality built-in, faster, and cross-platform.
I didn't want my disk search to take ages. Hence, I decided to split it in three phases:
1 - Search for log files on the current folder, in case the user drops the executable there. Assuming the parser from Part I is named 'my_parser':
import os, fnmatch
files = [filename for filename in os.listdir('.')]
for f in files:
# only list XML files
if fnmatch.fnmatch( f, '*.xml' ):
my_parser.parse(f)
2 - Search for files, starting from the user's profile folder. This prunes the search space by avoiding OS and user specific folders. I separated searching the right folders from listing the log files;
import os, fnmatch, re
folders = []
files = []
r = re.compile(r'msn_username') # msn_username is passed by the user
def browse((r, folders), dirpath, namelist):
for name in namelist:
# list folders' names starting with msn_username given
if r.search(name):
folders.append(os.path.join(dirpath, name))
def listfiles((wildcard, files), dirpath, namelist):
for name in namelist:
# only append 'wildcard'-specified filenames
if fnmatch.fnmatch( name, wildcard ):
files.append(os.path.join(dirpath, name))
userbase = os.environ['USERPROFILE']
# change directory to user profile folder
os.chdir(userbase)
os.path.walk(os.getcwd(), browse, (r, folders))
if folders:
# populate the list of XML files from the folders
wcard = '*.xml'
for fld in folders:
os.path.walk(fld, listfiles, (wcard, files))
3 - Search for files on the whole disk, in case none was found with the two previous methods;
# replace the userbase value with root directory
userbase = '\\'
os.chdir(userbase)
os.path.walk(os.getcwd(), browse, (r, folders))
...
And that's it. Amazing how simple it looks now. Part III, the GUI, will follow soon.
UPDATE: This module does most of the above in a nicer, more elegant way. No need to reinvent the wheel.
No comments:
Post a Comment