If you work with strings in your Python scripts and you're writing obscure logic to process them, then you need to look into regex in Python. It lets you describe patterns instead of writing ...
Smarter document extraction starts here.
This library has been tested on a limited set of documents. It is highly likely that documents exist this from which the library, in its current state, cannot extract text.