You should be using the tokenize to actually parse the sentences. If you tokenize on spaces then you can put spaces in where needed (between words) and not have to worry about 'removing' them.