I’m writing an application in Python that is going to have a lot of different functions, so logically I thought it would be best to split up my script into different modules. Currently my script reads in a text file that contains code which has been converted into tokens and spellings. The script then reconstructs the code into a string, with blank lines where comments would have been in the original code.

I’m having a problem making the script object-oriented though. Whatever I try I can’t seem to get the program running the same way it would as if it was just a single script file. Ideally I’d like to have two script files, one that contains a class and function that cleans and reconstructs the file. The second script would simply call the function from the class in the other file on a file given as an argument from the command line. This is my current script:

import sys

tokenList = open(sys.argv[1], 'r')
cleanedInput=""
prevLine = 0

for line in tokenList:

    if line.startswith('LINE:'):
        lineNo = int(line.split(':', 1)[1].strip())
        diff = lineNo - prevLine - 1

        if diff == 0:
            cleanedInput += '\n'
        if diff == 1:
            cleanedInput += '\n\n'
        else:
            cleanedInput += '\n' * diff

        prevLine = lineNo
        continue

    cleanedLine = line.split(':', 1)[1].strip()
    cleanedInput += cleanedLine + ' '

print cleanedInput

After following Alex Martelli advice below, I now have the following code which gives me the same output as my original code.

def main():
    tokenList = open(sys.argv[1], 'r')
    cleanedInput = []
    prevLine = 0

    for line in tokenList:

        if line.startswith('LINE:'):
            lineNo = int(line.split(':', 1)[1].strip())
            diff = lineNo - prevLine - 1

            if diff == 0:
                cleanedInput.append('\n')
            if diff == 1:
                cleanedInput.append('\n\n')
            else:
                cleanedInput.append('\n' * diff)

            prevLine = lineNo
            continue

        cleanedLine = line.split(':', 1)[1].strip()
        cleanedInput.append(cleanedLine + ' ')

    print cleanedInput

if __name__ == '__main__':
    main()

I would still like to split my code into multiple modules though. A ‘cleaned file’ in my program will have other functions performed on it so naturally a cleaned file should be a class in its own right?

To speed up your existing code measurably, add def main(): before the assignment to tokenList, indent everything after that 4 spaces, and at the end put the usual idiom

if __name__ == '__main__':
  main()

(The guard is not actually necessary, but it’s a good habit to have nevertheless since, for scripts with reusable functions, it makes them importable from other modules).

This has little to do with “object oriented” anything: it’s simply faster, in Python, to keep all your substantial code in functions, not as top-level module code.

Second speedup, change cleanedInput into a list, i.e., its first assignment should be = [], and wherever you now have +=, use .append instead. At the end, ''.join(cleanedInput) to get the final resulting string. This makes your code take linear time as a function of input size (O(N) is the normal way of expressing this) while it currently takes quadratic time (O(N squared)).

Then, correctness: the two statements right after continue never execute. Do you need them or not? Remove them (and the continue) if not needed, remove the continue if those two statements are actually needed. And the tests starting with if diff will fail dramatically unless the previous if was executed, because diff would be undefined then. Does your code as posted perhaps have indentation errors, i.e., is the indentation of what you posted different from that of your actual code?

Considering these important needed enhancements, and the fact that it’s hard to see what advantage you are pursuing in making this tiny code OO (and/or modular), I suggest clarifying the indenting / correctness situation, applying the enhancements I’ve proposed, and leaving it at that;-).

Edit: as the OP has now applied most of my suggestions, let me follow up with one reasonable way to hive off most functionality to a class in a separate module. In a new file, for example foobar.py, in the same directory as the original script (or in site-packages, or elsewhere on sys.path), place this code:

def token_of(line):
  return line.partition(':')[-1].strip()

class FileParser(object):
  def __init__(self, filename):
    self.tokenList = open(filename, 'r')

  def cleaned_input(self):
    cleanedInput = []
    prevLine = 0

    for line in self.tokenList:
        if line.startswith('LINE:'):
            lineNo = int(token_of(line))
            diff = lineNo - prevLine - 1
            cleanedInput.append('\n' * (diff if diff>1 else diff+1))
            prevLine = lineNo
        else:
            cleanedLine = token_of(line)
            cleanedInput.append(cleanedLine + ' ')

    return cleanedInput

Your main script then becomes just:

import sys
import foobar

def main():
    thefile = foobar.FileParser(sys.argv[1])
    print thefile.cleaned_input()

if __name__ == '__main__':
  main()

When I do this particular refactoring, I usually start with an initial transformation within the first file. Step 1: move the functionality into a method in a new class. Step 2: add
the magic invocation below to get the file to run like a script again:

class LineCleaner:

    def cleanFile(filename):
        cleanInput = ""
        prevLine = 0
        for line in open(filename,'r'):         
           <... as in original script ..>

if __name__ == '__main__':
     cleaner = LineCleaner()
     cleaner.cleanFile(sys.argv[1]) 

You can get away with creating a function and putting all your logic in it. For full “object orientedness” though, you can do something like this:

ps – your posted code has a bug on the continue line – it is always executed and the last 2 lines will never execute.

class Cleaner:
  def __init__(...):
    ...init logic...
  def Clean(self):
    for line in open(self.tokenList):
      ...cleaning logic...
    return cleanedInput

def main(argv):
  cleaner = Cleaner(argv[1])
  print cleaner.Clean()
  return 0

if '__main__' == __name__:
  sys.exit(main(sys.argv))

If presented code is all code Just don’t add any class !!

Your code is too simply for that !!
OOP approach would add unnecessary complexity.

But if still wont.
Put all code into function eg.

def parse_tokenized_input(file):
    tokenList = open(file, 'r')
    cleanedInput=""
    prevLine = 0
    #rest of code

at end add:

if __name__ == '__main__':
    parse_tokenized_input(sys.argv[1])

If code works correct put def of function to new file (and all needed imports!)
eg. mymodyle.py

your script now will be:

from mymodule.py import parse_tokenized_input

if __name__ == '__main__':
        parse_tokenized_input(sys.argv[1])

Oh and think out better name for your function and module (module should have general name).