Each Answer to this Q is separated by one/two green lines.
I have some legacy code with a legacy function that takes a filename as an argument and processes the file contents. A working facsimile of the code is below.
What I want to do is not have to write to disk with some content that I generate in order to use this legacy function, so I though I could use
StringIO to create an object in place of the physical filename. However, this does not work, as you can see below.
StringIO was the way to go with this. Can anyone tell me if there is a way to use this legacy function and pass it something in the argument that isn’t a file on disk but can be treated as such by the legacy function? The legacy function does have the
with context manager doing work on the
filename parameter value.
The one thing I came across in google was: http://bugs.python.org/issue1286, but that didn’t help me…
from pprint import pprint import StringIO # Legacy Function def processFile(filename): with open(filename, 'r') as fh: return fh.readlines() # This works print 'This is the output of FileOnDisk.txt' pprint(processFile('c:/temp/FileOnDisk.txt')) print # This fails plink_data = StringIO.StringIO('StringIO data.') print 'This is the error.' pprint(processFile(plink_data))
This is the output in
['This file is on disk.\n']
This is the error:
Traceback (most recent call last): File "C:\temp\test.py", line 20, in <module> pprint(processFile(plink_data)) File "C:\temp\test.py", line 6, in processFile with open(filename, 'r') as fh: TypeError: coercing to Unicode: need string or buffer, instance found
StringIO instance is an open file already. The
open command, on the other hand, only takes filenames, to return an open file. A
StringIO instance is not suitable as a filename.
Also, you don’t need to close a
StringIO instance, so there is no need to use it as a context manager either. While closing an instance frees the memory allocated, so does simply letting the garbage collector reap the object. At any rate, the
contextlib.closing() context manager could take care of closing the object if you want to ensure freeing the memory while still holding a reference to the object.
If all your legacy code can take is a filename, then a
StringIO instance is not the way to go. Use the
tempfile module to generate a temporary filename instead.
Here is an example using a contextmanager to ensure the temp file is cleaned up afterwards:
import os import tempfile from contextlib import contextmanager @contextmanager def tempinput(data): temp = tempfile.NamedTemporaryFile(delete=False) temp.write(data) temp.close() try: yield temp.name finally: os.unlink(temp.name) with tempinput('Some data.\nSome more data.') as tempfilename: processFile(tempfilename)
You can also switch to the newer Python 3 infrastructure offered by the
io module (available in Python 2 and 3), where
io.BytesIO is the more robust replacement for
cStringIO.StringIO. This object does support being used as a context manager (but still can’t be passed to
you could define your own open function
fopen = open def open(fname,mode): if hasattr(fname,"readlines"): return fname else: return fopen(fname,mode)
however with wants to call __exit__ after its done and StringIO does not have an exit method…
you could define a custom class to use with this open
class MyStringIO: def __init__(self,txt): self.text = txt def readlines(self): return self.text.splitlines() def __exit__(self): pass
This one is based on the python doc of contextmanager
It’s just wrapping StringIO with simple context, and when exit is called, it will return to the yield point, and properly close the StringIO. This avoids the need of making tempfile, but with large string, this will still eat up the memory, since StringIO buffer that string.
It works well on most cases where you know the string data is not going to be long
from contextlib import contextmanager @contextmanager def buildStringIO(strData): from cStringIO import StringIO try: fi = StringIO(strData) yield fi finally: fi.close()
Then you can do:
with buildStringIO('foobar') as f: print(f.read()) # will print 'foobar'