Question

[Solved] How to solve the memory error in Python

I am dealing with several large txt file, each of them has about 8000000 lines. A short example of the lines are:

usedfor zipper fasten_coat
usedfor zipper fasten_jacket
usedfor zipper fasten_pant
usedfor your_foot walk
atlocation camera cupboard
atlocation camera drawer
atlocation camera house
relatedto more plenty

The code to store them in a dictionary is:

dicCSK = collections.defaultdict(list)
for line in finCSK:
    line=line.strip('n')
    try:
        r, c1, c2 = line.split(" ")
    except ValueError:
        print line
    dicCSK[c1].append(r+" "+c2)

It runs good in the first txt file, but when it runs to the second txt file, I got an error MemoryError.

I am using window 7 64bit with python 2.7 32bit, intel i5 cpu, with 8Gb memory. How can I solve the problem?

Further explaining:
I have four large files, each file contains different information for many entities. For example, I want to find all information for cat, its father node animal and its child node persian cat and so on. So my program first read all txt files in the dictionary, then I scan all dictionaries to find information for cat and its father and its children.

Solution #1:

Simplest solution: You’re probably running out of virtual address space (any other form of error usually means running really slowly for a long time before you finally get a MemoryError). This is because a 32 bit application on Windows (and most OSes) is limited to 2 GB of user mode address space (Windows can be tweaked to make it 3 GB, but that’s still a low cap). You’ve got 8 GB of RAM, but your program can’t use (at least) 3/4 of it. Python has a fair amount of per-object overhead (object header, allocation alignment, etc.), odds are the strings alone are using close to a GB of RAM, and that’s before you deal with the overhead of the dictionary, the rest of your program, the rest of Python, etc. If memory space fragments enough, and the dictionary needs to grow, it may not have enough contiguous space to reallocate, and you’ll get a MemoryError.

Install a 64 bit version of Python (if you can, I’d recommend upgrading to Python 3 for other reasons); it will use more memory, but then, it will have access to a lot more memory space (and more physical RAM as well).

If that’s not enough, consider converting to a sqlite3 database (or some other DB), so it naturally spills to disk when the data gets too large for main memory, while still having fairly efficient lookup.

Respondent: flyingmouse

Solution #2:

Assuming your example text is representative of all the text, one line would consume about 75 bytes on my machine:

In [3]: sys.getsizeof('usedfor zipper fasten_coat')
Out[3]: 75

Doing some rough math:

75 bytes * 8,000,000 lines / 1024 / 1024 = ~572 MB

So roughly 572 meg to store the strings alone for one of these files. Once you start adding in additional, similarly structured and sized files, you’ll quickly approach your virtual address space limits, as mentioned in @ShadowRanger’s answer.

If upgrading your python isn’t feasible for you, or if it only kicks the can down the road (you have finite physical memory after all), you really have two options: write your results to temporary files in-between loading in and reading the input files, or write your results to a database. Since you need to further post-process the strings after aggregating them, writing to a database would be the superior approach.

Respondent: ShadowRanger

The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .

Most Popular

To Top
India and Pakistan’s steroid-soaked rhetoric over Kashmir will come back to haunt them both clenbuterol australia bossier man pleads guilty for leadership role in anabolic steriod distribution conspiracy