Open() and codecs.open() in Python 2.7 behave strangely different
I have a text file with first line of unicode characters and all other lines in ASCII.
I try to read the first line as one variable, and all other lines as another. However, when I use the following code:
# -*- coding: utf-8 -*- import codecs import os filename = '1.txt' f = codecs.open(filename, 'r3', encoding='utf-8') print f names_f = f.readline().split(' ') data_f = f.readlines() print len(names_f) print len(data_f) f.close() print 'And now for something completely differerent:' g = open(filename, 'r') names_g = g.readline().split(' ') print g data_g = g.readlines() print len(names_g) print len(data_g) g.close()
I get the following output:
<open file '1.txt', mode 'rb' at 0x01235230> 28 7 And now for something completely differerent: <open file '1.txt', mode 'r' at 0x017875A0> 28 77
If I don’t use readlines(), whole file reads, not only first 7 lines both at codecs.open() and open().
Why does such thing happen?
And why does codecs.open() read file in binary mode, despite the ‘r’ parameter is added?
Upd: This is original file: http://www1.datafilehost.com/d/0792d687
Because you used
.readline() first, the
codecs.open() file has filled a linebuffer; the subsequent call to
.readlines() returns only the buffered lines.
If you call
.readlines() again, the rest of the lines are returned:
open(filename, 'r3', encoding='utf-8') line = f.readline() len(f.readlines()) 7 len(f.readlines()) 71f = codecs.
The work-around is to not mix
f = codecs.open(filename, 'r3', encoding='utf-8') data_f = f.readlines() names_f = data_f.pop(0).split(' ') # take the first line.
This behaviour is really a bug; the Python devs are aware of it, see issue 8260.
The other option is to use
io.open() instead of
io library is what Python 3 uses to implement the built-in
open() function and is a lot more robust and versatile than the