The things I’ve googled haven’t worked, so I’m turning to experts!
I have some text in a tab-delimited text file that has some sort of carriage return in it (when I open it in Notepad++ and use “show all characters”, I see [CR][LF] at the end of the line). I need to remove this carriage return (or whatever it is), but I can’t seem to figure it out. Here’s a snippet of the text file showing a line with the carriage return:
firstcolumn secondcolumn third fourth fifth sixth seventh moreoftheseventh 8th 9th 10th 11th 12th 13th
Here’s the code I’m trying to use to replace it, but it’s not finding the return:
with open(infile, "r") as f: for line in f: if "n" in line: line = line.replace("n", " ")
My script just doesn’t find the carriage return. Am I doing something wrong or making an incorrect assumption about this carriage return? I could just remove it manually in a text editor, but there are about 5000 records in the text file that may also contain this issue.
The goal here is select two columns from the text file, so I split on t characters and refer to the values as parts of an array. It works on any line without the returns, but fails on the lines with the returns because, for example, there is no element 9 in those lines.
vals = line.split("t") print(vals + " " + vals)
So, for the line of text above, this code fails because there is no index 9 in that particular array. For lines of text that don’t have the [CR][LF], it works as expected.
Technically, there is an answer!
with open(filetoread, "rb") as inf: with open(filetowrite, "w") as fixed: for line in inf: fixed.write(line)
The b in
open(filetoread, "rb") apparently opens the file in such a way that I can access those line breaks and remove them. This answer actually came from Stack Overflow user Kenneth Reitz off the site.
Here’s how to remove carriage returns without using a temporary file:
with open(file_name, 'r') as file: content = file.read() with open(file_name, 'w', newline='n') as file: file.write(content)
Depending on the type of file (and the OS it comes from, etc), your carriage return might be
'r'n'. The best way to get rid of them regardless of which one they are is to use
with open(infile, "r") as f: for line in f: line = line.rstrip() # strip out all tailing whitespace
If you want to get rid of ONLY the carriage returns and not any extra whitespaces that might be at the end, you can supply the optional argument to
with open(infile, "r") as f: for line in f: line = line.rstrip('rn') # strip out all tailing whitespace
Hope this helps
Python opens files in so-called
universal newline mode, so newlines are always
Python is usually built with universal newlines support; supplying ‘U’
opens the file as a text file, but lines may be terminated by any of
the following: the Unix end-of-line convention ‘n’, the Macintosh
convention ‘r’, or the Windows convention ‘rn’. All of these
external representations are seen as ‘n’ by the Python program.
You iterate through file line-by-line. And you are replacing
n in the lines. But in fact there are no
n because lines are already separated by
n by iterator and each line contains no
You can just read from file
f.read(). And then replace
n in it.
with open(infile, "r") as f: content = f.read() content = content.replace('n', ' ') #do something with content
I’ve created a code to do it and it works:
end1='C:...file1.txt' end2='C:...file2.txt' with open(end1, "rb") as inf: with open(end2, "w") as fixed: for line in inf: line = line.replace("n", "") line = line.replace("r", "") fixed.write(line)