So i tried reading all the csv files from a folder and then concatenate them to create a big csv(structure of all the files was same), save it and read it again. All this was done using Pandas. The Error occurs while reading. I am Attaching the code and the Error below.

import pandas as pd
import numpy as np
import glob

path =r'somePath' # use your path
allFiles = glob.glob(path + "/*.csv")
frame = pd.DataFrame()
list_ = []
for file_ in allFiles:
    df = pd.read_csv(file_,index_col=None, header=0)
    list_.append(df)
store = pd.concat(list_)
store.to_csv("C:\work\DATA\Raw_data\\store.csv", sep=',', index= False)
store1 = pd.read_csv("C:\work\DATA\Raw_data\\store.csv", sep=',')

Error:-

CParserError                              Traceback (most recent call last)
<ipython-input-48-2983d97ccca6> in <module>()
----> 1 store1 = pd.read_csv("C:\work\DATA\Raw_data\\store.csv", sep=',')

C:\Users\armsharm\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\io\parsers.pyc in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, na_fvalues, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, float_precision, nrows, iterator, chunksize, verbose, encoding, squeeze, mangle_dupe_cols, tupleize_cols, infer_datetime_format, skip_blank_lines)
    472                     skip_blank_lines=skip_blank_lines)
    473 
--> 474         return _read(filepath_or_buffer, kwds)
    475 
    476     parser_f.__name__ = name

C:\Users\armsharm\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\io\parsers.pyc in _read(filepath_or_buffer, kwds)
    258         return parser
    259 
--> 260     return parser.read()
    261 
    262 _parser_defaults = {

C:\Users\armsharm\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\io\parsers.pyc in read(self, nrows)
    719                 raise ValueError('skip_footer not supported for iteration')
    720 
--> 721         ret = self._engine.read(nrows)
    722 
    723         if self.options.get('as_recarray'):

C:\Users\armsharm\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\io\parsers.pyc in read(self, nrows)
   1168 
   1169         try:
-> 1170             data = self._reader.read(nrows)
   1171         except StopIteration:
   1172             if nrows is None:

pandas\parser.pyx in pandas.parser.TextReader.read (pandas\parser.c:7544)()

pandas\parser.pyx in pandas.parser.TextReader._read_low_memory (pandas\parser.c:7784)()

pandas\parser.pyx in pandas.parser.TextReader._read_rows (pandas\parser.c:8401)()

pandas\parser.pyx in pandas.parser.TextReader._tokenize_rows (pandas\parser.c:8275)()

pandas\parser.pyx in pandas.parser.raise_parser_error (pandas\parser.c:20691)()

CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.

I tried using csv reader as well:-

import csv
with open("C:\work\DATA\Raw_data\\store.csv", 'rb') as f:
    reader = csv.reader(f)
    l = list(reader)

Error:-

Error                                     Traceback (most recent call last)
<ipython-input-36-9249469f31a6> in <module>()
      1 with open('C:\work\DATA\Raw_data\\store.csv', 'rb') as f:
      2     reader = csv.reader(f)
----> 3     l = list(reader)

Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?

I found this error, the cause was that there were some carriage returns “\r” in the data that pandas was using as a line terminator as if it was “\n”. I thought I’d post here as that might be a common reason this error might come up.

The solution I found was to add lineterminator=”\n” into the read_csv function like this:

df_clean = pd.read_csv('test_error.csv',
                 lineterminator="\n")

If you are using python and its a big file you may use
engine="python" as below and should work.

df = pd.read_csv( file_, index_col=None, header=0, engine="python" )

Not an answer, but too long for a comment (not speaking of code formatting)

As it breaks when you read it in csv module, you can at least locate the line where the error occurs:

import csv
with open(r"C:\work\DATA\Raw_data\store.csv", 'rb') as f:
    reader = csv.reader(f)
    linenumber = 1
    try:
        for row in reader:
            linenumber += 1
    except Exception as e:
        print (("Error line %d: %s %s" % (linenumber, str(type(e)), e.message)))

Then look in store.csv what happens at that line.

Change the directory to the CSV

Corpus = pd.read_csv(r"C:\Users\Dell\Desktop\Dataset.csv",encoding='latin-1')

the problem is from the format of the excel file.
We select Save as Options from menu and change the format from xls to csv, then
it will surely work.

New contributor

trylearning2022 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.