[Solved] Condition to parse the CSV file only when number of columns exceeds by 1

I am trying to read data from CSV file that exists in S3 bucket using pandas. However, sometimes sometimes CSV files exists with just 1 column which is something I dont want to parse. I only want to parse the CSV in which there is more than 1 column. Can anyone tell me the condition to check which says if number of columns > 1, then only parse the file, else exit the loop.
Below is the code I am trying.

body = csv_obj['Body']
csv_string = body.read().decode('utf-8-sig')
df = pd.read_csv(StringIO(csv_string),usecols = [3,4,6])

Below is the second code I tried but got error: File "pandas_libsparsers.pyx", line 545, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file

code:

import pandas as pd
import sys
from io import StringIO # Python 3.x


s3_client = session.client("s3")
s3_resource = session.resource("s3")
csv_obj = s3_client.get_object(Bucket="XXXX", Key="XXXXXXYYY.csv")
body = csv_obj['Body']
csv_string = body.read().decode('utf-8-sig')                        
df = pd.read_csv(StringIO(csv_string), nrows = 10)
if len(df.columns) > 1:
     df = pd.read_csv(StringIO(csv_string),usecols = [3,4,6])
     Products_list = df.values.tolist()
     Products_list = str(Products_list)
     print(Products_list)
Enquirer: Ryan

||

Solution #1:

This might help:

import pandas as pd

# put your scv file instead of following url
url="https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
c = pd.read_csv(url)

if len(c.count()) > 1:
  # Your code
  print("Enjoy dude :)")
Respondent: Ryan

Solution #2:

Parse all files, and them check for if len(df.columns) > 1.

If the file is large, you can test for number of columns by reading limited number of rows with read_csv(nrows=N) parameter, and if test for number of columns succeeds, read the file in whole again.

I am not sure what your loop code looks like, here is a simple example, which returns nothing if number of columns is equal to 1:

body = csv_obj['Body']
csv_string = body.read().decode('utf-8-sig')
df = pd.read_csv(StringIO(csv_string), nrows=10)
if len(df.columns) > 1:
    df = pd.read_csv(StringIO(csv_string),usecols = [3,4,6])
else:
    return
Respondent: Scott

The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .

Leave a Reply

Your email address will not be published.