[Solved] Pyspark: Need to show a count of null/empty values per each column in a dataframe

I have a spark dataframe and need to do a count of null/empty values for each column. I need to show ALL columns in the output.

I have looked online and found a few “similar questions” but the solutions totally blew my mind which is why I am posting here for personal help.

Here is what I have for code, I know this part of the puzzle.

from pyspark.sql import *

sf.isnull()

After running it, this is the error I receive AttributeError: 'DataFrame' object has no attribute 'isnull'

What’s interesting is that, I did the same exercise with pandas and used df.isna().sum() which worked great. What am I missing for pyspark?

Enquirer: wally

||
Solution #1:

you can do the following, just make sure your df is a Spark DataFrame.

from pyspark.sql.functions import col, when

df.select(*(count(when(col(c).isNull(), c)).alias(c) for c in df.columns)).show()
Respondent: jayrythium
The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .

Leave a Reply

Your email address will not be published.