[Solved] Pyspark: Need to show a count of null/empty values per each column in a dataframe
I have a spark dataframe and need to do a count of null/empty values for each column. I need to show ALL columns in the output.
I have looked online and found a few “similar questions” but the solutions totally blew my mind which is why I am posting here for personal help.
Here is what I have for code, I know this part of the puzzle.
from pyspark.sql import *
sf.isnull()
After running it, this is the error I receive AttributeError: 'DataFrame' object has no attribute 'isnull'
What’s interesting is that, I did the same exercise with pandas and used df.isna().sum()
which worked great. What am I missing for pyspark?
Solution #1:
you can do the following, just make sure your df is a Spark DataFrame.
from pyspark.sql.functions import col, when
df.select(*(count(when(col(c).isNull(), c)).alias(c) for c in df.columns)).show()
The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .