That is the difference between `groupby("x").count`

and `groupby("x").size`

in pandas ?

Does size just exclude nil ?

Skip to content
# What is the difference between size and count in pandas?

## Behaviour with

## Behavior with

Each Answer to this Q is separated by one/two green lines.

That is the difference between `groupby("x").count`

and `groupby("x").size`

in pandas ?

Does size just exclude nil ?

`size`

includes `NaN`

values, `count`

does not:

```
In [46]:
df = pd.DataFrame({'a':[0,0,1,2,2,2], 'b':[1,2,3,4,np.NaN,4], 'c':np.random.randn(6)})
df
Out[46]:
a b c
0 0 1 1.067627
1 0 2 0.554691
2 1 3 0.458084
3 2 4 0.426635
4 2 NaN -2.238091
5 2 4 1.256943
In [48]:
print(df.groupby(['a'])['b'].count())
print(df.groupby(['a'])['b'].size())
a
0 2
1 1
2 2
Name: b, dtype: int64
a
0 2
1 1
2 3
dtype: int64
```

## What is the difference between size and count in pandas?

The other answers have pointed out the difference, however, it is **not completely accurate** to say “`size`

counts NaNs while `count`

does not”. While `size`

does indeed count NaNs, this is actually a consequence of the fact that ** size returns the size (or the length) of the object** it is called on. Naturally, this also includes rows/values which are NaN.

So, to summarize, `size`

returns the size of the Series/DataFrame^{1},

```
df = pd.DataFrame({'A': ['x', 'y', np.nan, 'z']})
df
A
0 x
1 y
2 NaN
3 z
```

<!- _>

```
df.A.size
# 4
```

…while `count`

counts the non-NaN values:

```
df.A.count()
# 3
```

Notice that `size`

is an attribute (gives the same result as `len(df)`

or `len(df.A)`

). `count`

is a function.

_{1. DataFrame.size is also an attribute and returns the number of elements in the DataFrame (rows x columns).}

`GroupBy`

– Output StructureBesides the basic difference, there’s also the difference in the structure of the generated output when calling `GroupBy.size()`

vs `GroupBy.count()`

.

```
df = pd.DataFrame({
'A': list('aaabbccc'),
'B': ['x', 'x', np.nan, np.nan,
np.nan, np.nan, 'x', 'x']
})
df
A B
0 a x
1 a x
2 a NaN
3 b NaN
4 b NaN
5 c NaN
6 c x
7 c x
```

Consider,

```
df.groupby('A').size()
A
a 3
b 2
c 3
dtype: int64
```

Versus,

```
df.groupby('A').count()
B
A
a 2
b 0
c 2
```

`GroupBy.count`

returns a DataFrame when you call `count`

on all column, while `GroupBy.size`

returns a Series.

The reason being that `size`

is the same for all columns, so only a single result is returned. Meanwhile, the `count`

is called for each column, as the results would depend on on how many NaNs each column has.

`pivot_table`

Another example is how `pivot_table`

treats this data. Suppose we would like to compute the cross tabulation of

```
df
A B
0 0 1
1 0 1
2 1 2
3 0 2
4 0 0
pd.crosstab(df.A, df.B) # Result we expect, but with `pivot_table`.
B 0 1 2
A
0 1 2 1
1 0 0 1
```

With `pivot_table`

, you can issue `size`

:

```
df.pivot_table(index='A', columns="B", aggfunc="size", fill_value=0)
B 0 1 2
A
0 1 2 1
1 0 0 1
```

But `count`

does not work; an empty DataFrame is returned:

```
df.pivot_table(index='A', columns="B", aggfunc="count")
Empty DataFrame
Columns: []
Index: [0, 1]
```

I believe the reason for this is that `'count'`

must be done on the series that is passed to the `values`

argument, and when nothing is passed, pandas decides to make no assumptions.

Just to add a little bit to @Edchum’s answer, even if the data has no NA values, the result of count() is more verbose, using the example before:

```
grouped = df.groupby('a')
grouped.count()
Out[197]:
b c
a
0 2 2
1 1 1
2 2 3
grouped.size()
Out[198]:
a
0 2
1 1
2 3
dtype: int64
```

When we are dealing with normal dataframes then only difference will be an inclusion of NAN values, means count does not include NAN values while counting rows.

But if we are using these functions with the `groupby`

then, to get the correct results by `count()`

we have to associate any numeric field with the `groupby`

to get the exact number of groups where for `size()`

there is no need for this type of association.

In addition to all above answers, I would like to point out one more diffrence which I seem significant.

You can correlate Panda’s `Datarame`

size and count with Java’s `Vectors`

size and length. When we create vector some predefined memory is allocated to it. when we reach closer to number of elements it can occupy while adding elements, more memory is allocated to it. Similarly, in `DataFrame`

as we add elements, memory allocated to it increases.

Size attribute gives number of memory cell allocated to `DataFrame`

whereas count gives number of elements that are actually present in `DataFrame`

. For example,

You can see though there are 3 rows in `DataFrame`

, its size is 6.

This answer covers size and count difference with respect to `DataFrame`

and not `Pandas Series`

. I have not checked what happens with `Series`

The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .