How to determine whether a column/variable is numeric or not in Pandas/NumPy?

Each Answer to this Q is separated by one/two green lines.

Is there a better way to determine whether a variable in Pandas and/or NumPy is numeric or not ?

I have a self defined dictionary with dtypes as keys and numeric / not as values.

In pandas 0.20.2 you can do:

import pandas as pd
from pandas.api.types import is_string_dtype
from pandas.api.types import is_numeric_dtype

df = pd.DataFrame({'A': ['a', 'b', 'c'], 'B': [1.0, 2.0, 3.0]})

>>>> True

>>>> True

You can use np.issubdtype to check if the dtype is a sub dtype of np.number. Examples:

np.issubdtype(arr.dtype, np.number)  # where arr is a numpy array
np.issubdtype(df['X'].dtype, np.number)  # where df['X'] is a pandas Series

This works for numpy’s dtypes but fails for pandas specific types like pd.Categorical as Thomas noted. If you are using categoricals is_numeric_dtype function from pandas is a better alternative than np.issubdtype.

df = pd.DataFrame({'A': [1, 2, 3], 'B': [1.0, 2.0, 3.0], 
                   'C': [1j, 2j, 3j], 'D': ['a', 'b', 'c']})
   A    B   C  D
0  1  1.0  1j  a
1  2  2.0  2j  b
2  3  3.0  3j  c

A         int64
B       float64
C    complex128
D        object
dtype: object

np.issubdtype(df['A'].dtype, np.number)
Out: True

np.issubdtype(df['B'].dtype, np.number)
Out: True

np.issubdtype(df['C'].dtype, np.number)
Out: True

np.issubdtype(df['D'].dtype, np.number)
Out: False

For multiple columns you can use np.vectorize:

is_number = np.vectorize(lambda x: np.issubdtype(x, np.number))
Out: array([ True,  True,  True, False], dtype=bool)

And for selection, pandas now has select_dtypes:

   A    B   C
0  1  1.0  1j
1  2  2.0  2j
2  3  3.0  3j

Based on @jaime’s answer in the comments, you need to check .dtype.kind for the column of interest. For example;

>>> import pandas as pd
>>> df = pd.DataFrame({'numeric': [1, 2, 3], 'not_numeric': ['A', 'B', 'C']})
>>> df['numeric'].dtype.kind in 'biufc'
>>> True
>>> df['not_numeric'].dtype.kind in 'biufc'
>>> False

NB The meaning of biufc: b bool, i int (signed), u unsigned int, f float, c complex. See

Pandas has select_dtype function. You can easily filter your columns on int64, and float64 like this:


This is a pseudo-internal method to return only the numeric type data

In [27]: df = DataFrame(dict(A = np.arange(3), 
                             B = np.random.randn(3), 
                             C = ['foo','bar','bah'], 
                             D = Timestamp('20130101')))

In [28]: df
   A         B    C                   D
0  0 -0.667672  foo 2013-01-01 00:00:00
1  1  0.811300  bar 2013-01-01 00:00:00
2  2  2.020402  bah 2013-01-01 00:00:00

In [29]: df.dtypes
A             int64
B           float64
C            object
D    datetime64[ns]
dtype: object

In [30]: df._get_numeric_data()
   A         B
0  0 -0.667672
1  1  0.811300
2  2  2.020402

How about just checking type for one of the values in the column? We’ve always had something like this:

isinstance(x, (int, long, float, complex))

When I try to check the datatypes for the columns in below dataframe, I get them as ‘object’ and not a numerical type I’m expecting:

df = pd.DataFrame(columns=('time', 'test1', 'test2'))
for i in range(20):
    df.loc[i] = [ - timedelta(hours=i*1000),i*10,i*100]

time     datetime64[ns]
test1            object
test2            object
dtype: object

When I do the following, it seems to give me accurate result:

isinstance(df['test1'][len(df['test1'])-1], (int, long, float, complex))



You can also try:

df_dtypes = np.array(df.dtypes)
df_numericDtypes= [x.kind in 'bifc' for x in df_dtypes]

It returns a list of booleans: True if numeric, False if not.

Just to add to all other answers, one can also use to get whats the data type of each column.

You can check whether a given column contains numeric values or not using dtypes

numerical_features = [feature for feature in train_df.columns if train_df[feature].dtypes != 'O']

Note: “O” should be capital

The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .

Leave a Reply

Your email address will not be published.