I tried to use Numpy’s nanmax function to get the max of all non-nan values in a matrix’s column, for some it works, for some it returns nan as the maximum. However, there are non-nan values in every column and just to be sure I tried the same thing in R with max(x, na.rm = T) and everything is fine there.
Anyone has any ideas of why this occurs? The only thing I can think of is that I converted the numpy matrix from a pandas frame but I really have no clue…
np.nanmax(datamatrix, axis=0) matrix([[1, 101, 193, 1, 163.0, 10.6, nan, 4.7, 142.0, 0.47, 595.0, 170.0, 5.73, 24.0, 27.0, 23.0, 361.0, 33.0, 94.0, 9.2, 16.8, nan, nan, 91.0, nan, nan, nan, nan, 0.0, 105.0, nan, nan, nan, nan,nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]], dtype=object)
Your array is an
object array, meaning the elements in the array are arbitrary python objects. Pandas uses object arrays, so it is likely that when you converted your Pandas DataFrame to a numpy array, the result was an object array.
nanmax() doesn’t handle object arrays correctly.
Here are a couple examples, one using a
numpy.matrix and one a
numpy.ndarray. With a
matrix, you get no warning at all the something went wrong:
In : import numpy as np In : m = np.matrix([[2.0, np.nan, np.nan]], dtype=object) In : np.nanmax(m) Out: nan
With an array, you get a cryptic warning, but
nan is still returned:
In : a = np.array([[2.0, np.nan, np.nan]], dtype=object) In : np.nanmax(a) /Users/warren/miniconda3scipy/lib/python3.5/site-packages/numpy/lib/nanfunctions.py:326: RuntimeWarning: All-NaN slice encountered warnings.warn("All-NaN slice encountered", RuntimeWarning) Out: nan
You can determine if your array is an object array in a few ways. When you display the array in an interactive python or ipython shell, you’ll see
dtype=object. Or you can check
a is an object array, you’ll see either
object (depending on whether you end up seeing the
repr() of the dtype).
Assuming all the values in the array are, in fact, floating point values, a way to work around this is to first convert from the object array to an array of floating point values:
In : b = a.astype(np.float64) In : b Out: array([[ 2., nan, nan]]) In : np.nanmax(b) Out: 2.0 In : n = m.astype(np.float64) In : np.nanmax(n) Out: 2.0