Subtract a year from a datetime column in pandas

Each Answer to this Q is separated by one/two green lines.

I have a datetime column as below –

>>> df['ACC_DATE'].head(2)
538   2006-04-07
550   2006-04-12
Name: ACC_DATE, dtype: datetime64[ns]

Now, I want to subtract an year from each row of this column. How can I achieve the same & which library can I use?

The expected field –

        ACC_DATE    NEW_DATE
538   2006-04-07  2005-04-07
549   2006-04-12  2005-04-12

You can use DateOffset to achieve this:

In[88]:
df['NEW_DATE'] = df['ACC_DATE'] - pd.DateOffset(years=1)
df

Out[88]: 
        ACC_DATE   NEW_DATE
index                      
538   2006-04-07 2005-04-07
550   2006-04-12 2005-04-12

Use DateOffset:

df["NEW_DATE"] = df["ACC_DATE"] - pd.offsets.DateOffset(years=1)
print (df)
        ACC_DATE   NEW_DATE
index                      
538   2006-04-07 2005-04-07
550   2006-04-12 2005-04-12

You could use pd.Timedelta:

df["NEW_DATE"] = df["ACC_DATE"] - pd.Timedelta(days=365) 

Or replace:

df["NEW_DATE"] = df["ACC_DATE"].apply(lambda x: x.replace(year=x.year - 1))

But neither will catch leap years so you could use dateutil.relativedelta :

from dateutil.relativedelta import  relativedelta

df["NEW_DATE"] = df["ACC_DATE"].apply(lambda x: x - relativedelta(years=1))

If having a pd.Timestamp object rather than a column,

  1. Using pd.DateOffset(years=n) is not ideal as it produces:

UserWarning: Discarding nonzero nanoseconds in conversion

  1. pd.Timedelta() doesn’t accept years.

The only approach that worked for me in this case is pd.Timestamp.replace:

t = pd.Timestamp.now()
t = t.replace(year=t.year - n)

This was hinted at in the answer by Padriac but it needed further clarity.


The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .