[Solved] pandas: how to run a pivot with a multi-index?

I would like to run a pivot on a pandas DataFrame, with the index being two columns, not one. For example, one field for the year, one for the month, an ‘item’ field which shows ‘item 1’ and ‘item 2’ and a ‘value’ field with numerical values. I want the index to be year + month.

The only way I managed to get this to work was to combine the two fields into one, then separate them again. is there a better way?

Minimal code copied below. Thanks a lot!

PS Yes, I am aware there are other questions with the keywords ‘pivot’ and ‘multi-index’, but I did not understand if/how they can help me with this question.

import pandas as pd
import numpy as np

df= pd.DataFrame()
month = np.arange(1, 13)
values1 = np.random.randint(0, 100, 12)
values2 = np.random.randint(200, 300, 12)


df['month'] = np.hstack((month, month))
df['year'] = 2004
df['value'] = np.hstack((values1, values2))
df['item'] = np.hstack((np.repeat('item 1', 12), np.repeat('item 2', 12)))

# This doesn't work: 
# ValueError: Wrong number of items passed 24, placement implies 2
# mypiv = df.pivot(['year', 'month'], 'item', 'value')

# This doesn't work, either:
# df.set_index(['year', 'month'], inplace=True)
# ValueError: cannot label index with a null key
# mypiv = df.pivot(columns="item", values="value")

# This below works but is not ideal: 
# I have to first concatenate then separate the fields I need
df['new field'] = df['year'] * 100 + df['month']

mypiv = df.pivot('new field', 'item', 'value').reset_index()
mypiv['year'] = mypiv['new field'].apply( lambda x: int(x) / 100)  
mypiv['month'] = mypiv['new field'] % 100

Solution #1:

You can group and then unstack.

>>> df.groupby(['year', 'month', 'item'])['value'].sum().unstack('item')
item        item 1  item 2
year month                
2004 1          33     250
     2          44     224
     3          41     268
     4          29     232
     5          57     252
     6          61     255
     7          28     254
     8          15     229
     9          29     258
     10         49     207
     11         36     254
     12         23     209

Or use pivot_table:

>>> df.pivot_table(
        values='value', 
        index=['year', 'month'], 
        columns='item', 
        aggfunc=np.sum)
item        item 1  item 2
year month                
2004 1          33     250
     2          44     224
     3          41     268
     4          29     232
     5          57     252
     6          61     255
     7          28     254
     8          15     229
     9          29     258
     10         49     207
     11         36     254
     12         23     209
Respondent: Alexander

Solution #2:

I believe if you include item in your MultiIndex, then you can just unstack:

df.set_index(['year', 'month', 'item']).unstack(level=-1)

This yields:

                value      
item       item 1 item 2
year month              
2004 1         21    277
     2         43    244
     3         12    262
     4         80    201
     5         22    287
     6         52    284
     7         90    249
     8         14    229
     9         52    205
     10        76    207
     11        88    259
     12        90    200

It’s a bit faster than using pivot_table, and about the same speed or slightly slower than using groupby.

Respondent: Ajean

Solution #3:

The following worked for me:

mypiv = df.pivot(index=['year','month'],columns='item')[['values1','values2']]
Respondent: Sammeer S Raawat

Solution #4:

thanks to gmoutso comment you can use this:

def multiindex_pivot(df, index=None, columns=None, values=None):
    if index is None:
        names = list(df.index.names)
        df = df.reset_index()
    else:
        names = index
    list_index = df[names].values
    tuples_index = [tuple(i) for i in list_index] # hashable
    df = df.assign(tuples_index=tuples_index)
    df = df.pivot(index="tuples_index", columns=columns, values=values)
    tuples_index = df.index  # reduced
    index = pd.MultiIndex.from_tuples(tuples_index, names=names)
    df.index = index
    return df

usage:

df.pipe(multiindex_pivot, index=['idx_column1', 'idx_column2'], columns='foo', values='bar')

You might want to have a simple flat column structure and have columns to be of their intended type, simply add this:

(df
   .infer_objects()  # coerce to the intended column type
   .rename_axis(None, axis=1))  # flatten column headers
Respondent: moshevi

The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .

Leave a Reply

Your email address will not be published.