[Solved] Seaborn: Setting a distplot bin range?
So I have this data set showing the GDP of countries in billions (so 1 trillion gdp = 1000).
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_csv('2014_World_GDP')
df.sort('GDP (BILLIONS)',ascending=False, inplace=True)
sorted = df['GDP (BILLIONS)']
fig, ax = plt.subplots(figsize=(12, 8))
sns.distplot(sorted,bins=8,kde=False,ax=ax)
The above code give me the following figure:
What I want to do whoever is set the bins range so they look more like [250,500,750,1000,2000,5000,10000,20000].
Is there a way to do that in seaborn?
Solution #1:
You could use logarithmic bins, which would work well with data that is distributed as yours is. Here is an example:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame()
df['GDP (BILLIONS)'] = 2000*1./(np.random.random(250))
df.sort_values(by='GDP (BILLIONS)',ascending=False, inplace=True)
fig, ax = plt.subplots(1,2,figsize=(8, 3))
sns.distplot(df['GDP (BILLIONS)'].values,bins=8,kde=False,ax=ax[0])
ax[0].set_title('Linear Bins')
LogMin, LogMax = np.log10(df['GDP (BILLIONS)'].min()),np.log10(df['GDP (BILLIONS)'].max())
newBins = np.logspace(LogMin, LogMax,8)
sns.distplot(df['GDP (BILLIONS)'].values,bins=newBins,kde=False,ax=ax[1])
ax[1].set_xscale('log')
ax[1].set_title('Log Bins')
fig.show()
Solution #2:
You could just put your bin range as a sequence, in your case that would be:
sns.distplot(df['GDP (BILLIONS)'].values,
bins=[250,500,750,1000,2000,5000,10000,20000],
kde=False,ax=ax[0])
However, doing this alone won’t change the x-axis scale, you would need the set scale lines in Robbie’s answer to do that.
The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .