I am trying to compare two simple and summarized pandas dataframe with line plot from Seaborn library but one of the lines shifts one unit in X axis. What's wrong with it?
The dataframes are:
Here is my code:
df = pd.read_csv('/home/gazelle/Documents/m3inference/m3_result.csv',index_col='id')
df = df.drop("Unnamed: 0",axis=1)
for i, v in df.iterrows():
if str(i) not in result:
df.drop(i, inplace=True)
else:
df.loc[i, 'estimated'] = result[str(i)]
m3 = pd.read_csv('plot_result.csv').set_index('id')
ids = list(m3.index.values)
m3 = m3['age'].value_counts().to_frame().reset_index().sort_values('index')
m3 = m3.rename(columns={m3.columns[0]:'bucket', m3.columns[1]:'age'})
df_estimated = df[df.index.isin(ids)]['estimated'].value_counts().to_frame().reset_index().sort_values('index')
df_estimated = df_estimated.rename(columns={df_estimated.columns[0]:'bucket', df_estimated.columns[1]:'age'})
sns.lineplot(x='bucket', y='age', data=m3)
sns.lineplot(x='bucket', y='age', data=df_estimated)
And the result is:
As has been pointed out in the comments, the data and code you provide appear to produce the correct result:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
sns.set()
m3 = pd.DataFrame({"index": [2, 3, 4, 1], "age": [123, 116, 66, 33]})
df_estimated = pd.DataFrame({"index": [3, 2, 4, 1], "estimated": [200, 100, 37, 1]})
sns.lineplot(x="index", y="age", data=m3)
sns.lineplot(x="index", y="estimated", data=df_estimated)
plt.show()
This gives a plot which is different from the one you posted above:
From your screenshots it looks like you are working in a Jupyter notebook. You are probably suffering from the issue that at the time you plot, the dataframe m3 no longer has the values you printed above, but has been modified.
Related
I have a df which represents three states (S1, S2, S3) at 3 timepoints (1hr, 2hr and 3hr). I would like to show a stacked bar plot of the states but the stacks are discontinous or at least not cumulative. How can I fix this in Seaborn? It is important that time is on the y-axis and the state counts on the x-axis.
Below is some code.
data = [[3, 2, 18],[4, 13, 6], [1, 2, 20]]
df = pd.DataFrame(data, columns = ['S1', 'S2', 'S3'])
df = df.reset_index().rename(columns = {'index':'Time'})
melt = pd.melt(df, id_vars = 'Time')
plt.figure()
sns.histplot(data = melt,x = 'value', y = 'Time', bins = 3, hue = 'variable', multiple="stack")
EDIT:
This is somewhat what I am looking for, I hope this gives you an idea. Please ignore the difference in the scales between boxes...
If I understand correctly, I think you want to use value as a weight:
sns.histplot(
data=melt, y='Time', hue='variable', weights='value',
multiple='stack', shrink=0.8, discrete=True,
)
This is pretty tough in seaborn as it doesn't natively support stacked bars. You can use either the builtin plot from pandas, or try plotly express.
data = [[3, 2, 18],[4, 13, 6], [1, 2, 20]]
df = pd.DataFrame(data, columns = ['S1', 'S2', 'S3'])
df = df.reset_index().rename(columns = {'index':'Time'})
# so your y starts at 1
df.Time+=1
melt = pd.melt(df, id_vars = 'Time')
# so y isn't treated as continuous
melt.Time = melt.Time.astype('str')
Pandas can do it, but getting the labels in there is a bit of pain. Check around to figure out how to do it.
df.set_index('Time').plot(kind='barh', stacked=True)
Plotly makes it easier:
import plotly.express as px
px.bar(melt, x='value', y='Time', color='variable', orientation='h', text='value')
I have a DataFrame where the index is NOT time. I need to re-scale all of the values from an old index which is not equi-spaced, to a new index which has different limits and is equi-spaced.
The first and last values in the columns should stay as they are (although they will have the new, stretched index values assigned to them).
Example code is:
import numpy as np
import pandas as pd
%matplotlib inline
index = np.asarray((2, 2.5, 3, 6, 7, 12, 15, 18, 20, 27))
x = np.sin(index / 10)
df = pd.DataFrame(x, index=index)
df.plot();
newindex = np.linspace(0, 29, 100)
How do I create a DataFrame where the index is newindex and the new x values are interpolated from the old x values?
The first new x value should be the same as the first old x value. Ditto for the last x value. That is, there should not be NaNs at the beginning and copies of the last old x repeated at the end.
The others should be interpolated to fit the new equi-spaced index.
I tried df.interpolate() but couldn't work out how to interpolate against the newindex.
Thanks in advance for any help.
This is works well:
import numpy as np
import pandas as pd
def interp(df, new_index):
"""Return a new DataFrame with all columns values interpolated
to the new_index values."""
df_out = pd.DataFrame(index=new_index)
df_out.index.name = df.index.name
for colname, col in df.iteritems():
df_out[colname] = np.interp(new_index, df.index, col)
return df_out
I have adopted the following solution:
import numpy as np
import pandas as pd
import matplotlib.pylab as plt
def reindex_and_interpolate(df, new_index):
return df.reindex(df.index | new_index).interpolate(method='index', limit_direction='both').loc[new_index]
index = np.asarray((2, 2.5, 3, 6, 7, 12, 15, 18, 20, 27))
x = np.sin(index / 10)
df = pd.DataFrame(x, index=index)
newindex = pd.Float64Index(np.linspace(min(index)-5, max(index)+5, 50))
df_reindexed = reindex_and_interpolate(df, newindex)
plt.figure()
plt.scatter(df.index, df.values, color='red', alpha=0.5)
plt.scatter(df_reindexed.index, df_reindexed.values, color='green', alpha=0.5)
plt.show()
I wonder if you're up against one of pandas limitations; it seems like you have limited choices for aligning your df to an arbitrary set of numbers (your newindex).
For example, your stated newindex only overlaps with the first and last numbers in index, so linear interpolation (rightly) interpolates a straight line between the start (2) and end (27) of your index.
import numpy as np
import pandas as pd
%matplotlib inline
index = np.asarray((2, 2.5, 3, 6, 7, 12, 15, 18, 20, 27))
x = np.sin(index / 10)
df = pd.DataFrame(x, index=index)
newindex = np.linspace(min(index), max(index), 100)
df_reindexed = df.reindex(index = newindex)
df_reindexed.interpolate(method = 'linear', inplace = True)
df.plot()
df_reindexed.plot()
If you change newindex to provide more overlapping points with your original data set, interpolation works in a more expected manner:
newindex = np.linspace(min(index), max(index), 26)
df_reindexed = df.reindex(index = newindex)
df_reindexed.interpolate(method = 'linear', inplace = True)
df.plot()
df_reindexed.plot()
There are other methods that do not require one to manually align the indices, but the resulting curve (while technically correct) is probably not what one wants:
newindex = np.linspace(min(index), max(index), 1000)
df_reindexed = df.reindex(index = newindex, method = 'ffill')
df.plot()
df_reindexed.plot()
I looked at the pandas docs but I couldn't identify an easy solution.
https://pandas.pydata.org/pandas-docs/stable/basics.html#basics-reindexing
I have the following problem: I want to combine graphs from various years in one plot. To explain you more about my problem I made the hereunder simplified problem.
# packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Here I make a simplified dataframe which i want to plot
data = {'Dates': ['02-04-2014', '18-08-2014', '05-03-2014', '06-06-2014', '05-08-2013', '06-11-2013', '03-01-2013', '12-02-2013'], 'Values':
[7, 8, 11, 3, 6, 1, 8, 13]}
df = pd.DataFrame.from_dict(data)
This is important for me because it is the format I work with in my problem
df['Dates'] = pd.to_datetime(df['Dates'])
Here I do the plotting
years = sorted([i for i in df['Dates'].apply(lambda x: x.year).unique()])
for i in years:
df1 = df[(df['Dates'].apply(lambda x: x.year) == i)]
df1 = df1.sort_values(by = ['Dates'])
plt.show()
This returns in this case two separate line plots, one for the year 2013 and one for 2014. I want these combined in one graph. So, that i get one grap with a legend for the year.
Hope you can help!
I'm trying to print actual values in pies instead of percentage, for one dimensonal series this helps:
Matplotlib pie-chart: How to replace auto-labelled relative values by absolute values
But when I try to create multiple pies it won't work.
d = {'Yes':pd.Series([825, 56], index=["Total", "Last 2 Month"]), 'No':pd.Series([725, 73], index=["Total", "Last 2 Month"])}
df = pd.DataFrame(d)
df = df.T
def absolute_value(val):
a = np.round(val/100.*df.values, 0)
return a
df.plot.pie(subplots=True, figsize=(12, 6),autopct=absolute_value)
plt.show()
How can I make this right?
Thanks.
A hacky solution would be to index the dataframe within the absolute_value function, considering that this function is called exactly once per value in that dataframe.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
d = {'Yes':pd.Series([825, 56], index=["Total", "Last 2 Month"]),
'No':pd.Series([725, 73], index=["Total", "Last 2 Month"])}
df = pd.DataFrame(d)
df = df.T
i = [0]
def absolute_value(val):
a = df.iloc[i[0]%len(df),i[0]//len(df)]
i[0] += 1
return a
df.plot.pie(subplots=True, figsize=(12, 6),autopct=absolute_value)
plt.show()
The other option is to plot the pie charts individually by looping over the columns.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
d = {'Yes':pd.Series([825, 56], index=["Total", "Last 2 Month"]),
'No':pd.Series([725, 73], index=["Total", "Last 2 Month"])}
df = pd.DataFrame(d)
df = df.T
print df.iloc[:,0].sum()
def absolute_value(val, summ):
a = np.round(val/100.*summ,0)
return a
fig, axes = plt.subplots(ncols=len(df.columns))
for i,ax in enumerate(axes):
df.iloc[:,i].plot.pie(ax=ax,autopct=lambda x: absolute_value(x,df.iloc[:,i].sum()))
plt.show()
In both cases the output would look similar to this
I am trying to graph multi indexing plot using matplotlib. However, I was struggling to find the exact code from the previously answered code. Can anyone assist me how can I produce similar graph.
import pandas as pd
import matplotlib.pyplot as plt
import pylab as pl
import numpy as np
import pandas
xls_filename = "abc.xlsx"
f = pandas.ExcelFile(xls_filename)
df = f.parse("Sheet1", index_col='Year' and 'Month')
f.close()
matplotlib.rcParams.update({'font.size': 18}) # Font size of x and y-axis
df.plot(kind= 'bar', alpha=0.70)
It is not indexing as I wanted and not produced the graph as expected as well. Help appreciated.
I created a DataFrame from some of the values I see on your attached plot and plotted it.
index = pd.MultiIndex.from_tuples(tuples=[(2011, ), (2012, ), (2016, 'M'), (2016, 'J')], names=['year', 'month'])
df = pd.DataFrame(index=index, data={'1': [10, 140, 6, 9], '2': [23, 31, 4, 5], '3': [33, 23, 1, 1]})
df.plot(kind='bar')
This is the outcome
where the DataFrame is this