I have two different DataFrames in Python, one is the actual revenue values and the second one is the values of the prediction with the accumulative per day (index of the rows). Both DataFrames have the same length.
I want to compare them on the same plot, row by row. If I want to plot only one row from each DataFrame, I use this code:
df_actual.loc[71].T.plot(figsize=(14,10), kind='line')
df_preds.loc[71].T.plot(figsize=(14,10), kind='line')
The output is this:
However, the ideal output is to have all the rows for each DataFrame in a grid so I can compare all the results:
I have tried to create a for loop to itinerate each row but it is not working:
for i in range(20):
df_actual.loc[i].T.plot(figsize=(14,10), kind='line')
df_preds.loc[i].T.plot(figsize=(14,10), kind='line')
Is there any way to do this that is not manual? Thanks!
it would be helpful if you provided a sample of your dfs.
assuming both dfs have the same length & assuming you want 2 columns, try this:
fig, ax = plt.subplots(round(len(df_actual)/2),2)
ax.ravel()
for i in range(len(ax)):
sns.lineplot(df_actual.loc[i].T, ax=ax[i], color="navy")
sns.lineplot(df_preds.loc[i].T, ax=ax[i], color="orange")
edit:
this works for me (you just have to add your .T):
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df_actual = pd.DataFrame(data=[[1,2,3,4,5], [6,7,8,9,10]], columns = ["col1","col2", "col3", "col4", "col5"])
df_pred = pd.DataFrame(data=[[3,4,5,6,7], [8,9,10,11,12]], columns = ["col1", "col2", "col3", "col4", "col5"])
fig, ax = plt.subplots(round(len(df_actual)/2),2)
ax.ravel()
for i in range(len(ax)):
ax[i].plot(df_actual.loc[i], color="navy")
ax[i].plot(df_pred.loc[i], color="orange")
Related
I have multiple data frames consist of three main columns: 1)the categories (c1, c2, c3), one includes the data values, and one includes different time-periods (AA, BB, CC, DD).
what I am trying to generate is to generate boxplots of the data for all dataframe, at once, and in one figure !
I did try with different enumerate options and "ax" argument, but still it generates the boxplot separately, I couldn't figure it out.
allCN=[df1, df2, df3]
fig, axs = plt.subplots(nrows = 3, ncols=4, figsize = (30,54))
axes = axes.flatten()
for i, x in enumerate(allCN):
sns.set(style="ticks", palette='Set2')
sns.set_context("paper", font_scale=1.1, rc={"lines.linewidth": 1.1})
g=sns.catplot(x="Cat", y="Data", ax=axs[i,0],
col="Period", data=x, kind="box", height=4, aspect=10/18,
width=0.6,fliersize=2.5,showfliers=False, linewidth=1.1,
notch=False,orient="v"))
g.set_ylabels("test", size=12)
g.set_xlabels("")
One way is to stack your data frames and use the row= argument inside catplot. First to create something like your data:
import pandas as pd
import numpy as np
import seaborn as sns
df1 = pd.DataFrame({'Cat':np.random.choice(['C1','C2','C3'],50),
'Data':np.random.uniform(0,1,50),"Period":np.random.choice(['AA','CC','DD'],50)})
df2 = pd.DataFrame({'Cat':np.random.choice(['C1','C2','C3'],50),
'Data':np.random.uniform(0,1,50),"Period":np.random.choice(['AA','CC','DD'],50)})
df3 = pd.DataFrame({'Cat':np.random.choice(['C1','C2','C3'],50),
'Data':np.random.uniform(0,1,50),"Period":np.random.choice(['AA','CC','DD'],50)})
Then concat the dataframes and add another column (i used source below) to annotate the dataframe:
allCN=pd.concat([df1,df2,df3])
allCN['source'] = np.repeat(['df1','df2','df3'],[len(df1),len(df2),len(df3)])
sns.catplot(x="Cat", y="Data",
col="Period", row = "source",
data=allCN, kind="box", height=2,aspect=1.6)
What about the hue parameter in sns.boxplot? Would that give you the result you want?
import seaborn as sns
import matplotlib.pyplot as plt
tips = sns.load_dataset("tips")
box_plot = sns.boxplot(x="day", y="total_bill", data=tips, hue="smoker")
plt.show()
I have a pandas DataFrame which has 200 columns and each column is a list of 200 values.
I want to plot those values in series in such a way that
First column (100 values) lie between 0 to 1 in x-axis
Second column (200 values) lie between 1 to 2 in x-axis
Third column (200 values) lie between 2 to 3 in x-axis
...
is there any way in python to solve this problem?
Thanks in advance
So, I gather that by "between 0 and 1", you actually want the points of Column 1 situated at x=0.5. To have all values of Column 1 at the same x-coordinate, just pass that fixed x-coordinate to the call to scatter. I show here the example for 20 columns with 20 values per column:
df = pd.DataFrame()
for i in range(20):
df[f'Col {i}'] = np.random.randn(20)
fig, axes = plt.subplots()
for i in range(20):
axes.scatter([i+0.5]*len(df), df[f'Col {i}'])
axes.set_xticks(range(20))
plt.show()
Personally, I prefer to iterate over the columns (or column keys) because one is flexible with the column names. This code snippet is a quick example:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# create random data with non-serial column names
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=['Col 1','Col 2','Col 4','Col 5'])
fig, ax= plt.subplots()
# creating a list of DataFramecolumns
columns = list(df)
# iterate over columns of the DataFrame
for i,col in enumerate(columns):
y = df[col]
x = [i+0.5] * len(y)
ax.scatter(x,y)
plt.show()
I have a pandas dataframe:
import pandas as pd
data1 = {'Date':['03-19-2019'],
'Total':[35],
'Solved':[19],
'Arrived':[23],
}
df1 = pd.DataFrame(data1)
and I want to plot a bar plot like this:
with
df1.plot(kind='barh',x='Date',y='Total', ax=ax0, color='#C0C0C0',
width=0.5)
df1.plot(kind='barh',x='Date',y='Arrived', ax=ax0, color='#C0FFFF',
width=0.5)
df1.plot(kind='barh',x='Date',y='Solved', ax=ax0, color='#C0C0FF',
width=0.5)
However, to avoid overlapping, I have to draw each column taking into account which of them has the bigger value.(Total greater than Arrived greater than Solved)
How can I avoid to do this and automate this process easily?
There must be a straightforward and simpler approach in Pandas but I just came up with this quick workaround. The idea is following:
Leave out the first column Date and sort the remaining columns.
Use the sorted indices for plotting the columns in ascending order
To make the colors consistent, you can make use of dictionary so that the ascending/descending order doesn't affect your colors.
fig, ax0 = plt.subplots()
ids = np.argsort(df1.values[0][1:])[::-1]
colors = {'Total': '#C0C0C0', 'Arrived': '#C0FFFF', 'Solved':'#C0C0FF'}
for col in np.array(df1.columns[1:].tolist())[ids]:
df1.plot(kind='barh',x='Date',y=col, ax=ax0, color=colors[col], width=0.1)
A stacked bar graph can be produced in pandas via the stacked=True option. To use this you need to make the "Date" the index first.
import matplotlib.pyplot as plt
import pandas as pd
data1 = {'Date':['03-19-2019'],
'Total':[35],
'Solved':[19],
'Arrived':[23],
}
df = pd.DataFrame(data1)
df.set_index("Date").plot(kind="barh", stacked=True)
plt.show()
I am trying to get an output from a dataframe that shows a stacked horizontal bar chart with a table to the left of it. The relevant data is as follows:
import pandas as pd
import matplotlib.pyplot as plt
cols = ['metric','target','daily_avg','days_green','days_yellow','days_red']
vals = ['Volume',338.65,106.81,63,2,1]
OutDict = dict(zip(cols,vals))
df = pd.DataFrame(columns = cols)
df = df.append(OutDict, ignore_index = True)
I'd like to get something similar to what's in the following: Python Matplotlib how to get table only. I can get the stacked bar chart:
df[['days_green','days_yellow','days_red']].plot.barh(stacked=True)
Adding in the keyword argument table=True puts a table below the chart. How do I get the axis to either display the df as a table or add one in next to the chart. Also, the DataFrame will eventually have more than one row, but if I can get it work for one then I should be able to get it to work for n rows.
Thanks in advance.
Unfortunately using the pandas.plot method you won't be able to do this. The docs for the table parameter state:
If True, draw a table using the data in the DataFrame and the data will be transposed to meet matplotlib’s default layout. If a Series or DataFrame is passed, use passed data to draw a table.
So you will have to use matplotlib directly to get this done. One option is to create 2 subplots; one for your table and one for your chart. Then you can add the table and modify it as you see fit.
import matplotlib.pyplot as plt
import pandas as pd
cols = ['metric','target','daily_avg','days_green','days_yellow','days_red']
vals = ['Volume',338.65,106.81,63,2,1]
OutDict = dict(zip(cols,vals))
df = pd.DataFrame(columns = cols)
df = df.append(OutDict, ignore_index = True)
fig, (ax1, ax2) = plt.subplots(1, 2)
df[['days_green','days_yellow','days_red']].plot.barh(stacked=True, ax=ax2)
ax1.table(cellText=df[['days_green','days_yellow','days_red']].values, colLabels=['days_green', 'days_yellow', 'days_red'], loc='center')
ax1.axis('off')
fig.show()
I have a dataframe that has an index (words) and a single column (counts) for some lyrics. I am trying to create a heatmap based on the word counts.
Cuenta
Que 179
La 145
Y 142
Me 113
No 108
I am trying to produce the heatmap like this:
df1 = pd.DataFrame.from_dict([top50]).T
df1.columns = ['Cuenta']
df1.sort_values(['Cuenta'], ascending = False, inplace=True)
result = df1.pivot(index=df1.index, columns='Cuenta', values=df1.Cuenta.count)
sns.heatmap(result, annot=True, fmt="g", cmap='viridis')
plt.show()
But, it keeps throwing 'Index' object has no attribute 'levels'
Any ideas why this isn't working? I tried using the index or words as a separate column and still doesn't work.
The data is one-dimensional. The counts are already present in the one (and only) column of the dataframe. There is no meaningless way to pivot this data.
You would hence directly plot the dataframe as a heatmap.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame({"Cuenta": [179,145,142,113,108]},
index=["Que", "La", "Y", "Me", "No"])
sns.heatmap(df, annot=True, fmt="g", cmap='viridis')
plt.show()
If the data to be on the y-axis is a column, and not the index of the dataframe, then use .set_index
df = pd.DataFrame({"Cuenta": [179,145,142,113,108],
"words": ["Que", "La", "Y", "Me", "No"]})
# given a dataframe of two columns, set the column as the index
df.set_index("words", inplace=True)
ax = sns.heatmap(df, annot=True, fmt="g", cmap='viridis')
sns.heatmap will result in an IndexError if passing a pandas.Series.
.value_counts create a Series
df['column'] and df.column create a Series. Use df[['column']] instead.
# sample data
tips = sns.load_dataset('tips')
# value_counts creates a Series
vc = tips.time.value_counts()
# convert to a DataFrame
vc = vc.to_frame()
# plot
ax = sns.heatmap(data=vc)