I have data like this :
[ ('2018-04-09', '10:18:11',['s1',10],['s2',15],['s3',5])
('2018-04-09', '10:20:11',['s4',8],['s2',20],['s1',10])
('2018-04-10', '10:30:11',['s4',10],['s5',6],['s6',3]) ]
I want to plot a stacked graph preferably of this data.
X-axis will be time,
it should be like this
I created this image in paint just to show.
X axis will show time like normal graph does( 10:00 ,April 3,2018).
I am stuck because the string value (like 's1',or 's2' ) will change in differnt bar graph.
Just to hard code and verify,I try this:
import plotly
import plotly.graph_objs as go
import matplotlib.pyplot as plt
import matplotlib
plotly.offline.init_notebook_mode()
def createPage():
graph_data = []
l1=[('com.p1',1),('com.p2',2)('com.p3',3)]
l2=[('com.p1',1),('com.p4',2)('com.p5',3)]
l3=[('com.p2',8),('com.p3',2)('com.p6',30)]
trace_temp = go.Bar(
x='2018-04-09 10:18:11',
y=l1[0],
name = 'top',
)
graph_data.append(trace_temp)
plotly.offline.plot(graph_data, filename='basic-scatter3.html')
createPage()
Error I am getting is Tuple Object is not callable.
So can someone please suggest some code for how I can plot such data.
If needed,I may store data in some other form which may be helpful in plotting.
Edit :
I used the approach suggested in accepted answer and succeed in plotting using plotly like this
fig=df.iplot(kin='bar',barmode='stack',asFigure=True)
plotly.offline.plt(fig,filename="stack1.html)
However I faced one error:
1.When Time intervals are very close,Data overlaps on graph.
Is there a way to overcome it.
You could use pandas stacked bar plot. The advantage is that you can create with pandas easily the table of column/value pairs you have to generate anyhow.
from matplotlib import pyplot as plt
import pandas as pd
all_data = [('2018-04-09', '10:18:11', ['s1',10],['s2',15],['s3',5]),
('2018-04-09', '10:20:11', ['s4',8], ['s2',20],['s1',10]),
('2018-04-10', '10:30:11', ['s4',10],['s5',6], ['s6',3]) ]
#load data into dataframe
df = pd.DataFrame(all_data, columns = list("ABCDE"))
#combine the two descriptors
df["day/time"] = df["A"] + "\n" + df["B"]
#assign each list to a new row with the appropriate day/time label
df = df.melt(id_vars = ["day/time"], value_vars = ["C", "D", "E"])
#split each list into category and value
df[["category", "val"]] = pd.DataFrame(df.value.values.tolist(), index = df.index)
#create a table with category-value pairs from all lists, missing values are set to NaN
df = df.pivot(index = "day/time", columns = "category", values = "val")
#plot a stacked bar chart
df.plot(kind = "bar", stacked = True)
#give tick labels the right orientation
plt.xticks(rotation = 0)
plt.show()
Output:
Related
I'm trying to set different colors for some bars in a plotly express bar graph:
import plotly.express as px
import pandas as pd
data = {'Name':['2020/01', '2020/02', '2020/03', '2020/04',
'2020/05', '2020/07', '2020/08'],
'Value':[34,56,66,78,99,55,22]}
df = pd.DataFrame(data)
color_discrete_sequence = ['#ec7c34']*len(df)
color_discrete_sequence[5] = '#609cd4'
fig=px.bar(df,x='Name',y='Value',color_discrete_sequence=color_discrete_sequence)
fig.show()
My expectations were that one (the sixth one) bar had a different color, however I got this result:
What am I doing wrong?
This happens because color in px.bar is used to name a category to illustrate traits or dimensions of a dataset using a colorscale. Or in you your case, rather a color cycle since you're dealing with a categorical / discrete case. color_discrete_sequence is then used to specify which color sequence to follow. One way to achieve your goal using your setup here, is to simply define a string variable with unique values, for example like df['category'] [str(i) for i in df.index], and then use:
fig=px.bar(df,x='Name',y='Value',
color = 'category',
color_discrete_sequence=color_discrete_sequence,
)
Plot:
If df['category'] is a numerical value, color_discrete_sequence will be ignored, and a default continuous sequence will be applied:
If anything else is unclear, don't hesitate to let me know.
Complete code:
import plotly.express as px
import pandas as pd
data = {'Name':['2020/01', '2020/02', '2020/03', '2020/04',
'2020/05', '2020/07', '2020/08'],
'Value':[34,56,66,78,99,55,22]}
df = pd.DataFrame(data)
df['category'] = [str(i) for i in df.index]
# df['category'] = df.index
color_discrete_sequence = ['#ec7c34']*len(df)
color_discrete_sequence[5] = '#609cd4'
fig=px.bar(df,x='Name',y='Value',
color = 'category',
color_discrete_sequence=color_discrete_sequence,
)
fig.show()
I have a dataframe, with 4 distinct values in a column, for each value I need to set the custom colors.
Below is the sample data
val,cluster
118910.000000,3
71209.000000,2
25674.666667,0
109267.666667,3
8.000000,1
Below is the code.
fig = px.histogram(types, x="val",color='cluster')
fig.show()
types is the dataframe name from data given
When I figureI get default colors. Instead I need to get
0:red
2:blue
1:purple
3:green
How can I set the custom colors in python plotly for histogram
Can anyone help on this pls
You can use color_discrete_map.
import pandas as pd
import random
df = pd.DataFrame([[random.random()*100,random.randint(0,3)]for i in range (100)],columns = ['val','cluster'])
fig = px.histogram(df, x="cluster",y='val',
color = 'cluster',
color_discrete_map = {0:'red',1:'blue',2:'purple',3:'green'}
)
fig
Is there any pandas way to "link" a dataframe column name with a nice description for that name?
See the following snippet where I have a dataframe with two column: the weight in kg and the height in meter of ten people.
When I create the dataframe I use this syntax
df = pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
but I would like to "attach" in the creation of the dataframe a beautiful description for column name a and $\b_0$ some latex for column name b so that all the graph items that automatically use that names appears nice to the user (legend, tick labels, axis labels and so on).
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
sz = 10
bmi = np.random.normal(25,0.1,sz)
h = np.random.normal(70*2.54/100,4*2.54/100,sz)
w = bmi*h**2
df = pd.DataFrame({'height_m':h,'weight_kg':w})
ax1 = df.plot.scatter(x='height_m',y='weight_kg')
plt.savefig('raw.png')
ax2 = df.plot.scatter(x='height_m',y='weight_kg')
ax2.set_xlabel('$h_0$, Altezza/m')
ax2.set_ylabel('$p_0$, Peso/kg')
plt.savefig('publishable.png')
plt.show()
This is the raw picture straight from pandas:
This is the picture I would like to get... but without modifying by myself the plot adding set_xlabel and set_ylabel and so on...
You can name your DataFrame correctly from the beginning and plot the dataframe accessing df.columns:
sz = 10
bmi = np.random.normal(25,0.1,sz)
h = np.random.normal(70*2.54/100,4*2.54/100,sz)
w = bmi*h**2
df = pd.DataFrame({'$h_0$, Altezza/m':h,'$p_0$, Peso/kg':w})
df.plot.scatter(x=df.columns[0], y=df.columns[1])
plt.savefig('publishable.png')
plt.show()
Plus, if you are using Jupyter Notebook / Jupyter Lab, it will convert the LaTeX correctly:
I am trying to get an output from a dataframe that shows a stacked horizontal bar chart with a table to the left of it. The relevant data is as follows:
import pandas as pd
import matplotlib.pyplot as plt
cols = ['metric','target','daily_avg','days_green','days_yellow','days_red']
vals = ['Volume',338.65,106.81,63,2,1]
OutDict = dict(zip(cols,vals))
df = pd.DataFrame(columns = cols)
df = df.append(OutDict, ignore_index = True)
I'd like to get something similar to what's in the following: Python Matplotlib how to get table only. I can get the stacked bar chart:
df[['days_green','days_yellow','days_red']].plot.barh(stacked=True)
Adding in the keyword argument table=True puts a table below the chart. How do I get the axis to either display the df as a table or add one in next to the chart. Also, the DataFrame will eventually have more than one row, but if I can get it work for one then I should be able to get it to work for n rows.
Thanks in advance.
Unfortunately using the pandas.plot method you won't be able to do this. The docs for the table parameter state:
If True, draw a table using the data in the DataFrame and the data will be transposed to meet matplotlib’s default layout. If a Series or DataFrame is passed, use passed data to draw a table.
So you will have to use matplotlib directly to get this done. One option is to create 2 subplots; one for your table and one for your chart. Then you can add the table and modify it as you see fit.
import matplotlib.pyplot as plt
import pandas as pd
cols = ['metric','target','daily_avg','days_green','days_yellow','days_red']
vals = ['Volume',338.65,106.81,63,2,1]
OutDict = dict(zip(cols,vals))
df = pd.DataFrame(columns = cols)
df = df.append(OutDict, ignore_index = True)
fig, (ax1, ax2) = plt.subplots(1, 2)
df[['days_green','days_yellow','days_red']].plot.barh(stacked=True, ax=ax2)
ax1.table(cellText=df[['days_green','days_yellow','days_red']].values, colLabels=['days_green', 'days_yellow', 'days_red'], loc='center')
ax1.axis('off')
fig.show()
tmpdf.boxplot(['original','new'], by = 'by column', ax = ax, sym = '')
gets me a plot like this
I want to compare "original" with "new", how can I arrange to put the two "0" boxes in one panel and the two "1" boxes in another panel? And of course swap the labelling with that.
Thanks
Here is a sample dataset to demonstrate.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# simulate some artificial data
# ==========================================
np.random.seed(0)
df = pd.DataFrame(np.random.rand(10,2), columns=['original', 'new'] )
df['by column'] = pd.Series([0,0,0,0,1,1,1,1,1,1])
# your original plot
ax = df.boxplot(['original', 'new'], by='by column', figsize=(12,6))
To get desired output, use groupby explicitly out of boxplot, so that we iterate over all subgroups, and plot a boxplot for each.
ax = df[['original', 'new']].groupby(df['by column']).boxplot(figsize=(12,6))