I have the following pandas dataframe
import pandas as pd
df_dict = {'index': [0, 1, 2, 3, 4, 5, 6, 7, 8], 'columns': ['from', 'moving', 'N', 'total', 'perc', 'helper', 'label'], 'data': [['0', 'no', 29, 39, 74.35897435897436, 'all', '74.4 %'], ['0', 'yes', 10, 39, 25.641025641025642, 'all', '25.6 %'], ['1', 'no', 77, 84, 91.66666666666667, 'all', '91.7 %'], ['1', 'yes', 7, 84, 8.333333333333334, 'all', '8.3 %'], ['2', 'no', 6, 6, 100.0, 'all', '100.0 %'], ['3', 'no', 19, 25, 76.0, 'all', '76.0 %'], ['3', 'yes', 6, 25, 24.0, 'all', '24.0 %'], ['4', 'no', 30, 45, 66.66666666666667, 'all', '66.7 %'], ['4', 'yes', 15, 45, 33.333333333333336, 'all', '33.3 %']]}
df = pd.DataFrame(index=df_dict['index'], columns=df_dict['columns'], data=df_dict['data'])
I am using the following code:
import plotly.express as px
def pl(dt, color_col, title, facet_col=None,
color_discrete_map=dict(zip(['4', '0', '2', '3', '1'],['#003898', '#164461','#61B3C1', '#8ED3F6 ','#8DD1C8']))):
px.bar(dt, x='helper', y='perc', color=color_col, facet_col=facet_col, category_orders={col: sorted(dt_temp[col].unique())},
color_discrete_map=color_discrete_map, title=title, text='label').show()
pl(dt=df, facet_col='from',
color_col='from', title='title')
In order to produce this plot:
I would like to add a shading of the specified color in the color_discrete_map with respect to the moving column of the df, so that the no's are a bit more faded.
How could I do that with plotly express ?
I don't believe that you can access the text font through any of the px.bar parameters. However, you can save your px.bar in an object called fig, then directly modify the each bar object through fig.data[0], fig.data[1], ... fig.data[n-1] for n bars.
The text color for each of these bars can be modified by passing a dictionary to the textcolor attribute of each of bar objects fig.data[0]... fig.data[n-1]. For example, you could modify the text of the first bar to be red with the line: fig.data[0].textcolor: {'color': 'red'}. This lends itself to looping through each fig.data bar object and modifying the textcolor attribute to be the desired color.
The last part is to make your color a shade of the bar color. I am not that familiar with hex color codes, so it makes the most sense to convert each hex color code to an rgb tuple of three values, and find the intermediate color between this value and white. plotly.colors conveniently has the methods hex_to_rgb and find_intermediate_color so we can use these to convert each of your hex colors to rgb, then find the rgb tuple between that color and white which is rgb(255,255,255).
To be consistent with the way you've structured your program, I put the code setting the textfont attributes in your pl function.
import pandas as pd
import plotly.express as px
from plotly.colors import hex_to_rgb, find_intermediate_color
df_dict = {'index': [0, 1, 2, 3, 4, 5, 6, 7, 8], 'columns': ['from', 'moving', 'N', 'total', 'perc', 'helper', 'label'], 'data': [['0', 'no', 29, 39, 74.35897435897436, 'all', '74.4 %'], ['0', 'yes', 10, 39, 25.641025641025642, 'all', '25.6 %'], ['1', 'no', 77, 84, 91.66666666666667, 'all', '91.7 %'], ['1', 'yes', 7, 84, 8.333333333333334, 'all', '8.3 %'], ['2', 'no', 6, 6, 100.0, 'all', '100.0 %'], ['3', 'no', 19, 25, 76.0, 'all', '76.0 %'], ['3', 'yes', 6, 25, 24.0, 'all', '24.0 %'], ['4', 'no', 30, 45, 66.66666666666667, 'all', '66.7 %'], ['4', 'yes', 15, 45, 33.333333333333336, 'all', '33.3 %']]}
df = pd.DataFrame(index=df_dict['index'], columns=df_dict['columns'], data=df_dict['data'])
bar_color_map = dict(zip(['4', '0', '2', '3', '1'],['#003898', '#164461','#61B3C1', '#8ED3F6','#8DD1C8']))
def pl(dt, color_col, title, facet_col=None, color_discrete_map=bar_color_map):
fig = px.bar(dt, x='helper', y='perc', color=color_col, facet_col=facet_col,
# category_orders={col: sorted(dt_temp[col].unique())},
category_orders={color_col: sorted(dt[color_col].unique())},
color_discrete_map=color_discrete_map, title=title, text='label')
## set fig.data.textfont attribute
for bar_number in range(len(fig.data)):
bar_color = hex_to_rgb(bar_color_map[str(bar_number)])
shaded_text_color = find_intermediate_color(bar_color,(255,255,255),0.5)
shaded_int_rgb_color = tuple([int(text_color) for text_color in shaded_text_color])
# print('rgb'+str(shaded_int_rgb_color))
fig.data[bar_number].textfont = {'color': 'rgb'+str(shaded_int_rgb_color)}
fig.show()
pl(dt=df, facet_col='from', color_col='from', title='title')
Related
Just trying to add a HLine (which is a mean of the values) to a groupby barplot I have created, however, I keep getting a value error "all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)". Anyone know where I could be going wrong? Data example that produces the error below:
Data_test = [['2010', 5, 'Yes'], ['2010', 7, 'No'],
['2011', 3, 'Yes'], ['2011', 5, 'No'],
['2012', 7, 'Yes'], ['2012', 3, 'No'],
['2013', 8, 'Yes'], ['2013', 7, 'No'],
['2014', 2, 'Yes'], ['2014', 3, 'No'],
['2015', 6, 'Yes'], ['2015', 7, 'No'],
['2016', 1, 'Yes'], ['2016', 7, 'No'],
['2017', 9, 'Yes'], ['2017', 3, 'No'],
['2018', 7, 'Yes'], ['2018', 5, 'No'],
['2019', 3, 'Yes'], ['2019', 9, 'No']]
test_df = pd.DataFrame(Data_test, columns = ['Year', 'Value', 'Category'])
test_plot = test_df.groupby(['Year', 'Category'])['Value'].mean().plot(kind = 'bar').opts(multi_level = False) * hv.HLine(test_df['Value'].mean())
test_plot
Thanks for any suggestions!
It is not clear what error you are seeing. If I run the code you have posted, it works, if you override the default plot method of Series object with hvplot. The line for plotting should be
test_plot = test_df.groupby(['Year', 'Category'])['Value'].mean().hvplot.bar().options(multi_level = False) * hv.HLine(test_df['Value'].mean())
And you get the plot.
The remaining code is same as yours
import pandas as pd
import holoviews as hv
import hvplot.pandas
Data_test = [['2010', 5, 'Yes'], ['2010', 7, 'No'],
['2011', 3, 'Yes'], ['2011', 5, 'No'],
['2012', 7, 'Yes'], ['2012', 3, 'No'],
['2013', 8, 'Yes'], ['2013', 7, 'No'],
['2014', 2, 'Yes'], ['2014', 3, 'No'],
['2015', 6, 'Yes'], ['2015', 7, 'No'],
['2016', 1, 'Yes'], ['2016', 7, 'No'],
['2017', 9, 'Yes'], ['2017', 3, 'No'],
['2018', 7, 'Yes'], ['2018', 5, 'No'],
['2019', 3, 'Yes'], ['2019', 9, 'No']]
test_df = pd.DataFrame(Data_test, columns = ['Year', 'Value', 'Category'])
test_plot = test_df.groupby(['Year', 'Category'])['Value'].mean().hvplot.bar().options(multi_level = False) * hv.HLine(test_df['Value'].mean())
test_plot
I'm looking for a solution that would help me do the following.
Suppose I have to generate HTML code using hand-drawn design. I got element type, x, y coordination width, and height as NumPy array. I need to sort these arrays according to the y coordination if the two elements are in the same value then sort according to the y value. Then I need to group the elements in the same type.
I created array like this :
import numpy
s = numpy.array([
#element type, x,y,width,height
["hyperlink", 5, 150, 25, 10],
["paragraph", 20, 60, 10, 10],
["image", 85, 150, 25, 10],
["radio", 20, 60, 10, 10],
["radio", 85, 150, 25, 10],
["button", 20, 60, 10, 10],
["text_field", 20, 40, 25, 10],
["label", 10, 10, 20, 10]])
print(s)
lists = s.tolist()
print([{"element_type ": x[0] , "x": x[1], "y": x[2], "width":x[3], "height" :x[4]} for i, x in enumerate(lists)])
I'd like to sort it such that my points are ordered by y-coordinate, and then by x in cases where their coordinate is the same.
Then I need to group the elements such as two radio buttons as in the above example. I expected output as follows.
{
'element_type ': 'hyperlink',
'group_id' :"1",
'x': '5',
'y': '150',
'width': '25',
'height': '10'
},
{
'element_type ': 'image',
'group_id' :"3",
'x': '85',
'y': '150',
'width': '25',
'height': '10'
},
{'element_type ': 'radio',
'group_id' :"4",
'x': '20',
'y': '60',
'width': '10',
'height': '10'
},
{
'element_type ': 'radio',
'group_id' :"4",
'x': '85',
'y': '150',
'width': '25',
'height': '10'
},
{
'element_type ': 'text_field',
'group_id' :"5",
'x': '20',
'y': '40',
'width': '25',
'height': '10'
},
{'element_type ': 'label',
**group_id :"5",**
'x': '10',
'y': '10',
'width': '20',
'height': '10'
}]
Can I get an idea for this? I used python.
numpy is not designed for groupby operations. Except the case you force it. pandas is a better option here.
import pandas as pd
df = pd.DataFrame(s, columns = ['element_type', 'x', 'y', 'width', 'height'])
df['group_id'] = df.groupby(['x', 'y', 'width', 'height']).ngroup()
>>> df.to_dict('records')
[{'element type': 'hyperlink',
'x': '5',
'y': '150',
'width': '25',
'height': '10',
'group_id': 3},
...,
{'element type': 'label',
'x': '10',
'y': '10',
'width': '20',
'height': '10',
'group_id': 0}]
Say I have a pandas dataframe like this:
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
I want to write a function to group by one column, get the average for each category, and then return the highest average
The current way I did this
avg = df.groupby('attempts')['score'].mean()
print(avg.max())
I want to write something like this
def return_max_average(df, category_column, numerical_column):
avg = df.groupby('category_column')['numerical_column'].mean()
return avg.max()
What would be the best way to write this function?
Here is a working example code:
data = {'name': ['Joe', 'Mike', 'Jack', 'Hack', 'David', 'Marry', 'Wansi', 'Sidy', 'Jason', 'Even'],
'age': [25, 32, 18, np.nan, 15, 20, 41, np.nan, 37, 32],
'gender': [1, 0, 1, 1, 0, 1, 0, 0, 1, 0],
'isMarried': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df = pd.DataFrame(data, index=labels)
print(df)
print("---------------------------")
obj = df[df["age"]>40].index.format()
print("obj is",type(obj))
I hope obj as a string (), but the above result is list().
What should I do to correct it ?
You can simply put obj = obj[0] and it will then become a string
data = {'name': ['Joe', 'Mike', 'Jack', 'Hack', 'David', 'Marry', 'Wansi', 'Sidy', 'Jason', 'Even'],
'age': [25, 32, 18, np.nan, 15, 20, 41, np.nan, 37, 32],
'gender': [1, 0, 1, 1, 0, 1, 0, 0, 1, 0],
'isMarried': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df = pd.DataFrame(data, index=labels)
print(df)
print("---------------------------")
obj = df[df["age"]>40].index.format()
obj = obj[0]
print("obj is",type(obj))
obj = df[df["age"]>40].index.format()[0]
print("obj is",obj,type(obj))
obj is g <class 'str'>
I have created the following graph. I would like to annotate the first value and the last value for each of the line created in the graph. The value would be annotated for each line just before the line and at its end
How to?
data = {'Time':['1', '2', '1', '2', '1', '2', '1', '2', '1', '2', '1', '2', '1', '2' , '1', '2'], 'Country':['Italy', 'Italy', 'Italy', 'Italy', 'Italy', 'Italy', 'Italy', 'Italy', 'Russia', 'Russia', 'Russia', 'Russia' , 'Russia', 'Russia', 'Russia', 'Russia'], 'Score':[20, 21, 14, 15, 19, 18, 5, 9, 5, 8, 3, 3, 5, 1, 3, 8]}
df = pd.DataFrame(data)
sns.lineplot(x="Time", y="Score", hue="Country", data=df)
Without more details on what you exactly want you can use the axes object returned by sns.lineplot and then use the text method of the axes object.
In the following I calculate the mean 'Score' for each country at 'Time' = 1 and then then add this as text to the graph. You can customise this as needed:
ax = sns.lineplot(x="Time", y="Score", hue="Country", data=df)
means = df.groupby(['Country','Time']).mean()
times = ['1', '2']
x_positions = [0.02, 0.9]
for country in df['Country'].unique():
for time, xpos in zip(times, x_positions):
mean = means.loc[(country, time)].values
ax.text(xpos, mean-1.1, mean[0])