opepyxl and 2D area chart - python

I need to create a 2D area chart based from data in an excel file I generate in python.
I have followed the tutorial https://openpyxl.readthedocs.io/en/stable/charts/area.html#d-area-charts.
My problem is quite simple, the data are not ordered in the same way the example given in the tutorial. They are transposed.
With data ordered that way, I didn't find so far the correct modification in the Reference method:
cats = Reference(ws, min_col=1, min_row=1, max_row=7)
data = Reference(ws, min_col=2, min_row=1, max_col=3, max_row=7)
chart.add_data(data, titles_from_data=True)
chart.set_categories(cats)
All my trials gave an incorrect area graph.
Any suggestion?
Thanks.
Sebastien

You dont say what a correct chart is, but the default chart would probably be something like the following. Showing chart creation as adding each different series from the data in range A1:G3.
import openpyxl
from openpyxl.chart import (
AreaChart,
Reference,
Series,
)
wb = openpyxl.load_workbook('foo.xlsx')
ws = wb['Sheet1']
chart = AreaChart()
chart.title = "Area Chart"
chart.style = 13
chart.x_axis.title = 'Number'
chart.y_axis.title = 'Batch'
# nseries x-axis OR add as category see below
ndata = Reference(ws, min_col=1, min_row=1, max_col=7, max_row=1)
nseries = Series(ndata, title_from_data=ndata)
chart.append(nseries)
# first series
s1data = Reference(ws, min_col=1, min_row=2, max_col=7, max_row=2)
series1 = Series(s1data, title_from_data=s1data)
series1.graphicalProperties.line.solidFill = "00000"
series1.graphicalProperties.solidFill = "ff9900"
chart.append(series1)
# second series
s2data = Reference(ws, min_col=1, min_row=3, max_col=7, max_row=3)
series2 = Series(s2data, title_from_data=s2data)
series2.graphicalProperties.line.solidFill = "00000"
series2.graphicalProperties.solidFill = "ffff00"
chart.append(series2)
### Add 'Number' as category if preferred. Enable these two lines and disable the number series above
# cats = Reference(ws, min_col=1, max_col=7, min_row=1, max_row=1)
# chart.set_categories(cats)
ws.add_chart(chart, "A5")
wb.save("area2D.xlsx")

Related

How to set the 'Angle of first slice' on a doughnut chart in Python openpyxl

How to set the 'Angle of first slice' on a doughnut chart in Python openpyxl?
For a simple doughnut chart like:
chart = DoughnutChart()
labels = Reference(WorkSheet, min_col=1, min_row=2, max_row=5)
data = Reference(WorkSheet, min_col=1, min_row=1, max_row=5)
chart.add_data(data, titles_from_data=True)
chart.set_categories(labels)
I have a few slices like:
slices = [DataPoint(idx=i) for i in range(4)]
plain, jam, lime, chocolate = slices
chart.series[0].data_points = slices
plain.graphicalProperties.solidFill = "FAE1D0"
jam.graphicalProperties.solidFill = "BB2244"
lime.graphicalProperties.solidFill = "22DD22"
chocolate.graphicalProperties.noFill = True
How can a rotate the first slice or set the excel chart setting for "Angle of first slice". I don;t see anything like this in the openpyxl "Read the Docs".
Maybe something like:
chocolate.rot = "-270"
Can this be done with another library? (xlsxwriter)
Thanks!!

AttributeError: module 'matplotlib.cm' has no attribute 'RdylBu'

This code is basically trying to plot the parallel coordinates. Since the outcome is continuous variable , to see the outcome pattern for each rows of data; shades need to be assign based on outcome. Book is using plot.cm.RdYlBu for similar case but when I tried to use this, there was an attribute error.
here is the similar code from book
import matplotlib.pyplot as plot
from math import exp
target_url = ("http://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data")
#read abalone data
abalone = pd.read_csv(target_url,header=None, prefix="V")
abalone.columns = ['Sex', 'Length', 'Diameter', 'Height', 'Whole Wt', 'Shucked Wt', 'Viscera Wt', 'Shell Wt', 'Rings']
#get summary to use for scaling
summary = abalone.describe()
minRings = summary.iloc[3,7]
maxRings = summary.iloc[7,7]
nrows = len(abalone.index)
for i in range(nrows):
#plot rows of data as if they were series data
dataRow = abalone.iloc[i,1:8]
labelColor = (abalone.iloc[i,8] - minRings) / (maxRings - minRings)
dataRow.plot(color=plot.cm.RdYlBu(labelColor), alpha=0.5)
plot.xlabel("Attribute Index")
plot.ylabel(("Attribute Values"))
plot.show()

Folium Heatmap With Time for COVID 19

I am trying to create a Heatmap movie for the confirmed cases of Covid 19.
My dataset is a pd.dataFrame with columns Date, Latitude, Longitude, Confirmed.
My issue is that I do not know how to input the Confirmed value as an input in the Folium.plugin.HeatmapWithTime.
I tried using:
new_map = folium.Map(location=[0, 0], tiles= "cartodbpositron",min_zoom=2, zoom_start=2, max_zoom=3)
df['Lat'] = df['Lat'].astype(float)
df['Long'] = df['Long'].astype(float)
Confirmed_df = df[['Lat', 'Long','Confirmed']]
hm = plugins.HeatMapWithTime(Confirmed_df,auto_play=True,max_opacity=0.8)
hm.add_to(new_map)
new_map
df looks like:
Date LAT LONG Confirmed
2020/04/26 48.847306 2.433284 6500
2020/04/26 48.861935 2.441292 4800
2020/04/26 48.839644 2.655109 9000
2020/04/25 48.924351 2.386369 12000
2020/04/25 48.829872 2.376677 0
You should pre-process data before input to HeatMapWithTime() function. The Folium document and example here are helpful.
In your case, the input should be a list of [lat, lng, weight], you should use Confirmed column as a weight. The first thing, you need normalize 'Confirmed' values to (0, 1].
df['Confirmed'] = df['Confirmed'] / df['Confirmed'].sum()
Then, you can preprocess like this:
df['Date'] = df['Date'].sort_values(ascending=True)
data = []
for _, d in df.groupby('Date'):
data.append([[row['lat'], row['lng'], row['Confirmed']] for _, row in d.iterrows()])
Finally, use data to input to function HeatMapWithTime() as you did:
hm = plugins.HeatMapWithTime(data, auto_play=True,max_opacity=0.8)
hm.add_to(new_map)
new_map

Placing Labels in nested categorical stacked bar with Bokeh and Pandas

I am trying to replicate a chart like the following using a pandas dataframe and bokeh vbar.:
Objective
So far, I´ve managed to place the labels in their corresponding height but now I can't find a way to access the numeric value where the category (2016,2017,2018) is located in the x axis. This is my result:
My nested categorical stacked bars chart
This is my code. It's messy but it's what i've managed so far. So is there a way to access the numeric value in x_axis of the bars?
def make_nested_stacked_bars(source,measurement,dimension_attr):
#dimension_attr is a list that contains the names of columns in source that will be used as categories
#measurement containes the name of the column with numeric data.
data = source.copy()
#Creates list of values of highest index
list_attr = source[dimension_attr[0]].unique()
list_stackers = list(source[dimension_attr[-1]].unique())
list_stackers.sort()
#trims labals that are too wide to fit in graph
for column in data.columns:
if data[column].dtype.name == 'object':
data[column] = np.where(data[column].apply(len) > 30, data[column].str[:30]+'...', data[column])
#Creates a list of dataframes, each grouping a specific value
list_groups = []
for item in list_attr:
list_groups.append(data[data[dimension_attr[0]] == item])
#Groups data by dimension attrs, aggregates measurement to count
#Drops highest index from dimension attr
dropped_attr = dimension_attr[0]
dimension_attr.remove(dropped_attr)
#Creates groupby by the last 2 parameters, and aggregates to count
#Calculates percentage
for index,value in enumerate(list_groups):
list_groups[index] = list_groups[index].groupby(by=dimension_attr).agg({measurement: ['count']})
list_groups[index] = list_groups[index].groupby(level=0).apply(lambda x: round(100 * x / float(x.sum()),1))
# Resets indexes
list_groups[index] = list_groups[index].reset_index()
list_groups[index] = list_groups[index].pivot(index=dimension_attr[0], columns=dimension_attr[1])
list_groups[index].index = [(x,list_attr[index]) for x in list_groups[index].index]
# Drops dimension attr as top level column
list_groups[index].columns = list_groups[index].columns.droplevel(0)
list_groups[index].columns = list_groups[index].columns.droplevel(0)
df = pd.concat(list_groups)
# Get the number of colors needed for the plot.
colors = brewer["Spectral"][len(list_stackers)]
colors.reverse()
p = figure(plot_width=800, plot_height=500, x_range=FactorRange(*df.index))
renderers = p.vbar_stack(list_stackers, x='index', width=0.3, fill_color=colors, legend=[get_item_value(x)for x in list_stackers], line_color=None, source=df, name=list_stackers,)
# Adds a different hovertool to a stacked bar
#empy dictionary with initial values set to zero
list_previous_y = {}
for item in df.index:
list_previous_y[item] = 0
#loops through bar graphs
for r in renderers:
stack = r.name
hover = HoverTool(tooltips=[
("%s" % stack, "#%s" % stack),
], renderers=[r])
#Initial value for placing label in x_axis
previous_x = 0.5
#Loops through dataset rows
for index, row in df.iterrows():
#adds value of df column to list
list_previous_y[index] = list_previous_y[index] + df[stack][index]
## adds label if value is not nan and at least 10
if not math.isnan(df[stack][index]) and df[stack][index]>=10:
p.add_layout(Label(x=previous_x, y=list_previous_y[index] -df[stack][index]/2,
text='% '+str(df[stack][index]), render_mode='css',
border_line_color='black', border_line_alpha=1.0,
background_fill_color='white', background_fill_alpha=1.0))
# increases position in x_axis
#this should be done by adding the value of next bar in x_axis
previous_x = previous_x + 0.8
p.add_tools(hover)
p.add_tools(hover)
p.legend.location = "top_left"
p.x_range.range_padding = 0.2
p.xgrid.grid_line_color = None
return p
Or is there an easier way to get all this done?
Thank you for your time!
UPDATE:
Added an additional image of a three level nested chart where the label placement in x_axis should be accomplished too
Three level nested chart
I can't find a way to access the numeric value where the category (2016,2017,2018) is located in the x axis.
There is not any way to access this information on the Python side in standalone Bokeh output. The coordinates are only computed inside the browser on the JavaScript side. i.e. only after your Python code has finished running and is out of the picture entirely. Even in a Bokeh server app context there is not any direct way, as there are not any synchronized properties that record the values.
As of Bokeh 1.3.4, support for placing labels with categorical coordinates is a known open issue.
In the mean time, the only workarounds I can suggest are:
Use the text glyph method with coordinates in a ColumnDataSource, instead of Label. That should work to position with actual categorical coordinates. (LabelSet might also work, though I have not tried). You can see an example of text with categorical coordiantes here:
https://github.com/bokeh/bokeh/blob/master/examples/plotting/file/periodic.py
Use numerical coordinates to position the Label. But you will have to experiment/best guess to find numercal coordinates that work for you. A rule of thumb is that categories have a width of 1.0 in synthetic (numeric) coordinate space.
My solution was..
Creating a copy of the dataframe used for making the chart. This dataframe (labeling_data) contains the y_axis coordinates calculated so that the label is positioned at the middle of the corresponding stacked bar.
Then, added aditional columnns to be used as the actual label where the values to be displayed were concatenated with the percentage symbol.
labeling_data = df.copy()
#Cumulative sum of columns
labeling_data = labeling_data.cumsum(axis=1)
#New names for columns
y_position = []
for item in labeling_data.columns:
y_position.append(item+'_offset')
labeling_data.columns = y_position
#Copies original columns
for item in df:
#Adding original columns
labeling_data[item] = df[item]
#Modifying offset columns to place label in the middle of the bar
labeling_data[item+'_offset'] = labeling_data[item+'_offset']-labeling_data[item]/2
#Concatenating values with percentage symbol if at least 10
labeling_data[item+'_label'] = np.where(df[item] >=10 , '% '+df[item].astype(str), "")
Finally, by looping through the renderers of the plot, a labelset was added to each stack group using the labeling_data as Datasource . By doing this, the index of the dataframe can be used to set the x_coordinate of the label. And the corresponding columns were added for the y_coordinate and text parameters.
info = ColumnDataSource(labeling_data)
#loops through bar graphs
for r in renderers:
stack = r.name
#Loops through dataset rows
for index, row in df.iterrows():
#Creates Labelset and uses index, y_offset and label columns
#as x, y and text parameters
labels = LabelSet(x='index', y=stack+'_offset', text=stack+'_label', level='overlay',
x_offset=-25, y_offset=-5, source=info)
p.add_layout(labels)
Final result:
Nested categorical stacked bar chart with labels

Axis text orientation on openpyxl chart

I'm generating a ScatterChart with pyopenxl from a pandas dataframe.
I am trying to change the rotation of the text in the X axis to 270º but I cannot found documentation about how to do it.
This is the code to generate the chart.
import numpy as np
from openpyxl.chart import ScatterChart, Reference, Series
from openpyxl.chart.axis import DateAxis
import pandas as pd
def generate_chart_proyeccion(writer_sheet, col_to_graph, start_row, end_row, title):
"""
Construct a new chart object
:param writer_sheet: Worksheet were is data located
:param col_to_graph: Column of data to be plotted
:param start_row: Row where data starts
:param end_row: Row where data ends
:param title: Chart title
:return: returns a chart object
"""
chart = ScatterChart()
chart.title = title
chart.x_axis.number_format = 'd-mmm HH:MM'
chart.x_axis.majorTimeUnit = "days"
chart.x_axis.title = "Date"
chart.y_axis.title = "Value"
chart.legend.position = "b"
data = Reference(writer_sheet, min_col=col_to_graph, max_col=col_to_graph, min_row=start_row, max_row=end_row)
data_dates = Reference(writer_sheet, min_col=1, max_col=1, min_row=start_row, max_row=end_row) # Corresponde a la columna con la fecha
serie = Series(data, data_dates, title_from_data=True)
chart.series.append(serie)
return chart
# Write data to excel
writer = pd.ExcelWriter("file.xlsx", engine='openpyxl')
df = pd.DataFrame(np.random.randn(10,1), columns=['Value'], index=pd.date_range('20130101',periods=10,freq='T'))
start_row = 1 # Row to start writing df in excel
df.to_excel(writer, sheet_name="Sheet1", startrow=start_row)
end_row = start_row + len(df) # End of the data
chart = generate_chart_proyeccion(writer.sheets["Sheet1"], 2, start_row, end_row, "A title")
# Añado gráfico a excel
writer.sheets["Sheet1"].add_chart(chart, "C2")
writer.save()
This is the output chart that I got.
This is the output chart that I want.
Thanks!
This is unfortunately nothing like as simple as it should be because in the specification this is one of the areas where the schema changes from SpreadsheetDrawingML to DrawingML, which is far more abstract. The best thing to do is actually create two sample files and compare them. In this case this difference is in rot or rotation attribute of the txPr or textProperties of the axis. This is covered in § 21.1.2.1.1 of the OOXML specification.
The following code should work, but might require you to create a TextProperties object:
chart.x_axis.textProperties.bodyProperties.rot = -5400000
I had the same question - this SO post by #oldhumble solved it for me - please see Rotate the axis of an excel chart using openpyxl

Categories

Resources