Openpyxl charts - data series from random unconnected cells possible? - python

I'm currently experimenting with the python module openpyxl, trying to automate some tasks at work and generate spreadsheets automatically. For one of the required sheets I need to generate a scatter chart from tabulated data. However, the scatter chart should consist from multiple lines connecting two points each, so each of the individual x/y series in the scatter chart should connect two points only.
Generally I found from the openpyxl documentation that scatter charts are generated like in this small example:
from openpyxl import Workbook
from openpyxl.chart import (
ScatterChart,
Reference,
Series,
)
wb = Workbook()
ws = wb.active
rows = [
['Size', 'Batch 1', 'Batch 2'],
[2, 40, 30],
[3, 40, 25],
[4, 50, 30],
[5, 30, 25],
[6, 25, 35],
[7, 20, 40],
]
for row in rows:
ws.append(row)
chart = ScatterChart()
chart.title = "Scatter Chart"
chart.style = 13
chart.x_axis.title = 'Size'
chart.y_axis.title = 'Percentage'
xvalues = Reference(ws, min_col=1, min_row=2, max_row=7)
for i in range(2, 4):
values = Reference(ws, min_col=i, min_row=1, max_row=7)
series = Series(values, xvalues, title_from_data=True)
chart.series.append(series)
ws.add_chart(chart, "A10")
wb.save("scatter.xlsx")
However, the x (and y) coordinates of the two points I would like to connect in the scatter points are not located in adjacent cells.
So when I import the data series manually in excel by holding 'ctrl' and select two cells I get something like this:
'Sheet!$A$4;Sheet!$A$6'
instead of
'Sheet!$A$4:$A$6'
when dragging the cursor to select a range of cells.
For only two individual not-adjacent cells this means that I do not have a clear min_row/min_col/max_row etc.. but only a list of cell pairs (for both x and y). Is there a way create a data series in openpyxl as a list of cells instead of a connected/adjacent range?
Help would be much appreciated! :)

There are currently no plans to support non-contiguous cell ranges in chart series. I would suggest you try and arrange your data or create references to it that will allow you to work with contiguous ranges.

Are you sure this doesn't work?
I've modified the example for a situation like yours and it seems to work for me:
from openpyxl import Workbook
from openpyxl.chart import (
ScatterChart,
Reference,
Series,
)
wb = Workbook()
ws = wb.active
rows = [
['Size'],
[2, 'Batch 1', 'Batch 2'],
[3, 40, 30],
[4, 40, 25],
[5, 50, 30],
[6, 30, 25],
[7, 25, 35],
[None, 20, 40],
]
for row in rows:
ws.append(row)
chart = ScatterChart()
chart.title = "Scatter Chart"
chart.style = 13
chart.x_axis.title = 'Size'
chart.y_axis.title = 'Percentage'
xvalues = Reference(ws, min_col=1, min_row=2, max_row=7)
for i in range(2, 4):
values = Reference(ws, min_col=i, min_row=2, max_row=8)
series = Series(values, xvalues, title_from_data=True)
chart.series.append(series)
ws.add_chart(chart, "A10")
wb.save("scatter.xlsx")
Result:

Related

How to display the y_axis name of a xlsxwriter column chart horizontally, at the top of the axis?

I am using XlsxWriter to do create column charts with the y axis name showing horizontally at the top of the axis. The default orientation for the y axis name seems to be vertical, reading from bottom to top.
So I'm using the name font rotation to try to position it horizontally, reading from left to right. It all works fine for any rotation angle from -90 to 90, except for 0 (zero), which should be the horizontal orientation. It seems that when rotation is set to 0 (zero) it is simply disregarded and the name is shown vertically.
The workaround that achieves a visuaaly acceptable result is setting rotation to 1, or -1. This works for short names, but for longer names it starts to be noticeable that the text is not at 0 (zero) degrees - horizontal.
I am using Python version Python 3.8.3, xlsxwriter version 1.2.9, and Excel 365.
Here is some code that demonstrates the problem:
workbook = xlsxwriter.Workbook('test.xlsx')
worksheet = workbook.add_worksheet()
chart = workbook.add_chart({'type': 'column'})
data = [
[1, 2, 3, 4, 5],
[2, 4, 6, 8, 10],
[3, 6, 9, 12, 15],
]
worksheet.write_column('A1', data[0])
worksheet.write_column('B1', data[1])
worksheet.write_column('C1', data[2])
chart.add_series({'values': '=Sheet1!$A$1:$A$5'})
chart.set_title({'name': 'Some Title',
'name_font': {'rotation': 0}})
chart.set_y_axis({'visible': False,
'name': 'Y Axis',
'name_font': {'size': 14, 'bold': False, 'italic': True, 'rotation': 0},
'name_layout': {'x': 0.025, 'y': 0.075}})
worksheet.insert_chart('E1', chart)
workbook.close()
Any ideas on how to show the y axis name horizontally?
That unfortunately is due to a small bug. I've fixed it in XlsxWriter release 1.4.0.
With the fix in place the output from your (nicely detailed) program is as expected:

Why my Seaborn line plot x-axis shifts one unit?

I am trying to compare two simple and summarized pandas dataframe with line plot from Seaborn library but one of the lines shifts one unit in X axis. What's wrong with it?
The dataframes are:
Here is my code:
df = pd.read_csv('/home/gazelle/Documents/m3inference/m3_result.csv',index_col='id')
df = df.drop("Unnamed: 0",axis=1)
for i, v in df.iterrows():
if str(i) not in result:
df.drop(i, inplace=True)
else:
df.loc[i, 'estimated'] = result[str(i)]
m3 = pd.read_csv('plot_result.csv').set_index('id')
ids = list(m3.index.values)
m3 = m3['age'].value_counts().to_frame().reset_index().sort_values('index')
m3 = m3.rename(columns={m3.columns[0]:'bucket', m3.columns[1]:'age'})
df_estimated = df[df.index.isin(ids)]['estimated'].value_counts().to_frame().reset_index().sort_values('index')
df_estimated = df_estimated.rename(columns={df_estimated.columns[0]:'bucket', df_estimated.columns[1]:'age'})
sns.lineplot(x='bucket', y='age', data=m3)
sns.lineplot(x='bucket', y='age', data=df_estimated)
And the result is:
As has been pointed out in the comments, the data and code you provide appear to produce the correct result:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
sns.set()
m3 = pd.DataFrame({"index": [2, 3, 4, 1], "age": [123, 116, 66, 33]})
df_estimated = pd.DataFrame({"index": [3, 2, 4, 1], "estimated": [200, 100, 37, 1]})
sns.lineplot(x="index", y="age", data=m3)
sns.lineplot(x="index", y="estimated", data=df_estimated)
plt.show()
This gives a plot which is different from the one you posted above:
From your screenshots it looks like you are working in a Jupyter notebook. You are probably suffering from the issue that at the time you plot, the dataframe m3 no longer has the values you printed above, but has been modified.

Chart with secondary y-axis and x-axis as dates

I'm trying to create a chart in openpyxl with a secondary y-axis and an DateAxis for the x-values.
For this MWE, I've adapted the secondary axis example with the DateAxis example.
from datetime import datetime
from openpyxl import Workbook, chart
# set to True to fail/create an invalid document
# set to False to generate a valid, but ugly/useless chart
DATES_ON_2ND = True
wb = Workbook()
ws = wb.active
xvals = ['date', *[datetime(2018, 11, d, d+12) for d in range(1, 7)]]
avals = ['aliens', 6, 3, 4, 3, 6, 7]
hvals = ['humans', 10, 40, 50, 20, 10, 50]
for row in zip(xvals, avals, hvals):
ws.append(row)
dates = chart.Reference(ws, min_row=2, max_row=7, min_col=1, max_col=1)
aliens = chart.Reference(ws, min_row=1, max_row=7, min_col=2, max_col=2)
humans = chart.Reference(ws, min_row=1, max_row=7, min_col=3, max_col=3)
c1 = chart.LineChart()
c1.x_axis = chart.axis.DateAxis(crossAx=100)
c1.x_axis.title = "Date"
c1.x_axis.crosses = "min"
c1.x_axis.majorTickMark = "out"
c1.x_axis.number_format = "yyyy-mmm-dd"
c1.add_data(aliens, titles_from_data=True)
c1.set_categories(dates)
c1.y_axis.title = 'Aliens'
# Create a second chart
c2 = chart.LineChart()
if DATES_ON_2ND:
c2.x_axis = chart.axis.DateAxis(crossAx=100)
c2.x_axis.number_format = "yyyy-mmm-dd"
c2.x_axis.crosses = "min"
c2.add_data(humans, titles_from_data=True)
c2.set_categories(dates)
# c2.y_axis.axId = 200
c2.y_axis.title = "Humans"
# Display y-axis of the second chart on the right
# by setting it to cross the x-axis at its maximum
c1.y_axis.crosses = "max"
c1 += c2
ws.add_chart(c1, "E4")
wb.save("secondary.xlsx")
When I leave the secondary x axis as a categorical axis, a valid Excel document is created, even if the chart isn't what I want. But setting the secondary axis as a DateAxis the same way as the primary axis generates an invalid corrupted file that fails to show any chart.
Is there a trick to this that I'm missing?
So, as noted in my comments, there isn't really much benefit in DateAxes but if you use them then they have a default id of 500. This is important because it is the value that the y-axes need to cross. CrossAx for the category/date axis doesn't seen to matter. The following works for me:
from datetime import datetime
from openpyxl import Workbook, chart
wb = Workbook()
ws = wb.active
xvals = ['date', *[datetime(2018, 11, d, d+12) for d in range(1, 7)]]
avals = ['aliens', 6, 3, 4, 3, 6, 7]
hvals = ['humans', 10, 40, 50, 20, 10, 50]
for row in zip(xvals, avals, hvals):
ws.append(row)
dates = chart.Reference(ws, min_row=2, max_row=7, min_col=1, max_col=1)
aliens = chart.Reference(ws, min_row=1, max_row=7, min_col=2, max_col=2)
humans = chart.Reference(ws, min_row=1, max_row=7, min_col=3, max_col=3)
c1 = chart.LineChart()
c1.x_axis = chart.axis.DateAxis() # axId defaults to 500
c1.x_axis.title = "Date"
c1.x_axis.crosses = "min"
c1.x_axis.majorTickMark = "out"
c1.x_axis.number_format = "yyyy-mmm-dd"
c1.add_data(aliens, titles_from_data=True)
c1.set_categories(dates)
c1.y_axis.title = 'Aliens'
c1.y_axis.crossAx = 500
c1.y_axis.majorGridlines = None
# Create a second chart
c2 = chart.LineChart()
c2.x_axis.axId = 500 # same as c1
c2.x_axis.crosses = "min"
c2.add_data(humans, titles_from_data=True)
c2.set_categories(dates)
c2.y_axis.axId = 20
c2.y_axis.title = "Humans"
c2.y_axis.crossAx = 500
# Display y-axis of the second chart on the right
# by setting it to cross the x-axis at its maximum
c1.y_axis.crosses = "max"
c1 += c2
ws.add_chart(c1, "E4")
wb.save("secondary.xlsx")

Concatenate Bokeh Stacked Bar Plots to visualise changes

I have two dataframes
df1 = pd.DataFrame([['1','1','1','2','2','2','3','3','3'],['1.2','3.5','44','77','3.4','24','11','12','13'], ['30312', '20021', '23423', '23424', '45646', '34535', '35345', '34535', '76786']]).T
df.columns = [['QID','score', 'DocID']]
df2 = pd.DataFrame([['1','1','1','2','2','2','3','3','3'],['21.2','13.5','12','77.6','3.9','29','17','41','32'], ['30312', '20021', '23423', '23424', '45646', '34535', '35345', '34535', '76786']]).T
df.columns = [['QID','score', 'DocID']]
Currently, I'm plotting scores using bokeh in df1 and df2 in two different graphs as
df1_BarDocID = Bar(df1, 'QID', values='score', stack = 'DocID', title="D1: QID Stacked by DocID on Score")
D2_BarDocID = Bar(df2, 'QID', values='score', stack = 'DocID', title="D1: QID Stacked by DocID on Score")
grid = gridplot([[D1_BarDocID, D2_BarDocID]])
show(grid)
But, I want to plot two Dataframes in a single figure in a way that the outputs of Df1 and Df2 are plotted side by side for a single QID. So I can visualise the difference in score between two DataFrames, using bokeh.
df1 & df2 plots, using bokeh
Here is a complete example using the newer vbar_stack and stable bokeh.plotting API. It could probably be made simpler but my Pandas knowledge is limited:
import pandas as pd
from bokeh.core.properties import value
from bokeh.io import output_file
from bokeh.models import FactorRange
from bokeh.palettes import Spectral8
from bokeh.plotting import figure, show
df1 = pd.DataFrame([['1','1','1','2','2','2','3','3','3'],[1.2, 3.5, 44, 77, 3.4, 24, 11, 12, 13], ['30312', '20021', '23423', '23424', '45646', '34535', '35345', '34535', '76786']]).T
df1.columns = ['QID','score', 'DocID']
df1 = df1.pivot(index='QID', columns='DocID', values='score').fillna(0)
df1.index = [(x, 'df1') for x in df1.index]
df2 = pd.DataFrame([['1','1','1','2','2','2','3','3','3'],[21.2, 13.5, 12, 77.6, 3.9, 29, 17, 41, 32], ['30312', '20021', '23423', '23424', '45646', '34535', '35345', '34535', '76786']]).T
df2.columns = ['QID','score', 'DocID']
df2 = df2.pivot(index='QID', columns='DocID', values='score').fillna(0)
df2.index = [(x,'df2') for x in df2.index]
df = pd.concat([df1, df2])
p = figure(plot_width=800, x_range=FactorRange(*df.index))
p.vbar_stack(df.columns, x='index', width=0.8, fill_color=Spectral8,
line_color=None, source=df, legend=[value(x) for x in df.columns])
p.legend.location = "top_left"
output_file('foo.html')
show(p)
Produces:

openpyxl trendline and R-squared value

I'm trying to add "linear" trend-line to my excel chart and display R-squared value using openpyxl, but i cannot find any example.
Below is python code that generates chart shown on image without trend-line and R-squared formula chart image.
Thanks!
from openpyxl import Workbook, load_workbook
from openpyxl.chart import (
ScatterChart,
Reference,
Series,
)
from openpyxl.chart.trendline import Trendline
wb = load_workbook(r"path to load blank workbook\data.xlsx")
ws = wb.active
rows = [
['Size', 'Batch 1'],
[3, 40],
[4, 50],
[2, 40],
[5, 30],
[6, 25],
[7, 20],
]
for row in rows:
ws.append(row)
chart = ScatterChart()
chart.title = "Scatter Chart"
#chart.style = 13
chart.x_axis.title = 'Size'
chart.y_axis.title = 'Percentage'
xvalues = Reference(ws, min_col=1, min_row=2, max_row=8)
for i in range(2, 4):
values = Reference(ws, min_col=i, min_row=2, max_row=8)
series = Series(values, xvalues, title_from_data=True)
chart.series.append(series)
line = chart.series[0]
line.graphicalProperties.line.noFill = True
line.marker.symbol = "circle"
ws.add_chart(chart, "A10")
wb.save("path to save workbook\scatter.xlsx")
It's basically impossible to document all the possibilities for charts so you will occasionally have to dive into the XML of a relevant chart to find out how it's done. That said, trendlines are pretty easy to do.
from openpyxl.chart.trendline import Trendline
line.trendline = Trendline()

Categories

Resources