openpyxl trendline and R-squared value

openpyxl trendline and R-squared value - python

I'm trying to add "linear" trend-line to my excel chart and display R-squared value using openpyxl, but i cannot find any example.
Below is python code that generates chart shown on image without trend-line and R-squared formula chart image.
Thanks!
from openpyxl import Workbook, load_workbook
from openpyxl.chart import (
ScatterChart,
Reference,
Series,
)
from openpyxl.chart.trendline import Trendline
wb = load_workbook(r"path to load blank workbook\data.xlsx")
ws = wb.active
rows = [
['Size', 'Batch 1'],
[3, 40],
[4, 50],
[2, 40],
[5, 30],
[6, 25],
[7, 20],
]
for row in rows:
ws.append(row)
chart = ScatterChart()
chart.title = "Scatter Chart"
#chart.style = 13
chart.x_axis.title = 'Size'
chart.y_axis.title = 'Percentage'
xvalues = Reference(ws, min_col=1, min_row=2, max_row=8)
for i in range(2, 4):
values = Reference(ws, min_col=i, min_row=2, max_row=8)
series = Series(values, xvalues, title_from_data=True)
chart.series.append(series)
line = chart.series[0]
line.graphicalProperties.line.noFill = True
line.marker.symbol = "circle"
ws.add_chart(chart, "A10")
wb.save("path to save workbook\scatter.xlsx")

It's basically impossible to document all the possibilities for charts so you will occasionally have to dive into the XML of a relevant chart to find out how it's done. That said, trendlines are pretty easy to do.
from openpyxl.chart.trendline import Trendline
line.trendline = Trendline()

Related

How to normalize coloring of data with seaborn in pandas?

I got data like you can see in picture 1, because I have value 0 and rest is much bigger (values are between 0 and 100). I would like to get data like is show in picture 2. How to solve this problem?
This is minimal reproducible code.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import colors
index = pd.MultiIndex.from_product([[2019, 2020], [1, 2]],
names=['year', 'visit'])
columns = pd.MultiIndex.from_product([['Group1', 'Group2', 'Group3'], ['value1', 'value2']],
names=['subject', 'type'])
data = np.round(np.random.randn(4, 6), 1)
data[:, ::2] *= 20
data += 50
rdata = pd.DataFrame(data, index=index, columns=columns)
cc = sns.light_palette("red", as_cmap=True)
cc.set_bad('white')
def my_gradient(s, cmap):
return [f'background-color: {colors.rgb2hex(x)}'
for x in cmap(s.replace(np.inf, np.nan))]
styler = rdata.style
red = styler.apply(
my_gradient,
cmap=cc,
subset=rdata.columns.get_loc_level('value1', level=1)[0],
axis=0)
styler
Picture 1
Picture 2

You need to normalize. Usually, in matplotlib, a norm is used, of which plt.Normalize() is the most standard one.
The updated code could look like:
my_norm = plt.Normalize(0, 100)
def my_gradient(s, cmap):
return [f'background-color: {colors.rgb2hex(x)}'
for x in cmap(my_norm(s.replace(np.inf, np.nan)))]

You can normalize you data with the following equation (x-min)/(max-min). So to apply this to your dataframe you could use something like the following:
result = pd.DataFrame()
for i,row in df.iterrows():
hold = {}
for h in df:
hold[h] = (row[h]-df[h].min())/(df[h].max()-df[h].min())
result = result.append(hold,ignore_index=True)

Reverse order bar chart without reversing axis values

I am trying to create a bar chart in excel in reverse order categories.
The following does the trick:
chart.set_y_axis({ 'reverse': True})
But then the the x-axis values are on top of the chart, while I want to keep them on the bottom.
I tried to reverse the order then move the x-axis labels via:
chart.set_x_axis({ 'label_position': 'low'})
This does not seem to do anything. That is strange, because if I don't reverse the y-axis and just set the x-axis position to 'high", then it does move the x-axis values to the top. So I don't know why it won't also move it to the bottom.
How can I reverser the order of categories in an excel bar chart, while keeping the x-axis values at the bottom?
Example code:
import pandas as pd
# Some sample data to plot.
list_data = [10, 20, 30, 20, 15, 30, 45]
# Create a Pandas dataframe from the data.
df = pd.DataFrame(list_data)
# Create a Pandas Excel writer using XlsxWriter as the engine.
excel_file = 'column.xlsx'
sheet_name = 'Sheet1'
writer = pd.ExcelWriter(excel_file, engine='xlsxwriter')
df.to_excel(writer, sheet_name=sheet_name)
workbook = writer.book
worksheet = writer.sheets[sheet_name]
# Create a chart object.
chart = workbook.add_chart({'type': 'bar'})
# Reverse Order
chart.set_y_axis({ 'reverse': True,
'label_position': 'low'
})
# Try and fail to get x-axis values on bottom instead of top
chart.set_x_axis({ 'label_position': 'low'
})
# Configure the series of the chart from the dataframe data.
chart.add_series({
'values': '=Sheet1!$B$2:$B$8',
})
# Insert the chart into the worksheet.
worksheet.insert_chart('D2', chart)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
P.s. Not sure if such comments are allowed here, but xlsxwriter is the best thing since sliced bread!

To get the type of output that you want in Excel (and XlsxWriter) you need to set the crossing point for the axis using the "Axis crosses at maximum category" option.
With XlsxWriter you can do it using the axis crossing parameter. Like this:
import pandas as pd
# Some sample data to plot.
list_data = [10, 20, 30, 20, 15, 30, 45]
# Create a Pandas dataframe from the data.
df = pd.DataFrame(list_data)
# Create a Pandas Excel writer using XlsxWriter as the engine.
excel_file = 'column.xlsx'
sheet_name = 'Sheet1'
writer = pd.ExcelWriter(excel_file, engine='xlsxwriter')
df.to_excel(writer, sheet_name=sheet_name)
workbook = writer.book
worksheet = writer.sheets[sheet_name]
# Create a chart object.
chart = workbook.add_chart({'type': 'bar'})
# Reverse Order
chart.set_y_axis({'reverse': True,
'crossing': 'max'})
# Configure the series of the chart from the dataframe data.
chart.add_series({'values': '=Sheet1!$B$2:$B$8'})
# Insert the chart into the worksheet.
worksheet.insert_chart('D2', chart)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Output:

Chart with secondary y-axis and x-axis as dates

I'm trying to create a chart in openpyxl with a secondary y-axis and an DateAxis for the x-values.
For this MWE, I've adapted the secondary axis example with the DateAxis example.
from datetime import datetime
from openpyxl import Workbook, chart
# set to True to fail/create an invalid document
# set to False to generate a valid, but ugly/useless chart
DATES_ON_2ND = True
wb = Workbook()
ws = wb.active
xvals = ['date', *[datetime(2018, 11, d, d+12) for d in range(1, 7)]]
avals = ['aliens', 6, 3, 4, 3, 6, 7]
hvals = ['humans', 10, 40, 50, 20, 10, 50]
for row in zip(xvals, avals, hvals):
ws.append(row)
dates = chart.Reference(ws, min_row=2, max_row=7, min_col=1, max_col=1)
aliens = chart.Reference(ws, min_row=1, max_row=7, min_col=2, max_col=2)
humans = chart.Reference(ws, min_row=1, max_row=7, min_col=3, max_col=3)
c1 = chart.LineChart()
c1.x_axis = chart.axis.DateAxis(crossAx=100)
c1.x_axis.title = "Date"
c1.x_axis.crosses = "min"
c1.x_axis.majorTickMark = "out"
c1.x_axis.number_format = "yyyy-mmm-dd"
c1.add_data(aliens, titles_from_data=True)
c1.set_categories(dates)
c1.y_axis.title = 'Aliens'
# Create a second chart
c2 = chart.LineChart()
if DATES_ON_2ND:
c2.x_axis = chart.axis.DateAxis(crossAx=100)
c2.x_axis.number_format = "yyyy-mmm-dd"
c2.x_axis.crosses = "min"
c2.add_data(humans, titles_from_data=True)
c2.set_categories(dates)
# c2.y_axis.axId = 200
c2.y_axis.title = "Humans"
# Display y-axis of the second chart on the right
# by setting it to cross the x-axis at its maximum
c1.y_axis.crosses = "max"
c1 += c2
ws.add_chart(c1, "E4")
wb.save("secondary.xlsx")
When I leave the secondary x axis as a categorical axis, a valid Excel document is created, even if the chart isn't what I want. But setting the secondary axis as a DateAxis the same way as the primary axis generates an invalid corrupted file that fails to show any chart.
Is there a trick to this that I'm missing?

So, as noted in my comments, there isn't really much benefit in DateAxes but if you use them then they have a default id of 500. This is important because it is the value that the y-axes need to cross. CrossAx for the category/date axis doesn't seen to matter. The following works for me:
from datetime import datetime
from openpyxl import Workbook, chart
wb = Workbook()
ws = wb.active
xvals = ['date', *[datetime(2018, 11, d, d+12) for d in range(1, 7)]]
avals = ['aliens', 6, 3, 4, 3, 6, 7]
hvals = ['humans', 10, 40, 50, 20, 10, 50]
for row in zip(xvals, avals, hvals):
ws.append(row)
dates = chart.Reference(ws, min_row=2, max_row=7, min_col=1, max_col=1)
aliens = chart.Reference(ws, min_row=1, max_row=7, min_col=2, max_col=2)
humans = chart.Reference(ws, min_row=1, max_row=7, min_col=3, max_col=3)
c1 = chart.LineChart()
c1.x_axis = chart.axis.DateAxis() # axId defaults to 500
c1.x_axis.title = "Date"
c1.x_axis.crosses = "min"
c1.x_axis.majorTickMark = "out"
c1.x_axis.number_format = "yyyy-mmm-dd"
c1.add_data(aliens, titles_from_data=True)
c1.set_categories(dates)
c1.y_axis.title = 'Aliens'
c1.y_axis.crossAx = 500
c1.y_axis.majorGridlines = None
# Create a second chart
c2 = chart.LineChart()
c2.x_axis.axId = 500 # same as c1
c2.x_axis.crosses = "min"
c2.add_data(humans, titles_from_data=True)
c2.set_categories(dates)
c2.y_axis.axId = 20
c2.y_axis.title = "Humans"
c2.y_axis.crossAx = 500
# Display y-axis of the second chart on the right
# by setting it to cross the x-axis at its maximum
c1.y_axis.crosses = "max"
c1 += c2
ws.add_chart(c1, "E4")
wb.save("secondary.xlsx")

Concatenate Bokeh Stacked Bar Plots to visualise changes

I have two dataframes
df1 = pd.DataFrame([['1','1','1','2','2','2','3','3','3'],['1.2','3.5','44','77','3.4','24','11','12','13'], ['30312', '20021', '23423', '23424', '45646', '34535', '35345', '34535', '76786']]).T
df.columns = [['QID','score', 'DocID']]
df2 = pd.DataFrame([['1','1','1','2','2','2','3','3','3'],['21.2','13.5','12','77.6','3.9','29','17','41','32'], ['30312', '20021', '23423', '23424', '45646', '34535', '35345', '34535', '76786']]).T
df.columns = [['QID','score', 'DocID']]
Currently, I'm plotting scores using bokeh in df1 and df2 in two different graphs as
df1_BarDocID = Bar(df1, 'QID', values='score', stack = 'DocID', title="D1: QID Stacked by DocID on Score")
D2_BarDocID = Bar(df2, 'QID', values='score', stack = 'DocID', title="D1: QID Stacked by DocID on Score")
grid = gridplot([[D1_BarDocID, D2_BarDocID]])
show(grid)
But, I want to plot two Dataframes in a single figure in a way that the outputs of Df1 and Df2 are plotted side by side for a single QID. So I can visualise the difference in score between two DataFrames, using bokeh.
df1 & df2 plots, using bokeh

Here is a complete example using the newer vbar_stack and stable bokeh.plotting API. It could probably be made simpler but my Pandas knowledge is limited:
import pandas as pd
from bokeh.core.properties import value
from bokeh.io import output_file
from bokeh.models import FactorRange
from bokeh.palettes import Spectral8
from bokeh.plotting import figure, show
df1 = pd.DataFrame([['1','1','1','2','2','2','3','3','3'],[1.2, 3.5, 44, 77, 3.4, 24, 11, 12, 13], ['30312', '20021', '23423', '23424', '45646', '34535', '35345', '34535', '76786']]).T
df1.columns = ['QID','score', 'DocID']
df1 = df1.pivot(index='QID', columns='DocID', values='score').fillna(0)
df1.index = [(x, 'df1') for x in df1.index]
df2 = pd.DataFrame([['1','1','1','2','2','2','3','3','3'],[21.2, 13.5, 12, 77.6, 3.9, 29, 17, 41, 32], ['30312', '20021', '23423', '23424', '45646', '34535', '35345', '34535', '76786']]).T
df2.columns = ['QID','score', 'DocID']
df2 = df2.pivot(index='QID', columns='DocID', values='score').fillna(0)
df2.index = [(x,'df2') for x in df2.index]
df = pd.concat([df1, df2])
p = figure(plot_width=800, x_range=FactorRange(*df.index))
p.vbar_stack(df.columns, x='index', width=0.8, fill_color=Spectral8,
line_color=None, source=df, legend=[value(x) for x in df.columns])
p.legend.location = "top_left"
output_file('foo.html')
show(p)
Produces:

Openpyxl charts - data series from random unconnected cells possible?

I'm currently experimenting with the python module openpyxl, trying to automate some tasks at work and generate spreadsheets automatically. For one of the required sheets I need to generate a scatter chart from tabulated data. However, the scatter chart should consist from multiple lines connecting two points each, so each of the individual x/y series in the scatter chart should connect two points only.
Generally I found from the openpyxl documentation that scatter charts are generated like in this small example:
from openpyxl import Workbook
from openpyxl.chart import (
ScatterChart,
Reference,
Series,
)
wb = Workbook()
ws = wb.active
rows = [
['Size', 'Batch 1', 'Batch 2'],
[2, 40, 30],
[3, 40, 25],
[4, 50, 30],
[5, 30, 25],
[6, 25, 35],
[7, 20, 40],
]
for row in rows:
ws.append(row)
chart = ScatterChart()
chart.title = "Scatter Chart"
chart.style = 13
chart.x_axis.title = 'Size'
chart.y_axis.title = 'Percentage'
xvalues = Reference(ws, min_col=1, min_row=2, max_row=7)
for i in range(2, 4):
values = Reference(ws, min_col=i, min_row=1, max_row=7)
series = Series(values, xvalues, title_from_data=True)
chart.series.append(series)
ws.add_chart(chart, "A10")
wb.save("scatter.xlsx")
However, the x (and y) coordinates of the two points I would like to connect in the scatter points are not located in adjacent cells.
So when I import the data series manually in excel by holding 'ctrl' and select two cells I get something like this:
'Sheet!$A$4;Sheet!$A$6'
instead of
'Sheet!$A$4:$A$6'
when dragging the cursor to select a range of cells.
For only two individual not-adjacent cells this means that I do not have a clear min_row/min_col/max_row etc.. but only a list of cell pairs (for both x and y). Is there a way create a data series in openpyxl as a list of cells instead of a connected/adjacent range?
Help would be much appreciated! :)

There are currently no plans to support non-contiguous cell ranges in chart series. I would suggest you try and arrange your data or create references to it that will allow you to work with contiguous ranges.

Are you sure this doesn't work?
I've modified the example for a situation like yours and it seems to work for me:
from openpyxl import Workbook
from openpyxl.chart import (
ScatterChart,
Reference,
Series,
)
wb = Workbook()
ws = wb.active
rows = [
['Size'],
[2, 'Batch 1', 'Batch 2'],
[3, 40, 30],
[4, 40, 25],
[5, 50, 30],
[6, 30, 25],
[7, 25, 35],
[None, 20, 40],
]
for row in rows:
ws.append(row)
chart = ScatterChart()
chart.title = "Scatter Chart"
chart.style = 13
chart.x_axis.title = 'Size'
chart.y_axis.title = 'Percentage'
xvalues = Reference(ws, min_col=1, min_row=2, max_row=7)
for i in range(2, 4):
values = Reference(ws, min_col=i, min_row=2, max_row=8)
series = Series(values, xvalues, title_from_data=True)
chart.series.append(series)
ws.add_chart(chart, "A10")
wb.save("scatter.xlsx")
Result:

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

openpyxl trendline and R-squared value - python

It's basically impossible to document all the possibilities for charts so you will occasionally have to dive into the XML of a relevant chart to find out how it's done. That said, trendlines are pretty easy to do. from openpyxl.chart.trendline import Trendline line.trendline = Trendline()

Related

How to normalize coloring of data with seaborn in pandas?

Reverse order bar chart without reversing axis values

Chart with secondary y-axis and x-axis as dates

Concatenate Bokeh Stacked Bar Plots to visualise changes

Openpyxl charts - data series from random unconnected cells possible?

Categories

Resources