I'm trying to create a chart in openpyxl with a secondary y-axis and an DateAxis for the x-values.
For this MWE, I've adapted the secondary axis example with the DateAxis example.
from datetime import datetime
from openpyxl import Workbook, chart
# set to True to fail/create an invalid document
# set to False to generate a valid, but ugly/useless chart
DATES_ON_2ND = True
wb = Workbook()
ws = wb.active
xvals = ['date', *[datetime(2018, 11, d, d+12) for d in range(1, 7)]]
avals = ['aliens', 6, 3, 4, 3, 6, 7]
hvals = ['humans', 10, 40, 50, 20, 10, 50]
for row in zip(xvals, avals, hvals):
ws.append(row)
dates = chart.Reference(ws, min_row=2, max_row=7, min_col=1, max_col=1)
aliens = chart.Reference(ws, min_row=1, max_row=7, min_col=2, max_col=2)
humans = chart.Reference(ws, min_row=1, max_row=7, min_col=3, max_col=3)
c1 = chart.LineChart()
c1.x_axis = chart.axis.DateAxis(crossAx=100)
c1.x_axis.title = "Date"
c1.x_axis.crosses = "min"
c1.x_axis.majorTickMark = "out"
c1.x_axis.number_format = "yyyy-mmm-dd"
c1.add_data(aliens, titles_from_data=True)
c1.set_categories(dates)
c1.y_axis.title = 'Aliens'
# Create a second chart
c2 = chart.LineChart()
if DATES_ON_2ND:
c2.x_axis = chart.axis.DateAxis(crossAx=100)
c2.x_axis.number_format = "yyyy-mmm-dd"
c2.x_axis.crosses = "min"
c2.add_data(humans, titles_from_data=True)
c2.set_categories(dates)
# c2.y_axis.axId = 200
c2.y_axis.title = "Humans"
# Display y-axis of the second chart on the right
# by setting it to cross the x-axis at its maximum
c1.y_axis.crosses = "max"
c1 += c2
ws.add_chart(c1, "E4")
wb.save("secondary.xlsx")
When I leave the secondary x axis as a categorical axis, a valid Excel document is created, even if the chart isn't what I want. But setting the secondary axis as a DateAxis the same way as the primary axis generates an invalid corrupted file that fails to show any chart.
Is there a trick to this that I'm missing?
So, as noted in my comments, there isn't really much benefit in DateAxes but if you use them then they have a default id of 500. This is important because it is the value that the y-axes need to cross. CrossAx for the category/date axis doesn't seen to matter. The following works for me:
from datetime import datetime
from openpyxl import Workbook, chart
wb = Workbook()
ws = wb.active
xvals = ['date', *[datetime(2018, 11, d, d+12) for d in range(1, 7)]]
avals = ['aliens', 6, 3, 4, 3, 6, 7]
hvals = ['humans', 10, 40, 50, 20, 10, 50]
for row in zip(xvals, avals, hvals):
ws.append(row)
dates = chart.Reference(ws, min_row=2, max_row=7, min_col=1, max_col=1)
aliens = chart.Reference(ws, min_row=1, max_row=7, min_col=2, max_col=2)
humans = chart.Reference(ws, min_row=1, max_row=7, min_col=3, max_col=3)
c1 = chart.LineChart()
c1.x_axis = chart.axis.DateAxis() # axId defaults to 500
c1.x_axis.title = "Date"
c1.x_axis.crosses = "min"
c1.x_axis.majorTickMark = "out"
c1.x_axis.number_format = "yyyy-mmm-dd"
c1.add_data(aliens, titles_from_data=True)
c1.set_categories(dates)
c1.y_axis.title = 'Aliens'
c1.y_axis.crossAx = 500
c1.y_axis.majorGridlines = None
# Create a second chart
c2 = chart.LineChart()
c2.x_axis.axId = 500 # same as c1
c2.x_axis.crosses = "min"
c2.add_data(humans, titles_from_data=True)
c2.set_categories(dates)
c2.y_axis.axId = 20
c2.y_axis.title = "Humans"
c2.y_axis.crossAx = 500
# Display y-axis of the second chart on the right
# by setting it to cross the x-axis at its maximum
c1.y_axis.crosses = "max"
c1 += c2
ws.add_chart(c1, "E4")
wb.save("secondary.xlsx")
Related
I would like to add a column in my df to show the pixel coordinates of points plotted as a line chart. My goal is to obtain the y-coordinates of two line charts that have been overlaid on each other on different y-axes and find the difference between each pair of y-coordinates.
I took inspiration from this other StackOverflow answer similar to it and tried to apply it. The code failed at this point:
fig_main.transData.transform(np.vstack([x,y]).T)
Error was:
Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'
I think it's pretty obvious I don't actually understand what the heck I'm doing here.
My code is as such:
import investpy
import pandas as pd
import matplotlib
#raw data
spot_df = investpy.get_currency_cross_historical_data(currency_cross = "AUD/SGD", from_date = "01/01/2018", to_date = "05/02/2020")
base_yield_df = investpy.bonds.get_bond_historical_data(bond = "Australia 2Y", country = "Australia", from_date = "01/01/2018", to_date = "05/02/2020")
quote_yield_df = investpy.bonds.get_bond_historical_data(bond = "Singapore 2Y", country = "Singapore", from_date = "01/01/2018", to_date = "05/02/2020")
#data munging
df = pd.merge(spot_df["Close"],base_yield_df["Close"], how='inner', left_index=True, right_index=True)
df = pd.merge(df, quote_yield_df["Close"], how='inner', left_index=True, right_index=True)
df.columns = ["AUD/SGD", "base", "quote"]
df["2Y Spread"] = df["base"]/df["quote"]
df
#creating subplot
fig = plt.figure(figsize = [15, 9] )
gs = fig.add_gridspec(3, 2)
fig_main = fig.add_subplot(gs[0:2,:])
#plotting data points
points, = fig_main.plot(df["2Y Spread"], label = "2Y Spread", color = "red")
fig_main = fig_main.twinx()
fig_main.plot(df["AUD/SGD"], label = "AUD/SGD", color = "green")
#attempt to obtain coordinates of data points
x, y = points.get_data()
xy_pixels = fig_main.transData.transform(np.vstack([x,y]).T)
xpix, ypix = xy_pixels.T
xpxi, ypix
#another subplot to display the difference in y-coordinate of the two line graphs
fig_main = fig.add_subplot(gs[2,:])
You can uncomment the final line to see the dataframe, otherwise you should see the chart.
I am trying to with a set of data make a table with just two columns. Below is the following data:
data = [[ 66386, 174296, 75131, 577908, 32015],
[ 58230, 381139, 78045, 99308, 160454],
[ 89135, 80552, 152558, 497981, 603535],
[ 78415, 81858, 150656, 193263, 69638],
[139361, 331509, 343164, 781380, 52269]]
I just want to display the first column of the data so that I can table that looks like this below:
Below is a snippet of code that I am trying to use:
columns = ('Freeze', 'Wind', 'Flood', 'Quake', 'Hail')
rows = ['%d year' % x for x in (100, 50, 20, 10, 5)]
# Get some pastel shades for the colors
colors = plt.cm.BuPu(np.linspace(0, 0.5, len(rows)))
n_rows = len(data)
# Initialize the vertical-offset for the stacked bar chart.
y_offset = np.zeros(len(columns))
# Plot bars and create text labels for the table
cell_text = []
for row in range(n_rows):
y_offset = data[row]
cell_text.append(['%1.1f' % (x / 1000.0) for x in y_offset])
# Reverse colors and text labels to display the last value at the top.
colors = colors[::-1]
the_table = plt.table(cellText=cell_text,
rowLabels=rows,
rowColours=colors,
colLabels=columns,
loc='center')
How do I tweak this code to get the desired result?
add an attribute 'year' for example in columns
columns = ('year','Freeze', 'Wind', 'Flood', 'Quake', 'Hail')
rows = ['%d year' % x for x in (100, 50, 20, 10, 5)]
# Get some pastel shades for the colors
colors = plt.cm.BuPu(np.linspace(0, 0.5, len(rows)))
n_rows = len(data)
# Initialize the vertical-offset for the stacked bar chart.
y_offset = np.zeros(len(columns))
# Plot bars and create text labels for the table
cell_text = []
for row in range(n_rows):
y_offset = data[row]
cell_text.append(['%1.1f' % (x / 1000.0) for x in y_offset])
# Reverse colors and text labels to display the last value at the top.
colors = colors[::-1]
the_table = plt.table(cellText=cell_text,
rowLabels=rows,
rowColours=colors,
colLabels=columns,
loc='center')
I'm trying to adapt the brewer example (http://docs.bokeh.org/en/latest/docs/gallery/stacked_area.html) to my needs. One of the things I'd like is to have dates at the x-axis. I did the following:
timesteps = [str(x.date()) for x in pd.date_range('1950-01-01', '1951-07-01', freq='MS')]
p = figure(x_range=FactorRange(factors=timesteps), y_range=(0, 800))
p.xaxis.major_label_orientation = np.pi/4
as an adaptation of the previous line
p = figure(x_range=(0, 19), y_range=(0, 800))
The dates are displayed, but the first date 1950-01-01 sits at x=1. How can I shift it to x=0? The first real data points I have are for that date and therefore should be displayed together with that date and not one month later.
Well, if you have a list of strings as your x axis, then apparently the count starts at 1, then you have to modify your x data for the plot to start at 1. Actually the brewer example (http://docs.bokeh.org/en/latest/docs/gallery/stacked_area.html) has a range from 0 to 19, so it has 20 data points not 19 like your timesteps list. I modified the x input for the plot as : data['x'] = np.arange(1,N+1) to start from 1 to N. And I added one more day to your list: timesteps = [str(x.date()) for x in pd.date_range('1950-01-01', '1951-08-01', freq='MS')]
Here is the complete code:
import numpy as np
import pandas as pd
from bokeh.plotting import figure, show, output_file
from bokeh.palettes import brewer
N = 20
categories = ['y' + str(x) for x in range(10)]
data = {}
data['x'] = np.arange(1,N+1)
for cat in categories:
data[cat] = np.random.randint(10, 100, size=N)
df = pd.DataFrame(data)
df = df.set_index(['x'])
def stacked(df, categories):
areas = dict()
last = np.zeros(len(df[categories[0]]))
for cat in categories:
next = last + df[cat]
areas[cat] = np.hstack((last[::-1], next))
last = next
return areas
areas = stacked(df, categories)
colors = brewer["Spectral"][len(areas)]
x2 = np.hstack((data['x'][::-1], data['x']))
timesteps = [str(x.date()) for x in pd.date_range('1950-01-01', '1951-08-01', freq='MS')]
p = figure(x_range=bokeh.models.FactorRange(factors=timesteps), y_range=(0, 800))
p.grid.minor_grid_line_color = '#eeeeee'
p.patches([x2] * len(areas), [areas[cat] for cat in categories],
color=colors, alpha=0.8, line_color=None)
p.xaxis.major_label_orientation = np.pi/4
bokeh.io.show(p)
And here is the output:
UPDATE
You can leave data['x'] = np.arange(0,N) from 0 to 19, and then use offset=-1 inside FactorRange, i.e. figure(x_range=bokeh.models.FactorRange(factors=timesteps,offset=-1),...
Update version bokeh 0.12.16
In this version I am using datetime for x axis which has the advantage of nicer formatting when zooming in.
import numpy as np
import pandas as pd
from bokeh.plotting import figure, show, output_file
from bokeh.palettes import brewer
timesteps = [x for x in pd.date_range('1950-01-01', '1951-07-01', freq='MS')]
N = len(timesteps)
cats = 10
df = pd.DataFrame(np.random.randint(10, 100, size=(N, cats))).add_prefix('y')
def stacked(df):
df_top = df.cumsum(axis=1)
df_bottom = df_top.shift(axis=1).fillna({'y0': 0})[::-1]
df_stack = pd.concat([df_bottom, df_top], ignore_index=True)
return df_stack
areas = stacked(df)
colors = brewer['Spectral'][areas.shape[1]]
x2 = np.hstack((timesteps[::-1], timesteps))
p = figure( x_axis_type='datetime', y_range=(0, 800))
p.grid.minor_grid_line_color = '#eeeeee'
p.patches([x2] * areas.shape[1], [areas[c].values for c in areas],
color=colors, alpha=0.8, line_color=None)
p.xaxis.formatter = bokeh.models.formatters.DatetimeTickFormatter(
months=["%Y-%m-%d"])
p.xaxis.major_label_orientation = 3.4142/4
output_file('brewer.html', title='brewer.py example')
show(p)
I'm currently experimenting with the python module openpyxl, trying to automate some tasks at work and generate spreadsheets automatically. For one of the required sheets I need to generate a scatter chart from tabulated data. However, the scatter chart should consist from multiple lines connecting two points each, so each of the individual x/y series in the scatter chart should connect two points only.
Generally I found from the openpyxl documentation that scatter charts are generated like in this small example:
from openpyxl import Workbook
from openpyxl.chart import (
ScatterChart,
Reference,
Series,
)
wb = Workbook()
ws = wb.active
rows = [
['Size', 'Batch 1', 'Batch 2'],
[2, 40, 30],
[3, 40, 25],
[4, 50, 30],
[5, 30, 25],
[6, 25, 35],
[7, 20, 40],
]
for row in rows:
ws.append(row)
chart = ScatterChart()
chart.title = "Scatter Chart"
chart.style = 13
chart.x_axis.title = 'Size'
chart.y_axis.title = 'Percentage'
xvalues = Reference(ws, min_col=1, min_row=2, max_row=7)
for i in range(2, 4):
values = Reference(ws, min_col=i, min_row=1, max_row=7)
series = Series(values, xvalues, title_from_data=True)
chart.series.append(series)
ws.add_chart(chart, "A10")
wb.save("scatter.xlsx")
However, the x (and y) coordinates of the two points I would like to connect in the scatter points are not located in adjacent cells.
So when I import the data series manually in excel by holding 'ctrl' and select two cells I get something like this:
'Sheet!$A$4;Sheet!$A$6'
instead of
'Sheet!$A$4:$A$6'
when dragging the cursor to select a range of cells.
For only two individual not-adjacent cells this means that I do not have a clear min_row/min_col/max_row etc.. but only a list of cell pairs (for both x and y). Is there a way create a data series in openpyxl as a list of cells instead of a connected/adjacent range?
Help would be much appreciated! :)
There are currently no plans to support non-contiguous cell ranges in chart series. I would suggest you try and arrange your data or create references to it that will allow you to work with contiguous ranges.
Are you sure this doesn't work?
I've modified the example for a situation like yours and it seems to work for me:
from openpyxl import Workbook
from openpyxl.chart import (
ScatterChart,
Reference,
Series,
)
wb = Workbook()
ws = wb.active
rows = [
['Size'],
[2, 'Batch 1', 'Batch 2'],
[3, 40, 30],
[4, 40, 25],
[5, 50, 30],
[6, 30, 25],
[7, 25, 35],
[None, 20, 40],
]
for row in rows:
ws.append(row)
chart = ScatterChart()
chart.title = "Scatter Chart"
chart.style = 13
chart.x_axis.title = 'Size'
chart.y_axis.title = 'Percentage'
xvalues = Reference(ws, min_col=1, min_row=2, max_row=7)
for i in range(2, 4):
values = Reference(ws, min_col=i, min_row=2, max_row=8)
series = Series(values, xvalues, title_from_data=True)
chart.series.append(series)
ws.add_chart(chart, "A10")
wb.save("scatter.xlsx")
Result:
I'm trying to add "linear" trend-line to my excel chart and display R-squared value using openpyxl, but i cannot find any example.
Below is python code that generates chart shown on image without trend-line and R-squared formula chart image.
Thanks!
from openpyxl import Workbook, load_workbook
from openpyxl.chart import (
ScatterChart,
Reference,
Series,
)
from openpyxl.chart.trendline import Trendline
wb = load_workbook(r"path to load blank workbook\data.xlsx")
ws = wb.active
rows = [
['Size', 'Batch 1'],
[3, 40],
[4, 50],
[2, 40],
[5, 30],
[6, 25],
[7, 20],
]
for row in rows:
ws.append(row)
chart = ScatterChart()
chart.title = "Scatter Chart"
#chart.style = 13
chart.x_axis.title = 'Size'
chart.y_axis.title = 'Percentage'
xvalues = Reference(ws, min_col=1, min_row=2, max_row=8)
for i in range(2, 4):
values = Reference(ws, min_col=i, min_row=2, max_row=8)
series = Series(values, xvalues, title_from_data=True)
chart.series.append(series)
line = chart.series[0]
line.graphicalProperties.line.noFill = True
line.marker.symbol = "circle"
ws.add_chart(chart, "A10")
wb.save("path to save workbook\scatter.xlsx")
It's basically impossible to document all the possibilities for charts so you will occasionally have to dive into the XML of a relevant chart to find out how it's done. That said, trendlines are pretty easy to do.
from openpyxl.chart.trendline import Trendline
line.trendline = Trendline()