python ppt find and replace within a chart - python

I already referred these posts here here, here and here. Please don't mark it as a duplicate.
I have a chart embedded inside the ppt like below
I wish to replace the axis headers from FY2021 HC to FY1918 HC. Similarly, FY2122 HC should be replaced with FY1718 HC.
How can I do this using python pptx? This chart is coming from embedded Excel though. Is there anyway to change it in ppt?
When I tried the below, it doesn't get the axis headers
text_runs = []
for shape in slide.shapes:
if not shape.has_text_frame:
continue
for paragraph in shape.text_frame.paragraphs:
for run in paragraph.runs:
text_runs.append(run.text)
when I did the below, I find the list of shape types from the specific slide. I wish to change only the chart headers. So, the screenshot shows only two charts that I have in my slide.
for slide in ip_ppt.slides:
for shape in slide.shapes:
print("id: %s, type: %s" % (shape.shape_id, shape.shape_type))
id: 24, type: TEXT_BOX (17)
id: 10242, type: TEXT_BOX (17)
id: 11306, type: TEXT_BOX (17)
id: 11, type: AUTO_SHAPE (1)
id: 5, type: TABLE (19)
id: 7, type: TABLE (19)
id: 19, type: AUTO_SHAPE (1)
id: 13, type: CHART (3)
id: 14, type: CHART (3)
When I try to access the shape using id, I am unable to as well
ip_ppt.slides[5].shapes[13].Chart
I also tried the code below
from pptx import chart
from pptx.chart.data import CategoryChartData
chart_data = CategoryChartData()
chart.datalabel = ['FY1918 HC', 'FY1718 HC']
Am new to python and pptx. Any solution on how to edit the embedded charts headers would really be useful. Help please

You can get to the category labels the following way:
from pptx import Presentation
from pptx.shapes.graphfrm import GraphicFrame
prs = Presentation('chart-01.pptx')
for slide in prs.slides:
for shape in slide.shapes:
print("slide: %s, id: %s, index: %s, type: %s" % (slide.slide_id, shape.shape_id, slide.shapes.index(shape), shape.shape_type))
if isinstance(shape, GraphicFrame) and shape.has_chart:
plotIndex = 0
for plot in shape.chart.plots:
catIndex = 0
for cat in plot.categories:
print(" plot %s, category %s, category label: %s" % (plotIndex, catIndex, cat.label))
catIndex += 1
plotIndex += 1
which will put out something like that:
slide: 256, id: 2, index: 0, type: PLACEHOLDER (14)
slide: 256, id: 3, index: 1, type: CHART (3)
plot 0, category 0, category label: East
plot 0, category 1, category label: West
plot 0, category 2, category label: Midwest
Unfortunately you can not change the category label, because it is stored in the embedded Excel. The only way to change those is to replace the chart data by using the chart.replace_data() method.
Recreating the ChartData object you need for the call to replace_data based on the existing chart is a bit more involved, but here is my go at it based on a chart that I created with the following code:
from pptx import Presentation
from pptx.chart.data import CategoryChartData
from pptx.enum.chart import XL_CHART_TYPE,XL_LABEL_POSITION
from pptx.util import Inches, Pt
from pptx.dml.color import RGBColor
# create presentation with 1 slide ------
prs = Presentation()
slide = prs.slides.add_slide(prs.slide_layouts[5])
# define chart data ---------------------
chart_data = CategoryChartData()
chart_data.categories = ['FY2021 HC', 'FY2122 HC']
chart_data.add_series('blue', (34.5, 31.5))
chart_data.add_series('orange', (74.1, 77.8))
chart_data.add_series('grey', (56.3, 57.3))
# add chart to slide --------------------
x, y, cx, cy = Inches(2), Inches(2), Inches(6), Inches(4.5)
gframe = slide.shapes.add_chart(
XL_CHART_TYPE.COLUMN_STACKED, x, y, cx, cy, chart_data
)
chart = gframe.chart
plot = chart.plots[0]
plot.has_data_labels = True
data_labels = plot.data_labels
data_labels.font.size = Pt(13)
data_labels.font.color.rgb = RGBColor(0x0A, 0x42, 0x80)
data_labels.position = XL_LABEL_POSITION.INSIDE_END
prs.save('chart-01.pptx')
and that looks almost identical to your picture in the question:
The following code will change the category labels in that chart:
from pptx import Presentation
from pptx.chart.data import CategoryChartData
from pptx.shapes.graphfrm import GraphicFrame
from pptx.enum.chart import XL_CHART_TYPE
from pptx.util import Inches
# read presentation from file
prs = Presentation('chart-01.pptx')
# find the first chart object in the presentation
slideIdx = 0
for slide in prs.slides:
for shape in slide.shapes:
if shape.has_chart:
chart = shape.chart
print("Chart of type %s found in slide[%s, id=%s] shape[%s, id=%s, type=%s]"
% (chart.chart_type, slideIdx, slide.slide_id,
slide.shapes.index(shape), shape.shape_id, shape.shape_type ))
break
slideIdx += 1
# create list with changed category names
categorie_map = { 'FY2021 HC': 'FY1918 HC', 'FY2122 HC': 'FY1718 HC' }
new_categories = list(categorie_map[c] for c in chart.plots[0].categories)
# build new chart data with new category names and old data values
new_chart_data = CategoryChartData()
new_chart_data.categories = new_categories
for series in chart.series:
new_chart_data.add_series(series.name,series.values)
# write the new chart data to the chart
chart.replace_data(new_chart_data)
# save everything in a new file
prs.save('chart-02.pptx')
The comments should explain what is going on and if you open chart-02.pptx with PowerPoint, this is what you will see:
Hope that solves your problem!

Related

Updating plot using Select Widget issue

I am currently trying to write a program that will switch between two sets of data when different options are chosen from the select widget. I am trying to make this program as autonomous as possible so in the future when people update the data they don't have to modify the code at all and the updates will happen automatically.
Currently, my issue is that when I select 'White' I want the plot to update but nothing is happening.
The two data sets are currently a dict of lists, one labeled 'White_dict' and the other labeled 'black_dict' solely to represent the color of the material for the data (I know its kinda ironic).
from bokeh.plotting import figure, curdoc
from bokeh.models import ColumnDataSource, Legend
from bokeh.models import Select
from bokeh.layouts import column
import pandas as pd
from plot_tools import add_hover
import itertools
from collections import defaultdict
bokeh_doc = curdoc()
material_types = pd.read_csv('data/material-information.csv')
df = pd.read_csv('data/Black_Materials_total_reflecatance.csv')
black_df = pd.read_csv('data/Black_Materials_total_reflecatance.csv')
white_df = pd.read_csv('data/SPIE18_white_all.csv')
names = []
w_names = []
black_dict = defaultdict(list)
white_dict = defaultdict(list)
for name, w_name in zip(df, white_df):
names.append(name)
w_names.append(w_name)
data = pd.read_csv('data/Black_Materials_total_reflecatance.csv', usecols = names)
w_data = pd.read_csv('data/SPIE18_white_all.csv', usecols = w_names)
for name, w_name in zip(names, w_names):
for i in range(0, 2250):
black_dict[name].append(data[name][i])
white_dict[w_name].append(w_data[w_name][i])
mySource = ColumnDataSource(data = black_dict)
#create total reflectance figure
total_fig = figure(plot_width = 650, plot_height = 350,
title = 'Total Reflectance',
x_axis_label = 'Wavelength(nm)', y_axis_label = 'Total Reflectance',
x_range = (250, 2500), y_range = (0,10),
title_location = 'above', sizing_mode = "scale_both",
toolbar_location = "below",
tools = "box_zoom, pan, wheel_zoom, save")
select = Select(title="Material Type", options=['Black', 'White'])
def update_plot(attr, old, new):
if new == 'White':
mySource.data = white_dict
else:
mySource.data = black_dict
for name, color in zip(mySource.data, Turbo256):
if name != 'nm':
total_fig.line('nm', name, line_width = .7, source = mySource, color = color)
select.on_change('value', update_plot)
bokeh_doc.add_root(total_fig)
bokeh_doc.add_root(select)
I'm currently using bokeh serve bokehWork.py to launch the server. If anyone has any idea on what I should fix it would be much appreciated! Thanks!
EDIT:
Adding data for Black_materials_total_reflectance.csv
Black Reflectance Data sample
Adding data for White_all.csv
White Reflectance Data sample
There are two main issues with your code:
You read the same files multiple times and you do a lot of work that Pandas and Bokeh can already do for you
(the main one) You do not take into account the fact that different CSV files have different column names
Here's a fixed version. Notice also the usage of the palette. With just Turbo256 you were getting almost the same color for all lines.
import pandas as pd
from bokeh.models import ColumnDataSource, Select
from bokeh.palettes import turbo
from bokeh.plotting import figure, curdoc
black_ds = ColumnDataSource(pd.read_csv('/home/p-himik/Downloads/Black_material_data - Sheet1.csv').set_index('nm'))
white_ds = ColumnDataSource(pd.read_csv('/home/p-himik/Downloads/White Materials Sample - Sheet1.csv').set_index('nm'))
total_fig = figure(plot_width=650, plot_height=350,
title='Total Reflectance',
x_axis_label='Wavelength(nm)', y_axis_label='Total Reflectance',
title_location='above', sizing_mode="scale_both",
toolbar_location="below",
tools="box_zoom, pan, wheel_zoom, save")
total_fig.x_range.range_padding = 0
total_fig.x_range.only_visible = True
total_fig.y_range.only_visible = True
palette = turbo(len(black_ds.data) + len(white_ds.data))
def plot_lines(ds, color_offset, visible):
renderers = []
for name, color in zip(ds.data, palette[color_offset:]):
if name != 'nm':
r = total_fig.line('nm', name, line_width=.7, color=color,
source=ds, visible=visible)
renderers.append(r)
return renderers
black_renderers = plot_lines(black_ds, 0, True)
white_renderers = plot_lines(white_ds, len(black_ds.data), False)
select = Select(title="Material Type", options=['Black', 'White'], value='Black')
def update_plot(attr, old, new):
wv = new == 'White'
for r in white_renderers:
r.visible = wv
for r in black_renderers:
r.visible = not wv
select.on_change('value', update_plot)
bokeh_doc = curdoc()
bokeh_doc.add_root(total_fig)
bokeh_doc.add_root(select)

pptx Chart class arguments

I'm trying to call the Chart class in the pptx module. Chart has two arguments: chartSpace and chart_part. The problem is that I have no idea what those two arguments are. There is probably a simple answer to this, but I've tried looking over all the documentation and can't find anything about these arguments. Can someone explain what these arguments are looking for?
from pptx import Presentation
from pptx.enum.chart import XL_CHART_TYPE, XL_TICK_LABEL_POSITION
from pptx.chart.data import CategoryChartData
from pptx.chart.data import ChartData
from pptx.enum.shapes import PP_PLACEHOLDER
from pandas import DataFrame as DF
from pptx.chart.chart import Chart
prs_dir = 'Directory'
layout = prs.slide_layouts[6]
slide = prs.slides.add_slide( layout )
chart_data = ChartData()
chart_data.categories = ['Budget','Actuals']
chart_data.add_series('Budget', (1,21,23,4,5,6,7,35))
chart_data.add_series('Actuals', (1,21,23,4,5,6,7,35))
chart = Chart()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-207-e560d7744421> in <module>
----> 1 chart = Chart()
TypeError: __init__() missing 2 required positional arguments: 'chartSpace' and 'chart_part'```
The Chart class is not intended to be instantiated directly. Use the .add_chart() method on the slide shapes collection to add a chart.
from pptx import Presentation
from pptx.chart.data import CategoryChartData
from pptx.enum.chart import XL_CHART_TYPE
from pptx.util import Inches
# create presentation with 1 slide ------
prs = Presentation()
slide = prs.slides.add_slide(prs.slide_layouts[5])
# define chart data ---------------------
chart_data = CategoryChartData()
chart_data.categories = ['East', 'West', 'Midwest']
chart_data.add_series('Series 1', (19.2, 21.4, 16.7))
# add chart to slide --------------------
x, y, cx, cy = Inches(2), Inches(2), Inches(6), Inches(4.5)
slide.shapes.add_chart(
XL_CHART_TYPE.COLUMN_CLUSTERED, x, y, cx, cy, chart_data
)
prs.save('chart-01.pptx')
More details are available in the documentation here: https://python-pptx.readthedocs.io/en/latest/user/charts.html

Set data labels text frame wrap to false – python-pptx

I am asking a duplicate of this question, except that the answer submitted does not work for me. I would like to toggle the data_labels' "Wrap text in shape" button from the powerpoint UI via python-pptx. The linked answer ends up removing the data labels altogether instead. I am using the latest python-pptx version (0.6.18).
Here is a simple example to replicate:
from pptx import Presentation
from pptx.chart.data import ChartData
from pptx.enum.chart import XL_CHART_TYPE
from pptx.util import Cm
from pptx.text.text import TextFrame
# create presentation with 1 slide ------
prs = Presentation()
slide = prs.slides.add_slide(prs.slide_layouts[5])
x = ['one','two','three', 'four']
y = [['diff',
[1,
2,
3,
4,
]],
]
specs = {
'height': Cm(7.82),
'width': Cm(14.8),
'left': Cm(2.53),
'top': Cm(5.72)}
data = ChartData()
data.categories = x
data.add_series('diff', [j for j in y[0][1]])
frame = slide.shapes.add_chart(
XL_CHART_TYPE.BAR_CLUSTERED, specs['left'], specs['top'],
specs['width'], specs['height'], data
)
plot = frame.chart.plots[0]
plot.has_data_labels = True
data_labels = plot.series[0].data_labels
dLbls = data_labels._element
# ---use its <c:txPr> child to create TextFrame object---
text_frame = TextFrame(dLbls.get_or_add_txPr(), None)
# ---turn off word-wrap in the usual way---
text_frame.wrap = False
prs.save('chart-01.pptx')
I believe the second to last line should be text_frame.word_wrap = False, not .wrap; that's my mistake on the earlier answer (now fixed).
Also change this line:
data_labels = plot.series[0].data_labels
to:
data_labels = plot.data_labels
And I think you'll get what you're looking for.

Interactive Plotly Int Slider

Hi I'm fairly new to Python, Plotly and Jupyter Notebook. I would like to use a slider to select the number of days as the range in a query to which a graph is created from. My only issue is that I want the graph to automatically update on interaction with the slider, without having to re-run the query and graph creation. My code is below:
slider = widgets.IntSlider()
display(slider)
sliderVal = slider.value
df = pd.read_sql(f"""
SELECT CASE WHEN SiteID LIKE 3 THEN 'BLAH'
WHEN SiteID LIKE 4 THEN 'BLAHBLAH'
END AS Website,
COUNT(1) AS Count
FROM viewName
WHERE (TimeStamp > DATEADD(DAY, -{sliderVal}, GETDATE()))
GROUP BY SiteId
ORDER BY Count DESC
""", conn)
data = [go.Bar(x=df.Website, y=df.Count)]
layout = go.Layout(
xaxis=dict(
title='Website'),
yaxis=dict(
title='Exception count'),
title=f'Number of exceptions per user in the last {sliderVal} days')
chart = go.Figure(data=data, layout=layout, )
py.iplot(chart, filename='WebExceptions')
Thanks in advance!
If you do not want to rerun the query, then your data frame df must contain the results for all the values that you want the intslider widget to take, the function linked to the widget will then simply filter the data and redraw the graph with the new filtered data.
Here's an example with some dummy data:
import ipywidgets as widgets
import plotly.offline as py
import plotly.graph_objs as go
import pandas as pd
py.init_notebook_mode(connected = True)
# Dummy data, to be replaced with your query result for the range of sliderVal
df = pd.DataFrame({'Days': [1] * 3 + [2] * 4 + [3] * 5,
'Website': [1,2,3, 4,5,6,7, 8,9,10,11,12],
'Count': [10,5,30, 15,20,25,12, 18,17,30,23,27]})
def update_plot(sliderVal):
filtered_df = df.query('Days== ' + str(sliderVal))
data = [go.Bar(x = filtered_df.Website,
y = filtered_df.Count)]
layout = go.Layout(
xaxis = dict(title = 'Website'),
yaxis = dict(title = 'Exception count'),
title = f'Number of exceptions per user in the last {sliderVal} days')
chart = go.Figure(data = data, layout = layout, )
py.iplot(chart, filename = 'WebExceptions')
# links an IntSlider taking values between 1 and 3 to the update_plot function
widgets.interact(update_plot, sliderVal = (1, 3))
and here is the result with sliderVal = 2:

How does one define , extract and replace data from a Chart in an existing Powerpoint using Python

Currently I am using the following code to define and replace
Placeholder (Text data) in existing Powerpoint presentations.
current_dir = os.path.dirname(os.path.realpath(__file__))
prs = Presentation(current_dir + '/test2.pptx')
slides = prs.slides
title_slide_layout = prs.slide_layouts[0]
slide = slides[0]
for shape in slide.placeholders:
print('%d %s' % (shape.placeholder_format.idx, shape.name))
title = slide.shapes.title
subtitle1 = slide.shapes.placeholders[0]
subtitle2 = slide.shapes.placeholders[10]
subtitle10 = slide.shapes.placeholders[11]
subtitle11 = slide.shapes.placeholders[12]
subtitle1.text = "1"
subtitle2.text = "2"
subtitle10.text = "3"
subtitle11.text = "4"
slide2 = slides[1]
for shape in slide2.placeholders:
print('%d %s' % (shape.placeholder_format.idx, shape.name))
subtitle3 = slide2.shapes.placeholders[10]
subtitle4 = slide2.shapes.placeholders[11]
subtitle5 = slide2.shapes.placeholders[12]
subtitle6 = slide2.shapes.placeholders[13]
subtitle12 = slide2.shapes.placeholders[16]
companydate = slide2.shapes.placeholders[14]
subtitle3.text = "1"
subtitle4.text = "2"
subtitle5.text = "3"
subtitle6.text = "4"
subtitle12.text = "40%"
companydate.text = "Insert company"
slide3 = slides[2]
for shape in slide3.placeholders:
print('%d %s' % (shape.placeholder_format.idx, shape.name))
subtitle7 = slide3.shapes.placeholders[10]
subtitle8 = slide3.shapes.placeholders[11]
subtitle9 = slide3.shapes.placeholders[12]
subtitle13 = slide3.shapes.placeholders[16]
companydate2 = slide3.shapes.placeholders[14]
subtitle7.text = "1"
subtitle8.text = "2"
subtitle9.text = "3"
subtitle13.text = "5x"
companydate2.text = "Insert Company"
slide4 = slides[3]
# for shape in slide4.placeholders:
#print('%d %s' % (shape.placeholder_format.idx, shape.name))
companydate3 = slide4.shapes.placeholders[14]
companydate3.text = "Insert Company"
"'Adapting Charts'"
from pptx.chart.data import ChartData
from pptx.enum.chart import XL_CHART_TYPE
from pptx.util import Pt
"Adapting Chart 1"
prs1 = Presentation(current_dir + '/output4.pptx')
slides1 = prs1.slides
chart1 = prs1.slides[0].chart
However, I am also running analytics in the background and I was wondering if it is possible to recognize (define) charts in the same presentation along with extracting and replacing the data in those charts. These chards are not embedded in the template.
As plotting charts with plotly or mathplotlib does not render a compliant image I am not able to use these , unless fully modified into the following format:Graph budget Click Correl
If yes, would it be possible to give concrete coding examples?
Thanks in advance!
Yes, it's possible to do that. The documentation will be your best source.
This will find the chart shapes:
for shape in slide.shapes:
if shape.has_chart:
chart = shape.chart
print('found a chart')
Data is extracted from the chart series(es):
for series in chart.series:
for value in series.values:
print(value)
Data is replaced by creating a new ChartData object and calling .replace_data() on the chart using that chart data object:
chart_data = ChartData(...)
... # add categories, series with values, etc.
chart.replace_data(chart_data)
http://python-pptx.readthedocs.io/en/latest/api/chart.html#pptx.chart.chart.Chart.replace_data
Adding to the answer by #scanny above, this worked for me:
if shape.name == 'Chart1':
chart = shape.chart
print(shape.name)
for series in chart.plots:
print(list(series.categories))
cat = list(series.categories)
for series in chart.series:
ser = series.values
print(series.values)
try:
# ---define new chart data---
chart_data = CategoryChartData()
chart_data.categories = cat
chart_data.add_series('category', df['column'])
# ---replace chart data---
chart.replace_data(chart_data)
except KeyError:
continue
Using the code above, you can print the categories and the series values, then replace them with your new values (while keeping category the same).
I added the KeyError exception because without it, you get a "rId3" error. From the forums it seems like there is some XML writing issue in writing to PPTX.

Categories

Resources