Read chart data from existing chart using python-pptx - python

I'd like to use the python-pptx library to read data from charts inside a presentation. I've seen the documentation to replace chart data, but I can't figure out how to read data.

Chart data is:
the chart's chart-type
its category names (and possibly hierarchical organization)
its series names
and its series values
These are available at the plot level, for example:
>>> chart.chart_type
DOUGHNUT
>>> plot = chart.plots[0]
>>> category_labels = [c.label for c in plot.categories]
["West", "North", "South"]
>>> series = plot.series[0]
>>> series.name
'Sales'
>>> series.values
(17.8, 24.5, 88.0)
There are subtleties depending on the chart/plot type. Consult the API document here to learn more about those.
https://python-pptx.readthedocs.io/en/latest/api/chart.html

Related

Display additional values in holoviews sankey labels or hover information

I would like to find a way to modify the labels on holoviews sankey diagrams that they show, in addition to the numerical values, also the percentage values.
For example:
import holoviews as hv
import pandas as pd
hv.extension('bokeh')
data = {'A':['XX','XY','YY','XY','XX','XX'],
'B':['RR','KK','KK','RR','RK','KK'],
'values':[10,5,8,15,19,1]}
df = pd.DataFrame(data, columns=['A','B','values'])
sankey = hv.Sankey(df)
For 'From' label 'YY' which is 'YY - 8' change this to 'YY - 8 (13.7%)' - add the additional percentage in there.
I have found ways to change from the absolute value to percentage by using something along the lines of:
value_dim = hv.Dimension('Percentage', unit='%')
But can't find a way to have both values in the label.
Additionally, I tried to modify the hover tag. In my search to find ways to modify this I found ways to reference and display various attributes in the hover information (through the bokeh tooltips) but it does not seem like you can manipulate this information.
In this post two possible ways are explained how to achive the wanted result. Let's start with the example DataFrame and the necessary imports.
import holoviews as hv
from holoviews import opts, dim # only needed for 2. solution
import pandas as pd
data = {'A':['XX','XY','YY','XY','XX','XX'],
'B':['RR','KK','KK','RR','RK','KK'],
'values':[10,5,8,15,19,1],
}
df = pd.DataFrame(data)
1. Option
Use hv.Dimension(spec, **params), which gives you the opportunity to apply a formatter with the keyword value_format to a column name. This formatter is simple the combination of the value and the value in percent.
total = df.groupby('A', sort=False)['values'].sum().sum()
def fmt(x):
return f'{x} ({round(x/total,2)}%)'
hv.Sankey(df, vdims = hv.Dimension('values', value_format=fmt))
2. Option
Extend the DataFrame df by one column wich stores the labels, you want to use. This can be later reused inside the Sankey, with opts(labels=dim('labels')). To ckeck if the calculations are correct, you can turn show_values on, but this will cause a duplicate inside the labels. Therefor in the final solution show_values is set to False. This can be sometime tricky to find the correct order.
labels = []
for item in ['A', 'B']:
grouper = df.groupby(item, sort=False)['values']
total_sum = grouper.sum().sum()
for name, group in grouper:
_sum = group.sum()
_percent = round(_sum/total_sum,2)
labels.append(f'{name} - {_sum} ({_percent}%)')
df['labels'] = labels
hv.Sankey(df).opts(show_values=False, labels=dim('labels'))
The downside of this solution is, that we apply a groupby for both columns 'A' and 'B'. This is something holoviews will do, too. So this is not very efficient.
Output
Comment
Both solutions create nearly the same figure, except that the HoverTool is not equal.

Pandas data frame gets sorted differently when filtering on different columns

I am using plotly dash for visual representation of data analysis that I have performed on database of IPL. I have bunch of csv that I have exported from sql views.
And now I am reading this csv with the help of pandas and giving the retrieved data based on my filters to plotly graph.
The problem is data comes sorted based on different columns when filter is applied on a different column, i.e. When I filter data by season_id data comes sorted based on runs and when I filter data by team_bowling data comes sorted based on match_id.
I am not able to understand this behavior of filtering or pandas data frame.
Here is my code and the output.
stats = pd.read_csv('data_files/All_Season_Batsman_Runs.csv', delimiter=',')
kohli = stats[stats.Player_Name == 'V Kohli'][stats.Season_Id == 1]
print(kohli)
stats = pd.read_csv('data_files/All_Season_Batsman_Runs.csv', delimiter=',')
kohli = stats[stats.Player_Name == 'V Kohli'][stats.Team_Bowling == 1]
print(kohli)
I am using
Pandas => 0.23.4
Python => 3.7
Looking at the index numbers, the original file has some sorting already. Possibly by season and runs. Nothing unexpected as far is I can tell.

Generating a Plotly Heat Map from a pandas pivot table

I've been searching this topic for a couple hours now, and still can't get my code to work. I am trying to generate a heat map from a pivot table I've created using pandas. I'm very new to coding and am probably not using the right terminology, but I'll try my best.
My table looks like this:
enter image description here
It has many more rows as well. I am trying to generate a plotly heat map with the countries on the y axis, the 4 ownership types on the x, and the numeric values being used as the z values. I've been getting a lot of errors, but I think I'm getting close because it gets to my last line and says "TypeError: Object of type 'DataFrame' is not JSON serializable." I've searched this error but can't find anything that I can understand. I set up the table like so, and am having trouble with the z, x, and y inputs:
data = [go.Heatmap(z=[Country_Ownership_df[['Company Owned', 'Franchise', 'Joint Venture', 'Licensed']]],
y=[Country_Ownership_df['Country']],
x=['Company Owned', 'Franchise', 'Joint Venture', 'Licensed'],
colorscale=[[0.0, 'white'], [0.000001, 'rgb(191, 0, 0)'], [.001, 'rgb(209, 95, 2)'], [.005, 'rgb(244, 131, 67)'], [.015, 'rgb(253,174,97)'], [.03, 'rgb(249, 214, 137)'], [.05, 'rgb(224,243,248)'], [0.1, 'rgb(116,173,209)'], [0.3, 'rgb(69,117,180)'], [1, 'rgb(49,54,149)']])]
layout = go.Layout(
margin = dict(t=30,r=260,b=30,l=260),
title='Ownership',
xaxis = dict(ticks=''),
yaxis = dict(ticks='', nticks=0 )
)
fig = go.Figure(data=data, layout=layout)
#iplot(fig)
plotly.offline.plot(fig, filename= 'tempfig3.html')
It's probably a fairly simple task, I just haven't learned much with coding and appreciate any help you could offer.
Plotly takes the data arguments as lists, and doesn't support Pandas DataFrames. To get a DataFrame that is already in a correct format,
data as the values ('z' in Plotly notation),
x-values as columns
y-values as index
The following function works:
def df_to_plotly(df):
return {'z': df.values.tolist(),
'x': df.columns.tolist(),
'y': df.index.tolist()}
As it returns a dict, you can directly pass it as an argument to go.HeatMap:
import plotly.graph_objects as go
fig = go.Figure(data=go.Heatmap(df_to_plotly(df)))
fig.show()
Apparently Plotly doesn't directly support DataFrames. But you can turn your DataFrames into dictionaries of lists like this:
Country_Ownership_df[['foo', 'bar']].to_dict()
Then non-Pandas tools like Plotly should work, because dicts and lists are JSON serializable by default.

Plotting dictionary values into multi_line / timeseries Bokeh chart

Note from maintainers: This question is about the obsolete bokeh.charts API removed years ago. For information on plotting with modern Boheh, including timseries, see:
https://docs.bokeh.org/en/latest/docs/user_guide/plotting.html
I have a defined dictionary of values where key is a date in a form of string and values are an array of floats.
Dictionary looks like this:
dict =
{'2017-03-23': [1.07874, 1.07930, 1.07917, 1.07864,],
'2017-03-27': [1.08382, 1.08392, 1.08410, 1.08454],
'2017-03-24': [1.07772, 1.07721, 1.07722, 1.07668]}
I want to display each date as a separate line on a Bokeh line_chart. Since the dates interval will change over time, I do not want to simply define p1.line, p2.line, p3.line (a static set) for each date because the amount of plotted dates will vary over time.
I have tried to follow tutorials here: http://docs.bokeh.org/en/0.9.3/docs/user_guide/charts.html but I keep struggling and getting errors.
Here is my code:
#input dates at this occasion
dates = ['2017-03-27','2017-03-24', '2017-03-23']
#dataframe is taken from input and contains columns date,time,close and other columns that I am not using
df
#I create a dictionary of dataframe in the structure described above
dict = {k: list(v) for k, v in df.groupby("date")["close"]}
#i want to plot chart
output_file("chart2.html")
p = figure(title="Dates line charts", x_axis_label='Index', y_axis_label='Price')
p = TimeSeries(dict, index='Index', legend=True, title="FX", ylabel='Price Prices')
show(p)
I am getting this error:
AttributeError: unexpected attribute 'index' to Chart, possible attributes are above, background_fill_alpha, background_fill_color, below, border_fill_alpha, border_fill_color, css_classes, disabled, extra_x_ranges, extra_y_ranges, h_symmetry, height, hidpi, inner_height, inner_width, js_callbacks, left, lod_factor, lod_interval, lod_threshold, lod_timeout, min_border, min_border_bottom, min_border_left, min_border_right, min_border_top, name, outline_line_alpha, outline_line_cap, outline_line_color, outline_line_dash, outline_line_dash_offset, outline_line_join, outline_line_width, plot_height, plot_width, renderers, right, sizing_mode, tags, title, title_location, tool_events, toolbar, toolbar_location, toolbar_sticky, v_symmetry, webgl, width, x_mapper_type, x_range, xlabel, xscale, y_mapper_type, y_range, ylabel or yscale
Thank you for the help.
Note from maintainers: This question is about the obsolete bokeh.charts API removed years ago. For information on plotting with modern Boheh, including timseries, see:
https://docs.bokeh.org/en/latest/docs/user_guide/plotting.html
You are looking at very old documentation (0.9.3). The latest documentation (0.12.4) for bokeh Timeseries can be found here.
As you can see, Timeseries no longer accepts an index parameter. The available parameters are
data (list(list), numpy.ndarray, pandas.DataFrame, list(pd.Series)) –
a 2d data source with columns of data for each stepped line.
x (str or
list(str), optional) – specifies variable(s) to use for x axis
y (str
or list(str), optional) – specifies variable(s) to use for y axis
builder_type (str or Builder, optional) – the type of builder to use
to produce the renderers. Supported options are ‘line’, ‘step’, or
‘point’.
Just follow the example given in the most recent documentation and you should not run into the same problem.

Formatting in HoverTool

I love how easy it is to set up basic hover feedback with HoverTool, but I'm wrestling with a couple aspects of the display. I have time-series data, with measurements that represent amounts in US$. This data starts out life as a pandas.Series. Legible plotting is easy (following assumes jupyter notebook):
p = figure(title='Example currency', x_axis_type='datetime',
plot_height=200, plot_width=600, tools='')
p.line(my_data.index, my_data)
p.yaxis[0].formatter = NumeralTickFormatter(format='$0,0')
show(p)
This shows me the time-series, with date-formatting on the x-axis and y-axis values that look like "$150,000", "$200,000", "$250,000", etc. I have two questions about HoverTool behavior:
Controlling formatting for $x and $y.
Accessing the name of the dataset under the cursor.
Simply adding a HoverTool allows me to see values, but in unhelpful units:
p.add_tools(HoverTool())
The corresponding tooltip values with these defaults show "1.468e+5" rather than "$146,800" (or even "146800", the underlying Series value); similarly, the date value appears as "1459728000000" rather than (say) "2016-04-04". I can manually work around this display issue by making my pandas.Series into a ColumnDataSource and adding string columns with the desired formatting:
# Make sure Series and its index have `name`, before converting to DataFrame
my_data.name = 'revenue'
my_data.index.name = 'day'
df = my_data.reset_index()
# Add str columns for tooltip display
df['daystr'] = df['day'].dt.strftime('%m %b %Y')
df['revstr'] = df['revenue'].apply(lambda x: '${:,d}'.format(int(x)))
cds = ColumnDataSource(df)
p = figure(title='Example currency', x_axis_type='datetime',
plot_height=200, plot_width=600, tools='')
p.line('day', 'revenue', source=cds)
p.yaxis[0].formatter = NumeralTickFormatter(format='$0,0')
p.add_tools(HoverTool(tooltips=[('Amount', '#revstr'), ('Day', '#daystr')]))
show(p)
but is there a way to handle the formatting in the HoverTool configuration instead? That seems much more desirable than all the data-set transformation that's required above. I looked through the documentation and (quickly) scanned through the source, and didn't see anything obvious that would save me from building the "output" columns as above.
Related to that, when I have several lines in a single plot, is there any way for me to access the name (or perhaps legend value) of each line within HoverTool.tooltips? It would be extremely helpful to include something in the tooltip to differentiate which dataset values are coming from, rather than needing to rely on (say) line-color in conjunction with the tool-tip display. For now, I've added an additional column to the ColumnDataSource that's just the string value I want to show; that obviously only works for datasets that include a single measurement column. When multiple lines are sharing an underlying ColumnDataSource, it would be sufficient to access the column-name that's provided to y.
Hey i know its 2 years late but this is for other people that come across this
p.add_tools(HoverTool(
tooltips=[
('Date', '#Date{%F}'),
('Value', '#Value{int}')],
formatters={
'Date':'datetime',
'Value':'numeral'},mode='vline'
))

Categories

Resources