Create secondary axis for dataframe using pandas, python - python

Here is the example how dataframe is converted into column-stacked-chart. The result of this code is : What I want now is to add line to this chart. For that line I want to have additional secondary y axis. I don't wanna use matplotlib, because this library, as I found, is only for displaying the data and I want to save the data in excel file. I want to add secondary y axis using pandas.
import pandas as pd
from vincent.colors import brews
if __name__ == '__main__':
# Some sample data to plot.
farm_1 = {'apples': 10, 'berries': 32, 'squash': 21, 'melons': 13, 'corn': 18}
farm_2 = {'apples': 15, 'berries': 43, 'squash': 17, 'melons': 10, 'corn': 22}
farm_3 = {'apples': 6, 'berries': 24, 'squash': 22, 'melons': 16, 'corn': 30}
farm_4 = {'apples': 12, 'berries': 30, 'squash': 15, 'melons': 9, 'corn': 15}
data = [farm_1, farm_2, farm_3, farm_4]
index = ['Farm 1', 'Farm 2', 'Farm 3', 'Farm 4']
# Create a Pandas dataframe from the data.
df = pd.DataFrame(data, index=index)
# Create a Pandas Excel writer using XlsxWriter as the engine.
excel_file = 'C:/Users/gegij/Desktop/stacked_column_farms.xlsx'
sheet_name = 'Sheet1'
writer = pd.ExcelWriter(excel_file, engine='xlsxwriter')
df.to_excel(writer, sheet_name=sheet_name)
# Access the XlsxWriter workbook and worksheet objects from the dataframe.
workbook = writer.book
worksheet = writer.sheets[sheet_name]
# Create a chart object.
chart = workbook.add_chart({'type': 'column', 'subtype': 'stacked'})
# Configure the series of the chart from the dataframe data.
for col_num in range(1, len(farm_1) + 1):
chart.add_series({
'name': ['Sheet1', 0, col_num],
'categories': ['Sheet1', 1, 0, 4, 0],
'values': ['Sheet1', 1, col_num, 4, col_num],
'fill': {'color': brews['Pastel1'][col_num - 1]},
'gap': 20,
})
# Configure the chart axes.
chart.set_x_axis({'name': 'Total Produce'})
chart.set_y_axis({'name': 'Farms', 'major_gridlines': {'visible': False}})
# Insert the chart into the worksheet.
worksheet.insert_chart('H2', chart)
# Close the Pandas Excel writer and output the Excel file.
writer.save()

Related

Python: Iterating though dataframe columns as values in a function that prints charts

I'm trying to iterate through numeric fields in a data frame and create two separate bar charts one for Test1 and another for Test2 scores grouped by Name. I have a for loop that I get a type error on. I have a small sample of the data below but this for loop would run for data frame larger than 25 fields. Below is my code and error:
import pandas as pd
import matplotlib.pyplot as plt
data = {'Name': ['Tom', 'Joseph', 'Krish', 'John', 'Tom', 'Joseph', 'Krish', 'John'],
'Test1': [20, 21, 19, 18, 30, 33, 12, 10],
'Test2': [78, 89, 77, 91, 95, 90, 87, 70]}
df = pd.DataFrame(data)
for columns in df.columns[1:]:
data = df[(df.columns > 80 )].groupby(
df.Name, as_index = True).agg(
{columns: "sum"})
fig, (ax) = plt.subplots( figsize = (24,7))
data.plot(kind = 'bar', stacked = False,
ax = ax)
TypeError: '>' not supported between instances of 'str' and 'int'
Your program was having an issue with attempting to compare the data in the "Name" column with the integer value that you had in the variable definition line before it would move along to the other two columns.
data = df[(df.columns > 80 )].groupby(df.Name, as_index = True).agg({columns: "sum"})
The values in that column are strings which makes the function fail. Through some trial and error, I revised your program to just perform comparisons on columns two and three ("Test1" and "Test2"). Following is the revised code.
import pandas as pd
import matplotlib.pyplot as plt
data = {'Name': ['Tom', 'Joseph', 'Krish', 'John', 'Tom', 'Joseph', 'Krish', 'John'],
'Test1': [20, 21, 19, 18, 30, 33, 12, 10],
'Test2': [78, 89, 77, 91, 95, 90, 87, 70]}
df = pd.DataFrame(data)
for columns in df.columns[1:]:
data = df[(df['Test1'] > 20) | (df['Test2'] > 80)].groupby(df.Name, as_index = True).agg({columns: "sum"})
fig, (ax) = plt.subplots( figsize = (24,7))
data.plot(kind = 'bar', stacked = False, ax = ax)
plt.show()
Running that program produced the two bar charts.
You might want to experiment with the comparison values, but I think this should provide you with the information to move forward on your program.
Hope that helped.
Regards.

Insert description under graph in pandas

i used this example to get started with pandas:
http://pandas-xlsxwriter-charts.readthedocs.io/chart_grouped_column.html#chart-grouped-column
i also want to save the chart in excel just like in the example
i would like to know how e.g. in the example above i can add a description or a table under the graph chart
the only thing related i found was this :
Add graph description under graph in pylab
but this is done with pylab, is the same possible with pandas and an excel chart?
In Excel you could add a text box and insert some text but that isn't possible with XlsxWriter.
You could use the chart title property but in Excel the title is generally at the top and not the bottom.
You can reposition it, manually, in Excel. This is also possible with XlsxWriter using the layout options of the different chart objects.
Here is an example:
import xlsxwriter
workbook = xlsxwriter.Workbook('chart.xlsx')
worksheet = workbook.add_worksheet()
# Create a new Chart object.
chart = workbook.add_chart({'type': 'column'})
# Write some data to add to plot on the chart.
data = [
[1, 2, 3, 4, 5],
[2, 4, 6, 8, 10],
[3, 6, 9, 12, 15],
]
worksheet.write_column('A1', data[0])
worksheet.write_column('B1', data[1])
worksheet.write_column('C1', data[2])
# Configure the charts. In simplest case we just add some data series.
chart.add_series({'values': '=Sheet1!$A$1:$A$5'})
chart.add_series({'values': '=Sheet1!$B$1:$B$5'})
chart.add_series({'values': '=Sheet1!$C$1:$C$5'})
chart.set_x_axis({'name': 'X axis title'})
chart.set_y_axis({'name': 'Y axis title'})
chart.set_title({
'name': 'Here is some text to describe the chart',
'name_font': {'bold': False, 'size': 10},
'layout': {
'x': 0.25,
'y': 0.90,
}
})
chart.set_plotarea({
'layout': {
'x': 0.11,
'y': 0.10,
'width': 0.75,
'height': 0.60,
}
})
#Insert the chart into the worksheet.
worksheet.insert_chart('A7', chart)
workbook.close()
Note, you will need to do some trial and error with the layout property to get the layout that you want.
Output:

Bokeh not displaying plot for pandas

I can't get Bokeh to display my plot. This is my Python code.
import pandas as pd
from bokeh.plotting import figure, ColumnDataSource
from bokeh.io import output_file, show
if __name__ == '__main__':
file = 'Overview Data.csv'
overview_df = pd.read_csv(file)
overview_ds = ColumnDataSource(overview_df)
output_file('Wins across Seasons.html')
print(overview_ds.data)
p = figure(plot_width=400, plot_height=400)
# add a circle renderer with a size, color, and alpha
p.circle('Season', 'Wins', source = overview_ds, size=20, color="navy", alpha=0.5)
# show the results
show(p)
I checked my Chrome browser Inspect Element and the console shows the following.
Wins across Seasons.html:17 [bokeh] could not set initial ranges
e.set_initial_range # Wins across Seasons.html:17
This only seems to happen when I am reading from a file. Hard-coding x and y coordinates work.
I have checked other posts but none of the fixes worked. All my packages are up to date.
This is the file I am reading
Season,Matches Played,Wins,Losses,Goals,Goals Conceded,Clean Sheets
2011-12,38,28,5,89,33,20
2010-11,38,23,4,78,37,15
2009-10,38,27,7,86,28,19
2008-09,38,28,4,68,24,24
2007-08,38,27,5,80,22,21
2006-07,38,28,5,83,27,16
This is the output of the print statement.
{'Season': array(['2011-12', '2010-11', '2009-10', '2008-09', '2007-08', '2006-07'],
dtype=object), 'Matches Played': array([38, 38, 38, 38, 38, 38], dtype=int64), 'Wins': array([28, 23, 27, 28, 27, 28], dtype=int64), 'Losses': array([5, 4, 7, 4, 5, 5], dtype=int64), 'Goals': array([89, 78, 86, 68, 80, 83], dtype=int64), 'Goals Conceded': array([33, 37, 28, 24, 22, 27], dtype=int64), 'Clean Sheets': array([20, 15, 19, 24, 21, 16], dtype=int64), 'index': array([0, 1, 2, 3, 4, 5], dtype=int64)}
Bokeh does not know what to do with those string dates unless you tell it. There are two basic possibilities:
Keep them as strings, and treat them as categorical factors. You can do that by telling Bokeh what the factors are when you create the plot:
p = figure(plot_width=400, plot_height=400,
x_range=list(overview_df.Season.unique()))
That results in this figure:
If you want a different order of categories you can re-order x_range however you like.
Convert them to real datetime values and use a datetime axis. You can do this by telling Pandas to parse column 0 as a date field:
overview_df = pd.read_csv(file, parse_dates=[0])
and telling Bokeh to use a datetime axis:
p = figure(plot_width=400, plot_height=400, x_axis_type="datetime")
That results in this figure:
you can convert the 'Season'-column to datetime to get an output.
overview_df = pd.read_csv(file)
overview_df.Season = pd.to_datetime(overview_df.Season)
overview_ds = ColumnDataSource(overview_df)

XlsxWriter not writing to multiple columns simultaneously

I'm trying to generate simple pie chart for better visualization of some data, but XlsxWriter wont write data to two columns simultaneously. Where as other example is working fine.
I'm clueless where I might be going wrong
Following is the data :
{'core2': [10.3], 'core3': [4.17], 'core0': [58.68], 'core1': [24.42], 'core6': [0.02], 'core7': [0.0], 'core4': [2.31], 'core5': [0.12]})
Actual data is passed as list -> [10.3, 4.17, 58.68, 24.42, 0.02, 0.0, 2.31, 0.12] to below function
Please find below code :
def draw_simultaneously_busy_cores(type_of_chart,data,workbook):
print data
worksheet = workbook.add_worksheet()#'simultaneously_busy_cores')
bold = workbook.add_format({'bold': 1})
headings = [0, 1, 2, 3, 4, 5, 6, 7]
worksheet.write_column('$A$1', headings, bold)
worksheet.write_column('$B$1',headings)
chart1 = workbook.add_chart({'type': type_of_chart})
chart1.add_series({
'name': 'Simultaneous Busy Cores',
'categories': ['simultaneously_busy_cores', '=simultaneously_busy_cores!$A$1:$A$8'],
'values': ['simultaneously_busy_cores', '=simultaneously_busy_cores!$B$1:$B$8'],
#'data_labels': {'percentage': True, }
})
#Add a title.
chart1.set_title({'name': 'Simultaneous Busy Cores'})
#Set an Excel chart style. Colors with white outline and shadow.
chart1.set_style(10)
#Insert the chart into the worksheet (with an offset).
worksheet.insert_chart('C2', chart1, {'x_offset': 25, 'y_offset': 10})
thanks in advance. Image shows the outpput :
It should work. Here is an example with sample data:
import xlsxwriter
def draw_simultaneously_busy_cores(type_of_chart, data, workbook):
worksheet = workbook.add_worksheet('simultaneously_busy_cores')
bold = workbook.add_format({'bold': 1})
worksheet.write_column('A1', data, bold)
worksheet.write_column('B1', data)
chart1 = workbook.add_chart({'type': type_of_chart})
chart1.add_series({
'name': 'Simultaneous Busy Cores',
'categories': '=simultaneously_busy_cores!$A$1:$A$8',
'values': '=simultaneously_busy_cores!$B$1:$B$8',
})
#Add a title.
chart1.set_title({'name': 'Simultaneous Busy Cores'})
#Set an Excel chart style. Colors with white outline and shadow.
chart1.set_style(10)
#Insert the chart into the worksheet (with an offset).
worksheet.insert_chart('C2', chart1, {'x_offset': 25, 'y_offset': 10})
workbook = xlsxwriter.Workbook('test.xlsx')
data = [0, 1, 2, 3, 4, 5, 6, 7]
draw_simultaneously_busy_cores('line', data, workbook)
workbook.close()
Output:
The chart categories and values syntax in your example is incorrect. You are mixing the list and string syntaxes. Read through the documentation and examples again.

Openpyxl charts - data series from random unconnected cells possible?

I'm currently experimenting with the python module openpyxl, trying to automate some tasks at work and generate spreadsheets automatically. For one of the required sheets I need to generate a scatter chart from tabulated data. However, the scatter chart should consist from multiple lines connecting two points each, so each of the individual x/y series in the scatter chart should connect two points only.
Generally I found from the openpyxl documentation that scatter charts are generated like in this small example:
from openpyxl import Workbook
from openpyxl.chart import (
ScatterChart,
Reference,
Series,
)
wb = Workbook()
ws = wb.active
rows = [
['Size', 'Batch 1', 'Batch 2'],
[2, 40, 30],
[3, 40, 25],
[4, 50, 30],
[5, 30, 25],
[6, 25, 35],
[7, 20, 40],
]
for row in rows:
ws.append(row)
chart = ScatterChart()
chart.title = "Scatter Chart"
chart.style = 13
chart.x_axis.title = 'Size'
chart.y_axis.title = 'Percentage'
xvalues = Reference(ws, min_col=1, min_row=2, max_row=7)
for i in range(2, 4):
values = Reference(ws, min_col=i, min_row=1, max_row=7)
series = Series(values, xvalues, title_from_data=True)
chart.series.append(series)
ws.add_chart(chart, "A10")
wb.save("scatter.xlsx")
However, the x (and y) coordinates of the two points I would like to connect in the scatter points are not located in adjacent cells.
So when I import the data series manually in excel by holding 'ctrl' and select two cells I get something like this:
'Sheet!$A$4;Sheet!$A$6'
instead of
'Sheet!$A$4:$A$6'
when dragging the cursor to select a range of cells.
For only two individual not-adjacent cells this means that I do not have a clear min_row/min_col/max_row etc.. but only a list of cell pairs (for both x and y). Is there a way create a data series in openpyxl as a list of cells instead of a connected/adjacent range?
Help would be much appreciated! :)
There are currently no plans to support non-contiguous cell ranges in chart series. I would suggest you try and arrange your data or create references to it that will allow you to work with contiguous ranges.
Are you sure this doesn't work?
I've modified the example for a situation like yours and it seems to work for me:
from openpyxl import Workbook
from openpyxl.chart import (
ScatterChart,
Reference,
Series,
)
wb = Workbook()
ws = wb.active
rows = [
['Size'],
[2, 'Batch 1', 'Batch 2'],
[3, 40, 30],
[4, 40, 25],
[5, 50, 30],
[6, 30, 25],
[7, 25, 35],
[None, 20, 40],
]
for row in rows:
ws.append(row)
chart = ScatterChart()
chart.title = "Scatter Chart"
chart.style = 13
chart.x_axis.title = 'Size'
chart.y_axis.title = 'Percentage'
xvalues = Reference(ws, min_col=1, min_row=2, max_row=7)
for i in range(2, 4):
values = Reference(ws, min_col=i, min_row=2, max_row=8)
series = Series(values, xvalues, title_from_data=True)
chart.series.append(series)
ws.add_chart(chart, "A10")
wb.save("scatter.xlsx")
Result:

Categories

Resources