Sorting columns with pandas / xslxwriter after adding formulas - python

I have my dataframe with few columns, that's actually irrelevant to this problem, but I wanted to sort my columns in specific order.
Now, the issue is that I have a bunch of formulas that refer to excel tables (that I'm creating with xslxwriter worksheet.add_table), like for example:
planned_units = '=Table1[#[Spend]]/Table1[#[CP]]'
So if I will add those formulas by simply adding a column in pandas:
df['newformula'] = planned_units
it won't work, I think because I added a formula that references a table before actually adding a table. So sorting those columns before adding formulas won't work because:
I'm adding formulas later (after creating a table) but I also want to sort columns that I just added
if I'm adding formulas referencing an excel table before add_table, then those formulas
won't work in excel
It seems that xslxwriter doesn't allow me to sort columns in any way (maybe im wrong?) so I don't see any possibility of sorting columns after I have my final 'product' (after adding all columns with formulas).
It's still better to have working formulas instead of sorted columns, but I will happily welcome any ideas on how to sort them at this point.
thanks!
PS Code example:
import pandas as pd
import xlsxwriter
# simple dataframe with 3 columns
input_df = pd.DataFrame({'column_a': ['x', 'y', 'z'],
'column_b': ['red', 'white', 'blue'],
'column_c': ['a', 'e', 'i'],
})
output_file = 'output.xlsx'
# formula I want to add
column_concatenation = '=CONCATENATE(Table1[#[column_a]], " ", Table1[#[column_b]])'
# now if adding formulas with pandas would be possible, I would do it like this:
# input_df['concatenation'] = column_concatenation
# but its not possible since excel gives you errors while opening!
# adding excel table with xlsxwriter:
workbook = xlsxwriter.Workbook(output_file)
worksheet = workbook.add_worksheet("Sheet with formula")
# here I would change column order only IF formulas added with pandas would work! so no-no
'''
desired_column_order = ['columnB', 'concatenation', 'columnC', 'columnA']
input_df = input_df[desired_column_order]
'''
data = input_df
worksheet.add_table('A1:D4', {'data': data.values.tolist(),
'columns': [{'header': c} for c in data.columns.tolist()] +
[{'header': 'concatenation',
'formula': column_concatenation}
],
'style': 'Table Style Medium 9'})
workbook.close()
Now before workbook.close() I'd love to use that table 'desired_column_order' to re-order my columns after I've added formulas.
thanks:)

It looks like there are two issues here: sorting and the table formula.
Sorting is something that Excel does at runtime, in the Excel application and it isn't a property of, or something that can be triggered in, the file format. Since XlsxWriter only deals with the file format it cannot do any sorting. However, the data can be sorted in Python/Pandas prior to writing it with XlsxWriter.
The formula issue is due to the fact that Excel had an original [#This Row] syntax (Excel 2007) and a later # syntax (Excel 2010+). See the XlsxWriter docs on Working with Worksheet Tables - Columns:
The Excel 2007 style [#This Row] and Excel 2010 style # structural references are supported within the formula. However, other Excel 2010 additions to structural references aren’t supported and formulas should conform to Excel 2007 style formulas.
So basically you need to use the Excel 2007 syntax, since that is what is stored in the file format, even if Excel displays the Excel 2010+ syntax externally.
When you add formulas via the add_table() method XlsxWriter does the conversion for you but if you add the formulas in another way, such as via Pandas, you need to use the Excel 2007 syntax. So instead of a formula like this:
=CONCATENATE(Table1[#[column_a]], " ", Table1[#[column_b]])
You need to add this:
=CONCATENATE(Table1[[#This Row],[column_a]], " ", Table1[[#This Row],[column_b]])
(You can see why the moved to the shorter syntax in later Excel versions.)
Then your program will work as expected:
import pandas as pd
import xlsxwriter
input_df = pd.DataFrame({'column_a': ['x', 'y', 'z'],
'column_b': ['red', 'white', 'blue'],
'column_c': ['a', 'e', 'i'],
})
output_file = 'output.xlsx'
column_concatenation = '=CONCATENATE(Table1[[#This Row],[column_a]], " ", Table1[[#This Row],[column_b]])'
input_df['concatenation'] = column_concatenation
workbook = xlsxwriter.Workbook(output_file)
worksheet = workbook.add_worksheet("Sheet with formula")
desired_column_order = ['column_b', 'concatenation', 'column_c', 'column_a']
input_df = input_df[desired_column_order]
data = input_df
# Make the columns wider for clarity.
worksheet.set_column(0, 3, 16)
worksheet.add_table('A1:D4', {'data': data.values.tolist(),
'columns': [{'header': c} for c in data.columns.tolist()] +
[{'header': 'concatenation'}],
'style': 'Table Style Medium 9'})
workbook.close()
Output:

Related

Python number format changing after styling

I have a dataframe that resembles the following:
Name
Amount
A
3,580,093,709.00
B
5,656,745,317.00
Which I am then applying some styling using CSS, however when I do this the Amount values become scientific formatted so 3.58009e+09 and 5.39538e+07.
Name
Amount
A
3.58009e+09
B
5.65674e+07
How can I stop this from happening?
d = {'Name': ['A', 'B'], 'Amount': [3580093709.00, 5656745317.00]}
df = pd.DataFrame(data=d)
df= df.style
df
You are not showing how you are styling the columns but, to set it as a float with two decimals, you should add the following to your styler, based on the first line of Pandas documentation (they write it for something):
df = df.style.format(formatter={('Amount'): "{:.2f}"})
Here is the link for more information:
https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html

ipysheet.sheet converting to DataFrame with saving manual changes done

The aim is to create an interaction dataframe where I can handle values of the cells without coding.
For me it seems should be in the following way:
creating ipysheet.sheet
handling cells manually
converting it to pandas dataframe
the problem is:
after creating a ipysheet.sheet I manualy changed the values of some cells and then convert it to pandas dataframe, but changes done are not reflected in this datafarme; if you just call this sheet without converting you can see these changes;
d = {'col1': [2,8], 'col2': [3,6]}
df = pd.DataFrame(data=d)
sheet1 = ipysheet.sheet(rows=len(df.columns) +1 , columns=3)
first_col = df.columns.to_list()
first_col.insert(0, 'Attribute')
column = ipysheet.column(0, value = first_col, row_start = 0)
cell_value1 = ipysheet.cell(0, 1, 'Format')
cell_value2 = ipysheet.cell(0, 2, 'Description')
sheet1
creating a sheet1
ipysheet.to_dataframe(sheet1)
converting to pd.DataFrame
Solved by predefining all empty spaces as np.nan. You can handle it manually and it transmits to DataFrame when converting.

How do I create multiple dataframes from a loop based on two existing dataframes and matching creteria?

I think the below code might make my question easier to understand. But I'll try explain what I want to do anyway.
I have two dataframes, There is one column common to each. I want the rows from df2 that match based on the values in df1's col 1 to be put in a separate dataframe and loop through df2 until I have new dataframes for each of the criteria in df1's col1.
Dataframes
df1 = pd.DataFrame([['a', '1'], ['p', '3']],
columns=['col 1', 'col 2'])
df2 = pd.DataFrame([['t','a', '1'], ['q','a', '2'], ['x','p', '3']],
columns=['col 1', 'col 2', 'col 3'])
for strategy in df2:
if df2[df2['col 2']] == df1[df1['col 1']]:
df = df2[df2['col 2']] == df1[df1['strategy']]
df.to_excel("output.xlsx", sheet_name = 'Sheet_name_1')
After that, I want to use each new dataframe in the loop and perform a function on it and then export that new dataframe to excel. But for the time being lets focus on the first problem.
Your code sample is unclear about what should contain each sheet in the output
Excel file.
So I took another approach to generate DataFrames for each sheet:
For each value in col 2 column in df2 (I named it vv):
generate a (temporary) DataFrame - rows from df1 with col 1
equal to vv,
save it as a sheet with the name taken form vv.
To write an Excel file with multiple sheets, you have to:
definie an ExcelWriter object, connected to the output file
(before the loop),
call to_excel, passing the above object and the sheet name
(in the loop),
close the Excel writer (after the loop).
The code to do it is:
exw = pd.ExcelWriter('Output.xlsx')
for vv in df2['col 2'].unique():
df1.query('`col 1` == #vv').to_excel(exw, vv)
exw.save()
Note how concise it this code:
df1.query('`col 1` == #vv') - generates the output DataFrame
(from df1),
to_excel(exw, vv) - writes it to the output file (as a separate sheet).
Now you have to:
define what should be the content of each sheet,
adjust my code accordingly.
Below is a solution provided to me by a colleague in work. A big thank you to him.
So, we have two dataframes, with a common columns being Strategy and Style in both (indices and funds).
We set the index to the Strategy column values in the indices dataframe.
We then create a dictionary of dataframes from the funds dataframe. Each of these is based on the Style value.
Now we create an example function that we can run.
And finally we create a function that selects each of the new dataframes and takes the index assigned to each fund and passes them to our new function and gives us our final set of dataframes. At this point we would export each to excel individually per sheet (I'm still working on this part)
import pandas as pd
indices = pd.DataFrame([['Volatility', 'SPX Index'], ['Multistrategy', 'SXXP Index'], ['Real Estate', 'IBEX Index']],
columns=['Strategy', 'Ticker'])
indices = indices.set_index('Strategy') # i would have the strategy as index to help with mapping later
funds = pd.DataFrame([['MABAX US Equity','Real Estate', 'name1'], ['SPY US Equity','Real Estate', 'name2'],
['AAPL US Equity','Volatility', 'name3']],
columns=['Ticker', 'Style', 'Name'])
strategies = indices.index.values
data_frames = {strategy:funds[funds['Style']==strategy] for strategy in strategies}
#this gives you a dictionary of dataframes so you can get funds of volatility with
# something like: data_frames['Volatility']
def my_function(fund_df,parameter,index_ticker):
#this is your function and it takes the fund df of each
#strategy and also the index ticker of each strategy
print(fund_df)
print(index_ticker)
return
#you could run the function for each dataframe and each index by using something like:
for strategy in strategies:
index_ticker = indices.loc[strategy,'Ticker']
fund_df = data_frames[strategy]
my_function(fund_df,'M',index_ticker)

Python function inputting variables into strings

I have outputted a pandas df into an excel file using xlsxwriter. I'm trying to create a totals row at the top. To do so, I'm trying to create a function that dynamically populates the totals based off the column I choose.
Here is an example of what I'm intending to do:
worksheet.write_formula('G4', '=SUM(G4:G21)')
#G4 = Where total should be placed
I need this to be a function because the row counts can change (summation range should be dynamic), and I want there to be an easy way to apply this formula to various columns.
Therefore I've come up with the following:
def get_totals(column):
start_row = '4' #row which the totals will be on
row_count = str(tl_report.shape[0]) #number of rows in the table so I can sum up every row.
return (worksheet.write_formula(str(column+start_row),"'=SUM("+str(column+start_row)+":"+str(column+row_count)+")'") )
When running get_totals("G") it just results in 0. I'm suspecting it has to do with the STR operator that I had to apply because its adding single quotes to the formula, and therefore rendering it incorrectly.
However I cannot take the str operator out because I cannot concatenate INTs apparently?
Maybe I'm coding this all wrong, new to python, any help appreciated.
Thank you!
You could also do something like this:
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1,2,3,4], 'B': [5,6,7,8],
'C': [np.nan, np.nan, np.nan, np.nan]})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False, startrow = 2)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
def get_totals(start_row, sum_column, column1='A', column2='B'):
for row in range(start_row,df.shape[0]+start_row):
worksheet.write_formula(f'{sum_column}{row}', f'=SUM({column1}{row},{column2}{row})')
get_totals(4, 'C')
writer.save()
Output:
In almost all cases XlsxWriter methods support two forms of notation to designate the position of cells: Row-column notation and A1 notation.
Row-column notation uses a zero based index for both row and column while A1 notation uses the standard Excel alphanumeric sequence of column letter and 1-based row. For example:
(6, 2) # Row-column notation.
('C7') # The same cell in A1 notation.
So for your case you could do the following and set the row-column values programatically (you may have to adjust by -1 to get zero indexing):
worksheet.write_formula(start_row, start_column, '=SUM(G4:G21)')
For the formula you could use of of XlsxWriter's utility functions:
from xlsxwriter.utility import xl_range
my_range = xl_range(3, 6, 20, 6) # G4:G21

How to change the font-size of text in dataframe using pandas

I have studied the styling documentation of pandas but not able to exactly get a particular and precise answer of my question. I am reading an excel file using dataframe and processing that dataframe in my program. At last I am writing processed dataframe in another existing excel file using xlwings library.
I am using-
import pandas as pd
import xlwings as xw
df = pd.read_excel("file.xlsx")
.
. #Code for processing dataframe and writing dataframe in another excel file
.
Before writing this dataframe in another existing excel I want to change the font-size of whole text inside my final dataframe. I am not able to get the way to do it.
I have found following code in pandas styling document to achieve it-
def magnify():
return [dict(selector="th",
props=[("font-size", "4pt")]),
dict(selector="td",
props=[('padding', "0em 0em")]),
dict(selector="th:hover",
props=[("font-size", "12pt")]),
dict(selector="tr:hover td:hover",
props=[('max-width', '200px'),
('font-size', '12pt')])
]
I have used above code in my program but font-size of my dataframe remains same.It creates no effect to font-size. I have tried some other methods using styling also but font-size remains same.
Can anyone please tell me in very simple manner how to only change the font-size of my final dataframe using pandas or any other library. Because I have tried many ways but none of ways works for me.I only want to change the font-size and not want to do more styling with my font.
You can set one or more properties for each cell using set_properties().
df = pd.DataFrame({
'date': ('2019-11-29', '2016-11-28'),
'price': (0, 1),
})
df = df.style.set_properties(**{
'background-color': 'grey',
'font-size': '20pt',
})
df.to_excel('test.xlsx', engine='openpyxl')
Also you can use apply() method to customize specific cells:
def custom_styles(val):
# price column styles
if val.name == 'price':
styles = []
# red prices with 0
for i in val:
styles.append('color: %s' % ('red' if i == 0 else 'black'))
return styles
# other columns will be yellow
return ['background-color: yellow'] * len(val)
df = pd.DataFrame(...)
df = df.style.apply(custom_styles)
df.to_excel('test.xlsx', engine='openpyxl')
Also you can use applymap method which works elementwise. You can find more examples in docs.
Hope this helps.

Categories

Resources