I have a dataframe which is constructed using a list and other dataframe that i read from excel file.
What I want to do is, I just have to apply the background color to first row of a dataframe which I would export in to an excel.
The below code doing the job correclty as expected.(There is issue with the data)
The issue is the style which I have applied to the dataframe was not reflected in the excel sheet. I am using Jupyter Notebook.
Please suggest a way to get the styles in excel.
import pandas as pd
sheet1 = r'D:\dinesh\input.xlsx'
sheet2 = "D:\dinesh\lookup.xlsx"
sheet3 = "D:\dinesh\Output.xlsx"
sheetname = 'Dashboard Índice (INPUT)'
print('Started extracting the Crime Type!')
df1 = pd.read_excel(sheet1,sheet_name = 'Dashboard Índice (INPUT)',skiprows=10, usecols = 'B,C,D,F,H,J,L,N,P', encoding = 'unicode_escape')
crime_type = list(df1.iloc[:0])[3:]
print(f'crime_types : {crime_type}')
df1 = (df1.drop(columns=crime_type,axis=1))
cols = list(df1.iloc[0])
print(f'Columns : {cols}')
df1.columns = cols
df1 = (df1[1:]).dropna()
final_data = []
for index, row in df1.iterrows():
sheetname = (f'{row[cols[1]]:0>2d}. {row[cols[0]]}')
cnty_cd = [row[cols[0]], row[cols[1]], row[cols[2]]]
wb = pd.ExcelFile(sheet2)
workbook = ''.join([workbook for workbook in wb.sheet_names if workbook.upper() == sheetname])
if workbook:
df2 = pd.read_excel(sheet2, sheet_name = workbook, skiprows=7, usecols ='C,D,H:T', encoding = 'unicode_escape')
df2_cols = list(df2.columns)
final_cols = cols + df2_cols
df2 = df2.iloc[2:]
df2 = df2.dropna(subset=[df2_cols[1]])
for index2, row2 in df2.iterrows():
if row2[df2_cols[1]].upper() in crime_type:
s1 = pd.Series(cnty_cd)
df_rows = (pd.concat([s1, row2], axis=0)).to_frame().transpose()
final_data.append(df_rows)
break
else:
print(f'{sheetname} does not exists!')
df3 = pd.concat(final_data)
df3.columns = final_cols
df_cols = (pd.Series(final_cols, index=final_cols)).to_frame().transpose()
df_final = (pd.concat([df_cols,df3], axis=0, ignore_index=True, sort=False))
df_final.style.apply(lambda x: ['background: blue' if x.name==0 else '' for i in x], axis=1)
df_final.to_excel(sheet3, sheet_name='Crime Details',index=False,header = None)
print(f'Sucessfully created the Output file to {sheet3}!')
You need to export to excel the styled dataframe and not the unstyled dataframe and so you either need to chain your styling and sending to Excel together, similar to shown in the documentation here, or assign the styled dataframe and use that to send to Excel.
The latter could look like this based on your code:
df_styled = df_final.style.apply(lambda x: ['background: blue' if x.name==0 else '' for i in x], axis=1)
df_styled.to_excel(sheet3,, sheet_name='Crime Details',index=False, header = None, engine='openpyxl')
As described here you need either the OpenPyXL or XlsxWriter engines for export.
Related
I am working on combining two excel files that that the same columns but have different values. I would like to convert all numbers into currency form ($ and commas). I've been able to do this but would like to find a more simple way to write the code.
Also, I need help with the output file. I cannot open it unless I close python. It says "Cannot access this file" and is always syncing. Anyone know any solutions?
Here is my code
import pandas as pd
import openpyxl
import xlsxwriter
outputfile = "Outputfile.xlsx"
excel_files = ["File1.xlsx",
"File2.xlsx"]
def combine_excel(excel_files, sheet_name):
sheet_frames = [pd.read_excel(x, sheet_name=sheet_name) for x in excel_files]
combined_df = pd.concat(sheet_frames).reset_index(drop=True)
return combined_df
df1 = combine_excel(excel_files, 0)
df2 = combine_excel(excel_files, 1)
df3 = combine_excel(excel_files, 2)
df4 = combine_excel(excel_files, 3)
df5 = combine_excel(excel_files, 4)
df6 = combine_excel(excel_files, 5)
df7 = combine_excel(excel_files, 6)
for x in df1.iloc[:,[10,11,12,13,14,15,16,17,18,19,20,26,27,28,29,30]]:
df1[x] = df1[x].apply(lambda x: f"${x:,.0f}")
for x in df2.iloc[:,[10,11,12,13,14,15,16,17,18,19,20,26,27,28,29,30]]:
df2[x] = df2[x].apply(lambda x: f"${x:,.0f}")
.
.
.
.
.
.
writer = pd.ExcelWriter(outputfile, engine='xlsxwriter')
df1.to_excel(writer, sheet_name ='Column1', index = False)
df2.to_excel(writer, sheet_name='Column2', index = False)
df3.to_excel(writer, sheet_name='Column3', index = False)
df4.to_excel(writer, sheet_name='Column4', index = False)
df5.to_excel(writer, sheet_name='Column5', index = False)
df6.to_excel(writer, sheet_name='Column6', index = False)
df7.to_excel(writer, sheet_name ='Column7', index = False)
writer.save()
As you can see I would like to make this part more simple to read and write:
for x in df1.iloc[:,[10,11,12,13,14,15,16,17,18,19,20,26,27,28,29,30]]:
df1[x] = df1[x].apply(lambda x: f"${x:,.0f}")
for x in df2.iloc[:,[10,11,12,13,14,15,16,17,18,19,20,26,27,28,29,30]]:
df2[x] = df2[x].apply(lambda x: f"${x:,.0f}")
.
.
.
.
.
.
There is a total 12 lines of code just to convert a number of columns into currency form. Is there a way to do this with 2 lines of code? Also the reason there are multiple df(s) is because I am combining 6 sheets within each Excel file.
I can't test this, but this simplification by refactoring should work:
# instead df1 = ..., df2 = ..., etc., store them in a list
combined_frames = [combine_excel(excel_files, i) for i in range(7)]
# instead of explicitly enumerating all column indices, use a range;
# instead of applying to each column individually, use applymap to apply to
# all cells in the dataframe
for i,df in enumerate(combined_frames):
combined_frames[i].iloc[:, 10:31] = df.iloc[:, 10:31].applymap(lambda x: f"${x:,.0f}")
writer = pd.ExcelWriter(outputfile, engine='xlsxwriter')
# instead of exporting each individual df, export them in a loop,
# dynamically setting the sheet_name
for i, df in enumerate(combined_frames, start=1):
df.to_excel(writer, sheet_name = f'Column{i}', index=False)
Hello guys I created the following code to get some information out of multiple excel files, I have one for each month of each year (2015 to date) and it seems that somewhere in those time lines they decided to change from upper to lower case on the sheet names, is there a way to call the sheets names with an upper of lower function to normalize their names? Below is what I created so far:
## 2020
import glob as glob
import pandas as pd
import datetime
pd.set_option('display.max_rows', None)
data2020 = glob.glob('*20*')
cols = ["VendorName", "VendorNo", "InvoiceNumber", "IvaPagadoOriginal","PeriodoEdo"]
li3 = []
for filename in data2020:
df = pd.read_excel(filename, sheet_name = 'virtual', index_col=None, usecols= cols)
li3.append(df)
frame3 = pd.concat(li3, axis=0, ignore_index=True)
frame3 = frame3.iloc[:-1 , :]
frame3 = frame3[(frame3.IvaPagadoOriginal != 0)]
frame3 = frame3.dropna(subset=['InvoiceNumber'])
frame3['Vat Returns Adjustment Category'] = 'Total foreign suppliers'
frame3['Date'] = pd.to_datetime(frame3['PeriodoEdo'])
frame3['Period Name'] = frame3['Date'].dt.strftime('%b-%y')
As you can see the sheet is called "virtual" but it can be "Virtual" or in some cases "virtuaL"
Any assistance will be much appreciate.
Create a pandas.ExcelFile object which contains a list of the sheet names as an attribute. Then iterate over all the sheet names of each file and select only the one named 'virtual' or 'Virtual' (i.e. if sheet_name.lower() == 'virtual'). Finally, pass only that one to pandas.read_excel.
Replace the for loop with
for filename in data2020:
excel_file = pd.ExcelFile(filename)
sheet_name = next(sheet for sheet in excel_file.sheet_names
if sheet.lower() == 'virtual')
df = pd.read_excel(filename, sheet_name=sheet_name, index_col=None, usecols= cols)
li3.append(df)
Or use the pandas.ExcelFile.parse method instead of pandas.read_excel
for filename in data2020:
excel_file = pd.ExcelFile(filename)
sheet_name = next(sheet for sheet in excel_file.sheet_names
if sheet.lower() == 'virtual')
df = excel_file.parse(sheet_name=sheet_name, index_col=None, usecols= cols)
li3.append(df)
I have 5 sheets and created a script to do numerous formatting, I tested it per sheet, and it works perfectly.
import numpy as np
import pandas as pd
FileLoc = r'C:\T.xlsx'
Sheets = ['Alex','Elvin','Gerwin','Jeff','Joshua',]
df = pd.read_excel(FileLoc, sheet_name= 'Alex', skiprows=6)
df = df[df['ENDING'] != 0]
df = df.head(30).T
df = df[~df.index.isin(['Unnamed: 2','Unnamed: 3','Unnamed: 4','ENDING' ,3])]
df.index.rename('STORE', inplace=True)
df['index'] = df.index
df2 = df.melt(id_vars=['index', 2 ,0, 1] ,value_name='SKU' )
df2 = df2[df2['variable']!= 3]
df2['SKU2'] = np.where(df2['SKU'].astype(str).fillna('0').str.contains('ALF|NOB|MET'),df2.SKU, None)
df2['SKU2'] = df2['SKU2'].ffill()
df2 = df2[~df2[0].isnull()]
df2 = df2[df2['SKU'] != 0]
df2[1] = pd.to_datetime(df2[1]).dt.date
df2.to_excel(r'C:\test.xlsx', index=False)
but when I assigned a list in Sheet_name = Sheets it always produced an error KeyError: 'ENDING'. This part of the code:
Sheets = ['Alex','Elvin','Gerwin','Jeff','Joshua',]
df = pd.read_excel(FileLoc,sheet_name='Sheets',skiprows=6)
Is there a proper way to do this, like looping?
My expected result is to execute the formatting that I have created and consolidate it into one excel file.
NOTE: All sheets have the same format.
In using the read_excel method, if you give the parameter sheet_name=None, this will give you a OrderedDict with the sheet names as keys and the relevant DataFrame as the value. So, you could apply this and loop through the dictionary using .items().
The code would look something like this,
dfs = pd.read_excel('your-excel.xlsx', sheet_name=None)
for key, value in dfs.items():
# apply logic to value
If you wish to combine the data in the sheets, you could use .append(). We can append the data after the logic has been applied to the data in each sheet. That would look something like this,
combined_df = pd.DataFrame()
dfs = pd.read_excel('your-excel.xlsx', sheet_name=None)
for key, value in dfs.items():
# apply logic to value, which is a DataFrame
combined_df = combined_df.append(sheet_df)
I have multiple dataframes that look like this, the data is irrelevant.
I want it to look like this, i want to insert a title above the column headers.
I want to combine them into multiple tabs in an excel file.
Is it possible to add another row above the column headers and insert a Title into the first cell before saving the file to excel.
I am currently doing it like this.
with pd.ExcelWriter('merged_file.xlsx',engine='xlsxwriter') as writer:
for filename in os.listdir(directory):
if filename.endswith('xlsx'):
print(filename)
if 'brands' in filename:
some function
elif 'share' in filename:
somefunction
else:
some function
df.to_excel(writer,sheet_name=f'{filename[:-5]}',index=True,index_label=True)
writer.close()
But the sheet_name is too long, that's why I want to add the title above the column headers.
I tried this code,
columns = df.columns
columns = list(zip([f'{filename[:-5]}'] * len(df.columns), columns))
columns = pd.MultiIndex.from_tuples(columns)
df2 = pd.DataFrame(df,index=df.index,columns=columns)
df2.to_excel(writer,sheet_name=f'{filename[0:3]}',index=True,index_label=True)
But it ends up looking like this with all the data gone,
It should look like this
You can write data from sedond row first and then write to first cell your text:
df = pd.DataFrame({'col': list('abc'), 'col1': list('def')})
print (df)
col col1
0 a d
1 b e
2 c f
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', startrow = 1, index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
text = 'sometitle'
worksheet.write(0, 0, text)
writer.save()
Then for reading need:
title = pd.read_excel('test.xlsx', nrows=0).columns[0]
print (title)
sometitle
df = pd.read_excel('test.xlsx', skiprows=1)
print (df)
col col1
0 a d
1 b e
2 c f
You can use MultiIndex. There is an example:
import pandas as pd
df = pd.read_excel('data.xls')
header = pd.MultiIndex.from_product([['Title'],
list(df.columns)])
pd.DataFrame(df.to_numpy(), None , columns = header)
Also, I can share with you my solution with real data in Deepnote (my favorite tool). Feel free to duplicate and play with your own .xls:
https://deepnote.com/publish/3cfd4171-58e8-48fd-af21-930347e8e713
Good morning.
I have a question regarding Python. I have an if where has the conditional and else, the else it renders more than one file and I need to save all information it reads inside a DataFrame, is there a way to do this?
The code I am using:
for idx, folder in enumerate(fileLista):
if folder == 'filename_for_treatment':
df1 = pd.read_excel(folder, sheet_name = sheetName[idx], skiprows=1)
df1.columns = df1.columns.str.strip()
tratativaUm = df1[[column information to be used]]
else:
df2 = pd.read_excel(folder, sheet_name = sheetName[idx], skiprows=1)
df2.columns = df2.columns.str.strip()
TratativaDois = df2[[column information to be use]]
####assign result of each file received in the else
frames = [tratativaUm, tratativaDois]
titEmpresa = pd.concat(frames)
Can someone help me, is it possible to do this? Thanks
you can do it by appending your dataframes in a list for example:
list_df_tratativaDois = []
for idx, folder in enumerate(fileLista):
df = pd.read_excel(folder, sheet_name = sheetName[idx], skiprows=1)
df.columns = df.columns.str.strip()
if folder == 'filename_for_treatment':
tratativaUm = df[[column information to be used]]
else:
list_df_tratativaDois.append(df[[column information to be use]])
titEmpresa = pd.concat([tratativaUm]+list_df_tratativaDois)
Note that instead of df1 and df2 you can just create a df as it was the same read_excel and then depending if folder is the right one, do a different action on df