I am working on combining two excel files that that the same columns but have different values. I would like to convert all numbers into currency form ($ and commas). I've been able to do this but would like to find a more simple way to write the code.
Also, I need help with the output file. I cannot open it unless I close python. It says "Cannot access this file" and is always syncing. Anyone know any solutions?
Here is my code
import pandas as pd
import openpyxl
import xlsxwriter
outputfile = "Outputfile.xlsx"
excel_files = ["File1.xlsx",
"File2.xlsx"]
def combine_excel(excel_files, sheet_name):
sheet_frames = [pd.read_excel(x, sheet_name=sheet_name) for x in excel_files]
combined_df = pd.concat(sheet_frames).reset_index(drop=True)
return combined_df
df1 = combine_excel(excel_files, 0)
df2 = combine_excel(excel_files, 1)
df3 = combine_excel(excel_files, 2)
df4 = combine_excel(excel_files, 3)
df5 = combine_excel(excel_files, 4)
df6 = combine_excel(excel_files, 5)
df7 = combine_excel(excel_files, 6)
for x in df1.iloc[:,[10,11,12,13,14,15,16,17,18,19,20,26,27,28,29,30]]:
df1[x] = df1[x].apply(lambda x: f"${x:,.0f}")
for x in df2.iloc[:,[10,11,12,13,14,15,16,17,18,19,20,26,27,28,29,30]]:
df2[x] = df2[x].apply(lambda x: f"${x:,.0f}")
.
.
.
.
.
.
writer = pd.ExcelWriter(outputfile, engine='xlsxwriter')
df1.to_excel(writer, sheet_name ='Column1', index = False)
df2.to_excel(writer, sheet_name='Column2', index = False)
df3.to_excel(writer, sheet_name='Column3', index = False)
df4.to_excel(writer, sheet_name='Column4', index = False)
df5.to_excel(writer, sheet_name='Column5', index = False)
df6.to_excel(writer, sheet_name='Column6', index = False)
df7.to_excel(writer, sheet_name ='Column7', index = False)
writer.save()
As you can see I would like to make this part more simple to read and write:
for x in df1.iloc[:,[10,11,12,13,14,15,16,17,18,19,20,26,27,28,29,30]]:
df1[x] = df1[x].apply(lambda x: f"${x:,.0f}")
for x in df2.iloc[:,[10,11,12,13,14,15,16,17,18,19,20,26,27,28,29,30]]:
df2[x] = df2[x].apply(lambda x: f"${x:,.0f}")
.
.
.
.
.
.
There is a total 12 lines of code just to convert a number of columns into currency form. Is there a way to do this with 2 lines of code? Also the reason there are multiple df(s) is because I am combining 6 sheets within each Excel file.
I can't test this, but this simplification by refactoring should work:
# instead df1 = ..., df2 = ..., etc., store them in a list
combined_frames = [combine_excel(excel_files, i) for i in range(7)]
# instead of explicitly enumerating all column indices, use a range;
# instead of applying to each column individually, use applymap to apply to
# all cells in the dataframe
for i,df in enumerate(combined_frames):
combined_frames[i].iloc[:, 10:31] = df.iloc[:, 10:31].applymap(lambda x: f"${x:,.0f}")
writer = pd.ExcelWriter(outputfile, engine='xlsxwriter')
# instead of exporting each individual df, export them in a loop,
# dynamically setting the sheet_name
for i, df in enumerate(combined_frames, start=1):
df.to_excel(writer, sheet_name = f'Column{i}', index=False)
Related
hi i have create a DataFrame with pandas by a csv in this way
elementi = pd.read_csv('elementi.csv')
df = pd.DataFrame(elementi)
lst= []
lst2=[]
for x in df['elementi']:
a = x.split(";")
lst.append(a[0])
lst2.append(a[1])
ipo_oso = np.random.randint(0,3,76)
oso = np.random.randint(3,5,76)
ico = np.random.randint(5,6,76)
per_ico = np.random.randint(6,7,76)
df = pd.DataFrame(lst,index=lst2,columns=['elementi'])
# drop the element i don't use in the periodic table
df = df.drop(df[103:117].index)
df = df.drop(df[90:104].index)
df = df.drop(df[58:72].index)
df.head()
df['ipo_oso'] = ipo_oso
df['oso'] = oso
df['ico'] = ico
df['per_ico'] = per_ico
df.to_csv('period_table')
df.head()
and looks like this
when i save this table with to_csv() and import it in another project with read_csv() the index of table is considered as a column but is the index
e= pd.read_csv('period_table')
e.head()
or
e= pd.read_csv('period_table')
df =pd.DataFrame(e)
df.head()
how can i fix that :)
Just use index_col=0 as parameter of read_csv:
df = pd.read_csv('period_table', index_col=0)
df.head()
Hello guys I created the following code to get some information out of multiple excel files, I have one for each month of each year (2015 to date) and it seems that somewhere in those time lines they decided to change from upper to lower case on the sheet names, is there a way to call the sheets names with an upper of lower function to normalize their names? Below is what I created so far:
## 2020
import glob as glob
import pandas as pd
import datetime
pd.set_option('display.max_rows', None)
data2020 = glob.glob('*20*')
cols = ["VendorName", "VendorNo", "InvoiceNumber", "IvaPagadoOriginal","PeriodoEdo"]
li3 = []
for filename in data2020:
df = pd.read_excel(filename, sheet_name = 'virtual', index_col=None, usecols= cols)
li3.append(df)
frame3 = pd.concat(li3, axis=0, ignore_index=True)
frame3 = frame3.iloc[:-1 , :]
frame3 = frame3[(frame3.IvaPagadoOriginal != 0)]
frame3 = frame3.dropna(subset=['InvoiceNumber'])
frame3['Vat Returns Adjustment Category'] = 'Total foreign suppliers'
frame3['Date'] = pd.to_datetime(frame3['PeriodoEdo'])
frame3['Period Name'] = frame3['Date'].dt.strftime('%b-%y')
As you can see the sheet is called "virtual" but it can be "Virtual" or in some cases "virtuaL"
Any assistance will be much appreciate.
Create a pandas.ExcelFile object which contains a list of the sheet names as an attribute. Then iterate over all the sheet names of each file and select only the one named 'virtual' or 'Virtual' (i.e. if sheet_name.lower() == 'virtual'). Finally, pass only that one to pandas.read_excel.
Replace the for loop with
for filename in data2020:
excel_file = pd.ExcelFile(filename)
sheet_name = next(sheet for sheet in excel_file.sheet_names
if sheet.lower() == 'virtual')
df = pd.read_excel(filename, sheet_name=sheet_name, index_col=None, usecols= cols)
li3.append(df)
Or use the pandas.ExcelFile.parse method instead of pandas.read_excel
for filename in data2020:
excel_file = pd.ExcelFile(filename)
sheet_name = next(sheet for sheet in excel_file.sheet_names
if sheet.lower() == 'virtual')
df = excel_file.parse(sheet_name=sheet_name, index_col=None, usecols= cols)
li3.append(df)
I have 5 sheets and created a script to do numerous formatting, I tested it per sheet, and it works perfectly.
import numpy as np
import pandas as pd
FileLoc = r'C:\T.xlsx'
Sheets = ['Alex','Elvin','Gerwin','Jeff','Joshua',]
df = pd.read_excel(FileLoc, sheet_name= 'Alex', skiprows=6)
df = df[df['ENDING'] != 0]
df = df.head(30).T
df = df[~df.index.isin(['Unnamed: 2','Unnamed: 3','Unnamed: 4','ENDING' ,3])]
df.index.rename('STORE', inplace=True)
df['index'] = df.index
df2 = df.melt(id_vars=['index', 2 ,0, 1] ,value_name='SKU' )
df2 = df2[df2['variable']!= 3]
df2['SKU2'] = np.where(df2['SKU'].astype(str).fillna('0').str.contains('ALF|NOB|MET'),df2.SKU, None)
df2['SKU2'] = df2['SKU2'].ffill()
df2 = df2[~df2[0].isnull()]
df2 = df2[df2['SKU'] != 0]
df2[1] = pd.to_datetime(df2[1]).dt.date
df2.to_excel(r'C:\test.xlsx', index=False)
but when I assigned a list in Sheet_name = Sheets it always produced an error KeyError: 'ENDING'. This part of the code:
Sheets = ['Alex','Elvin','Gerwin','Jeff','Joshua',]
df = pd.read_excel(FileLoc,sheet_name='Sheets',skiprows=6)
Is there a proper way to do this, like looping?
My expected result is to execute the formatting that I have created and consolidate it into one excel file.
NOTE: All sheets have the same format.
In using the read_excel method, if you give the parameter sheet_name=None, this will give you a OrderedDict with the sheet names as keys and the relevant DataFrame as the value. So, you could apply this and loop through the dictionary using .items().
The code would look something like this,
dfs = pd.read_excel('your-excel.xlsx', sheet_name=None)
for key, value in dfs.items():
# apply logic to value
If you wish to combine the data in the sheets, you could use .append(). We can append the data after the logic has been applied to the data in each sheet. That would look something like this,
combined_df = pd.DataFrame()
dfs = pd.read_excel('your-excel.xlsx', sheet_name=None)
for key, value in dfs.items():
# apply logic to value, which is a DataFrame
combined_df = combined_df.append(sheet_df)
I have a dataframe which is constructed using a list and other dataframe that i read from excel file.
What I want to do is, I just have to apply the background color to first row of a dataframe which I would export in to an excel.
The below code doing the job correclty as expected.(There is issue with the data)
The issue is the style which I have applied to the dataframe was not reflected in the excel sheet. I am using Jupyter Notebook.
Please suggest a way to get the styles in excel.
import pandas as pd
sheet1 = r'D:\dinesh\input.xlsx'
sheet2 = "D:\dinesh\lookup.xlsx"
sheet3 = "D:\dinesh\Output.xlsx"
sheetname = 'Dashboard Índice (INPUT)'
print('Started extracting the Crime Type!')
df1 = pd.read_excel(sheet1,sheet_name = 'Dashboard Índice (INPUT)',skiprows=10, usecols = 'B,C,D,F,H,J,L,N,P', encoding = 'unicode_escape')
crime_type = list(df1.iloc[:0])[3:]
print(f'crime_types : {crime_type}')
df1 = (df1.drop(columns=crime_type,axis=1))
cols = list(df1.iloc[0])
print(f'Columns : {cols}')
df1.columns = cols
df1 = (df1[1:]).dropna()
final_data = []
for index, row in df1.iterrows():
sheetname = (f'{row[cols[1]]:0>2d}. {row[cols[0]]}')
cnty_cd = [row[cols[0]], row[cols[1]], row[cols[2]]]
wb = pd.ExcelFile(sheet2)
workbook = ''.join([workbook for workbook in wb.sheet_names if workbook.upper() == sheetname])
if workbook:
df2 = pd.read_excel(sheet2, sheet_name = workbook, skiprows=7, usecols ='C,D,H:T', encoding = 'unicode_escape')
df2_cols = list(df2.columns)
final_cols = cols + df2_cols
df2 = df2.iloc[2:]
df2 = df2.dropna(subset=[df2_cols[1]])
for index2, row2 in df2.iterrows():
if row2[df2_cols[1]].upper() in crime_type:
s1 = pd.Series(cnty_cd)
df_rows = (pd.concat([s1, row2], axis=0)).to_frame().transpose()
final_data.append(df_rows)
break
else:
print(f'{sheetname} does not exists!')
df3 = pd.concat(final_data)
df3.columns = final_cols
df_cols = (pd.Series(final_cols, index=final_cols)).to_frame().transpose()
df_final = (pd.concat([df_cols,df3], axis=0, ignore_index=True, sort=False))
df_final.style.apply(lambda x: ['background: blue' if x.name==0 else '' for i in x], axis=1)
df_final.to_excel(sheet3, sheet_name='Crime Details',index=False,header = None)
print(f'Sucessfully created the Output file to {sheet3}!')
You need to export to excel the styled dataframe and not the unstyled dataframe and so you either need to chain your styling and sending to Excel together, similar to shown in the documentation here, or assign the styled dataframe and use that to send to Excel.
The latter could look like this based on your code:
df_styled = df_final.style.apply(lambda x: ['background: blue' if x.name==0 else '' for i in x], axis=1)
df_styled.to_excel(sheet3,, sheet_name='Crime Details',index=False, header = None, engine='openpyxl')
As described here you need either the OpenPyXL or XlsxWriter engines for export.
Good morning.
I have a question regarding Python. I have an if where has the conditional and else, the else it renders more than one file and I need to save all information it reads inside a DataFrame, is there a way to do this?
The code I am using:
for idx, folder in enumerate(fileLista):
if folder == 'filename_for_treatment':
df1 = pd.read_excel(folder, sheet_name = sheetName[idx], skiprows=1)
df1.columns = df1.columns.str.strip()
tratativaUm = df1[[column information to be used]]
else:
df2 = pd.read_excel(folder, sheet_name = sheetName[idx], skiprows=1)
df2.columns = df2.columns.str.strip()
TratativaDois = df2[[column information to be use]]
####assign result of each file received in the else
frames = [tratativaUm, tratativaDois]
titEmpresa = pd.concat(frames)
Can someone help me, is it possible to do this? Thanks
you can do it by appending your dataframes in a list for example:
list_df_tratativaDois = []
for idx, folder in enumerate(fileLista):
df = pd.read_excel(folder, sheet_name = sheetName[idx], skiprows=1)
df.columns = df.columns.str.strip()
if folder == 'filename_for_treatment':
tratativaUm = df[[column information to be used]]
else:
list_df_tratativaDois.append(df[[column information to be use]])
titEmpresa = pd.concat([tratativaUm]+list_df_tratativaDois)
Note that instead of df1 and df2 you can just create a df as it was the same read_excel and then depending if folder is the right one, do a different action on df