I have different excel files in the same folder, in each of them there are the same sheets. I need to select the last sheet of each file and join them all by the columns (that is, form a single table). The columns of all files are named the same. I think it is to identify the dataframe of each file and then paste them. But I do not know how
Just do what Recessive said and use a for loop to read the excel file one by one and do the following:
excel_files = os.listdir(filepath)
for file in excel_files:
read excel file sheet
save specific column to variable
end of loop
concatenate each column from different variables to one dataframe
Related
I have a few .xlsm files in a directory.
I want to write a python code that creates new .xlsx files in that directory that having the same file names as the xlsm files (just xlsx instead of xlsm).
The xlsm files have a simple formula in cell Q1. It just sums a few cells. (like Q1 is =A1+B1).
I want to copy the result of the sum in Q1 of the xlsm to the corresponding xlsx file, and it should be a constant number in the xlsx file, not a formula (since the rest of the xlsx sheet is empty).
All what I try to do, it doesn't copy the result as a constant number. It copies it as a formula, and then the value in the xlsx file is wrong (since it doesn't have the rest of the data to make the calculation).
Here is my python code, what should I change so it will copy the result just as a constant number?
(edit: probably in the future I'll make the formula in Q1 more complicated, and maybe copy more cells with different formulas.. so I'm looking for a solution that will copy the data itself, not a solution that makes calculations in xlsx file by using the values in xlsm files.
Edit2: Only these specific cells are copied. It does not copy the whole xlsm files to the xlsx files)
Thanks
import os
import openpyxl
for filename in os.listdir():
if filename.endswith('.xlsm'):
xlsm_wb = openpyxl.load_workbook(filename)
xlsm_sheet = xlsm_wb.active
xlsx_wb = openpyxl.Workbook()
xlsx_sheet = xlsx_wb.active
xlsx_sheet['Q1'] = xlsm_sheet['Q1'].value
xlsx_wb.save(filename[:-1] + 'x')
You could just replicate the formula in Python. Ex. add in a line that says:
xlsx_sheet['Q'] = xlsx_sheet['A'] + xlsx_sheet['B']
And it will replace the formulas it imported with the results of the replicated formula.
Sorry if this question has been asked before, I just couldn't find a simple example.
I have 2 large CSV files that I would like to split based on the unique values in the Location & LocationType Column. I would like to store the split csv files into sub-directories for each value in a folder named item/{item_name} where item_name is the unique value in Location & Location_type.
Location.csv
Location-type.csv
Each split csv file should have the same header line as the parent file
If the sub-directory already exists, delete those files before writing the new files.
End result would be a directory called item with two sub-directories called fm5 & fm15 with our split CSV files stored. location.csv & location_type.csv
Thank you in advance
would like to know the workflow for this type of project
open the file
sort the contents on the desired column
group by the desired column
write each group to a new file
I have one excel file with multiple sheets (let's call this the 'Master' file.) Each sheet a list in columns A, B. I have a file path with multiple files, with names similar to that of the sheets in the 'Master' file. The names are similar, containing specific text, but not exactly the same.
I would like to be able to export each sheet from the Master file to the files in the file path with the corresponding text.
How can I achieve this using python? I have not tried any code yet because I can't find anything that seems to be exactly what I need.
you may use pandas to read the excel file, iterate the sheets and save each one to different path.
this code sample based of asongtoruin's answer in 'Python Loop through Excel sheets, place into one df
':
import pandas as pd
sheets_dict = pd.read_excel('master_file.xlsx', sheetname=None)
for name, sheet in sheets_dict.items():
sheet.to_excel("directory/" + name)
I have a script that parses Excel files all together from one directory. It joins all of the files together and concatenates them into one.
Right now the way I write CSV files from a dataframe by starting an empty list then appending the scraped data from the function cutpaste which parses the data I want from each file and into a new dataframe which then writes a final concatenated CSV file.
files is the variable that calls all the Excel files from a given directory.
# Create new CSV file
df_list = []
for file in files:
df = pd.read_excel(io=file, sheet_name=sheet)
new_file = cutpaste(df)
df_list.append(new_file)
df_final = pd.concat(df_list)
df_final.to_csv('Energy.csv', header=True, index=False)
What I need now is a way of changing my code so that I can write any new Excel files that don't already exist in Energy.csv to Energy.csv.
I am creating a new dataframe in pandas as below:
df = pd.read_excel(zipfile.open('zipfilename 2017-06-28.xlsx'), header=1, names=cols)
The single .xlsx within the .zip is dynamically named (so changes based on the date).
This means I need to change the name of the .xlsx in my code each time I open the .zip to account for the dynamically named .xlsx.
Is there a way to make pandas read the file within the .zip, regardless of the name of the file? Or to return the name of the .xlsx within the line of code somehow?
Thanks
Read your file using,
df = pd.read_excel('zipfilename 2017-06-28.xlsx',compression='zip', header=1, names=cols)