I am creating a new dataframe in pandas as below:
df = pd.read_excel(zipfile.open('zipfilename 2017-06-28.xlsx'), header=1, names=cols)
The single .xlsx within the .zip is dynamically named (so changes based on the date).
This means I need to change the name of the .xlsx in my code each time I open the .zip to account for the dynamically named .xlsx.
Is there a way to make pandas read the file within the .zip, regardless of the name of the file? Or to return the name of the .xlsx within the line of code somehow?
Thanks
Read your file using,
df = pd.read_excel('zipfilename 2017-06-28.xlsx',compression='zip', header=1, names=cols)
Related
I am trying to take some new data that I have created from some old data and I want to save the new data in a different directory separate from the original directory from where I got the original data. I believe I have the correct data path but I don't think I am using the correct method being called to both create the csv and put it in the newly created directory. I have the code what I was suggested:
#create the appropriate data path
datapath = '../data'
#save the dataframe as a csv file in a new directory
save_file(ski_data, 'ski_data_cleaned.csv', datapath)
I get an error:
NameError: name 'save_file' is not defined
I was understanding the 'save_file' was the method and I'm not sure how to include the 'datapath' in other methods?
try below one:
Call to_csv method on your dataframe. you need to pass the CSV file path as an argument for the method.
ski_data.to_csv("../data/ski_data_cleaned.csv")
If you need to save without headers then use the following one.
ski_data.to_csv("../data/ski_data_cleaned.csv", header=False, index=False)
To save a specific location
#For windows
ski_data.to_csv(r"C:\Users\Admin\Desktop\data\ski_data_cleaned.csv")
Check out the official site for more details.
I'm trying to output an .xlsx file with pd.ExcelWriter as shown below:
writer_t = pd.ExcelWriter('C:/Users/bbecker021/AppData/Local/Programs/Python/Python38/Project_Catalog_Template_IDs.xlsx', engine='xlsxwriter')`
I then append to the file with:
data_requirements_DF.to_excel(writer_t, sheet_name=repo_list[element])
And print out the dataframe being added:
print("---", repo_list[element], "data_requirements---\n", data_requirements_DF)
But when I check the folder path I specify above, there is no file...any help as to check what might be happening?
I have not seen what I am about to ask anywhere so far.
I have 2 excel files in a folder named say RedRose on say C drive.
The files start with say date 09-30-2019_rest_of_name1, ...name2.
The _rest_of_name1, ...name2 are static, only dates are updated daily as new files are added into the RedRose folder daily.
Using Python on Run command I want to automatically look in that folder, search for each file name and import each file into its own pandas dataframe.
Thoughts, can this be done with Python?
Not sure where to start
You can get a list of files in the current directory with the glob module.
import glob
files = glob.glob('C:\RedRose\*.xls*')
It returns a list of files with the .xls type of extension and uses regular expressions to check for the right names. Also, the Windows path format might be different
Use the read_excel function in the Pandas library to read the excel files into DataFrames. You can loop through all the file names in files and store each DataFrame as an element of a list or dictionary.
import pandas as pd
dataframes = []
for filename in files:
dataframes.append(pd.read_excel(filename))
For reading into a dictionary, you need to specify a key for each DataFrame. I would suggest using the filename as the key because it is unique.
I have one excel file with multiple sheets (let's call this the 'Master' file.) Each sheet a list in columns A, B. I have a file path with multiple files, with names similar to that of the sheets in the 'Master' file. The names are similar, containing specific text, but not exactly the same.
I would like to be able to export each sheet from the Master file to the files in the file path with the corresponding text.
How can I achieve this using python? I have not tried any code yet because I can't find anything that seems to be exactly what I need.
you may use pandas to read the excel file, iterate the sheets and save each one to different path.
this code sample based of asongtoruin's answer in 'Python Loop through Excel sheets, place into one df
':
import pandas as pd
sheets_dict = pd.read_excel('master_file.xlsx', sheetname=None)
for name, sheet in sheets_dict.items():
sheet.to_excel("directory/" + name)
I have a script that parses Excel files all together from one directory. It joins all of the files together and concatenates them into one.
Right now the way I write CSV files from a dataframe by starting an empty list then appending the scraped data from the function cutpaste which parses the data I want from each file and into a new dataframe which then writes a final concatenated CSV file.
files is the variable that calls all the Excel files from a given directory.
# Create new CSV file
df_list = []
for file in files:
df = pd.read_excel(io=file, sheet_name=sheet)
new_file = cutpaste(df)
df_list.append(new_file)
df_final = pd.concat(df_list)
df_final.to_csv('Energy.csv', header=True, index=False)
What I need now is a way of changing my code so that I can write any new Excel files that don't already exist in Energy.csv to Energy.csv.