How to iterate through multiple excel files using python - python

I am trying to develop a python script that will iterate through several Excel .xlsx files, search each file for a set of values and save them to a new .xlsx template.
The issue I'm having is when I'm trying to get a proper list of files in the folder I'm looking at. I'm saving these filenames in a list variable 'fileList' to manage iteration.
When I run the code os.chdir(sourcepath),
I'm constantly getting a FileNotFoundError: [WinError 2] The system cannot find the file specified: C:\\Users\\username\\PycharmProjects\\projectName\\venv\\Site List\\siteListfolder
I think this has to do with the '\\' that is displaying in the error, but when I run a print(sourcepath) in this code, the path is properly displayed, with just one '\' between each subdirectory instead of two.
I need to be able to get the list of files in the siteListfolder, and be able to iterate through them using this kind of logic:
priCLLI = sys.argv[1]
secCLLI = sys.argv[2]
sourcepath = os.path.join(homepath, 'Site List', f'{priCLLI}_{secCLLI}')
siteListfolder = os.listdir(sourcepath)
for file in siteListfolder:
for row in file:
<script does its work>
'siteListfolder = os.listdir(sourcepath)' is generating the error
Thanks to all in advance for supporting this kind of forum.

import os
directory = ('your/path/directory')
Source_Workbook = []
for filename in os.listdir(directory):
if filename.endswith(".xlsx"):
Source_Workbook.append(filename)
print(Source_Workbook)

Related

Modify all files in specified directory (including subfolders) and saving them in new directory while presevering folder structure (Python)

(I'm new to python so please excuse the probably trivial question. I tried my best looking for similar issues but suprisingly couldn't find someone with the same question.)
I'm trying to build a simple static site generator in Python. The script should take all .txt files in a specific directory (including subfolders), paste the content of each into a template .html file and then save all the newly generated .html files into a new directory while recreating the folder structure of the original directory.
So for I got the code which does the conversion itself for a single file but I'm unsure how to do it for multiple files in a directory.
with open('template/page.html', 'r') as template:
templatedata = template.read()
with open('content/content.txt', 'r') as content:
contentdata = content.read()
pagedata = templatedata.replace('!PlaceholderContent!', contentdata)
with open('www/content.html', 'w') as output:
output.write(pagedata)
To manipulate files and directories, you will need to import some system functionalites under the built-in module os.
import os
The functionalities under the os module include :
Listing the content of a directory :
path_to_template_dir = 'template/'
template_files = os.listdir(path_to_template_dir)
print(template_files)
# Outputs : ['page.html']
Creating a directory (If it does not already exist) :
path_to_output_dir = 'www/'
try :
os.mkdir(path_to_output_dir)
except FileExistsError as e:
print('Directory exists:', path_to_output_dir)
And since you know the names of the directories you want to use, and using these two functions, you now know the names of the files you want to use and generate, you can now concatenate the name of each file to the names of its directories to create the string str of the final file path, which you can then open() for reading and/or writing.
It's hard to give a perfect code example for your question since the logic of how you want to manipulate each of the template and content file is missing, but here is an example for writing a file inside the newly created directory :
path_to_output_file = path_to_output_dir + 'content.html'
with open(path_to_output_file, 'w') as output:
output.write('Content')
And an example for reading all the template files inside the template/ directory and then printing them to the screen.
for template_file in template_files:
path_to_template_file = path_to_template_dir + template_file
with open(path_to_template_file, 'r') as template:
print(template.read())
In the end, manipulating files is all about creating the path string you want to read from or write to, and then accessing it.
Anymore functionalities you might need (for example : checking if a path is a file os.path.isfile() or if it's for a directory os.path.isdir() can be found under the os module.

Duplicate in list created from filenames (python)

I'm trying to create a list of excel files that are saved to a specific directory, but I'm having an issue where when the list is generated it creates a duplicate entry for one of the file names (I am absolutely certain there is not actually a duplicate of the file).
import glob
# get data file names
path =r'D:\larvalSchooling\data'
filenames = glob.glob(path + "/*.xlsx")
output:
>>> filenames
['D:\\larvalSchooling\\data\\copy.xlsx', 'D:\\larvalSchooling\\data\\Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx', 'D:\\larvalSchooling\\data\\Raw data-SF_Sat_70dpf_GroupA_n5_20200808_1015-Trial 1.xlsx', 'D:\\larvalSchooling\\data\\Raw data-SF_Sat_84dpf_GroupABCD_n5_20200822_1440-Trial 1.xlsx', 'D:\\larvalSchooling\\data\\~$Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx']
you'll note 'D:\larvalSchooling\data\Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx' is listed twice.
Rather than going through after the fact and removing duplicates I was hoping to figure out why it's happening to begin with.
I'm using python 3.7 on windows 10 pro
If you wrote the code to remove duplicates (which can be as simple as filenames = set(filenames)) you'd see that you still have two filenames. Print them out one on top of the other to make a visual comparison easier:
'D:\\larvalSchooling\\data\\Raw data-SF_Sat_84dpf_GroupABCD_n5_20200822_1440-Trial 1.xlsx',
'D:\\larvalSchooling\\data\\~$Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx'
The second one has a leading ~ (probably an auto-backup).
Whenever you open an excel file it will create a ghost copy that works as a temporary backup copy for that specific file. In this case:
Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial1.xlsx
~$ Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial1.xlsx
This means that the file is open by some software and it's showing you that backup inside(usually that file is hidden from the explorer as well)
Just search for the program and close it. Other actions, such as adding validation so the "~$.*.xlsx" type of file is ignored should be also implemented if this is something you want to avoid.
You can use os.path.splittext to get the file extension and loop through the directory using os.listdir . The open excel files can be skipped using the following code:
filenames = []
for file in os.listdir('D:\larvalSchooling\data'):
filename, file_extension = os.path.splitext(file)
if file_extension == '.xlsx':
if not file.startswith('~$'):
filenames.append(file)
Note: this might not be the best solution, but it'll get the job done :)

Python- os.chdir takes an open excel file instance into account while execution

I have some excel files in a folder. I use the below code to read those excel files and get them into a list so that I can pass that list into a loop to get particular data from all those files.
My problem is - If I open a excel file from that folder and run the script.The opened excel file instance is created in the folder and the script now takes that temporary instance as an .xlsx file and returms it in the list and passes it to the loop where it eventually fails as -"No such directory" I found a way of avoiding the failure by adding in a "-1" from lenght of list to loop.But this isnt effective.
Please suggest any alternatives for os.chdir
import pandas as pd
import glob
import os
os.chdir(r'\\servername/Files_to_Read')
files = [i for i in glob.glob('*.{}'.format('xlsx'))]
print(files)
s = 0
while(len(files) > s):
print(files[s])
df_getvalues = pd.read_excel(files[s], sheet_name="LISTS", header=None)
dfindx = (df_getvalues.index)
print("This is the index of the file - " +str(dfindx))
print(df_getvalues.iloc[dfindx,0])
s = s + 1
error:-
FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\$name_of_file.xlsx'
The error states that it is searching for the file in C drive but actual folder of excel files is on H drive.
Note - Im using Windows 10, Excel 2016 , python 3.7
This issue seems to be very intermittent and this never occurred again after I did a reboot and I also had some issues with my VM migration and my profile has been configured manually so I dont know what among the various reasons helped me to get out of this issue.
Also, I am not using OS.Chdir now . I am using path and list directory this time as I saw somewhere that chdir is not so recommended.
list_dirctry_content = os.listdir(src)
for xlfile in list_dirctry_content:
str_name_file = os.path.join(src, xlfile)
if(str_name_file.endswith('.xlsx')):
final_driving_list.append(str_name_file.strip(src))

How to fix "No such file or directory" error with csv creation in Python

I'm trying to make a new .csv file, but I'm getting a "No such file or directory" in the with open(...) portion of the code.
I modified the with open(...) portion of the code to exclude a direction, substituting a string name, and it worked just fine. The document was created with all my PyCharm scratches on the C Drive.
I believe it's worth noting that I'm running python on my C: Drive while the directory giving me issues exists on the D: Drive. Not sure if that actually makes a difference, but i
path = r"D:\Folder_Location\\"
plpath = pathlib.PurePath(path)
files = []
csv_filename = r"D:\Folder_Location\\"+str(plpath.name)+".csv"
#Create New CSV
with open(csv_filename, mode='w',newline='') as c:
writer = csv.writer(c)
writer.writerow(['Date','Name'])
I expected the code to create a new .csv file that would then be used by the rest of the script in the specific folder location, but instead I got the following error:
File "C:/Users/USER/.PyCharm2018.2/config/scratches/file.py", line 14, in <module>
with open(csv_filename, mode='w',newline='') as c:
FileNotFoundError: [Errno 2] No such file or directory: '[INTENDED FILE NAME]'
Process finished with exit code 1
The error code correctly builds the file name, but then says that it can't find the location, leading me to believe, again, that it's not the code itself but an issue with the separate drives (speculating). Also, line 14 is where the with open(...) starts.
EDIT: I tested a theory, and moved the folder to the C: drive, updated the path with just a copy and paste from the new location (still using the \ at the end of the file path in Python), and it worked. The new .csv file is now there. So why would the Drive make a difference? Permission issue for Python?
The raw string can not end with one single backslash '\' so what you are using in your code like in path = r"D:\Folder_Location\\" is the right thing but actually you don't need any backslashes at the end of your path:
i ran some similar tests like yours and all goes well, only got the same error when i used a non existing directory
this is what i got:
FileNotFoundError: [Errno 2] No such file or directory: 'E:\\python\\myProgects\\abc\\\\sample3.txt'
so my bet is you have a non existing path assigned in path = r"D:\Folder_Location\\" or your path is referring to a file not a folder
to make sure just run this:
import os
path = r"D:\Folder_Location\\"
print(os.path.isdir(path)) # print true if folder already exists
better approach:
file_name = str(plpath.name)+".csv"
path = r"D:\Folder_Location"
csv_filename = os.path.join(path, file_name)

How to save multiple file from URLs in one zip using python?

I am scrapping the files from URLs using beautiful soup, and then want to store those files in a single zip using Python. Below is my code snippet for one URL.
fz = zipfile.ZipFile('C:\\Users\\ADMIN\\data\\data.zip', 'w')
response = urllib2.urlopen(url/file_name.txt)
file = open('C:\\Users\\ADMIN\\data\\filename.txt','w')
file.write(response.read())
file.close()
fz.write('C:\\Users\\ADMIN\\data\\filename.txt',compress_type=zipfile.ZIP_DEFLATED) fz.close()
This snippet is not working for me can any one please help me on this. getting below error:
WindowsError: [Error 2] The system cannot find the file specified:
'C:\Users\ADMIN\data\filename.txt'
but file is present in this location.
Use:
fz.writestr("file_name", url.read())
as many times as you need. I.e. one writestr() per file. Select the zip's mode (deflated) at the opening of the new ZIP.
So, you do not need to save file to disk, then pack it. Just get the html's name and the content and feed them to writestr(). ZIP gets '/' as a path separator. Therefore you use something like: "/some_dir/some_subdir/index.html" for subdirectories or "/index.html" to put a file into root.

Categories

Resources