For some research I am analysing an area for precipitation and runoff. Everything works fine when I analyse one area (pd.read_csv), but I would like to analyse multiple (>10) areas at the same time without manually changing the file every time.
My goals in steps:
Operations on all txt files in folder A.
Operations on all txt files in folder B.
Operations on combination of matching files from folder A and B. For example, put the precipitation of area 5 and runoff of area 5 in a graph together.
What I tried
I figured that this can be done by using for files in os.scandir(foldername) and then start operations on all files in that folder, but my method is definitely wrong.
The first thing I like to do is to add a index in all of my files. I changed my path to "folderA" for this question. I tried it this way:
for files in os.scandir(r"C:folderA"):
with open (file) in files:
file.loc[:,'dt'] = file.to_datetime(file[['year', 'month', 'day']])
file.index = file['dt']
file.columns = ['year', 'month', 'day','Rain','dt']
file.head()
This results in the error: FileNotFoundError
Traceback (most recent call last)
<ipython-input-44-1641f38697a9> in <module>
1 for files in os.scandir(r"C:\FolderA"):
----> 2 with open (file) in files:
3 file.loc[:,'dt'] = file.to_datetime(file[['year', 'month', 'day']])
4 file.index = file['dt']
5 file.columns = ['year', 'month', 'day','Rain','dt']
FileNotFoundError: [Errno 2] No such file or directory: 'C'
As you can see I can really use some help to get me started. I think that when I understand the process better, that I will be able to do the operations myself. In conclusion, my questions is how can I do operations automatically on the data of all txt. files in a folder?
To loop over all files in a directory, do this:
for root, dirs, files in os.walk(path):
for filename in files:
file_path = os.path.join(root, filename)
with open(file_path) as file:
# logic
make sure the path you give is correct
Related
I have 3000 csv files for machine learning and I need to treat each of these files separately, but the code I will apply is the same. File sizes and number of lines vary between 16 kb and 25 mb, and 60 lines and 330 thousand lines, respectively. Also, each csv file has 77 columns. I wrote only code inside the loop with the help of the previous post, but after applying the code, I cannot update within the same files. I just applied the code from previous post
and get error "No such file or directory: '101510EF'" (101510EF is my first csv file on my folder)
Loooking forward to your help. Thank you!
You don't need the line:
file_name=os.path.splitext(...)
Just this:
path = "absolute/path/to/your/folder"
os.chdir(path)
all_files = glob.glob('*.csv')
for file in all_files:
df = pd.read_csv(file)
df["new_column"] = df["seq"] + df["log_id"]
df.to_csv(file)
You need to provide absolute path WITH extension to pd.read_csv and df.to_csv methods.
e.g. c:/Users/kaanarik/Desktop/tez_deneme/ornel/101510EF.csv
I am new to python. I tried to search for answers but I cannot find a exact match to my question. I am trying to move all non-Excel files to another folder. However, there is an error when trying to move a .pbix file. I wonder if there are only limited number of filetypes supported by shutil.move() and os.rename() in moving files. And, are there any workarounds? Thank you.
UPDATE: The error is PermissionError. Actually, when I checked now the target folder, the file is transferred but the original file is retained.
Here is my sample code:
files = os.listdir(os.getcwd())
for f in files:
try:
data = pd.read_excel(f) # importing the file
except:
shutil.move("{}".format(f), r".\\Non_Excel_Files\{}".format(f))
It is now working. Thanks to the suggestion of S3DEV.
files = os.listdir(os.getcwd())
for f in files:
if os.path.splitext(f)[1] != ".xlsx":
shutil.move("{}".format(f), r".\\Non_Excel_Files\{}".format(f))
So far I've managed to compile all of the files from a series of folders using the following:
path = r'C:\Users\keefr\Documents\Data\Pulse Characterisation\sample 7'
subfolders = [f.path for f in os.scandir(path) if f.is_dir()]
for sub in subfolders:
for f in os.listdir(sub):
print(f)
files = [i for i in f if os.path.isfile(os.path.join(f,'*.txt')) and 'data' in f]
Where f prints out the names of all of the files. What I want to do is take only certain files from this (starts with 'data' and is a .txt file) and put these in an array called files. The last line in the above code is where I tried to do this but whenever I print files it's still an empty array. Any ideas where I'm going wrong and how to fix it?
Update
I've made some progress, I changed the last line to:
if os.path.isfile(os.path.join(sub,f)) and 'data' in f:
files.append(f)
So I now have an array with the correct file names. The problem now is that there's a mix of .meta, .index and .txt files and I only want the .txt files. What's the best way to filter out the other types of files?
I would probably do it like this. Considering f is the filename, and is a string, python has functions startswith() and endswith() that can be applied to specifically meet your criteria of starting with data and ending with .txt. If we find such a file, we append it to file_list. If you want the full path in file_list, I trust you are able to make that modification.
import os
path = r'C:\Users\keefr\Documents\Data\Pulse Characterisation\sample 7'
subfolders = [f.path for f in os.scandir(path) if f.is_dir()]
file_list = []
for sub in subfolders:
for f in os.listdir(sub):
if (f.startswith("data") and f.endswith(".txt")):
file_list.append(f)
print(file_list)
Problem: I have around 100 folders in 1 main folder each with a csv file with the exact same name and structure, for example input.csv. I have written a Python script for one of the files which takes the csv-file as an input and gives two images as output in the same folder. I want to produce these images and put them in every folder given the input per folder.
Is there a fast way to do this? Until now I have copied my script every time in each folder and been executing it. For 5 folders it was alright, but for 100 this will get tedious and take a lot of time.
Can someone please help met out? I'm very new to coding w.r.t. directories, paths, files etc. I have already tried to look for a solution, but no succes so far.
You could try something like this:
import os
import pandas as pd
path = r'path\to\folders'
filename = 'input'
# get all directories within the top level directory
dirs = [os.path.join(path, val) for val in os.listdir(path) if os.path.isdir(os.path.join(path, val))]
# loop through each directory
for dd in dirs:
file_ = [f for f in os.listdir(dd) if f.endwith('.csv') and 'input' in f]
if file_:
# this assumes only a single file of 'filename' and extension .csv exists in each directory
file_ = file_[0]
else:
continue
data = pd.read_csv(file_)
# apply your script here and save images back to folder dd
I am currently learning how to work with Python and for the moment I am very fond of working with CSV files. I managed to learn a few things and now I want to apply what I learned to multiple files at once. But something got me confused. I have this code:
for root, dirs, files in os.walk(path):
for file in files:
if file.endswith(".csv"):
paths=os.path.join(root,file)
tables=pd.read_csv(paths, header='infer', sep=',')
print(paths)
print(tables)
It prints all the CSV files found in that folder in a certain format ( a kind of table with the first row being a header and the rest following under)
The trick is that I want to be able to access these anytime (print and edit) and what I wrote there only prints them ONCE. if I write print(paths) or prints(tables) anywhere else after that it only prints the LAST CSV file and its data, even though I believe it should do the same thing.
I also tried making similar separate codes for each print (tables and paths) but it only works for the first os.walk() - I just don`t get why it only works once.
Thank you!
You will want to store the DataFrames as you load them. Right now you are just loading and discarding.
dfs = []
for root, dirs, files in os.walk(path):
for file in files:
if file.endswith(".csv"):
paths=os.path.join(root,file)
tables=pd.read_csv(paths, header='infer', sep=',')
dfs.append(tables)
print(paths)
print(tables)
The above will give you a list of DataFrames dfs that you can then access and utilize. Like so:
print(dfs[0])
# prints the first DataFrame you read in.
for df in dfs:
print(df)
# prints each DataFrame in sequence
Once you have the data stored you can do pretty much anything.