Python Iterate over Folders and combine csv files inside

Python Iterate over Folders and combine csv files inside - python

Windows OS - I've got several hundred subdirectories and each subdirectory contains 1 or more .csv files. All the files are identical in structure. I'm trying to loop through each folder and concat all the files in each subdirectory into a new file combining all the .csv files in that subdirectory.
example:
folder1 -> file1.csv, file2.csv, file3.csv -->> file1.csv, file2.csv, file3.csv, combined.csv
folder2 -> file1.csv, file2.csv -->> file1.csv, file2.csv, combined.csv
Very new to coding and getting lost in this. Tried using os.walk but completely failed.

The generator produced by os.walk yields three items each iteration: the path of the current directory in the walk, a list of paths representing sub directories that will be traversed next, and a list of filenames contained in the current directory.
If for whatever reason you don't want to walk certain file paths, you should remove entries from what I called sub below (the list of sub directories contained in root). This will prevent os.walk from traversing any paths you removed.
My code does not prune the walk. Be sure to update this if you don't want to traverse an entire file subtree.
The following outline should work for this although I haven't been able to test this on Windows. I have no reason to think it'll behave differently.
import os
import sys
def write_files(sources, combined):
# Want the first header
with open(sources[0], 'r') as first:
combined.write(first.read())
for i in range(1, len(sources)):
with open(sources[i], 'r') as s:
# Ignore the rest of the headers
next(s, None)
for line in s:
combined.write(line)
def concatenate_csvs(root_path):
for root, sub, files in os.walk(root_path):
filenames = [os.path.join(root, filename) for filename in files
if filename.endswith('.csv')]
combined_path = os.path.join(root, 'combined.csv')
with open(combined_path, 'w+') as combined:
write_files(filenames, combined)
if __name__ == '__main__':
path = sys.argv[1]
concatenate_csvs(path)

Related

How do i make a list with values taken from different text files?

I have a folder, which i want to select manually, with an X number of .txt files. I want to make a program that allows me to run it -> select my folder with files -> And cycle through all files in the folder and take a value from a set place.
I have already made a piece of code that allows me to take the value from the .txt file:
mylines = []
with open ('test1.txt', 'rt') as myfile:
for myline in myfile:
mylines.append(myline)
subline = mylines[58]
sub = subline.split(' ')
print(sub[5])`
EDIT: I also have a piece of code that makes a list of directories with all the files I want to use this on:
'''
import glob
path = r'C:/Users/Etienne/.spyder-py3/test/*.UIT'
files = glob.glob(path)
print(files)
'''
How can I use the first piece of code on every file in the list from the second piece of code so i end up with a list of values?
I never worked with coding but this would make my work a lot faster so I want to pick up python.

If I understood the problem correctly, the os module might be helpful for you.
***os.listdir() method in python is used to get the list of all files and directories in the specified directory.For example;
import os
# Get the list of all files and directories
# in the root directory, you can change your directory
path = "/"
dir_list = os.listdir(path)
print("Files and directories in '", path, "' :")
# print the list
print(dir_list)
with this list you can iterate your txt files.
To additional information you can click
How can I iterate over files in a given directory?

Python to go through multiple folders and process files inside them

I have multiple folders than contain about 5-10 files each. What I am trying to do is go to the next folder when finishing processing files from the previous folders and start working on the new files. I have this code:
for root, dirs, files in os.walk("Training Sets"): #Path that contains folders
for i in dirs: #if I don't have this, an error is shown in line 4 that path needs to be str and not list
for file in i: #indexing files inside the folders
path = os.path.join(i, files) #join path of the files
dataset = pd.read_csv(path, sep='\t', header = None) #reading the files
trainSet = dataset.values.tolist() #some more code
editedSet = dataset.values.tolist() #some more code
#rest of the code...
The problem is that it doesn't do anything. Not even printing if I add prints for debugging.

First off, be sure that you are in the correct top-level directory (i.e. the one containing "Training Sets". You can check this with os.path.abspath(os.curdir). Otherwise, the code does nothing since it does not find the directory to walk.
os.walk does the directory walking for you. The key is understanding root (the path to the current directory), dirs (a list of subdirectories) and files (a list of files in the current directory). You don't actually need dirs.
So your code is two loops:
>>> for root, dirs, files in os.walk("New Folder1"): #Path that contains folders
... for file in files: #indexing files inside the folders
... path = os.path.join(root, file) #join path of the files
... print(path) # Your code here
...
New Folder1\New folder1a\New Text Document.txt
New Folder1\New folder1b\New Text Document2.txt

Python: search multiple directories and grab newest file, deleting others

New to python and would appreciate a little help.
I would like to go through 10 directories and copy the newest file from each directory back into a single folder. There may be multiple files in each directory.
I can pull a complete listing from each directory, not sure how to narrow this down. Any direction would be appreciated.
inside the STATES directory will be directories for each state (i.e. CA, NY, FL, MI, GA)
**Edited if it is helpful, the directory structure looks like this:
'/dat/users/states/CA/'
'/dat/users/states/NY/'
'/dat/users/states/MI/'
import glob
import os
data_dir = '/dat/users/states/*/'
file_dir_extension = os.path.join(data_dir, '*.csv')
for file_name in glob.glob(file_dir_extension):
if file_name.endswith('.csv'):
print (file_name)

You can use os.walk() instead of glob.glob() to traverse all of your folders. For each folder you get a list of the filename in it. This can be sorted by date using os.path.getmtime(). This will result in the newest file being at the start of the list.
Pop the first element off the list and copy this to your target folder. The remaining elements in the list could then be deleted using os.remove() as follows:
import os
import shutil
root = r'/src/folder/'
copy_to = r'/copy to/folder'
for dirpath, dirnames, filenames in os.walk(root):
# Filter only csv files
files = [file for file in filenames if os.path.splitext(file)[1].lower() == '.csv']
# Sort list by file date
files = sorted(files, key=lambda x: os.path.getmtime(os.path.join(dirpath, x)), reverse=True)
if files:
# Copy the newest file
copy_me = files.pop(0)
print("Copying '{}'".format(copy_me))
shutil.copyfile(os.path.join(dirpath, copy_me), os.path.join(copy_to, copy_me))
# Remove the remaining files
for file in files:
src = os.path.join(dirpath, file)
print("Removing '{}'".format(src))
#os.remove(src)
os.path.join() is used to safely join a path and filename together.
Note: If it is supported on your system, you might need to use something like:
os.stat(os.path.join(dirpath, x)).st_birthtime
to sort based on the creation date/time.

Python - Combing data from different .csv files. into one

I need some help from python programmers to solve the issue I'm facing in processing data:-
I have .csv files placed in a directory structure like this:-
-MainDirectory
Sub directory 1
sub directory 1A
fil.csv
Sub directory 2
sub directory 2A
file.csv
sub directory 3
sub directory 3A
file.csv
Instead of going into each directory and accessing the .csv files, I want to run a script that can combine the data of the all the sub directories.
Each file has the same type of header. And I need to maintain 1 big .csv file with one header only and all the .csv file data can be appended one after the other.
I have the python script that can combine all the files in a single file but only when those files are placed in one folder.
Can you help to provide a script that can handle the above directory structure?

Try this code, I tested it on my laptop,it works well!
import sys
import os
def mergeCSV(srcDir,destCSV):
with open(destCSV,'w') as destFile:
header=''
for root,dirs,files in os.walk(srcDir):
for f in files:
if f.endswith(".csv"):
with open(os.path.join(root,f),'r') as csvfile:
if header=='':
header=csvfile.readline()
destFile.write(header)
else:
csvfile.readline()
for line in csvfile:
destFile.write(line)
if __name__ == '__main__':
mergeCSV('D:/csv','D:/csv/merged.csv')

You don't have to put all the files in one folder. When you do something with the files, all you need is the path to the file. So gathering all the csv files' paths and the perform the combination.
import os
csvfiles = []
def Test1(rootDir):
list_dirs = os.walk(rootDir)
for root, dirs, files in list_dirs:
for f in files:
if f.endswith('.csv'):
csvfiles.append(os.path.join(root, f))

you can use os.listdir() to get list of files in directory

Python - Search for files & ZIP, across multiple directories

This is my first time hacking together bits and pieces of code to form a utility that I need (I'm a designer by trade) and, though I feel I'm close, I'm having trouble getting the following to work.
I routinely need to zip up files with a .COD extension that are inside of a directory structure I've created. As an example, the structure may look like this:
(single root folder) -> (multiple folders) -> (two folders) -> (one folder) -> COD files
I need to ZIP up all the COD files into COD.zip and place that zip file one directory above where the files currently are. Folder structure would look like this when done for example:
EXPORT folder -> 9800 folder -> 6 folder -> OTA folder (+ new COD.zip) -> COD files
My issues -
first, the COD.zip that it creates seems to be appropriate for the COD files within it but when I unzip it, there is only 1 .cod inside but the file size of that ZIP is the size of all the CODs zipped together.
second, I need the COD files to be zipped w/o any folder structure - just directly within COD.zip. Currently, my script creates an entire directory structure (starting with "users/mysuername/etc etc").
Any help would be greatly appreciated - and explanations even better as I'm trying to learn :)
Thanks.
import os, glob, fnmatch, zipfile
def scandirs(path):
for currentFile in glob.glob( os.path.join(path, '*') ):
if os.path.isdir(currentFile):
scandirs(currentFile)
if fnmatch.fnmatch(currentFile, '*.cod'):
cod = zipfile.ZipFile("COD.zip","a")
cod.write(currentFile)
scandirs(os.getcwd())

For problem #1, I think your problem is probably this section:
cod = zipfile.ZipFile("COD.zip","a")
cod.write(currentFile)
You're creating a new zip (and possibly overwriting the existing one) every time you go to write a new file. Instead you want to create the zip once per directory and then repeatedly append to it (see example below).
For problem #2, your issue is that you probably need to flatten the filename when you write it to the archive. One approach would be to use os.chdir to CD into each directory in scandirs as you look at it. An easier approach is to use the os.path module to split up the file path and grab the basename (the filename without the path) and then you can use the 2nd parameter to cod.write to change the filename that gets put into the actual zip (see example below).
import os, os.path, glob, fnmatch, zipfile
def scandirs(path):
#zip file goes at current path, then up one dir, then COD.zip
zip_file_path = os.path.join(path,os.path.pardir,"COD.zip")
cod = zipfile.ZipFile(zip_file_path,"a") #NOTE: will result in some empty zips at the moment for dirs that contain no .cod files
for currentFile in glob.glob( os.path.join(path, '*') ):
if os.path.isdir(currentFile):
scandirs(currentFile)
if fnmatch.fnmatch(currentFile, '*.cod'):
cod.write(currentFile,os.path.basename(currentFile))
cod.close()
if not cod.namelist(): #zip is empty
os.remove(zip_file_path)
scandirs(os.getcwd())
So create the zip file once, repeatedly append to it while flattening the filenames, then close it. You also need to make sure you call close or you may not get all your files written.
I don't have a good way to test this locally at the moment, so feel free to try it and report back. I'm sure I probably broke something. ;-)

The following code has the same effect but is more reusable and does not create multiple zip files.
import os,glob,zipfile
def scandirs(path, pattern):
result = []
for file in glob.glob( os.path.join( path, pattern)):
if os.path.isdir(file):
result.extend(scandirs(file, pattern))
else:
result.append(file)
return result
zfile = zipfile.ZipFile('yourfile.zip','w')
for file in scandirs(yourbasepath,'*.COD'):
print 'Processing file: ' + file
zfile.write(file) # folder structure
zfile.write(file, os.path.split(file)[1]) # no folder structure
zfile.close()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Iterate over Folders and combine csv files inside - python

Related

How do i make a list with values taken from different text files?

Python to go through multiple folders and process files inside them

Python: search multiple directories and grab newest file, deleting others

Python - Combing data from different .csv files. into one

Python - Search for files & ZIP, across multiple directories

Categories

Resources