Python - Combing data from different .csv files. into one - python

I need some help from python programmers to solve the issue I'm facing in processing data:-
I have .csv files placed in a directory structure like this:-
-MainDirectory
Sub directory 1
sub directory 1A
fil.csv
Sub directory 2
sub directory 2A
file.csv
sub directory 3
sub directory 3A
file.csv
Instead of going into each directory and accessing the .csv files, I want to run a script that can combine the data of the all the sub directories.
Each file has the same type of header. And I need to maintain 1 big .csv file with one header only and all the .csv file data can be appended one after the other.
I have the python script that can combine all the files in a single file but only when those files are placed in one folder.
Can you help to provide a script that can handle the above directory structure?

Try this code, I tested it on my laptop,it works well!
import sys
import os
def mergeCSV(srcDir,destCSV):
with open(destCSV,'w') as destFile:
header=''
for root,dirs,files in os.walk(srcDir):
for f in files:
if f.endswith(".csv"):
with open(os.path.join(root,f),'r') as csvfile:
if header=='':
header=csvfile.readline()
destFile.write(header)
else:
csvfile.readline()
for line in csvfile:
destFile.write(line)
if __name__ == '__main__':
mergeCSV('D:/csv','D:/csv/merged.csv')

You don't have to put all the files in one folder. When you do something with the files, all you need is the path to the file. So gathering all the csv files' paths and the perform the combination.
import os
csvfiles = []
def Test1(rootDir):
list_dirs = os.walk(rootDir)
for root, dirs, files in list_dirs:
for f in files:
if f.endswith('.csv'):
csvfiles.append(os.path.join(root, f))

you can use os.listdir() to get list of files in directory

Related

read all files in sub folder with pandas

My notebook is in the home folder where I also have another folder "test". In the test folder, I have 5 sub folders. Each of the folder contains a .shp file. I want to iterate in all sub folders within test and open all .shp files. It doesn't matter if they get overwritten.
data = gpd.read_file("./test/folder1/file1.shp")
data.head()
How can I do so? I tried this
path = os.getcwd()
files = glob.glob(os.path.join(path + "/test/", "*.shp"))
print(files)
but this would only go in 1 layer deep.
you can use the os.walk method in the os library.
import os
import pandas as pd
for root, dirs, files in os.walk("./test"):
for name in files:
fpath = os.path.join(root, name)
data = pd.read_file(fpath)
Just do os.chdir(path), and then use glob.glob(os.path.join('*.shp')). It should work.
You have already given the string to join 'os.path'.

Python loop through directories

I am trying to use python library os to loop through all my subdirectories in the root directory, and target specific file name and rename them.
Just to make it clear this is my tree structure
My python file is located at the root level.
What I am trying to do, is to target the directory 942ba loop through all the sub directories and locate the file 000000 and rename it to 000000.csv
the current code I have is as follow:
import os
root = '<path-to-dir>/942ba956-8967-4bec-9540-fbd97441d17f/'
for dirs, subdirs, files in os.walk(root):
for f in files:
print(dirs)
if f == '000000':
dirs = dirs.strip(root)
f_new = f + '.csv'
os.rename(os.path.join(r'{}'.format(dirs), f), os.path.join(r'{}'.format(dirs), f_new))
But this is not working, because when I run my code, for some reasons the code strips the date from the subduers
can anyone help me to understand how to solve this issue?
A more efficient way to iterate through the folders and only select the files you are looking for is below:
source_folder = '<path-to-dir>/942ba956-8967-4bec-9540-fbd97441d17f/'
files = [os.path.normpath(os.path.join(root,f)) for root,dirs,files in os.walk(source_folder) for f in files if '000000' in f and not f.endswith('.gz')]
for file in files:
os.rename(f, f"{f}.csv")
The list comprehension stores the full path to the files you are looking for. You can change the condition inside the comprehension to anything you need. I use this code snippet a lot to find just images of certain type, or remove unwanted files from the selected files.
In the for loop, files are renamed adding the .csv extension.
I would use glob to find the files.
import os, glob
zdir = '942ba956-8967-4bec-9540-fbd97441d17f'
files = glob.glob('*{}/000000'.format(zdir))
for fly in files:
os.rename(fly, '{}.csv'.format(fly))

Looping through multiple CSV-files in different folders and producing multiple outputs and put these in the same folder based on input

Problem: I have around 100 folders in 1 main folder each with a csv file with the exact same name and structure, for example input.csv. I have written a Python script for one of the files which takes the csv-file as an input and gives two images as output in the same folder. I want to produce these images and put them in every folder given the input per folder.
Is there a fast way to do this? Until now I have copied my script every time in each folder and been executing it. For 5 folders it was alright, but for 100 this will get tedious and take a lot of time.
Can someone please help met out? I'm very new to coding w.r.t. directories, paths, files etc. I have already tried to look for a solution, but no succes so far.
You could try something like this:
import os
import pandas as pd
path = r'path\to\folders'
filename = 'input'
# get all directories within the top level directory
dirs = [os.path.join(path, val) for val in os.listdir(path) if os.path.isdir(os.path.join(path, val))]
# loop through each directory
for dd in dirs:
file_ = [f for f in os.listdir(dd) if f.endwith('.csv') and 'input' in f]
if file_:
# this assumes only a single file of 'filename' and extension .csv exists in each directory
file_ = file_[0]
else:
continue
data = pd.read_csv(file_)
# apply your script here and save images back to folder dd

Python Iterate over Folders and combine csv files inside

Windows OS - I've got several hundred subdirectories and each subdirectory contains 1 or more .csv files. All the files are identical in structure. I'm trying to loop through each folder and concat all the files in each subdirectory into a new file combining all the .csv files in that subdirectory.
example:
folder1 -> file1.csv, file2.csv, file3.csv -->> file1.csv, file2.csv, file3.csv, combined.csv
folder2 -> file1.csv, file2.csv -->> file1.csv, file2.csv, combined.csv
Very new to coding and getting lost in this. Tried using os.walk but completely failed.
The generator produced by os.walk yields three items each iteration: the path of the current directory in the walk, a list of paths representing sub directories that will be traversed next, and a list of filenames contained in the current directory.
If for whatever reason you don't want to walk certain file paths, you should remove entries from what I called sub below (the list of sub directories contained in root). This will prevent os.walk from traversing any paths you removed.
My code does not prune the walk. Be sure to update this if you don't want to traverse an entire file subtree.
The following outline should work for this although I haven't been able to test this on Windows. I have no reason to think it'll behave differently.
import os
import sys
def write_files(sources, combined):
# Want the first header
with open(sources[0], 'r') as first:
combined.write(first.read())
for i in range(1, len(sources)):
with open(sources[i], 'r') as s:
# Ignore the rest of the headers
next(s, None)
for line in s:
combined.write(line)
def concatenate_csvs(root_path):
for root, sub, files in os.walk(root_path):
filenames = [os.path.join(root, filename) for filename in files
if filename.endswith('.csv')]
combined_path = os.path.join(root, 'combined.csv')
with open(combined_path, 'w+') as combined:
write_files(filenames, combined)
if __name__ == '__main__':
path = sys.argv[1]
concatenate_csvs(path)

Read csv files from multiple folders using a for loop

My code will read from a csv file and perform multiple operations/calculations then create another csv file, i have 8 folders to read/write from and i want my code to iterate through them one by one
lets say i have folders named Folder1 to Folder8, first of all how do i specify my code to read from a different directory instead of the default one where the python script exists?
this is part of my code
#read the columns from CSV
MAXCOLS = Number_Of_Buses + 1
Bus_Vol = [[] for _ in range(MAXCOLS)]
with open('p_voltage_table_output.csv', 'rb') as input:
for row in csv.reader(input, delimiter=','):
for i in range(MAXCOLS):
Bus_Vol[i].append(row[i] if i < len(row) else '')
for i in xrange(1,MAXCOLS):
dummy=0
#print('Bus_Vol[{}]: {}'.format(i, Bus_Vol[i]))
i want to be able to specify the directory folder to folder1 and also iterate through folder1 to folder8 which all have the same csv file with the same name
To read a directory other than where your script is located, you need to provide python the absolute path to the directory.
Windows style: c:\path\to\directory
*nix style: /path/to/directory
In either case it'll be a string.
You didn't specify if your target folders were in the same directory or not. If they are, it's a bit easier.
import os
path_to_parent = "/path/to/parent"
for folder in os.listdir(path_to_parent):
for csv_file in os.listdir(os.path.join(path_to_parent, folder)):
# Do whatever to your csv file here
If your folders are spread out on your system, then you have to provide an absolute path to each one:
import os
paths_to_folders = ['/path/to/folder/one', '/path/to/folder/two']
for folder in paths_to_folders:
for csv_file in os.listdir(folder):
# Do whatever to your csv file

Categories

Resources