Read csv files from multiple folders using a for loop - python

My code will read from a csv file and perform multiple operations/calculations then create another csv file, i have 8 folders to read/write from and i want my code to iterate through them one by one
lets say i have folders named Folder1 to Folder8, first of all how do i specify my code to read from a different directory instead of the default one where the python script exists?
this is part of my code
#read the columns from CSV
MAXCOLS = Number_Of_Buses + 1
Bus_Vol = [[] for _ in range(MAXCOLS)]
with open('p_voltage_table_output.csv', 'rb') as input:
for row in csv.reader(input, delimiter=','):
for i in range(MAXCOLS):
Bus_Vol[i].append(row[i] if i < len(row) else '')
for i in xrange(1,MAXCOLS):
dummy=0
#print('Bus_Vol[{}]: {}'.format(i, Bus_Vol[i]))
i want to be able to specify the directory folder to folder1 and also iterate through folder1 to folder8 which all have the same csv file with the same name

To read a directory other than where your script is located, you need to provide python the absolute path to the directory.
Windows style: c:\path\to\directory
*nix style: /path/to/directory
In either case it'll be a string.
You didn't specify if your target folders were in the same directory or not. If they are, it's a bit easier.
import os
path_to_parent = "/path/to/parent"
for folder in os.listdir(path_to_parent):
for csv_file in os.listdir(os.path.join(path_to_parent, folder)):
# Do whatever to your csv file here
If your folders are spread out on your system, then you have to provide an absolute path to each one:
import os
paths_to_folders = ['/path/to/folder/one', '/path/to/folder/two']
for folder in paths_to_folders:
for csv_file in os.listdir(folder):
# Do whatever to your csv file

Related

Move files one by one to newly created directories for each file with Python 3

What I have is an initial directory with a file inside D:\BBS\file.x and multiple .txt files in the work directory D:\
What I am trying to do is to copy the folder BBS with its content and incrementing it's name by number, then copy/move each existing .txt file to the newly created directory to make it \BBS1, \BBS2, ..., BBSn (depends on number of the txt).
Visual example of the Before and After:
Initial view of the \WorkFolder
Desired view of the \WorkFolder
Right now I have reached only creating of a new directory and moving txt in it but all at once, not as I would like to. Here's my code:
from pathlib import Path
from shutil import copy
import shutil
import os
wkDir = Path.cwd()
src = wkDir.joinpath('BBS')
count = 0
for content in src.iterdir():
addname = src.name.split('_')[0]
out_folder = wkDir.joinpath(f'!{addname}')
out_folder.mkdir(exist_ok=True)
out_path = out_folder.joinpath(content.name)
copy(content, out_path)
files = os.listdir(wkDir)
for f in files:
if f.endswith(".txt"):
shutil.move(f, out_folder)
I kindly request for assistance with incrementing and copying files one by one to the newly created directory for each as mentioned.
Not much skills with python in general. Python3 OS Windows
Thanks in advance
Now, I understand what you want to accomplish. I think you can do it quite easily by only iterating over the text files and for each one you copy the BBS folder. After that you move the file you are currently at. In order to get the folder_num, you may be able to just access the file name's characters at the particular indexes (e.g. f[4:6]) if the name is always of the pattern TextXX.txt. If the prefix "Text" may vary, it is more stable to use regular expressions like in the following sample.
Also, the function shutil.copytree copies a directory with its children.
import re
import shutil
from pathlib import Path
wkDir = Path.cwd()
src = wkDir.joinpath('BBS')
for f in os.listdir(wkDir):
if f.endswith(".txt"):
folder_num = re.findall(r"\d+", f)[0]
target = wkDir.joinpath(f"{src.name}{folder_num}")
# copy BBS
shutil.copytree(src, target)
# move .txt file
shutil.move(f, target)

Merge CSV files in ADLS2 that are prepared through DataBricks

While running DataBricks code and preparing CSV files and loading them into ADLS2, the CSV files are split into many CSV files and are being loaded into ADLS2.
Is there a way to merge these CSV files in ADLS2 thru pyspark.
Thanks
Is there a way to merge these CSV files in ADLS2 thru pyspark.
As i know,spark dataframe does makes the files separately.Theoretically,you could use spark.csv method which could accept list of strings as parameters.
>>> df = spark.read.csv('path')
Then use df.toPandas().to_csv() method to write objects into pandas dataframe.You could refer to some clues from this case:Azure Data-bricks : How to read part files and save it as one file to blob?.
However,i'm afraid that this process could not hold such high memory consumption. So,i'd suggest you just using os package to do the merge job directly.I tested below 2 snippet of code for your reference.
1st:
import os
path = '/dbfs/mnt/test/'
file_suffix = '.csv'
filtered_files = [file for file in files if file.endswith(file_suffix)]
print(filtered_files)
with open(path + 'final.csv', 'w') as final_file:
for file in filtered_files:
with open(file) as f:
lines = f.readlines()
final_file.writelines(lines[1:])
2rd:
import os
path = '/dbfs/mnt/test/'
file_suffix = '.csv'
filtered_files = [os.path.join(root, name) for root, dirs, files in os.walk(top=path , topdown=False) for name in files if name.endswith(file_suffix)]
print(filtered_files)
with open(path + 'final2.csv', 'w') as final_file:
for file in filtered_files:
with open(file) as f:
lines = f.readlines()
final_file.writelines(lines[1:])
The second one is compatible hierarchy.
In additional, i provide a way here which is using ADF copy activity to transfer multiple csv files into one file in ADLS gen2.
Please refer to this doc and configure the folder path in ADLS gen2 source dataset.Then set MergeFiles with copyBehavior property.(Besides, you could use wildFileName like *.csv to exclude files which you don't want to touch in the specific folder)
Merges all files from the source folder to one file. If the file name
is specified, the merged file name is the specified name. Otherwise,
it's an autogenerated file name.

Is there a way to change your cwd in Python using a file as an input?

I have a Python program where I am calculating the number of files within different directories, but I wanted to know if it was possible to use a text file containing a list of different directory locations to change the cwd within my program?
Input: Would be a text file that has different folder locations that contains various files.
I have my program set up to return the total amount of files in a given folder location and return the amount to a count text file that will be located in each folder the program is called on.
You can use os module in Python.
import os
# dirs will store the list of directories, can be populated from your text file
dirs = []
text_file = open(your_text_file, "r")
for dir in text_file.readlines():
dirs.append(dir)
#Now simply loop over dirs list
for directory in dirs:
# Change directory
os.chdir(directory)
# Print cwd
print(os.getcwd())
# Print number of files in cwd
print(len([name for name in os.listdir(directory)
if os.path.isfile(os.path.join(directory, name))]))
Yes.
start_dir = os.getcwd()
indexfile = open(dir_index_file, "r")
for targetdir in indexfile.readlines():
os.chdir(targetdir)
# Do your stuff here
os.chdir(start_dir)
Do bear in mind that if your program dies half way through it'll leave you in a different working directory to the one you started in, which is confusing for users and can occasionally be dangerous (especially if they don't notice it's happened and start trying to delete files that they expect to be there - they might get the wrong file). You might want to consider if there's a way to achieve what you want without changing the working directory.
EDIT:
And to suggest the latter, rather than changing directory use os.listdir() to get the files in the directory of interest:
import os
start_dir = os.getcwd()
indexfile = open(dir_index_file, "r")
for targetdir in indexfile.readlines():
contents = os.listdir(targetdir)
numfiles = len(contents)
countfile = open(os.path.join(targetdir, "count.txt"), "w")
countfile.write(str(numfiles))
countfile.close()
Note that this will count files and directories, not just files. If you only want files then you'll have to go through the list returned by os.listdir checking whether each item is a file using os.path.isfile()

Move certain files from specific subdirectories into a new directory

How do I move only certain files (not all files), from specific subdirectories (not all subdirectories), into a new directory?
The files that need to be moved have been listed in a CSV file and are about 85,000 in number. They have been mentioned with their absolute paths. All the files possess the same extension, i.e., .java. The number of specific subdirectories is about 13,000.
Is there a Python script (preferred) or a Shell script to do this?
N.B: The forums that I searched on returned solutions on how to move all files from within a single subdirectory into a new directory. They are mentioned below:
https://www.daniweb.com/programming/software-development/threads/473187/copymove-all-sub-directories-from-a-folder-to-another-folder-using-python
http://pythoncentral.io/how-to-copy-a-file-in-python-with-shutil/
Filter directory when using shutil.copytree?
https://unix.stackexchange.com/questions/207375/copy-certain-files-from-specified-subdirectories-into-a-separate-subdirectory
Assuming that your CSV file looks like this:
George,/home/geo/mtvernon.java,Betsy
Abe,/home/honest/gettys.java,Mary
You could move the files using this shell command:
$ cut -d, -f2 < x.csv | xargs -I '{}' mv '{}' /home/me/new-directory
Something like this might work for you:
import os
import csv
def move_file(old_file_path, new_directory):
if not os.path.isdir(new_directory):
os.mkdir(new_directory)
base_name = os.path.basename(old_file_path)
new_file_path = os.path.join(new_directory, base_name)
os.rename(old_file_path, new_file_path)
def parse_csv_file(csv_path):
csv_file = open(csv_path)
csv_reader = csv.reader(csv_file, delimiter=',', quotechar='"')
paths = list(csv_reader)
csv_file.close()
return paths
if __name__ == '__main__':
old_file_paths = parse_csv_file('your_csv_path')
for old_file_path in old_file_paths:
move_file(old_file_path, 'your_new_directory')
This assumes that your CSV file only contains paths, delimited by commas, and all of those files exist.

Python - Combing data from different .csv files. into one

I need some help from python programmers to solve the issue I'm facing in processing data:-
I have .csv files placed in a directory structure like this:-
-MainDirectory
Sub directory 1
sub directory 1A
fil.csv
Sub directory 2
sub directory 2A
file.csv
sub directory 3
sub directory 3A
file.csv
Instead of going into each directory and accessing the .csv files, I want to run a script that can combine the data of the all the sub directories.
Each file has the same type of header. And I need to maintain 1 big .csv file with one header only and all the .csv file data can be appended one after the other.
I have the python script that can combine all the files in a single file but only when those files are placed in one folder.
Can you help to provide a script that can handle the above directory structure?
Try this code, I tested it on my laptop,it works well!
import sys
import os
def mergeCSV(srcDir,destCSV):
with open(destCSV,'w') as destFile:
header=''
for root,dirs,files in os.walk(srcDir):
for f in files:
if f.endswith(".csv"):
with open(os.path.join(root,f),'r') as csvfile:
if header=='':
header=csvfile.readline()
destFile.write(header)
else:
csvfile.readline()
for line in csvfile:
destFile.write(line)
if __name__ == '__main__':
mergeCSV('D:/csv','D:/csv/merged.csv')
You don't have to put all the files in one folder. When you do something with the files, all you need is the path to the file. So gathering all the csv files' paths and the perform the combination.
import os
csvfiles = []
def Test1(rootDir):
list_dirs = os.walk(rootDir)
for root, dirs, files in list_dirs:
for f in files:
if f.endswith('.csv'):
csvfiles.append(os.path.join(root, f))
you can use os.listdir() to get list of files in directory

Categories

Resources