Move certain files from specific subdirectories into a new directory - python

How do I move only certain files (not all files), from specific subdirectories (not all subdirectories), into a new directory?
The files that need to be moved have been listed in a CSV file and are about 85,000 in number. They have been mentioned with their absolute paths. All the files possess the same extension, i.e., .java. The number of specific subdirectories is about 13,000.
Is there a Python script (preferred) or a Shell script to do this?
N.B: The forums that I searched on returned solutions on how to move all files from within a single subdirectory into a new directory. They are mentioned below:
https://www.daniweb.com/programming/software-development/threads/473187/copymove-all-sub-directories-from-a-folder-to-another-folder-using-python
http://pythoncentral.io/how-to-copy-a-file-in-python-with-shutil/
Filter directory when using shutil.copytree?
https://unix.stackexchange.com/questions/207375/copy-certain-files-from-specified-subdirectories-into-a-separate-subdirectory

Assuming that your CSV file looks like this:
George,/home/geo/mtvernon.java,Betsy
Abe,/home/honest/gettys.java,Mary
You could move the files using this shell command:
$ cut -d, -f2 < x.csv | xargs -I '{}' mv '{}' /home/me/new-directory

Something like this might work for you:
import os
import csv
def move_file(old_file_path, new_directory):
if not os.path.isdir(new_directory):
os.mkdir(new_directory)
base_name = os.path.basename(old_file_path)
new_file_path = os.path.join(new_directory, base_name)
os.rename(old_file_path, new_file_path)
def parse_csv_file(csv_path):
csv_file = open(csv_path)
csv_reader = csv.reader(csv_file, delimiter=',', quotechar='"')
paths = list(csv_reader)
csv_file.close()
return paths
if __name__ == '__main__':
old_file_paths = parse_csv_file('your_csv_path')
for old_file_path in old_file_paths:
move_file(old_file_path, 'your_new_directory')
This assumes that your CSV file only contains paths, delimited by commas, and all of those files exist.

Related

Is there a way to change your cwd in Python using a file as an input?

I have a Python program where I am calculating the number of files within different directories, but I wanted to know if it was possible to use a text file containing a list of different directory locations to change the cwd within my program?
Input: Would be a text file that has different folder locations that contains various files.
I have my program set up to return the total amount of files in a given folder location and return the amount to a count text file that will be located in each folder the program is called on.
You can use os module in Python.
import os
# dirs will store the list of directories, can be populated from your text file
dirs = []
text_file = open(your_text_file, "r")
for dir in text_file.readlines():
dirs.append(dir)
#Now simply loop over dirs list
for directory in dirs:
# Change directory
os.chdir(directory)
# Print cwd
print(os.getcwd())
# Print number of files in cwd
print(len([name for name in os.listdir(directory)
if os.path.isfile(os.path.join(directory, name))]))
Yes.
start_dir = os.getcwd()
indexfile = open(dir_index_file, "r")
for targetdir in indexfile.readlines():
os.chdir(targetdir)
# Do your stuff here
os.chdir(start_dir)
Do bear in mind that if your program dies half way through it'll leave you in a different working directory to the one you started in, which is confusing for users and can occasionally be dangerous (especially if they don't notice it's happened and start trying to delete files that they expect to be there - they might get the wrong file). You might want to consider if there's a way to achieve what you want without changing the working directory.
EDIT:
And to suggest the latter, rather than changing directory use os.listdir() to get the files in the directory of interest:
import os
start_dir = os.getcwd()
indexfile = open(dir_index_file, "r")
for targetdir in indexfile.readlines():
contents = os.listdir(targetdir)
numfiles = len(contents)
countfile = open(os.path.join(targetdir, "count.txt"), "w")
countfile.write(str(numfiles))
countfile.close()
Note that this will count files and directories, not just files. If you only want files then you'll have to go through the list returned by os.listdir checking whether each item is a file using os.path.isfile()

Read csv files from multiple folders using a for loop

My code will read from a csv file and perform multiple operations/calculations then create another csv file, i have 8 folders to read/write from and i want my code to iterate through them one by one
lets say i have folders named Folder1 to Folder8, first of all how do i specify my code to read from a different directory instead of the default one where the python script exists?
this is part of my code
#read the columns from CSV
MAXCOLS = Number_Of_Buses + 1
Bus_Vol = [[] for _ in range(MAXCOLS)]
with open('p_voltage_table_output.csv', 'rb') as input:
for row in csv.reader(input, delimiter=','):
for i in range(MAXCOLS):
Bus_Vol[i].append(row[i] if i < len(row) else '')
for i in xrange(1,MAXCOLS):
dummy=0
#print('Bus_Vol[{}]: {}'.format(i, Bus_Vol[i]))
i want to be able to specify the directory folder to folder1 and also iterate through folder1 to folder8 which all have the same csv file with the same name
To read a directory other than where your script is located, you need to provide python the absolute path to the directory.
Windows style: c:\path\to\directory
*nix style: /path/to/directory
In either case it'll be a string.
You didn't specify if your target folders were in the same directory or not. If they are, it's a bit easier.
import os
path_to_parent = "/path/to/parent"
for folder in os.listdir(path_to_parent):
for csv_file in os.listdir(os.path.join(path_to_parent, folder)):
# Do whatever to your csv file here
If your folders are spread out on your system, then you have to provide an absolute path to each one:
import os
paths_to_folders = ['/path/to/folder/one', '/path/to/folder/two']
for folder in paths_to_folders:
for csv_file in os.listdir(folder):
# Do whatever to your csv file

How can I run a python script on many files to get many output files?

I am new at programming and I have written a script to extract text from a vcf file. I am using a Linux virtual machine and running Ubuntu. I have run this script through the command line by changing my directory to the file with the vcf file in and then entering python script.py.
My script knows which file to process because the beginning of my script is:
my_file = open("inputfile1.vcf", "r+")
outputfile = open("outputfile.txt", "w")
The script puts the information I need into a list and then I write it to outputfile. However, I have many input files (all .vcf) and want to write them to different output files with a similar name to the input (such as input_processed.txt).
Do I need to run a shell script to iterate over the files in the folder? If so how would I change the python script to accommodate this? I.e writing the list to an outputfile?
I would integrate it within the Python script, which will allow you to easily run it on other platforms too and doesn't add much code anyway.
import glob
import os
# Find all files ending in 'vcf'
for vcf_filename in glob.glob('*.vcf'):
vcf_file = open(vcf_filename, 'r+')
# Similar name with a different extension
output_filename = os.path.splitext(vcf_filename)[0] + '.txt'
outputfile = open(output_filename, 'w')
# Process the data
...
To output the resulting files in a separate directory I would:
import glob
import os
output_dir = 'processed'
os.makedirs(output_dir, exist_ok=True)
# Find all files ending in 'vcf'
for vcf_filename in glob.glob('*.vcf'):
vcf_file = open(vcf_filename, 'r+')
# Similar name with a different extension
output_filename = os.path.splitext(vcf_filename)[0] + '.txt'
outputfile = open(os.path.join(output_dir, output_filename), 'w')
# Process the data
...
You don't need write shell script,
maybe this question will help you?
How to list all files of a directory?
It depends on how you implement the iteration logic.
If you want to implement it in python, just do it;
If you want to implement it in a shell script, just change your python script to accept parameters, and then use shell script to call the python script with your suitable parameters.
I have a script I frequently use which includes using PyQt5 to pop up a window that prompts the user to select a file... then it walks the directory to find all of the files in the directory:
pathname = first_fname[:(first_fname.rfind('/') + 1)] #figures out the pathname by finding the last '/'
new_pathname = pathname + 'for release/' #makes a new pathname to be added to the names of new files so that they're put in another directory...but their names will be altered
file_list = [f for f in os.listdir(pathname) if f.lower().endswith('.xls') and not 'map' in f.lower() and not 'check' in f.lower()] #makes a list of the files in the directory that end in .xls and don't have key words in the names that would indicate they're not the kind of file I want
You need to import os to use the os.listdir command.
You can use listdir(you need to write condition to filter the particular extension) or glob. I generally prefer glob. For example
import os
import glob
for file in glob.glob('*.py'):
data = open(file, 'r+')
output_name = os.path.splitext(file)[0]
output = open(output_name+'.txt', 'w')
output.write(data.read())
This code will read the content from input and store it in outputfile.

How to merge files within a sub-directory and perform this function on mutliple sub-directories

I have a directory with ~2000 sub-directories and within each sub-directory there are between 2-10 txt files. I would like to open each sub-directory and merge or concatenate the contents into a single file, thus I would have 2000 directories, each with 1 txt file.
I have tried to do this using unix commands, but I can't seem to get the command to execute in a specific sub-directory and then change directories and perform the function again.
find . -maxdepth 1 -name "*.faa" -exec cat {}
Is there a way to turn this into a bash script and have it run over the entire directory, or should I look to something more like python to try and accomplish this task.
Thank you and I apologize if this has been asked.
This should give you what you want, and can be customized to your needs:
import os
OLD_BASE = '/tmp/so/merge/old'
NEW_BASE = '/tmp/so/merge/new'
NEW_NAME = 'merged.txt'
def merge_files(infiles, outfile):
with open(outfile, 'wb') as fo:
for infile in infiles:
with open(infile, 'rb') as fi:
fo.write(fi.read())
for (dirpath, dirnames, filenames) in os.walk(OLD_BASE):
base, tail = os.path.split(dirpath)
if base != OLD_BASE: continue # Don't operate on OLD_BASE, only children directories
# Build infiles list
infiles = sorted([os.path.join(dirpath, filename) for filename in filenames])
# Create output directory
new_dir = os.path.join(NEW_BASE, tail)
os.mkdir(new_dir) # This will raise an OSError if the directory already exists
# Build outfile name
outfile = os.path.join(new_dir, NEW_NAME)
# Merge
merge_files(infiles, outfile)
The end result is, for each directory in OLD_BASE, a directory of the same name is created in NEW_BASE. Inside each NEW_BASE subdirectory, a file called merged.txt is created with the concatenated contents of the files inside the corresponding OLD_BASE subdirectory.
So
<OLD_BASE>
DIR_1
FILE_1
FILE_2
DIR_2
FILE_3
FILE_4
FILE_5
DIR_3
FILE_6
Becomes
<NEW_BASE>
DIR_1
<NEW_NAME> (=FILE_1 + FILE_2)
DIR_2
<NEW_NAME> (=FILE_3 + FILE_4 + FILE_5)
DIR_3
<NEW_NAME> (=FILE_6)
I know you said it doesn't matter what order the files are merged, but this merges them alphabetically by filename (case-sensitive), in case future viewers are interested. If you're really not, you can remove the sorted() wrapping function.
If I understand correctly, this will do:
find -maxdepth 1 -type d -exec sh -c 'cd "$0" && cat *.faa > bigfile' {} \;
It finds all the subdirectories (non-recursively) in current directory, cd into them, and concatenate all the *.faa files into a file called bigfile (inside the subdirectory).

Python - Combing data from different .csv files. into one

I need some help from python programmers to solve the issue I'm facing in processing data:-
I have .csv files placed in a directory structure like this:-
-MainDirectory
Sub directory 1
sub directory 1A
fil.csv
Sub directory 2
sub directory 2A
file.csv
sub directory 3
sub directory 3A
file.csv
Instead of going into each directory and accessing the .csv files, I want to run a script that can combine the data of the all the sub directories.
Each file has the same type of header. And I need to maintain 1 big .csv file with one header only and all the .csv file data can be appended one after the other.
I have the python script that can combine all the files in a single file but only when those files are placed in one folder.
Can you help to provide a script that can handle the above directory structure?
Try this code, I tested it on my laptop,it works well!
import sys
import os
def mergeCSV(srcDir,destCSV):
with open(destCSV,'w') as destFile:
header=''
for root,dirs,files in os.walk(srcDir):
for f in files:
if f.endswith(".csv"):
with open(os.path.join(root,f),'r') as csvfile:
if header=='':
header=csvfile.readline()
destFile.write(header)
else:
csvfile.readline()
for line in csvfile:
destFile.write(line)
if __name__ == '__main__':
mergeCSV('D:/csv','D:/csv/merged.csv')
You don't have to put all the files in one folder. When you do something with the files, all you need is the path to the file. So gathering all the csv files' paths and the perform the combination.
import os
csvfiles = []
def Test1(rootDir):
list_dirs = os.walk(rootDir)
for root, dirs, files in list_dirs:
for f in files:
if f.endswith('.csv'):
csvfiles.append(os.path.join(root, f))
you can use os.listdir() to get list of files in directory

Categories

Resources