Concatenating multiple files through multiple folders

Concatenating multiple files through multiple folders - python

Im trying to create a single file out of multiple text files I have across multiple folders. This is my code for concatenating. It works only if the program file is placed in each folder:
import os
file_list = [each for each in cur_folder if each.endswith(".txt")]
print file_list
align_file = open("all_the_files.txt","w")
seq_list = []
for each_file in file_list:
f_o = open(file_path,"r")
seq = (f_o.read().replace("\n",""))
lnth = len(seq)
wholeseq = ">"+each_file+" | "+str(lnth)+" nt\n"+seq+"\n"
align_file.write(wholeseq)
print "done"
Now I tried to edit to make sure that it automatically runs through the entire Data folder and then enters the subdirectories and concatenates all the files without me having to paste the program file in each folder. This is the edit.
import os
dir_folder = os.listdir("C:\Users\GAMER\Desktop\Data")
for each in dir_folder:
cur_folder = os.listdir("C:\\Users\\GAMER\\Desktop\\Data\\"+each)
file_list = []
file_list = [each for each in cur_folder if each.endswith(".txt")]
print file_list
align_file = open("all_the_files.txt","w")
seq_list = []
for each_file in file_list:
f_o = open(file_path,"r")
seq = (f_o.read().replace("\n",""))
lnth = len(seq)
wholeseq = ">"+each_file+" | "+str(lnth)+" nt\n"+seq+"\n"
align_file.write(wholeseq)
print "done" , cur_folder
However when I run this , I get an error on the first file of the folder saying no such file exists. I can seem to understand why, specifically since it names the file which is not "hardcoded". Any help will be appreciated.
If the code looks ugly to you feel free to suggested better ways to do it.

Jamie is correct - os.walk is most likely the function you need.
An example based on your use case:
for root, dirs, files in os.walk(r"C:\Users\GAMER\Desktop\Data"):
for f in files:
if f.endswith('.txt'):
print(f)
This will print the name of every single file within every folder within the root directory passed in os.walk, as long as the filename ends in .txt.
Python's documentation is here: https://docs.python.org/2/library/os.html#os.walk

Related

Python code to merge multiple .wav files from multiple folders gets hung up

I have a bunch of wave files from an outdoor bird recorder that are broken up into 1 hour segments. Each days worth of audio is in a single folder and I have 30 days worth of folders. I am trying to iterate through the folders an merge each days audio into one file and export it with the folder name but each time i try to run it the print statements indicate that each for loop runs to completion before the merge function can be called, or it runs properly and the merge funtion throws a write error.
import wave
import os
#creates an empty object for the first folder name
rootfiles= ""
#sets the path for the starting location
path = "I:\SwiftOne_000"
#lists all folders in the directory "path"
dir_list = os.listdir(path)
print("Files and directories in '", path, "' :")
#iterates through folders in path
for i in dir_list:
#adds file name to original path
rootfiles = ( path + "\\" + i)
prefix = i
# define outfiles for waves
out_name = prefix
print("first loop completed")
for x in rootfiles:
myfiles= []
paths = rootfiles
ext = (".wav")
#print(paths)
dir_lists = os.listdir(paths)
#print(dir_lists)
#print("Files and directories in '", paths, "' :")
print("second loop completed")
for x in dir_lists:
myfiles.append( paths + "\\" + x)
#print (myfiles)
outfile= "D:\SwiftD\prefix" + prefix + ".wav"
wav_files = myfiles
print("third loop completed")
from contextlib import closing
with closing(wave.open(outfile, 'wb')) as output:
# find sample rate from first file
with closing(wave.open(wav_files[0])) as w:
output.setparams(w.getparams())
# write each file to output
for infile in wav_files:
with closing(wave.open(infile)) as w:
output.writeframes(w.readframes(w.getnframes()))

I think you want something like this, assuming your folder structure is:
- Swift (directory)
- Day1 (directory)
- File1
- File2
- File3
import os, wave
src = r'I:\SwiftOne_000'
output_folder = r'I:\OutputFolder'
input_data = {}
for d_name, d_path in [(d, path) for d in os.listdir(src) if os.path.isdir(path := os.path.join(src, d))]:
input_data[d_name] = [path for f in os.listdir(d_path) if f.lower().endswith('.wav') and os.path.isfile(path := os.path.join(d_path, f))]
print(input_data)
for d_name, paths in input_data.items():
with wave.open(os.path.join(output_folder, f'{d_name}.wav'), 'wb') as output:
params_written = False
for path in paths:
with wave.open(path, 'rb') as data:
if not params_written:
output.setparams(data.getparams())
params_written = True
output.writeframes(data.readframes(data.getnframes()))
There are a few issues with your code. It better to use os.path.join to concatentate paths rather than constructing the string yourself as it makes it platform independent (although you probably don't care). os.listdir will return files and folders so you should check the type with os.path.isfile or os.path.isdir to be sure. The case for the file extension isn't always in lower case so your extension check might not work; using .lower() means you can always check for .wav.
I'm pretty sure you don't need contentlib closing as the with block will already take care of this for you.
You are using the outfile variable to write to the file, however, you overwrite this each time you loop around the third loop, so you will only ever get one file corresponding to the last directory.
Without seeing the stack trace, I'm not sure what the write error is likely to be.

Add only files with extension to list

I'm trying to write a slideshow program, and part of that is adding image files. To do this, I want to be able to search a selected folder for all files that have certain extensions. I have looked, and I believe what I have below should work to pull in certain files. However, in testing I find that it is pulling in every file, and not just ones with the proper extension. Any help in figuring this out would be amazing, thank you!
def image_load_test(image_loop_path):
image_file_table_length = 0
path = image_loop_path
image_file_table = []
valid_images = [".gif",".png",".tga"]
for f in os.listdir(path):
ext = os.path.splitext(f)[1]
if ext.lower() not in valid_images:
continue
image_file_table.append(os.path.join(path,f))
image_file_table_length = image_file_table_length + 1
return image_file_table
I can tell this isn't working because I have a folder with many .png and .jpg images, and even though I am not adding .jpg to the valid_images table, those files still end up listed.

#Nathan, the following code can help you identifying the type of files
import glob
path_file = r'C:/Users/...' #path/folder of files where all the png files are kept
types = ('*.jpg', '*.png') #this is where you add the required file type
all_files = []
for files in types:
all_files.extend(glob.glob(path_file + files))

Take out the 'not' from the if statement line, remove your 'continue' you don't need it, and put the next two lines where it is instead (tab them in).

This ended up working for me, thank you everyone!
def image_load_test(image_loop_path):
image_file_table_length = 0
path = image_loop_path
image_file_table = []
valid_images = [".jpg",".png",".tga",".jpeg"]
for f in os.listdir(path):
ext = os.path.splitext(f)[1]
if ext.lower() in valid_images:
image_file_table.append(os.path.join(path,f))
image_file_table_length = image_file_table_length + 1
continue
else:
continue
return image_file_table

How to find and change all json.gz files by python

I have a folder, which has many sub folders, and in these sub folders is json.gz file with the same name.And these json files are with the same structure.
Such as ...\a\0\b.json.gz
...\a\1\b.josn.gz
...\a\2\b.josn.gz
What I want to do is give the path to .../a then alter a certain value in the json file.
My code is like this:
def getAllSourceFile(folder):
arrSource = []
for root,dirs,files in os.walk(folder):
for file in files:
if file.__str__()=='b.json.gz':
allSourceFile = os.path.join(root,file)
arrSource.append(allSourceFile)
return arrSource
def ModRootFile(rootM_file,Scale):
f=gzip.open(rootM_file,'r')
file_content=f.read()
f.close()
d=file_content.decode('utf8').split(',')
for i in d:
doc=d.get(i)
jsondoc=json.dumps(doc)
jsondict=json.loads(jsondoc)
print(i)
for k in jsondict["2"]:
k["atlas"]="false"
d.save(jsondict)
I'm trying to change "atlas" to "false", it finished without error, or at least without error hint.but nothing changed.
Could some one please tell me what is wrong? Thank you.

limit no of filepaths using os.walk where huge files in a directory

I have an application. One method which will allow a directory path and returns list of file paths under given directory using os.walk
I would like to read certain no of files(some threshold value like bring 20 file paths) in a directory where has huge no files and stores in Queue. Here i can have a check of file path with its status in database.
Next time when i call the same method with same directory, it should return next set of file paths by excluding already returned file paths.
Scenario:
Lets assume, D:/Sample_folder has 1000 no of files.
my_dir = "D:/Sample_folder"
def read_files(directory):
file_paths = []
for root, directories, files in os.walk(directory):
for filename in files:
file_path = os.path.join(root, filename)
file_paths.append(file_path)
return file_paths
read_files(my_dir) == which will give first 100 no of files in first turn
Next turn, it should give remaining set of 100 files
like so...
Any ideas or sample scripts for this.

Assuming you already have files populated, this should do.
import Queue
paths = Queue.Queue()
current_list = []
for i, path in enumerate(files):
# Second case to make sure we dont add a blank list
if i % 100 == 0 and i != 0:
paths.put(current_list)
current_list = []
current_list.append(path)

EDIT:
Here is a possible solution using a class, but it doesn't add much code. The main idea is to pop off an element each time it gets accessed. So the workflow is to make a FileListIter object, then call .next() on it to return a list of the next 100 files to do something with and then the object forgets them. You can call .has_next() to check if you're out of files. If you pass an argument to next like .next(2), then it will instead give back the first 2 files in a list.
CODE:
import os
class FileListIter(object):
#Initialize the files
def __init__(self,directory):
file_paths = []
for root, directories, files in os.walk(directory):
for filename in files:
file_path = os.path.join(root, filename)
file_paths.append(file_path)
self.files = file_paths
#When called w/out args give back the first 100 files, otherwise the first n
def next(self,n=100):
ret,self.files = self.files[:n],self.files[n:]
return ret
#Check if there are any files left
def has_next(self):
return len(self.files) > 0
d = '/home/rob/stack_overflow'
files_gen = FileListIter(d) #<-- this makes an object
while files_gen.has_next():
file_subset = files_gen.next(2)
print file_subset

Renaming multiple files in a directory using Python

I'm trying to rename multiple files in a directory using this Python script:
import os
path = '/Users/myName/Desktop/directory'
files = os.listdir(path)
i = 1
for file in files:
os.rename(file, str(i)+'.jpg')
i = i+1
When I run this script, I get the following error:
Traceback (most recent call last):
File "rename.py", line 7, in <module>
os.rename(file, str(i)+'.jpg')
OSError: [Errno 2] No such file or directory
Why is that? How can I solve this issue?
Thanks.

You are not giving the whole path while renaming, do it like this:
import os
path = '/Users/myName/Desktop/directory'
files = os.listdir(path)
for index, file in enumerate(files):
os.rename(os.path.join(path, file), os.path.join(path, ''.join([str(index), '.jpg'])))
Edit: Thanks to tavo, The first solution would move the file to the current directory, fixed that.

You have to make this path as a current working directory first.
simple enough.
rest of the code has no errors.
to make it current working directory:
os.chdir(path)

import os
from os import path
import shutil
Source_Path = 'E:\Binayak\deep_learning\Datasets\Class_2'
Destination = 'E:\Binayak\deep_learning\Datasets\Class_2_Dest'
#dst_folder = os.mkdir(Destination)
def main():
for count, filename in enumerate(os.listdir(Source_Path)):
dst = "Class_2_" + str(count) + ".jpg"
# rename all the files
os.rename(os.path.join(Source_Path, filename), os.path.join(Destination, dst))
# Driver Code
if __name__ == '__main__':
main()

As per #daniel's comment, os.listdir() returns just the filenames and not the full path of the file. Use os.path.join(path, file) to get the full path and rename that.
import os
path = 'C:\\Users\\Admin\\Desktop\\Jayesh'
files = os.listdir(path)
for file in files:
os.rename(os.path.join(path, file), os.path.join(path, 'xyz_' + file + '.csv'))

Just playing with the accepted answer define the path variable and list:
path = "/Your/path/to/folder/"
files = os.listdir(path)
and then loop over that list:
for index, file in enumerate(files):
#print (file)
os.rename(path+file, path +'file_' + str(index)+ '.jpg')
or loop over same way with one line as python list comprehension :
[os.rename(path+file, path +'jog_' + str(index)+ '.jpg') for index, file in enumerate(files)]
I think the first is more readable, in the second the first part of the loop is just the second part of the list comprehension

If your files are renaming in random manner then you have to sort the files in the directory first. The given code first sort then rename the files.
import os
import re
path = 'target_folder_directory'
files = os.listdir(path)
files.sort(key=lambda var:[int(x) if x.isdigit() else x for x in re.findall(r'[^0-9]|[0-9]+', var)])
for i, file in enumerate(files):
os.rename(path + file, path + "{}".format(i)+".jpg")

I wrote a quick and flexible script for renaming files, if you want a working solution without reinventing the wheel.
It renames files in the current directory by passing replacement functions.
Each function specifies a change you want done to all the matching file names. The code will determine the changes that will be done, and displays the differences it would generate using colors, and asks for confirmation to perform the changes.
You can find the source code here, and place it in the folder of which you want to rename files https://gist.github.com/aljgom/81e8e4ca9584b481523271b8725448b8
It works in pycharm, I haven't tested it in other consoles
The interaction will look something like this, after defining a few replacement functions
when it's running the first one, it would show all the differences from the files matching in the directory, and you can confirm to make the replacements or no, like this

This works for me and by increasing the index by 1 we can number the dataset.
import os
path = '/Users/myName/Desktop/directory'
files = os.listdir(path)
index=1
for index, file in enumerate(files):
os.rename(os.path.join(path, file),os.path.join(path,''.join([str(index),'.jpg'])))
index = index+1
But if your current image name start with a number this will not work.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Concatenating multiple files through multiple folders - python

Related

Python code to merge multiple .wav files from multiple folders gets hung up

Add only files with extension to list

How to find and change all json.gz files by python

limit no of filepaths using os.walk where huge files in a directory

Renaming multiple files in a directory using Python

Categories

Resources