Recursion between two list of text files - python

I have the following question.
Let say I have a list of text files in a folder:
D:/Users/Roger/A
And another list of text files in another folder:
D:/Users/Roger/Reports
(The lists are the complete path to them), and they are ordered alphabetically so they match 1:1 for example:
FOLDER_A = ["D:/Users/Roger/A/a.txt", "D:/Users/Roger/A/b.txt"]
FOLDER_B = ["D:/Users/Roger/B/a.txt", "D:/Users/Roger/A/b.txt"]
I made a dictionary using both lists
Dict = {}
for i in range(len(FOLDER_A)):
Dict[FOLDER_A[i]] = FOLDER_B[i]
sorted(Dict.items())
Later on, I copied the information of a.txt in folder A to the a.txt file in folder B, doing a for loop, that iterated between the key and value of the dictionary.
My question: Is there a form to this by using some kind of recursion? instead of iterating through k,v in a dictionary with python 2.7.??
Thank you very much!

There is a form of recursion, as with all iterative algorithms there will be an alternate form. However, the recursive version is rarely used because of the likelihood of generating a stack overflow which would be due to the length of the list being longer than the stack space.
Recursive algorithms can be very expressive, but to me, the organisation of the data is asking to be iterated over.
btw your dict can be created with a dictionary comprehension:
Dict = { FOLDER_A[i]]:FOLDER_B[i] for i in range(len(FOLDER_A)) }

You don't need any recursion here. You can iterate through the files in the first folder and compare the names and then copy the information from the in the first folder to the file in the second folder, some thing like that:
import os
folder1= "folder1"
folder2 = "folder2"
for root, dirs, files in os.walk(folder1):
for file in files:
if file in os.listdir(folder2):
file_path = os.path.join(folder1, file)
file1_path = os.path.join(folder2, file)
if os.path.isfile(file1_path):
with open(file_path) as f:
lines = f.readlines()
with open(file1_path, 'a') as f1:
f1.writelines(lines)

Related

How to get ALL subdirectories, all levels deep except files in AWS S3 with python boto3

There are a lot of similar questions but I don't find an answer to exactly on this question. How to get ALL sub-directories starting from an initial one. The depth of the sub-directories is unknown.
Lets say I have:
data/subdir1/subdir2/file.csv
data/subdir1/subdir3/subdir4/subdir5/file2.csv
data/subdir6/subdir7/subdir8/file3.csv
So I would like to either get a list of all sub-directories all length deep OR even better all the paths one level before the files. In my example I would ideally want to get:
data/subdir1/subdir2/
data/subdir1/subdir3/subdir4/subdir5/
data/subdir6/subdir7/subdir8/
but I could work with this as well:
data/subdir1/
data/subdir1/subdir2/
data/subdir1/subdir3/
data/subdir1/subdir3/subdir4/
etc...
data/subdir6/subdir7/subdir8/
My code so far only gets me one level deep of directories:
result = await self.s3_client.list_objects(
Bucket=bucket, Prefix=prefix, Delimiter="/"
)
subfolders = set()
for content in result.get("CommonPrefixes"):
print(f"sub folder : {content.get('Prefix')}")
subfolders.add(content.get("Prefix"))
return subfolders
import os
# list_objects returns a dictionary. The 'Contents' key contains a
# list of full paths including the file name stored in the bucket
# for example: data/subdir1/subdir3/subdir4/subdir5/file2.csv
objects = s3_client.list_objects(Bucket='bucket_name')['Contents']
# here we iterate over the fullpaths and using
# os.path.dirname we get the fullpath excluding the filename
for obj in objects:
print(os.path.dirname(obj['Key'])
To make this a unique sorted list of directory "paths", we would use sort a set comprehension inline. Sets are unique, and sorted will convert this to a list.
See https://docs.python.org/3/tutorial/datastructures.html#sets
import os
paths = sorted({os.path.dirname(obj['Key']) for obj in objects})

Reading multiple files and putting them into a list of lists

I have a several files in a directory, and each file contains lines of data where each line is to be a separate element in a list. However, with multiple files I would like each file to be a list of their own within an overall list of lists.
So far, I have been able to read in all files in "my path", and separate each line into a separate element. But I have only managed to collapse them into one flat list. But my task is to output a list of lists based on separate files in the directory.
What I've tried:
lst=[]
for x in os.listdir("my path"):
with open(os.path.join("my path", x)) as f:
lst.append([x.strip() for x in f.read().split('\n')])
Edit: Looks like the code was OK, but the directory that I specified in "my path" was referencing a specific file and not the directory itself.
from pathlib import Path
def data(folder_path):
for p in filter(Path.is_file, Path(folder_path).iterdir()):
with open(p) as f:
yield [l.strip() for l in f]
print(list(data('my path')))

Comparing two differently formatted lists in Python?

I need to compare two lists of records. One list has records that are stored in a network drive:
C:\root\to\file.pdf
O:\another\root\to\record.pdf
...
The other list has records stored in ProjectWise, collaboration software. It contains only filenames:
drawing.pdf
file.pdf
...
I want to create a list of the network drive file paths that do not have a filename that is in the ProjectWise list. It must include the paths. Currently, I am searching a list of each line in the drive list with a regular expression consisting of a line ending with any of the names in the ProjectWise list. The script is taking an unbearably long time and I feel I am overcomplicating the process.
I have thought about using sets to compare the lists (set(list1)-set(list2)) but this would only work with and return filenames on their own without the paths.
If you use os.path.basename on the list that contains full paths to the file you can get the filename and can then compare that to the other list.
import os
orig_list = [path_dict[os.path.basename(path) for path in file_path_list]
missing_filepaths = set(orig_list) - set(file_name_list)
that will get you a list of all filenames that don't have an associated path and you should be able to go from there.
Edit:
So, you want a list of paths that don't have an associated filename, correct? Then pretty simple. Extending from the code before you can do this:
paths_without_filenames = [path for path in file_path_list if os.path.split(path)[1] in missing_filepaths]
this will generate a list of filepaths from your list of filepaths that don't have an associated filename in the list of filenames.

How to sort file names in a particular order using python

Is there a simple way to sort files in a directory in python? The files I have in mind come in an ordering as
file_01_001
file_01_005
...
file_02_002
file_02_006
...
file_03_003
file_03_007
...
file_04_004
file_04_008
What I want is something like
file_01_001
file_02_002
file_03_003
file_04_004
file_01_005
file_02_006
...
I am currently opening them using glob for the directory as follows:
for filename in glob(path):
with open(filename,'rb') as thefile:
#Do stuff to each file
So, while the program performs the desired tasks, it's giving incorrect data if I do more than one file at a time, due to the ordering of the files. Any ideas?
As mentioned, files in a directory are not inherently sorted in a particular way. Thus, we usually 1) grab the file names 2) sort the file names by desired property 3) process the files in the sorted order.
You can get the file names in the directory as follows. Suppose the directory is "~/home" then
import os
file_list = os.listdir("~/home")
To sort file names:
#grab last 4 characters of the file name:
def last_4chars(x):
return(x[-4:])
sorted(file_list, key = last_4chars)
So it looks as follows:
In [4]: sorted(file_list, key = last_4chars)
Out[4]:
['file_01_001',
'file_02_002',
'file_03_003',
'file_04_004',
'file_01_005',
'file_02_006',
'file_03_007',
'file_04_008']
To read in and process them in sorted order, do:
file_list = os.listdir("~/home")
for filename in sorted(file_list, key = last_4chars):
with open(filename,'rb') as thefile:
#Do stuff to each file
A much better solution is to use Tcl's "lsort -dictionary":
from tkinter import Tcl
Tcl().call('lsort', '-dict', file_list)
Tcl dictionary sorting will treat numbers correctly, and you will get results similar to the ones a file manager uses for sorting files.

How can I get the file names and contents of a whole directory?

I have a directory named main that contains two files: a text file named alex.txt that only has 100 as its contents, and another file named mark.txt that has 400.
I want to create a function that will go into the directory, and take every file name and that file's contents and store them (into a dict?). So the end result would look something like this:
({'alex.txt', '100'}, {'mark.txt', '400'})
What would be the best way of doing this for large amounts of files?
This looks like a good job for os.walk
d = {}
for path,dirs,fnames in os.walk(top):
for fname in fnames:
visit = os.path.join(path,fname)
with open(visit) as f:
d[visit] = f.read()
This solution will also recurse into subdirectories if they are present.
Using a dictionary looks like the way to go.
You can use os.listdir to get a list of the files in your directory. Then, iterate on the files, opening each of them, reading its input and storing them in your dictionary.
If your main directory has some subdirectories, you may want to use the os.walk function to process them recursively. Stick to os.listdir otherwise.
Note that an item of os.listdir is relative to main. You may want to add the path to main before opening the file. In that case, use os.path.join(path_to_main, f) where f is an item of os.listdir.
import os
bar = {}
[bar.update({i: open(i, 'r').read()}) for i in os.listdir('.')]
or (via mgilson)
bar = dict( (i,open(i).read()) for i in os.listdir('.') )

Categories

Resources