How to select file randomly from multiple sub-folders - python

I've multiple sub-folders and each sub-folder has multiple files. I need to select the sub-folder randomly and then need to select a random file in that sub-folder. Let's say I've five folders A, B, C, D, E, and each folder contains another folder named data and this data folder contains multiple files. I need to pick the folder randomly from the five folders and then open the data folder and finally randomly select a file.

Keep the folder names in a list.
import random
import os
folders = [0,1,2,3,4]
selected_folder = random.choice(folders)
path = selected_folder+"/data"
Now to take random file from the path, do random.choice() and pass the list of files in that path.
Use os.listdir(path) to get the list of files.

import os
import random
path = os.getcwd()
def getRandomFile(path):
randomDir = random.choice([(x) for x in list(os.scandir(path)) if x.is_dir()]).name
randomFile = random.choice([f for f in list(os.scandir(randomDir + "\\data\\"))]).name
return randomFile
print(getRandomFile(path))

Try this: (Python file must be in the same main folder as those 5 folders)
import os,random
lst=list(filter(lambda x: os.path.isdir(x), os.listdir('.'))) //get folder list
folder=random.choice(lst) //select random folder
os.chdir(os.path.join(os.path.dirname(__file__), folder, 'data')) // goto random folder/data
lst=list(filter(lambda x: os.path.isfile(x), os.listdir('.'))) //get file list
file=random.choice(lst) //get random file
print(file)

As I understand, you actually need 4 functions to build your block of code:
os.listdir(path) which list all files and directories at a location
os.path.isdir(path) to check if a element in a location is a directory
os.path.isfile(path) idem with a files
random.randrange(X) to find a random number in the range [0; X[
I'm sure you can find easily the doc concerning those functions as they are all in the standard library of python. Anyway here is your code:
import os
import random
path = "/home/johndoe/"
dirs = list(filter(lambda dir: os.path.isdir(os.path.join(path, dir)), os.listdir(path)))
dir_chosen = dirs[random.randrange(len(dirs))]
files_path = os.path.join(path, dir_chosen, "data")
files = list(filter(lambda file: os.path.isfile(os.path.join(files_path, file)), os.listdir(files_path)))
file_chosen = files[random.randrange(len(files))]
print("the file randomly chosen is: {}".format(os.path.join(files_path, file_chose )))
You can also check about os.path.join(a, b) if you don't know about it but it is basically equivalent to a + '/' + b on UNIX and a + '\' + b on Windows.

Related

Python: Prepend foldername to filename

So I've been working on this for quite a while now..
The incoming mail (paper) is scanned using a Xerox WorkCentre.
On the screen we select the matching scan folder for any customer/vendor (4 digit number).
So any invoice by vendor x is stored in a specific folder.
Now we'd like to rename the pdf-file by prepending the matching customer-ID (4 digits) to the file, which happens to be the name of the parent folder where the pdf is stored in.
On our server we have a folder structure where all the scans are stored like this:
S:/Scan/[4 digit number]/filename.pdf
e.g. S:/Scan/9020/
where the contents is like
doc_12345.pdf
doc_12346.pdf
[...]
Now I'd like to prepend the parent folder name to any file like this:
S:/Scan/9020/doc_12345.pdf becomes S:/Scan/9020/9020_doc_12345.pdf
S:/Scan/9021/doc_12346.pdf becomes S:/Scan/9021/9021_doc_12345.pdf
After the file has been renamed, it should be moved to a common folder like:
S:/Scan/Renamed/
I would appreciate any ideas :)
Try this:
import os
import glob
import pathlib
inp_dir = 'S:/Scan/'
out_dir = 'S:/Scan/Renamed/'
folder_list = [i for i in pathlib.Path(inp_dir).iterdir() if i.is_dir()]
for cust in folder_list:
flist = glob.glob(str(cust) + '/*.pdf')
flist = [pathlib.Path(i) for i in flist]
for file in flist:
new_name = f'{cust.name}_{file.name}'
os.rename(str(file), f'{out_dir}{new_name}')
import os
import shutil
source = 'S:/Scan/Source/'
target = 'S:/Scan/Renamed/'
for dpath, dnames, fnames in os.walk(source):
for f in fnames:
n = dpath.rsplit('/',2) [-1]
os.chdir(dpath)
if not f.startswith(n):
os.rename(f, f.replace(f, n+'_'+f))
nf = dpath+"/"+n+'_'+f
shutil.move(nf, target)
That's what I've got so far.
Seems to work.

Finding the newest files in multiple folders and send them to other folder

I have 3 folders (A, B and C) in Source. Each folder contains at least 1 file. I want to find the newest files in each folder and send that to the Destination which also contains folders A, B and C. The NOT-newest files will be moved to Archive, which also contains folders A, B and C. I used the code below, but I get the following error: NotADirectoryError: [WinError 267] The directory name is invalid: 'c:\\data\\AS\\Desktop\\Source\\A\\12.txt'
This is my code:
from datetime import datetime,timedelta
import shutil, os, os.path
import time
#Make Source, Destination and Archive paths.
source = r'c:\data\AS\Desktop\Source'
destination = r'c:\data\AS\Desktop\Destination'
archive = r'c:\data\AS\Desktop\Archive'
#First os.walk for the source folder itself
for root, dirs, files in os.walk(source):
for folder in dirs:
subdir=root+'\\'+folder
#second os.walk for each folder in the source folder (A, B, and C)
for subroot, subdirs, subfiles in os.walk(subdir):
for file in subfiles:
filePath=subroot+'\\'+file
maxi = max(os.listdir(filePath), key=os.path.getctime)
print(maxi)
I also would like to know what key stands for in key=os.path.getctime. Thank you all in advance
If your goal is only to move files in sub-directories one level below source, then you do not want to use os.walk(). That is a recursive walk and will enumerate all directories/files under the root directory. Instead, use os.listdir(), which will only list immediate sub-directories and files. Note also that os.path.getctime() requires a complete path and will not work given only the file name returned by os.listdir().
import os
import os.path
src = 'src'
dst = 'dst'
arc = 'arc'
for subdir in os.listdir(src):
subdir_path = os.path.join(src, subdir)
if not os.path.isdir(subdir_path):
# Only consider directories (a, b, c, ...) under src; skip files.
continue
# Get a list of absolute paths of _files_ in the sub-directory.
subfile_paths = [os.path.join(subdir_path, f) for f in os.listdir(subdir_path)]
subfile_paths = [p for p in subfile_paths if os.path.isfile(p)]
if len(subfile_paths) == 0:
# Skip empty sub-directories.
continue
newest_path = max(subfile_paths, key=os.path.getctime)
for subfile_path in subfile_paths:
if subfile_path == newest_path:
dst_root = dst
else:
dst_root = arc
dst_path = os.path.join(dst_root, subdir, os.path.basename(subfile_path))
os.rename(subfile_path, dst_path)
The error you are getting is a result of this line:
maxi = max(os.listdir(filePath), key=os.path.getctime)
You do not need to do the second os.walk function for folders A, B, and C. When you do this you are assigning a full filepath to the variable filePath. Then when you pass filePath to the max() function it throws the error you are seeing because it is expecting a folder. You should be passing the path to the A, B, and C folders to the max() function, not paths of individual files. You should be able to get rid of the second os.walk structure. Something like this:
for root, dirs, files in os.walk(source):
for folder in dirs:
subdir=root+'\\'+folder
maxi = max(os.listdir(subdir), key=os.path.getctime)
print(maxi)
Also, key=os.path.getctime is telling the max() function to use the created timestamp of the file to determine that max. So you are saying show me the maximum file where maximum is defined as the most recent created time.

How to randomly select a file in python

I am intermediate when it comes to python but when it comes to modules I struggle. I'm working on a project and I'm trying to assign a variable to a random directory or file within the current directory (any random thing within the directory). I would like it to just choose any random thing in that directory and then assign it to a variable.
The product should end up assigning a variable to a random object within the working directory. Thank you.
file = (any random file in the directory)
Edit: This works too
_files = os.listdir('.')
number = random.randint(0, len(_files) - 1)
file_ = _files[number]
Thank you everyone that helped :)
Another option is to use globbing, especially if you want to choose from some files, not all files:
import random, glob
pattern = "*" # (or "*.*")
filename = random.choice(glob.glob(pattern))
You can use
import random
import os
# random.choice selects random element
# os.listdir lists in current directory
filename=""
# filter out directories
while not os.path.isfile(filename):
filename=random.choice(os.listdir(directory_path))
with open(filename,'r') as file_obj:
# do stuff with file
_files = os.listdir('.')
number = random.randint(0, len(_files) - 1)
file_ = _files[number]
Line by line order:
It puts all the files in the directory into a list
Chooses a random number between 0 and the length of the directory - 1
Assigns _file to a random file
Here is an option to print and open a single random file from directory with mulitple sub-directories.
import numpy as np
import os
file_list = [""]
for root, dirs, files in os.walk(r"E:\Directory_with_sub_directories", topdown=False):
for name in files:
file_list.append(os.path.join(root, name))
the_random_file = file_list[np.random.choice(len(file_list))]
print(the_random_file)
os.startfile(the_random_file)

Extract files from a zip folder with sub folders into a single folder

I have a zip folder with several subfolders.
testfolder.zip contains files in the given format
testfolder/files 1,2,3
testfolder/Test1folder/files 5,6
testfolder/Test2folder/files 7,8
I need the output as
testfolder/files 1,2,3,4,5,6,7,8
I am able to unzip the folder with its subfolders but not in the desired way.
This is my attempt so far
import glob
import os
import zipfile
folder = 'E:/Test'
extension = '.zip'
zip_files = glob.glob(folder + extension)
for zip_filename in zip_files:
dir_name = os.path.splitext(zip_filename)[0]
os.mkdir(dir_name)
zip_handler = zipfile.ZipFile(zip_filename, "r")
zip_handler.extractall(dir_name)
Any help will be very appreciated.Thanks in advance.
Replace:
zip_handler.extractall(dir_name)
with something like this:
for z in zip_handler.infolist():
zip_handler.extract(z, dir_name)
This should by taking each file in the archive at a time and extracting it to the same point in the directory.
UPDATE:
Apparently it still extracts them relatively. Solved it by adding a few lines of code to your original snippet:
for p, d, f in os.walk(folder, topdown= False):
for n in f:
os.rename(os.path.join(p, n), os.path.join(dir_name, n))
for n in d:
os.rmdir(os.path.join(p, n))
This will move the files into the base folder and delete all empty folders that remain. This one I have tried and tested.

Creating folders by checking numbers at beginning of a string with python

I have a list of strings which contains files like below
filename = [ '000101 FL - Project Title Page.DOC',
'014200 FL - References.DOC',
'095446 FL - Fabric-Wrapped Ceiling Panels.DOC',
'142113 FL - ELECTRIC TRACTION FREIGHT ELEVATORS.DOC']
I want to check if a folder with a name consisting of Div + first two numbers in each string exist, such as Div00, Div01, Div09, Div14 in this case. If not I would like to create this folder. Then store the name of the file in this folder.
In pseudocode I believe it would be similar to
for file in filenames
if 'Div' + file[0][0] not a folder
make folder 'Div' + file[0][0]
add file to folder
else
add file to folder Div + file[0][0]
There will be multiple files starting with the same two numbers this is why I want to sort them into folder.
Let me know if you need any clarification.
Use os.mkdir to create a directory and shutil.copy2 to copy a file,
import os
import shutil
filenames = [ '000101 FL - Project Title Page.DOC']
for filename in filenames:
folder = 'Div' + filename[:2] # 'Div00'
# Create the folder if doesn't exist
if not os.path.exists(folder):
os.makedirs(folder)
# Copy the file to `folder`
if os.path.isfile(filename):
shutil.copy2(filename, folder) # metadata is copied as well
You can just check if there is a folder and make it if it does not exist
if not os.path.exists(dirName):
os.makedirs(dirName)
Try something like this:
import os
import shutil
for file in filenames
dir_name = "Div%s" % file[0:2]
if not os.path.isdir(dir_name)
os.makedirs(dir_name)
shutil.copy(file, dir_name)

Categories

Resources