So I've been working on this for quite a while now..
The incoming mail (paper) is scanned using a Xerox WorkCentre.
On the screen we select the matching scan folder for any customer/vendor (4 digit number).
So any invoice by vendor x is stored in a specific folder.
Now we'd like to rename the pdf-file by prepending the matching customer-ID (4 digits) to the file, which happens to be the name of the parent folder where the pdf is stored in.
On our server we have a folder structure where all the scans are stored like this:
S:/Scan/[4 digit number]/filename.pdf
e.g. S:/Scan/9020/
where the contents is like
doc_12345.pdf
doc_12346.pdf
[...]
Now I'd like to prepend the parent folder name to any file like this:
S:/Scan/9020/doc_12345.pdf becomes S:/Scan/9020/9020_doc_12345.pdf
S:/Scan/9021/doc_12346.pdf becomes S:/Scan/9021/9021_doc_12345.pdf
After the file has been renamed, it should be moved to a common folder like:
S:/Scan/Renamed/
I would appreciate any ideas :)
Try this:
import os
import glob
import pathlib
inp_dir = 'S:/Scan/'
out_dir = 'S:/Scan/Renamed/'
folder_list = [i for i in pathlib.Path(inp_dir).iterdir() if i.is_dir()]
for cust in folder_list:
flist = glob.glob(str(cust) + '/*.pdf')
flist = [pathlib.Path(i) for i in flist]
for file in flist:
new_name = f'{cust.name}_{file.name}'
os.rename(str(file), f'{out_dir}{new_name}')
import os
import shutil
source = 'S:/Scan/Source/'
target = 'S:/Scan/Renamed/'
for dpath, dnames, fnames in os.walk(source):
for f in fnames:
n = dpath.rsplit('/',2) [-1]
os.chdir(dpath)
if not f.startswith(n):
os.rename(f, f.replace(f, n+'_'+f))
nf = dpath+"/"+n+'_'+f
shutil.move(nf, target)
That's what I've got so far.
Seems to work.
Related
I am having trouble with changing file name manually
I have folder with lots of file with name like
202012_34324_3643.txt
202012_89543_0292.txt
202012_01920_1922.txt
202012_23442_0928.txt
202012_21346_0202.txt
what i want it to be renamed as below removing numbers before _ and after _ leaving number in between underscore.
34324.txt
89543.txt
01920.txt
23442.txt
21346.txt
i want a script that reads all files in the folder renames it like above mentioned.
Thanks
You could try using the os library in python.
import os
# retrieve current files in the directory
fnames = os.listdir()
# split the string by '_' and access the middle index
new_names = [fnames.split('_')[1]+'.txt' for fname in fnames]
for oldname, newname in zip(fnames, new_names):
os.rename(oldname, newname)
This will do the work for the current directory.
import os
fnames = os.listdir()
for oldName in fnames:
if oldName[-4:] == '.txt' and len(oldName) - len(oldName.replace("_","")) == 2:
s = oldName.split('_')
os.rename(oldName, s[1]+'_'+s[2]+'.txt')
I have the following directory structure with the following files:
Folder_One
├─file1.txt
├─file1.doc
└─file2.txt
Folder_Two
├─file2.txt
├─file2.doc
└─file3.txt
I would like to get only the .txt files from each folder listed. Example:
Folder_One-> file1.txt and file2.txt
Folder_Two-> file2.txt and file3.txt
Note: This entire directory is inside a folder called dataset. My code looks like this, but I believe something is missing. Can someone help me.
path_dataset = "./dataset/"
filedataset = os.listdir(path_dataset)
for i in filedataset:
pasta = ''
pasta = pasta.join(i)
for file in glob.glob(path_dataset+"*.txt"):
print(file)
from pathlib import Path
for path in Path('dataset').rglob('*.txt'):
print(path.name)
Using glob
import glob
for x in glob.glob('dataset/**/*.txt', recursive=True):
print(x)
You can use re module to check that filename ends with .txt.
import re
import os
path_dataset = "./dataset/"
l = os.listdir(path_dataset)
for e in l:
if os.path.isdir("./dataset/" + e):
ll = os.listdir(path_dataset + e)
for file in ll:
if re.match(r".*\.txt$", file):
print(e + '->' + file)
One may use an additional option to check and find all files by using the os module (this is of advantage if you already use this module):
import os
#get current directory, you may also provide an absolute path
path=os.getcwd()
#walk recursivly through all folders and gather information
for root, dirs, files in os.walk(path):
#check if file is of correct type
check=[f for f in files if f.find(".txt")!=-1]
if check!=[]:print(root,check)
Here is the code I am working on:
from qgis.core import*
import glob, os, shutil, time, qgis
path = r"C:\Temp\testinput"
dest = r"C:\Temp\testoutput"
fname = []
for root,d_names,f_names in os.walk(path):
for f in f_names:
if f.endswith('.kml'):
src = os.path.join(root,f)
print(time.strftime('%m/%d/%Y', time.gmtime(os.path.getmtime(src))))
print(os.path.realpath(src))
shutil.copy2(src, dest)
this code transverses through the directory and copies the files but does overwrites files with the same name. How do I prevent the overwriting part? I would like to have the same type of filed renamed to "filename-copy" if it catches a file with the same name in the new folder.
Very quick answer;
If the extension is known, it is not that hard. You can check upfront if a file exists. By adding something like this
exists = os.path.isfile(dest)
if exists:
os.rename(dest , dest.replace('.kml', '-copy.kml'))
So the entire thing looks like this:
from qgis.core import*
import glob, os, shutil, time, qgis
path = r"C:\Temp\testinput"
dest = r"C:\Temp\testoutput"
fname = []
for root,d_names,f_names in os.walk(path):
for f in f_names:
if f.endswith('.kml'):
src = os.path.join(root,f)
print(time.strftime('%m/%d/%Y', time.gmtime(os.path.getmtime(src))))
print(os.path.realpath(src))
exists = os.path.isfile(dest)
if exists:
os.rename(dest , dest.replace('.kml', '-copy.kml'))
shutil.copy2(src, dest)
Not so very quick answer;
But, this assumes that the "file-copy.kml" does not already exist. If you want to keep an X amount of copies, perhaps rename them differently.
In that case, I would advice something like this:
exists = os.path.isfile(dest)
if exists:
os.rename(dest , dest.replace('.kml', '-copy_'+id_generator()+'.kml'))
Where you place this someplace at the top of your file.
import string
import random
def id_generator(size=6, chars=string.ascii_uppercase + string.digits):
return ''.join(random.choice(chars) for _ in range(size))
Where I have borrowed the function that generates some random characters from this question;
Random string generation with upper case letters and digits in Python
I have a directory
* workingdir
* raw_data
* 2001
- a.dat
- b.dat
- c.dat
* 2002
- d.dat
- e.dat
- f.data
* 2003 etc.
How can I read these dat files into separate variables?
So far:
import os # Operating system interface
import glob # For Unix style pathnames
import numpy as np
workingdir = '/home/x/workingdir/'
#Directory for all raw data files
rawdatadir = os.path.abspath(os.path.join(os.getcwd(), os.path.pardir, "raw_data"))
for root, dirs, files in os.walk(rawdatadir):
for files in [f for f in files if f.endswith(".dat")]:
print(os.path.join(rawdatadir, files))
But this is giving me
/home/x/workingdir/raw_data/a.dat
/home/x/workingdir/raw_data/b.dat
So,
How can I get the full path of all the files
And import them (np.fromfile?)
Any "smarter" way to do this?
I come from an R/dataframe background and would prefer to mimic something near that.
You can get full path by replacing os.path.join(rawdatadir, files) with os.path.join(root, files)
root variable contains the directory path in which files listed in files are located.
Correct loop implementation would be:
Storing results can be done using dict if you want to access them by file name
results = {}
for root, dirs, files in os.walk(rawdatadir):
for file in filter(lambda f: f.endswith('.dat'), files):
results[file] = np.fromfile(os.path.join(root, file))
Use glob to find all files in subdirectories, the walking over the list and storing names and content. Its recursive option allows the token ** to match any path which includes subdirectories into the search.
from glob import iglob
import os.path
workingdir = '/home/x/workingdir/'
result = {}
for f in iglob(os.path.join(workingdir, './**/*.dat'), recursive=True):
result[f] = np.fromfile(os.path.abspath(f))
This cute single generator also allows us to express this in a nice pythonic form
files = iglob(os.path.join(workingdir, './**/*.dat'), recursive=True)
result = {f: np.fromfile(os.path.abspath(f)) for f in files}
I am intermediate when it comes to python but when it comes to modules I struggle. I'm working on a project and I'm trying to assign a variable to a random directory or file within the current directory (any random thing within the directory). I would like it to just choose any random thing in that directory and then assign it to a variable.
The product should end up assigning a variable to a random object within the working directory. Thank you.
file = (any random file in the directory)
Edit: This works too
_files = os.listdir('.')
number = random.randint(0, len(_files) - 1)
file_ = _files[number]
Thank you everyone that helped :)
Another option is to use globbing, especially if you want to choose from some files, not all files:
import random, glob
pattern = "*" # (or "*.*")
filename = random.choice(glob.glob(pattern))
You can use
import random
import os
# random.choice selects random element
# os.listdir lists in current directory
filename=""
# filter out directories
while not os.path.isfile(filename):
filename=random.choice(os.listdir(directory_path))
with open(filename,'r') as file_obj:
# do stuff with file
_files = os.listdir('.')
number = random.randint(0, len(_files) - 1)
file_ = _files[number]
Line by line order:
It puts all the files in the directory into a list
Chooses a random number between 0 and the length of the directory - 1
Assigns _file to a random file
Here is an option to print and open a single random file from directory with mulitple sub-directories.
import numpy as np
import os
file_list = [""]
for root, dirs, files in os.walk(r"E:\Directory_with_sub_directories", topdown=False):
for name in files:
file_list.append(os.path.join(root, name))
the_random_file = file_list[np.random.choice(len(file_list))]
print(the_random_file)
os.startfile(the_random_file)