Extracting specific files within directory - Windows - python

I am running a loop which needs to access circa 200 files in the directory.
In the folder - the format of the files range as follows:
Excel_YYYYMMDD.txt
Excel_YYYYMMDD_V2.txt
Excel_YYYYMMDD_orig.txt
I only need to extract the first one - that is YYYYMMDD.txt, and nothing else
I am using glob.glob to access the directory where i specified my path name as follows:
path = "Z:\T\Al8787\Box\EAST\OT\\ABB files/2019/*[0-9].txt"
However the code also extracts the .Excel_YYYYMMDD_orig.txt file too
Appreciate assistance on how to modify code to only extract desired files.

Here is a cheap way to do it (and by cheap I mean probably not the best/cleanest method):
import glob
l = glob.glob("Excel_[0-9]*.txt")
This will get you:
>>> print(l)
['Excel_19900717_orig.txt', 'Excel_19900717_V2.txt', 'Excel_19900717.txt']
Now filter it yourself:
nl = [x for x in l if "_orig" not in x and "_V2" not in x]
This will give you:
>>> print(nl)
['Excel_19900717.txt']
The reason for manually filtering through our glob is because the glob library does not support regex.

Use ^Excel_[0-9]{8}\.txt as the file matching regex.

Related

Deleting the useless output files using Python

After I execute a python script from a particular directory, I get many output files but apart from 5-6 files I want to delete the rest from that directory. What I have done is, I have taken those 5-6 useful files inside a list and deleted all the other files which are not there in that list. Below is my code:
list1=['prog_1.py', 'prog_2.py', 'prog_3.py'] #Extend
import os
dir = '/home/dev/codes' #Change accordingly
for f in os.listdir(dir):
if f not in list1:
os.remove(os.path.join(dir, f))
Now here I just want to add one more thing, if the output files start with output_of_final, then I don't want them to be deleted. How can I do it? Should I use regex?
You could use Regex, but that's overkill here. Just use the str.startswith method.
Also, it's bad practice to use reserved keywords, built-in types and functions as variable names. I have renamed dir to directory. (https://docs.python.org/3/library/functions.html#dir)
list1 = ['prog_1.py', 'prog_2.py', 'prog_3.py'] # Extend
import os
directory = '/home/dev/codes' # Change accordingly
for f in os.listdir(directory):
if f not in list1 and not f.startswith('output_of_final'):
os.remove(os.path.join(directory, f))
yes the regex works here, but there are easier options like using startswith method for strings
list1=['prog_1.py', 'prog_2.py', 'prog_3.py'] #Extend
import os
dir = '/home/dev/codes' #Change accordingly
for f in os.listdir(dir):
if (f not in list1) and (not f.startswith('output_of_final')):
os.remove(os.path.join(dir, f))

Find file in a directory with python and if multiple files show up matching decide which to open

basically what the title says, what is the best approach to do this?
I was looking at a few tools like the os.walk and scandir but then I am not sure how I would store them and decide which file to open if they are multiples. I was thinking I would need to store in a dictionary and then decide which numbered item I want.
you can use
list_of_files = os.listdit(some_directory)
which returns a list of names of the files that exist in that directory, you can easily add some of these names to a dictionary based on their index in this list.
Here is a function that implements the specifications you have outlined. It may require some tinkering as your specs evolve, but it's an ok start. See the docs for the os builtin package for more info :)
import os
def my_files_dict(directory, filename):
myfilesdict = []
with os.scandir(directory) as myfiles:
for f in myfiles:
if f.name == filename and f.is_file:
myfilesdict.append(f.name)
return dict(enumerate(myfilesdict))

How can I read files with similar names on python, rename them and then work with them?

I've already posted here with the same question but I sadly I couldn't come up with a solution (even though some of you guys gave me awesome answers but most of them weren't what I was looking for), so I'll try again and this time giving more information about what I'm trying to do.
So, I'm using a program called GMAT to get some outputs (.txt files with numerical values). These outputs have different names, but because I'm using them to more than one thing I'm getting something like this:
GMATd_1.txt
GMATd_2.txt
GMATf_1.txt
GMATf_2.txt
Now, what I need to do is to use these outputs as inputs in my code. I need to work with them in other functions of my script, and since I will have a lot of these .txt files I want to rename them as I don't want to use them like './path/etc'.
So what I wanted was to write a loop that could get these files and rename them inside the script so I can use these files with the new name in other functions (outside the loop).
So instead of having to this individually:
GMATds1= './path/GMATd_1.txt'
GMATds2= './path/GMATd_2.txt'
I wanted to write a loop that would do that for me.
I've already tried using a dictionary:
import os
import fnmatch
dict = {}
for filename in os.listdir('.'):
if fnmatch.fnmatch(filename, 'thing*.txt'):
examples[filename[:6]] = filename
This does work but I can't use the dictionary key outside the loop.
If I understand correctly, you try to fetch files with similar names (at least a re-occurring pattern) and rename them. This can be accomplished with the following code:
import glob
import os
all_files = glob.glob('path/to/directory/with/files/GMAT*.txt')
for file in files:
new_path = create_new_path(file) # possibly split the file name, change directory and/or filename
os.rename(file, new_path)
The glob library allows for searching files with * wildcards and makes it hence possible to search for files with a specific pattern. It lists all the files in a certain directory (or multiple directories if you include a * wildcard as a directory). When you iterate over the files, you could either directly work with the input of the files (as you apparently intend to do) or rename them as shown in this snippet. To rename them, you would need to generate a new path - so you would have to write the create_new_path function that takes the old path and creates a new one.
Since python 3.4 you should be using the built-in pathlib package instead of os or glob.
from pathlib import Path
import shutil
for file_src in Path("path/to/files").glob("GMAT*.txt"):
file_dest = str(file_src.resolve()).replace("ds", "d_")
shutil.move(file_src, file_dest)
you can use
import os
path='.....' # path where these files are located
path1='.....' ## path where you want these files to store
i=1
for file in os.listdir(path):
if file.endswith(end='.txt'):
os.rename(path + "/" + file, path1 + "/"+str(i) + ".txt")
i+=1
it will rename all the txt file in the source folder to 1,2,3,....n.txt

Use part of non static file name to create new directory

I'm new to python and trying to figure some stuff out.
I'm already learning to use the shutil.copy, .move functions as well as scanning files with glob. However I have a few questions on a scenario I'm facing:
Find a file that gets deposited to the same directory everyday, but in which half the file name changes everyday,and use it to make a destination folder, or zip it up with zipfile.
Example:
File X110616.Filename_110416.txt comes in today.
Tomorrow it will be X110616.Filename_110423.txt.
Since half or part of the name changes everyday, how do I cut/save a specific part of the string for a function or module to create a destination folder, or a zip file?
I can use the glob module to scan for a file with wild card variables, and I've tried using the rstrip(), but that only seems to remove the last half and not the beginning or center of the string.
Also not sure how to save the variables it finds and use it else where to create directories or zip files. Bottom line is I know how to tell the script to look for non-static characters in string but not sure what direction to take it in when using/saving those characters for other things:
import glob
for f in glob.glob("C:\\users\%username%\\Documents\\Test_Files\\X??????.Filename_??????.txt"):
Newdir = f
print(Newdir)
#or use to make a directory, or zip file...
This will find me the file with any ending, however I can't seem to understand how to save the file's name or path (whatever it may be).
To get a substring in python you use the slice operator.
>>> a = "Hello World"
>>> a[0:5]
'Hello'
str.split is also very powerful.
>>> a.split(" ")
['Hello', 'World']
I will often solve problems like you describe with a combination of the two. But for really tricky parsing problems there are regular expressions.
>>> b = "whatsit__blah.foo"
>>> import re
>>> result = re.search("(?P<first>[a-z]+)__(?P<second>[a-z]+).(?P<ext>[a-z]+)", b)
>>> result.groups()
('whatsit', 'blah', 'foo')
>>> result.groups("first")
('whatsit', 'blah', 'foo')
>>> result.group("first")
'whatsit'
>>> result.group("second")
'blah'
>>> result.group("ext")
'foo'
As you can see there is a lot to regular expressions. Because of the added complexity I avoid them unless I have a very complex problem.
Two more functions you may find useful. The os.path module has .split(), which will split a path into the base directory and the filename and .splitext(), which will split a path on the last ".", return the extension and the remainder of the path.
So here's what I ended up doing, and it seemed to have worked. It made a folder for each file that was found in a specific directory, while only using a specific part of the file name to make a folder reflecting the filename.
destdir = "C:\\Users\\%USERNAME%\\Documents\\Test_Files\\test\\"
srcpath = "C:\\download\\"
for z in os.listdir("C:\\download"):
if z.endswith("FILE.FILENAME.ZIP"):
os.mkdir(destdir + z[0:7])
newdir = destdir + z[0:7]
print(newdir)
I added print at the end to show what it created.

Glob search files in date order?

I have this line of code in my python script. It searches all the files in in a particular directory for * cycle *.log.
for searchedfile in glob.glob("*cycle*.log"):
This works perfectly, however when I run my script to a network location it does not search them in order and instead searches randomly.
Is there a way to force the code to search by date order?
This question has been asked for php but I am not sure of the differences.
Thanks
To sort files by date:
import glob
import os
files = glob.glob("*cycle*.log")
files.sort(key=os.path.getmtime)
print("\n".join(files))
See also Sorting HOW TO.
Essentially the same as #jfs but in one line using sorted
import os,glob
searchedfiles = sorted(glob.glob("*cycle*.log"), key=os.path.getmtime)
Well. The answer is nope. glob uses os.listdir which is described by:
"Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order. It does not include the special entries '.' and '..' even if they are present in the directory."
So you are actually lucky that you got it sorted. You need to sort it yourself.
This works for me:
import glob
import os
import time
searchedfile = glob.glob("*.cpp")
files = sorted( searchedfile, key = lambda file: os.path.getctime(file))
for file in files:
print("{} - {}".format(file, time.ctime(os.path.getctime(file))) )
Also note that this uses creation time, if you want to use modification time, the function used must be getmtime.
If your paths are in sortable order then you can always sort them as strings (as others have already mentioned in their answers).
However, if your paths use a datetime format like %d.%m.%Y, it becomes a bit more involving. Since strptime does not support wildcards, we developed a module datetime-glob to parse the date/times from paths including wildcards.
Using datetime-glob, you could walk through the tree, list a directory, parse the date/times and sort them as tuples (date/time, path).
From the module's test cases:
import pathlib
import tempfile
import datetime_glob
def test_sort_listdir(self):
with tempfile.TemporaryDirectory() as tempdir:
pth = pathlib.Path(tempdir)
(pth / 'some-description-20.3.2016.txt').write_text('tested')
(pth / 'other-description-7.4.2016.txt').write_text('tested')
(pth / 'yet-another-description-1.1.2016.txt').write_text('tested')
matcher = datetime_glob.Matcher(pattern='*%-d.%-m.%Y.txt')
subpths_matches = [(subpth, matcher.match(subpth.name)) for subpth in pth.iterdir()]
dtimes_subpths = [(mtch.as_datetime(), subpth) for subpth, mtch in subpths_matches]
subpths = [subpth for _, subpth in sorted(dtimes_subpths)]
# yapf: disable
expected = [
pth / 'yet-another-description-1.1.2016.txt',
pth / 'some-description-20.3.2016.txt',
pth / 'other-description-7.4.2016.txt'
]
# yapf: enable
self.assertListEqual(subpths, expected)
Using glob no. Right now as you're using it, glob is storing all the files simultaneously in code and has no methods for organizing those files. If only the final result is important, you could use a second loop that checks the file's date and resorts based on that. If the parse order matters, glob is probably not the best way to do this.
You can sort the list of files that come back using os.path.getmtime or os.path.getctime. See this other SO answer and note the comments as well.

Categories

Resources