Use part of non static file name to create new directory - python

I'm new to python and trying to figure some stuff out.
I'm already learning to use the shutil.copy, .move functions as well as scanning files with glob. However I have a few questions on a scenario I'm facing:
Find a file that gets deposited to the same directory everyday, but in which half the file name changes everyday,and use it to make a destination folder, or zip it up with zipfile.
Example:
File X110616.Filename_110416.txt comes in today.
Tomorrow it will be X110616.Filename_110423.txt.
Since half or part of the name changes everyday, how do I cut/save a specific part of the string for a function or module to create a destination folder, or a zip file?
I can use the glob module to scan for a file with wild card variables, and I've tried using the rstrip(), but that only seems to remove the last half and not the beginning or center of the string.
Also not sure how to save the variables it finds and use it else where to create directories or zip files. Bottom line is I know how to tell the script to look for non-static characters in string but not sure what direction to take it in when using/saving those characters for other things:
import glob
for f in glob.glob("C:\\users\%username%\\Documents\\Test_Files\\X??????.Filename_??????.txt"):
Newdir = f
print(Newdir)
#or use to make a directory, or zip file...
This will find me the file with any ending, however I can't seem to understand how to save the file's name or path (whatever it may be).

To get a substring in python you use the slice operator.
>>> a = "Hello World"
>>> a[0:5]
'Hello'
str.split is also very powerful.
>>> a.split(" ")
['Hello', 'World']
I will often solve problems like you describe with a combination of the two. But for really tricky parsing problems there are regular expressions.
>>> b = "whatsit__blah.foo"
>>> import re
>>> result = re.search("(?P<first>[a-z]+)__(?P<second>[a-z]+).(?P<ext>[a-z]+)", b)
>>> result.groups()
('whatsit', 'blah', 'foo')
>>> result.groups("first")
('whatsit', 'blah', 'foo')
>>> result.group("first")
'whatsit'
>>> result.group("second")
'blah'
>>> result.group("ext")
'foo'
As you can see there is a lot to regular expressions. Because of the added complexity I avoid them unless I have a very complex problem.
Two more functions you may find useful. The os.path module has .split(), which will split a path into the base directory and the filename and .splitext(), which will split a path on the last ".", return the extension and the remainder of the path.

So here's what I ended up doing, and it seemed to have worked. It made a folder for each file that was found in a specific directory, while only using a specific part of the file name to make a folder reflecting the filename.
destdir = "C:\\Users\\%USERNAME%\\Documents\\Test_Files\\test\\"
srcpath = "C:\\download\\"
for z in os.listdir("C:\\download"):
if z.endswith("FILE.FILENAME.ZIP"):
os.mkdir(destdir + z[0:7])
newdir = destdir + z[0:7]
print(newdir)
I added print at the end to show what it created.

Related

Deleting files based on day within filename

I have a directory with files like: data_Mon_15-8-22.csv, data_Tue_16-8-22.csv, data_Mon_22-8-22.csv etc and I am trying to delete all but the Monday files. However, my script doesn't seem to differentiate between the filenames and just deletes everything despite me stating it. Where did I go wrong? Any help would be much appreciated!
My Code:
def file_delete():
directory = pathlib.Path('/Path/To/Data')
for file in directory.glob('data_*.csv'):
if file != 'data_Mon_*.csv':
os.remove(file)]
if all Monday files start with "data_Mon_" then you might use str.startswith:
def file_delete():
directory = pathlib.Path('/Path/To/Data')
for file in directory.glob('data_*.csv'):
if not file.name.startswith('data_Mon_'):
os.remove(file)
if file != 'data_Mon_*.csv'
There's two problems here:
file is compared against the string 'data_Mon_*.csv'. Since file isn't a string, these two objects will never be equal. So the if condition will always be true. To fix this, you need to get the file's name, rather than using the file object directly.
Even if you fix this, the string 'data_Mon_*.csv' is literal. In other words, the * is a *. Unlike directory.glob('data_*.csv'), this will only match a * rather than match "anything" as in a glob expression. In order to fix this, you need to use a regular expression to match against your file name.

Extracting specific files within directory - Windows

I am running a loop which needs to access circa 200 files in the directory.
In the folder - the format of the files range as follows:
Excel_YYYYMMDD.txt
Excel_YYYYMMDD_V2.txt
Excel_YYYYMMDD_orig.txt
I only need to extract the first one - that is YYYYMMDD.txt, and nothing else
I am using glob.glob to access the directory where i specified my path name as follows:
path = "Z:\T\Al8787\Box\EAST\OT\\ABB files/2019/*[0-9].txt"
However the code also extracts the .Excel_YYYYMMDD_orig.txt file too
Appreciate assistance on how to modify code to only extract desired files.
Here is a cheap way to do it (and by cheap I mean probably not the best/cleanest method):
import glob
l = glob.glob("Excel_[0-9]*.txt")
This will get you:
>>> print(l)
['Excel_19900717_orig.txt', 'Excel_19900717_V2.txt', 'Excel_19900717.txt']
Now filter it yourself:
nl = [x for x in l if "_orig" not in x and "_V2" not in x]
This will give you:
>>> print(nl)
['Excel_19900717.txt']
The reason for manually filtering through our glob is because the glob library does not support regex.
Use ^Excel_[0-9]{8}\.txt as the file matching regex.

os.path.basename() is inconsistent and I'm not sure why

While creating a program that backs up my files, I found that os.path.basename() was not working consistently. For example:
import os
folder = '\\\\server\\studies\\backup\\backup_files'
os.path.basename(folder)
returns 'backup_files'
folder = '\\\\server\\studies'
os.path.basename(folder)
returns ''
I want that second basename function to return 'studies' but it returns an empty string. I ran os.path.split(folder) to see how it's splitting the string and it turns out it's considering the entire path to be the directory, i.e. ('\\\\server\\studies', ' ').
I can't figure out how to get around it.. The weirdest thing is I ran the same line earlier and it worked, but it won't anymore! Does it have something to do with the very first part being a shared folder on the network drive?
that looks like a Windows UNC specificity
UNC paths can be seen as equivalent of unix path, only with double backslashes at the start.
A workaround would be to use classical rsplit:
>>> r"\\server\studies".rsplit(os.sep,1)[-1]
'studies'
Fun fact: with 3 paths it works properly:
>>> os.path.basename(r"\\a\b\c")
'c'
Now why this? let's check the source code of ntpath on windows:
def basename(p):
"""Returns the final component of a pathname"""
return split(p)[1]
okay now split
def split(p):
seps = _get_bothseps(p)
d, p = splitdrive(p)
now splitdrive
def splitdrive(p):
"""Split a pathname into drive/UNC sharepoint and relative path specifiers.
Returns a 2-tuple (drive_or_unc, path); either part may be empty.
Just reading the documentation makes us understand what's going on.
A Windows sharepoint has to contain 2 path parts:
\\server\shareroot
So \\server\studies is seen as the drive, and the path is empty. Doesn't happen when there are 3 parts in the path.
Note that it's not a bug, since it's not possible to use \\server like a normal directory, create dirs below, etc...
Note that the official documentation for os.path.basename doesn't mention that (because os.path calls ntpath behind the scenes) but it states:
Return the base name of pathname path. This is the second element of the pair returned by passing path to the function split(). Note that the result of this function is different from the Unix basename program
That last emphasised part at least is true! (and the documentation for os.path.split() doesn't mention that issue or even talks about windows)

Using input to change directory path

I'm kinda new to python and I feel like the answer to this is so simple but I have no idea what the answer is. I'm trying to move files from one place to another but I don't want to have to change my code every time I wanna move that file so I just want to get user input from the terminal.
import shutil
loop = True
while loop:
a = input()
shutil.move("/home/Path/a", "/home/Path/Pictures")
What do I have to put around the a so that it doesn't read it as part of the string?
This should do what you want. the os.path.join() will combine the string value in a, that you get from input with the first part of the path you have provided. You should use os.path.join() as this will form paths in a way that is system independent.
import shutil
import os
loop = True
while loop:
a = input()
shutil.move(os.path.join("/home/Path/", a), "/home/Path/Pictures")
Output:
>>> a = input()
test.txt
>>> path = os.path.join("/home/Path/", a)
>>> path
'/home/Path/test.txt'
You can also use "/home/Path/{0}".format(a) which will swap the value of a with {0}, or you can do do "/home/Path/{0}" + str(a) which will also do what you want.
Edited to account for Question in comment:
This will work if your directory doesn't have any sub-directories. it may still work if there are directories and files in there but I didn't test that.
import shutil
import os
files = os.listdir("/home/Path/")
for file in files:
shutil.move(os.path.join("/home/Path/", file), "/home/Path/Pictures")
one solution
a = 'test.csv'
path = '/home/Path/{}'.format(a)
>>> path
/home/Path/test.csv

Finding File Path in Python

I'd like to find the full path of any given file, but when I tried to use
os.path.abspath("file")
it would only give me the file location as being in the directory where the program is running. Does anyone know why this is or how I can get the true path of the file?
What you are looking to accomplish here is ultimately a search on your filesystem. This does not work out too well, because it is extremely likely you might have multiple files of the same name, so you aren't going to know with certainty whether the first match you get is in fact the file that you want.
I will give you an example of how you can start yourself off with something simple that will allow you traverse through directories to be able to search.
You will have to give some kind of base path to be able to initiate the search that has to be made for the path where this file resides. Keep in mind that the more broad you are, the more expensive your searching is going to be.
You can do this with the os.walk method.
Here is a simple example of using os.walk. What this does is collect all your file paths with matching filenames
Using os.walk
from os import walk
from os.path import join
d = 'some_file.txt'
paths = []
for i in walk('/some/base_path'):
if d in i[2]:
paths.append(join(i[0], d))
So, for each iteration over os.walk you are going to get a tuple that holds:
(path, directories, files)
So that is why I am checking against location i[2] to look at files. Then I join with i[0], which is the path, to put together the full filepath name.
Finally, you can actually put the above code all in to one line and do:
paths = [join(i[0], d) for i in walk('/some/base_path') if d in i[2]]

Categories

Resources