I have a huge database of files whose names are like:
XYZ-ABC-K09235D1-20151220-5H1E2H4A.txt
XYZ-ABC-W8D2S5G5-20151225-HG2EK4GE.txt
XYZ-ABC-ME2C5K32-20160206-DD8BA4R6.txt
etc...
Names have all the same structure:
'XYZ-ABC-' + 8 random char + '%y%m%d' + 8 random char + '.txt'
Now, I need to open a file, given the date. The point is that, I don't know the exact name of the file, as there are some random chars within. For instance, for datetime 12/05/2014 I know the filename will be something like
XYZ-ABC-????????-20140512-????????.txt
but I don't know the exact name when using f.open command. What could be the best way to do this? (I thought about first creating a list with all filenames, but I don't know whether it's a good technique or if it's better to use something like glob...). Thank you in advance.
You can use following code
import os
fileName = [filename for filename in os.listdir('.') if filename.startswith("prefix") and 'otherstring' in filename]
Hope this helps !
Related
I have lot of xml files which are named like:
First_ExampleXML_Only_This_Should_Be_Name_20211234567+1234565.xml
Second_ExampleXML_OnlyThisShouldBeName_202156789+55684894.xml
Third_ExampleXML_Only_This_Should_Be_Name1_2021445678+6963696.xml
Fourth_ExampleXML_Only_This_Should_Be_Name2_20214567+696656.xml
I have to make a script that will go through all of the files and rename them, so only this is left from the example:
Only_This_Should_Be_Name.xml
OnlyThisShouldBeName.xml
Only_This_Should_Be_Name1xml
Only_This_Should_Be_Name2.xml
At the moment I have something like this but really struggling to get exactly what I need, guess that have to count from second _ up to _202, and take everything in between.
fnames = listdir('.')
for fname in fnames:
# replace .xml with type of file you want this to have impact on
if fname.endswith('.xml):
Anyone has idea what would be the best approach to do it?
You can strip the contents by splitting with underscores for all xml files and rename with the first value in the list as below.
import os
fnames = os.listdir('.')
for fname in fnames:
# replace .xml with type of file you want this to have impact on
if fname.endswith('.xml'):
newName = '_'.join(fname.split("_")[2:-1])
os.rename(fname, newName+".xml")
else:
continue
here you are eliminating the values which are before and after "_".
There are two problems here:
Finding files of one kind in the directory
Whilst listdir will work, you might as well glob them:
from pathlib import Path
for fn in Path("/path").glob("*.xml"):
....
Renaming files
In this case your files are named "file_name_NUMBERS.xml" and we want to strip the numbers out, so we'll use a regex: Edit: this is not the best way in this case. Just split and combine as in the other answer
import re
from pathlib import Path
for fn in Path("dir").glob("*.xml"):
new_name = re.search(r"(.*?)_[0-9]+", fn.stem).group(1)
fn.rename(fn.with_name(new_name + ".xml"))
Edit: don't know why I overcomplicted things. I'll leave the re solution there for more difficult cases, but in this case you can just do:
new_name = "_".join(fn.stem.split("_")[:-1])
Which is greately superior as it doesn't depend on the precise naming of the files.
Note that you can do all this without pathlib, but you asked for the best way ;)
Lastly, to answer an implicit question, nothing stops you wrapping all this in a function and passing an argument to glob for different types of files.
I think regex will be the simplest approach here, which in python can be accomplished with the re module.
import os
import re
fnames = os.listdir('.')
for fname in fnames:
result = re.sub(r"^.*?_ExampleXML_(.*?)_[\d+]+\.xml$", r"\1.xml", fname)
if result != fname:
os.rename(fname, result)
There are several pattern matching strategies you could employ, depending on your use case.
For instance you could try variants like the following, depending on how specific/general you need to be:
^.*?_ExampleXML_(.*?)_\d+\.xml$ (https://regex101.com/r/hYOLMF/1)
^.*?_ExampleXML_(.*?)_2021\d+\.xml$ (https://regex101.com/r/UzEsbO/1)
^.*?_ExampleXML_(.*?)_[^_]+\.xml$ (https://regex101.com/r/lKzYhq/1)
I'm trying to figure out the best way to store a dynamic or changing value in a variable and use the variable as part of my pattern search in fnmatch. Quite possibly fnmatch is not the right way to go?
Trying to keep this as simple as possible. I'm reading in a list of files from a directory which will have a date string that changes from day to day. I want to verify the file I am looking for exists, and for now just print the filename.
This works ...
#!/bin/python
import os
import datetime as dt
import fnmatch
working_dir = '/my/working/dir/'
now = dt.date.today()
f_date = (now.strftime('%Y%m%d'))
print f_date
for root,dirs,files in os.walk(working_dir):
for fname in files:
if fnmatch.fnmatch(fname, '*data*20190923*'):
print fname
exit(0)
What I see is the file I would like to evaluate further on:
20190923
file-data-random_junk.20190923.txt
However What I'd like to do in the pattern line is use f_date which returns the string 20190923 instead of typing in the date string. Is it possible to match on a combination of text and variable in the pattern string so that I could do something like: if fnmatch.fnmatch(fname, '*data*[my variable]*'): ?
Ok I think I may have answered my own question. Leaving for posterity and in case it helps anyone. All I did was alter the fnmatch line: if fnmatch.fnmatch(fname, '*data*' + f_date + '*'): which is getting me the result I wanted
I am trying to write a Python script in Spyder to deal with several files at the same time.
Actual path is something like:
/TestCondition/TestDate-A1.txt
I want to control the TestCondition and TestDate only at the beginning.
FoldPath = TestCondition
FileName = TestDate
I want to do something like:
dfA1 = pd.read_csv(FoldPath&'/'Filename&'A1.txt'
dfA2 = pd.read_csv(FoldPath&'/'Filename&'A2.txt'
....
dfA12 = pd.read_csv(FoldPath&'/'Filename&'A12.txt'
#Code with Pandas and Numpy...
How do I concatenate the variable names FoldPath and FileName with the string "A1 to A12" specifically to call out a csv file ? I can't find the correct syntax.
Thanks,
J-F
Edit Question solved.
With the knowledge of "import os" and also "os.path.join", I can now find a bunch of examples to do what I intended to do. I know that this question has been asked several times, but with my limited knowledge of Python, and programming in general, I could not find the correct key words. Anyway, thanks again for your quick answers.
You can use os.path.join and the + to concatenate
import os
filepath = os.path.join(FoldPath, FileName + '-A1.txt')
dfA1 = pd.read_csv(filepath ...
first..
dfA1 = pd.read_csv("/{}/{}-A1.txt".format(FoldPath, Filename)
but this code not recommanded.
second.. use os.path.join
dfA1 = pd.read_csv(os.path.join(FoldPath, "{}-A1.txt".format(Filename, ))
Here is the below code we have developed for single directory of files
from os import listdir
with open("/user/results.txt", "w") as f:
for filename in listdir("/user/stream"):
with open('/user/stream/' + filename) as currentFile:
text = currentFile.read()
if 'checksum' in text:
f.write('current word in ' + filename[:-4] + '\n')
else:
f.write('NOT ' + filename[:-4] + '\n')
I want loop for all directories
Thanks in advance
If you're using UNIX you can use grep:
grep "checksum" -R /user/stream
The -R flag allows for a recursive search inside the directory, following the symbolic links if there are any.
My suggestion is to use glob.
The glob module allows you to work with files. In the Unix universe, a directory is / should be a file so it should be able to help you with your task.
More over, you don't have to install anything, glob comes with python.
Note: For the following code, you will need python3.5 or greater
This should help you out.
import os
import glob
for path in glob.glob('/ai2/data/prod/admin/inf/**', recursive=True):
# At some point, `path` will be `/ai2/data/prod/admin/inf/inf_<$APP>_pvt/error`
if not os.path.isdir(path):
# Check the `id` of the file
# Do things with the file
# If there are files inside `/ai2/data/prod/admin/inf/inf_<$APP>_pvt/error` you will be able to access them here
What glob.glob does is, it Return a possibly-empty list of path names that match pathname. In this case, it will match every file (including directories) in /user/stream/. If these files are not directories, you can do whatever you want with them.
I hope this will help you!
Clarification
Regarding your 3 point comment attempting to clarify the question, especially this part we need to put appi dynamically in that path then we need to read all files inside that directory
No, you do not need to do this. Please read my answer carefully and please read glob documentation.
In this case, it will match every file (including directories) in /user/stream/
If you replace /user/stream/ with /ai2/data/prod/admin/inf/, you will have access to every file in /ai2/data/prod/admin/inf/. Assuming your app ids are 1, 2, 3, this means, you will have access to the following files.
/ai2/data/prod/admin/inf/inf_1_pvt/error
/ai2/data/prod/admin/inf/inf_2_pvt/error
/ai2/data/prod/admin/inf/inf_3_pvt/error
You do not have to specify the id, because you will be iterating over all files. If you do need the id, you can just extract it from the path.
If everything looks like this, /ai2/data/prod/admin/inf/inf_<$APP>_pvt/error, you can get the id by removing /ai2/data/prod/admin/inf/ and taking everything until you encounter _.
I'm currently trying to write a python script to rename a bunch of files. The file is named like this: [Name][Number]-[Number]. To give a specific example: milk-00-00. The next file is milk-00-01, then 02, 03 until X. After that milk-01-00 starts with the same pattern.
What I need to do is to switch 'milk' into a number and replace the '-XX-XX' by '-01', '02', ...
I hope you guys get the idea. The current state of my code is pretty poor, it was hard enough to get it this far though. It looks like this and with this I'm at least able to replace something. I'll also manage to get rid of the 'milk' with the help of google. However, if there is an easier way, I'd really appreciate a push in the right direction!
import os
import sys
path = 'C:/Users/milk/Desktop/asd'
i=00
for filename in os.listdir(path):
if filename.endswith('.tiff'):
newname = filename.replace('00', 'i')
os.rename(filename,newname)
i=i+1
You can use the format function
temp = (' ').join(filename.split('.')[:-1])
os.rename(filename, '10{}-{}.tiff'.format(temp.split('-')[-2],temp.split('-')[-1]))
Since filename has the .tiff extension this program first creates a version of filename without the extension - temp - and then creates new names from that.
os.rename(filename, '1000-%02d.tiff' % i)
i += 1