Pulling file names from a list of full paths? - python

I am trying to pull out file names from a specifically formatted document, and put them into a list. The document contains a large amount of information, but the lines I am concerned about look like the following with "File Name: " always at the start of the line:
File Name: C:\windows\system32\cmd.exe
I tried the following:
xmlfile = open('my_file.xml', 'r')
filetext = xmlfile.read()
file_list = []
file_list.append(re.findall(r'\bFile Name:\s+.*\\.*(?=\n)', filetext))
This makes file_list look like:
[['File Name: c:\\windows\\system32\\file1.exe',
'File Name: c:\\windows\\system32\\file2.exe',
'File Name: c:\\windows\\system32\\file3.exe']]
I'm looking for my output to simply be:
(file1.exe, file2.exe, file3.exe)
I also tried using ntpath.basename on my above output, but it looks like it wants a string as input and not a list.
I'm very new to Python and scripting in general, so any suggestions would be appreciated.

You can get the expected output with following regular expression:
file_list = re.findall(r'\bFile Name:\s+.*\\([^\\]*)(?=\n)', filetext)
([^\\]*) will capture everything except a slash after final path separator until \n is encountered, see online example. Since findall already returns a list there's no need to append the return value to existing list.

You can do it in a more declarative style. It ensures less bugs, high memory efficiency.
import os.path
pat = re.compile(r'\bFile Name:\s+.*\\.*(?=\n)')
with open('my_file.xml') as f:
ms = (pat.match(line) for line in f)
ns = (os.path.basename(m) for m in ms)
# the iterator ns emits names such as 'foo.txt'
for n in ns:
# do something
If you change the regex slightly, i.e the grouping you don't even need os.path.

I would change this up a bit to make it a bit clearer to read and separate the process a bit - clearly it can be done in one step, but I think your code is going to be tough to manage later
import re
import os
with open('my_file.xml', 'r') as xmlfile:
filetext = xmlfile.read() # this way the file handle goes away - you left the file open
file_list = []
my_pattern = re.compile(r'\bFile Name:\s+.*\\.*(?=\n)')
for filename in my_pattern.findall(filetext):
cleaned_name = filename.split(os.sep)[-1]
file_list.append(cleaned_name)

You're on the right track. The reason basename wasn't working was because re.findall() returns a list which was being put into yet another list. Here's a fix for that which iterates through that list returned and creates another with just the base file names in:
import re
import os
with open('my_file.xml', 'rU') as xmlfile:
file_text = xmlfile.read()
file_list = [os.path.basename(fn)
for fn in re.findall(r'\bFile Name:\s+.*\\.*(?=\n)', file_text)]

Related

concatenate file contents into a list with python

I wrote a small script in python to concatenate some lines from different files into one file. But somehow it doesn't print anything I like it to print by the function I wrote. I tried to spot the problems, but after one evening and one morning, I still can't find the problem. Could somebody help me please? Thanks a lot!
So I have a folder where around thousands of .fa files are. In each of the .fa file, I would like to extract the line starting with ">", and also do some change to extract the information I like. In the end, I would like to combine all the information extracted from one file into one line in a new file, and then concatenate all the information from all the .fa file into one .txt file.
So the folder:
% ls
EstimatedSpeciesTree.nwk HOG9998.fa concatenate_gene_list_HOG.py
HOG9997.fa HOG9999.fa output
One .fa file for example
>BnaCnng71140D [BRANA]
MTSSFKLSDLEEVTTNAEKIQNDLLKEILTLNAKTEYLRQFLHGSSDKTFFKKHVPVVSYEDMKPYIERVADGEPSEIIS
GGPITKFLRRYSF
>Cadbaweo98702.t [THATH]
MTSSFKLSDLEEVTTNAEKIQNDLLKEILTLNAKTEYLRQFLHGSSDKTFFKKHVPVVSYEDMKPYIERVADGEPSEIIS
GGPITKFLRRYSF
What I would like to have is one file like this
HOG9997.fa BnaCnng71140D:BRANA Cadbaweo98702.t:THATH
HOG9998.fa Bkjfnglks098:BSFFE dsgdrgg09769.t
HOG9999.fa Dsdfdfgs1937:XDGBG Cadbaweo23425.t:THATH Dkkjewe098.t:THUGB
# NOTE: the number of lines in each .fa file are uncertain. Also, some lines has [ ], but some lines has not.
So my code is
#!/usr/bin/env python3
import os
import re
import csv
def concat_hogs(a_file):
output = []
for row in a_file: # select the gene names in each HOG fasta file
if row.startswith(">"):
trans = row.partition(" ")[0].partition(">")[2]
if row.partition(" ")[2].startswith("["):
species = re.search(r"\[(.*?)\]", row).group(1)
else:
species = ""
output.append(trans + ":" + species)
return '\t'.join(output)
folder = "/tmp/Fasta/"
print("Concatenate names in " + folder)
for file in os.listdir(folder):
if file.endswith('.fa'):
reader = csv.reader(file, delimiter="\t")
print(file + concat_hogs(reader))
But the output only prints the file name with out the part that should be generated by the function concat_hogs(file). I don't understand why.
The error comes from you passing the name of the file to your concat_hogs function instead of an iterable file handle. You are missing the actual opening of the file for reading purposes.
I agree with Jay M that your code can be simplified drastically, not least by using regular expressions more efficiently. Also pathlib is awesome.
But I think it can be even more concise and expressive. Here is my suggestion:
#!/usr/bin/env python3
import re
from pathlib import Path
GENE_PATTERN = re.compile(
r"^>(?P<trans>[\w.]+)\s+(?:\[(?P<species>\w+)])?"
)
def extract_gene(string: str) -> str:
match = re.search(GENE_PATTERN, string)
return ":".join(match.groups(default=""))
def concat_hogs(file_path: Path) -> str:
with file_path.open("r") as file:
return '\t'.join(
extract_gene(row)
for row in file
if row.startswith(">")
)
def main() -> None:
folder = Path("/tmp/Fasta/")
print("Concatenate names in", folder)
for element in folder.iterdir():
if element.is_file() and element.suffix == ".fa":
print(element.name, concat_hogs(element))
if __name__ == '__main__':
main()
I am using named capturing groups for the regular expression because I prefer it for readability and usability later on.
Also I assume that the first group can only contain letters, digits and dots. Adjust the pattern, if there are more options.
PS
Just to add a few additional explanations:
The pathlib module is a great tool for any basic filesystem-related task. Among a few other useful methods you can look up there, I use the Path.iterdir method, which just iterates over elements in that directory instead of creating an entire list of them in memory first the way os.listdir does.
The RegEx Match.groups method returns a tuple of the matched groups, the default parameter allows setting the value when a group was not matched. I put an empty string there, so that I can always simply str.join the groups, even if the species-group was not found. Note that this .groups call will result in an AttributeError, if no match was found because then the match variable will be set to None. It may or may not be useful for you to catch this error.
For a few additional pointers about using regular expressions in Python, there is a great How-To-Guide in the docs. In addition I can only agree with Jay M about how useful regex101.com is, regardless of language specifics. Also, I think I would recommend using his approach of reading the entire file into memory as a single string first and then using re.findall on it to grab all matches at once. That is probably much more efficient than going line-by-line, unless you are dealing with gigantic files.
In concat_hogs I pass a generator expression to str.join. This is more efficient than first creating a list and passing that to join because no additional memory needs to be allocated. This is possible because str.join accepts any iterable of strings and that generator expression (... for ... in ...) returns a Generator, which inherits from Iterator and thus from Iterable. For more insight about the container inheritance structures I always refer to the collections.abc docs.
Use standard Python libraries
In this case
regex (use a site such as regex101 to test your regex)
pathlib to encapsulate paths in a platform independent way
collections.namedtuple to make data more structured
A breakdown of the regex used here:
>([a-z0-9A-Z\.]+?)\s*(\n|\[([A-Z]+)\]?\n)
> The start of block character
(regex1) First matchig block
\s* Any amount of whitespace (i.e. zero space is ok)
(regex2|regex3) A choice of two possible regex
regex1: + = One or more of characters in [class] Where class is any a to z or 0 to 9 or a dot
regex2: \n = A newline that immediately follows the whitespace
regex3: [([A-Z]+)] = One or more upper case letter inside square brackets
Note1: The brackets create capture groups, which we later use to split out the fields.
Note2: The regex demands zero or more whitespace between the first and second part of the text line, this makes it more resiliant.
import re
from collections import namedtuple
from pathlib import Path
import os
class HOG(namedtuple('HOG', ['filepath', 'filename', 'key', 'text'], defaults=[None])):
__slots__ = ()
def __str__(self):
return f"{self.key}:{self.text}"
regex = re.compile(r">([a-z0-9A-Z\.]+?)\s*(\n|\[([A-Z]+)\]?\n)")
path = Path(os.path.abspath("."))
wildcard = "*.fa"
files = list(path.glob("*.fa"))
print(f"Searching {path}/{wildcard} => found {len(files)} files")
data = {}
for file in files:
print(f"Processing {file}")
with file.open() as hf:
text = hf.read(-1)
matches = regex.findall(text)
for match in matches:
key = match[0].strip()
text = match[-1].strip()
if file.name not in data:
data[file.name] = []
data[file.name].append(HOG(path, file.name, key, text))
print("Now you have the data you can process it as you like")
for file, entries in data.items():
line = "\t".join(list(str(e) for e in entries))
print(file, line)
# e.g. Write the output as desired
with Path("output").open("w") as fh:
for file, entries in data.items():
line = "\t".join(list(str(e) for e in entries))
fh.write(f"{file}\t{line}\n")

finding matches between a) all the files in a directory and b) a txt list of files not working with fnmatch - python

So I've got the code below and when I run tests to spit out all the files in A1_dir and A2_list, all of the files are showing up, but when I try to get the fnmatch to work, I get no results.
For background in case its helpful: I am trying to comb through a directory of files and take an action (duplicate the file) only IF it matches a file name on the newoutput.txt list. I'm sure there's a better way to do all of this lol, so if you have that I'd love to hear it too!
import fnmatch
import os
A1_dir = ('C:/users/alexd/kobe')
A2_list = open('C:/users/alexd/kobe/newoutput.txt')
Lines = A2_list.readlines()
A2_list.close()
for file in (os.listdir(A1_dir)):
for line in Lines:
if fnmatch.fnmatch(file, line):
print("got one:{file}")
readline returns a single line and readlines returns all the lines as a list (doc). However, in both cases, the lines always have a trailing \n i.e. the newline character.
A simple fix here would be to change
Lines = A2_list.readlines()
to
Lines = [i.strip() for i in A2_list.readlines()]
Since you asked for a better way, you could take a look at set operations.
Since the lines are exactly what you want the file names to be (and not patterns), save A2_list as a set instead of a list.
Next, save all the files from os.listdir also as a set.
Finally, perform a set intersection
import fnmatch
import os
with open('C:/users/alexd/kobe/newoutput.txt') as fp:
myfiles = set(i.strip() for i in fp.readlines())
all_files = set(os.listdir('C:/users/alexd/kobe'))
for f in all_files.intersection(myfiles):
print(f"got one:{f}")
You cannot use fnmatch.fnmatch to compare 2 different filenames, fnmatch.fnmatch only accepts 2 parameters filename and pattern respectively.
As you can see in the official documentation:
Possible Solution:
I don't think that you have to use any function to compare 2 strings. Both os.listdir() and .readlines() returns you lists of strings.

Code for coyping specific lines from multiple files to a single file (and removing part of the copied lines)

First of all, I am really new to this. I've been reading up on some tutorials over the past days, but now I've hit a wall with what I want to achieve.
To give you the long version: I have multiple files in a directory, all of which contain information in certain lines (23-26). Now, the code would have to find and open all files (naming pattern: *.tag) and then copy lines 23-26 to a new single file. (And add a new line after each new entry...). Optionally it would also remove a specific part from each line that I do not need:
C12b2
-> everything before C12b2 (or similar) would need to be removed.
Thus far I have managed to copy those lines from a single file to a new file, but the rest still eludes me: (no idea how formatting works here)
f = open('2.tag')
n = open('output.txt', 'w')
for i, text in enumerate(f):
if i >= 23 and i < 27:
n.write(text)
else:
pass
Could anyone give me some advice ? I do not need a complete code as an answer, however, good tutorials that don't skip explanations seem to be hard to come by.
You can look at the glob module , it gives a list of filenames that match the pattern you provide it , please note this pattern is not regex , it is shell-style pattern (using shell-style wildcards).
Example of glob -
>>> import glob
>>> glob.glob('*.py')
['a.py', 'b.py', 'getpip.py']
You can then iterate over each of the file returned by the glob.glob() function.
For each file you can do that same thing you are doing right now.
Then when writing files, you can use str.find() to find the first instance of the string C12b2 and then use slicing to remove of the part you do not want.
As an example -
>>> s = "asdbcdasdC12b2jhfasdas"
>>> s[s.find("C12b2"):]
'C12b2jhfasdas'
You can do something similar for each of your lines , please note if the usecase if that only some lines would have C12b2 , then you need to first check whether that string is present in the line, before doing the above slicing. Example -
if 'C12b2' in text:
text = text[text.find("C12b2"):]
You can do above before writing the line into the output file.
Also, would be good to look into the with statement , you can use it for openning files, so that it will automatically handle closing the file, when you are done with the processing.
Without importing anything but os:
#!/usr/bin/env python3
import os
# set the directory, the outfile and the tag below
dr = "/path/to/directory"; out = "/path/to/newfile"; tag = ".txt"
for f in [f for f in os.listdir(dr) if f.endswith(".txt")]:
open(out, "+a").write(("").join([l for l in open(dr+"/"+f).readlines()[22:25]])+"\n")
What it does
It does exactly as you describe, it:
collects a defined region of lines from all files (that is: of a defined extension) in a directory
pastes the sections into a new file, separated by a new line
Explanation
[f for f in os.listdir(dr) if f.endswith(".tag")]
lists all files of the specific extension in your directory,
[l for l in open(dr+"/"+f).readlines()[22:25]]
reads the selected lines of the file
open(out, "+a").write()
writes to the output file, creates it if it does not exist.
How to use
Copy the script into an empty file, save it as collect_lines.py
set in the head section the directory with your files, the path to the new file and the extension
run it with the command:
python3 /path/to/collect_lines.py
The verbose version, with explanation
If we "decompress" the code above, this is what happens:
#!/usr/bin/env python3
import os
#--- set the path to the directory, the new file and the tag below
dr = "/path/to/directory"; out = "/path/to/newfile"; tag = ".txt"
#---
files = os.listdir(dr)
for f in files:
if f.endswith(tag):
# read the file as a list of lines
content = open(dr+"/"+f).readlines()
# the first item in a list = index 0, so line 23 is index 22
needed_lines = content[22:25]
# convert list to string, add a new line
string_topaste = ("").join(needed_lines)+"\n"
# add the lines to the new file, create the file if necessary
open(out, "+a").write(string_topaste)
Using the glob package you can get a list of all *.tag files:
import glob
# ['1.tag', '2.tag', 'foo.tag', 'bar.tag']
tag_files = glob.glob('*.tag')
If you open your file using the with statement, it is being closed automatically afterwards:
with open('file.tag') as in_file:
# do something
Use readlines() to read your entire file into a list of lines, which can then be sliced:
lines = in_file.readlines()[22:26]
If you need to skip everything before a specific pattern, use str.split() to separate the string at the pattern and take the last part:
pattern = 'C12b2'
clean_lines = [line.split(pattern, 1)[-1] for line in lines]
Take a look at this example:
>>> lines = ['line 22', 'line 23', 'Foobar: C12b2 line 24']
>>> pattern = 'C12b2'
>>> [line.split(pattern, 1)[-1] for line in lines]
['line 22', 'line 23', ' line 24']
You can realines and writelines using a and b as line bounds for the slice of lines to write:
with open('oldfile.txt', 'r') as old:
lines = old.readlines()[a:b]
with open('newfile.txt', 'w') as new:
new.writelines(lines)

python clear content writing on same file

I am a newbie to python. I have a code in which I must write the contents again to my same file,but when I do it it clears my content.Please help to fix it.
How should I modify my code such that the contents will be written back on the same file?
My code:
import re
numbers = {}
with open('1.txt') as f,open('11.txt', 'w') as f1:
for line in f:
row = re.split(r'(\d+)', line.strip())
words = tuple(row[::2])
if words not in numbers:
numbers[words] = [int(n) for n in row[1::2]]
numbers[words] = [n+1 for n in numbers[words]]
row[1::2] = map(str, numbers[words])
indentation = (re.match(r"\s*", line).group())
print (indentation + ''.join(row))
f1.write(indentation + ''.join(row) + '\n')
In general, it's a bad idea to write over a file you're still processing (or change a data structure over which you are iterating). It can be done...but it requires much care, and there is little safety or restart-ability should something go wrong in the middle (an error, a power failure, etc.)
A better approach is to write a clean new file, then rename it to the old name. For example:
import re
import os
filename = '1.txt'
tempname = "temp{0}_{1}".format(os.getpid(), filename)
numbers = {}
with open(filename) as f, open(tempname, 'w') as f1:
# ... file processing as before
os.rename(tempname, filename)
Here I've dropped filenames (both original and temporary) into variables, so they can be easily referred to multiple times or changed. This also prepares for the moment when you hoist this code into a function (as part of a larger program), as opposed to making it the main line of your program.
You don't strictly need the temporary name to embed the process id, but it's a standard way of making sure the temp file is uniquely named (temp32939_1.txt vs temp_1.txt or tempfile.txt, say).
It may also be helpful to create backups of the files as they were before processing. In which case, before the os.rename(tempname, filename) you can drop in code to move the original data to a safer location or a backup name. E.g.:
backupname = filename + ".bak"
os.rename(filename, backupname)
os.rename(tempname, filename)
While beyond the scope of this question, if you used a read-process-overwrite strategy frequently, it would be possible to create a separate module that abstracted these file-handling details away from your processing code. Here is an example.
Use
open('11.txt', 'a')
To append to the file instead of w for writing (a new or overwriting a file).
If you want to read and modify file in one time use "r+' mode.
f=file('/path/to/file.txt', 'r+')
content=f.read()
content=content.replace('oldstring', 'newstring') #for example change some substring in whole file
f.seek(0) #move to beginning of file
f.write(content)
f.truncate() #clear file conent "tail" on disk if new content shorter then old
f.close()

Rename Files Based on File Content

Using Python, I'm trying to rename a series of .txt files in a directory according to a specific phrase in each given text file. Put differently and more specifically, I have a few hundred text files with arbitrary names but within each file is a unique phrase (something like No. 85-2156). I would like to replace the arbitrary file name with that given phrase for every text file. The phrase is not always on the same line (though it doesn't deviate that much) but it always is in the same format and with the No. prefix.
I've looked at the os module and I understand how
os.listdir
os.path.join
os.rename
could be useful but I don't understand how to combine those functions with intratext manipulation functions like linecache or general line reading functions.
I've thought through many ways of accomplishing this task but it seems like easiest and most efficient way would be to create a loop that finds the unique phrase in a file, assigns it to a variable and use that variable to rename the file before moving to the next file.
This seems like it should be easy, so much so that I feel silly writing this question. I've spent the last few hours looking reading documentation and parsing through StackOverflow but it doesn't seem like anyone has quite had this issue before -- or at least they haven't asked about their problem.
Can anyone point me in the right direction?
EDIT 1: When I create the regex pattern using this website, it creates bulky but seemingly workable code:
import re
txt='No. 09-1159'
re1='(No)' # Word 1
re2='(\\.)' # Any Single Character 1
re3='( )' # White Space 1
re4='(\\d)' # Any Single Digit 1
re5='(\\d)' # Any Single Digit 2
re6='(-)' # Any Single Character 2
re7='(\\d)' # Any Single Digit 3
re8='(\\d)' # Any Single Digit 4
re9='(\\d)' # Any Single Digit 5
re10='(\\d)' # Any Single Digit 6
rg = re.compile(re1+re2+re3+re4+re5+re6+re7+re8+re9+re10,re.IGNORECASE|re.DOTALL)
m = rg.search(txt)
name = m.group(0)
print name
When I manipulate that to fit the glob.glob structure, and make it like this:
import glob
import os
import re
re1='(No)' # Word 1
re2='(\\.)' # Any Single Character 1
re3='( )' # White Space 1
re4='(\\d)' # Any Single Digit 1
re5='(\\d)' # Any Single Digit 2
re6='(-)' # Any Single Character 2
re7='(\\d)' # Any Single Digit 3
re8='(\\d)' # Any Single Digit 4
re9='(\\d)' # Any Single Digit 5
re10='(\\d)' # Any Single Digit 6
rg = re.compile(re1+re2+re3+re4+re5+re6+re7+re8+re9+re10,re.IGNORECASE|re.DOTALL)
for fname in glob.glob("\file\structure\here\*.txt"):
with open(fname) as f:
contents = f.read()
tname = rg.search(contents)
print tname
Then this prints out the byte location of the the pattern -- signifying that the regex pattern is correct. However, when I add in the nname = tname.group(0) line after the original tname = rg.search(contents) and change around the print function to reflect the change, it gives me the following error: AttributeError: 'NoneType' object has no attribute 'group'. When I tried copying and pasting #joaquin's code line for line, it came up with the same error. I was going to post this as a comment to the #spatz answer but I wanted to include so much code that this seemed to be a better way to express the `new' problem. Thank you all for the help so far.
Edit 2: This is for the #joaquin answer below:
import glob
import os
import re
for fname in glob.glob("/directory/structure/here/*.txt"):
with open(fname) as f:
contents = f.read()
tname = re.search('No\. (\d\d\-\d\d\d\d)', contents)
nname = tname.group(1)
print nname
Last Edit: I got it to work using mostly the code as written. What was happening is that there were some files that didn't have that regex expression so I assumed Python would skip them. Silly me. So I spent three days learning to write two lines of code (I know the lesson is more than that). I also used the error catching method recommended here. I wish I could check all of you as the answer, but I bothered #Joaquin the most so I gave it to him. This was a great learning experience. Thank you all for being so generous with your time. The final code is below.
import os
import re
pat3 = "No\. (\d\d-\d\d)"
ext = '.txt'
mydir = '/directory/files/here'
for arch in os.listdir(mydir):
archpath = os.path.join(mydir, arch)
with open(archpath) as f:
txt = f.read()
s = re.search(pat3, txt)
if s is None:
continue
name = s.group(1)
newpath = os.path.join(mydir, name)
if not os.path.exists(newpath):
os.rename(archpath, newpath + ext)
else:
print '{} already exists, passing'.format(newpath)
Instead of providing you with some code which you will simply copy-paste without understanding, I'd like to walk you through the solution so that you will be able to write it yourself, and more importantly gain enough knowledge to be able to do it alone next time.
The code which does what you need is made up of three main parts:
Getting a list of all filenames you need to iterate
For each file, extract the information you need to generate a new name for the file
Rename the file from its old name to the new one you just generated
Getting a list of filenames
This is best achieved with the glob module. This module allows you to specify shell-like wildcards and it will expand them. This means that in order to get a list of .txt file in a given directory, you will need to call the function glob.iglob("/path/to/directory/*.txt") and iterate over its result (for filename in ...:).
Generate new name
Once we have our filename, we need to open() it, read its contents using read() and store it in a variable where we can search for what we need. That would look something like this:
with open(filename) as f:
contents = f.read()
Now that we have the contents, we need to look for the unique phrase. This can be done using regular expressions. Store the new filename you want in a variable, say newfilename.
Rename
Now that we have both the old and the new filenames, we need to simply rename the file, and that is done using os.rename(filename, newfilename).
If you want to move the files to a different directory, use os.rename(filename, os.path.join("/path/to/new/dir", newfilename). Note that we need os.path.join here to construct the new path for the file using a directory path and newfilename.
There is no checking or protection for failures (check is archpath is a file, if newpath already exists, if the search is succesful, etc...), but this should work:
import os
import re
pat = "No\. (\d\d\-\d\d\d\d)"
mydir = 'mydir'
for arch in os.listdir(mydir):
archpath = os.path.join(mydir, arch)
with open(archpath) as f:
txt = f.read()
s = re.search(pat, txt)
name = s.group(1)
newpath = os.path.join(mydir, name)
os.rename(archpath, newpath)
Edit: I tested the regex to show how it works:
>>> import re
>>> pat = "No\. (\d\d\-\d\d\d\d)"
>>> txt='nothing here or whatever No. 09-1159 you want, does not matter'
>>> s = re.search(pat, txt)
>>> s.group(1)
'09-1159'
>>>
The regex is very simple:
\. -> a dot
\d -> a decimal digit
\- -> a dash
So, it says: search for the string "No. " followed by 2+4 decimal digits separated by a dash.
The parentheses are to create a group that I can recover with s.group(1) and that contains the code number.
And that is what you get, before and after:
Text of files one.txt, two.txt and three.txt is always the same, only the number changes:
this is the first
file with a number
nothing here or whatever No. 09-1159 you want, does not matter
the number is
Create a backup of your files, then try something like this:
import glob
import os
def your_function_to_dig_out_filename(lines):
import re
# i'll let you attempt this yourself
for fn in glob.glob('/path/to/your/dir/*.txt'):
with open(fn) as f:
spam = f.readlines()
new_fn = your_function_to_dig_out_filename(spam)
if not os.path.exists(new_fn):
os.rename(fn, new_fn)
else:
print '{} already exists, passing'.format(new_fn)

Categories

Resources