I have a directory containing a number of files with this format:
1 or 2 numbers_S followed by 1 or 2 numbers_L001_R1 or R2_001.fastq
Examples: 1_S1_L001_R1_001.fastq or 14_S14_L001_R2_001.fastq
I want the file names to be like this: 1_R1.fastq 14_R2.fastq
I have figured out the regexp that reflects the file names and can successfully do the search and replace within TextWrangler. Below is the regexp that I came up with:
Search: (\d+)\wS\d+\wL001\w(R\d)\w001(\.fastq)
Replace: \1_\2\3 (or $1_$2$3 depending on the program)
However, I would like to know how to batch rename the files using a simple Python script. I would appreciate any advice.
Thank you!
You can do something like this
import glob, re, os
for filename in glob.glob('/some/dir/*.fastq'):
new_name = re.sub(pattern, r'\1_\2\3', filename)
os.rename(filename, new_name)
Consider using the package os, from which you can use os.rename(src, dst). The documentation is right here.
Related
I have a script I run daily to compile a bunch of spreadsheets into one. Well after a year of running one of the filenames changed due to it being produced 14 seconds later. I read the filename in like this
uproduction = Path(r"\\server\folder\P"+year+month+day+r"235900.xls")
and then df = pd.read_excel(upreduction)
This was working fine until the file name changed to P20210225235914.xls . When I am using a raw string like that is there a way I can make it pick any file that starts with P20210225*.xls ? I can't seem to find exactly what i'm looking for for in the docs
You can use glob:
from glob import glob
glob(r"\\server\folder\P"+year+month+day+"*.xls")
You can use the glob method on the Path:
for file in Path(r'\\server\folder\').glob(r'P20210225*.xls'):
print(file.name)
I've already posted here with the same question but I sadly I couldn't come up with a solution (even though some of you guys gave me awesome answers but most of them weren't what I was looking for), so I'll try again and this time giving more information about what I'm trying to do.
So, I'm using a program called GMAT to get some outputs (.txt files with numerical values). These outputs have different names, but because I'm using them to more than one thing I'm getting something like this:
GMATd_1.txt
GMATd_2.txt
GMATf_1.txt
GMATf_2.txt
Now, what I need to do is to use these outputs as inputs in my code. I need to work with them in other functions of my script, and since I will have a lot of these .txt files I want to rename them as I don't want to use them like './path/etc'.
So what I wanted was to write a loop that could get these files and rename them inside the script so I can use these files with the new name in other functions (outside the loop).
So instead of having to this individually:
GMATds1= './path/GMATd_1.txt'
GMATds2= './path/GMATd_2.txt'
I wanted to write a loop that would do that for me.
I've already tried using a dictionary:
import os
import fnmatch
dict = {}
for filename in os.listdir('.'):
if fnmatch.fnmatch(filename, 'thing*.txt'):
examples[filename[:6]] = filename
This does work but I can't use the dictionary key outside the loop.
If I understand correctly, you try to fetch files with similar names (at least a re-occurring pattern) and rename them. This can be accomplished with the following code:
import glob
import os
all_files = glob.glob('path/to/directory/with/files/GMAT*.txt')
for file in files:
new_path = create_new_path(file) # possibly split the file name, change directory and/or filename
os.rename(file, new_path)
The glob library allows for searching files with * wildcards and makes it hence possible to search for files with a specific pattern. It lists all the files in a certain directory (or multiple directories if you include a * wildcard as a directory). When you iterate over the files, you could either directly work with the input of the files (as you apparently intend to do) or rename them as shown in this snippet. To rename them, you would need to generate a new path - so you would have to write the create_new_path function that takes the old path and creates a new one.
Since python 3.4 you should be using the built-in pathlib package instead of os or glob.
from pathlib import Path
import shutil
for file_src in Path("path/to/files").glob("GMAT*.txt"):
file_dest = str(file_src.resolve()).replace("ds", "d_")
shutil.move(file_src, file_dest)
you can use
import os
path='.....' # path where these files are located
path1='.....' ## path where you want these files to store
i=1
for file in os.listdir(path):
if file.endswith(end='.txt'):
os.rename(path + "/" + file, path1 + "/"+str(i) + ".txt")
i+=1
it will rename all the txt file in the source folder to 1,2,3,....n.txt
Here is the below code we have developed for single directory of files
from os import listdir
with open("/user/results.txt", "w") as f:
for filename in listdir("/user/stream"):
with open('/user/stream/' + filename) as currentFile:
text = currentFile.read()
if 'checksum' in text:
f.write('current word in ' + filename[:-4] + '\n')
else:
f.write('NOT ' + filename[:-4] + '\n')
I want loop for all directories
Thanks in advance
If you're using UNIX you can use grep:
grep "checksum" -R /user/stream
The -R flag allows for a recursive search inside the directory, following the symbolic links if there are any.
My suggestion is to use glob.
The glob module allows you to work with files. In the Unix universe, a directory is / should be a file so it should be able to help you with your task.
More over, you don't have to install anything, glob comes with python.
Note: For the following code, you will need python3.5 or greater
This should help you out.
import os
import glob
for path in glob.glob('/ai2/data/prod/admin/inf/**', recursive=True):
# At some point, `path` will be `/ai2/data/prod/admin/inf/inf_<$APP>_pvt/error`
if not os.path.isdir(path):
# Check the `id` of the file
# Do things with the file
# If there are files inside `/ai2/data/prod/admin/inf/inf_<$APP>_pvt/error` you will be able to access them here
What glob.glob does is, it Return a possibly-empty list of path names that match pathname. In this case, it will match every file (including directories) in /user/stream/. If these files are not directories, you can do whatever you want with them.
I hope this will help you!
Clarification
Regarding your 3 point comment attempting to clarify the question, especially this part we need to put appi dynamically in that path then we need to read all files inside that directory
No, you do not need to do this. Please read my answer carefully and please read glob documentation.
In this case, it will match every file (including directories) in /user/stream/
If you replace /user/stream/ with /ai2/data/prod/admin/inf/, you will have access to every file in /ai2/data/prod/admin/inf/. Assuming your app ids are 1, 2, 3, this means, you will have access to the following files.
/ai2/data/prod/admin/inf/inf_1_pvt/error
/ai2/data/prod/admin/inf/inf_2_pvt/error
/ai2/data/prod/admin/inf/inf_3_pvt/error
You do not have to specify the id, because you will be iterating over all files. If you do need the id, you can just extract it from the path.
If everything looks like this, /ai2/data/prod/admin/inf/inf_<$APP>_pvt/error, you can get the id by removing /ai2/data/prod/admin/inf/ and taking everything until you encounter _.
Hello I'm new to python and I'd like to know how to process a .txt file line by line to copy files specifid as wild cards
basically the .txt file looks like this.
bin/
bin/*.txt
bin/*.exe
obj/*.obj
document
binaries
so now with that information I'd like to be able to read my .txt file match the directory copy all the files that start with * for that directory, also I'd like to be able to copy the folders listed in the .txt file. What's the best practical way of doing this? your help is appreciated, thanks.
Here's something to start with...
import glob # For specifying pathnames with wildcards
import shutil # For doing common "shell-like" operations.
import os # For dealing with pathnames
# Grab all the pathnames of all the files matching those specified in `text_file.txt`
matching_pathnames = []
for line in open('text_file.txt','r'):
matching_pathnames += glob.glob(line)
# Copy all the matched files to the same filename + '.new' at the end
for pathname in matching_pathnames:
shutil.copyfile(pathname, '%s.new' % (pathname,))
You might want to look at the glob and re modules
http://docs.python.org/library/glob.html
I have written a piece of a code which is supposed to read the texts inside several files which are located in a directory. These files are basically text files but they do not have any extensions.But my code is not able to read them:
corpus_path = 'Reviews/'
for infile in glob.glob(os.path.join(corpus_path,'*.*')):
review_file = open(infile,'r').read()
print review_file
To test if this code works, I put a dummy text file, dummy.txt. which worked because it has extension. But i don't know what should be done so files without the extensions could be read.
can someone help me? Thanks
Glob patterns don't work the same way as wildcards on the Windows platform. Just use * instead of *.*. i.e. os.path.join(corpus_path,'*'). Note that * will match every file in the directory - if that's not what you want then you can revise the pattern accordingly.
See the glob module documentation for more details.
Just use * instead of *.*.
The latter requires an extension to be present (more precisely, there needs to be a dot in the filename), the former doesn't.
You could search for * instead of *.*, but this will match every file in your directory.
Fundamentally, this means that you will have to handle cases where the file you are opening is not a text file.
it seems that you need
from os import listdir
from filename in ( fn for fn in listdir(corpus_path) if '.' not in fn):
# do something
you could write
from os import listdir
for fn in listdir(corpus_path):
if '.' not in fn:
# do something
but the former with a generator spares one indentation level