Replacing part of a file name in a directory using python - python

I am trying to rename a set of files in a directory using python. The files are currently labelled with a Pool number, AR number and S number (e.g. Pool1_AR001_S13__fw_paired.fastq.gz.) Each file refers to a specific plant sequence name. I would like to rename these files by removing the 'Pool_AR_S' and replacing it with the sequence name e.g. 'Lbienne_dor5_GS1', while leaving the suffix (e.g. fw_paired.fastq.gz, rv_unpaired.fastq.gz), I am trying to read the files into a dictionary, but I am stuck as to what to do next. I have a .txt file containing the necessary information in the following format:
Pool1_AR010_S17 - Lbienne_lla10_GS2
Pool1_AR011_S18 - Lbienne_lla10_GS3
Pool1_AR020_S19 - Lcampanulatum_borau4_T_GS1
The code I have so far is:
from optparse import OptionParser
import csv
import os
parser = OptionParser()
parser.add_option("-w", "--wanted", dest="w")
parser.add_option("-t","--trimmed", dest="t")
parser.add_option("-d", "--directory", dest="working_dir", default="./")
(options, args) = parser.parse_args()
wanted_file = options.w
trimmomatic_output = options.t
#Read the wanted file and create a dictionary of index vs species identity
with open(wanted_file, 'rb') as species_sequence:
species_list = list(csv.DictReader(species_sequence, delimiter='-'))
print species_list
#Rename the Trimmomatic Output files according to the dictionary
for trimmed_sequence in os.listdir(trimmomatic_output):
os.rename(os.path.join(trimmomatic_output, trimmed_sequence),
os.path.join(trimmomatic_output, trimmed_sequence.replace(species_list[0], species_list[1]))
Please can you help me to replace half of the . I'm very new to python and to stack overflow, so I am sorry if this question has been asked before or if I have asked this in the wrong place.

First job is to get rid of all those modules. They may be nice, but for a job like yours they are very unlikely to make things easier.
Create a .py file in the directory where those .gz files reside.
import os
files = os.listdir() #files is of list type
#'txt_file' is the path of your .txt file containing those conversions
dic=parse_txt(txt_file) #omitted the body of parse_txt() func.Should return a dictionary by parsing that .txt file
for f in files:
pre,suf=f.split('__') #"Pool1_AR001_S13__(1)fw_paired.fastq.gz"
#(1)=assuming prefix and suffix are divided by double underscore
pre = dic[pre]
os.rename(f,pre+'__'+suf)
If you need help with parse_txt() function, let me know.

Here is a solution that I tested with Python 2. Its fine if you use your own logic instead of the get_mappings function. Refer comments in code for explanation.
import os
def get_mappings():
mappings_dict = {}
with(open('wanted_file.txt', 'r')) as f:
for line in f:
# if you have Pool1_AR010_S17 - Lbienne_lla10_GS2
# it becomes a list i.e ['Pool1_AR010_S17 ', ' Lbienne_lla10_GS2']
#note that there may be spaces before/after the names as shown above
text = line.split('-')
#trim is used to remove spaces in the names
mappings_dict[text[0].strip()] = text[1].strip()
return mappings_dict
#PROGRAM EXECUTION STARTS FROM HERE
#assuming all files are in the current directory
# if not replace the dot(.) with the path of the directory where you have the files
files = os.listdir('.')
wanted_names_dict = get_mappings()
for filename in files:
try:
#prefix='Pool1_AR010_S17', suffix='fw_paired.fastq.gz'
prefix, suffix = filename.split('__')
new_filename = wanted_names_dict[prefix] + '__' + suffix
os.rename(filename, new_filename)
print 'renamed', filename, 'to', new_filename
except:
print 'No new name defined for file:' + filename

Related

List of windows path in configparse file in python

I am writting a script to backup a list of paths, inspired by https://github.com/Johanndutoit/Zip-to-FTP-Backup/blob/master/backup_to_ftp.py
So I have an ini file
[folders]
/home/david/docs
/home/david/images
/home/david/videos
[ftp]
username=etc
password=pwd
The code to read it is:
config = configparser.ConfigParser(allow_no_value=True)
config.optionxform = lambda option: option # preserve case for letters
config.read('backupcfg.ini')
filelistings = [] # All Files that will be added to the Archive
# Add Files From Locations
for folder in config.options("folders"):
filelistings.append(str(folder.strip("'")))
The problem is I can't find a way to read it as raw when I'm running it on windows, with folders like
[folders]
Z:\Desktop\winpython
I can't scape the backslash. I've ended up including it as:
filelistings = [r'Z:\docs\winpython', r'Z:\images\family']
Is there any way to write the paths in the ini? Can't find a way to read config.options("folders") to return raw.
Thank you for your help!
I have tried to add the path straight to the tar:
config = configparser.ConfigParser(allow_no_value=True)
config.optionxform = lambda option: option # preserve case for letters
config.read('backupcfg.ini')
now = datetime.datetime.now()
zipname = 'backup.' + now.strftime("%Y.%m.%d") + '.tgz'
with tarfile.open(zipname, "w:gz") as tar:
for name in config.options("folders"):
print ("Adding "+name)
tar.add(str(name))
Which reports can't find filename: 'Z'
Is there any way to access that information as it is, with ?
I got the idea for this answer from this answer. The problem isn't with the slashes. It's with the colon (:). By default, Configparser uses = and : as delimiters. But you can specify other delimiters to use instead.
backupcfg.ini
[folders]
/home/david/docs
Z:\Desktop\winpython
python code
import configparser
config = configparser.ConfigParser(allow_no_value=True, delimiters=(','))
config.optionxform = lambda option: option # preserve case for letters
config.read('backupcfg.ini')
for folder in config.options("folders"):
print(folder)
output:
/home/david/docs
Z:\Desktop\winpython

Applying a UDF into a for loop - Python

Example of PDF: "Smith#00$Consolidated_Performance.pdf"
The goal is to add a bookmark to page 1 of each PDF based on the filename.
(Bookmark name in example would be "Consolidated Performance")
import os
from openpyxl import load_workbook
from PyPDF2 import PdfFileMerger
cdir = "Directory of PDF" # Current directory
pdfcdir = [filename for filename in os.listdir(cdir) if filename.endswith(".pdf")]
def addbookmark(f):
output = PdfFileMerger()
name = os.path.splitext(os.path.basename(f))[0] # Split filename from .pdf extension
dp = name.index("$") + 1 # Find position of $ sign
bookmarkname = name[dp:].replace("_", " ") # replace underscores with spaces
output.addBookmark(bookmarkname, 0, parent=None) # Add bookmark
output.append(open(f, 'rb'))
output.write(open(f, 'wb'))
for f in pdfcdir:
addbookmark(f)
The UDF works fine when applied to individual PDFs, but it won't add the bookmarks when put into the loop at the bottom of the code. Any ideas on how to make the UDF loop through all PDFs within pdfcdir?
I'm pretty sure that the issue you're having has nothing to do with the loop. Rather, you're passing just the filenames and not including the directory path. It's trying to open these files in the script's current working directory (the directory the script is in, by default) rather than in the directory you read the filenames from.
So, join the directory name with each file name when calling your function.
for f in pdfcdir:
addbookmark(os.path.join(cdir, f))

Python3 Rename files in a directory importing the new names from a txt file

I have a directory containing multiple files.
The name of the files follows this pattern 4digits.1.4digits.[barcode]
The barcode specifies each file and it is composed by 7 leters.
I have a txt file where in one column I have that barcode and in the other column the real name of the file.
What I would like to do is to right a pyhthon script that automatically renames each file according to the barcode to it s new name written in the txt file.
Is there anybody that could help me?
Thanks a lot!
I will give you the logic:
1. read the text file that contains barcode and name.http://www.pythonforbeginners.com/files/reading-and-writing-files-in-python.
for each line in txt file do as follows:
2. Assign the value in first(barcode) and second(name) column in two separate variables say 'B' and 'N'.
3. Now we have to find the filename which has the barcode 'B' in it. the link
Find a file in python will help you do that.(first answer, 3 rd example, for your case the name you are going to find will be like '*B')
4. The previous step will give you the filename that has B as a part. Now use the rename() function to rename the file to 'N'. this link will hep you.http://www.tutorialspoint.com/python/os_rename.htm
Suggestion: Instead of having a txt file with two columns. You can have a csv file, that would be easy to handle.
The following code will do the job for your specific use-case, though can make it more general purpose re-namer.
import os # os is a library that gives us the ability to make OS changes
def file_renamer(list_of_files, new_file_name_list):
for file_name in list_of_files:
for (new_filename, barcode_infile) in new_file_name_list:
# as per the mentioned filename pattern -> xxxx.1.xxxx.[barcode]
barcode_current = file_name[12:19] # extracting the barcode from current filename
if barcode_current == barcode_infile:
os.rename(file_name, new_filename) # renaming step
print 'Successfully renamed %s to %s ' % (file_name, new_filename)
if __name__ == "__main__":
path = os.getcwd() # preassuming that you'll be executing the script while in the files directory
file_dir = os.path.abspath(path)
newname_file = raw_input('enter file with new names - or the complete path: ')
path_newname_file = os.path.join(file_dir, newname_file)
new_file_name_list = []
with open(path_newname_file) as file:
for line in file:
x = line.strip().split(',')
new_file_name_list.append(x)
list_of_files = os.listdir(file_dir)
file_renamer(list_of_files, new_file_name_list)
Pre-assumptions:
newnames.txt - comma
0000.1.0000.1234567,1234567
0000.1.0000.1234568,1234568
0000.1.0000.1234569,1234569
0000.1.0000.1234570,1234570
0000.1.0000.1234571,1234571
Files
1111.1.0000.1234567
1111.1.0000.1234568
1111.1.0000.1234569
were renamed to
0000.1.0000.1234567
0000.1.0000.1234568
0000.1.0000.1234569
The terminal output:
>python file_renamer.py
enter file with new names: newnames.txt
The list of files - ['.git', '.idea', '1111.1.0000.1234567', '1111.1.0000.1234568', '1111.1.0000.1234569', 'file_renamer.py', 'newnames.txt.txt']
Successfully renamed 1111.1.0000.1234567 to 0000.1.0000.1234567
Successfully renamed 1111.1.0000.1234568 to 0000.1.0000.1234568
Successfully renamed 1111.1.0000.1234569 to 0000.1.0000.1234569

Python 3x - Remove numbers from file names

import os
def rename_files():
# (1) get file names from a folder
file_list = os.listdir(r"C:\Users\USEER\Desktop\Udacity\Udacity - Programming Foundation with Python\Project\prank\prank")
# print(file_list)
saved_path = os.getcwd()
print("Current Working Directory is " + saved_path)
os.chdir(r"C:\Users\USEER\Desktop\Udacity\Udacity - Programming Foundation with Python\Project\prank\prank")
# (2) for each file, rename file name
for file_name in file_list:
print("Old Name - " + file_name)
print("New Name - " + file_name.translate("0123456789"))
os.rename(file_name, file_name.translate("0123456789"))
os.chdir(saved_path)
rename_files()
The code above doesn't rename the file by removing the integers. Can anyone help? (Python 3x)
import re
new_name = re.sub('[0-9]', '', file_name)
In Python 3 the String.translate is gone. Therefore you need to use the str.translate. It needs 'str.maketrans' which normally creates a translation table with the first two arguments supplied(not needed in this example), the third argument supplies the characters to be stripped.
This line should have the desired effect ...
os.rename(file_name, file_name.translate(str.maketrans('','','0123456789'))
Previous suggestions used .strip() however in this case as the numbers are mixed in with the filenames (not before or after) I believe it would not work, another used Regular Expressions which is perfectly valid, however within the context of this particular Udacity course translate was the suggested solution.
Here are the docs for maketrans :
[https://docs.python.org/3/library/stdtypes.html#str.maketrans][1]
The problem is in your translate function that doesn't do anything. There are better options available, but if your want to use translate then the proper syntax is:
#!/usr/bin/env python2
import string
new_name = string.translate(file_name, None, "0123456789")
Here is one way of renaming files.
Use os.renames to rename the files
Use file_name.strip("0123456789") to remove numbers
Code is given below:
import os
def file_rename():
name_list=os.listdir(r"C:\python\prank")
print(name_list)
saved_path=os.getcwd()
print("Current working directory is"+saved_path)
os.chdir(r"C:\python\prank")
for file_name in name_list:
print("old name"+file_name)
print("new name"+file_name.strip("0123456789"))
os.renames(file_name,file_name.strip("0123456789"))
os.chdir(saved_path)
file_rename()
To read more about os.renames check here.
To read more about the strip function, check here.
Another way to do this without import re. Instead of utilizing the .translate, use the .strip.
os.rename(file_name, file_name.strip('0123456789'))
Another observation is that your code wont read the new file name after changing it. At the top of your code you are reading file names and saving those name in file_list
# (1) get file names from a folder
file_list = os.listdir(r"C:\Users\USEER\Desktop\Udacity\Udacity - Programming Foundation with Python\Project\prank\prank")
In the for loop, where you are changing the name of each file, YOU ARE NOT reading the new file's name. You need to do something like this.
# (2) for each file, rename file name
for file_name in file_list:
print("Old Name - " + file_name)
os.rename(file_name, file_name.strip("0123456789"))
# (3) read file's name again... 'file_list' has old names
new_file_list = os.listdir(r"C:\Users\USEER\Desktop\Udacity\Udacity - Programming Foundation with Python\Project\prank\prank")
for file_name in new_file_list:
print("New file's name: " + new_file_name)
os.chdir(saved_path)
import os
def rename_files():
#1 Get file names from the folder
file = os.listdir(r"C:\Web\Python\prank")
print(file)
saved_path = os.getcwd()
print("Current Working Directory is"+saved_path)
os.chdir(r"C:\Web\Python\prank")
#2 For each file name rename file names
for file_name in file:
print("Old Name - " + file_name)
os.rename(file_name,file_name.strip("0123456789"))
print("New Name - " + file_name)
os.chdir(saved_path)
rename_files()
enter code here
import os
dir="/home/lucidvis/myPythonHome/prank/"
def rename_files():
# get file names from a folder
filenames = os.listdir(dir)
# print(filenames)
for file in filenames:
#print(file)
try:
#with os.open(filename for filename in filenames,"r+"):
#read_file = filename.read()
# new_name = file.translate(str.maketrans('','', "0123456789"))
new_filname = (file.translate(str.maketrans('','', "0123456789"))).replace(" ","")
#print(dir+new_file_name)
os.rename(dir+file,dir+new_file_name)
except SyntaxError as e:
print(e)
continue
# for each file, rename filename
rename_files()
import os
dir="/home/lucidvis/myPythonHome/prank/"
def rename_files():
# get file names from a folder
filenames = os.listdir(dir)
# iterate through the list of filenames
for file in filenames:
#print(file)
try:
#assign a variable to new names for easy manipulation
new_filname = (file.translate(str.maketrans('','', "0123456789"))).replace(" ","")
#concatenating the directory name to old and new file names
os.rename(dir+file,dir+new_file_name)
#just to manage errors
except SyntaxError as e:
print(e)
continue
#file renaming function call
rename_files()
will guys i was trying to solve this problem because i see udacity online courses and it require to rename file without numbers thanks to simon for his replay i have to figured it out
this is my code to rename files without numbers, hope it jhelp anyone who stuck
import os
import re
def rename():
#get the list of the photo name
plist = os.listdir(r"D:\pay\prank")
print(plist)
#removing the numbers from the photo names
os.chdir(r"D:\pay\prank")
for pname in plist :
os.rename(pname, re.sub('[0-9]', '' , pname))
print(pname)
rename()
import os
import re
from string import digits
#Get file names
file_list = os.listdir(r"C:\Users\703305981\Downloads\prank\prank")
print(file_list)
#chenage directory.
os.chdir(r"C:\Users\703305981\Downloads\prank\prank")
print (os.getcwd())
#Change the Name.
for file_name in file_list:
os.rename(file_name, re.sub(r'[0-9]+', '', file_name))

How read contents of txt files in different directories and rename other files according to

I just started with Python 3 and ran into the following problem:
I downloaded a good deal of PDFs from different journals for my thesis, but they are all named after their DOI and not in the format “Author (Year) - Title”.
The documents are saved in different directories, according to the journal's name and volume, e.g.:
/Journal 1/
/Vol. 1/
file1.pdf
file1.txt
file2.pdf
file2.txt
filen.pdf
filen.txt
/Vol. 2/
file1.pdf
file1.txt
/Journal 2/
...
Because I have no idea how to read the contents of a PDF with Python, I wrote a very short bash script, that converted the PDFs to simple TXT files. The pdf and txt files have the same name with a different file extension.
I would like to rename all of the PDF files, luckily there is a string in each of the file's continuous text, that I could use. This variable string lies between two static strings:
"Cite this article as: " AUTHOR/YEAR/TITLE ", Journal name".
How do I make Python go into each directory, read the contents of the TXT/PDF, extract the variable string between the two fixed strings and then rename the appropriate PDF file?
If anyone knows how to do this with Python 3, I would be very thankful.
Finally got it to work:
#__author__ = 'Telefonmann'
# -*- coding: utf-8 -*-
import os, re, ntpath, shutil
for root, dirs, files in os.walk(os.getcwd()):
for file in files: # loops through directories and files
if file.endswith(('.txt')): # only processes txt files
full_path = ntpath.splitdrive(ntpath.join(root, file))[1]
# builds correct path under Win 7 (and probably other NT-systems
with open(full_path, 'r', encoding='utf-8') as f:
content = f.read().replace('\n', '') # remove newline
r = re.compile('To\s*cite\s*this\s*article:\s*(.*?),\s*Journal\s*of\s*Quantitative\s*Linguistics\s*,')
m = r.search(content)
# finds substring inbetween "To cite this article: " and "Journal of Quantitative Linguistics,"
# also finds typos like "Journal ofQuantitative ..."
if m:
full_title = m.group(1)
print("full_title: {0}".format(full_title))
full_title = (full_title.replace('<','') # removes/replaces forbidden characters in Windows file names
.replace('>','')
.replace(':',' -')
.replace('"','')
.replace('/','')
.replace('\\','')
.replace('|','')
.replace('?','')
.replace('*',''))
pdf_name = full_path.replace('txt','pdf')
# since txt and pdf files only differ in their format extension I simply replace .txt with .pdf
# to get the right name
print('File: '+ file)
print('Full Path: ' + full_path)
print('Full Title: ' + full_title)
print('PDF Name: ' + pdf_name)
print('....................................')
# for trouble shooting
dirname = ntpath.dirname(pdf_name)
new_path = ntpath.join(dirname, "{0}.pdf".format(full_title))
if ntpath.exists(full_path):
print("all paths found")
shutil.copy(pdf_name, new_path)
# makes a copy of the pdf file with the new name in the respective directory

Categories

Resources