Replacing text in set of files, and writing to new files

Replacing text in set of files, and writing to new files - python

I have a set of .SQL scripts in a folder (maybe 20 or so files). I want to search every file in that folder and replace 'ABC' with 'ABCD' and 'XYZ' with 'WXYZ', and then take each file that's been processed and save it in a different folder (path_2) in example below, with the same file name as the original.
I know this is not working, what tweaks are needed?
import sys
def main():
path = "C:/path/to/input/folder"
path_2 = "C:/path/to/input/folder"
def replace_text(replacements):
replacements = {'ABC': 'ABCD', 'XYZ':'WXYZ'}
path_2.write()
for filename in path:
if filename.endswith('.sql')
open(filename,'r')
replace_text()
if __name__ == "__main__":
main()

you never actually get filename - you could use os.listdir(), or glob.glob("*.sql") might be easier
if filename.endswith('.sql') needs to end with a colon
below that, open and replace_text need to be indented
the contents of main need to be indented
def replace_text should not be in main
you open the file; you should then .read() the contents and pass it to replace_text
replace_text doesn't do anything; for each key,value pair in replacements.items() you should do text.replace(key, value)
replace_text should then pass the updated text back to main
main should then save the updated text

There are many syntax errors in your code. #Hugh-Bothwell pointed out those errors. Also, both paths are same.
If you want to do just these two replacements, I think there is no need to create a separate function or a dictionary.
Following code should work -
import os
def main():
path = "C:/path/to/input/folder"
path_2 = "C:/path/to/output/folder"
for filename in os.listdir(path):
if filename.endswith('.sql'):
#Getting Full file paths to read/write
full_file_path = os.path.join(path, filename)
new_file_path = os.path.join(path_2, filename)
with open(full_file_path,"r") as f:
content = f.read()
content = content.replace("ABC","ABCD").replace("XYZ","WXYZ")
'''This will save all the files to new location, even if there is no change, to save only changed files,
copy the content to a temp variable and check if there is any change before saving'''
with open(new_file_path, "w") as f_new:
f_new.write(content)
if __name__ == "__main__":
main()
Note:
The path to the new location should exist, or you can create the
path using - os.makedirs(path_2)
The replacements are case sensitive
It will replace the occurrences in between the words too, e.g MNABCS will be replaced by MNABCDS

Related

read all text files in directory and subdirectories and extract all email addresses from the textfiles

I need to extract all email addresses from all text files within a directory with a lot of subdirectories. It is to much work to do this manually. I wrote the python script below to automate this task. However, when I execute the script I end up with an empty array printed. No errors shown. Can one please indicate what I'm doing wrong
# Import Module
import os
import re
# Folder Path
path = "pat to the root directory"
# Change the directory
os.chdir(path)
#create list and index to add the emails
new_list = []
idx = 0
# I create a method to add all email address from within the subdirectories to add
them to an array
def read_text_file(file_path):
with open(file_path, 'r') as f:
emails = re.findall(r"[a-z0-9\.\-+_]+#[a-z0-9\.\-+_]+\.[a-z]+", str(f))
new_list.insert(idx, emails)
idx + 1
# iterate through all file and call the method from above
for file in os.listdir():
# Check whether file is in text format or not
if file.endswith(".txt"):
p = f"{path}\{file}"
# call read text file function
read_text_file(p)
#print the array
print (new_list)

to check subdirectories as said you want, you need to check if the current item in the os.listdir() list is a folder and if so check all the file in that folder (and if there are more folders in that folder check them as well) or a file that ends with .txt
you also need to read() the file (f.read()) and only then you can pass it to re.findall()
# Import Module
import os
import re
# Folder Path
PATH = r"path to folder" # constants are UPPER CASED LETTERS
# create list to add the emails
new_list = []
def read_text_file(file_path):
global new_list
with open(file_path, 'r') as f:
emails = re.findall(r"[a-z0-9.\-+_]+#[a-z0-9.\-+_]+\.[a-z]+", str(f.read()))
new_list += emails
def find_all_text_files(path):
# iterate through all file and call the method from above
path = path if path.endswith("\\") else path + "\\"
for file_or_dir in os.listdir(path):
# Check whether file is in text format or not
if os.path.isfile(path + file_or_dir) and file_or_dir.endswith(".txt"):
file_path = path + file_or_dir
# call read text file function
read_text_file(file_path)
# if the current item is dir
elif os.path.isdir(path + file_or_dir):
new_path = path + file_or_dir
find_all_text_files(new_path)
def main():
global new_list
find_all_text_files(PATH)
# print the array
print(new_list)
if __name__ == '__main__':
main()

When your code doesn't produce expected results, you need to debug it to find the cause. Is it your regex? Is it your list.insert()? Is it the conversion of the file handle f to a string directly Something else entirely? Let's find out by modifying your read_text_file() func to print status:
def read_text_file(file_path):
print(f"attempting to parse {file_path}")
with open(file_path, 'r') as f:
huge_line = f.read()
emails = re.findall(r"[a-z0-9.\-+_]+#[a-z0-9.\-+_]+\.[a-z]+", huge_line)
print(f"found {len(emails)} emails in file {file_path}")
new_list.extend(emails)
I dropped the idx var and just used list.extend(). Try that out and see where it fails, then add more print statements as needed to narrow it down.

Reading file in the same folder - Improvement?

i am writing an python script. I was having some problems to open the file. The error was always that system just can not find the file.
Because of that i tried get the active path... Replace backslash ... and so on....
Is there any improvements to work with the file in the same folder?
The Code
import os
# The name of the txt file that is in the same folder.
myFile = 'noticia.txt'
# Getting the active script
diretorio = os.path.dirname(os.path.abspath(__file__))
# Replace BackSlash and concatenate myFile
correctPath = diretorio.replace("\\", "/") + "/" + myFile
# Open file
fileToRead = open(correctPath, "r")
# Store text in a variable
myText = fileToRead.read()
# Print
print(myText)
Note:
The script is in the same folder of the txt file.

Is there any improvements to work with the file in the same folder?
First off, please see PEP 8 for standard conventions on variable names.
correctPath = diretorio.replace("\\", "/") + "/" + myFile
While forward slashes are preferred when you specify a new path in your code, there is no need to replace the backslashes in a path that Windows gives you. Python and/or Windows will translate behind the scenes as necessary.
However, it would be better to use os.path.join to combine the path components (something like correct_path = os.path.join(diretorio, my_file)).
fileToRead = open(correctPath, "r")
# Store text in a variable
myText = fileToRead.read()
It is better to use a with block to manage the file, which ensures that it is closed properly, like so:
with open(correct_path, 'r') as my_file:
my_text = my_file.read()

Replacing part of a file name in a directory using python

I am trying to rename a set of files in a directory using python. The files are currently labelled with a Pool number, AR number and S number (e.g. Pool1_AR001_S13__fw_paired.fastq.gz.) Each file refers to a specific plant sequence name. I would like to rename these files by removing the 'Pool_AR_S' and replacing it with the sequence name e.g. 'Lbienne_dor5_GS1', while leaving the suffix (e.g. fw_paired.fastq.gz, rv_unpaired.fastq.gz), I am trying to read the files into a dictionary, but I am stuck as to what to do next. I have a .txt file containing the necessary information in the following format:
Pool1_AR010_S17 - Lbienne_lla10_GS2
Pool1_AR011_S18 - Lbienne_lla10_GS3
Pool1_AR020_S19 - Lcampanulatum_borau4_T_GS1
The code I have so far is:
from optparse import OptionParser
import csv
import os
parser = OptionParser()
parser.add_option("-w", "--wanted", dest="w")
parser.add_option("-t","--trimmed", dest="t")
parser.add_option("-d", "--directory", dest="working_dir", default="./")
(options, args) = parser.parse_args()
wanted_file = options.w
trimmomatic_output = options.t
#Read the wanted file and create a dictionary of index vs species identity
with open(wanted_file, 'rb') as species_sequence:
species_list = list(csv.DictReader(species_sequence, delimiter='-'))
print species_list
#Rename the Trimmomatic Output files according to the dictionary
for trimmed_sequence in os.listdir(trimmomatic_output):
os.rename(os.path.join(trimmomatic_output, trimmed_sequence),
os.path.join(trimmomatic_output, trimmed_sequence.replace(species_list[0], species_list[1]))
Please can you help me to replace half of the . I'm very new to python and to stack overflow, so I am sorry if this question has been asked before or if I have asked this in the wrong place.

First job is to get rid of all those modules. They may be nice, but for a job like yours they are very unlikely to make things easier.
Create a .py file in the directory where those .gz files reside.
import os
files = os.listdir() #files is of list type
#'txt_file' is the path of your .txt file containing those conversions
dic=parse_txt(txt_file) #omitted the body of parse_txt() func.Should return a dictionary by parsing that .txt file
for f in files:
pre,suf=f.split('__') #"Pool1_AR001_S13__(1)fw_paired.fastq.gz"
#(1)=assuming prefix and suffix are divided by double underscore
pre = dic[pre]
os.rename(f,pre+'__'+suf)
If you need help with parse_txt() function, let me know.

Here is a solution that I tested with Python 2. Its fine if you use your own logic instead of the get_mappings function. Refer comments in code for explanation.
import os
def get_mappings():
mappings_dict = {}
with(open('wanted_file.txt', 'r')) as f:
for line in f:
# if you have Pool1_AR010_S17 - Lbienne_lla10_GS2
# it becomes a list i.e ['Pool1_AR010_S17 ', ' Lbienne_lla10_GS2']
#note that there may be spaces before/after the names as shown above
text = line.split('-')
#trim is used to remove spaces in the names
mappings_dict[text[0].strip()] = text[1].strip()
return mappings_dict
#PROGRAM EXECUTION STARTS FROM HERE
#assuming all files are in the current directory
# if not replace the dot(.) with the path of the directory where you have the files
files = os.listdir('.')
wanted_names_dict = get_mappings()
for filename in files:
try:
#prefix='Pool1_AR010_S17', suffix='fw_paired.fastq.gz'
prefix, suffix = filename.split('__')
new_filename = wanted_names_dict[prefix] + '__' + suffix
os.rename(filename, new_filename)
print 'renamed', filename, 'to', new_filename
except:
print 'No new name defined for file:' + filename

Find fileS and then find a string in those files

I have written a function that finds all of the version.php files in a path. I am trying to take the output of that function and find a line from that file. The function that finds the files is:
def find_file():
for root, folders, files in os.walk(acctPath):
for file in files:
if file == 'version.php':
print os.path.join(root,file)
find_file()
There are several version.php files in the path and I would like to return a string from each of those files.
Edit:
Thank you for the suggestions, my implementation of the code didn't fit my need. I was able to figure it out by creating a list and passing each item to the second part. This may not be the best way to do it, I've only been doing python for a few days.
def cmsoutput():
fileList = []
for root, folders, files in os.walk(acctPath):
for file in files:
if file == 'version.php':
fileList.append(os.path.join(root,file))
for path in fileList:
with open(path) as f:
for line in f:
if line.startswith("$wp_version ="):
version_number = line[15:20]
inst_path = re.sub('wp-includes/version.php', '', path)
version_number = re.sub('\';', '', version_number)
print inst_path + " = " + version_number
cmsoutput()

Since you want to use the output of your function, you have to return something. Printing it does not cut it. Assuming everything works it has to be slightly modified as follows:
import os
def find_file():
for root, folders, files in os.walk(acctPath):
for file in files:
if file == 'version.php':
return os.path.join(root,file)
foundfile = find_file()
Now variable foundfile contains the path of the file we want to look at. Looking for a string in the file can then be done like so:
with open(foundfile, 'r') as f:
content = f.readlines()
for lines in content:
if '$wp_version =' in lines:
print(lines)
Or in function version:
def find_in_file(string_to_find, file_to_search):
with open(file_to_search, 'r') as f:
content = f.readlines()
for lines in content:
if string_to_find in lines:
return lines
# which you can call it like this:
find_in_file("$wp_version =", find_file())
Note that the function version of the code above will terminate as soon as it finds one instance of the string you are looking for. If you wanna get them all, it has to be modified.

Iterating files with os.walk, but cannot open and print text files

Currently I am trying to write a function will walk through the requested directory and print all the text of all the files.
Right now, the function works in displaying the file_names as a list so the files surely exist (and there is text in the files).
def PopularWordWalk (starting_dir, word_dict):
print ("In", os.path.abspath(starting_dir))
os.chdir(os.path.abspath(starting_dir))
for (this_dir,dir_names,file_names) in os.walk(starting_dir):
for file_name in file_names:
fpath = os.path.join(os.path.abspath(starting_dir), file_name)
fileobj = open(fpath, 'r')
text = fileobj.read()
print(text)
Here is my output with some checking of the directory contents:
>>> PopularWordWalk ('text_dir', word_dict)
In /Users/normanwei/Documents/Python for Programmers/Homework 4/text_dir
>>> os.listdir()
['.DS_Store', 'cats.txt', 'zen_story.txt']
the problem is that whenever i try to print the text, i get nothing. eventually I want to push the text through some other functions but as of now it seems moot without any text. Can anyone lend any experience on why no text is appearing? (when trying to open files/read/storing&printing text manually in idle it works i.e. if I just manually inputted 'cats.txt' instead of 'file_name') - currently running python 3.
EDIT - The question has been answered - just have to remove the os.chdir line - see jojo's answer for explanation.

This line won't work
file = open(file_name, 'r')
Because it would require that these files exist in the same folder you are running the script from. You would have to provide the path to those files, as well as the file names
with open(os.path.join(starting_dir,file_name), 'r') as file:
#do stuff
This way it will build the full path from the directory and the file name.

If you do os.chdir(os.path.abspath(starting_dir)) you go into starting_dir. Then for (this_dir,dir_names,file_names) in os.walk(starting_dir): will loop over nothing since starting_dir is not in starting_dir.
Long story short, comment the line os.chdir(os.path.abspath(starting_dir)) and you should be good.
Alternatively if you want to stick to the os.chdir, this should do the job:
def PopularWordWalk (starting_dir, word_dict):
print ("In", os.path.abspath(starting_dir))
os.chdir(os.path.abspath(starting_dir))
for (this_dir,dir_names,file_names) in os.walk('.'):
for file_name in file_names:
fpath = os.path.join(os.path.abspath(starting_dir), file_name)
with open(fpath, 'r') as fileobj:
text = fileobj.read()
print(text)

You'll want to join the root path with the file path. I'd change:
file = open(file_name, 'r')
to
fpath = os.path.join(this_dir, file_name)
file = open(fpath, 'r')
You may also want to use another word to describe it than file as that's a built-in function in Python. I'd recommend fileobj.

Just to add on to the previous answer, you will have to join the absolute path and the relative path of the walk.
Try this:
fpath = os.path.abspath(os.path.join(this_dir, file_name))
f = open(fpath, 'r')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Replacing text in set of files, and writing to new files - python

Related

read all text files in directory and subdirectories and extract all email addresses from the textfiles

Reading file in the same folder - Improvement?

Replacing part of a file name in a directory using python

Find fileS and then find a string in those files

Iterating files with os.walk, but cannot open and print text files

Categories

Resources