Find and replace in multiple files and add incrementing number

Find and replace in multiple files and add incrementing number - python

I'm trying to replace a regex pattern in many .cpp files on my computer. I need to add an incrementing number at the end of each substitution, so I chose python to do this.
This is what I got already, but it doesn't work yet:
import os
import re
i = 0
for file in os.listdir("C:\SeparatorTest"):
if file.endswith(".cpp"):
for line in open(file):
line = re.sub(r'([^\s]*)\.Separator\((.*)\)\;', r'Separator \1\_' + i += 1 + '(\1,\2);')
Have I missed something?

Haven't tested because I didn't think of what you are trying to replace, but you shouldn't increment like that in the middle of your re.sub call, you need to replace your last line of code with this :
line = re.sub(r'([^\s]*)\.Separator\((.*)\)\;', r'Separator \1\_' + i + '(\1,\2);')
i += 1
In C++ you'd just put i++ or ++i and the expression would be evaluated to i before or after incrementation, but here I wouldn't try fancy things, python needs to be readable and the next programmer reading your code might not guess what you did. and there is no ++ operator in python.
Edit : And you are just reading your file, the open(file) has default "r", which means reading, and you aren't writing anything. you need open(file, "w") for that. and not just store the re.sub() return value in a variable, but write it to the file.
Edit2 : Here is what I'm working on, it's not done yet, I'll edit as I find how to get it to work :
import os, re
i = 0
def replacement(i):
i += 1
return r'Separator \1\_' + str(i) + '(\1,\2);'
for file in os.listdir("."):
if file.endswith(".cpp"):
for line in open(file):
re.sub(r'([^\s]*)\.Separator\((.*)\)\;', replacement(i), line)
The idea is that the replacement text can be the result of a function that should only be called when a non overlapping pattern matches, according to The python documentation
Edit3 : I think I'll stop there, unless I get a response from you because I have some regex and other problems I don't have time to address. Also I'm unsure on best practice for text replacement, you should look for that, there should be help available. Glue the whole thing together (Incrementation, correcting your re.sub() call, open in writing mode, replace the text that matches) and you should achieve what you were trying to do.

Related

What is the best way to replace a function call in a script?

Consider this line of Python:
new_string = change_string(old_string)
If you want to replace or remove the function call and just have new_string = old_string, a simple text replacement will not suffice. (Replacing "function_changes_string(“ with the empty string will leave the closing parenthesis. What if you wanted to replace the function call 100 or more times. That’s a lot of wasted time.
As an alternative, I'm using a regex to replace the function call.
Here is a short python script that takes as input the name of function to remove.
import os
import re
# Define variables
current_directory = os.getcwd()
file_to_regex_replace = "path/to/script/script.py"
output_filepath = current_directory + "/regex_repace_output.py"
regex_to_replace = re.compile(r"function_changes_string\((.+?)\)")
fixed_data_array = []
#read file line by line to array
f = open(file_to_regex_replace, "r")
data = f.readlines()
f.close()
line_count = 0
found_count = 0
not_found_count = 0
for line in data:
line_count += 1
# repace the regex in each line
try:
found = re.search(regex_to_replace, line).group(1)
found_count += 1
print str(line_count) + " " + re.sub(regex_to_replace, found, line).replace("\n", "")
fixed_data_array.append(re.sub(regex_to_replace, found, line))
except AttributeError:
fixed_data_array.append(line)
not_found_count += 1
print "Found : " + str(found_count)
print "Total : " + str(not_found_count + found_count)
# Open file to write to
f = open(output_filepath, "w")
# loop through and write each line to file
for item in fixed_data_array:
f.write(item)
f.close()
This worked fine and did what I expected. However, is there another, more accepted way to do this?

Using a regex is probably the simplest way to handle your use case. But use the regex match and replace functionality likely built into your IDE instead of reinventing the wheel by writing your own script.
Note that many IDEs have powerful automated refactoring capabilities built into the application. For example, PyCharm understands the notion of extracting method calls as well as renaming variables/methods, changing method signatures, and several others. However, PyCharm currently does not have a built-in refactoring operation for your use case, so regex is a good alternative.
Here's an example regex that works in Atom:
Find: change_string\((.+)\)
Replace: $1
Given the line new_string = change_string(old_string), the resulting line after replacement will be new_string = old_string.
If you are writing software for a company that has a relatively large codebase, then large-scale refactoring operations might happen frequently enough that the company has developed their own solution to your use case. If this might be the case, consider asking your colleagues about it.

Write a single poly-linear string to multiple lines in .txt

I have encountered a strange problem which I am struggling to resolve. When I run a re.findall() through a .txt file, and then try to print and write the results. all of the results I would expect appear, but they do so in different formats.
The code (modified from a similar thread I found earlier):
import re
with open ('test.txt') as text:
text = text.read()
match = re.findall(r'[\w\.-]+#[\w\.-]+', text)
for i in match:
with open ('list.txt', 'a') as dest:
i = str(i)
print(i)
dest.write(i)
The interpreter then produces the result:
a#a
b#b
c#c
which is exactly what I would expect it to do, given the contents of test.txt.
However, list.txt reads:
(generic existing text goes here)
a#ab#bc#c
while I want it to (and believe it should) read
(generic existing text goes here)
a#a
b#b
c#c
I've tried using str.writelines.() in place of str.write() but this was not helpful. What differences between print() and str.write() are causing this ambiguity, and how would one go about avoiding it.
N.B. I am 99% sure that line 8 i = str(i) serves no purpose, but I've left it in because it's what I've been doing. Not really sure why...

I'll start with your last comment. What str(i) does is it converts i to its string representation (which is defined in i's class's __str__ method). If you call str(4) you get '4', for example. This is unnecessary in this case because re.findall returns a list of strings as per the documentation.
As for your actual issue: you're missing the newlines. I would also prefer to open the file fewer times than you are.
Perhaps try:
import re
with open ('test.txt') as text:
text = text.read()
match = re.findall(r'[\w\.-]+#[\w\.-]+', text)
with open('list.txt', 'a') as dest:
for i in match:
print(i)
dest.write(i + '\n')
(You can also remove the print(i) line if you don't want to see the output in the console every time a write is done.)

Searching multiple files to define variable

With Python, I need to search a file for a string and use it to define a variable. If there are no matches in that file, it searches another file. I only have 2 files for now, but handling more is a plus. Here is what I currently have:
regex = re.compile(r'\b[01] [01] '+dest+r'\b')
dalt=None
with open(os.path.join('path','to','file','file.dat'), 'r') as datfile:
for line in datfile:
if regex.search(line):
params=line.split()
dalt=int(params[1])
break
if dalt is None:
with open(os.path.join('different','file','path','file.dat'), 'r') as fdatfile:
for line in fdatfile:
if regex.search(line):
params=line.split()
dalt=int(params[1])
break
if dalt is None:
print "Not found, giving up"
dalt=0
Is there a better way to do this? I feel like a loop would work but I'm not sure how exactly. I'm sure there are also ways to make the code more "safe", suggestions in addition to answers are appreciated.
I'm coding for Python 2.73
As requested, here is an example of what I am searching for:
The string I will have to search with is "KBFI" (dest), and I want to find this line:
1 21 1 0 KBFI Boeing Field King Co Intl
Previously I had if dest in line, but in some cases dest can appear in other lines. So I switched to a regex that also matches the two digits before dest, which can be 0 or 1. This seems to be working fine at least most of the time (haven't identified any bad cases yet). Although based on the spec, supposedly the right line will start with a 1, so maybe the right search is:
r'^1\s.*'+dest
But I haven't tested that. I suppose a fairly exact search would be:
r'^1\s+\d{,5}\s+[01]\s+[01]\s+'+dest+r'\b'
Since the fields are 1, up to five digit number (this is what I need to return), 0 or 1, 0 or 1, then the string I'm searching for. (I haven't done much regex so I'm learning)

fileinput can take a list of files:
regex = re.compile(regexstring)
dir1 = "path_to_dir/file.dat"
dir2 = "path_to_dir2/file.dat"
import fileinput
import os
for line in fileinput.input([dir1,dir2]): # pass all files to check
if regex.search(line):
params = line.split()
dalt = int(params[1])
break # found it so leave the loop
print(dalt)
else: # if we get here no file had what we want
print "Not found, giving"
If you want all the files from certain directories with similar names use glob and whatever pattern you want to match:
import glob
dir1 = "path_to_dir/"
dir2 = "path_to_dir2/"
path1_files = glob.glob(dir1+"file*.dat")
path2_files = glob.glob(dir2+"file*.dat")
You might not actually need a regex either, a simple in line may be enough.

Write strings to another file

The Problem - Update:
I could get the script to print out but had a hard time trying to figure out a way to put the stdout into a file instead of on a screen. the below script worked on printing results to the screen. I posted the solution right after this code, scroll to the [ solution ] at the bottom.
First post:
I'm using Python 2.7.3. I am trying to extract the last words of a text file after the colon (:) and write them into another txt file. So far I am able to print the results on the screen and it works perfectly, but when I try to write the results to a new file it gives me str has no attribute write/writeline. Here it the code snippet:
# the txt file I'm trying to extract last words from and write strings into a file
#Hello:there:buddy
#How:areyou:doing
#I:amFine:thanks
#thats:good:I:guess
x = raw_input("Enter the full path + file name + file extension you wish to use: ")
def ripple(x):
with open(x) as file:
for line in file:
for word in line.split():
if ':' in word:
try:
print word.split(':')[-1]
except (IndexError):
pass
ripple(x)
The code above works perfectly when printing to the screen. However I have spent hours reading Python's documentation and can't seem to find a way to have the results written to a file. I know how to open a file and write to it with writeline, readline, etc, but it doesn't seem to work with strings.
Any suggestions on how to achieve this?
PS: I didn't add the code that caused the write error, because I figured this would be easier to look at.
End of First Post
The Solution - Update:
Managed to get python to extract and save it into another file with the code below.
The Code:
inputFile = open ('c:/folder/Thefile.txt', 'r')
outputFile = open ('c:/folder/ExtractedFile.txt', 'w')
tempStore = outputFile
for line in inputFile:
for word in line.split():
if ':' in word:
splitting = word.split(':')[-1]
tempStore.writelines(splitting +'\n')
print splitting
inputFile.close()
outputFile.close()
Update:
checkout droogans code over mine, it was more efficient.

Try this:
with open('workfile', 'w') as f:
f.write(word.split(':')[-1] + '\n')
If you really want to use the print method, you can:
from __future__ import print_function
print("hi there", file=f)
according to Correct way to write line to file in Python. You should add the __future__ import if you are using python 2, if you are using python 3 it's already there.

I think your question is good, and when you're done, you should head over to code review and get your code looked at for other things I've noticed:
# the txt file I'm trying to extract last words from and write strings into a file
#Hello:there:buddy
#How:areyou:doing
#I:amFine:thanks
#thats:good:I:guess
First off, thanks for putting example file contents at the top of your question.
x = raw_input("Enter the full path + file name + file extension you wish to use: ")
I don't think this part is neccessary. You can just create a better parameter for ripple than x. I think file_loc is a pretty standard one.
def ripple(x):
with open(x) as file:
With open, you are able to mark the operation happening to the file. I also like to name my file object according to its job. In other words, with open(file_loc, 'r') as r: reminds me that r.foo is going to be my file that is being read from.
for line in file:
for word in line.split():
if ':' in word:
First off, your for word in line.split() statement does nothing but put the "Hello:there:buddy" string into a list: ["Hello:there:buddy"]. A better idea would be to pass split an argument, which does more or less what you're trying to do here. For example, "Hello:there:buddy".split(":") would output ['Hello', 'there', 'buddy'], making your search for colons an accomplished task.
try:
print word.split(':')[-1]
except (IndexError):
pass
Another advantage is that you won't need to check for an IndexError, since you'll have, at least, an empty string, which when split, comes back as an empty string. In other words, it'll write nothing for that line.
ripple(x)
For ripple(x), you would instead call ripple('/home/user/sometext.txt').
So, try looking over this, and explore code review. There's a guy named Winston who does really awesome work with Python and self-described newbies. I always pick up new tricks from that guy.
Here is my take on it, re-written out:
import os #for renaming the output file
def ripple(file_loc='/typical/location/while/developing.txt'):
outfile = "output.".join(os.path.basename(file_loc).split('.'))
with open(outfile, 'w') as w:
lines = open(file_loc, 'r').readlines() #everything is one giant list
w.write('\n'.join([line.split(':')[-1] for line in lines]))
ripple()
Try breaking this down, line by line, and changing things around. It's pretty condensed, but once you pick up comprehensions and using lists, it'll be more natural to read code this way.

You are trying to call .write() on a string object.
You either got your arguments mixed up (you'll need to call fileobject.write(yourdata), not yourdata.write(fileobject)) or you accidentally re-used the same variable for both your open destination file object and storing a string.

Rename Files Based on File Content

Using Python, I'm trying to rename a series of .txt files in a directory according to a specific phrase in each given text file. Put differently and more specifically, I have a few hundred text files with arbitrary names but within each file is a unique phrase (something like No. 85-2156). I would like to replace the arbitrary file name with that given phrase for every text file. The phrase is not always on the same line (though it doesn't deviate that much) but it always is in the same format and with the No. prefix.
I've looked at the os module and I understand how
os.listdir
os.path.join
os.rename
could be useful but I don't understand how to combine those functions with intratext manipulation functions like linecache or general line reading functions.
I've thought through many ways of accomplishing this task but it seems like easiest and most efficient way would be to create a loop that finds the unique phrase in a file, assigns it to a variable and use that variable to rename the file before moving to the next file.
This seems like it should be easy, so much so that I feel silly writing this question. I've spent the last few hours looking reading documentation and parsing through StackOverflow but it doesn't seem like anyone has quite had this issue before -- or at least they haven't asked about their problem.
Can anyone point me in the right direction?
EDIT 1: When I create the regex pattern using this website, it creates bulky but seemingly workable code:
import re
txt='No. 09-1159'
re1='(No)' # Word 1
re2='(\\.)' # Any Single Character 1
re3='( )' # White Space 1
re4='(\\d)' # Any Single Digit 1
re5='(\\d)' # Any Single Digit 2
re6='(-)' # Any Single Character 2
re7='(\\d)' # Any Single Digit 3
re8='(\\d)' # Any Single Digit 4
re9='(\\d)' # Any Single Digit 5
re10='(\\d)' # Any Single Digit 6
rg = re.compile(re1+re2+re3+re4+re5+re6+re7+re8+re9+re10,re.IGNORECASE|re.DOTALL)
m = rg.search(txt)
name = m.group(0)
print name
When I manipulate that to fit the glob.glob structure, and make it like this:
import glob
import os
import re
re1='(No)' # Word 1
re2='(\\.)' # Any Single Character 1
re3='( )' # White Space 1
re4='(\\d)' # Any Single Digit 1
re5='(\\d)' # Any Single Digit 2
re6='(-)' # Any Single Character 2
re7='(\\d)' # Any Single Digit 3
re8='(\\d)' # Any Single Digit 4
re9='(\\d)' # Any Single Digit 5
re10='(\\d)' # Any Single Digit 6
rg = re.compile(re1+re2+re3+re4+re5+re6+re7+re8+re9+re10,re.IGNORECASE|re.DOTALL)
for fname in glob.glob("\file\structure\here\*.txt"):
with open(fname) as f:
contents = f.read()
tname = rg.search(contents)
print tname
Then this prints out the byte location of the the pattern -- signifying that the regex pattern is correct. However, when I add in the nname = tname.group(0) line after the original tname = rg.search(contents) and change around the print function to reflect the change, it gives me the following error: AttributeError: 'NoneType' object has no attribute 'group'. When I tried copying and pasting #joaquin's code line for line, it came up with the same error. I was going to post this as a comment to the #spatz answer but I wanted to include so much code that this seemed to be a better way to express the `new' problem. Thank you all for the help so far.
Edit 2: This is for the #joaquin answer below:
import glob
import os
import re
for fname in glob.glob("/directory/structure/here/*.txt"):
with open(fname) as f:
contents = f.read()
tname = re.search('No\. (\d\d\-\d\d\d\d)', contents)
nname = tname.group(1)
print nname
Last Edit: I got it to work using mostly the code as written. What was happening is that there were some files that didn't have that regex expression so I assumed Python would skip them. Silly me. So I spent three days learning to write two lines of code (I know the lesson is more than that). I also used the error catching method recommended here. I wish I could check all of you as the answer, but I bothered #Joaquin the most so I gave it to him. This was a great learning experience. Thank you all for being so generous with your time. The final code is below.
import os
import re
pat3 = "No\. (\d\d-\d\d)"
ext = '.txt'
mydir = '/directory/files/here'
for arch in os.listdir(mydir):
archpath = os.path.join(mydir, arch)
with open(archpath) as f:
txt = f.read()
s = re.search(pat3, txt)
if s is None:
continue
name = s.group(1)
newpath = os.path.join(mydir, name)
if not os.path.exists(newpath):
os.rename(archpath, newpath + ext)
else:
print '{} already exists, passing'.format(newpath)

Instead of providing you with some code which you will simply copy-paste without understanding, I'd like to walk you through the solution so that you will be able to write it yourself, and more importantly gain enough knowledge to be able to do it alone next time.
The code which does what you need is made up of three main parts:
Getting a list of all filenames you need to iterate
For each file, extract the information you need to generate a new name for the file
Rename the file from its old name to the new one you just generated
Getting a list of filenames
This is best achieved with the glob module. This module allows you to specify shell-like wildcards and it will expand them. This means that in order to get a list of .txt file in a given directory, you will need to call the function glob.iglob("/path/to/directory/*.txt") and iterate over its result (for filename in ...:).
Generate new name
Once we have our filename, we need to open() it, read its contents using read() and store it in a variable where we can search for what we need. That would look something like this:
with open(filename) as f:
contents = f.read()
Now that we have the contents, we need to look for the unique phrase. This can be done using regular expressions. Store the new filename you want in a variable, say newfilename.
Rename
Now that we have both the old and the new filenames, we need to simply rename the file, and that is done using os.rename(filename, newfilename).
If you want to move the files to a different directory, use os.rename(filename, os.path.join("/path/to/new/dir", newfilename). Note that we need os.path.join here to construct the new path for the file using a directory path and newfilename.

There is no checking or protection for failures (check is archpath is a file, if newpath already exists, if the search is succesful, etc...), but this should work:
import os
import re
pat = "No\. (\d\d\-\d\d\d\d)"
mydir = 'mydir'
for arch in os.listdir(mydir):
archpath = os.path.join(mydir, arch)
with open(archpath) as f:
txt = f.read()
s = re.search(pat, txt)
name = s.group(1)
newpath = os.path.join(mydir, name)
os.rename(archpath, newpath)
Edit: I tested the regex to show how it works:
>>> import re
>>> pat = "No\. (\d\d\-\d\d\d\d)"
>>> txt='nothing here or whatever No. 09-1159 you want, does not matter'
>>> s = re.search(pat, txt)
>>> s.group(1)
'09-1159'
>>>
The regex is very simple:
\. -> a dot
\d -> a decimal digit
\- -> a dash
So, it says: search for the string "No. " followed by 2+4 decimal digits separated by a dash.
The parentheses are to create a group that I can recover with s.group(1) and that contains the code number.
And that is what you get, before and after:
Text of files one.txt, two.txt and three.txt is always the same, only the number changes:
this is the first
file with a number
nothing here or whatever No. 09-1159 you want, does not matter
the number is

Create a backup of your files, then try something like this:
import glob
import os
def your_function_to_dig_out_filename(lines):
import re
# i'll let you attempt this yourself
for fn in glob.glob('/path/to/your/dir/*.txt'):
with open(fn) as f:
spam = f.readlines()
new_fn = your_function_to_dig_out_filename(spam)
if not os.path.exists(new_fn):
os.rename(fn, new_fn)
else:
print '{} already exists, passing'.format(new_fn)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.