Python String Replace Error - python

I have a python script that keeps returning the following error:
TypeError: replace() takes at least 2 arguments (1 given)
I cannot for the life of me figure out what is causing this.
Here is part of my code:
inHandler = open(inFile2, 'r')
outHandler = open(outFile2, 'w')
for line in inHandler:
str = str.replace("set([u'", "")
str = str.replace("'", "")
str = str.replace("u'", "")
str = str.replace("'])", "")
outHandler.write(str)
inHandler.close()
outHandler.close()
Everything that is seen within double quotations needs to be replaced with nothing.
So set([u' should look like

This is what you want to do:
for line in inHandler:
line = line.replace("set([u'", "")
line = line.replace("'", "")
line = line.replace("u'", "")
line = line.replace("'])", "")
outHandler.write(line)
On the documentation, wherever it says something like str.replace(old,new[,count]) the str is an example variable. In fact, str is an inbuilt function, meaning you never want to change what it means by assigning it to anything.
line = line.replace("set([u'", "")
^This sets the string equal to the new, improved string.
line = line.replace("set([u'", "")
^ This is the string of what you want to change.

There are two ways to call replace.
Let us start by defining a string:
In [19]: s = "set([u'"
We can call the replace method of string s:
In [20]: s.replace("u'", "")
Out[20]: 'set(['
Or, we can call the replace of the class str:
In [21]: str.replace(s, "u'", "")
Out[21]: 'set(['
The latter way requires three arguments because str. That is why you received the error about missing arguments.
What went wrong
Consider the code:
for line in inHandler:
str = str.replace("set([u'", "")
str = str.replace("'", "")
str = str.replace("u'", "")
str = str.replace("'])", "")
First, note the goal is to replace text in line but nowhere in the calls to replace is the variable line used for anything.
The first call to replace generates the error:
>>> str.replace("set([u'", "")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: replace() takes at least 2 arguments (1 given)
Used in the above form, str.replace interprets its first argument as the string to replace. It is as if you wrote:
"set([u'".replace("")
In other words, it thinks that set([u' is the string to operate on and the replace function was given just one argument: the empty string. That it why the message is replace() takes at least 2 arguments (1 given).
What you need is to operate on the variable line:
line = line.replace("set([u'", "")
And so on for the remaining lines in the loop.

Conclusion: you have two issue:
error: wrong syntax for str.replace
abuse: the Python reserved key str
error: wrong syntax for str.replace
for error TypeError: replace() takes at least 2 arguments (1 given)
the root cause is:
your code
str = str.replace("set([u'", "")
intension is to use str.replace do replace work
the correct (one) syntax is:
newStr = oldStr.replace(fromStr, toStr[, replaceCount])
corresponding to your code, could be:
replacedLine = line.replace("set([u'", "")
Note: another syntax is:
newStr = str.replace(oldStr, fromStr, toStr[, replaceCount])
for full details please refer another post's answer
abuse: the Python reserved key str
background knowledge
str is Python reserved key word
means: the name of String class in Python 3
str has many builtin functions
such as: str.replace(), str.lower()
also means: you should NOT use str as normal variable/function name
so your code:
for line in inHandler:
str = str.replace("set([u'", "")
should change to (something like this):
for line in inHandler:
newLine = line.replace("set([u'", "")
or
for line in inHandler:
newLine = str.replace(line, "set([u'", "")

I think this should work.
for s in inHandler:
s = s.replace("set([u'", " ") ## notice the space between the quotes
s = s.replace("'", " ")
s = s.replace("u'", " ")
s = s.replace("'])", " ")
Please refrain from using built-in data types as variables (like you have used str).

You could compress the four lines of replace code
str = str.replace("set([u'", "")
str = str.replace("'", "")
str = str.replace("u'", "")
str = str.replace("'])", "")
as,
str = re.sub(r"set\(\[u'|u'|'\]\)|'", r"", str)
Example:
>>> import re
>>> string = "foo set([u'bar'foobaru''])"
>>> string = re.sub(r"set\(\[u'|u'|'\]\)|'", r"", string)
>>> string
'foo barfoobar'

I modified your code as below:
inHandler = open('test.txt', 'r')
outHandler = open('test1.txt', 'w')
data = ''
for line in inHandler.readlines():
print 'src:' + line
line = line.replace("set([u'", "")
line = line.replace("u'", "")
line = line.replace("'])", "")
line = line.replace("'", "")
data += line
print 'replace:' + line
outHandler.write(data)
inHandler.close()
outHandler.close()
And I tested it. Result:
src:set([u'adg',u'dafasdf'])
replace:adg,dafasdf
src:set([u'adg',u'dafasdf'])
replace:adg,dafasdf
src:set([u'adg',u'dafasdf'])
replace:adg,dafasdf

if you are referring to str.replace (string) inbuild function in python then
str.replace(old, new[, count])
Return a copy of the string with all occurrences of substring old replaced by new.
If the optional argument count is given, only the first count occurrences are replaced.
Means you need to give 2 values.
And you are using line as variable so you can try.
local_string = ''
for line in inHandler:
local_string = line
local_string = local_string.replace("set([u'", "")
local_string = local_string.replace("'", "")
local_string = local_string.replace("u'", "")
local_string = local_string.replace("'])", "")

Related

String for text issue in python

I met some problems about try to use string in text.
here is a provided file sqroot2_10kdigits.txt.
the sqroot2_10kdigits.txt is below:
1.4142135623 7309504880 1688724209 6980785696 7187537694 8073176679 7379907324 7846210703 8850387534 3276415727 3501384623 0912297024
9248360558 5073721264 4121497099 9358314132 2266592750 5592755799
9505011527 8206057147 0109559971 6059702745 3459686201 4728517418
6408891986 0955232923 0484308714 3214508397 6260362799 5251407989
6872533965 4633180882 9640620615 2583523950 5474575028 7759961729
8355752203 3753185701 1354374603 4084988471
My code is below:
myfile = open("sqroot2_10kdigits.txt")
txt = myfile.read()
print(txt)
myfile.close()
Q2: Make a new empty string called sqroot_2_string. Note that there's a space between every 10 digits.Instead of using the .rstrip() method, try using .replace(" ", "") to remove all the spaces in the file and save it in the empty string I just made. Check the length of the string as well, it should be 10002. Then print the first 10 digits followed by .... Here's an example:
The first 10 digit of square root of 2 is 1.4142135623... My codes are below:
def sqroot_2_string(string):
count = 0
list = []
for i in xrange(len(string)):
if string[i] != ' ':
list.append(string[i])
return toString(list)
# Utility Function
def toString(List):
return ''.join(List)
# Driver program
string = myfile
print sqroot_2_string(string)
Anyone can check my code in Q2? I don't know how to use .replace(" ", "") to remove all the spaces in the file and save it in the empty string
You can just do
def sqroot_2_string(string):
return string.replace(" ", "")
Also note that you should do
print(sqroot_2_string(txt))
so you are using the text from the file instead of the file handle

How to search string in a line and extract data between two characters in python?

file contents:
module traffic(
green_main, yellow_main, red_main, green_first, yellow_first,
red_first, clk, rst, waiting_main, waiting_first
);
I need to search the string 'module' and I need to extract the contents between (.......); the brackets.
Here is the code I tried out, I am not able to get the result
fp = open(file_name)
contents = fp.read()
unique_word_a = '('
unique_word_b = ');'
s = contents
for line in contents:
if 'module' in line:
your_string=s[s.find(unique_word_a)+len(unique_word_a):s.find(unique_word_b)].strip()
print(your_string)
The problem with your code is here:
for line in contents:
if 'module' in line:
Here, contents is a single string holding the entire content of the file, not a list of strings (lines) or a file handle that can be looped line-by-line. Thus, your line is in fact not a line, but a single character in that string, which obviously can never contain the substring "module".
Since you never actually use the line within the loop, you could just remove both the loop and the condition and your code will work just fine. (And if you changed your code to actually loop lines, and find within those lines, it would not work since the ( and ) are not on the same line.)
Alternatively, you can use a regular expression:
>>> content = """module traffic(green_main, yellow_main, red_main, green_first, yellow_first,
... red_first, clk, rst, waiting_main, waiting_first);"""
...
>>> re.search("module \w+\((.*?)\);", content, re.DOTALL).group(1)
'green_main, yellow_main, red_main, green_first, yellow_first, \n red_first, clk, rst, waiting_main, waiting_first'
Here, module \w+\((.*?)\); means
the word module followed by a space and some word-type \w characters
an literal opening (
a capturing group (...) with anything ., including linebreaks (re.DOTALL), non-greedy *?
an literal closing ) and ;
and group(1) gets you what's found in between the (non-escaped) pair of (...)
And if you want those as a list:
>>> list(map(str.strip, _.split(",")))
['green_main', 'yellow_main', 'red_main', 'green_first', 'yellow_first', 'red_first', 'clk', 'rst', 'waiting_main', 'waiting_first']
if you want to extract content between "(" ")" you can do:(but first take care how you handle the content):
for line in content.split('\n'):
if 'module' in line:
line_content = line[line.find('(') + 1: line.find(')')]
if your content is not only in one line :
import math
def find_all(your_string, search_string, max_index=math.inf, offset=0,):
index = your_string.find(search_string, offset)
while index != -1 and index < max_index:
yield index
index = your_string.find(search_string, index + 1)
s = content.replace('\n', '')
for offset in find_all(s, 'module'):
max_index = s.find('module', offset=offset + len('module'))
if max_index == -1:
max_index = math.inf
print([s[start + 1: stop] for start, stop in zip(find_all(s, '(',max_index, offset), find_all(s, ')', max_index, offset))])

To add a new line before a set of characters in a line using python

I have a line of huge characters in which a set of characters keep repeating. The line is : qwethisistheimportantpartqwethisisthesecondimportantpart
There are no spaces in the string. I want to add a new line before the string 'qwe' so that I can distinguish every important part from the other.
Output :
qwethisistheimportantpart
qwethisisthesecondimportantpart
I tried using
for line in infile:
if line.startswith("qwe"):
line="\n" + line
and it doesn't seem to work
str.replace() can do what you want:
line = 'qwethisistheimportantpartqwethisisthesecondimportantpart'
line = line.replace('qwe', '\nqwe')
print(line)
You can use re.split() and then join with \nqwe:
import re
s = "qwethisistheimportantpartqwethisisthesecondimportantpart"
print '\nqwe'.join(re.split('qwe', s))
Output:
qwethisistheimportantpart
qwethisisthesecondimportantpart
I hope this will help you
string = 'qwethisistheimportantpartqwethisisthesecondimportantpart'
split_factor = 'qwe'
a , b , c = map(str,string.split(split_factor))
print split_factor + b
print split_factor + c
Implemented in Python 2.7
This yields same output as you have mentioned buddy.
output:
qwethisistheimportantpart
qwethisisthesecondimportantpart

How to Print this statment in to a txt file

I am trying to write a txt file but I am getting a TypeError.
How do I go about this? Here is my code below:
yesterdays_added = f1_set - f2_set
yesterdays_removed = f2_set -f1_set
with open('me{}{}{}.txt'.format(dt.year, '%02d' % dt.month, '%02d' % dt.day), 'w') as out:
for line in todays:
if line in yesterdays_added:
out.write( 'Removed', line.strip())
elif line in yesterdays_removed:
out.write ('Added', line.strip())
for line in yesterdays:
if line in yesterdays_added:
out.write ('Removed', line.strip())
elif line in yesterdays_removed:
out.write ('Added', line.strip())
This is the error I am getting:
out.write ('Added', line.strip())
TypeError: function takes exactly 1 argument (2 given)
You need to concatenate those together.
out.write("Added "+line.strip()) # Or out.write("Added {}".format(line.strip()))
For example,
>>> "Added "+"abc\n".strip()
'Added abc'
From The Python Docs
f.write(string) writes the contents of string to the file, returning
None.
Whenever in doubt, use help().
write(...)
write(str) -> None. Write string str to file.
This says that write() only takes in one argument. (Whereas you provide 2, hence the error)
As the error message suggests, write takes only one argument. If you want to write two things, make two calls to write, or concatenate them with +.
You received this error because the write method takes one argument of type string.
The writelines method, however, accepts one argument of type iterable.
The writelines method is the preferred method in this case, unless you would like to format the output string.
writelines example:
lines_to_write = ('Removed', line.strip(),)
open.writelines(lines_to_write)
write example:
line_to_write = '{0}...fancy formatting...{1}'.format('Removed', line.strip(),)
open.write(line_to_write)
Documentation:
http://docs.python.org/2/library/stdtypes.html#file.write
http://docs.python.org/2/library/stdtypes.html#file.writelines

Python split string on quotes

I'm a python learner. If I have a lines of text in a file that looks like this
"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"
Can I split the lines around the inverted commas? The only constant would be their position in the file relative to the data lines themselves. The data lines could range from 10 to 100+ characters (they'll be nested network folders). I cannot see how I can use any other way to do those markers to split on, but my lack of python knowledge is making this difficult.
I've tried
optfile=line.split("")
and other variations but keep getting valueerror: empty seperator. I can see why it's saying that, I just don't know how to change it. Any help is, as always very appreciated.
Many thanks
You must escape the ":
input.split("\"")
results in
['\n',
'Y:\\DATA\x0001\\SERVER\\DATA.TXT',
' ',
'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT',
'\n']
To drop the resulting empty lines:
[line for line in [line.strip() for line in input.split("\"")] if line]
results in
['Y:\\DATA\x0001\\SERVER\\DATA.TXT', 'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT']
I'll just add that if you were dealing with lines that look like they could be command line parameters, then you could possibly take advantage of the shlex module:
import shlex
with open('somefile') as fin:
for line in fin:
print shlex.split(line)
Would give:
['Y:\\DATA\\00001\\SERVER\\DATA.TXT', 'V:\\DATA2\\00002\\SERVER2\\DATA2.TXT']
No regex, no split, just use csv.reader
import csv
sample_line = '10.0.0.1 foo "24/Sep/2015:01:08:16 +0800" www.google.com "GET /" -'
def main():
for l in csv.reader([sample_line], delimiter=' ', quotechar='"'):
print l
The output is
['10.0.0.1', 'foo', '24/Sep/2015:01:08:16 +0800', 'www.google.com', 'GET /', '-']
shlex module can help you.
import shlex
my_string = '"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"'
shlex.split(my_string)
This will spit
['Y:\\DATA\x0001\\SERVER\\DATA.TXT', 'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT']
Reference: https://docs.python.org/2/library/shlex.html
Finding all regular expression matches will do it:
input=r'"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"'
re.findall('".+?"', # or '"[^"]+"', input)
This will return the list of file names:
["Y:\DATA\00001\SERVER\DATA.TXT", "V:\DATA2\00002\SERVER2\DATA2.TXT"]
To get the file name without quotes use:
[f[1:-1] for f in re.findall('".+?"', input)]
or use re.finditer:
[f.group(1) for f in re.finditer('"(.+?)"', input)]
The following code splits the line at each occurrence of the inverted comma character (") and removes empty strings and those consisting only of whitespace.
[s for s in line.split('"') if s.strip() != '']
There is no need to use regular expressions, an escape character, some module or assume a certain number of whitespace characters between the paths.
Test:
line = r'"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"'
output = [s for s in line.split('"') if s.strip() != '']
print(output)
>>> ['Y:\\DATA\\00001\\SERVER\\DATA.TXT', 'V:\\DATA2\\00002\\SERVER2\\DATA2.TXT']
I think what you want is to extract the filepaths, which are separated by spaces. That is you want to split the line about items contained within quotations. I.e with a line
"FILE PATH" "FILE PATH 2"
You want
["FILE PATH","FILE PATH 2"]
In which case:
import re
with open('file.txt') as f:
for line in f:
print(re.split(r'(?<=")\s(?=")',line))
With file.txt:
"Y:\DATA\00001\SERVER\DATA MINER.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"
Outputs:
>>>
['"Y:\\DATA\\00001\\SERVER\\DATA MINER.TXT"', '"V:\\DATA2\\00002\\SERVER2\\DATA2.TXT"']
This was my solution. It parses most sane input exactly the same as if it was passed into the command line directly.
import re
def simpleParse(input_):
def reduce_(quotes):
return '' if quotes.group(0) == '"' else '"'
rex = r'("[^"]*"(?:\s|$)|[^\s]+)'
return [re.sub(r'"{1,2}',reduce_,z.strip()) for z in re.findall(rex,input_)]
Use case: Collecting a bunch of single shot scripts into a utility launcher without having to redo command input much.
Edit:
Got OCD about the stupid way that the command line handles crappy quoting and wrote the below:
import re
tokens = list()
reading = False
qc = 0
lq = 0
begin = 0
for z in range(len(trial)):
char = trial[z]
if re.match(r'[^\s]', char):
if not reading:
reading = True
begin = z
if re.match(r'"', char):
begin = z
qc = 1
else:
begin = z - 1
qc = 0
lc = begin
else:
if re.match(r'"', char):
qc = qc + 1
lq = z
elif reading and qc % 2 == 0:
reading = False
if lq == z - 1:
tokens.append(trial[begin + 1: z - 1])
else:
tokens.append(trial[begin + 1: z])
if reading:
tokens.append(trial[begin + 1: len(trial) ])
tokens = [re.sub(r'"{1,2}',lambda y:'' if y.group(0) == '"' else '"', z) for z in tokens]
I know this got answered a million year ago, but this works too:
input = '"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"'
input = input.replace('" "','"').split('"')[1:-1]
Should output it as a list containing:
['Y:\\DATA\x0001\\SERVER\\DATA.TXT', 'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT']
My question Python - Error Caused by Space in argv Arument was marked as a duplicate of this one. We have a number of Python books doing back to Python 2.3. The oldest referred to using a list for argv, but with no example, so I changed things to:-
repoCmd = ['Purchaser.py', 'task', repoTask, LastDataPath]
SWCore.main(repoCmd)
and in SWCore to:-
sys.argv = args
The shlex module worked but I prefer this.

Categories

Resources