I am working with some strings and I am removing some characters from them by using replace(), for example:
a = 'monsterr'
new_a = a.replace("rr", "r")
new_a
However, let's say that now I receive the following string:
In:
a = 'difference'
new_a = a.replace("rr", "r")
new_a
Out:
'difference'
How can I return nothing if my string doesnt contain rr? Is there anyway of just pass or return nothing? I tried to:
def check(a_str):
if 'rr' in a_str:
a_str = a_str.replace("rr", "r")
return a_str
else:
pass
However, it doesn't work. The expected output would be for monsterwould be nothing.
Use return:
def check(a_str):
if 'rr' in a_str:
a_str = a_str.replace("rr", "r")
return a_str
For list comprehension:
a = ["difference", "hinderr"]
x = [i.replace("rr", "r") for i in a]
Just as a little easter egg, I figured I'd include this little gem as an option as well, if only because of your question:
How can I return nothing if my string doesnt contain rr? Is there anyway of just pass or return nothing?
Using boolean operators, you could take the if line completely out of check().
def check(text, dont_want='rr', want='r'):
replacement = text.replace(dont_want, want)
return replacement != text and replacement or None
#checks if there was a change after replacing,
#if True: returns replacement
#if False: returns None
test = "differrence"
check(test)
#difference
test = "difference"
check(test)
#None
Consider this un-pythonic or not, it's another option. Plus it's along the lines of his question.
"return none if string doesn't contain rr"
For those that don't know how or why this works, (and/or enjoy learning cool python tricks but don't know this) then here's the docs page explaining boolean operators.
P.S.
Technically speaking, it is un-pythonic due to it being a ternary operation. This does go against the "Zen of Python" ~ import this but coming from C style languages I enjoy them.
Related
I have a seemingly simple problem. I have a dataset: archive.ics.uci.edu/ml/machine-learning-databases/acute/diagnosis.data
and I want to replace the "no"s to "0"s and "yes" to "1"s
I have tried this code:
fString = open("diagnosis.data","r")
fBool = open("diagnosis1.txt","w")
for line in fString:
line.replace("no","0")
line.replace("yes","1")
fBool.write(line)
fString.close()
fBool.close()
The only thing that happened is that the last yes/no gets an ഀ added. I dont know why it's not working.
Since replace returns the modified string you need to assign it. The original is left untouched. I guess you need:
with open("diagnosis.data", "r") as fString, open("diagnosis1.txt", "w") as fBool:
for line in fString:
nline = line.replace("no", "0")
nline = nline.replace("yes", "1")
fBool.write(nline)
Your issue may be that open() doesn't return a list of strings (or some other iterable type), and therefore you can't do for line in fString:, because that will not yield strings which you can then .replace().
Instead you need to do something like:
fString = open("diagnosis.data","r")
lines = fString.read().split('\n')
fBool = open("diagnosis1.txt","w")
for line in lines:
newLine = line.replace("no","0")
newLine = newLine.replace("yes","1")
fBool.write(newLine)
fString.close()
fBool.close()
This approach gets a list of strings, each of which is a line of a file, and the iterates through that. You need to make sure you use the .replace() method correctly as well, because it returns the new string, but doesn't modify the original string.
The .replace string method returns the string with the replaced parameters but it doesnt change the object so:
>>> k="lolo"
>>> k.replace('l','k')
'koko'
>>> k
'lolo'
>>> k=k.replace('l','k')
>>> k
'koko'
What you want is:
line=line.replace("no","0")
or with an auxiliar variable:
aux=line.replace("no","0").replace("yes","1")
I have a running python script that reads in a file of phone numbers. Some of these phone numbers are invalid.
import re
def IsValidNumber(number, pattern):
isMatch = re.search(pattern, number)
if isMatch is not None:
return number
numbers = [line.strip() for line in open('..\\phoneNumbers.txt', 'r')]
Then I use another list comprehension to filter out the bad numbers:
phonePattern = '^\d{10}$'
validPhoneNumbers = [IsValidNumber(x, phonePattern) for x in phoneNumbers
if IsValidNumber(x, phonePattern) is not None]
for x in validPhoneNumbers:
print x
Due to formatting, the second list comprehension spans two lines.
The problem is that although the IsValidNumber should only return the number if the match is valid, it also returns 'None' on invalid matches. So I had to modify the second list comprehension to include:
if IsValidNumber(x, phonePattern) is not None
While this works, the problem is that for each iteration in the list, the function is executed twice. Is there a cleaner approach to doing this?
Your isValidFunction should return True/False (as its name suggests). That way your list comprehension becomes:
valid = [num for num in phoneNumbers if isValidNumber(num, pattern)]
While you're at it, modify numbers to be a generator expression instead of a list comprehension (since you're interested in efficiency):
numbers = (line.strip() for line in open("..\\phoneNumbers.txt"))
Try this:
validPhoneNumbers = [x for x in phoneNumbers if isValidNumber(x, phonepattern)]
Since isValidNumber returns the same number that's passed in, without modification, you don't actually need that number. You just need to know that a number is returned at all (meaning the number is valid).
You may be able to combine the whole thing as well, with:
validPhoneNumbers = [x.strip() for x in open('..\\phonenumbers.txt', 'r') if isValidNumber(x.strip(), phonePattern)]
I would change your validity check method to simply return whether the number matches or not, but not return the number itself.
def is_valid_number(number):
return re.search(r'^\d{10}$', number)
Then you can filter out the invalid numbers in the first list comprehension:
numbers = [line.strip() for line in open('..\\phoneNumbers.txt', 'r')
if is_valid_number(line.strip())]
There are many options to work with here, including filter(None, map(isValidNumber, lines)). Most efficient is probably to let the regular expression do all the work:
import re
numpat = re.compile(r'^\s*(\d{10})\s*$', re.MULTILINE)
filecontents = open('phonenumbers.txt', 'r').read()
validPhoneNumbers = numpat.findall(filecontents)
This way there is no need for a Python loop, and you get precisely the validated numbers.
I have this code, which I want to open a specified file, and then every time there is a while loop it will count it, finally outputting the total number of while loops in a specific file. I decided to convert the input file to a dictionary, and then create a for loop that every time the word while followed by a space was seen it would add a +1 count to WHILE_ before finally printing WHILE_ at the end.
However this did not seem to work, and I am at a loss as to why. Any help fixing this would be much appreciated.
This is the code I have at the moment:
WHILE_ = 0
INPUT_ = input("Enter file or directory: ")
OPEN_ = open(INPUT_)
READLINES_ = OPEN_.readlines()
STRING_ = (str(READLINES_))
STRIP_ = STRING_.strip()
input_str1 = STRIP_.lower()
dic = dict()
for w in input_str1.split():
if w in dic.keys():
dic[w] = dic[w]+1
else:
dic[w] = 1
DICT_ = (dic)
for LINE_ in DICT_:
if ("while\\n',") in LINE_:
WHILE_ += 1
elif ('while\\n",') in LINE_:
WHILE_ += 1
elif ('while ') in LINE_:
WHILE_ += 1
print ("while_loops {0:>12}".format((WHILE_)))
This is the input file I was working from:
'''A trivial test of metrics
Author: Angus McGurkinshaw
Date: May 7 2013
'''
def silly_function(blah):
'''A silly docstring for a silly function'''
def nested():
pass
print('Hello world', blah + 36 * 14)
tot = 0 # This isn't a for statement
for i in range(10):
tot = tot + i
if_im_done = false # Nor is this an if
print(tot)
blah = 3
while blah > 0:
silly_function(blah)
blah -= 1
while True:
if blah < 1000:
break
The output should be 2, but my code at the moment prints 0
This is an incredibly bizarre design. You're calling readlines to get a list of strings, then calling str on that list, which will join the whole thing up into one big string with the quoted repr of each line joined by commas and surrounded by square brackets, then splitting the result on spaces. I have no idea why you'd ever do such a thing.
Your bizarre variable names, extra useless lines of code like DICT_ = (dic), etc. only serve to obfuscate things further.
But I can explain why it doesn't work. Try printing out DICT_ after you do all that silliness, and you'll see that the only keys that include while are while and 'while. Since neither of these match any of the patterns you're looking for, your count ends up as 0.
It's also worth noting that you only add 1 to WHILE_ even if there are multiple instances of the pattern, so your whole dict of counts is useless.
This will be a lot easier if you don't obfuscate your strings, try to recover them, and then try to match the incorrectly-recovered versions. Just do it directly.
While I'm at it, I'm also going to fix some other problems so that your code is readable, and simpler, and doesn't leak files, and so on. Here's a complete implementation of the logic you were trying to hack up by hand:
import collections
filename = input("Enter file: ")
counts = collections.Counter()
with open(filename) as f:
for line in f:
counts.update(line.strip().lower().split())
print('while_loops {0:>12}'.format(counts['while']))
When you run this on your sample input, you correctly get 2. And extending it to handle if and for is trivial and obvious.
However, note that there's a serious problem in your logic: Anything that looks like a keyword but is in the middle of a comment or string will still get picked up. Without writing some kind of code to strip out comments and strings, there's no way around that. Which means you're going to overcount if and for by 1. The obvious way of stripping—line.partition('#')[0] and similarly for quotes—won't work. First, it's perfectly valid to have a string before an if keyword, as in "foo" if x else "bar". Second, you can't handle multiline strings this way.
These problems, and others like them, are why you almost certainly want a real parser. If you're just trying to parse Python code, the ast module in the standard library is the obvious way to do this. If you want to be write quick&dirty parsers for a variety of different languages, try pyparsing, which is very nice, and comes with some great examples.
Here's a simple example:
import ast
filename = input("Enter file: ")
with open(filename) as f:
tree = ast.parse(f.read())
while_loops = sum(1 for node in ast.walk(tree) if isinstance(node, ast.While))
print('while_loops {0:>12}'.format(while_loops))
Or, more flexibly:
import ast
import collections
filename = input("Enter file: ")
with open(filename) as f:
tree = ast.parse(f.read())
counts = collections.Counter(type(node).__name__ for node in ast.walk(tree))
print('while_loops {0:>12}'.format(counts['While']))
print('for_loops {0:>14}'.format(counts['For']))
print('if_statements {0:>10}'.format(counts['If']))
I am trying to write a python code to match things from two lists in python.
One tab-delimited file looks like this:
COPB2
KLMND7
BLCA8
while the other file2 has a long list of similar looking "names", if you will. There should be some identical matches in the file, which I have succeeded in identifying and writing out to a new file. The problem is when there are additional characters at the end of one of the "names". For example, COPB2 from above should match COPB2A in file2, but it does not. Similarly KLMND7 should match KLMND79. Should I use regular expressions? Make them into strings? Any ideas are helpful, thank you!
What I have worked on so far, after the first response seen below:
with open(in_file1, "r") as names:
for line in names:
file1_list = [i.strip() for i in line.split()]
file1_str = str(file1_list)
with open(in_file2, "r") as symbols:
for line in symbols:
items = line.split("\t")
items = str(items)
matches = items.startswith(file1_str)
print matches
This code returns False when I know there should be some matches.
string.startswith() No need for regex, if it's only trailing characters
>>> g = "COPB2A"
>>> f = "COPB2"
>>> g.startswith(f)
True
Here is a working piece of code:
file1_list = []
with open(in_file1, "r") as names:
for line in names:
line_items = line.split()
for item in line_items:
file1_list.append(item)
matches = []
with open(in_file2, "r") as symbols:
for line in symbols:
file2_items = line.split()
for file2_item in file2_items:
for file1_item in file1_list:
if file2_item.startswith(file1_item):
matches.append(file2_item)
print file2_item
print matches
It may be quite slow for large files. If it's unacceptable, I could try to think about how to optimize it.
You might take a look at difflib if you need a more generic solution. Keep in mind it is a big import with lots of overhead so only use it if you really need to. Here is another question that is somewhat similar.
https://stackoverflow.com/questions/1209800/difference-between-two-strings-in-python-php
Assuming you loaded the files into lists X, Y.
## match if a or b is equal to or substring of one another in a case-sensitive way
def Match( a, b):
return a.find(b[0:min(len(a),len(b))-1])
common_words = {};
for a in X:
common_words[a]=[];
for b in Y:
if ( Match( a, b ) ):
common_words[a].append(b);
If you want to use regular expressions to do the matching, you want to use "beginning of word match" operator "^".
import re
def MatchRe( a, b ):
# make sure longer string is in 'a'.
if ( len(a) < len(b) ):
a, b = b, a;
exp = "^"+b;
q = re.match(exp,a);
if ( not q ):
return False; #no match
return True; #access q.group(0) for matches
I have a array list with large collection, and i have one input string. Large collecion if found in the input string, it will replace by given option.
I tried following but its returning wrong:
#!/bin/python
arr=['www.', 'http://', '.com', 'many many many....']
def str_replace(arr, replaceby, original):
temp = ''
for n,i in enumerate(arr):
temp = original.replace(i, replaceby)
return temp
main ='www.google.com'
main1='www.a.b.c.company.google.co.uk.com'
print str_replace(arr,'',main);
Output:
www.google
Expected:
google
You are deriving temp from the original every time, so only the last element of arr will be replaced in the temp that is returned. Try this instead:
def str_replace(arr, replaceby, original):
temp = original
for n,i in enumerate(arr):
temp = temp.replace(i, replaceby)
return temp
You don't even need temp (assuming the above code is the whole function):
def str_replace(search, replace, subject):
for s in search:
subject = subject.replace(s, replace)
return subject
Another (probably more efficient) option is to use regular expressions:
import re
def str_replace(search, replace, subject):
search = '|'.join(map(re.escape, search))
return re.sub(search, replace, subject)
Do note that these functions may produce different results if replace contains substrings from search.
temp = original.replace(i, replaceby)
It should be
temp = temp.replace(i, replaceby)
You're throwing away the previous substitutions.
Simple way :)
arr=['www.', 'http://', '.com', 'many many many....']
main ='http://www.google.com'
for item in arr:
main = main.replace(item,'')
print main