Is there a way to simplify this pile of if-statements? This parsing function sure works (with the right dictionaries), but it has to test 6 if-statements for each word in the input. For a 5-word sentence that would be 30 if-statements. It is also kind of hard to read.
def parse(text):
predicate=False
directObjectAdjective=False
directObject=False
preposition=False
indirectObjectAdjective=False
indirectObject=False
text=text.casefold()
text=text.split()
for word in text:
if not predicate:
if word in predicateDict:
predicate=predicateDict[word]
continue
if not directObjectAdjective:
if word in adjectiveDict:
directObjectAdjective=adjectiveDict[word]
continue
if not directObject:
if word in objectDict:
directObject=objectDict[word]
continue
if not preposition:
if word in prepositionDict:
preposition=prepositionDict[word]
continue
if not indirectObjectAdjective:
if word in adjectiveDict:
indirectObjectAdjective=adjectiveDict[word]
continue
if not indirectObject:
if word in objectDict:
indirectObject=objectDict[word]
continue
if not directObject and directObjectAdjective:
directObject=directObjectAdjective
directObjectAdjective=False
if not indirectObject and indirectObjectAdjective:
indirectObject=indirectObjectAdjective
indirectObjectAdjective=False
return [predicate,directObjectAdjective,directObject,preposition,indirectObjectAdjective,indirectObject]
Here's also a sample of a dictionary, if that's needed.
predicateDict={
"grab":"take",
"pick":"take",
"collect":"take",
"acquire":"take",
"snag":"take",
"gather":"take",
"attain":"take",
"capture":"take",
"take":"take"}
This is more of a Code Review question than a Stack Overflow one. A major issue is that you have similar data that you're keeping in separate variables. If you combine your variables, then you can iterate over them.
missing_parts_of_speech = ["predicate", [...]]
dict_look_up = {"predicate":predicateDict,
[...]
}
found_parts_of_speech = {}
for word in text:
for part in missing_parts_of_speech:
if word in dict_look_up[part]:
found_parts_of_speech[part] = dict_look_up[part][word]
missing_parts_of_speech.remove(part)
continue
I would suggest to simply use the method dict.get. This method has the optional argument default. By passing this argument you can avoid a KeyError. If the key is not present in a dictionary, the default value will be returned.
If you use the previously assigned variable as default, it will not be replaced by an arbitrary value, but the correct value. E.g., if the current word is a "predicate" the "direct object" will be replaced by the value that was already stored in the variable.
CODE
def parse(text):
predicate = False
directObjectAdjective = False
directObject = False
preposition = False
indirectObjectAdjective = False
indirectObject = False
text=text.casefold()
text=text.split()
for word in text:
predicate = predicateDict.get(word, predicate)
directObjectAdjective = adjectiveDict.get(word, directObjectAdjective)
directObject = objectDict.get(word, directObject)
preposition = prepositionDict.get(word, preposition)
indirectObjectAdjective = adjectiveDict.get(word, indirectObjectAdjective)
indirectObject = objectDict.get(word, indirectObject)
if not directObject and directObjectAdjective:
directObject = directObjectAdjective
directObjectAdjective = False
if not indirectObject and indirectObjectAdjective:
indirectObject = indirectObjectAdjective
indirectObjectAdjective = False
return [predicate, directObjectAdjective, directObject, preposition, indirectObjectAdjective, indirectObject]
PS: Use a little more spaces. Readers will thank you...
PPS: I have not tested this, for I do not have such dictionaries at hand.
PPPS: This will always return the last occurances of the types within the text, while your implementation will always return the first occurances.
You could map the different kinds of words (as strings) to dictionaries where to find those words, and then just check which of those have not been found yet and look them up if they are in those dicts.
needed = {"predicate": predicateDict,
"directObjectAdjective": adjectiveDict,
"directObject": objectDict,
"preposition": prepositionDict,
"indirectObjectAdjective": adjectiveDict,
"indirectObject": objectDict}
for word in text:
for kind in needed:
if isinstance(needed[kind], dict) and word in needed[kind]:
needed[kind] = needed[kind][word]
continue
In the end (and in each step on the way) all the items in needed that do not have a dict as a value have been found and replaced by the value from their respective dict.
(In retrospect, it might make more sense to ue two dictionaries, or one dict and a set: One for the final value for that kind of word, and one for whether they have already been found. Would probably be a bit easier to grasp.)
I suggest that you use a new pattern to write this code instead the old one. The new pattern has 9 lines and stay 9 lines - just add more dictionaries to D. The old has already 11 lines and will grow 4 lines with every additional dictionaries to test.
aDict = { "a1" : "aa1", "a2" : "aa1" }
bDict = { "b1" : "bb1", "b2" : "bb2" }
text = ["a1", "b2", "a2", "b1"]
# old pattern
a = False
b = False
for word in text:
if not a:
if word in aDict:
a = aDict[word]
continue
if not b:
if word in bDict:
b = bDict[word]
continue
print(a, b)
# new pattern
D = [ aDict, bDict]
A = [ False for _ in D]
for word in text:
for i, a in enumerate(A):
if not a:
if word in D[i]:
A[i] = D[i][word]
continue
print(A)
Related
I'm a beginner and I have a question. Is there any possibility to compare characters inside strings?
I made a function:
def animal_crackers(text):
text1 = text.split()
a = ''
count = 0
for a in text1:
for char in enumerate(a):
if char[0] == char[1]:
return True
else:
return False
Result:
>>> animal_crackers('Spam Spam')
>>> False
The logic is that I'm trying to split a string consisting of two words. Then I set those words with 1st "for" cycle and then I'm trying to get inside the string with the 2nd and this "char in enumerate(a)".
It should return True if both words start with the same letter.
This is basically not working so I'm wondering. Can you give me an advice and not ready code? Or maybe you can tell me where's mistake.
You can also have a look at Levensthein distance for strings. This is really basic, but both a good lesson for starters and a reasonable method of comparing typography.
While strings are not the same as lists, their elements can be accessed like lists.
salami = 'Salami'
spam = 'Spam'
cheese = 'Cheese'
salami[0] == spam[0] # True
salami[0] == cheese[0] # False
This is probably what you need:
def animal_crackers(text):
text1 = text.split()
for i in range(len(text1)-1):
if text1[i][0] == text1[i+1][0]:
print(True)
else:
print(False)
return
I can see where the mistake is and it is at the "enumerate(a)". when you use enumerate it will return a pair like for the first iteration it will give (0, 'S') i.e. char[0] = 0 and char[1]='S' so char[0] == char[1] is False and they are different data types. Instead try indexing like a list since text1.split() will return list. I hope it helps.
I am starting to learn Python and looked at following website: https://www.w3resource.com/python-exercises/string/
I work on #4 which is "Write a Python program to get a string from a given string where all occurrences of its first char have been changed to '$', except the first char itself."
str="restart"
char=str[0]
print(char)
strcpy=str
i=1
for i in range(len(strcpy)):
print(strcpy[i], "\n")
if strcpy[i] is char:
strcpy=strcpy.replace(strcpy[i], '$')
print(strcpy)
I would expect "resta$t" but the actual result is: $esta$t
Thank you for your help!
There are two issues, first, you are not starting iteration where you think you are:
i = 1 # great, i is 1
for i in range(5):
print(i)
0
1
2
3
4
i has been overwritten by the value tracking the loop.
Second, the is does not mean value equivalence. That is reserved for the == operator. Simpler types such as int and str can make it seem like is works in this fashion, but other types do not behave this way:
a, b = 5, 5
a is b
True
a, b = "5", "5"
a is b
True
a==b
True
### This doesn't work
a, b = [], []
a is b
False
a == b
True
As #Kevin pointed out in the comments, 99% of the time, is is not the operator you want.
As far as your code goes, str.replace will replace all instances of the argument supplied with the second arg, unless you give it an optional number of instances to replace. To avoid replacing the first character, grab the first char separately, like val = somestring[0], then replace the rest using a slice, no need for iteration:
somestr = 'restart' # don't use str as a variable name
val = somestr[0] # val is 'r'
# somestr[1:] gives 'estart'
x = somestr[1:].replace(val, '$')
print(val+x)
# resta$t
If you still want to iterate, you can do that over the slice as well:
# collect your letters into a list
letters = []
char = somestr[0]
for letter in somestr[1:]: # No need to track an index here
if letter == char: # don't use is, use == for value comparison
letter = '$' # change letter to a different value if it is equal to char
letters.append(letter)
# Then use join to concatenate back to a string
print(char + ''.join(letters))
# resta$t
There are some need of modification on your code.
Modify your code with as given in below.
strcpy="restart"
i=1
for i in range(len(strcpy)):
strcpy=strcpy.replace(strcpy[0], '$')[:]
print(strcpy)
# $esta$t
Also, the best practice to write code in Python is to use Function. You can modify your code as given below or You can use this function.
def charreplace(s):
return s.replace(s[0],'$')[:]
charreplace("restart")
#'$esta$t'
Hope this helpful.
I have code that works but I'm wondering if there is a more pythonic way to do this. I have a dictionary and I want to see if:
a key exists
that value isn't None (NULL from SQL in this case)
that value isn't simply quote quote (blank?)
that value doesn't solely consist of spaces
So in my code the keys of "a", "b", and "c" would succeed, which is correct.
import re
mydict = {
"a":"alpha",
"b":0,
"c":False,
"d":None,
"e":"",
"g":" ",
}
#a,b,c should succeed
for k in mydict.keys():
if k in mydict and mydict[k] is not None and not re.search("^\s*$", str(mydict[k])):
print(k)
else:
print("I am incomplete and sad")
What I have above works, but that seems like an awfully long set of conditions. Maybe this simply is the right solution but I'm wondering if there is a more pythonic "exists and has stuff" or better way to do this?
UPDATE
Thank you all for wonderful answers and thoughtful comments. With some of the points and tips, I've updated the question a little bit as there some conditions I didn't have which should also succeed. I have also changed the example to a loop (just easier to test right?).
Try to fetch the value and store it in a variable, then use object "truthyness" to go further on with the value
v = mydict.get("a")
if v and v.strip():
if "a" is not in the dict, get returns None and fails the first condition
if "a" is in the dict but yields None or empty string, test fails, if "a" yields a blank string, strip() returns falsy string and it fails too.
let's test this:
for k in "abcde":
v = mydict.get(k)
if v and v.strip():
print(k,"I am here and have stuff")
else:
print(k,"I am incomplete and sad")
results:
a I am here and have stuff
b I am incomplete and sad # key isn't in dict
c I am incomplete and sad # c is None
d I am incomplete and sad # d is empty string
e I am incomplete and sad # e is only blanks
if your values can contain False, 0 or other "falsy" non-strings, you'll have to test for string, in that case replace:
if v and v.strip():
by
if v is not None and (not isinstance(v,str) or v.strip()):
so condition matches if not None and either not a string (everything matches) or if a string, the string isn't blank.
The get method for checking if a key exists is more efficient that iterating through the keys. It checks to see if the key exists without iteration using an O(1) complexity as apposed to O(n). My preferred method would look something like this:
if mydict.get("a") is not None and str(mydict.get("a")).replace(" ", "") != '':
# Do some work
You can use a list comprehension with str.strip to account for whitespace in strings.
Using if v is natural in Python to cover False-like objects, e.g. None, False, 0, etc. So note this only works if 0 is not an acceptable value.
res = [k for k, v in mydict.items() if (v.strip() if isinstance(v, str) else v)]
['a']
Here's a simple one-liner to check:
The key exists
The key is not None
The key is not ""
bool(myDict.get("some_key"))
As for checking if the value contains only spaces, you would need to be more careful as None doesn't have a strip() method.
Something like this as an example:
try:
exists = bool(myDict.get('some_key').strip())
except AttributeError:
exists = False
Well I have 2 suggestions to offer you, especially if your main issue is the length of the conditions.
The first one is for the check if the key is in the dict. You don't need to use "a" in mydict.keys() you can just use "a" in mydict.
The second suggestion to make the condition smaller is to break down into smaller conditions stored as booleans, and check these in your final condition:
import re
mydict = {
"a":"alpha",
"c":None,
"d":"",
"e":" ",
}
inKeys = True if "a" in mydict else False
isNotNone = True if mydict["a"] is not None else False
isValidKey = True if not re.search("^\s*$", mydict["a"]) else False
if inKeys and isNotNone and isValidKey:
print("I am here and have stuff")
else:
print("I am incomplete and sad")
it check exactly for NoneType not only None
from types import NoneType # dont forget to import this
mydict = {
"a":"alpha",
"b":0,
"c":False,
"d":None,
"e":"",
"g":" ",
}
#a,b,c should succeed
for k in mydict:
if type(mydict[k]) != NoneType:
if type(mydict[k]) != str or type(mydict[k]) == str and mydict[k].strip():
print(k)
else:
print("I am incomplete and sad")
else:
print("I am incomplete and sad")
cond is a generator function responsible for generating conditions to apply in a short-circuiting manner using the all function. Given d = cond(), next(d) will check if a exists in the dict, and so on until there is no condition to apply, in that case all(d) will evaluate to True.
mydict = {
"a":"alpha",
"c":None,
"d":"",
"e":" ",
}
def cond ():
yield 'a' in mydict
yield mydict ['a']
yield mydict ['a'].strip ()
if all (cond ()):
print("I am here and have stuff")
else:
print("I am incomplete and sad")
I'm working in Python, using any() like so to look for a match between a String[] array and a comment pulled from Reddit's API.
Currently, I'm doing it like this:
isMatch = any(string in comment.body for string in myStringArray)
But it would also be useful to not just know if isMatch is true, but which element of myStringArray it was that had a match. Is there a way to do this with my current approach, or do I have to find a different way to search for a match?
You could use next with default=False on a conditional generator expression:
next((string for string in myStringArray if string in comment.body), default=False)
The default is returned when there is no item that matched (so it's like any returning False), otherwise the first matching item is returned.
This is roughly equivalent to:
isMatch = False # variable to store the result
for string in myStringArray:
if string in comment.body:
isMatch = string
break # after the first occurrence stop the for-loop.
or if you want to have isMatch and whatMatched in different variables:
isMatch = False # variable to store the any result
whatMatched = '' # variable to store the first match
for string in myStringArray:
if string in comment.body:
isMatch = True
whatMatched = string
break # after the first occurrence stop the for-loop.
For python 3.8 or newer use Assignment Expressions
if any((match := string) in comment.body for string in myStringArray):
print(match)
I agree with the comment that an explicit loop would be clearest. You could fudge your original like so:
isMatch = any(string in comment.body and remember(string) for string in myStringArray)
^^^^^^^^^^^^^^^^^^^^^
where:
def remember(x):
global memory
memory = x
return True
Then the global memory will contain the matched string if isMatch is True, or retain whatever value (if any) it originally had if isMatch is False.
It's not a good idea to use one variable to store two different kinds of information: whether a string matches (a bool) and what that string is (a string).
You really only need the second piece of information: while there are creative ways to do this in one statement, as in the above answer, it really makes sense to use a for loop:
match = ''
for string in myStringArray:
if string in comment.body:
match = string
break
if match:
pass # do stuff
Say you have a = ['a','b','c','d'] and b = ['x','y','d','z'],
so that by doing any(i in b for i in a) you get True.
You can get:
The array of matches : matches = list( (i in b for i in a) )
Where in a it first matches : posInA = matches.index(True)
The value : value = a[posInA]
Where in b it first matches : posInB = b.index(value)
To get all the values and their indexes, the problem is that matches == [False, False, True, True] whether the multiple values are in a or b, so you need to use enumerate in loops (or in a list comprehension).
for m,i in enumerate(a):
print('considering '+i+' at pos '+str(m)+' in a')
for n,j in enumerate(b):
print('against '+j+' at pos '+str(n)+' in b')
if i == j:
print('in a: '+i+' at pos '+str(m)+', in b: '+j+' at pos '+str(n))
I've been doing some more CodeEval challenges and came across one on the hard tab.
You are given two strings. Determine if the second string is a substring of the first (Do NOT use any substr type library function). The second string may contain an asterisk() which should be treated as a regular expression i.e. matches zero or more characters. The asterisk can be escaped by a \ char in which case it should be interpreted as a regular '' character. To summarize: the strings can contain alphabets, numbers, * and \ characters.
So you are given two strings in a file that look something like this: Hello,ell your job is to figure out if ell is in hello, what I do:
I haven't quite gotten it perfect, but I did get it to the point where it passes and works with a 65% complete. How it runs through the string, and the key, and checks if the characters match. If the characters match, it appends the character into a list. After this it divides the length of the string by 2 and checks if the length of the list is either greater than, or equal to half of the string. I figured half of the string length would be enough to verify if it indeed matches or not. Example of how it works:
h == e -> no
e == e -> yes -> list
l == e -> no
l == e -> no
...
My question is what can I do better to the point where I can verify the wildcards that are said above?
import sys
def search_string(string, key):
""" Search a string for a specified key.
If the key exists out put "true" if it doesn't output "false"
>>> search_string("test", "est")
true
>>> search_string("testing", "rawr")
false"""
results = []
for c in string:
for ch in key:
if c == ch:
results.append(c)
if len(string) / 2 < len(results) or len(string) / 2 == len(results):
return "true"
else:
return "false"
if __name__ == '__main__':
with open(sys.argv[1]) as data:
for line in data.readlines():
data_list = line.rstrip().split(",")
search_key = data_list[1]
word = data_list[0]
print(search_string(word, search_key))
I've come up with a solution to this problem. You've said "Do NOT use any substr type library function", I'm not sure If some of the functions I used are allowed or not, so tell me if I've broken any rules :D
Hope this helps you :)
def search_string(string, key):
key = key.replace("\\*", "<NormalStar>") # every \* becomes <NormalStar>
key = key.split("*") # splitting up the key makes it easier to work with
#print(key)
point = 0 # for checking order, e.g. test = t*est, test != est*t
found = "true" # default
for k in key:
k = k.replace("<NormalStar>", "*") # every <NormalStar> becomes *
if k in string[point:]: # the next part of the key is after the part before
point = string.index(k) + len(k) # move point after this
else: # k nbt found, return false
found = "false"
break
return found
print(search_string("test", "est")) # true
print(search_string("t....est", "t*est")) # true
print(search_string("n....est", "t*est")) # false
print(search_string("est....t", "t*est")) # false
print(search_string("anything", "*")) # true
print(search_string("test", "t\*est")) # false
print(search_string("t*est", "t\*est")) # true