Python: how to ignore 'substring not found' error - python

Let's say that you have a string array 'x', containing very long strings, and you want to search for the following substring: "string.str", within each string in array x.
In the vast majority of the elements of x, the substring in question will be in the array element. However, maybe once or twice, it won't be. If it's not, then...
1) is there a way to just ignore the case and then move onto the next element of x, by using an if statement?
2) is there a way to do it without an if statement, in the case where you have many different substrings that you're looking for in any particular element of x, where you might potentially end up writing tons of if statements?

You want the try and except block. Here is a simplified example:
a = 'hello'
try:
print a[6:]
except:
pass
Expanded example:
a = ['hello', 'hi', 'hey', 'nice']
for i in a:
try:
print i[3:]
except:
pass
lo
e

You can use list comprehension to filter the list concisely:
Filter by length:
a_list = ["1234", "12345", "123456", "123"]
print [elem[3:] for elem in a_list if len(elem) > 3]
>>> ['4', '45', '456']
Filter by substring:
a_list = ["1234", "12345", "123456", "123"]
a_substring = "456"
print [elem for elem in a_list if a_substring in elem]
>>> ['123456']
Filter by multiple substrings (Checks if all the substrings are in the element by comparing the filtered array size and the number of substrings):
a_list = ["1234", "12345", "123456", "123", "56", "23"]
substrings = ["56","23"]
print [elem for elem in a_list if\
len(filter(lambda x: x in elem, substrings)) == len(substrings)]
>>> ['123456']

Well, if I understand what you wrote, you can use the continue keyword to jump to the next element in the array.
elements = ["Victor", "Victor123", "Abcdefgh", "123456", "1234"]
astring = "Victor"
for element in elements:
if astring in element:
# do stuff
else:
continue # this is useless, but do what you want, buy without it the code works fine too.
Sorry for my English.

Use any() to see if any of the substrings are in an item of x. any() will consume a generator expression and it exhibits short circuit beavior - it will return True with the first expression that evaluates to True and stop consuming the generator.
>>> substrings = ['list', 'of', 'sub', 'strings']
>>> x = ['list one', 'twofer', 'foo sub', 'two dollar pints', 'yard of hoppy poppy']
>>> for item in x:
if any(sub in item.split() for sub in substrings):
print item
list one
foo sub
yard of hoppy poppy
>>>

Related

How to search a specific text into Two dimensional array in Python

I am new in python. I have a two dimensional list in django. Now I want to check if given text is in list or not. But its not working. Here is my code:
newmessage = 'Bye'
stockWords = [
['hello', 'hi', 'hey', 'greetings'],
['Bye', 'Goodbye']
]
for i in range(0, len(stockWords)):
if newmessage.lower() in stockWords[i]:
return HttpResponse('Found')
else:
return HttpResponse('Not Found')
The problem is it works only for first element of list, the second one is not working.
What am I doing wrong? Any suggestion?
Update: your code checks for .lower() string, which isn't in any of the lists. ('bye' and 'Bye' are two different objects) I've tested my code without it, and it works:
>>> for i in stockWords:
if newmessage in i:
print 'found'
found
In order for this to work you need to lowercase all your strings in the list.
stocksLower = [stock.lower() for in_list in stockWords for stock in in_list]
Note it will create a single list, not list of lists.
You don't need to iterate over range when you have a sequence (list of lists in your case). You can iterate straight over it
for i in stockWords:
if newmessage.lower() in i:
return HttpResponse('Found')
else:
return HttpResponse('Not Found')
i now contains one element from stockWords list and test for containment.
Your original code was iterating over the len of stockWords which s 2. So it didn't check the inner lists.
you can try it, generate flatten one level lower list, and find in it
import itertools
stockWords = [
['hello', 'hi', 'hey', 'greetings'],
['Bye', 'Goodbye']
]
stock = itertools.chain.from_iterable(stockWords)
stock_lower = [x.lower() for x in stock]
newmessage = 'Bye'
if newmessage.lower() in stock_lower:
return HttpResponse('Found')
else:
return HttpResponse('Not Found')
newmessage = 'Bye'
stockWords = [
['hello', 'hi', 'hey', 'greetings'],
['Bye', 'Goodbye']
]
for i in range(0, len(stockWords)):
if newmessage.lower() in [str(x).lower() for x in stockWords[i]]:
print( 'Found')
else:
print( 'Not Found')
Your stockWords is a nested list when you are using for loop its taking only 2 elements (2 list), you can flat the nested list or change looping logic so that you can traverse in nested loops too
Try this(without flat):
newmessage = 'Bye'
stockWords = [['hello', 'hi', 'hey', 'greetings'],['Bye', 'Goodbye']]
for r in stockWords:
for c in r:
if newmessage.lower() == c.lower():
print("Found")
else:
print ('Not found')
If not mistaken #tisuchi is trying to create a function (wanted to return something) but not defining.
You can also use list comprehension. I have modified answer #WBM
found = False
for i in stockWords:
for j in i:
if newmessage.lower() not in j:
continue
else:
found = True
break
response = HttpResponse('Found') if found else HttpResponse('Not Found')

Generating a list of substrings not in a list of strings

I have a list of strings and a list of substrings. I want to generate a list of substrings not in the list of strings.
substring_list=["100", "101", "102", "104", "105"]
string_list=["101 foo", "102 bar", "103 baz", "104 lorem"]
I tried to do new_list = [s for s in substring_list if s not in [i for i in string_list]], but this doesn't work. I've also tried various uses of any() but have had no luck.
I'd like to return new_list=["100", "105"].
You can try this:
[sub for sub in substring_list if all(sub not in s for s in string_list)]
# ['100', '105']
Or alternatively:
[sub for sub in substring_list if not any(sub in s for s in string_list)]
# ['100', '105']
Coming from a Ruby background and since Python has any and all, I looked for a none function or method but was surprised to see it doesn't exist.
If you often use not any or all not, it could be interesting to define none() :
def none(iterable):
for element in iterable:
if element:
return False
return True
substring_list = ["100", "101", "102", "104", "105"]
string_list = ["101 foo", "102 bar", "103 baz", "104 lorem"]
print([sub for sub in substring_list if none(sub in s for s in string_list)])
# ['100', '105']
It might lead to confusion with None though. That's probably the reason why it doesn't exist.
Code written returns all because "for i in string_list" creates a new set identical string_list array solve it is split handy split the organs to benefit string_list and then when you start i [0] you create similar string_list system but contains only numbers without letters
so
new_list = [s for s in substring_list if s not in [i.split()[0] for i in string_list]]

Given a list of string, determine if one string is a prefix of another string

I want to write a Python function which checks if one string is a prefix string of another; not an arbitrary sub string of another; must be prefix. If it is, return True. For instance,
list = ['abc', 'abcd', 'xyx', 'mno']
Return True because 'abc' is a prefix of 'abcd'.
list = ['abc', 'xyzabc', 'mno']
Return False
I tried the startwith() and list comprehension, but it didn't quite work.
Appreciate for any help or pointers.
Let us first sort the given lst w.r.t length of the string, due to the known fact that sub strings always have length less than or equal to the original string, so after sorting we have strings with smaller length at the start of the list, and then we iterate over the sorted list comparing the current element with all the elements next to it, This small optimization would reduce the complexity of the problem as now we don't have to comapre each element with every other element.
lst1 = ['abc', 'abcd', 'xyx', 'mno']
lst2 = ['abc', 'xyzabc', 'mno']
lst3 = ["abc", "abc"]
def check_list(lst):
lst = list(set(lst)) #if you want to avoid redundant strings.
lst.sort(key = lambda x:len(x))
n = len(lst)
for i in xrange(n):
for j in xrange(i+1, n):
if lst[j].startswith(lst[i]):
return True
return False
print check_list(lst1)
print check_list(lst2)
print check_list(lst3)
>>> True
>>> False
>>> False #incase you use lst = list(set(lst))
Using itertools
import itertools
list1 = ["abc", "xyz", "abc123"]
products = itertools.product(list1, list1)
is_substringy = any(x.startswith(y) for x, y in products if x != y)
This isn't very optimised, but depending on the amount of data you've got to deal with, the code is fairly elegant (and short); that might trump speed in your use case.
This assumes that you don't have pure repeats in the list however (but you don't have that in your example).
import itertools
mlist = ['abc', 'abcd', 'xyx', 'mno']
#combination of list elements, 2-by-2. without repetition
In [638]: for i,j in itertools.combinations(mlist,2):
print (i,j)
.....:
('abc', 'abcd')
('abc', 'xyx')
('abc', 'mno')
('abcd', 'xyx')
('abcd', 'mno')
('xyx', 'mno')
#r holds the final result. if there is any pair where one is a prefixed of another
r=False
In [639]: for i,j in itertools.combinations(mlist,2):
r = r or i.startswith(j) # if i is the prefix of j. logical or
r = r or j.startswith(i) # if j is the prefix of i
.....:
In [640]: r
Out[640]: True

How to check float string?

I have a list which consists irregular words and float numbers, I'd like to delete all these float numbers from the list, but first I need to find a way to detect them. I know str.isdigit() can discriminate numbers, but it can't work for float numbers. How to do it?
My code is like this:
my_list = ['fun','3.25','4.222','cool','82.356','go','foo','255.224']
for i in my_list:
if i.isdigit() == True:
my_list.pop(i)
# Can't work, i.isdigit returns False
Use exception handling and a list comprehension. Don't modify the list while iterating over it.
>>> def is_float(x):
... try:
... float(x)
... return True
... except ValueError:
... return False
>>> lis = ['fun','3.25','4.222','cool','82.356','go','foo','255.224']
>>> [x for x in lis if not is_float(x)]
['fun', 'cool', 'go', 'foo']
To modify the same list object use slice assignment:
>>> lis[:] = [x for x in lis if not is_float(x)]
>>> lis
['fun', 'cool', 'go', 'foo']
Easy way:
new_list = []
for item in my_list:
try:
float(item)
except ValueError:
new_list.append(item)
Using regular expressions:
import re
expr = re.compile(r'\d+(?:\.\d*)')
new_list = [item for item in my_list if not expr.match(item)]
A point about using list.pop():
When you use list.pop() to alter an existing list, you are shortening the length of the list, which means altering the indices of the list. This will lead to unexpected results if you are simultaneously iterating over the list. Also, pop() takes the index as an argument, not the element. You are iterating over the element in my_list. It is better to create a new list as I have done above.
A dead simple list comprehension, adding only slightly to isdigit:
my_list = [s for s in my_list if not all(c.isdigit() or c == "." for c in s)]
This will remove string representations of both int and float values (i.e. any string s where all characters c are numbers or a full stop).
As I understand OP the function should only remove floats. If integers should stay - consider this solution:
def is_float(x):
try:
return int(float(x)) < float(x)
except ValueError:
return False
my_list = ['fun', '3.25', 'cool', '82.356', 'go', 'foo', '255.224']
list_int = ['fun', '3.25', 'cool', '82.356', 'go', 'foo', '255.224', '42']
print [item for item in my_list if not is_float(item)]
print [item for item in list_int if not is_float(item)]
Output
['fun', 'cool', 'go', 'foo']
['fun', 'cool', 'go', 'foo', '42']
Regular expressions would do the trick - this code searches each string for the format of a float (including floats starting with or ending with a decimal point), and if the string is not a float, adds it to the new list.
import re
my_list = ['fun','3.25','4.222','cool','82.356','go','foo','255.224']
new_list = []
for pos, st in enumerate(my_list):
if not re.search('[0-9]*?[.][0-9]*', st):
new_list.append(st)
print new_list
Creating a new list avoids working on the same list you are iterating on.
Ewans answer is cleaner and quicker, I think.

Finding a substring within a list in Python [duplicate]

This question already has answers here:
How to check if a string is a substring of items in a list of strings
(18 answers)
Closed 4 years ago.
Background:
Example list: mylist = ['abc123', 'def456', 'ghi789']
I want to retrieve an element if there's a match for a substring, like abc
Code:
sub = 'abc'
print any(sub in mystring for mystring in mylist)
above prints True if any of the elements in the list contain the pattern.
I would like to print the element which matches the substring. So if I'm checking 'abc' I only want to print 'abc123' from list.
print [s for s in list if sub in s]
If you want them separated by newlines:
print "\n".join(s for s in list if sub in s)
Full example, with case insensitivity:
mylist = ['abc123', 'def456', 'ghi789', 'ABC987', 'aBc654']
sub = 'abc'
print "\n".join(s for s in mylist if sub.lower() in s.lower())
All the answers work but they always traverse the whole list. If I understand your question, you only need the first match. So you don't have to consider the rest of the list if you found your first match:
mylist = ['abc123', 'def456', 'ghi789']
sub = 'abc'
next((s for s in mylist if sub in s), None) # returns 'abc123'
If the match is at the end of the list or for very small lists, it doesn't make a difference, but consider this example:
import timeit
mylist = ['abc123'] + ['xyz123']*1000
sub = 'abc'
timeit.timeit('[s for s in mylist if sub in s]', setup='from __main__ import mylist, sub', number=100000)
# for me 7.949463844299316 with Python 2.7, 8.568840944994008 with Python 3.4
timeit.timeit('next((s for s in mylist if sub in s), None)', setup='from __main__ import mylist, sub', number=100000)
# for me 0.12696599960327148 with Python 2.7, 0.09955992100003641 with Python 3.4
Use a simple for loop:
seq = ['abc123', 'def456', 'ghi789']
sub = 'abc'
for text in seq:
if sub in text:
print(text)
yields
abc123
This prints all elements that contain sub:
for s in filter (lambda x: sub in x, list): print (s)
I'd just use a simple regex, you can do something like this
import re
old_list = ['abc123', 'def456', 'ghi789']
new_list = [x for x in old_list if re.search('abc', x)]
for item in new_list:
print item

Categories

Resources