How to find all element from str - python

I want to find all element in dict from str.
I try to write code, but it doesn't work well.
I consider using recursive function.
str = "xx111xxx200x222x"
nums = {"one hundreds": ["100","111"], "two hundreds": ["200", "222"]}
result = []
def allfind(data):
for key in nums.keys():
for num in nums[key]:
index = data.find(num)
if index > -1:
result.append(key)
return allfind(data[index+len(num):])
allfind("xx111xxx200x222x")
print result # return ["two hundreds", "two hundreds"]
# I want to get ["one hundreds", "two hundreds", "two hundreds"]

I would have transformed the dictionary to have all the values as keys and the key as corresponding values and using the RegEx suggested by Grijesh Chauhan, getting the values will be easy like this
nums, my_str = {num:key for key in nums for num in nums[key]}, "xx111xxx200x222x"
import re
print nums
# {'200': 'two hundreds', '100': 'one hundreds', '111': 'one hundreds', '222': 'two hundreds'}
print [nums[item] for item in re.split('x+', my_str) if nums.get(item, "")]
# ['one hundreds', 'two hundreds', 'two hundreds']

you can do something like(read comments):
>>> import re
>>> r = [] # return list
>>> for i in re.split('x+', "xx111xxx200x222x"): # outer loop
... for k in nums: # iterate for each key
... if i in nums[k]: # check if i in list at key
... r.append(k) # if true add in return list
...
>>> r
['one hundreds', 'two hundreds', 'two hundreds']
Note in outer loop you are iterating for following:
>>> re.split('x+', "xx111xxx200x222x")
['', '111', '200', '222', '']
# ^ ^ doesn't exists in dict values.

The reason because you are getting wrong answer is because nums is a dictionary and is orderless and so,
nums.keys() becomes ['two hundreds', 'one hundreds']
Hence you have two hundreds as your first result and then when you do
return allfind(data[index+len(num):])
it returns the string x222x. Which ofcourse has only "two hundreds" (222), so final result becomes
['two hundreds', 'two hundreds']
The solution, which I think you can do it after knowing the error, should come when you iterate over the nums keys in correct order. (Think list).
Also, try putting simple print statements for easy debugging, whenever possible.

Related

check if a string in a list is a subset of another string in the same list

I am working on a python script with a list of strings. I want to create a method that takes in a list of strings and returns a list of supersets and removes the subsets. Consider the below case:
A = ['this is a sentence', 'who is alice', 'sentence', 'hi i am carrot', 'i am carrot']
Now if you see the array A, it has two elements that are a pair superset and subset, 'this is a sentence' is a superset of 'sentence' and similarly 'hi i am carrot' and 'i am carrot' I want to write a function that removes the subsets from the list and returns the updated list.
In the above example the output would look like:
ResultA = ['this is a sentence', 'who is alice', 'hi i am carrot']
I've written a quick code sample that can showcase what I am looking for but the I am not sure if using two for loops is the right way:
elements_to_keep = []
for i in phase_two_match:
for j in phase_two_match:
if i == j:
continue
else:
if j not in i:
elements_to_keep.append(j)
This return items from A that are not in another item in A, but skips instances where the items are exact matches (ie: skip itself)
[x for x in A if not any(x in y and x!=y for y in A)]
# returns:
['this is a sentence', 'who is alice', 'hi i am carrot']
[x for i, x in enumerate(A) if all(i==idx or x not in elem for idx, elem in enumerate(A))]
This builds a list of those items in A that for each item in A either share an index with that item (i.e. are the same item) or are not a substring of that item.
A simple way to do that is following(although not the most efficient):
def myFunc(A):
duplicate_index = []
for i, a in enumerate(A):
score = [1 if a in b else 0 for b in A]
if sum(score) > 1:
duplicate_index.append(i)
return [c for i, c in enumerate(A) if i not in duplicate_index]

Adding more than one value to dictionary when looping through string

Still super new to Python 3 and have encountered a problem... I am trying to create a function which returns a dictionary with the keys being the length of each word and the values being the words in the string.
For example, if my string is: "The dogs run quickly forward to the park", my dictionary should return
{2: ['to'] 3: ['The', 'run', 'the'], 4: ['dogs', 'park], 7: ['quickly', 'forward']}
Problem is that when I loop through the items, it is only appending one of the words in the string.
def word_len_dict(my_string):
dictionary = {}
input_list = my_string.split(" ")
unique_list = []
for item in input_list:
if item.lower() not in unique_list:
unique_list.append(item.lower())
for word in unique_list:
dictionary[len(word)] = []
dictionary[len(word)].append(word)
return (dictionary)
print (word_len_dict("The dogs run quickly forward to the park"))
The code returns
{2: ['to'], 3: ['run'], 4: ['park'], 7: ['forward']}
Can someone point me in the right direction? Perhaps not giving me the answer freely, but what do I need to look at next in terms of adding the missing words to the list. I thought that appending them to the list would do it, but it's not.
Thank you!
This will solve all your problems:
def word_len_dict(my_string):
input_list = my_string.split(" ")
unique_set = set()
dictionary = {}
for item in input_list:
word = item.lower()
if word not in unique_set:
unique_set.add(word)
key = len(word)
if key not in dictionary:
dictionary[key] = []
dictionary[key].append(word)
return dictionary
You were wiping dict entries each time you encountered a new word. There were also some efficiencly problems (searching a list for membership while growing it resulted in an O(n**2) algorithm for an O(n) task). Replacing the list membership test with a set membership test corrected the efficiency problem.
It gives the correct output for your sample sentence:
>>> print(word_len_dict("The dogs run quickly forward to the park"))
{2: ['to'], 3: ['the', 'run'], 4: ['dogs', 'park'], 7: ['quickly', 'forward']}
I noticed some of the other posted solutions are failing to map words to lowercase and/or failing to remove duplicates, which you clearly wanted.
you can create first the list of the unique words like this in order to avoid a first loop, and populate the dictionary on a second step.
unique_string = set("The dogs run quickly forward to the park".lower().split(" "))
dict = {}
for word in unique_string:
key, value = len(word), word
if key not in dict: # or dict.keys() for better readability (but is the same)
dict[key] = [value]
else:
dict[key].append(value)
print(dict)
You are assigning an empty list to the dictionary item before you append the latest word, which erases all previous words.
for word in unique_list:
dictionary[len(word)] = [x for x in input_list if len(x) == len(word)]
Your code is simply resetting the key to an empty list each time, which is why you only get one value (the last value) in the list for each key.
To make sure there are no duplicates, you can set the default value of a key to a set which is a collection that enforces uniqueness (in other words, there can be no duplicates in a set).
def word_len_dict(my_string):
dictionary = {}
input_list = my_string.split(" ")
for word in input_list:
if len(word) not in dictionary:
dictionary[len(word)] = set()
dictionary[len(word)].add(word.lower())
return dictionary
Once you add that check, you can get rid of the first loop as well. Now it will work as expected.
You can also optimize the code further, by using the setdefault method of dictionaries.
for word in input_list:
dictionary.setdefault(len(word), set()).add(word.lower())
Pythonic way,
Using itertools.groupby
>>> my_str = "The dogs run quickly forward to the park"
>>> {x:list(y) for x,y in itertools.groupby(sorted(my_str.split(),key=len), key=lambda x:len(x))}
{2: ['to'], 3: ['The', 'run', 'the'], 4: ['dogs', 'park'], 7: ['quickly', 'forward']}
This option starts by creating a unique set of lowercase words and then takes advantage of dict's setdefault to avoid searching the dictionary keys multiple times.
>>> a = "The dogs run quickly forward to the park"
>>> b = set((word.lower() for word in a.split()))
>>> result = {}
>>> {result.setdefault(len(word), []).append(word.lower()) for word in b}
{None}
>>> result
{2: ['to'], 3: ['the', 'run'], 4: ['park', 'dogs'], 7: ['quickly', 'forward']}

Python: how to ignore 'substring not found' error

Let's say that you have a string array 'x', containing very long strings, and you want to search for the following substring: "string.str", within each string in array x.
In the vast majority of the elements of x, the substring in question will be in the array element. However, maybe once or twice, it won't be. If it's not, then...
1) is there a way to just ignore the case and then move onto the next element of x, by using an if statement?
2) is there a way to do it without an if statement, in the case where you have many different substrings that you're looking for in any particular element of x, where you might potentially end up writing tons of if statements?
You want the try and except block. Here is a simplified example:
a = 'hello'
try:
print a[6:]
except:
pass
Expanded example:
a = ['hello', 'hi', 'hey', 'nice']
for i in a:
try:
print i[3:]
except:
pass
lo
e
You can use list comprehension to filter the list concisely:
Filter by length:
a_list = ["1234", "12345", "123456", "123"]
print [elem[3:] for elem in a_list if len(elem) > 3]
>>> ['4', '45', '456']
Filter by substring:
a_list = ["1234", "12345", "123456", "123"]
a_substring = "456"
print [elem for elem in a_list if a_substring in elem]
>>> ['123456']
Filter by multiple substrings (Checks if all the substrings are in the element by comparing the filtered array size and the number of substrings):
a_list = ["1234", "12345", "123456", "123", "56", "23"]
substrings = ["56","23"]
print [elem for elem in a_list if\
len(filter(lambda x: x in elem, substrings)) == len(substrings)]
>>> ['123456']
Well, if I understand what you wrote, you can use the continue keyword to jump to the next element in the array.
elements = ["Victor", "Victor123", "Abcdefgh", "123456", "1234"]
astring = "Victor"
for element in elements:
if astring in element:
# do stuff
else:
continue # this is useless, but do what you want, buy without it the code works fine too.
Sorry for my English.
Use any() to see if any of the substrings are in an item of x. any() will consume a generator expression and it exhibits short circuit beavior - it will return True with the first expression that evaluates to True and stop consuming the generator.
>>> substrings = ['list', 'of', 'sub', 'strings']
>>> x = ['list one', 'twofer', 'foo sub', 'two dollar pints', 'yard of hoppy poppy']
>>> for item in x:
if any(sub in item.split() for sub in substrings):
print item
list one
foo sub
yard of hoppy poppy
>>>

How to check float string?

I have a list which consists irregular words and float numbers, I'd like to delete all these float numbers from the list, but first I need to find a way to detect them. I know str.isdigit() can discriminate numbers, but it can't work for float numbers. How to do it?
My code is like this:
my_list = ['fun','3.25','4.222','cool','82.356','go','foo','255.224']
for i in my_list:
if i.isdigit() == True:
my_list.pop(i)
# Can't work, i.isdigit returns False
Use exception handling and a list comprehension. Don't modify the list while iterating over it.
>>> def is_float(x):
... try:
... float(x)
... return True
... except ValueError:
... return False
>>> lis = ['fun','3.25','4.222','cool','82.356','go','foo','255.224']
>>> [x for x in lis if not is_float(x)]
['fun', 'cool', 'go', 'foo']
To modify the same list object use slice assignment:
>>> lis[:] = [x for x in lis if not is_float(x)]
>>> lis
['fun', 'cool', 'go', 'foo']
Easy way:
new_list = []
for item in my_list:
try:
float(item)
except ValueError:
new_list.append(item)
Using regular expressions:
import re
expr = re.compile(r'\d+(?:\.\d*)')
new_list = [item for item in my_list if not expr.match(item)]
A point about using list.pop():
When you use list.pop() to alter an existing list, you are shortening the length of the list, which means altering the indices of the list. This will lead to unexpected results if you are simultaneously iterating over the list. Also, pop() takes the index as an argument, not the element. You are iterating over the element in my_list. It is better to create a new list as I have done above.
A dead simple list comprehension, adding only slightly to isdigit:
my_list = [s for s in my_list if not all(c.isdigit() or c == "." for c in s)]
This will remove string representations of both int and float values (i.e. any string s where all characters c are numbers or a full stop).
As I understand OP the function should only remove floats. If integers should stay - consider this solution:
def is_float(x):
try:
return int(float(x)) < float(x)
except ValueError:
return False
my_list = ['fun', '3.25', 'cool', '82.356', 'go', 'foo', '255.224']
list_int = ['fun', '3.25', 'cool', '82.356', 'go', 'foo', '255.224', '42']
print [item for item in my_list if not is_float(item)]
print [item for item in list_int if not is_float(item)]
Output
['fun', 'cool', 'go', 'foo']
['fun', 'cool', 'go', 'foo', '42']
Regular expressions would do the trick - this code searches each string for the format of a float (including floats starting with or ending with a decimal point), and if the string is not a float, adds it to the new list.
import re
my_list = ['fun','3.25','4.222','cool','82.356','go','foo','255.224']
new_list = []
for pos, st in enumerate(my_list):
if not re.search('[0-9]*?[.][0-9]*', st):
new_list.append(st)
print new_list
Creating a new list avoids working on the same list you are iterating on.
Ewans answer is cleaner and quicker, I think.

How to get the values in split python?

['column1:abc,def', 'column2:hij,klm', 'column3:xyz,pqr']
I want to get the values after the :. Currently if I split it takes into account column1, column2, column3 as well, which I dont want. I want only the values.
This is similar to key-values pair in dictionary. The only dis-similarity is that it is list of strings.
How will I split it?
EDITED
user_widgets = Widgets.objects.filter(user_id = user_id)
if user_widgets:
for widgets in user_widgets:
widgets_list = widgets.gadgets_list //[u'column1:', u'column2:', u'column3:widget_basicLine']
print [item.split(":")[1].split(',') for item in widgets_list] //yields list index out of range
But when the widgets_list value is copied from the terminal and passed it runs correctly.
user_widgets = Widgets.objects.filter(user_id = user_id)
if user_widgets:
for widgets in user_widgets:
widgets_list = [u'column1:', u'column2:', u'column3:widget_basicLine']
print [item.split(":")[1].split(',') for item in widgets_list] //prints correctly.
Where I'm going wrong?
You can split items by ":", then split the item with index 1 by ",":
>>> l = ['column1:abc,def', 'column2:hij,klm', 'column3:xyz,pqr']
>>> [item.split(":")[1].split(',') for item in l]
[['abc', 'def'], ['hij', 'klm'], ['xyz', 'pqr']]
Nothing wrong with a 'for' loop and testing if your RH has actual data:
li=[u'column1:', u'column2:', u'column3:widget_basicLine', u'column4']
out=[]
for us in li:
us1,sep,rest=us.partition(':')
if rest.strip():
out.append(rest)
print out # [u'widget_basicLine']
Which can be reduced to a list comprehension if you wish:
>>> li=[u'column1:', u'column2:', u'column3:widget_basicLine', u'column4']
>>> [e.partition(':')[2] for e in li if e.partition(':')[2].strip()]
[u'widget_basicLine']
And you can further split by the comma if you have data:
>>> li=[u'column1:', u'column2:a,b', u'column3:c,d', u'column4']
>>> [e.partition(':')[2].split(',') for e in li if e.partition(':')[2].strip()]
[[u'a', u'b'], [u'c', u'd']]

Categories

Resources