How to find repeating strings one after the other?

How to find repeating strings one after the other? - python

I recently started studying Python on my own, and I have not been able to answer this question:
"Write a function that receives a list of names (as strings). The function returns a list of all the names that appear twice in a row in the list, without duplicates. For example, for the list:
["avi", "avi", "beni", "shlomo", "shlomo", "David", "haim", "moshe", "shlomo", "shlomo"] The function will return a list of Avi and Shlomo. The function must work in O (n)."
This is what I have written so far, but I have not succeeded:
def double_names(lst):
new_lst = []
for i in lst:
if lst[i] == lst[i+1]:
new_lst.append(i)
return new_lst
print(double_names(["avi", "avi", "beni", "shlomo", "shlomo", "David", "haim", "moshe", "shlomo", "shlomo"]))

Your attempt fails because:
There's not always a list element i+1 (you could use zip(lst, lst[1:]) instead).
You are not accounting for the fact that in your result no names should appear twice. You could use a set for this.
You're iterating over a list while expecting to get indices (you'll actually get list items). (Thanks, Paul)
Something like this should work:
def double_names(lst):
new_set = set()
for first, second in zip(lst, lst[1:]):
if first == second:
new_set.add(first)
return list(new_set)

keep a variable previous_name which will store the name from the previous iteration.
def double_names(names_list):
double_names = []
previous_name = None
for n in names_list:
if previous_name is not None:
if n == previous_name:
if n not in double_names:
double_names.append(n)
previous_name = n
return double_names

I think the easiest way is by saving the first element of the list. Then you start iterating over the list from first position on and check if the current element is the same as the last element you checked. If it is, then you have a match, if not then you just update the last element and keep going.
def doublenames(l):
last = l[0]
new_list = []
for element in l[1:]:
if element == last:
if element not in new_list:
new_list.append(element)
else:
last = element
return new_list
Hope it helped :)
(You can also do it in one line:
[x[0] for x in set(zip(l, l[1:])) if (x[0] == x[1])]
)

Another solution:
lst = ["avi", "avi", "beni", "shlomo", "shlomo", "David", "haim", "moshe", "shlomo", "shlomo"]
def doublenames(L):
out = {}
for v in L:
out.setdefault(v, []).append(v)
return [v[0] for v in out.values() if len(v) > 1]
print(doublenames(lst))
Prints:
['avi', 'shlomo']

If the order of the names doesn't matter, you can use a set to remove duplicates
def double_names(names):
from itertools import groupby
return list({key for key, group in groupby(names) if len(tuple(group)) >= 2})
print(double_names(["avi", "avi", "beni", "shlomo", "shlomo", "David", "haim", "moshe", "shlomo", "shlomo"]))
Output:
['shlomo', 'avi']
>>>
Just tried this with Python 3.8.2, and it seems the order is not preserved.

Related

Group elements of a list based on repetition of values

I am really new to Python and I am having a issue figuring out the problem below.
I have a list like:
my_list = ['testOne:100', 'testTwo:88', 'testThree:76', 'testOne:78', 'testTwo:88', 'testOne:73', 'testTwo:66', 'testThree:90']
And I want to group the elements based on the occurrence of elements that start with 'testOne'.
Expected Result:
new_list=[['testOne:100', 'testTwo:88', 'testThree:76'], ['testOne:78', 'testTwo:88'], ['testOne:73', 'testTwo:66', 'testThree:90']]

Just start a new list at every testOne.
>>> new_list = []
>>> for item in my_list:
if item.startswith('testOne:'):
new_list.append([])
new_list[-1].append(item)
>>> new_list
[['testOne:100', 'testTwo:88', 'testThree:76'], ['testOne:78', 'testTwo:88'], ['testOne:73', 'testTwo:66', 'testThree:90']]

Not a cool one-liner, but this works also with more general labels:
result = [[]]
seen = set()
for entry in my_list:
test, val = entry.split(":")
if test in seen:
result.append([entry])
seen = {test}
else:
result[-1].append(entry)
seen.add(test)
Here, we are keeping track of the test labels we've already seen in a set and starting a new list whenever we encounter a label we've already seen in the same list.
Alternatively, assuming the lists always start with testOne, you could just start a new list whenever the label is testOne:
result = []
for entry in my_list:
test, val = entry.split(":")
if test == "testOne":
result.append([entry])
else:
result[-1].append(entry)

It'd be nice to have an easy one liner, but I think it'd end up looking a bit too complicated if I tried that. Here's what I came up with:
# Create a list of the starting indices:
ind = [i for i, e in enumerate(my_list) if e.split(':')[0] == 'testOne']
# Create a list of slices using pairs of indices:
new_list = [my_list[i:j] for (i, j) in zip(ind, ind[1:] + [None])]

Not very sophisticated but it works:
my_list = ['testOne:100', 'testTwo:88', 'testThree:76', 'testOne:78', 'testTwo:88', 'testOne:73', 'testTwo:66', 'testThree:90']
splitting_word = 'testOne'
new_list = list()
partial_list = list()
for item in my_list:
if item.startswith(splitting_word) and partial_list:
new_list.append(partial_list)
partial_list = list()
partial_list.append(item)
new_list.append(partial_list)

joining the list into a string with delimiter |
step1="|".join(my_list)
splitting the listing based on 'testOne'
step2=step1.split("testOne")
appending "testOne" to the list elements to get the result
new_list=[[i for i in str('testOne'+i).split("|") if len(i)>0] for i in step2[1:]]

cycle "for" in Python

I have to create a three new lists of items using two different lists.
list_one = ['one', 'two','three', 'four','five']
list_two = ['blue', 'green', 'white']
So, len(list_one) != len(list_two)
Now I should create an algorithm(a cycle) which can do this:
[oneblue, twoblue, threeblue, fourblue, fiveblue]. Same for 'green' and 'white'.
I undestand that I should create three cycles but I don't know how.
I've tried to make a function like this but it doesn't works.
def mix():
i = 0
for i in range(len(list_one)):
new_list = list_one[i]+list_two[0]
i = i+1
return new_list
What am I doing wrong?

I think you might be looking for itertools.product:
>>> [b + a for a,b in itertools.product(list_two, list_one)]
['oneblue',
'twoblue',
'threeblue',
'fourblue',
'fiveblue',
'onegreen',
'twogreen',
'threegreen',
'fourgreen',
'fivegreen',
'onewhite',
'twowhite',
'threewhite',
'fourwhite',
'fivewhite']

You should do this
def cycle(list_one,list_two):
newList = []
for el1 in list_two:
for el2 in list_one:
newList.append(el2+el1)
return newList

There are a few problems with your code:
When you do a for loop for i in ...:, you do not need to initialize i (i = 0) and you should not increment it (i = i + 1) since Python knows that i will take all values specified in the for loop definition.
If your code indentation (indentation is very important in Python) is truly the one written above, your return statement is inside the for loop. As soon as your function encounters your return statement, your function will exit and return what you specified: in this case, a string.
new_list is not a list but a string.
In Python, you can loop directly over the list items as opposed to their index (for item in list_one: as opposed to for i in range(len(list_one)):
Here is your code cleaned up:
def mix():
new_list = []
for i in list_one:
new_list.append(list_one[i]+list_two[0])
return new_list
This can be rewritten using a list comprehension:
def mix(list_one, list_two):
return [item+list_two[0] for item in list_one]
And because list_two has more than one item, you would need to iterate over list_two as well:
def mix(list_one, list_two):
return [item+item2 for item in list_one for item2 in list_two]

return should be out of for loop.
No need to initialize i and increment it, since you are using range.
Also, since both list can be of variable length, don't use range. Iterate over the list elements directly.
def mix(): should be like def mix(l_one,l_two):
All above in below code:
def mix(l_one,l_two):
new_list = []
for x in l_one:
for y in l_two:
new_list.append(x+y)
return new_list
list_one = ['one', 'two','three', 'four','five']
list_two = ['blue', 'green', 'white']
n_list = mix(list_one,list_two)
print n_list
Output:
C:\Users\dinesh_pundkar\Desktop>python c.py
['oneblue', 'onegreen', 'onewhite', 'twoblue', 'twogreen', 'twowhite', 'threeblu
e', 'threegreen', 'threewhite', 'fourblue', 'fourgreen', 'fourwhite', 'fiveblue'
, 'fivegreen', 'fivewhite']
C:\Users\dinesh_pundkar\Desktop>
Using List Comprehension, mix() function will look like below:
def mix(l_one,l_two):
new_list =[x+y for x in l_one for y in l_two]
return new_list

Python unflatten list based on second list

I have 2 Python lists:
list_a = [[['Ab'], ['Qr', 'Zr']], [['Gt', 'Mh', 'Nt'], ['Dv', 'Cb']]]
list_b = [['Ab', 'QrB', 'Zr'], ['GtB', 'MhB', 'Nt6B', 'DvB', 'Cb6B5']]
I need to un-flatten list_b based on list_a. I need:
list_c = [['Ab'], ['QrB', 'Zr'], [['GtB', 'MhB', 'Nt6B'], ['DbB', 'Cb6B5']]]
Is there a way to get this list_c?
Additional Information:
The lists will always be defined such that:
A partial string from list_a will always be found in list_b. eg. for Gt in one list, there will be either Gt or GtB in the 2nd list.
Entries in each list cannot be in a different order - i.e. if Qr comes before Zr in one list then it (Qr or QrB) must come before Zr in the 2nd list.
Each list can have a maximum of 20 strings in it.
Each list has only unique strings.. eg. Gt cannot occur 2 or more times in any list.
Attempt:
Here is what I have tried:
list_c = [[],[]]
for ty,iten in enumerate(list_b):
for q in iten:
for l_e in list_a:
for items in l_e:
for t,qr in enumerate(items):
if qr in q:
list_c[ty].append([q])
the output of this is:
[[['Ab'], ['QrB'], ['Zr']], [['GtB'], ['MhB'], ['Nt6B'], ['DbB'], ['Cb6B5']]]
The problem is that ['QrB'], ['Zr'] should be combined ['QrB','Zr'] just like they are combined in list_a.
Attempt 2:
for ty,iten in enumerate(list_b):
for q in iten:
for l_e,m in enumerate(list_a):
for ss,items in enumerate(m):
for t,qr in enumerate(items):
if qr in q:
list_a[l_e][ss][t] = q
This works and produces the required output:
[[['Ab'], ['QrB', 'Zr']], [['GtB', 'MhB', 'Nt6B'], ['DvB', 'Cb6B5']]]
However, it (attempt 2) is too long and I would like to know: it does not seem that this is the proper way to do this in Python. Is there is a more Pythonic way to do this?

If all you care about is the length of the sublists in list_a then can transform list_a into its sublist lengths and then use that to slice the sublists of list_b:
# Transform list_a into len of sublists, (generator of generators :)
index_a = ((len(l2) for l2 in l1) for l1 in list_a))
list_c = []
for flatb, index in zip(list_b, index_a):
splitb = []
s = 0
for i in index:
splitb.append(flatb[s:s+i])
s += i
list_c.append(splitb)
Value of list_c:
[[['Ab'], ['QrB', 'Zr']], [['GtB', 'MhB', 'Nt6B'], ['DvB', 'Cb6B5']]]

This is a recursive variant for arbitrary depth of nesting. Not too pretty, but should work.
list_a = [[['Ab'], ['Qr', 'Zr']], [['Gt', 'Mh', 'Nt'], ['Dv', 'Cb']]]
list_b = [['Ab', 'QrB', 'Zr'], ['GtB', 'MhB', 'Nt6B', 'DvB', 'Cb6B5']]
def flatten(l):
for el in l:
if isinstance(el, list):
for sub in flatten(el):
yield sub
else:
yield el
def flitten(l1, l2, i):
result = []
for j in l1:
if isinstance(j, list):
i, res = flitten(j, l2, i)
result.append(res)
else:
result.append(l2[i])
i += 1
return i, result
def flutten(l1, l2):
i, result = flitten(l1, list(flatten(l2)), 0)
return result
print(flutten(list_a, list_b))
# prints [[['Ab'], ['QrB', 'Zr']], [['GtB', 'MhB', 'Nt6B'], ['DvB', 'Cb6B5']]]

Your code does not look too long given the fairly complicated nature of the task (find a list within a list within a list and match it to a list within another list according to the first two letters, and replace the original value with the matched value retaining the nested structure of the original list...)
You could at least eliminate one of the loops like this:
for sub_a, sub_b in zip(list_a, list_b):
for inner_a in sub_a:
for i, a in enumerate(inner_a):
for b in sub_b:
if b.startswith(a):
inner_a[i] = b
If you want a more general solution it will probably involve recursion as in #Tibor's answer.
EDIT: Given the extra information you have supplied, you could recursively work through list_a, replacing all the short strings with their full versions from an iterator based on the flattened version of list_b. This uses the fact that the strings appear in the same order in both lists with no duplicates.
def replace_abbreviations(L, full_names):
for i, item in enumerate(L):
if isinstance(item, basestring):
L[i] = full_names.next()
elif isinstance(item, list):
replace_abbreviations(item, full_names)
replace_abbreviations(list_a, (item for L in list_b for item in L))
Alternatively you can get a flattened list of the indices of each string in both lists and then loop through those:
def flat_indices(L):
for i, item in enumerate(L):
if isinstance(item, list):
for j, inner_list in flat_indices(item):
yield (j, inner_list)
else:
yield (i, L)
for (a, i), (b, j) in zip(flat_indices(list_a), flat_indices(list_b)):
a[i] = b[j]

Nested lists python

Can anyone tell me how can I call for indexes in a nested list?
Generally I just write:
for i in range (list)
but what if I have a list with nested lists as below:
Nlist = [[2,2,2],[3,3,3],[4,4,4]...]
and I want to go through the indexes of each one separately?

If you really need the indices you can just do what you said again for the inner list:
l = [[2,2,2],[3,3,3],[4,4,4]]
for index1 in xrange(len(l)):
for index2 in xrange(len(l[index1])):
print index1, index2, l[index1][index2]
But it is more pythonic to iterate through the list itself:
for inner_l in l:
for item in inner_l:
print item
If you really need the indices you can also use enumerate:
for index1, inner_l in enumerate(l):
for index2, item in enumerate(inner_l):
print index1, index2, item, l[index1][index2]

Try this setup:
a = [["a","b","c",],["d","e"],["f","g","h"]]
To print the 2nd element in the 1st list ("b"), use print a[0][1] - For the 2nd element in 3rd list ("g"): print a[2][1]
The first brackets reference which nested list you're accessing, the second pair references the item in that list.

You can do this. Adapt it to your situation:
for l in Nlist:
for item in l:
print item

The question title is too wide and the author's need is more specific. In my case, I needed to extract all elements from nested list like in the example below:
Example:
input -> [1,2,[3,4]]
output -> [1,2,3,4]
The code below gives me the result, but I would like to know if anyone can create a simpler answer:
def get_elements_from_nested_list(l, new_l):
if l is not None:
e = l[0]
if isinstance(e, list):
get_elements_from_nested_list(e, new_l)
else:
new_l.append(e)
if len(l) > 1:
return get_elements_from_nested_list(l[1:], new_l)
else:
return new_l
Call of the method
l = [1,2,[3,4]]
new_l = []
get_elements_from_nested_list(l, new_l)

n = [[1, 2, 3], [4, 5, 6, 7, 8, 9]]
def flatten(lists):
results = []
for numbers in lists:
for numbers2 in numbers:
results.append(numbers2)
return results
print flatten(n)
Output: n = [1,2,3,4,5,6,7,8,9]

I think you want to access list values and their indices simultaneously and separately:
l = [[2,2,2],[3,3,3],[4,4,4],[5,5,5]]
l_len = len(l)
l_item_len = len(l[0])
for i in range(l_len):
for j in range(l_item_len):
print(f'List[{i}][{j}] : {l[i][j]}' )

Remove duplicates in a list while keeping its order (Python)

This is actually an extension of this question. The answers of that question did not keep the "order" of the list after removing duplicates. How to remove these duplicates in a list (python)
biglist =
[
{'title':'U2 Band','link':'u2.com'},
{'title':'Live Concert by U2','link':'u2.com'},
{'title':'ABC Station','link':'abc.com'}
]
In this case, the 2nd element should be removed because a previous "u2.com" element already exists. However, the order should be kept.

use set(), then re-sort using the index of the original list.
>>> mylist = ['c','a','a','b','a','b','c']
>>> sorted(set(mylist), key=lambda x: mylist.index(x))
['c', 'a', 'b']

My answer to your other question, which you completely ignored!, shows you're wrong in claiming that
The answers of that question did not
keep the "order"
my answer did keep order, and it clearly said it did. Here it is again, with added emphasis to see if you can just keep ignoring it...:
Probably the fastest approach, for a really big list, if you want to preserve the exact order of the items that remain, is the following...:
biglist = [
{'title':'U2 Band','link':'u2.com'},
{'title':'ABC Station','link':'abc.com'},
{'title':'Live Concert by U2','link':'u2.com'}
]
known_links = set()
newlist = []
for d in biglist:
link = d['link']
if link in known_links: continue
newlist.append(d)
known_links.add(link)
biglist[:] = newlist

Generators are great.
def unique( seq ):
seen = set()
for item in seq:
if item not in seen:
seen.add( item )
yield item
biglist[:] = unique( biglist )

This page discusses different methods and their speeds:
http://www.peterbe.com/plog/uniqifiers-benchmark
The recommended* method:
def f5(seq, idfun=None):
# order preserving
if idfun is None:
def idfun(x): return x
seen = {}
result = []
for item in seq:
marker = idfun(item)
# in old Python versions:
# if seen.has_key(marker)
# but in new ones:
if marker in seen: continue
seen[marker] = 1
result.append(item)
return result
f5(biglist,lambda x: x['link'])
*by that page

This is an elegant and compact way, with list comprehension (but not as efficient as with dictionary):
mylist = ['aaa','aba','aaa','aea','baa','aaa','aac','aaa',]
[ v for (i,v) in enumerate(mylist) if v not in mylist[0:i] ]
And in the context of the answer:
[ v for (i,v) in enumerate(biglist) if v['link'] not in map(lambda d: d['link'], biglist[0:i]) ]

dups = {}
newlist = []
for x in biglist:
if x['link'] not in dups:
newlist.append(x)
dups[x['link']] = None
print newlist
produces
[{'link': 'u2.com', 'title': 'U2 Band'}, {'link': 'abc.com', 'title': 'ABC Station'}]
Note that here I used a dictionary. This makes the test not in dups much more efficient than using a list.

Try this :
list = ['aaa','aba','aaa','aea','baa','aaa','aac','aaa',]
uniq = []
for i in list:
if i not in uniq:
uniq.append(i)
print list
print uniq
output will be :
['aaa', 'aba', 'aaa', 'aea', 'baa', 'aaa', 'aac', 'aaa']
['aaa', 'aba', 'aea', 'baa', 'aac']

A super easy way to do this is:
def uniq(a):
if len(a) == 0:
return []
else:
return [a[0]] + uniq([x for x in a if x != a[0]])
This is not the most efficient way, because:
it searches through the whole list for every element in the list, so it's O(n^2)
it's recursive so uses a stack depth equal to the length of the list
However, for simple uses (no more than a few hundred items, not performance critical) it is sufficient.

I think using a set should be pretty efficent.
seen_links = set()
for index in len(biglist):
link = biglist[index]['link']
if link in seen_links:
del(biglist[index])
seen_links.add(link)
I think this should come in at O(nlog(n))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to find repeating strings one after the other? - python

Another solution: lst = ["avi", "avi", "beni", "shlomo", "shlomo", "David", "haim", "moshe", "shlomo", "shlomo"] def doublenames(L): out = {} for v in L: out.setdefault(v, []).append(v) return [v[0] for v in out.values() if len(v) > 1] print(doublenames(lst)) Prints: ['avi', 'shlomo']

Related

Group elements of a list based on repetition of values

cycle "for" in Python

Python unflatten list based on second list

Nested lists python

Remove duplicates in a list while keeping its order (Python)

Categories

Resources