merge adjacent number in a list in python - python

I have a list that contains a random number of ints.
I would like to iterate over this list, and if a number and the successive number are within one numeric step of one another, I would like to concatenate them into a sublist.
For example:
input = [1,2,4,6,7,8,10,11]
output = [[1,2],[4],[6,7,8],[10,11]]
The input list will always contain positive ints sorted in increasing order.
I tried some of the code from here.
initerator = iter(inputList)
outputList = [c + next(initerator, "") for c in initerator]
Although I can concat every two entries in the list, I cannot seem to add a suitable if in the list comprehension.
Python version = 3.4

Unless you have to have a one-liner, you could use a simple generator function, combining elements until you hit a non consecutive element:
def consec(lst):
it = iter(lst)
prev = next(it)
tmp = [prev]
for ele in it:
if prev + 1 != ele:
yield tmp
tmp = [ele]
else:
tmp.append(ele)
prev = ele
yield tmp
Output:
In [2]: lst = [1, 2, 4, 6, 7, 8, 10, 11]
In [3]: list(consec(lst))
Out[3]: [[1, 2], [4], [6, 7, 8], [10, 11]]

Nice way (found the "splitting" indices and then slice:
input = [1,2,4,6,7,8,10,11]
idx = [0] + [i+1 for i,(x,y) in enumerate(zip(input,input[1:])) if x+1!=y] + [len(input)]
[ input[u:v] for u,v in zip(idx, idx[1:]) ]
#output:
[[1, 2], [4], [6, 7, 8], [10, 11]]
using enumerate() and zip().

Simplest version I have without any imports:
def mergeAdjNum(l):
r = [[l[0]]]
for e in l[1:]:
if r[-1][-1] == e - 1:
r[-1].append(e)
else:
r.append([e])
return r
About 33% faster than one liners.
This one handles the character prefix grouping mentioned in a comment:
def groupPrefStr(l):
pattern = re.compile(r'([a-z]+)([0-9]+)')
r = [[l[0]]]
pp, vv = re.match(pattern, l[0]).groups()
vv = int(vv)
for e in l[1:]:
p,v = re.match(pattern, e).groups()
v = int(v)
if p == pp and v == vv + 1:
r[-1].append(e)
else:
pp, vv = p, v
r.append([e])
return r
This is way slower than the number only one. Knowing the exact format of the prefix (only one char ?) could help avoid using the re module and speed things up.

Related

Sort tree which divides list into k parts python

I understand how to sort a list using a binary tree. Eg. sort [ 1,3,5,6,7,3,4,2] from smallest to largest. I recursively split the data into 2 parts each time until it becomes n lists. I then compare 2 lists at a time and append the smaller value into a new list. I do not understand how to do this when it requiress me to splits a list into k parts each time. Eg. k=3. [1,3,5] [6,7,3] [4,2] .I could only find a solution in Java so could someone explain this to me using python?
You have k sublists. At every iteration, find the sublist whose first element is the smallest; append that element to the result list; advance one in that sublist and don't advance in the other sublists.
This is easier if you have a function arg_min or min_with_index that gives you the smallest element as well as its index (so you know which sublist it comes from).
Here are two equivalent ways of writing function min_with_index using python's builtin min to get the min, and enumerate to get the index:
def min_with_index(it):
return min(enumerate(it), key=lambda p:p[1])
import operator
def min_with_index(it):
return min(enumerate(it), key=operator.itemgetter(1))
# >>> min_with_index([14,16,13,15])
# (2, 13)
This was for merging. Here are two different ways of splitting, using list slices:
def split_kway_1(l, k):
return [l[i::k] for i in range(k)]
def split_kway_2(l, k):
j = (len(l)-1) // k + 1
return [l[i:i+j] for i in range(0,len(l),j)]
def split_kway_3(l, k):
j = len(l) // k
result = [l[i:i+j] for i in range(0, j*(k-1), j)]
result.append(l[j*(k-1):])
return result
# >>> split_kway_1(list(range(10)), 3)
# [[0, 3, 6, 9], [1, 4, 7], [2, 5, 8]]
# >>> split_kway_2(list(range(10)), 3)
# [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9]]
# >>> split_kway_3(list(range(10)), 3)
# [[0, 1, 2], [3, 4, 5], [6, 7, 8, 9]]
# versions 2 and 3 differ only when the length of the list is not a multiple of k
And now we can combine splitting and merging to write merge sort:
import operator
def split_kway(l, k):
return [l[i::k] for i in range(k)]
def min_with_index(it):
return min(enumerate(it), key=operator.itemgetter(1))
def merge_kway(list_of_sublists):
result = []
list_of_sublists = [l for l in list_of_sublists if len(l) > 0]
while list_of_sublists:
i,v = min_with_index(l[0] for l in list_of_sublists)
result.append(v)
if len(list_of_sublists[i]) > 1:
list_of_sublists[i].pop(0) # advance in sublist i
else:
list_of_sublists.pop(i) # remove sublist i which is now empty
return result
def merge_sort_kway(l, k):
if len(l) > 1:
list_of_sublists = split_kway(l, k)
list_of_sublists = [merge_sort_kway(l, k) for l in list_of_sublists]
return merge_kway(list_of_sublists)
else:
return list(l)
See also: Wikipedia on k-way merge

Unable to reset counters in for loop

I am trying to amend a list of integers in a way that every 2 duplicating integers will be multiplied by 2 and will replace the duplicates. here is an example:
a = [1, 1, 2, 3] = [2, 2 ,3] = [4 ,3]
also : b = [2, 3, 3, 6 ,9] = [2 , 6 , 6, 9] = [2, 12 , 9]
I am using the code below to achieve this. Unfortunately, every time I find a match my index would skip the next match.
user_input = [int(a) for a in input().split()]
for index, item in enumerate(user_input):
while len(user_input)-2 >= index:
if item == user_input[index + 1]:
del user_input[index]
del user_input[index]
item += item
user_input.insert(index,item)
break
print(*user_input)
In Python, you should never modify a container object while you are iterating over it. There are some exceptions if you know what you are doing, but you certainly should not change the size of the container object. That is what you are trying to do and that is why it fails.
Instead, use a different approach. Iterate over the list but construct a new list. Modify that new list as needed. Here is code that does what you want. This builds a new list named new_list and either changes the last item(s) in that list or appends a new item. The original list is never changed.
user_input = [int(a) for a in input().split()]
new_list = []
for item in user_input:
while new_list and (item == new_list[-1]):
new_list.pop()
item *= 2
new_list.append(item)
print(*new_list)
This code passes the two examples you gave. It also passes the example [8, 4, 2, 1, 1, 7] which should result in [16, 7]. My previous version did not pass that last test but this new version does.
Check if this works Rory!
import copy
user_input = [1,1,2,3]
res = []
while res!=user_input:
a = user_input.pop(0)
if len(user_input)!=0
b = user_input.pop(0)
if a==b:
user_input.insert(0,a+b)
else:
res.append(a)
user_input.insert(0,b)
else:
res.append(a)
user_input = copy.deepcopy(res)
You can use itertools.groupby and a recursion:
Check for same consecutive elements:
def same_consec(lst):
return any(len(list(g)) > 1 for _, g in groupby(lst))
Replace consecutive same elements:
def replace_consec(lst):
if same_consec(lst):
lst = [k * 2 if len(list(g)) > 1 else k for k, g in groupby(lst)]
return replace_consec(lst)
else:
return lst
Usage:
>>> a = [8, 4, 2, 1, 1, 7]
>>> replace_consec(a)
[16, 7]

Find unique pairs in list of pairs

I have a (large) list of lists of integers, e.g.,
a = [
[1, 2],
[3, 6],
[2, 1],
[3, 5],
[3, 6]
]
Most of the pairs will appear twice, where the order of the integers doesn't matter (i.e., [1, 2] is equivalent to [2, 1]). I'd now like to find the pairs that appear only once, and get a Boolean list indicating that. For the above example,
b = [False, False, False, True, False]
Since a is typically large, I'd like to avoid explicit loops. Mapping to frozensets may be advised, but I'm not sure if that's overkill.
ctr = Counter(frozenset(x) for x in a)
b = [ctr[frozenset(x)] == 1 for x in a]
We can use Counter to get counts of each list (turn list to frozenset to ignore order) and then for each list check if it only appears once.
Here's a solution with NumPy that 10 times faster than the suggested frozenset solution:
a = numpy.array(a)
a.sort(axis=1)
b = numpy.ascontiguousarray(a).view(
numpy.dtype((numpy.void, a.dtype.itemsize * a.shape[1]))
)
_, inv, ct = numpy.unique(b, return_inverse=True, return_counts=True)
print(ct[inv] == 1)
Sorting is fast and makes sure that the edges [i, j], [j, i] in the original array identify with each other. Much faster than frozensets or tuples.
Row uniquification inspired by https://stackoverflow.com/a/16973510/353337.
Speed comparison for different array sizes:
The plot was created with
from collections import Counter
import numpy
import perfplot
def fs(a):
ctr = Counter(frozenset(x) for x in a)
b = [ctr[frozenset(x)] == 1 for x in a]
return b
def with_numpy(a):
a = numpy.array(a)
a.sort(axis=1)
b = numpy.ascontiguousarray(a).view(
numpy.dtype((numpy.void, a.dtype.itemsize * a.shape[1]))
)
_, inv, ct = numpy.unique(b, return_inverse=True, return_counts=True)
res = ct[inv] == 1
return res
perfplot.save(
"out.png",
setup=lambda n: numpy.random.randint(0, 10, size=(n, 2)),
kernels=[fs, with_numpy],
labels=["frozenset", "numpy"],
n_range=[2 ** k for k in range(15)],
xlabel="len(a)",
)
You could scan the list from start to end, while maintaining a map of encountered pairs to their first position. Whenever you process a pair, you check to see if you've encountered it before. If that's the case, both the first encounter's index in b and the current encounter's index must be set to False. Otherwise, we just add the current index to the map of encountered pairs and change nothing about b. b will start initially all True. To keep things equivalent wrt [1,2] and [2,1], I'd first simply sort the pair, to obtain a stable representation. The code would look something like this:
def proc(a):
b = [True] * len(a) # Better way to allocate this
filter = {}
idx = 0
for p in a:
m = min(p)
M = max(p)
pp = (m, M)
if pp in filter:
# We've found the element once previously
# Need to mark both it and the current value as "False"
# If we encounter pp multiple times, we'll set the initial
# value to False multiple times, but that's not an issue
b[filter[pp]] = False
b[idx] = False
else:
# This is the first time we encounter pp, so we just add it
# to the filter for possible later encounters, but don't affect
# b at all.
filter[pp] = idx
idx++
return b
The time complexity is O(len(a)) which is good, but the space complexity is also O(len(a)) (for filter), so this might not be so great. Depending on how flexible you are, you can use an approximate filter such as a Bloom filter.
#-*- coding : utf-8 -*-
a = [[1, 2], [3, 6], [2, 1], [3, 5], [3, 6]]
result = filter(lambda el:(a.count([el[0],el[1]]) + a.count([el[1],el[0]]) == 1),a)
bool_res = [ (a.count([el[0],el[1]]) + a.count([el[1],el[0]]) == 1) for el in a]
print result
print bool_res
wich gives :
[[3, 5]]
[False, False, False, True, False]
Use a dictionary for an O(n) solution.
a = [ [1, 2], [3, 6], [2, 1], [3, 5], [3, 6] ]
dict = {}
boolList = []
# Iterate through a
for i in range (len(a)):
# Assume that this element is not a duplicate
# This 'True' is added to the corresponding index i of boolList
boolList += [True]
# Set elem to the current pair in the list
elem = a[i]
# If elem is in ascending order, it will be entered into the map as is
if elem[0] <= elem[1]:
key = repr(elem)
# If not, change it into ascending order so keys can easily be compared
else:
key = repr( [ elem[1] ] + [ elem[0] ])
# If this pair has not yet been seen, add it as a key to the dictionary
# with the value a list containing its index in a.
if key not in dict:
dict[key] = [i]
# If this pair is a duploicate, add the new index to the dict. The value
# of the key will contain a list containing the indeces of that pair in a.
else:
# Change the value to contain the new index
dict[key] += [i]
# Change boolList for this to True for this index
boolList[i] = False
# If this is the first duplicate for the pair, make the first
# occurrence of the pair into a duplicate as well.
if len(dict[key]) <= 2:
boolList[ dict[key][0] ] = False
print a
print boolList

Nested lists , check if one list has common elements with other and if so join [duplicate]

This question already has answers here:
Union find implementation using Python
(4 answers)
Closed 7 years ago.
I have a lists in list say [[1, 3, 5], [2, 4], [1,7,9]]
my requirement is i want to iterate through the list and reduce it to
[[1,3,5,7,9], [2,4]].
How would i do it??
Algo:
Get base element from the new method.
Remove First item from input list and create new variable for that.
Iterate on every item from the new list.
Check any element from item is present in the base set or not by set intersection method.
If present then do 6,7,8,9
Update base with current item by set update method.
Remove current item from the list.
Set flag to True.
break the for loop because need to check again from first item.
Create final result list add adding base and remaining list.
[Edit:]
Issue:
Previous code considering base item as first item from the given list, but when there is no matching of this item with other items and other items have matching then code will not work.
Updated:
Get base item from the given list which have matching with any one item from the list.
[Edit2]:
Inserted merged item into respective position
Demo:
import copy
a = [[13, 15, 17], [66,77], [1, 2, 4], [1,7,9]]
#- Get base
base = None
length_a = len(a)
base_flag = True
i = -1
while base_flag and i!=length_a-1:
i += 1
item = a[i]
for j in xrange(i+1, length_a):
tmp = set(item).intersection(set(a[j]))
if tmp:
base = set(item)
base_flag = False
break
print "Selected base:", base
if base:
tmp_list = copy.copy(a)
target_index = i
tmp_list.pop(target_index)
flag = True
while flag:
flag = False
for i, item in enumerate(tmp_list):
tmp = base.intersection(set(item))
if tmp:
base.update(set(item))
tmp_list.pop(i)
flag = True
break
print "base:", base
print "tmp_list:", tmp_list
result = tmp_list
result.insert(target_index, list(base))
else:
result = a
print "\nFinal result:", result
Output:
$ python task4.py
Selected base: set([1, 2, 4])
base: set([1, 2, 4, 7, 9])
tmp_list: [[13, 15, 17], [66, 77]]
Final result: [[13, 15, 17], [66, 77], [1, 2, 4, 7, 9]]
It's quite inefficient, but this does the trick:
def combine_lists(lists):
# Keep a set of processed items
skip = set()
for i, a in enumerate(lists):
# If we already used this list, skip it
if i in skip:
continue
for j, b in enumerate(lists[i + 1:], i + 1):
# Use a set to check if there are common numbers
if set(a) & set(b):
skip.add(j)
for x in b:
if x not in a:
a.append(x)
# yield all lists that were not added to different lists
for i, a in enumerate(lists):
if i not in skip:
yield a
[edit] Just noticed that the order doesn't matter anymore (your output suggests that it does), that makes things easier :)
This version should be fairly optimal:
def combine_lists(lists):
# Keep a set of processed items
skip = set()
sets = map(set, lists)
for i, a in enumerate(sets):
# If we already returned this set, skip it
if i in skip:
continue
for j, b in enumerate(sets[i + 1:], i + 1):
# Use a set to check if there are common numbers
if a & b:
skip.add(j)
a |= b
# yield all sets that were not added to different sets
for i, a in enumerate(sets):
if i not in skip:
yield a
a = [[1,2], [3,4 ], [1,5,3], [5]] # output: [set([1, 2, 3, 4, 5])]
# a = [[1, 3, 5], [2, 4], [1,7,9]] # output: [set([1, 3, 5, 7, 9]), set([2, 4])]
# convert them to sets
a = [set(x) for x in a]
go = True
while go:
merged = False
head = a[0]
for idx, item in enumerate(a[1:]):
if head.intersection(item):
a[0] = head.union(item)
a.pop(idx + 1)
merged = True
break
if not merged:
go = False
print a

How to remove an entry in a list based on a value in the sublist

I have a list as below:
list = [ [1,2,3,4,5],
[1,2,3,3,5],
[1,2,3,2,5],
[1,2,3,4,6] ]
I would like to parse through this list and remove the entry if it satisfy below conditions:
if list[i][0] is the same as list[i+1][0] AND
if list[i][4] is the same as list[i+1][4]
which will result in below list:
list = [ [1,2,3,4,5],
[1,2,3,4,6]]
Any help is much appreciated. Thanks.
Edit: Using Python 2.5.4
Use a list comprehension to keep everything not matching the condition:
[sublist for i, sublist in enumerate(yourlist)
if i + 1 == len(yourlist) or (sublist[0], sublist[4]) != (yourlist[i+1][0], yourlist[i + 1][4])]
So, any row that is either the last one, or one where the first and last element do not match the same columns in the next row is allowed.
Result:
>>> [sublist for i, sublist in enumerate(yourlist)
... if i + 1 == len(yourlist) or (sublist[0], sublist[4]) != (yourlist[i+1][0], yourlist[i + 1][4])]
[[1, 2, 3, 2, 5], [1, 2, 3, 4, 6]]
Less concise non-list comprehension version.
list = [ [1,2,3,4,5],
[1,2,3,3,5],
[1,2,3,2,5],
[1,2,3,4,6] ]
output = []
for i, v in enumerate(list):
if i +1 < len(list):
if not (list[i][0] == list[i+1][0] and list[i][4] == list[i+1][4]):
output.append(v)
else:
output.append(v)
print output
Just to put some itertools on the table :-)
from itertools import izip_longest
l = [ [1,2,3,4,5],
[1,2,3,3,5],
[1,2,3,2,5],
[1,2,3,4,6] ]
def foo(items):
for c, n in izip_longest(items, items[1:]):
if not n or c[0] != n[0] or c[4] != n[4]:
yield c
print list(foo(l))
Output:
[[1, 2, 3, 2, 5], [1, 2, 3, 4, 6]]
If you don't mind that this don't work in place put rather creates a new list.
Edit:
Since you told us you are using 2.5.4, you can use a method like the following instead of izip_longest:
# warning! items must not be empty :-)
def zip_longest(items):
g = iter(items)
next(g)
for item in items:
yield item, next(g, None)
I think list comprehension is best for this code.
res = [value for index, value in enumerate(a) if not (index < len(a)-1 and value[0]==a[index+1][0] and value[4]==a[index+1][4])]

Categories

Resources