Related
I've organized my data into 3 lists. The first one simply contains floating-point numbers, some of which are duplicates. The second and third lists contain 1D arrays of variable length.
The first list is sorted and all lists contain the same number of elements.
The overall format is this:
a = [1.0, 1.5, 1.5, 2 , 2]
b = [arr([1 2 3 4 10]), arr([4 8 10 11 5 6 12]), arr([1 5 7]), arr([70 1 2]), arr([1])]
c = [arr([3 4 8]), arr([5 6 12]), arr([6 7 10 123 14]), arr([70 1 2]), arr([1 5 10 4])]
I'm trying to find a way to merge the arrays in lists b and c if their corresponding float number is the same in the list a. For the example above, the desired result would be:
a = [1.0, 1.5, 2]
b = [arr([1 2 3 4 10]), arr([4 8 10 11 5 6 12 1 5 7]), arr([70 1 2 1])]
c = [arr([3 4 8]), arr([5 6 12 6 7 10 123 14]), arr([70 1 2 1 5 10 4]])]
How would I go about doing this? Does it have something to do with zip?
Since a is sorted, I would use itertools.groupby. Similar to #MadPhysicist's answer, but iterating over the zip of lists:
import numpy as np
from itertools import groupby
arr = np.array
a = [1.0, 1.5, 1.5, 2 , 2]
b = [arr([1, 2, 3, 4, 10]), arr([4, 8, 10, 11, 5, 6, 12]), arr([1, 5, 7]), arr([70, 1, 2]), arr([1])]
c = [arr([3, 4, 8]), arr([5, 6, 12]), arr([6, 7, 10, 123, 14]), arr([70, 1, 2]), arr([1, 5, 10, 4])]
res_a, res_b, res_c = [], [], []
for k, g in groupby(zip(a, b, c), key=lambda x: x[0]):
g = list(g)
res_a.append(k)
res_b.append(np.concatenate([x[1] for x in g]))
res_c.append(np.concatenate([x[2] for x in g]))
..which outputs res_a, res_b and res_c as:
[1.0, 1.5, 2]
[array([ 1, 2, 3, 4, 10]), array([ 4, 8, 10, 11, 5, 6, 12, 1, 5, 7]), array([70, 1, 2, 1])]
[array([3, 4, 8]), array([ 5, 6, 12, 6, 7, 10, 123, 14]), array([70, 1, 2, 1, 5, 10, 4])]
Alternatively in case a is not sorted, you can use defaultdict:
import numpy as np
from collections import defaultdict
arr = np.array
a = [1.0, 1.5, 1.5, 2 , 2]
b = [arr([1, 2, 3, 4, 10]), arr([4, 8, 10, 11, 5, 6, 12]), arr([1, 5, 7]), arr([70, 1, 2]), arr([1])]
c = [arr([3, 4, 8]), arr([5, 6, 12]), arr([6, 7, 10, 123, 14]), arr([70, 1, 2]), arr([1, 5, 10, 4])]
res_a, res_b, res_c = [], [], []
d = defaultdict(list)
for x, y, z in zip(a, b, c):
d[x].append([y, z])
for k, v in d.items():
res_a.append(k)
res_b.append(np.concatenate([x[0] for x in v]))
res_c.append(np.concatenate([x[1] for x in v]))
Since a is sorted, you could use itertools.groupby on the range of indices in your list, keyed by a:
from itertools import groupby
result_a = []
result_b = []
result_c = []
for _, group in groupby(range(len(a)), key=a.__getitem__):
group = list(group)
index = slice(group[0], group[-1] + 1)
result_a.append(k)
result_b.append(np.concatenate(b[index]))
result_c.append(np.concatenate(c[index]))
group is an iterator, so you need to consume it to get the actual indices it represents. Each group contains all the indices that correspond to the same value in list_a.
slice(...) is what gets passed to list.__getitem__ any time there is a : in the indexing expression. index is equivalent to group[0]:group[-1] + 1]. This slices out the portion of the list that corresponds to each key in list_a.
Finally, np.concatenate just merges your arrays together in batches.
If you wanted to do this without doing list(group), you could consume the iterator in other ways, without keeping the values around. For example, you could get groupby to do it for you:
from itertools import groupby
result_a = []
result_b = []
result_c = []
prev = None
for _, group in groupby(range(len(a)), key=a.__getitem__):
index = next(group)
result_a.append(k)
if prev is not None:
result_b.append(np.concatenate(b[prev:index]))
result_c.append(np.concatenate(c[prev:index]))
prev = index
if prev is not None:
result_b.append(np.concatenate(b[prev:]))
result_c.append(np.concatenate(c[prev:]))
At that point, you wouldn't even really need to use groupby since it wouldn't be much more work to keep track of everything yourself:
result_a = []
result_b = []
result_c = []
k = None
for i, n in enumerate(a):
if n == k:
continue
result_a.append(n)
if k is not None:
result_b.append(np.concatenate(b[prev:i]))
result_c.append(np.concatenate(c[prev:i]))
k = n
prev = index
if k is not None:
result_b.append(np.concatenate(b[prev:]))
result_c.append(np.concatenate(c[prev:]))
EDIT: solutions above from #Austin and #Mad Physicist are better, so it's better to use them. Mine is reinventing bicycle which is not pythonic way.
I think that modifying original arrays is dangerous despite this approach using twice as much memory, but it's safe to iterate and do operations this way.
What's happening:
iterate over a and search for index occurencies in rest of a (we
exclude current value by remove(i)
if no duplicates then just copy b and c as usual
if there are, then merge in temp lists, then append it to a1, b1
and c1. Block value so that duplicate value won't trigger another
merge. Using if in the beginning we can check if value is blocked
return new lists
I didn't bother with np arrays, though i used np.where since it is a bit faster than using list comprehensions. Feel free to edit data formats etc, mine are simple for demonstration purposes.
import numpy as np
a = [1.0, 1.5, 1.5, 2, 2]
b = [[1, 2, 3, 4, 10], [4, 8, 10, 11, 5, 6, 12], [1, 5, 7], [70, 1, 2], [1]]
c = [[3, 4, 8], [5, 6, 12], [6, 7, 10, 123, 14], [70, 1, 2], [1, 5, 10, 4]]
def function(list1, list2, list3):
a1 = []
b1 = []
c1 = []
merged_list = []
# to preserve original index we use enumerate
for i, item in enumerate(list1):
# to aboid merging twice we just exclude values from a we already checked
if item not in merged_list:
list_without_elem = np.array(list1)
ixs = np.where(list_without_elem == item)[0].tolist() # removing our original index
ixs.remove(i)
# if empty append to new list as usual since we don't need merge
if not ixs:
a1.append(item)
b1.append(list2[i])
c1.append(list3[i])
merged_list.append(item)
else:
temp1 = [*list2[i]] # temp b and c prefilled with first b and c
temp2 = [*list3[i]]
for ix in ixs:
[temp1.append(item) for item in list2[ix]]
[temp2.append(item) for item in list3[ix]]
a1.append(item)
b1.append(temp1)
c1.append(temp2)
merged_list.append(item)
print(a1)
print(b1)
print(c1)
# example output
# [1.0, 1.5, 2]
# [[1, 2, 3, 4, 10], [4, 8, 10, 11, 5, 6, 12, 1, 5, 7], [70, 1, 2, 1]]
# [[3, 4, 8], [5, 6, 12, 6, 7, 10, 123, 14], [70, 1, 2, 1, 5, 10, 4]]
I have an array a = [[1,2,3,4,5,6,7,8,9,10],[4,1,6,2,3,5,8,9,7,10]], where lets say a1 = [1,2,3,4,5,6,7,8,9,10] and a2 = [4,1,6,2,3,5,8,9,7,10], from which I have constructed cyclic permutation. Note that a1 is a sorted array. For e.g in my case, the cycles are;
c = [[4, 2, 1], [6, 5, 3], [8, 9, 7], [10]]
lets say c1 = [4, 2, 1]
c2 = [6, 5, 3]
c3 = [8, 9, 7]
c4 = [10]
Now I want to form new arrays a11 and a22 as follow;
I have a method that gives all the cycle in a given permutation, but constructing new arrays from it, seems to be complicated. Any ideas to implement this is in python3 would be much appreciated.
-------------------
To obtain cycles;
import numpy as np
import random
def cx(individual):
c = {i+1: individual[i] for i in range(len(individual))}
cycles = []
while c:
elem0 = next(iter(c)) # arbitrary starting element
this_elem = c[elem0]
next_item = c[this_elem]
cycle = []
while True:
cycle.append(this_elem)
del c[this_elem]
this_elem = next_item
if next_item in c:
next_item = c[next_item]
else:
break
cycles.append(cycle)
return cycles
aa = cx([4,1,6,2,3,5, 8,9,7,10])
print("array: ", aa)
You can ues itertools.permutations to get different permutations of items of a, then use itertools.cycle to cycle through dicts that map items of sublists of a to their indices, and zip the sublists of c with the mappings to produce sequences that follow the indices specified by the cycling dicts:
a = [[1,2,3,4,5,6,7,8,9,10],[4,1,6,2,3,5,8,9,7,10]]
c = [[4, 2, 1], [6, 5, 3], [8, 9, 7], [10]]
from itertools import cycle, permutations
print([[d[i] for i in range(len(d))] for l in permutations(a) for d in ({p[n]: n for s, p in zip(c, cycle({n: i for i, n in enumerate(s)} for s in l)) for n in s},)])
This outputs:
[[1, 2, 6, 4, 3, 5, 7, 8, 9, 10], [4, 1, 3, 2, 5, 6, 8, 9, 7, 10]]
Say I have a matrix A = [a_1,a_2,...,a_n]. Each column a_i belongs to a class. All classes are from 1 to K. All n column's labels are stored in one n-dim vector b.
Now for each class i, I need to sum all vectors in class i together and put the result vector as the i-th column of a new matrix. So the new matrix has K columns, same number of rows as A.
I know nonzero() can help me get index corresponding to one same label. But I don't know how to write everything without loop. I'm actually working on a large matrix. So using any "for" loop will definitely ruin the efficiency.
Example as following:
A = [[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]]
label = [1,2,3,1,2,3]
New matrix = [[1+4,2+5,3+6],
[7+10,8+11,9+12]]
K=3, n=6
One way to avoid loops is to use map.
First create a func to reduce a list.
def red(l):
return list(map(lambda x: x[0] + x[1], zip(l[:3], l[3:])))
A = [[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]]
Then apply map to the list of lists.
>>> list(map(red, A))
[[5, 7, 9], [17, 19, 21]]

I do not know why you are not able to use loops but here is the solution
import numpy as np
def masksum(v,vmask,elemn):
vmask = list(map(lambda x: 1 if x == elemn else 0, vmask))
a = np.array(v)
return np.sum(a * vmask)
def mysum(v1,vmask):
norepeat = list(set(vmask))
return list(map(lambda elemn: masksum(v1,vmask,elemn),norepeat))
A = [[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]]
label = [1,2,3,1,2,3]
result = list(map(lambda vectori: mysum(vectori,label), A))
print(result)
label is like a mask you can try with [1,3,3,1,3,3] or [4,4,4,5,5,5] etc and the code will works
I want to permute the elements of a list according to their index modulo 3, so for example the list:
[0,1,2,3,4,5,6,7,8]
should be reordered into:
[0,3,6,1,4,7,2,5,8]
and in general:
[A0, A1, A2, A3, A4, A5, A6, A7, A8]
should become:
[A0, A3, A6, A1, A4, A7, A2, A5, A8]
I have tried to use the following code:
def arr_sort(arr, algo):
arrtmp = arr
arrlen = len(arr)
if algo == 1:
return arr
if algo == 2:
count = 0
while count < (arrlen - 1):
for index, val in enumerate(arr):
if index % 3 == 0:
arrtmp[count] = val
count += 1
for index, val in enumerate(arr):
if index % 3 == 1:
arrtmp[count] = val
count += 1
for index, val in enumerate(arr):
if index % 3 == 2:
arrtmp[count] = val
count += 1
return arrtmp
It's not working properly, as arr gets changed throughout the while loop, and I can't really see why.
(also I know that I could do the "f index % ... bits in a for loop as well, but it should work anyways, right?)
Is there a pre-existing function that could do that instead?
You are reordering on the modulo of 3:
l = [0,1,2,3,4,5,6,7,8]
l_sorted = sorted(l, key=lambda x: x%3)
print(l_sorted)
# [0, 3, 6, 1, 4, 7, 2, 5, 8]
If you're reordering on the index and not on the values, then something more detailed requiring enumerate would do that:
l_sorted = [x[1] for x in sorted(enumerate(l), key=lambda x: x[0]%3)]
print(l_sorted)
# [0, 3, 6, 1, 4, 7, 2, 5, 8]
Another one based on index is
l_sorted = sum([l[x::3] for x in range(3)],[])
Let a = [0,1,2,3,4,5,6,7,8].
a[::3] will give you every third element: [0, 3, 6]
a[n::3] will give you every third element starting at n, so a[1::3] becomes [1, 4, 7] and a[2::3] becomes [2, 5, 8].
Concatenate these lists with +: a[0::3] + a[1::3] + a[2::3] evaluates to [0, 3, 6, 1, 4, 7, 2, 5, 8], as specified.
Here's a generalized solution to chain a list together by grouping on some step. itertools.chain.from_iterable is used to chain the resulting groups from each step together.
from itertools import chain
def step_chain(l, step_len):
return chain.from_iterable(l[s::step_len] for s in range(step_len))
Demo:
>>> l = [0, 1, 2, 3, 4, 5, 6, 7, 8]
>>> list(step_chain(l, 2))
[0, 2, 4, 6, 8, 1, 3, 5, 7]
>>> list(step_chain(l, 3))
[0, 3, 6, 1, 4, 7, 2, 5, 8]
If the step length does not fit nicely into the list, the additional elements are placed at the end of the new list, and if the step length is larger than the list then evidently the original list is returned.
>>> step_chain(l, 7)
[0, 7, 1, 8, 2, 3, 4, 5, 6]
>>> step_chain(l, 100)
[0, 1, 2, 3, 4, 5, 6, 7, 8]
Why your attempt doesn't work
When you set temp_arr = arr, temp_arr is simply a reference to the same underlying (mutable) list as arr. Hence you are modifying it. You can think of temp_arr and arr as two "sticky-notes" attached to the same underlying list object, which will change accordingly.
If you copy your original list arr with the list() built-in
arrtmp = list(arr)
your approach works for this particular case.
Title is definitely confusing, so here's an example: Say I have a list of values [1,2,3,2,1,4,5,6,7,8]. I want to remove between the two 1s in the list, and by pythonic ways it will also end up removing the first 1 and output [1,4,5,6,7,8]. Unfortunately, due to my lack of pythonic ability, I have only been able to produce something that removes the first set:
a = [1,2,3,2,1,4,5,6,7]
uniques = []
junks = []
for value in a:
junks.append(value)
if value not in uniques:
uniques.append(value)
for value in uniques:
junks.remove(value)
for value in junks:
a.remove(value)
a.remove(value)
a[0] = 1
print(a)
[1,4,5,6,7]
Works with the first double occurrence and will not work with the next occurrence in a larger list. I have an idea which is to remove between the index of the first occurrence and the second occurrence which will preserve the second and not have me do some dumb thing like a[0] = 1 but I'm really not sure how to implement it.
Would this do what you asked:
a = [1, 2, 3, 2, 1, 4, 5, 6, 7, 8]
def f(l):
x = l.copy()
for i in l:
if x.count(i) > 1:
first_index = x.index(i)
second_index = x.index(i, first_index + 1)
x = x[:first_index] + x[second_index:]
return x
So the output of f(a) would be [1, 4, 5, 6, 7, 8] and the output of f([1, 2, 3, 2, 1, 4, 5, 6, 7, 8, 7, 6, 5, 15, 16]) would be [1, 4, 5, 15, 16].
if you want to find unique elements you can use set and list
mylist = list(set(mylist))
a = [1, 2, 3, 2, 1, 4, 5, 6, 7, 8, 7, 6, 5, 15, 16]
dup = [x for x in a if a.count(x) > 1] # list of duplicates
while dup:
pos1 = a.index(dup[0])
pos2 = a.index(dup[0], pos1+1)
a = a[:pos1]+a[pos2:]
dup = [x for x in a if a.count(x) > 1]
print a #[1, 4, 5, 15, 16]
A more efficient solution would be
a = [1, 2, 3, 2, 1, 4, 5, 6, 7, 8, 7, 6, 5, 15, 16]
pos1 = 0
while pos1 < len(a):
if a[pos1] in a[pos1+1:]:
pos2 = a.index(a[pos1], pos1+1)
a = a[:pos1]+a[pos2:]
pos1 += 1
print a #[1, 4, 5, 15, 16]
(This probably isn't the most efficient way, but hopefully it helps)
Couldn't you just check if something appears twice, if it does you have firstIndex, secondIndex, then:
a=[1,2,3,4,5,1,7,8,9]
b=[]
#do a method to get the first and second index of the repeated number then
for index in range(0, len(a)):
print index
if index>firstIndex and index<secondIndex:
print "We removed: "+ str(a[index])
else:
b.append(a[index])
print b
The output is [1,1,7,8,9] which seems to be what you want.
To do the job you need:
the first and the last position of duplicated values
all indexes between, to remove them
Funny thing is, you can simply tell python to do this:
# we can use a 'smart' dictionary, that can construct default value:
from collections import defaultdict
# and 'chain' to flatten lists (ranges)
from itertools import chain
a = [1, 2, 3, 2, 1, 4, 5, 6, 7]
# build dictionary where each number is key, and value is list of positions:
index = defaultdict(list)
for i, item in enumerate(a):
index[item].append(i)
# let's take first only and last index for non-single values
edges = ((pos[0], pos[-1]) for pos in index.values() if len(pos) > 1)
# we can use range() to get us all index positions in-between
# ...use chain.from_iterable to flatten our list
# ...and make set of it for faster lookup:
to_remove = set(chain.from_iterable(range(start, end)
for start, end in edges))
result = [item for i, item in enumerate(a) if i not in to_remove]
# expected: [1, 4, 5, 6, 7]
print result
Of course you can make it shorter:
index = defaultdict(list)
for i, item in enumerate([1, 2, 3, 2, 1, 4, 5, 6, 7]):
index[item].append(i)
to_remove = set(chain.from_iterable(range(pos[0], pos[-1])
for pos in index.values() if len(pos) > 1))
print [item for i, item in enumerate(a) if i not in to_remove]
This solution has linear complexity and should be pretty fast. The cost is
additional memory for dictionary and set, so you should be careful for huge data sets. But if you have a lot of data, other solutions that use lst.index will choke anyway, because they are O(n^2) with a lot of dereferencing and function calls.