I would like to map a list into numbers according to the values.
For example:
['aa', 'b', 'b', 'c', 'aa', 'b', 'a'] -> [0, 1, 1, 2, 0, 1, 3]
I'm trying to achieve this by using numpy and a mapping dict.
def number(lst):
x = np.array(lst)
unique_names = list(np.unique(x))
mapping = dict(zip(unique_names, range(len(unique_names)))) # Translating dict
map_func = np.vectorize(lambda name: d[name])
return map_func(x)
Is there a more elegant / faster way to do this?
Update: Bonus question -- do it with the order maintained.
You can use the return_inverse keyword:
x = np.array(['aa', 'b', 'b', 'c', 'aa', 'b', 'a'])
uniq, map_ = np.unique(x, return_inverse=True)
map_
# array([1, 2, 2, 3, 1, 2, 0])
Edit: Order preserving version:
x = np.array(['aa', 'b', 'b', 'c', 'aa', 'b', 'a'])
uniq, idx, map_ = np.unique(x, return_index=True, return_inverse=True)
mxi = idx.max()+1
mask = np.zeros((mxi,), bool)
mask[idx] = True
oidx = np.where(mask)[0]
iidx = np.empty_like(oidx)
iidx[map_[oidx]] = np.arange(oidx.size)
iidx[map_]
# array([0, 1, 1, 2, 0, 1, 3])
Here's a vectorized NumPy based solution -
def argsort_unique(idx):
# Original idea : http://stackoverflow.com/a/41242285/3293881 by #Andras
n = idx.size
sidx = np.empty(n,dtype=int)
sidx[idx] = np.arange(n)
return sidx
def map_uniquetags_keep_order(a):
arr = np.asarray(a)
sidx = np.argsort(arr)
s_arr = arr[sidx]
m = np.concatenate(( [True], s_arr[1:] != s_arr[:-1] ))
unq = s_arr[m]
tags = np.searchsorted(unq, arr)
rev_idx = argsort_unique(sidx[np.searchsorted(s_arr, unq)].argsort())
return rev_idx[tags]
Sample run -
In [169]: a = ['aa', 'b', 'b', 'c', 'aa', 'b', 'a'] # String input
In [170]: map_uniquetags_keep_order(a)
Out[170]: array([0, 1, 1, 2, 0, 1, 3])
In [175]: a = [4, 7, 7, 5, 4, 7, 2] # Numeric input
In [176]: map_uniquetags_keep_order(a)
Out[176]: array([0, 1, 1, 2, 0, 1, 3])
Use sets to remove duplicates:
myList = ['a', 'b', 'b', 'c', 'a', 'b']
mySet = set(myList)
Then build your dictionary using comprehension:
mappingDict = {letter:number for number,letter in enumerate(mySet)}
I did it using the ASCII values because it is easy and short.
def number(list):
return map(lambda x: ord(x)-97,list)
l=['a', 'b', 'b', 'c', 'a', 'b']
print number(l)
Output:
[0, 1, 1, 2, 0, 1]
If the order is not a concern:
[sorted(set(x)).index(item) for item in x]
# returns:
[1, 2, 2, 3, 1, 2, 0]
I have 3 lists:
a_exist = []
b_exist = []
c_exist = []
i am looping through a main list of strings:
l = ['a', 'b']
for item in l:
if 'a' in item:
a_exist.append(1)
b_exist.append(0)
c_exist.append(0)
else:
a_exist.append(0)
b_exist.append(0)
c_exist.append(0)
if 'b' in item:
b_exist.append(1)
a_exist.append(0)
c_exist.append(0)
else:
b_exist.append(0)
a_exist.append(0)
c_exist.append(0)
What i am trying to get:
a_exist = [1,0]
b_exist = [0,1]
c_exist = [0,0]
Is there a better way of doing this?
As an alternate take on this, your problem sounds like it would likely be better suited for a dictionary of lists. This makes it easy to extend (e.g., if you want to detect other characters, you just add them to the initial list checks below) without having to add a new _exist list each time.
In [7]: checks = ['a', 'b', 'c', 'd', 'e']
In [8]: l = ['a', 'b', 'ae', 'bcd']
In [9]: ret = {k: [int(k in v) for v in l] for k in checks}
In [10]: ret
Out[10]:
{'a': [1, 0, 1, 0],
'b': [0, 1, 0, 1],
'c': [0, 0, 0, 1],
'd': [0, 0, 0, 1],
'e': [0, 0, 1, 0]}
l = ['a', 'b']
a_exist = [1 if 'a' in i else 0 for i in l]
c_exist = [1 if 'b' in i else 0 for i in l]
b_exist = [1 if 'c' in i else 0 for i in l]
print(a_exist, b_exist, c_exist, sep='\n')
out:
[1, 0]
[0, 1]
[0, 0]
Just combine the list comprehension and conditional assignment.
First loop through l and get each value, than check if the value match the condition, if the value match, return 1, else return 0
On start I have 2 lists and 1 list that says in what order I should merge those two lists.
For example I have first list equal to [a, b, c] and second list equal to [d, e] and 'merging' list equal to [0, 1, 0, 0, 1].
That means: to make merged list first I need to take element from first list, then second, then first, then first, then second... And I end up with [a, d, b, c, e].
To solve this I just used for loop and two "pointers", but I was wondering if I can do this task more pythonic... I tried to find some functions that could help me, but no real result.
You could create iterators from those lists, loop through the ordering list, and call next on one of the iterators:
i1 = iter(['a', 'b', 'c'])
i2 = iter(['d', 'e'])
# Select the iterator to advance: `i2` if `x` == 1, `i1` otherwise
print([next(i2 if x else i1) for x in [0, 1, 0, 0, 1]]) # ['a', 'd', 'b', 'c', 'e']
It's possible to generalize this solution to any number of lists as shown below
def ordered_merge(lists, selector):
its = [iter(l) for l in lists]
for i in selector:
yield next(its[i])
In [4]: list(ordered_merge([[3, 4], [1, 5], [2, 6]], [1, 2, 0, 0, 1, 2]))
Out[4]: [1, 2, 3, 4, 5, 6]
If the ordering list contains strings, floats, or any other objects that can't be used as list indexes, use a dictionary:
def ordered_merge(mapping, selector):
its = {k: iter(v) for k, v in mapping.items()}
for i in selector:
yield next(its[i])
In [6]: mapping = {'A': [3, 4], 'B': [1, 5], 'C': [2, 6]}
In [7]: list(ordered_merge(mapping, ['B', 'C', 'A', 'A', 'B', 'C']))
Out[7]: [1, 2, 3, 4, 5, 6]
Of course, you can use integers as dictionary keys as well.
Alternatively, you could remove elements from the left side of each of the original lists one by one and add them to the resulting list. Quick example:
In [8]: A = ['a', 'b', 'c']
...: B = ['d', 'e']
...: selector = [0, 1, 0, 0, 1]
...:
In [9]: [B.pop(0) if x else A.pop(0) for x in selector]
Out[9]: ['a', 'd', 'b', 'c', 'e']
I would expect the first approach to be more efficient (list.pop(0) is slow).
How about this,
list1 = ['a', 'b', 'c']
list2 = ['d', 'e']
options = [0,1,0,0,1]
list1_iterator = iter(list1)
list2_iterator = iter(list2)
new_list = [next(list2_iterator) if option else next(list1_iterator) for option in options]
print(new_list)
# Output
['a', 'd', 'b', 'c', 'e']
I'd like to make additions/replacements to the digram list which looks similar to this:
[[a,b][b,c][c,d][d,c][c,b][b,a]]
If the list is flattened, outcome would be: ´´[a,b,c,d,c,b,a]´´ but this is just for describing the structure, not the issue.
Note that there are only two items on a digram and each of the two items on a
digram precedes the next and the previous digram items, except of the first
and the last digram, where terminating item occurs only once. See item
´´a´´.
My question is that how can you replace digrams to the list, so that next example results on the comment part would fulfill:
replace([['d','d']], 1, ['a', 0]) # should return: [['d', 'd']]
replace([['d',1]], 1, ['a', 0]) # should return: [['d', 'a'], ['a', 0]]
replace([[1,'d']], 1, ['a', 0]) # should return: [['a', 0], [0, 'd']]
replace([[1,'d'],['d', 1]], 1, ['a', 0]) # should return: [['a', 0], [0, 'd'], ['d', 'a'], ['a', 0]]
replace([['d',1],[1,'d']], 1, ['a', 0]) # should return: [['d','a'], ['a', 0], [0, 'd']]
replace([[1,1]], 1, ['a', 0]) # should return: [['a', 0], [0, 'a'], ['a', 0]]
replace([[1,1],[1,1]], 1, ['a', 0]) # should return: [['a', 0], [0, 'a'], ['a', 0], [0, 'a'], ['a', 0]]
I have tried the next approach, but it has some issues. Especially the part under ´´j == 1´´ has special cases that doesnt work.
def replace(t, a, b):
"""
1. argument is the target list
2. argument is the index value to be used on replacement
3. argument is the digram to be inserted
"""
# copy possibly not needed, im not sure
t1 = t[:]
for i, x in enumerate(t1):
for j, y in enumerate(x):
# if there is a digram match, lets make replacement / addition
if y == a:
if j == 0:
c = t1[i:]
del t1[i:]
t1 += [b] + c
c[0][0] = b[1]
if j == 1:
c = t1[i:]
del t1[i:]
t1 += c + [b]
c[len(c)-1][1] = b[0]
#c[0][1] = b[0]
#t1 += c
print (t, t1)
Can you suggest some tips to improve the function or have alternative ways to do the task?
Addition
This is my enchanged version of the function, which provides right answers, but still "annoying" part of it or whole approach could be optimized. This question and topic could be changed more to the code optimization area:
def replace(t, a, b):
"""
1. argument is the target list
2. argument is the index value to be used on replacement
3. argument is the digram to be inserted
"""
l = len(t)
i = 0
while i < l:
for j, y in enumerate(t[i]):
# if there is a digram match, lets make replacement / addition
if y == a:
if j == 0:
c = t[i:]
del t[i:]
t += [b] + c
c[0][0] = b[1]
# increase both index and length
# this practically jumps over the inserted digram to the next one
i += 1
l += 1
elif j == 1:
c = t[i:]
del t[i:]
# this is the annoying part of the algorithm...
if len(c) > 1 and c[1][0] == a:
t += c
else:
t += c + [b]
c[-1][1] = b[0]
t[i][1] = b[0]
i += 1
return t
I also provide test function to test inputs and outputs:
def test(ins, outs):
try:
assert ins == outs
return (True, 'was', outs)
except:
return (False, 'was', ins, 'should be', outs)
for i, result in enumerate(
[result for result in [
[replace([['d','d']], 1, ['a', 0]), [['d', 'd']]],
[replace([['d',1]], 1, ['a', 0]), [['d', 'a'], ['a', 0]]],
[replace([[1,'d']], 1, ['a', 0]), [['a', 0], [0, 'd']]],
[replace([[1,'d'],['d', 1]], 1, ['a', 0]), [['a', 0], [0, 'd'], ['d', 'a'], ['a', 0]]],
[replace([['d',1],[1,'d']], 1, ['a', 0]), [['d','a'], ['a', 0], [0, 'd']]],
[replace([[1,1]], 1, ['a', 0]), [['a', 0], [0, 'a'], ['a', 0]]],
[replace([[1,1],[1,1]], 1, ['a', 0]), [['a', 0], [0, 'a'], ['a', 0], [0, 'a'], ['a', 0]]],
[replace([['d',1],[1,1]], 1, ['a', 0]), [['d', 'a'], ['a', 0], [0, 'a'], ['a', 0]]],
[replace([[1,1],[1,'d']], 1, ['a', 0]), [['a', 0], [0, 'a'], ['a', 0], [0, 'd']]]
]]):
print (i+1, test(*result))
This is my approach. Explanation below.
def replace(t, a, b):
# Flatten the list
t = [elem for sub in t for elem in sub]
replaced = []
# Iterate the elements of the flattened list
# Let the elements that do not match a in and replace the ones that
# do not match with the elements of b
for elem in t:
if elem == a: # this element matches, replace with b
replaced.extend(b)
else: # this element does not, add it
replaced.append(elem)
# break up the replaced, flattened list with groups of 2 elements
return [replaced[x:x+2] for x in range(len(replaced)-1)]
You start with some list of lists. So first, we can flatten that.
[[1,'d'],['d', 1]] becomes [1,'d','d', 1]
Now we can loop through the flattened list and anywhere we find a match on a we can extend our replaced list with the contents of b. If the element does not match a we simply append it to replaced. We end up with:
['a', 0, 'd', 'd', 'a', 0]
Now we want to take all of these in groups of 2, moving our index 1 at a time.
[['a', 0] ...]
[['a', 0], [0, 'd'], ...]
[['a', 0], [0, 'd'], ['d', 'd'], ...]
If your data was substantially longer than your examples and was in need of performance improvements, the flattening of the list could be removed and you could flatten the value in t with a nested loop so you would make a single pass through t.
EDIT
def replace(t, a, b):
t = [elem for sub in t for elem in sub]
inner_a_matches_removed = []
for i, elem in enumerate(t):
if not i % 2 or elem != a:
inner_a_matches_removed.append(elem)
continue
if i < len(t) - 1 and t[i+1] == a:
continue
inner_a_matches_removed.append(elem)
replaced = []
for elem in inner_a_matches_removed:
if elem == a:
replaced.extend(b)
else:
replaced.append(elem)
return [replaced[x:x+2] for x in range(len(replaced)-1)]
And here is an addition for testing:
args_groups = [
([['d','d']], 1, ['a', 0]),
([['d',1]], 1, ['a', 0]),
([[1,'d']], 1, ['a', 0]),
([[1,'d'],['d', 1]], 1, ['a', 0]),
([['d',1],[1,'d']], 1, ['a', 0]),
([[1,1]], 1, ['a', 0]),
([[1,1],[1,1]], 1, ['a', 0]),
]
for args in args_groups:
print "replace({}) => {}".format(", ".join(map(str, args)), replace(*args))
Which outputs:
replace([['d', 'd']], 1, ['a', 0]) => [['d', 'd']]
replace([['d', 1]], 1, ['a', 0]) => [['d', 'a'], ['a', 0]]
replace([[1, 'd']], 1, ['a', 0]) => [['a', 0], [0, 'd']]
replace([[1, 'd'], ['d', 1]], 1, ['a', 0]) => [['a', 0], [0, 'd'], ['d', 'd'], ['d', 'a'], ['a', 0]]
replace([['d', 1], [1, 'd']], 1, ['a', 0]) => [['d', 'a'], ['a', 0], [0, 'd']]
replace([[1, 1]], 1, ['a', 0]) => [['a', 0], [0, 'a'], ['a', 0]]
replace([[1, 1], [1, 1]], 1, ['a', 0]) => [['a', 0], [0, 'a'], ['a', 0], [0, 'a'], ['a', 0]]
I guess I still don't understand case #4, but you seem to have solved it yourself which is Great!
Here is your modified code:
def replace(t, a, b):
# Flatten the list
t1 = []
l = len(t)-1
for items in [t[i][0:(1 if i>-1 and i<l else 2)] for i in range(0,l+1)]:
t1.extend(items)
replaced = []
# Iterate the elements of the flattened list
# Let the elements that do not match a in and replace the ones that
# do not match with the elements of b
for elem in t1:
if elem == a: # this element matches, replace with b
replaced.extend(b)
else: # this element does not, add it
replaced.append(elem)
# break up the replaced, flattened list with groups of 2 elements
return [replaced[x:x+2] for x in range(len(replaced)-1)]
Quick Summary:
need_to_reorder = [['a', 'b', 'c', 'd'], [1, 2, 3, 4]]
I want to set an order for the need_to_reorder[0][x] x values using my sorting array
sorting_array = [1, 3, 0, 2]
Required result: need_to_reorder will equal
[['b', 'd', 'a', 'c'], [2, 4, 1, 3]]
Searching for the answer, I tried using numPy:
import numpy as np
sorting_array = [1, 3, 0, 2]
i = np.array(sorting_array)
print i ## Results: [1 3 0 2] <-- No Commas?
need_to_reorder[:,i]
RESULTS:
TypeError: list indicies must be integers, not tuple
I'm looking for a correction to the code above or an entirely different approach.
You can try a simple nested comprehension
>>> l = [['a', 'b', 'c', 'd'], [1, 2, 3, 4]]
>>> s = [1, 3, 0, 2]
>>> [[j[i] for i in s] for j in l]
[['b', 'd', 'a', 'c'], [2, 4, 1, 3]]
If you need this as a function you can have a very simple function as in
def reorder(need_to_reorder,sorting_array)
return [[j[i] for i in sorting_array] for j in need_to_reorder]
Do note that this can be solved using map function also. However in this case, a list comp is preferred as the map variant would require a lambda function. The difference between map and a list-comp is discussed in full length in this answer
def order_with_sort_array(arr, sort_arr):
assert len(arr) == len(sort_arr)
return [arr[i] for i in sort_arr]
sorting_array = [1, 3, 0, 2]
need_to_reorder = [['a', 'b', 'c', 'd'], [1, 2, 3, 4]]
after_reordered = map(lambda arr : order_with_sort_array(arr, sorting_array),
need_to_reorder)
This should work
import numpy as np
ntr = np.array([['a', 'b', 'c', 'd'], [1, 2, 3, 4]])
sa = np.array([1, 3, 0, 2])
print np.array( [ntr[0,] , np.array([ntr[1,][sa[i]] for i in range(sa.shape[0])])] )
>> [['a' 'b' 'c' 'd'],['2' '4' '1' '3']]