Python find index of all array elements in another array - python

I am trying to do the following:
import numpy as np
A = np.array([1,5,2,7,1])
B = np.sort(A)
print B
>>> [1,1,2,5,7]
I want to find the location of all elements in B as in original array A. i.e. I want to create an array C such that
print C
>>[0,4,2,1,3]
which refers to 1 in B being present in A at 0 and 4th location, 5 in B was present in A at 1st location, etc.
I tried using np.where( B == A) but it produces gibberish

import numpy as np
A = np.array([1,5,2,7,1])
print np.argsort(A) #prints [0 4 2 1 3]

If you don't want to imporr numpy for any reason you can also use this code:
a = [1,5,2,7,1]
b = zip(a, range(len(a)))
tmp = sorted(b, key=lambda x: x[0])
c = map( lambda x: x[1], tmp)
print c
[0, 4, 2, 1, 3]

https://repl.it/CVbI
A = [1,5,2,7,1]
for i,e in sorted(enumerate(A), key=lambda x: x[1]):
print(i, e)
B = [x for x,_ in sorted(enumerate(A), key=lambda x: x[1])]
A = sorted(A)
print(A)
print(B)

Related

Pandas Multi-index set value based on three different condition

The objective is to create a new multiindex column based on 3 conditions of the column (B)
Condition for B
if B<0
CONDITION_B='l`
elif B<-1
CONDITION_B='L`
else
CONDITION_B='g`
Naively, I thought, we can simply create two different mask and replace the value as suggested
# Handle CONDITION_B='l` and CONDITION_B='g`
mask_2 = df.loc[:,idx[:,'B']]<0
appenddf_2=mask_2.replace({True:'g',False:'l'}).rename(columns={'A':'iv'},level=1)
and then
# CONDITION_B='L`
mask_33 = df.loc[:,idx[:,'B']]<-0.1
appenddf_2=mask_33.replace({True:'G'}).rename(columns={'A':'iv'},level=1)
As expected, this will throw an error
TypeError: sequence item 1: expected str instance, bool found
May I know how to handle the 3 different condition
Expected output
ONE TWO
B B
g L
l l
l g
g l
L L
The code to produce the error is
import pandas as pd
import numpy as np
np.random.seed(3)
arrays = [np.hstack([['One']*2, ['Two']*2]) , ['A', 'B', 'A', 'B']]
columns = pd.MultiIndex.from_arrays(arrays)
df= pd.DataFrame(np.random.randn(5, 4), columns=list('ABAB'))
df.columns = columns
idx = pd.IndexSlice
mask_2 = df.loc[:,idx[:,'B']]<0
appenddf_2=mask_2.replace({True:'g',False:'l'}).rename(columns={'A':'iv'},level=1)
mask_33 = df.loc[:,idx[:,'B']]<-0.1
appenddf_2=mask_33.replace({True:'G'}).rename(columns={'A':'iv'},level=1)
IIUC:
np.select() is ideal in this case:
conditions=[
df.loc[:,idx[:,'B']].lt(0) & df.loc[:,idx[:,'B']].gt(-1),
df.loc[:,idx[:,'B']].lt(-1),
df.loc[:,idx[:,'B']].ge(0)
]
labels=['l','L','g']
out=pd.DataFrame(np.select(conditions,labels),columns=df.loc[:,idx[:,'B']].columns)
OR
via np.where():
s=np.where(df.loc[:,idx[:,'B']].lt(0) & df.loc[:,idx[:,'B']].gt(-1),'l',np.where(df.loc[:,idx[:,'B']].lt(-1),'L','g'))
out=pd.DataFrame(s,columns=df.loc[:,idx[:,'B']].columns)
output of out:
One Two
B B
0 g L
1 l l
2 l g
3 g l
4 L L
I don't fully understand what you want to do but try something like this:
df = pd.DataFrame({'B': [ 0, -1, -2, -2, -1, 0, 0, -1, -1, -2]})
df['ONE'] = np.where(df['B'] < 0, 'l', 'g')
df['TWO'] = np.where(df['B'] < -1, 'L', df['ONE'])
df = df.set_index(['ONE', 'TWO'])
Output result:
>>> df
B
ONE TWO
g g 0
l l -1
L -2
L -2
l -1
g g 0
g 0
l l -1
l -1
L -2

Construct an assignment matrix - Python

I have two lists of element
a = [1,2,3,2,3,1,1,1,1,1]
b = [3,1,2,1,2,3,3,3,3,3]
and I am trying to uniquely match the element from a to b, my expected result is like this:
1: 3
2: 1
3: 2
So I tried to construct an assignment matrix and then use scipy.linear_sum_assignment
a = [1,2,3,2,3,1,1,1,1,1]
b = [3,1,2,1,2,3,3,3,3,3]
total_true = np.unique(a)
total_pred = np.unique(b)
matrix = np.zeros(shape=(len(total_pred),
len(total_true)
)
)
for n, i in enumerate(total_true):
for m, j in enumerate(total_pred):
matrix[n, m] = sum(1 for item in b if item==(i))
I expected the matrix to be:
1 2 3
1 0 2 0
2 0 0 2
3 6 0 0
But the output is:
[[2. 2. 2.]
[2. 2. 2.]
[6. 6. 6.]]
What mistake did I made in here? Thank you very much
You don't even need to process this by Pandas. try to use zip and dict:
In [42]: a = [1,2,3,2,3,1,1,1,1,1]
...: b = [3,1,2,1,2,3,3,3,3,3]
...:
In [43]: c =zip(a,b)
In [44]: dict(c)
Out[44]: {1: 3, 2: 1, 3: 2}
UPDATE as OP said, if we need to store all the value with the same key, we can use defaultdict:
In [58]: from collections import defaultdict
In [59]: d = defaultdict(list)
In [60]: for k,v in c:
...: d[k].append(v)
...:
In [61]: d
Out[61]: defaultdict(list, {1: [3, 3, 3, 3, 3, 3], 2: [1, 1], 3: [2, 2]})
This row:
matrix[n, m] = sum(1 for item in b if item==(i))
counts the occurrences of i in b and saves the result to matrix[n, m]. Each cell of the matrix will contain either the number of 1's in b (i.e. 2) or the number of 2's in b (i.e. 2) or the number of 3's in b (i.e. 6). Notice that this value is completely independent of j, which means that the values in one row will always be the same.
In order to take j into consideration, try to replace the row with:
matrix[n, m] = sum(1 for x, y in zip(a, b) if (x, y) == (j, i))
In case your expected output, since how we specify the matrix as a(i, j) with i is the index of the row, and j is the index of the col. Looking at a(3,1) in your matrix, the result is 6, which means (3,1) combination matches 6 times, with 3 is from b and 1 is from a. We can find all the matches from 2 list.
matches = [tuple([x, y]) for x,y in zip(b, a)]
Then we can find how many matches there are of a specific combination, for example a(3, 1).
result = matches.count((3,1))

reduce lists given single value of 2d lists

I have 2 lists:
edges = [[0,1],[0,2],[0,3],[1,2],[1,3]]
weight = [10,8,7,3,7]
edges represents the list of edges connecting 2 nodes together with the corresponding weight.
for the given starting nodes as in edges[i][0] I want to choose the shortest connecting point given the weight so in this case the result would look like:
connect = [[0,3],[1,2]]
weight = [7,3]
Because out of all the nodes connected to 0 3 is the closest one and for 1, 2 is the closest one.
I am not able to formulate the problem, any help is appreciated!
edges = [[0,1],[0,2],[0,3],[1,2],[1,3]]
weight = [10,8,7,3,7]
connect = []
wght = []
In [8]: for i in set(e[0] for e in edges):
...: temp = [(a, b) for a, b in zip(edges, weight) if a[0] == i]
...: temp = min(temp, key=lambda x: x[1])
...: connect += [temp[0]]
...: wght += [temp[1]]
In [9]: connect
Out[9]: [[0, 3], [1, 2]]
In [10]: wght
Out[10]: [7, 3]
In case you are into one liner:
In [20]: [min([(a, b) for a, b in zip(edges, weight) if a[0] == i], key=lambda x: x[1]) fo
...: r i in set([e[0] for e in edges])]
Out[20]: [([0, 3], 7), ([1, 2], 3)]
Another solution using Pandas:
df = pd.DataFrame(edges, columns=['start','end'])
df['weight'] = weight
df.loc[df.groupby('start')['weight'].idxmin()]
With the results being:
start end weight
0 3 7
1 2 3

Compare 4 numbers to find if 3 are the same

I have the following Python list:
mylist = [a, b, c, d]
where a,b,c,d are integers.
I want to compare the 4 numbers and see if 3 of them are the same.
I have tried converting the list to a set, but it didn't help me.
Try collections.Counter.
import collections
x = [1, 2, 1, 1]
counter = collections.Counter(x)
if 3 in counter.values():
print('3 are the same')
Output:
3 are the same
UPDATE
If you are interested in checking for 3 or more occurrences, you can check the maximum value in the Counter like this:
if max(counter.values()) >= 3:
print('3 or more are the same')
This method has the added advantage that it works for larger lists as well without modification.
if mylist.count(mylist[0])>=3 or mylist.count(mylist[1])>=3:
print('3 are the same')
I would suggest using collections.Counter.
Convert the list to a counter. The counter should have two keys, and one of its values should be 3:
In [1]: from collections import Counter
In [2]: c = Counter([0, 1, 1, 1])
In [3]: len(c) == 2
Out[3]: True
In [4]: 3 in c.values()
Out[4]: True
In short:
In [5]: len(c) == 2 and 3 in c.values()
Out[5]: True
Let's try a example that doesn't meet the criteria:
In [8]: d = Counter([0, 0, 1, 1])
In [9]: len(d) == 2 and 3 in d.values()
Out[9]: False
Check the highest count?
max(map(mylist.count, mylist)) >= 3
This solution uses a collections.Counter
from collections import Counter
mylist1 = [1, 2, 4, 4]
mylist2 = [1, 3, 3, 3]
c1 = Counter(mylist1)
c2 = Counter(mylist2)
c1.most_common(1)
>>> [(4, 2)]
c1.most_common(1)[0][1] == 3
>>> False
c2.most_common(1)[0][1] == 3
>>> True
Here's one way:
mylist = [a, b, c, d]
d = {}
for i in mylist:
d[i] = d.get(i, 0) + 1
if 3 in d.values():
print("three are the same")
You can try this:
mylist = [a, b, c, d]
counter = {a:mylist.count(a) for a in mylist}
if 1 in counter.values() and len(counter) == 2:
print("three are the same")
You can use a collections.Counter:
from collections import Counter
same3 = Counter(mylist).most_common(1)[0][1] >= 3
This will be true if at least 3 elements are the same.

Why is the index only changing when I use different values?

I just started programming in Python, and I can't figure out how to make the index change if I want the values in the list to be the same. What I want is for the index to change, so it will print 0, 1, 2, but all I get is 0, 0, 0. I tried to change the values of the list so that they were different, and then I got the output I wanted. But I don't understand why it matters what kind of values I use, why would the index care about what is in the list?
a = 0
b = 0
c = 0
d = 0
e = 0
f = 0
justTesting = [[a, b], [c, d], [e, f]]
for item in justTesting:
something = justTesting.index(item)
print (something)
I'm using python 3.6.1 if that mattters
Because each list (designated 'item' in your loop) is [0, 0] this means the line:
something = justTesting.index(item)
will look for the first instance of the list [0, 0] in the list for each 'item' during the iteration. As every item in the list is [0, 0] the first instance is at position 0.
I have prepared an alternative example to illustrate the point
a = 1
b = 2
c = 3
d = 4
e = 5
f = 6
justTesting = [[a, b], [c, d], [e, f]]
for item in justTesting:
print(item)
something = justTesting.index(item)
print(something)
This results in the following:
[1, 2]
0
[3, 4]
1
[5, 6]
2
It's because your list only contains [0, 0]!
So basically, if we replace all the variables with their values, we get:
justTesting = [[0, 0], [0, 0], [0, 0]]
And using .index(item) will return the first occurrence of item if any. Since item is always [0, 0] and it first appears at justTesting[0], you will always get 0! Try changing up the values in each list and try again. For example, this works:
b = [1, 2, 3, 4, 5, 6, 7, 8, 9]
for item in b:
print(b.index(item))
Which returns:
0, 1, 2, 3, 4, 5, 6, 7, 8
if the results were on a single line.
Try it here!
Read the documentation: the default for index is to identify the first occurence. You need to use the start parameter as well, updating as you go: search only the list after the most recent find.
something = justTesting.index(item, something+1)
That's because you are iterating over a list of lists.
Every item is actually a list, and you are executing list.index() method which returns the index of the element in the list.
This is a little tricky. Since you actually have 3 lists, of [0, 0] their values will be the same when testing for equality:
>>> a = 0
>>> b = 0
>>> c = 0
>>> d = 0
>>> ab = [a, b]
>>> cd = [c, d]
>>>
>>> ab is cd
False
>>> ab == cd
True
>>>
Now when you run list.index(obj) you are looking for the 1st index that matches the object. Your code actually runs list.index([0, 0]) 3 times and returns the first match, which is at index 0.
Put different values inside a, b, c lists and it would work as you expect.
Your code:
a = 0
b = 0
c = 0
d = 0
e = 0
f = 0
justTesting = [[a, b], [c, d], [e, f]]
for item in justTesting:
something = justTesting.index(item)
print (something)
is equivalent to:
a = 0
b = 0
c = 0
d = 0
e = 0
f = 0
ab = [a, b]
cd = [c, d]
ef = [e, f]
justTesting = [ab, cd, ef]
# Note that ab == cd is True and cd == ef is True
# so all elements of justTesting are identical.
#
# for item in justTesting:
# something = justTesting.index(item)
# print (something)
#
# is essentially equivalent to:
item = justTesting[0] # = ab = [0, 0]
something = justTesting.index(item) # = 0 First occurrence of [0, 0] in justTesting
# is **always** at index 0
item = justTesting[1] # = cd = [0, 0]
something = justTesting.index(item) # = 0
item = justTesting[2] # = ef = [0, 0]
something = justTesting.index(item) # = 0
justTesting does not change as you iterate and the first position in justTesting at which [0,0] is found is always 0.
But I don't understand why it matters what kind of values I use, why would the index care about what is in the list?
Possibly what is confusing you is the fact that index() does not search for occurrences of the item "in abstract" but it looks at the values of items in a list and compares those values with a given value of item. That is,
[ab, cd, ef].index(cd)
is equivalent to
[[0,0],[0,0],[0,0].index([0,0])
and the first occurrence of [0,0] value (!!!) is at 0 index of the list for your specific values for a, b, c, d, e, and f.

Categories

Resources