I have this code:
gs = open("graph.txt", "r")
gp = gs.readline()
gp_splitIndex = gp.find(" ")
gp_nodeCount = int(gp[0:gp_splitIndex])
gp_edgeCount = int(gp[gp_splitIndex+1:-1])
matrix = [] # predecare the array
for i in range(0, gp_nodeCount):
matrix.append([])
for y in range(0, gp_nodeCount):
matrix[i].append(0)
for i in range(0, gp_edgeCount-1):
gp = gs.readline()
gp_splitIndex = gp.find(" ") # get the index of space, dividing the 2 numbers on a row
gp_from = int(gp[0:gp_splitIndex])
gp_to = int(gp[gp_splitIndex+1:-1])
matrix[gp_from][gp_to] = 1
print matrix
The file graph.txt contains this:
5 10
0 1
1 2
2 3
3 4
4 0
0 3
3 1
1 4
4 2
2 0
The first two number are telling me, that GRAPH has 5 nodes and 10 edges. The Following number pairs demonstrate the edges between nodes. For example "1 4" means an edge between node 1 and 4.
Problem is, the output should be this:
[[0, 1, 0, 1, 0], [0, 0, 1, 0, 1], [1, 0, 0, 1, 0], [0, 1, 0, 0, 1], [1, 0, 1, 0, 0]]
But instead of that, I get this:
[[0, 1, 0, 1, 0], [0, 0, 1, 0, 1], [0, 0, 0, 1, 0], [0, 1, 0, 0, 1], [1, 0, 1, 0, 0]]
Only one number is different and I can't understand why is this happening. The edge "3 1" is not present. Can someone explain, where is the problem?
Change for i in range(0, gp_edgeCount-1): to
for i in range(0, gp_edgeCount):
The range() function already does the "-1" operation. range(0,3) "==" [0,1,2]
And it is not the "3 1" edge that is missing, it is the "2 0" edge that is missing, and that is the last edge. The matrices start counting at 0.
Matthias has it; you don't need edgeCount - 1 since the range function doesn't include the end value in the iteration.
There are several other things you can do to clean up your code:
The with operator is preferred for opening files, since it closes them automatically for you
You don't need to call find and manually slice, split already does what you want.
You can convert and assign directly to a pair of numbers using a generator expression and iterable unpacking
You can call range with just an end value, the 0 start is implicit.
The multiplication operator is handy for initializing lists
With all of those changes:
with open('graph.txt', 'r') as graph:
node_count, edge_count = (int(n) for n in graph.readline().split())
matrix = [[0]*node_count for _ in range(node_count)]
for i in range(edge_count):
src, dst = (int(n) for n in graph.readline().split())
matrix[src][dst] = 1
print matrix
# [[0, 1, 0, 1, 0], [0, 0, 1, 0, 1], [1, 0, 0, 1, 0], [0, 1, 0, 0, 1], [1, 0, 1, 0, 0]]
Just to keep your code and style, of course it could be much more readable:
gs = open("graph.txt", "r")
gp = gs.readline()
gp_splitIndex = gp.split(" ")
gp_nodeCount = int(gp_splitIndex[0])
gp_edgeCount = int(gp_splitIndex[1])
matrix = [] # predecare the array
for i in range(0, gp_nodeCount):
matrix.append([])
for y in range(0, gp_nodeCount):
matrix[i].append(0)
for i in range(0, gp_edgeCount):
gp = gs.readline()
gp_Index = gp.split(" ") # get the index of space, dividing the 2 numbers on a row
gp_from = int(gp_Index[0])
gp_to = int(gp_Index[1])
matrix[gp_from][gp_to] = 1
print matrix
Exactly is the last instance not used..the 2 0 from your file. Thus the missed 1. Have a nice day!
The other answers are correct, another version similar to the one of tzaman:
with open('graph.txt', mode='r') as txt_file:
lines = [l.strip() for l in txt_file.readlines()]
number_pairs = [[int(n) for n in line.split(' ')] for line in lines]
header = number_pairs[0]
edge_pairs = number_pairs[1:]
num_nodes, num_edges = header
edges = [[0] * num_nodes for _ in xrange(num_nodes)]
for edge_start, edge_end in edge_pairs:
edges[edge_start][edge_end] = 1
print edges
Related
I'm trying to figure it out a way to count the number of times that a subset appears in a list of lists. For example if I have the following list:
dataset = [[0,0,1,0,1,0],[0,0,1,0,1,1],[1,0,1,0,1,0],[0,1,1,0,0,0]]
The pattern [0,0,1,0,1,0] appears in three of the four items of the list (i.e. in three of the lists, the elements at index 2 and index 4 are set to 1, just like in the pattern). How can I count the number of times that the pattern appears?
So far I've tried this, but it does not work:
subsets_count = []
for i in range(len(dataset)):
current_subset_count = 0
for j in range(len(dataset)):
if dataset[i] in dataset[j]:
subset_count += 1
subsets_count.append(current_subset_count)
Using one of my favorite itertools, compress:
[sum(all(compress(e, d)) for e in dataset)
for d in dataset]
Results in (Try it online!):
[3, 1, 1, 1]
For each sublist, generate a set of indices where the ones exist. Do the same for the pattern. Then, for each set of indices, find whether the pattern indices are a subset of that set. If so, the pattern is in the sublist.
one_indices_of_subsets = [{i for i, v in enumerate(sublist) if v} for sublist in dataset]
pattern_indices = {i for i, v in enumerate(pattern) if v}
result = sum(1 for s in one_indices_of_subsets if pattern_indices <= s)
print(result)
This outputs:
3
This allows for one digit to be different from the pattern.
Straight forward pattern matcher:
dataset = [[0,0,1,0,1,0],[0,0,1,0,1,1],[1,0,1,0,1,0],[0,1,1,0,0,0]]
pattern = [0,0,1,0,1,0]
m = len(pattern)
subsets_count = 0
for i in range(len(dataset)):
count = 0
for j in range(m):
if dataset[i][j] == pattern[j]:
count +=1
if count >= m-1:
subsets_count +=1
print(subsets_count)
Output:
3
if you want to count a pattern (by taking into account the order of the pattern) you can simply use the .count() function by applying it as follows:
dataset = [[0,0,1,0,1,0],[0,0,1,0,1,1],[1,0,1,0,1,0],[0,1,1,0,0,0],[0,0,1,0,1,0]]
num_count = dataset.count([0,0,1,0,1,0])
print(num_count)
output:
2
and if you dont care about the order of the 0's and ones, you can use:
dataset = [[0,0,1,0,1,0],[0,0,1,0,1,1],[1,0,1,0,1,0],[0,1,1,0,0,0],[0,0,1,0,1,0]]
num_count = [sum(el) for el in dataset].count(sum([0,0,1,0,1,0]))
print(num_count)
output2:
3
Try:
dataset = [
[0, 0, 1, 0, 1, 0],
[0, 0, 1, 0, 1, 1],
[1, 0, 1, 0, 1, 0],
[0, 1, 1, 0, 0, 0],
]
pat = [0, 0, 1, 0, 1, 0]
cnt = sum(all(a == b for a, b in zip(pat, d) if a == 1) for d in dataset)
print(cnt)
Prints:
3
I would like to loop over following check_matrix in such a way that code recognize whether the first and second element is 1 and 1 or 1 and 2 etc? Then for each separate class of pair i.e. 1,1 or 1,2 or 2,2, the code should store in the new matrices, the sum of last element (which in this case has index 8) times exp(-i*q(check_matrix[k][2:5]-check_matrix[k][5:8])), where i is iota (complex number), k is the running index on check_matrix and q is a vector defined as given below. So there are 20 q vectors.
import numpy as np
q= []
for i in np.linspace(0, 10, 20):
q.append(np.array((0, 0, i)))
q = np.array(q)
check_matrix = np.array([[1, 1, 0, 0, 0, 0, 0, -0.7977, -0.243293],
[1, 1, 0, 0, 0, 0, 0, 1.5954, 0.004567],
[1, 2, 0, 0, 0, -1, 0, 0, 1.126557],
[2, 1, 0, 0, 0, 0.5, 0.86603, 1.5954, 0.038934],
[2, 1, 0, 0, 0, 2, 0, -0.7977, -0.015192],
[2, 2, 0, 0, 0, -0.5, 0.86603, 1.5954, 0.21394]])
This means in principles I will have to have 20 matrices of shape 2x2, corresponding to each q vector.
For the moment my code is giving only one matrix, which appears to be the last one, even though I am appending in the Matrices. My code looks like below,
for i in range(2):
i = i+1
for j in range(2):
j= j +1
j_list = []
Matrices = []
for k in range(len(check_matrix)):
if check_matrix[k][0] == i and check_matrix[k][1] == j:
j_list.append(check_matrix[k][8]*np.exp(-1J*np.dot(q,(np.subtract(check_matrix[k][2:5],check_matrix[k][5:8])))))
j_11 = np.sum(j_list)
I_matrix[i-1][j-1] = j_11
Matrices.append(I_matrix)
I_matrix is defined as below:
I_matrix= np.zeros((2,2),dtype=np.complex_)
At the moment I get following output.
Matrices = [array([[-0.66071446-0.77603624j, -0.29038112+2.34855023j], [-0.31387562-0.08116629j, 4.2788 +0.j ]])]
But, I desire to get a matrix corresponding to each q value meaning that in total there should be 20 matrices in this case, where each 2x2 matrix element would be containing sums such that elements belong to 1,1 and 1,2 and 2,2 pairs in following manner
array([[11., 12.],
[21., 22.]])
I shall highly appreciate your suggestion to correct it. Thanks in advance!
I am pretty sure you can solve this problem in an easier way and I am not 100% sure that I understood you correctly, but here is some code that does what I think you want. If you have a possibility to check if the results are valid, I would suggest you do so.
import numpy as np
n = 20
q = np.zeros((20, 3))
q[:, -1] = np.linspace(0, 10, n)
check_matrix = np.array([[1, 1, 0, 0, 0, 0, 0, -0.7977, -0.243293],
[1, 1, 0, 0, 0, 0, 0, 1.5954, 0.004567],
[1, 2, 0, 0, 0, -1, 0, 0, 1.126557],
[2, 1, 0, 0, 0, 0.5, 0.86603, 1.5954, 0.038934],
[2, 1, 0, 0, 0, 2, 0, -0.7977, -0.015192],
[2, 2, 0, 0, 0, -0.5, 0.86603, 1.5954, 0.21394]])
check_matrix[:, :2] -= 1 # python indexing is zero based
matrices = np.zeros((n, 2, 2), dtype=np.complex_)
for i in range(2):
for j in range(2):
k_list = []
for k in range(len(check_matrix)):
if check_matrix[k][0] == i and check_matrix[k][1] == j:
k_list.append(check_matrix[k][8] *
np.exp(-1J * np.dot(q, check_matrix[k][2:5]
- check_matrix[k][5:8])))
matrices[:, i, j] = np.sum(k_list, axis=0)
NOTE: I changed your indices to have consistent
zero-based indexing.
Here is another approach where I replaced the k-loop with a vectored version:
for i in range(2):
for j in range(2):
k = np.logical_and(check_matrix[:, 0] == i, check_matrix[:, 1] == j)
temp = np.dot(check_matrix[k, 2:5] - check_matrix[k, 5:8], q[:, :, np.newaxis])[..., 0]
temp = check_matrix[k, 8:] * np.exp(-1J * temp)
matrices[:, i, j] = np.sum(temp, axis=0)
3 line solution
You asked for efficient solution in your original title so how about this solution that avoids nested loops and if statements in a 3 liner, which is thus hopefully faster?
fac=2*(check_matrix[:,0]-1)+(check_matrix[:,1]-1)
grp=np.split(check_matrix[:,8], np.cumsum(np.unique(fac,return_counts=True)[1])[:-1])
[np.sum(x) for x in grp]
output:
[-0.23872600000000002, 1.126557, 0.023742000000000003, 0.21394]
How does it work?
I combine the first two columns into a single index, treating each as "bits" (i.e. base 2)
fac=2*(check_matrix[:,0]-1)+(check_matrix[:,1]-1)
( If you have indexes that exceed 2, you can still use this technique but you will need to use a different base to combine the columns. i.e. if your indices go from 1 to 18, you would need to multiply column 0 by a number equal to or larger than 18 instead of 2. )
So the result of the first line is
array([0., 0., 1., 2., 2., 3.])
Note as well it assumes the data is ordered, that one column changes fastest, if this is not the case you will need an extra step to sort the index and the original check matrix. In your example the data is ordered.
The next step groups the data according to the index, and uses the solution posted here.
np.split(check_matrix[:,8], np.cumsum(np.unique(fac,return_counts=True)[1])[:-1])
[array([-0.243293, 0.004567]), array([1.126557]), array([ 0.038934, -0.015192]), array([0.21394])]
i.e. it outputs the 8th column of check_matrix according to the grouping of fac
then the last line simply sums those... knowing how the first two columns were combined to give the single index allows you to map the result back. Or you could simply add it to check matrix as a 9th column if you wanted.
I have two numpy arrays of equal size. They contain the values 1, 0, and -1. I can count the number of matching ones and negative ones, but I'm not sure how to count the matching elements that have the same index and value of zero.
I'm a little confused on how to proceed here.
Here is some code:
print(actual_direction.shape)
print(predicted_direction.shape)
act = actual_direction
pre = predicted_direction
part1 = act[pre == 1]
part2 = part1[part1 == 1]
result1 = part2.sum()
part3 = act[pre == -1]
part4 = part3[part3 == -1]
result2 = part4.sum() * -1
non_zeros = result1 + result2
zeros = len(act) - non_zeros
print(f'zeros : {zeros}\n')
print(f'non_zeros : {non_zeros}\n')
final_result = non_zeros + zeros
print(f'result1 : {result1}\n')
print(f'result2 : {result2}\n')
print(f'final_result : {final_result}\n')
Here is the printout:
(11279,)
(11279,)
zeros : 5745.0
non_zeros : 5534.0
result1 : 2217.0
result2 : 3317.0
final_result : 11279.0
So what I've done here is simply subtract the summation of the ones and negative ones from the total length of the array. I can't assume that the difference (zeros: 5745) contains ALL matching elements that contain zeros can I?
You could try this:
import numpy as np
a=np.array([1,0,0,1,-1,-1,0,0])
b=np.array([1,0,0,1,-1,-1,0,1])
summ = np.sum((a==0) & (b==0))
print(summ)
Output:
3
You can use numpy.ravel() to flatten out the array, then use zip() to compare each element side by side:
import numpy as np
ar1 = np.array([[1, 0, 0],
[0, 1, 1],
[0, 1, 0]])
ar2 = np.array([[0, 0, 0],
[1, 0, 1],
[0, 1, 0]])
count = 0
for e1, e2 in zip(ar1.ravel(), ar2.ravel()):
if e1 == e2:
count += 1
print(count)
Output:
6
You can also do this to list all the matches found, as well as print out the amount:
dup = [e1 for e1, e2 in zip(ar1.ravel(), ar2.ravel()) if e1 == e2]
print(dup)
print(len(dup))
Output:
[0, 0, 1, 0, 1, 0]
6
You have two arrays and want to count the positions where both of these are 0, right?
You can check where the array meets your required condition (a == 0), and then use the 'and' operator & to check where both arrays meet your requirement:
import numpy as np
a = np.array([1, 0, -1, 0, -1, 1, 1, 1, 1])
b = np.array([1, 0, -1, 1, 0, -1, 1, 0, 1])
both_zero = (a == 0) & (b == 0) # [False, True, False, False, False, False]
both_zero.sum() # 1
In your updated question you appear to be interested in the similarities and differences between actual values and predictions. For this, a confusion matrix is ideally suited.
from sklearn.metrics import confusion_matrix
confusion_matrix(a, b, labels=[-1, 0, 1])
will give you a confusion matrix as output telling you how many -1s were predicted as -1, 0 and 1, and the same for 0 and +1:
[[1 1 0] # -1s predicted as -1, 0 and 1
[0 1 1] # 0s predicted as -1, 0 and 1
[1 1 3]] # 1s predicted as -1, 0 and 1
I have a list of 1 and 0 --> output = [1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0]
I would like to convert that list of ones and zeroes to a string, where each 8 bits in "litlle-endian" represent one letter in "latin1"
So far I have this code (below) which works fine, but I think its quite slow and seem to slow down my script...
for i in range(0,len(output),8):
x=output[i:i+8]
l="".join([str(j) for j in x[::-1]])
out_str += chr(int(("0b"+l),base=2))
Do you have any faster ideas?
Here's a faster solution using a dictionary of tuples for the 256 possible characters:
bits = [1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0]
chars = { tuple(map(int,f"{n:08b}"[::-1])):chr(n) for n in range(0,256) }
def toChars(bits):
return "".join(chars[tuple(bits[i:i+8])] for i in range(0,len(bits),8) )
roughly 3x faster than original solution
[EDIT] and an even faster one using bytes and zip:
chars = { tuple(map(int,f"{n:08b}")):n for n in range(256) }
def toChars(bits):
return bytes(chars[b] for b in zip(*(bits[7-i::8] for i in range(8)))).decode()
about 2x faster than the previous one (on long lists)
[EDIT2] a bit of explanations for this last one ...
b in the list comprehension will be a tuple of 8 bits
chars[b] will return an integer corresponding to the 8 bits
bytes(...).decode() converts the list of integers to a string based on the chr(n) of each value
zip(*(... 8 bit iterators...)) unpacks the 8 striding ranges of bits running in parallel, each from a different starting point
The strategy with the unpacked zip is to go through the bits in steps of 8. For example, if we were going through 8 parallel ranges, we would get this:
bits[7::8] -> [ 0, 0, ... ] zip returns: (0,1,0,0,0,1,1)
bits[6::8] -> [ 1, 1, ... ] (0,1,1,0,1,1,1)
bits[5::8] -> [ 0, 1, ... ] ...
bits[4::8] -> [ 0, 0, ... ]
bits[3::8] -> [ 0, 1, ... ]
bits[2::8] -> [ 0, 1, ... ]
bits[1::8] -> [ 1, 1, ... ]
bits[0::8] -> [ 1, 1, ... ]
The zip function will take one column of this per iteration and return it as a tuple of bits.
#!/usr/bin/python
bits = [1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0]
result = []
c = 0
for i,v in enumerate(bits):
i = i % 8
c = c | v << i
if i == 7:
result.append(chr(c))
c = 0
print(''.join(result))
Testing:
$ python ./test.py
Co
Using sum and enumerate should be faster, as they are built-ins. Let's time yours and mine, on the same machine.
Run 100,000 times in a loop and tested with time python3 tmp.py. (user values. For both the amount of sys time hovered around 0m0.012s, so it only had a percentual influence on the results.)
Yours: 0m1.624s
Mine is 50% faster: 0m1.063s, with this
output = [1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0]
for item in [output[i:i + 8] for i in range(0, len(output), 8)]:
out_str += chr(sum(x<<i for i,x in enumerate(item)))
I did some measuments of the execution time for all valid solutions. See the results below in the code. Codes are sorted from slowest to fastest. Fatest being the one from Alain T.. I've tested the codes on a quite large list resulting in a string of 200000 characters.
Even for such a large list the execution time is still pretty fast also for my original solution. There has to be an issue somewhere else in my program... :-)
Thank you all for your codes!
import time
start_time = time.time()
bits = [1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0] * 100000
### tested code ###
print("Execution time: ", time.time() - start_time, "seconds")
### former solution --> 0.59 seconds
out_str = ""
for i in range(0,len(bits),8):
x=bits[i:i+8]
l="".join([str(j) for j in x[::-1]])
out_str += chr(int(("0b"+l),base=2))
### enumerate and result.append --> 0.48 seconds
result = []
c = 0
for i,v in enumerate(bits):
i = i % 8
c = c | v << i
if i == 7:
result.append(chr(c))
c = 0
out_str = ''.join(result)
### sum and enumerate --> 0.45 seconds
out_str = ""
for item in [bits[i:i + 8] for i in range(0, len(bits), 8)]:
out_str += chr(sum(x<<i for i,x in enumerate(item)))
### map and chars dictionary --> 0.10 seconds
chars = { tuple(map(int,f"{n:08b}"[::-1])):chr(n) for n in range(0,256) }
def toChars(bits):
return "".join(chars[tuple(bits[i:i+8])] for i in range(0,len(bits),8) )
### bytes and zip --> 0.06 seconds
chars = { tuple(map(int,f"{n:08b}")):n for n in range(256) }
def toChars(bits):
return bytes(chars[b] for b in zip(*(bits[7-i::8] for i in range(8)))).decode()
EDIT:
I wrote the best (fastest) solution in a more understandable form (not using list comprehensions) so I could step through the code because it took me some while to understand how it works (solution by Alain T.):
bits = [1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0] * 10
chars = {tuple(map(int,f"{n:08b}")):n for n in range(256)}
temp = []
out = []
for i in range(8):
temp.append(bits[7-i::8])
unzipped = zip(*temp)
for b in unzipped:
out.append(bytes([chars[b]]).decode())
print("".join(out))
Check whether this is faster:
tmp_list = []
for i in range(0,len(output),8):
byte_value = 0
for digit in output[i:i+8:-1]:
byte_value = (byte_value<<1) + digit
tmp_list.append(chr(byte_value))
out_str = ''.join(tmp_list)
I have a feature matrix and a corresponding targets, which are ones or zeroes:
# raw observations
features = np.array([[1, 1, 0],
[1, 1, 0],
[0, 1, 0],
[0, 1, 0],
[0, 1, 0],
[0, 0, 1]])
targets = np.array([1, 0, 1, 1, 0, 0])
As you can see, each feature may correspond to both ones and zeros. I need to convert my raw observation matrix to probability matrix, where each feature will correspond to the probability of seeing one as a target:
[1 1 0] -> 0.5
[0 1 0] -> 0.67
[0 0 1] -> 0
I have constructed a quite straight-forward solution:
import numpy as np
# raw observations
features = np.array([[1, 1, 0],
[1, 1, 0],
[0, 1, 0],
[0, 1, 0],
[0, 1, 0],
[0, 0, 1]])
targets = np.array([1, 0, 1, 1, 0, 0])
from collections import Counter
def convert_obs_to_proba(features, targets):
features_ = []
targets_ = []
# compute unique rows (idx will point to some representative)
b = np.ascontiguousarray(features).view(np.dtype((np.void, features.dtype.itemsize * features.shape[1])))
_, idx = np.unique(b, return_index=True)
idx = idx[::-1]
zeros = Counter()
ones = Counter()
# collect row-wise number of one and zero targets
for i, row in enumerate(features[:]):
if targets[i] == 0:
zeros[tuple(row)] += 1
else:
ones[tuple(row)] += 1
# iterate over unique features and compute probabilities
for k in idx:
unique_row = features[k]
zero_count = zeros[tuple(unique_row)]
one_count = ones[tuple(unique_row)]
proba = float(one_count) / float(zero_count + one_count)
features_.append(unique_row)
targets_.append(proba)
return np.array(features_), np.array(targets_)
features_, targets_ = convert_obs_to_proba(features, targets)
print(features_)
print(targets_)
which:
extracts unique features;
counts number of zero and one observations targets for each unique feature;
computes probability and constructs the result.
Could it be solved in a prettier way using some advanced numpy magic?
Update. Previous code was pretty inefficient O(n^2). Converted it to more performance-friendly. Old code:
import numpy as np
# raw observations
features = np.array([[1, 1, 0],
[1, 1, 0],
[0, 1, 0],
[0, 1, 0],
[0, 1, 0],
[0, 0, 1]])
targets = np.array([1, 0, 1, 1, 0, 0])
def convert_obs_to_proba(features, targets):
features_ = []
targets_ = []
# compute unique rows (idx will point to some representative)
b = np.ascontiguousarray(features).view(np.dtype((np.void, features.dtype.itemsize * features.shape[1])))
_, idx = np.unique(b, return_index=True)
idx = idx[::-1]
# calculate ZERO class occurences and ONE class occurences
for k in idx:
unique_row = features[k]
zeros = 0
ones = 0
for i, row in enumerate(features[:]):
if np.array_equal(row, unique_row):
if targets[i] == 0:
zeros += 1
else:
ones += 1
proba = float(ones) / float(zeros + ones)
features_.append(unique_row)
targets_.append(proba)
return np.array(features_), np.array(targets_)
features_, targets_ = convert_obs_to_proba(features, targets)
print(features_)
print(targets_)
It's easy using Pandas:
df = pd.DataFrame(features)
df['targets'] = targets
Now you have:
0 1 2 targets
0 1 1 0 1
1 1 1 0 0
2 0 1 0 1
3 0 1 0 1
4 0 1 0 0
5 0 0 1 0
Now, the fancy part:
df.groupby([0,1,2]).targets.mean()
Gives you:
0 1 2
0 0 1 0.000000
1 0 0.666667
1 1 0 0.500000
Name: targets, dtype: float64
Pandas doesn't print the 0 at the leftmost part of the 0.666 row, but if you inspect the value there, it is indeed 0.
np.sum(np.reshape([targets[f] if tuple(features[f])==tuple(i) else 0 for i in np.vstack(set(map(tuple,features))) for f in range(features.shape[0])],features.shape[::-1]),axis=1)/np.sum(np.reshape([1 if tuple(features[f])==tuple(i) else 0 for i in np.vstack(set(map(tuple,features))) for f in range(features.shape[0])],features.shape[::-1]),axis=1)
Here you go, numpy magic! Although unnecceserily so, this could probably be cleaned up using some boring variables ;)
(And this is probably far from optimal)