I'm trying to map a color histogram where each pixel also as another (float) property, alpha, from a similar size array.
I want eventually to have a dictionary of (color) -> (count, sum) where count is actually the histogram count for that color, and sum is the sum of alpha values that correspond to a certain color.
here's a simple python code that makes what i want (c and d are the same length, and are very long):
for i in range(len(c)):
if str(c[i]) in dict:
dict[str(c[i])][0] += 1
dict[str(c[i])][1] += alpha[i]
else:
dict[str(c[i])] = [0, alpha[i]]
but naturally that takes a lot of time. Any ideas for a numpy equivalent?
Thanks
Okay, so i eventually found a very nice solution using this answer using only numpy:
https://stackoverflow.com/a/8732260/1752591
Which is a function that sums up vector according to another vector of indices.
So all I had to do is to give an id for each color, and make the dictionary:
d = alpha.reshape((-1))
id = color_code_image(colormap)
v, g = sum_by_group(d, id)
count, g = sum_by_group(np.ones(len(d)), id)
avg = v/count
return dict(np.array([g, avg]).T)
Related
I have two arrays centroids and nodes
I need to find the shortest distance of each point in centroids to any point in nodes
The output for centroids is following
array([[12.52512263, 55.78940022],
[12.52027731, 55.7893347 ],
[12.51987146, 55.78855611]])
The output for nodes is following
array([[12.5217378, 55.7799275],
[12.5122589, 55.7811443],
[12.5241664, 55.7843297],
[12.5189395, 55.7802709]])
I use the following code to get the shortest distance
shortdist_from_centroid_to_node = np.min(cdist(centroids,nodes))
However, this is the output I get (I should get 3 lines of output)
Out[131]: 3.0575613850140956e-05
Can anyone specify what the problem is here? Thanks.
When you doing np.min it return the minimal value of the 2d-array.
You want the minimum value for each centroids.
shortest_distance_to_centroid = [min(x) for x in cdist(centroids,nodes)]
To have the associate index one way would be to get the index of the corresponding value. Another is to write a custom min() function that also return the index (so you parse the list only once)
[(list(x).index(min(x)), min(x)) for x in cdist(centroids,nodes)] # the cast list() is needed because numpy array don't have index methods
solution with a custom function:
def my_min(x):
current_min = x[0]
current_index = [1]
for i, v in enumerate(x[1:]):
if v < current_min:
current_min = v
current_index = i + 1
return (current_index, current_min)
[my_min(x) for x in cdist(centroids,nodes)]
I guess what you need is just add an arg called axis, just like this:
shortdist_from_centroid_to_node = np.min(cdist(centroids,nodes), axis=1)
As for the meaning of the axis arg, you could refer to numpy.min. All in all you need minimum on each row rather than on the whole matrix.
If I am not wrong your code says you are trying to access the min value hence you are getting a single value. remove np.min() try:
shortdist_from_centroid_to_node = cdist(centroids,nodes)
I'm trying to make an rgb color picture editor, using just numpy.
I've tried using a nested for loop, but it's really slow (over a minute).
I'm wanting to control first, second, and third element (r,g,b) of the third dimension of the nested array. Thanks
This is to just look at the numbers:
%matplotlib inline
import numpy as np
img = plt.imread('galaxy.jpg')
img = np.array(img)
for i in range(len(img)):
for j in range(len(img[i])):
for k in (img[i][j]):
print(k)
Perhaps this might help you. np.ndenumerate() lets you iterate through a matrix without nested for loops. I did a quick test and my second for loop (in the example below) is slightly faster than your triple nested for loop, as far as printing is concerned. Printing is very slow so taking out the print statements might help with speed. As far as modifying these values, I added r g b a variables that can be modified to scale the various pixel values. Just a thought, but perhaps it might give you more ideas to expand on. Also, I didn't check to see which index values correspond to r, g, b, or a.
r = 1.0
g = 1.0
b = 1.0
a = 1.0
for index, pixel in np.ndenumerate(img): # <--- Acheives the same as your original code
print(pixel)
for index, pixel in np.ndenumerate(img):
i = index[0]
j = index[1]
print("{} {} {} {}".format(img[i][j][0], img[i][j][1], img[i][j][2], img[i][j][3]))
for index, pixel in np.ndenumerate(img):
i = index[0]
j = index[1]
imgp[i][j][0] *= r;
imgp[i][j][1] *= g;
imgp[i][j][2] *= b;
imgp[i][j][3] *= a;
Hope this helps
I have an error function, and sum of all errors on self.array:
#'array' looks something like this [[x1,y1],[x2,y2],[x3,y3],...,[xn,yn]]
#'distances' is an array with same length as array with different int values in it
def calcError(self,n,X,Y): #calculate distance of nth member of array from given point
X,Y = float(X),float(Y)
arrX = float(self.array[n][0])
arrY = float(self.array[n][1])
e = 2.71828
eToThePower = e**(-1*self.distances[n])
distanceFromPoint=math.sqrt((arrX-X)**2+(arrY-Y)**2)
return float(eToThePower*(distanceFromPoint-self.distances[n])**2)
def sumFunction(self,X,Y):
res = 0.0
for i in range(len(self.array)):
res += self.calcError(i,X,Y)
return res
I have been looking for a way to find for which coordinates sumFunction return value is minimal. I have heard about scipy yet I am looking for a way to build that manualy. Gradient descent won't seem to work either since it is very hard to derive this sum function.
Thank you!
Did you try that create variable as a dictionary then append all iteration like this {self.calcError(i,X,Y)}:{i,X,Y}. If you return minimum the variable.keys then you can reach the coordinate from the min keys to value.
I have a matrix x, and a matrix p of the same structure and size.
One row represents the coordinates of an n-dimensional point.
I have a function f which takes a point (a row so to say) and computes a score for it.
Given x and p, I'd like to replace row i in p with row i in x if row i in x is smaller than row i in p according to my function f, formally:
for all row indices i do:
p[i] = (x[i] if f(x[i]) < f(p[i]) else p[i])
Python's list comprehension is way to slow, so I need to do it in numpy, but I'm new to numpy and have tried and failed hard while trying to figure it out.
From other computations I already have, I've called them benchmarks for some reason, vectors for x and p where the value at index i is the score of row i.
Here's the relevant code:
benchmark_x = FUNCTION(x)
benchmark_p = FUNCTION(p)
# TODO Too slow, ask smart guys from StackOverflow
p = np.array([x[i] if benchmark_x[i] < benchmark_p[i] else p[i] for i in range(p.shape[0])])
How about this ?
pos = benchmark_x < benchmark_p
p[pos] = x[pos]
Let us say i have a tuple of strings as follows:
tos = ('12|edr4r\tedward\t21\n',
'1|edr4r\tedward\t21\n',
'3|edr4r\tedward\t21\n',
'8|edr4r\tedward\t21\n',
'10|edr4r\tedward\t21\n',
'2|edr4r\tedward\t21\n')
Where the format for each element in the tuple is:
'integer_number|id\tname\tage\n'
and each element in the tuple contains the same information, in this case,
'edr4r\tedward\t21\n'
and a map list that tells (over) which elements to compute the averages over the integer_numbers of tos.
map_lst = [0,0,1,2,1,0]
i.e., one average will be over tos[0], tos[1] and tos[5] (since 0 appears in positions 0, 1 and 5 of map_lst), the other average will be over tos[2] and tos[4], and finally one over tos[3].
I'd like to compute the averages of the numbers before '|' in an avgs_list that contain the averages, and (only) some of the information in each element of tos:
avgs_list = ['edr4r\tedward\t(12+1+2)/3\n',
'edr4r\tedward\t(3+10)/2\n',
'edr4r\tedward\t8\n']
Is there any pythonic way to do this. I am looking for a solution as generic as possible without hardcoding the number of indexes, etc.
I could do some for looping over the list, store and then compute averages but I thought there may be a more pythonic way to do it, using the map function or something else...
How is this?
def average(tos, map_lst):
"""
given
tos: a sequence of N|user\tname\tAGE\n
map_lst: a list with positions corresponding to those in tos, and values
indicating which group each tos element will be averaged with.
return the groups of averages as a list of user\tname\tAVG\n
"""
# get the leading nums
nums = [s.partition('|')[0] for s in tos]
# group them into lists that will be averaged together (based on the map)
avg_groups = [[] for i in set(map_lst)]
for i,n in zip(map_lst, nums):
avg_groups[i].append(n)
# generate the averages
def fmt(tup):
mid = tos[0].partition('|')[2].rpartition('\t')[0] # user\tname
if len(tup) > 1:
avg = '({0})/{1}'.format('+'.join(tup), len(tup))
else:
avg = str(tup[0])
return "{0}\t{1}\n".format(mid, avg)
return [fmt(l) for l in avg_groups]
Test:
tos = ('12|edr4r\tedward\t21\n','1|edr4r\tedward\t21\n','3|edr4r\tedward\t21\n','8|edr4r\tedward\t21\n','10|edr4r\tedward\t21\n','2|edr4r\tedward\t21\n')
map_lst = [0,0,1,2,1,0]
print(average(tos,map_lst))
>> ['edr4r\tedward\t(12+1+2)/3\n', 'edr4r\tedward\t(3+10)/2\n', 'edr4r\tedward\t8\n']
To actually calculate the averages of the leading integer, you could use something like:
averages = []
for n in range(max(map_lst) + 1): # however many averages needed
averages.append(sum(int(v.split("|")[0]) # get int from v
for i, v in enumerate(tos) # index and value
if map_lst[i] == n) # whether to use this v
/ float(map_lst.count(i))) # divide by number of ints
For your data, this gives
averages == [5.0, 6.5, 8.0]
I am a little confused by your output format, which seems to include the calculation to carry out but not the answer. I think you should focus less on using strings in your code; parse them at the start, create them at the end, but use other data structures in-between.
You could use pandas:
from pandas import *
import re
data = [re.split(r'\t|\|', x) for x in tos]
data = DataFrame(data)
data[3] = data[3].str.rstrip('\n')
data[0] = data[0].astype(int)
data[4] = map_lst
data.groupby([1,2,3,4])[0].mean()
Out[1]:
1 2 3 4
edr4r edward 21 0 5.0
1 6.5
2 8.0
Name: 0, dtype: float64