go over and search in numpy array faster

go over and search in numpy array faster - python

my array looks like in the picture:
I would like to know how to return an array(newarr) with the following logic(here is a pseudo code):I need it to be implement as faster as possible!
please help be to write it in the correct and faster way in python.(maybe using numpy,i understand that numpy is fast)
i=0
for x in arraynumpy
i=i+1
for y in arraynumpy[0:i-1]
if x[0]==y[1] and x[1]==y[0] and x[2]==y[2]
newarr.append(x)
continue; # break the loop for y,if found
the array that will be returned for the input like in the picture,will be:
[[20,10,'1'],[30,10,'1']]
thank you

Your pseudo-code is in quadratic time. Here is a solution which is in linear time (much better if you have big inputs). It does not use numpy arrays, as I don't think that would help getting better performance.
def find_stuff(l):
result = []
index = set()
for x1, x2, x3 in l:
if (x2, x1, x3) in index:
result.append((x1, x2, x3))
index.add((x1, x2, x3))
return result
And, on your example:
>>> a = [(10, 20, 1), (10, 30, 1), (10, 50, 1), (10, 108, 1), (10, 200010, 1), (20, 10, 1), (20, 108, 1), (20, 710, 1), (20, 710, 1), (20, 200020, 1), (30, 10, 1)]
>>> find_stuff(a)
[(20, 10, 1), (30, 10, 1)]
>>>
Note that if you really need better performance, maybe you should consider using an other language, as Python is quite slow.

Related

Using Python Pandas, why is my vectorized gradient descent so much slower than my gradient descent using loops?

Here is my vectorized version:
import pandas as pd
import numpy as np
alpha=.001
data = [ (2, 3), (4, 7), (6, 11), (8, 17), (10, 23), (12, 31), (14, 39), (16, 49), (18, 59), (20, 71), (22, 83), (24, 97), (26, 113), (28, 131), (30, 149), (32, 169), (34, 191), (36, 214), (38, 239), (40, 266), (42, 295), (44, 326), (46, 359), (48, 394), (50, 431)]
##CREATE EXAMPLES MATRIX
x_coordinates = [x[0] for x in data]
x_coords=[]
[x_coords.append([1,x]) for x in x_coordinates]
#Creates a list of all x-coordinates with a 1 column
examples=pd.DataFrame(x_coords).transpose()
#uses that list to create a dataframe. Must transpose so it is dimsion 2,25. rows are features, columns are specific examples.
##CREATE THETA MATRIX/VECTOR
theta_list = [1, 2]
theta = pd.DataFrame(theta_list).transpose()
#creates a df of dimension 1,2.
##CREATE Y VECTOR/MATRIX
y_coordinates = [x[1] for x in data]
y=pd.DataFrame(y_coordinates).transpose()
deriv=pd.DataFrame([])
count=0
while (deriv != 0).all().all() and count <= 500000:
length=len(data)
#theta*X
thetaX=theta.dot(examples)
error=thetaX-y
error_pt2=error.dot(examples.T)
deriv=alpha*(1/length)*error_pt2
theta=theta-deriv
print(theta)
count+=1
print(count)
Here is my version made with loops:
total=0
th0=0
th1=0
alpha=0.001
deriv0=1
deriv1=1
count=0
while deriv0 and deriv1 != 0 and count<=1000000:
total0=0
total1=0
#th0
for i in data:
hyp=th0+(th1*i[0])
#print("Hyp is {}".format(hyp))
total0+=(hyp-i[1])
deriv0=(1/25)*total0
th0temp=th0-(alpha*(deriv0))
#th1
for i in data:
hyp=th0+(th1*i[0])
total1+=(hyp-i[1])*i[0]
deriv1=(1/25)*total1
th1temp=th1-(alpha*(deriv1))
th0=th0temp
th1=th1temp
th0temp=0
th1temp=0
count+=1
print("Theta 0: {} \n Theta 1: {} \n\n".format(th0,th1))
print(count)
When I run the vectorized version, it takes almost 10 times as long to run. I would have expected using vectors would make this code much more efficient than running multiple loops. What gives? Is it just the computational overhead of Pandas which is making this run slower? Maybe Pandas isn't suited for this sort of algorithm

Best suitable approach to find nearest neighbour to (x, y, z) from list of triplets

I am trying to obtain a triplet from list of triplets that is closest to my required triplet incase if it was not found.
For example:
# V_s,V_g,V_r
triplets = [(500, 12, 5),
(400, 15, 2.5),
(400, 15, 3),
(450, 12, 3),
... ,
(350, 14, 3)]
The triple that I am looking for is
req_triplet = (450, 15, 2) #(Vreq_s, Vreq_g, Vreq_r)
How can I achieve this in python, a best suitable strategy to achieve is what I am in need for.
As of now I am thinking to filter the list by finding nearest parameter V_s. From the resulting list filter further by finding nearest to V_g and finally by V_r.

You can compute Euclidean distance with numPy or you can use
numpy.linalg.norm.
Try this:
>>> import numpy as np
>>> def dist(x,y):
... return np.sqrt(np.sum((x-y)**2))
>>> triplets = [(500, 12, 5), (400, 15, 2.5), (400, 15, 3),(450, 12, 3)(350, 14, 3)]
>>> req_triplet = (450, 15, 2)
>>> arr_dst = [np.linalg.norm(np.array(tr) - np.array(req_triplet)) for tr in triplets]
>>> arr_dst = [dist(np.array(tr), np.array(req_triplet)) for tr in triplets]
>>> arr_dst
[50.17967716117751, 50.002499937503124, 50.00999900019995, 3.1622776601683795, 100.00999950005]
>>> idx = np.argmin(arr_dst)
>>> idx
3
>>> triplets[idx]
(450, 12, 3)

You have to define a metric ||.||, then the triplet T that is close to a fixed one F is the one that minimize ||T - F||. You can use a classic Euclidean distance:
import numpy as np
def dist(u, v):
return np.sqrt(np.sum((np.array(u)-np.array(v))**2))

The general strategy would be to Loop through the list, for each element calculate the distance and check if it is the minimum, otherwise keep going on.
In python this would look something like this-
from math import abs
triplets = [(500, 12, 5),
(400, 15, 2.5),
(400, 15, 3),
(450, 12, 3),
... ,
(350, 14, 3)]
req_triplet = (450, 15, 2)
def calc_dist(a,b):
return sum((abs(a[i]-b[i]) for i in range(3))
def find_closest_triple(req_triplet,triplets):
min_ind = None
min_dist = -1
for i,triplet in enumerate(triplets):
if e == req_triplet:
return i
dist = calc_dist(req_triplet,triplet)
if dist < min_dist:
min_ind = i
return min_ind

Python Fit Polynomial to 3d Data

Have a set of data points (x,y,z), and trying to fit a generic quadratic to them using scipy optimize.curve_fit.
I have tried a couple different methods, but can't seem to make it work. Any guidance on why I would be getting this error, or a recommended different method?
Error is "ValueError: operands could not be broadcast together with shapes (2,) (12,)"
import scipy.optimize as optimize
XY = [(11, 70), (11, 75), (11, 80), (11, 85), (12, 70), (12, 75), (12, 80), (12, 85), (13, 70), (13, 75), (13, 80), (13, 85)]
Z = [203.84, 208, 218.4, 235.872, 228.30080000000004, 232.96000000000004, 244.60800000000006, 264.1766400000001, 254.8, 260, 273, 294.84000000000003]
guess = (1,1,1,1,1,1)
def fit(X, a, b, c, d, f, g):
return a + (b*X[0])+(c*X[1])+(d*X[0]**2)+(f*X[1]**2)+(g*X[0]*X[1])
params, cov = optimize.curve_fit(fit, XY, Z, guess)

According to the docs you need XY to be size (k,M) where k is the dimension and M is the number of samples. In your case you've defined XY to be size (M,k).
Try the following
import numpy as np
...
params, cov = optimize.curve_fit(fit, np.transpose(XY), Z, guess)

Generating random vertices that don't repeat

I have the following code that is generating a random number of tuples in order to create a connected undirect weighted graph.
for i in xrange(0,10):
for j in xrange(0, (int)(10*random.random())):
b = (int)(10*random.random())
j = [(i,b)]
print(j)
When I run this code I am able to randomly generate random vertices (x,y), however I am running into an issue in that my b variable has the possibility of repeating twice. For example I may get (6,3) followed by (6,3) which when I add weights would ruin the graphs I'm trying to create. Also I am sometimes running into issues where I might get (2,4) and then later get (4,2) which again ruins the graphs that I'm trying to create.
Does anyone know how I can keep vertices from repeating?

Use random.sample
>>> lst1 = random.sample(range(20), 10)
>>> lst2 = random.sample(range(20), 10)
>>> zip(lst1, lst2)
[(19, 5), (5, 11), (9, 19), (0, 9), (4, 6), (12, 0), (7, 12), (16, 1), (10, 7), (15, 16)]
You can change the list generated by range(20) to suit your set of vertices.
Don't generate a new vertex inside the for loop, since it may generate the same one again. (don't pick randomly off of the random.sample lists) Just generate them once and zip them together.
Since you want to remove duplicates of type (x, y) and (y, x). You can do something like the following. (or the simple in method works too)
>>> r = [(19, 5), (5, 11), (5, 19)]
>>> from itertools import groupby
>>> m = map(set, r)
>>> m
[set([19, 5]), set([11, 5]), set([19, 5])]
>>> sorted(m, key=lambda x: sum(x))
[set([11, 5]), set([19, 5]), set([19, 5])]
>>> [tuple(k) for k, v in groupby(sorted(m, key = lambda x: sum(x)))]
[(11, 5), (19, 5)]

Visually Representing X and Y Values

I have a list of (x,y) values that are in a list like [(x,y),(x,y),(x,y)....]. I feel like there is a solution in matplotlib, but I couldn't quite get there because of my formatting. I would like to plot it as a histogram or line plot. Any help is appreciated.

You can quite easily convert a list of (x, y) tuples into a list of two tuples of x- and y- coordinates using the * ('splat') operator (see also this SO question):
>>> zip(*[(0, 0), (1, 1), (2, 4), (3, 9)])
[(0, 1, 2, 3), (0, 1, 4, 9)]
And then, you can use the * operator again to unpack those arguments into plt.plot
>>> plt.plot(*zip(*[(0, 0), (1, 1), (2, 4), (3, 9)]))
or even plt.bar
>>> plt.bar(*zip(*[(0, 0), (1, 1), (2, 4), (3, 9)]))

Perhaps you could try something like this (also see):
import numpy as np:
xs=[]; ys=[]
for x,y in xy_list:
xs.append(x)
ys.append(y)
xs=np.asarray(xs)
ys=np.asarray(ys)
plot(xs,ys,'ro')
Maybe not the most elegant solution, but it should work. Cheers, Trond

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

go over and search in numpy array faster - python

Related

Using Python Pandas, why is my vectorized gradient descent so much slower than my gradient descent using loops?

Best suitable approach to find nearest neighbour to (x, y, z) from list of triplets

Python Fit Polynomial to 3d Data

Generating random vertices that don't repeat

Visually Representing X and Y Values

Categories

Resources