Related
Here is my vectorized version:
import pandas as pd
import numpy as np
alpha=.001
data = [ (2, 3), (4, 7), (6, 11), (8, 17), (10, 23), (12, 31), (14, 39), (16, 49), (18, 59), (20, 71), (22, 83), (24, 97), (26, 113), (28, 131), (30, 149), (32, 169), (34, 191), (36, 214), (38, 239), (40, 266), (42, 295), (44, 326), (46, 359), (48, 394), (50, 431)]
##CREATE EXAMPLES MATRIX
x_coordinates = [x[0] for x in data]
x_coords=[]
[x_coords.append([1,x]) for x in x_coordinates]
#Creates a list of all x-coordinates with a 1 column
examples=pd.DataFrame(x_coords).transpose()
#uses that list to create a dataframe. Must transpose so it is dimsion 2,25. rows are features, columns are specific examples.
##CREATE THETA MATRIX/VECTOR
theta_list = [1, 2]
theta = pd.DataFrame(theta_list).transpose()
#creates a df of dimension 1,2.
##CREATE Y VECTOR/MATRIX
y_coordinates = [x[1] for x in data]
y=pd.DataFrame(y_coordinates).transpose()
deriv=pd.DataFrame([])
count=0
while (deriv != 0).all().all() and count <= 500000:
length=len(data)
#theta*X
thetaX=theta.dot(examples)
error=thetaX-y
error_pt2=error.dot(examples.T)
deriv=alpha*(1/length)*error_pt2
theta=theta-deriv
print(theta)
count+=1
print(count)
Here is my version made with loops:
total=0
th0=0
th1=0
alpha=0.001
deriv0=1
deriv1=1
count=0
while deriv0 and deriv1 != 0 and count<=1000000:
total0=0
total1=0
#th0
for i in data:
hyp=th0+(th1*i[0])
#print("Hyp is {}".format(hyp))
total0+=(hyp-i[1])
deriv0=(1/25)*total0
th0temp=th0-(alpha*(deriv0))
#th1
for i in data:
hyp=th0+(th1*i[0])
total1+=(hyp-i[1])*i[0]
deriv1=(1/25)*total1
th1temp=th1-(alpha*(deriv1))
th0=th0temp
th1=th1temp
th0temp=0
th1temp=0
count+=1
print("Theta 0: {} \n Theta 1: {} \n\n".format(th0,th1))
print(count)
When I run the vectorized version, it takes almost 10 times as long to run. I would have expected using vectors would make this code much more efficient than running multiple loops. What gives? Is it just the computational overhead of Pandas which is making this run slower? Maybe Pandas isn't suited for this sort of algorithm
my array looks like in the picture:
I would like to know how to return an array(newarr) with the following logic(here is a pseudo code):I need it to be implement as faster as possible!
please help be to write it in the correct and faster way in python.(maybe using numpy,i understand that numpy is fast)
i=0
for x in arraynumpy
i=i+1
for y in arraynumpy[0:i-1]
if x[0]==y[1] and x[1]==y[0] and x[2]==y[2]
newarr.append(x)
continue; # break the loop for y,if found
the array that will be returned for the input like in the picture,will be:
[[20,10,'1'],[30,10,'1']]
thank you
Your pseudo-code is in quadratic time. Here is a solution which is in linear time (much better if you have big inputs). It does not use numpy arrays, as I don't think that would help getting better performance.
def find_stuff(l):
result = []
index = set()
for x1, x2, x3 in l:
if (x2, x1, x3) in index:
result.append((x1, x2, x3))
index.add((x1, x2, x3))
return result
And, on your example:
>>> a = [(10, 20, 1), (10, 30, 1), (10, 50, 1), (10, 108, 1), (10, 200010, 1), (20, 10, 1), (20, 108, 1), (20, 710, 1), (20, 710, 1), (20, 200020, 1), (30, 10, 1)]
>>> find_stuff(a)
[(20, 10, 1), (30, 10, 1)]
>>>
Note that if you really need better performance, maybe you should consider using an other language, as Python is quite slow.
I'm generating a list of (x,y) coordinates from detecting a ball's flight in a video. The problem I have is for a few frames in the middle of the video the ball can't be detected, for these frames the list appends (-1,-1).
Is there a way to estimate the true (x,y) coordinates of the ball for these points?
Eg tracked points list being:
pointList = [(60, 40), (55, 42), (53, 43), (-1, -1), (-1, -1), (-1, -1), (35, 55), (30, 60)]
Then returning an estimate of what the 3 (-1,-1) missing coordinates would be with context to the sourounding points (preserving the curve).
If it's a ball then theoretically it should have a parabolic path, you could try and fit a curve ignoring the (-1, -1) and then replace the missing values.
Something like...
import numpy as np
pointList = [(60, 40), (55, 42), (53, 43), (-1, -1), (-1, -1), (-1, -1), (35, 55), (30, 60)]
x, y = list(zip(*[(x, y) for (x, y) in pointList if x>0]))
fit = np.polyfit(x, y, 2)
polynome = np.poly1d(fit)
# call your polynome for missing data, e.g.
missing = (55 - i*(55-35)/4 for i in range(3))
print([(m, polynome(m)) for m in missing])
giving ...
[(55.0, 41.971982486554325), (50.0, 44.426515896714186), (45.0, 47.44514924300471)]
You could use scipys spline to interpolate the missing values:
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import splprep, splev
pointList = [(60, 40), (55, 42), (53, 43),
(-1, -1), (-1, -1), (-1, -1),
(35, 55), (30, 60)]
# Remove the missing values
pointList = np.array(pointList)
pointList = pointList[pointList[:, 0] != -1, :]
def spline(x, n, k=2):
tck = splprep(x.T, s=0, k=k)[0]
u = np.linspace(0.0, 1.0, n)
return np.column_stack(splev(x=u, tck=tck))
# Interpolate the points with a quadratic spline at 100 points
pointList_interpolated = spline(pointList, n=100, k=2)
plt.plot(*pointList.T, c='r', ls='', marker='o', zorder=10)
plt.plot(*pointList_interpolated.T, c='b')
If camera is not moving - just the ball and you ignore the wind, then trajectory is parabolic. See: https://en.wikipedia.org/wiki/Trajectory#Uniform_gravity,_neither_drag_nor_wind
In this case fit quadratic function to points which you know and you will get missing ones. Set also error of boundary points in the vicinity of unknown area (point 53,43 and 35, 55) to be 0 or close to 0 (no-error, big weight in interpolation) when fitting so your interpolation will go through these points.
There are some libraries for polynomial fit. E.g. numpy.polyfit:
https://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.polynomial.polynomial.polyfit.html
Having a 2d data array data and two co-ordinate arrays x, y I can plot a contour plot with matplotlib at any given level
import numpy as np
import matplotlib.pyplot as plt
x, y = np.linspace(0, 2*np.pi), np.linspace(0, 2*np.pi)
xx, yy = np.meshgrid(x, y)
data = np.sin(xx) * np.sin(yy)
level = 0.5
contour_ = plt.contour(xx, yy, data, levels=[level])
plt.show()
Now, I am not really interested in plotting, but rather in position of the contour. For example, I want to see if the contour lies within the x, y domain or 'leaks' outside.
I can get a path object containing (x, y) points of the contour by calling
contour_path = contour_.collections[0].get_paths()
My question is whether there are standard tools to get the same (or analogous) information using only numpy, without matplotlib module. Since no plotting is involved, that would be reasonable.
If you read the source code of contour you can find Cntr:
from matplotlib._cntr import Cntr
x, y = np.linspace(0, 2*np.pi), np.linspace(0, 2*np.pi)
xx, yy = np.meshgrid(x, y)
data = np.sin(xx) * np.sin(yy)
level = 0.5
cntr = Cntr(xx, yy, data)
res = cntr.trace(level)
res is a list that contains the pathes and the codes.
If you just have the data field, you can find approximately where the boundary lies;
In [1]: import numpy as np
In [2]: x, y = np.linspace(0, 2*np.pi), np.linspace(0, 2*np.pi)
In [3]: xx, yy = np.meshgrid(x, y)
In [4]: data = np.sin(xx) * np.sin(yy)
In [5]: scan = np.logical_and(data>0.45, data<0.55)
In [6]: a, b = scan.shape
In [7]: for x in range(a):
for y in range(b):
if scan[x,y]:
print('({}, {}),'.format(x,y), end='')
...:
(4, 10),(4, 11),(4, 12),(4, 13),(4, 14),(4, 15),(5, 7),(5, 8),(5, 9),
(5, 16),(5, 17),(6, 6),(6, 7),(6, 18),(6, 19),(7, 5),(7, 6),(7, 19),
(8, 5),(8, 20),(9, 5),(9, 20),(10, 4),(10, 20),(11, 4),(11, 20),
(12, 4),(12, 20),(13, 4),(13, 20),(14, 4),(14, 20),(15, 4),(15, 20),
(16, 5),(16, 20),(17, 5),(17, 19),(18, 6),(18, 18),(18, 19),(19, 6),
(19, 7),(19, 17),(19, 18),(20, 8),(20, 9),(20, 10),(20, 11),(20, 12),
(20, 13),(20, 14),(20, 15),(20, 16),(29, 33),(29, 34),(29, 35),
(29, 36),(29, 37),(29, 38),(29, 39),(29, 40),(29, 41),(30, 31),
(30, 32),(30, 42),(30, 43),(31, 30),(31, 31),(31, 43),(32, 30),
(32, 44),(33, 29),(33, 44),(34, 29),(34, 45),(35, 29),(35, 45),
(36, 29),(36, 45),(37, 29),(37, 45),(38, 29),(38, 45),(39, 29),
(39, 45),(40, 29),(40, 44),(41, 29),(41, 44),(42, 30),(42, 43),
(42, 44),(43, 30),(43, 31),(43, 42),(43, 43),(44, 32),(44, 33),
(44, 40),(44, 41),(44, 42),(45, 34),(45, 35),(45, 36),(45, 37),
(45, 38),(45, 39),
Of course of make the scan range too small, you won't find many points.
In [9]: scan2 = np.logical_and(data>0.49, data<0.51)
In [10]: for x in range(a):
for y in range(b):
if scan2[x,y]:
print('({}, {}),'.format(x,y), end='')
....:
(4, 12),(5, 17),(7, 19),(9, 20),(12, 4),(17, 5),(19, 7),(20, 9),
(29, 40),(30, 42),(32, 44),(37, 45),(40, 29),(42, 30),(44, 32),
(45, 37),
I am rather new to Python and NetworkX. I need to create a list similar to Edgelist=[(0,1),(0,3),(1,0),(1,2),(1,4),(2,1),(2,5)], which elements represent the starting and ending node of an edge (link) that is in turn part of a network.
Rather than setting them manually, I want Python to create the couples you see in the list by randomly selecting the integer values of (start,end) from an assigned range of values (namely, 0, 999), which represent the node IDs. Then, I want to make sure that every node ID is included at least once in the series of (start,end) values (this means that all my nodes will be connected to at least one other node).
I know I could use random.randint(0, 999) but I don't know how to "nest" it into the creation of a list (perhaps a for loop?). I wish I had some code to show you but this is my first attempt at working with NetworkX!
EDIT
To give you a visual idea of what I mean, here are two images. The first is a regular network (aka lattice), and the second is a random one. The edge list of the first was created manually in order to reproduce a chess table, while the second displays an edge list which is a (manually) shuffled counterpart of the first one. As you see, the nodes are kept in exactly the same locations. Hope this helps a bit more. Thanks!
There is a similar answer but for a complete graph on - How to generate a fully connected subgraph from node list using python's networkx module
In your case, using zubinmehta's answer:
import networkx
import itertools
def complete_graph_from_list(L, create_using=None):
G = networkx.empty_graph(len(L),create_using)
if len(L)>1:
if G.is_directed():
edges = itertools.permutations(L,2)
else:
edges = itertools.combinations(L,2)
G.add_edges_from(edges)
return G
You could build the graph as:
S = complete_graph_from_list(map(lambda x: str(x), range(0,1000)))
print S.edges()
Here is a networkx command that will create a graph such that each node has exactly one edge:
import networkx as nx
G = nx.configuration_model([1]*1000)
If you look into the guts of it, it does the following which answers your question - each node will appear in exactly one edge.
import random
mylist = random.suffle(range(start,end))
edgelist = []
while mylist:
edgelist.append((mylist.pop(),mylist.pop()))
You should guarantee that mylist has even length before going through the popping.
Python has inbuilt library called itertools.
Sample as below as how you achieve what you mentioned:
import itertools
list = [3, 4, 6, 7]
sublist_length = 2
comb = itertools.combinations(list, sublist_length)
This will return comb as an iterator.
You can do comb.next() to get next element in the iterator or iterate over a for loop to get all results as you wanted as below.
for item in comb:
print item
which should output:
(3, 4),
(3, 6),
(3, 7),
(4, 6),
(4, 7),
(6, 7),
I hope this will solve your problem.
For the list creation you can do something like:
import random
max = 999
min = 0
original_values = range(min, max) # could be arbitrary list
n_edges = # some number..
my_edge_list = [(random.choice(original_values), random.choice(original_values))
for _ in range(n_edges)]
To assert you have all values in there you can do the following
vals = set([v for tup in my_edge_list for v in tup])
assert all([v in vals for v in original_values])
The assert will make sure you have the proper representation in your edges. As far as doing your best to make sure you don't hit that assert you can do a couple of things.
Sample without replacement from your list of integers until they are all gone to create a "base network" and then randomly add on more to your hearts desire
Make n_edges sufficiently high that it's very likely your condition will be met. If it's not try again...
Really depends on what you're going to use the network for and what kind of structure you want it to have
EDIT: I have updated my response to be more robust to an arbitrary list of values rather than requiring a sequential list
random.seed(datetime.datetime.now())
from random import randint
# ot generate 100 tuples with randints in range 0-99
li = [(randint(0,99),randint(0,99)) for i in range(100)]
print(li)
[(80, 55), (3, 10), (66, 65), (26, 23), (8, 72), (83, 25), (24, 99), (72, 9), (52, 76), (72, 68), (67, 25), (72, 18), (94, 62), (7, 62), (49, 94), (29, 89), (11, 38), (52, 51), (19, 32), (20, 85), (56, 61), (4, 40), (97, 58), (82, 2), (50, 82), (77, 5), (2, 9), (2, 46), (39, 4), (74, 40), (69, 15), (1, 77), (45, 58), (80, 59), (85, 80), (27, 80), (81, 4), (22, 33), (77, 60), (75, 87), (43, 36), (60, 34), (90, 54), (75, 3), (89, 84), (51, 93), (62, 64), (81, 50), (15, 60), (33, 97), (42, 62), (83, 26), (13, 33), (41, 87), (29, 63), (4, 32), (6, 14), (79, 73), (95, 4), (41, 16), (96, 64), (15, 28), (35, 13), (35, 82), (77, 16), (63, 27), (75, 37), (11, 52), (21, 35), (37, 96), (9, 86), (83, 11), (5, 42), (34, 32), (17, 8), (65, 55), (58, 19), (90, 40), (18, 75), (29, 14), (0, 11), (25, 68), (34, 52), (22, 8), (12, 53), (16, 49), (73, 54), (78, 80), (74, 60), (40, 68), (69, 20), (37, 38), (74, 60), (53, 90), (25, 48), (44, 52), (49, 27), (28, 35), (29, 94), (35, 60)]
Here is a solution that first generates a random population of nodes (pop1), then shuffles it (pop2) and combines those into a list of pairs.
Note: this method only yields vertices where each node is exactly once start and exactly once end, so maybe not what you're after. See below for another method
import random, copy
random.seed() # defaults to time.time() ...
# extract a number of samples - the number of nodes you want
pop1 = random.sample(xrange(1000), 10)
pop2 = copy.deepcopy( pop1 )
random.shuffle( pop2 )
# generate pairs from the same population - this guarantees your constraint
pairs = zip( pop1, pop2 )
print pairs
Output:
[(17, 347), (812, 688), (347, 266), (731, 342), (342, 49), (904, 17), (49, 731), (50, 904), (688, 50), (266, 812)]
Here is another method
This allows for duplicate occurrences of the nodes.
The idea is to draw start and end nodes from the same population:
import random
random.seed()
population = range(10) # any population would do
# choose randomly from the population for both ends
# so you can have duplicates
pairs = [(random.choice(population), random.choice(population) for _ in xrange(100)]
print pairs[:10]
Output:
[(1, 9), (7, 1), (8, 6), (4, 7), (6, 2), (7, 3), (0, 2), (1, 0), (8, 3), (8, 3)]