Combining Arrays within an Array - python
I am currently trying to generate a random dataset based on a choice of k, the number of clusters, and xlim and ylim as the boundaries to be inputted. I want my output to be as follows:
[array([11.7282981 , 6.89656728],
[ 9.88391172, 5.83611126],
[7.45631652, 7.88674093],
[8.38232831, 7.82884638])
This code is for k means project
Here is my attempt. First I create a cluster center, which is randomly generated within a range between 0 and xlimit and ylimit inputted. Then I create 2 (in this case 2 but I will be doing 100) random points around the cluster center with noise:
k = 2
xlim = 12
ylim = 12
f = []
for x in range(0,k):
clusterCenter = [random.randint(0,xlim),random.randint(0,ylim)]
cluster = np.random.randn(2, 2) + clusterCenter
f.append(cluster)
f
unfortunately the output comes out to be:
[array([[11.7282981 , 6.89656728],
[ 9.88391172, 5.83611126]]),
array([[7.45631652, 7.88674093],
[8.38232831, 7.82884638]])]
which is not what I want as I would like to put this into a pandas dataframe. can anyone help?
the numbers will be a lot greater, I have made it such that the cluster generated would be a set of 2 x and y co-ordinates, but would ideally want:
cluster = np.random.randn(100, 2) + clusterCenter
So keep that in consideration! any help would be greatly appreciated!
Replace f.append(cluster) with:
f = None # instead of []
...
if f is None:
f = cluster
else:
f = np.concatenate( (f, cluster) )
Related
How to split data into two graphs with mat plot lib
I would be so thankful if someone would be able to help me with this. I am creating a graph in matplotib however I would to love to split up the 14 lines created from the while loop into the x and y values of P, so instead of plt.plot(t,P) it would be plt.plot(t,((P[1])[0]))) and plt.plot(t,((P[1])[1]))). I would love if someone could help me very quick, it should be easy but i am just getting errors with the arrays ` #Altering Alpha in Tumor Cells vs PACCs #What is alpha? α = Rate of conversion of cancer cells to PACCs import numpy as np from scipy.integrate import odeint import matplotlib.pyplot as plt from google.colab import files value = -6 counter = -1 array = [] pac = [] while value <= 0: def modelP(x,t): P, C = x λc = 0.0601 K = 2000 α = 1 * (10**value) ν = 1 * (10**-6) λp = 0.1 γ = 2 #returning odes dPdt = ((λp))*P*(1-(C+(γ*P))/K)+ (α*C) dCdt = ((λc)*C)*(1-(C+(γ*P))/K)-(α*C) + (ν***P) return dPdt, dCdt #initial C0= 256 P0 = 0 Pinit = [P0,C0] #time points t = np.linspace(0,730) #solve odes P = odeint(modelP,Pinit,t) plt.plot(t,P) value += 1 #plot results plt.xlabel('Time [days]') plt.ylabel('Number of PACCs') plt.show() `
You can use subplots() to create two subplots and then plot the individual line into the plot you need. To do this, firstly add the subplots at the start (before the while loop) by adding this line... fig, ax = plt.subplots(2,1) ## Plot will 2 rows, 1 column... change if required Then... within the while loop, replace the plotting line... plt.plot(t,P) with (do take care of the space so that the lines are within while loop) if value < -3: ## I am using value = -3 as the point of split, change as needed ax[0].plot(t,P)#, ax=ax[0]) ## Add to first plot else: ax[1].plot(t,P)#,ax=ax[1]) ## Add to second plot This will give a plot like this.
Is it possible/ good practice to plot a pandas column containing tuples representing coordinates (x, y)?
I am building a simple simulation of a surf ride where at any given second, the surfer has coordonates (x, y) on a 2 dimensional plan. I defined a function generating a list of tuple coordonates for a 100 seconds ride : def takeof(vague, surfer): positions = [(0,0)] positionV.y = 0 positionS.y = 0 positionS.x = 0 for i in range (1, 100): positionV.y = positionV.y + vague.vitesse + random.choice([0.5, 0, -0.5]) positionS.y = positionV.y positionS.x = positionS.x + surfer.vitesse + random.choice([0.25, 0, -0.25]) moveto = (positionS.x, positionS.y) positions.append(moveto) return positions Then in order to plot 10 simulated waves on a same graph, I defined another function assuming my data would be easier to work with inside a pandas DataFrame : def sim(vague, surfer): dflist = pd.DataFrame() convertisseur = {1:'1', 2:'2', 3:'3', 4:'4', 5:'5', 6:'6', 7:'7', 8:'8', 9:'9', 10:'10'} for n in range (1,10): coordonnees = takeof(normale, pablo) dflist[convertisseur[n]] = coordonnees print(coordonnees) print(dflist[convertisseur[n]]) print(dflist) The DataFrame gets created fine with each column being a series of tuple coordinates, however I struggle in my attempts to plot all 10 waves in a single graph. Is there an obvious way I can't see? Is it generally a bad idea to proceed this way when simulating coordinates?
drowing a 1d lattice graph in python networkx
I am tying to plot 1d lattice graph, but i face with below: NetworkXPointlessConcept: the null graph has no paths, thus there is no averageshortest path length what is the problem of this code? thanks. N = 1000 x = 0 for n in range(1, N, 10): lattice_1d_distance = list() d = 0 lattice_1d = nx.grid_graph(range(1,n)) d = nx.average_shortest_path_length(lattice_1d) lattice_1d_distance.append(d) x.append(n) plt.plot(x, lattice_1d_distance) plt.show()
According to networkx documentation nx.grid_graph the input is a list of dimensions for nx.grid_graph Example print(list(range(1,4))) nx.draw(nx.grid_graph(list(range(1,4))) # this is a two dimensional graph, as there is only 3 entries AND ONE ENTRY = 1 [1, 2, 3] print(list(range(1,5))) nx.draw(nx.grid_graph([1,2,3,4])) # this is a 3 dimensional graph, as there is only 4 entries AND ONE ENTRY = 1 [1, 2, 3, 4] Therefore, lets say if you want to 1. plot the distance vs increment of number of dimensions for grid graphs but with constant size for each dimension, or you want to 2. plot the distance vs increment of size for each dimension for grid graphs but with constant number of dimensions: import networkx as nx import matplotlib.pyplot as plt N = 10 x = [] lattice_1d_distance = [] for n in range(1, 10): d = 0 lattice_1d = nx.grid_graph([2]*n) # plotting incrementing number of dimensions, but each dimension have same length. d = nx.average_shortest_path_length(lattice_1d) lattice_1d_distance.append(d) x.append(n) plt.plot(x, lattice_1d_distance) plt.show() N = 10 x = [] lattice_1d_distance = [] for n in range(1, 10): d = 0 lattice_1d = nx.grid_graph([n,n]) # plotting 2 dimensional graphs, but each graph have incrementing length for each dimension. d = nx.average_shortest_path_length(lattice_1d) lattice_1d_distance.append(d) x.append(n) plt.plot(x, lattice_1d_distance) plt.show() Also, you need to pay attention to the declaration of list variables.
Having some problem to understand the x_bin in regplot of Seaborn
I used the seaborn.regplot to plot data, but not quite understand how the error bar in regplot was calculated. I have compared the results with the mean and standard deviation derived from mannual calculation. Here is my testing script. import numpy as np import pandas as pd import seaborn as sn def get_data_XYE(p): x_list = [] lower_list = [] upper_list = [] for line in p.lines: x_list.append(line.get_xdata()[0]) lower_list.append(line.get_ydata()[0]) upper_list.append(line.get_ydata()[1]) y = 0.5 * (np.asarray(lower_list) + np.asarray(upper_list)) y_error = np.asarray(upper_list) - y x = np.asarray(x_list) return x, y, y_error x = [37.3448,36.6026,42.7795,34.7072,75.4027,226.2615,192.7984,140.8045,242.9952,458.451,640.6542,726.1024,231.7347,107.5605,200.2254,190.0006,314.1349,146.8131,152.4497,175.9096,284.9926,116.9681,118.2953,312.3787,815.8389,458.0146,409.5797,595.5373,188.9955,15.7716,36.1839,244.8689,57.4579,94.8717,112.2237,87.0687,72.79,22.3457,24.1728,29.505,80.8765,252.7454,280.6002,252.9573,348.246,112.705,98.7545,317.0541,300.9573,402.8411,406.6884,56.1286,30.1385,32.9909,497.556,19.3606,20.8409,95.2324,108.6074,15.7753,54.5511,45.5623,64.564,101.1934,81.8459,88.286,58.2642,56.1225,51.2943,38.0649,63.5882,63.6847,120.495,102.4097,49.3255,111.3309,171.6028,58.9526,28.7698,144.6884,180.0661,116.6028,146.2594,199.8702,128.9378,423.2363,119.8537,124.6508,518.8625,306.3023,79.5213,121.0309,116.9346,170.8863,930.361,48.9983,55.039,47.1092,72.0548,75.4045,103.521,83.4134,142.3253,146.6215,121.4467,101.4252,68.4812,291.4275,143.9475,142.647,78.9826,47.094,204.2196,89.0208,82.792,27.1346,142.4764,83.7874,67.3216,112.9531,138.2549,133.3446,86.2659,45.3464,56.1604,43.5882,54.3623,86.296,115.7272,96.5498,111.8081,36.1756,40.2947,34.2532,89.1452,53.9062,36.458,113.9297,176.9962,77.3125,77.8891,64.807,64.1515,127.7242,119.6876,976.2324,322.8454,434.2883,168.6923,250.0284,234.7329,131.0793,152.335,118.8838,243.1772,24.1776,168.6327,170.7541,167.8444,75.9315,110.1045,113.4417,60.5464,66.8956,79.7606,71.6659,72.5251,77.513,207.8019,21.8592,35.2787,169.7698,146.5012,412.9934,248.0708,318.5489,104.1278,184.7592,108.0581,175.2646,169.7698,340.3732,570.3396,23.9853,69.0405,66.7391,67.9435,294.6085,68.0537,77.6344,433.2713,104.3178,229.4615,187.8587,78.1399,121.4737,122.5451,384.5935,38.5232,117.6835,50.3308,318.2513,103.6695,20.7181,321.9601,510.3248,13.4754,16.1188,44.8082,37.7291,733.4587,446.6241,21.1822,287.9603,327.2367,274.1109,195.4713,158.2114,64.4537,26.9857,172.8503] y = [37,40,30,29,24,23,27,12,21,20,29,28,27,32,23,29,28,22,28,23,24,29,32,18,22,12,12,14,29,31,34,31,22,40,25,36,27,27,29,35,33,25,25,27,27,19,35,26,18,24,25,37,52,47,34,39,40,48,41,44,35,36,53,46,38,44,23,26,26,28,27,21,25,21,20,27,35,24,46,34,22,30,30,30,31,26,25,28,21,31,24,27,33,21,31,33,29,33,32,21,25,22,39,31,34,26,23,18,20,18,34,25,20,12,23,25,21,21,25,31,17,27,28,29,25,24,25,21,24,27,23,22,23,22,22,26,22,19,26,35,33,35,29,26,26,30,22,32,33,33,28,32,26,29,36,37,37,28,24,30,25,20,29,24,33,35,30,32,31,33,40,35,37,24,34,29,27,24,36,26,26,26,27,27,20,17,28,34,18,20,20,18,19,23,20,22,25,32,44,41,39,41,40,44,36,42,31,32,26,29,23,29,29,28,31,22,29,24,28,28,25] xbreaks = [13.4754, 27.1346, 43.5882, 58.9526, 72.79, 89.1452, 110.1045, 131.0793, 158.2114, 180.0661, 207.8019, 234.7329, 252.9573, 300.9573, 327.2367, 348.246, 412.9934, 434.2883, 458.451, 518.8625, 595.5373, 640.6542, 733.4587, 815.8389, 930.361, 976.2324] df = pd.DataFrame([x,y]).T df.columns = ['x','y'] # Check the bin average and std using agge bins = pd.cut(df.x,xbreaks,right=False) t = df[['x','y']].groupby(bins).agg({"x": "mean", "y": ["mean","std"]}) t.reset_index(inplace=True) t.columns = ['range_cut','x_avg_cut','y_avg_cut','y_std_cut'] t.index.name ='id' # Get the bin average from g = sns.regplot(x='x',y='y',data=df,fit_reg=False,x_bins=xbreaks,seed=seed) xye = pd.DataFrame(get_data_XYE(g)).T xye.columns = ['x_regplot','y_regplot','e_regplot'] xye.index.name = 'id' t2 = xye.merge(t,on='id',how='left') t2 You can see the y and e from the two ways are different. I understand that the default x_ci or x_estimator may afect the result of regplot, but I still can not the these values in excel by removing some lowest and/or highest values in each bin.
In seaborn.regplot, the x_bins are the center of each bin, and the original x values are assigned to the nearest bin value. Whereas in pandas.cut, the breaks define the bin edges.
Bin average as a function of position
I want to efficiently calculate the average of a variable (say temperature) over multiple areas of the plane. I essentially want to do the following. import numpy as np num = 10000 XYT = np.random.uniform(0, 1, (num, 3)) X = np.transpose(XYT)[0] Y = np.transpose(XYT)[1] T = np.transpose(XYT)[2] size = 10 bins = np.empty((size, size)) for i in range(size): for j in range(size): if rescaled X,Y in bin[i][j]: bins[i][j] = mean T
I would use pandas (although im sure you can achieve basically the same with vanilla numpy) df = pandas.DataFrame({'x':npX,'y':npY,'z':npZ}) # solve quadrants df['quadrant'] = (df['x']>=0)*2 + (df['y']>=0)*1 # group by and aggregate mean_per_quadrant = df.groupby(['quadrant'])['temp'].aggregate(['mean']) you may need to create multiple quadrant cutoffs to get unique groupings for example (df['x']>=50)*4 + (df['x']>=0)*2 + (df['y']>=0)*1 would add an extra 2 quadrants to our group (one y>=0, and one y<0) (just make sure you use powers of 2)