Combining Arrays within an Array - python

I am currently trying to generate a random dataset based on a choice of k, the number of clusters, and xlim and ylim as the boundaries to be inputted. I want my output to be as follows:
[array([11.7282981 , 6.89656728],
[ 9.88391172, 5.83611126],
[7.45631652, 7.88674093],
[8.38232831, 7.82884638])
This code is for k means project
Here is my attempt. First I create a cluster center, which is randomly generated within a range between 0 and xlimit and ylimit inputted. Then I create 2 (in this case 2 but I will be doing 100) random points around the cluster center with noise:
k = 2
xlim = 12
ylim = 12
f = []
for x in range(0,k):
clusterCenter = [random.randint(0,xlim),random.randint(0,ylim)]
cluster = np.random.randn(2, 2) + clusterCenter
f.append(cluster)
f
unfortunately the output comes out to be:
[array([[11.7282981 , 6.89656728],
[ 9.88391172, 5.83611126]]),
array([[7.45631652, 7.88674093],
[8.38232831, 7.82884638]])]
which is not what I want as I would like to put this into a pandas dataframe. can anyone help?
the numbers will be a lot greater, I have made it such that the cluster generated would be a set of 2 x and y co-ordinates, but would ideally want:
cluster = np.random.randn(100, 2) + clusterCenter
So keep that in consideration! any help would be greatly appreciated!

Replace f.append(cluster) with:
f = None # instead of []
...
if f is None:
f = cluster
else:
f = np.concatenate( (f, cluster) )

Related

How to split data into two graphs with mat plot lib

I would be so thankful if someone would be able to help me with this. I am creating a graph in matplotib however I would to love to split up the 14 lines created from the while loop into the x and y values of P, so instead of plt.plot(t,P) it would be plt.plot(t,((P[1])[0]))) and
plt.plot(t,((P[1])[1]))). I would love if someone could help me very quick, it should be easy but i am just getting errors with the arrays
`
#Altering Alpha in Tumor Cells vs PACCs
#What is alpha? α = Rate of conversion of cancer cells to PACCs
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
from google.colab import files
value = -6
counter = -1
array = []
pac = []
while value <= 0:
def modelP(x,t):
P, C = x
λc = 0.0601
K = 2000
α = 1 * (10**value)
ν = 1 * (10**-6)
λp = 0.1
γ = 2
#returning odes
dPdt = ((λp))*P*(1-(C+(γ*P))/K)+ (α*C)
dCdt = ((λc)*C)*(1-(C+(γ*P))/K)-(α*C) + (ν***P)
return dPdt, dCdt
#initial
C0= 256
P0 = 0
Pinit = [P0,C0]
#time points
t = np.linspace(0,730)
#solve odes
P = odeint(modelP,Pinit,t)
plt.plot(t,P)
value += 1
#plot results
plt.xlabel('Time [days]')
plt.ylabel('Number of PACCs')
plt.show()
`
You can use subplots() to create two subplots and then plot the individual line into the plot you need. To do this, firstly add the subplots at the start (before the while loop) by adding this line...
fig, ax = plt.subplots(2,1) ## Plot will 2 rows, 1 column... change if required
Then... within the while loop, replace the plotting line...
plt.plot(t,P)
with (do take care of the space so that the lines are within while loop)
if value < -3: ## I am using value = -3 as the point of split, change as needed
ax[0].plot(t,P)#, ax=ax[0]) ## Add to first plot
else:
ax[1].plot(t,P)#,ax=ax[1]) ## Add to second plot
This will give a plot like this.

Is it possible/ good practice to plot a pandas column containing tuples representing coordinates (x, y)?

I am building a simple simulation of a surf ride where at any given second, the surfer has coordonates (x, y) on a 2 dimensional plan.
I defined a function generating a list of tuple coordonates for a 100 seconds ride :
def takeof(vague, surfer):
positions = [(0,0)]
positionV.y = 0
positionS.y = 0
positionS.x = 0
for i in range (1, 100):
positionV.y = positionV.y + vague.vitesse + random.choice([0.5, 0, -0.5])
positionS.y = positionV.y
positionS.x = positionS.x + surfer.vitesse + random.choice([0.25, 0, -0.25])
moveto = (positionS.x, positionS.y)
positions.append(moveto)
return positions
Then in order to plot 10 simulated waves on a same graph, I defined another function assuming my data would be easier to work with inside a pandas DataFrame :
def sim(vague, surfer):
dflist = pd.DataFrame()
convertisseur = {1:'1', 2:'2', 3:'3', 4:'4', 5:'5', 6:'6', 7:'7', 8:'8', 9:'9', 10:'10'}
for n in range (1,10):
coordonnees = takeof(normale, pablo)
dflist[convertisseur[n]] = coordonnees
print(coordonnees)
print(dflist[convertisseur[n]])
print(dflist)
The DataFrame gets created fine with each column being a series of tuple coordinates, however I struggle in my attempts to plot all 10 waves in a single graph. Is there an obvious way I can't see? Is it generally a bad idea to proceed this way when simulating coordinates?

drowing a 1d lattice graph in python networkx

I am tying to plot 1d lattice graph, but i face with below:
NetworkXPointlessConcept: the null graph has no paths, thus there is no averageshortest path length
what is the problem of this code?
thanks.
N = 1000
x = 0
for n in range(1, N, 10):
lattice_1d_distance = list()
d = 0
lattice_1d = nx.grid_graph(range(1,n))
d = nx.average_shortest_path_length(lattice_1d)
lattice_1d_distance.append(d)
x.append(n)
plt.plot(x, lattice_1d_distance)
plt.show()
According to networkx documentation nx.grid_graph the input is a list of dimensions for nx.grid_graph
Example
print(list(range(1,4)))
nx.draw(nx.grid_graph(list(range(1,4))) # this is a two dimensional graph, as there is only 3 entries AND ONE ENTRY = 1
[1, 2, 3]
print(list(range(1,5)))
nx.draw(nx.grid_graph([1,2,3,4])) # this is a 3 dimensional graph, as there is only 4 entries AND ONE ENTRY = 1
[1, 2, 3, 4]
Therefore, lets say if you want to 1. plot the distance vs increment of number of dimensions for grid graphs but with constant size for each dimension, or you want to 2. plot the distance vs increment of size for each dimension for grid graphs but with constant number of dimensions:
import networkx as nx
import matplotlib.pyplot as plt
N = 10
x = []
lattice_1d_distance = []
for n in range(1, 10):
d = 0
lattice_1d = nx.grid_graph([2]*n) # plotting incrementing number of dimensions, but each dimension have same length.
d = nx.average_shortest_path_length(lattice_1d)
lattice_1d_distance.append(d)
x.append(n)
plt.plot(x, lattice_1d_distance)
plt.show()
N = 10
x = []
lattice_1d_distance = []
for n in range(1, 10):
d = 0
lattice_1d = nx.grid_graph([n,n]) # plotting 2 dimensional graphs, but each graph have incrementing length for each dimension.
d = nx.average_shortest_path_length(lattice_1d)
lattice_1d_distance.append(d)
x.append(n)
plt.plot(x, lattice_1d_distance)
plt.show()
Also, you need to pay attention to the declaration of list variables.

Having some problem to understand the x_bin in regplot of Seaborn

I used the seaborn.regplot to plot data, but not quite understand how the error bar in regplot was calculated. I have compared the results with the mean and standard deviation derived from mannual calculation. Here is my testing script.
import numpy as np
import pandas as pd
import seaborn as sn
def get_data_XYE(p):
x_list = []
lower_list = []
upper_list = []
for line in p.lines:
x_list.append(line.get_xdata()[0])
lower_list.append(line.get_ydata()[0])
upper_list.append(line.get_ydata()[1])
y = 0.5 * (np.asarray(lower_list) + np.asarray(upper_list))
y_error = np.asarray(upper_list) - y
x = np.asarray(x_list)
return x, y, y_error
x = [37.3448,36.6026,42.7795,34.7072,75.4027,226.2615,192.7984,140.8045,242.9952,458.451,640.6542,726.1024,231.7347,107.5605,200.2254,190.0006,314.1349,146.8131,152.4497,175.9096,284.9926,116.9681,118.2953,312.3787,815.8389,458.0146,409.5797,595.5373,188.9955,15.7716,36.1839,244.8689,57.4579,94.8717,112.2237,87.0687,72.79,22.3457,24.1728,29.505,80.8765,252.7454,280.6002,252.9573,348.246,112.705,98.7545,317.0541,300.9573,402.8411,406.6884,56.1286,30.1385,32.9909,497.556,19.3606,20.8409,95.2324,108.6074,15.7753,54.5511,45.5623,64.564,101.1934,81.8459,88.286,58.2642,56.1225,51.2943,38.0649,63.5882,63.6847,120.495,102.4097,49.3255,111.3309,171.6028,58.9526,28.7698,144.6884,180.0661,116.6028,146.2594,199.8702,128.9378,423.2363,119.8537,124.6508,518.8625,306.3023,79.5213,121.0309,116.9346,170.8863,930.361,48.9983,55.039,47.1092,72.0548,75.4045,103.521,83.4134,142.3253,146.6215,121.4467,101.4252,68.4812,291.4275,143.9475,142.647,78.9826,47.094,204.2196,89.0208,82.792,27.1346,142.4764,83.7874,67.3216,112.9531,138.2549,133.3446,86.2659,45.3464,56.1604,43.5882,54.3623,86.296,115.7272,96.5498,111.8081,36.1756,40.2947,34.2532,89.1452,53.9062,36.458,113.9297,176.9962,77.3125,77.8891,64.807,64.1515,127.7242,119.6876,976.2324,322.8454,434.2883,168.6923,250.0284,234.7329,131.0793,152.335,118.8838,243.1772,24.1776,168.6327,170.7541,167.8444,75.9315,110.1045,113.4417,60.5464,66.8956,79.7606,71.6659,72.5251,77.513,207.8019,21.8592,35.2787,169.7698,146.5012,412.9934,248.0708,318.5489,104.1278,184.7592,108.0581,175.2646,169.7698,340.3732,570.3396,23.9853,69.0405,66.7391,67.9435,294.6085,68.0537,77.6344,433.2713,104.3178,229.4615,187.8587,78.1399,121.4737,122.5451,384.5935,38.5232,117.6835,50.3308,318.2513,103.6695,20.7181,321.9601,510.3248,13.4754,16.1188,44.8082,37.7291,733.4587,446.6241,21.1822,287.9603,327.2367,274.1109,195.4713,158.2114,64.4537,26.9857,172.8503]
y = [37,40,30,29,24,23,27,12,21,20,29,28,27,32,23,29,28,22,28,23,24,29,32,18,22,12,12,14,29,31,34,31,22,40,25,36,27,27,29,35,33,25,25,27,27,19,35,26,18,24,25,37,52,47,34,39,40,48,41,44,35,36,53,46,38,44,23,26,26,28,27,21,25,21,20,27,35,24,46,34,22,30,30,30,31,26,25,28,21,31,24,27,33,21,31,33,29,33,32,21,25,22,39,31,34,26,23,18,20,18,34,25,20,12,23,25,21,21,25,31,17,27,28,29,25,24,25,21,24,27,23,22,23,22,22,26,22,19,26,35,33,35,29,26,26,30,22,32,33,33,28,32,26,29,36,37,37,28,24,30,25,20,29,24,33,35,30,32,31,33,40,35,37,24,34,29,27,24,36,26,26,26,27,27,20,17,28,34,18,20,20,18,19,23,20,22,25,32,44,41,39,41,40,44,36,42,31,32,26,29,23,29,29,28,31,22,29,24,28,28,25]
xbreaks = [13.4754, 27.1346, 43.5882, 58.9526, 72.79, 89.1452, 110.1045, 131.0793, 158.2114, 180.0661, 207.8019, 234.7329, 252.9573, 300.9573, 327.2367, 348.246, 412.9934, 434.2883, 458.451, 518.8625, 595.5373, 640.6542, 733.4587, 815.8389, 930.361, 976.2324]
df = pd.DataFrame([x,y]).T
df.columns = ['x','y']
# Check the bin average and std using agge
bins = pd.cut(df.x,xbreaks,right=False)
t = df[['x','y']].groupby(bins).agg({"x": "mean", "y": ["mean","std"]})
t.reset_index(inplace=True)
t.columns = ['range_cut','x_avg_cut','y_avg_cut','y_std_cut']
t.index.name ='id'
# Get the bin average from
g = sns.regplot(x='x',y='y',data=df,fit_reg=False,x_bins=xbreaks,seed=seed)
xye = pd.DataFrame(get_data_XYE(g)).T
xye.columns = ['x_regplot','y_regplot','e_regplot']
xye.index.name = 'id'
t2 = xye.merge(t,on='id',how='left')
t2
You can see the y and e from the two ways are different. I understand that the default x_ci or x_estimator may afect the result of regplot, but I still can not the these values in excel by removing some lowest and/or highest values in each bin.
In seaborn.regplot, the x_bins are the center of each bin, and the original x values are assigned to the nearest bin value. Whereas in pandas.cut, the breaks define the bin edges.

Bin average as a function of position

I want to efficiently calculate the average of a variable (say temperature) over multiple areas of the plane.
I essentially want to do the following.
import numpy as np
num = 10000
XYT = np.random.uniform(0, 1, (num, 3))
X = np.transpose(XYT)[0]
Y = np.transpose(XYT)[1]
T = np.transpose(XYT)[2]
size = 10
bins = np.empty((size, size))
for i in range(size):
for j in range(size):
if rescaled X,Y in bin[i][j]:
bins[i][j] = mean T
I would use pandas (although im sure you can achieve basically the same with vanilla numpy)
df = pandas.DataFrame({'x':npX,'y':npY,'z':npZ})
# solve quadrants
df['quadrant'] = (df['x']>=0)*2 + (df['y']>=0)*1
# group by and aggregate
mean_per_quadrant = df.groupby(['quadrant'])['temp'].aggregate(['mean'])
you may need to create multiple quadrant cutoffs to get unique groupings
for example (df['x']>=50)*4 + (df['x']>=0)*2 + (df['y']>=0)*1 would add an extra 2 quadrants to our group (one y>=0, and one y<0) (just make sure you use powers of 2)

Categories

Resources