Projecting the velocity on the line of sight - EDITED - python

I have a huge text file which contains the position (x,y,z) and velocity component (vx,vy,vz) of a million stars. After doing some rotations and projections, I obtain new position and velocity component (x',y',z',vx',vy',vz') of the stars.
My final step is to compute the velocity along the line of sight, it's like I have to "average" the vz component, and to do this I try to create a FITS file in which every pixel contain the mean value of the vz component.
Here there's a part of my code:
mod = np.genfromtxt('data_bar_region.txt')
x = list(mod[:,0])
y = list(mod[:,1])
vz = mod[:,5]
x_rang_1 = np.arange(-40, 41, 1)
y_rang_1 = np.arange(-40, 41, 1)
fake_data_1 = np.array((len(x_rang_1),len(x_rang_1)))
for i in range(len(x_rang_1)-1):
for j in range(len(y_rang_1)-1):
vel_tmp = []
for index in range(len(x)):
if x_rang_1[i] <= x[index] <= x_rang_1[i+1]:
if y_rang_1[j] <= y[index] <= y_rang_1[j+1]:
vel_tmp.append(vz[index])
fake_data_1[j,i] = np.mean(vel_tmp)
hdu1 = fits.PrimaryHDU(fake_data_1)
hdu1.writeto('TEST.fits')
This code is too much slow (it took about 8 hours on my laptop) and I don't know how I can speed up.
Did you have some suggestions or other ways to compute the v_LOS in a better and faster way?
EDIT : Before performing the "averaging", I have to divide the image into portions of several shape and dimensions (such portions are called "bins").
Here there's an [image of the bins (on the right panel, there's the same image of bins but it's zoomed to better evidence what are bins)] 1.
So, I have another FITS file (called bins.fits) with the same dimension of fake_data_1, and I just want to find the correspondence between these 2 files, because I want to calculate the mean and the std of the distribution of stars in the several bins.
Alternatively, I have a text file which contains the info on which pixel belongs to a specific bin, for example:
x y bin
1 1 34
1 2 34
1 3 34
. . .
34 56 37
34 57 37
34 58 37
and so on. The bins.file has the size (564,585), and so, also the fake_data_1, changing opportunity the start and stop of x and y range. I attached the whole script:
mod = np.genfromtxt('data_new_bar_scaled.txt')
# to match the correct position and size of the observation,
# I have to multiply by a factor equal to the semi-size
x = mod[:, 0]*(585-1)/200
y = mod[:, 1]*(564-1)/200
vz = mod[:,5]
A = fits.open('NGC4277_TESIkinematic.fits')
bins = A[7].data.T
start_x = -(585-1)/2
stop_x = (585-1)/2
step_x = step # step in x_rang_1
x_rang = np.arange(start_x, stop_x + step_x, step_x)
start_y = -(564-1)/2
stop_y = (564-1)/2
step_y = step # step in y_rang_1
y_rang = np.arange(start_y, stop_y + step_y, step_y)
fake_data_1 = np.empty((len(x_rang), len(y_rang)))
fake_data_1[:] = np.NaN # initialize with NaN
print(fake_data_1.shape)
print(bins.shape)
d = {}
for i in range(len(x)):
index_for_x = math.floor((x[i] - start_x) / step_x)
index_for_y = math.floor((y[i] - start_y) / step_y)
if 0 <= index_for_x < len(x_rang) and 0 <= index_for_y < len(y_rang):
key = (x_rang[index_for_x], y_rang[index_for_y])
if key in d:
d[key].append(vz[i])
else:
d[key] = [vz[i]]
bb = np.unique(bins)
print(len(bb))
for i, x in enumerate(x_rang):
for j, y in enumerate(y_rang):
key = (x, y)
for z in range(len(bb)):
j,k = np.where(bb[z]==bins)
print('index :', z)
if key in d:
fake_data_1[j,k] = np.mean(d[key])

Your code is so slow since the nested loops in your code iterate over a million of stars 1600 (80*80) times. You can improve the performance by using a dictionary and iterating over a million of stars just once.
You can try the following code, which is about 1600 times faster:
import numpy as np
import math
mod = np.genfromtxt('data_bar_region.txt')
x = list(mod[:, 0])
y = list(mod[:, 1])
vz = mod[:, 5]
x_rang_1 = np.arange(-40, 41, 1)
y_rang_1 = np.arange(-40, 41, 1)
fake_data_1 = np.empty((len(x_rang_1), len(y_rang_1)))
fake_data_1[:] = np.NaN # initialize with NaN
d = {}
for i in range(len(x)):
key = (math.floor(x[i]), math.floor(y[i]))
if key in d:
d[key].append(vz[i])
else:
d[key] = [vz[i]]
for i, x in enumerate(x_rang_1):
for j, y in enumerate(y_rang_1):
key = (x, y)
if key in d:
fake_data_1[i, j] = np.mean(d[key])
hdu1 = fits.PrimaryHDU(fake_data_1)
hdu1.writeto('TEST.fits')
UPDATE
For a generalized version for step in x_rang_1 (or y_rang_1), you can try the following code:
import numpy as np
import math
mod = np.genfromtxt('data_bar_region.txt')
x = list(mod[:, 0])
y = list(mod[:, 1])
vz = mod[:, 5]
start_x_rang_1 = -40
stop_x_rang_1 = 40
step_x_rang_1 = 0.5 # step in x_rang_1
x_rang_1 = np.arange(start_x_rang_1, stop_x_rang_1 + step_x_rang_1, step_x_rang_1)
start_y_rang_1 = -40
stop_y_rang_1 = 40
step_y_rang_1 = 1 # step in y_rang_1
y_rang_1 = np.arange(start_y_rang_1, stop_y_rang_1 + step_y_rang_1, step_y_rang_1)
fake_data_1 = np.empty((len(x_rang_1), len(y_rang_1)))
fake_data_1[:] = np.NaN # initialize with NaN
d = {}
for i in range(len(x)):
index_for_x_rang_1 = math.floor((x[i] - start_x_rang_1) / step_x_rang_1)
index_for_y_rang_1 = math.floor((y[i] - start_y_rang_1) / step_y_rang_1)
if 0 <= index_for_x_rang_1 < len(x_rang_1) and 0 <= index_for_y_rang_1 < len(y_rang_1):
key = (x_rang_1[index_for_x_rang_1], y_rang_1[index_for_y_rang_1])
if key in d:
d[key].append(vz[i])
else:
d[key] = [vz[i]]
for i, x in enumerate(x_rang_1):
for j, y in enumerate(y_rang_1):
key = (x, y)
if key in d:
fake_data_1[i, j] = np.mean(d[key])
hdu1 = fits.PrimaryHDU(fake_data_1)
hdu1.writeto('TEST.fits')
UPDATE 2
Maybe like the following?
When I supposed the inputs are
x y vz
0 0.1 10
1.8 0 4
1.2 1.9 5.2
bins = np.array(
[[34, 35, 34, 34, 36],
[37, 36, 34, 35, 36],
[34, 35, 37, 36, 34]]) # shape: (5, 3)
You want the following code?
import numpy as np
import math
x = np.array([0, 1.8, 1.2, ])
y = np.array([0.1, 0, 1.9, ])
vz = np.array([10, 4, 5.2])
start_x_rang_1 = 0
stop_x_rang_1 = 2
step_x_rang_1 = 1 # step in x_rang_1
x_rang_1 = np.arange(start_x_rang_1, stop_x_rang_1 + step_x_rang_1, step_x_rang_1)
start_y_rang_1 = 0
stop_y_rang_1 = 0.5
step_y_rang_1 = 2 # step in y_rang_1
y_rang_1 = np.arange(start_y_rang_1, stop_y_rang_1 + step_y_rang_1, step_y_rang_1)
fake_data_1 = np.empty((len(x_rang_1), len(y_rang_1))) # shape: (3, 5)
fake_data_1[:] = np.NaN # initialize with NaN
bins = np.array(
[[34, 35, 34, 34, 36],
[37, 36, 34, 35, 36],
[34, 35, 37, 36, 34]]) # shape: (3, 5)
d_bins = {}
for i in range(len(x)):
index_for_x_rang_1 = math.floor((x[i] - start_x_rang_1) / step_x_rang_1)
index_for_y_rang_1 = math.floor((y[i] - start_y_rang_1) / step_y_rang_1)
if 0 <= index_for_x_rang_1 < len(x_rang_1) and 0 <= index_for_y_rang_1 < len(y_rang_1):
key = bins[index_for_x_rang_1, index_for_y_rang_1]
if key in d_bins:
d_bins[key].append(vz[i])
else:
d_bins[key] = [vz[i]]
d_bins_mean = {}
for bin in d_bins:
d_bins_mean[bin] = np.mean(d_bins[bin])
get_corresponding_mean = np.vectorize(lambda x: d_bins_mean.get(x, np.NaN))
result = get_corresponding_mean(bins)
print(result)
which prints
[[10. nan 10. 10. nan]
[ 4.6 nan 10. nan nan]
[10. nan 4.6 nan 10. ]]

Related

find max value in islands defined by other vector

I have a vector of values vals, a same-dimension vector of frequencies freqs, and a set of frequency values pins.
I need to find the max values of vals within the corresponding interval around each pin (from pin-1 to pin+1). However, the intervals merge if they overlap (e.g., [1,2] and [0.5,1.5] become [0.5,2]).
I have a code that (I think) works, but I feel is not optimal at all:
import numpy as np
np.random.seed(666)
freqs = np.linspace(0, 20, 50)
vals = np.random.randint(100, size=(len(freqs), 1)).flatten()
print(freqs)
print(vals)
pins = [2, 6, 10, 11, 15, 15.2]
# find one interval for every pin and then sum to find final ones
islands = np.zeros((len(freqs), 1)).flatten()
for pin in pins:
island = np.zeros((len(freqs), 1)).flatten()
island[(freqs >= pin-1) * (freqs <= pin+1)] = 1
islands += island
islands = np.array([1 if x>0 else 0 for x in islands])
print(islands)
maxs = []
k = 0
idxs = []
for i,x in enumerate(islands):
if (x > 0) and (k == 0): # island begins
k += 1
idxs.append(i)
elif (x > 0) and (k > 0): # island continues
pass
elif (x == 0) and (k > 0): # island finishes
idxs.append(i)
maxs.append(np.max(vals[idxs[0]:idxs[1]]))
k = 0
idxs = []
continue
print(maxs)
Which gives maxs=[73, 97, 79, 77].
Here's some optimizations for your code. There are many numpy functions that make your life easier, get to know them and use them ;). I tried commenting my code to make it as understandable as possible, but let me know if anything is unclear!
import numpy as np
np.random.seed(666)
freqs = np.linspace(0, 20, 50)
vals = np.random.randint(100, size=(len(freqs), 1)).flatten()
print(freqs)
print(vals)
pins = [2, 6, 10, 11, 15, 15.2]
# find one interval for every pin and then sum to find final ones
islands = np.zeros_like(freqs) # in stead of: np.zeros((len(freqs), 1)).flatten()
for pin in pins:
island = np.zeros_like(freqs) # see above comment
island[(freqs >= pin-1) & (freqs <= pin+1)] = 1 # "&" makes it more readable
islands += island
# in stead of np.array([1 if x>0 else 0 for x in islands])
islands = np.where(islands > 0, 1, 0) # read as: where "islands > 0" put a '1', else put a '0'
# compare each value with the next to get island/sea transistions (islands are 1's seas are 0's)
island_edges = islands[:-1] != islands[1:]
# split at the edges (+1 to account for starting at the 1 index with comparisons
# islands_and_seas is a list of 'seas' and 'islands'
islands_and_seas = np.split(islands, np.where(island_edges)[0]+1)
# do the same as above but on the 'vals' array
islands_and_seas_vals = np.split(vals, np.where(island_edges)[0]+1)
# get the max values for the seas and islands
max_vals = np.array([np.max(arr) for arr in islands_and_seas_vals])
# create an array where the islands -> True, and seas -> False
islands_and_seas_bool = [np.all(arr) for arr in islands_and_seas]
# select only the max values of islands with
maxs = max_vals[islands_and_seas_bool]
print(maxs)

Reorder Sankey diagram vertically based on label value

I'm trying to plot patient flows between 3 clusters in a Sankey diagram. I have a pd.DataFrame counts with from-to values, see below. To reproduce this DF, here is the counts dict that should be loaded into a pd.DataFrame (which is the input for the visualize_cluster_flow_counts function).
from to value
0 C1_1 C1_2 867
1 C1_1 C2_2 405
2 C1_1 C0_2 2
3 C2_1 C1_2 46
4 C2_1 C2_2 458
... ... ... ...
175 C0_20 C0_21 130
176 C0_20 C2_21 1
177 C2_20 C1_21 12
178 C2_20 C0_21 0
179 C2_20 C2_21 96
The from and to values in the DataFrame represent the cluster number (either 0, 1, or 2) and the amount of days for the x-axis (between 1 and 21). If I plot the Sankey diagram with these values, this is the result:
Code:
import plotly.graph_objects as go
def visualize_cluster_flow_counts(counts):
all_sources = list(set(counts['from'].values.tolist() + counts['to'].values.tolist()))
froms, tos, vals, labs = [], [], [], []
for index, row in counts.iterrows():
froms.append(all_sources.index(row.values[0]))
tos.append(all_sources.index(row.values[1]))
vals.append(row[2])
labs.append(row[3])
fig = go.Figure(data=[go.Sankey(
arrangement='snap',
node = dict(
pad = 15,
thickness = 5,
line = dict(color = "black", width = 0.1),
label = all_sources,
color = "blue"
),
link = dict(
source = froms,
target = tos,
value = vals,
label = labs
))])
fig.update_layout(title_text="Patient flow between clusters over time: 48h (2 days) - 504h (21 days)", font_size=10)
fig.show()
visualize_cluster_flow_counts(counts)
However, I would like to vertically order the bars so that the C0's are always on top, the C1's are always in the middle, and the C2's are always at the bottom (or the other way around, doesn't matter). I know that we can set node.x and node.y to manually assign the coordinates. So, I set the x-values to the amount of days * (1/range of days), which is an increment of +- 0.045. And I set the y-values based on the cluster value: either 0, 0.5 or 1. I then obtain the image below. The vertical order is good, but the vertical margins between the bars are obviously way off; they should be similar to the first result.
The code to produce this is:
import plotly.graph_objects as go
def find_node_coordinates(sources):
x_nodes, y_nodes = [], []
for s in sources:
# Shift each x with +- 0.045
x = float(s.split("_")[-1]) * (1/21)
x_nodes.append(x)
# Choose either 0, 0.5 or 1 for the y-value
cluster_number = s[1]
if cluster_number == "0": y = 1
elif cluster_number == "1": y = 0.5
else: y = 1e-09
y_nodes.append(y)
return x_nodes, y_nodes
def visualize_cluster_flow_counts(counts):
all_sources = list(set(counts['from'].values.tolist() + counts['to'].values.tolist()))
node_x, node_y = find_node_coordinates(all_sources)
froms, tos, vals, labs = [], [], [], []
for index, row in counts.iterrows():
froms.append(all_sources.index(row.values[0]))
tos.append(all_sources.index(row.values[1]))
vals.append(row[2])
labs.append(row[3])
fig = go.Figure(data=[go.Sankey(
arrangement='snap',
node = dict(
pad = 15,
thickness = 5,
line = dict(color = "black", width = 0.1),
label = all_sources,
color = "blue",
x = node_x,
y = node_y,
),
link = dict(
source = froms,
target = tos,
value = vals,
label = labs
))])
fig.update_layout(title_text="Patient flow between clusters over time: 48h (2 days) - 504h (21 days)", font_size=10)
fig.show()
visualize_cluster_flow_counts(counts)
Question: how do I fix the margins of the bars, so that the result looks like the first result? So, for clarity: the bars should be pushed to the bottom. Or is there another way that the Sankey diagram can vertically re-order the bars automatically based on the label value?
Firstly I don't think there is a way with the current exposed API to achieve your goal smoothly you can check the source code here.
Try to change your find_node_coordinates function as follows (note that you should pass the counts DataFrame to):
counts = pd.DataFrame(counts_dict)
def find_node_coordinates(sources, counts):
x_nodes, y_nodes = [], []
flat_on_top = False
range = 1 # The y range
total_margin_width = 0.15
y_range = 1 - total_margin_width
margin = total_margin_width / 2 # From number of Cs
srcs = counts['from'].values.tolist()
dsts = counts['to'].values.tolist()
values = counts['value'].values.tolist()
max_acc = 0
def _calc_day_flux(d=1):
_max_acc = 0
for i in [0,1,2]:
# The first ones
from_source = 'C{}_{}'.format(i,d)
indices = [i for i, val in enumerate(srcs) if val == from_source]
for j in indices:
_max_acc += values[j]
return _max_acc
def _calc_node_io_flux(node_str):
c,d = int(node_str.split('_')[0][-1]), int(node_str.split('_')[1])
_flux_src = 0
_flux_dst = 0
indices_src = [i for i, val in enumerate(srcs) if val == node_str]
indices_dst = [j for j, val in enumerate(dsts) if val == node_str]
for j in indices_src:
_flux_src += values[j]
for j in indices_dst:
_flux_dst += values[j]
return max(_flux_dst, _flux_src)
max_acc = _calc_day_flux()
graph_unit_per_val = y_range / max_acc
print("Graph Unit per Acc Val", graph_unit_per_val)
for s in sources:
# Shift each x with +- 0.045
d = int(s.split("_")[-1])
x = float(d) * (1/21)
x_nodes.append(x)
print(s, _calc_node_io_flux(s))
# Choose either 0, 0.5 or 1 for the y-v alue
cluster_number = s[1]
# Flat on Top
if flat_on_top:
if cluster_number == "0":
y = _calc_node_io_flux('C{}_{}'.format(2, d))*graph_unit_per_val + margin + _calc_node_io_flux('C{}_{}'.format(1, d))*graph_unit_per_val + margin + _calc_node_io_flux('C{}_{}'.format(0, d))*graph_unit_per_val/2
elif cluster_number == "1": y = _calc_node_io_flux('C{}_{}'.format(2, d))*graph_unit_per_val + margin + _calc_node_io_flux('C{}_{}'.format(1, d))*graph_unit_per_val/2
else: y = 1e-09
# Flat On Bottom
else:
if cluster_number == "0": y = 1 - (_calc_node_io_flux('C{}_{}'.format(0,d))*graph_unit_per_val / 2)
elif cluster_number == "1": y = 1 - (_calc_node_io_flux('C{}_{}'.format(0,d))*graph_unit_per_val + margin + _calc_node_io_flux('C{}_{}'.format(1,d)) * graph_unit_per_val /2 )
elif cluster_number == "2": y = 1 - (_calc_node_io_flux('C{}_{}'.format(0,d))*graph_unit_per_val + margin + _calc_node_io_flux('C{}_{}'.format(1,d)) * graph_unit_per_val + margin + _calc_node_io_flux('C{}_{}'.format(2,d)) * graph_unit_per_val /2 )
y_nodes.append(y)
return x_nodes, y_nodes
Sankey graphs supposed to weigh their connection width by their corresponding normalized values right? Here I do the same, first, it calculates each node flux, later by calculating the normalized coordinate the center of each node calculated according to their flux.
Here is the sample output of your code with the modified function, note that I tried to adhere to your code as much as possible so it's a bit unoptimized(for example, one could store the values of nodes above each specified source node to avoid its flux recalculation).
With flag flat_on_top = True
With flag flat_on_top = False
There is a bit of inconsistency in the flat_on_bottom version which I think is caused by the padding or other internal sources of Plotly API.

How to declare a variable in python without assigning a value?

I'm trying to create graphs of the Mandelbrot set. I have managed to do this by iterating over a lot of points but this takes a lot of processing power, so I'm now trying to generate a polynomial by iterating f(z) = z**2 + c many times and then finding the roots for z = c, in order to generate a boundary of the set.
However I can't seem to get Python to generate the polynomial, any help would be much apprecaited.
Edit:Implemented azro's fix but now I get the error - TypeError: unsupported operand type(s) for ** or pow(): 'NoneType' and 'int'
Code so far:
import numpy as np
c = None
def f(z):
return z**2 + c
eqn = c
for i in range(100):
eqn = f(eqn)
np.roots(eqn)
This is a very hard problem. Searching through literature, I only found this (which doesn't seem very reputable). However, it does seem to begin to create what you want. This is only up to 8 iterations. So the polynomial gets very complicated very fast. See the following code:
import matplotlib.pyplot as plt
coeff = [0, 1, 1, 2, 5, 14, 26, 44, 69, 94, 114, 116, 94, 60, 28, 8, 1]
coeff = [0, 1, 1, 2, 5, 14, 42, 132, 429, 1302, 3774, 10652, 29538, 80812, 218324, 582408, 1534301, 3993030, 10269590, 26108844, 65626918, 163107044, 400844588, 974083128, 2340595778, 5560968284, 13062923500, 30336029592, 69640352964, 158015533208, 354347339496, 785248461712, 1719477330477, 3720187393990, 7952125694214, 16792863663700, 35031835376454, 72188854953372, 146932182777116, 295372837865192, 586400982013486, 1149605839249820, 2225301467579844, 4252710138415640, 8022825031835276, 14938862548001560, 27452211062573400, 49778848242964944, 89054473147697354, 157160523515654628, 273551721580800380, 469540646039042536, 794643418760272876, 1325752376790240280, 2180053774442766712, 3532711259225506384, 5640327912922026260, 8870996681171366696, 13741246529612440920, 20959276151880728336, 31472438318100876584, 46514944583399578896, 67649247253332557392, 96791719611591962592, 136210493669590627493, 188481251186354006062, 256386228250001079082, 342743629811082484420, 450159936955994386738, 580706779030058464252, 735537050036491961156, 914470757914434625800, 1115597581733327913554, 1334957092752100409132, 1566365198635995978988, 1801452751402955781592, 2029966595320794439668, 2240353897304462193848, 2420609646335251593480, 2559320275988283588176, 2646791812246207696810, 2676118542978972739644, 2644036970936308845148, 2551425591643957182856, 2403354418943890067404, 2208653487832260558008, 1979045408073272278264, 1727958521630464742736, 1469189341596552030212, 1215604411161527170376, 978057923319151340728, 764655844340519788496, 580430565842543266504, 427417353874088245520, 305060580205223726864, 210835921361505594848, 140960183546144741182, 91071943593142473900, 56796799826096529620, 34150590308701283528, 19772322481956974532, 11008161481780603512, 5884917700519129288, 3016191418506637264, 1479594496462756340, 693434955498545848, 309881648709683160, 131760770157606224, 53181959591958024, 20324543852025936, 7333879739219600, 2490875091238112, 793548088258508, 236221241425176, 65418624260840, 16771945556496, 3958458557608, 854515874096, 167453394320, 29524775520, 4634116312, 639097008, 76185104, 7685024, 637360, 41696, 2016, 64, 1]
r = np.roots(coeff)
plt.plot(r.real, r.imag, '.')
I would suggest something more like the following (stolen and modified from here). This sounds like something you've already tried. But try changing the max iterations to get something that can run relatively fast (30 was fast and had relatively high resolution for me).
MAX_ITER = 30
def mandelbrot(c):
z = 0
n = 0
while abs(z) <= 2 and n < MAX_ITER:
z = z*z + c
n += 1
return n
# Image size (pixels)
WIDTH = 600
HEIGHT = 400
# Plot window
RE_START = -2
RE_END = 1
IM_START = -1
IM_END = 1
img = np.zeros((WIDTH, HEIGHT))
for x in range(0, WIDTH):
for y in range(0, HEIGHT):
# Convert pixel coordinate to complex number
c = complex(RE_START + (x / WIDTH) * (RE_END - RE_START),
IM_START + (y / HEIGHT) * (IM_END - IM_START))
# Compute the number of iterations
m = mandelbrot(c)
if m > MAX_ITER-1:
img[x, y] = 1
plt.imshow(img.T, cmap='bone')

Python Numpy Array indexing

I am having a small difficulty with Numpy indexing. The script gives only the index of the last array three times when it supposed to give index of three different arrays (F_fit in the script). I am sure it is a simple thing, but I haven't figured it out yet. The 3_phases.txt file contains these 3 lines
1 -1 -1 -1 1 1
1 1 1 -1 1 1
1 1 -1 -1 -1 1
Here is the code:
import numpy as np
import matplotlib.pyplot as plt
D = 12.96
n = np.arange(1,7)
F0 = 1.0
x = np.linspace(0.001,4,2000)
Q = 2*np.pi*np.array([1/D, 2/D, 3/D, 4/D, 5/D, 6/D])
I = (11.159, 43.857, 26.302, 2.047, 0.513, 0.998)
phase = np.genfromtxt('3_phases.txt')
for row in phase:
F = (np.sqrt(np.square(n)*I/sum(I)))*row
d = sum(i*(np.sin(x*D/2+np.pi*j)/(x*D/2+np.pi*j))for i,j in zip(F,n))
e = sum(i*(np.sin(x*D/2-np.pi*j)/(x*D/2-np.pi*j))for i,j in zip(F,n))
f_0 = F0*(np.sin(x*D/2)/(x*D/2))
F_cont = np.array(d) + np.array(e) + np.array(f_0)
plt.plot(x,F_cont,'r')
#plt.show()
plt.clf()
D2 = 12.3
I2 = (9.4, 38.6, 8.4, 3.25, 0, 0.37)
Q2 = 2*np.pi*np.array([1/D2, 2/D2, 3/D2, 4/D2, 5/D2, 6/D2])
n2 = np.arange(1,7)
for row in phase:
F2 = (np.sqrt(np.square(n2)*I2/sum(I2)))*row
plt.plot(Q2,F2,'o')
#plt.show()
F_data = F2
Q_data = Q2
I_data = np.around(2000*Q2/(4-0.001))
I_data = np.array(map(int,I_data))
F_fit = F_cont[I_data]
print F_fit
R2 = (1-(sum(np.square(F_data-F_fit))/sum(np.square(F_data-np.mean(F_data)))))
Any help would be appreciated.
You are redefining F_cont each time you go through your first loop. By the time you get to your second loop (with all the _2 values) you only have access to the F_cont from the last row.
To fix this, move your _2 definitions above your first loop and only do the loop once, then you'll have access to each F_cont and your printouts will be different.
The following code is identical to yours except for the rearrangement described above, as well as the fact that I implemented my comment from above (using n/D in your Q's).
import numpy as np
import matplotlib.pyplot as plt
D = 12.96
n = np.arange(1,7)
F0 = 1.0
x = np.linspace(0.001,4,2000)
Q = 2*np.pi*n/D
I = (11.159, 43.857, 26.302, 2.047, 0.513, 0.998)
phase = np.genfromtxt('3_phases.txt')
D2 = 12.3
I2 = (9.4, 38.6, 8.4, 3.25, 0, 0.37)
Q2 = 2*np.pi*n/D2
n2 = np.arange(1,7)
for row in phase:
F = (np.sqrt(np.square(n)*I/sum(I)))*row
d = sum(i*(np.sin(x*D/2+np.pi*j)/(x*D/2+np.pi*j))for i,j in zip(F,n))
e = sum(i*(np.sin(x*D/2-np.pi*j)/(x*D/2-np.pi*j))for i,j in zip(F,n))
f_0 = F0*(np.sin(x*D/2)/(x*D/2))
F_cont = np.array(d) + np.array(e) + np.array(f_0)
plt.plot(x,F_cont,'r')
plt.clf()
F2 = (np.sqrt(np.square(n2)*I2/sum(I2)))*row
plt.plot(Q2,F2,'o')
F_data = F2
Q_data = Q2
I_data = np.around(2000*Q2/(4-0.001))
I_data = np.array(map(int,I_data))
F_fit = F_cont[I_data]
print F_fit
R2 = (1-(sum(np.square(F_data-F_fit))/sum(np.square(F_data-np.mean(F_data)))))
F_fit is being calculating from I_data, which is in turn being calculated from Q2. Q2 is set outside the loop, and doesn't depend on row - perhaps you meant I_data to be a function of F2 instead?

For loop, repetitive calculation in Python

from random import uniform
prob = [0.25,0.30,0.45]
def onetrial(prob):
u=uniform(0,1)
if 0 < u <= prob[0]:
return 11
if prob[0] < u <= prob[0]+prob[1]:
return 23
if prob[0]+prob[1] < u <= prob[0]+prob[1]+prob[2]:
return 39
print onetrial(prob)
I wonder how to reduce the repetitive part in the def using some for-loop techniques. Thanks.
The following is equivalent to your current code and it uses a for loop:
from random import uniform
prob = [0.25, 0.30, 0.45]
def onetrial(prob):
u = uniform(0, 1)
return_values = [11, 23, 39]
total_prob = 0
for i in range(3):
total_prob += prob[i]
if u <= total_prob:
return return_values[i]
I am a little unclear on the relationship between the values you return and the probabilities, it seems like for your code prob will always have exactly 3 elements, so I made that assumption as well.
I like F.J's answer, but I would use a list of tuples, assuming you can easily do so:
from random import uniform
prob = [(0.25, 11), (0.30, 23), (0.45, 39)]
def onetrial(prob):
u = uniform(0, 1)
total_prob = 0
for i in range(3):
total_prob += prob[i][0]
if u <= total_prob:
return prob[i][1]
Assuming you call onetrial frequently, calculate the CDF first to make it a bit faster:
from random import uniform
vals = [11, 23, 39]
prob = [0.25, 0.30, 0.45]
cdf = [sum(prob[0:i+1]) for i in xrange(3)]
def onetrial(vals, cdf):
u = uniform(0, 1)
for i in range(3):
if u <= cdf[i]:
return vals[i]
You could use bisect to make it even faster.

Categories

Resources