I am used to working with python, and am just getting used to Matlab. I am trying to write a foor loop in Matlab similar to this
x_temp=x[0]
for i in range(0,400):
if x[i]>=x_temp:
x_temp=x[i]
print(x_temp)
I tried
N=401;
x=linspace(-20,20,N);
dt = 0.0002;
t=0:dt:2;
x_temp=x(0);
for j=2: lenght(t)
if x(j)>=x_temp
x_temp=x(j);
end
end
print(x_temp);
but I get an error saying 'Array indices must be positive integers or logical values.' Could anyone please help answer how I should index the vectors properly in matlab?
Arrays start at 1 in matlab, not 0 like in python. It's kind of a pain but you'll get used to it.
It's not really clear what you are trying to compute here since it's just a fragment, but it looks like you want the largest element in the array. No need for a for loop, just use the max function on the subset of the array you want to test:
[value, index] = max( x(2:length(t)) ) ;
In general what makes matlab better than other languages for math/science is the powerful built in functions. Never write a for loop before you check if there's a simple function or one-line vector operation that gives the same result (and in most cases, much quicker).
Related
I have built a python program processing the probability of various datasets. I input 'manually' various mean values and standard deviations, and that works, however I need to automate it so that I can upload all my data through a text or csv file. I've got so far but now have a nested for loop query I think with indices problems, but some background follows...
My code works for a small dataset where I can manually key in 6-8 parameters working but now I need to automate it and upload various inputs of unknown sizes by csv / text file. I am copying my existing code and amending it where appropriate but I have run into a problem.
I have a 2_D numpy-array where some probabilities have been reverse sorted. I have a second array which gives me the value of 68.3% of each row, and I want to trim the low value 31.7% data.
I need a solution which can handle an unspecified number of rows.
My pre-existing code worked for a single one-dimensional array was
prob_combine_sum= np.sum(prob_combine)
#Reverse sort the probabilities
prob_combine_sorted=sorted(prob_combine, reverse=True)
#Calculate 1 SD from peak Prob by multiplying Total Prob by 68.3%
sixty_eight_percent=prob_combine_sum*0.68269
#Loop over the sorted list and append the 1SD data into a list
#onesd_prob_combine
onesd_prob_combine=[]
for i in prob_combine_sorted:
onesd_prob_combine.append(i)
if sum(onesd_prob_combine) > sixty_eight_percent:
break
That worked. However, now I have a multi-dimensional array, and I want to take the 1 standard deviation data from that multi-dimensional array and stick it in another.
There's probably more than one way of doing this but I thought I would stick to the for loop, but now it's more complicated by the indices. I need to preserve the data structure, and I need to be able to handle unlimited numbers of rows in the future.
I simulated some data and if I can get this to work with this, I should be able to put it in my program.
sorted_probabilities=np.asarray([[9,8,7,6,5,4,3,2,1],
[87,67,54,43,32,22,16,14,2],[100,99,78,65,45,43,39,22,3],
[67,64,49,45,42,40,28,23,17]])
sd_test=np.asarray([30.7215,230.0699,306.5323,256.0125])
target_array=np.zeros(4).reshape(4,1)
#Task transfer data from sorted_probabilities to target array on
condition that value in each target row is less than the value in the
sd_test array.
#Ignore the problem that data transferred won't add up to 68.3%.
My real data-sample is very big. I just need a way of trimmining
and transferring.
for row in sorted_probabilities:
for element in row:
target_array[row].append[i]
if sum(target[row]) > sd_test[row]:
break
Error: IndexError: index 9 is out of bounds for axis 0 with size 4
I know it's not a very good attempt. My problem is that I need a solution which will work for any 2D array, not just one with 4 rows.
I'd be really grateful for any help.
Thank you
Edit:
Can someone help me out with this? I am struggling.
I think the reason my loop will not work is that the 'index' row I am using is not a number, but in this case a row. I will have a think about this. In meantime has anyone a solution?
Thanks
I tried the following code after reading the comments:
for counter, value in enumerate(sorted_probabilities):
for i, element in enumerate(value):
target_array[counter]=sorted_probabilities[counter][element]
if target_array[counter] > sd_test[counter]:
break
I get an error: IndexError: index 9 is out of bounds for axis 0 with size 9
I think it's because I am trying to add to numpy array of pre-determined dimensions? I am not sure. I am going to try another tack now as I can not do this with this approach. It's having to maintain the rows in the target array that is making it difficult. Each row relates to an object, and if I lose the structure it will be pointless.
I recommend you use pandas. You can read directly the csv in a dataframe and do multiple operations on columns and such, clean and neat.
You are mixing numpy arrays with python lists. Better use only one of these (numpy is preferred). Also try to debug your code, because it has either syntax and logical errors. You don't have variable i, though you're using it as an index; also you are using row as index while it is a numpy array, but not an integer.
I strongly recommend you to
0) debug your code (at least with prints)
1) use enumerate to create both of your for loops;
2) replace append with plain assigning, because you've already created an empty vector (target_array). Or initialize your target_array as empty list and append into it.
3) if you want to use your solution for any 2d array, wrap your code into a function
Try this:
sorted_probabilities=np.asarray([[9,8,7,6,5,4,3,2,1],
[87,67,54,43,32,22,16,14,2],
[100,99,78,65,45,43,39,22,3],
[67,64,49,45,42,40,28,23,17]]
)
sd_test=np.asarray([30.7215,230.0699,306.5323,256.0125])
target_array=np.zeros(4).reshape(4,1)
for counter, value in enumerate(sorted_probabilities):
for i, element in enumerate(value):
target_array[counter] = element # Here I removed the code that produced error
if target_array[counter] > sd_test[counter]:
break
I am fairly new to using tensorflow so it is possible there is a very obvious solution to my problem that I am missing. I currently have a 3-dimensional array filled with integer values. the specific values are not important so I have put in a smaller array with filler values for the sake of this question
`Array = tf.constant([[[0,0,1000,0],[3000,3000,3000,3000],[0,2500,0,0]],
[[100,200,300,400],[0,0,0,100],[300,300,400,300]]]).eval()`
So the array looks like this when printed I believe.
`[[[0,0,1000,0],
[3000,3000,3000,3000],
[0,2500,0,0]],
[[100,200,300,400],
[0,0,0,100],
[300,300,400,300]]]`
In reality this array has 23 2-D arrays stacked on top of each other. What I want to do is to create an array or 3 separate arrays that contain the range of values in each row of different levels of the 3-D array.
Something like
`Xrange = tf.constant([Array[0,0,:].range(),Array[1,0,:].range(),Array[2,0,:].range()...,Array[22,0,:].range()])`
Firstly, I am having trouble finding a functioning set of commands strung together using tensorflow that allows me to find the range of the row. I know how to do this easily in numpy but have yet to find any way to do this. Secondly, assuming there is a way to do the above, is there a way to consolidate the code without having to write it out 23 times within one line for each unique row. I know that could simply be done with a for loop, but I would also like to avoid using a solution that requires a loop. Is there a good way to do this, or is more information needed? Also please let me know if I'm screwing up my syntax since I'm still fairly new to both python and tensorflow.
So as I expected, my question has a reasonably simple answer. All that was necessary was to use the tf.reduce_max and tf.reduce_min commands
The code I finally ended with looks like:
Range = tf.subtract(tf.reduce_max(tf.constant(Array),axis=2,keep_dims=True),tf.reduce_min(tf.constant(Array),axis=2,keep_dims=True))
This produced:
[[[1000]
[ 0]
[2500]]
[[ 300]
[ 100]
[ 100]]]
An example what I want to do is instead of doing what is shown below:
Z_old = [[0,0,0,0,0],[0,0,0,0,0],[0,0,0,0,0]]
for each_axes in range(len(Z_old)):
for each_point in range(len(Z_old[each_axes])):
Z_old[len(Z_old)-1-each_axes][each_point] = arbitrary_function(each_point, each_axes)
I want now to not initialize the Z_old array with zeroes but rather fill it up with values while iterating through it which is going to be something like the written below although it's syntax is horribly wrong but that's what I want to reach in the end.
Z = np.zeros((len(x_list), len(y_list))) for Z[len(x_list) -1 - counter_1][counter_2] is equal to power_at_each_point(counter_1, counter_2] for counter_1 in range(len(x_list)) and counter_2 in range(len(y_list))]
As I explained in my answer to your previous question, you really need to vectorize arbitrary_function.
You can do this by just calling np.vectorize on the function, something like this:
Z = np.vectorize(arbitrary_function)(np.arange(3), np.arange(5).reshape(5, 1))
But that will only give you a small speedup. In your case, since arbitrary_function is doing a huge amount of work (including opening and parsing an Excel spreadsheet), it's unlikely to make enough difference to even notice, much less to solve your performance problem.
The whole point of using NumPy for speedups is to find the slow part of the code that operates on one value at a time, and replace it with something that operates on the whole array (or at least a whole row or column) at once. You can't do that by looking at the very outside loop, you need to look at the very inside loop. In other words, at arbitrary_function.
In your case, what you probably want to do is read the Excel spreadsheet into a global array, structured in such a way that each step in your process can be written as an array-wide operation on that array. Whether that means multiplying by a slice of the array, indexing the array using your input values as indices, or something completely different, it has to be something NumPy can do for you in C, or NumPy isn't going to help you.
If you can't figure out how to do that, you may want to consider not using NumPy, and instead compiling your inner loop with Cython, or running your code under PyPy. You'll still almost certainly need to move the "open and parse a whole Excel spreadsheet" outside of the inner loop, but at least you won't have to figure out how to rethink your problem in terms of vectorized operations, so it may be easier for you.
rows = 10
cols = 10
Z = numpy.array([ arbitrary_function(each_point, each_axes) for each_axes in range(cols) for each_point in range(rows) ]).reshape((rows,cols))
maybe?
this is my first post here, so i'm sorry if i didn't follow the rules
i recently learned python, i know the basics and i like writing famous sets and plot them, i've wrote codes for the hofstadter sequence, a logistic sequence and succeeded in both
now i've tried writing mandelbrot's sequence without any complex parameters, but actually doing it "by hand"
for exemple if Z(n) is my complexe(x+iy) variable and C(n) my complexe number (c+ik)
i write the sequence as {x(n)=x(n-1)^2-y(n-1)^2+c ; y(n)=2.x(n-1).y(n-1)+c}
from math import *
import matplotlib.pyplot as plt
def mandel(p,u):
c=5
k=5
for i in range(p):
c=5
k=k-10/p
for n in range(p):
c=c-10/p
x=0
y=0
for m in range (u):
x=x*x-y*y + c
y=2*x*y + k
if sqrt(x*x+y*y)>2:
break
if sqrt(x*x+y*y)<2:
X=X+[c]
Y=Y+[k]
print (round((i/p)*100),"%")
return (plt.plot(X,Y,'.')),(plt.show())
p is the width and number of complexe parameters i want, u is the number of iterations
this is what i get as a result :
i think it's just a bit close to what i want.
now for my questions, how can i make the function faster? and how can i make it better ?
thanks a lot !
A good place to start would be to profile your code.
https://docs.python.org/2/library/profile.html
Using the cProfile module or the command line profiler, you can find the inefficient parts of your code and try to optimize them. If I had to guess without personally profiling it, your array appending is probably inefficient.
You can either use a numpy array that is premade at an appropriate size, or in pure python you can make an array with a given size (like 50) and work through that entire array. When it fills up, append that array to your main array. This reduces the number of times the array has to be rebuilt. The same could be done with a numpy array.
Quick things you could do though
if sqrt(x*x+y*y)>2:
should become this
if x*x+y*y>4:
Remove calls to sqrt if you can, its faster to just exponentiate the other side by 2. Multiplication is cheaper than finding roots.
Another thing you could do is this.
print (round((i/p)*100),"%")
should become this
# print (round((i/p)*100),"%")
You want faster code?...remove things not related to actually plotting it.
Also, you break a for loop after a comparison then make the same comparison...Do what you want to after the comparison and then break it...No need to compute that twice.
I have a matlab code that I'm trying to translate in python.
I'm new on python but I have been able to answer a lot of questions googling a little bit.
But now, I'm trying to figure out the following:
I have a for loop when I apply different things on each column, but you don't know the number of columns. For example.
In matlab, nothing easier than this:
for n = 1:size(x,2); y(n) = mean(x(:,n)); end
But I have no idea how to do it on python when, for example, the number of columns is 1, because I can't do x[:,1] in python.
Any idea?
Thanx
Yes, if you use numpy you can use x[:,1], and also you get other data structures (vectors instead of lists), the main difference between matlab and numpy is that matlab uses matrices for calculations and numpy uses vectors, but you get used to it, I think this guide will help you out.
Try numpy. It is a python bindings for high-performance math library written in C. I believe it has the same concepts of matrix slice operations, and it is significantly faster than the same code written in pure python (in most cases).
Regarding your example, I think the closest would be something using numpy.mean.
In pure python it is hard to calculate mean of column, but it you are able to transpose the matrix you could do it using something like this:
# there are no builtin avg function
def avg(lst):
return sum(lst)/len(lst)
rows = list(avg(row) for row in a)
this is one way to do it
from numpy import *
x=matrix([[1,2,3],[2,3,4]])
[mean(x[:,n]) for n in range(shape(x)[1])]
# [1.5, 2.5, 3.5]