I am trying to find the global minimum of the Sum of Squared Differences which contains five different parameters x[0], x[1], x[2], x{3], x[4] coming from an affine transformation of data. Since I have a lot of data to compare the different_evolution approach was speeded up with the use of numba as you can see in the code part below. (More details to the code can be found here: How to speed up differential_evolution to find the minimum of Sum of Squared Errors with five variables)
My problem now is, that I get different values for my parameters for running the same Code several times. This problem is also described in Differential evolution algorithm different results for different runs but I use the scipy.optimize package to import differential_evolution instead of mystic. I tried to use mystic instead but then I get de following error message:
TypeError: CPUDispatcher(<function tempsum3 at 0x000002AA043999D0>) is not a Python function
My question now is, if there is any appproach that really gives me back the global minimum of my function? An approach using the powell-method with the minimize command gives back worse values for my parameters than differential_evolution (gets stuck in local minmia even faster). And if there is no other way than different_evolution, how can I be sure that I really have the global Minimum at some point? And in case the mystic approach is expedient, how do i get it going?
I would be thankful for any kind of advice,
Here's my function to minimize with the use of numba to speed it up and the mystic pacakge, that gives me back an error message:
from math import floor
import matplotlib.pyplot as plt
import numpy as np
import mystic as my
from mystic.solvers import diffev
import numba as nb
w1=5 #A1
h1=3
w2=8#A2
h2=5
hw1=w1*h1
hw2=w2*h2
A1=np.ones((h1,w1))
A2=np.ones((h2,w2))
for n1 in np.arange(2,4):
A1[1][n1]=1000
for n2 in np.arange(5,7):
A2[3][n2]=1000
A2[4][n2]=1000
#Norm the raw data
maxA1=np.max(A1)
A1=1/maxA1*A1
#Norm the raw data
maxA2=np.max(A2)
A2=1/maxA2*A2
fig, axes = plt.subplots(1, 2)
axes[0].imshow(A2)
axes[0].set_title('original-A2')
axes[1].imshow(A1)
axes[1].set_title('shifted-A1')
plt.show()
#nb.njit
def xy_to_pixelvalue(x,y,dataArray): #getting the values to the normed pixels
height=dataArray.shape[0]
width=dataArray.shape[1]
xcoord=floor((x+1)/2*(width-1))
ycoord=floor((y+1)/2*(height-1))
if -1 <=x<=1 and -1<=y<=1:
return dataArray[ycoord][xcoord]
else:
return(0)
#norming pixel coordinates
A1x=np.linspace(-1,1,w1)
A1y=np.linspace(-1,1,h1)
A2x=np.linspace(-1,1,w2)
A2y=np.linspace(-1,1,h2)
#normed coordinates of A2 in a matrix
Ap2=np.zeros((h2,w2,2))
for i in np.arange(0,h2):
for j in np.arange(0,w2):
Ap2[i][j]=(A2x[j],A2y[i])
#defining a vector with the coordinates of A2
d=[]
cdata=Ap2[:,:,0:2]
for i in np.arange(0,h2):
for j in np.arange(0,w2):
d.append((cdata[i][j][0],cdata[i][j][1]))
d=np.asarray(d)
coora2=d.transpose()
coora2=np.array(coora2, dtype=np.float64)
#nb.njit('(float64[::1],)')
def tempsum3(x):
tempsum=0
for l in np.arange(0,hw2):
tempsum += (xy_to_pixelvalue(np.cos(np.radians(x[2]))*x[0]*coora2[0][l]-np.sin(np.radians(x[2]))*x[1]*coora2[1][l]+x[3],np.sin(np.radians(x[2]))*x[0]*coora2[0][l]+np.cos(np.radians(x[2]))*x[1]*coora2[1][l]+x[4],A1)-xy_to_pixelvalue(coora2[0][l],coora2[1][l],A2))**2
return tempsum
x0 = np.array([1,1,-0.5,0,0])
bounds = [(0.1,5),(0.1,5),(-1,2),(-4,4),(-4,4)]
result = diffev(tempsum3, x0, npop = 5*15, bounds = bounds, ftol = 1e-11, gtol = 3500, maxiter = 1024**3, maxfun = 1024**3, full_output=True, scale = 0.8)
print(result)
In the following code I want to optimize a wind farm using a penalty function.
Using the first function(newsite), I have defined the wind turbines numbers and layout. Then in the next function, after importing x0(c=x0=initial guess), for each range of 10 wind directions (wd) I took the c values for the mean wd of each range. For instance, for wd:[0,10] mean value is 5 and I took c values of wd=5 and put it for all wd in the range[0,10] and for each wind speed(ws). I have to mention that c is the value that shows that wind turbines are off or on(c=0 means wt is off). then I have defined operating according to the c, which means that if operating is 0,c=0 and that wt is off.
Then I defined the penalty function to optimize power output. indeed wherever TI_eff>0.14, I need to implement a penalty function so this function must be subtracted from the original power output. For instance, if sim_res.TI_eff[1][2][3] > 0.14, so I need to apply penalty function so curr_func[1][2][3]=sim_res.Power[1][2][3]-10000*(sim_res.TI_eff[1][2][3]-0.14)**2.
The problem is that I run this code but it did not give me any results and I waited for long hours, I think it was stuck in a loop that could not reach converge. so I want to know what is the problem?
import time
from py_wake.examples.data.hornsrev1 import V80
from py_wake.examples.data.hornsrev1 import Hornsrev1Site # We work with the Horns Rev 1 site, which comes already set up with PyWake.
from py_wake import BastankhahGaussian
from py_wake.turbulence_models import GCLTurbulence
from py_wake.deflection_models.jimenez import JimenezWakeDeflection
from scipy.optimize import minimize
from py_wake.wind_turbines.power_ct_functions import PowerCtFunctionList, PowerCtTabular
import numpy as np
def newSite(x,y):
xNew=np.array([x[0]+560*i for i in range(4)])
yNew=np.array([y[0]+560*i for i in range(4)])
x_newsite=np.array([xNew[0],xNew[0],xNew[0],xNew[1]])
y_newsite=np.array([yNew[0],yNew[1],yNew[2],yNew[0]])
return (x_newsite,y_newsite)
def wt_simulation(c):
c = c.reshape(4,360,23)
site = Hornsrev1Site()
x, y = site.initial_position.T
x_newsite,y_newsite=newSite(x,y)
windTurbines = V80()
for item in range(4):
for j in range(10,370,10):
for i in range(j-10,j):
c[item][i]=c[item][j-5]
windTurbines.powerCtFunction = PowerCtFunctionList(
key='operating',
powerCtFunction_lst=[PowerCtTabular(ws=[0, 100], power=[0, 0], power_unit='w', ct=[0, 0]), # 0=No power and ct
windTurbines.powerCtFunction], # 1=Normal operation
default_value=1)
operating = np.ones((4,360,23)) # shape=(#wt,wd,ws)
operating[c <= 0.5]=0
wf_model = BastankhahGaussian(site, windTurbines,deflectionModel=JimenezWakeDeflection(),turbulenceModel=GCLTurbulence())
# run wind farm simulation
sim_res = wf_model(
x_newsite, y_newsite, # wind turbine positions
h=None, # wind turbine heights (defaults to the heights defined in windTurbines)
wd=None, # Wind direction (defaults to site.default_wd (0,1,...,360 if not overriden))
ws=None, # Wind speed (defaults to site.default_ws (3,4,...,25m/s if not overriden))
operating=operating
)
curr_func=np.ones((4,360,23))
for i in range(4):
for l in range(360):
for k in range(23):
if sim_res.TI_eff[i][l][k]-0.14 > 0 :
curr_func[i][l][k]=sim_res.Power[i][l][k]-10000*(sim_res.TI_eff[i][l][k]-0.14)**2
else:
curr_func[i][l][k]=sim_res.Power[i][l][k]
return -float(np.sum(curr_func)) # negative because of scipy minimize
t0 = time.perf_counter()
def solve():
wt =4 # for V80
wd=360
ws=23
x0 = np.ones((wt,wd,ws)).reshape(-1) # initial value for c
b=(0,1)
bounds=np.full((wt,wd,ws,2),b).reshape(-1, 2)
res = minimize(wt_simulation, x0=x0, bounds=bounds)
return res
res=solve()
print(f'success status: {res.success}')
print(f'aep: {-res.fun}') # negative to get the true maximum aep
print(f'c values: {res.x}\n')
print(f'elapse: {round(time.perf_counter() - t0)}s')
sim_res=wt_simulation(res.x)
There are a number of things in your approach that are either wrong or incomprehensible to me. Just for fun I have tried your code. A few observations:
Your set of parameters (optimization variables) has a shape of (4, 360, 23), i.e. you are looking at 33,120 parameters. There is no nonlinear optimization algorithm that is going to give you any meaningful answer to a problem that big. Ever. But then again, you shouldn't be looking at SciPy optimize if your optimization variables should only assume 0/1 values.
Calling SciPy minimize like this:
res = minimize(wt_simulation, x0=x0, bounds=bounds)
Is going to select a nonlinear optimizer between BFGS, L-BFGS-B or SLSQP (according to the documentation at https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html)
Those algorithms are gradient-based, and since you're not providing a gradient of your objective function SciPy is going to calculate them numerically. Good luck with that when you have 33,000 parameters. Never going to finish.
At the beginning of your objective function you are doing this:
for item in range(4):
for j in range(10,370,10):
for i in range(j-10,j):
c[item][i]=c[item][j-5]
I don't understand why you're doing it but you are overriding the input values of c coming from the optimizer.
Your objective function takes 20-25 seconds to evaluate on my powerful workstation. Even if you had only 10-15 optimization parameters, it would take you several days to get any answer out of an optimizer. You have 33,000+ variables. No way.
I don't know why you are doing this and why you're doing it the way you're doing it. You should rethink your approach.
I am trying to fit some experimental data to a nonlinear function with one parameter that includes an arcus cosine function which therefore is limited in its area of definition from -1 to 1. I use scipy's curve_fit to find the parameter of the function, but it returns the following error:
RuntimeError: Optimal parameters not found: Number of calls to function has reached maxfev = 400.
The function I want to fit is this one:
def fitfunc(x, a):
y = np.rad2deg(np.arccos(x*np.cos(np.deg2rad(a))))
return y
For the fitting, I provid a numpy array for x and y respectively which contain values in degree (which is why the function contains conversion to and from radians).
param, param_cov = curve_fit(fitfunc, xs, ys)
When I use other fit functions like for example a polynomial, the curve_fit returns some values, the error mentioned above only occurs when I use this function which includes an arcus cosine.
I suspect that it cannot fit the data points because depending on the parameter of the arcus cosine function, some data points do not lie inside the area of definition of the arcus cosine. I have tried raising the number iterations (maxfev) but without success.
Sample data:
ys = np.array([113.46125, 129.4225, 140.88125, 145.80375, 145.4425,
146.97125, 97.8025, 112.91125, 114.4325, 119.16125,
130.13875, 134.63125, 129.4375, 141.99, 139.86,
138.77875, 137.91875, 140.71375])
xs = np.array([2.786427013, 3.325624466, 3.473013087, 3.598247534, 4.304280248,
4.958273121, 2.679526725, 2.409388637, 2.606306639, 3.661558062,
4.569923009, 4.836843789, 3.377013596, 3.664550526, 4.335401233,
3.064199519, 3.97155254, 4.100567011])
As HS-nebula mentioned in his comments, you need to define an initial value a0 of a as a start guess for the curve-fitting. Moreover, you need to be careful when choosing a0 as your np.arcos() is only defined in [-1,1] and choosing the wrong a0 results in error.
import numpy as np
from scipy.optimize import curve_fit
ys = np.array([113.46125, 129.4225, 140.88125, 145.80375, 145.4425, 146.97125,
97.8025, 112.91125, 114.4325, 119.16125, 130.13875, 134.63125,
129.4375, 141.99, 139.86, 138.77875, 137.91875, 140.71375])
xs = np.array([2.786427013, 3.325624466, 3.473013087, 3.598247534, 4.304280248, 4.958273121,
2.679526725, 2.409388637, 2.606306639, 3.661558062, 4.569923009, 4.836843789,
3.377013596, 3.664550526, 4.335401233, 3.064199519, 3.97155254, 4.100567011])
def fit_func(x, a):
a_in_rad = np.deg2rad(a)
cos_a_in_rad = np.cos(a_in_rad)
arcos_xa_product = np.arccos( x * cos_a_in_rad )
return np.rad2deg(arcos_xa_product)
a0 = 80
param, param_cov = curve_fit(fit_func, xs, ys, a0, bounds = (0, 360))
print('Using curve we retrieve a value of a = ', param[0])
Output:
Using curve we retrieve a value of a = 100.05275506147824
However if you choose a0=60, you get the following error:
ValueError: Residuals are not finite in the initial point.
To be able to use the data with all possible values of a, a normalization as HS-nebula suggested is good idea.
I am aware that there are a few questions about a similar subject, although I couldn't find a proper answer.
I would like to fit some data with a function (called Bastenaire) and iget the parameters values. Here is the code:
import numpy as np
from matplotlib import pyplot as plt
from scipy import optimize
def bastenaire(s, A,B, C,sd):
logNB=np.log(A)-C*(s-sd)-np.log(s-sd)
return np.exp(logNB)-B
S=np.array([659,646,634,623,613,595,580,565,551,535,515,493,473,452,432,413,394,374,355,345])
N=np.array([46963,52934,59975,65522,74241,87237,101977,116751,133665,157067,189426,233260,281321,355558,428815,522582,630257,768067,902506,1017280])
fitmb,fitmob=optimize.curve_fit(bastenaire,S,N,p0=(30000,2000000000,0.2,250))
plt.scatter(N,S)
plt.plot(bastenaire(S,*fitmb),S,label='bastenaire')
plt.legend()
plt.show()
However, the curve fit cannot identify the correct parameters and I get: OptimizeWarning: Covariance of the parameters could not be estimated.
Same results when I give no input parameters values.
Figure
Is there any way to tweak something and get results? Should my dataset cover a wider range and values?
Thank you!
Broc
Fitting is tough, you need to restrain the parameter space using bounds and (often) check a bit your initial values.
To make it work, I search for an initial value where the function had the correct look, then estimated some constraints:
bounds = np.array([(1e4, 1e12), (-np.inf, np.inf), (1e-20, 1e-2), (-2000., 20000)]).T
fitmb, fitmob = optimize.curve_fit(bastenaire,S, N,p0=(1e7,-100.,1e-5,250.), bounds=bounds)
returns
(array([ 1.00000000e+10, 1.03174824e+04, 7.53169772e-03, -7.32901325e+01]), array([[ 2.24128391e-06, 6.17858390e+00, -1.44693602e-07,
-5.72040842e-03],
[ 6.17858390e+00, 1.70326029e+07, -3.98881486e-01,
-1.57696515e+04],
[-1.44693602e-07, -3.98881486e-01, 1.14650323e-08,
4.68707940e-04],
[-5.72040842e-03, -1.57696515e+04, 4.68707940e-04,
1.93358414e+01]]))
This is my first time posting a question and I'm going to try to make it as clear as I can but feel free to ask questions.
I'm trying to fit a model to a curve using the scipy.curve_fit method as below:
import numpy as np
import matplotlib.pyplot as pyplot
import scipy
from scipy.optimize import curve_fit
def func2(x,EM):
return (((4.0*EM*(np.sqrt(8*10**-9)))/(3.0*(1.0-(0.5**2))*8*10**-9))*(((((x))*1*10**-9)**((3.0/2.0)))))
ydata=[-0.003428768, -0.009050058, -0.0037997673999999996, -0.0003833233, -0.007557649, -0.0034860994, -0.0009856887, -0.0017508664, -0.00036931394999999996,
-0.0040713947, -0.005737315000000001, 0.0005120568, -0.007336486, -0.00719302, -0.0039941817, -0.0029785274, -0.0013044578, -0.008190335, -0.00833507,
-0.0074282060000000006, -0.009629990000000001, -0.009425125, -0.008662485999999999, -0.0019445216, -0.008331748, -0.009513038, -0.0047609017, -0.004364422,
-0.010325097, -0.0036570733, -0.0060091914, -0.005655772, -0.0045517069999999995, -0.00066998035, 0.006374902, 0.006445733, 0.0019101816,
0.010262737999999999, 0.011139007, 0.018161469, 0.016963122, 0.022915895, 0.027177791, 0.028707139, 0.040105638, 0.044088004, 0.041657403,
0.052325636999999994, 0.062399405, 0.07020844, 0.076979915, 0.08888523, 0.099634745, 0.10961602, 0.12188646, 0.13677225, 0.15639512, 0.16833586,
0.18849944000000002, 0.21515548, 0.23989769000000002, 0.26319308, 0.29388397, 0.321042, 0.35637776, 0.38564656999999997, 0.4185209, 0.44986692,
0.48931552999999994, 0.52583893, 0.5626885, 0.6051665, 0.6461075, 0.69644346, 0.7447817, 0.7931281, 0.8381386000000001, 0.8883482, 0.9395609999999999,
0.9853629, 1.0377034, 1.0889026, 1.1334094]
xdata=[34.51388, 33.963736999999995,
33.510695, 33.04127, 32.477253, 32.013624, 31.536019999999997, 31.02925, 30.541649999999997,
30.008646, 29.493828, 29.049707, 28.479668, 27.980956, 27.509590000000003, 27.018721, 26.533737, 25.972296,
25.471065, 24.979228000000003, 24.459624, 23.961517, 23.46839, 23.028454, 22.471411, 21.960924, 21.503428000000003,
21.007033, 20.453855, 20.013475, 19.492528, 18.995746999999998, 18.505670000000002, 18.040403, 17.603387, 17.104082,
16.563634, 16.138298000000002, 15.646187, 15.20897, 14.69833, 14.25156, 13.789688, 13.303409, 12.905278, 12.440909, 11.919262,
11.514609, 11.104646, 10.674512, 10.235055, 9.84145, 9.437523, 9.026733, 8.63639, 8.2694065, 7.944733, 7.551445, 7.231599999999999,
6.9697434, 6.690793299999999, 6.3989780000000005, 6.173159, 5.9157856, 5.731453, 5.4929328, 5.2866156, 5.066648000000001, 4.9190496,
4.745381399999999, 4.574569599999999, 4.4540283, 4.3197597000000005, 4.2694026, 4.2012034, 4.133134, 4.035212, 3.9837262, 3.9412007, 3.8503475999999996,
3.8178950000000005, 3.7753053999999997, 3.6728842]
dstart=20.0
xdata=np.array(xdata[::-1])
xdata=xdata-dstart
xdata=list(xdata)
xdata1=[]
ydata1=[]
for i in range(len(xdata)):
if xdata[i]>0:
xdata1.append(xdata[i])
ydata1.append(ydata[i])
xdata=np.array(xdata1)
ydata=np.array(ydata1)
popt, pcov = curve_fit(func2, xdata, ydata)
a=popt[0]
print "E=", popt[0]/10**6
t=func2(xdata,a)
ax=pyplot.figure().add_subplot(1,1,1)
ax.plot(xdata,t, color="blue",mew=2.0,label="Hertz Fit")
ax.plot(xdata,ydata,ls="",marker="x",color="red",mew=2.0,label="Data")
ax.legend(loc=2)
pyplot.show()
The "dstart" value basically cuts off the lower portion of the code I don't want to fit because it is negative and the model doesn't like negative numbers. Currently I have to manually set "dstart" before running the code and then I see the final result.
I started by doing this fitting in Excel with Solver to vary both the "EM" variable and the "dstart" variable simultaneously by nesting the code which adjusts the xdata by "dstart" and cuts off the negative values into the function being fit.
Essentially what I want is:
import numpy as np
import matplotlib.pyplot as pyplot
import scipy
from scipy.optimize import curve_fit
def func2(x,EM,dstart):
xdata=np.array(x[::-1])
xdata=dstart-xdata
xdata=list(xdata)
xdata1=[]
for i in range(len(xdata)):
if xdata[i]>0:
xdata1.append(xdata[i])
global xdata2
xdata2=np.array(xdata1)
return (((4.0*EM*(np.sqrt(8*10**-9)))/(3.0*(1.0-(0.5**2))*8*10**-9))*(((((xdata2))*1*10**-9)**((3.0/2.0)))))
ydata=[-0.003428768, -0.009050058, -0.0037997673999999996, -0.0003833233, -0.007557649, -0.0034860994, -0.0009856887, -0.0017508664, -0.00036931394999999996,
-0.0040713947, -0.005737315000000001, 0.0005120568, -0.007336486, -0.00719302, -0.0039941817, -0.0029785274, -0.0013044578, -0.008190335, -0.00833507,
-0.0074282060000000006, -0.009629990000000001, -0.009425125, -0.008662485999999999, -0.0019445216, -0.008331748, -0.009513038, -0.0047609017, -0.004364422,
-0.010325097, -0.0036570733, -0.0060091914, -0.005655772, -0.0045517069999999995, -0.00066998035, 0.006374902, 0.006445733, 0.0019101816,
0.010262737999999999, 0.011139007, 0.018161469, 0.016963122, 0.022915895, 0.027177791, 0.028707139, 0.040105638, 0.044088004, 0.041657403,
0.052325636999999994, 0.062399405, 0.07020844, 0.076979915, 0.08888523, 0.099634745, 0.10961602, 0.12188646, 0.13677225, 0.15639512, 0.16833586,
0.18849944000000002, 0.21515548, 0.23989769000000002, 0.26319308, 0.29388397, 0.321042, 0.35637776, 0.38564656999999997, 0.4185209, 0.44986692,
0.48931552999999994, 0.52583893, 0.5626885, 0.6051665, 0.6461075, 0.69644346, 0.7447817, 0.7931281, 0.8381386000000001, 0.8883482, 0.9395609999999999,
0.9853629, 1.0377034, 1.0889026, 1.1334094]
xdata=[34.51388, 33.963736999999995,
33.510695, 33.04127, 32.477253, 32.013624, 31.536019999999997, 31.02925, 30.541649999999997,
30.008646, 29.493828, 29.049707, 28.479668, 27.980956, 27.509590000000003, 27.018721, 26.533737, 25.972296,
25.471065, 24.979228000000003, 24.459624, 23.961517, 23.46839, 23.028454, 22.471411, 21.960924, 21.503428000000003,
21.007033, 20.453855, 20.013475, 19.492528, 18.995746999999998, 18.505670000000002, 18.040403, 17.603387, 17.104082,
16.563634, 16.138298000000002, 15.646187, 15.20897, 14.69833, 14.25156, 13.789688, 13.303409, 12.905278, 12.440909, 11.919262,
11.514609, 11.104646, 10.674512, 10.235055, 9.84145, 9.437523, 9.026733, 8.63639, 8.2694065, 7.944733, 7.551445, 7.231599999999999,
6.9697434, 6.690793299999999, 6.3989780000000005, 6.173159, 5.9157856, 5.731453, 5.4929328, 5.2866156, 5.066648000000001, 4.9190496,
4.745381399999999, 4.574569599999999, 4.4540283, 4.3197597000000005, 4.2694026, 4.2012034, 4.133134, 4.035212, 3.9837262, 3.9412007, 3.8503475999999996,
3.8178950000000005, 3.7753053999999997, 3.6728842]
xdata2=list(xdata2)
ydata1=[]
for i in range(len(xdata2)):
if xdata2[i]>0:
ydata1.append(ydata[i])
popt, pcov = curve_fit(func2, xdata, ydata)
But this doesn't work as I get a value error "ValueError: operands could not be broadcast together with shapes (28,) (30,)". I think what I need is for the the curve_fit to bring in the xdata, adjust by the first guessed "dstart", guess EM and check for fit and minimized error, try new "dstart" to adjust xdata, guess EM and check for fit and minimized error, so on and so forth. As I'm still fairly new to Python I'm definitely out of my element with the curve fit and I would just use Excel if I didn't have potentially thousands of curves to run.
Any help would be appreciated!
I'll split this in two: conceptual and coding related
Conceptual:
Let's start by rephrasing your question. As it stands the answer is: Yes, obviously. Simply absorb the parameter-dependent change of x in the target function. But that won't solve your problem. What you really seem to be interested in is what to do with parameters for which some of the x cannot be processed by your function. There is no one-size-fits-all for that.
You could choose to deem such parameters as unacceptable in which case you'd have to resort to constrained optimisation. There are a few solvers in scipy that can do that.
You could choose to remove the difficult points from the data set before fitting.
You could introduce soft constraints and penalise bad values instead of ruling them out completely.
Programming style:
for loops in numerical programs. There are gazillions of posts on that on this site, so I'll only give one example:
xdata2=list(xdata2)
ydata1=[]
for i in range(len(xdata2)):
if xdata2[i]>0:
ydata1.append(ydata[i])
can be written in one line that will execute much faster and return an array instead of a list:
ydata1 = ydata[xdata2 > 0]
look at the numpy tutorial/docs or search this site for "vectorization" if you want to learn this technique.
Apart from that, no complaints.
Why your second program doesn't work.
You are sieving both your x and your y, so they should have the same shape. But then you go on and use an old copy instead of the new y whereas you do use the new x. That's why the shapes don't match
Btw. the way you've set it up (modify x within func2) is more or less implementing the absorb strategy I mention earlier. Only, since you have no access to y you cannot change the shape of x.