I've solved a problem on spoj, but it's still too slow for being accepted.
I've tried to make it use multiprocessing too, but I've failed because it's still slower.
The basic implemenation, even with pypy, returns "time limits exceeded" on spoj.
So, how can I improve it?
And what is wrong with the multiprocessing implementation?
# -- shipyard
from collections import defaultdict
#W = 100 total weight
#N = 2 number of types
#value | weight
#1 1
#30 50
# result -> 60 = minimum total value
#c = [1, 30]
#w = [1, 50]
def knap(W, N, c, w):
f = defaultdict(int)
g = defaultdict(bool)
g[0] = True
for i in xrange(N):
for j in xrange(W):
if g[j]:
g[j+w[i]] = True
#print "g("+str(j+w[i])+") = true"
if ( f[j+w[i]]==0 or f[j+w[i]]>f[j]+c[i]):
f[j+w[i]] = f[j]+c[i]
#print " f("+str(j+w[i])+") = ",f[j+w[i]]
if g[W]:
print f[W]
else:
print -1
def start():
while True:
num_test = int(raw_input())
for i in range(num_test):
totWeight = int(raw_input())
types = int(raw_input())
costs = defaultdict(int)
weights = defaultdict(int)
for t in range(int( types )):
costs[t], weights[t] = [int(i) for i in raw_input().split()]
knap(totWeight, types, costs, weights)
return
if __name__ == '__main__':
start()
And here is the multiprocessing version:
# -- shipyard
from multiprocessing import Process, Queue
from collections import defaultdict
from itertools import chain
W = 0
c = {} #[]
w = {} #[]
def knap(i, g, f, W, w, c, qG, qF):
for j in xrange( W ):
if g[j]:
g[j+w[i]] = True
#print "g("+str(j+w[i])+") = true"
if ( f[j+w[i]]==0 or f[j+w[i]]>f[j]+c[i]):
f[j+w[i]] = f[j]+c[i]
#print " f("+str(j+w[i])+") = ",f[j+w[i]]
qG.put( g)
qF.put( f)
def start():
global f, g, c, w, W
while True:
num_test = int(raw_input())
for _ in range(num_test):
qG = Queue()
qF = Queue()
W = int(raw_input())
N = int(raw_input()) # types
c = {} #[0 for i in range(N)]
w = {} #[0 for i in range(N)]
f = defaultdict(int)
g = defaultdict(bool)
g[0] = True
for t in range( N ):
c[t], w[t] = [int(i) for i in raw_input().split()]
# let's go parallel
for i in xrange(0, N, 2):
k1 = Process(target=knap, args=(i, g, f, W, w, c, qG, qF))
k2 = Process(target=knap, args=(i+1, g, f, W, w, c, qG, qF))
k1.start()
k2.start()
k1.join()
k2.join()
#while k1.is_alive(): # or k2.is_alive():
# None
#g2 = defaultdict(bool, chain( g.iteritems(), qG.get().iteritems(), qG.get().iteritems()))
#f2 = defaultdict(int, chain( f.iteritems(), qF.get().iteritems(), qF.get().iteritems()))
g2 = defaultdict(bool, g.items()+ qG.get().items()+ qG.get().items())
f2 = defaultdict(int, f.items()+ qF.get().items()+ qF.get().items())
g = g2
f = f2
print "\n g: ", len(g), "\n f: ", len(f),"\n"
if g[W]:
print f[W]
else:
print -1
return
if __name__ == '__main__':
start()
I probably haven't understood how to make two processes to work efficently on the same dictionary
Some programming contest will explicitly ban multithreading or block it so try to look elsewhere. My approach in those times is use a profiling tool to see where your code is struggling. You could try the built-in cProfile (python -m cProfile -o <outputfilename> <script-name> <options>) and then this wonderful visualization tool: http://www.vrplumber.com/programming/runsnakerun/
Once you have your visualization look around, dig into boxes. Some times there are things that are not directly evident but make sense once inspecting the running times. i.e. a common problem (not sure if it's your case) is checking for list membership. It's much faster to use a set in this case and often it pays to keep a separate list and set if you need the order later. There are many more tips regarding importing variables into local space, etc. You can check a list here: https://wiki.python.org/moin/PythonSpeed/PerformanceTips
Many people who use Python face the same problem on programming contest sites. I've found that its best to simply ditch Python altogether for problems which accept large inputs where you have to construct and iterate over a big data structure. Simply re-implement the same solution in C or C++. Python is known to be 10 to 100 times slower than well optimised C/C++ code.
Your code looks good and there's really little you can do to gain more speed (apart from big O improvements or going through hoops such as multiprocessing). If you have to use Python, try to avoid creating unnecessary large lists and use the most efficient algorithm you can think of. You can also try to generate and run large test cases prior to submitting your solution.
Related
I am entirely new to parallel computing, in fact, to numerical methods. I am trying to solve a differential equation using python solve_ivp of the following form:
y''(x) + (a^2 + x^2)y(x) = 0
y(0)=1
y'(0)=0
x=(0,100)
I want to solve for a range of a and write a file as a[i] y[i](80).
The original Equation is quite complicated but essentially the structure is the same as defined above. I have used a for loop and it is taking a lot of time for computation. Searching online I came across this beautiful website and found this question and related answer that preciously may solve the problem I am facing.
I tried the original code provided in the solution; however, the output produced is not properly sorted. I mean, the second column is not in proper order.
q1 a1 Y1
q1 a2 Y3
q1 a4 Y4
q1 a3 Y3
q1 a5 Y5
...
I have even tried with one loop with one parameter, but the same issue still remains. Below is my code with the same multiprocessing method but with solve_ivp
import numpy as np
import scipy.integrate
import multiprocessing as mp
from scipy.integrate import solve_ivp
def fun(t, y):
# replace this function with whatever function you want to work with
# (this one is the example function from the scipy docs for odeint)
theta, omega = y
dydt = [omega, -a*omega - q*np.sin(theta)]
return dydt
#definitions of work thread and write thread functions
tspan = np.linspace(0, 10, 201)
def run_thread(input_queue, output_queue):
# run threads will pull tasks from the input_queue, push results into output_queue
while True:
try:
queueitem = input_queue.get(block = False)
if len(queueitem) == 3:
a, q, t = queueitem
sol = solve_ivp(fun, [tspan[0], tspan[-1]], [1, 0], method='RK45', t_eval=tspan)
F = 1 + sol.y[0].T[157]
output_queue.put((q, a, F))
except Exception as e:
print(str(e))
print("Queue exhausted, terminating")
break
def write_thread(queue):
# write thread will pull results from output_queue, write them to outputfile.txt
f1 = open("outputfile.txt", "w")
while True:
try:
queueitem = queue.get(block = False)
if queueitem[0] == "TERMINATE":
f1.close()
break
else:
q, a, F = queueitem
print("{} {} {} \n".format(q, a, F))
f1.write("{} {} {} \n".format(q, a, F))
except:
# necessary since it will throw an error whenever output_queue is empty
pass
# define time point sequence
t = np.linspace(0, 10, 201)
# prepare input and output Queues
mpM = mp.Manager()
input_queue = mpM.Queue()
output_queue = mpM.Queue()
# prepare tasks, collect them in input_queue
for q in np.linspace(0.0, 4.0, 100):
for a in np.linspace(-2.0, 7.0, 100):
# Your computations as commented here will now happen in run_threads as defined above and created below
# print('Solving for q = {}, a = {}'.format(q,a))
# sol1 = scipy.integrate.odeint(fun, [1, 0], t, args=( a, q))[..., 0]
# print(t[157])
# F = 1 + sol1[157]
input_tupel = (a, q, t)
input_queue.put(input_tupel)
# create threads
thread_number = mp.cpu_count()
procs_list = [mp.Process(target = run_thread , args = (input_queue, output_queue)) for i in range(thread_number)]
write_proc = mp.Process(target = write_thread, args = (output_queue,))
# start threads
for proc in procs_list:
proc.start()
write_proc.start()
# wait for run_threads to finish
for proc in procs_list:
proc.join()
# terminate write_thread
output_queue.put(("TERMINATE",))
write_proc.join()
Kindly let me know what is wrong in the multiprocessing so that I can learn a bit about multiprocessing in python in the process. Also, I would much appreciate if anyone let me know about the most elegant/efficient way(s) for handling such a computation in python. Thanks
What you want is something of an online sort. In this case, you know the order in which you want to present results (i.e., the input order), so just accumulate the outputs in a priority queue and pop elements from it when they match the next key you’re expecting.
A trivial example without showing the parallelism:
import heapq
def sort_follow(pairs,ref):
"""
Sort (a prefix of) pairs so as to have its first components
be the elements of (sorted) ref in order.
Uses memory proportional to the disorder in pairs.
"""
heap=[]
pairs=iter(pairs)
for r in ref:
while not heap or heap[0][0]!=r:
heapq.heappush(heap,next(pairs))
yield heapq.heappop(heap)
The virtue of this approach—the reduced memory footprint—is probably irrelevant for small results of just a few floats, but it’s easy enough to apply.
I wrote a code that works great for solving equations numerically, but there is I a specific equation that when I get in there and try running the code - it will run and no output will ever come out!
Equation I got an output for: x^3−3*x+2−a*(np.sin(x))
Equation I didn't get an output for: (x-1)(x-2)(x-3)-a*(np.cos(x))
I also tried writing the second equation without brackets, like this: x^3-6*x^2+11*x-6-a*np.cos(x)
and it didn't help. where is the problem?!
this is my code:
import math
import numpy as np
h =1e-5
eps =1e-8
#function of the equation
def nf(x,a,c):
c=math.cos(x)
solu=(x-1)*(x-2)*(x-3)-a*c
return(solu)
#numerical method
def sl(a,x):
c=math.cos(x)
f = nf(x,a,c)
while abs(f)>eps:
x = x - h*f/(nf(x+h,a,c)-f)
f = nf(x,a,c)
return(x)
N = 101
mya = np.linspace(0.0,1.0,N)
myb = np.zeros(mya.shape)
myc = np.zeros(mya.shape)
myd = np.zeros(mya.shape)
for i in range(0,N):
myb[i] = sl(mya[i],1.0)
myc[i] = sl(mya[i],2.0)
myd[i] = sl(mya[i],3.0)
print(myb[i])
print(myc[i])
print(myd[i])
The problem is that for some input to sl, abs(f)>eps might not ever become False, creating an infinite loop. I havn't investigated the mathematical problem of yours, so I can't solve this problem "for real". What I can provide is automatic detection of when this happens, so that the code returns without a result rather than looping forever.
def sl(a,x):
c=math.cos(x)
f = nf(x,a,c)
count, maxcount = 0, 1000
while abs(f)>eps:
x = x - h*f/(nf(x+h,a,c)-f)
f = nf(x,a,c)
count += 1
if count > maxcount:
return
return(x)
Here, a maximum of 1000 iterations is allowed before a solution is deemed unreachable. In such a case, sl returns None, which when inserted into your NumPy float arrays becomes np.nan.
Upon investigating the output, only myc[60] fails in this way.
Your nf function was a little weird. You were passing c = math.cos(x) into nf() but whithin nf() you tried to assign c to math.cos(x) again. Just use the value c what you passed. Commenting it out fixes your code. As for the mathematical correctness, I cannot determine that unless you provide a better explanation what youre trying to do.
import math
import numpy as np
h =1e-5
eps =1e-8
#function of the equation
def nf(x,a,c):
# this line is not needed. Commenting allows your code to run
# c=math.cos(x)
solu=(x-1)*(x-2)*(x-3)-a*c
return(solu)
#numerical method
def sl(a,x):
c = math.cos(x)
f = nf(x,a,c)
while abs(f)>eps:
x = x - h*f/(nf(x+h,a,c)-f)
f = nf(x,a,c)
return(x)
N = 101
mya = np.linspace(0.0,1.0,N)
myb = np.zeros(mya.shape)
myc = np.zeros(mya.shape)
myd = np.zeros(mya.shape)
for i in range(0,N):
myb[i] = sl(mya[i],1.0)
myc[i] = sl(mya[i],2.0)
myd[i] = sl(mya[i],3.0)
print(myb[i])
print(myc[i])
print(myd[i])
Output:
3.2036907284
0.835006605064
0.677633820877
So I am trying to implement a version of Hartree-Fock theory for a band system. Basically, it's a matrix convergence problem. I have a matrix H0, from whose eigenvalues I can construct another matrix F. The procedure is then to define H1 = H0 + F and check if the eigenvalues of H1 is close to the ones of H0. If not, I construct a new F from eigenvalues of H1 and define H2 = H0 + F. Then check again and iterate.
The problem is somewhat generic and my exact code seems not really relevant. So I am showing only this:
# define the matrix F
def generate_fock(H):
def fock(k): #k is a 2D array
matt = some prefactor*outer(eigenvectors of H(k) with itself) #error1
return matt
return fock
k0 = linspace(0,5*kt/2,51)
# H0 is considered defined
H = lambda k: H0(k)
evalold = array([sort(linalg.eigvalsh(H(array([0,2*k-kt/2])))) for k in k0[:]])[the ones I care]
while True:
fe = generate_fock(H)
H = lambda k: H0(k)+fe(k) #error2
evalnew = array([sort(linalg.eigvalsh(H(array([0,2*k-kt/2])))) for k in k0[:]])[the ones I care]
if allclose(evalnew, evalold): break
else: evalold = evalnew
I am using inner functions, hoping that python would not find my definitions are recursive (I am not sure if I am using the word correctly). But python knows :( Any suggestions?
Edit1:
The error message is highlighting the lines I labeled error1 and error2 and showing the following:
RecursionError: maximum recursion depth exceeded while calling a Python object
I think this comes from my way of defining the functions: In loop n, F(k) depends on H(k) of the previous loop and H(k) in the next step depends on F(k) again. My question is how do I get around this?
Edit2&3:
Let me add more details to the code as suggested. This is the shortest thing I can come up with that exactly reproduce my problem.
from numpy import *
from scipy import linalg
# Let's say H0 is any 2m by 2m Hermitian matrix. m = 4 in this case.
# Here are some simplified parameters
def h(i,k):
return -6*linalg.norm(k)*array([[0,exp(1j*(angle(k#array([1,1j]))+(-1)**i*0.1/2))],
[exp(-1j*(angle(k#array([1,1j]))+(-1)**i*0.1/2)),0]])
T = array([ones([2,2]),
[[exp(-1j*2*pi/3),1],[exp(1j*2*pi/3),exp(-1j*2*pi/3)]],
[[exp(1j*2*pi/3),1],[exp(-1j*2*pi/3),exp(1j*2*pi/3)]]])
g = array([[ 0.27023695, 0.46806412], [-0.27023695, 0.46806412]])
kt = linalg.norm(g[0])
def H0(k):
"one example"
matt = linalg.block_diag(h(1,k),h(2,k+g[0]),h(2,k+g[1]),h(2,k+g[0]+g[1]))/2
for j in range(3): matt[0:2,2*j+2:2*j+4] = T[j]
return array(matrix(matt).getH())+matt
dim = 4
def bz(x):
"BZ centered at 0 with (2x)^2 points in it"
tempList = []
for i in range(-x,x):
for j in range(-x,x):
tempList.append(i*g[0]/2/x+j*g[1]/2/x)
return tempList
def V(i,G):
"2D Coulomb interaction"
if linalg.norm(G)==0: return 0
if i>=dim: t=1
else: t=0
return 2*pi/linalg.norm(G)*exp(0.3*linalg.norm(G)*(-1+(-1)**t)/2)
# define the matrix F for some H
def generate_fock(H):
def fock(k): #k is a 2D array
matf = zeros([2*dim,2*dim],dtype=complex128)
for pt in bz(1): #bz is a list of 2D arrays
matt = zeros([2*dim,2*dim],dtype=complex128)
eig_vals1, eig_vecs1 = linalg.eigh(H(pt)) #error1
idx = eig_vals1.argsort()[::]
vecs1 = eig_vecs1[:,idx][:dim]
for vec in vecs1:
matt = matt + outer(conjugate(vec),vec)
matt = matt.transpose()/len(bz(1))
for i in range(2*dim):
for j in range(2*dim):
matf[i,j] = V(j-i,pt-k)*matt[i,j] #V is some prefactor
return matf
return fock
k0 = linspace(0,5*kt/2,51)
H = lambda k: H0(k)
evalold = array([sort(linalg.eigvalsh(H(array([0,2*k-kt/2])))) for k in k0[:]])[dim-1:dim+1]
while True:
fe = generate_fock(H)
H = lambda k: H0(k)+fe(k) #error2
evalnew = array([sort(linalg.eigvalsh(H(array([0,2*k-kt/2])))) for k in k0[:]])[dim-1:dim+1]
if allclose(evalnew, evalold): break
else: evalold = evalnew
The problem is these lines:
while True:
fe = generate_fock(H)
H = lambda k: H0(k)+fe(k) #error2
In each iteration, you are generating a new function referencing the older one rather than the final output of that older function, so it has to keep them all on the stack. This will also be very slow, since you have to back multiply all your matrices every iteration.
What you want to do is keep the output of the old values, probably by making a list from the results of the prior iteration and then applying functions from that list.
Potentially you could even do this with a cache, though it might get huge. Keep a dictionary of inputs to the function and use that. Something like this:
# define the matrix F
def generate_fock(H):
d = {}
def fock(k): #k is a 2D array
if k in d:
return d[k]
matt = some prefactor*outer(eigenvectors of H(k) with itself) #error1
d[k] = matt
return matt
return fock
Then it should hopefully only have to reference the last version of the function.
EDIT: Give this a try. As well as caching the result, keep an index into an array of functions instead of a reference. This should prevent a recursion depth overflow.
hList = []
# define the matrix F
def generate_fock(n):
d = {}
def fock(k): #k is a 2D array
if k in d:
return d[k]
matt = some prefactor*outer(eigenvectors of hList[n](k) with itself) #error1
d[k] = matt
return matt
return fock
k0 = linspace(0,5*kt/2,51)
# H0 is considered defined
HList.append(lambda k: H0(k))
H = HList[0]
evalold = array([sort(linalg.eigvalsh(H(array([0,2*k-kt/2])))) for k in k0[:]])[the ones I care]
n = 0
while True:
fe = generate_fock(n)
n += 1
hList.append(lambda k: H0(k)+fe(k)) #error2
H = hList[-1]
evalnew = array([sort(linalg.eigvalsh(H(array([0,2*k-kt/2])))) for k in k0[:]])[the ones I care]
if allclose(evalnew, evalold): break
else: evalold = evalnew
I created following code:
M=20000
sample_all = np.load('sample.npy')
sd = np.zeros(M)
chi_arr = np.zeros((M,4))
sigma_e = np.zeros((M,41632))
mean_sigma = np.zeros(M)
max_sigma = np.zeros(M)
min_sigma = np.zeros(M)
z = np.load('z_array.npy')
prof = np.load('profile_at_sources.npy')
L = np.load('luminosities.npy')
for k in range(M):
sd[k]=np.array(sp.std(sample_all[k,:]))
arr = np.genfromtxt('samples_fin1.txt').T[2:6]
arr_T = arr.T
chi_arr[k,:] = arr_T[k,:]
sigma_e[k,:]=np.sqrt(calc(z,prof,chi_arr[k,:], L))
mean_sigma[k] = np.array(sp.mean(sigma_e[k,:]))
max_sigma[k] = np.array(sigma_e[k,:].max())
min_sigma[k] = np.array(sigma_e[k,:].min())
where calc(...) is a function that calculates some stuff (is not important for my question)
This loop takes, for M=20000, about 27 hours on my machine. It's enough... There's a way to optimize it, maybe with vectors instead of loop for?
For me it's really simple create loop, my head thinks with loops for this kind of code... It's my limitation... Could you help me? thanks
It seems to me like each of the k-th rows that are created in your various arrays are independent of each k-th iteration of your for loop and only dependent on rows of sigma_e... so you could parallelize it over many workers. Not sure if the code is 100% kosher but you didn't provide a working example.
Note this only works if each k-th iteration is COMPLETELY independent of each k-1th iteration.
M=20000
sample_all = np.load('sample.npy')
sd = np.zeros(M)
chi_arr = np.zeros((M,4))
sigma_e = np.zeros((M,41632))
mean_sigma = np.zeros(M)
max_sigma = np.zeros(M)
min_sigma = np.zeros(M)
z = np.load('z_array.npy')
prof = np.load('profile_at_sources.npy')
L = np.load('luminosities.npy')
workers = 100
arr = np.genfromtxt('samples_fin1.txt').T[2:6] # only works if this is really what you're doing to set arr.
def worker(k_start, k_end):
for k in range(k_start, k_end + 1):
sd[k]=np.array(sp.std(sample_all[k,:]))
arr_T = arr.T
chi_arr[k,:] = arr_T[k,:]
sigma_e[k,:]=np.sqrt(calc(z,prof,chi_arr[k,:], L))
mean_sigma[k] = np.array(sp.mean(sigma_e[k,:]))
max_sigma[k] = np.array(sigma_e[k,:].max())
min_sigma[k] = np.array(sigma_e[k,:].min())
threads = []
kstart = 0
for k in range(0, workers):
T = threading.Thread(target=worker, args=[0 + k * M / workers, (1+ k) * M / workers - 1 ])
threads.append(T)
T.start()
for t in threads:
t.join()
Edited following comments:
Seems like there's a mutex that CPython places on all objects that prevents parallel access. Use IronPython or Jython to step around this. Also, you can move the file read outside if you're really just deserializing the same array from samples_fin1.txt.
I would like to query the value of an exponentially weighted moving average at particular points. An inefficient way to do this is as follows. l is the list of times of events and queries has the times at which I want the value of this average.
a=0.01
l = [3,7,10,20,200]
y = [0]*1000
for item in l:
y[int(item)]=1
s = [0]*1000
for i in xrange(1,1000):
s[i] = a*y[i-1]+(1-a)*s[i-1]
queries = [23,68,103]
for q in queries:
print s[q]
Outputs:
0.0355271185019
0.0226018371526
0.0158992102478
In practice l will be very large and the range of values in l will also be huge. How can you find the values at the times in queries more efficiently, and especially without computing the potentially huge lists y and s explicitly. I need it to be in pure python so I can use pypy.
Is it possible to solve the problem in time proportional to len(l)
and not max(l) (assuming len(queries) < len(l))?
Here is my code for doing this:
def ewma(l, queries, a=0.01):
def decay(t0, x, t1, a):
from math import pow
return pow((1-a), (t1-t0))*x
assert l == sorted(l)
assert queries == sorted(queries)
samples = []
try:
t0, x0 = (0.0, 0.0)
it = iter(queries)
q = it.next()-1.0
for t1 in l:
# new value is decayed previous value, plus a
x1 = decay(t0, x0, t1, a) + a
# take care of all queries between t0 and t1
while q < t1:
samples.append(decay(t0, x0, q, a))
q = it.next()-1.0
# take care of all queries equal to t1
while q == t1:
samples.append(x1)
q = it.next()-1.0
# update t0, x0
t0, x0 = t1, x1
# take care of any remaining queries
while True:
samples.append(decay(t0, x0, q, a))
q = it.next()-1.0
except StopIteration:
return samples
I've also uploaded a fuller version of this code with unit tests and some comments to pastebin: http://pastebin.com/shhaz710
EDIT: Note that this does the same thing as what Chris Pak suggests in his answer, which he must have posted as I was typing this. I haven't gone through the details of his code, but I think mine is a bit more general. This code supports non-integer values in l and queries. It also works for any kind of iterables, not just lists since I don't do any indexing.
I think you could do it in ln(l) time, if l is sorted. The basic idea is that the non recursive form of EMA is a*s_i + (1-a)^1 * s_(i-1) + (1-a)^2 * s_(i-2) ....
This means for query k, you find the greatest number in l less than k, and for a estimation limit, use the following, where v is the index in l, l[v] is the value
(1-a)^(k-v) *l[v] + ....
Then, you spend lg(len(l)) time in search + a constant multiple for the depth of your estimation. I'll provide a code sample in a little bit (after work) if you want it, just wanted to get my idea out there while I was thinking about it
here's the code -
v is the dictionary of values at a given time; replace with 1 if it's just a 1 every time...
import math
from bisect import bisect_right
a = .01
limit = 1000
l = [1,5,14,29...]
def find_nearest_lt(l, time):
i = bisect_right(a, x)
if i:
return i-1
raise ValueError
def find_ema(l, time):
i = find_nearest_lt(l, time)
if l[i] == time:
result = a * v[l[i]
i -= 1
else:
result = 0
while (time-l[i]) < limit:
result += math.pow(1-a, time-l[i]) * v[l[i]]
i -= 1
return result
if I'm thinking correctly, the find nearest is l(n), then the while loop is <= 1000 iterations, guaranteed, so it's technically a constant (though a kind of large one). find_nearest was stolen from the page on bisect - http://docs.python.org/2/library/bisect.html
It appears that y is a binary value -- either 0 or 1 -- depending on the values of l. Why not use y = set(int(item) for item in l)? That's the most efficient way to store and look up a list of numbers.
Your code will cause an error the first time through this loop:
s = [0]*1000
for i in xrange(1000):
s[i] = a*y[i-1]+(1-a)*s[i-1]
because i-1 is -1 when i=0 (first pass of loop) and both y[-1] and s[-1] are the last element of the list, not the previous. Maybe you want xrange(1,1000)?
How about this code:
a=0.01
l = [3.0,7.0,10.0,20.0,200.0]
y = set(int(item) for item in l)
queries = [23,68,103]
ewma = []
x = 1 if (0 in y) else 0
for i in xrange(1, queries[-1]):
x = (1-a)*x
if i in y:
x += a
if i == queries[0]:
ewma.append(x)
queries.pop(0)
When it's done, ewma should have the moving averages for each query point.
Edited to include SchighSchagh's improvements.