Efficient n-body simulation with dask - python

An N-body simulation is used to simulated dynamics of a physical system involving particles interactions, or a problem reduced to some kind of particles with physical meaning. A particle could be a gas molecule or a star in a galaxy. Dask.bag provides a simple way to distribute the particles in a cluster, for example, giving dask.bag.from_sequence() a custom iterator, that returns a particle object:
class ParticleGenerator():
def __init__(self, num_of_particles, max_position, seed=time.time()):
random.seed(seed)
self.index = -1
self.limit = num_of_particles
self.max_position = max_position
def __iter__(self):
return self
def __next__(self):
self.index += 1
if self.index < self.limit :
return np.array([self.max_position*random.random(), self.max_position*random.random(), self.max_position*random.random()])
else :
raise StopIteration
b = db.from_sequence( ParticleGenerator(1000, 1, seed=123456789) )
Here, the particle object is simply a numpy array, but could be anything. Now, to compute the interactions between all particles, information about position, speed and similar quantities must be shared. dask.bag.map maps a function across all elements in collection, inside this function, interaction between the element and all other particles is calculated to obtain the new particle state.
b = b.map(update_position, others=list(b))
b.compute()
For completitude, this is update_position function:
def update_position(e, others=None, mass=1, dt=1e-4):
f = np.zeros(3)
for o in others:
r = e - o
r_mag = np.sqrt(r.dot(r))
if r_mag == 0 :
continue
f += ( A/(r_mag**7) + B/(r_mag**13) ) * r
return e + f * (dt**2 / mass)
A and B some arbitrary values. dask.bag.map() could be called multiple times inside a loop to execute the simulation.
Is Dask.bag a good collection (abstraction) for dealing with this kind of problems? Maybe Dask.distributed is a better idea?
Programming the simulation this way, is the scheduler handling all communications or information about position, speed, etc is shared with inter-worker communication?
Any comments to optimize the code? Specially about the overheat of transforming the collection into a list while calling dask.bag.map().

Generally speaking N-Body simulations require sophisticated algorithms and data structures to run efficiently. Many common solutions include the use of complex tree data structures. You might want to search for terms like kd-tree or barnes-hut.
Dask.bag on the other hand is one of the simplest/dumbest parallel programming abstractions you can imagine, similar to other bulk data processing systems like MapReduce and Spark. These systems are not flexible enough to give good performance on complex problems like N-Body simulations.
Something like dask.array or dask.delayed will offer more flexibility, but even these won't be the same as a finely tuned KD-Tree.

Related

Improve performance of combinations

Hey guys I have a script that compares each possible user and checks how similar their text is:
dictionary = {
t.id: (
t.text,
t.set,
t.compare_string
)
for t in dataframe.itertuples()
}
highly_similar = []
for a, b in itertools.combinations(dictionary.items(), 2):
if a[1][2] == b[1][2] and not a[1][1].isdisjoint(b[1][1]):
similarity_score = fuzz.ratio(a[1][0], b[1][0])
if (similarity_score >= 95 and len(a[1][0]) >= 10) or similarity_score == 100:
highly_similar.append([a[0], b[0], a[1][0], b[1][0], similarity_score])
This script takes around 15 minutes to run, the dataframe contains 120k users, so comparing each possible combination takes quite a bit of time, if I just write pass on the for loop it takes 2 minutes to loop through all values.
I tried using filter() and map() for the if statements and fuzzy score but the performance was worse. I tried improving the script as much as I could but I don't know how I can improve this further.
Would really appreciate some help!
It is slightly complicated to reason about the data since you have not attached it, but we can see multiple places that might provide an improvement:
First, let's rewrite the code in a way which is easier to reason about than using the indices:
dictionary = {
t.id: (
t.text,
t.set,
t.compare_string
)
for t in dataframe.itertuples()
}
highly_similar = []
for a, b in itertools.combinations(dictionary.items(), 2):
a_id, (a_text, a_set, a_compre_string) = a
b_id, (b_text, b_set, b_compre_string) = b
if (a_compre_string == b_compre_string
and not a_set.isdisjoint(b_set)):
similarity_score = fuzz.ratio(a_text, b_text)
if (similarity_score >= 95 and len(a_text) >= 10)
or similarity_score == 100):
highly_similar.append(
[a_id, b_id, a_text, b_text, similarity_score])
You seem to only care about pairs having the same compare_string values. Therefore, and assuming this is not something that all pairs share, we can key by whatever that value is to cover much less pairs.
To put some numbers into it, let's say you have 120K inputs, and 1K values for each value of val[1][2] - then instead of covering 120K * 120K = 14 * 10^9 combinations, you would have 120 bins of size 1K (where in each bin we'd need to check all pairs) = 120 * 1K * 1K = 120 * 10^6 which is about 1000 times faster. And it would be even faster if each bin has less than 1K elements.
import collections
# Create a dictionary from compare_string to all items
# with the same compare_string
items_by_compare_string = collections.defaultdict(list)
for item in dictionary.items():
compare_string = item[1][2]
items_by_compare_string[compare_string].append(items)
# Iterate over each group of items that have the same
# compare string
for item_group in items_by_compare_string.values():
# Check pairs only within that group
for a, b in itertools.combinations(item_group, 2):
# No need to compare the compare_strings!
if not a_set.isdisjoint(b_set):
similarity_score = fuzz.ratio(a_text, b_text)
if (similarity_score >= 95 and len(a_text) >= 10)
or similarity_score == 100):
highly_similar.append(
[a_id, b_id, a_text, b_text, similarity_score])
But, what if we want more speed? Let's look at the remaining operations:
We have a check to find if two sets share at least one item
This seems like an obvious candidate for optimization if we have any knowledge about these sets (to allow us to determine which pairs are even relevant to compare)
Without additional knowledge, and just looking at every two pairs and trying to speed this up, I doubt we can do much - this is probably highly optimized using internal details of Python sets, I don't think it's likely to optimize it further
We a fuzz.ratio computation which is some external function, and I'm going to assume is heavy
If you are using this from the FuzzyWuzzy package, make sure to install python-Levenshtein to get the speedups detailed here
We have some comparisons which we are unlikely to be able to speed up
We might be able to cache the length of a_text by nesting the two loops, but that's negligible
We have appends to a list, which runs on average ("amortized") constant time per operation, so we can't really speed that up
Therefore, I don't think we can reasonably suggest any more speedups without additional knowledge. If we know something about the sets that can help optimize which pairs are relevant we might be able to speed things up further, but I think this is about it.
EDIT: As pointed out in other answers, you can obviously run the code in multi-threading. I assumed you were looking for an algorithmic change that would possibly reduce the number of operations significantly, instead of just splitting these over more CPUs.
Essentially, from python programming side, i see two things that can improve your processing time:
Multi-threads and Vectorized operations
From the fuzzy score side, here is a list of tips you can use to improve your processing time (new anonymous tab to avoid paywall):
https://towardsdatascience.com/fuzzy-matching-at-scale-84f2bfd0c536
Using multi thread you can speed you operation up to N times, being N the number of threads in you CPU. You can check it with:
import multiprocessing
multiprocessing.cpu_count()
Using vectorized operations you can parallel process your operations in low level with SIMD (single instruction / multiple data) operations, or with gpu tensor operations (like those in tensorflow/pytorch).
Here is a small comparison of results for each case:
import numpy as np
import time
A = [np.random.rand(512) for i in range(2000)]
B = [np.random.rand(512) for i in range(2000)]
high_similarity = []
def measure(i,j,a,b,high_similarity):
d = ((a-b)**2).sum()
if d>12:
high_similarity.append((i,j,d))
start_single_thread = time.time()
for i in range(len(A)):
for j in range(len(B)):
if i<j:
measure(i,j,A[i],B[j],high_similarity)
finis_single_thread = time.time()
print("single thread time:",finis_single_thread-start_single_thread)
out[0] single thread time: 147.64517450332642
running on multi thread:
from threading import Thread
high_similarity = []
def measure(a = None,b= None,high_similarity = None):
d = ((a-b)**2).sum()
if d > 12:
high_similarity.append(d)
start_multi_thread = time.time()
for i in range(len(A)):
for j in range(len(B)):
if i<j:
thread = Thread(target=measure,kwargs= {'a':A[i],'b':B[j],'high_similarity':high_similarity} )
thread.start()
thread.join()
finish_multi_thread = time.time()
print("time to run on multi threads:",finish_multi_thread - start_multi_thread)
out[1] time to run on multi-threads: 11.946279764175415
A_array = np.array(A)
B_array = np.array(B)
start_vectorized = time.time()
for i in range(len(A_array)):
#vectorized distance operation
dists = (A_array-B_array)**2
high_similarity+= dists[dists>12].tolist()
aux = B_array[-1]
np.delete(B_array,-1)
np.insert(B_array, 0, aux)
finish_vectorized = time.time()
print("time to run vectorized operations:",finish_vectorized-start_vectorized)
out[2] time to run vectorized operations: 2.302949905395508
Note that you can't guarantee any order of execution, so will you also need to store the index of results. The snippet of code is just to illustrate that you can use parallel process, but i highly recommend to use a pool of threads and divide your dataset in N subsets for each worker and join the final result (instead of create a thread for each function call like i did).

How to consume a python gneerator in parallel using multiprocessing?

How can I improve the performance of the networkx function local_bridges https://networkx.org/documentation/stable//reference/algorithms/generated/networkx.algorithms.bridges.local_bridges.html#networkx.algorithms.bridges.local_bridges
I have experimented using pypy - but so far am still stuck on consuming the generator on a single core. My graph has 300k edges. An example:
# construct the nx Graph:
import networkx as nx
# construct an undirected graph here - this is just a dummy graph
G = nx.cycle_graph(300000)
# fast - as it only returns an generator/iterator
lb = nx.local_bridges(G)
# individual item is also fast
%%time
next(lb)
CPU times: user 1.01 s, sys: 11 ms, total: 1.02 s
Wall time: 1.02 s
# computing all the values is very slow.
lb_list = list(lb)
How can I consume this iterator in parallel to utilize all processor cores? The current naive implementation is only using a single core!
My naive multi-threaded first try is:
import multiprocessing as mp
lb = nx.local_bridges(G)
pool = mp.Pool()
lb_list = list(pool.map((), lb))
However, I do not want to apply a specific function - () rather only get the next element from the iterator in parallel.
Related:
python or dask parallel generator?
edit
I guess it boils down how to parallelize:
lb_res = []
lb = nx.local_bridges(G)
for node in range(1, len(G) +1):
lb_res.append(next(lb))
lb_res
Naively using multiprocessing obviously fails:
# from multiprocessing import Pool
# https://stackoverflow.com/questions/41385708/multiprocessing-example-giving-attributeerror
from multiprocess import Pool
lb_res = []
lb = nx.local_bridges(G)
def my_function(thing):
return next(thing)
with Pool(5) as p:
parallel_result = p.map(my_function, range(1, len(G) +1))
parallel_result
But it is unclear to me how I can pass my generator as the argument to the map function - and fully consume the generator.
edit 2
For this particular problem, it turns out that the bottleneck is the shortest path computation for the with_span=True parameter. When disabled, it is decently fast.
When calculating the span is desired I would suggest cugraph with a fast implementation of SSSP on the GPU. Still, the iteration over the set of edges does not happen in parallel and should be improved further.
However, to learn more, I would be interested in understanding how to parallelize the consumption from a generator in python.
You can't consume a generator in parallel, every non-trivial generator's next state is determined by its current state. You have to call next() sequentially.
From https://github.com/networkx/networkx/blob/master/networkx/algorithms/bridges.py#L162 this is how the function is implemented
for u, v in G.edges:
if not (set(G[u]) & set(G[v])):
yield u, v
So you could parallelize it using something like this, but then you would have to incur the penalty of merging those individual lists using something like multiprocessing.Manager. I think it would just make the whole thing much slower, but you can time it yourself.
def process_edge(e):
u, v = e
lb_list = []
if not (set(G[u]) & set(G[v])):
lb_list.append((u,v))
with Pool(os.cpu_count()) as pool:
pool.map(process_edge, G.edges)
Another way is to split the graph into ranges of vertices and process them concurrently.
def process_nodes(nodes):
lb_list = []
for u in nodes:
for v in G[u]:
if not (set(G[u]) & set(G[v])):
lb_list.append((u,v))
with Pool(os.cpu_count()) as pool:
pool.map(process_nodes, np.array_split(list(range(G.number_of_nodes())),
os.cpu_count()))
Maybe you could also check if any better algorithms exist for this problem. Or find a faster library that's implemented in C.

recursion to iteration in python

We are trying to make a cluster analysis for a big amount of data. We are kind of new to python and found out that an iterative function is way more efficient than an recursive one. Now we are trying to change that but it is way harder than we thought.
This code underneath is the heart of our clustering function. This takes over 90 percent of the time. Can you help us to change that into a recursive one?
Some extra information: The taunach function gets neighbours of our point which will later form the clusters. The problem is that we have many many points.
def taunach(tau,delta, i,s,nach,anz):
dis=tabelle[s].dist
#delta=tau
x=data[i]
y=Skalarprodukt(data[tabelle[s].index]-x)
a=tau-abs(dis)
#LA.norm(data[tabelle[s].index]-x)
if y<a*abs(a):
nach.update({item.index for item in tabelle[tabelle[s].inner:tabelle[s].outer-1]})
anz = anzahl(delta, i, tabelle[s].inner, anz)
if dis>-1:
b=dis-tau
if y>=b*abs(b):#*(1-0.001):
nach,anz=taunach(tau,delta, i,tabelle[s].outer,nach,anz)
else:
if y<tau**2:
nach.add(tabelle[s].index)
if y < delta:
anz += 1
if tabelle[s].dist>-4:
b = dis - tau
if y>=b*abs(b):#*(1-0.001)):
nach,anz=taunach(tau,delta, i,tabelle[s].outer,nach,anz)
if tabelle[s].dist > -1:
if y<=(dis+tau)**2:
nach,anz=taunach(tau,delta, i,tabelle[s].inner,nach,anz)
return nach,anz

Batch-constraining objects (feathers to a wing)

really not long ago I had my first dumb question answered here so... there I am again, with a hopefully less dumb and more interesting headscratcher. Keep in my mind I am still making my baby steps in scripting !
There it is : I need to rig a feathered wing, and I already have all the feathers in place. I thought of mimicking another rig I animated recently that had the feathers point-constrained to the arm and forearm, and orient-constrained to three other controllers on the arm : each and every feather was constrained to two of those controllers at a time, and the constraint's weights would shift as you went down the forearm towards the wrist, so that one feather perfectly at mid-distance between the elbow and the forearm would be equally constrained by both controllers... you get the picture.
My reasoning was as follows : let's make a loop that iterates over every feather, gets its world position, finds the distance from that feather to each of the orient controllers (through Pythagoras), normalize that and feed the values into the weight attribute of an orient constraint. I could even go the extra mile and pass the normalized distance through a sine function to get a nice easing into the feathers' silhouette.
My pseudo-code is ugly and broken, but it's a try. My issues are inlined.
Second try !
It works now, but only on active object, instead of the whole selection. What could be happening ?
import maya.cmds as cmds
# find world space position of targets
base_pos = cmds.xform('base',q=1,ws=1,rp=1)
tip_pos = cmds.xform('tip',q=1,ws=1,rp=1)
def relative_dist_from_pos(pos, ref):
# vector substract to get relative pos
pos_from_ref = [m - n for m, n in zip(pos, ref)]
# pythagoras to get distance from vector
dist_from_ref = (pos_from_ref[0]**2 + pos_from_ref[1]**2 + pos_from_ref[2]**2)**.5
return dist_from_ref
def weight_from_dist(dist_from_base, dist_to_tip):
normalize_fac = (1/(dist_from_base + dist_to_tip))
dist_from_base *= normalize_fac
dist_to_tip *= normalize_fac
return dist_from_base, dist_to_tip
sel = cmds.ls(selection=True)
for obj in sel:
# find world space pos of feather
feather_pos = cmds.xform(obj, q=1, ws=1, rp=1)
# call relative_dist_from_pos
dist_from_base = relative_dist_from_pos(feather_pos, base_pos)
dist_to_tip = relative_dist_from_pos(feather_pos, tip_pos)
# normalize distances
weight_from_dist(dist_from_base, dist_to_tip)
# constrain the feather - weights are inverted
# because the smaller the distance, the stronger the constraint
cmds.orientConstraint('base', obj, w=dist_to_tip)
cmds.orientConstraint('tip', obj, w=dist_from_base)
There you are. Any pointers are appreciated.
Have a good night,
Hadriscus

Writing a faster Python physics simulator

I have been playing around with writing my own physics engine in Python as an exercise in physics and programming. I started out by following the tutorial located here. That went well, but then I found the article "Advanced character physics" by thomas jakobsen, which covered using Verlet integration for simulations, which I found fascinating.
I have been attempting to write my own basic physics simulator using verlet integration, but it turns out to be slightly more difficult than I first expected. I was out browsing for example programs to read, and stumbled accross this one written in Python and I also found this tutorial which uses Processing.
What impresses me about the Processing version is how fast it runs. The cloth alone has 2400 different points being simulated, and that's not including the bodies.
The python example only uses 256 particles for the cloth, and it runs at about 30 frames per second. I tried increasing the number of particles to 2401 (it has to be square for that program to work), it ran at about 3 fps.
Both of these work by storing instances of a particle object in a list, and then iterating through the list, calling each particles "update position" method. As an example, this is the part of the code from the Processing sketch that calculates each particle's new postion:
for (int i = 0; i < pointmasses.size(); i++) {
PointMass pointmass = (PointMass) pointmasses.get(i);
pointmass.updateInteractions();
pointmass.updatePhysics(fixedDeltaTimeSeconds);
}
EDIT: Here is the code from the python version I linked earlier:
"""
verletCloth01.py
Eric Pavey - 2010-07-03 - www.akeric.com
Riding on the shoulders of giants.
I wanted to learn now to do 'verlet cloth' in Python\Pygame. I first ran across
this post \ source:
http://forums.overclockers.com.au/showthread.php?t=870396
http://dl.dropbox.com/u/3240460/cloth5.py
Which pointed to some good reference, that was a dead link. After some searching,
I found it here:
http://www.gpgstudy.com/gpgiki/GDC%202001%3A%20Advanced%20Character%20Physics
Which is a 2001 SIGGRAPH paper by Thomas Jakobsen called:
"GDC 2001: Advanced Characer Physics".
This code is a Python\Pygame interpretation of that 2001 Siggraph paper. I did
borrow some code from 'domlebo's source code, it was a great starting point. But
I'd like to think I put my own flavor on it.
"""
#--------------
# Imports & Initis
import sys
from math import sqrt
# Vec2D comes from here: http://pygame.org/wiki/2DVectorClass
from vec2d import Vec2d
import pygame
from pygame.locals import *
pygame.init()
#--------------
# Constants
TITLE = "verletCloth01"
WIDTH = 600
HEIGHT = 600
FRAMERATE = 60
# How many iterations to run on our constraints per frame?
# This will 'tighten' the cloth, but slow the sim.
ITERATE = 2
GRAVITY = Vec2d(0.0,0.05)
TSTEP = 2.8
# How many pixels to position between each particle?
PSTEP = int(WIDTH*.03)
# Offset in pixels from the top left of screen to position grid:
OFFSET = int(.25*WIDTH)
#-------------
# Define helper functions, classes
class Particle(object):
"""
Stores position, previous position, and where it is in the grid.
"""
def __init__(self, screen, currentPos, gridIndex):
# Current Position : m_x
self.currentPos = Vec2d(currentPos)
# Index [x][y] of Where it lives in the grid
self.gridIndex = gridIndex
# Previous Position : m_oldx
self.oldPos = Vec2d(currentPos)
# Force accumulators : m_a
self.forces = GRAVITY
# Should the particle be locked at its current position?
self.locked = False
self.followMouse = False
self.colorUnlocked = Color('white')
self.colorLocked = Color('green')
self.screen = screen
def __str__(self):
return "Particle <%s, %s>"%(self.gridIndex[0], self.gridIndex[1])
def draw(self):
# Draw a circle at the given Particle.
screenPos = (self.currentPos[0], self.currentPos[1])
if self.locked:
pygame.draw.circle(self.screen, self.colorLocked, (int(screenPos[0]),
int(screenPos[1])), 4, 0)
else:
pygame.draw.circle(self.screen, self.colorUnlocked, (int(screenPos[0]),
int(screenPos[1])), 1, 0)
class Constraint(object):
"""
Stores 'constraint' data between two Particle objects. Stores this data
before the sim runs, to speed sim and draw operations.
"""
def __init__(self, screen, particles):
self.particles = sorted(particles)
# Calculate restlength as the initial distance between the two particles:
self.restLength = sqrt(abs(pow(self.particles[1].currentPos.x -
self.particles[0].currentPos.x, 2) +
pow(self.particles[1].currentPos.y -
self.particles[0].currentPos.y, 2)))
self.screen = screen
self.color = Color('red')
def __str__(self):
return "Constraint <%s, %s>"%(self.particles[0], self.particles[1])
def draw(self):
# Draw line between the two particles.
p1 = self.particles[0]
p2 = self.particles[1]
p1pos = (p1.currentPos[0],
p1.currentPos[1])
p2pos = (p2.currentPos[0],
p2.currentPos[1])
pygame.draw.aaline(self.screen, self.color,
(p1pos[0], p1pos[1]), (p2pos[0], p2pos[1]), 1)
class Grid(object):
"""
Stores a grid of Particle objects. Emulates a 2d container object. Particle
objects can be indexed by position:
grid = Grid()
particle = g[2][4]
"""
def __init__(self, screen, rows, columns, step, offset):
self.screen = screen
self.rows = rows
self.columns = columns
self.step = step
self.offset = offset
# Make our internal grid:
# _grid is a list of sublists.
# Each sublist is a 'column'.
# Each column holds a particle object per row:
# _grid =
# [[p00, [p10, [etc,
# p01, p11,
# etc], etc], ]]
self._grid = []
for x in range(columns):
self._grid.append([])
for y in range(rows):
currentPos = (x*self.step+self.offset, y*self.step+self.offset)
self._grid[x].append(Particle(self.screen, currentPos, (x,y)))
def getNeighbors(self, gridIndex):
"""
return a list of all neighbor particles to the particle at the given gridIndex:
gridIndex = [x,x] : The particle index we're polling
"""
possNeighbors = []
possNeighbors.append([gridIndex[0]-1, gridIndex[1]])
possNeighbors.append([gridIndex[0], gridIndex[1]-1])
possNeighbors.append([gridIndex[0]+1, gridIndex[1]])
possNeighbors.append([gridIndex[0], gridIndex[1]+1])
neigh = []
for coord in possNeighbors:
if (coord[0] < 0) | (coord[0] > self.rows-1):
pass
elif (coord[1] < 0) | (coord[1] > self.columns-1):
pass
else:
neigh.append(coord)
finalNeighbors = []
for point in neigh:
finalNeighbors.append((point[0], point[1]))
return finalNeighbors
#--------------------------
# Implement Container Type:
def __len__(self):
return len(self.rows * self.columns)
def __getitem__(self, key):
return self._grid[key]
def __setitem__(self, key, value):
self._grid[key] = value
#def __delitem__(self, key):
#del(self._grid[key])
def __iter__(self):
for x in self._grid:
for y in x:
yield y
def __contains__(self, item):
for x in self._grid:
for y in x:
if y is item:
return True
return False
class ParticleSystem(Grid):
"""
Implements the verlet particles physics on the encapsulated Grid object.
"""
def __init__(self, screen, rows=49, columns=49, step=PSTEP, offset=OFFSET):
super(ParticleSystem, self).__init__(screen, rows, columns, step, offset)
# Generate our list of Constraint objects. One is generated between
# every particle connection.
self.constraints = []
for p in self:
neighborIndices = self.getNeighbors(p.gridIndex)
for ni in neighborIndices:
# Get the neighbor Particle from the index:
n = self[ni[0]][ni[1]]
# Let's not add duplicate Constraints, which would be easy to do!
new = True
for con in self.constraints:
if n in con.particles and p in con.particles:
new = False
if new:
self.constraints.append( Constraint(self.screen, (p,n)) )
# Lock our top left and right particles by default:
self[0][0].locked = True
self[1][0].locked = True
self[-2][0].locked = True
self[-1][0].locked = True
def verlet(self):
# Verlet integration step:
for p in self:
if not p.locked:
# make a copy of our current position
temp = Vec2d(p.currentPos)
p.currentPos += p.currentPos - p.oldPos + p.forces * TSTEP**2
p.oldPos = temp
elif p.followMouse:
temp = Vec2d(p.currentPos)
p.currentPos = Vec2d(pygame.mouse.get_pos())
p.oldPos = temp
def satisfyConstraints(self):
# Keep particles together:
for c in self.constraints:
delta = c.particles[0].currentPos - c.particles[1].currentPos
deltaLength = sqrt(delta.dot(delta))
try:
# You can get a ZeroDivisionError here once, so let's catch it.
# I think it's when particles sit on top of one another due to
# being locked.
diff = (deltaLength-c.restLength)/deltaLength
if not c.particles[0].locked:
c.particles[0].currentPos -= delta*0.5*diff
if not c.particles[1].locked:
c.particles[1].currentPos += delta*0.5*diff
except ZeroDivisionError:
pass
def accumulateForces(self):
# This doesn't do much right now, other than constantly reset the
# particles 'forces' to be 'gravity'. But this is where you'd implement
# other things, like drag, wind, etc.
for p in self:
p.forces = GRAVITY
def timeStep(self):
# This executes the whole shebang:
self.accumulateForces()
self.verlet()
for i in range(ITERATE):
self.satisfyConstraints()
def draw(self):
"""
Draw constraint connections, and particle positions:
"""
for c in self.constraints:
c.draw()
#for p in self:
# p.draw()
def lockParticle(self):
"""
If the mouse LMB is pressed for the first time on a particle, the particle
will assume the mouse motion. When it is pressed again, it will lock
the particle in space.
"""
mousePos = Vec2d(pygame.mouse.get_pos())
for p in self:
dist2mouse = sqrt(abs(pow(p.currentPos.x -
mousePos.x, 2) +
pow(p.currentPos.y -
mousePos.y, 2)))
if dist2mouse < 10:
if not p.followMouse:
p.locked = True
p.followMouse = True
p.oldPos = Vec2d(p.currentPos)
else:
p.followMouse = False
def unlockParticle(self):
"""
If the RMB is pressed on a particle, if the particle is currently
locked or being moved by the mouse, it will be 'unlocked'/stop following
the mouse.
"""
mousePos = Vec2d(pygame.mouse.get_pos())
for p in self:
dist2mouse = sqrt(abs(pow(p.currentPos.x -
mousePos.x, 2) +
pow(p.currentPos.y -
mousePos.y, 2)))
if dist2mouse < 5:
p.locked = False
#------------
# Main Program
def main():
# Screen Setup
screen = pygame.display.set_mode((WIDTH, HEIGHT))
clock = pygame.time.Clock()
# Create our grid of particles:
particleSystem = ParticleSystem(screen)
backgroundCol = Color('black')
# main loop
looping = True
while looping:
clock.tick(FRAMERATE)
pygame.display.set_caption("%s -- www.AKEric.com -- LMB: move\lock - RMB: unlock - fps: %.2f"%(TITLE, clock.get_fps()) )
screen.fill(backgroundCol)
# Detect for events
for event in pygame.event.get():
if event.type == pygame.QUIT:
looping = False
elif event.type == MOUSEBUTTONDOWN:
if event.button == 1:
# See if we can make a particle follow the mouse and lock
# its position when done.
particleSystem.lockParticle()
if event.button == 3:
# Try to unlock the current particles position:
particleSystem.unlockParticle()
# Do stuff!
particleSystem.timeStep()
particleSystem.draw()
# update our display:
pygame.display.update()
#------------
# Execution from shell\icon:
if __name__ == "__main__":
print "Running Python version:", sys.version
print "Running PyGame version:", pygame.ver
print "Running %s.py"%TITLE
sys.exit(main())
Because both programs work roughly the same way, but the Python version is SO much slower, it makes me wonder:
Is this performance difference part of the nature of Python?
What should I do differently from the above if I want to get better performance from my own Python programs? E.g store the properties of all particles inside an array instead of using individual objects, etc.
EDIT: Answered!!
#Mr E's linked PyCon talk in the comments, and #A. Rosa answer with the linked resources all helped ENORMOUSLY in better understanding how to write good, fast python code. I am now bookmarking this page for future reference :D
There is a Guido van Rossum's article linked in the section Performance Tips of the Python Wiki. In its conclusion, you can read the following sentence:
If you feel the need for speed, go for built-in functions - you can't beat a loop written in C.
The essay continues with a list of guidelines for loop optimization. I recommend both resources, since they give concrete and practical advices about optimizing Python code.
There is also a well-known group of benchmarks in benchmarksgame.alioth.debian.org, where you can find comparasions among different programs and languages in distinct machines. As can be seen, there are lots of variables in play that makes impossible state something as broad as Java is faster than Python. This is commonly summed up in the sentence "Languages don't have speeds; implementations do".
In your code can be applied more pythonic and faster alternatives using built-in functions. For example, there are several nested loops (some of them don't require processing the whole list) which can be rewritten using imap or list comprehensions. PyPy is also another interesting option to improve the performance. I'm not an expert about Python optimization, but there are lots of tips which are extremely useful (Notice that don't write Java in Python is one of them!).
Resources and another related questions on SO:
Performance differences between Python and C
Is it reasonable to integrate python with c for performance?
http://www.ibm.com/developerworks/opensource/library/os-pypy-intro/index.html?ca=drs-
http://pyevolve.sourceforge.net/wordpress/?p=1189
If you write Python like you write Java, of course it's going to be slower, idiomatic java does not translate well to idiomatic python.
Is this performance difference part of the nature of Python?
What should I do differently from the above if I want to get better performance from my own Python programs? E.g store the properties of all particles inside an array instead of using individual objects, etc.
Hard to say without seeing your code.
Here are an incomplete list of differences between python and java that may sometimes affect performance:
Processing uses immediate mode canvas, if you want a comparable performance in Python, you also need to use immediate mode canvas. Canvases in most GUI framework (including Tkinter canvas) is retained mode, which is easier to use, but inherently slower than immediate mode. You'll need to use immediate mode canvas like those provided by pygame, SDL, or Pyglet.
Python is dynamic language, that means instance member access, module member access, and global variable access is resolved at run time. Instance member access, module member access, and global variable access in python is really dictionary access. In java, they are resolved at compile time and by its nature much faster. Cache frequently accessed globals, module variables, and attributes to a local variable.
In python 2.x, range() produces a concrete list, in python, iteration done using iterator, for item in list, is usually faster than iteration done using iteration variable, for n in range(len(list)). You should almost always iterate directly using iterator instead of iterating using range(len(...)).
Python's numbers is immutable, this means any arithmetic calculation allocates a new object. This is one reason why plain python is not very suitable for low level calculations; most people that want to be able to write low level calculations without having to resort to writing C extension typically uses cython, psyco, or numpy. This usually only becomes a problem when you have millions of calculations though.
This are just partial, very incomplete list, there are many other reasons why translating java to python would produce suboptimal code. Without seeing your code it's impossible to tell what you need to do differently. Optimized python code generally looks very different than optimized java code.
I would also suggest to read about other physics engines. There are a few open source engines which use a variety of methods for calculating the "physics".
Newton Game Dynamics
Chipmunk
Bullet
Box2D
ODE (Open Dynamics Engine)
There are also ports of most of the engines:
Pymunk
PyBullet
PyBox2D
PyODE
If you read through the documentation of those engines you will often find statements saying that they are optimized for speed (30fps - 60fps). But if you think they can do this while calculating "real" physics you are wrong. Most engines calculate physics to a point where a normal user cannot optically distinguish between "real" physical behavior and "simulated" physical behavior. However if you investigate the error it is neglectable if you want to write games. But if you want to do physics, all of those engines are of no use to you.
Thats why I would say if you are doing a real physical simulation you are slower than those engines by design and you will never outrun another physics engine.
Particle-based physics simulation translates easily into linear algebra operations ie. matrix operations. Numpy offers such operations, which are implemented in Fortran/C/C++ under the hood. Well-written python/Numpy code (taking full advantage of language & library) allows to write decently fast code.

Categories

Resources