So I am implementing a pathfinding algorithm in python and I want to look how each iteration looks like. So I made it a generator that will yield every intermediate result up until the final result which will end with a return statement.
I made a quick pygame (because I am not aware of python libraries yet, so it was the simplest for me to make a grid and color the cells) program to visualize everything. Every frame makes an iteration on the algorithm and updates a variable result = next(alg). The problem is when the algorithm ends, It still tries to go next. I was wondering if there is a way around this other than catching the stopiteration error. The best for me would be something like if not alg.over() : result = next(alg), but I found nothing on the internet. Is there something like that that I can use? Thank you !
Related
I have graphed many things before, but any particular reason this basic two blocks won't plot my function? I can change the function to say, "f = t**2" and it will plot this, but for my specific function that I need, it won't work..
I don't know why it is giving me an error.
Jupyter screenshot:-
I believe it is because the function is expecting a single value, and you passed an array. Maybe try a for loop where each iteration you perform the calculation on the next value in t_span and save the results to an array to be returned.
when posting code, just copy and paste the code in your question and format it, that way it's easier for people to see it and help you.
I have a structure, looking a lot like a graph but I can 'sort' it. Therefore I can have two graphs, that are equivalent, but one is sorted and not the other. My goal is to compute a minimal dominant set (with a custom algorithm that fits my specific problem, so please do not link to other 'efficient' algorithms).
The thing is, I search for dominant sets of size one, then two, etc until I find one. If there isn't a dominant set of size i, using the sorted graph is a lot more efficient. If there is one, using the unsorted graph is much better.
I thought about using threads/multiprocessing, so that both graphs are explored at the same time and once one finds an answer (no solution or a specific solution), the other one stops and we go to the next step or end the algorithm. This didn't work, it just makes the process much slower (even though I would expect it to just double the time required for each step, compared to using the optimal graph without threads/multiprocessing).
I don't know why this didn't work and wonder if there is a better way, that maybe doesn't even required the use of threads/multiprocessing, any clue?
If you don't want an algorithm suggestion, then lazy evaluation seems like the way to go.
Setup the two in a data structure such that with a class_instance.next_step(work_to_do_this_step) where a class instance is a solver for one graph type. You'll need two of them. You can have each graph move one "step" (whatever you define a step to be) forward. By careful selection (possibly dynamically based on how things are going) of what a step is, you can efficiently alternate between how much work/time is being spent on the sorted vs unsorted graph approaches. Of course this is only useful if there is at least a chance that either algorithm may finish before the other.
In theory if you can independently define what those steps are, then you could split up the work to run them in parallel, but it's important that each process/thread is doing roughly the same amount of "work" so they all finish about the same time. Though writing parallel algorithms for these kinds of things can be a bit tricky.
Sounds like you're not doing what you describe. Possibly you're waiting for BOTH to finish somehow? Try doing that, and seeing if the time changes.
I would like to use the following code to find a specific number of permutations in a list.
def permutations(ls, prefix, result):
if ls:
for i in range(len(ls)):
permutations([*ls[0:i], *ls[i + 1:]], [*prefix, ls[i]], result)
else:
result.append(prefix)
return result
My issue is that I cannot simply include another parameter to count the number of permutations found. I can't do that because each recursive call to permutations() "splits" into a new "version" (it's like a fork in the road with the old counter and each branch counts its own number of permutations found, not communicating with the others). In other words, this won't work:
def permutations(ls, prefix, result, count, limit):
if count > limit:
return
if ls:
for i in range(len(ls)):
permutations([*ls[0:i], *ls[i + 1:]], [*prefix, ls[i]], result)
else:
count += 1
result.append(prefix)
return result
So what I would like to do instead of including a count parameter in the function signature, is to notify some other part of my program every time a new permutation is found, and keep track of the count that way. This may be a good use of threading, but I would like to do it without parallelization if possible (or at least the simplest parallelized solution possible).
I realize that I would have to spawn a new thread at each call to permutations([*ls[0:i], *ls[i + 1:]], [*prefix, ls[i]], result) in the for loop.
I'm hoping that someone would be so kind as to point me in the right direction or let me know if there is a better way to do this in Python.
If you are not using threading, then I recommend not using threading and also not thinking in terms of using threading.
The reason is that the more simply and directly you are able to tackle a problem, the easier it is to think about.
As a second tip, any time you find yourself iterating through permutations, you probably should find a better approach. The reason is that the number of permutations of length n grows as n!, and depending on what you are doing/your patience, computers top out somewhere between n=10 and n=15. So finding ways to count without actually iterating becomes essential. How do do that, of course, depends on your problem.
But back to the problem as stated. I would personally solve this type of problem in Python using generators. That is, you have code that can produce the next element of the list in a generator, and then elsewhere you can have code that processes it. This allows you to start processing your list right away, and not keep it all in memory.
In a language without generators, I would tackle this with closures. That is you pass in a function (or object) that you call for each value, which does whatever it wants to do. That again allows you to separate the iteration logic from the logic of what you want to do with each iteration.
If you're working with some other form of cooperative multi-tasking, use that instead. So, for example, in JavaScript you would have to figure out how to coordinate using Promises. (Luckily the async/await syntax lets you do that and make it look almost like a generator approach. Note that you may wind up with large parts of the data set in memory at once. How to avoid that is a topic in and of itself.) For another example, in Go you should use channels and goroutines.
I would only go to global variables as a last resort. And if you do, remember that you need enough memory to keep the entire data set that you iterated over in memory at once. This may be a lot of memory!
I prefer all of these over the usual multi-threading approach.
recently I've been working on a project using the raspberry pi camera OpenCV and Python to count people passing by a specific area, live, since for my usage will be easier than processing a recorded video.
Overall the code works and all, but I've been experiencing a problem with the counting part of it, that:
1 - If an object stays in the reference line, it keeps adding to the counts;
2 - Sometimes depending on the speed of the object, it is counted multiple times;
I am not an expert on python, and may be lacking the words in english to look for the proper solution, so I thought maybe someone could tell me what would be better here to solve this problem. To illustrate, here it is a gif sample:
Even tough it looks like there are more than one reference box crossing the line, it happens when only one box crosses it, as well as when the object stays on the line.
This is the code that checks if the object is crossing the line:
if (TestaInterseccaoEntrada(CoordenadaYCentroContorno,CoordenadaYLinhaEntrada,CoordenadaYLinhaSaida)):
ContadorEntradas += 1
if (TestaInterseccaoSaida(CoordenadaYCentroContorno,CoordenadaYLinhaEntrada,CoordenadaYLinhaSaida)):
ContadorSaidas += 1
I thought of using some kind of delay with time.sleep(x) on the loop, but that does not solve it obviously, and also looks bad =D.
If needed, I may post the rest of the code here, but it's here, to keep things here tidy: Code Paste
Don't mind any bad syntax or errors, part of it is not mine and the part that is, looks terrible! XD
Thanks in advance.
Cool project! It's quite a challenge to count the amount of bounding boxes that pass each line if you don't track them. It's even worse if you want to count them going both ways.
Because of this difficulty usually people prefer to track the object and then look at the trajectory to determine if the object passed the line or not.
This link can help you understand the difference. It also provides code to do the detection (but you got that part working already) and tracking (which you will need)
https://www.pyimagesearch.com/2018/08/13/opencv-people-counter/
Next to that the easiest way to track is by linking the boxes with the highest iou. A good and easy implementation can be found here:
https://github.com/bochinski/iou-tracker
Good luck!
I'm trying to think about how a Python API might look for large datastores like Cassandra. R, Matlab, and NumPy tend to use the "everything is a matrix" formulation and execute each operation separately. This model has proven itself effective for data that can fit in memory. However, one of the benefits of SAS for big data is that it executes row by row, doing all the row calculations before moving to the next. For a datastore like Cassandra, this model seems like a huge win -- we only loop through data once.
In Python, SAS's approach might look something like:
with load('datastore') as data:
for row in rows(data):
row.logincome = row.log(income)
row.rich = "Rich" if row.income > 100000 else "Poor"
This is (too?) explicit but has the advantage of only looping once. For smaller datasets, performance will be very poor compared to NumPy because the functions aren't vectorized using compiled code. In R/Numpy we would have the much more concise and compiled:
data.logincome = log(data.income)
data.rich = ifelse(data.income > 100000, "Rich", Poor")
This will execute extremely quickly because log and ifelse are both compiled functions that operator on vectors. A downside, however, is that we will loop twice. For small datasets this doesn't matter, but for a Cassandra backed datastore, I don't see how this approach works.
Question: Is there a way to keep the second API (like R/Numpy/Matlab) but delay computation. Perhaps by calling a sync(data) function at the end?
Alternative ideas? It would be nice to maintain the NumPy type syntax since users will be using NumPy for smaller operations and will have an intuitive grasp of how that works.
I don't know anything about Cassandra/NumPy, but if you adapt your second approach (using NumPy) to process data in chunks of a reasonable size, you might benefit from the CPU and/or filesystem cache and therefore prevent any slowdown caused by looping over the data twice, without giving up the benefit of using optimized processing functions.
I don't have a perfect answer, just a rough idea, but maybe it is worthwhile. It centers around Python generators, in sort of a producer-consumer style combination.
For one, as you don't want to loop twice, I think there is no way around an explicit loop for the rows, like this:
for row in rows(data):
# do stuff with row
Now, feed the row to (an arbitrary number of) consumers that are - don't choke - generators again. But you would be using the send method of the generator. As an example for such a consumer, here is a sketch of riches:
def riches():
rich_data = []
while True:
row = (yield)
if row == None: break
rich_data.append("Rich" if row.income > 100000 else "Poor")
yield rich_data
The first yield (expression) is just to fuel the individual rows into riches. It does its thing, here building up a result array. After the while loop, the second yield (statement) is used to actually provide the result data to the caller.
Going back to the caller loop, it could look someting like this:
richConsumer = riches()
richConsumer.next() # advance to first yield
for row in rows(data):
richConsumer.send(row)
# other consumers.send(row) here
richConsumer.send(None) # make consumer exit its inner loop
data.rich = richConsumer.next() # collect result data
I haven't tested that code, but that's how I think about it. It doesn't have the nice compact syntax of the vector-based functions. But it makes the main loop very simple and encapsulates all processing in separate consumers. Additional consumers can be nicely stacked after each other. And the API could be further polished by pushing generator managing code behind e.g. object boundaries. HTH