Update : For anyone wondering what I went with at the end -
I divided the result-set into 4 and ran 4 instances of the same program with one argument each indicating what set to process. It did the trick for me. I also consider PP module. Though it worked, it prefer the same program. Please pitch in if this is a horrible implementation! Thanks..
Following is what my program does. Nothing memory intensive. It is serial processing and boring. Could you help me convert this to more efficient and exciting process? Say, I process 1000 records this way and with 4 threads, I can get it to run in 25% time!
I read articles on how python threading can be inefficient if done wrong. Even python creator says the same. So I am scared and while I am reading more about them, want to see if bright folks on here can steer me in the right direction. Muchos gracias!
def startProcessing(sernum, name):
'''
Bunch of statements depending on result,
will write to database (one update statement)
Try Catch blocks which upon failing,
will call this function until the request succeeds.
'''
for record in result:
startProc = startProcessing(str(record[0]), str(record[1]))
Python threads can't run at the same time due to the Global Interpreter Lock; you want new processes instead. Look at the multiprocessing module.
(I was instructed to post this as an answer =p.)
Related
I am currently working in a project where I need to sychronise data between python and c sharp.I need to label data from c sharp using python machine learning program. To label the data, I am using timestamp from both the application and based on the common timestamp, I am labelling the data.
Python program is running every 0.5 to 1.5 sec and C sharp program is running 10 times every 1 sec. Since the two process are running differently, I know there is some time lag. So labelling the data using the timestamp is not much accurate. I want to analyse the time lag properly. For this I am looking for options of real time synchronization between the two programs. I have looked into sockets but I think there is a better way using IPC. I donot know much about this.
I am thinking to create a shared variable between python and c#. Since python is slower, I will update that variable using python and read that variable from c# program. so same variable instance on both the program would tell us that they are synchronized perfectly. So I can look the value of this variable instead of timestamp for labelling the data. I am thinking that this might solve the issue. Please let me know what would be the best optimal solution to minimize the time difference lag between the two program.
since these are complex projects, I cannot implement them in a single program. I need to find a way to synchronize these two programs.
Any suggestions would be appreciated. Thank you.
I tried working with socket programming but they were not that good and a bit complex. So I am now thinking about IPC but still not sure which is the best way.
First of all, I implemented a socket in C# program so that I get data from the socket. Then I implemented multiprocessing in python. One process will request to the socket and another process will work for ML model. I was able to achieve the synchronization using the multiprocessing module. I used multiprocessing.Event() to wait for the event from another process. You can also look into shared variables in python using multiprocessing.Value, multiprocessing.Array, multiprocessing.Event.
begin TLDR;
I want to write a python3 script to scan through the memory of a running windows process and find strings.
end TLDR;
This is for a CTF binary. It's a typical Windows x86 PE file. The goal is simply to get a flag from the processes memory as it runs. This is easy with ProcessHacker you can search through the strings in the memory of the running application and find the flag with a regex. Now because I'm a masochistic geek I strive to script out solutions for CTFs (for everything really). Specifically I want to use python3, C# is also an option but would really like to keep all of the solution scripts in python.
Thought this would be a very simple task. You know... pip install some library written by someone that's already solved the problem and use it. Couldn't find anything that would let me do what I need for this task. Here are the libraries I tried out already.
ctypes - This was the first one I used, specifically ReadProcessMemory. Kept getting 299 errors which was because the buffer I was passing in was larger than that section of memory so I made a recursive function that would catch that exception, divide the buffer length by 2 until it got something THEN would read one byte at a time until it hit a 299 error. May have been on the right track there but I wasn't able to get the flag. I WAS able to find the flag only if I knew the exact address of the flag (which I'd get from process hacker). I may make a separate question on SO to address that, this one is really just me asking the community if something already exists before diving into this.
pymem - A nice wrapper for ctypes but had the same issues as above.
winappdbg - python2.x only. I don't want to use python 2.x.
haystack - Looks like this depends on winappdbg which depends on python 2.x.
angr - This is a possibility, Only scratched the surface with it so far. Looks complicated and it's on the to learn list but don't want to dive into something right now that's not going to solve the issue.
volatility - Looks like this is meant for working with full RAM dumps not for hooking into currently running processes and reading the memory.
My plan at the moment is to dive a bit more into angr to see if that will work, go back to pymem/ctypes and try more things. If all else fails ProcessHacker IS opensource. I'm not fluent in C so it'll take time to figure out how they're doing it. Really hoping there's some python3 library I'm missing or maybe I'm going about this the wrong way.
Ended up writing the script using the frida library. Also have to give soutz to rootbsd because his or her code in the fridump3 project helped greatly.
Say, I have two files demo.py
# demo.py
from pathlib import Path
for i in range(5):
exec(Path('another_file.txt').read_text())
and another_file.txt (note the indent)
print(i)
Is it possible to make python demo.py run?
N.B. This is useful when using Page (or wxformbuilder or pyqt's designer) to generate a GUI layout where skeletons of callback functioins are automatically generated. The skeletons will have to be modified, at the same time, each iteration overwrites the skeletons -- code snippets will have to be copied back. Anyway, you know what I am talking about if you have used any of Page or wxformbuilder or pyqt's designer.
You can solve the basic problem by removing the indent:
from pathlib import Path
import textwrap
for i in range(5):
exec(textwrap.dedent(Path('another_file.txt').read_text()))
There are still two rather major problems with this:
There are serious security implications here. You're running code without including it in your project. The idea that you can "worry about security and other issues later" will cause you pain at a later point. You'll see similar advice on this site with avoiding SQL injection. That later date may never come, and even if it does, there's a very real chance you won't remember or correctly identify all of the problems. It's absolutely better to avoid the problems in the first place.
Also, with dynamic code like this, you run the very real risk of running into a syntax error with a call stack that shows you nothing about where the code is from. It's not bad for simple cases like this, but as you add more and more complexity to a project like this, you'll likely find you're adding more support to help you debug issues you run into, rather than spending your time adding features.
And, combining these two problems can be fun. It's contrived, but if you changed the for loop to a while loop like this:
i = 0
while i < 5:
exec(textwrap.dedent(Path('another_file.txt').read_text()))
i += 1
And then modified the text file to be this:
print(i)
i += 1
It's trivial to understand why it's no longer operating the 5 times you expect, but as both "sides" of this project get more complex, figuring out the complex interplay between the elements will get much harder.
In short, don't use eval. Future you will thank past you for making your life easier.
Python newbie here. Sorry if I am a bit unclear.
I have a Python script that reads temperature, and I have another script that grabs the temperature value from the first one. If the first script terminates, the second will terminate with it.
How can I keep the second sript from terminating?
Thank you ;)
This isn't a direct answer to your question, but it's more of a directional consideration that you need to think about.
Since you're talking about temperature monitoring, it's a classic example for a Publish Subscribe model. Recently protocols like MQTT have gained in popularity, and is really helpful for the scenario that you're in.
You should really design your application with scalability in mind and explore before just using threaded python code. There's a cool python example here for MQTT.
I have a program that is performing waaaay under par, and I would like to profile it. However, it is multithreaded, so I can't seem to find a good way to profile this thing. Any advice?
I've tried yappi, but it segfaults on OS X :(
EDIT: This is in python, sorry for putting it under profiling...
Are you multithreading or multiprocessing? If you are just multithreading, then that is the problem. Python currently has problems with multithreading on a multiprocessor system because of the Global Interpreter Lock (GIL). They are working on fixing it for Python 3.2 - at least so that your program will run as fast on a single core as on multiple cores.
If you aren't convinced take a look at the shootout results for the thread-ring program. Running with a single core is faster than running with quad cores.
Now, if you use multiprocessing instead, profiling can be difficult as well because then you have to run CProfiler from each separate process. There are some questions that point you in the right direction though.
Depending on how far you've come in your troubleshooting, there are some tools that might point you in the right direction.
"top" is a helpful start to show you if your problem is burning CPU time or simply waiting for stuff.
"dtruss -c" can show you where you spend time and what system calls takes most of your time.
Both these can give you a hint without knowing anything about python.
If you just want to use yappi, it isn't too much work to set up a virtualbox and install some sort of Linux on your machine. I find myself doing that from time to time when I want to try something.
There might of course be things I don't know about that makes it impossible or not worth the effort. Also, profiling on another OS running virtualized might not give the exact same results, but it might still be helpful.