Calling python multiprocess on Windows - python

I'm developing a personal app which envolves web scraping stuffs, and inside its folders there's some files that need multiprocessing tools to makes things go more efficient as any other web scraping software. I know that Windows lacks of some aspects available on Linux such as fork, and thus when we want to instanciate a pool process in our program we need to make sure that the process code is enclosed by if __name__ == "__main__":.
But something is bothering me. Here's the scenario (there are several of them currently on my program). In my file main I'm calling import web_scraping_x, which contains a multiprocess pool. The question is: How do I protect my web_scraping_x code without messing up everything inside of it?
Here's some pseudo-code (for the better good of comprehension):
"__main__"
from web_scraping_x import scraping_manager
instructions = ["some list or dict"]
data = scraping_manager(instructions)
and
"web_scraping_x"
def escraping_manager(instructions):
if instructions is True:
scraping_a(instructios)
def scraping_a(instructions):
"do a lot of work ur proletariat"
I'm trying not to come up with a solution as bad as the problem on Windows, even though it may not be possible, such as importing my module with a changed __name__ ... so any tips humans I really appreciate. If needed, I could show my code, but I think it's just gonna make things more difficulty to share with you.

Related

How to import script that requires __name__ == "__main__"

I'm pretty new to Python, this question probably shows that. I'm working on multiprocessing part of my script, couldn't find a definitive answer to my problem.
I'm struggling with one thing. When using multiprocessing, part of the code has to be guarded with if __name__ == "__main__". I get that, my pool is working great. But I would love to import that whole script (making it a one big function that returns an argument would be the best). And here is the problem. First, how can I import something if part of it will only run when launched from the main/source file because of that guard? Secondly, if I manage to work it out and the whole script will be in one big function, pickle can't handle that, will use of "multiprocessing on dill" or "pathos" fix it?
Thanks!
You are probably confused with the concept. The if __name__ == "__main__" guard in Python exists exactly in order for it to be possible for all Python files to be importable.
Without the guard, a file, once imported, would have the same behavior as if it were the "root" program - and it would require a lot of boyler plate and inter-process comunication (like writting a "PID" file at a fixed filesystem location) to coordinate imports of the same code, including for multiprocessing.
Just leave under the guard whatever code needs to run for the root process. Everything else you move into functions that you can call from the importing code.
If you'd run "all" the script, even the part setting up the multiprocessing workers would run, and any simple job would create more workers exponentially until all machine resources were taken (i.e.: it would crash hard and fast, potentially taking the machine to an unresponsive state).
So, this is a good pattern - th "dothejob" function can call all
other functions you need, so you just need to import and call it,
either from a master process, or from any other project importing
your file as a Python module.
import multiprocessing
...
def dothejob():
...
def start():
# code to setup and start multiprocessing workers:
# like:
worker1 = multiprocessing.Process(target=dothejob)
...
worker1.start()
...
worker1.join()
if __name__ == "__main__":
start()

How do I run Python scripts automatically, while my Flask website is running on a VPS?

Okay, so basically I am creating a website. The data I need to display on this website is delivered twice daily, where I need to read the delivered data from a file and store this new data in the database (instead of the old data).
I have created the python functions to do this. However, I would like to know, what would be the best way to run this script, while my flask application is running? This may be a very simple answer, but I have seen some answers saying to incorporate the script into the website design (however these answers didn't explain how), and others saying to run it separately. The script needs to run automatically throughout the day with no monitoring or input from me.
TIA
Generally it's a really bad idea to put a webserver to handle such tasks, that is the flask application in your case. There are many reasons for it so just to name a few:
Python's Achilles heel - GIL.
Sharing system resources of the application between users and other operations.
Crashes - it happens, it could be unlikely but it does. And if you are not careful, the web application goes down along with it.
So with that in mind I'd advise you to ditch this idea and use crontabs. Basically write a script that does whatever transformations or operations it needs to do and create a cron job at a desired time.

Utility to manage multiple python scripts

I saw this post on Medium, and wondered how one might go about managing multiple python scripts.
How I Hacked Amazon's Wifi Button
This describes a system where you need to run one or more scripts continuously to catch and react to events in your network.
My question: Let's say I had multiple python scripts that I wanted to do run while I work on other things. What approaches are available to manage these scripts? I have to imagine there is a better way than having a large number of terminal windows running each script individually.
I am coming back to python, and have no formal training in computer programming, so any guidance you can provide will be greatly appreciated.
Let's say I had multiple python scripts that I wanted to do run. What
approaches are available to manage these scripts? I have to imagine
there is a better way than having a large number of terminal windows
running each script individually.
If you have several .py files in a directory that you want to run, without having a specific order, you can do:
import glob
pyFiles = glob.glob('path/*.py')
for pyFile in pyFiles:
execfile(pyFile)
Your system already runs a large number of background processes, with output to the system log or occasionally to a service-specific log file.
A common arrangement for quick and dirty deployments -- where you don't necessarily want to invest in making the scripts robust and well-behaved enough to run as proper services -- is to start the script inside screen or tmux. You can detach when you don't need to be looking at it, and can reattach at any time -- even from a remote login -- to view the output, or to troubleshoot.
Take a look at luigi (I've not used it).
https://github.com/spotify/luigi
These days (five years after the question was asked) a lot of people use docker compose. But that's a little heavy weight depending on what you want to do.
I just saw today the script server of bugy. Maybe it might be a solution for you or somebody else.
(I am just trying to find a tampermonkey script structure for python..)

Strategy to Implement Linked Processes in Python

Hey everyone I have a large scope type question that I'd really appriciate feedback/insight on.
I'm a newish programer and am Developing an 'Engine Development Enviorment' for fun at work. I have a program that makes alot of run files, and I have another program that manages batched/local multiprocessed simulations.
I want them to work seamlessly together, but I only want one instance of the 'BatchMaster' to run locally (from the taskbar). I also do alot of scripting type stuff for exploratory data analysis and would love the ability to launch simulations in a single line of code like the following.
import enginemodels
aetd = enginemodels.aetd(bleed='ON')
results = aetd.run(altitude=80000,mach=5,tempsls=60)
I'm would like to have my engine model run method send input to the 'BatchMaster' process.
My question is two parts:
1) How to do you find a multiprocessing instance in windows and send it information?
2) If there isn't an instance of that program, how do you launch it?
Thanks for any feedback or insight you can provide!
This will really help out alot of people at my workplace who aren't good at programming, and do most of their work making files via copy paste.

Run the main python program from any sub-program file

This is more of a convenience than a real problem, but the project I'm working on has a lot of separate files, and I want to basically be able to run any of those files (that all basically only contain classes) to run the main file.
Now in the middle of writing the first sentence of this question, I tried just importing main.py into each file, and that seemed to work fine and dandy, but I cant help but feeling that:
it might cause problems, and
that I had problems with circular imports before and I am somewhat surprised that nothing came up.
First let me say: this is most likely a bad idea, and it's definitely not at all standard. It will likely lead to confusion and frustration down the road.
However, if you really want to do it, you can put:
if __name__ == "__main__":
from mypackage import main
main.run()
Which, assuming mypackage.main.run() is your main entry point, will let you run any file you want as if it were the main file.
You may still hit issues issues with circular imports, and those will be completely unavoidable, unless mypackage.main doesn't import anything… Which would make it fairly useless :)
As an alternative, you may wish to use a testing framework like doctest or unittest, then configure your IDE to run the unit tests from a hotkey. This way you're automatically building the repeatable tests as you develop your code.

Categories

Resources