Logging / Printing from a pymongo Map Reduce Operation

Logging / Printing from a pymongo Map Reduce Operation - python

The mongo shell includes a useful print command.
When executing map_reduce from pymongo, how can one print / log info from within a javascript block?
Update: OK, I have an answer. The process running mongo will output whatever is printed with print(). A second option is to configure mongo logging (and probably tailing the log files). Virtual bonus points for answering "Can I still get these values directly in python via pymongo?"

You can still use print() inside the JS block which will get written to /var/log/mongodb/mongodb.log, more info here. Due to the fact that map/reduce happens server-side it won't emit anything in your Python application.

Related

Express closes the request when spawned Python script sleeps

Original problem
I am creating an API using express that queries a sqlite DB and outputs the result as a PDF using html-pdf module.
The problem is that certain queries might take a long time to process and thus would like to de-couple the actual query call from the node server where express is running, otherwise the API might slow down if several clients are running heavy queries.
My idea to solve this was to decouple the execution of the sqlite query and instead run that on a python script. This script can then be called from the API and thus avoid using node to query the DB.
Current problem
After quickly creating a python script that runs a sqlite query, and calling that from my API using child_process.spawn(), I found out that express seems to get an exit code signal as soon as the python script starts to execute the query.
To confirm this, I created a simple python script that just sleeps in between printing two messages and the problem was isolated.
To reproduce this behavior you can create a python script like this:
print("test 1")
sleep(1)
print("test 2)
Then call it from express like this:
router.get('/async', function(req, res, next) {
var python = child_process.spawn([
'python3'
);
var output = "";
python.stdout.on('data', function(data){
output += data
console.log(output)
});
python.on('close', function(code){
if (code !== 0) {
return res.status(200).send(code)
}
return res.status(200).send(output)
});
});
If you then run the express server and do a GET /async you will get a "1" as the exit code.
However if you comment the sleep(1) line, the server successfully returns
test 1
test 2
as the response.
You can even trigger this using sleep(0).
I have tried flushing the stdout before the sleep, I have also tried piping the result instead of using .on('close') and I have also tried using -u option when calling python (to use unbuffered streams).
None of this has worked, so I'm guessing there's some mechanism baked into express that closes the request as soon as the spawned process sleeps OR finishes (instead of only when finishing).
I also found this answer related to using child_process.fork() but I'm not sure if this would have a different behavior or not and this one is very similar to my issue but has no answer.
Main question
So my question is, why does the python script send an exit signal when doing a sleep() (or in the case of my query script when running cursor.execute(query))?
If my supposition is correct that express closes the request when a spawned process sleeps, is this avoidable?
One potential solution I found suggested the use of ZeroRPC, but I don't see how that would make express keep the connection open.
The only other option I can think of is using something like Kue so that my express API will only need to respond with some sort of job ID, and then Kue will actually spawn the python script and wait for its response, so that I can query the result via some other API endpoint.
Is there something I'm missing?
Edit:
AllTheTime's comment is correct regarding the sleep issue. After I added from time import sleep it worked. However my sqlite script is not working yet.

As it turns out AllTheTime was indeed correct.
The problem was that in my python script I was loading a config.json file, which was loaded correctly when called from the console because the path was relative to the script.
However when calling it from node, the relative path was no longer correct.
After fixing the path it worked exactly as expected.

How to know if a program was opened with admin permission?

I have a Python program that opens a socket that communicates with a program on
a remote computer.
I want to check if the program on the remote computer was opened with admin permissions.
I have tried to look on-line, with no success.

This depends heavily on what the remote program is, and what other access you have to the server. The most reliable way would be to query the program so that it tells you its permissions - how to do this, and even whether it is actually possible, depends on what queries it supports and what responses it gives. If that isn't possible, you may be able to ask other processes on the server about it - for example, if you have shell access, you could try parsing the output of ps aux or locating information under /proc. But that has potential to be quite brittle, and also comes with a raft of security issues - you would give shell and possibly admin to every person who runs your script.
You should probably reconsider what you are trying to do with this information - there is probably a better way to solve the problem than inspecting the privileges of remote services. You've said that the process is an rpyc instance that gives you access to some Python module. Presumably, some of its functions are unavailable if it isn't running with enough permissions - in this case, the most Pythonic solution is to do exactly what you would do were it a local script: try to do whatever it is you need to do, and be prepared to handle a failure. This would usually involve a try/except block like this:
try:
privileged_operation()
except PermissionError:
# Handle the problem
You would need to consult the module's documentation, or play around with it a bit, to find out the exact error that it gives you here.

Query Python3 script to get stats about the script

I have a script that continually runs and accepts data (For those that are familiar and if it helps, it is connected to EMDR - https://eve-market-data-relay.readthedocs.org).
Inside the script I have debugging built in so that I can see how much data is currently in the queue for the threads to process, however this is built to be used with just printing to the console. What I would like to do is be able to either run the same script with an additional option or a totally different script that would return the current queue count without having to enable debug.
Is there a way to do this could someone please point me in the direction of the documentation/libaries that I need to research?

There are many ways to solve this; two that come to mind:
You can write the queue count to a k/v store (like memcache or redis) and then have another script read that for you and do whatever other actions required.
You can create a specific logger for your informational output (like the queue length) and set it to log somewhere else other than the console. For example, you could use it to send you an email or log to an external service, etc. See the logging cookbook for examples.

shell command from python script

I need you guys :D
I have a web page, on this page I have check some items and pass their value as variable to python script.
problem is:
I Need to write a python script and in that script I need to put this variables into my predefined shell commands and run them.
It is one gnuplot and one other shell commands.
I never do anything in python can you guys send me some advices ?
THx

I can't fully address your questions due to lack of information on the web framework that you are using but here are some advice and guidance that you will find useful. I did had a similar problem that will require me to run a shell program that pass arguments derived from user requests( i was using the django framework ( python ) )
Now there are several factors that you have to consider
How long will each job takes
What is the load that you are expecting (are there going to be loads of jobs)
Will there be any side effects from your shell command
Here are some explanation that why this will be important
How long will each job takes.
Depending on your framework and browser, there is a limitation on the duration that a connection to the server is kept alive. In other words, you will have to take into consideration that the time for the server to response to a user request do not exceed the connection time out set by the server or the browser. If it takes too long, then you will get a server connection time out. Ie you will get an error response as there is no response from the server side.
What is the load that you are expecting.
You will have probably figure that if a work that you are requesting is huge,it will take out more resources than you will need. Also, if you have multiple requests at the same time, it will take a huge toll on your server. For instance, if you do proceed with using subprocess for your jobs, it will be important to note if you job is blocking or non blocking.
Side effects.
It is important to understand what are the side effects of your shell process. For instance, if your shell process involves writing and generating lots of temp files, you will then have to consider the permissions that your script have. It is a complex task.
So how can this be resolve!
subprocesswhich ship with base python will allow you to run shell commands using python. If you want more sophisticated tools check out the fabric library. For passing of arguments do check out optparse and sys.argv
If you expect a huge work load or a long processing time, do consider setting up a queue system for your jobs. Popular framework like celery is a good example. You may look at gevent and asyncio( python 3) as well. Generally, instead of returning a response on the fly, you can retur a job id or a url in which the user can come back later on and have a look
Point to note!
Permission and security is vital! The last thing you want is for people to execute shell command that will be detrimental to your system
You can also increase connection timeout depending on the framework that you are using.
I hope you will find this useful
Cheers,
Biobirdman

What does Disco's "Could not parse worker event:" error mean?

I'm trying to run a Disco job using map and reduce functions that are deserialized after being passed over a TCP socket using the marshal library. Specifically, I'm unpacking them with
code = marshal.loads(data_from_tcp)
func = types.FunctionType(code, globals(), "func")
I've already tested plain Disco jobs (with locally defined functions) on the same system, and they work fine. However, when I run a Disco job with the new functions, the jobs keep failing and I keep getting the error message localhost WARNING: [map:0] Could not parse worker event: invalid_length
I've searched the documentation, and there is no mention that I could find of a "worker event", or of an invalid_length. After doing a grep on the source code, I find a single instance of the phrase "Could not parse worker event:", specifically in the file master/src/disco_worker.erl. I'm not familiar with Erlang, and have no idea how this works.
What is causing this problem? Should I do something else to circumvent it?
EDIT: After more debugging, I've realized that this error is tied to my use of the string.split() method inside my test-case function. Whenever it is used (even on strings that are not part of the input), this error is raised. I've verified that the method does exist on the object, but calling it seems to cause problems. Any thoughts?
EDIT 2: In addition, any use of the re.split function achieves the same effect.
EDIT 3: It appears that calling any string function on the input string in the map function creates this same error.

In my case this warning occured always when I printed something to sys.stderr in map function (and the job failed in the end).
The documentation to worker protocol says: Workers should not write anything to stderr, except messages formatted as described below. stdout is also initially redirected to stderr.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.