How to multiprocess pandas dataframe using map?

How to multiprocess pandas dataframe using map? - python

So I am able to multiprocess with the map function but when I add another variable it does not work.
name url
0 camera1 http://x.x.x.x:83/mjpg/video.mjpg
1 camera2 http://x.x.x.x:82/mjpg/video.mjpg
2 camera3 http://x.x.x.x:80/mjpg/video.mjpg
3 camera4 http://x.x.x.x:8001/mjpg/video.mjpg
4 camera5 http://x.x.x.x:8001/mjpg/video.mjpg
5 camera6 http://x.x.x.x:81/mjpg/video.mjpg
6 camera7 http://x.x.x.x:80/mjpg/video.mjpg
7 camera8 http://x.x.x.x:88/mjpg/video.mjpg
8 camera9 http://x.x.x.x:84/mjpg/video.mjpg
9 camera10 http://x.x.x.x:80/mjpg/video.mjpg
Here is my pandas dataframe. I have actual IPs btw.
The code below works. I have only 1 variable in the subprocess run. What the code is doing is recording the http urls all at once.
camera_df = pd.read_csv('/home/test/streams.csv',low_memory=False)
def ffmpeg_function(*arg):
subprocess.run(["/usr/bin/ffmpeg", "-y", "-t", "10", "-i", *arg, "-f", "null", "/dev/null"], capture_output=True)
p = mp.Pool(mp.cpu_count())
camera_df['url'] = p.map(ffmpeg_function, camera_df['url'])
But when I try to add another variable to name the mp4 file that I am recording it does not work. What I am trying to do is record the http url and name the mp4 file after the name in the column next to it
camera_df = pd.read_csv('/home/test/streams.csv',low_memory=False)
def ffmpeg_function(*arg):
subprocess.run(["/usr/bin/ffmpeg", "-y", "-t", "10", "-i", *arg, *arg], capture_output=True)
p = mp.Pool(mp.cpu_count())
video_file = '/home/test/test.mp4'
camera_df['url'] = p.map(ffmpeg_function, [camera_df['url'], [camera_df['url']])
I get the following error below
TypeError: expected str, bytes or os.PathLike object, not Series

There is absolutely no good reason to involve pandas in any of this. Just use:
import multiprocessing as mp
import csv
def ffmpeg_function(args):
result = subprocess.run(["/usr/bin/ffmpeg", "-y", "-t", "10", "-i", *args], capture_output=True)
return result.stdout # not sure what you actually need...
with open('/home/test/streams.csv') as f, mp.Pool(mp.cpu_count()) as pool:
reader = csv.reader(f)
# skip header in csv
next(reader)
result = pool.map(ffmpeg_function, reader)
If you insist on using pandas to do this, then just use itertuples:
with mp.Pool(mp.cpu_count()) as pool:
df = pd.read_csv('/home/test/streams.csv')
df['whatever'] = pool.map(
ffmpeg_function,
df.itertuples(index=False, name=None)
)
There are a lot of different ways you could have done this.
Note, in the ffmep_function you have to actually return something. Not exactly sure what you want. You may want to use return result.stdout.decode() if you want a string instead of bytes objects.

Related

Tryin to do ping to several pc with python. .

I'm trying to read a file which contains the ips of 300 computers an then writing in a new file if exists or not each one.
import os
import datetime
import platform
import subprocess
date = datetime.datetime.now()
day = date.day
hour = date.hour
os.chdir ('Path')
openips = open ("ips.txt","r")
ipfile = openips.readlines()
print (ipfile)
for ips in ipfile():
ips = ips.strip()
print (ips)
args = ["ping", "-n", "4", "-l", "1", "-w", "1000", ips]
pping = subprocess.Popen(args, stdout = subprocess.PIPE, stderr = subprocess.PIPE)
for line in pping.stdout:
print (line)
os.chdir ('Path')
presult = open ("pingresults_{}_{}.txt".format(day,hour), 'a')
presult.write ('{}_{}\n'.format(ips, line))
presult.close ()
Dunno why, everytime I'm testing my code. . . the result is:
TypeError: 'list' object is not callable
I've tried everything, even when I change the variable to string . . . say the same, just changing the list for string

change:
for ips in ipfile:
...
no brackets

Knowing if my server is up or down using ping. .

Thanks to all for your time.
I'm trying to know if several server are up or down using ping, and it works . . . but when I try convert the result into a up or down, something is wrong and always is down.
Dunno what other thing I should try, don't need anything else, just up or down and the IP.
import os
import datetime
import platform
import subprocess
import string
date = datetime.datetime.now()
day = date.day
hour = date.hour
def writedoc ():
os.chdir ('Path')
wresult = open ("pingresults_{}_{}.txt".format(day,hour), 'a')
wresult.write ('{}-{}\n'.format(ips, rping))
wresult.close ()
os.chdir ('Path')
openips = open ("ips.txt","r")
ipfile = openips.readlines()
for ips in ipfile:
ips = ips.strip()
print (ips)
args = ["ping", "-n", "4", "-l", "1", "-w", "1000", ips]
pping = subprocess.Popen(args, stdout = subprocess.PIPE, stderr = subprocess.PIPE)
rping = pping.stdout
for line in rping:
print (line)
if (rping.find("(100% perdidos)" != -1)):
result = "down"
print (result)
else:
result = "up"
print (result)
writedoc()

if (rping.find("(100% perdidos)" != -1))
Should this instead be
if (rping.find("(100% perdidos)") != -1)
So that this checks that rping.find("(100% perdidos)") does not return - 1.
With your example you are effectively calling rping.find(True) as
"(100% perdidos)" does not equal - 1.

python subprocess call to compare two CSV files using grep or awk or sed

I have two CSV files files on /tmp/ directory.
One CSV file results are from a python results and second CSV file is the master file to match.
>>> import json
>>> resp = { "status":"success", "msg":"", "data":[ { "website":"https://www.blahblah.com", "severity":"low", "location":"unknown", "asn_number":"AS4134 Chinanet", "longitude":121.3997000000, "epoch_timestamp":1530868957, "id":"c1e15eccdd1f31395506fb85" }, { "website":"https://www.jhonedoe.co.uk/sample.pdf", "severity":"low", "location":"unknown", "asn_number":"AS4134 Chinanet", "longitude":120.1613998413, "epoch_timestamp":1530868957, "id":"933bf229e3e95a78d38223b2" } ] }
>>> response = json.loads(json.dumps(resp))
>>> KEYS = 'website', 'asn_number' , 'severity'
>>> x = []
>>> for attribute in response['data']:
... csv_response = ','.join(attribute[key] for key in KEYS)
... with open('/tmp/processed_results.csv', 'a') as score:
... score.write(csv_response + '\n')
$cat processed_results.csv
https://www.blahblah.com,AS4134 Chinanet,low
https://www.jhonedoe.co.uk/sample.pdf,AS4134 Chinanet,low
Meta file to match.
$cat master_meta.csv
http://download2.freefiles-10.de,AS24940 Hetzner Online GmbH,high
https://www.jhonedoe.co.uk/sample.pdf,AS4134 Chinanet,low
http://download2.freefiles-11.de,AS24940 Hetzner Online GmbH,high
www.solener.com,AS20718 ARSYS INTERNET S.L.,low
https://www.blahblah.com,AS4134 Chinanet,low
www.telewizjairadio.pl,AS29522 Krakowskie e-Centrum Informatyczne JUMP Dziedzic,high
I know how to use grep to compare the two files and get the matching lines.
$grep -Ff processed_results.csv master_meta.csv
https://www.jhonedoe.co.uk/sample.pdf,AS4134 Chinanet,low
https://www.blahblah.com,AS4134 Chinanet,low
Any suggestions on how to use python subprocess call to pass grep/sed/awk commands to compare two files and get the matching lines in a variable ?

Most people won't call this "good", but if you just use it for yourself you shouldn't care too much.
def sh(cmd, verbose=True):
"""Returns the stdout of the shell command as an iterator over the lines,
but the process self is blocking.
The lines are strip()-ed.
"""
if verbose:
print("[INFO]: executing: " + cmd)
out, err = Popen(cmd, stdout=PIPE, stderr=PIPE, shell=True).communicate()
if err:
print("[ERROR]: while executing: " + cmd)
print(err.decode('ascii').strip())
return out.decode('ascii').strip()

Pytonic way of passing values between process

I need a simple way to pass the stdout of a subprocess as a list to another function using multiprocess:
The first function that invokes subprocess:
def beginRecvTest():
command = ["receivetest","-f=/dev/pcan33"]
incoming = Popen(command, stdout = PIPE)
processing = iter(incoming.stdout.readline, "")
lines = list(processing)
return lines
The function that should receive lines:
def readByLine(lines):
i = 0
while (i < len(lines)):
system("clear")
if(lines[i][0].isdigit()):
line = lines[i].split()
dictAdd(line)
else:
next
print ; print "-" *80
for _i in mydict.keys():
printMsg(mydict, _i)
print "Keys: ", ; print mydict.keys()
print ; print "-" *80
sleep(0.3)
i += 1
and the main from my program:
if __name__ == "__main__":
dataStream = beginRecvTest()
p = Process(target=dataStream)
reader = Process(target=readByLine, args=(dataStream,))
p.start()
reader.start()
I've read up on using queues, but I don't think that's exactly what I need.
The subprocess called returns infinite data so some people have suggested using tempfile, but I am totally confused about how to do this.
At the moment the script only returns the first line read, and all efforts on looping the beginRecvTest() function have ended in compilation errors.

Read a file and do something, multithreading

This source is just an exemple:
inputf = open('input', 'r')
outputf = open('output', 'a')
for x in inputf:
x = x.strip('\n')
result = urllib2.urlopen('http://test.com/'+x).getcode()
outputf.write(x+' - '+result+'\n')
I want to add threading to this to check a few URLs at the same time.
The user should everytime decide how many threads he want to use.
The order on the output is not important.
What is the best and most beautiful way for that?

I like multiprocessing.pool.ThreadPool (or multiprocessing.pool.Pool)
like:
from multiprocessing.pool import ThreadPool
n_threads = 5
pool = ThreadPool(processes=n_threads)
threads = [pool.apply_async(some_function, args=(arg1,)) for arg1 in args]
pool.close()
pool.join()
results = [result.get() for result in threads]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to multiprocess pandas dataframe using map? - python

Related

Tryin to do ping to several pc with python. .

Knowing if my server is up or down using ping. .

python subprocess call to compare two CSV files using grep or awk or sed

Pytonic way of passing values between process

Read a file and do something, multithreading

Categories

Resources