Multiprocessing With r2pipe - python

I'm having issues with using r2pipe, Radare2's API, with the multiprocessing Pool.map function in python. The problem I am facing is the application hangs on pool.join().
My hope was to use multithreading via the multiprocessing.dummy class in order to evaluate functions quickly through r2pipe. I have tried passing my r2pipe object as a namespace using the Manager class. I have attempted using events as well, but none of these seem to work.
class Test:
def __init__(self, filename=None):
if filename:
self.r2 = r2pipe.open(filename)
else:
self.r2 = r2pipe.open()
self.r2.cmd('aaa')
def t_func(self, args):
f = args[0]
r2_ns = args[1]
print('afbj # {}'.format(f['name']))
try:
bb = r2_ns.cmdj('afbj # {}'.format(f['name']))
if bb:
return bb[0]['addr']
else:
return None
except Exception as e:
print(e)
return None
def thread(self):
funcs = self.r2.cmdj('aflj')
mgr = ThreadMgr()
ns = mgr.Namespace()
ns.r2 = self.r2
pool = ThreadPool(2)
results = pool.map(self.t_func, product(funcs, [ns.r2]))
pool.close()
pool.join()
print(list(results))
This is the class I am using. I make a call to the Test.thread function in my main function.
I expect the application to print out the command it is about to run in r2pipe afbj # entry0, etc. Then to print out the list of results containing the first basic block address [40000, 50000, ...].
The application does print out the command about to run, but then hangs before printing out the results.

ENVIRONMENT
radare2: radare2 4.2.0-git 23712 # linux-x86-64 git.4.1.1-97-g5a48a4017
commit: 5a48a401787c0eab31ecfb48bebf7cdfccb66e9b build: 2020-01-09__21:44:51
r2pipe: 1.4.2
python: Python 3.6.9 (default, Nov 7 2019, 10:44:02)
system: Ubuntu 18.04.3 LTS
SOLUTION
This may be due to passing the same instance of r2pipe.open() to every call of t_func in the pool. One solution is to move the following lines of code into t_func:
r2 = r2pipe.open('filename')
r2.cmd('aaa')
This works, however its terribly slow to reanalyze for each thread/process.
Also, it is often faster to allow radare2 to do as much of the work as possible and limit the number of commands we need to send using r2pipe.
This problem is solved by using the command: afbj ##f
afbj # List basic blocks of given function and show results in json
##f # Execute the command for each function
EXAMPLE
Longer Example
import r2pipe
R2: r2pipe.open_sync = r2pipe.open('/bin/ls')
R2.cmd("aaaa")
FUNCS: list = R2.cmd('afbj ##f').split("\n")[:-1]
RESULTS: list = []
for func in FUNCS:
basic_block_info: list = eval(func)
first_block: dict = basic_block_info[0]
address_first_block: int = first_block['addr']
RESULTS.append(hex(address_first_block))
print(RESULTS)
'''
['0x4a56', '0x1636c', '0x3758', '0x15690', '0x15420', '0x154f0', '0x15420',
'0x154f0', '0x3780', '0x3790', '0x37a0', '0x37b0', '0x37c0', '0x37d0', '0x0',
...,
'0x3e90', '0x6210', '0x62f0', '0x8f60', '0x99e0', '0xa860', '0xc640', '0x3e70',
'0xd200', '0xd220', '0x133a0', '0x14480', '0x144e0', '0x145e0', '0x14840', '0x15cf0']
'''
Shorter Example
import r2pipe
R2 = r2pipe.open('/bin/ls')
R2.cmd("aaaa")
print([hex(eval(func)[0]['addr']) for func in R2.cmd('afbj ##f').split("\n")[:-1]])

Related

How to get every second's GPU usage in Python

I have a model which runs by tensorflow-gpu and my device is nvidia. And I want to list every second's GPU usage so that I can measure average/max GPU usage. I can do this mannually by open two terminals, one is to run model and another is to measure by nvidia-smi -l 1. Of course, this is not a good way. I also tried to use a Thread to do that, here it is.
import subprocess as sp
import os
from threading import Thread
class MyThread(Thread):
def __init__(self, func, args):
super(MyThread, self).__init__()
self.func = func
self.args = args
def run(self):
self.result = self.func(*self.args)
def get_result(self):
return self.result
def get_gpu_memory():
output_to_list = lambda x: x.decode('ascii').split('\n')[:-1]
ACCEPTABLE_AVAILABLE_MEMORY = 1024
COMMAND = "nvidia-smi -l 1 --query-gpu=memory.used --format=csv"
memory_use_info = output_to_list(sp.check_output(COMMAND.split()))[1:]
memory_use_values = [int(x.split()[0]) for i, x in enumerate(memory_use_info)]
return memory_use_values
def run():
pass
t1 = MyThread(run, args=())
t2 = MyThread(get_gpu_memory, args=())
t1.start()
t2.start()
t1.join()
t2.join()
res1 = t2.get_result()
However, this does not return every second's usage as well. Is there a good solution?
In the command nvidia-smi -l 1 --query-gpu=memory.used --format=csv
the -l stands for:
-l, --loop= Probe until Ctrl+C at specified second interval.
So the command:
COMMAND = 'nvidia-smi -l 1 --query-gpu=memory.used --format=csv'
sp.check_output(COMMAND.split())
will never terminate and return.
It works if you remove the event loop from the command(nvidia-smi) to python.
Here is the code:
import subprocess as sp
import os
from threading import Thread , Timer
import sched, time
def get_gpu_memory():
output_to_list = lambda x: x.decode('ascii').split('\n')[:-1]
ACCEPTABLE_AVAILABLE_MEMORY = 1024
COMMAND = "nvidia-smi --query-gpu=memory.used --format=csv"
try:
memory_use_info = output_to_list(sp.check_output(COMMAND.split(),stderr=sp.STDOUT))[1:]
except sp.CalledProcessError as e:
raise RuntimeError("command '{}' return with error (code {}): {}".format(e.cmd, e.returncode, e.output))
memory_use_values = [int(x.split()[0]) for i, x in enumerate(memory_use_info)]
# print(memory_use_values)
return memory_use_values
def print_gpu_memory_every_5secs():
"""
This function calls itself every 5 secs and print the gpu_memory.
"""
Timer(5.0, print_gpu_memory_every_5secs).start()
print(get_gpu_memory())
print_gpu_memory_every_5secs()
"""
Do stuff.
"""
Here is a more rudimentary way of getting this output, however just as effective - and I think easier to understand. I added a small 10-value cache to get a good recent average and upped the check time to every second. It outputs average of the last 10 seconds and the current each second, so operations that cause usage can be identified (what I think the original question was).
import subprocess as sp
import time
memory_total=8192 #found with this command: nvidia-smi --query-gpu=memory.total --format=csv
memory_used_command = "nvidia-smi --query-gpu=memory.used --format=csv"
isolate_memory_value = lambda x: "".join(y for y in x.decode('ascii') if y in "0123456789")
def main():
percentage_cache = []
while True:
memory_used = isolate_memory_value(sp.check_output(memory_used_command.split(), stderr=sp.STDOUT))
percentage = float(memory_used)/float(memory_total)*100
percentage_cache.append(percentage)
percentage_cache = percentage_cache[max(0, len(percentage_cache) - 10):]
print("curr: " + str(percentage) + " %", "\navg: " + str(sum(percentage_cache)/len(percentage_cache))[:4] + " %\n")
time.sleep(1)
main()

How do I reuse the code from this function cleanly?

The following function is used within a module to query network devices and is called from multiple scripts I use. The arguments it takes are a nested dictionary (the device IP and creds etc) and a string (the command to run on the device):
def query_devices(hosts, cmd):
results = {
'success' : [],
'failed' : [],
}
for host in hosts:
device = hosts[host]
try:
swp = ConnectHandler(device_type=device['dev_type'],
ip=device['ip'],
username=device['user'],
password=device['pwd'],
secret=device['secret'])
swp.enable()
results[host] = swp.send_command(cmd)
results['success'].append(host)
swp.disconnect()
except (NetMikoTimeoutException, NetMikoAuthenticationException) as e:
results['failed'].append(host)
results[host] = e
return results
I want to reuse all of the code to update a device and the only changes would be:
The function would take the same dictionary but the cmd argument would now be a list of commands.
The following line:
results[host] = swp.send_command(cmd)
would be replaced by:
results[host] = swp.send_config_set(cmd)
I could obviously just replicate the function making those two changes and as it is in a module I reuse, I am only having to do it once but I am still basically repeating a lot of the same code.
Is there a better way to do this as I seem to come across the same issue quite often in my code.
You could just add a check on the changed line:
...
if isinstance(cmd, str):
results[host] = swp.send_command(cmd)
else:
results[host] = swp.send_config_set(cmd)
...
The rest of the function can stay the same and now you can simply call it with either a string or a list of strings...
You could use the unpacking operator to always make cmds an iterable (a tuple, actually) even if it is a single value. That way you could always call send_config_set. Here is a super simplified example to illustrate the concept.
def query_devices(hosts, *cmds):
for one_cmd in cmds:
print(one_cmd)
print('query_devices("hosts", "cmd_1")')
query_devices("hosts", "cmd_1")
print('\nquery_devices("hosts", "cmd_1", "cmd_2", "cmd_3")')
query_devices("hosts", "cmd_1", "cmd_2", "cmd_3")
print('\nquery_devices("hosts", *["cmd_1", "cmd_2", "cmd_3"])')
query_devices("hosts", *["cmd_1", "cmd_2", "cmd_3"])
Output:
query_devices("hosts", "cmd_1")
cmd_1
query_devices("hosts", "cmd_1", "cmd_2", "cmd_3")
cmd_1
cmd_2
cmd_3
query_devices("hosts", *["cmd_1", "cmd_2", "cmd_3"])
cmd_1
cmd_2
cmd_3

Add metadata to TestCase in Python's unittest

I'd like to add metadata to individual tests in a TestCase that I've written to use Python's unittest framework. The metadata (a string, really) needs to be carried through the testing process and output to an XML file.
Other than remaining with the test the data isn't going to be used by unittest, nor my test code. (I've got a program that will run afterwards, open the XML file, and go looking for the metadata/string).
I've previously used NUnit which allows one to use C# attribute to do this. Specifically, you can put this above a class:
[Property("SmartArrayAOD", -3)]
and then later find that in the XML output.
Is it possible to attach metadata to a test in Python's unittest?
Simple way for just dumping XML
If all you want to do is write stuff to an XML file after every unit test, just add a tearDown method to your test class (e.g. if you have , give it a).
class MyTest(unittest.TestCase):
def tearDown(self):
dump_xml_however_you_do()
def test_whatever(self):
pass
General method
If you want a general way to collect and track metadata from all your tests and return it at the end, try creating an astropy table in your test class's __init__() and adding rows to it during tearDown(), then extracting a reference to your initialized instances of your test class from unittest, like this:
Step 1: set up a re-usable subclass of unittest.TestCase so we don't have to duplicate the table handling
(put all the example code in the same file or copy the imports)
"""
Demonstration of adding and retrieving meta data from python unittest tests
"""
import sys
import warnings
import unittest
import copy
import time
import astropy
import astropy.table
if sys.version_info < (3, 0):
from StringIO import StringIO
else:
from io import StringIO
class DemoTest(unittest.TestCase):
"""
Demonstrates setup of an astropy table in __init__, adding data to the table in tearDown
"""
def __init__(self, *args, **kwargs):
super(DemoTest, self).__init__(*args, **kwargs)
# Storing results in a list made it convenient to aggregate them later
self.results_tables = [astropy.table.Table(
names=('Name', 'Result', 'Time', 'Notes'),
dtype=('S50', 'S30', 'f8', 'S50'),
)]
self.results_tables[0]['Time'].unit = 'ms'
self.results_tables[0]['Time'].format = '0.3e'
self.test_timing_t0 = 0
self.test_timing_t1 = 0
def setUp(self):
self.test_timing_t0 = time.time()
def tearDown(self):
test_name = '.'.join(self.id().split('.')[-2:])
self.test_timing_t1 = time.time()
dt = self.test_timing_t1 - self.test_timing_t0
# Check for errors/failures in order to get state & description. https://stackoverflow.com/a/39606065/6605826
if hasattr(self, '_outcome'): # Python 3.4+
result = self.defaultTestResult() # these 2 methods have no side effects
self._feedErrorsToResult(result, self._outcome.errors)
problem = result.errors or result.failures
state = not problem
if result.errors:
exc_note = result.errors[0][1].split('\n')[-2]
elif result.failures:
exc_note = result.failures[0][1].split('\n')[-2]
else:
exc_note = ''
else: # Python 3.2 - 3.3 or 3.0 - 3.1 and 2.7
# result = getattr(self, '_outcomeForDoCleanups', self._resultForDoCleanups) # DOESN'T WORK RELIABLY
# This is probably only good for python 2.x, meaning python 3.0, 3.1, 3.2, 3.3 are not supported.
exc_type, exc_value, exc_traceback = sys.exc_info()
state = exc_type is None
exc_note = '' if exc_value is None else '{}: {}'.format(exc_type.__name__, exc_value)
# Add a row to the results table
self.results_tables[0].add_row()
self.results_tables[0][-1]['Time'] = dt*1000 # Convert to ms
self.results_tables[0][-1]['Result'] = 'pass' if state else 'FAIL'
with warnings.catch_warnings():
warnings.filterwarnings('ignore', category=astropy.table.StringTruncateWarning)
self.results_tables[0][-1]['Name'] = test_name
self.results_tables[0][-1]['Notes'] = exc_note
Step 2: set up a test manager that extracts metadata
def manage_tests(tests):
"""
Function for running tests and extracting meta data
:param tests: list of classes sub-classed from DemoTest
:return: (TextTestResult, Table, string)
result returned by unittest
astropy table
string: formatted version of the table
"""
table_sorting_columns = ['Result', 'Time']
# Build test suite
suite_list = []
for test in tests:
suite_list.append(unittest.TestLoader().loadTestsFromTestCase(test))
combo_suite = unittest.TestSuite(suite_list)
# Run tests
results = [unittest.TextTestRunner(verbosity=1, stream=StringIO(), failfast=False).run(combo_suite)]
# Catch test classes
suite_tests = []
for suite in suite_list:
suite_tests += suite._tests
# Collect results tables
results_tables = []
for suite_test in suite_tests:
if getattr(suite_test, 'results_tables', [None])[0] is not None:
results_tables += copy.copy(suite_test.results_tables)
# Process tables, if any
if len(results_tables):
a = []
while (len(a) == 0) and len(results_tables):
a = results_tables.pop(0) # Skip empty tables, if any
results_table = a
for rt in results_tables:
if len(rt):
with warnings.catch_warnings():
warnings.filterwarnings('ignore', category=DeprecationWarning)
results_table = astropy.table.join(results_table, rt, join_type='outer')
try:
results_table = results_table.group_by(table_sorting_columns)
except Exception:
print('Error sorting test results table. Columns may not be in the preferred order.')
column_names = list(results_table.columns.keys())
alignments = ['<' if cn == 'Notes' else '>' for cn in column_names]
if len(results_table):
rtf = '\n'.join(results_table.pformat(align=alignments, max_width=-1))
exp_res = sum([result.testsRun - len(result.skipped) for result in results])
if len(results_table) != exp_res:
print('ERROR forming results table. Expected {} results, but table length is {}.'.format(
exp_res, len(results_table),
))
else:
rtf = None
else:
results_table = rtf = None
return results, results_table, rtf
Step 3: Example usage
class FunTest1(DemoTest):
#staticmethod
def test_pass_1():
pass
#staticmethod
def test_fail_1():
assert False, 'Meant to fail for demo 1'
class FunTest2(DemoTest):
#staticmethod
def test_pass_2():
pass
#staticmethod
def test_fail_2():
assert False, 'Meant to fail for demo 2'
res, tab, form = manage_tests([FunTest1, FunTest2])
print(form)
print('')
for r in res:
print(r)
for error in r.errors:
print(error[0])
print(error[1])
Sample results:
$ python unittest_metadata.py
Name Result Time Notes
ms
-------------------- ------ --------- ----------------------------------------
FunTest2.test_fail_2 FAIL 5.412e-02 AssertionError: Meant to fail for demo 2
FunTest1.test_fail_1 FAIL 1.118e-01 AssertionError: Meant to fail for demo 1
FunTest2.test_pass_2 pass 6.199e-03
FunTest1.test_pass_1 pass 6.914e-03
<unittest.runner.TextTestResult run=4 errors=0 failures=2>
Should work with python 2.7 or 3.7. You can add whatever columns you want to the table. You can add parameters and stuff to the table in setUp, tearDown, or even during the tests.
Warnings:
This solution accesses a protected attribute _tests of unittest.suite.TestSuite, which can have unexpected results. This specific implementation works as expected for me in python2.7 and python3.7, but slight variations on how the suite is built and interrogated can easily lead to strange things happening. I couldn't figure out a different way to extract references to the instances of my classes that unittest uses, though.

ThreadPool from python's multiprocessing hangs out

I have a phantom problem with one of my unit tests.
I use a ThreadPool from multiprocessing package for wrapping stdout and stderr funtions from my class utilizing paramiko. During creation I made some real life tests using code below and it is working nicely. But during writing unit test for that code I managed to get into problem, that this usage of ThreadPool hangs out.
This part hangs out for like 95 percent of time and somehow sometimes executes properly.
while not (self.__stdout_async_r.ready() and self.__stderr_async_r.ready()):
time.sleep(WAIT_FOR_DATA)
I've checked the values during debugging and I've found out that sometimes there is one or other condition set to finished but the other is not. But both functions are already finished so the results is just asking for the state that is never changed in the future.
The code for reproduce (with functionality necessary for this issue):
import time
from multiprocessing.pool import ThreadPool
class ExecResult(object):
def __init__(self, command=None, exit_status_func=None,
receive_stdout_func=None, receive_stderr_func=None,
connection=None):
self.connection = connection
self.stdout = None
self.stderr = None
self.ecode = None
self.ts_stop = None
self._exit_status_f = exit_status_func
self.result_available = False
self.__fetch_streams(receive_stdout_func, receive_stderr_func)
def wait_for_data(self):
WAIT_FOR_DATA = 0.1
if not self.result_available:
# Here it hangs out for 95 percent
while not (self.__stdout_async_r.ready() and self.__stderr_async_r.ready()):
time.sleep(WAIT_FOR_DATA)
self.result_available = True
self.ts_stop = time.time()
self.stdout = self.__stdout_async_r.get(timeout=2)
self.stderr = self.__stderr_async_r.get(timeout=2)
self.ecode = self._exit_status_f()
def __fetch_streams(self, stdout_func, stderr_func):
stdout_t = ThreadPool(processes=1)
stderr_t = ThreadPool(processes=1)
self.__stdout_async_r = stdout_t.apply_async(func=stdout_func)
self.__stderr_async_r = stderr_t.apply_async(func=stderr_func)
stdout_t.close()
stderr_t.close()
def stderr():
return "stderr"
def stdout():
return "stdout"
def exit():
return "0"
# actual reproduction
res = ExecResult(None, exit, stdout, stderr, None)
res.wait_for_data() #if are data available get them or wait
print res.stdout
print res.stderr
print res.ecode
As it usually is, I found out an answer for this after some time spent cursing and doing some tea.
Solution is to add this after close methods:
stdout_t.join()
stderr_t.join()
So this is the repaired part as whole:
def __fetch_streams(self, stdout_func, stderr_func):
stdout_t = ThreadPool(processes=1)
stderr_t = ThreadPool(processes=1)
self.__stdout_async_r = stdout_t.apply_async(func=stdout_func)
self.__stderr_async_r = stderr_t.apply_async(func=stderr_func)
stdout_t.close()
stderr_t.close()
stdout_t.join()
stderr_t.join()

Python multi-threading missing jobs

If I run the script step by step works perfectly, but when I'm using threading misses 50-60%. I'm using Python + mechanize module
#setting up the browser
mySite = 'http://example.com/managament.php?'
postData = {'UserID' : '', 'Action':'Delete'}
job_tab1_user1 = [1,2,3]
job_tab2_user1 = [4,5,6]
job_tab1_user2 = [7,8,9]
job_tab2_user2 = [10,12,13]
.... till user1000
#i want to point out that the lists are 100% different
def user1_jobs:
for i in job_tab1_user1:
browser.open("http://example.com/jobs.php?actions="+i)
browser.open(mySite, Post_data)
for i in job_tab2_user1:
browser.open("http://example.com/jobs.php?actions="+i)
browser.open(mySite, Post_data)
def user2_jobs:
for i in job_tab1_user2:
browser.open("http://example.com/jobs.php?actions="+i)
browser.open(mySite, Post_data)
for i in job_tab2_user2:
browser.open("http://example.com/jobs.php?actions="+i)
browser.open(mySite, Post_data)
... and so on till user 1000
And I call them in the end like this:
t_user1 = threading.Thread(target=user1_jobs, args=[])
t_user1.start()
t_user2 = threading.Thread(target=user2_jobs, args=[])
t_user2.start()
I have a similar script that sends like 200 request per second and all of them are processed. I also tried using time.sleep(2), but again is missing a lot.
Another question besides what is wrong with my script is if its way to compact this code, because I'm using 1000 users and the script reaches thousands of lines. Thank you in advance.
from threading import *
submits = [[1,2,3], [3,4,5], [6,7,8]]
class worker(Thread):
def __init__(self, site, postdata, data):
Thread.__init__(self)
self.data = data
self.site = site
self.postdata = postdata
self.start()
def run(self):
for i in self.data:
browser.open("http://example.com/jobs.php?actions="+str(i))
browser.open(self.site, self.postdata)
for obj in submits:
worker('http://example.com/managament.php?', {'UserID' : '', 'Action':'Delete'}, submits)
Since the OP asked for it, here's a condensed/compressed version of the code.
or:
for index in range(0,1000):
worker('http://example.com/managament.php?', {'UserID' : '', 'Action':'Delete'}, [i for i in range(1,4)])
If the data you want to send actually is a sequence of 3 integers (1,2,3) that inclines in a perfect order.
Here is a full script that you can easily modify by changing the initial variables.
It creates a list dynamically and uses a generator to create the functions for each thread.
Currently it creates 1000 users, each with 2 tabs and 3 jobs.
# define your variables here
NUM_USERS = 1000
NUM_JOBS_PER_USER = 3
NUM_TABS_PER_USER = 2
URL_PART = "http://example.com/jobs.php?actions="
# populate our list of jobs
# the structure is like this: jobs[user][tab][job]
jobs = [[[0 for y in range(NUM_JOBS_PER_USER)] \
for x in range(NUM_TABS_PER_USER)] \
for x in range(NUM_USERS)]
p = 1
for i in range(NUM_USERS):
for j in range(NUM_TABS_PER_USER):
for k in range(NUM_JOBS_PER_USER):
jobs[i][j][k] = p
p += 1
# create a generator that builds our thread functions
def generateFunctions(jobs):
for user in jobs:
for tab in user:
for job in tab:
def f():
browser.open(URL_PART + str(job))
browser.open(mySite, Post_data)
yield f
# create and start threads, add them to a list
# if we need to preserve handlers for later use
threads = []
for f in generateFunctions(jobs):
thr = threading.Thread(target = f, args=[])
thr.start()
threads.append(thr)

Categories

Resources