I am writing a code to parallelize a task using Python 3.x multiprocessing library.
I have two classes, (Content of which is too large to be pasted), and two functions like so:
def abc_1(objekt, raw):
return raw + "changed"
def file_task(file_name):
con_fig = Some_Config()
obj_ect = Some_other_class(con_fig)
pool = Pool(4)
with open(file_name, 'r', encoding='utf-8') as fh:
res = pool.map(partial(abc_1, con_fig), fh, 4)
return res
When I run above code I get this error:
cls = <class 'multiprocessing.reduction.ForkingPickler'>
obj = (0, 0, <function mapstar at 0x10abec9d8>, ((functools.partial(<function abc_1 at 0x10ad98620>, <project.some_file.... 'Indeed, Republican lawyers identified only 300 cases of electoral fraud in the United States in a decade.\n')),), {})
protocol = None
#classmethod
def dumps(cls, obj, protocol=None):
buf = io.BytesIO()
cls(buf, protocol).dump(obj)
TypeError: 'bool' object is not callable
However, If I change my two functions like so, where I don't need to send object of a class to my function abc_1 which I want parallelized, it works fine:
def abc_1(raw):
return raw + "changed"
def file_task(file_name):
con_fig = Some_Config()
obj_ect = Some_other_class(con_fig)
pool = Pool(4)
with open(file_name, 'r', encoding='utf-8') as fh:
res = pool.map(abc_1, fh, 4)
return res
I would think "partial" is the culprit but even this works:
def file_task(file_name):
con_fig = Some_Config()
obj_ect = Some_other_class(con_fig)
pool = Pool(4)
with open(file_name, 'r', encoding='utf-8') as fh:
res = pool.map(partial(abc_1), fh, 4)
return res
Since, I haven't provided the content that I have in my classes, I understand his makes answering my question, hard.
I have gone through several such posts: TypeError: 'bool' object is not callable while creating custom thread pool
Nothing of this sort is happening in my code, not even in my classes.
Could my initializations in my class be leading to the error I am getting ? What else could be causing this ? What are some things to keep in mind which passing class instance to a function during multiprocessing ?
It will really help if someone can point me in right direction.
Related
I have a piece of code that fetches data from the ticketmaster API using a function I've named get_event_info. The first revision of the code worked as desired, subsequently I modified the original function to make use of header based authentication instead of URL based. I also added a few lines to the function which were intended to validate the response status code. After making these changes the code began producing the following TypeError:
Traceback (most recent call last):
File "ticketmaster_only_w_headers.py", line 146, in <module>
for event in ticket_search["_embedded"]["events"].items():
TypeError: 'NoneType' object is not subscriptable
I've read quite a bit about this type of error but I'm still unable to determine why my code is producing it in this instance. I would really appreciate an explanation on why my code is producing this error and what troubleshooting methods I should have used to uncover the source error. I'm fairly comfortable with programming but certainly no expert so the simpler the language used the better.
(Function Definition)
def get_event_info(search):
if search in CACHE_DICTION:
d = CACHE_DICTION[search]
else:
api_url = '{0}events/'.format(api_url_base)
payload = {"keyword": search, "apikey": api_token,
"format": "json", "dmaId": "366", "size": 200, "radius": "2"}
response = requests.get(api_url, headers=headers, params=payload)
if response.status_code == 200:
d = json.loads(response.text)
CACHE_DICTION[search] = d
f = open(CACHE_FNAME, 'w')
f.write(json.dumps(CACHE_DICTION))
f.close()
else:
d = None
return d
(Code snippet that triggers the error)
ticket_search = get_event_info("")
for event in ticket_search["_embedded"]["events"]:
a = event["id"]
b = event["name"]
if "dateTime" in event["dates"]["start"]:
c = event["dates"]["start"]["dateTime"].replace(
"T", " ").replace("Z", "")
else:
c = "NONE"
if "end" in event["dates"] and "dateTime" in event["dates"]["end"]:
j = event["dates"]["end"]["dateTime"].replace(
"T", " ").replace("Z", "")
else:
j = "NONE"
(Code that creates, opens, and writes to the cache used in the above code)
CACHE_FNAME = "ticketmaster_cache.json"
try:
cache_file = open(CACHE_FNAME, "r")
cache_contents = cache_file.read()
CACHE_DICTION = json.loads(cache_contents)
cache_file.close()
except:
CACHE_DICTION = {}
The previous revision of the get_event_info function shown below which does not produce any TypeError:
def get_event_info(search, ticketmaster_key = ticketmaster_key):
if search in CACHE_DICTION:
d = CACHE_DICTION[search]
else:
data = requests.get("https://app.ticketmaster.com/discovery/v2/events",
params = {"keyword": search, "apikey": ticketmaster_key,
"format":"json", "dmaId": "366", "size": 200, "radius": "2"})
print(data.url)
d = json.loads(data.text)
CACHE_DICTION[search] = d
f = open(CACHE_FNAME, 'w')
f.write(json.dumps(CACHE_DICTION))
f.close()
return d
Traceback & Error message I see when I run the latest revision of the code:
Traceback (most recent call last):
File "ticketmaster_only_w_headers.py", line 146, in <module>
for event in ticket_search["_embedded"]["events"]:
TypeError: 'NoneType' object is not subscriptable
Whenever you have a function that can explicitly return None, you should always check the return value first:
def func(a):
if a == 1:
return list(range(10)) # could return a list
else:
return None # or it could return None
a = 10
f = func(a)
f[1]
# raises TypeError: NoneType is not subscriptable
# check for NoneType first
if f is not None:
print(f[1])
# otherwise, kick out different result
else:
print('Got "None" for f!')
# Got "None" for f!
Your ticket_search is returned as None, but because your for loop is trying to do a key-lookup, it's failing, because None doesn't support that operation. Your logic, following from the above, should look like:
if ticket_search is not None:
for event in ticket_search["_embedded"]["events"]:
a = event["id"]
else:
raise TypeError
# or do something else
Well, the interpreter is explicitly telling you that you are trying to evaluate something like a[i], where a is None (instead of the intended type, like a list or a dict). In your case, it is either ticket_search itself, or ticket_search["_embedded"].
In any case, if you can rerun your code at all, putting a print(ticket_search) under ticket_search = get_event_info("") should make everything clear.
I want to refer to an element (mem[0]) of a list (mem) with a different name (fetch):
mem = [0]
f = open("File.lx", "rb").read()
for b in f: mem += [b]
size = len(mem)
while mem[0] < size: #using mem[0]
char = (mem[0]*2)+1
source = mem[char]
target = mem[char + 1]
mem[0] += 1
mem[target] = mem[source]
And I tried that with the with statement:
mem = [0]
f = open("File.lx", "rb").read()
for b in f: mem += [b]
size = len(mem)
with mem[0] as fetch: #with statement
while fetch < size: #using mem[0] as fetch
char = (fetch*2)+1
source = mem[char]
target = mem[char + 1]
fetch += 1
mem[target] = mem[source]
But I got an error:
Traceback (most recent call last):
File "C:\documents\test.py", line 6, in <module>
with mem[0] as fetch:
AttributeError: __enter__
I thought this would be the way because that's how it's done with file objects:
with open("File.lx", "rb") as file:
fileBytes = file.read()
I read the docs for the with statement and it says that the __exit()__ and __enter()__ methods are loaded. According to what I understood after reading that and from the AttributeError, my guess is that sequence elements (mem[0]) do not have an __enter()__ method.
as the comments already mentioned, mem[0] is a literal integer, which doesn't have __enter__ and __exit__ which are required for the as keyword to work and it would be indeed simpler if you just used mem[0]
but that would be too easy, what you CAN do (as an exercise don't actually do this)
is extend the int class and add __enter__ and __exit__ like so:
class FancyInt(int):
def __enter__(self):
return self
def __exit__(self, *args):
pass
mem = [FancyInt(0)]
with mem[0] as fetch:
print(fetch)
this is neat but fetch is an alias to a LITERAL! if you change fetch, mem[0] will not change!
You seem to want a mutable object which functions as an alias for a specific location in a list. I could see some utility in that (since explicit indices are somewhat ugly in Python). You could create such a class. Here is a proof of concept, implementing the three things that you tried to do with fetch in your code:
class Fetcher:
def __init__(self,target_list, index):
self._list = target_list
self._i = index
def __iadd__(self,v):
self._list[self._i] += v
return self
def __mul__(self,v):
return self._list[self._i] * v
def __lt__(self,v):
return self._list[self._i] < v
For example,
mem = [0,1,2]
fetch = Fetcher(mem,0)
print(fetch < 2) #true
mem[0] = 1
print(fetch < 2) #still true
fetch += 1
print(fetch < 2) #false!
print(mem[0]) #2, showing that mem[0] was changed
print(fetch*2) #4 -- but 2*fetch won't work!
The last line shows that there is a limit to what you could achieve here. To make this really useful, you would want to implement many more magic methods (beyond __iadd__ etc.). Whether or not all this is useful just to avoid [0], you be the judge.
I'm developing a program with IBM Watson Speech to Text and currently using Python 2.7. Here's a stub of some code for development:
class MyRecognizeCallback(RecognizeCallback):
def __init__(self):
RecognizeCallback.__init__(self)
def on_data(self, data):
pass
def on_error(self, error):
pass
def on_inactivity_timeout(self, error):
pass
speech_to_text = SpeechToTextV1(username='*goes here*', password='*goes here*')
speech_to_text.set_detailed_response(True)
f = '/home/user/file.wav'
rate, data = wavfile.read(f)
work = data.tolist()
with open(f, 'rb') as audio_file:
# Get IBM Watson analytics
currentModel = "en-US_NarrowbandModel" if rate <= 8000 else "en-US_BroadbandModel"
x = ""
print(" - " + f)
try:
# Callback info
myRecognizeCallback = MyRecognizeCallback()
# X represents the responce from Watson
audio_source = AudioSource(audio_file)
my_result = speech_to_text.recognize_using_websocket(
audio_source,
content_type='audio/wav',
timestamps=True,
recognize_callback=myRecognizeCallback,
model=currentModel,
inactivity_timeout=-1,
max_alternatives=0)
x = json.loads(json.dumps(my_result, indent=2), object_hook=lambda d: n
namedtuple('X', d.keys())(*d.values()))
What I'm expecting to be returned is a JSON object with the results of the file given the above parameters. What instead I'm recieving is an error that looks like this:
Error received: 'NoneType' object has no attribute 'connected'
That's the entire traceback - no other errors than that. However, when I try to access the JSON object in further code, I get this error:
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/watson_developer_cloud/websocket/recognize_listener.py", line 96, in run
chunk = self.audio_source.input.read(ONE_KB)
ValueError: I/O operation on closed file
Did I forget something or put something in the wrong place?
Edit:
My original code had an error in it that I fixed myself. Regardless, I'm still getting the original error. Here's the update:
my_result = speech_to_text.recognize_using_websocket(
audio_source,
content_type='audio/wav',
timestamps=True,
recognize_callback=myRecognizeCallback,
model=currentModel,
inactivity_timeout=None,
max_alternatives=None).get_result()
x = json.loads(json.dumps(my_result, indent=2), object_hook=lambda d: namedtuple('X', d.keys())(*d.values()))
Take a look at object_hook=lambda d: n, in python lambda d: n means "a function that takes d, ignores d, and returns n".
I'm guessing n is set to None somewhere else.
If that doesn't work, it may be easier to debug if you break your lambda into a separate function, def to_named_tuple(object): perhaps.
I'm trying to multiprocess a function that does multiple actions for a large file but I'm getting the knownle pickling error eventhough Im using partial.
The function looks something like this:
def process(r,intermediate_file,record_dict,record_id):
res=0
record_str = str(record_dict[record_id]).upper()
start = record_str[0:100]
end= record_str[len(record_seq)-100:len(record_seq)]
print sample, record_id
if r=="1":
if something:
res = something...
intermediate_file.write("...")
if something:
res = something
intermediate_file.write("...")
if r == "2":
if something:
res = something...
intermediate_file.write("...")
if something:
res = something
intermediate_file.write("...")
return res
The way im calling it is the following in another function:
def call_func():
intermediate_file = open("inter.txt","w")
record_dict = get_record_dict() ### get infos about each record as a dict based on the record_id
results_dict = {}
pool = Pool(10)
for a in ["a","b","c",...]:
if not results_dict.has_key(a):
results_dict[a] = {}
for b in ["1","2","3",...]:
if not results_dict[a].has_key(b):
results_dict[a][b] = {}
results_dict[a][b]['res'] = []
infile = open(a+b+".txt","r")
...parse the file and return values in a list called "record_ids"...
### now call the function based on for each record_id in record_ids
if b=="1":
func = partial(process,"1",intermediate_file,record_dict)
res=pool.map(func, record_ids)
## append the results for each pair (a,b) for EACH RECORD in the results_dict
results_dict[a][b]['res'].append(res)
if b=="2":
func = partial(process,"2",intermediate_file,record_dict)
res = pool.map(func, record_ids)
## append the results for each pair (a,b) for EACH RECORD in the results_dict
results_dict[a][b]['res'].append(res)
... do something with results_dict...
The idea is that for each record inside the record_ids, I want to save the results for each pair (a,b).
I'm not sure what is giving me this error:
File "/code/Python/Python-2.7.9/Lib/multiprocessing/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/code/Python/Python-2.7.9/Lib/multiprocessing/pool.py", line 558, in get
raise self._value
cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function faile
d
func is not defined at the top level of the code so it can't be pickled.
You can use pathos.multiprocesssing which is not a standard module but it will work.
Or, use something diferent to Pool.map maybe a Queue of workers ?
https://docs.python.org/2/library/queue.html
In the end there is an example you can use, it's for threading but is very similar to the multiprocessing where there is also Queues...
https://docs.python.org/2/library/multiprocessing.html#pipes-and-queues
I'm using Nose and Fudge for unit testing. Consider the following class:
class Foo():
def __init__(self, some_commandline):
self._some_commandline = commandline
def process(self):
stdout, stderr = self._commandline()
...
And a test:
def test_process_commandline(self):
import StringIO
# Setup
fake_stdout = StringIO.StringIO()
fake_stderr = StringIO.StringIO()
fake_stdio = fake_stdout, fake_stderr
fake_cline = (fudge
.Fake('SomeCommandline')
.is_a_stub()
.provides('__call__')
.returns(fake_stdio))
sut = Foo(fake_cline)
# Exercise
sut.process()
# Verify
...
The error I get is:
...
stdout, stderr = self._commandline()
TypeError: 'Fake' object is not iterable
The code I'm stubbing has a return line that looks like this (the real version of "SomeCommandline")
return stdout_str, stderr_str
Why am I getting the TypeError saying Fake is not iterable, and how do i stub this method with fudge?
The stub should be setup using .is_callable() instead of .provides('__call__'):
fake_cline = (fudge
.Fake('SomeCommandline')
.is_callable()
.returns(fake_stdio))
Also, .is_a_stub() is not needed here because we are stubbing out the method, __call__, directly, which is accessed via the class name SomeCommandLine.