This error has been arising for quite sometime but doesn't like to appear very frequently, it's time I squashed it.
I see that it appears whenever I have more than a single thread. The applications quite extensive so I'll be posting the code snippet down below.
I'm using the datetime.datetime strptime to format my message into a datetime object. When I use this within a multithreaded function, that error arises on the one thread but works perfectly fine on the other.
The error
Exception in thread Thread-7:
Traceback (most recent call last):
File "C:\Users\kdarling\AppData\Local\Continuum\anaconda3\envs\lobsandbox\lib\threading.py", line 801, in __bootstrap_inner
self.run()
File "C:\Users\kdarling\AppData\Local\Continuum\anaconda3\envs\lobsandbox\lib\threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "d:\kdarling_lob\gryphon_eagle\proj_0012_s2_lobcorr\main\lobx\src\correlator\handlers\inputhandler.py", line 175, in _read_socket
parsed_submit_time = datetime.datetime.strptime(message['msg']['submit_time'], '%Y-%m-%dT%H:%M:%S.%fZ')
AttributeError: 'module' object has no attribute '_strptime'
Alright so this is a bit strange as this was called on the first thread but the second thread works completely fine by using datetime.datetime.
Thoughts
Am I overwriting anything? It is Python afterall. Nope, I don't use datetime anywhere.
I am using inheritance through ABC, how about the parent? Nope, this bug was happening long before and I don't overwrite anything in the parent.
My next go to was thinking "Is is this Python blackmagic with the Datetime module?" in which I decided to time.strptime.
The error
Exception in thread Thread-6:
Traceback (most recent call last):
File "C:\Users\kdarling\AppData\Local\Continuum\anaconda3\envs\lobsandbox\lib\threading.py", line 801, in __bootstrap_inner
self.run()
File "C:\Users\kdarling\AppData\Local\Continuum\anaconda3\envs\lobsandbox\lib\threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "d:\kdarling_lob\gryphon_eagle\proj_0012_s2_lobcorr\main\lobx\src\correlator\handlers\inputhandler.py", line 176, in _read_socket
parsed_submit_time = time.strptime(message['msg']['submit_time'], '%Y-%m-%dT%H:%M:%S.%fZ')
AttributeError: 'module' object has no attribute '_strptime_time'
The code
import datetime
import heapq
import json
import os
import socket
import sys
import time
from io import BytesIO
from threading import Thread
from handler import Handler
class InputStreamHandler(Handler):
def __init__(self, configuration, input_message_heap):
"""
Initialization function for InputStreamHandler.
:param configuration: Configuration object that stores specific information.
:type configuration: Configuration
:param input_message_heap: Message heap that consumers thread will populate.
:type input_message_heap: Heap
"""
super(InputStreamHandler, self).__init__()
self.release_size = configuration.get_release_size()
self.input_src = configuration.get_input_source()
self.input_message_heap = input_message_heap
self.root_path = os.path.join(configuration.get_root_log_directory(), 'input', 'sensor_data')
self.logging = configuration.get_logger()
self.Status = configuration.Status
self.get_input_status_fn = configuration.get_input_functioning_status
self.update_input_status = configuration.set_input_functioning_status
if configuration.get_input_state() == self.Status.ONLINE:
self._input_stream = Thread(target=self._spinup_sockets)
elif configuration.get_input_state() == self.Status.OFFLINE:
self._input_stream = Thread(target=self._read_files)
def start(self):
"""
Starts the input stream thread to begin consuming data from the sensors connected.
:return: True if thread hasn't been started, else False on multiple start fail.
"""
try:
self.update_input_status(self.Status.ONLINE)
self._input_stream.start()
self.logging.info('Successfully started Input Handler.')
except RuntimeError:
return False
return True
def status(self):
"""
Displays the status of the thread, useful for offline reporting.
"""
return self.get_input_status_fn()
def stop(self):
"""
Stops the input stream thread by ending the looping process.
"""
if self.get_input_status_fn() == self.Status.ONLINE:
self.logging.info('Closing Input Handler execution thread.')
self.update_input_status(self.Status.OFFLINE)
self._input_stream.join()
def _read_files(self):
pass
def _spinup_sockets(self):
"""
Enacts sockets onto their own thread to collect messages.
Ensures that blocking doesn't occur on the main thread.
"""
active_threads = {}
while self.get_input_status_fn() == self.Status.ONLINE:
# Check if any are online
if all([value['state'] == self.Status.OFFLINE for value in self.input_src.values()]):
self.update_input_status(self.Status.OFFLINE)
for active_thread in active_threads.values():
active_thread.join()
break
for key in self.input_src.keys():
# Check if key exists, if not, spin up call
if (key not in active_threads or not active_threads[key].isAlive()) and self.input_src[key]['state'] == self.Status.ONLINE:
active_threads[key] = Thread(target=self._read_socket, args=(key, active_threads,))
active_threads[key].start()
print(self.input_src)
def _read_socket(self, key, cache):
"""
Reads data from a socket, places message into the queue, and pop the key.
:param key: Key corresponding to socket.
:type key: UUID String
:param cache: Key cache that corresponds the key and various others.
:type cache: Dictionary
"""
message = None
try:
sensor_socket = self.input_src[key]['sensor']
...
message = json.loads(stream.getvalue().decode('utf-8'))
if 'submit_time' in message['msg'].keys():
# Inherited function
self.write_to_log_file(self.root_path + key, message, self.release_size)
message['key'] = key
parsed_submit_time = time.strptime(message['msg']['submit_time'], '%Y-%m-%dT%H:%M:%S.%fZ')
heapq.heappush(self.input_message_heap, (parsed_submit_time, message))
cache.pop(key)
except:
pass
Additional thought
When this error wasn't being thrown, the two threads share a common function as seen called write_to_log_file. Sometimes, an error would occur where when the write_to_log_file was checking if the OS has a specific directory, it would return False even though it was there. Could this be something with Python and accessing the same function at the same time, even for outside modules? The error was never consistent as well.
Overall, this error won't arise when only running a single thread/connection.
Related
I'm writing a telegram bot and I need the bot to be available to users even when it is processing some previous request. My bot downloads some videos and compresses them if it exceeds the size limit, so it takes some time to process the request. I want to turn my sync functions to async ones and handle them within another process to make this happen.
I found a way to do this, using this article but it doesn't work for me. That's my code to test the solution:
import asyncio
from concurrent.futures import ProcessPoolExecutor
from functools import wraps, partial
executor = ProcessPoolExecutor()
def async_wrap(func):
#wraps(func)
async def run(*args, **kwargs):
loop = asyncio.get_running_loop()
pfunc = partial(func, *args, **kwargs)
return await loop.run_in_executor(executor, pfunc)
return run
#async_wrap
def sync_func(a):
import time
time.sleep(10)
if __name__ == "__main__":
asyncio.run(sync_func(4))
As a result, I've got the following error message:
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/mikhail/.pyenv/versions/3.10.4/lib/python3.10/multiprocessing/queues.py", line 245, in _feed
obj = _ForkingPickler.dumps(obj)
File "/home/mikhail/.pyenv/versions/3.10.4/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function sync_func at 0x7f2e333625f0>: it's not the same object as __main__.sync_func
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/mikhail/Projects/social_network_parsing_bot/processes.py", line 34, in <module>
asyncio.run(sync_func(4))
File "/home/mikhail/.pyenv/versions/3.10.4/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/home/mikhail/.pyenv/versions/3.10.4/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete
return future.result()
File "/home/mikhail/Projects/social_network_parsing_bot/processes.py", line 18, in run
return await loop.run_in_executor(executor, pfunc)
File "/home/mikhail/.pyenv/versions/3.10.4/lib/python3.10/multiprocessing/queues.py", line 245, in _feed
obj = _ForkingPickler.dumps(obj)
File "/home/mikhail/.pyenv/versions/3.10.4/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function sync_func at 0x7f2e333625f0>: it's not the same object as __main__.sync_func
As I understand, the error arises because decorator changes the function and as a result returns a new object. What I need to change in my code to make it work. Maybe I don't understand some crucial concepts and there is some simple method to achieve the desired. Thanks for help
The article runs a nice experiment, but it really is just meant to work with a threaded-pool exercutor - not a multi-processing one.
If you see its code, at some point it passes executor=None to the .run_in_executor call, and asyncio creates a default executor which is a ThreadPoolExecutor.
The main difference to a ProcessPoolExecutor is that all data moved cross-process (and therefore, all data sent to the workers, including the target functions) have to be serialized - and it is done via Python's pickle.
Now, Pickle serialization of functions do not really send the function objects, along with its bytecode, down the wire: rather, it just sends the function qualname, and it is expected that the function with the same qualname on the other end is the same as the original function.
In the case of your code, the func which is the target for the executor-pool is the declared function, prior to it being wrapped in the decorator ( __main__.sync_func) . But what exists with this name in the target process is the post-decorated function. So, if Python would not block it due to the functions not being the same, you'd get into an infinite-loop creating hundreds of nested subprocess and never actually calling your function - as the entry-point in the target would be the wrapped function. That is just an error in the article you viewed.
All this said, the simpler way to make all this work, is instead of using this decorator in the usual fashion, just keep the original, undecorated function, in the module namespace, and create a new name for the wrapped function - this way, the "raw" code can be the target for the executor:
(...)
def sync_func(a):
import time
time.sleep(2)
print(f"finished {a}")
# this creates the decorated function with a new name,
# instead of replacing the original:
wrapped_sync = async_wrap(sync_func)
if __name__ == "__main__":
asyncio.run(wrapped_sync("go go go"))
I'm really new to Luigi and I would like to set up luigi to execute my API calls.
I'm working with MockFiles since the json object that I retrieve through API are light and I want to avoid to use an external database.
This is my code :
import luigi
from luigi import Task, run as runLuigi, mock as LuigiMock
import yaml
class getAllCountries(Task):
task_complete = False
def requires(self):
return LuigiMock.MockFile("allCountries")
def run(self):
sync = Sync()
# Get list of all countries
countries = sync.getAllCountries()
if(countries is None or len(countries) == 0):
Logger.error("Sync terminated. The country array is null")
object_to_send = yaml.dump(countries)
_out = self.output().open('r')
_out.write(object_to_send)
_out.close()
task_complete = True
def complete(self):
return self.task_complete
class getActiveCountries(Task):
task_complete = False
def requires(self):
return getAllCountries()
def run(self):
_in = self.input().read('r')
serialised = _in.read()
countries = yaml.load(serialised)
doSync = DoSync()
activeCountries = doSync.getActiveCountries(countries)
if(activeCountries is None or len(activeCountries) == 0):
Logger.error("Sync terminated. The active country account array is null")
task_complete = True
def complete(self):
return self.task_complete
if __name__ == "__main__":
runLuigi()
I'm running the project with the following command :
PYTHONPATH='.' luigi --module app getActiveCountries --workers 2 --local-scheduler
And it fails, and this is the stacktrace that I got :
DEBUG: Checking if getActiveCountries() is complete
DEBUG: Checking if getAllCountries() is complete
INFO: Informed scheduler that task getActiveCountries__99914b932b has status PENDING
ERROR: Luigi unexpected framework error while scheduling getActiveCountries()
Traceback (most recent call last):
File "/Users/thibaultlr/anaconda3/envs/testThib/lib/python3.6/site-packages/luigi/worker.py", line 763, in add
for next in self._add(item, is_complete):
File "/Users/thibaultlr/anaconda3/envs/testThib/lib/python3.6/site-packages/luigi/worker.py", line 861, in _add
self._validate_dependency(d)
File "/Users/thibaultlr/anaconda3/envs/testThib/lib/python3.6/site-packages/luigi/worker.py", line 886, in _validate_dependency
raise Exception('requires() can not return Target objects. Wrap it in an ExternalTask class')
Exception: requires() can not return Target objects. Wrap it in an ExternalTask class
INFO: Worker Worker(salt=797067816, workers=2, host=xxx, pid=85795) was stopped. Shutting down Keep-Alive thread
ERROR: Uncaught exception in luigi
Traceback (most recent call last):
File "/Users/thibaultlr/anaconda3/envs/testThib/lib/python3.6/site-packages/luigi/retcodes.py", line 75, in run_with_retcodes
worker = luigi.interface._run(argv).worker
File "/Users/thibaultlr/anaconda3/envs/testThib/lib/python3.6/site-packages/luigi/interface.py", line 211, in _run
return _schedule_and_run([cp.get_task_obj()], worker_scheduler_factory)
File "/Users/thibaultlr/anaconda3/envs/testThib/lib/python3.6/site-packages/luigi/interface.py", line 171, in _schedule_and_run
success &= worker.add(t, env_params.parallel_scheduling, env_params.parallel_scheduling_processes)
File "/Users/thibaultlr/anaconda3/envs/testThib/lib/python3.6/site-packages/luigi/worker.py", line 763, in add
for next in self._add(item, is_complete):
File "/Users/thibaultlr/anaconda3/envs/testThib/lib/python3.6/site-packages/luigi/worker.py", line 861, in _add
self._validate_dependency(d)
File "/Users/thibaultlr/anaconda3/envs/testThib/lib/python3.6/site-packages/luigi/worker.py", line 886, in _validate_dependency
raise Exception('requires() can not return Target objects. Wrap it in an ExternalTask class')
Exception: requires() can not return Target objects. Wrap it in an ExternalTask class
Also, i'm running the luigid in background and I do not see any tasks that ran on it. Neither if it failed or not
Any ideas ?
Firstly, you are not seeing anything happen within the luigi daemon, because in your PYTHONPATH you specify --local-scheduler. This disregards the daemon entirely and just runs the scheduler on the local process.
Second, in the getAllCountries task, you are specifying a Target as a requirement, when it should be in your output function. Once you've changed if from:
def requires(self):
return LuigiMock.MockFile("allCountries")
to:
def output(self):
return LuigiMock.MockFile("allCountries")
you won't need to redefine the complete function or set task_complete to True, because luigi will determine the task is complete by the presence of the output file. To find out more about targets take a look at: https://luigi.readthedocs.io/en/stable/workflows.html#target
Side note: You can make this section:
_out = self.output().open('r')
_out.write(object_to_send)
_out.close()
a lot easier and less prone to bugs by just using Python's with functionality.
with self.output().open('r') as _out:
_out.write(object_to_send)
Python will automatically close the file when exiting the with scope and on error.
Second side note: Don't use luigi's run. It is deprecated. Use luigi.build instead.
Environment
Windows 10 + python 3.6.3 64 bit (also tried 32 bit). I am a python developer trying to use COM for (nearly) the first time and hit this huge blocker.
Problem
I have had various errors when trying to use an IRTDServer implemented in a dll (not written by me), via either win32com or comtypes. Using win32com turned out to be more difficult. I have an included an example unittest for both libraries below.
Accessing the server from Excel 2016 works as expected; this returns the expected value:
=RTD("foo.bar", , "STAT1", "METRIC1")
Code using win32com library
Here is a simple test case which should connect to the server but doesn't. (This is just one version, as I have changed it many times trying to debug the problem.)
from unittest import TestCase
class COMtest(TestCase):
def test_win32com(self):
import win32com.client
from win32com.server.util import wrap
class RTDclient:
# are these only required when implementing the server?
_com_interfaces_ = ["IRTDUpdateEvent"]
_public_methods_ = ["Disconnect", "UpdateNotify"]
_public_attrs_ = ["HeartbeatInterval"]
def __init__(self, *args, **kwargs):
self._comObj = win32com.client.Dispatch(*args, **kwargs)
def connect(self):
self._rtd = win32com.client.CastTo(self._comObj, 'IRtdServer')
result = self._rtd.ServerStart(wrap(self))
assert result > 0
def UpdateNotify(self):
print("UpdateNotify() callback")
def Disconnect(self):
print("Disconnect() called")
HeartbeatInterval = -1
_rtd = RTDclient("foo.bar")
_rtd.connect()
Result:
Traceback (most recent call last):
File "env\lib\site-packages\win32com\client\gencache.py", line 532, in EnsureDispatch
ti = disp._oleobj_.GetTypeInfo()
pywintypes.com_error: (-2147467263, 'Not implemented', None, None)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "test\test.py", line 23, in test_win32com
_rtd.connect()
File "test\test.py", line 16, in connect
self._rtd = win32com.client.CastTo(dispatch, 'IRtdServer')
File "env\lib\site-packages\win32com\client\__init__.py", line 134, in CastTo
ob = gencache.EnsureDispatch(ob)
File "env\lib\site-packages\win32com\client\gencache.py", line 543, in EnsureDispatch
raise TypeError("This COM object can not automate the makepy process - please run makepy manually for this object")
TypeError: This COM object can not automate the makepy process - please run makepy manually for this object
Following those directions, I ran the makepy script successfully:
> env\Scripts\python.exe env\lib\site-packages\win32com\client\makepy.py "foo.bar"
Generating to C:\Users\user1\AppData\Local\Temp\gen_py\3.5\longuuid1x0x1x0.py
Building definitions from type library...
Generating...
Importing module
(I replaced the UUID on stackoverflow for privacy. This UUID is the same as the typelib UUID for "foo.bar".)
The generated file contains various the function and type definitions of both IRtdServer and IRTDUpdateEvent. But in this file, both interfaces are subclasses of win32com.client.DispatchBaseClass, while according to OleViewDotNet, they should be subclasses of IUnknown?
However, when I attempted to run the unittest again, I received the exact same error as before. It is as if the lookup mechanism is not finding the generated module?
Also, GetTypeInfo returning Not implemented is alarming me. From my understanding, win32com uses that method (part of IDispatch COM interface) to determine the argument and return types for all other functions in other interfaces, including IRtdServer. If it's not implemented, it would be unable to determine the types correctly. Yet, the generated file seems to include this information, which is also perplexing.
Code using comtypes library
from unittest import TestCase
class COMtest(TestCase):
def test_comtypes(self):
import comtypes.client
class RTDclient:
# are these for win32com only?
_com_interfaces_ = ["IRTDUpdateEvent"]
_public_methods_ = ["Disconnect", "UpdateNotify"]
_public_attrs_ = ["HeartbeatInterval"]
def __init__(self, clsid):
self._comObj = comtypes.client.CreateObject(clsid)
def connect(self):
self._rtd = self._comObj.IRtdServer()
result = self._rtd.ServerStart(self)
assert result > 0
def UpdateNotify(self):
print("UpdateNotify() callback")
def Disconnect(self):
print("Disconnect() called")
HeartbeatInterval = -1
_rtd = RTDclient("foo.bar")
_rtd.connect()
Result:
File "test\test.py", line 27, in test_comtypes
_rtd.connect()
File "test\test.py", line 16, in connect
self._rtd = self._comObj.IRTDServer()
File "env\lib\site-packages\comtypes\client\dynamic.py", line 110, in __getattr__
dispid = self._comobj.GetIDsOfNames(name)[0]
File "env\lib\site-packages\comtypes\automation.py", line 708, in GetIDsOfNames
self.__com_GetIDsOfNames(riid_null, arr, len(names), lcid, ids)
_ctypes.COMError: (-2147352570, 'Unknown name.', (None, None, None, 0, None))
Some other solutions I've tried
(Based on googling and answers in the comments below)
(Re-)Registered the DLL
Registered the 32 bit version of the DLL and tried python 32 bit
Set compatibility mode of python.exe to Windows XP SP3
Tried not instantiating IRtdServer, that is, replacing these two lines:
self._rtd = self._comObj.IRtdServer()
result = self._rtd.ServerStart(self)
with:
result = self._comObj.ServerStart(self)
The error this time is:
TypeError: 'NoneType' object is not callable
That would seem to indicate that the ServerStart function exists, but is undefined? (Seems really weird. There must be more to this mystery.)
Tried passing interface="IRtdServer" parameter to CreateObject:
def __init__(self, clsid):
self._comObj = comtypes.client.CreateObject(clsid, interface="IRtdServer")
def connect(self):
result = self._comObj.ServerStart(self)
...
The error received is:
File "test\test.py", line 13, in __init__
self._comObj = comtypes.client.CreateObject(clsid, interface="IRtdServer")
File "env\lib\site-packages\comtypes\client\__init__.py", line 238, in CreateObject
obj = comtypes.CoCreateInstance(clsid, clsctx=clsctx, interface=interface)
File "env\lib\site-packages\comtypes\__init__.py", line 1223, in CoCreateInstance
p = POINTER(interface)()
TypeError: Cannot create instance: has no _type_
Tracing code in the comtypes library, that would seem to indicate that the interface parameter wants an interface class, not a string. I found various interfaces defined in the comtypes library: IDispatch, IPersist, IServiceProvider. All are subclasses of IUnknown. According to OleViewDotNet, IRtdServer is also a subclass of IUnknown. This leads me to believe that I need to similarly write an IRtdServer class in python in order to use the interface, but I don't know how to do that.
I noticed the dynamic parameter of CreateObject. The code indicates this is mutually exclusive to the interface parameter, so I tried that:
def __init__(self, clsid):
self._comObj = comtypes.client.CreateObject(clsid, dynamic=True)
def connect(self):
self._rtd = self._comObj.IRtdServer()
result = self._rtd.ServerStart(self)
But the error is the same as my original error: IRtdServer has _ctypes.COMError: (-2147352570, 'Unknown name.', (None, None, None, 0, None))
Any help or clues would be greatly be appreciated. Thank you in advance.
(Not really knowing what I'm doing,) I tried to use OleViewDotNet to look at the DLL:
I ran into same problem.
I also tried using win32com to get excel run that for me, that's a bit unstable to be honest...I cannot even touch my Excel.
Therefore I spent some time looking into this. The problem lies with CastTo. Think that COM object you (and I) loaded just does not contain enough information to be casted (some methods like GetTypeInfo are not implemented etc...)
Therefore I created a wrapper that makes methods of those COM objects callable...not obvious. And this seems working for me.
Client code is modified from a project called pyrtd, which didn't work for various reasons (think due to change of RTD model...return of RefreshData is just completely different now).
import functools
import pythoncom
import win32com.client
from win32com import universal
from win32com.client import gencache
from win32com.server.util import wrap
EXCEL_TLB_GUID = '{00020813-0000-0000-C000-000000000046}'
EXCEL_TLB_LCID = 0
EXCEL_TLB_MAJOR = 1
EXCEL_TLB_MINOR = 4
gencache.EnsureModule(EXCEL_TLB_GUID, EXCEL_TLB_LCID, EXCEL_TLB_MAJOR, EXCEL_TLB_MINOR)
universal.RegisterInterfaces(EXCEL_TLB_GUID,
EXCEL_TLB_LCID, EXCEL_TLB_MAJOR, EXCEL_TLB_MINOR,
['IRtdServer', 'IRTDUpdateEvent'])
# noinspection PyProtectedMember
class ObjectWrapperCOM:
"""
This object can act as a wrapper for an object dispatched using win32com.client.Dispatch
Sometimes the object written by 3rd party is not well constructed that win32com will not be able to obtain
type information etc in order to cast the object to a certain interface. win32com.client.CastTo will fail.
This wrapper class will enable the object to call its methods in this case, even if we do not know what exactly
the wrapped object is.
"""
LCID = 0x0
def __init__(self, obj):
self._impl = obj # type: win32com.client.CDispatch
def __getattr__(self, item):
flags, dispid = self._impl._find_dispatch_type_(item)
if dispid is None:
raise AttributeError("{} is not a valid property or method for this object.".format(item))
return functools.partial(self._impl._oleobj_.Invoke, dispid, self.LCID, flags, True)
# noinspection PyPep8Naming
class RTDUpdateEvent:
"""
Implements interface IRTDUpdateEvent from COM imports
"""
_com_interfaces_ = ['IRTDUpdateEvent']
_public_methods_ = ['Disconnect', 'UpdateNotify']
_public_attrs_ = ['HeartbeatInterval']
# Implementation of IRTDUpdateEvent.
HeartbeatInterval = -1
def __init__(self, event_driven=True):
self.ready = False
self._event_driven = event_driven
def UpdateNotify(self):
if self._event_driven:
self.ready = True
def Disconnect(self):
pass
class RTDClient:
"""
Implements a Real-Time-Data (RTD) client for accessing COM data sources that provide an IRtdServer interface.
"""
MAX_REGISTERED_TOPICS = 1024
def __init__(self, class_id):
"""
:param classid: can either be class ID or program ID
"""
self._class_id = class_id
self._rtd = None
self._update_event = None
self._topic_to_id = {}
self._id_to_topic = {}
self._topic_values = {}
self._last_topic_id = 0
def connect(self, event_driven=True):
"""
Connects to the RTD server.
Set event_driven to false if you to disable update notifications.
In this case you'll need to call refresh_data manually.
"""
dispatch = win32com.client.Dispatch(self._class_id)
self._update_event = RTDUpdateEvent(event_driven)
try:
self._rtd = win32com.client.CastTo(dispatch, 'IRtdServer')
except TypeError:
# Automated makepy failed...no detailed construction available for the class
self._rtd = ObjectWrapperCOM(dispatch)
self._rtd.ServerStart(wrap(self._update_event))
def update(self):
"""
Check if there is data waiting and call RefreshData if necessary. Returns True if new data has been received.
Note that you should call this following a call to pythoncom.PumpWaitingMessages(). If you neglect to
pump the message loop you'll never receive UpdateNotify callbacks.
"""
# noinspection PyUnresolvedReferences
pythoncom.PumpWaitingMessages()
if self._update_event.ready:
self._update_event.ready = False
self.refresh_data()
return True
else:
return False
def refresh_data(self):
"""
Grabs new data from the RTD server.
"""
(ids, values) = self._rtd.RefreshData(self.MAX_REGISTERED_TOPICS)
for id_, value in zip(ids, values):
if id_ is None and value is None:
# This is probably the end of message
continue
assert id_ in self._id_to_topic, "Topic ID {} is not registered.".format(id_)
topic = self._id_to_topic[id_]
self._topic_values[topic] = value
def get(self, topic: tuple):
"""
Gets the value of a registered topic. Returns None if no value is available. Throws an exception if
the topic isn't registered.
"""
assert topic in self._topic_to_id, 'Topic %s not registered.' % (topic,)
return self._topic_values.get(topic)
def register_topic(self, topic: tuple):
"""
Registers a topic with the RTD server. The topic's value will be updated in subsequent data refreshes.
"""
if topic not in self._topic_to_id:
id_ = self._last_topic_id
self._last_topic_id += 1
self._topic_to_id[topic] = id_
self._id_to_topic[id_] = topic
self._rtd.ConnectData(id_, topic, True)
def unregister_topic(self, topic: tuple):
"""
Un-register topic so that it will not get updated.
:param topic:
:return:
"""
assert topic in self._topic_to_id, 'Topic %s not registered.' % (topic,)
self._rtd.DisconnectData(self._topic_to_id[topic])
def disconnect(self):
"""
Closes RTD server connection.
:return:
"""
self._rtd.ServerTerminate()
There seems to already be both server/client for Excel 2002.
pyrtd
Looking at that source, once you create a dispatch object, then it seems to be cast to IRtdServer.
Extract related parts, it becomes below.
from win32com import client, universal
from win32com.server.util import wrap
def __init__(self, classid):
self._classid = classid
self._rtd = None
def connect(self, event_driven=True):
dispatch = client.Dispatch(self._classid)
self._rtd = client.CastTo(dispatch, 'IRtdServer')
if event_driven:
self._rtd.ServerStart(wrap(self))
else:
self._rtd.ServerStart(None)
Please refer to client.py and examples/rtdtime.py of the following sources.
pyrtd - default
pyrtd/rtd/client.py
pyrtd/examples/rtdtime.py
UPDATE: As noted by Mr. Fooz, the functional version of the wrapper has a bug, so I reverted to the original class implementation. I've put the code up on GitHub:
https://github.com/nofatclips/timeout/commits/master
There are two commits, one working (using the "import" workaround) the second one broken.
The source of the problem seems to be the pickle#dumps function, which just spits out an identifier when called on an function. By the time I call Process, that identifier points to the decorated version of the function, rather than the original one.
ORIGINAL MESSAGE:
I was trying to write a function decorator to wrap a long task in a Process that would be killed if a timeout expires. I came up with this (working but not elegant) version:
from multiprocessing import Process
from threading import Timer
from functools import partial
from sys import stdout
def safeExecution(function, timeout):
thread = None
def _break():
#stdout.flush()
#print (thread)
thread.terminate()
def start(*kw):
timer = Timer(timeout, _break)
timer.start()
thread = Process(target=function, args=kw)
ret = thread.start() # TODO: capture return value
thread.join()
timer.cancel()
return ret
return start
def settimeout(timeout):
return partial(safeExecution, timeout=timeout)
##settimeout(1)
def calculatePrimes(maxPrimes):
primes = []
for i in range(2, maxPrimes):
prime = True
for prime in primes:
if (i % prime == 0):
prime = False
break
if (prime):
primes.append(i)
print ("Found prime: %s" % i)
if __name__ == '__main__':
print (calculatePrimes)
a = settimeout(1)
calculatePrime = a(calculatePrimes)
calculatePrime(24000)
As you can see, I commented out the decorator and assigned the modified version of calculatePrimes to calculatePrime. If I tried to reassign it to the same variable, I'd get a "Can't pickle : attribute lookup builtins.function failed" error when trying to call the decorated version.
Anybody has any idea of what is happening under the hood? Is the original function being turned into something different when I assign the decorated version to the identifier referencing it?
UPDATE: To reproduce the error, I just change the main part to
if __name__ == '__main__':
print (calculatePrimes)
a = settimeout(1)
calculatePrimes = a(calculatePrimes)
calculatePrimes(24000)
#sleep(2)
which yields:
Traceback (most recent call last):
File "c:\Users\mm\Desktop\ING.SW\python\thread2.py", line 49, in <module>
calculatePrimes(24000)
File "c:\Users\mm\Desktop\ING.SW\python\thread2.py", line 19, in start
ret = thread.start()
File "C:\Python33\lib\multiprocessing\process.py", line 111, in start
self._popen = Popen(self)
File "C:\Python33\lib\multiprocessing\forking.py", line 241, in __init__
dump(process_obj, to_child, HIGHEST_PROTOCOL)
File "C:\Python33\lib\multiprocessing\forking.py", line 160, in dump
ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class 'function'>: attribute lookup builtin
s.function failed
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Python33\lib\multiprocessing\forking.py", line 344, in main
self = load(from_parent)
EOFError
P.S. I also wrote a class version of safeExecution, which has exactly the same behaviour.
Move the function to a module that's imported by your script.
Functions are only picklable in python if they're defined at the top level of a module. Ones defined in scripts are not picklable by default. Module-based functions are pickled as two strings: the name of the module, and the name of the function. They're unpickled by dynamically importing the module then looking up the function object by name (hence the restriction on top-level-only functions).
It's possible to extend the pickle handlers to support semi-generic function and lambda pickling, but doing so can be tricky. In particular, it can be difficult to reconstruct the full namespace tree if you want to properly handle things like decorators and nested functions. If you want to do this, it's best to use Python 2.7 or later or Python 3.3 or later (earlier versions have a bug in the dispatcher of cPickle and pickle that's unpleasant to work around).
Is there an easy way to pickle a python function (or otherwise serialize its code)?
Python: pickling nested functions
http://bugs.python.org/issue7689
EDIT:
At least in Python 2.6, the pickling works fine for me if the script only contains the if __name__ block, the script imports calculatePrimes and settimeout from a module, and if the inner start function's name is monkey-patched:
def safeExecution(function, timeout):
...
def start(*kw):
...
start.__name__ = function.__name__ # ADD THIS LINE
return start
There's a second problem that's related to Python's variable scoping rules. The assignment to the thread variable inside start creates a shadow variable whose scope is limited to one evaluation of the start function. It does not assign to the thread variable found in the enclosing scope. You can't use the global keyword to override the scope because you want and intermediate scope and Python only has full support for manipulating the local-most and global-most scopes, not any intermediate ones. You can overcome this problem by placing the thread object in a container that's housed in the intermediate scope. Here's how:
def safeExecution(function, timeout):
thread_holder = [] # MAKE IT A CONTAINER
def _break():
#stdout.flush()
#print (thread)
thread_holder[0].terminate() # REACH INTO THE CONTAINER
def start(*kw):
...
thread = Process(target=function, args=kw)
thread_holder.append(thread) # MUTATE THE CONTAINER
...
start.__name__ = function.__name__ # MAKES THE PICKLING WORK
return start
Not sure really why you get that problem, but to answer your title question: Why does the decorator not work?
When you pass arguments to a decorator, you need to structure the code slightly different. Essentially you have to implement the decorator as a class with an __init__ and an __call__.
In the init, you collect the arguments that you send to the decorator, and in the call, you'll get the function you decorate:
class settimeout(object):
def __init__(self, timeout):
self.timeout = timeout
def __call__(self, func):
def wrapped_func(n):
func(n, self.timeout)
return wrapped_func
#settimeout(1)
def func(n, timeout):
print "Func is called with", n, 'and', timeout
func(24000)
This should get you going on the decorator front at least.
I'm trying to make a file like object which is meant to be assigned to sys.stdout/sys.stderr during testing to provide deterministic output. It's not meant to be fast, just reliable. What I have so far almost works, but I need some help getting rid of the last few edge-case errors.
Here is my current implementation.
try:
from cStringIO import StringIO
except ImportError:
from StringIO import StringIO
from os import getpid
class MultiProcessFile(object):
"""
helper for testing multiprocessing
multiprocessing poses a problem for doctests, since the strategy
of replacing sys.stdout/stderr with file-like objects then
inspecting the results won't work: the child processes will
write to the objects, but the data will not be reflected
in the parent doctest-ing process.
The solution is to create file-like objects which will interact with
multiprocessing in a more desirable way.
All processes can write to this object, but only the creator can read.
This allows the testing system to see a unified picture of I/O.
"""
def __init__(self):
# per advice at:
# http://docs.python.org/library/multiprocessing.html#all-platforms
from multiprocessing import Queue
self.__master = getpid()
self.__queue = Queue()
self.__buffer = StringIO()
self.softspace = 0
def buffer(self):
if getpid() != self.__master:
return
from Queue import Empty
from collections import defaultdict
cache = defaultdict(str)
while True:
try:
pid, data = self.__queue.get_nowait()
except Empty:
break
cache[pid] += data
for pid in sorted(cache):
self.__buffer.write( '%s wrote: %r\n' % (pid, cache[pid]) )
def write(self, data):
self.__queue.put((getpid(), data))
def __iter__(self):
"getattr doesn't work for iter()"
self.buffer()
return self.__buffer
def getvalue(self):
self.buffer()
return self.__buffer.getvalue()
def flush(self):
"meaningless"
pass
... and a quick test script:
#!/usr/bin/python2.6
from multiprocessing import Process
from mpfile import MultiProcessFile
def printer(msg):
print msg
processes = []
for i in range(20):
processes.append( Process(target=printer, args=(i,), name='printer') )
print 'START'
import sys
buffer = MultiProcessFile()
sys.stdout = buffer
for p in processes:
p.start()
for p in processes:
p.join()
for i in range(20):
print i,
print
sys.stdout = sys.__stdout__
sys.stderr = sys.__stderr__
print
print 'DONE'
print
buffer.buffer()
print buffer.getvalue()
This works perfectly 95% of the time, but it has three edge-case problems. I have to run the test script in a fast while-loop to reproduce these.
3% of the time, the parent process output isn't completely reflected. I assume this is because the data is being consumed before the Queue-flushing thread can catch up. I haven't though of a way to wait for the thread without deadlocking.
.5% of the time, there's a traceback from the multiprocess.Queue implementation
.01% of the time, the PIDs wrap around, and so sorting by PID gives the wrong ordering.
In the very worst case (odds: one in 70 million), the output would look like this:
START
DONE
302 wrote: '19\n'
32731 wrote: '0 1 2 3 4 5 6 7 8 '
32732 wrote: '0\n'
32734 wrote: '1\n'
32735 wrote: '2\n'
32736 wrote: '3\n'
32737 wrote: '4\n'
32738 wrote: '5\n'
32743 wrote: '6\n'
32744 wrote: '7\n'
32745 wrote: '8\n'
32749 wrote: '9\n'
32751 wrote: '10\n'
32752 wrote: '11\n'
32753 wrote: '12\n'
32754 wrote: '13\n'
32756 wrote: '14\n'
32757 wrote: '15\n'
32759 wrote: '16\n'
32760 wrote: '17\n'
32761 wrote: '18\n'
Exception in thread QueueFeederThread (most likely raised during interpreter shutdown):
Traceback (most recent call last):
File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
File "/usr/lib/python2.6/threading.py", line 484, in run
File "/usr/lib/python2.6/multiprocessing/queues.py", line 233, in _feed
<type 'exceptions.TypeError'>: 'NoneType' object is not callable
In python2.7 the exception is slightly different:
Exception in thread QueueFeederThread (most likely raised during interpreter shutdown):
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
File "/usr/lib/python2.7/threading.py", line 505, in run
File "/usr/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
<type 'exceptions.IOError'>: [Errno 32] Broken pipe
How do I get rid of these edge cases?
The solution came in two parts. I've successfully run the test program 200 thousand times without any change in output.
The easy part was to use multiprocessing.current_process()._identity to sort the messages. This is not a part of the published API, but it is a unique, deterministic identifier of each process. This fixed the problem with PIDs wrapping around and giving a bad ordering of output.
The other part of the solution was to use multiprocessing.Manager().Queue() rather than the multiprocessing.Queue. This fixes problem #2 above because the manager lives in a separate Process, and so avoids some of the bad special cases when using a Queue from the owning process. #3 is fixed because the Queue is fully exhausted and the feeder thread dies naturally before python starts shutting down and closes stdin.
I have encountered far fewer multiprocessing bugs with Python 2.7 than with Python 2.6. Having said this, the solution I used to avoid the "Exception in thread QueueFeederThread" problem is to sleep momentarily, possibly for 0.01s, in each process in which the the Queue is used. It is true that using sleep is not desirable or even reliable, but the specified duration was observed to work sufficiently well in practice for me. You can also try 0.1s.