I am using a socket connection to download data through a third party API. It works fine for a while but every now and then my script will crash giving the following error: BrokenPipeError: [Errno 32] Broken pipe
After some research it seems the suggestion (link here) is to do the following:
from signal import signal, SIGPIPE, SIG_DFL
signal(SIGPIPE,SIG_DFL)
However im firstly not sure what this actually does (im still confused after reading the python manual on signal). And I also don't know where to put the code.
If anyone is familiar with this error please could you advise if this is infact the correct solution and where the signal(SIGPIPE,SIG_DFL) would be placed. Should there be a try/except block inside which this is placed, or is it simply placed at the start of the program? Im confused.
Here's some of the relevant code. I basically have a dataframe consisting of several thousand items. I loop through each item passing it to the download method. The download method downloads the data via the api and then writes it to a database. I then move to the next item to download.
def recv_data(sock, recv_buffer=4096, delim='\n'):
buffer = ''
data = True
while data:
data = sock.recv(recv_buffer)
buffer += str(data.decode('latin-1'))
while buffer.find(delim) != -1:
line, buffer = buffer.split('\n', 1)
yield line
def update_existing_symbol_data(engine, sock, exchange, exchange_id, symbol, symbol_id, start_date):
data = ''
message = #request data message
sock.sendall(message.encode())
for line in recv_data(sock):
if "!ENDMSG!" in line:
break
data += line[:-2] + '\n'
df = pd.read_csv(io.StringIO(data))
df.set_index('date', inplace=True)
df.to_sql('daily', engine, if_exists='append')
def main():
df = #dataframe all symbols that need to be downloaded
for index, row in df.iterrows():
update_existing_symbol_data(args)
SIGPIPE is a POSIX thing that gets sent when a socket write operation fails. The default behavior is for the signal (this is an OS/socket thing, not a Python thing) to just kill your process. Python instead gives it to you as an exception so that it's possible to write more robust programs. But if you don't need to handle that event, which it sounds like you don't considering your use case, you can safely ignore it. There's no logic you need to do when you receive the signal, so the solution from that blog post should be fine. No try/except needed.
If your use case changes at a later date and you do need to handle the SIGPIPE, then wrapping that in a try/except and handling it there would be the way to go.
Related
Hello :) I´m a complete beginner when it comes to COM objects, any help is appreciated!
I´m working on a Python program supposed to read incoming MS-Word documents in a client/server fashion, i.e. the client sends a request (one or multiple MS-Word documents) and the server reads specific content from those requests using pythoncom and win32com.
Because I want to minimize waiting time for the client (client needs a status message from server, I do not want to open an MS-Word instance for every request. Hence, I intend to have a pool of running MS-Word instances from which the server can pick and choose. This, in turn, means I have to reuse those instances from the pool in different threads and this is what causes trouble right now. After I read Using win32com with multithreading, my dummy code for the server looks like this:
import pythoncom, win32com.client, threading, psutil, os, queue, time, datetime
appPool = {'WINWORD.EXE': queue.Queue()}
def initAppPool():
global appPool
wordApp = win32com.client.DispatchEx('Word.Application')
appPool["WINWORD.EXE"].put(wordApp) # For testing purpose I only use one MS-Word instance currently
def run_in_thread(appid, path):
#open doc, read do some stuff, close it and reattach MS-Word instance to pool
pythoncom.CoInitialize()
wordApp = win32com.client.Dispatch(pythoncom.CoGetInterfaceAndReleaseStream(appid, pythoncom.IID_IDispatch))
doc = wordApp.Documents.Open(path)
time.sleep(3) # read out some content ...
doc.Close()
appPool["WINWORD.EXE"].put(wordApp)
if __name__ == '__main__':
initAppPool()
pathOfFile2BeRead1 = r'C:\Temp\file4.docx'
pathOfFile2BeRead2 = r'C:\Temp\file5.doc'
#treat first request
wordApp = appPool["WINWORD.EXE"].get(True, 10)
pythoncom.CoInitialize()
wordApp_id = pythoncom.CoMarshalInterThreadInterfaceInStream(pythoncom.IID_IDispatch, wordApp)
readDocjob1 = threading.Thread(target=run_in_thread,args=(wordApp_id,pathOfFile2BeRead1), daemon=True)
readDocjob1.start()
#wait here until readDocjob1 is done
wait = True
while wait:
try:
wordApp = appPool["WINWORD.EXE"].get(True, 1)
wait = False
except queue.Empty:
print(f"[{datetime.datetime.now()}] error: appPool empty")
except BaseException as err:
print(f"[{datetime.datetime.now()}] error: {err}")
So far everything works as expected, but when I start a second request similar to the first one:
(x) wordApp_id = pythoncom.CoMarshalInterThreadInterfaceInStream(pythoncom.IID_IDispatch, wordApp)
readDocjob2 = threading.Thread(target=run_in_thread,args=(wordApp_id,pathOfFile2BeRead2), daemon=True)
readDocjob2.start()
I receive the following error message: "The application called an interface that was marshaled for a different thread" for the (x) marked line.
I thought that is why I have to use pythoncom.CoGetInterfaceAndReleaseStream to jump between threads with the same COM object? And besides that why does it work the first time but not the second time?
I searched for different solutions on StackOverflow which use CoMarshalInterface instead of CoMarshalInterThreadInterfaceInStream, but they all gave me the same error. I´m really confused right now.
EDIT:
After fixing the error as mentioned in the comments, I ran into a mysterious behavior.
When the second job is executed:
wordApp_id = pythoncom.CoMarshalInterThreadInterfaceInStream(pythoncom.IID_IDispatch, wordApp)
readDocjob2 = threading.Thread(target=run_in_thread,args=(wordApp_id,pathOfFile2BeRead2), daemon=True)
readDocjob2.start()
The function run_in_thread terminates immediately without executing any line, respectively it seems that the pythoncom.CoInitialize() is not working properly.
The script finishes without any error messages though.
def run_in_thread(instance,appid, path):
#open doc, read do some stuff, close it and reattach MS-Word instance to pool
pythoncom.CoInitialize()
wordApp = win32com.client.Dispatch(pythoncom.CoGetInterfaceAndReleaseStream(appid, pythoncom.IID_IDispatch))
doc = wordApp.Documents.Open(path)
time.sleep(3) # read out some content ...
doc.Close()
instance.flag = True
What happens is you put back in the "activePool" a COM reference that you got from CoGetInterfaceAndReleaseStream. But this reference was created specially for this new thread and then you call CoMarshalInterThreadInterfaceInStream on this new reference.
That's what is wrong.
You must always use the original COM reference you got from the thread that created it, to be able to call CoMarshalInterThreadInterfaceInStream repeatedly.
So, to solve the problem, you must change how your apppool works, use some kind of a "in use" flag but don't touch the original COM reference.
I am relatively new to Python, and programming as a whole. I am progressively getting the hang of it, however I have been stumped as of late in regards to one of my latest projects. I have a set of Atlas Scientific EZO circuits w/ their corresponding sensors hooked up to my Raspberry Pi 3. I can run the i2c script fine, and the majority of the code makes sense to me. However, I would like to pull data from the sensors and log it with a time stamp in a CSV file, taking data points in timed intervals. I am not quite sure how to pull the data from the sensor, and put it into a CSV. Making CSVs in Python is fairly simple, as is filling them with data, but I cannot seem to understand how I would make the data that goes into the CSV the same as what is displayed in the terminal when one runs the Poll function. Attached is the i2c sample code from Atlas' website. I have annotated it a bit more so as to help me understand it better.
I have already attempted to make sense of the poll function, but am confused in regards to the self.file_write and self.file_read methods used throughout the code. I do believe they would be of use in this instance but I am generally stumped in terms of implementation. Below you will find a link to the Python script (i2c.py) written by Atlas Scientific
https://github.com/AtlasScientific/Raspberry-Pi-sample-code/blob/master/i2c.py
I'm guessing by "the polling function" you are referring to this section of the code:
# continuous polling command automatically polls the board
elif user_cmd.upper().startswith("POLL"):
delaytime = float(string.split(user_cmd, ',')[1])
# check for polling time being too short, change it to the minimum timeout if too short
if delaytime < AtlasI2C.long_timeout:
print("Polling time is shorter than timeout, setting polling time to %0.2f" % AtlasI2C.long_timeout)
delaytime = AtlasI2C.long_timeout
# get the information of the board you're polling
info = string.split(device.query("I"), ",")[1]
print("Polling %s sensor every %0.2f seconds, press ctrl-c to stop polling" % (info, delaytime))
try:
while True:
print(device.query("R"))
time.sleep(delaytime - AtlasI2C.long_timeout)
except KeyboardInterrupt: # catches the ctrl-c command, which breaks the loop above
print("Continuous polling stopped")
If this is the case then if looks like you can recycle most of this code for your use. You can grab the string you are seeing in your console with device.query("R"), instead of printing it, grab the return value and write it to your CSV.
I think You should add method to AtlasI2C class which will write data to file
Just type under AtlasI2C init() this method:
def update_file(self, new_data):
with open(self.csv_file, 'a') as data_file:
try:
data = "{}\n".format(str(new_data))
data_file.write(data)
except Exception as e:
print(e)
add to AtlasI2C init statement about csv file name:
self.csv_file = <my_filename>.csv # replace my_filename with ur name
and then under line 51 (char_list = list(map(lambda x: chr(ord(x) & ~0x80), list(response[1:]))) add this line:
self.update_file(''.join(char_list))
Hope its gonna help You.
Cheers,
Fenrir
I am working on a large scale embedded system built using Python and we are using ZeroMQ to make everything modular. I have sensor data being sent across a ZeroMQ serial port in the form of the python Dictionary as shown here:
accel_com.publish_message({"ACL_X": ACL_1_X_val})
Where accel_com is a Communicator class we built that wraps the ZeroMQ logic that publishes messages across a port. Here you can see we are sending Dictionaries across.
However, on the other side of the communication port, I have another module that grabs this data using this code:
accel_msg = san.get_last_message("sensor/accelerometer")
accel.ax = accel_msg.get('ACL_X')
accel.ay = accel_msg.get('ACL_Y')
accel.az = accel_msg.get('ACL_Z')
The problem is when I try to treat accel_msg as a Python Dictionary, I get an Error:
'NoneType' object does not have a method 'get()'.
So my guess is the dictionary is not going across the wire correctly. I am not very familiar with Python so I am not sure how to solve this problem.
Expanding on #JoranBeasley's comment:
accel_msg is sometimes None, such as while it's waiting for a message. The solution is to skip over None messages
while True: # waiting indefinitely for messages
accel_msg = san.get_last_message("sensor/accelerometer")
if accel_msg: # or more explicitly, if accel_msg is not None:
accel.ax = accel_msg.get('ACL_X')
accel.ay = accel_msg.get('ACL_Y')
accel.az = accel_msg.get('ACL_Z')
break # if you only want one message. otherwise remove this
else:
print accel_msg # which is almost certainly None
I use a script with an infinite loop to upload sensor data to parse.com. I use
batcher.batch_save(myDataPoints)
to upload the data.
The problem is if the computer where the script runs (a raspberry pi) looses the internet connection, the scipt will exit on an error because the batcher can't reach the parse api.
how can I avoid this? If there is no internet connection, I would like the program to execute some code and keep looping, but NOT to exit on an error.
thanks.
If you just want to keep going after you get an error you can just use try...except, e.g.
while True:
try:
upload_function()
except Exception:
pass
This should catch your exception and instead of exiting out of your while loop, just continue on. You can put whatever you want to do there, you don't have to just leave it as pass.
def spinner(newList):
count = 0
while connection_is_bad:
count +=1
x= uploader(newList)
If your data is a list I think it gets easier
def uploader(someList):
last_value = ''
while connection_is_good:
for item in someList:
do some uploading
last_value = item
return something
newList = someList[someList.index(last_value) + 1:]
x = spinner(newList)
Here is my thought about some pseudo-code. I can imagine a number of ways to keep track of the data being transferred but not completely sure until I saw how it was collected
I am looking for a python snippet to read an internet radio stream(.asx, .pls etc) and save it to a file.
The final project is cron'ed script that will record an hour or two of internet radio and then transfer it to my phone for playback during my commute. (3g is kind of spotty along my commute)
any snippits or pointers are welcome.
The following has worked for me using the requests library to handle the http request.
import requests
stream_url = 'http://your-stream-source.com/stream'
r = requests.get(stream_url, stream=True)
with open('stream.mp3', 'wb') as f:
try:
for block in r.iter_content(1024):
f.write(block)
except KeyboardInterrupt:
pass
That will save a stream to the stream.mp3 file until you interrupt it with ctrl+C.
So after tinkering and playing with it Ive found Streamripper to work best. This is the command i use
streamripper http://yp.shoutcast.com/sbin/tunein-station.pls?id=1377200 -d ./streams -l 10800 -a tb$FNAME
If you find that your requests or urllib.request call in Python 3 fails to save a stream because you receive "ICY 200 OK" in return instead of an "HTTP/1.0 200 OK" header, you need to tell the underlying functions ICY 200 OK is OK!
What you can effectively do is intercept the routine that handles reading the status after opening the stream, just before processing the headers.
Simply put a routine like this above your stream opening code.
def NiceToICY(self):
class InterceptedHTTPResponse():
pass
import io
line = self.fp.readline().replace(b"ICY 200 OK\r\n", b"HTTP/1.0 200 OK\r\n")
InterceptedSelf = InterceptedHTTPResponse()
InterceptedSelf.fp = io.BufferedReader(io.BytesIO(line))
InterceptedSelf.debuglevel = self.debuglevel
InterceptedSelf._close_conn = self._close_conn
return ORIGINAL_HTTP_CLIENT_READ_STATUS(InterceptedSelf)
Then put these lines at the start of your main routine, before you open the URL.
ORIGINAL_HTTP_CLIENT_READ_STATUS = urllib.request.http.client.HTTPResponse._read_status
urllib.request.http.client.HTTPResponse._read_status = NiceToICY
They will override the standard routine (this one time only) and run the NiceToICY function in place of the normal status check when it has opened the stream. NiceToICY replaces the unrecognised status response, then copies across the relevant bits of the original response which are needed by the 'real' _read_status function. Finally the original is called and the values from that are passed back to the caller and everything else continues as normal.
I have found this to be the simplest way to get round the problem of the status message causing an error. Hope it's useful for you, too.
I am aware this is a year old, but this is still a viable question, which I have recently been fiddling with.
Most internet radio stations will give you an option of type of download, I choose the MP3 version, then read the info from a raw socket and write it to a file. The trick is figuring out how fast your download is compared to playing the song so you can create a balance on the read/write size. This would be in your buffer def.
Now that you have the file, it is fine to simply leave it on your drive (record), but most players will delete from file the already played chunk and clear the file out off the drive and ram when streaming is stopped.
I have used some code snippets from a file archive without compression app to handle a lot of the file file handling, playing, buffering magic. It's very similar in how the process flows. If you write up some sudo-code (which I highly recommend) you can see the similarities.
I'm only familiar with how shoutcast streaming works (which would be the .pls file you mention):
You download the pls file, which is just a playlist. It's format is fairly simple as it's just a text file that points to where the real stream is.
You can connect to that stream as it's just HTTP, that streams either MP3 or AAC. For your use, just save every byte you get to a file and you'll get an MP3 or AAC file you can transfer to your mp3 player.
Shoutcast has one addition that is optional: metadata. You can find how that works here, but is not really needed.
If you want a sample application that does this, let me know and I'll make up something later.
In line with the answer from https://stackoverflow.com/users/1543257/dingles (https://stackoverflow.com/a/41338150), here's how you can achieve the same result with the asynchronous HTTP client library - aiohttp:
import functools
import aiohttp
from aiohttp.client_proto import ResponseHandler
from aiohttp.http_parser import HttpResponseParserPy
class ICYHttpResponseParser(HttpResponseParserPy):
def parse_message(self, lines):
if lines[0].startswith(b"ICY "):
lines[0] = b"HTTP/1.0 " + lines[0][4:]
return super().parse_message(lines)
class ICYResponseHandler(ResponseHandler):
def set_response_params(
self,
*,
timer = None,
skip_payload = False,
read_until_eof = False,
auto_decompress = True,
read_timeout = None,
read_bufsize = 2 ** 16,
timeout_ceil_threshold = 5,
) -> None:
# this is a copy of the implementation from here:
# https://github.com/aio-libs/aiohttp/blob/v3.8.1/aiohttp/client_proto.py#L137-L165
self._skip_payload = skip_payload
self._read_timeout = read_timeout
self._reschedule_timeout()
self._timeout_ceil_threshold = timeout_ceil_threshold
self._parser = ICYHttpResponseParser(
self,
self._loop,
read_bufsize,
timer=timer,
payload_exception=aiohttp.ClientPayloadError,
response_with_body=not skip_payload,
read_until_eof=read_until_eof,
auto_decompress=auto_decompress,
)
if self._tail:
data, self._tail = self._tail, b""
self.data_received(data)
class ICYConnector(aiohttp.TCPConnector):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self._factory = functools.partial(ICYResponseHandler, loop=self._loop)
This can then be used as follows:
session = aiohttp.ClientSession(connector=ICYConnector())
async with session.get("url") as resp:
print(resp.status)
Yes, it's using a few private classes and attributes but this is the only solution to change the handling of something that's part of HTTP spec and (theoretically) should not ever need to be changed by the library's user...
All things considered, I would say this is still rather clean in comparison to monkey patching which would cause the behavior to be changed for all requests (especially true for asyncio where setting before and resetting after a request does not guarantee that something else won't make a request while request to ICY is being made). This way, you can dedicate a ClientSession object specifically for requests to servers that respond with the ICY status line.
Note that this comes with a performance penalty for requests made with ICYConnector - in order for this to work, I am using the pure Python implementation of HttpResponseParser which is going to be slower than the one that aiohttp uses by default and is written in C. This cannot really be done differently without vendoring the whole library as the behavior for parsing status line is deeply hidden in the C code.