Truncated file header while using multiprocessing

Truncated file header while using multiprocessing - python

When I run the line:
def book_processing(pair, pool_length):
p = Pool(len(pool_length)*3)
temp_parameters = partial(book_call_mprocess, pair)
p.map_async(temp_parameters, pool_length).get(999999)
p.close()
p.join()
return exchange_books
I get the following error:
Traceback (most recent call last):
File "test_code.py", line 214, in <module>
current_books = book_call.book_processing(cp, book_list)
File "/home/user/Desktop/book_call.py", line 155, in book_processing
p.map_async(temp_parameters, pool_length).get(999999)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
zipfile.BadZipfile: Truncated file header
I feel as though there is some resource that is being used that didn't close during the last loop, but I am not sure how to close it (still learning about multiprocessing library). This error only occurs when my code repeats this section relatively quickly (within the same minute). This does not happen often, but is clear when it does.
Edit (adding the book_call code):
def book_call_mprocess(currency_pair, ex_list):
polo_error = 0
live_error = 0
kraken_error = 0
gdax_error = 0
ex_list = set([ex_list])
ex_Polo = 'Polo'
ex_Live = 'Live'
ex_GDAX = 'GDAX'
ex_Kraken = 'Kraken'
cp_polo = 'BTC_ETH'
cp_kraken = 'XETHXXBT'
cp_live = 'ETH/BTC'
cp_GDAX = 'ETH-BTC'
# Instances
polo_instance = poloapi.poloniex(polo_key, polo_secret)
fookraken = krakenapi.API(kraken_key, kraken_secret)
publicClient = GDAX.PublicClient()
flag = False
while not flag:
flag = False
err = False
# Polo Book
try:
if ex_Polo in ex_list:
polo_books = polo_instance.returnOrderBook(cp_polo)
exchange_books['Polo'] = polo_books
except:
err = True
polo_error = 1
# Livecoin
try:
if ex_Live in ex_list:
method = "/exchange/order_book"
live_books = OrderedDict([('currencyPair', cp_live)])
encoded_data = urllib.urlencode(live_books)
sign = hmac.new(live_secret, msg=encoded_data, digestmod=hashlib.sha256).hexdigest().upper()
headers = {"Api-key": live_key, "Sign": sign}
conn = httplib.HTTPSConnection(server)
conn.request("GET", method + '?' + encoded_data, '', headers)
response = conn.getresponse()
live_books = json.load(response)
conn.close()
exchange_books['Live'] = live_books
except:
err = True
live_error = 1
# Kraken
try:
if ex_Kraken in ex_list:
kraken_books = fookraken.query_public('Depth', {'pair': cp_kraken})
exchange_books['Kraken'] = kraken_books
except:
err = True
kraken_error = 1
# GDAX books
try:
if ex_GDAX in ex_list:
gdax_books = publicClient.getProductOrderBook(level=2, product=cp_GDAX)
exchange_books['GDAX'] = gdax_books
except:
err = True
gdax_error = 1
flag = True
if err:
flag = False
err = False
error_list = ['Polo', polo_error, 'Live', live_error, 'Kraken', kraken_error, 'GDAX', gdax_error]
print_to_excel('excel/error_handler.xlsx', 'Book Call Errors', error_list)
print "Holding..."
time.sleep(30)
return exchange_books
def print_to_excel(workbook, worksheet, data_list):
ts = str(datetime.datetime.now()).split('.')[0]
data_list = [ts] + data_list
wb = load_workbook(workbook)
if worksheet == 'active':
ws = wb.active
else:
ws = wb[worksheet]
ws.append(data_list)
wb.save(workbook)

The problem lies in the function print_to_excel
And more specifically in here:
wb = load_workbook(workbook)
If two processes are running this function at the same time, you'll run into the following race condition:
Process 1 wants to open error_handler.xlsx, since it doesn't exist it creates an empty file
Process 2 wants to open error_handler.xlsx, it does exist, so it tries to read it, but it is still empty. Since the xlsx format is just a zip file consisting of a bunch of XML files, the process expects a valid ZIP header which it doesn't find and it omits zipfile.BadZipfile: Truncated file header
What looks strange though is your error message as in the call stack I would have expected to see print_to_excel and load_workbook.
Anyway, Since you confirmed that the problem really is in the XLSX handling you can either
generate a new filename via tempfile for every process
use locking to ensure that only one process runs print_to_excel at a time

Related

Exception: Failed to process waveform

Error:
Traceback (most recent call last):
File "c:\Programming\New_assistant\speech_to_text.py", line 18, in <module>
if rec.AcceptWaveform(data):
File "C:\Users\david\AppData\Local\Programs\Python\Python310\lib\site-packages\vosk\__init__.py", line 84, in AcceptWaveform
raise Exception("Failed to process waveform")
Exception: Failed to process waveform
PS C:\Programming\New_assistant>
I get this error when I try to use AcceptWaveform regardless of the file (wav) or the rest of the code, but the error is removed only when using vosk-model-small-ru-0.22, and does not give errors on vosk-model-ru-0.22, but the processing time is too long.
Code:
from vosk import Model, KaldiRecognizer
import json
import wave
model = Model(r"File\vosk-model-small-ru-0.22")
wf = wave.open(r"File\record1.wav", "rb")
rec = KaldiRecognizer(model, 8000)
result = ''
last_n = False
while True:
data = wf.readframes(8000)
if len(data) == 0:
break
if rec.AcceptWaveform(data):
res = json.loads(rec.Result())
if res['text'] != '':
result += f" {res['text']}"
last_n = False
elif not last_n:
result += '\n'
last_n = True
res = json.loads(rec.FinalResult())
result += f" {res['text']}"
print(result)

Using the poke method, I found a solution and got an assumption about the error occurring, so if I'm wrong or you have a more complete solution, add it and I'll mark it.
If you copied the sample code using vosk, then most likely this is the version with the module vosk-model-ru-0.22 which works with the sampling rate 8000,but vosk-model-small-ru-0.22 works with 44100, so just change 8000 to 44100 (depending on the entry)

Multiprocessing with text scraping

I want to scrape <p> from pages and since there will be a couple thousands of them I want to use multiprocessing. However, it doesn't work when I try to append the result to some variable
I want to append the result of scraping to the data = []
I made a url_common for a base website since some pages don't start with HTTP etc.
from tqdm import tqdm
import faster_than_requests as requests #20% faster on average in my case than urllib.request
import bs4 as bs
def scrape(link, data):
for i in tqdm(link):
if i[:3] !='htt':
url_common = 'https://www.common_url.com/'
else:
url_common = ''
try:
ht = requests.get2str(url_common + str(i))
except:
pass
parsed = bs.BeautifulSoup(ht,'lxml')
paragraphs = parsed.find_all('p')
for p in paragraphs:
data.append(p.text)
Above doesn't work, since map() doesn't accept function like above
I tried to use it another way:
def scrape(link):
for i in tqdm(link):
if i[:3] !='htt':
url_common = 'https://www.common_url.com/'
else:
url_common = ''
try:
ht = requests.get2str(url_common + str(i))
except:
pass
parsed = bs.BeautifulSoup(ht,'lxml')
paragraphs = parsed.find_all('p')
for p in paragraphs:
print(p.text)
from multiprocessing import Pool
p = Pool(10)
links = ['link', 'other_link', 'another_link']
data = p.map(scrape, links)
I get this error while using above function:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 297, in _bootstrap
self.run()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 110, in worker
task = get()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\queues.py", line 354, in get
return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'scrape' on <module '__main__' (built-in)>
I have not figured a way to do it so that it uses Pool and at the same time appending the result of scraping to the given variable
EDIT
I change a little bit to see where it stops:
def scrape(link):
for i in tqdm(link):
if i[:3] !='htt':
url_common = 'https://www.investing.com/'
else:
url_common = ''
try: #tries are always halpful with url as you never know
ht = requests.get2str(url_common + str(i))
except:
pass
print('works1')
parsed = bs.BeautifulSoup(ht,'lxml')
paragraphs = parsed.find_all('p')
print('works2')
for p in paragraphs:
print(p.text)
links = ['link', 'other_link', 'another_link']
scrape(links)
#WORKS PROPERLY AND PRINTS EVERYTHING
if __name__ == '__main__':
p = Pool(5)
print(p.map(scrape, links))
#DOESN'T WORK, NOTHING PRINTS. Error like above

You are using the map function incorrectly.
It iterates over each element of the iterable and calls the function on each element.
You can see the map function as doing something like the following:
to_be_mapped = [1, 2, 3]
mapped = []
def mapping(x): # <-- note that the mapping accepts a single value
return x**2
for item in to_be_mapped:
res = mapping(item)
mapped.append(res)
So to solve your problem remove the outermost for-loop as iterating is handled by the map function
def scrape(link):
if link[:3] !='htt':
url_common = 'https://www.common_url.com/'
else:
url_common = ''
try:
ht = requests.get2str(url_common + str(link))
except:
pass
parsed = bs.BeautifulSoup(ht,'lxml')
paragraphs = parsed.find_all('p')
for p in paragraphs:
print(p.text)

KeyError in Python Code even though Key is present

I am trying to create a Channel Separator code to separate the transcribe that is printed in a JSON file.
I have the following code:
import json
import boto3
def lambda_handler(event, context):
if event:
s3 = boto3.client("s3")
s3_object = event["Records"][0]["s3"]
bucket_name = s3_object["bucket"]["name"]
file_name = s3_object["object"]["key"]
file_obj = s3.get_object(Bucket=bucket_name, Key=file_name)
transcript_result = json.loads(file_obj["Body"].read())
segmented_transcript = transcript_result["results"]["channel_labels"]
items = transcript_result["results"]["items"]
channel_text = []
flag = False
channel_json = {}
for no_of_channel in range (segmented_transcript["number_of_channels"]):
for word in items:
for cha in segmented_transcript["channels"]:
if cha["channel_label"] == "ch_"+str(no_of_channel):
end_time = cha["end_time"]
if "start_time" in word:
if cha["items"]:
for cha_item in cha["items"]:
if word["end_time"] == cha_item["end_time"] and word["start_time"] == cha_item["start_time"]:
channel_text.append(word["alternatives"][0]["content"])
flag = True
elif word["type"] == "punctuation":
if flag and channel_text:
temp = channel_text[-1]
temp += word["alternatives"][0]["content"]
channel_text[-1] = temp
flag = False
break
channel_json["ch_"+str(no_of_channel)] = ' '.join(channel_text)
channel_text = []
print(channel_json)
s3.put_object(Bucket="aws-speaker-separation", Key=file_name, Body=json.dumps(channel_json))
return{
'statusCode': 200,
'body': json.dumps('Channel transcript separated successfully!')
}
However, when I run it, I get an error on line 23 saying:
[ERROR] KeyError: 'end_time'
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 23, in lambda_handler
end_time = cha["end_time"]
I am confused as to why this error is happening as in my JSON code, the things to read are as follows:
JSON Code Parameters
Any ideas why this error is appearing?

cha is a channel, the end_time is a layer deeper in the items of your channel. To access the items of your channel do:
for item in cha["items"]:
print(item["end_time"])

Python zlib error while bruteforcing protected zip file

Basically, i'm trying to write a script to bruteforce a protected zip file in python that tries every character combination (i.e. aa,ba,ca,da etc). But after a few tries it retrieves a strange error, and i'm not being able to find nowhere a solution for it.
Program:
import zipfile as z
class zipVictim:
def __init__(self,file):
self.found = False
self.password = ''
self.file = file
self.extracted_file_list = []
def bruteforceAttack(self,start_char: int,end_char: int,length: int,deep_loop=False,print_mode=False):
"""
Doc
"""
def _loop_chain(self,file,start_char,end_char,length):
lList = []
sPass = ''
iAttempt = 1
for iInc in range(length):
lList.append(start_char)
while lList[len(lList)-1] < end_char:
for iInc2 in range(start_char,end_char):
for iInc3 in range(len(lList)):
sPass = sPass + chr(lList[iInc3])
if iInc3 == 0:
lList[iInc3] = lList[iInc3] + 1
elif lList[iInc3 -1] > end_char:
lList[iInc3] = lList[iInc3] + 1
lList[iInc3 -1] = start_char
try:
if print_mode:
print("Attempt %s, password: %s" % (iAttempt,sPass),end='\r')
iAttempt = iAttempt + 1
oFile.extractall(pwd=sPass.encode())
self.extracted_file_list = oFile.namelist()
self.password = sPass
self.found = True
return self.found
except RuntimeError:
pass
sPass = ''
return self.found
oFile = self.file
if not deep_loop:
_loop_chain(self,oFile,start_char,end_char,length)
else:
for iInc in range(length):
_loop_chain(self,oFile,start_char,end_char,iInc+1)
if __name__ == '__main__':
file = z.ZipFile('data.zip')
s = zipVictim(file)
s.bruteforceAttack(64,125,2,print_mode=True)
Error Retrieved:
Traceback (most recent call last):
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\zipfile.py", line 925, in _read1
data = self._decompressor.decompress(data, n)
zlib.error: Error -3 while decompressing data: invalid block type
Does anybody know what trigger this error and how to solve it?

Permissions, Python Script

I am learning to write Python Scripts for work and I have run into some problems. The script is supposed to read a file, and print the permissions to an email. My problem is I am getting an error when it tries to call the permission() method, and I don't know how to fix it.
Python Code
import smtplib
import os
import stat
result = ""
def permission(file):
s = os.stat(file)
mode = s.st_mode
if(stat.S_IRUSR & mode):
ownerRead = 1
result += ownerRead
else:
ownerRead = 0
result += ownerRead
if(stat.S_IWUSR & mode):
ownerWrite = 1
result += ownerWrite
else:
ownerWrite = 0
result += ownerWrite
if(stat.S_IXUSR & mode):
ownerExecute = 1
result += ownerExecute
else:
ownerExecute = 0
result += ownerExecute
if(stat.S_IRGRP & mode):
groupRead = 1
result += groupRead
else:
groupRead = 0
result += groupRead
if(stat.S_IWGRP & mode):
groupWrite = 1
result += groupWrite
else:
groupWrite = 0
result += groupWrite
if(stat.S_IXGRP & mode):
groupExecute = 1
result += groupExecute
else:
groupExecute = 0
result += groupExecute
if(stat.S_IROTH & mode):
otherRead = 1
result += otherRead
else:
otherRead = 0
result += otherRead
if(stat.S_IWOTH & mode):
otherWrite = 1
result += otherWrite
else:
otherWrite = 0
result += otherWrite
if(stat.S_IXOTH & mode):
otherExecute = 1
result += otherExecute
else:
otherExecute = 0
result += otherExecute
return result
to = 'email#yahoo.com'
gmail_user = 'email#gmail.com'
gmail_pwd = 'pwd'
smtpserver = smtplib.SMTP("smtp.gmail.com",587)
smtpserver.ehlo()
smtpserver.starttls()
smtpserver.ehlo
smtpserver.login(gmail_user, gmail_pwd)
header = 'To:' + to + '\n' + 'From: ' + gmail_user + '\n' + 'Subject:permissions \n'
print header
values = permission(file)
print values
msg = header + values
smtpserver.sendmail(gmail_user, to, msg)
print 'done!'
smtpserver.close()
Error Output
Traceback (most recent call last):
File "lastpart.py", line 83, in <module>
values = permission(file)
File "lastpart.py", line 15, in permission
s = os.stat(file)
TypeError: coercing to Unicode: need string or buffer, type found

You fix it by passing the actual filename to the function, not the file built-in type.

>>> file
<type 'file'>
Name your variable something other than file, and actually pass a filename to it. You never actually define the thing you're passing to your call of the function, and thus the only reason it's not crashing with an undefined variable error is because Python happens to already define something with that name, built-in.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Truncated file header while using multiprocessing - python

Related

Exception: Failed to process waveform

Multiprocessing with text scraping

KeyError in Python Code even though Key is present

Python zlib error while bruteforcing protected zip file

Permissions, Python Script

Categories

Resources