Python function not writing or overwriting when threading - python

I have a script that takes a a group of images on the local machine, sends it to removeBG using threading. If it is successful, it gets the resultant file and uploads to s3 and grabs the s3 URL. Now we pass this URL to BannerBear to generate a composite which returns another URL that we print to the screen and then get the file so we can write it locally.
But while all the text of each process is being printed to the screen properly, somewhere along the way the writing of the final image locally gets skipped or overwritten by the other file that is being processed. If I go one at a time it works.
Code:
# Check the result of the RemoveBG API request if ok write the file locally and upload to s3
if response.status_code == requests.codes.ok:
with open(
os.path.join(OUTPUT_DIR, os.path.splitext(file)[0] + ".png"), "wb"
) as out:
out.write(response.content)
# Open a file-like object using io.BytesIO
image_data = io.BytesIO(response.content)
# Upload the image data to S3
s3.upload_fileobj(image_data, bucket_name, object_key)
# Get the URL for the uploaded image
image_url = f"https://{bucket_name}.s3.amazonaws.com/{object_key}"
print(f"Image uploaded to S3: {image_url}")
# pass the s3 url to BannerBear
bannerbear(image_url)
# start BannerBear Func
def bannerbear(photo_url):
# Send a POST request to the endpoint URL
responseBB = requests.post(endpoint_url, headers=headers, json=data)
if responseBB.status_code == 200:
# Print the URL of the generated image
print(responseBB.json()["image_url_jpg"])
# Write the image data to a file
filedata = requests.get(responseBB.json()["image_url_jpg"])
with open(
os.path.join(OUTPUT_DIR, "bb", os.path.splitext(file)[0] + ".jpg"), "wb"
) as f:
f.write(filedata.content)
print("BannerBear File Written")
else:
# Something went wrong
print(f"An error occurred: {responseBB.status_code}")
print(response.json()["message"])
# Create a thread pool with a maximum of 4 threads
with concurrent.futures.ThreadPoolExecutor(max_workers=8) as executor:
# Iterate through all the files in the image directory
for file in os.listdir(IMAGE_DIR):
# Check if the file is an image
if file.endswith(".jpg") or file.endswith(".png"):
# Submit the task to the thread pool
executor.submit(process_image, file)
This is what I see in the console:
So it's going through all the steps.

Related

Unable to upload file to AWS S3 using python boto3 and upload_fileobj

I am trying to get a webp image, convert it to jpg and upload it to aws S3 without saving the file to disk (using io.BytesIO and boto3 upload_fileobj) , but with no success. The funny thing is that it works fine if I save the file to local disk and than use boto3 upload melhod.
This works:
r = requests.get(url)
if r.status_code == 200:
file_name = "name.jpeg"
s3 = boto3.client("s3")
webp_file = io.BytesIO(r.content)
im = Image.open(webp_file).convert("RGB")
im.save(
f"{config.app_settings.image_tmp_dir}/{file_name}", "JPEG"
)
s3.upload_file(
f"{config.app_settings.image_tmp_dir}/{file_name}",
config.app_settings.image_S3_bucket,
file_name,
ExtraArgs={"ContentType": "image/jpeg"},
)
This does not work:
r = requests.get(url)
if r.status_code == 200:
file_name = "name.jpeg"
s3 = boto3.client("s3")
webp_file = io.BytesIO(r.content)
im = Image.open(webp_file).convert("RGB")
jpg_file = io.BytesIO()
im.save(
jpg_file, "JPEG"
)
s3.upload_fileobj(
jpg_file,
config.app_settings.image_S3_bucket,
file_name,
ExtraArgs={"ContentType": "image/jpeg"},
)
I can see that the jpg_file has the correct size after im.save, but when the file is uploaded to aws S3 I get empty file.
After calling im.save(jpg_file, "JPEG"), the stream's position is still pointing after the newly-written image data. Anything that tries to read from jpg_file, will start reading from that position and will not see the image data.
You can use the stream's seek() method to move the position back to the start of the stream, before s3.upload_fileobj() tries to read it:
im.save(
jpg_file, "JPEG"
)
# Reset the stream position back to the start of the stream.
jpg_file.seek(0)
s3.upload_fileobj(
jpg_file,
config.app_settings.image_S3_bucket,
file_name,
ExtraArgs={"ContentType": "image/jpeg"},
)

Exploring an image (.jpg) file having 2 pages of a text Book and extract the text to save in .txt file (both files in local folder)

I am using Ms Azure Credentials for Computer Vision to access an image file and extract the text from it and finally to save it in .txt file. The codes are working fine with url having .jpg extension. My codes are giving some errors with
image files with .jpg extension saved in local folder.
image files from web with url which do. not have .jpg extension.
My codes are here-in-under
'''url of the remote (web) Image File'''
#remote_image_url = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-sample-data-files/master/ComputerVision/Images/landmark.jpg"
#imagefile = "<filepath>//IMAGE1.JPG"
remote_image_url = "http://site.meishij.net/r/58/25/3568808/a3568808_142682562777944.jpg"
## Saving a url image to local folder as jpg
import requests
pic_url = "http://site.meishij.net/r/58/25/3568808/a3568808_142682562777944.jpg"
#pic_url = "https://wallup.net/new-york-city-manhattan-nyc-usa-new-york-manhattan-usa-city-type-height-panorama-night-pink-sunset-blue-sky-clouds-lights-light-house-building-skyscraper-skyscrapers-5/"
with open('C://Users//ubana//OneDrive//ANIL JOSHI//PROJECTS//CONVERSION TO TXT FILE//IMAGES//pic1.jpg', 'wb') as handle:
response = requests.get(pic_url, stream=True)
if not response.ok:
print (response)
for block in response.iter_content(1024):
if not block:
break
handle.write(block)
'''
Describe an Image - remote
This example describes the contents of an image with the confidence score.
'''
print("===== Describe an image - remote =====")
# Call API
description_results = computervision_client.describe_image(remote_image_url)
# Get the captions (descriptions) from the response, with confidence level
print("Description of remote image: ")
if (len(description_results.captions) == 0):
print("No description detected.")
else:
for caption in description_results.captions:
print("'{}' with confidence {:.2f}%".format(caption.text, caption.confidence * 100))
The errors are following
**ComputerVisionErrorException Traceback (most recent call last)
in
5 print("===== Describe an image - remote =====")
6 # Call API
----> 7 description_results = computervision_client.describe_image(imgfile)
8
9 # Get the captions (descriptions) from the response, with confidence level
~\Anaconda3\lib\site-packages\azure\cognitiveservices\vision\computervision\operations_computer_vision_client_operations.py in describe_image(self, url, max_candidates, language, description_exclude, custom_headers, raw, operation_config)
201
202 if response.status_code not in [200]:
--> 203 raise models.ComputerVisionErrorException(self._deserialize, response)
204
205 deserialized = None
ComputerVisionErrorException: Image URL is badly formatted.
I appreciate if anyone help me on this issuer
Salil Ray
It seems that there is an error with the file path in
with open('C://Users//ubana//OneDrive//ANIL JOSHI//PROJECTS//CONVERSION TO TXT FILE//IMAGES//pic1.jpg', 'wb') as handle:
is incorrect. Maybe try using double backslashes rather than forward slashes in the file path, like this:
with open('C:\\Users\\ubana\\OneDrive\\ANIL JOSHI\\PROJECTS\\CONVERSION TO TXT FILE\\IMAGES\\pic1.jpg', 'wb') as handle:

App Engine - download files from Cloud Storage

I am using Python 2.7 and Reportlab to create .pdf files for display/print in my app engine system. I am using ndb.Model to store the data if that matters.
I am able to produce the equivalent of a bank statement for a single client on-line. That is; the user clicks the on-screen 'pdf' button and the .pdf statement appears on screen in a new tab, exactly as it should.
I am using the following code to save .pdf files to Google Cloud Storage successfully
buffer = StringIO.StringIO()
self.p = canvas.Canvas(buffer, pagesize=portrait(A4))
self.p.setLineWidth(0.5)
try:
# create .pdf of .csv data here
finally:
self.p.save()
pdfout = buffer.getvalue()
buffer.close()
filename = getgcsbucket() + '/InvestorStatement.pdf'
write_retry_params = gcs.RetryParams(backoff_factor=1.1)
try:
gcs_file = gcs.open(filename,
'w',
content_type='application/pdf',
retry_params=write_retry_params)
gcs_file.write(pdfout)
except:
logging.error(traceback.format_exc())
finally:
gcs_file.close()
I am using the following code to create a list of all files for display on-screen, it shows all the files stored above.
allfiles = []
bucket_name = getgcsbucket()
rfiles = gcs.listbucket(bucket_name)
for rfile in rfiles:
allfiles.append(rfile.filename)
return allfiles
My screen (html) shows rows of ([Delete] and Filename). When the user clicks the [Delete] button, the following delete code snippet works (filename is /bucket/filename, complete)
filename = self.request.get('filename')
try:
gcs.delete(filename)
except gcs.NotFoundError:
pass
My question - given I have a list of files on-screen, I want the user to click on the filename and for that file to be downloaded to the user's computer. In Google's Chrome Browser, this would result in the file being downloaded, with it's name displayed on the bottom left of the screen.
One other point, the above example is for .pdf files. I will also have to show .csv files in the list and would like them to be downloaded as well. I only want the files to be downloaded, no display is required.
So, I would like a snippet like ...
filename = self.request.get('filename')
try:
gcs.downloadtousercomputer(filename) ???
except gcs.NotFoundError:
pass
I think I have tried everything I can find both here and elsewhere. Sorry I have been so long-winded. Any hints for me?
To download a file instead of showing it in the browser, you need to add a header to your response:
self.response.headers["Content-Disposition"] = 'attachment; filename="%s"' % filename
You can specify the filename as shown above and it works for any file type.
One solution you can try is to read the file from the bucket and print the content as the response with the correct header:
import cloudstorage
...
def read_file(self, filename):
bucket_name = "/your_bucket_name"
file = bucket_name + '/' + filename
with cloudstorage.open(file) as cloudstorage_file:
self.response.headers["Content-Disposition"] = str('attachment;filename=' + filename)
contents = cloudstorage_file.read()
cloudstorage_file.close()
self.response.write(contents)
Here filename could be something you are sending as GET parameter and needs to be a file that exist on your bucket or you will raise an exception.
[1] Here you will find a sample.
[1]https://cloud.google.com/appengine/docs/standard/python/googlecloudstorageclient/read-write-to-cloud-storage

Python flask ajax save decoded base64 image to server temporarily

I am resizing images client side before sending them to my flask app.
The resized image, which is drawn into a canvas to be resized, is sent via a POST request.
In my app the image is decoded via base64:
def resize_image(item):
content = item.split(';')[1]
image_encoded = content.split(',')[1]
body = base64.decodestring(image_encoded.encode('utf-8'))
return body
The imagedata is stored as type String in the body variable. I can save the data to my local machine and it works:
filename = 'some_image.jpg'
with open(filename, 'wb') as f:
print "written"
f.write(body)
What I need is to upload the resized image to AWS3. On one point I need to read() the image contents, but until the image is saved somewhere as a file it is still a String, so it fails:
file_data = request.values['a']
imagedata = resize_image(file_data)
s3 = boto.connect_s3(app.config['MY_AWS_ID'], app.config['MY_AWS_SECRET'], host='s3.eu-central-1.amazonaws.com')
bucket_name = 'my_bucket'
bucket = s3.get_bucket(bucket_name)
k = Key(bucket)
# fails here
file_contents = imagedata.read()
k.key = "my_images/" + "test.png"
k.set_contents_from_string(file_contents)
Unless there is an other solution, I thought I save the image temporarily to my server (Heroku) and upload it and then delete it, how would this work? Deleting afterwards is important here!
set_contents_from_string takes a string as a parameter, you could probably just pass your image string data directly to it for upload to S3
Solution:
Delete this part:
file_contents = imagedata.read()
Use imagedata directly here:
k.set_contents_from_string(imagedata)
If you need to call .read() on your data, but don't need save file on disk use StringIO:
import StringIO
output = StringIO.StringIO()
output.write('decoded image')
output.seek(0)
output.read()
Out[1]: 'decoded image'

updating file with Google Drive API with mimeType html causes margin increase

I am trying to download a file using Google Drive (in python), edit some values, saving the file locally and then upload that local file as an update. I am getting the HTML version of the file and uploading using mimeType = "text/html". The edits and uploads work except for the margins or line spacing on headings (h2 & h3) increase slightly each time the script is run.
I have tried putting the content directly into the local file after the download without editing it and the same thing happens (see code below). Has anyone any ideas as to what might be causing this?
KB
# get the google drive api object
service = get_service()
# search for file
params = {}
params["q"] = "title = 'Test File'"
values = service.files().list(**params).execute()
# get file
found_file = values['items'][0]
# download file content
content = download_file(service, found_file, "text/html")
f = open('temp.html', 'wb')
f.write(content)
f.close()
file = service.files().get(fileId=found_file['id']).execute()
# set the new values
media_body = MediaFileUpload("temp.html", mimetype="text/html",
resumable=True)
try:
# update the file
updated_file = service.files().update(fileId=found_file['id'], body=file,
media_body=media_body).execute()
except:
print("error uploading file")

Categories

Resources