Corrupted SQLite Database jpegs? - python

I am trying to take BLOB type information from rows in a sqlite database and save them into picture files. They are all jpg files stored in the blob. When I attempt to loop through and create my jpeg files from the database blobs I get strange corrupted looking jpeg files that have some of the original data and shows the image, but there is alot of distortion too. I think it has to do something with the way the data is being handled from the DB. Anyways it's column six that contains the blobs, here is my code...
import sqlite3
cnt = 0
con = None
con = sqlite3.connect("C:\\my.db")
cur = con.cursor()
data = cur.execute("SELECT * FROM friends")
for item in data:
cnt = cnt + 1
a = open("C:\\images\\"+str(cnt)+".jpg", 'w')
a.write(item[6])
a.close()
It will create an image for each blob in each row, 902 to be exact, and the images actually are viewable, even of the correct picture, just some heavy distortion/corruption has been added and really mucks up the pictures? How can I create jpg picture files from each blob jpg file in my database using Python that aren't corrupted?
Thanks!

Write files in binary mode: open("C:\\images\\"+str(cnt)+".jpg", 'wb').

Related

Scrape images from excel file and store into dictionary - Python

I am trying to scrape images from an excel file and store them into a dictionary. However I am running into some issues with the package I am using. Below is my code.
#loading the Excel File and the sheet
pxl_doc = openpyxl.load_workbook(excel_file_path)
sheet = pxl_doc["Mud Motor Report "]
# calling the image_loader
image_loader = SheetImageLoader(sheet)
# get the image (put the cell you need instead of 'A1')
image_number = 0
for cell in image_loader._images:
image = image_loader.get(cell)
dict_new["image_{}".format(image_number)] = image
# showing the image
# image.show()
image_number += 1
The code above works fine, however once I get to the 7th file in my loop I hit an unusual error saying
ValueError: I/O operation on closed file.
I found this issue on another thread with no solution. The error is misleading as the file is open and is able to be read. Reading image from xlsx python won't work - I/O operation on closed file
I was wondering if there is another way to scrape images from an excel file and put them into a dictionary. I would later insert this dictionary into a sql database.

Process millions of image files and insert each file to mongodb

I have a large collection (~800,000 images, 1TB+ total size) of image files on S3 that I use some Python code to process into a dictionary for insertion to MongoDB. The dictionary contains a buffer that is used by a command like np.frombuffer to reconstruct the image.
I need to process each file and insert it into a MongoDB. So far I've tried multiprocessing the code and while this is effective, it gets slower and slower with each insert - it takes 20 min for 50,000 files but 5 hours for 250,000 files.
I have 2 things I'm unsure about:
Why does inserting get so much slower as the number of documents in the database increases, how can I address that? I'm guessing it's because of the more records you have, the more work Mongo has to do to check if the record it's trying to insert already exists but I'm not sure how to mitigate this.
What is the best approach to this type of problem? Another idea I had was bulk inserts after writing the processed image files locally.
Code sample below:
def process_image(img_file):
# define MongoClient and collections
client = MongoClient(...)
collection = client['collection_name']
# read image file from s3
obj = s3.Object(bucket_name='test_bucket', key=img_file)
im = Image.open(obj.get()['Body'].read()
# create image buffer
buffer = cv2.imencode(".jpg", im)
buffer = buffer.flatten().tobytes() # usually around 100,000 bytes
# dict to be written to mongo
d = {}
d['filename'] = img_file
d['buffer'] = buffer
# insert to mongo
collection.insert_one(d)
### multiprocessing code
from multiprocessing import Pool
pool = Pool(processes=16)
results = pool.map(process_image, ls_filenames, chunksize=500)
pool.close()
pool.join()
ls_filenames has around 800k image paths in it.
There's unnecessary overhead creating the MongoClient each time. Create it once and reuse the connection.

Problem while inserting a text file/an image into SQlite using flask:python

I'm trying to insert a text file and an image into SQlite database but the code I'm using doesn't work,even though there are no errors displayed but it won't store anything in the BLOB image column of SQLite.I have been searching the internet but couldn't find what I'm looking for.
The python code I'm using is:
data=sqlite3.Binary(f.read())
new_file = Samples(Samplename=filename,imagefile=data)
db.session.add(new_file)
db.session.commit()
And plus I dont want to save the filepath ,I want to save the file itself,even if it is not a good practice.Kindly help.TIA

Saving Images into SQLite database

I have a program that collects some data from the web-site. Text data is appended into "info" DataFrame and photo urls are appended to "photos" DataFrame.
I have already inserted "info" table to my SQL database and works really fine!
data.to_sql('Flat', con=conn, if_exists='replace',index=False)
Now i need to understand how can convert image links to Blob data and insert it into DataBase.
BLOBs are Binary Large OBjects. First you need to convert the image to a binary object.
def convertToBinaryData(imageLocation):
#Convert digital data to binary format
with open(imageLocation, 'rb') as file:
blobData = file.read()
return blobData
The rest is a simple insert, make sure you are connected. Create an insert statement, inject your binaries into this statement.
insert = """ INSERT INTO 'images' ('id', 'image') VALUES (?, ?) """
id = 1
image = convertToBinary(imageLocation)
cursor.execute(insert, (id, image))
connection.commit()
These functions are omitting how to create a connection and get a cursor, however full example can be found at: https://pynative.com/python-sqlite-blob-insert-and-retrieve-digital-data/

Python - Saving and Recovering Image/Pictures/JPEG from PostgreSQL

So, I'm trying to save an image into my PostgreSQL table in Python using psycopg2
INSERT Query (Insert.py)
#Folder/Picture is my path/ID is generated/IDnum is increment
picopen = open("Folder/Picture."+str(ID)+"."+str(IDnum)+".jpg", 'rb').read()
filename = ("Picture."+str(ID)+"."+str(IDnum)+".jpg")
#Sample file name is this Picture.1116578795.7.jpg
#upload_name is where I want the file name, upload_content is where I store the image
SQL = "INSERT INTO tbl_upload (upload_name, upload_content) VALUES (%s, %s)"
data = (filename, psycopg2.Binary(picopen))
cur.execute(SQL, data)
conn.commit()
now to recover the saved Image I perform this query (recovery.py)
cur.execute("SELECT upload_content, upload_name from tbl_upload")
for row in cur:
mypic = cur.fetchone()
open('Folder/'+row[1], 'wb').write(str(mypic[0]))
now what happens is when I execute the recovery.py it does generate a ".jpg" file but I can't view or open it.
If it helps I'm doing it using Python 2.7 and Centos7.
for the sake of additional information, I get this on the image viewer when I open the generated file.
Error interpreting JPEG image file (Not a JPEG file: starts with 0x5c 0x78)
I also tried using other formats as well (.png, .bmp)
I double checked my database type and apparently upload_content datatype is text its supposed to be bytea I thought I already had it set to bytea when I created my db. Problem solved.

Categories

Resources