Exporting image generated by a camera in Python

Exporting image generated by a camera in Python - python

I am navigating the SDK provided by a microscope camera supplier and face a challenge understanding how the image is processed. For those interested, the camera is this on: http://www.touptek.com/product/showproduct.php?id=285&lang=en.
In the unique python example provided by the supplier, here is how the image is generated and displayed in a Qt GUI interface:
First, the script finds the camera using the uvcham.py file provided in the SDK.
a = uvcham.Uvcham.enum()
self.hcam = uvcham.Uvcham.open(a[0].id)
Then, the scripts extract the resolution, width, height, and bufsize (don't know what this is):
self.hcam.put(uvcham.UVCHAM_FORMAT, 2) # format: RGB888
res = self.hcam.get(uvcham.UVCHAM_RES)
self.w = self.hcam.get(uvcham.UVCHAM_WIDTH | res)
self.h = self.hcam.get(uvcham.UVCHAM_HEIGHT | res)
bufsize = ((self.w * 24 + 31) // 32 * 4) * self.h
self.buf = bytes(bufsize)
self.hcam.start(self.buf, self.cameraCallback, self)
(see the following chunk of code for self.cameraCallback)
It then emits the image (eventImage is a pyqtSignal())
#staticmethod
def cameraCallback(nEvent, ctx):
if nEvent == uvcham.UVCHAM_EVENT_IMAGE:
ctx.eventImage.emit()
and lastly it displays the image in the Qt GUI using the following code:
#pyqtSlot()
def eventImageSignal(self):
if self.hcam is not None:
self.total += 1
self.setWindowTitle('{}: {}'.format(self.camname, self.total))
img = QImage(self.buf, self.w, self.h, (self.w * 24 + 31) // 32 * 4, QImage.Format_RGB888)
self.label.setPixmap(QPixmap.fromImage(img))
So, now my question is what if I want to save a video? How can I handle the different part of this script to store somewhere, and properly, the multiple frames generated by the camera, and which are here displayed in a Qt GUI, and later turn these multiples frames into a video?
I know how to do such operations with OpenCV, but here it's different I think, the image is generated using its buf, width, height, and some calculations that I don't understand.
I tried using OpenCV directly to handle this camera using the following classical function:
video = cv2.VideoCapture(1+cv2.CAP_ANY)
and the problem is that the camera is not stable when handled that way through openCV and directshow (a few frame sometimes display X/Y offset and/or color issue). In contrast, the camera, and the images it produced, are very stable when using the method described in the first part of my post (with the Qt GUI).
Have anyone here ever worked with such a way to generate image from a camera using resolution, width, height, buf (?), and could help me navigate this? My final objective here is to be able to record videos using this camera via an automated method (meaning through lines of code, so my need to understand these lines of codes rather than using the manual software provided by the supplier).
Thank you in advance for your help

Related

Obtaining the image iterations before final image has been generated StableDiffusionPipeline.pretrained

I am currently using the diffusers StableDiffusionPipeline (from hugging face) to generate AI images with a discord bot which I use with my friends. I was wondering if it was possible to get a preview of the image being generated before it is finished?
For example, if an image takes 20 seconds to generate, since it is using diffusion it starts off blury and gradually gets better and better. What I want is to save the image on each iteration (or every few seconds) and see how it progresses. How would I be able to do this?
class ImageGenerator:
def __init__(self, socket_listener, pretty_logger, prisma):
self.model = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", revision="fp16", torch_dtype=torch.float16, use_auth_token=os.environ.get("HF_AUTH_TOKEN"))
self.model = self.model.to("cuda")
async def generate_image(self, data):
start_time = time.time()
with autocast("cuda"):
image = self.model(data.description, height=self.default_height, width=self.default_width,
num_inference_steps=self.default_inference_steps, guidance_scale=self.default_guidance_scale)
image.save(...)
The code I have currently is this, however it only returns the image when it is completely done. I have tried to look into how the image is generated inside of the StableDiffusionPipeline but I cannot find anywhere where the image is generated. If anybody could provide any pointers/tips on where I can begin that would be very helpful.

You can use the callback argument of the stable diffusion pipeline to get the latent space representation of the image: link to documentation
The implementation shows how the latents are converted back to an image. We just have to copy that code and decode the latents.
Here is a small example that saves the generated image every 5 steps:
from diffusers import StableDiffusionPipeline
import torch
#load model
model = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", revision="fp16", torch_dtype=torch.float16, use_auth_token="YOUR TOKEN HERE")
model = model.to("cuda")
def callback(iter, t, latents):
# convert latents to image
with torch.no_grad():
latents = 1 / 0.18215 * latents
image = model.vae.decode(latents).sample
image = (image / 2 + 0.5).clamp(0, 1)
# we always cast to float32 as this does not cause significant overhead and is compatible with bfloa16
image = image.cpu().permute(0, 2, 3, 1).float().numpy()
# convert to PIL Images
image = model.numpy_to_pil(image)
# do something with the Images
for i, img in enumerate(image):
img.save(f"iter_{iter}_img{i}.png")
# generate image (note the `callback` and `callback_steps` argument)
image = model("tree", callback=callback, callback_steps=5)
To understand the stable diffusion model I highly recommend this blog post.

Fast screenshot of a small part of the screen in Python

I am currently working on a project where I need to take a 30x40 pixels screenshot from a specific area of my screen. This is not very hard to do as there are plenty of methods that do that.
The issue I have is that I need to take about 10 to 15 screenshots/second of the size I mentioned. When I looked at some of these methods that capture the screen, I have seen that when you give them parameters for a smaller selection, there's cropping involved. So a full screenshot is being taken, then the method crops it to the given size. That seems like a waste of resources if I'm only going to use 30x40 image, especially considering I will take thousands of screenshots.
So my question is: Is there a method that ONLY captures a part of the screen without capturing the whole screen cutting the desired section out of the big screenshot? I'm currently using this command:
im = pyautogui.screenshot(region=(0,0, 30, 40)).

The Python mss module ( https://github.com/BoboTiG/python-mss , https://python-mss.readthedocs.io/examples.html ), an ultra fast cross-platform multiple screenshots module in pure Python using ctypes ( where MSS stands for Multiple Screen Shots ), is what you are looking for. The screenshots are fast enough to capture frames from a video and the smaller the part of the screen to grab the faster the capture (so there is apparently no cropping involved ). Check it out. mss.mss().grab() outperforms by far PIL.ImageGrab.grab(). Below a code example showing how to get the data of the screenshot pixels (allows to detect changes):
import mss
from time import perf_counter as T
left = 0
right = 2
top = 0
btm = 2
with mss.mss() as sct:
# parameter for sct.grab() can be:
monitor = sct.monitors[1] # entire screen
bbox = (left, top, right, btm) # screen part to capture
sT=T()
sct_im = sct.grab(bbox) # type: <class 'mss.screenshot.ScreenShot'>
eT=T();print(" >", eT-sT) # > 0.0003100260073551908
print(len(sct_im.raw), sct_im.raw)
# 16 bytearray(b'-12\xff\x02DU\xff-12\xff"S_\xff')
print(len(sct_im.rgb), sct_im.rgb)
# 12 b'21-UD\x0221-_S"'

PIL Gif generation not working as expected

What I'm trying to do: Since I'm still quite new to image generation using the PIL library, I decided to experiment with putting images on top of gifs. There were not a lot of proper tutorials or references I could use.
What's going wrong: More often than not, the gif would not be generated. This would give the error IndexError: bytearray index out of range which I'm not sure how to fix. However, sometimes the gif would be generated, but there would be some errors. I have included some of these gifs below.
The code:
#client.command()
async def salt(ctx, user:discord.Member=None):
if user == None:
user = ctx.author
animated_gif = Image.open("salty.gif")
response = requests.get(user.avatar_url)
background = Image.open(BytesIO(response.content))
all_frames = []
# background = background.resize((500, 500))
for gif_frame in ImageSequence.Iterator(animated_gif):
# duplicate background image
new_frame = background.copy()
# need to convert from `P` to `RGBA` to use it in `paste()` as mask for transparency
gif_frame = gif_frame.convert('RGBA')
# paste on background using mask to get transparency
new_frame.paste(gif_frame, mask=gif_frame)
all_frames.append(new_frame)
# save all frames as animated gif
all_frames[0].save("image.gif", save_all=True, append_images=all_frames[1:], duration=50, loop=0)
This is the gif I am using:

Unfortunately animated GIF support in PIL is faulty, and hard to work with at all. The images you are showing suffer from the layers sharing the palette information with the background layer, so they have some of their colors distorted.
I don't know if there is a way to control the palette information for each frame using PIL.
If you want to generate GIFs progamaticaly using Python, I'd, for now, recommend that you use the GIMP Image editor - there you can build your image, either interactively using the program, or programaticaly, using the Python console, and just call the "save as gif" function (pdb.file_gif_save2).
(I will take a look at PILs exact capabilities, and check if I can extend the answer on proper handling of transparency - otherwise, GIMP is the way to go)

Real time OCR in python

The problem
Im trying to capture my desktop with OpenCV and have Tesseract OCR find text and set it as a variable, for example, if I was going to play a game and have the capturing frame over a resource amount, I want it to print that and use it. A perfect example of this is a video by Micheal Reeves
where whenever he loses health in a game it shows it and sends it to his Bluetooth enabled airsoft gun to shoot him. So far I have this:
# imports
from PIL import ImageGrab
from PIL import Image
import numpy as np
import pytesseract
import argparse
import cv2
import os
fourcc = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter("output.avi", fourcc, 5.0, (1366, 768))
while(True):
x = 760
y = 968
ox = 50
oy = 22
# screen capture
img = ImageGrab.grab(bbox=(x, y, x + ox, y + oy))
img_np = np.array(img)
frame = cv2.cvtColor(img_np, cv2.COLOR_BGR2RGB)
cv2.imshow("Screen", frame)
out.write(frame)
if cv2.waitKey(1) == 0:
break
out.release()
cv2.destroyAllWindows()
it captures real-time and displays it in a window but I have no clue how to make it recognise the text every frame and output it.
any help?

It's fairly simple to grab the screen and pass it to tesseract for OCRing.
The PIL (pillow) library can grab the frames easily on MacOS and Windows. However, this feature has only recently been added for Linux, so the code below works around it not existing. (I'm on Ubuntu 19.10 and my Pillow does not support it).
Essentially the user starts the program with screen-region rectangle co-ordinates. The main loop continually grabs this area of the screen, feeding it to Tesseract. If Tesseract finds any non-whitespace text in that image, it is written to stdout.
Note that this is not a proper Real Time system. There is no guarantee of timeliness, each frame takes as long as it takes. Your machine might get 60 FPS or it might get 6. This will also be greatly influenced by the size of the rectangle your ask it to monitor.
#! /usr/bin/env python3
import sys
import pytesseract
from PIL import Image
# Import ImageGrab if possible, might fail on Linux
try:
from PIL import ImageGrab
use_grab = True
except Exception as ex:
# Some older versions of pillow don't support ImageGrab on Linux
# In which case we will use XLib
if ( sys.platform == 'linux' ):
from Xlib import display, X
use_grab = False
else:
raise ex
def screenGrab( rect ):
""" Given a rectangle, return a PIL Image of that part of the screen.
Handles a Linux installation with and older Pillow by falling-back
to using XLib """
global use_grab
x, y, width, height = rect
if ( use_grab ):
image = PIL.ImageGrab.grab( bbox=[ x, y, x+width, y+height ] )
else:
# ImageGrab can be missing under Linux
dsp = display.Display()
root = dsp.screen().root
raw_image = root.get_image( x, y, width, height, X.ZPixmap, 0xffffffff )
image = Image.frombuffer( "RGB", ( width, height ), raw_image.data, "raw", "BGRX", 0, 1 )
# DEBUG image.save( '/tmp/screen_grab.png', 'PNG' )
return image
### Do some rudimentary command line argument handling
### So the user can speicify the area of the screen to watch
if ( __name__ == "__main__" ):
EXE = sys.argv[0]
del( sys.argv[0] )
# EDIT: catch zero-args
if ( len( sys.argv ) != 4 or sys.argv[0] in ( '--help', '-h', '-?', '/?' ) ): # some minor help
sys.stderr.write( EXE + ": monitors section of screen for text\n" )
sys.stderr.write( EXE + ": Give x, y, width, height as arguments\n" )
sys.exit( 1 )
# TODO - add error checking
x = int( sys.argv[0] )
y = int( sys.argv[1] )
width = int( sys.argv[2] )
height = int( sys.argv[3] )
# Area of screen to monitor
screen_rect = [ x, y, width, height ]
print( EXE + ": watching " + str( screen_rect ) )
### Loop forever, monitoring the user-specified rectangle of the screen
while ( True ):
image = screenGrab( screen_rect ) # Grab the area of the screen
text = pytesseract.image_to_string( image ) # OCR the image
# IF the OCR found anything, write it to stdout.
text = text.strip()
if ( len( text ) > 0 ):
print( text )
This answer was cobbled together from various other answers on SO.
If you use this answer for anything regularly, it would be worth adding a rate-limiter to save some CPU. It could probably sleep for half a second every loop.

Tesseract is a single-use command-line application using files for input and output, meaning every OCR call creates a new process and initializes a new Tesseract engine, which includes reading multi-megabyte data files from disk. Its suitability as a real-time OCR engine will depend on the exact use case—more pixels requires more time—and which parameters are provided to tune the OCR engine. Some experimentation may ultimately be required to tune the engine to the exact scenario, but also expect the time required to OCR for a frame may exceed the frame time and a reduction in the frequency of OCR execution may be required, i.e. performing OCR at 10-20 FPS rather than 60+ FPS the game may be running at.
In my experience, a reasonably complex document in a 2200x1700px image can take anywhere from 0.5s to 2s using the english fast model with 4 cores (the default) on an aging CPU, however this "complex document" represents the worst-case scenario and makes no assumptions on the structure of the text being recognized. For many scenarios, such as extracting data from a game screen, assumptions can be made to implement a few optimizations and speed up OCR:
Reduce the size of the input image. When extracting specific information from the screen, crop the grabbed screen image as much as possible to only that information. If you're trying to extract a value like health, crop the image around just the health value.
Use the "fast" trained models to improve speed at the cost of accuracy. You can use the -l option to specify different models and the --testdata-dir option to specify the directory containing your model files. You can download multiple models and rename the files to "eng_fast.traineddata", "eng_best.traineddata", etc.
Use the --psm parameter to prevent page segmentation not required for your scenario. --psm 7 may be the best option for singular pieces of information, but play around with different values and find which works best.
Restrict the allowed character set if you know which characters will be used, such as if you're only looking for numerics, by changing the whitelist configuration value: -c tessedit_char_whitelist='1234567890'.
pytesseract is the best way to get started with implementing Tesseract, and the library can handle image input directly (although it saves the image to a file before passing to Tesseract) and pass the resulting text back using image_to_string(...).
import pytesseract
# Capture frame...
# If the frame requires cropping:
frame = frame[y:y + h, x:x + w]
# Perform OCR
text = pytesseract.image_to_string(frame, lang="eng_fast" config="--psm 7")
# Process the result
health = int(text)

Alright, I was having the same issue as you so I did some research into it and I'm sure that I found the solution! First, you will need these libraries:
cv2
pytesseract
Pillow(PIL)
numpy
Installation:
To install cv2, simply use this in a command line/command prompt: pip install opencv-python
Installing pytesseract is a little bit harder as you also need to pre-install Tesseract which is the program that actually does the ocr reading. First, follow this tutorial on how to install Tesseract. After that, in a command line/command prompt just use the command: pip install pytesseract
If you don't install this right you will get an error using the ocr
To install Pillow use the following command in a command-line/command prompt: python -m pip install --upgrade Pillow or python3 -m pip install --upgrade Pillow. The one that uses python works for me
To install NumPy, use the following command in a command-line/command prompt: pip install numpy. Thought it's usually already installed in most python libraries.
Code:
This code was made by me and as of right now it works how I want it to and similar to the effect that Michal had. It will take the top left of your screen, take a recorded image of it and show a window display of the image it's currently using OCR to read. Then in the console, it is printing out the text that it read on the screen.
# OCR Screen Scanner
# By Dornu Inene
# Libraries that you show have all installed
import cv2
import numpy as np
import pytesseract
# We only need the ImageGrab class from PIL
from PIL import ImageGrab
# Run forever unless you press Esc
while True:
# This instance will generate an image from
# the point of (115, 143) and (569, 283) in format of (x, y)
cap = ImageGrab.grab(bbox=(115, 143, 569, 283))
# For us to use cv2.imshow we need to convert the image into a numpy array
cap_arr = np.array(cap)
# This isn't really needed for getting the text from a window but
# It will show the image that it is reading it from
# cv2.imshow() shows a window display and it is using the image that we got
# use array as input to image
cv2.imshow("", cap_arr)
# Read the image that was grabbed from ImageGrab.grab using pytesseract.image_to_string
# This is the main thing that will collect the text information from that specific area of the window
text = pytesseract.image_to_string(cap)
# This just removes spaces from the beginning and ends of text
# and makes the the it reads more clean
text = text.strip()
# If any text was translated from the image, print it
if len(text) > 0:
print(text)
# This line will break the while loop when you press Esc
if cv2.waitKey(1) == 27:
break
# This will make sure all windows created from cv2 is destroyed
cv2.destroyAllWindows()
I hope this helped you with what you were looking for, it sure did help me!

Python creating a function that pulls a picture from a url then resizing to thumbnail

So I have been having problems trying to write the function to change the size of an image if to big and saving it as a thumbnail. I have how to retrieve the image just lost after that. I know about pillow but cant use for the class any help would be appreciated.
Update: So far I have gotten the code to resize the image and make it a thumbnail. The next part that I am on is having it save if resized to thumbnail2, but if it stays the same save as thumbnail1. Here is my code so far without the next step.
import urllib
url ="https://upload.wikimedia.org/wikipedia/commons/4/47/PNG_transparency_demonstra ion_1.png"
src = "C:\Users\laramie\Pictures\PNG_transparency_demonstration_1.png"
connect = urllib.urlretrieve(url, src)
def scalePicture(src):
newWidth = getWidth(src)/2
newHeight = getHeight(src)/2
canvas = makeEmptyPicture(newWidth, newHeight)
for x in range(newWidth):
for y in range(newHeight):
setColor(getPixel(canvas, x,y), getColor(getPixel(src, x*2, y*2)))
return canvas
def thumbNail():
srcPic = makePicture(src)
destWidth = getWidth(srcPic) / 2
destHeight = getHeight(srcPic) / 2
destPic = makeEmptyPicture(destWidth, destHeight)
destPic = scalePicture(srcPic)
show(srcPic)
show(destPic)
thumbNail()

There are a bunch of strange things going on in your code:
destPic = makeEmptyPicture(destWidth, destHeight)
destPic = scalePicture(srcPic)
the first line here is not required, because the destPic is overwritten immediately.
for x in range(newWidth):
for y in range(newHeight):
setColor(getPixel(canvas, x,y), getColor(getPixel(src, x*2, y*2)))
Ths is a very inefficient way to scale an image, that gives inferior results, unless the scale factor is an integer, and even then there are faster and better approaches.
I would recommend you to import PIL (Python Image Library) and use it to work with images. Things like loadng, saving, scaling or flipping images are easily done. However, you may need to install this library if it did not come with your python installation.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.