How do I use OpenPose data to segment a long clip? - python

I love the OpenPose library -- and I've been playing with the demo for a while. I like the option of it spitting out JSON file data of the poses.
I wanted to ask -- are there any examples I've missed or solutions where someone takes that pose keypoints data and uses it to segment a long clip?
For example: If I wanted to cut a clip of one person punching the other -- and use that to train a network to segment a different longer clip to TRIM only the punch if any in the other clip.
Any help would be appreciated. Using Python/Tensorflow

OpenPose analyzes each frame of the video. You just need to step into it to run your analysis and decide if you save that part or not.
You can import the video as a CV VideoCapture, extract each frame into cv Mat, convert using CV2OPMAT, exctract keypoints and run your "punch detection" on a frame. You can reference OpenPose examples for image analysis. If the frame qualifies save the frame before conversions (CV MAT) back to video using CV Video Writer like in this example: https://www.life2coding.com/convert-image-frames-video-file-using-opencv-python/
Extra consideration, you may need to convert pixels into BGR format using CV CVTCOLOR.
Let me know if it works :)

Related

Workflow to calculate depth of an object

I wanted to check if my understanding is correct. I want to calculate the depth of a crack in an object.
I have a video of the object with a crack on it captured from a smartphone. video is captured from different angles covering almost whole object. using segmentation models, I have changed the background to black keeping only object in the video.
Generate a depth map for each frame of the video using Pytorch depth estimation deep learning model.
Generate .ply file, and point cloud using depth map and frames.
calculate depth using a .ply file, but how?
Does this workflow makes sense and is it feasible? Correct me if I am wrong. Also kindly guide me in the right direction.
Removed the background of the object in the video so there should not be noise in the final output.

Godot how to get screenshot of camera viewport as a numpy array

I have set up a basic scene in Godot as below:
Godot Scene
with a floor, some obstacles (boxes) and a camera.
(I am using python as a scripting language in Godot)
I found a way to get a screenshot of the camera viewport like so:
(in Python _process function...)
old_clear_mode = self.viewport.get_clear_mode()
self.viewport.set_clear_mode(Viewport.CLEAR_MODE_ONLY_NEXT_FRAME)
img = self.viewport.get_texture().get_data()
self.viewport.set_clear_mode(old_clear_mode)
img.flip_y()
img.save_png("temp.png")
img = Image.open("temp.png").convert("RGB")
sharpened_img = self.sharpen_edges(img)
plt.imsave('temp2.png', sharpened_img)
(...)
I save the as a png, and immediately after I load that very same png to apply a transformation fro edge detection, then save it again.
This method takes close to 0.5s so it is rather slow.
The question is:
Is there a faster way to convert the "screenshot" to a numpy array, in order to apply my transformation?
NOTE:
I am trying to make this run at close to (if not at) real-time speeds, so speed optimizations are of the essence.
Thanks!
I think the following links will help you:
Link1
Link2

extract text and labels from PDF document

I am trying to detect and extract the "labels" and "dimensions" of a 2D technical drawing which is being saved as PDF using python. I came across a python library call "pytesseract" which has optical character recognition capability. I tried the demo on my image but it fails to detect most of the label/dimensions. Please suggest if there is other way to do it. Thank you**.
** Attached is a sample of the 2D technical drawing I try to detect
** what I am trying to achieve is to able to obtain the coordinate of every dimensions (the 160,120,10 4x45 etc) on the image, and extract the, as well.
About 16 months ago we asked ourselves the same question.
If you want to implement it yourself, I'd suggest the following process:
Extract the Canvas from the sheet
Separate the Cuts
Detect the Measure Regions on each Cut
Detect the individual attributes of the Measure Regions to understand where the Measure Start & End. In your particular example that's relatively easy.
Run the detected Measure Labels through OCR
Associate the Labels to the Measures
Verify your results
Alternatively you can also run it through our API and get the results as JSON.
Here's a quick visualization of the result:
Drawing Read (GT stands for General Tolerances)

How to convert images taken in a fish eye camera into plane (rectangular) images using opencv?

Is there any predefined code for this or I have to write my own code?
Also, I do not have the camera properties for this, I have only the image taken in fisheye lens and now I have to flatten the images
OpenCV provides a module for working with fisheye images: https://docs.opencv.org/3.4/db/d58/group__calib3d__fisheye.html
This is a tutorial with an example application.
Keep in mind that your task might be a bit hard to achieve since the problem is under-determined. If you have some cues in the image (such as straight lines), that might help. Otherwise, you should seek a way of getting more information about the lens. If it's a known lens type, you might find calibration info online. Also, some images might have the lens used to capture them in the EXIF data.

Recognize images in Python

I'm kinda new both to OCR recognition and Python.
What I'm trying to achieve is to run Tesseract from a Python script to 'recognize' some particular figures in a .tif.
I thought I could do some training for Tesseract but I didn't find any similar topic on Google and here at SO.
Basically I have some .tif that contains several images (like an 'arrow', a 'flower' and other icons), and I want the script to print as output the name of that icon. If it finds an arrow then print 'arrow'.
Is it feasible?
This is by no means a complete answer, but if there are multiple images in the tif and if you know the size in advance, you can standardize the image samples prior to classifying them. You would cut up the image into all the possible rectangles in the tif.
So when you create a classifier (I don't mention the methods here), the end result would take a synthesis of classifying all of the smaller rectangles.
So if given a tif , the 'arrow' or 'flower' images are 16px by 16px , say, you can use
Python PIL to create the samples.
from PIL import Image
image_samples = []
im = Image.open("input.tif")
sample_dimensions = (16,16)
for box in get_all_corner_combinations(im, sample_dimensions):
image_samples.append(im.crop(box))
classifier = YourClassifier()
classifications = []
for sample in image_samples:
classifications.append (classifier (sample))
label = fuse_classifications (classifications)
Again, I didn't talk about the learning step of actually writing YourClassifier. But hopefully this helps with laying out part of the problem.
There is a lot of research on the subject of learning to classify images as well as work in cleaning up noise in images before classifying them.
Consider browsing through this nice collection of existing Python machine learning libraries.
http://scipy-lectures.github.com/advanced/scikit-learn/index.html
There are many techniques that relate to images as well.

Categories

Resources