I want to use OpenCV Python to do SIFT feature detection on remote sensing images. These images are high resolution and can be thousands of pixels wide (7000 x 6000 or bigger). I am having trouble with insufficient memory, however. As a reference point, I ran the same 7000 x 6000 image in Matlab (using VLFEAT) without memory error, although larger images could be problematic. Does anyone have suggestions for processing this kind of data set using OpenCV SIFT?
OpenCV Error: Insufficient memory (Failed to allocate 672000000 bytes) in cv::OutOfMemoryError, file C:\projects\opencv-python\opencv\modules\core\src\alloc.cpp, line 55
OpenCV Error: Assertion failed (u != 0) in cv::Mat::create, file
(I'm using Python 2.7 and OpenCV 3.4 in the Spyder IDE on a Windows 64-bit with 32 GB of RAM.)
I would split the image into smaller windows. So long as you know the windows overlap (I assume you have an idea of the lateral shift) the match in any window will be valid.
You can even use this as a check, the translation between feature points in any part of the image must be the same for the transform to be valid
There are a few flavors how to process SIFT corner detection in this case:
process single image per unit/time one core;
multiprocess 2 or more images /unit time on single core;
multiprocess 2 or more images/unit time on multiple cores.
Read cores as either cpu or gpu. Threading result in serial processing instead of parallel.
As stated Rebecca has at least 32gb internal memory on her PC at her disposal which is more than sufficient for option 1 to process at once.
So in that light.. splitting a single image as suggested by Martin... should be a last resort in my opinion.
Why should you avoid splitting a single image in multiple windows during feature detection (w/o running out of memory)?
Answer:
If a corner is located at the spilt-side of the window and thus becomes unwillingly two more or less polygonal straight-line-like shapes you won't find the corner you're looking for, unless you got a specialized algorithm to search for those anomalies.
In casu:
In Rebecca's case its crucial to know which approach she took on processing the image(s)... Was it one, two, or many more images loaded simultaneously into memory?
If hundreds or thousands of images are simultaneously loaded into memory... you're basically choking the system by taking away its breathing space (in the form of free memory). In addition, we're not talking about other programs that are loaded into memory and claim (reserve) or consume memory space for various background programs. That comes on top of the issue at hand.
Overthinking:
If as suggested by Martin there is an issue with the Opencv lib in handling such amount of images as described by Rebecca.. do some debugging and then report your findings to Opencv, post a question here at SO as she did... but post also code that shows how you deal with the image processing at the start; as explained above why that is important. And yes as Martin stated... don't post wrappers... totally pointless to do so. A referral link to it (with possible version number) is more than enough... or a tag ;-)
Related
I am using a video which has around 30000 frames, trying to use the below FER code for emotion recognition
The entire process is taking anywhere between 10-15 hrs just to analyze the video?
Is there a way to speed up the processing time or any other algorithm to detect facial emotion?
Here is the code:
from fer import Video
from fer import FER
import os
import sys
import pandas as pd
location_videofile = "/Users/Akash/Desktop/videoplayback.mp4"
input_video = Video(location_videofile)
processing_data = input_video.analyze(face_detector, display=False, frequency=5)
Tried adding the frequency paramter in the analyze function as well, but of no use since the processing time is pretty much the same, i am assuming it affects the output and not the analyze function
With the following answer I will give you several solutions that may or may not work with your particular video.
The FER code relies on tensorflow and opencv for processing the data.
Assuming a default installation of these packages through pip, tensorflow is already running on gpu (you may want to double check that), while opencv is not.
Some of the functionalities of opencv can run on gpu and they may be the ones that FER is using: in this case, you may want to build the opencv package with GPU support (you can take a look here).
Another solution is to downsample the video-frames of you video by your own before supplying it to FER.
Downsample each frame of the video in order to reduce the number of pixels in each frame. This may give a huge speed-up, if you can afford it (i.e. faces are occupying much of the screen and the number of frame pixels is relatively high)
Multiprocessing. You could split the video in several mini-videos that you can analyse with multiple python processes. In my opinion, this is the cheapest and more reliable way to deal with the speed issue without loss in accuracy
Seeking to random points in a video file with OpenCV seems to be much slower than in media players like Windows Media Player or VLC. I am trying to seek to different positions on a video file encoded in H264 (or MPEG-4 AVC (part10)) using VideoCapture and the time taken to seek to the position seems to be proportional to the frame number queried. Here's a small code example of what I'm trying to do:
import cv2
cap = cv2.VideoCapture('example_file')
frame_positions = [200, 400, 8000, 200000]
for frame_position in frame_positions:
cap.set(cv2.cv.CV_CAP_PROP_FRAMES, frame_position)
img = cap.read()
cv2.imshow('window', img)
cv2.waitKey(0)
The perceived times for when the images are displayed from above are proportional to the frame number. That is, frame number 200 and 400, barely have any delay, 8000 some noticeable lag, but 200000 would take almost half a minute.
Why isn't OpenCV able to seek as "quickly" as say Windows Media Player? Could it be that OpenCV is not using the FFMPEG codecs correctly while seeking? Would building OpenCV from sources with some alternate configuration for codecs help? If so, could someone tell me what the configuration could be?
I have only tested this on Windows 7 and 10 PCs, with OpenCV binaries as is, with relevant FFMPEG DLLs in system path.
Another observation: With OpenCV (binaries) versions greater than 2.4.9 (Example 2.4.11, 3.3.0), the first seek works, but not the subsequent ones. That is, it can seek to frame 200 from above example, but not to 400 and the rest; the video just jumps back to frame 0. But since it works for me with 2.4.9, I'm happy for now.
GPU acceleration should not matter for seeking, because you are not decoding frames. In addition, even if you were decoding frames, doing so on the GPU would be slower than on the CPU, because your CPU nowadays has video codecs "soldered" into the chip, which makes video decoding very fast, and there would have to be some book-keeping to shovel data from main memory into the GPU.
It sounds like OpenCV implements a "safe" way of seeking: Video files can contain stream offsets. For example, your audio stream may be set off against your video stream. As another example, you might have cut away the beginning of a video and saved the result. If your cut did not happen precisely at a key frame, video editing software like ffmpeg will include a small number of frames before your cut in the output file, in order to allow the frame at which your cut happened to be decoded properly (for which the previous frames might be necessary). In this case, too, there will be a stream offset.
In order to make sure that such offsets are interpreted the right way, that is, to really hit exactly the desired frame relative to "time 0", the only "easy", but expensive way is to really eat and decode all the video frames. And that's apparently what openCV is doing here. Your video players do not bother about this, because everyday users don't notice and the controls in the GUI are anyway much to imprecise.
I might be wrong about this. But answers to other questions and some experiments I conducted to evaluate them showed that only the "slow" way of counting the frames in a video gave accurate results.
It's likely because that is a very basic code example and the mentioned applications are doing something more clever.
A few points:
Windows Media Player has hardware acceleration
Windows Media Player almost definitly uses your GPU, you could try disabling this to see what difference it makes
VLC is an open source project so you could check out it's code to see how it does video seeking
VLC probably also uses your GPU
OpenCV provides GPU functions that will most likely make your code much quicker
If speed for seeking is important, you almost definitly want to work with the GPU when doing video operations:
https://github.com/opencv/opencv/blob/master/samples/gpu/video_reader.cpp
Here are some related github issues:
https://github.com/opencv/opencv/issues/4890
https://github.com/opencv/opencv/issues/9053
Re-encode your video with ffmpeg. It works for me.
I am working with some EDF (European Data Format) images, and I have the following problem: if I load the files in a npy array, and I compare a certain array element with the corresponding raw file, I get that
The files look the same BUT
The difference is not 0. Plotting Image_from_stack - Ram_image, I get a striped value distribution (see image). Does anyone have a suggestion on what could be the cause for this, and how to fix it?
To make things more interesting, the difference changes from image to image, but it always shows a striped pattern.
I am working in python.
A note for future readers: the problem explained above was related to a scientific programming script running on a high performance computing machine. The script was using a substantial amount of memory (up to 100 GB).
My guess is that the striped pattern effect presented above is related to such anomalous memory requirements. After rebooting the machine I couldn't replicate the problem.
So in case you see something similar, check the memory usage. If it's very high, give reboot a chance!
I am dealing with several large txt file, each of them has about 8000000 lines. A short example of the lines are:
usedfor zipper fasten_coat
usedfor zipper fasten_jacket
usedfor zipper fasten_pant
usedfor your_foot walk
atlocation camera cupboard
atlocation camera drawer
atlocation camera house
relatedto more plenty
The code to store them in a dictionary is:
dicCSK = collections.defaultdict(list)
for line in finCSK:
line=line.strip('\n')
try:
r, c1, c2 = line.split(" ")
except ValueError:
print line
dicCSK[c1].append(r+" "+c2)
It runs good in the first txt file, but when it runs to the second txt file, I got an error MemoryError.
I am using window 7 64bit with python 2.7 32bit, intel i5 cpu, with 8Gb memory. How can I solve the problem?
Further explaining:
I have four large files, each file contains different information for many entities. For example, I want to find all information for cat, its father node animal and its child node persian cat and so on. So my program first read all txt files in the dictionary, then I scan all dictionaries to find information for cat and its father and its children.
Simplest solution: You're probably running out of virtual address space (any other form of error usually means running really slowly for a long time before you finally get a MemoryError). This is because a 32 bit application on Windows (and most OSes) is limited to 2 GB of user mode address space (Windows can be tweaked to make it 3 GB, but that's still a low cap). You've got 8 GB of RAM, but your program can't use (at least) 3/4 of it. Python has a fair amount of per-object overhead (object header, allocation alignment, etc.), odds are the strings alone are using close to a GB of RAM, and that's before you deal with the overhead of the dictionary, the rest of your program, the rest of Python, etc. If memory space fragments enough, and the dictionary needs to grow, it may not have enough contiguous space to reallocate, and you'll get a MemoryError.
Install a 64 bit version of Python (if you can, I'd recommend upgrading to Python 3 for other reasons); it will use more memory, but then, it will have access to a lot more memory space (and more physical RAM as well).
If that's not enough, consider converting to a sqlite3 database (or some other DB), so it naturally spills to disk when the data gets too large for main memory, while still having fairly efficient lookup.
Assuming your example text is representative of all the text, one line would consume about 75 bytes on my machine:
In [3]: sys.getsizeof('usedfor zipper fasten_coat')
Out[3]: 75
Doing some rough math:
75 bytes * 8,000,000 lines / 1024 / 1024 = ~572 MB
So roughly 572 meg to store the strings alone for one of these files. Once you start adding in additional, similarly structured and sized files, you'll quickly approach your virtual address space limits, as mentioned in #ShadowRanger's answer.
If upgrading your python isn't feasible for you, or if it only kicks the can down the road (you have finite physical memory after all), you really have two options: write your results to temporary files in-between loading in and reading the input files, or write your results to a database. Since you need to further post-process the strings after aggregating them, writing to a database would be the superior approach.
Say I have some huge amount of data stored in an HDF5 data file (size: 20k x 20k, if not more) and I want to create an image from all of this data using Python. Obviously, this much data cannot be opened and stored in the memory without an error. Therefore, is there some other library or method that would not require all of the data to be dumped into the memory and then processed into an image (like how the libraries: Image, matplotlib, numpy, etc. handle it)?
Thanks.
This question comes from a similar question I asked: Generating pcolormesh images from very large data sets saved in H5 files with Python But I think that the question I posed here covers a broader range of applications.
EDIT (7.6.2013)
Allow me to clarify my question further: In the first question (the link), I was using the easiest method I could think of to generate an image from a large collection of data stored in multiple files. This method was to import the data, generate a pcolormesh plot using matplotlib, and then save a high resolution image from this plot. But there are obvious memory limitations to this approach. I can only import about 10 data sets from the files before I reach a memory error.
In that question, I was asking if there is a better method to patch together the data sets (that are saved in HDF5 files) into a single image without importing all of the data into the memory of the computer. (I will likely require 100s of these data sets to be patched together into a single image.) Also, I need to do everything in Python to make it automated (as this script will need to be run very often for different data sets).
The real question I discovered while trying to get this to work using various libraries is: How can I work with high resolution images in Python? For example, if I have a very high resolution PNG image, how can I manipulate it with Python (crop, split, run through an fft, etc.)? In my experience, I have always run into memory issues when trying to import high resolution images (think ridiculously high resolution pictures from a microscope or telescope (my application is a microscope)). Are there any libraries designed to handle such images?
Or, conversely, how can I generate a high resolution image from a massive amount of data saved in a file with Python? Again the data file could be arbitrarily large (5-6 Gigabytes if not larger).
But in my actual application, my question is: Is there a library or some kind of technique that would allow me to take all of the data sets that I receive from my device (which are saved in HDF5) and patch them together to generate an image from all of them? Or I could save all of the data sets in a single (very large) HDF5 file. Then how could I import this one file and then create an image from its data?
I do not care about displaying the data in some interactive plot. The resolution of the plot is not important. I can easily use a lower resolution for it, but I must be able to generate and save a high resolution image from the data.
Hope this clarifies my question. Feel free to ask any other questions about my question.
You say it "obviously can't be stored in memory", but the following calculations say otherwise.
20,000 * 20,000 pixels * 4 channels = 1.6GB
Most reasonably modern computers have 8GB to 16GB of memory so handling 1.6GB shouldn't be a problem.
However, in order to handle the patchworking you need to do, you could stream each pixel from one file into the other. This assumes the format is a lossless bitmap using a linear encoding format like BMP or TIFF. Simply read each file and append to your result file.
You may need to get a bit clever if the files are different sizes or patched together in some type of grid. In that case, you'd need to calculate the total dimensions of the resulting image and offset the file writing pointer.