I am automating a computer game using Sikuli as a hobby project and to hopefully get good enough to make scripts to help me at my job. In a certain small region, (20x20 pixels) one of 15 characters will appear. Right now I have these 15 images defined as variables, and then using an if, elif loop I am doing Region.exists(). If one of my images is present in the region, I assign a variable the appropriate value.
I am doing this for two areas on the screen and then based on the combination of characters the script clicks appropriately.
The problem right now is that to run the 15 if statements is taking approximately 10 seconds. I was hoping to do this recognition in closer to 1 second.
These are just text characters but the OCR feature was not reading them reliably and I wanted close to 100% accuracy.
Is this an appropriate way to do OCR? Is there a better way you guys can recommend? I haven't done much coding in the last 3 years so I am wondering if OCR has improved and if Sikuli is still even a relevant program. Seeing as this is just a hobby project I am hoping to stick to free solutions.
Sikuli operates by scanning a Screen or a part of a screen and attempting to match a set pattern. Naturally, the smaller the pattern is, the more time it will consume to match it. There few ways to improve the detection time:
Region and Pattern manipulation (bound region size)
Functions settings (reduce minimum wait time)
Configuration (amend scan rate)
I have described the issue in some more detail here.
OCR is still quite unreliable. There are ways to improve that but if you only have a limited set of characters, I reckon you will be better off using them as patterns. It will be quicker and more reliable.
As of Sikuli itself, the tool is under active development and is still relevant if it helps you to solve your problem.
Related
This is more of a 'what is this called' kind of question than a technical one. I have recently started playing with PyAutoGUI and I am using it to do some automation. In order to improve the speed of the overall function I am trying to narrow down the 'region' in which its looking. How would I identify a region by looking for a specific "border" ignoring the internal contents. I don't really need any code, unless your just that bored, just trying to learn what techniques are available to accomplish this task or maybe some helpful keywords that I can use in my search. I am having a very difficult time finding any resources that relate to my objective.
For example, how would I match the entire dimensions of the following picture regardless of what is inside the frame.
I actually have Photodiode connect to my PC an do capturing with Audacity.
I want to improve this by using an old RPI1 as dedicated test station. As result the shutter speed should appear on the console. I would prefere a python solution for getting signal an analyse it.
Can anyone give me some suggestions? I played around with oct2py, but i dont really under stand how to calculate the time between the two peak of the signal.
I have no expertise on sound analysis with Python and this is what I found doing some internet research as far as I am interested by this topic
pyAudioAnalysis for an eponym purpose
You an use pyAudioAnalysis developed by Theodoros Giannakopoulos
Towards your end, function mtFileClassification() from audioSegmentation.py can be a good start. This function
splits an audio signal to successive mid-term segments and extracts mid-term feature statistics from each of these sgments, using mtFeatureExtraction() from audioFeatureExtraction.py
classifies each segment using a pre-trained supervised model
merges successive fix-sized segments that share the same class label to larger segments
visualize statistics regarding the results of the segmentation - classification process.
For instance
from pyAudioAnalysis import audioSegmentation as aS
[flagsInd, classesAll, acc, CM] = aS.mtFileClassification("data/scottish.wav","data/svmSM", "svm", True, 'data/scottish.segments')
Note that the last argument of this function is a .segment file. This is used as ground-truth (if available) in order to estimate the overall performance of the classification-segmentation method. If this file does not exist, the performance measure is not calculated. These files are simple comma-separated files of the format: ,,. For example:
0.01,9.90,speech
9.90,10.70,silence
10.70,23.50,speech
23.50,184.30,music
184.30,185.10,silence
185.10,200.75,speech
...
If I have well understood your question this is at least what you want to generate isn't it ? I rather think you have to provide it there.
Most of these information are directly quoted from his wiki which I suggest you to read it. Yet don't hesitate to reach out as far as I am really interested by this topic
Other available libraries for audio analysis :
I am trying to solve what I have realized is quite a hard problem to address due to my lack of expertise in the subject. Suppose I have an image of a table with 3 rows and 5 columns. Each row contains text (let's assume only english for now) or numbers (normal Indo-Arabic numerals). There is nothing but whitespace between the columns and between each row. Now assuming all rows and all columns are aligned, my task would be to get an algorithm to recognize and extract each row out from the document (don't know if I'm articulating this well enough).
Could someone suggest a good starting point (library , similar example , textbook chapter that deals with something like this) etc.. for me to get started.
My background is data science but I have just never been exposed to computer vision.
Any help would be appreciated.
You should start off with OpenCV, like Racialz suggested. This tool contains a Hough lines/Hough transform method which should be the primary and easiest way for you to find and crop text from table sections. There are many different tasks for lines to find for which people use this algorythm (like THIS or THIS), but with your task it would be much easier, because lines should be much clearer and simplier, rather than in these examples. After you do your extraction, you then will need to scan your text, for this I would suggest you using tesseract ocr engine. This engine is for free, really easy to use, it provides pretty decent results and allows you to train it to scan specific types of letters.
I am working on image processing and computer vision project. The project is to count the number of people entering the conference. This need to done in OpenCV or Python.
I have already tried the Haar Cascade that is available in OpenCV for Upper body: Detect upper body portion using OpenCV
However, it does not address the requirement. The link of the videos is as follows:
https://drive.google.com/open?id=0B3LatSCwKo2benZyVXhKLXV6R0U
If you view the sample1 file, at 0:16 secs a person is entering the room, that would always be the way. The camera is on top of the door.
Identifying People from this Aerial Video Stream
I think there is a simple way of approaching this problem. Background subtraction methods for detecting moving objects are just what you need because the video you provided seems to only have one moving object at any point: the person walking through the door. Thus, if you follow this tutorial in Python, you should be able to implement a satisfying solution for your problem.
Counting People Entering / Exiting
Now, the first question that pops to my mind is what might I do to count if multiple people are walking through the door at separate time intervals (one person walks in 10 seconds into the video and a second person walks in 20 seconds into the video)? Here's the simplest solution to this consideration that I can think of. Once you've detected the blob(s) via background subtraction, you only have to track the blob until it goes off the frame. Once it leaves the frame, the next blob you detect must be a new person entering the room and thus you can continue counting. If you aren't familiar with how to track objects once they have been detected, give this tutorial a read. In this manner, you'd avoid counting the same blob (i.e., the same person) entering too many times.
The Difficulties in Processing Complex Dynamic Environments
If you think that there is a high level of traffic through that doorway, then the problem becomes much more difficult. This is because in that case there may not be much stationary background to subtract at any given moment, and further there may be a lot of overlap between detected blobs. There is a lot of active research in the area of autonomous pedestrian tracking and identification - so, in short, it's a difficult question that doesn't have a straightforward easy-to-implement solution. However, if you're interested in reading about some of the potential approaches you could take to solving these more challenging problems in pedestrian detection from an aerial view, I'd recommend reading the answers to this question.
I hope this helps, good luck coding!
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have a project proposal for music lovers who have no knowledge in audio processing. I think the project is interesting, but I don't have clear picture on how to implement it.
The project proposal: Some people like singing, but they cannot find appropriate musical accompaniment (background music). People who can play guitar, they may sing with playing guitar (the rhythm provided by guitar is background music). The project is to achieve the similar result like playing guitar for people singing.
I think to implement this project, the following components are required:
Musical knowledge (how guitar plays as background music (maybe simple pattern will work))
signal/audio processing
Key detection
Beat detection
Chord matching
Is there any other component I missed to achieve my purpose? Any libraries can help me? The project is supposed to be completed in 1.5 month. Is it possible? (I just expect it to work like guitar beginners playing background music). For development languages, I will not use c/c++. Currently my favorite is python, but possibly use other programming language as long as it can help simplify the implementation process.
I have no musical background and just studies very basic audio processing. Any suggestions or comments are apprietiated.
Edited Information:
I tried to search auto accompaniment, and there are some software. I didn't find any open source project for it, I want to know the details on how it process audio information. If you know any open source project about it , please share your knowledge, thank you.
You might start by considering what a guitarist would have to do to successfully accompany a singer singing in a situation where that they have no prior knowledge of the key, chord progression, or rhythm of the song (not to mention its structure, style, etc.)
Doing this in real-time in a situation where the accompanist (human or computer) has not heard the song before will be difficult, as it will take some time to analyse what's being sung in order to make appropriate musical choices about the accompaniment. A guitarist or other musician having this ability in the real world would be considered highly skilled.
It sounds like a very challenging project for 1.5 months if you have no musical background. 'maybe simple pattern will work' - maybe, but there are a huge number of simple patterns possible!
Less ambitious projects might be:
record a whole song and analyse it, then render a backing (still a
lot of work!)
to create a single harmony line or part, in the same
way that vocal harmoniser effects do
generating a backing based on a
chord progression input by the user
Edit in reply to your first comment:
If you wanted to generate a full accompaniment, you will need to (as you say) deal with both the key and chord progression, and the timing (including time signature and detecting which beat of the bar is 'beat 1')
Getting this level of timing information this may be difficult, as beat detection from voice only is not going to be possible using the standard techniques used to get beat from a song (looking for amplitude peaks in certain frequency ranges).
You might still get good results by not caculating timing at all, and simply playing your chords in time with the start of the sung notes (or a subset of them).
All you would then need to do is
detect the notes. This post is about detecting pitch in python: Python frequency detection. Amplitude detection is more straightforward.
come up with an algorithm for working out the root note of the piece (and - more ambitiously - places where it changes). In some cases it may be hard to discern from the melody alone. You could start by assuming that the first note or most common note is the root.
come up with an algorithm for generating a chord progression (do a web search for 'harmonising a melody'). Obviously there is no objectively right or wrong way to do this and you will likely only be able to do this convincingly for a limited range of styles. You might want to start by assuming a limited subset of chords, e.g. I, IV, V. These should work on most simple 'nursery rhyme' style tunes.
Of course if you limit yourself to simple tunes that start on beat one, you might have an easier time working out the time signature. In general I think your route to success will be to try to deal with the easy cases first and then build on that.