Find phonome and duration

Find phonome and duration - python

I was wondering if anyone knows of a python tool that finds phonemes from a text, as well as their duration.
In short, I want a forced alignment tool like aeneas, but I want the phonemes and their duration.
Thank you!

You didn't specify what kind of data you have, but I assume it's audio files with their corresponding orthographic transcriptions.
In that case, the Montreal Forced Aligner might be suitable (there is link to the executable on that page).
It is based on Kaldi, so for more robust and comprehensive solution, the kaldi-dnn-ali-gop repo provides more powerful options.

Related

Can we integrate POLQA (perceptual objective listening quality assessment), to Check Voice Quality of Call?

I have requirement to Check Voice quality Rate out of 5 ,
5 indicate excellent and 1 is bad.
i have research POLQA can do it.
but can not find any reference for Android Integration.
I found visQol library in Python.
But i need it in Android E2E.
POLQA trademarks -
POLQA
VisQOL in Python
Please help

As far as I know, you need a license to use POLQA. You could use PESQ instead, which is its predecessor. PESQ is not as good as other state of the art metrics but the source code is on ITU-T website and there is also a python implementation on https://github.com/ludlows/python-pesq
The use of PESQ is not recommended since it is outdated, but can be useful if you have no other choice.
I also found this library which is supposed to calculate POLQA but it is in python, and I haven't tried it. https://pypi.org/project/AlgorithmLib/

Just to state it very clearly and to avoid misunderstandings: The fact that source code for eg. PESQ is available on the ITU web site does not mean that you can use it freely. Any use of PESQ or POLQA is subject to a valid license agreement with OPTICOM.
The Pypi "AlgorithmLib" is also no legal option and it definitely does not implement POLQA despite stating so. It would never pass a conformance test.
We did also compare the difference between POLQA, ViSQOL and AQUA scores to many thousand subjective scores and the difference in accuracy between the three is HUGE. POLQA is standardized, independently validated and well documented by the ITU. It has excellent performance. ViSQOL is already far worse, even worse than PESQ and AQUA is even worse.
Don't let anybody fool you by showing the performance of a voice quality measurement algorithm for a small number of subjective databases. POLQA was validated on 64 and trained on >100.

You can also use Sevana AQUA library. It's another algorithm, but MOS scores match in most of the cases.

#Nadim Ansari, I have a similar requirement where I am recording audio/speech and I need to verify if it is same as the reference(original) audio. I am in the process of implementing Visqol, which for my purposes works well. This is how I started implementing it on Mac:
Download Visqol and all the dependencies, compile it according to the gitHub instructions
Prepare 2 files, reference and degraded, ideally 5-10sek each, silence at most 0.5 sec from each side (with ffmpeg)
Convert files to .wav (with ffmpeg)
Change sample rate (with ffmpeg) to 16000kHZ, as required by Visqol for speech testing, just for audio there is other sample rate
Place prepared files into the project
Compare the original and degraded file, pipe the results into a file
Read the results from the file
Decide on the score, e.g. when it's < 4, fail the test scenario
Command used for comparing 2 speech files:
./bazel-bin/visqol --reference_file out_trimmed.wav --degraded_file different_file_sample_rate_changed.wav --use_speech_mode --verbose (use speech mode for comparing speech)
With the same samples it was 4.7
With a slightly different sample it was 1 so for me it looks good so far.
Currently I'm at the point where I am comparing files using the Terminal and it works very well so I am going to implement it further, probably also changing the approach so that there are fewer steps.

Breaking Down 3D models up to lines and curves

I'm working on a project to breakdown 3D models but I'm quite lost. I hope you can help me.
I'm getting a 3D model from Autodesk BIM and the format could be native or generic CAD formats (.stp, .igs, .x_t, .stl). Then, I need to "measure" somehow the maximum dimensions to model a raw material body, it will always have the shape of a huge panel. Once I get both bodies, I will get the difference to extract the solids I need to analyze; and, on each of these bodies, I need to extract the faces, and then the lines or curves of each face.
This sounds something really easy to do on a CAD software, but the idea is to automate this process. I was looking into openSCAD, but seems that works only to model geometry and it doesn't handle well imported solids. I'm leaving a picture with the idea of what I need to do in the link below.
So, Any idea how can I do this? which langue and library can help in this project?

I can see this automation possible with a few in between steps:
OpenSCAD can handle differences well, so your "Extract Bodies" seems plausible
1.5 Before going further, you'll have to explain how you "filtered out" the cylinder. Will you do this manually? If you don't, you will have it considered for analysis and have a lot of faces as a result.
I don't think openSCAD provides you a vertex array. However, it can save to .STL, which is kinda easy to parse with the programming language of your choice, you'll have to study .stl file structure a bit (this sounds much more frightening than it is - if you open an stl with an editor you will probably immediately realize what's happening).
Since you've parsed the file, you can now calculate lines with high school math.
This is not an easy, GUI way to do what you ask, but if you have a few skills you'll have your automation, and depending on the amount of your projects it may be worth it.

I have been working in this project, and foundt the library "trimesh" is better to solve this concern. Give it a shot, and save some time.

split a table in an image into rows by whitespace using computer vision applications

I am trying to solve what I have realized is quite a hard problem to address due to my lack of expertise in the subject. Suppose I have an image of a table with 3 rows and 5 columns. Each row contains text (let's assume only english for now) or numbers (normal Indo-Arabic numerals). There is nothing but whitespace between the columns and between each row. Now assuming all rows and all columns are aligned, my task would be to get an algorithm to recognize and extract each row out from the document (don't know if I'm articulating this well enough).
Could someone suggest a good starting point (library , similar example , textbook chapter that deals with something like this) etc.. for me to get started.
My background is data science but I have just never been exposed to computer vision.
Any help would be appreciated.

You should start off with OpenCV, like Racialz suggested. This tool contains a Hough lines/Hough transform method which should be the primary and easiest way for you to find and crop text from table sections. There are many different tasks for lines to find for which people use this algorythm (like THIS or THIS), but with your task it would be much easier, because lines should be much clearer and simplier, rather than in these examples. After you do your extraction, you then will need to scan your text, for this I would suggest you using tesseract ocr engine. This engine is for free, really easy to use, it provides pretty decent results and allows you to train it to scan specific types of letters.

Is it possible to identify the graphics program from the image?

Is there a good way to identify (or at least approximate) the graphics program used to obtain a particular image? For instance, I want to know if there is a certain signature that these programs embed into an image. Any suggestions?
If not, is there a reference where I can find what all meta-information can be extracted out of an image?

Certain image file formats do have meta-data. It is format dependent. Digital cameras usually write some of their information into the meta-data. EXIF is what comes to mind. Images not acquired through a digital camera may or may not have relevant meta-data, so you can't consider meta-data of any sort to be a guaranteed reliable identifier. That's about as much as I can give as an answer, alas. I'm sure someone else may have more details.

Is there any document for the Python win32com operations on Powerpoint?

I have gat some samples about how to open a presentation and access the slides and shapes. But I want to do some more other operations(e.g. generate a thumbnail from a specified slide). What methods can I use? Is there any document illustrating all the functionalities?

Not to discourage you, but my experience using COM from Python is that you won't find many examples.
I would be shocked (but happy to see) if anybody posted a big tutorial or reference using PowerPoint in Python. Probably the best you'll find, which you've probably already found, is this article
However, if you follow along through that article and some of the other Python+COM code around, you start to see the patterns of how VB and C# code converts to Python code using the same interfaces.
Once you understand that, your best source of information is probably the PowerPoint API reference on MSDN.

From looking at the samples Jeremiah pointed to, it looks like you'd start there then do something like this, assuming you wanted to export slide #42:
Slide = Presentation.Slides(42)
Slide.Export FileName, "PNG", 1024, 768
Substitute the full path\filename.ext to the file you want to export to for Filename; string.
Use PNG, JPG, GIF, WMF, EMF, TIF (not always a good idea from PowerPoint), etc; string
The next two numbers are the width and height (in pixels) at which to export the image; VBLong (signed 32-bit (4-byte) numbers ranging in value from -2,147,483,648 to 2,147,483,647)
I've petted pythons but never coded in them; this is my best guess as to syntax. Shouldn't be too much of a stretch to fix any errors.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find phonome and duration - python

I was wondering if anyone knows of a python tool that finds phonemes from a text, as well as their duration. In short, I want a forced alignment tool like aeneas, but I want the phonemes and their duration. Thank you!

Related

Can we integrate POLQA (perceptual objective listening quality assessment), to Check Voice Quality of Call?

Breaking Down 3D models up to lines and curves

split a table in an image into rows by whitespace using computer vision applications

Is it possible to identify the graphics program from the image?

Is there any document for the Python win32com operations on Powerpoint?

Categories

Resources