Incorporating Lmtool into PocketSphinx? - python

I am trying to create a simple way to add new keywords into PocketSphinx. The idea is to have a temporary text file that can be used to (via a script) add a word (or phrase) automatically added to the corpus.txt, dictionary.dic and language_model.lm.
Currently the best way to do this seems to be to use lmtool and then replace the aforementioned files with the updated versions. However this presents three problems:
Lmtool is slow for large libraries, so the process will become exponentially slower as more words are added.
Lmtool requires a semi-reliable internet connection to work and I'd like to be able to add commands while offline.
This is not the most efficient way to add commands, and won't work with the setup I'm putting together.
What I'd like to be able to do is to (if possible) use/create an offline version of lmtool that takes inputs from a temporary text file (input.txt) processes them and prints the contents into three temporary text files (dic.txt, lm.txt, corp.txt).
The last step would be to run a script that will:
Take the output in corp.txt and add it to the end of corpus.txt.
Look through dictionary.dic and add any new words in dic.txt.
Somehow modify language_model.lm to include the new terms in lm.txt.
Erase the contents of the three output files.
My question is if it is possible to get an offline version of lmtool that is capable of outputting results into specific text files? I know it is possible to automate lmtool (according to their site), but I would like to be able to run the process offline if possible.
Also, has anyone attempted something like this before that I can use as a guide?
I am running pocketsphinx on a raspberry pi and I am aware that it will likely not be able to run lmtool on it's own. My plan is to have lmtool run on a local server and sync files with the pi via wifi/ethernet.
Any help would be appreciated.

You have few choice, If you want to generate dict and language model locally on Raspberry Pi (Model 2B at least)
For Language Model generation, you can use either
CMUCLMTK or
SRILM (SRI Language Modeling Toolkit)
To compile SRILM on Raspbian you need to tweak some files.
Take a look here https://github.com/G10DRAS/SRILM-on-RaspberryPi
For dictionary generation, you can use either
Phonetisaurus with G2P Model available here (or you can generate FST by yourself using phonetisaurus-cmudict-split), or
g2p-seq2seq (Sequence-to-Sequence G2P toolkit)
g2p-seq2seq based on TensorFlow which is not officially supported on RaspberryPi. For more details see Installing TensorFlow on Raspberry Pi 3
For more details (usage, how to compile etc....) please go through the documents of respective toolkit.

Related

I want to use PyWebIO to create a web app that cleans excel files and outputs the cleaned versions, is this possible and if so how?

I started a new job in which they have 0 optimization. Essentially I wanted to use a code which I have pretty much finished to clean excel files and output only the values that meet parameters. My boss said that I’d only be allowed to do this if it was a web application that was developed so I’m wondering if it’s possible and if so how would I go about it. If it can’t be done using python what would be another easy option to learn in which the code would be similar to what I wrote for the parameters in python?

How to protect your code during cloud-computing?

Before I buy my first setup. I'll launch my deep-learning-pipline on sth like vast.ai.
I never did it before, but How can I protect my script from being "stolen"?
This should be a serious launch and take around 7 days to finish training.
google colab doesn allow enough memory & ram for what i need ( need around 64GB ram)
is there a way to run a python script encrypted? (note: it makes use of libaries)
It is hard to run python encrypted. However, you could try to store the code into encrypted disk space.
There are some ways, from fully obfuscating your script to converting your script into equivalent cython or creating an executable out of it using the likes of Nuitka.
You may also implement some important logic in C/C++ (as extensions) and then call them in your script.
You may also set up a server where you feel is OK and send the bits that needs to be executed, basically create a distributed system.
As you can see there are many ways, and the deeper you go the more complex it gets.
Also you might want to have a look here as well.

How to verify if two systems are in sync

I have a requirement to test two applications (via automation using Python).
The requirement is for example we have a system called “www.abc.com”
where we develop and merge code in every 2 weeks and then we create a another system called “www.xyz.com” ( basically it is backup to the first system ), everytime we do a release and add/edit in the main system, we update in our back up system.
Now the question is i need to tests both the system, after every release (every 2 weeks) to see if they both are in sync (identical).
how do i fire a python automation test script (multiple tests) to check if for example databases, servers, UI, front end, check if code base are same in both systems? can i do that if yes any help and advice , please suggest so that i can implement possible solutions .
There are several ways you could approach this:
Assuming you are using some sort of source control you could write a script to make sure that the repo is up to date and then report back the results. See here and here. This probably won't cover the data in your databases, but there are numerous ways to back database backups and it will depend what programs you are using.
Another or additional way you might check is to write a script to gather a list of hashes or checksums of all the files you care about in both systems and then compare the list for differences.

How best to enable non-programmers to run a Python program

I have written a Python script which models an academic problem which I wish to publish. I will put the source on Github and some academics that just happen to know Python may get my source and play with it themselves. However there are probably more academics that may be interested in the model but that are not python programmers and I would like them to be able to run my model too. Even though they are not programmers they could at least try out editing the values of some of the parameters to see how that affects the results. So now my question is how could I arrange for a non-python programmer to run a Python program as easily (for them) as possible. I would guess that my options may be...
google colab
an online python compiler like this one
compiling the program into an exe (and letting the user set parameters via a config file)
something else?
So now a couple of complications that makes my problem trickier.
The output of the program is graphical and uses matplotlib. As I understand it, the utilities that turn python scripts into exe files struggle or fail altogether when it comes to matplotlib.
The source is split into two separate files, one small neat file which contains the model and the user might like to have a good look at it and get the gist of it even if they're not really a python programmer. And a separate large ugly file which just handles the graphics - an academic would have no interest in this and I'd like to spare them the gory details.
EDIT: I did ask a related question here - but that was all about programmers that won't mind doing things like installing python and using pip... this question is in relation to non-programmers who would not be comfortable doing things like that.
Colab can handle the 2 problems, but you may need to adapt some code.
Matplotlib interface: Colab can display plots just fine. But you may want user to interact with slider, checkbox, dropdown menu. Then, you need to use Colab's own Form UI, or pywidgets. See an example here
2 separate python files: you can convert one of them to a notebook. Then import the other. Or you can create a new notebook that import both files. Here's an example.

Utility to manage multiple python scripts

I saw this post on Medium, and wondered how one might go about managing multiple python scripts.
How I Hacked Amazon's Wifi Button
This describes a system where you need to run one or more scripts continuously to catch and react to events in your network.
My question: Let's say I had multiple python scripts that I wanted to do run while I work on other things. What approaches are available to manage these scripts? I have to imagine there is a better way than having a large number of terminal windows running each script individually.
I am coming back to python, and have no formal training in computer programming, so any guidance you can provide will be greatly appreciated.
Let's say I had multiple python scripts that I wanted to do run. What
approaches are available to manage these scripts? I have to imagine
there is a better way than having a large number of terminal windows
running each script individually.
If you have several .py files in a directory that you want to run, without having a specific order, you can do:
import glob
pyFiles = glob.glob('path/*.py')
for pyFile in pyFiles:
execfile(pyFile)
Your system already runs a large number of background processes, with output to the system log or occasionally to a service-specific log file.
A common arrangement for quick and dirty deployments -- where you don't necessarily want to invest in making the scripts robust and well-behaved enough to run as proper services -- is to start the script inside screen or tmux. You can detach when you don't need to be looking at it, and can reattach at any time -- even from a remote login -- to view the output, or to troubleshoot.
Take a look at luigi (I've not used it).
https://github.com/spotify/luigi
These days (five years after the question was asked) a lot of people use docker compose. But that's a little heavy weight depending on what you want to do.
I just saw today the script server of bugy. Maybe it might be a solution for you or somebody else.
(I am just trying to find a tampermonkey script structure for python..)

Categories

Resources