I am working on a project based something on natural language understanding.
So, what I am currently doing is to try and reference the pronouns to their respective antecedents, for which I am trying to build a model. I have worked out the basic part of it, but to complete the task, I need to understand the narrative of the sentence. So what I want is to check whether the noun and object are associated with each other by the verb using an API in python.
Example:
method(laptop, have, operating-system) = yes
method(program, have, operating-system) = No
method("he"/"proper_noun", play, football) = yes
method("he"/"proper_noun", play, college) = No
I've heard about nltk's wordnet API, but I am not sure whether I can use it to perform the same. Can it be used?
Also, I am kind of on a clock.
Any suggestions are welcome and appreciated.
Notes: I am using parsey-mcparseface to break the sentence. I could do the same with nltk but P-MPF is more accurate.
** Why isn't there an NLU tag available? **
Edit 1:
Thanks to alexis, The thing I am trying to do is called "Anaphora Resolution".
The name for what you want is "anaphora resolution", or "coreference resolution". It's a hard problem (probably harder than you realize-- nlp tasks are like that), so unless your purpose is just to learn, I recommend you try some existing solutions. I don't know of an anaphora resolution module in the nltk itself, but you can find it as part of the Stanford CoreNLP suite.
See this question about how to interface to it from the nltk. (I haven't tried it myself).
Related
I am working on a machine learning chatbot project which uses google's speech recognition api.
Now my problem is, when I say 2 or more sentences in one command, speech recognition api returns all sentences in one string, without any fullstop or commas. As a result, it has become harder to seperate sentences. For example, if I say,
Take a photo. Tell me about today's weather. Open Google Chrome.
the speech recognition api returnes:
take a photo tell me about todays weather open Google Chrome
so, my chatbot takes this full string as one sentence.
Is there any way to extract sentences from a string like the one above?
(BTW, I am using Python)
If you are about to say multiple commands say words like "and" and split the command based on that word. Now loop through the list and pass each value to your execute function.
If the variable command stores your value split it using command.split(" and ")
I had previously answered a similar question take a look at it:
https://stackoverflow.com/a/65872940/12279129
I think you could try different approaches to solve the problem:
A Naive solution
I don't know how your system works for now but if you are just looking for some subsentences you could search in the full set of sentences if there is what you are looking for.
i.e.
input_str = "Take a photo turn on fan".lower()
if "take a photo" in input_str :
print("Just took a photo!")
if "turn on fan" in input_str :
print("Just turned the fan on!")
Ofc you could also select a separator word (like and, furthermore, ..) and use it as separator.
A more advanced solution
You could use a NLP library (i.e. spacy) and perform entity recognition so that you can isolate verbs from noun and so on.
After that you could evemtually make use of stemming and lemmatization to further generalize the recognition.
You could also perform many intermediate step with different NLP techniques like stopwords removal.
Try auto punctuation from API
Maybe you can try enabling automatic punctuation in the speech to text api and see if this works good enough for you.
That's because the Google Cloud Speech doesn't provide Natural Language Understanding and you are stuck parsing text transcripts.
You can of course create the natural language understanding component yourself, either by using simple regular expressions or using something like Rasa, but there's a smarter way, too.
Speechly provides you with everything you need to create voice user interfaces on Android, iOS or web. It returns you not only the transcript, but also actionable intents and entities that makes it a lot easier to create something a bit more complex. The best part is that it's free for up to 20 hours a month.
You can see a very simple example on how it works for instance for creating search experiences here. However, the basic idea is always the same: create a model and test that it returns correct intents for your speech input. After you are done, you integrate it to your app by looping through the returned results and whenever you get the correct intent, react in your application as needed. It's actually very simple.
You can use split method
Let your string is A
X = A.split('.')
It will make X a list which will contain items as sentences
I am new to programming, and I am trying to understand transliteration - like the Google Input Tools that will allow the user to type from one language to another language.
How does transliteration work? Specifically, if I am translating from English to Hindi or English to Russsian, do I need to incorporate a dictionary of words for English, Hindi and Russian languages?
Does any one know of any tutorials showing how to write the code for transliteration? I have tried searching, but no luck.
Also, does the code have to be in JavaScript/JQuery (client side code)? My project is Python/django. Can I write the transliteration code in python/dgango?
Thanks.
Direct dictionary-to-dictionary automatic translation produces poor results due to differences in grammar and the presence of idiomatic sentences. The starting point in python, in my experience, should be NLTK (Natural Language ToolKit) libraries and tutorials.
Then, trying to provide you a working example you may start from here:
Machine Translation using babelize_shell() in NLTK
Translating human languages in Python
Google is your friend
Bing is your friend
The use of javascript/jquery depends on the UI you are planning, maybe you want to trigger an automatic translation after a few key pressed, or onblur or onchange in a input tag but is not relevant for the translation itself.
The process of translating is also really resource consuming, so I discourage you to do it inside a django view. My suggestion is to not reinvent the wheel, and use some already existing API like google or bing ones.
I found that the better search term is Input Method Editor not transliteration.
There is a project on github here: https://github.com/wikimedia/jquery.ime that deals with IME's and transliteration here.
I hope that this helps some one.
The typical way of implementing transliteration is to use a mapping dictionary. An example of this can be seen in the mapping.py file for the CyrTranslit Python package.
Word translation usages a database to convert English word into Hindi Word.
Some apps are based on this concept like:
English to Hindi Dictionary
I have been wanting to create an application using the Microsoft Speech Recognition.
My application's users are expected to often say abbreviated things, such as 'LHC' for 'Large Hadron Collider' or 'CERN'. Given that exact order, my application will return
You said: At age C.
You said: Cern
While it did work for 'CERN', it failed very badly for 'LHC'.
However, if I could make my own custom training files, I could easily place the term 'LHC' somewhere in there. Then, I could make the user access the Speech Control Panel and run my training file.
All the links I have found for this have been frustratingly useless, as they just say things like 'This is ----, you should try going to the ---- forum instead'.
If it does help, here is a list of the links:
http://compgroups.net/comp.speech.users/add-my-own-training/153194
https://groups.google.com/forum/#!topic/microsoft.public.speech.server/v58SH1ov22s
http://social.msdn.microsoft.com/Forums/en/servercorefordevelopers/thread/f7a35f3f-b352-464a-b264-e16eb4afd049
Is my problem even possible? Or are the training files themselves in a special format? If so, can that format be reproduced?
A solution that can also work on Windows XP would be ideal.
Thanks in advance!
P.S. If there are any libraries or modules out there already for this, could anyone point me to some? A Python or C/C++ solution would be splendid. Also, since I'd rather not post another question regarding this, is it possible to utilize the train utilities from command prompt (or without the GUI visible, but still having total command of all controls)?
Okay, pulling this from a thing I wrote three or four years ago now, but I believe you want to do something like this.
The grammar library is a trained system which can recognize words. You can create your own grammar library cued to specific words.
C#, sorry
using System.Speech
using System.Speech.Recognition
using System.Speech.AudioFormat
SpeechRecognitionEngine sre = new SpeechRecognitionEngine();
string[] words = {"L H C", "CERN"};
Choices choices = new Choices(words);
GrammarBuilder gb = new GrammarBuilder(choices);
Grammar grammar = new Grammar(gb);
sre.LoadGrammar(grammar);
That is as far as I can get you. From docs it looks like you can define the pronunciations somehow. So perhaps that way you could have LHC map directly to a single word. Here are the docs on the grammar class - http://msdn.microsoft.com/en-us/library/system.speech.recognition.grammar.aspx
Small update - see example in their docs here http://msdn.microsoft.com/en-us/library/ms554228.aspx
I want a code for tagging idioms in a given sentence or text using NLTK and Python.
Depends what you mean by an "idiom". Joe's suggestion of POS tagging is probably a good start - and might be what you are really after. If so, go read "Natural Language Processing with Python" by Bird et al. It is published by O'Reilly but is also available online under a Creative Commons license. This will get you started with POS tagging. It also has a good review of NLTK's abilities. For example, can some "Named Entity Recognition" techniques be adapted to do what you want? Or perhaps what you want is simply too difficult. I suspect the latter is the case (as implied by Rafi) but you will find that out in your journey. Perhaps you'll develop something new during your journey, in which case I hope you give back to the NLTK community.
For my GAE app I need to do some natural language processing to extract the subject and object from an input sentence.
Apparently NLTK can't be installed (easily) on GAE so I am looking for another solution.
I noticed GAE comes with Antlr3 but from browsing their documentation it solves a different kind of grammar problem.
Any ideas?
You can easily build and NTLK RPC server on some machine and access it.
Another option is to find another web based service that already does that (such as opencalais).
With regards to the NLTK problem specifically, my solution would probably be to fix the weird imports that NLTK is doing, and use that as originally planned. When you're done, submit a patch of course.
That said, if this ultimately involves touching the data store, the answer is that it probably can't be done in a performant way, unless your data set is small or for some reason your NLP stuff doesn't need to hit some kind of full-text index. The GAE guys are working on it, but they have indicated that no one should be expecting a quick resolution to this particular issue.