So I am making my own MadLibs game, and have come across a problem. When a user is requested to enter a noun, for example, but instead enters a verb or adverb etc. I want my program to pick up on this, and ask them to enter a different word as this word does not match criteria. How do I do this? This is what I have so far:
while True:
name1 = input("A male name (protagonist): ")
if name1.endswith (('ly', 's')):
print("Sorry mate, this doesn't seem to be a proper noun. Try it again.")
continue
break
But I would like it to come out along the lines of this:
A male name (protagonist): sandwich
Sorry mate, this doesn't seem to be a proper noun. Try it again.
A male name (protagonist): Bob
How do I make it recognise nouns, adverbs etc. without me manually typing it in?
What you are looking for is Natural Language Processing mate. You have to identify which part of speech is the word & then you can tag it. NLP field is vast & complex so try researching on your own & you might come up with some solution. But I think there is no direct way to do that in Python. Python is programming language. Though you can use some tools that might help you tag POS, such as Tree tagger & try integrating them in your application.
Related
I am working on a machine learning chatbot project which uses google's speech recognition api.
Now my problem is, when I say 2 or more sentences in one command, speech recognition api returns all sentences in one string, without any fullstop or commas. As a result, it has become harder to seperate sentences. For example, if I say,
Take a photo. Tell me about today's weather. Open Google Chrome.
the speech recognition api returnes:
take a photo tell me about todays weather open Google Chrome
so, my chatbot takes this full string as one sentence.
Is there any way to extract sentences from a string like the one above?
(BTW, I am using Python)
If you are about to say multiple commands say words like "and" and split the command based on that word. Now loop through the list and pass each value to your execute function.
If the variable command stores your value split it using command.split(" and ")
I had previously answered a similar question take a look at it:
https://stackoverflow.com/a/65872940/12279129
I think you could try different approaches to solve the problem:
A Naive solution
I don't know how your system works for now but if you are just looking for some subsentences you could search in the full set of sentences if there is what you are looking for.
i.e.
input_str = "Take a photo turn on fan".lower()
if "take a photo" in input_str :
print("Just took a photo!")
if "turn on fan" in input_str :
print("Just turned the fan on!")
Ofc you could also select a separator word (like and, furthermore, ..) and use it as separator.
A more advanced solution
You could use a NLP library (i.e. spacy) and perform entity recognition so that you can isolate verbs from noun and so on.
After that you could evemtually make use of stemming and lemmatization to further generalize the recognition.
You could also perform many intermediate step with different NLP techniques like stopwords removal.
Try auto punctuation from API
Maybe you can try enabling automatic punctuation in the speech to text api and see if this works good enough for you.
That's because the Google Cloud Speech doesn't provide Natural Language Understanding and you are stuck parsing text transcripts.
You can of course create the natural language understanding component yourself, either by using simple regular expressions or using something like Rasa, but there's a smarter way, too.
Speechly provides you with everything you need to create voice user interfaces on Android, iOS or web. It returns you not only the transcript, but also actionable intents and entities that makes it a lot easier to create something a bit more complex. The best part is that it's free for up to 20 hours a month.
You can see a very simple example on how it works for instance for creating search experiences here. However, the basic idea is always the same: create a model and test that it returns correct intents for your speech input. After you are done, you integrate it to your app by looping through the returned results and whenever you get the correct intent, react in your application as needed. It's actually very simple.
You can use split method
Let your string is A
X = A.split('.')
It will make X a list which will contain items as sentences
I have one problem. Let me try to explain this little problem.
I use transliterate library in my Django project. User can write english (latin) or russian (cyrillic) letters in field. If user write russian words it change word to latin letters but if user write english words I see next error:
LanguageDetectionError: Can't detect language for the text "document" given.
I use this code:
transliterate.translit(field_value, reversed=True)
Also I notice that in that project its impossible to detect english language, isn't it?
transliterate.detect_language(field_value) return None when user enter english word.
My aim is to transliterate only if user wrote russion word, but don't touch it user wrote english word. What can you advice?
Right now I found library which can help me to detect language: https://pypi.python.org/pypi/langdetect
Who worked with this library?
Could you try detecting English and then moving on to assume Russian? I put some Russian news articles into the Python code listed below. It clearly detected that it is not English.
It's pretty simple code that can be easily applied.
isEnglish github
So I've been working on a reddit bot and have run into one issue. Lets say for example I trying to find all comments with the word "man" in it, the bot will find comments with this word but will also find comments with man in a word. For example "woman" . I only want it to find the exact word man.
I am using the package "praw" . I know this is going to be a really easy fix but for some reason I cant figure it out. This is the code to find the word in a comment.
if "man" in comment.body:
if you need to see more of the code just let me know. Any help would be great. Also I am using Python to make this bot.
The word "man" appears exactly in the word "woman". You need to be more specific in your search, and that depends on the content of the comments. Perhaps you can search for " man" (but that would still count cases like " manage"), so perhaps you can do a regex search for whitespace followed by the string "man" followed by either whitespace or punctuation?
I am working on a project based something on natural language understanding.
So, what I am currently doing is to try and reference the pronouns to their respective antecedents, for which I am trying to build a model. I have worked out the basic part of it, but to complete the task, I need to understand the narrative of the sentence. So what I want is to check whether the noun and object are associated with each other by the verb using an API in python.
Example:
method(laptop, have, operating-system) = yes
method(program, have, operating-system) = No
method("he"/"proper_noun", play, football) = yes
method("he"/"proper_noun", play, college) = No
I've heard about nltk's wordnet API, but I am not sure whether I can use it to perform the same. Can it be used?
Also, I am kind of on a clock.
Any suggestions are welcome and appreciated.
Notes: I am using parsey-mcparseface to break the sentence. I could do the same with nltk but P-MPF is more accurate.
** Why isn't there an NLU tag available? **
Edit 1:
Thanks to alexis, The thing I am trying to do is called "Anaphora Resolution".
The name for what you want is "anaphora resolution", or "coreference resolution". It's a hard problem (probably harder than you realize-- nlp tasks are like that), so unless your purpose is just to learn, I recommend you try some existing solutions. I don't know of an anaphora resolution module in the nltk itself, but you can find it as part of the Stanford CoreNLP suite.
See this question about how to interface to it from the nltk. (I haven't tried it myself).
How can I pick tags from an article or a user's post using Python?
Is the following method ok?
Build a list of word frequency from the text and sort them.
Remove some common words and pick the top 10 words remained in the list as the tags.
If the above method is ok, what library can detect if which words are common, like "the, if, you, etc" and which are descriptive words?
Here's an article on removing stop words. The link to the stop word list in the article is broken but here's another one.
The Natural Language Toolkit offers a broad variety of methods for this kind of stuff. I can't give you hands-on advice as I'm not familiar with this subject, but I think it's worth the effort to read a few articles about this topic first before you start: just picking words from the text directly won't get you very far I think, you should probably try to find similar words to the ones for that tags already exist. And of course you need to filter out the common words of the language like "the" and stuff. Again, this Python library can help you with this, at least for a few common languages.
I'd suggest you download the Stack Overflow data dump. There you get a lot of real world posts, with appropriate tags, to test different algorithms of tag selection.
But generally I doubt it will work too well. For your own question "words" is the clear winner in word count, followed by a list of words with two appearances each, like "common", "list", "method", "pick" and "tags". Which of those would you automatically choose as tags? Also the tags you chose manually contain "python" and "context", none of which shows up with high word frequency.
Train Bayes or Fischer filter with already tagged data (e.g. with Stackoverflow data dump suggested by sth) and use it to classify new posts. I'd recommend reading excellent Programming Collective Intelligence book by Toby Segaran for more information and python examples on this topic.
Instead of blacklisting words that shouldn't be tags, why don't you instead build a whitelist of words that would make for good tags?
Start with an handful of tags that you would like to have, like Python, off-topic, football, rickroll or whatnot (depends on the kind of site you are building!) and have the system only suggest between those, then let users handpick appropriate tags and also let them type in their own tags.
When enough users suggest a tag, it gets into the pool of "known good" tags for auto suggestion -- maybe after some sort of moderation, so that you can still blacklist stupid tags like the, lolol, or typoed tags like objectoriented when you have object-oriented.
Only show few suggestions. Offer autocompletion. Limit the number of tags per item. If this will be about coding, maybe some sort of language detection system (the file linux command is not too shabby on this) will help your suggestion system.