Python libary to build learning sentences - python

I need to remember some declinations in another language. I do this by building a sentence where the first letters represent the phrase I have to remember. Then I use a website dictionary to find words starting with this phrase and manually combine it to a sentence which is worth remembering.
I have distilled a Python-dict of word beginnings. One column represents one sentence to build. Now the steps I need tooling or sources for:
I now require a dictionary of English words and tooling to look up all works which match the beginnings in the dictionary.
From this distilled list I need to build a memorable sentence, combining nouns and verbs with the right beginnings.
Bonus: The tool generates easy to remember phrases and not just something random.
Is there any way to automate this in Python?

Related

Find most SIMILAR sentence/string to a reference one in text corpus in python

my goal is very simple: I have a set of strings or a sentence and I want to find the most similar one within a text corpus.
For example I have the following text corpus: "The front of the library is adorned with the Word of Life mural designed by artist Millard Sheets."
And I'd like to find the substring of the original corpus which is most similar to: "the library facade is painted"
So what I should get as output is: "fhe front of the library is adorned"
The only thing I came up with is to split the original sentence in substrings of variable lengths (eg. in substrings of 3,4,5 strings) and then use something like string.similarity(substring) from the spacy python module to assess the similarities of my target text with all the substrings and then keep the one with the highest value.
It seems a pretty inefficient method. Is there anything better I can do?
It probably works to some degree, but I wouldn't expect the spacy similarity method (averaging word vectors) to work particularly well.
The task you're working on is related to paraphrase detection/identification and semantic textual similarity and there is a lot of existing work. It is frequently used for things like plagiarism detection and the evaluation of machine translation systems, so you might find more approaches by looking in those areas, too.
If you want something that works fairly quickly out of the box for English, one suggestion is terp, which was developed for MT evaluation but shown to work well for paraphrase detection:
https://github.com/snover/terp
Most methods are set up to compare two sentences, so this doesn't address your potential partial sentence matches. Maybe it would make sense to find the most similar sentence and then look for substrings within that sentence that match better than the sentence as a whole?

Find similar/synonyms/context words Python

Hello i'm looking to find a solution of my issue :
I Want to find a list of similar words with french and english
For example :
name could be : first name, last name, nom, prénom, username....
Postal address could be : city, country, street, ville, pays, code postale ....
The other answer, and comments, describe how to get synonyms, but I think you want more than that?
I can suggest two broad approaches: WordNet and word embeddings.
Using nltk and wordnet, you want to explore the adjacent graph nodes. See http://www.nltk.org/howto/wordnet.html for an overview of the functions available. I'd suggest that once you've found your start word in Wordnet, follow all its relations, but also go up to the hypernym, and do the same there.
Finding the start word is not always easy:
http://wordnetweb.princeton.edu/perl/webwn?s=Postal+address&sub=Search+WordNet&o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&h=
Instead it seems I have to use "address": http://wordnetweb.princeton.edu/perl/webwn?s=address&sub=Search+WordNet&o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&h=
and then decide which of those is the correct sense here. Then try clicking the hypernym, hyponym, sister term, etc.
To be honest, none of those feels quite right.
Open Multilingual WordNet tries to link different languages. http://compling.hss.ntu.edu.sg/omw/ So you could take your English WordNet code, and move to the French WordNet with it, or vice versa.
The other approach is to use word embeddings. You find the, say, 300 dimensional, vector of your source word, and then hunt for the nearest words in that vector space. This will be returning words that are used in similar contexts, so they could be similar meaning, or similar syntactically.
Spacy has a good implementation, see https://spacy.io/usage/spacy-101#vectors-similarity and https://spacy.io/usage/vectors-similarity
Regarding English and French, normally you would work in the two languages independently. But if you search for "multilingual word embeddings" you will find some papers and projects where the vector stays the same for the same concept in different languages.
Note: the API is geared towards telling you how two words are similar, not finding similar words. To find similar words you need to take your vector and compare with every other word vector, which is O(N) in the size of the vocabulary. So you might want to do this offline, and build your own "synonyms-and-similar" dictionary for each word of interest.
from PyDictionary import PyDictionary
dictionary=PyDictionary()
answer = dictionary.synonym(word)
word is the word for which you are finding the synonyms.

python text processing: identify nouns from individual words

I have a list of words and would like to keep only nouns.
This is not a duplicate of Extracting all Nouns from a text file using nltk
In the linked question a piece of text is processed. The accepted answer proposes a tagger. I'm aware of the different options for tagging text (nlkt, textblob, spacy), but I can't use them, since my data doesn't consist of sentences. I only have a list of individual words:
would
research
part
technologies
size
articles
analyzes
line
nltk has a wide selection of corpora. I found verbnet with a comprehensive list of verbs. But so far I didn't see anything similar for nouns. Is there something like a dictionary, where I can look up if a word is a noun, verb, adjective, etc ?
This could probably done by some online service. Microsoft translate for example returns a lot of information in their responses: https://learn.microsoft.com/en-us/azure/cognitive-services/translator/reference/v3-0-dictionary-lookup?tabs=curl
But this is a paid service. I would prefer a python package.
Regarding the ambiguity of words: Ideally I would like a dictionary that can tell me all the functions a word can have. "fish" for example is both noun and verb. "eat" is only verb, "dog" is only noun. I'm aware that this is not an exact science. A working solution would simply remove all words that can't be nouns.
Tried using wordnet?
from nltk.corpus import wordnet
words = ["would","research","part","technologies","size","articles","analyzes","line"]
for w in words:
syns = wordnet.synsets(w)
print(w, syns[0].lexname().split('.')[0]) if syns else (w, None)
You should see:
('would', None)
('research', u'noun')
('part', u'noun')
('technologies', u'noun')
('size', u'noun')
('articles', u'noun')
('analyzes', u'verb')
('line', u'noun')
You can run a POS tagger on individual fragments, it will have lower accuracy but I suppose that's already a given.
Ideally, find a POS tagger which reveals every possible reading for possible syntactic disambiguation later on in the processing pipeline. This will basically just pick out all the possible readings from the lexicon (perhaps with a probability) and let you take it from there.
Even if you use a dictionary, you will always have to deal with ambiguity, for example, the same word depending on the context can be a noun or a verb, take the word research
The government will invest on research.
The goal is to research new techniques of POS-tagging.
Most dictionaries will have more than one definition of research, example:
research as a noun
research as a verb
Where do these words come from, can you maybe pos-tag them within the context where they occur?
As #Triplee and #DavidBatista pointed out, it is really complicated to find out if a word is a noun or a verb only by itself, because in most languages, the syntax of a word depends on context.
Words are just representations of meanings. Because of that I'd like to add another proposition that might fit what you mean - instead of trying to find out if a words is a noun or a verb, try to find out if a Concept is an Object or an Action - this still has the problem of ambiguity, because a concept can carry both the Action or Object form.
However, you can stick to Concepts that only has object properties (such as TypeOf, HasAsPart, IsPartOf, etc) or Concepts that have both object and action properties (action properties are such as Subevents, Effects, Requires).
A good tool for Concept Searching is Conceptnet, it provides a WebApi to search for concepts in its network by keyword (it is based of Wikipedia and many other sites and is very complete for english language), is open and also points to synonyms in other languages (that are tagged as their common POS - you could average the POS of the synonyms to try to find out if the word is an object [noun-like] or an action [verb-like]).

How to Grab meaning of sentence using NLP?

I am new to NLP. My requirement is to parse meaning from sentences.
Example
"Perpetually Drifting is haunting in all the best ways."
"When The Fog Rolls In is a fantastic song
From above sentences, I need to extract the following sentences
"haunting in all the best ways."
"fantastic song"
Is it possible to achieve this in spacy?
It is not possible to extract the summarized sentences using spacy. I hope the following methods might work for you
Simplest one is extract the noun phrases or verb phrases. Most of the time that should give the text what you want.(Phase struce grammar).
You can use dependency parsing and extract the center word dependencies.
dependency grammar
You can train an sequence model where input is going to be the full sentence and output will be your summarized sentence.
Sequence models for text summaraization
Extracting the meaning of a sentence is a quite arbitrary task. What do you mean by the meaning? Using spaCy you can extract the dependencies between the words (which specify the meaning of the sentence), find the POS tags to check how words are used in the sentence and also find places, organizations, people using NER tagger. However, meaning of the sentence is too general even for the humans.
Maybe you are searching for a specific meaning? If that's the case, you have to train your own classifier. This will get you started.
If your task is summarization of a couple of sentences, consider also using gensim . You can have a look here.
Hope it helps :)

Are there any Python NLP tools to figure out how many ways a sentence can be parsed?

I want to be able to measure ambiguity of a sentence, and my current my idea to do so is by measuring how many ways a sentence can be parsed. For example, the sentence "Fruit flies like a banana" can have to interpretations.
So far I have tried using the Stanford Parser, but it only interpreted each sentence in one way. My other idea was to measure how many different parts of speech each word in a sentence could mean, but each POS tagger I found only marked each word with 1 tag even when it could be multiple.
Are there are tools to do either?
From the Stanford Parser FAQ page, hope it helps:
Can I obtain multiple parse trees for a single input sentence?
Yes, for the PCFG parser (only). With a PCFG parser, you can give the option -printPCFGkBest n and it will print the n highest-scoring parses for a sentence. They can be printed either as phrase structure trees or as typed dependencies in the usual way via the -outputFormat option, and each receives a score (log probability). The k best parses are extracted efficiently using the algorithm of Huang and Chiang (2005).

Categories

Resources