Does NLTK have a tool for dependency parsing? - python

I'm building a NLP application and have been using the Stanford Parser for most of my parsing work, but I would like to start using Python.
So far, NLTK seems like the best bet, but I cannot figure out how to parse grammatical dependencies. I.e. this is an example from the Stanford Parser. I want to be able to produce this in NTLK using Python from the original sentence "I am switching to Python.":
nsubj(switching-3, I-1)
aux(switching-3, am-2)
prep_to(switching-3, Python-5)
Can anyone give me a shove in the right direction to parse grammatical dependencies?

NLTK includes support for using the MaltParser, see nltk.parse.malt.MaltParser.
The pretrained English model for the MaltParser that's available here parses to the Stanford basic dependency representation. However, you would still need to call Stanford's JavaNLP code to convert the basic dependencies to the CCprocessed representation given above in your example parse.

Related

Python NLP: identifying the tense of a sentence using TextBlob, StanfordNLP or Google Cloud

(Note: I am aware that there have been previous posts on this question (e.g. here or here, but they are rather old and I think there has been quite some progress in NLP in the past few years.)
I am trying to determine the tense of a sentence, using natural language processing in Python.
Is there an easy-to-use package for this? If not, how would I need to implement solutions in TextBlob, StanfordNLP or Google Cloud Natural Language API?
TextBlob seems easiest to use, and I manage to get the POS tags listed, but I am not sure how I can turn the output into a 'tense prediction value' or simply a best guess on the tense. Moreover, my text is in Spanish, so I would prefer to use GoogleCloud or StanfordNLP (or any other easy to use solution) which support Spanish.
I have not managed to work with the Python interface for StanfordNLP.
Google Cloud Natural Language API seems to offer exactly what I need (see here, but I have not managed to find out how I would get to this output. I have used Google Cloud NLP for other analysis (e.g. entity sentiment analysis) and it has worked, so I am confident I could set it up if I find the right example of use.
Example of textblob:
from textblob import TextBlob
from textblob.taggers import NLTKTagger
nltk_tagger = NLTKTagger()
blob = TextBlob("I am curious to see whether NLP is able to predict the tense of this sentence., pos_tagger=nltk_tagger)
print(blob.pos_tags)
-> this prints the pos tags, how would I convert them into a prediction of the tense of this sentence?
Example with Google Cloud NLP (after setting up credentials):
from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types
text = "I am curious to see how this works"
client = language.LanguageServiceClient()
document = types.Document(
content=text,
type=enums.Document.Type.PLAIN_TEXT)
tense = (WHAT NEEDS TO COME HERE?)
print(tense)
-> I am not sure about the code that needs to be entered to predict the tense (indicated in the code)
I am quite a newbie to Python so any help on this topic would be highly appreciated! Thanks!
I don't think any NLP toolkit has a function to detect past tense right away. But you can simply get it from dependency parsing and POS tagging.
Do the dependency parse of the sentence and have a look at the root which is the main predicate of the sentence and its POS tag. If it is VBD (a verb is the past simple form), it is surely past tense. If it is VB (base form) or VBG (a gerund), you need to check its dependency children and have check if there is an auxiliary verb (deprel is aux) having the VBD tag.
If you need to cover also present/past perfect or past model expressions (I must have had...), you can just extend the conditions.
In spacy (my favorite NLP toolkit for Python), you can write it like this (assuming your input is a single sentence):
import spacy
nlp = spacy.load('en_core_web_sm')
def detect_past_sentece(sentence):
sent = list(nlp(sentence).sents)[0]
return (
sent.root.tag_ == "VBD" or
any(w.dep_ == "aux" and w.tag_ == "VBD" for w in sent.root.children))
With Google Cloud API or StanfordNLP, it would be basically the same, I am just no so familiar with the API.

Obtain relationship between words of a sentence

I am working on a project based something on natural language understanding.
So, what I am currently doing is to try and reference the pronouns to their respective antecedents, for which I am trying to build a model. I have worked out the basic part of it, but to complete the task, I need to understand the narrative of the sentence. So what I want is to check whether the noun and object are associated with each other by the verb using an API in python.
Example:
method(laptop, have, operating-system) = yes
method(program, have, operating-system) = No
method("he"/"proper_noun", play, football) = yes
method("he"/"proper_noun", play, college) = No
I've heard about nltk's wordnet API, but I am not sure whether I can use it to perform the same. Can it be used?
Also, I am kind of on a clock.
Any suggestions are welcome and appreciated.
Notes: I am using parsey-mcparseface to break the sentence. I could do the same with nltk but P-MPF is more accurate.
** Why isn't there an NLU tag available? **
Edit 1:
Thanks to alexis, The thing I am trying to do is called "Anaphora Resolution".
The name for what you want is "anaphora resolution", or "coreference resolution". It's a hard problem (probably harder than you realize-- nlp tasks are like that), so unless your purpose is just to learn, I recommend you try some existing solutions. I don't know of an anaphora resolution module in the nltk itself, but you can find it as part of the Stanford CoreNLP suite.
See this question about how to interface to it from the nltk. (I haven't tried it myself).

Natural Language dictionaries in Python

Does anyone know of a python Natural Language Processing Library or module that I could use to find synonyms (or antonyms, etc ..) of english words ?
NLTK is a very popular Python natural language toolkit.
http://nltk.org/
These links cover using NLTK to find synonyms...
http://nltk.googlecode.com/svn-/trunk/doc/howto/wordnet.html
http://www.randomhacks.net/articles/2009/12/28/experimenting-with-nltk
Pattern is also pretty powerful, and it has several features like pluralization + singularization, conjugation, parsers, wordnet access (from which you can get synonyms and antonyms), etc.
Have a look a wordnet, a lexical database made by Princeton university. Its intuitively organized in synsets, which might serve your purpose (if your still interested :)
You can download a local copy of wordnet and import it into your python code to perform NLP tasks.
Link: https://wordnet.princeton.edu/

code using nltk and python

I want a code for tagging idioms in a given sentence or text using NLTK and Python.
Depends what you mean by an "idiom". Joe's suggestion of POS tagging is probably a good start - and might be what you are really after. If so, go read "Natural Language Processing with Python" by Bird et al. It is published by O'Reilly but is also available online under a Creative Commons license. This will get you started with POS tagging. It also has a good review of NLTK's abilities. For example, can some "Named Entity Recognition" techniques be adapted to do what you want? Or perhaps what you want is simply too difficult. I suspect the latter is the case (as implied by Rafi) but you will find that out in your journey. Perhaps you'll develop something new during your journey, in which case I hope you give back to the NLTK community.

Lexical Analysis of Python Programming Language

Does anyone know where a FLEX or LEX specification file for Python exists? For example, this is a lex specification for the ANSI C programming language: http://www.quut.com/c/ANSI-C-grammar-l-1998.html
FYI, I am trying to write code highlighting into a Cocoa application. Regex won't do it because I also want grammar parsing to fold code and recognize blocks.
Lex is typically just used for tokenizing, not full parsing. Projects that use flex/lex for tokenizing typically use yacc/bison for the actual parsing.
You may want to take a look at ANTLR, a more "modern" alternative to lexx & yacc.
The ANTLR Project has a Github repo containing many ANTLR 4 grammars including at least one for Python 3.
grammar.txt is the official, complete Python grammar -- not directly lex compatible, but you should be able to massage it into a suitable form.
Have you considered using one of the existing code highlighters, like Pygments?

Categories

Resources