1)I tried the code from the official book on nltk package named /Natural Language Processing' but it gives error
dt = nltk.DiscourseTester(['A student dances', 'Every student is a person'])
print(dt.readings())
I get the error
NLTK was unable to find the mace4 file!
Use software specific configuration paramaters or set the PROVER9 environment variable.
2)I tried to use another code from the book:
from nltk import load_parser
parser = load_parser('drt.fcfg', logic_parser=nltk.DrtParser())
trees = parser.parse('Angus owns a dog'.split())
print(trees[0].node['sem'].simplify())
I got the error
AttributeError: module 'nltk' has no attribute 'DrtParser'
3)I tried the below code:
from nltk.sem import cooper_storage as cs
sentence = 'every girl chases a dog'
trees = cs.parse_with_bindops(sentence, grammar='storage.fcfg')
semrep = trees[0].label()
cs_semrep = cs.CooperStore(semrep)
print(cs_semrep.core)
for bo in cs_semrep.store:
print(bo)
cs_semrep.s_retrieve(trace=True)
for reading in cs_semrep.readings:
print(reading)
It worked but still it gave the below error:
AttributeError: 'CooperStore' object has no attribute 'core'
4) I tried another code from book:
from nltk import load_parser
parser = load_parser('simple-sem.fcfg', trace=0)
sentence = 'Angus gives a bone to every dog'
tokens = sentence.split()
trees = parser.parse(tokens)
for tree in trees:
print(tree.node['SEM'])
I got the below error:
NotImplementedError: Use label() to access a node label.
Please let me know what to do? Are these features deprecated because I heard that many of the features of nltk are. Please suggest a way out for all those features mentioned.
I found the answer, actually I was following the code from the book instead of NLTK's online version of book which is updated. So following the updated version solved the problems.
Related
Hi guys I need help on a thing, currently I'm working on a project where I have to find the semantic meaning of a word /phrase. For example
Hi, hello, good morning should return regards etc...
Any suggestion?
Thanks in advance
Your question is a bit vague, but here are two ideas that might help:
1. WordNet
WordNet is a lexical database that provides synonyms, categorisations and to some extent the 'semantic meaning' of English words. Here is the web interface to explore the database. Here is how to use it via NLTK.
Example:
from nltk.corpus import wordnet as wn
# get all possible meanings of a word. e.g. "welcome" has two possible meanings as a noun, three meanings as a verb and one meaning as an adjective
wn.synsets('welcome')
# output: [Synset('welcome.n.01'), Synset('welcome.n.02'), Synset('welcome.v.01'), Synset('welcome.v.02'), Synset('welcome.v.03'), Synset('welcome.a.01')]
# get the definition of one of these meanings:
wn.synset('welcome.n.02').definition()
# output: 'a greeting or reception'
# get the hypernym of the specific meaning, i.e. the more abstract category it belongs to
wn.synset('welcome.n.02').hypernyms()
# output: [Synset('greeting.n.01')]
2. Zero-shot-classification
HuggingFace Transformers and zero-shot classification: You can also use a pre-trained deep learning model to classify your text. In this case, you need to manually create labels for all possible different meanings you are looking for in your texts. e.g.: ["greeting", "insult", "congratulation"].
Then you can use the deep learning model to predict which label (broadly speaking 'semantic meaning') is the most adequate for your text.
Example:
# pip install transformers==3.1.0 # pip install in terminal
from transformers import pipeline
classifier = pipeline("zero-shot-classification")
sequence = "Hi, I welcome you to this event"
candidate_labels = ["greeting", "insult", "congratulation"]
classifier(sequence, candidate_labels)
# output: {'sequence': 'Hi, I welcome you to this event',
# 'labels': ['greeting', 'congratulation', 'insult'],
# 'scores': [0.9001138210296631, 0.09858417510986328, 0.001302019809372723]}
=> Each of your labels received a score and the label with the highest score would be the "semantic meaning" of your text.
Here is an interactive web application to see what the library does without coding. Here is a Jupyter notebook which demonstrates how to use it in Python. You can just copy-paste code from the notebook.
You have not shown any effort to do your own code, but here is a small example.
words = ['hello','hi','good morning']
x = input('Word here: ')
if x.lower() in words:
print('Regards')
I wanna get pronunciation of short messages using python. For example, message 'text' should be transformed to 'tekst' and message 'привет' (russian) should be transformed to 'privet'.
I have tried to use googletrans for it but there is no pronunciation in fact (pronunciation is None, my issue).
Does anybody know some package for this task? I have googled for it but there are no results. I've found over 5 packages for convert text-to-speech or text-translate-to-speech but I don't need an audio file, I need only text of pronunciation. The phonemizer is very good solution but I cannot run it's backends on windows.
Maybe does somebody know how to take some 'API' of this, this or this or this?
You can use selenium to get the texts from macmillandictionary.com.
With selenium you can navigate in the page, click, enter texts, etc. So your job will be hit the word in the search bar and get the result using selenium.
You may use oxfordlearnersdictionaries.com too.
Well there is a module named pronouncing in python which includes function to get pronunciations of words such as:
>>> import pronouncing
>>> pronouncing.phones_for_word("permit")
[u'P ER0 M IH1 T', u'P ER1 M IH2 T']
The pronouncing.phones_for_word() function returns a list of all pronunciations for the given word found in the CMU pronouncing dictionary.
Pronunciations are given using a special phonetic alphabet known as ARPAbet.
Here’s a list of ARPAbet symbols and what English sounds they stand for. Each token in a pronunciation string is called a “phone.”
The numbers after the vowels indicate the vowel’s stress. The number 1 indicates primary stress; 2 indicates secondary stress; and 0 indicates unstressed.
I got this from tutorial and cookbook page of pronouncing
There is another module named pysle which can help you
You might wanna check out the epitran python library
from googletrans import Translator
translator = Translator()
k = translator.translate("who are you", dest='hindi')
print(k)
print(k.text)
p = translator.translate(k.text,dest='hindi')#convert same language to same to get
#pronunciation
print(p)
print(p.pronunciation)
please try this for pronunciation
Output:
Translated(src=en, dest=hi, text=तुम कौन हो, pronunciation=None, extra_data="{'translat...")
तुम कौन हो
Translated(src=hi, dest=hi, text=तुम कौन हो, pronunciation=tum kaun ho, extra_data="{'translat...")
tum kaun ho
requirement to install googletrans
For a project, I would like to be able to get the noun form of an adjective or adverb if there is one using NLP.
For example, "deathly" would return "death" and "dead" would return "death".
"lively" would return "life".
I've tried using the spacy lemmatizer but it does not manage to get the base radical form.
For example, if I'd do:
import spacy
nlp = spacy.load('en_core_web_sm')
z = nlp("deathly lively")
for token in z:
print(token.lemma_)
It would return:
>>> deathly lively
instead of:
>>> death life
Does anyone have any ideas?
Any answer is appreciated.
From what I've seen so far, SpaCy is not super-great at doing what you want it to do. Instead, I am using a 3rd party library called pyinflect, which is intended to be used as an extension to SpaCy.
While it isn't perfect, I think it will work better than your current approach.
I'm also considering another 3rd-party library called inflect, which might be worth checking out, as well.
Going through the NLTK book, it's not clear how to generate a dependency tree from a given sentence.
The relevant section of the book: sub-chapter on dependency grammar gives an example figure but it doesn't show how to parse a sentence to come up with those relationships - or maybe I'm missing something fundamental in NLP?
EDIT:
I want something similar to what the stanford parser does:
Given a sentence "I shot an elephant in my sleep", it should return something like:
nsubj(shot-2, I-1)
det(elephant-4, an-3)
dobj(shot-2, elephant-4)
prep(shot-2, in-5)
poss(sleep-7, my-6)
pobj(in-5, sleep-7)
We can use Stanford Parser from NLTK.
Requirements
You need to download two things from their website:
The Stanford CoreNLP parser.
Language model for your desired language (e.g. english language model)
Warning!
Make sure that your language model version matches your Stanford CoreNLP parser version!
The current CoreNLP version as of May 22, 2018 is 3.9.1.
After downloading the two files, extract the zip file anywhere you like.
Python Code
Next, load the model and use it through NLTK
from nltk.parse.stanford import StanfordDependencyParser
path_to_jar = 'path_to/stanford-parser-full-2014-08-27/stanford-parser.jar'
path_to_models_jar = 'path_to/stanford-parser-full-2014-08-27/stanford-parser-3.4.1-models.jar'
dependency_parser = StanfordDependencyParser(path_to_jar=path_to_jar, path_to_models_jar=path_to_models_jar)
result = dependency_parser.raw_parse('I shot an elephant in my sleep')
dep = result.next()
list(dep.triples())
Output
The output of the last line is:
[((u'shot', u'VBD'), u'nsubj', (u'I', u'PRP')),
((u'shot', u'VBD'), u'dobj', (u'elephant', u'NN')),
((u'elephant', u'NN'), u'det', (u'an', u'DT')),
((u'shot', u'VBD'), u'prep', (u'in', u'IN')),
((u'in', u'IN'), u'pobj', (u'sleep', u'NN')),
((u'sleep', u'NN'), u'poss', (u'my', u'PRP$'))]
I think this is what you want.
I think you could use a corpus-based dependency parser instead of the grammar-based one NLTK provides.
Doing corpus-based dependency parsing on a even a small amount of text in Python is not ideal performance-wise. So in NLTK they do provide a wrapper to MaltParser, a corpus based dependency parser.
You might find this other question about RDF representation of sentences relevant.
If you need better performance, then spacy (https://spacy.io/) is the best choice. Usage is very simple:
import spacy
nlp = spacy.load('en')
sents = nlp(u'A woman is walking through the door.')
You'll get a dependency tree as output, and you can dig out very easily every information you need. You can also define your own custom pipelines. See more on their website.
https://spacy.io/docs/usage/
If you want to be serious about dependance parsing don't use the NLTK, all the algorithms are dated, and slow. Try something like this: https://spacy.io/
To use Stanford Parser from NLTK
1) Run CoreNLP Server at localhost
Download Stanford CoreNLP here (and also model file for your language).
The server can be started by running the following command (more details here)
# Run the server using all jars in the current directory (e.g., the CoreNLP home directory)
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000
or by NLTK API (need to configure the CORENLP_HOME environment variable first)
os.environ["CORENLP_HOME"] = "dir"
client = corenlp.CoreNLPClient()
# do something
client.stop()
2) Call the dependency parser from NLTK
>>> from nltk.parse.corenlp import CoreNLPDependencyParser
>>> dep_parser = CoreNLPDependencyParser(url='http://localhost:9000')
>>> parse, = dep_parser.raw_parse(
... 'The quick brown fox jumps over the lazy dog.'
... )
>>> print(parse.to_conll(4))
The DT 4 det
quick JJ 4 amod
brown JJ 4 amod
fox NN 5 nsubj
jumps VBZ 0 ROOT
over IN 9 case
the DT 9 det
lazy JJ 9 amod
dog NN 5 nmod
. . 5 punct
See detail documentation here, also this question NLTK CoreNLPDependencyParser: Failed to establish connection.
From the Stanford Parser documentation: "the dependencies can be obtained using our software [...] on phrase-structure trees using the EnglishGrammaticalStructure class available in the parser package." http://nlp.stanford.edu/software/stanford-dependencies.shtml
The dependencies manual also mentions: "Or our conversion tool can convert the
output of other constituency parsers to the Stanford Dependencies representation." http://nlp.stanford.edu/software/dependencies_manual.pdf
Neither functionality seem to be implemented in NLTK currently.
A little late to the party, but I wanted to add some example code with SpaCy that gets you your desired output:
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("I shot an elephant in my sleep")
for token in doc:
print("{2}({3}-{6}, {0}-{5})".format(token.text, token.tag_, token.dep_, token.head.text, token.head.tag_, token.i+1, token.head.i+1))
And here's the output, very similar to your desired output:
nsubj(shot-2, I-1)
ROOT(shot-2, shot-2)
det(elephant-4, an-3)
dobj(shot-2, elephant-4)
prep(shot-2, in-5)
poss(sleep-7, my-6)
pobj(in-5, sleep-7)
Hope that helps!
I am using NLTK to extract nouns from a text-string starting with the following command:
tagged_text = nltk.pos_tag(nltk.Text(nltk.word_tokenize(some_string)))
It works fine in English. Is there an easy way to make it work for German as well?
(I have no experience with natural language programming, but I managed to use the python nltk library which is great so far.)
Natural language software does its magic by leveraging corpora and the statistics they provide. You'll need to tell nltk about some German corpus to help it tokenize German correctly. I believe the EUROPARL corpus might help get you going.
See nltk.corpus.europarl_raw and this answer for example configuration.
Also, consider tagging this question with "nlp".
The Pattern library includes a function for parsing German sentences and the result includes the part-of-speech tags. The following is copied from their documentation:
from pattern.de import parse, split
s = parse('Die Katze liegt auf der Matte.')
s = split(s)
print s.sentences[0]
>>> Sentence('Die/DT/B-NP/O Katze/NN/I-NP/O liegt/VB/B-VP/O'
'auf/IN/B-PP/B-PNP der/DT/B-NP/I-PNP Matte/NN/I-NP/I-PNP ././O/O')
Update: Another option is spacy, there is a quick example in this blog article:
import spacy
nlp = spacy.load('de')
doc = nlp(u'Ich bin ein Berliner.')
# show universal pos tags
print(' '.join('{word}/{tag}'.format(word=t.orth_, tag=t.pos_) for t in doc))
# output: Ich/PRON bin/AUX ein/DET Berliner/NOUN ./PUNCT
Part-of-Speech (POS) tagging is very specific to a particular [natural] language. NLTK includes many different taggers, which use distinct techniques to infer the tag of a given token in a given token. Most (but not all) of these taggers use a statistical model of sorts as the main or sole device to "do the trick". Such taggers require some "training data" upon which to build this statistical representation of the language, and the training data comes in the form of corpora.
The NTLK "distribution" itself includes many of these corpora, as well a set of "corpora readers" which provide an API to read different types of corpora. I don't know the state of affairs in NTLK proper, and if this includes any german corpus. You can however locate free some free corpora which you'll then need to convert to a format that satisfies the proper NTLK corpora reader, and then you can use this to train a POS tagger for the German language.
You can even create your own corpus, but that is a hell of a painstaking job; if you work in a univeristy, you gotta find ways of bribing and otherwise coercing students to do that for you ;-)
Possibly you can use the Stanford POS tagger. Below is a recipe I wrote. There are python recipes for German NLP that I've compiled and you can access them on http://htmlpreview.github.io/?https://github.com/alvations/DLTK/blob/master/docs/index.html
#-*- coding: utf8 -*-
import os, glob, codecs
def installStanfordTag():
if not os.path.exists('stanford-postagger-full-2013-06-20'):
os.system('wget http://nlp.stanford.edu/software/stanford-postagger-full-2013-06-20.zip')
os.system('unzip stanford-postagger-full-2013-06-20.zip')
return
def tag(infile):
cmd = "./stanford-postagger.sh "+models[m]+" "+infile
tagout = os.popen(cmd).readlines()
return [i.strip() for i in tagout]
def taglinebyline(sents):
tagged = []
for ss in sents:
os.popen("echo '''"+ss+"''' > stanfordtemp.txt")
tagged.append(tag('stanfordtemp.txt')[0])
return tagged
installStanfordTag()
stagdir = './stanford-postagger-full-2013-06-20/'
models = {'fast':'models/german-fast.tagger',
'dewac':'models/german-dewac.tagger',
'hgc':'models/german-hgc.tagger'}
os.chdir(stagdir)
print os.getcwd()
m = 'fast' # It's best to use the fast german tagger if your data is small.
sentences = ['Ich bin schwanger .','Ich bin wieder schwanger .','Ich verstehe nur Bahnhof .']
tagged_sents = taglinebyline(sentences) # Call the stanford tagger
for sent in tagged_sents:
print sent
I have written a blog-post about how to convert the German annotated TIGER Corpus in order to use it with the NLTK. Have a look at it here.
It seems to be a little late to answer the question, but it might be helpful for anyone who finds this question by googling like i did. So i'd like to share the things I found out.
The HannoverTagger might be a useful tool for this Task.
You can find tutorials here and here(german), but the second one is in german.
The Tagger seems to use the STTS Tagset, if you need a complete list of all Tags.