Can I find subject from Spacy Dependency tree using NLTK in python? - python

I want to find the subject from a sentence using Spacy. The code below is working fine and giving a dependency tree.
import spacy
from nltk import Tree
en_nlp = spacy.load('en')
doc = en_nlp("The quick brown fox jumps over the lazy dog.")
def to_nltk_tree(node):
if node.n_lefts + node.n_rights > 0:
return Tree(node.orth_, [to_nltk_tree(child) for child in node.children])
else:
return node.orth_
[to_nltk_tree(sent.root).pretty_print() for sent in doc.sents]
From this dependency tree code, Can I find the subject of this sentence?

I'm not sure whether you want to write code using the nltk parse tree (see How to identify the subject of a sentence? ). But, spacy also generates this with the 'nsubj' label of the word.dep_ property.
import spacy
from nltk import Tree
en_nlp = spacy.load('en')
doc = en_nlp("The quick brown fox jumps over the lazy dog.")
sentence = next(doc.sents)
for word in sentence:
... print "%s:%s" % (word,word.dep_)
...
The:det
quick:amod
brown:amod
fox:nsubj
jumps:ROOT
over:prep
the:det
lazy:amod
dog:pobj
Reminder that there could more complicated situations where there is more than one.
>>> doc2 = en_nlp(u'When we study hard, we usually do well.')
>>> sentence2 = next(doc2.sents)
>>> for word in sentence2:
... print "%s:%s" %(word,word.dep_)
...
When:advmod
we:nsubj
study:advcl
hard:advmod
,:punct
we:nsubj
usually:advmod
do:ROOT
well:advmod
.:punct

Same with leavesof3, I prefer to use spaCy for this kind of purpose. It has better visualization, i.e.
the subject will be the word or phrase (if you use noun chunking) with the dependency property "nsubj" or "normal subject"
You can access displaCy (spaCy visualization) demo here

Try this:
import spacy
import en_core_web_sm
nlp = spacy.load('en_core_web_sm')
sent = "I need to be able to log into the Equitable siteI tried my username and password from the AXA Equitable site which worked fine yesterday but it won't allow me to log in and when I try to change my password it says my answer is incorrect for the secret question I just need to be able to log into the Equitable site"
nlp_doc=nlp(sent)
subject = [tok for tok in nlp_doc if (tok.dep_ == "nsubj") ]
print(subject)

Related

How to parse answers with deepl api?

I want to use deepl translate api for my university project, but I can't parse it. I want to use it wit PHP or with Python, because the argument I'll pass to a python script so it's indifferent to me which will be the end. I tried in php like this:
$original = $_GET['searchterm'];
$deeplTranslateURL='https://api-free.deepl.com/v2/translate?auth_key=MYKEY&text='.urlencode($original).'&target_lang=EN';
if (get_headers($deeplTranslateURL)[0]=='HTTP/1.1 200 OK') {
$translated = str_replace(' ', '', json_decode(file_get_contents($deeplTranslateURL))["translations"][0]["text"]);
}else{
echo("translate error");
}
$output = passthru("python search.py $original $translated");
and I tried also in search.py based this answer:
#!/usr/bin/env python
import sys
import requests
r = requests.post(url='https://api.deepl.com/v2/translate',
data = {
'target_lang' : 'EN',
'auth_key' : 'MYKEY',
'text': str(sys.argv)[1]
})
print 'Argument:', sys.argv[1]
print 'Argument List:', str(sys.argv)
print 'translated to: ', str(r.json()["translations"][0]["text"])
But neither got me any answer, how can I do correctly? Also I know I can do it somehow in cURL but I didn't used that lib ever.
DeepL now has a python library that makes translation with python much easier, and eliminates the need to use requests and parse a response.
Get started as such:
import deepl
translator = deepl.Translator(auth_key)
result = translator.translate_text(text_you_want_to_translate, target_lang="EN-US")
print(result)
Looking at your question, it looks like search.py might have a couple problems, namely that sys splits up every individual word into a single item in a list, so you're only passing a single word to DeepL. This is a problem because DeepL is a contextual translator: it builds a translation based on the words in a sentence - it doesn't simply act as a dictionary for individual words. If you want to translate single words, DeepL API probably isn't what you want to go with.
However, if you are actually trying to pass a sentence to DeepL, I have built out this new search.py that should work for you:
import sys
import deepl
auth_key="your_auth_key"
translator = deepl.Translator(auth_key)
"""
" ".join(sys.argv[1:]) converts all list items after item [0]
into a string separated by spaces
"""
result = translator.translate_text(" ".join(sys.argv[1:]), target_lang = "EN-US")
print('Argument:', sys.argv[1])
print('Argument List:', str(sys.argv))
print("String to translate: ", " ".join(sys.argv[1:]))
print("Translated String:", result)
I ran the program by entering this:
search.py Der Künstler wurde mit einem Preis ausgezeichnet.
and received this output:
Argument: Der
Argument List: ['search.py', 'Der', 'Künstler', 'wurde', 'mit', 'einem',
'Preis', 'ausgezeichnet.']
String to translate: Der Künstler wurde mit einem Preis ausgezeichnet.
Translated String: The artist was awarded a prize.
I hope this helps, and that it's not too far past the end of your University Project!

python wikipedia package changing input

I'm running a script to get pages related to a word using python (pip3 install wikipedia). I enter a word to search, let's say the word is "cat". I send that to the code below, but the wikipedia code changes it to "hat" and returns pages related to "hat". It does this with any word I search for (ie: "bear" becomes "beard". "dog" becomes "do", etc...)
wikipedia_page_name = "cat"
print("Original: ", wikipedia_page_name)
myString = wikipedia.page(wikipedia_page_name)
print("Returned: ", myString)
Here is what I get back:
Original: cat
Returned: <WikipediaPage 'Hat'>
My steps to use this were to install wikipedia "pip3 install wikipedia" and then import it "import wikipedia". That's it! I've tried uninstalling and then reinstalling, but I get the same results.
Any help is appreciated!
If you want to work with the page <WikipediaPage 'Cat'>, please try to set auto_suggest to False as suggest can be pretty bad at finding the right page:
import wikipedia
wikipedia_page_name = "cat"
print("Original: ", wikipedia_page_name)
myString = wikipedia.page(wikipedia_page_name, pageid=None, auto_suggest=False)
print("Returned: ", myString)
Output:
Original: cat
Returned: <WikipediaPage 'Cat'>
If you want to find titles, use search instead:
import wikipedia
wikipedia_page_name = "cat"
searches = wikipedia.search(wikipedia_page_name)
print(searches)
Output:
['Cat', 'Cat (disambiguation)', 'Keyboard Cat', 'Calico cat', 'Pussy Cat Pussy Cat', 'Felidae', "Schrödinger's cat", 'Tabby cat', 'Bengal cat', 'Sphynx cat']
You can use both together to make sure you get the right page from a String, as such:
import wikipedia
wikipedia_page_name = "cat"
searches = wikipedia.search(wikipedia_page_name)
if searches:
my_page = wikipedia.page(searches[0], pageid=None, auto_suggest=False)
print(my_page)
else:
print("No page found for the String", wikipedia_page_name)
Output:
<WikipediaPage 'Cat'>

Fancy string substitution for requirements to leave comments on the same padding on python

I write a handy tool for myself to leave comments in pyproject.yaml file in front of every requirement.
So it should look something like this.
[tool.poetry.dependencies]
django = "3.0.5"
djangorestframework = "3.11.0" # Rest api [ https://www.django-rest-framework.org ]
psycopg2-binary = "2.8.4" # PostgreSQL driver
redis = "3.4.1" # The Python interface to the Redis key-value store
I know how to do it in a simple ugly way, but probably you can come with something sexy and clever?
I thought about:
new_text = re.sub('^(django[\s|=].*").*', r"\1 # COMMENTS HERE", text, flags=re.MULTILINE)
Where instead of django will be package name of course... But it seems that there is no way to keep the same indentation because I can't get the length of the matched string there. And also I can't just use toml parser and rewrite file from scratch, because I want to keep existing comments which are not related to requirements.
I tried another approach. But I don't like it either.
poetry = parsed_toml['tool']['poetry']
for dep_type in ('dependencies', 'dev-dependencies'):
for dependency, version in poetry[dep_type].items():
char_length = len(dependency + version)
text = re.sub(f'^{dependency}.*{version}.*', f'{dependency} = "{version}" {" " * (45 - char_length)} # Comment', text, flags=re.MULTILINE)
I also want to make this substitution with less possible iteration. So if you have any bright ideas, please share :)
You can get the length of the match by using a replacement function:
import re
text = '''\
[tool.poetry.dependencies]
django = "3.0.5"
djangorestframework = "3.11.0" # Rest api [ https://www.django-rest-framework.org ]
psycopg2-binary = "2.8.4" # PostgreSQL driver
redis = "3.4.1" # The Python interface to the Redis key-value store
'''
def repl(m):
return f'{m.group(1):44s}# COMMENTS HERE' # pad match out to 44 spaces before comment
text = re.sub('^(django[\s|=].*").*$', repl, text, flags=re.MULTILINE)
print(text)
Output:
[tool.poetry.dependencies]
django = "3.0.5" # COMMENTS HERE
djangorestframework = "3.11.0" # Rest api [ https://www.django-rest-framework.org ]
psycopg2-binary = "2.8.4" # PostgreSQL driver
redis = "3.4.1" # The Python interface to the Redis key-value store

How to extract tag attributes using Spacy

I tried to get the morphological attributes of the verb using Spacy like below:
import spacy
from spacy.lang.it.examples import sentences
nlp = spacy.load('it_core_news_sm')
doc = nlp('Ti è piaciuto il film?')
token = doc[2]
nlp.vocab.morphology.tag_map[token.tag_]
output was:
{'pos': 'VERB'}
But I want to extract
V__Mood=Cnd|Number=Plur|Person=1|Tense=Pres|VerbForm=Fin": {POS: VERB}
Is it possible to extract the mood, tense,number,person information as specified in the tag-map https://github.com/explosion/spacy/blob/master/spacy/lang/it/tag_map.py like above using Spacy?
The nlp.vocab.morphology.tag_map maps from the detailed tag to the dict with simpler tag, so you just need to skip that step and inspect the tag directly:
import spacy
nlp = spacy.load('it')
doc = nlp('Ti è piaciuto il film?')
print(doc[2].tag_)
should return
VA__Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin
(with spacy 2.0.11, it_core_news_sm-2.0.0)

Coreference resolution in python nltk using Stanford coreNLP

Stanford CoreNLP provides coreference resolution as mentioned here, also this thread, this, provides some insights about its implementation in Java.
However, I am using python and NLTK and I am not sure how can I use Coreference resolution functionality of CoreNLP in my python code. I have been able to set up StanfordParser in NLTK, this is my code so far.
from nltk.parse.stanford import StanfordDependencyParser
stanford_parser_dir = 'stanford-parser/'
eng_model_path = stanford_parser_dir + "stanford-parser-models/edu/stanford/nlp/models/lexparser/englishRNN.ser.gz"
my_path_to_models_jar = stanford_parser_dir + "stanford-parser-3.5.2-models.jar"
my_path_to_jar = stanford_parser_dir + "stanford-parser.jar"
How can I use coreference resolution of CoreNLP in python?
As mentioned by #Igor You can try the python wrapper implemented in this GitHub repo: https://github.com/dasmith/stanford-corenlp-python
This repo contains two main files:
corenlp.py
client.py
Perform the following changes to get coreNLP working:
In the corenlp.py, change the path of the corenlp folder. Set the path where your local machine contains the corenlp folder and add the path in line 144 of corenlp.py
if not corenlp_path:
corenlp_path = <path to the corenlp file>
The jar file version number in "corenlp.py" is different. Set it according to the corenlp version that you have. Change it at line 135 of corenlp.py
jars = ["stanford-corenlp-3.4.1.jar",
"stanford-corenlp-3.4.1-models.jar",
"joda-time.jar",
"xom.jar",
"jollyday.jar"]
In this replace 3.4.1 with the jar version which you have downloaded.
Run the command:
python corenlp.py
This will start a server
Now run the main client program
python client.py
This provides a dictionary and you can access the coref using 'coref' as the key:
For example: John is a Computer Scientist. He likes coding.
{
"coref": [[[["a Computer Scientist", 0, 4, 2, 5], ["John", 0, 0, 0, 1]], [["He", 1, 0, 0, 1], ["John", 0, 0, 0, 1]]]]
}
I have tried this on Ubuntu 16.04. Use java version 7 or 8.
Stanford's CoreNLP has now an official Python binding called StanfordNLP, as you can read in the StanfordNLP website.
The native API doesn't seem to support the coref processor yet, but you can use the CoreNLPClient interface to call the "standard" CoreNLP (the original Java software) from Python.
So, after following the instructions to setup the Python wrapper here, you can get the coreference chain like that:
from stanfordnlp.server import CoreNLPClient
text = 'Barack was born in Hawaii. His wife Michelle was born in Milan. He says that she is very smart.'
print(f"Input text: {text}")
# set up the client
client = CoreNLPClient(properties={'annotators': 'coref', 'coref.algorithm' : 'statistical'}, timeout=60000, memory='16G')
# submit the request to the server
ann = client.annotate(text)
mychains = list()
chains = ann.corefChain
for chain in chains:
mychain = list()
# Loop through every mention of this chain
for mention in chain.mention:
# Get the sentence in which this mention is located, and get the words which are part of this mention
# (we can have more than one word, for example, a mention can be a pronoun like "he", but also a compound noun like "His wife Michelle")
words_list = ann.sentence[mention.sentenceIndex].token[mention.beginIndex:mention.endIndex]
#build a string out of the words of this mention
ment_word = ' '.join([x.word for x in words_list])
mychain.append(ment_word)
mychains.append(mychain)
for chain in mychains:
print(' <-> '.join(chain))
stanfordcorenlp, the relatively new wrapper, may work for you.
Suppose the text is "Barack Obama was born in Hawaii. He is the president. Obama was elected in 2008."
The code:
# coding=utf-8
import json
from stanfordcorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP(r'G:\JavaLibraries\stanford-corenlp-full-2017-06-09', quiet=False)
props = {'annotators': 'coref', 'pipelineLanguage': 'en'}
text = 'Barack Obama was born in Hawaii. He is the president. Obama was elected in 2008.'
result = json.loads(nlp.annotate(text, properties=props))
num, mentions = result['corefs'].items()[0]
for mention in mentions:
print(mention)
Every "mention" above is a Python dict like this:
{
"id": 0,
"text": "Barack Obama",
"type": "PROPER",
"number": "SINGULAR",
"gender": "MALE",
"animacy": "ANIMATE",
"startIndex": 1,
"endIndex": 3,
"headIndex": 2,
"sentNum": 1,
"position": [
1,
1
],
"isRepresentativeMention": true
}
Maybe this works for you? https://github.com/dasmith/stanford-corenlp-python
If not, you can try to combine the two yourself using http://www.jython.org/

Categories

Resources