Google Cloud NL entity recognizer grouping words together - python

When attempting to find the entities in a long input of text, Google Cloud's natural language program is grouping together words and then getting their incorrect entity. Here is my program:
def entity_recognizer(nouns):
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/Users/superaitor/Downloads/link"
text = ""
for words in nouns:
text += words + " "
client = language.LanguageServiceClient()
if isinstance(text, six.binary_type):
text = text.decode('utf-8')
document = types.Document(
content=text.encode('utf-8'),
type=enums.Document.Type.PLAIN_TEXT)
encoding = enums.EncodingType.UTF32
if sys.maxunicode == 65535:
encoding = enums.EncodingType.UTF16
entity = client.analyze_entities(document, encoding).entities
entity_type = ('UNKNOWN', 'PERSON', 'LOCATION', 'ORGANIZATION',
'EVENT', 'WORK_OF_ART', 'CONSUMER_GOOD', 'OTHER')
for entity in entity:
#if entity_type[entity.type] is "PERSON":
print(entity_type[entity.type])
print(entity.name)
Here nouns is a list of words. I then turn that into a string(i've tried multiple ways of doing so, all give the same result), but yet the program spits out output like:
PERSON
liberty secularism etching domain professor lecturer tutor royalty
government adviser commissioner
OTHER
business view society economy
OTHER
business
OTHER
verge industrialization market system custom shift rationality
OTHER
family kingdom life drunkenness college student appearance income family
brink poverty life writer variety attitude capitalism age process
production factory system
Any input on how to fix this?

To analyze entities in a text you can use a sample from the documentation which looks something like this:
import argparse
import sys
from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types
import six
def entities_text(text):
"""Detects entities in the text."""
client = language.LanguageServiceClient()
if isinstance(text, six.binary_type):
text = text.decode('utf-8')
# Instantiates a plain text document.
document = types.Document(
content=text,
type=enums.Document.Type.PLAIN_TEXT)
# Detects entities in the document. You can also analyze HTML with:
# document.type == enums.Document.Type.HTML
entities = client.analyze_entities(document).entities
# entity types from enums.Entity.Type
entity_type = ('UNKNOWN', 'PERSON', 'LOCATION', 'ORGANIZATION',
'EVENT', 'WORK_OF_ART', 'CONSUMER_GOOD', 'OTHER')
for entity in entities:
print('=' * 20)
print(u'{:<16}: {}'.format('name', entity.name))
print(u'{:<16}: {}'.format('type', entity_type[entity.type]))
print(u'{:<16}: {}'.format('metadata', entity.metadata))
print(u'{:<16}: {}'.format('salience', entity.salience))
print(u'{:<16}: {}'.format('wikipedia_url',
entity.metadata.get('wikipedia_url', '-')))
entities_text("Donald Trump is president of United States of America")
The output of this sample is:
====================
name : Donald Trump
type : PERSON
metadata : <google.protobuf.pyext._message.ScalarMapContainer object at 0x7fd9d0125170>
salience : 0.9564903974533081
wikipedia_url : https://en.wikipedia.org/wiki/Donald_Trump
====================
name : United States of America
type : LOCATION
metadata : <google.protobuf.pyext._message.ScalarMapContainer object at 0x7fd9d01252b0>
salience : 0.04350961744785309
wikipedia_url : https://en.wikipedia.org/wiki/United_States
As you can see in this example, Entity Analysis inspects the given text for known entities (proper nouns such as public figures, landmarks, etc.). It's not gonna provide you entity for each word in the text.

Instead of classifying according to entities, I would use Google default categories directly, changing
entity = client.analyze_entities(document, encoding).entities
to
categories = client.classify_text(document).categories
and consequently up-dating the code. I wrote the following sample code based on this tutorial, further developed in github.
def run_quickstart():
# [START language_quickstart]
# Imports the Google Cloud client library
# [START migration_import]
from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types
# [END migration_import]
# Instantiates a client
# [START migration_client]
client = language.LanguageServiceClient()
# [END migration_client]
# The text to analyze
text = u'For its part, India has said it will raise taxes on 29 products imported from the US - including some agricultural goods, steel and iron products - in retaliation for the wide-ranging US tariffs.'
document = types.Document(
content=text,
type=enums.Document.Type.PLAIN_TEXT)
# Detects the sentiment of the text
sentiment = client.analyze_sentiment(document=document).document_sentiment
# Classify content categories
categories = client.classify_text(document).categories
# User category feedback
for category in categories:
print(u'=' * 20)
print(u'{:<16}: {}'.format('name', category.name))
print(u'{:<16}: {}'.format('confidence', category.confidence))
# User sentiment feedback
print('Text: {}'.format(text))
print('Sentiment: {}, {}'.format(sentiment.score, sentiment.magnitude))
# [END language_quickstart]
if __name__ == '__main__':
run_quickstart()
Does this solution works for you? If not, why?

Related

How to search a specific country's tweets with Tweepy client.search_recent_tweets()

y'all. I'm trying to figure out how to sort for a specific country's tweets using search_recent_tweets. I take a country name as input, use pycountry to get the 2-character country code, and then I can either put some sort of location filter in my query or in search_recent_tweets params. Nothing I have tried so far in either has worked.
######
import tweepy
from tweepy import OAuthHandler
from tweepy import API
import pycountry as pyc
# upload token
BEARER_TOKEN='XXXXXXXXX'
# get tweets
client = tweepy.Client(bearer_token=BEARER_TOKEN)
# TAKE USER INPUT
countryQuery = input("Find recent tweets about travel in a certain country (input country name): ")
keyword = 'women safe' # gets tweets containing women and safe for that country (safe will catch safety)
# get country code to plug in as param in search_recent_tweets
country_code = str(pyc.countries.search_fuzzy(countryQuery)[0].alpha_2)
# get 100 recent tweets containing keywords and from location = countryQuery
query = str(keyword+' place_country='+str(countryQuery)+' -is:retweet') # search for keyword and no retweets
posts = client.search_recent_tweets(query=query, max_results=100, tweet_fields=['id', 'text', 'entities', 'author_id'])
# expansions=geo.place_id, place.fields=[country_code],
# filter posts to remove retweets
# export tweets to json
import json
with open('twitter.json', 'w') as fp:
for tweet in posts.data:
json.dump(tweet.data, fp)
fp.write('\n')
print("* " + str(tweet.text))
I have tried variations of:
query = str(keyword+' -is:retweet') # search for keyword and no retweets
posts = client.search_recent_tweets(query=query, place_fields=[str(countryQuery), country_code], max_results=100, tweet_fields=['id', 'text', 'entities', 'author_id'])
and:
query = str(keyword+' place.fields='+str(countryQuery)+','+country_code+' -is:retweet') # search for keyword and no retweets
posts = client.search_recent_tweets(query=query, max_results=100, tweet_fields=['id', 'text', 'entities', 'author_id'])
These either ended up pulling me NoneType tweets aka nothing or causing a
"The place.fields query parameter value [Germany] is not one of [contained_within,country,country_code,full_name,geo,id,name,place_type]"
The documentation for search_recent_tweets seems like place.fields / place_fields / place_country should be supported.
Any advice would help!!!

OSM Overpass missing data in query result

I'm gathering all cities, towns and villages of some countries from OSM using an Overpass query in a Python program.
Everything seems to be correct but I found a town in Luxembug that is missing im my result set. It concerns the town Kiischpelt.
'''
import requests
import json
Country = 'LU'
overpass_url = "http://overpass-api.de/api/interpreter"
overpass_query = """
[out:json];
area["ISO3166-1"=""" + Country + """][admin_level=2]->.search;
(node["place"="city"](area.search);
node["place"="town"](area.search);
node["place"="village"](area.search);
way["place"="city"](area.search);
way["place"="town"](area.search);
way["place"="village"](area.search);
rel["place"="city"](area.search);
rel["place"="town"](area.search);
rel["place"="village"](area.search);
);
out center;
"""
response = requests.get(overpass_url,
params={'data': overpass_query})
data = response.json()
filename = """C:/Data/GetGeoData/data/""" + Country + 'cities' +'.json'
f = open(filename,'w', encoding="utf-8")
json.dump(data, f)
f.close()
'''
When searching on the OSM site for Kiischpelt, I get a result of type relation but it doesnet appear in my result set.
Also when I change the query as follows
'''rel"place";''' which should return all places of all kinds (city, town, village, isolated dwelling,...)
Any idea what I'm doing wrong?
Many thanks!

Luis python SDK Utterance addition

we are trying to create a Chatbot using Luis framework and Python SDK using the Azure documentation as a reference. We have been able to add Intent, entity and pre-built entities using the same. These changes show up on the portal verifying the addition.
But the code for Utterance addition is not showing on the portal or being listed on the terminal.
def create_utterance(intent, utterance, *labels):
"""
Add an example LUIS utterance from utterance text and a list of
labels. Each label is a 2-tuple containing a label name and the
text within the utterance that represents that label.
Utterances apply to a specific intent, which must be specified.
"""
text = utterance.lower()
def label(name, value):
value = value.lower()
start = text.index(value)
return dict(entity_name=name, start_char_index=start,
end_char_index=start + len(value), role=None)
return dict(text=text, intent_name=intent,
entity_labels=[label(n, v) for (n, v) in labels])
utterances = [create_utterance("FindFlights", "find flights in economy to Madrid",
("Flight", "economy to Madrid"),
("Location", "Madrid"),
("Class", "economy")),
create_utterance("FindFlights", "find flights to London in first class",
("Flight", "London in first class"),
("Location", "London"),
("Class", "first")),
create_utterance("FindFlights", "find flights from seattle to London in first class",
("Flight", "flights from seattle to London in first class"),
("Location", "London"),
("Location", "Seattle"),
("Class", "first"))]
client.examples.batch(appId, appVersion, utterances, raw=True)
client.examples.list(appId, appVersion)
This code does not return any error but does not list the Utterances either.

Facebook Business SDK: cannot create an Ad

I cannot create a simple Ad with an external link to a mobile app. I have properly set access, can create a Campaign, an AdSet, load an image, but during an Ad creation I get an error:
Ads and ad creatives must be associated with a Facebook Page. Try connecting your ad or ad creative to a Page and resubmit your ad.
But I have associated a page! Here is my code:
# No problem with these ones
adset = ...
image_hash = '...'
url = 'https://itunes.apple.com/app/96...'
page_id = '25036...'
# Create an Ad Creative
creative = AdCreative()
creative['_parent_id'] = my_ads_acc_id
creative[AdCreative.Field.title] = 'Aivan Test Creative'
creative[AdCreative.Field.body] = 'Aivan Test Ad Creative Body'
creative[AdCreative.Field.actor_id] = page_id
creative[AdCreative.Field.link_url] = url
creative[AdCreative.Field.object_url] = url
creative[AdCreative.Field.object_type] = AdCreative.ObjectType.domain
creative[AdCreative.Field.call_to_action_type] = AdCreative.CallToActionType.use_mobile_app
creative[AdCreative.Field.image_hash] = image_hash
# Create an Ad
ad = Ad()
ad['_parent_id'] = my_ads_acc_id
ad[Ad.Field.name] = 'Aivan Ad'
ad[Ad.Field.adset_id] = adset[AdSet.Field.id]
ad[Ad.Field.creative] = creative
# This line generates an exception:
ad.remote_create(params={
'status': Ad.Status.paused,
})
I have specified the actor_id field, also I have tried other different code samples, but nothing works well. How can I connect a page?
Additional info:
My app is in development mode. I cannot turn the production mode because it needs a review which needs a completed app.
I have tried to use object_story_spec with link_data in it, but it creates other error because it doesn't work in development mode.
The app and the page are linked with Facebook Business Manager.
The results is the same if I init the API with app token or system user token: FacebookAdsApi.init(app_id, app_secret, app_access_token / system_user_token). The system user has access to both Ads Account and the Page.
I've solved the problem a long time ago, and since that time my server app successfully created lots of Facebook ads of both types, for websites and mobile apps. The first step to solve the problem was to understand that these ads types are completely different on Facebook, they need different settings for Campaign, AdSet & Ad. Here is my code for mobile ads creation.
1) Create Campaign object. account_id must be the ID of your Ad Account.
campaign = Campaign()
campaign['_parent_id'] = account_id
campaign[Campaign.Field.name] = 'Some Campaign Name'
campaign[Campaign.Field.objective] = 'APP_INSTALLS'
campaign.remote_create()
campaign_id = str(campaign[Campaign.Field.id])
2) Create AdSet object.
adset = AdSet()
adset['_parent_id'] = account_id
adset.update({
AdSet.Field.name: 'Some AdSet Name',
AdSet.Field.campaign_id: campaign_id,
AdSet.Field.lifetime_budget: budget * 100,
AdSet.Field.bid_strategy: 'LOWEST_COST_WITHOUT_CAP',
AdSet.Field.billing_event: AdSet.BillingEvent.link_clicks,
AdSet.Field.optimization_goal: AdSet.OptimizationGoal.link_clicks,
AdSet.Field.promoted_object: {
'object_store_url': app_store_url,
'application_id': ad_app_id,
},
AdSet.Field.targeting: targeting_object,
AdSet.Field.start_time: '2018-12-01 00:00:00',
AdSet.Field.end_time: '2018-12-30 23:59:00',
})
adset.remote_create()
adset_id = str(adset[AdSet.Field.id])
Note that to create mobile ad, you initially need to register your mobile app as a Facebook app (here you will get ad_app_id) and specify links to Apple App Store and Google Play Market. So, the value of app_store_url must be equal to one of those links in your Facebook app settings. Unfortunately, app can be registered only manually (if you know how to do it programmatically – write a comment, please).
Also note that billing_event and optimization_goal are connected with ads type (mobile/web) and with each other, you cannot just choose another one. (But if you know that this is possible, or there are some docs on this topics – let me know.)
budget is a money amount in the currency of your Ad Account. You can specify either lifetime_budget or something like day_budget, read the docs about it.
3) Then, you have to create AdCreative object with some other sub objects. Note that some of these lines of code are necessary for FB ad only, others for IG, others for both of them, but together they work well for everything. You can find description for all the formats here.
link_data = AdCreativeLinkData()
link_data[AdCreativeLinkData.Field.name] = main_text
link_data[AdCreativeLinkData.Field.message] = title
link_data[AdCreativeLinkData.Field.link] = app_store_url
link_data[AdCreativeLinkData.Field.image_hash] = image_hash
link_data[AdCreativeLinkData.Field.call_to_action] = {
'type': 'INSTALL_MOBILE_APP',
'value': {
'application': ad_app_id,
'link': app_store_url,
},
}
object_story_spec = AdCreativeObjectStorySpec()
object_story_spec[AdCreativeObjectStorySpec.Field.page_id] = page_id
object_story_spec[AdCreativeObjectStorySpec.Field.link_data] = link_data
creative = AdCreative()
creative['_parent_id'] = account_id
creative[AdCreative.Field.object_story_spec] = object_story_spec
creative[AdCreative.Field.title] = main_text
creative[AdCreative.Field.body] = title
creative[AdCreative.Field.actor_id] = page_id
creative[AdCreative.Field.link_url] = app_store_url
creative[AdCreative.Field.image_hash] = image_hash
To upload an image and get image_hash, check out this doc. The page_id must be an ID of the page which name and logo will be shown as the author of the ad.
You must note that the user, who creates the ad, must have an access to this page, to the mobile app registered on FB (ad_app_id), and to the Ad Account (account_id). In my server application I use Facebook system users for all the work with API.
4) And finally, create the Ad object itself:
ad = Ad()
ad['_parent_id'] = account_id
ad[Ad.Field.name] = 'Some Ad Name'
ad[Ad.Field.adset_id] = adset_id
ad[Ad.Field.creative] = creative
ad.remote_create(params={
'status': Ad.Status.active,
})
ad_id = str(ad[Ad.Field.id])
That's all!
Maybe someone will need to use or just want to see the difference when creating FB/IG ads for websites, it is a little bit simpler. So, here is my code for website ads creation.
1) Create Campaign object. Notice that website ads has a different objective. account_id must be the ID of your Ad Account.
campaign = Campaign()
campaign['_parent_id'] = account_id
campaign[Campaign.Field.name] = 'Some Campaign Name'
campaign[Campaign.Field.objective] = 'LINK_CLICKS'
campaign.remote_create()
campaign_id = str(campaign[Campaign.Field.id])
2) Create AdSet object. Note that billing_event and optimization_goal are connected with ads type (mobile/web) and with each other. Also, here you don't need to specify promoted_object in the AdSet.
adset = AdSet()
adset['_parent_id'] = account_id
adset.update({
AdSet.Field.name: 'Some AdSet Name',
AdSet.Field.campaign_id: campaign_id,
AdSet.Field.lifetime_budget: budget * 100,
AdSet.Field.bid_strategy: 'LOWEST_COST_WITHOUT_CAP',
AdSet.Field.billing_event: AdSet.BillingEvent.impressions,
AdSet.Field.optimization_goal: AdSet.OptimizationGoal.reach,
AdSet.Field.targeting: targeting_object,
AdSet.Field.start_time: '2018-12-01 00:00:00',
AdSet.Field.end_time: '2018-12-30 23:59:00',
})
adset.remote_create()
adset_id = str(adset[AdSet.Field.id])
Rules for budget are the same: budget is a money amount in the currency of your Ad Account. You can specify either lifetime_budget or something like day_budget, read the docs about it.
3) Then, you have to create AdCreative object with some other sub objects. You can find description for all the formats here.
link_data = AdCreativeLinkData()
link_data[AdCreativeLinkData.Field.name] = main_text
link_data[AdCreativeLinkData.Field.message] = title
link_data[AdCreativeLinkData.Field.link] = website_url
link_data[AdCreativeLinkData.Field.image_hash] = image_hash
object_story_spec = AdCreativeObjectStorySpec()
object_story_spec[AdCreativeObjectStorySpec.Field.page_id] = page_id
object_story_spec[AdCreativeObjectStorySpec.Field.link_data] = link_data
creative = AdCreative()
creative['_parent_id'] = account_id
creative[AdCreative.Field.object_story_spec] = object_story_spec
creative[AdCreative.Field.title] = main_text
creative[AdCreative.Field.body] = title
creative[AdCreative.Field.actor_id] = page_id
creative[AdCreative.Field.link_url] = website_url
creative[AdCreative.Field.object_type] = AdCreative.ObjectType.domain
creative[AdCreative.Field.image_hash] = image_hash
To upload an image and get image_hash, check out this doc. The page_id must be an ID of the page which name and logo will be shown as the author of the ad. Note that the user, who creates the ad, must have an access to this page, to the mobile app registered on FB (ad_app_id), and to the Ad Account (account_id).
4) And finally, create the Ad object itself:
ad = Ad()
ad['_parent_id'] = account_id
ad[Ad.Field.name] = 'Some Ad Name'
ad[Ad.Field.adset_id] = adset_id
ad[Ad.Field.creative] = creative
ad.remote_create(params={
'status': Ad.Status.active,
})
ad_id = str(ad[Ad.Field.id])
As you can see, to promote websites you don't need to register them on Facebook (in contrast to mobile ads).

Entity Recognition in Stanford NLP using Python

I am using Stanford Core NLP using Python.I have taken the code from here.
Following is the code :
from stanfordcorenlp import StanfordCoreNLP
import logging
import json
class StanfordNLP:
def __init__(self, host='http://localhost', port=9000):
self.nlp = StanfordCoreNLP(host, port=port,
timeout=30000 , quiet=True, logging_level=logging.DEBUG)
self.props = {
'annotators': 'tokenize,ssplit,pos,lemma,ner,parse,depparse,dcoref,relation,sentiment',
'pipelineLanguage': 'en',
'outputFormat': 'json'
}
def word_tokenize(self, sentence):
return self.nlp.word_tokenize(sentence)
def pos(self, sentence):
return self.nlp.pos_tag(sentence)
def ner(self, sentence):
return self.nlp.ner(sentence)
def parse(self, sentence):
return self.nlp.parse(sentence)
def dependency_parse(self, sentence):
return self.nlp.dependency_parse(sentence)
def annotate(self, sentence):
return json.loads(self.nlp.annotate(sentence, properties=self.props))
#staticmethod
def tokens_to_dict(_tokens):
tokens = defaultdict(dict)
for token in _tokens:
tokens[int(token['index'])] = {
'word': token['word'],
'lemma': token['lemma'],
'pos': token['pos'],
'ner': token['ner']
}
return tokens
if __name__ == '__main__':
sNLP = StanfordNLP()
text = r'China on Wednesday issued a $50-billion list of U.S. goods including soybeans and small aircraft for possible tariff hikes in an escalating technology dispute with Washington that companies worry could set back the global economic recovery.The country\'s tax agency gave no date for the 25 percent increase...'
ANNOTATE = sNLP.annotate(text)
POS = sNLP.pos(text)
TOKENS = sNLP.word_tokenize(text)
NER = sNLP.ner(text)
PARSE = sNLP.parse(text)
DEP_PARSE = sNLP.dependency_parse(text)
I am only interested in Entity Recognition which is being saved in the variable NER. The command NER is giving the following result
The same thing if I run on Stanford Website, the output for NER is
There are 2 problems with my Python Code:
1. '$' and '50-billion' should be combined and named a single entity.
Similarly, I want '25' and 'percent' as a single entity as it is showing in the online stanford output.
2. In my output, 'Washington' is shown as State and 'China' is shown as Country. I want both of them to be shown as 'Loc' as in the stanford website output. The possible solution to this problem lies in the documentation .
But I don't know which model am I using and how to change the model.
Here is a way you can solve this
Make sure to download Stanford CoreNLP 3.9.1 and the necessary models jars
Set up the server properties in this file "ner-server.properties"
annotators = tokenize,ssplit,pos,lemma,ner
ner.applyFineGrained = false
Start the server with this command:
java -Xmx12g edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000 -serverProperties ner-server.properties
Make sure you've installed this Python package:
https://github.com/stanfordnlp/python-stanford-corenlp
Run this Python code:
import corenlp
client = corenlp.CoreNLPClient(start_server=False, annotators=["tokenize", "ssplit", "pos", "lemma", "ner"])
sample_text = "Joe Smith was born in Hawaii."
ann = client.annotate(sample_text)
for mention in ann.sentence[0].mentions:
print([x.word for x in ann.sentence[0].token[mention.tokenStartInSentenceInclusive:mention.tokenEndInSentenceExclusive]])
Here are all the fields available in the EntityMention for each entity:
sentenceIndex: 0
tokenStartInSentenceInclusive: 5
tokenEndInSentenceExclusive: 7
ner: "MONEY"
normalizedNER: "$5.0E10"
entityType: "MONEY"

Categories

Resources