NER german natural objects - python

I have some familiarity with R, and I am just starting with python to get into NLP, with a specific interest in Semantic Analysis and Named Entity Recognition (i am currently learning spaCy).
I have a background in Humanities and very little computational knowledge.
With this in mind, I am interested in exploring sentiments in German literature of a specific period, in relation to the use and references to geographical places and natural elements of the specific area and time this literature was produced.
I thought I could use dictionaries with tagged places/natural elements in combination with dictionaries for sentiments, and proceed in R with the text mining of my corpus, by analysing how emotions are expressed in proximity (or in relation to) the entities I am interested in.
Thus two questions: do such NER dictionaries exist for geographical/natural elements, and do they exist in German? Where could I find them?
I would be very happy to read any sort of suggestion. Thanks.

Stanford coreNLP provides good ner tagger. You can find ner models for German also. See their wbsite https://nlp.stanford.edu/software/CRF-NER.html. Check how the predictions are coming.

Related

Techniques for NER

I might sound really nooby for asking this but I'm writing a report about Named Entity Recognition for University and our lecturer wants us to provide techniques and tools required in NER. I think I've got my tools sorted with SpaCy, NLTK and Stanford NLP but I'm not quite sure what he means by techniques. Would the techniques mean tokenization? tagging? or aren't those proper techniques.
Cheers
First: Named Entity Recognition is an Activity of Natural Language Processing.
Techniques are ways to achieve something. For example, there are several ways to do Part of Speech Tagging, which is another Activity of Natural Language Processing. As an example, POS tagging can be rule based, based on Machine Learning or based on dependency parsing.
The same goes for Named Entity Recognition - there are several techniques, and even several different algorithms, that can achieve it. It's your work to find out which.

Can I use Natural Language Processing while identifying given words in a paragraph Or do I need to use machine learning algorithms

I need to identify some given words using NLP.
As an example,
Mary Lives in France
If we consider in here the given words are Australia, Germany,France. But in this sentence it include only France.
So Among the above 3 given words I need to identify the sentence is include only France
I would comment but I don't have enough reputation. It's a bit unclear exactly what you are trying to achieve here and how representative your example is - please edit your question to make it clearer.
Anyhow, like Guy Coder says, if you know exactly the words you are looking for, you don't really need machine learning or NLP libraries at all. However, if this is not the case, and you don't know have every example of what you are looking for, the below might help:
It seems like what you are trying to do is perform Named Entity Recognition (NER) i.e. identify the named entities (e.g. countries) in your sentences. If so, the short answer is: you don't need to use any machine learning algorithms. You can just use a python library such as spaCy which comes out of the box with a pretrained language model that can already perform a bunch of tasks, for instance NER, to high degree of performance. The following snippet should get you started:
import spacy
nlp = spacy.load('en')
doc = nlp("Mary Lives in France")
for entity in doc.ents:
if (entity.label_ == "GPE"):
print(entity.text)
The output of the above snippet is "France". Named entities cover a wide range of possible things. In the snippet above I have filtered for Geopolitical entities (GPE).
Learn more about spaCy here: https://spacy.io/usage/spacy-101

Extracting and ranking keywords from short text

I am working on a project to extract a keyword from short texts (3-4 sentences). Using the spaCy library I extract noun phrases and NER and use them as keywords. However, I would like to sort them based on their importance wrt the original text.
I tried standard informational retrieval approaches, like tfidf, and even a couple of graph-based algorithms but having such short text the results weren't so great.
I was thinking that maybe using a NN with an attention mechanism could help me rank those keywords. Is there any way to use the pre-trained models that come with spaCy to do some kind of ranking?
How about something like maximal marginal relevance? http://www.cs.cmu.edu/~jgc/publication/The_Use_MMR_Diversity_Based_LTMIR_1998.pdf

Name Entity Resolution Algorithm

I was trying to build an entity resolution system, where my entities are,
(i) General named entities, that is organization, person, location,date, time, money, and percent.
(ii) Some other entities like, product, title of person like president,ceo, etc.
(iii) Corefererred entities like, pronoun, determiner phrase,synonym, string match, demonstrative noun phrase, alias, apposition.
From various literature and other references, I have defined its scope as I would not consider the ambiguity of each of the entity beyond its entity category. That is, I am taking Oxford of Oxford University
as different from Oxford as place, as the previous one is the first word of an organization entity and second one is the entity of location.
My task is to construct one resolution algorithm, where I would extract
and resolve the entities.
So, I am working out an entity extractor in the first place.
In the second place, if I try to relate the coreferences as I found from
various literatures like this seminal work, they are trying to work out
a decision tree based algorithm, with some features like, distance,
i-pronoun, j-pronoun, string match, definite noun
phrase, demonstrative noun phrase, number agreement feature,
semantic class agreement, gender agreement, both proper names, alias, apposition
etc.
The algorithm seems a nice one where enities are extracted with Hidden Markov Model(HMM).
I could work out one entity recognition system with HMM.
Now I am trying to work out a coreference as well as an entity
resolution system. I was trying to feel instead of using so many
features if I use an annotated corpus and train it directly with
HMM based tagger, with a view to solve a relationship extraction like,
*"Obama/PERS is/NA delivering/NA a/NA lecture/NA in/NA Washington/LOC, he/PPERS knew/NA it/NA was/NA going/NA to/NA be/NA
small/NA as/NA it/NA may/NA not/NA be/NA his/PoPERS speech/NA as/NA Mr. President/APPERS"
where, PERS-> PERSON
PPERS->PERSONAL PRONOUN TO PERSON
PoPERS-> POSSESSIVE PRONOUN TO PERSON
APPERS-> APPOSITIVE TO PERSON
LOC-> LOCATION
NA-> NOT AVAILABLE*
would I be wrong? I made an experiment with around 10,000 words. Early results seem
encouraging. With a support from one of my colleague I am trying to insert some
semantic information like,
PERSUSPOL, LOCCITUS, PoPERSM, etc. for PERSON OF US IN POLITICS, LOCATION CITY US, POSSESSIVE PERSON MALE, in the tagset to incorporate entity disambiguation at one go. My feeling relationship extraction would be much better now.
Please see this new thought too.
I got some good results with Naive Bayes classifier also where sentences
having predominately one set of keywords are marked as one class.
If any one may suggest any different approach, please feel free to suggest so.
I use Python2.x on MS-Windows and try to use libraries like NLTK, Scikit-learn, Gensim,
pandas, Numpy, Scipy etc.
Thanks in Advance.
It seems that you are going in three different paths that are totally different and each can be done in a stand alone Phd. There are many literature about them. My first advice focus on the main task and outsource the remaining. If you are going to develop this for non-famous language, also, you can build on others.
Named Entity Recognition
Standford NLP have really go too far in that specially for English. They resolve named entities really good, they are widely used and have a nice community.
Other solution may exist in openNLP for python .
Some tried to extend it to unusual fine-grain types but you need much bigger training data to cover the cases and the decision becomes much harder.
Edit: Stanford NER exists in NLTK python
Named Entity Resolution/Linking/Disambiguation
This is concerned with linking the name to some knowledge base, and solves the problem of whether Oxford University of Oxford City.
AIDA: is one of the state-of-art in that. They uses different context information as well as coherence information. Also, they have tried supporting several languages. They have a good bench mark.
Babelfy: offers interesting API that does NER and NED for Entities and concepts. Also, they support many language but never worked very well.
others like tagme and wikifi ...etc
Conference Resolution
Also Stanford CoreNLP has some good work in that direction. I can also recommend this work where they combined Conference Resolution with NED.

How to determine the "sentiment" between two named entities with Python/NLTK?

I'm using NLTK to extract named entities and I'm wondering how it would be possible to determine the sentiment between entities in the same sentence. So for example for "Jon loves Paris." i would get two entities Jon and Paris. How would I be able to determine the sentiment between these two entities? In this case should be something like Jon -> Paris = positive
In short "you cannot". This task is far beyond simple text processing which is provided with NLTK. Such objects relations sentiment analysis could be the topic of the research paper, not something solvable with a simple approach. One possible method would be to perform a grammar analysis, extraction of the conceptual relation between objects and then independent sentiment analysis of words included, but as I said before - it is rather a reasearch topic.

Categories

Resources