Classifying people's future hopes in a test by using Python? - python

I have a csv file which includes people's hopes about an educational program. So I want to classify people and match them to see whose are having some similar dreams. I am using python TFID for this. What kind of technologies or repositories do you suggest me?
pieceof a part of data

Related

TexSoup for Bib-Files

this is my first question so I will try to do everything as proper as possible.
I am currently using LaTeX to write my documents at my University because I want to use the powerful citing capabilities provided by BibTeX. For ease of use, I am writing on scripts that will implement my .bib-files into my .tex files easier and allow easier management of my .bib-files. As I am using Arch Linux, I did this in bash, but it is a little clunky. Therefore I wanted to switch to python, as I came across the TexSoup-library for Python.
My issue is now, that I cannot find resources regarding the use of TexSoup for .bib files, I can only find resources on .tex-files. Does anybody know, if and if yes how I can use TeXSoup to find books / articles or other entries in my bib-files with python (or the TexSoup-library)?
with open("bib_complete.bib") as f:
soup = TexSoup(f)
print(soup)
This is a code sample I am trying to use, but I don't know how to look for entry names or entry-types with the package. I would really appreciate if someone could guide me to good resources if they exist.
I hope my writing was comprehensive enough and not too long.
Thanks everybody!

How to transform unstructured text to rdf turtle in practice?

i am currently working on a study project, where i have to transform the vehicle complaints descriptions from the NHTSA Database (https://catalog.data.gov/dataset/nhtsas-office-of-defects-investigation-odi-complaints) into rdf-turtle and later into a Knowledge Graph representation maybe using GraphDB, etc. A set of descriptions can be found in the attachment.
I have research topics like NER, Relation Extration, OpenIE, FRED, Knowledge Graph Construction, RDFS, OWL and Ontologies in theory.
Now, i come to a point where i just don't know how to practically do it.
Can someone help me with it and guide me a little bit through it?
Where should i start and with what?
Thanks you very much,
Dennis
Examples customer complaints
Stanford CORE NLP has Open IE that can extract triples from the text.
https://nlp.stanford.edu/software/openie.html
If you want to do it with python, look into Stanza
https://github.com/stanfordnlp/stanza.
It has an official python wrapper that fires up a JAVA-based CORE NLP server at the backend.

Running a Python Script on a Website (in the background)

Firstly, apologies for the very basic question. I have looked into other answers but they haven't quite answered what I'm after. I'm confident designing a site in HTML/CSS and have very very basic knowledge of Python.
I want to run a very basic Python script on my website. It analyses tweets about a specific topic, and then posts a sentiment analysis score. I want it to run this sentiment analysis every hour and cache the score.
I have a working Python script which does this in Jupyter Notebook. Could you give me an overview of how I would make this script function online and cache the results? I've read into using Python web frameworks, but from my limited understanding, they seem like overkill?
Thank you for your help!
Could you give me an overview of how I would make this script function online
The key thing would be to uncouple the two parts of your system:
Producing the data
Showing it in a website.
So the first thing to do is have your sentiment-analysis script push its value to a database. The database could be something as simple as a csv file, or it could be a key/value store, or something like MySQL or CouchDB (or hundreds of other choices).
Over on the website you have to make a decision between:
Server-side
Client-side
If the former, you could program in Python if that is what you are most familiar with. Whatever language/framework combination you go for, there will an example tutorial of how to read a value from a database and display it: it is just about the most fundamental thing.
If client-side you will usually be programming in JavaScript. Again you need to choose a framework, but again you should easily be able to find a tutorial to follow.
(Unless you have a good reason to prefer server-side, such as familiarity with an existing framework, or security issues with accessing your database, I'd go with a client-side approach.)
I've read into using Python web frameworks... overkill?
Yes and no. You are going to need some kind of database, and some kind of framework. It would be good to understand the basics of web security, too. If the sentiment analysis is your major goal, all that is going to be a distraction, and it might be better to find a friend who already knows web programming to work with. Or just find a tutorial that is very close to what you want to do, and adapt that.
(P.S. I was going to flag your question as "too broad", but you did ask for an overview, so I hope this helps.)

What are good online resources to learn using Python to get and post data in a webpage? More generally for webscraping?

I am not a software engineer and my knowledge of Python is focused on using it fro data wrangling and machine learning modeling. However, I need to learn how to get and post data to webpages and do webscraping as well. What are good on line tutorials or courses that would teach me the necessary skills? I find difficult learning from reading the documentation.
The following series of courses, is a very good start for learning python
https://www.class-central.com/mooc/4319/coursera-programming-for-everybody-getting-started-with-python
You might want to focus on the following course in the series:
https://www.class-central.com/mooc/4343/coursera-using-python-to-access-web-data
About the Course
This course will show how one can treat the Internet as a source of data. We will scrape, parse, and read web data as well as access data using web APIs. We will work with HTML, XML, and JSON data formats in Python. This course will cover Chapters 11-13 of the textbook “Python for Informatics”. To succeed in this course, you should be familiar with the material covered in Chapters 1-10 of the textbook and the first two courses in this specialization. These topics include variables and expressions, conditional execution (loops, branching, and try/except), functions, Python data structures (strings, lists, dictionaries, and tuples), and manipulating files. This course covers Python 2.

AIML for Intelligent Answering Engine

I have heard about a programming language called AIML which can be used for programming Intelligent Robots.
I am a web developer and have a web crawler build using Python 2.7 and have indexed Wikipedia ...
So I wanted to build a answering engine using python which would use a string variable
(It is a HUGE variable containing the whole of Wikipedia) as a source of information and use AI to answer...
Finally, I wanted to put this up on my school website...
So can I do that in AIML?
Later on I also want to modify it so as to give my live scores answers to questions like:
"What is the age of ~someperson~?" etc.
For that I'll send my web crawler to index some score pages etc..
Can I program this sort of answering agent in AIML?
If yes please provide links to tutorials which tell me how to do that? (using string variables as a source of information to parse queries and answer like a human)
moreover, AIML uses syntax like:
<category>
<pattern>WHAT ARE YOU</pattern>
<template>
<think><set name="topic">Me</set></think>
I am the latest result in artificial intelligence,
which can reproduce the capabilities of the human brain
with greater speed and accuracy.
</template>
</category>
Where pattern is the query and template is answer, so does that mean I have to sit and write these tags for all possible queries?
Or can I make it use its brains to figure out what the person wants and give them answers
using the string variable as its source of information.
Thank you.
AIML
It looks like AIML is a form of pattern matching. Moreover, it looks like this is mainly meant for dialog based agents. Therefore, to use AIML, you would likely need to manually generate every question and the correct response (answer).
Question answering
What it seems like you are really after is what we call a question answering system. Very briefly, a QA system generally has these components:
Question analysis.
Extract keywords.
(Sometimes) determine expected answer type (location, person, color, number, etc.).
Candidate document selection---doing a search on your knowledge base using an information retrieval system.
Candidate document analysis.
Answer extraction---select some part of the document (sentence(s), paragraph(s)).
Response generation.
Scores and ranks each answer.
Displays the most confident answer(s).
Research
If you're really want to dig deeply into this area, I'd suggest using Google Scholar and search for some of the terms I've mentioned, which will give you some research papers that go into detail about many of these topics. Some papers to get you started:
Natural language question answering: The view from here
Answering complex, list and context questions with LCC's Question-Answering Server
The structure and performance of an open-domain question answering system
Learning surface text patterns for a question answering system
Learning question classifiers
What is not in the Bag of Words for Why-QA?
Shameless plug
I've recently taken a course on natural language processing, and developed a rudimentary QA system that uses Wikipedia as a knowledge base. (Actually, I used the Simple English Wikipedia because it was much easier to work with; though the system does work with the full version just much more slowly.)
If you are interested in looking at some Python code as a reference, you may do so on the project's GitHub page: bwbaugh/causeofwhy. In addition, there is some more detailed documentation on what goes on in each step of the system components.
There is also a very basic working demo of the QA system in action that is (currently) available, however bear in mind the system is a proof-of-concept and can take upwards of 30 seconds to respond to a question (depending on the question).

Categories

Resources