AIML for Intelligent Answering Engine

AIML for Intelligent Answering Engine - python

I have heard about a programming language called AIML which can be used for programming Intelligent Robots.
I am a web developer and have a web crawler build using Python 2.7 and have indexed Wikipedia ...
So I wanted to build a answering engine using python which would use a string variable
(It is a HUGE variable containing the whole of Wikipedia) as a source of information and use AI to answer...
Finally, I wanted to put this up on my school website...
So can I do that in AIML?
Later on I also want to modify it so as to give my live scores answers to questions like:
"What is the age of ~someperson~?" etc.
For that I'll send my web crawler to index some score pages etc..
Can I program this sort of answering agent in AIML?
If yes please provide links to tutorials which tell me how to do that? (using string variables as a source of information to parse queries and answer like a human)
moreover, AIML uses syntax like:
<category>
<pattern>WHAT ARE YOU</pattern>
<template>
<think><set name="topic">Me</set></think>
I am the latest result in artificial intelligence,
which can reproduce the capabilities of the human brain
with greater speed and accuracy.
</template>
</category>
Where pattern is the query and template is answer, so does that mean I have to sit and write these tags for all possible queries?
Or can I make it use its brains to figure out what the person wants and give them answers
using the string variable as its source of information.
Thank you.

AIML
It looks like AIML is a form of pattern matching. Moreover, it looks like this is mainly meant for dialog based agents. Therefore, to use AIML, you would likely need to manually generate every question and the correct response (answer).
Question answering
What it seems like you are really after is what we call a question answering system. Very briefly, a QA system generally has these components:
Question analysis.
Extract keywords.
(Sometimes) determine expected answer type (location, person, color, number, etc.).
Candidate document selection---doing a search on your knowledge base using an information retrieval system.
Candidate document analysis.
Answer extraction---select some part of the document (sentence(s), paragraph(s)).
Response generation.
Scores and ranks each answer.
Displays the most confident answer(s).
Research
If you're really want to dig deeply into this area, I'd suggest using Google Scholar and search for some of the terms I've mentioned, which will give you some research papers that go into detail about many of these topics. Some papers to get you started:
Natural language question answering: The view from here
Answering complex, list and context questions with LCC's Question-Answering Server
The structure and performance of an open-domain question answering system
Learning surface text patterns for a question answering system
Learning question classifiers
What is not in the Bag of Words for Why-QA?
Shameless plug
I've recently taken a course on natural language processing, and developed a rudimentary QA system that uses Wikipedia as a knowledge base. (Actually, I used the Simple English Wikipedia because it was much easier to work with; though the system does work with the full version just much more slowly.)
If you are interested in looking at some Python code as a reference, you may do so on the project's GitHub page: bwbaugh/causeofwhy. In addition, there is some more detailed documentation on what goes on in each step of the system components.
There is also a very basic working demo of the QA system in action that is (currently) available, however bear in mind the system is a proof-of-concept and can take upwards of 30 seconds to respond to a question (depending on the question).

Related

Running a Python Script on a Website (in the background)

Firstly, apologies for the very basic question. I have looked into other answers but they haven't quite answered what I'm after. I'm confident designing a site in HTML/CSS and have very very basic knowledge of Python.
I want to run a very basic Python script on my website. It analyses tweets about a specific topic, and then posts a sentiment analysis score. I want it to run this sentiment analysis every hour and cache the score.
I have a working Python script which does this in Jupyter Notebook. Could you give me an overview of how I would make this script function online and cache the results? I've read into using Python web frameworks, but from my limited understanding, they seem like overkill?
Thank you for your help!

Could you give me an overview of how I would make this script function online
The key thing would be to uncouple the two parts of your system:
Producing the data
Showing it in a website.
So the first thing to do is have your sentiment-analysis script push its value to a database. The database could be something as simple as a csv file, or it could be a key/value store, or something like MySQL or CouchDB (or hundreds of other choices).
Over on the website you have to make a decision between:
Server-side
Client-side
If the former, you could program in Python if that is what you are most familiar with. Whatever language/framework combination you go for, there will an example tutorial of how to read a value from a database and display it: it is just about the most fundamental thing.
If client-side you will usually be programming in JavaScript. Again you need to choose a framework, but again you should easily be able to find a tutorial to follow.
(Unless you have a good reason to prefer server-side, such as familiarity with an existing framework, or security issues with accessing your database, I'd go with a client-side approach.)
I've read into using Python web frameworks... overkill?
Yes and no. You are going to need some kind of database, and some kind of framework. It would be good to understand the basics of web security, too. If the sentiment analysis is your major goal, all that is going to be a distraction, and it might be better to find a friend who already knows web programming to work with. Or just find a tutorial that is very close to what you want to do, and adapt that.
(P.S. I was going to flag your question as "too broad", but you did ask for an overview, so I hope this helps.)

Microsoft Speech Recognition Custom Training

I have been wanting to create an application using the Microsoft Speech Recognition.
My application's users are expected to often say abbreviated things, such as 'LHC' for 'Large Hadron Collider' or 'CERN'. Given that exact order, my application will return
You said: At age C.
You said: Cern
While it did work for 'CERN', it failed very badly for 'LHC'.
However, if I could make my own custom training files, I could easily place the term 'LHC' somewhere in there. Then, I could make the user access the Speech Control Panel and run my training file.
All the links I have found for this have been frustratingly useless, as they just say things like 'This is ----, you should try going to the ---- forum instead'.
If it does help, here is a list of the links:
http://compgroups.net/comp.speech.users/add-my-own-training/153194
https://groups.google.com/forum/#!topic/microsoft.public.speech.server/v58SH1ov22s
http://social.msdn.microsoft.com/Forums/en/servercorefordevelopers/thread/f7a35f3f-b352-464a-b264-e16eb4afd049
Is my problem even possible? Or are the training files themselves in a special format? If so, can that format be reproduced?
A solution that can also work on Windows XP would be ideal.
Thanks in advance!
P.S. If there are any libraries or modules out there already for this, could anyone point me to some? A Python or C/C++ solution would be splendid. Also, since I'd rather not post another question regarding this, is it possible to utilize the train utilities from command prompt (or without the GUI visible, but still having total command of all controls)?

Okay, pulling this from a thing I wrote three or four years ago now, but I believe you want to do something like this.
The grammar library is a trained system which can recognize words. You can create your own grammar library cued to specific words.
C#, sorry
using System.Speech
using System.Speech.Recognition
using System.Speech.AudioFormat
SpeechRecognitionEngine sre = new SpeechRecognitionEngine();
string[] words = {"L H C", "CERN"};
Choices choices = new Choices(words);
GrammarBuilder gb = new GrammarBuilder(choices);
Grammar grammar = new Grammar(gb);
sre.LoadGrammar(grammar);
That is as far as I can get you. From docs it looks like you can define the pronunciations somehow. So perhaps that way you could have LHC map directly to a single word. Here are the docs on the grammar class - http://msdn.microsoft.com/en-us/library/system.speech.recognition.grammar.aspx
Small update - see example in their docs here http://msdn.microsoft.com/en-us/library/ms554228.aspx

Mining Wikipedia for mapping relations for text mining

I am planning to develop a web-based application which could crawl wikipedia for finding relations and store it in a database. By relations, I mean searching for a name say,'Bill Gates' and find his page, download it and pull out the various information from the page and store it in a database. Information may include his date of birth, his company and a few other things. But I need to know if there is any way to find these unique data from the page, so that I could store them in a database. Any specific books or algorithms would be greatly appreciated. Also mentioning of good opensource libraries would be helpful.
Thank You

If you haven't already, you should have a look at DBpedia. Many categories of wiki articles have "Infoboxes" for the kinds of information you describe, and they've made a database out of it:
http://en.wikipedia.org/wiki/DBpedia
You might also leverage some of the information in Metaweb's Freebase (which overlaps and I believe may even integrate the info from DBpedia.) They have an API for querying their graph database, and there's a Python wrapper for it called freebase-python.
UPDATE: Freebase is no more; they were acquired by Google and eventually folded into the Google Knowledge Graph. There is an API but I don't think they have anything like the formal sync'ing Freebase had with public sources like Wikipedia. I'm personally disappointed in how this looks to have turned out. :-/
As for the natural language processing bit, if you do make headway on that problem you might consider these databases as repositories for any information you do mine.

You mention Python and Open Source, so I would investigate the NLTK (Natural Language Toolkit). Text mining and natural language processing is one of those things that you can do a lot with a dumb algorithm (eg. Pattern matching), but if you want to go a step further and do something more sophisticated - ie. Trying to extract information that is stored in a flexible manner or trying to find information that might be interesting but is not known a priori, then natural language processing should be investigated.
NLTK is intended for teaching, so it is a toolkit. This approach suits Python very well. There are a couple of books for it as well. The O'Reilly book is also published online with an open license. See NLTK.org

Jvc, there are existing python modules that can do everything you mentioned above.
For pulling information from webpages, I like to use Selenium, http://seleniumhq.org/projects/ide/. Basically, you can localize and retrieve information on any webpage using a number of identifiers (id, Xpath, etc).
However, like winwaed said, it can be inflexible if you are simply "pattern matching", especially since some websites use dynamic code- meaning the identifiers can change with each subsequent reload of the page. But, this problem can be solved by adding regular expressions, i.e. (.*), to your code. Check out this youtube video, http://www.youtube.com/watch?v=Ap_DlSrT-iE. Even though he is using BeautifulSoup to scrape the website- you can see how he uses regular expressions to pull the information from the page.
Also, I'm not sure what type of database you are working with, but pyodbc, http://code.google.com/p/pyodbc/, can work with SQL types, and also mainstream databases like Microsoft Access.
So, my advice is to look into Selenium for finding the info on the webpage, pyodbc to store and retrieve it, and regular expressions when the identifiers are dynamic.

Where is Python used? I read about it a lot on Reddit

I have downloaded the Pyscripter and learning Python. But I have no Idea if it has any job value , especially in India. I am learning Python as a Hobby. But it would be comforting to know if Python programmers are in demand in India.

Everywhere. It's used extensively by google for one.
See list of python software for more info, and also who uses python on the web?

In many large companies it is a primary scripting language.
Google is using it along with Java and C++ and almost nothing else.
Also many web pages are built on top of python and Django.
Another place is game development. Many games have their engines written in C++ but all the logic in Python.
In other words it is one of the most valuable tools.
This might be of interest for you as well:
Is Python good for big software projects (not web based)?
Are there any good reasons why I should not use Python?
What did you use to teach yourself python?

It definitely has job value. For instance Google requires it. Have a look at Google openings in India:
Excellent programming skills in at
least one of the following languages:
C, C++, Java or Python (C++/Python
preferred)

Not sure about India, but you can get a decent overview of available Python jobs on the python.org jobs page here.

Try looking at Mark Pilgrim's excellent book "Dive Into Python" which is available for download under GNU Free Documentation License.
HTH
cheers,
Rob

In 10 years of web development I've had 1 client have me write an email parsing app with it. Not that it doesn't get used, but I've seen Ruby/php/.net way more often in the wild.
Edit:
From the other posts if you plan on working at Google, it sounds like the language to learn - LOL!

It's juste one example but I know it is widely used in large scientific institutions with high tech machinery where non-programmers (typically physicists) need quick prototypes or tools to cover their data collection/processing needs. The easy-to access scripting language aspect clearly plays its role here. So I don't know about building a career out of that only but I'd definitely say that knowing Python is a very valuable asset on your resume, it'll strengthen your "smell of usefulness".

The google app engine lets you use python (or Java). I HIGHLY recommend that you check it out. If you want to have a FREE website with a database (actually a datastore but it works much like a database) using python, THIS IS IT. It scales up too. If you start to get enough traffic you would have to start paying for the usage it requires.
http://code.google.com/appengine/docs/python/overview.html
You could make your own python based site and run some ads. Voila, make some money. Also, I'm sure google could be impressed by some good python because I hear they use it for much of their own sites.

Building a full text search engine: where to start [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I want to write a web application using Google App Engine (so the reference language would be Python). My application needs a simple search engine, so the users would be able to find data specifying keywords.
For example, if I have one table with those rows:
1 Office space 2 2001: A space
odyssey 3 Brazil
and the user queries for "space", rows 1 and 2 would be returned. If the user queries for "office space", the result should be rows 1 and 2 too (row 1 first).
What are the technical guidelines/algorithms to do this in a simple way?
Can you give me good pointers to the theory behind this?
Thanks.
Edit: I'm not looking for anything complex here (say, indexing tons of data).

Read Tim Bray's series of posts on the subject.
Background
Usage of search engines
Basics
Precision and recall
Search engne intelligence
Tricky search terms
Stopwords
Metadata
Internationalization
Ranking results
XML
Robots
Requirements list

I found these two books very useful when I used to build full-text search engines.
Information Retrieval
Managing Gigabytes

I would not build it yourself, if possible.
App Engine includes the basics of a Full Text searching engine, and there is a great blog post here that describes how to use it.
There is also a feature request in the bug tracker that seems to be getting some attention lately, so you may want to hold out, if you can, until that is implemented.

As always start in wikipedia. First start is usually building an inverted index.

Here's an original idea:
Don't build an index. Seriously.
I was faced with a similar progblem some time ago. I needed a fast method to search megs and megs of text that came from documentation. I needed to match not just words, but word proximity in large documents (is this word near that word). I just ended up writing it in C, and the speed of it surprised me. It was fast enough that it didn't need any optimizing or indexing.
With the speed of today's computers, if you write code that runs straight on the metal (compiled code), you often don't need an order log(n) type algorithm to get the performance you need.

Lucene or Autonomy! These are not out of the box solutions for you. You will have to write wrappers on top of their interfaces.
They certainly do take care of the stemming, grammar , relational operators etc

First build your index.
Go through the input, split into words
For each word check if it is already in the index, if it is add the current record number to the index list, if not add the word and record number.
To look up a word go to the (possibly sorted) index and return all the record numbers for that word.
It's very esy to do this for a reasoable size list using Python's builtin storage types.
As an extra refinement you only want to store the base part of a word, eg 'find' for 'finding' - look up stemming algorithms.

The book Introduction to Information Retrieval provides a good introduction to the field.
A dead-tree version is published by Cambridge University Press, but you can also find a free online edition (in HTML and PDF) following the link above.

See also a question I asked: How-to: Ranking Search Results.
Surely there are more approaches, but this is the one I'm using for now.

Honestly, smarter people than I have figured this stuff out. I'd load up the solr app and make json calls from my appengine app and let solr take care of indexing.

I just found this article this weekend: http://www.perl.com/pub/a/2003/02/19/engine.html
Looks not too complicated to do a simple one (though it would need heavy optimizing to be an enterprise type solution for sure). I plan on trying a proof of concept with some data from Project Gutenberg.
If you're just looking for something you can explore and learn from I think this is a good start.

Look into the book "Managing Gigabytes" it covers storage and retrieval of huge amounts of plain text data -- eg. both compression and actual searching, and a variety of the algorithms that can be used for each.
Also for plain text retrieval you're best off using a vector based search system rather than a keyword->document indexing system as vector based systems can be much faster, and, more importantly can provide relevancy ranking relatively trivially.

Try this:
Say the variable table is your list of search entries.
query = input("Query: ").strip().lower()#Or raw_input, for python 2
end = []
for item in table:
if query in item.strip().lower():
end.append(item)
print end #Narrowed results
It just iterates through all of the items to see if the query is in any one of them. It works for a simple in-app search function. Maybe not for the whole internet, though.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.