How to search paragraph(s) for objects - python

I'm looking for a way to parse objects out of naturally written text. I have a database of thousands of locations (City, State). As my users write posts, I would like to intelligently find and enrich locations being written about. For example, given the post:
I had a really nice trip to Portland this weekend. It was beautiful and the climbing gyms are second to none.
I'd like to suggest Portland, OR and Portland, ME and ask the user to choose one.
Is there a name for this kind of search? I'm not even sure where to start.
EDIT: I'm currently using Python/Django and MySQL, but suggestions on any technology/platform would be useful.

You will need to use NLP to extract the city (location) from your sentence.
See:
http://www.nltk.org/howto/relextract.html
then run a query against your database:
maybe like:
select city, state from locations_table where city="Portland"
which will give you city, state pairs.

Related

"Get" document from cosmosdb by id (not knowing the _rid)

As MS Support recently told me that using a "GET" is much more efficient in RUs usage than a sql query. I'm wondering if I can (within the azure.cosmos python package or a custom HTTP request to the REST API) get a document by its unique 'id' field (for which I generated a GUIDs) without an SQL Query.
Every example shown are using the link/path of the doc which is built with the '_rid' metadata of the document and not the 'id' field set when creating the doc.
I use a bulk upsert stored procedure I wrote to create my new documents and never retrieve the metadata for each one of them (I have ~ 100 millions docs) so retrieving the _rid would be equivalent to retrieving the doc itself.
The reason that the ReadDocument method is so much more efficient than a SQL query is because it uses _rid instead of a user generated field, even the required id field. This is because the _rid isn't just a unique value, it also encodes information about where that document is physically stored.
To give an example of how this works, let's say you are explaining to someone where a party is this weekend. You could use the name that you use for the house "my friend Ryan's house" or you could use the address "123 ThatOne Street Somewhere, WA 11111". They both are unique identifiers, but for someone trying to get there one is way more efficient than the other.
Telling someone to go to your friend's house is like using your own id. It does map to a specific house, but the person will still need to find out where that physically is to get there. Using the address is like working with the _rid field. Based on that information alone they can get to the party location. Of course, in the real world the person would probably need directions, but the data storage in a database is a lot more organized than most city streets so an address is sufficient to go retrieve the document.
If you want to take advantage of this method you will need to find a way to work with the _rid field.

Medicine Database research materials?

This request is more about some research materials...
I want to build an application to control medicine dosage and to do this I think about making sample database with like 1000 medicines with info most importantly about a dosage (4 times a day, every 4 hours...) and composition.
Does anyone know about some webpage that could easily provide me with this kind of information? I think about webscraping it with python. Or maybe somewhere there is already created sample database like this.
Also, I think about making feature about what elements in medicine composition can be dangerous together. Something like: That you can't take some 3 medicines together. Maybe someone know and could recommend some webpage, article, maybe algorithm that could provide me more knowledge about this subject.

Searching for abbreviated words in MySQL

I have a MySQL database, working with Python, with some entries for products such as Samsung Television, Samsung Galaxy Phone etc. Now, if a user searches for Samsung T.V or just T.V, is there any way to return the entry for Samsung Television?
Do full text search libraries like Solr or Haystack support such features? If not, then how do I actually proceed with this?
Thanks.
yes, Solr will surely allow you to do this and much more.You can start Here
and SolrCloud is a really good way to provide for High Availabilty to end users.
You should have a look at the SynonymFilterFactory for your analyzer. When reading the documentation you will find this section that rather sounds like the scenario you describe.
Even when you aren't worried about multi-word synonyms, idf differences still make index time synonyms a good idea. Consider the following scenario:
An index with a "text" field, which at query time uses the SynonymFilter with the synonym TV, Televesion and expand="true"
Many thousands of documents containing the term "text:TV"
A few hundred documents containing the term "text:Television"
You should keep in mind to have separate analyzers for index and query time, as described in this SO question How to make solr synonyms work.

How should I structure a GAE datastore to be able to grab professions related to a keyword?

If someone searches for "teeth doctor", I would like to return entries from a google app engine datastore for dentists. Similarly, "foot doctor" would return podiatrists, "childrens' doctor" pediatrician, etc.
How should I find related keywords, and should I store them with the doctor entries, in a separate table, or grab them on request?
I'm thinking of having one entity for the professionals - it would include their name, location, contact info, etc, but most importantly, the formal name for their profession. And, another table for a relation of words to professions. For example, "teeth" would map to dentist, but also orthodontist. Would this be the best way to go about it?
Also, is there a way to have google sort the results by multiple things? I would like to list the most relevant results, but also have priority on slightly less-related, but closer doctors. For example, if a user searches for "teeth", I would want the results to be in the order of: 1. A dentist 0.5 miles away, 2. An orthadontist 0.2 miles away, and 3. A dentist 5 miles away. What I'm currently thinking for this is keeping track of the estimated percent-likelihood that a searched keyword is meant to return a certain profession and then calculate that into the distance calculator that I would be using and sorting by.
I would probably go with having a profession kind and a professional kind. Professional entities then reference the applicable profession. Profession entities would contain your keywords. You could then use the new app engine search feature to index and search professions (Search Overview (Python)) and use the results to look up professionals. Indexing your professionals this way as well would give you some/all of the location based searching you want to implement.

geo name database (city, points of interest)

I am building a travel website with django. When a user is typing in the destination city name (or points of interest, like yellow stone), I want to do ajax auto suggestion. The question is how I could get the suggestion database? Is there any web service? Best if it could also support foreign cities. Thanks a lot.
What you want is called a gazetteer database.
The official USGS gazetteer for the USA is available for download.
Two global geocoded databases include:
Geonames has a free list of cities and POI. It includes the USGS gazetteer and lots of other info. You might have to subset their database however, as it might return too many results for you.
Maxmind also have a free database of cities.
take a look at OpenStreetMap there are a lot of cities, pois both in chinease and english

Categories

Resources