Adding haystack's full text search to edx-platform - python

I'm trying to extend the edx-platform by adding full-text search, but I'm having trouble understanding how to retrieve data from mongodb. Does anyone have any experience with edx? How can one access data in a Courses.objects.all() manner?
Thanks!
A.

There is some sub-project created by edX called edX search, which will employ searching in a manner closer to the way you want. They use ElasticSearch directly without the Haystack lib .. have a look and let me if you find it useful: https://github.com/edx/edx-search
Good luck!

Have a look at http://docs.mongodb.org/manual/data-modeling/
as well as https://github.com/selvinsource/mongodb-datamining-shell
Hope it helps.

Related

Facebook/Instagram Comments Scraping

I need your help in order to find a tutorial or any other information regarding the scraping with Python (Legally: Because this is a part of a data collection for my thesis so I will need the legal ways to scrap the data please).
Whould you please help me to find out the usefull sources to realize this ?
https://medium.com/analytics-vidhya/web-scraping-instagram-with-selenium-python-b8e77af32ad4
Medium usually provides fairly good instructions.

Using WikipediaAPI

I'm making a chatbot using AIML.
Is there any tag in AIML that can search with Wikipedia?
I'd like to use Python to make it possible to search Wikipedia when certain questions are typed in.
Can anyone help me? T_T~~
i personally tried this for doing some search with Wikipedia on python. Managed to run several common use cases such as searching page.
Hope this helps!

GAE or Maps API3 Store Locator with Python? (Easy Version)

I'm tasked with creating our Google Maps website store locator and so far all I've been able to find is old php tutorials and some new appEngine apps.
The apps look great. They seem to function as designed and it looks like this is the way I need to proceed. I even found a demos here and here and both are perfect.
Problem is, I'm not at the level yet to understand them in order to learn from them and start implementing my own app for our stores. I do plan on using them to learn, but at the moment I'm not at that level yet so I'm not even really learning anything by examining the code.
Is there anything I can use at the moment that is a plugin option while I learn this? Perhaps any python tutorials out there hiding somewhere? I can learn these demos but I really need something for the time being while I'm figuring it all out.
This demo from 2008 might be a bit old but will put you on the right tracks.
There is also locator in geodatastore. Demo

Design help for static content with fixed keywords search framework

I am trying to work out a solution for detecting traceability between source code and documentation. The most important use case is that the user needs to see the a collection of source code tokens (sorted by relevance to the documentation) that can be traced back to the documentation. She is wont be bothered about the code format, but somehow needs to see an "identifier- documentation" mapping to get the idea of traceability.
I take the tokens from source code files - somehow split the concatenated identifiers (SimpleMAXAnalyzer becomes "simple max analyzer"), which then act as search terms on the documentation. Search frameworks are best for doing this specific task - drilling down documents to locate stuff using powerful information retrieval algorithms. Whoosh looked really great python search... with a number of analyzer and filters.
Though the problem is similar to search - it differs in that the user is not physically doing any search. So am I solving the problem the right way? Given that everything is static and needs to computed only once - am I using a wrong tool(a search framework) for the job?
I'm not sure, if I understand your use case. The user sees the source code and has some ways of jumping from a token to the appropriate part or a listing of the possible parts of the documentation, right?
Then a search tool seems to be the right tool for the job, although you could precompile every possible search (there is only a limited number of identifiers in the source, so you can calculate all possible references to the docs in advance).
Or are there any "canonical" parts of the documentation for every identifier? Then maybe some kind of index would be a better choice.
Maybe you could clarify your use case a bit further.
Edit: Maybe an alphabetical index of the documentation could be a step to the solution. Then you can look up the pages/chapters/sections for every token of the source, where all or most of its components are mentioned.

extract grammar features from sentence on Google App Engine

For my GAE app I need to do some natural language processing to extract the subject and object from an input sentence.
Apparently NLTK can't be installed (easily) on GAE so I am looking for another solution.
I noticed GAE comes with Antlr3 but from browsing their documentation it solves a different kind of grammar problem.
Any ideas?
You can easily build and NTLK RPC server on some machine and access it.
Another option is to find another web based service that already does that (such as opencalais).
With regards to the NLTK problem specifically, my solution would probably be to fix the weird imports that NLTK is doing, and use that as originally planned. When you're done, submit a patch of course.
That said, if this ultimately involves touching the data store, the answer is that it probably can't be done in a performant way, unless your data set is small or for some reason your NLP stuff doesn't need to hit some kind of full-text index. The GAE guys are working on it, but they have indicated that no one should be expecting a quick resolution to this particular issue.

Categories

Resources