Python/Plone: Getting all unique keywords (Subject)

Python/Plone: Getting all unique keywords (Subject) - python

Is there a way of getting all the unique keyword index i.e. Subject in Plone by querying the catalog?
I have been using this as a guide but not yet successful.
This is what I have so far
def search_content_by_keywords(self):
"""
Attempting to search the catalog
"""
catalog = self.context.portal_catalog
query = {}
query['Subject'] = 'Someval'
results = catalog.searchResults(query)
return results
Instead of passing the keyword, I want to fetch all the keywords

catalog = self.context.portal_catalog
my_keys = catalog.uniqueValuesFor('Subject')
reference: http://docs.plone.org/develop/plone/searching_and_indexing/query.html#unique-values

Related

Problems storing information of JSON into dictionary for loop in python

I'm new in this of API's and web development. so I'm sorry if my question is very basic :(.
I want to create a web browser of food recipes based in the ingredients contained. I'm using 2 queries urls to obtain the information because I need to acces to 2 json files. First one to obtain the id for each recipe based in the ingredient searched by the user and second one to obtain the information of each recipe based on the id returned in the first url.
The code I have is this one:
#Function that return id's of recipes that contains the word queried by user.
def ids(query):
try:
api_key = os.environ.get("API_KEY")
response = requests.get(f"https://api.spoonacular.com/recipes/autocomplete?apiKey={api_key}&query={urllib.parse.quote_plus(query)}")
response.raise_for_status()
except requests.RequestException:
return response
try:
ids = []
quotes = response.json()
for quote in quotes:
ids.append(quote['id'])
return ids
except (KeyError,TypeError, ValueError):
return None
#save inside a list named "ids", the id's of recipes that contains the ingredient chicken
ids = ids("chicken")
#function that return the differents options of recipes based in the ids.
def lookup(ids):
for ID in ids:
try:
api_key = os.environ.get("API_KEY")
response = requests.get(f"https://api.spoonacular.com/recipes/{ID}/information?apiKey{api_key}&includeNutrition=false")
response.raise_for_status()
except requests.RequestException:
return response
The main issue I have is that I don't know how to store the information returned in response, as you may notice I use into the "lookup" function a loop to get the responses for all ID contained in the list ids, but considering that I'll obtain 1 response for each ID (for instance if I have 6 ids, I'll obtain 6 different responses with 6 different information into the json files).
finally the info I want to store is this one
quote = response.json()
results = {'id':quote["id"],'title':quote["title"],'url':quote["sourceUrl"]}
This is the link with a sample of the data and the url used to obtain the json
https://spoonacular.com/food-api/docs#Get-Recipe-Information
I'm stucking trying to store this information located inside the different json files in a dictionary using python.
Any kind of help will be amazing!!

You would best use a dict for it with a structure matching the recipes you get back:
Assuming the API returns name, duration, difficulty and these are fields you will use later, as well as that you also save other data besides recipes for your program you could use a dict. If this is not the case simply use a list of dicts that represent single recipes
#just a dummy setup to simulate getting different recipes back from the API
one_response = {"name" : "Chicken and Egg", "duration" : 14, "difficulty" : "easy"}
another_response = {"name" : "Chicken square", "duration" : 100, "difficulty" : "hard"}
def get_recipe(id):
if id == 1:
return one_response
else:
return another_response
ids = [1,2]
# Here would be other information maybe as well, that you capture somewhere else. If you don't have this then simply use a list with recipes dicts inside..
queried_recipes = {"recipes" :[] }
for i in ids:
# Here you simply add a recipes to your recipes dict
queried _recipes["recipes"].append(get_recipe(i))
print (queried_recipes)
OUT: {'recipes': [{'name': 'Chicken and Egg', 'duration': 14, 'difficulty': 'easy'}, {'name': 'Chicken square', 'duration': 100, 'difficulty': 'hard'}]}
print(queried_recipes["recipes"][0]["duration"])
OUT: 14

You may want to use https://spoonacular.com/food-api/docs#Get-Recipe-Information-Bulk instead. That will get you all the information you want in one JSON document without having to loop through repeated calls to https://api.spoonacular.com/recipes/{ID}/information.
However, to answer the original question:
def lookup(ids):
api_key = os.environ.get("API_KEY")
results = []
for ID in ids:
response = requests.get(f"https://api.spoonacular.com/recipes/{ID}/information?apiKey{api_key}&includeNutrition=false")
response.raise_for_status()
quote = response.json()
result = {'id':quote["id"],'title':quote["title"],'url':quote["sourceUrl"]}
results.append(result)
return results

saving search results as text instead of list

I am using Django 1.8 and currently am working on a Blog application. When i search for tweets( just a name instead of posts) , i want to save the search results obtained after querying the database, as text instead of list. My view function is as below:
def search(request):
query = request.GET.get('q','')
if query:
qset = (
Q(text__icontains=query)
#Q(hashes__icontains=query)
#Q(artist__icontains=query)
)
results = Tweet.objects.filter(qset).distinct()
else:
results = []
number_of_results = len(results)
search_item = query
returned_items = []
for res in results:
text = res.text
returned_items.append(text)
returns = returned_items[:]
search = Search(search_item=search_item,returns=returns)
search.save()
context = {'query':query,'results':results,'number_of_results':number_of_results,'title':'Search results for '+request.GET.get('q','')}
return render_to_response("tweets/search.html",context,context_instance=RequestContext(request))
also, the snapshot of my search table in the database is as shown below:
Please help me out friends.

you should join the returned list using the comma separted values. This will return the string.
returns = ', '.join(returned_items)

This piece of code is setting returns to a list:
returns = returned_items[:]
If you want to access the first string, set it to returned_items[0]. If you want to join all strings in the list, use join()
returns = "".join(returned_items)

NDB Model Querying of Key Ids using an array filter

I'm trying to query an NDB model using a list of provided key id strings. The model has string ids that are assigned at creation - for example:
objectKey = MyModel(
id="123456ABC",
name="An Object"
).put()
Now I can't figure out how to query the NDB key ids with a list filter. Normally you can do the MyModel.property.IN() to query properties:
names = ['An Object', 'Something else', 'etc']
# This query works
query = MyModel.query(MyModel.name.IN(names))
When I try to filter by a list of keys, I can't get it to work:
# This simple get works
object = MyModel.get_by_id("123456ABC")
ids = ["123456ABC", "CBA654321", "etc"]
# These queries DON'T work
query = MyModel.query(MyModel.id.IN(ids))
query = MyModel.query(MyModel.key.id.IN(ids))
query = MyModel.query(MyModel.key.id().IN(ids))
query = MyModel.query(MyModel._properties['id'].IN(ids))
query = MyModel.query(getattr(MyModel, 'id').IN(ids))
...
I always get AttributeError: type object 'MyModel' has no attribute 'id' errors.
I need to be able to filter by a list of IDs, rather than iterate through each ID in the list (which is sometimes long). How do I do it?

The following should work:
keys = [ndb.Key(MyModel, anid) for anid in ids]
objs = ndb.get_multi(keys)

You can also use urlsafe keys If you have problems using the ids.
keys = ndb.get_multi([ndb.Key(urlsafe=k) for k in ids])

Returning the entire dataset using Google App Engine indexed search

Is there any way to fetch the entire dataset in an app engine search index? The below search takes an integer limit through QueryOptions, and the limit which always needs to be present.
I'm unable to determine if there is some special flag that can bypass this limit and return the entire result set. If the query is made without a QueryOptions, the result set is limited to 20 somehow.
_INDEX = search.Index(name=constants.SEARCH_INDEX)
_INDEX.search(query=search.Query(
query,
options=search.QueryOptions(
limit=limit,
sort_options=search.SortOptions(...))))
Any ideas?

You could customise the delete all example, if indeed you want every document in the index rather then every result in a query https://cloud.google.com/appengine/docs/python/search/#Python_Deleting_documents_from_an_index
from google.appengine.api import search
def delete_all_in_index(index_name):
"""Delete all the docs in the given index."""
doc_index = search.Index(name=index_name)
# looping because get_range by default returns up to 100 documents at a time
while True:
# Get a list of documents populating only the doc_id field and extract the ids.
document_ids = [document.doc_id
for document in doc_index.get_range(ids_only=True)]
if not document_ids:
break
# Delete the documents for the given ids from the Index.
doc_index.delete(document_ids)
So you might end up with something like:
while True:
document_ids = [document.doc_id
for document in doc_index.get_range(ids_only=True)]
if not document_ids:
break
# Get then something with the document
for id in document_ids:
document = index.get(id)
You'd probably want to get the document itself in the list comprehension rather then getting the ID then getting the document from that ID, but you get the idea.

Firstly, if you peek into the constructor of QueryOptions, that answers your question why it returns 20 results:
def __init__(self, limit=20, number_found_accuracy=None, cursor=None,
offset=None, sort_options=None, returned_fields=None,
ids_only=False, snippeted_fields=None,
returned_expressions=None):
The reason I think why the API is doing this is to avoid unnecessary fetching of results. You should use an offset if you need to fetch more results upon user action instead of always fetching all results. See this.
from google.appengine.api import search
...
# get the first set of results
page_size = 10
results = index.search(search.Query(query_string='some stuff',
options=search.QueryOptions(limit=page_size))
# calculate pages
pages = results.found_count / page_size
# user chooses page and hence an offset into results
next_page = ith * page_size
# get the search results for that page
results = index.search(search.Query(query_string='some stuff',
options=search.QueryOptions(limit=page_size, offset=next_page))

How to use ResultSet in PyES

I'm using PyES to use ElasticSearch in Python.
Typically, I build my queries in the following format:
# Create connection to server.
conn = ES('127.0.0.1:9200')
# Create a filter to select documents with 'stuff' in the title.
myFilter = TermFilter("title", "stuff")
# Create query.
q = FilteredQuery(MatchAllQuery(), myFilter).search()
# Execute the query.
results = conn.search(query=q, indices=['my-index'])
print type(results)
# > <class 'pyes.es.ResultSet'>
And this works perfectly. My problem begins when the query returns a large list of documents.
Converting the results to a list of dictionaries is computationally demanding, so I'm trying to return the query results already in a dictionary. I came across with this documentation:
http://pyes.readthedocs.org/en/latest/faq.html#id3
http://pyes.readthedocs.org/en/latest/references/pyes.es.html#pyes.es.ResultSet
https://github.com/aparo/pyes/blob/master/pyes/es.py (line 1304)
But I can't figure out what exactly I'm supposed to do.
Based on the previous links, I've tried this:
from pyes import *
from pyes.query import *
from pyes.es import ResultSet
from pyes.connection import connect
# Create connection to server.
c = connect(servers=['127.0.0.1:9200'])
# Create a filter to select documents with 'stuff' in the title.
myFilter = TermFilter("title", "stuff")
# Create query / Search object.
q = FilteredQuery(MatchAllQuery(), myFilter).search()
# (How to) create the model ?
mymodel = lambda x, y: y
# Execute the query.
# class pyes.es.ResultSet(connection, search, indices=None, doc_types=None,
# query_params=None, auto_fix_keys=False, auto_clean_highlight=False, model=None)
resSet = ResultSet(connection=c, search=q, indices=['my-index'], model=mymodel)
# > resSet = ResultSet(connection=c, search=q, indices=['my-index'], model=mymodel)
# > TypeError: __init__() got an unexpected keyword argument 'search'
Anyone was able to get a dict from the ResultSet?
Any good sugestion to efficiently convert the ResultSet to a (list of) dictionary will be appreciated too.

I tried too many ways directly to cast ResultSet into dict but got nothing. The best way I recently use is appending ResultSet items into another list or dict. ResultSet covers every single item in itself as a dict.
Here is how I use:
#create a response dictionary
response = {"status_code": 200, "message": "Successful", "content": []}
#set restul set to content of response
response["content"] = [result for result in resultset]
#return a json object
return json.dumps(response)

Its not that complicated: just iterate over the result set. For example with a for loop:
for item in results:
print item

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python/Plone: Getting all unique keywords (Subject) - python

catalog = self.context.portal_catalog my_keys = catalog.uniqueValuesFor('Subject') reference: http://docs.plone.org/develop/plone/searching_and_indexing/query.html#unique-values

Related

Problems storing information of JSON into dictionary for loop in python

saving search results as text instead of list

NDB Model Querying of Key Ids using an array filter

Returning the entire dataset using Google App Engine indexed search

How to use ResultSet in PyES

Categories

Resources