Python Bottle template issue: AttributeError("'dict' object has no attribute 'city'",) - python

As a learning project, I'm using MongoDB with Bottle for a web service. What I want to do is fetch results from MongoDB and display them in a template. Here's the output I want from my template:
output.tpl
<html><body>
%for record in records:
<li>{{record.city}} {{record.date}}
%end
</body></html>
I can pull the data out no problem:
result = db.records.find(query).limit(3)
return template('records_template', records=result)
But this resulted in no output at all - some debugging shows me that result is some sort of cursor:
<pymongo.cursor.Cursor object at 0x1560dd0>
So I attempted to convert this in to something that the template would like:
result = db.records.find(query).limit(3)
viewmodel=[]
for row in result:
l = dict()
for column in row:
l[str(column)]=row[column]
viewmodel.append(l)
return template('records_template', records=viewmodel)
Debugging shows me that my view data looks OK:
[{'_id': ObjectId('4fe3dfbc62933a0338000001'),
'city': u'CityName',
'date': u'Thursday June 21, 2012'},
{'_id': ObjectId('4fe3dfbd62933a0338000088')
'city': u'CityName',
'date': u'Thursday June 21, 2012'},
{'_id': ObjectId('4fe3dfbd62933a0338000089')
'city': u'CityName',
'date': u'Thursday June 21, 2012'}]
But this is the response I'm getting. Any ideas why?
AttributeError("'dict' object has no attribute 'city'",)
Edit: I added that bit about l[str(column)]=row[column] to convert the dictionary keys to non-unicode strings in case that was the problem, but it doesn't seem to matter either way.

You need to use the dictionary syntax to lookup the properties:
{{record['city']}} {{record['date']}}

result = db.records.find(query).limit(3)
viewmodel=[]
for row in result:
l = dict()
for column in row:
l[str(column)]=row[column]
viewmodel.append(l)
return template('records_template', records=viewmodel)
could be summarized in :
result = db.records.find(query).limit(3)
return template('records_template', records=list(result))
Python's beauty...

Related

List comprehension to iterate through dataframe

I have written code to encode one row of a dataframe to json, as follows:
def encode_df_metadata_row(df):
return {'name': df['Title'].values[0], 'code': df['Code'].values[0], 'frequency': df['Frequency'].values[0], 'description': df['Subtitle'].values[0], 'source': df['Source'].values[0]}
Now I would like to encode an entire dataframe to json with some transformation, so I wrote this function:
def encode_metadata_list(df_metadata):
return [encode_df_metadata_row(df_row) for index, df_row in df_metadata.iterrows()]
I then try to call the function using this code:
df_oodler_metadata = pd.read_csv('DATA\oodler-datasets-metadata.csv')
response = encode_metadata_list(df_oodler_metadata)
print(response)
When I run this code, I get the following error:
AttributeError: 'str' object has no attribute 'values'
I've tried a bunch of variations but I keep getting similar errors. Does someone know the right way to do this?
DataFrame.iterrows yields pairs of index and row, where each row is a Series object. It stores a single element for each column, so the .values[0] part in your encode_df_metadata_row(df) function becomes irrelevant - the correct form of this function should be:
def encode_df_metadata_row(row):
return {'name': row['Title'], 'code': row['Code'], 'frequency': row['Frequency'], 'description': row['Subtitle'], 'source': row['Source']}

How to use metadata for document retrieval using Sentence Transformers?

I'm trying to use Sentence Transformers and Haystack for document retrieval, focusing on searching documents on other metadata beside document text.
I'm using a dataset of academic publication titles, and I've appended a fake publication year (which I want to use as a search term). From reading around I've combined the columns and just added a separator between the title and publication year, and included the column titles since I thought maybe this could add context. An example input looks like:
title Sparsity-certifying Graph Decompositions [SEP] published year 1980
I have a document store and method of retrieving here, based on this:
document_store_faiss = FAISSDocumentStore(faiss_index_factory_str="Flat",
return_embedding=True,
similarity='cosine')
retriever_faiss = EmbeddingRetriever(document_store_faiss,
embedding_model='all-mpnet-base-v2',
model_format='sentence_transformers')
document_store_faiss.write_documents(df.rename(columns={'combined':'content'}).to_dict(orient='records'))
document_store_faiss.update_embeddings(retriever=retriever_faiss)
def get_results(query, retriever, n_docs = 25):
return [(item.content) for item in retriever.retrieve(q, top_k = n_docs)]
q = 'published year 1999'
print('Results: ')
res = get_results(q, retriever_faiss)
for r in res:
print(r)
I do a check to see if any inputs actually have a publication year matching the search term, but when I look at my search results I'm getting entries with seemingly random published years. I was hoping that at least the results would all be the same published year, since I hoped to do more complicated queries like "published year before 1980".
If anyone could either tell me what I'm doing wrong, or whether I have misunderstood this process / expected results it would be much appreciated.
It sounds like you need metadata filtering rather than placing the year within the query itself. The FaissDocumentStore doesn't support filtering, I'd recommend switching to the PineconeDocumentStore which Haystack introduced in the v1.3 release a few days ago. It supports the strongest filter functionality in the current set of document stores.
You will need to make sure you have the latest version of Haystack installed, and it needs an additional pinecone-client library too:
pip install -U farm-haystack pinecone-client
There's a guide here that may help, it will go something like:
document_store = PineconeDocumentStore(
api_key="<API_KEY>", # from https://app.pinecone.io
environment="us-west1-gcp"
)
retriever = EmbeddingRetriever(
document_store,
embedding_model='all-mpnet-base-v2',
model_format='sentence_transformers'
)
Before you write the documents you need to convert the data to include your text in content (as you have done above, but no need to pre-append the year), and then include the year as a field in a meta dictionary. So you would create a list of dictionaries that look like:
dicts = [
{'content': 'your text here', 'meta': {'year': 1999}},
{'content': 'another record text', 'meta': {'year': 1971}},
...
]
I don't know the exact format of your df but assuming it is something like:
text
year
"your text here"
1999
"another record here"
1971
We could write the following to reformat it:
df = df.rename(columns={'text': 'content'}) # you did this already
# create a new 'meta' column that contains {'year': <year>} data
df['meta'] = df['year'].apply(lambda x: {'year': x})
# we don't need the year column anymore so we drop it
df = df.drop(['year'], axis=1)
# now convert into the list of dictionaries format as you did before
dicts = df.to_dict(orient='records')
This data replaces the df dictionaries you write, so we would continue as so:
document_store.write_documents(dicts)
document_store.update_embeddings(retriever=retriever)
Now you can query with filters, for example to search for docs with the publish year of 1999 we use the condition "$eq" (equals):
docs = retriever.retrieve(
"some query here",
top_k=25,
filters={
{"year": {"$eq": 1999}}
}
)
For published before 1980 we can use "$lt" (less than):
docs = retriever.retrieve(
"some query here",
top_k=25,
filters={
{"year": {"$lt": 1980}}
}
)

Saving Json/Dict responses to dataframe - they get converted to string on export. How to avoid this?

I am running an API and saving the responses as a dictionary with response.to_dict() to a new column for referencing later.
Sample dataframe:
dict1 = {'thing': 200,
'other thing': 18,
'available_data': {'premium': {'emails': 1}},
'query': {'names': [{'first': 'John','last': 'Smith'}]}}
dict2 = {'thing': 123,
'other thing': 13,
'available_data': {'premium': {'emails': 1}},
'query': {'names': [{'first': 'Foo','last': 'Bar'}]}}
dict_frame = pd.DataFrame({'customers':['John','Foo'],
'api_response':[dict1,dict2]})
print(dict_frame)
customers api_response
0 John {'thing': 200, 'other thing': 18, 'available_d...
1 Foo {'thing': 123, 'other thing': 13, 'available_d...
We can see that the data is stil a dict type:
type(dict_frame.loc[1,'api_response'])
dict
However if I save it to file and re-load it, the data is now a string.
# save to file
dict_frame.to_csv('mydicts.csv')
# reload dataframe
dict_frame = pd.read_csv('mydicts.csv')
# check type
type(dict_frame.loc[1,'api_response'])
#it's a string
str
With some googling, I see there is a package to convert it back to a dict:
from ast import literal_eval
python_dict = literal_eval(first_dict)
It works, but I have a feeling there's a way to avoid this in the first place. Any advice?
I tried dtype={'api_response': dict} while reading in the csv, but TypeError: dtype '<class 'dict'>' not understood
That is a limitation of CSV file type: everything is converted to text. pandas must guess the data type when it reads the text back in. You can specify a converter:
from ast import literal_eval
dict_frame_csv = pd.read_csv('mydicts.csv', converters={'api_response': literal_eval})

Django Copy multiple items from DB Query to dict

i have a Problem with Django and dicts. I want to get only items that match a sting as below. but i canĀ“t get it to work. Thanks for your help.
django_db_query = [{'time': '13:00 Uhr', 'titel': 'test1'}, {'time': '14:00 blah', 'titel': 'test2'}, {'time': '13:00 Uhr', 'titel': 'test3'},]
all_db_items = Django_db.objects.all()
only_13 = dict()
for item in all_db_items:
if item.time is "13":
only_13 += item
wanted: The datastructure and multiple values from my db in my dict but only with time 13:00 Uhr
for item in only_13:
print item.titel
console
test1
test2
Assuming your DjangoDbModel looks something like
def DjangoDbModel(models.Model):
time = models.DateTimeField()
title = models.CharField(max_length=256)
All you need to do in that case is
DjangoDbModels.objects.filter(time__hour=13) if you want to have only items that are from hour 13. You can apply similar filters to the day, year and month for example.
for item in all_db_items:
if item['time'] == "13:00 Uhr":
only_13.update(item)
is only works with identical operands
+ is not implemented for dicts, use update instead
more resources:
https://www.python-course.eu/dictionaries.php

PyMongo: group with 2d geospatial index in conditions returns an error

The error returned is:
exception: manual matcher config not allowed
Here's my code:
cond = {'id': id, 'date': {'$gte': start_date}, 'date': {'$lte': end_date}, 'location': {'$within': {'$box': box }}}
reduce = 'function(obj, prev) { prev.count++; }'
rows = collection.group({'location': True}, cond, {'count': 0}, reduce)
When I remove location from condition then it works fine. If I change the query to find it works fine too so it's a problem with group.
What am I doing wrong?
MongoDB currently (version 1.6.2) doesn't support geo queries for mapreduce and group functions. See http://jira.mongodb.org/browse/SERVER-1742 for the issue ticket (and consider voting it up).

Categories

Resources