I am participating in the Yelp Dataset Challenge and I'm using RethinkDB to store the JSON documents for each of the different datasets.
I have the following script:
import rethinkdb as r
import json, os
RDB_HOST = os.environ.get('RDB_HOST') or 'localhost'
RDB_PORT = os.environ.get('RDB_PORT') or 28015
DB = 'test'
connection = r.connect(host=RDB_HOST, port=RDB_PORT, db=DB)
query = r.table('yelp_user').filter({"name":"Arthur"}).run(connection)
print(query)
But when I run it at the terminal in a virtualenv I get this as an example response:
<rethinkdb.net.DefaultCursor object at 0x102c22250> (streaming):
[{'yelping_since': '2014-03', 'votes': {'cool': 1, 'useful': 2, 'funny': 1}, 'review_count': 5, 'id': '08eb0b0d-2633-4ec4-93fe-817a496d4b52', 'user_id': 'ZuDUSyT4bE6sx-1MzYd2Kg', 'compliments': {}, 'friends': [], 'average_stars': 5, 'type': 'user', 'elite': [], 'name': 'Arthur', 'fans': 0}, ...]
I know I can use pprint to pretty print outputs but a bigger issue that I don't understand how to resolve is just printing them in an intelligent manner, like not just showing "..." as the end of the output.
Any suggestions?
run returns an iterable cursor. Iterate over it to get all the rows:
query = r.table('yelp_user').filter({"name":"Arthur"})
for row in query.run(connection):
print(row)
Related
I would like to create a query to find the number of trees whose species name ends by 'um'
by arrondissement.
My code is here:
from pymongo import MongoClient
from utils import get_my_password, get_my_username
from pprint import pprint
client = MongoClient(
host='127.0.0.1',
port=27017,
username=get_my_username(),
password=get_my_password(),
authSource='admin'
)
db = client['paris']
col = db['trees']
pprint(col.find_one())
{'_id': ObjectId('5f3276d8c22f704983b3f681'),
'adresse': 'JARDIN DU CHAMP DE MARS / C04',
'arrondissement': 'PARIS 7E ARRDT',
'circonferenceencm': 115.0,
'domanialite': 'Jardin',
'espece': 'hippocastanum',
'genre': 'Aesculus',
'geo_point_2d': [48.8561906007, 2.29586827747],
'hauteurenm': 11.0,
'idbase': 107224.0,
'idemplacement': 'P0040937',
'libellefrancais': 'Marronnier',
'remarquable': '0',
'stadedeveloppement': 'A',
'typeemplacement': 'Arbre'}
I tryed to do it with next lines:
import re
regex = re.compile('um')
pipeline = [
{'$group': {'_id': '$arrondissement',
'CountNumberTrees': {'$count': '${'espece': regex}'}
}
}
]
results = col.aggregate(pipeline)
pprint(list(results))
But it returns:
File "<ipython-input-114-fba3a8bf5bfd>", line 8
'CountNumberTrees': {'$count': '${'espece': regex}'}
^
SyntaxError: invalid syntax
When I check like this, it shows results: '25245'
results = col.count_documents(filter={'espece': regex})
print(results)
Could you help me please to understand what should I put in pipeline?
Best regards
Try this syntax for your aggregate query:
The $match stage filters on espace ending in um.
The $group stage counts each returned record grouped by arrondissement
The $project stage is optional but it provides a tidier list of fields.
cursor = col.aggregate([
{'$match': {'espece': {'$regex': 'um$'}}},
{'$group': {'_id': '$arrondissement', 'CountNumberTrees': {'$sum': 1}}},
{'$project': {'_id': 0, 'arrondissement': '$_id', 'CountNumberTrees': '$CountNumberTrees'}}
])
print(list(cursor))
I am trying to return just two fields from my MongoDB database and this query correct sends back my username and posts from the client:
db.users.find({'email_address': /^johnny/}, {'username': 1, 'user_posts': 1, '_id': 0});
I'm trying to switch this to PyMongo so I can query in Python and my query looks like this:
regx = Regex('^{0}'.format('johnny'))
query = {'postcode': regx}, {'username': 1, 'user_posts': 1, '_id': 0}
user_list = mycol.find(query).limit(5)
The bit where it's failing is here:
{'username': 1, 'user_posts': 1, '_id': 0}
As without this filter the documents are sent in full fine. With that I get this error message from PyMongo:
TypeError: filter must be an instance of dict, bson.son.SON, or any other type that inherits from collections.Mapping
I assume that my filter is malformed and it's no longer a dictionary so have tried various alternatives of wrapping in quotes and then .format() etc but cannot seem to hit the magical combination.
The problem is that the following is not a dictionary but a tuple of two dictionaries:
query = {'postcode': regx}, {'username': 1, 'user_posts': 1, '_id': 0}
if you wanted to use it directly you could do it with
user_list = mycol.find(query[0], query[1]).limit(5)
but probably best to actually assign it to two separate variables like so
query, filter = {'postcode': regx}, {'username': 1, 'user_posts': 1, '_id': 0}
and then
user_list = mycol.find(query, filter).limit(5)
Turned out it was quite easy.. I just needed to have two dictionaries. 1 for the query and the other for the filter.
regx = Regex('^{0}'.format('johnny'))
filter_stuff = {'username': 1, 'user_posts': 1, '_id': 0}
Then just build the full query as follows:
user_list = mycol.find(query, filter_stuff).limit(5)
I'm working on a python client program to Cloundant.
I'd like to retrieve a doc, not based on "_id",but on my own field.
Still, it does not work causing Key Error. Any help to solve this error is highly appreciated!
Here is my code:
from cloudant.client import Cloudant
from cloudant.error import CloudantException
from cloudant.result import Result,ResultByKey
...
client.connect()
databaseName = "mydata1"
myDatabase = client[databaseName]
# As direct access like 'doc = myDatabase[<_id>]' cannot work for my key,
# let's check on by one ...
for document in myDatabase:
# if document['_id']== "20170928chibikasmall": <= if I use _id it's ok
if document['gokigenField']== 111:
This cause
KeyError :'gokigenField'
In advance, I've created gokigenField index using dashboard, then confirm the result via my postman with REST API
GET https://....bluemix.cloudant.com/mydata1/_index
the result is as follows:
{"total_rows":2,"indexes":[{"ddoc":null,"name":"_all_docs","type":"special","def":{"fields":[{"_id":"asc"}]}},{"ddoc":"_design/f7fb53912eb005771b736422f41c24cd26c7f06a","name":"gokigen-index","type":"text","def":{"default_analyzer":"keyword","default_field":{},"selector":{},"fields":[{"gokigenField":"number"}],"index_array_lengths":true}}]}
Also, I've confirmed I can use this gokigenField as query index nicely on cloudant dashboard as well as POST query .
My newly created "gokigenField" is not included in all the document in DB, as there are automatically created doc ("_design/xxx) without that field.
I guess this might cause KeyError, when I call this from my Python client.
I cannot find Cloudant API for checking 'if a specific key exists or not in a document', in the reference.. So, cannot have any idea how to by-pass such docs...
This is how to index an query data from the Python client. Let's assume we already have the library imported and have a database client in myDatabase.
First of all I created some data:
#create some data
data = { 'name': 'Julia', 'age': 30, 'pets': ['cat', 'dog', 'frog'], 'gokigenField': 'a' }
myDatabase.create_document(data)
data = { 'name': 'Fred', 'age': 30, 'pets': ['dog'], 'gokigenField': 'b' }
myDatabase.create_document(data)
data = { 'name': 'Laura', 'age': 31, 'pets': ['cat'], 'gokigenField': 'c' }
myDatabase.create_document(data)
data = { 'name': 'Emma', 'age': 32, 'pets': ['cat', 'parrot', 'hamster'], 'gokigenField': 'c' }
myDatabase.create_document(data)
We can check the data is there in the Cloudant dashboard or by doing:
# check the data is there
for document in myDatabase:
print(document)
Next we can opt to index the field gokigenField like so:
# create an index on the field 'gokigenField'
mydb.create_query_index(fields=['gokigenField'])
Then we can query the database:
# do a query
selector = {'gokigenField': {'$eq': 'c'}}
docs = mydb.get_query_result(selector)
for doc in docs:
print (doc)
which outputs the two matching documents.
The python-cloudant documentation is here.
I have a set of data that a user needs to query using their own query string. The current solution creates a temporary in-memory sqlite database that the query is run against.
The dataset is a list of "flat" dictionaries, i.e. there is no nested data. The query string does not need to be SQL, but it should be simple to define using an existing query framework.
It needs to support ordering (ascending, descending, custom) and filtering.
The purpose of this question is to get a range of different solutions that might work for this use case.
import sqlite3
items = [
{'id': 1},
{'id': 2, 'description': 'This is a description'},
{'id': 3, 'comment': 'This is a comment'},
{'id': 4, 'height': 1.78}
]
# Assemble temporary sqlite database
conn = sqlite3.connect(':memory:')
cur = conn.cursor()
knownTypes = { "id": "real", "height": "real", "comment": "text" }
allKeys = list(set().union(*(d.keys() for d in items)))
allTypes = list(knownTypes.get(k, "text") for k in allKeys)
createTable_query = "CREATE TABLE data ({});".format(", ".join(["{} {}".format(x[0], x[1]) for x in zip(allKeys, allTypes)]))
cur.execute(createTable_query)
conn.commit()
qs = ["?" for i in range(len(allKeys))]
insertRow_query = "INSERT INTO data VALUES ({});".format(", ".join(qs))
for p in items:
vals = list([p.get(k, None) for k in allKeys])
cur.execute(insertRow_query, vals)
conn.commit()
# modify user query here
theUserQuery = "SELECT * FROM data"
# Get data from query
data = [row for row in cur.execute(theUserQuery)]
YAQL is what I'm looking for.
It doesn't do SQL, but it does execute a query string - which is a simple way to do complex user-defined sorting and filtering.
There's a library called litebox that does what you want. It is backed by SQLite.
from litebox import LiteBox
items = [
{'id': 1},
{'id': 2, 'description': 'This is a description'},
{'id': 3, 'comment': 'This is a comment'},
{'id': 4, 'height': 1.78}
]
types = {"id": int, "height": float, "comment": str}
lb = LiteBox(items, types)
lb.find("height > 1.5")
Result: [{'id': 4, 'height': 1.78}]
I have a pretty big dictionary which looks like this:
{
'startIndex': 1,
'username': 'myemail#gmail.com',
'items': [{
'id': '67022006',
'name': 'Adopt-a-Hydrant',
'kind': 'analytics#accountSummary',
'webProperties': [{
'id': 'UA-67522226-1',
'name': 'Adopt-a-Hydrant',
'websiteUrl': 'https://www.udemy.com/,
'internalWebPropertyId': '104343473',
'profiles': [{
'id': '108333146',
'name': 'Adopt a Hydrant (Udemy)',
'type': 'WEB',
'kind': 'analytics#profileSummary'
}, {
'id': '132099908',
'name': 'Unfiltered view',
'type': 'WEB',
'kind': 'analytics#profileSummary'
}],
'level': 'STANDARD',
'kind': 'analytics#webPropertySummary'
}]
}, {
'id': '44222959',
'name': 'A223n',
'kind': 'analytics#accountSummary',
And so on....
When I copy this dictionary on my Jupyter notebook and I run the exact same function I run on my django code it runs as expected, everything is literarily the same, in my django code I'm even printing the dictionary out then I copy it to the notebook and run it and I get what I'm expecting.
Just for more info this is the function:
google_profile = gp.google_profile # Get google_profile from DB
print(google_profile)
all_properties = []
for properties in google_profile['items']:
all_properties.append(properties)
site_selection=[]
for single_property in all_properties:
single_propery_name=single_property['name']
for single_view in single_property['webProperties'][0]['profiles']:
single_view_id = single_view['id']
single_view_name = (single_view['name'])
selections = single_propery_name + ' (View: '+single_view_name+' ID: '+single_view_id+')'
site_selection.append(selections)
print (site_selection)
So my guess is that my notebook has some sort of json parser installed or something like that? Is that possible? Why in django I can't access dictionaries the same way I can on my ipython notebooks?
EDITS
More info:
The error is at the line: for properties in google_profile['items']:
Django debug is: TypeError at /gconnect/ string indices must be integers
Local Vars are:
all_properties =[]
current_user = 'myemail#gmail.com'
google_profile = `the above dictionary`
So just to make it clear for who finds this question:
If you save a dictionary in a database django will save it as a string, so you won't be able to access it after.
To solve this you can re-convert it to a dictionary:
The answer from this post worked perfectly for me, in other words:
import json
s = "{'muffin' : 'lolz', 'foo' : 'kitty'}"
json_acceptable_string = s.replace("'", "\"")
d = json.loads(json_acceptable_string)
# d = {u'muffin': u'lolz', u'foo': u'kitty'}
There are many ways to convert a string to a dictionary, this is only one. If you stumbled in this problem you can quickly check if it's a string instead of a dictionary with:
print(type(var))
In my case I had:
<class 'str'>
before converting it with the above method and then I got
<class 'dict'>
and everything worked as supposed to