Mongo DB query on multiple conditions - python

I am trying to do a simple query based on two conditions in MongoDB using pymongo.
I am using the sample restaurants data set from the tutorial documentation. I have:
from pymongo import MongoClient
import pymongo
import pandas as pd
client = MongoClient()
db = client.test
cursor = db.restaurants.find({"$and":[{'borough':"Manhattan"},{"grades":{'grade':"A"}}]}
for record in cursor:
print record
I am just trying to print all the restaurants in Manhattan with a grade of 'B.' But this pulls back no results. I have also tried
cursor = db.restaurants.find({"borough":"Manhattan", "grades.grade":"B"})
but this will only filter by the first condition and won't filter by the "grade." It's exactly how it is laid out in the documentation but I can't get it to work.

The problem is in the second condition. grades is a subarray of grades, use $elemMatch:
db.restaurants.find({"$and": [{"borough": "Manhattan"}, {"grades": {"$elemMatch": {"grade": "A"}}}]})
Works for me.

I had a similar issue and it worked for me with the following syntax:
result = db.mycollection.find({"$and": [{"key1": value1}, {"key2": value2}]})
I have multiple records with the same value under key1, but I want the only one with specific value on key2. It seems to work for me.

Related

Retrieve data from python script and save to sqllite database

I have a python query which retrieves data through an API. The data returned is a dictionary. I want to save the data in sqlite3 database. There are two main columns('scan','tests'). I'm only interested in the data inside these two columns e.g. 'grade': 'D+', 'likelihood_indicator': 'MEDIUM'.
Any help is appreciated.
import pandas as pd
from httpobs.scanner.local import scan
import sqlite3
website_to_scan = 'digitalnz.org'
scan_site = scan(website_to_scan)
df = pd.DataFrame(scan_site)
print(scan_site)
print(df)`
Results of print(scan_site):
Results of print(df) attached:
This depends on how you have set up your table in sqlite but essentially you would write an INSERT INTO SQL clause and use the connection.execute() function in Python and pass your SQL string as an argument.
Its difficult to give a more precise answer for your question (i.e. code) because you haven't declared the connection variable. Lets imagine you already have your sqlite DB set up with the connection:
connection_variable.execute("""INSERT INTO table_name
(column_name1, column_name2) VALUES (value1, value2);""")

Problem with Pymongo : Want to add new fields to existing db with different values but all entries turn out to be the same

I have a database of reviews and want to create a new field in my database that indicates whether a review contains words relating to "pool".
import re
import pandas as pd
from pymongo import MongoClient
client = MongoClient()
db = client.Hotels_Copenhagen
collection = db.get_collection("hotel_review_table")
data = pd.DataFrame(list(collection.find()))
def common_member(a, b):
a_set = set(a)
b_set = set(b)
if a_set & b_set:
return True
else:
return False
pool_set = {"pool","swim","swimming"}
for single_review in data.review_text:
make_it_lowercase = str(single_review).lower()
tokenize_it = re.split("\s|\.|,", make_it_lowercase)
pool_mentioned = common_member(tokenize_it, pool_set)
db.hotel_review_table.update_one({}, {"$set":{"pool_mentioned": pool_mentioned}})
In python I already counted the amount of reviews containing words related to "pool" and it turns out that 1k/ 50k of my reviews talk about pools.
I solved my previously posted problem of getting the same entry everywhere by moving the db.hotel_review_table.update_one line into the loop.
Thus the main problem is solved. However, it takes quite some time to update the database like this. Is there any other way to make it faster ?
You've gone to a lot of trouble to implement a feature that is available straight out of the box in MongoDB . You need to use text indexes.
Create a text index (in MongoDB shell):
db.hotel_review_table.createIndex( { "single_review": "text" } )
Then your code distils down to:
from pymongo import MongoClient
db = MongoClient()['Hotels_Copenhagen']
for keyword in ['pool', 'swim', 'swimming']:
db.hotel_review_table.update_many({'single_review': keyword}, {'$set': {'pool_mentioned': True}})
Note this doesn't set the value to false in the case that it isn't mentioned; if this is really needed, you can write another update to set any values that aren't true to false.

Dask DataFrame.map_partition() to write to db table

I have a dask dataframe that contains some data after some transformations. I want to write those data back to a mysql table. I have implemented a function that takes a dataframe a db url and writes the dataframe back to database. Because I need some to make some final edits on the data of the dataframe, I use pandas df.to_dict('record') to handle the write.
the function looks like that
def store_partition_to_db(df, db_url):
from sqlalchemy import create_engine
from mymodels import DBTableBaseModel
records_dict = df.to_dict(records)
records_to_db = []
for record in records_dict:
transformed_record = transform_record_some_how # transformed_record is a dictionary
records_to_db.append(transformed_record)
engine = create_engine(db_uri)
engine.execute(DBTableBaseModel.__table__.insert(), records_to_db)
return records_to_db
In my dask code:
from functools import partial
partial_store_partition_to_db(store_partition_to_db db_url=url)
dask_dataframe = dask_dataframe_data.map_partitions(partial_store_partition_to_db)
all_records = dask_dataframe.compute()
print len([record_dict for record_list in all_records for record_dict in record_list]] # Gives me 7700
But when I go to the respected table in MySQL I get 7702 with the same value on all columns that is 1. When I try to filter all_records with that value, no dictionary is returned. Has anyone met this situation before? How do you handle db writes from paritions with dask?
PS: I use LocalCluster and dask distributed
The problem was that I didn't provide meta information in the map_partition method and because of that it created a ataframe with foo values which I turns were written in the db

How to select all data in PyMongo?

I want to select all data or select with conditional in table random but I can't find any guide in MongoDB in Python to do this.
And I can't show all data was select.
Here my code:
def mongoSelectStatement(result_queue):
client = MongoClient('mongodb://localhost:27017')
db = client.random
cursor = db.random.find({"gia_tri": "0.5748676522161966"})
# cursor = db.random.find()
inserted_documents_count = cursor.count()
for document in cursor:
result_queue.put(document)
There is a quite comprehensive documentation for mongodb. For python (Pymongo) here is the URL: https://api.mongodb.org/python/current/
Note: Consider the version you are running. Since the latest version has new features and functions.
To verify pymongo version you are using execute the following:
import pymongo
pymongo.version
Now. Regarding the select query you asked for. As far as I can tell the code you presented is fine. Here is the select structure in mongodb.
First off it is called find().
In pymongo; if you want to select specific rows( not really rows in mongodb they are called documents. I am saying rows to make it easy to understand. I am assuming you are comparing mongodb to SQL); alright so If you want to select specific document from the table (called collection in mongodb) use the following structure (I will use random as collection name; also assuming that the random table has the following attributes: age:10, type:ninja, class:black, level:1903):
db.random.find({ "age":"10" }) This will return all documents that have age 10 in them.
you could add more conditions simply by separating with commas
db.random.find({ "age":"10", "type":"ninja" }) This will select all data with age 10 and type ninja.
if you want to get all data just leave empty as:
db.random.find({})
Now the previous examples display everything (age, type, class, level and _id). If you want to display specific attributes say only the age you will have to add another argument to find called projection eg: (1 is show, 0 is do not show):
{'age':1}
Note here that this returns age as well as _id. _id is always returned by default. You have to explicitly tell it not to returning it as:
db.random.find({ "age":"10", "name":"ninja" }, {"age":1, "_id":0} )
I hope that could get you started.
Take a look at the documentation is very thorough.

Filtering with joined tables

I'm trying to get some query performance improved, but the generated query does not look the way I expect it to.
The results are retrieved using:
query = session.query(SomeModel).
options(joinedload_all('foo.bar')).
options(joinedload_all('foo.baz')).
options(joinedload('quux.other'))
What I want to do is filter on the table joined via 'first', but this way doesn't work:
query = query.filter(FooModel.address == '1.2.3.4')
It results in a clause like this attached to the query:
WHERE foos.address = '1.2.3.4'
Which doesn't do the filtering in a proper way, since the generated joins attach tables foos_1 and foos_2. If I try that query manually but change the filtering clause to:
WHERE foos_1.address = '1.2.3.4' AND foos_2.address = '1.2.3.4'
It works fine. The question is of course - how can I achieve this with sqlalchemy itself?
If you want to filter on joins, you use join():
session.query(SomeModel).join(SomeModel.foos).filter(Foo.something=='bar')
joinedload() and joinedload_all() are used only as a means to load related collections in one pass, not used for filtering/ordering!. Please read:
http://docs.sqlalchemy.org/en/latest/orm/tutorial.html#joined-load - the note on "joinedload() is not a replacement for join()", as well as :
http://docs.sqlalchemy.org/en/latest/orm/loading.html#the-zen-of-eager-loading

Categories

Resources