Find documents with MongoDB via Cosmos DB - python

I am attempting to retrieve all documents from a specified collection from MongoDB via Cosmos DB. I am returning an empty list instead of the documents I've requested.
def retrieve_transactions(collection):
client = MongoClient(environ.get('DB_URI')) # MongoClient is imported from pymongo
db = client[str(environ.get('DB'))]
transaction_collection = db[collection].transactions
transaction_list = list(transaction_collection.find({}))
client.close()
return transaction_list
The primary URI is being retrieved from the App Services application settings. The function successfully retrieves test data from my IDE as expected. This leads me to believe the issue involves Cosmos DB itself. I'm successfully inserting documents to this database from a separate App Services instance too. The database's Insights tab shows find requests and zero failed requests.
I'm stumped. Any thoughts?

I solved this by removing the dots (".") from my collection's name.
example.com.transactions -> examplecom
Cosmos DB (MongoDB API) must not support this structure.

Related

How to insert bulk data into Cosmos DB in Python?

I'm developing an application in Python which uses Azure Cosmos DB as the main database. At some point in the app, I need to insert bulk data (a batch of items) into Cosmos DB. So far, I've been using Azure Cosmos DB Python SDK for SQL API for communicating with Cosmos DB; however, it doesn't provide a method for bulk data insertion.
As I understood, these are the insertion methods provided in this SDK, both of which only support single item insert, which can be very slow when using it in a for loop:
.upsert_item()
.create_item()
Is there another way to use this SDK to insert bulk data instead of using the methods above in a for loop? If not, is there an Azure REST API that can handle bulk data insertion?
The Cosmos DB service does not provide this via its REST API. Bulk mode is implemented at the SDK layer and unfortunately, the Python SDK does not yet support bulk mode. It does however support asynchronous IO. Here's an example that may help you.
from azure.cosmos.aio import CosmosClient
import os
URL = os.environ['ACCOUNT_URI']
KEY = os.environ['ACCOUNT_KEY']
DATABASE_NAME = 'myDatabase'
CONTAINER_NAME = 'myContainer'
async def create_products():
async with CosmosClient(URL, credential=KEY) as client:
database = client.get_database_client(DATABASE_NAME)
container = database.get_container_client(CONTAINER_NAME)
for i in range(10):
await container.upsert_item({
'id': 'item{0}'.format(i),
'productName': 'Widget',
'productModel': 'Model {0}'.format(i)
}
)
Update: I remembered another way you can do bulk inserts in Cosmos DB for Python SDK and that is using Stored Procedures. There are examples of how to write these, including samples that demonstrate passing an array, which is what you want to do. I would also take a look at bounded execution as you will want to implement this as well. You can learn how to write them here, How to write stored procedures. Then how to register and call them here, How to use Stored Procedures. Note: these can only be used when passing a partition key value so you can only do batches within logical partitions.

Accessing an Azure Database for MySQL Single Server from outside Azure

Moving this question from DevOps Stack Exchange where it got only 5 views in 2 days:
I would like to query an Azure Database for MySQL Single Server.
I normally interact with this database using a universal database tool (dBeaver) installed onto an Azure VM. Now I would like to interact with this database using Python from outside Azure. Ultimately I would like to write an API (FastAPI) allowing multiple users to connect to the database.
I ran a simple test from a Jupyter notebook, using SQLAlchemy as my ORM and specifying the pem certificate as a connection argument:
import pandas as pd
from sqlalchemy import create_engine
cnx = create_engine('mysql://XXX', connect_args={"ssl": {"ssl_ca": "mycertificate.pem"}})
I then tried reading data from a specific table (e.g. mytable):
df = pd.read_sql('SELECT * FROM mytable', cnx)
Alas I ran into the following error:
'Client with IP address 'XX.XX.XXX.XXX' is not allowed to connect to
this MySQL server'.
According to my colleagues, a way to fix this issue would be to whitelist my IP address.
While this may be an option for a couple of users with static IP addresses I am not sure whether it is a valid solution in the long run.
Is there a better way to access an Azure Database for MySQL Single Server from outside Azure?
As mentioned in comments, you need to whitelist the IP address ranges(s) in the Azure portal for your MySQL database resource. This is a well accepted and secure approach.

Trying to Connect to azure cosmos client using python, Gives 104 connection aborted error

Okay so I have an azure cosmos subscription, where I have created a Mongo DB resource, Now when I am using python SDK to connect it, now it's given when 104, error, connection reset by peer.
Now I am not sure what's the issue,
I am using endpoint with SSL True and Primary Key.
code
endpoint = "http://XXX.mongo.cosmos.azure.com:10255/?ssl=true"
key = 'xxxxxxxxxxxxxxxx'
# <create_cosmos_client>
client = CosmosClient(endpoint, key)
When choosing the MongoDB API, you must use a native MongoDB SDK (in your case, pymongo); the wire protocol is MongoDB, and operations are performed via the same protocol as MongoDB.
Your code is attempting to use the Cosmos DB SDK, which is specific to, and will only work with, the Core (SQL) API.
If you look in the portal blade for your MongoDB-API instance, you'll see examples under Quick Start tab, which each use a MongoDB SDK in its examples (or the mongo shell). Same thing with the Connection Strings tab, showing native MongoDB connection strings (as well as the separate parts of the connection string).

How to delete pymongo.Database.Database object

I am using pymongo to connect to mongodb in my code. I am writing a google analytic kind of application. My db structure is like that for each new website I create a new db. So when someone registers a website I create a new db with that name, however when unregistering the website I wish the database to be deleted. I remove all the collection but still the database could not be removed
And as such the list of databases is growing very large. When I do
client = MongoClient(host=MONGO_HOST,port=27017,max_pool_size=200)
client.database_names()
I see a more than a 1000 list of apps. Many of them are just empty databases. Is there a way that I remove the mongo databases ?
Use drop_database method:
client = MongoClient(host=MONGO_HOST,port=27017,max_pool_size=200)
client.drop_database("database_name")

Python database WITHOUT using Django (for Heroku)

To my surprise, I haven't found this question asked elsewhere. Short version, I'm writing an app that I plan to deploy to the cloud (probably using Heroku), which will do various web scraping and data collection. The reason it'll be in the cloud is so that I can have it be set to run on its own every day and pull the data to its database without my computer being on, as well as so the rest of the team can access the data.
I used to use AWS's SimpleDB and DynamoDB, but I found SDB's storage limitations to be to small and DDB's poor querying ability to be a problem, so I'm looking for a database system (SQL or NoSQL) that can store arbitrary-length values (and ideally arbitrary data structures) and that can be queried on any field.
I've found many database solutions for Heroku, such as ClearDB, but all of the information I've seen has shown how to set up Django to access the database. Since this is intended to be script and not a site, I'd really prefer not to dive into Django if I don't have to.
Is there any kind of database that I can hook up to in Heroku with Python without using Django?
You can get a database provided from Heroku without requiring your app to use Django. To do so:
heroku addons:add heroku-postgresql:dev
If you need a larger more dedicated database, you can examine the plans at Heroku Postgres
Within your requirements.txt you'll want to add:
psycopg2
Then you can connect/interact with it similar to the following:
import psycopg2
import os
import urlparse
urlparse.uses_netloc.append('postgres')
url = urlparse.urlparse(os.environ['DATABASE_URL'])
conn = psycopg2.connect("dbname=%s user=%s password=%s host=%s " % (url.path[1:], url.username, url.password, url.hostname))
cur = conn.cursor()
query = "SELECT ...."
cur.execute(query)
I'd use MongoDB. Heroku has support for it, so I think it will be really easy to start and scale out: https://addons.heroku.com/mongohq
About Python: MongoDB is a really easy database. The schema is flexible and fits really well with Python dictionaries. That's something really good.
You can use PyMongo
from pymongo import Connection
connection = Connection()
# Get your DB
db = connection.my_database
# Get your collection
cars = db.cars
# Create some objects
import datetime
car = {"brand": "Ford",
"model": "Mustang",
"date": datetime.datetime.utcnow()}
# Insert it
cars.insert(car)
Pretty simple, uh?
Hope it helps.
EDIT:
As Endophage mentioned, another good option for interfacing with Mongo is mongoengine. If you have lots of data to store, you should take a look at that.
I did this recently with Flask. (https://github.com/HexIce/flask-heroku-sqlalchemy).
There are a couple of gotchas:
1. If you don't use Django you may have to set up your database yourself by doing:
heroku addons:add shared-database
(Or whichever database you want to use, the others cost money.)
2. The database URL is stored in Heroku in the "DATABASE_URL" environment variable.
In python you can get it by doing.
dburl = os.environ['DATABASE_URL']
What you do to connect to the database from there is up to you, one option is SQLAlchemy.
Create a standalone Heroku Postgres database. http://postgres.heroku.com

Categories

Resources