I'm trying to enable data compression in MongoDB 3.0 using the wiredTiger engine. The compression works fine at the server level where I can provide a global compression algorithm for all the collections in the mongo server config file like this:
storage:
engine: wiredTiger
wiredTiger:
collectionConfig:
blockCompressor: zlib
I want to enable this compression at collection level which can be done by using the below code in mongodb shell:
db.createCollection( "test", {storageEngine:{wiredTiger:{configString:'block_compressor=zlib'}}} );
How can I do this using the pymongo driver ?
from pymongo import MongoClient
client = MongoClient("localhost:27017")
db = client.mydb
Given it works via the Mongo shell, pass the same parameters via pymongo:
db.create_collection('test',
storageEngine={'wiredTiger':{'configString':'block_compressor=zlib'}})
from the official docs we see that
create_collection(name, codec_options=None, read_preference=None,
write_concern=None, read_concern=None, **kwargs)
...
**kwargs (optional): additional keyword arguments will be passed as options for the create collection command
Related
I wanted to execute raw mongo commands like (db.getUsers()) through the pymongo SDK. For example I have some js file which will contain only db.getUsers() in it . My python program needs to execute that command through establishing connection . I tried this db.command , db.runCommand but I'm not able to achieve that. After establishing the connection it should execute the mongo command whatever in the js file. Please assist.
db.getUsers() is a shell helper script to the usersInfo command (see https://docs.mongodb.com/manual/reference/method/db.getUsers/)
db.getUsers() wraps the usersInfo: 1 command.
You can run the usersInfo command using db.command() in pymongo with something like
from pymongo import MongoClient
db = MongoClient()['admin']
command_result = db.command({'usersInfo': 1})
print(command_result)
I'm using Peewee in one of my projecs. Specifically, I'm using SqliteQueueDatabase and I need to create a backup (i.e. another *.db file) without stopping my application. I saw that there are two methods that could work for me (backup and backup_to_file) but they're methods from CSqliteExtDatabase, and SqliteQueueDatabase is subclass of SqliteExtDatabase. I've found solutions to manually create a dump of the file, but I need a *.db file (not a *.csv file, for example). Couldn't find any similar question or relevant answer.
Thanks!
You can just import the backup_to_file() helper from playhouse._sqlite_ext and pass it your connection and a filename:
db = SqliteQueueDatabase('...')
from playhouse._sqlite_ext import backup_to_file
conn = db.connection() # get the underlying pysqlite conn
backup_to_file(conn, 'dest.db')
Also, if you're using pysqlite3, then there are also backup methods available on the connection itself.
I have a python application which should run remotely via an AWS pipeline and use secrets to get parameters such as database credentials. When running the application locally the parameters are loaded from a parameters.json file. My problem is how to to test I run remotely (so replacing IN_CLOUD_TEST):
import boto3
from json import load
if [IN_CLOUD_TEST]:
params_raw = boto3.client('ssm').get_parameters_by_path(Path='/', Recursive=True)['Parameters']
params = format_params(params)
else:
with open('parameters.txt') as json_file:
params = load(json_file)
I could of course use a try/except, but there must be something nicer.
You could check using AWS APIs, but a simpler alternative (and one that doesn't require making HTTP calls, helping you shave off some latency) is to set an environment variable on your remote server that tells it it's the production server and read it from the code.
import boto3
from json import load
from os import getenv
if getenv('IS_REMOTE', False):
params_raw = boto3.client('ssm').get_parameters_by_path(Path='/', Recursive=True)['Parameters']
params = format_params(params)
else:
with open('parameters.txt') as json_file:
params = load(json_file)
You could also apply the same logic but defining a variable that equals true when your server is supposed to be the testing one, and setting it on your local testing machine.
I am facing a problem to connect to an Azure MS SQL Server 2014 database in Apache Airflow 1.10.1 using pymssql.
I want to use the MsSqlHook class provided by Airflow, for the convenience to create my connection in the Airflow UI, and then create a context manager for my connection using SqlAlchemy:
#contextmanager
def mssql_session(dt_conn_id):
sqla_engine = MsSqlHook(mssql_conn_id=dt_conn_id).get_sqlalchemy_engine()
session = sessionmaker(bind=sqla_engine)()
try:
yield session
except:
session.rollback()
raise
else:
session.commit()
finally:
session.close()
But when I do that, I have this error when I run a request :
sqlalchemy.exc.InterfaceError: (pyodbc.InterfaceError) ('IM002',
'[IM002] [unixODBC][Driver Manager]Data source name not found, and no
default driver specified (0) (SQLDriverConnect)') (Background on this
error at: http://sqlalche.me/e/rvf5)
It seems come from pyodbc whereas I want to use pymssql (and in MsSqlHook, the method get_conn uses pymssql !)
I searched in the source code of Airflow the cause.
I noticed that the method get_uri from the class DbApiHook (from which is inherited MsSqlHook) builds the connection string passed to SqlAlchemy like this:
'{conn.conn_type}://{login}{host}/{conn.schema}'
But conn.conn_type is simply equal to 'mssql' whereas we need to specify the DBAPI as described here:
https://docs.sqlalchemy.org/en/latest/core/engines.html#microsoft-sql-server
(for example : 'mssql+pymssql://scott:tiger#hostname:port/dbname')
So, by default, I think it uses pyodbc.
But how can I set properly the conn_type of the connection to 'mssql+pymssql' instead of 'mssql' ?
In the Airflow IU, you can simply select SQL server in a dropdown list, but not set as you want :
To work around the issue, I overload the get_uri method from DbApiHook in a new class I created inherited from MsSqlHook, and in which I build my own connection string, but it's not clean at all...
Thanks for any help
You're right. There's no easy, straightforward way to get Airflow to do what you want. Personally I would build the sqlalchemy engine inside of your context manager, something like create_engine(hook.get_uri().replace("://", "+pymssql://")) -- then I would toss the code somewhere reusable.
You can create a connection by passing it as an environment variable to Airflow. See the docs. The value of the variable is the database URL in the format SqlAlchemy accepts.
The name of the env var follows the pattern AIRFLOW_CONN_ to which you append the connection ID. For example AIRFLOW_CONN_MY_MSSQL, in this case, the conn_id would be 'my_mssql'.
I've used boto to interact with S3 with with no problems, but now I'm attempting to connect to the AWS Support API to pull back info on open tickets, trusted advisor results, etc. It seems that the boto library has different connect methods for each AWS service? For example, with S3 it is:
conn = S3Connection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
According to the boto docs, the following should work to connect to AWS Support API:
>>> from boto.support.connection import SupportConnection
>>> conn = SupportConnection('<aws access key>', '<aws secret key>')
However, there are a few problems I see after digging through the source code. First, boto.support.connection doesn't actually exist. boto.connection does, but it doesn't contain a class SupportConnection. boto.support.layer1 exists, and DOES have the class SupportConnection, but it doesn't accept key arguments as the docs suggest. Instead it takes 1 argument - an AWSQueryConnection object. That class is defined in boto.connection. AWSQueryConnection takes 1 argument - an AWSAuthConnection object, class also defined in boto.connection. Lastly, AWSAuthConnection takes a generic object, with requirements defined in init as:
class AWSAuthConnection(object):
def __init__(self, host, aws_access_key_id=None,
aws_secret_access_key=None,
is_secure=True, port=None, proxy=None, proxy_port=None,
proxy_user=None, proxy_pass=None, debug=0,
https_connection_factory=None, path='/',
provider='aws', security_token=None,
suppress_consec_slashes=True,
validate_certs=True, profile_name=None):
So, for kicks, I tried creating an AWSAuthConnection by passing keys, followed by AWSQueryConnection(awsauth), followed by SupportConnection(awsquery), with no luck. This was inside a script.
Last item of interest is that, with my keys defined in a .boto file in my home directory, and running python interpreter from the command line, I can make a direct import and call to SupportConnection() (no arguments) and it works. It clearly is picking up my keys from the .boto file and consuming them but I haven't analyzed every line of source code to understand how, and frankly, I'm hoping to avoid doing that.
Long story short, I'm hoping someone has some familiarity with boto and connecting to AWS API's other than S3 (the bulk of material that exists via google) to help me troubleshoot further.
This should work:
import boto.support
conn = boto.support.connect_to_region('us-east-1')
This assumes you have credentials in your boto config file or in an IAM Role. If you want to pass explicit credentials, do this:
import boto.support
conn = boto.support.connect_to_region('us-east-1', aws_access_key_id="<access key>", aws_secret_access_key="<secret key>")
This basic incantation should work for all services in all regions. Just import the correct module (e.g. boto.support or boto.ec2 or boto.s3 or whatever) and then call it's connect_to_region method, supplying the name of the region you want as a parameter.