Python loop inserting last row only in cassandra

Python loop inserting last row only in cassandra - python

I typed a small demo loop in order to insert random values in Cassandra but only the last record is persisted into the database. I am using cassandra-driver from datastax and its object modeling lib. Cassandra version is 3.7 and Python 3.4. Any idea what I am doing wrong?
#!/usr/bin/env python
import datetime
import uuid
from random import randint, uniform
from cassandra.cluster import Cluster
from cassandra.cqlengine import connection, columns
from cassandra.cqlengine.management import sync_table
from cassandra.cqlengine.models import Model
from cassandra.cqlengine.query import BatchQuery
class TestTable(Model):
_table_name = 'test_table'
key = columns.UUID(primary_key=True, default=uuid.uuid4())
type = columns.Integer(index=True)
value = columns.Float(required=False)
created_time = columns.DateTime(default=datetime.datetime.now())
def main():
connection.setup(['127.0.0.1'], 'test', protocol_version = 3)
sync_table(TestTable)
for _ in range(10):
type = randint(1, 3)
value = uniform(-10, 10)
row = TestTable.create(type=type, value=value)
print("Inserted row: ", row.type, row.value)
print("Done inserting")
q = TestTable.objects.count()
print("We have inserted " + str(q) + " rows.")
if __name__ == "__main__":
main()
Many thanks!

The problem is in the definition of the key column:
key = columns.UUID(primary_key=True, default=uuid.uuid4())
For the default value it's going to call the uuid.uuid4 function once and use that result as the default for all future inserts. Because that's your primary key, all 10 writes will happen to the same primary key.
Instead, drop the parentheses so you are just passing a reference to uuid.uuid4 rather than calling it:
key = columns.UUID(primary_key=True, default=uuid.uuid4)
Now each time you create a row you'll get a new unique UUID value, and therefore a new row in Cassandra.

You need to use the method save.
...
row = TestTable(type=type, value=value)
row.save()
...
http://cqlengine.readthedocs.io/en/latest/topics/models.html#cqlengine.models.Model.save

Related

Reset index name in elasticsearch dsl

I'm trying to create an ETL that extracts from mongo, process the data and loads into elastic. I will do a daily load so I thought of naming my index with the current date. This will help me for a later processing I need to do with this first index.
I used elasticsearch dsl guide: https://elasticsearch-dsl.readthedocs.io/en/latest/persistence.html
The problem that I have comes from my little experience with working with classes. I don't know how to reset the Index name from the class.
Here is my code for the class (custom_indices.py):
from elasticsearch_dsl import Document, Date, Integer, Keyword, Text
from elasticsearch_dsl.connections import connections
from elasticsearch_dsl import Search
import datetime
class News(Document):
title = Text(analyzer='standard', fields={'raw': Keyword()})
manual_tagging = Keyword()
class Index:
name = 'processed_news_'+datetime.datetime.now().strftime("%Y%m%d")
def save(self, ** kwargs):
return super(News, self).save(** kwargs)
def is_published(self):
return datetime.now() >= self.processed
And this is the part of the code where I create the instance to that class:
from custom_indices import News
import elasticsearch
import elasticsearch_dsl
from elasticsearch_dsl.connections import connections
import pandas as pd
import datetime
connections.create_connection(hosts=['localhost'])
News.init()
for index, doc in df.iterrows():
new_insert = News(meta={'id': doc.url_hashed},
title = doc.title,
manual_tagging = doc.customTags,
)
new_insert.save()
Every time I call the "News" class I would expect to have a new name. However, the name doesn't change even if I load the class again (from custom_indices import News). I know this is only a problem I have when testing but I'd like to know how to force that "reset". Actually, I originally wanted to change the name outside the class with this line right before the loop:
News.Index.name = "NEW_NAME"
However, that didn't work. I was still seeing the name defined on the class.
Could anyone please assist?
Many thanks!
PS: this must be just an object oriented programming issue. Apologies for my ignorance on the subject.

Maybe you could take advantage of the fact that Document.init() accepts an index keyword argument. If you want the index name to get set automatically, you could implement init() in the News class and call super().init(...) in your implementation.
A simplified example (python 3.x):
from elasticsearch_dsl import Document
from elasticsearch_dsl.connections import connections
import datetime
class News(Document):
#classmethod
def init(cls, index=None, using=None):
index_name = index or 'processed_news_' + datetime.datetime.now().strftime("%Y%m%d")
return super().init(index=index_name, using=using)

You can override the index when you call save() .
new_insert.save('processed_news_' + datetime.datetime.now().strftime("%Y%m%d"))

Example as following.
# coding: utf-8
import datetime
from elasticsearch_dsl import Keyword, Text, \
Index, Document, Date
from elasticsearch_dsl.connections import connections
HOST = "localhost:9200"
index_names = [
"foo-log-",
"bar-log-",
]
default_settings = {"number_of_shards": 4, "number_of_replicas": 1}
index_settings = {
"foo-log-": {
"number_of_shards": 40,
"number_of_replicas": 1
}
}
class LogDoc(Document):
level = Keyword(ignore_above=256)
date = Date(format="yyyy-MM-dd'T'HH:mm:ss.SSS")
hostname = Text(fields={'fields': Keyword(ignore_above=256)})
message = Text()
createTime = Date(format="yyyy-MM-dd'T'HH:mm:ss.SSS")
def auto_create_index():
'''自动创建ES索引'''
connections.create_connection(hosts=[HOST])
for day in range(3):
dt = datetime.datetime.now() + datetime.timedelta(days=day)
for index in index_names:
name = index + dt.strftime("%Y-%m-%d")
settings = index_settings.get(index, default_settings)
idx = Index(name=name)
idx.document(LogDoc)
idx.settings(**settings)
try:
idx.create()
except Exception as e:
print(e)
continue
print("create index %s" % name)
if __name__ == '__main__':
auto_create_index()

SQLAlchemy 'on_conflict_do_update' does not update

I have the following code which I would like to do an upsert:
def add_electricity_reading(
*, period_usage, period_started_at, is_estimated, customer_pk
):
from sqlalchemy.dialects.postgresql import insert
values = dict(
customer_pk=customer_pk,
period_usage=period_usage,
period_started_at=period_started_at,
is_estimated=is_estimated,
)
insert_stmt = insert(ElectricityMeterReading).values(**values)
do_update_stmt = insert_stmt.on_conflict_do_update(
constraint=ElectricityMeterReading.__table_args__[0].name,
set_=dict(
period_usage=period_usage,
period_started_at=period_started_at,
is_estimated=is_estimated,
)
)
conn = DBSession.connection()
conn.execute(do_update_stmt)
return DBSession.query(ElectricityMeterReading).filter_by(**dict(
period_usage=period_usage,
period_started_at=period_started_at,
customer_pk=customer_pk,
is_estimated=is_estimated,
)).one()
def test_updates_existing_record_for_started_at_if_already_exists():
started_at = datetime.now(timezone.utc)
existing = add_electricity_reading(
period_usage=0.102,
customer_pk=customer.pk,
period_started_at=started_at,
is_estimated=True,
)
started_at = existing.period_started_at
reading = add_electricity_reading(
period_usage=0.200,
customer_pk=customer.pk,
period_started_at=started_at,
is_estimated=True,
)
# existing record was updated
assert reading.period_usage == 0.200
assert reading.id == existing.id
In my test when I add an existing record with period_usage=0.102 and then execute the query again but change to period_usage=0.2. When the final query at the bottom returns the record the period_usage is still 0.102.
Any idea why this could be happening?

This behaviour is explained in "Session Basics" under "What does the Session do?" The session holds references to objects it has loaded in a structure called the identity map, and so ensures that only 1 unique object per primary key value exists at a time during a session's lifetime. You can verify this with the following assertion in your own code:
assert existing is reading
The Core insert (or update) statements you are executing do not keep the session in sync with the changes taking place in the database the way for example Query.update() does. In order to fetch the new values you can expire the ORM loaded state of the unique object:
DBSession.expire(existing) # or reading, does not matter
# existing record was updated
assert reading.period_usage == 0.200
assert reading.id == existing.id

python cassandra get big result of select * in generator (without storage result in ram)

I want to get all data in cassandra table "user"
i have 840000 users and i don't want to get all users in python list.
i want get users in packs of 100 users
in cassandra doc https://datastax.github.io/python-driver/query_paging.html
i see i can use fetch_size, but in my python code i have database object that contains all cql instruction
from cassandra.cluster import Cluster
from cassandra.query import SimpleStatement
class Database:
def __init__(self, name, salary):
self.cluster = Cluster(['192.168.1.1', '192.168.1.2'])
self.session = cluster.connect()
def get_users(self):
users_list = []
query = "SELECT * FROM users"
statement = SimpleStatement(query, fetch_size=10)
for user_row in session.execute(statement):
users_list.append(user_row.name)
return users_list
actually get_users return very big list of user name
but i want to transform return get_users to a "generator"
i don't want get all users name in 1 list and 1 call of function get_users, but i want to have lot of call get_users and return list with only 100 users max every call function
for example :
list1 = database.get_users()
list2 = database.get_users()
...
listn = database.get_users()
list1 contains 100 first user in query
list2 contains 100 "second" users in query
listn contains the latest elements in query (<=100)
is this possible ?
thanks for advance for your answer

According to Paging Large Queries:
Whenever there are no more rows in the current page, the next page
will be fetched transparently.
So, if you execute your code like this, you will still the whole result set, but this is paged in a transparent manner.
In order to achieve what you need to use callbacks. You can also find some code sample on the link above.
I added below the full code for reference.
from cassandra.cluster import Cluster
from cassandra.query import SimpleStatement
from threading import Event
class PagedResultHandler(object):
def __init__(self, future):
self.error = None
self.finished_event = Event()
self.future = future
self.future.add_callbacks(
callback=self.handle_page,
errback=self.handle_error)
def handle_page(self, rows):
for row in rows:
process_row(row)
if self.future.has_more_pages:
self.future.start_fetching_next_page()
else:
self.finished_event.set()
def handle_error(self, exc):
self.error = exc
self.finished_event.set()
def process_row(user_row):
print user_row.name, user_row.age, user_row.email
cluster = Cluster()
session = cluster.connect()
query = "SELECT * FROM myschema.users"
statement = SimpleStatement(query, fetch_size=5)
future = session.execute_async(statement)
handler = PagedResultHandler(future)
handler.finished_event.wait()
if handler.error:
raise handler.error
cluster.shutdown()
Moving to next page is done in handle_page when start_fetching_next_page is called.
If you replace the if statement with self.finished_event.set() you will see that the iteration stops after the first 5 rows as defined in fetch_size

How to JSON dump to a rotating file object

I'm writing a program which periodically dumps old data from a RethinkDB database into a file and removes it from the database. Currently, the data is dumped into a single file which grows without limit. I'd like to change this so that the maximum file size is, say, 250 Mb, and the program starts to write to a new output file just before this size is exceeded.
It seems like Python's RotatingFileHandler class for loggers does approximately what I want; however, I'm not sure whether logging can be applied to any JSON-dumpable object or just to strings.
Another possible approach would be to use (a variant of) Mike Pennington's
RotatingFile class (see python: outfile to another text file if exceed certain file size).
Which of these approaches is likely to be the most fruitful?
For reference, my current program is as follows:
import os
import sys
import json
import rethinkdb as r
import pytz
from datetime import datetime, timedelta
import schedule
import time
import functools
from iclib import RethinkDB
import msgpack
''' The purpose of the Controller is to periodically archive data from the "sensor_data" table so that it does not grow without limit.'''
class Controller(RethinkDB):
def __init__(self, db_address=(os.environ['DB_ADDR'], int(os.environ['DB_PORT'])), db_name=os.environ['DB_NAME']):
super(Controller, self).__init__(db_address=db_address, db_name=db_name) # Initialize the IperCronComponent with the default logger name (in this case, "Controller")
self.db_table = RethinkDB.SENSOR_DATA_TABLE # The table name is "sensor_data" and is stored as a class variable in RethinkDBMixIn
def generate_archiving_query(self, retention_period=timedelta(days=3)):
expiry_time = r.now() - retention_period.total_seconds() # Timestamp before which data is to be archived
if "timestamp" in r.table(self.db_table).index_list().run(self.db): # If "timestamp" is a secondary index
beginning_of_time = r.time(1400, 1, 1, 'Z') # The minimum time of a ReQL time object (i.e., the year 1400 in the UTC timezone)
data_to_archive = r.table(self.db_table).between(beginning_of_time, expiry_time, index="timestamp") # Generate query using "between" (faster)
else:
data_to_archive = r.table(self.db_table).filter(r.row['timestamp'] < expiry_time) # Generate the same query using "filter" (slower, but does not require "timestamp" to be a secondary index)
return data_to_archive
def archiving_job(self, data_to_archive=None, output_file="archived_sensor_data.json"):
if data_to_archive is None:
data_to_archive = self.generate_archiving_query() # By default, the call the "generate_archiving_query" function to generate the query
old_data = data_to_archive.run(self.db, time_format="raw") # Without time_format="raw" the output does not dump to JSON
with open(output_file, 'a') as f:
ids_to_delete = []
for item in old_data:
print item
# msgpack.dump(item, f)
json.dump(item, f)
f.write('\n') # Separate each document by a new line
ids_to_delete.append(item['id'])
r.table(self.db_table).get_all(r.args(ids_to_delete)).delete().run(self.db) # Delete based on ID. It is preferred to delete the entire batch in a single operation rather than to delete them one by one in the for loop.
def test_job_1():
db_name = "ipercron"
table_name = "sensor_data"
port_offset = 1 # To avoid interference of this testing program with the main program, all ports are initialized at an offset of 1 from the default ports using "rethinkdb --port_offset 1" at the command line.
conn = r.connect("localhost", 28015 + port_offset)
r.db(db_name).table(table_name).delete().run(conn)
import rethinkdb_add_data
controller = Controller(db_address=("localhost", 28015+port_offset))
archiving_job = functools.partial(controller.archiving_job, data_to_archive=controller.generate_archiving_query())
return archiving_job
if __name__ == "__main__":
archiving_job = test_job_1()
schedule.every(0.1).minutes.do(archiving_job)
while True:
schedule.run_pending()
It is not completely 'runnable' from the part shown, but the key point is that I would like to replace the line
json.dump(item, f)
with a similar line in which f is a rotating, and not fixed, file object.

Following Stanislav Ivanov, I used json.dumps to convert each RethinkDB document to a string and wrote this to a RotatingFileHandler:
import os
import sys
import json
import rethinkdb as r
import pytz
from datetime import datetime, timedelta
import schedule
import time
import functools
from iclib import RethinkDB
import msgpack
import logging
from logging.handlers import RotatingFileHandler
from random_data_generator import RandomDataGenerator
''' The purpose of the Controller is to periodically archive data from the "sensor_data" table so that it does not grow without limit.'''
os.environ['DB_ADDR'] = 'localhost'
os.environ['DB_PORT'] = '28015'
os.environ['DB_NAME'] = 'ipercron'
class Controller(RethinkDB):
def __init__(self, db_address=None, db_name=None):
if db_address is None:
db_address = (os.environ['DB_ADDR'], int(os.environ['DB_PORT'])) # The default host ("rethinkdb") and port (28015) are stored as environment variables
if db_name is None:
db_name = os.environ['DB_NAME'] # The default database is "ipercron" and is stored as an environment variable
super(Controller, self).__init__(db_address=db_address, db_name=db_name) # Initialize the instance of the RethinkDB class. IperCronComponent will be initialized with its default logger name (in this case, "Controller")
self.db_name = db_name
self.db_table = RethinkDB.SENSOR_DATA_TABLE # The table name is "sensor_data" and is stored as a class variable of RethinkDBMixIn
self.table = r.db(self.db_name).table(self.db_table)
self.archiving_logger = logging.getLogger("archiving_logger")
self.archiving_logger.setLevel(logging.DEBUG)
self.archiving_handler = RotatingFileHandler("archived_sensor_data.log", maxBytes=2000, backupCount=10)
self.archiving_logger.addHandler(self.archiving_handler)
def generate_archiving_query(self, retention_period=timedelta(days=3)):
expiry_time = r.now() - retention_period.total_seconds() # Timestamp before which data is to be archived
if "timestamp" in self.table.index_list().run(self.db):
beginning_of_time = r.time(1400, 1, 1, 'Z') # The minimum time of a ReQL time object (namely, the year 1400 in UTC)
data_to_archive = self.table.between(beginning_of_time, expiry_time, index="timestamp") # Generate query using "between" (faster, requires "timestamp" to be a secondary index)
else:
data_to_archive = self.table.filter(r.row['timestamp'] < expiry_time) # Generate query using "filter" (slower, but does not require "timestamp" to be a secondary index)
return data_to_archive
def archiving_job(self, data_to_archive=None):
if data_to_archive is None:
data_to_archive = self.generate_archiving_query() # By default, the call the "generate_archiving_query" function to generate the query
old_data = data_to_archive.run(self.db, time_format="raw") # Without time_format="raw" the output does not dump to JSON or msgpack
ids_to_delete = []
for item in old_data:
print item
self.dump(item)
ids_to_delete.append(item['id'])
self.table.get_all(r.args(ids_to_delete)).delete().run(self.db) # Delete based on ID. It is preferred to delete the entire batch in a single operation rather than to delete them one by one in the for-loop.
def dump(self, item, mode='json'):
if mode == 'json':
dump_string = json.dumps(item)
elif mode == 'msgpack':
dump_string = msgpack.packb(item)
self.archiving_logger.debug(dump_string)
def populate_database(db_name, table_name, conn):
if db_name not in r.db_list().run(conn):
r.db_create(db_name).run(conn) # Create the database if it does not yet exist
if table_name not in r.db(db_name).table_list().run(conn):
r.db(db_name).table_create(table_name).run(conn) # Create the table if it does not yet exist
r.db(db_name).table(table_name).delete().run(conn) # Empty the table to start with a clean slate
# Generate random data with timestamps uniformly distributed over the past 6 days
random_data_time_interval = timedelta(days=6)
start_random_data = datetime.utcnow().replace(tzinfo=pytz.utc) - random_data_time_interval
random_generator = RandomDataGenerator(seed=0)
packets = random_generator.packets(N=100, start=start_random_data)
# print packets
print "Adding data to the database..."
r.db(db_name).table(table_name).insert(packets).run(conn)
if __name__ == "__main__":
db_name = "ipercron"
table_name = "sensor_data"
port_offset = 1 # To avoid interference of this testing program with the main program, all ports are initialized at an offset of 1 from the default ports using "rethinkdb --port_offset 1" at the command line.
host = "localhost"
port = 28015 + port_offset
conn = r.connect(host, port) # RethinkDB connection object
populate_database(db_name, table_name, conn)
# import rethinkdb_add_data
controller = Controller(db_address=(host, port))
archiving_job = functools.partial(controller.archiving_job, data_to_archive=controller.generate_archiving_query()) # This ensures that the query is only generated once. (This is sufficient since r.now() is re-evaluated every time a connection is made).
schedule.every(0.1).minutes.do(archiving_job)
while True:
schedule.run_pending()
In this context the RethinkDB class does little other than define the class variable SENSOR_DATA_TABLE and the RethinkDB connection, self.db = r.connect(self.address[0], self.address[1]). This is run together with a module for generating fake data, random_data_generator.py:
import random
import faker
from datetime import datetime, timedelta
import pytz
import rethinkdb as r
class RandomDataGenerator(object):
def __init__(self, seed=None):
self._seed = seed
self._random = random.Random()
self._random.seed(seed)
self.fake = faker.Faker()
self.fake.random.seed(seed)
def __getattr__(self, x):
return getattr(self._random, x)
def name(self):
return self.fake.name()
def datetime(self, start=None, end=None):
if start is None:
start = datetime(2000, 1, 1, tzinfo=pytz.utc) # Jan 1st 2000
if end is None:
end = datetime.utcnow().replace(tzinfo=pytz.utc)
if isinstance(end, datetime):
dt = end - start
elif isinstance(end, timedelta):
dt = end
assert isinstance(dt, timedelta)
random_dt = timedelta(microseconds=self._random.randrange(int(dt.total_seconds() * (10 ** 6))))
return start + random_dt
def packets(self, N=1, start=None, end=None):
return [{'name': self.name(), 'timestamp': self.datetime(start=start, end=end)} for _ in range(N)]
When I run controller it produces several rolled-over output logs, each at most 2 kB in size, as expected:

cx_Oracle: How can I receive each row as a dictionary?

By default, cx_Oracle returns each row as a tuple.
>>> import cx_Oracle
>>> conn=cx_Oracle.connect('scott/tiger')
>>> curs=conn.cursor()
>>> curs.execute("select * from foo");
>>> curs.fetchone()
(33, 'blue')
How can I return each row as a dictionary?

You can override the cursor's rowfactory method. You will need to do this each time you perform the query.
Here's the results of the standard query, a tuple.
curs.execute('select * from foo')
curs.fetchone()
(33, 'blue')
Returning a named tuple:
def makeNamedTupleFactory(cursor):
columnNames = [d[0].lower() for d in cursor.description]
import collections
Row = collections.namedtuple('Row', columnNames)
return Row
curs.rowfactory = makeNamedTupleFactory(curs)
curs.fetchone()
Row(x=33, y='blue')
Returning a dictionary:
def makeDictFactory(cursor):
columnNames = [d[0] for d in cursor.description]
def createRow(*args):
return dict(zip(columnNames, args))
return createRow
curs.rowfactory = makeDictFactory(curs)
curs.fetchone()
{'Y': 'brown', 'X': 1}
Credit to Amaury Forgeot d'Arc:
http://sourceforge.net/p/cx-oracle/mailman/message/27145597

A very short version:
curs.rowfactory = lambda *args: dict(zip([d[0] for d in curs.description], args))
Tested on Python 3.7.0 & cx_Oracle 7.1.2

Old question but adding some helpful links with a Python recipe
According to cx_Oracle documentation:
Cursor.rowfactory
This read-write attribute specifies a method to call for each row that
is retrieved from the database. Ordinarily a tuple is returned for
each row but if this attribute is set, the method is called with the
tuple that would normally be returned, and the result of the method is
returned instead.
The cx_Oracle - Python Interface for Oracle Database Also points to GitHub repository for lots of helpful sample examples. Please check GenericRowFactory.py.
Googled: This PPT can be further helpful: [PDF]CON6543 Python and Oracle Database - RainFocus
Recipe
Django database backend for Oracle under the hood uses cx_Oracle. In earlier versions ( Django 1.11- ) they have written _rowfactory(cursor, row) That also cast cx_Oracle's numeric data types into relevant Python data and strings into unicode.
If you have installed Django Please check base.py as follows:
$ DJANGO_DIR="$(python -c 'import django, os; print(os.path.dirname(django.__file__))')"
$ vim $DJANGO_DIR/db/backends/oracle/base.py
One can borrow _rowfactory() from $DJANGO_DIR/db/backends/oracle/base.py and can apply below decorator naming to make it return namedtuple instead of simple tuple.
mybase.py
import functools
from itertools import izip, imap
from operator import itemgetter
from collections import namedtuple
import cx_Oracle as Database
import decimal
def naming(rename=False, case=None):
def decorator(rowfactory):
#functools.wraps(rowfactory)
def decorated_rowfactory(cursor, row, typename="GenericRow"):
field_names = imap(case, imap(itemgetter(0), cursor.description))
return namedtuple(typename, field_names)._make(rowfactory(cursor, row))
return decorated_rowfactory
return decorator
use it as:
#naming(rename=False, case=str.lower)
def rowfactory(cursor, row):
casted = []
....
....
return tuple(casted)
oracle.py
import cx_Oracle as Database
from cx_Oracle import *
import mybase
class Cursor(Database.Cursor):
def execute(self, statement, args=None):
prepareNested = (statement is not None and self.statement != statement)
result = super(self.__class__, self).execute(statement, args or [])
if prepareNested:
if self.description:
self.rowfactory = lambda *row: mybase.rowfactory(self, row)
return result
def close(self):
try:
super(self.__class__, self).close()
except Database.InterfaceError:
"already closed"
class Connection(Database.Connection):
def cursor(self):
Cursor(self)
connect = Connection
Now, instead of import cx_oracle import oracle in user script as:
user.py
import oracle
dsn = oracle.makedsn('HOSTNAME', 1521, service_name='dev_server')
db = connect('username', 'password', dsn)
cursor = db.cursor()
cursor.execute("""
SELECT 'Grijesh' as FirstName,
'Chauhan' as LastName,
CAST('10560.254' AS NUMBER(10, 2)) as Salary
FROM DUAL
""")
row = cursor.fetchone()
print ("First Name is %s" % row.firstname) # => Grijesh
print ("Last Name is %s" % row.lastname) # => Chauhan
print ("Salary is %r" % row.salary) # => Decimal('10560.25')
Give it a Try!!

Building up on answer by #maelcum73 :
curs.rowfactory = lambda *args: dict(zip([d[0] for d in curs.description], args))
The issue with this solution is that you need to re-set this after every execution.
Going one step further, you can create a shell around the cursor object like so:
class dictcur(object):
# need to monkeypatch the built-in execute function to always return a dict
def __init__(self, cursor):
self._original_cursor = cursor
def execute(self, *args, **kwargs):
# rowfactory needs to be set AFTER EACH execution!
self._original_cursor.execute(*args, **kwargs)
self._original_cursor.rowfactory = lambda *a: dict(
zip([d[0] for d in self._original_cursor.description], a)
)
# cx_Oracle's cursor's execute method returns a cursor object
# -> return the correct cursor in the monkeypatched version as well!
return self._original_cursor
def __getattr__(self, attr):
# anything other than the execute method: just go straight to the cursor
return getattr(self._original_cursor, attr)
dict_cursor = dictcur(cursor=conn.cursor())
Using this dict_cursor, every subsequent dict_cursor.execute() call will return a dictionary. Note: I tried monkeypatching the execute method directly, however that was not possible because it is a built-in method.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python loop inserting last row only in cassandra - python

You need to use the method save. ... row = TestTable(type=type, value=value) row.save() ... http://cqlengine.readthedocs.io/en/latest/topics/models.html#cqlengine.models.Model.save

Related

Reset index name in elasticsearch dsl

SQLAlchemy 'on_conflict_do_update' does not update

python cassandra get big result of select * in generator (without storage result in ram)

How to JSON dump to a rotating file object

cx_Oracle: How can I receive each row as a dictionary?

Categories

Resources