Make changes persistent in Boto

Make changes persistent in Boto - python

I have a SimpleDB instance that I update and read using boto for Python:
sdb = boto.connect_sdb(access_key, secret_key)
domain = sdb.get_domain('DomainName')
itemName = 'UserID'
itemAttr = {'key1': 'val1', 'key2': val2}
userDom.put_attributes(itemName, itemAttr)
That works a expected. A new item with name 'UserID' and values val1 and val2 will be inserted in the domain.
Now, the problem that I am facing is that if I query that domain right after updating its attributes,
query = 'select * from `DomainName` where key1=val1'
check = domain.select(query)
itemName = check.next()['key2']
I will get an error because the values in the row could not be found. However, if I add a time.sleep(1) between the write and the read everything works.
I suspect this problem is due to the fact that put_atributes signals the data base for writing, but does not wait until this change has been made persistent. I have also tried to write using creating an item and then saving that item (item.save()) without much success. Does anyone know how can I make sure that the values have been written in the SimpleDB instance before proceeding with the next operations?
Thanks.

The issue here is that SimpleDB is, by default, eventually consistent. So, when you write data and then immediately try to read it, you are not guaranteed to get the newest data although you are guaranteed that eventually the data will be consistent. With SimpleDB, eventually usually means less than a second but there are no guarantees on how long that could take.
There is, however, a way to tell SimpleDB that you want a consistent view of the data and are willing to wait for it, if necessary. You could do this by changing your query code slightly:
query = 'select * from `DomainName` where key1=val1'
check = domain.select(query, consistent_read=True)
itemName = check.next()['key2']
This should always return the latest values.

Related

Python Mongodb sorting too big, how to use index?

I'm trying to iterate in Python over all elements of a large Mongodb database.
Usually, I do:
mgclient = MongoClient('mongodb://user:pwd#0.0.0.0:27017')
mgdb = mgclient['mongo']
mgcol = mgdb['name']
for mg_ob in mgcol.find().sort('Date').sort('time'):
#DOTHINGS
But it says "Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit".
So I created an index named 'SortedTime', but I don't understand how I can use it now.
Basically, I'm trying to have something like:
mgclient = MongoClient('mongodb://user:pwd#0.0.0.0:27017')
mgdb = mgclient['mongo']
mgcol = mgdb['name']
for mg_ob in mgcol.find()['SortedTime']:
#DOTHINGS
Any ideas ? A little hand would be much appreciated.
I hope this post will help others. Thank you very much
Update:
I managed to make it work thanks to Joe. After I created the Index:
resp = mgcol.create_index(
[
("date", 1),
("time", 1)
]
)
print ("index response:", resp)
What I did was just:
mgclient = MongoClient('mongodb://user:pwd#0.0.0.0:27017')
mgdb = mgclient['mongo']
mgcol = mgdb['name']
for mg_ob in mgcol.find():
#DOTHINGS
No need to use the index name.

Your query is sorting on 2 fields, Date and time, so you will need an index that includes these fields first in the key specification.
Working from the mongo shell, you might use the createIndex shell helper:
db.getSiblingDB("mongo").getCollection("name").createIndex({Date:1, time:1})
Working from the client side, you might use the createIndexes database command.
Once the index has been created, query just like you did before and the mongod's query executor should use the index.
You can use explain() to get detailed query execution stages to see which indexes were considered and the comparative performance of each.

In SQLAlchemy, how can I get affected rows after update using optimistic lock?

I have known that in a session of SQLAlchemy, update() will not work, maybe even not communicate with database until you use session.commit.
Here is my code, I think the skill I use is called optimistic lock:
with Session() as session:
# get one record
object = session.query(Table).where(key=param).limit(1)
content = "here is the calculated value of my business"
# update the record by the id of the gotten record
# if 2 thread get the same record and run into this update statement, I think only can success
# because it is ensured by the MySQL MVCC
Table.update(Table).set(Table.content = content).where(id = object.id)
sessoin.commit()
But I hava a question here, what if I want to do something in the thread which got the lock after session.commit()?
For example, the code I expect is like this:
affected_rows = sessoin.commit()
if affected_rows:
do_something_after_get_lock_success()

According to the docs, ResultProxy has a property rowcount. you can get the affected rows from that.
but note that :
This attribute returns the number of rows matched, which is not necessarily the same as the number of rows that were actually modified - an UPDATE statement, for example, may have no net change on a given row if the SET values given are the same as those present in the row already. Such a row would be matched but not modified. On backends that feature both styles, such as MySQL, rowcount is configured by default to return the match count in all cases.
see here also for great Mike Bayer explanation about this.

Pythonic way to use a variable as any integer

I am having trouble with the parameter of an SNMP query in a python script. An SNMP query takes an OID as a parameter. The OID I use here is written in the code below and, if used alone in a query, should return a list of states for the interfaces of the IP addresses I am querying onto.
What I want is to use that OID with a variable appended to it in order to get a very precise information (if I use the OID alone I will only get a list of thing that would only complexify my problem).
The query goes like this:
oid = "1.3.6.1.4.1.2011.5.25.119.1.1.3.1.2."
variable = "84.79.84.79"
query = session.get(oid + variable)
Here, this query will return a corrupted SNMPObject, as in the process of configuration of the device I am querying on, another number is added, for some reason we do not really care about here, between these two elements of the parameter.
Below is a screenshot showing some examples of an SNMP request that only takes as a parameter the OID above, without the variable appended, on which you may see that my variable varies, and so does the highlighted additional number:
Basically what I am looking for here is the response, but unfortunately I cannot predict for each IP address I am querying what will that "random" number be.
I could use a loop that tries 20 or 50 queries and only saves the response of the only one that would have worked, but it's ugly. What would be better is some built-in function or library that would just say to the query:
"SNMP query on that OID, with any integer appended to it, and with my variable appended to that".
I definitely don't want to generate a random int, as it is already generated in the configuration of the device I am querying, I just want to avoid looping just to get a proper response to a precise query.
I hope that was clear enough.

Something like this should work:
from random import randint
variable = "84.79.84.79"
numbers = "1.3.6.1.4.1.2011.5.25.119.1.1.3.1.2"
query = session.get('.'.join([numbers, str(randint(1,100)), variable])

PyMongo cursor operations are very slow

I'm new to both MongoDB and pyMongo,
and am having some performance issues
regarding cursors.
TL,DNR: Anything operation I try to perform
using a cursor takes about a second.
Long version
I have a small database, which I bulkloaded. Each entry has 3 fields:
dom: domain name (unique)
date: date, YYYYMMDD
flag: string
I've loaded about 1.9 million entries, without incident, and quite quickly.
I created a hash index on the dom field.
Now, I want to grab certain records by the domain field, and update them, using a Python program.
That's where the problem lies.
I'm using the latest MongoDB, and the latest pyMongo.
stripped down program...
import pymongo
from pymongo import MongoClient
db = client.myindexname
posts = db.posts
print list(db.profiles.index_information()) # shows hash index is present
for k in newdomainlist.keys(): #iterate list of domains to check
ret = posts.find({"dom": k}) #this runs fine, and quickly
#'ret' is a cursor
print ret #this runs quickly
#Here's the problem
print ret.count() #this takes about a second. why?
If I just 'print ret', the speed is fine. However, if I try to
reference anything in the cursor, the speed drops to the floor - I
can do about 1 operation per second.
In this case, I'm just trying to see if ret.count() returns '0' (we don't
have this domain), or '1' (we have it already).
I've tried adding a batch_size(10000) to the find, without it helping.
I DO have the Python C extensions loaded.
What the heck am I doing wrong?
thanks

It turned out that I'd created my hashed index on the wrong field, 'collection', rather than 'posts'. Chalk it up to mongodb inexperience. We can close this one now, or delete it entirely.

How to a query a set of objects and return a set of object specific attribute in SQLachemy/Elixir?

Suppose that I have a table like:
class Ticker(Entity):
ticker = Field(String(7))
tsdata = OneToMany('TimeSeriesData')
staticdata = OneToMany('StaticData')
How would I query it so that it returns a set of Ticker.ticker?
I dig into the doc and seems like select() is the way to go. However I am not too familiar with the sqlalchemy syntax. Any help is appreciated.
ADDED: My ultimate goal is to have a set of current ticker such that, when new ticker is not in the set, it will be inserted into the database. I am just learning how to create a database and sql in general. Any thought is appreciated.
Thanks. :)

Not sure what you're after exactly but to get an array with all 'Ticker.ticker' values you would do this:
[instance.ticker for instance in Ticker.query.all()]
What you really want is probably the Elixir getting started tutorial - it's good so take a look!
UPDATE 1: Since you have a database, the best way to find out if a new potential ticker needs to be inserted or not is to query the database. This will be much faster than reading all tickers into memory and checking. To see if a value is there or not, try this:
Ticker.query.filter_by(ticker=new_ticker_value).first()
If the result is None you don't have it yet. So all together,
if Ticker.query.filter_by(ticker=new_ticker_value).first() is None:
Ticker(ticker=new_ticker_value)
session.commit()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Make changes persistent in Boto - python

Related

Python Mongodb sorting too big, how to use index?

In SQLAlchemy, how can I get affected rows after update using optimistic lock?

Pythonic way to use a variable as any integer

PyMongo cursor operations are very slow

How to a query a set of objects and return a set of object specific attribute in SQLachemy/Elixir?

Categories

Resources