Matlab checking for a value in a database - python

I have the following Boolean statement in Python:
db_connection.query(
'select storage_time from traces where id=' + trace_id
).dictresult()[0]['storage_time'] == None
It basically checks if there is a value in storage_time and I would like to do the same thing in Matlab, but I can't find anything equivalent to None.
Could you please help me out?
Thanks

Equivalent to None values from Python are NULLs. And since you connect to your database
via Matlab Database Toolbox, you need to specify how NULL values retrieved from database
are to be presented in Matlab. This may be done by setting of 'NullNumberRead'
via calling of setdbprefs function from Matlab Database Toolbox. For instance, you can do this so
setdbprefs('NullNumberRead','NaN')
or so
setdbprefs('NullNumberRead','0')
Unfortunately, there is no guarantee that the NULL representation value specified by this way won't be confused
with real non-NULL values obtained as a result of your query (it is your own responsibility in this case to
guarantee the query always do not contain NaNs or zeros, respectively, among non-NULL values in its results).
But if you have to connect to PostgreSQL, as far as I know, there exists at least one Matlab and PostgreSQL connector that
respects NULLs in a fully consistent manner. This is a high-performance PostgreSQL client library PgMex.
In PostgreSQL both a value itself and the value elements can be NULL (for array types). This makes a representation of NULLs in Matlab less trivial than expected.
To illustrate the way PgMex uses to represent NULLs in Matlab, let us consider the following example. Suppose you have to retrieve results for a query returning one field myfield of float8[] type with two tuples. And suppose the value of myfield for the first tuple equals to NULL as itself, while for the second tuple the corresponding value equals {0,NULL,NaN}. Results are obtained as follows
(we assume that the argument of the very first command connect below is corrected properly and that the table mytable containing
myfield of float8[] type among its fields already exists within the respective database):
% Create the database connection
dbConn=com.allied.pgmex.pgmexec('connect',[...
'host=<yourhost> dbname=<yourdb> port=<yourport> '...
'user=<your_postgres_username> password=<your_postgres_password>']);
pgResult=com.allied.pgmex.pgmexec('exec',dbConn,...
'select myfield from mytable'); % perform the query
SMyField=com.allied.pgmex.pgmexec('getf',pgResult,...
'%float8[]',0); % retrieve the results
Here SMyField is a structure with three fields: valueVec, isNullVec and isValueNullVec.
isValueNullVec equals a column logical array [true;false], that is the entire value for the first tuple equals NULL,
while the value for the second tuple does not equal NULL as itself. isNullVec equals the following column cell array:
{[];[false,true,false]}. By this way it is possible to indicate that only the second element of the array being the value
of myfield for the second tuple is NULL. At last, valueVec equals to the column cell array {[];[0 0 NaN]}. Only
the first and the third elements of the second cell are relevant, in contrast to the second element of this array.
This is because isNullVec clearly indicates that this second element is NULL, so this zero value does not matter
(some default value is chosen for each particular data type).
What concerns your example, the respective code in Matlab may be the following (we assume that dbConn obtained as above already exists
as well as that the query below is correct with storage_time field of timestamp type and the variable trace_id being already defined):
pgResult=com.allied.pgmex.pgmexec('exec',dbConn,...
['select storage_time from traces where id=' + trace_id]); % perform the query
SStorageTime=com.allied.pgmex.pgmexec('getf',pgResult,...
'%timestamp',0); % retrieve the results
% check that the value for the first tuple is not NULL
isStorageTime=~SStorageTime.isValueNullVec(1);
Hence it is sufficient to check only isValueNullVec.
EDIT: There are free academic licenses for PgMex.

MATLAB's Database Toolbox has preferences how to handle NULL values. Depending on those settings you can get different values. See SETDBPREFS for details. You can change preferences in GUI as well.
By default you will get NaN if you read the data as numeric, and 'NULL' strings if you read as strings. In the first case for numbers check for NaN with ISNAN function.
null_idx = isnan(fetcheddata);
For strings use STRCMP:
null_idx= strcmp(upper(fetcheddata), 'NULL');
In addition, if you fetch the data as cell array, you may need to deal with them with CELLFUN or convert to matrix with CELL2MAT.

Matlab idiom is usually to use the isempty() function.
isempty(somefunction(someargument))
returns true if somefunction(someargument) returns any empty result, and false otherwise.
I have not worked with the Matlab DB toolbox much, so I'm not sure what the full translation of your Python statement is.

If you use this query you can check for True or False in instead of None:
trace_id_exists = db_connection.query("""\
select exists (
select 1
from traces
where id = %s
) "exists"
""" % trace_id
).dictresult()[0]['exists']
if trace_id_exists:
...
You could also return something else like 1 or 0.

Related

Aerospike where query index python

We are currently testing "aerospike".
But there are certain points in the documentation that we do not understand with reference to the keys.
key = ('trivium', 'profile', 'data')
# Write a record
client.put(key, {
'name': 'John Doe',
'bin_data': 'KIJSA9878MGU87',
'public_profile': True
})
We read about the namespace, but when we try to query with the general documentation.
client = aerospike.client(config).connect()
query = client.query('trivium', 'profile')
query.select('name', 'bin_data')
query.where(p.equals('public_profile', True))
print(query.results())
The result is null, but when we eerase the "where" statement the query brings all the records, the documentation says that the query work with the secondary index, but how that works?
Regards.
You can use one filter in a query. That filter, in your case, the equality filter, is on the public_profile bin. To use the filter, you must build a secondary index (SI) on public_profile bin, however SIs can only be on bins containing numeric or string data type. So to do what you are trying to do, change public_profile to a numeric entry say 0 or 1, then add a secondary index on that bin and use the equality filter on the value of 0 or 1. While you can build multiple SIs, you can only invoke one filter in a any given query. You cannot chain multiple filters with an "AND". If you have to use multiple filters, you will have to write Stream UDFs (User Defined Functions). You can use AQL to define SIs, you just have to do once.
$aql
aql>help --- see the command to add secondary index.
aql>exit
SIs reside in process RAM. Once defined, any new data added or modify is automatically indexed by aerospike as applicable. If you define index on public_profile as NUMERIC but in some records insert string data in that bin, those records will not be indexed and won't participate in the query filter.

Python MySQLdb test for select count(*) = zero

I use SELECT COUNT(*) FROM db WHERE <expression> to see if a set of records is null. So:
>>> cnt = c.fetchone()
>>> print cnt
(0L,)
My question is: how do you test for this condition?
I have a number of other ways to accomplish this. Is something like the following possible?
if cnt==(0L,):
# do something
fetchone returns a row, which is a sequence of columns.
If you want to get the first value in a sequence, you use [0].
You could instead compare the row to (0,), as you're suggesting. But as far as I know neither the general DB-API nor the specific MySQLdb library guarantee what kind of sequence a row is; it could be a list, or a custom sequence class. So, relying on the fact that it's a tuple is probably not a good idea. And, since it's just as easy to not do so, why not be safe and portable?
So:
count_row = c.fetchone()
count = count_row[0]
if count == 0:
do_something()
Or, putting it together in one line:
if c.fetchone()[0] == 0:
do_something()
Thank you. Your first sequence works, don't know how I did not try that one, but I did not. The second construction gets an error: ...object has no attribute 'getitem'. I would guess my version of MySQLdb (1.2.3_4, Python 2.7) does not support it.
What I did in the interim was to construct the zero tuple by executing a count(*) constructed to return zero records. This seems to work fine
It's often easier to use the .rowcount attribute of the cursor object to check whether there are any rows in your result set. This attribute is specified in the Python Database API:
This read-only attribute specifies the number of rows that the last
.execute*() produced (for DQL statements like SELECT) or
affected (for DML statements like UPDATE or INSERT). [9]
The attribute is -1 in case no .execute*() has been performed on
the cursor or the rowcount of the last operation is cannot be
determined by the interface. [7]
When .rowcount cannot be used
Note that per the above specs, Cursor.rowcount should be set to -1 when the number of rows produced or affected by the last statement "cannot be determined by the interface." This happens when using the SSCursor and SSDictCursor cursor classes.
The reason is that the MySQL C API has two different functions for retrieving result sets: mysql_store_result() and mysql_use_result(). The difference is that mysql_use_result() reads rows from the result set as you ask for them, rather than storing the entire result set as soon as the query is executed. For very large result sets, this "unbuffered" approach can be faster and uses much less memory on the client machine; however, it makes it impossible for the interface to determine how many rows the result set contains at the time the query is executed.
Both SSCursor and SSDictCursor call mysql_use_result(), so their .rowcount attribute should hold the value -1 regardless of the size of the result set. In contrast, DictCursor and the default Cursor class call mysql_store_result(), which reads and counts the entire result set immediately after executing the query.
To make matters worse, the .rowcount attribute only ever holds the value -1 when the cursor is first opened; once you execute a query, it receives the return value of mysql_affected_rows(). The problem is that mysql_affected_rows() returns an unsigned long long integer, which represents the value -1 in a way that can be very counterintuitive and wouldn't be caught by a condition like cursor.rowcount == -1.
Counting for counting's sake
If the only thing you're doing is counting records, then .rowcount isn't that useful because your COUNT(*) query is going to return a row whether the records exist or not. In that case, test for the zero value in the same way that you would test for any value when fetching results from a query. Whether you can do c.fetchone()[0] == 0 depends on the cursor class you're using; it would work for a Cursor or SSCursor but fail for a DictCursor or SSDictCursor, which fetch dictionaries instead of tuples.
The important thing is just to be clear in your code about what's happening, which is why I would recommend against using c.fetchone() == (0,). That tests an entire row when all you need to do is test a single value; get the value out of the row before you test it, and your code will be more clear. Personally, I find c.fetchone()[0] to be needlessly opaque; I would prefer:
row = cursor.fetchone()
if row[0] == 0:
do_something()
This makes it abundantly clear, without being too verbose, that you're testing the first item of the row. When I'm doing anything more complicated than a simple COUNT() or EXISTS(), I prefer to use DictCursor so that my code relies on (at most) explicit aliases and never on implicit column ordering.
Testing for an empty result set
On the other hand, if you actually need to fetch a result set and the counting is purely incidental, as long as you're not using one of the unbuffered cursor classes you can just execute the important query and not worry about the COUNT():
cursor.execute(r"SELECT id, name, email FROM user WHERE date_verified IS NULL;")
if cursor.rowcount == 0:
print 'No results'

Python result from mysql db does not match actual value: comparison of values

I have a program in which I will compare hash values generated from my code as the ones in my mysql database.
So, after generating the check values i have:
hash to be compared: 78ff0103440dcea01f36438a71bdf28f
hash value from db: (('78ff0103440dcea01f36438a71bdf28f',),)
The hash value from the DB was output through using something like:
db_hash.fetchone()
that's why it includes the (('',),) symbols.
But I've tried appending the same symbols with the hash to be compared and it still wont equate properly.
Im baffled because it's only supposed to be a simple compare through a
if hash == result:
do some code
else:
do some code
If you have an idea on what this is please answer :)
Python's MySQL adapter returns rows as tuples of values - and in this case, you're receiving a full result set of one row, with one column. To get the value of that, just do:
dbResult # this is (('78ff0103440dcea01f36438a71bdf28f',),)
dbResult[0][0] # this is '78ff0103440dcea01f36438a71bdf28f'
Of course, if your query was different (or returned no rows) this would throw an error. You should ideally be checking the number of rows returned (len(dbResult)) first. The number of columns in each row will be consistent.

Query for bit based values

I have a problem how to get values from pytables. Values are bit based, but stored as integer number.
One column in my table is Int32Column() with name 'Value'. In this column I will store integer values where every bit has different meaning. So, if I want information for some bit, I will take value from table and make some bit manipulation actions. I don't know how to make query for getting specified values from table.
For example, I want to know all values in Value column where is first bit == 1 and third bit ==1.
How to make that query?
I'm trying with mask:
[ x['Value'] for x in table.where('((Value & mask) == mask)')]
but, I'm getting exception:
NotImplementedError: unsupported operand types for \*and\*: int, int
Processing query must be very fast because large number of rows in future. One restriction is that values must be as int values in table, because I'm getting values from server in int format. I hope that someone has better solution.
For future reference.
I had a similar problem and solved it in the following way. As the usual bitwise operators(<<, >>) are not available and the meaning of the &, | operators is logical, instead of bitwise, one has to improvise.
To check if a value VAL has the n-th bit set or not, we can shift the interesting bit to the 0th position, which denotes the parity of the number (2**0). The parity can be checked using the modulus operator.
So, one can do something like to check whether, for example, bit 25 is set and 16 is unset.
table.where("((VAL/(2**25))%2==1) & ((VAL/(2**16))%2==0)")
Not elegant, but working for now.

How to tell MySQL to trigger a Hash before Update/Insert?

I am struggeling to create a trigger within MySQL, so that everytime I am inserting a value into a column named title a HASH shall be created and stored in the column title_hash. Since I don't know how this works I found this code while googling:
CREATE TRIGGER insertModelHash
BEFORE
INSERT
ON
products
FOR EACH ROW SET
NEW.model_hash = CONV(RIGHT(MD5(NEW.products_model), 16), 16, 10)
The MySQL-reference tells me, that this means:
Create a trigger called insertModelHash...
... before inserting row int table products...
use the functions MD5, RIGHT, CONV on the column products_model in every new row I intend to insert.
The 3. point needs more explanation:
I guess, that NEW is some sort of identifier of new rows. So NEW.products_model points to the column products_model in the current (new) row.
Then MD5 is issued. Since I want to use SHA-2 it is obvious for me to change MD5(NEW.products_model), 16) ===> SHA2(NEW.products_model), 224).
And now I struggle: Why does this guy use CONV(RIGHT(...)...)? Is this really necessary?
Additional information: Right now, I am doing
hashlib.sha224(title).hexdigest()
in Python and store this value.
I appreciate any suggestions/explanations!
To answer your three questions:
The NEW keyword references the 'pseudo-table' for the record that would be inserted. On an updated trigger, you can access both 'NEW' and 'OLD', and on a delete, just 'OLD'.
Yes, MD5 is used to create the hash. However, in your question, you have part of the parameter to the 'RIGHT' function included. It's only MD5(NEW.products_model) (there's no , 16). And yes, you can substitute SHA2 for MD5, if it's available (it's only available if MySQL is configured for SSL support).
The RIGHT(string, number) simply takes the right 'number' characters from 'string'.
The CONV() function is used to convert a number between bases. The combination of these last two functions takes the right 16 characters of the hash, and converts them from base 16 (hex) to base 10 (decimal).
And the answer is no, you don't need them if all you want to do is store the hash itself.
NEW.model_hash = SHA2(NEW.products_model)

Categories

Resources