Aerospike where query index python - python

We are currently testing "aerospike".
But there are certain points in the documentation that we do not understand with reference to the keys.
key = ('trivium', 'profile', 'data')
# Write a record
client.put(key, {
'name': 'John Doe',
'bin_data': 'KIJSA9878MGU87',
'public_profile': True
})
We read about the namespace, but when we try to query with the general documentation.
client = aerospike.client(config).connect()
query = client.query('trivium', 'profile')
query.select('name', 'bin_data')
query.where(p.equals('public_profile', True))
print(query.results())
The result is null, but when we eerase the "where" statement the query brings all the records, the documentation says that the query work with the secondary index, but how that works?
Regards.

You can use one filter in a query. That filter, in your case, the equality filter, is on the public_profile bin. To use the filter, you must build a secondary index (SI) on public_profile bin, however SIs can only be on bins containing numeric or string data type. So to do what you are trying to do, change public_profile to a numeric entry say 0 or 1, then add a secondary index on that bin and use the equality filter on the value of 0 or 1. While you can build multiple SIs, you can only invoke one filter in a any given query. You cannot chain multiple filters with an "AND". If you have to use multiple filters, you will have to write Stream UDFs (User Defined Functions). You can use AQL to define SIs, you just have to do once.
$aql
aql>help --- see the command to add secondary index.
aql>exit
SIs reside in process RAM. Once defined, any new data added or modify is automatically indexed by aerospike as applicable. If you define index on public_profile as NUMERIC but in some records insert string data in that bin, those records will not be indexed and won't participate in the query filter.

Related

How to implement this in python : Check if a vertex exist, if not, create a new vertex

I want to create a neptune db, and dump data to it. I download historical data from DynamoDB to S3, these files in csv format. The header in these csv like:
~id, someproperties:String, ~label
Then, I need to implement real-time streaming to this neptune db through lambda, in the lambda function, I will check if one vertex(or edges) exist or not, if exist, I will update the vertex(or edges), otherwise I creat a new one.
In python, my implementation like this:
g.V().hasLabel('Event').has(T.id, event['Id']).fold().coalesce(unfold(), addV('Event').property(T.id, event['Id'])).property(Cardinality.single, 'State', event['State']).property('sourceData', event['sourceData']).next()
Here I have some questions:
In real-time streaming, I need to query if vertex with a id already
there, so I need to query the nodes of historical data, so can
has(T.id, event['Id']) do this? or should I just use
has(id, event['Id']) or has("id", event['Id']) ?
I was using g.V().has('Event', T.id, event['Id']) instead of
g.V().hasLabel('Event').has(T.id, event['Id']), but got error
like cannot local NeptuneGraphTraversal.has(). Are these two
queries same thing?
Here's the three bits of Gremlin you had a question about:
g.V().has(T.id, "some-id")
g.V().has(id, "some-id")
g.V().has("id", "some-id")
The first two will return you the same result as id is a member of T (as a point of style, Gremlin users typically statically import id so that it can be referenced that way for brevity). The last traversal is different from the first two because, as a String value it refers to a standard property key named "id". Generally speaking, TinkerPop would recommend that you not use a property key name like "id" or "label" as it can lead to mistakes and confusion with values of T.
As for the second part of your question revolving around:
g.V().has('Event', T.id, event['Id'])
g.V().hasLabel('Event').has(T.id, event['Id'])
You can't pass T.id to the 3-ary form of has() as Kelvin points out as the step signature only allows a String in that second position. It also wouldn't make sense to allow T there because T.label is already accounted for by the first argument and T.id refers to the actual graph element identifier. If you know that value then you wouldn't bother specifying the T.label in the first place, as the T.id already uniquely identifies the element. You would just do g.V(event['Id']).

Peewee incrementing an integer field without the use of primary key during migration

I have a table I need to add columns to it, one of them is a column that dictates business logic. So think of it as a "priority" column, and it has to be unique and a integer field. It cannot be the primary key but it is unique for business logic purposes.
I've searched the docs but I can't find a way to add the column and add default (say starting from 1) values and auto increment them without setting this as a primarykey..
Thus creating the field like
example_column = IntegerField(null=False, db_column='PriorityQueue',default=1)
This will fail because of the unique constraint. I should also mention this is happening when I'm migrating the table (existing data will all receive a value of '1')
So, is it possible to do the above somehow and get the column to auto increment?
It should definitely be possible, especially outside of peewee. You can definitely make a counter that starts at 1 and increments to the stop and at the interval of your choice with range(). You can then write each incremented variable to the desired field in each row as you iterate through.
Depends on your database, but postgres uses sequences to handle this kind of thing. Peewee fields accept a sequence name as an initialization parameter, so you could pass it in that manner.

How to define list of enumerated values in Web2Py

I develop a website with Web2Py framework.
It provides a way to define enumerated values as given below.
I need to define a table as given below.
Field('state','string', length=10, requires=IS_IN_SET(('open','closed','not_open')))
Also, I can define a field which can list values as given below.
Field('emails','list:string')
But, what is the syntax to combine this?
I need to define the weekend days for an organization and this should be more than 1.
I tried the following.
db.define_table('organization',
Field('name','string', requires=IS_NOT_EMPTY()),
Field('description','text'),
Field('weekends','list:string', length=10, requires=IS_IN_SET(('sunday','monday','tuesday','wednesday','thursday','friday','saturday'))),
redefine=migrate_flag
)
But it only defines an enumeration with a single value.
I verify this in the new record creation in the Web2Py appadmin interface by creating a new database record there. I can enter only one value for the weekends field.
Can this be done in the 'web2py' way? Or will I have to resort to creating a new weekend table in the database and make a foreign key to the organization?
Use the "multiple" argument to allow/require multiple selections:
IS_IN_SET(('sunday','monday','tuesday','wednesday','thursday','friday','saturday'),
multiple=True)
Or if you want to require exactly two choices:
IS_IN_SET(('sunday','monday','tuesday','wednesday','thursday','friday','saturday'),
multiple=(2, 2))
If multiple is True, it will allow zero or more choices. multiple can also be a tuple specifying the minimum and maximum number of choices allowed.
The IS_IN_DB validator also takes the multiple argument.

Annotate with a filtered related object set

So I have a SensorType model, which has a collection of SensorReading objects as part of a sensorreading_set (i.e., the sensor type has many sensor readings). I want to annotate the sensor types to give me the sensor reading with the max id. To wit:
sensor_types = SensorType.objects.annotate(
newest_reading_id=Max('sensorreading__id'))
This works fantastically, but there's a catch. Sensor Readings have another foreign key, Device. What I really want is the highest sensor reading id for a given sensor type for a given device. Is it possible to have the annotation refer to a subset of sensor readings that basically amounts to SensorReading.objects.filter(device=device)?
Filtering works perfectly fine with related objects, and annotations work perfectly fine with those filters. What you need to do is:
from django.db.models import Max
SensorType.objects.filter(sensorreading__device=device) \
.annotate(newest_reading_id=Max('sensorreading__id'))
Note that the order of function calls matters. Using filter before annotate will only annotate on the filtered set, using annotate before filter will annotate on the complete set, and then filter. Also, when filtering on related objects, keep in mind that filter(sensorreading__x=x, sensorreading__y=y) will filter for sensorreadings where all conditions are true, while .filter(sensorreading__x=x).filter(sensorreading__y=y) will filter for sensorreadings where either one of these conditions is true.
You can use .extra for these type of Queries in Django:
Like this:
SensorType.objects.extra(
select={
"newest_reading_id": "SELECT MAX(id) FROM sensorreading WHERE sensorreading.sensortype_id = sensortype.id AND sensorreading.device_id=%s",
},
select_params = [device.id]
)
You can read more about .extra here : https://docs.djangoproject.com/en/1.6/ref/models/querysets/#django.db.models.query.QuerySet.extra
As I understood, you want to GROUP_BY on two fields, device_id and sensortype_id. This can be done using:
SensorReading.objects.all().values('device_id', 'sensortype_id').annotate(max=Max('id'))
I didn't tried it; it was taken from 2 different answers in SO, this one and this one.

Reorder of SQLAlchemy Query results based on external ranking

The results of an ORM query (e.g., MyObject.query()) need to be ordered according to a ranking algorithm that is based on values not within the database (i.e. from a separate search engine). This means 'order_by' will not work, since it only operates on fields within the database.
But, I don't want to convert the query results to a list, then reorder, because I want to maintain the ability to add further constraints to the query. E.g.:
results = MyObject.query()
results = my_reorder(results)
results = results.filter(some_constraint)
Is this possible to accomplish via SQLAlchemy?
I am afraid you will not be able to do it, unless the ordering can be derived from the fields of the object's table(s) and/or related objects' tables which are in the database.
But you could return from your code the tuple (query, order_func/result). In this case the query can be still extended until it is executed, and then resorted. Or you could create a small Proxy-like class, which will contain this tuple, and will delegate the query-extension methods to the query, and query-execution methods (all(), __iter__, ...) to the query and apply ordering when executed.
Also, if you could calculate the value for each MyObject instance beforehand, you could add a literal column to the query with the values and then use order_by to order by it. Alternatively, add temporary table, add rows with the computed ordering value, join on it in the query and add ordering. But I guess these are adding more complexity than the benefit they bring.

Categories

Resources