use custom comparator for a specific set - python

I am storing a number of objects in a set. Is there a way to override the comparator function used just for that set? I know I can override __eq__ and friends but I don't want to do so as I am also storing those objects in other sets.
Illustration:
# suppose Person class has name and address fields
p1 = Person("Alice", "addr1")
p2 = Person("Alice", "addr2")
s1 = set(p1, p2, [equality based on name only]) # this should contain only one of p1 or p2
s2 = set(p1, p2) # this should contain p1 and p2

set doesn't provide a way to determine whether two objects are equivalent; it leaves that up to the class.
However, you can group the objects by an arbitrary predicate, then construct an appropriate sequence to pass to set.
Here's a solution using itertools.groupby:
from itertools import groupby
def get_name(p):
return p.name # or however you get the name of a Person instance
s1 = set(next(v) for _, v in groupby(sorted([p1, p2], key=get_name), get_name))
After sorting by name, groupby will put all Persons with the same name in a single group. Iterating over the resulting sequence yields tuples like ("Alice", <sequence of Person...>). You can ignore the key, and just call next on the sequence to get an object with the name Alice.
Note that depending on how you do the grouping, "equal" elements can still end up in the different groups, and set will discard the duplicates as usual.

You can do this using a dictionary (which kind of uses a set under-the-hood):
# Just to simulate a class, not really necessary
import collections
Person = collections.namedtuple('Person', ('name', 'address'))
people = [Person("Alice", "addr1"), Person("Alice", "addr2"), Person("Bob", "addr1")]
s = set({person.name: person for person in people}.values())
print(s)
# Output: {Person(name='Bob', address='addr1'), Person(name='Alice', address='addr2')}

Related

How can I generate a unique key from 2 strings?

Let's say I have "username1" and "username2"
How can I combine these two usernames together to generate a unique value?
The value should be the same no matter how they are combined if username2 is entered first or vice versa the two names should always combine to become the same unique value. The string length will not work as other usernames can have the same length.
Is there a simple way to do this or a technique for this?
Sets are unordered, and Python has a hashable immutable set type, assuming your requirement for a unique key is that it can be used as a dict key:
def key(a, b):
return frozenset([a, b])
d = {}
d[key("foo", "bar")] = "baz"
print(d[key("bar", "foo")])
You can also create a sorted tuple:
def key(a, b):
return tuple(sorted([a, b]))
d = {}
d[key("foo", "bar")] = "baz"
print(d[key("bar", "foo")])
There are many ways to generate a unique value. One common (to sysops) method is the one used to get a UNIX UUID (Universally Unique IDentifier) from the routines supplied with most major languages. In case of parsing problems, use a separator that does not appear in your input strings.
If you replace my constant strings "username1" and "username2" with your variables, I believe that your problem is solved.
import uuid
unique_sep = '|' # I posit vertical bar as not appearing in any user name
import uuid
unique_ID = uuid.uuid5(uuid.NAMESPACE_X500, "username1" + unique_sep + "username2")
print(unique_ID)
Output:
b2106742-94f5-596d-a461-c977b5982d85
To preserve uniqueness from entry order, simply sort them in any convenient fashion, such as:
names_in = [username1, username2]
unique_ID = uuid.uuid5(uuid.NAMESPACE_X500, min(names_in) + unique_sep + max(names_in))

Querying objects using attribute of member of many-to-many

I have the following models:
class Member(models.Model):
ref = models.CharField(max_length=200)
# some other stuff
def __str__(self):
return self.ref
class Feature(models.Model):
feature_id = models.BigIntegerField(default=0)
members = models.ManyToManyField(Member)
# some other stuff
A Member is basically just a pointer to a Feature. So let's say I have Features:
feature_id = 2, members = 1, 2
feature_id = 4
feature_id = 3
Then the members would be:
id = 1, ref = 4
id = 2, ref = 3
I want to find all of the Features which contain one or more Members from a list of "ok members." Currently my query looks like this:
# ndtmp is a query set of member-less Features which Members can point to
sids = [str(i) for i in list(ndtmp.values('feature_id'))]
# now make a query set that contains all rels and ways with at least one member with an id in sids
okmems = Member.objects.filter(ref__in=sids)
relsways = Feature.geoobjects.filter(members__in=okmems)
# now combine with nodes
op = relsways | ndtmp
This is enormously slow, and I'm not even sure if it's working. I've tried using print statements to debug, just to make sure anything is actually being parsed, and I get the following:
print(ndtmp.count())
>>> 12747
print(len(sids))
>>> 12747
print(okmems.count())
... and then the code just hangs for minutes, and eventually I quit it. I think that I just overcomplicated the query, but I'm not sure how best to simplify it. Should I:
Migrate Feature to use a CharField instead of a BigIntegerField? There is no real reason for me to use a BigIntegerField, I just did so because I was following a tutorial when I began this project. I tried a simple migration by just changing it in models.py and I got a "numeric" value in the column in PostgreSQL with format 'Decimal:( the id )', but there's probably some way around that that would force it to just shove the id into a string.
Use some feature of Many-To-Many Fields which I don't know abut to more efficiently check for matches
Calculate the bounding box of each Feature and store it in another column so that I don't have to do this calculation every time I query the database (so just the single fixed cost of calculation upon Migration + the cost of calculating whenever I add a new Feature or modify an existing one)?
Or something else? In case it helps, this is for a server-side script for an ongoing OpenStreetMap related project of mine, and you can see the work in progress here.
EDIT - I think a much faster way to get ndids is like this:
ndids = ndtmp.values_list('feature_id', flat=True)
This works, producing a non-empty set of ids.
Unfortunately, I am still at a loss as to how to get okmems. I tried:
okmems = Member.objects.filter(ref__in=str(ndids))
But it returns an empty query set. And I can confirm that the ref points are correct, via the following test:
Member.objects.values('ref')[:1]
>>> [{'ref': '2286047272'}]
Feature.objects.filter(feature_id='2286047272').values('feature_id')[:1]
>>> [{'feature_id': '2286047272'}]
You should take a look at annotate:
okmems = Member.objects.annotate(
feat_count=models.Count('feature')).filter(feat_count__gte=1)
relsways = Feature.geoobjects.filter(members__in=okmems)
Ultimately, I was wrong to set up the database using a numeric id in one table and a text-type id in the other. I am not very familiar with migrations yet, but as some point I'll have to take a deep dive into that world and figure out how to migrate my database to use numerics on both. For now, this works:
# ndtmp is a query set of member-less Features which Members can point to
# get the unique ids from ndtmp as strings
strids = ndtmp.extra({'feature_id_str':"CAST( \
feature_id AS VARCHAR)"}).order_by( \
'-feature_id_str').values_list('feature_id_str',flat=True).distinct()
# find all members whose ref values can be found in stride
okmems = Member.objects.filter(ref__in=strids)
# find all features containing one or more members in the accepted members list
relsways = Feature.geoobjects.filter(members__in=okmems)
# combine that with my existing list of allowed member-less features
op = relsways | ndtmp
# prove that this set is not empty
op.count()
# takes about 10 seconds
>>> 8997148 # looks like it worked!
Basically, I am making a query set of feature_ids (numerics) and casting it to be a query set of text-type (varchar) field values. I am then using values_list to make it only contain these string id values, and then I am finding all of the members whose ref ids are in that list of allowed Features. Now I know which members are allowed, so I can filter out all the Features which contain one or more members in that allowed list. Finally, I combine this query set of allowed Features which contain members with ndtmp, my original query set of allowed Features which do not contain members.

SQLAlchemy: how to select if in one list or another list?

I have a class which looks something like this:
class A(Base):
__tablename__ = "a";
id = Column(Integer, primary_key=True)
stuff = relationship('Stuff', secondary=stuff_a)
more_stuff = relationship('Stuff', secondary=more_stuff_a)
Basically two lists, stuff and more_stuff containing lists of Stuff.
I want to do a query which selects all A which have Stuff with id=X in either stuff list or in more_stuff list.
This is how I would do it for one list:
session.query(A).join(Stuff).filter(Stuff.id==X)
But that won't pick up Stuff from more_stuff.
I think that if you have two relationships from A to Stuff, even when you join for one, you need to explicitly specify which one, or sqlalchemy will rightfully complain. You can do this as follows:
q = (
session
.query(A)
.join(Stuff, A.stuff) # #note: here you specify the relationship
.filter(Stuff.id == X)
)
As to filter for both lists, you need to use an or_ operator in a filter. In order to be able to reference to both relationships, the easiest is to create aliases (give different names) to each of them. Then the code looks like below:
S1 = aliased(Stuff)
S2 = aliased(Stuff)
q = (
session
.query(A)
.join(S1, A.stuff) # S1 will refer to `A.stuff`
.join(S2, A.more_stuff) # S2 will refer to `A.more_stuff`
.filter(or_(S1.id == X, S2.id == X))
)
Alternatively, a cleaner code can be achieved with relationship.any():
q = (
session
.query(A)
.filter(or_(
A.stuff.any(Stuff.id == X), # here Stuff will refer to `A.stuff`
A.more_stuff.any(Stuff.id == X), # here Stuff will refer to `A.more_stuff`
))
)
but you will need to compare performance difference between two versions as the latter is implemented using EXISTS with sub-selects.

python: sort list with multiple values

I have a list of CrewRecords objects.
crew_record = list[<CrewRecords instance at 0x617bb48>,
<CrewRecords instance at 0x617b9e0>,
<CrewRecords instance at 0x5755680>]
where:
class CrewRecords():
def __init__(self):
self.crew_id = None
self.crew_date_of_hire = None
self.crew_points = None
def crew_attributes(self,crew_bag):
''' populate the values of crew with some values'''
self.crew_id = crew_bag.crew.id()
self.crew_date_of_hire = crew_bag.crew.date_of_hire()
self.crew_points = crew_bag.crew_points()
Now, i want to write a function in python which takes 3 arguments and sort the list by the preferences provided. i.e.
if the user inputs the value to be sort by
options:
points, date_of_hire, id
points, id, date_of_hire
date_of_hire, points, id
etc.. sort based on user input.
then, function should be able to sort with sort.i.e.
if the 1st option is chosen, then sort all crew by points, if 2 crew has same points then sort by date_of_hire, if date_of_hire is also same then sort by id.
Also, later if the sort options increases like if the user wants to sort by some extra option (by name for example) we should also be able to easily extend the sort criteria.
Use the key keyword to sorted, i.e.
return sorted(crew_record, key=attrgetter('crew_points', 'crew_date_of_hire', 'crew_id'))
would solve your first point.

py2neo - How can I use merge_one function along with multiple attributes for my node?

I have overcome the problem of avoiding the creation of duplicate nodes on my DB with the use of merge_one functions which works like that:
t=graph.merge_one("User","ID","someID")
which creates the node with unique ID. My problem is that I can't find a way to add multiple attributes/properties to my node along with the ID which is added automatically (date for example).
I have managed to achieve this the old "duplicate" way but it doesn't work now since merge_one can't accept more arguments! Any ideas???
Graph.merge_one only allows you to specify one key-value pair because it's meant to be used with a uniqueness constraint on a node label and property. Is there anything wrong with finding the node by its unique id with merge_one and then setting the properties?
t = graph.merge_one("User", "ID", "someID")
t['name'] = 'Nicole'
t['age'] = 23
t.push()
I know I am a bit late... but still useful I think
Using py2neo==2.0.7 and the docs (about Node.properties):
... and the latter is an instance of PropertySet which extends dict.
So the following worked for me:
m = graph.merge_one("Model", "mid", MID_SR)
m.properties.update({
'vendor':"XX",
'model':"XYZ",
'software':"OS",
'modelVersion':"",
'hardware':"",
'softwareVesion':"12.06"
})
graph.push(m)
This hacky function will iterate through the properties and values and labels gradually eliminating all nodes that don't match each criteria submitted. The final result will be a list of all (if any) nodes that match all the properties and labels supplied.
def find_multiProp(graph, *labels, **properties):
results = None
for l in labels:
for k,v in properties.iteritems():
if results == None:
genNodes = lambda l,k,v: graph.find(l, property_key=k, property_value=v)
results = [r for r in genNodes(l,k,v)]
continue
prevResults = results
results = [n for n in genNodes(l,k,v) if n in prevResults]
return results
The final result can be used to assess uniqueness and (if empty) create a new node, by combining the two functions together...
def merge_one_multiProp(graph, *labels, **properties):
r = find_multiProp(graph, *labels, **properties)
if not r:
# remove tuple association
node,= graph.create(Node(*labels, **properties))
else:
node = r[0]
return node
example...
from py2neo import Node, Graph
graph = Graph()
properties = {'p1':'v1', 'p2':'v2'}
labels = ('label1', 'label2')
graph.create(Node(*labels, **properties))
for l in labels:
graph.create(Node(l, **properties))
graph.create(Node(*labels, p1='v1'))
node = merge_one_multiProp(graph, *labels, **properties)

Categories

Resources