python: sort list with multiple values - python

I have a list of CrewRecords objects.
crew_record = list[<CrewRecords instance at 0x617bb48>,
<CrewRecords instance at 0x617b9e0>,
<CrewRecords instance at 0x5755680>]
where:
class CrewRecords():
def __init__(self):
self.crew_id = None
self.crew_date_of_hire = None
self.crew_points = None
def crew_attributes(self,crew_bag):
''' populate the values of crew with some values'''
self.crew_id = crew_bag.crew.id()
self.crew_date_of_hire = crew_bag.crew.date_of_hire()
self.crew_points = crew_bag.crew_points()
Now, i want to write a function in python which takes 3 arguments and sort the list by the preferences provided. i.e.
if the user inputs the value to be sort by
options:
points, date_of_hire, id
points, id, date_of_hire
date_of_hire, points, id
etc.. sort based on user input.
then, function should be able to sort with sort.i.e.
if the 1st option is chosen, then sort all crew by points, if 2 crew has same points then sort by date_of_hire, if date_of_hire is also same then sort by id.
Also, later if the sort options increases like if the user wants to sort by some extra option (by name for example) we should also be able to easily extend the sort criteria.

Use the key keyword to sorted, i.e.
return sorted(crew_record, key=attrgetter('crew_points', 'crew_date_of_hire', 'crew_id'))
would solve your first point.

Related

Python, django filter by kwargs or list, inclusive output

I want to get get account Ids that will be associated with determined list of ids, currently I filter by one exactly id and I would like to input various Ids so I can get a Wider result.
My code:
from typing import List
from project import models
def get_followers_ids(system_id) -> List[int]:
return list(models.Mapper.objects.filter(system_id__id=system_id
).values_list('account__id', flat=True))
If I run the code, I get the Ids associated with the main ID, the output will be a list of ids related to the main one (let's say, "with connection to"):
Example use:
system_id = 12350
utility_ids = get_followers_ids(system_id)
print(utility_ids)
output:
>>> [14338, 14339, 14341, 14343, 14344, 14346, 14347, 14348, 14349, 14350, 14351]
But I would like to input more variables avoiding to fell in a for loop, which will be slow because it will do many requests to the server.
The input I would like to use is a list or similar, it should be able to input various arguments at a time.
And the output should be a list of relations (doing the least number of requests to DB), example
if id=1 is related to [3,4,5,6]
and if id=2 is related to [5,6,7,8]
The output should be [3,4,5,6,7,8]
you can use the Field lookups, in your case use "In" lookup
so:
# system_ids is now a list
def get_followers_ids(system_ids) -> List[int]:
# here add in the end in filter field the "__in"
return list(models.Mapper.objects.filter(system_id__id__in=system_ids
).values_list('account__id', flat=True))
system_ids = [12350, 666]
utility_ids = get_followers_ids(system_ids)
print(utility_ids)

use custom comparator for a specific set

I am storing a number of objects in a set. Is there a way to override the comparator function used just for that set? I know I can override __eq__ and friends but I don't want to do so as I am also storing those objects in other sets.
Illustration:
# suppose Person class has name and address fields
p1 = Person("Alice", "addr1")
p2 = Person("Alice", "addr2")
s1 = set(p1, p2, [equality based on name only]) # this should contain only one of p1 or p2
s2 = set(p1, p2) # this should contain p1 and p2
set doesn't provide a way to determine whether two objects are equivalent; it leaves that up to the class.
However, you can group the objects by an arbitrary predicate, then construct an appropriate sequence to pass to set.
Here's a solution using itertools.groupby:
from itertools import groupby
def get_name(p):
return p.name # or however you get the name of a Person instance
s1 = set(next(v) for _, v in groupby(sorted([p1, p2], key=get_name), get_name))
After sorting by name, groupby will put all Persons with the same name in a single group. Iterating over the resulting sequence yields tuples like ("Alice", <sequence of Person...>). You can ignore the key, and just call next on the sequence to get an object with the name Alice.
Note that depending on how you do the grouping, "equal" elements can still end up in the different groups, and set will discard the duplicates as usual.
You can do this using a dictionary (which kind of uses a set under-the-hood):
# Just to simulate a class, not really necessary
import collections
Person = collections.namedtuple('Person', ('name', 'address'))
people = [Person("Alice", "addr1"), Person("Alice", "addr2"), Person("Bob", "addr1")]
s = set({person.name: person for person in people}.values())
print(s)
# Output: {Person(name='Bob', address='addr1'), Person(name='Alice', address='addr2')}

Python: Sorting ip ranges which are dictionary keys

I have a dictionary which has IP address ranges as Keys (used to de-duplicate in a previous step) and certain objects as values. Here's an example
Part of the dictionary sresult:
10.102.152.64-10.102.152.95 object1:object3
10.102.158.0-10.102.158.255 object2:object5:object4
10.102.158.0-10.102.158.31 object3:object4
10.102.159.0-10.102.255.255 object6
There are tens of thousands of lines, I want to sort (correctly) by IP address in keys
I tried splitting the key based on the range separator - to get a single IP address that can be sorted as follows:
ips={}
for key in sresult:
if '-' in key:
l = key.split('-')[0]
ips[l] = key
else:
ips[1] = key
And then using code found on another post, sorting by IP address and then looking up the values in the original dictionary:
sips = sorted(ipaddress.ip_address(line.strip()) for line in ips)
for x in sips:
print("SRC: "+ips[str(x)], "OBJECT: "+" :".join(list(set(sresult[ips[str(x)]]))), sep=",")
The problem I have encountered is that when I split the original range and add the sorted first IPs as new keys in another dictionary, I de-duplicate again losing lines of data - lines 2 & 3 in the example
line 1 10.102.152.64 -10.102.152.95
line 2 10.102.158.0 -10.102.158.255
line 3 10.102.158.0 -10.102.158.31
line 4 10.102.159.0 -10.102.255.25
becomes
line 1 10.102.152.64 -10.102.152.95
line 3 10.102.158.0 -10.102.158.31
line 4 10.102.159.0 -10.102.255.25
So upon rebuilding the original dictionary using the IP address sorted keys, I have lost data
Can anyone help please?
EDIT This post now consists of three parts:
1) A bit of information about dictionaries that you will need in order to understand the rest.
2) An analysis of your code, and how you could fix it without using any other Python features.
3) What I would consider the best solution to the problem, in detail.
1) Dictionaries
Python dictionaries are not ordered. If I have a dictionary like this:
dictionary = {"one": 1, "two": 2}
And I loop through dictionary.items(), I could get "one": 1 first, or I could get "two": 2 first. I don't know.
Every Python dictionary implicitly has two lists associated with it: a list of it's keys and a list of its values. You can get them list this:
print(list(dictionary.keys()))
print(list(dictionary.values()))
These lists do have an ordering. So they can be sorted. Of course, doing so won't change the original dictionary, however.
Your Code
What you realised is that in your case you only want to sort according to the first IP address in your dictionaries keys. Therefore, the strategy that you adopted is roughly as follows:
1) Build a new dictionary, where the keys are only this first part.
2) Get that list of keys from the dictionary.
3) Sort that list of keys.
4) Query the original dictionary for the values.
This approach will, as you noticed, fail at step 1. Because as soon as you made the new dictionary with truncated keys, you will have lost the ability to differentiate between some keys that were only different at the end. Every dictionary key must be unique.
A better strategy would be:
1) Build a function which can represent you "full" ip addresses with as an ip_address object.
2) Sort the list of dictionary keys (original dictionary, don't make a new one).
3) Query the dictionary in order.
Let's look at how we could change your code to implement step 1.
def represent(full_ip):
if '-' in full_ip:
# Stylistic note, never use o or l as variable names.
# They look just like 0 and 1.
first_part = full_ip.split('-')[0]
return ipaddress.ip_address(first_part.strip())
Now that we have a way to represent the full IP addresses, we can sort them according to this shortened version, without having to actually change the keys at all. All we have to do is tell Python's sorted method how we want the key to be represented, using the key parameter (NB, this key parameter has nothing to do with key in a dictionary. They just both happened to be called key.):
# Another stylistic note, always use .keys() when looping over dictionary keys. Explicit is better than implicit.
sips = sorted(sresults.keys(), key=represent)
And if this ipaddress library works, there should be no problems up to here. The remainder of your code you can use as is.
Part 3 The best solution
Whenever you are dealing with sorting something, it's always easiest to think about a much simpler problem: given two items, how would I compare them? Python gives us a way to do this. What we have to do is implement two data model methods called
__le__
and
__eq__
Let's try doing that:
class IPAddress:
def __init__(self, ip_address):
self.ip_address = ip_address # This will be the full IP address
def __le__(self, other):
""" Is this object less than or equal to the other one?"""
# First, let's find the first parts of the ip addresses
this_first_ip = self.ip_address.split("-")[0]
other_first_ip = other.ip_address.split("-")[0]
# Now let's put them into the external library
this_object = ipaddress.ip_address(this_first_ip)
other_object = ipaddress.ip_adress(other_first_ip)
return this_object <= other_object
def __eq__(self, other):
"""Are the two objects equal?"""
return self.ip_address == other.ip_adress
Cool, we have a class. Now, the data model methods will automatically be invoked any time I use "<" or "<=" or "==". Let's check that it is working:
test_ip_1 = IPAddress("10.102.152.64-10.102.152.95")
test_ip_2 = IPAddress("10.102.158.0-10.102.158.255")
print(test_ip_1 <= test_ip_2)
Now, the beauty of these data model methods is that Pythons "sort" and "sorted" will use them as well:
dictionary_keys = sresult.keys()
dictionary_key_objects = [IPAddress(key) for key in dictionary_keys]
sorted_dictionary_key_objects = sorted(dictionary_key_objects)
# According to you latest comment, the line below is what you are missing
sorted_dictionary_keys = [object.ip_address for object in sorted_dictionary_key_objects]
And now you can do:
for key in sorted_dictionary_keys:
print(key)
print(sresults[key])
The Python data model is almost the defining feature of Python. I'd recommend reading about it.

Querying objects using attribute of member of many-to-many

I have the following models:
class Member(models.Model):
ref = models.CharField(max_length=200)
# some other stuff
def __str__(self):
return self.ref
class Feature(models.Model):
feature_id = models.BigIntegerField(default=0)
members = models.ManyToManyField(Member)
# some other stuff
A Member is basically just a pointer to a Feature. So let's say I have Features:
feature_id = 2, members = 1, 2
feature_id = 4
feature_id = 3
Then the members would be:
id = 1, ref = 4
id = 2, ref = 3
I want to find all of the Features which contain one or more Members from a list of "ok members." Currently my query looks like this:
# ndtmp is a query set of member-less Features which Members can point to
sids = [str(i) for i in list(ndtmp.values('feature_id'))]
# now make a query set that contains all rels and ways with at least one member with an id in sids
okmems = Member.objects.filter(ref__in=sids)
relsways = Feature.geoobjects.filter(members__in=okmems)
# now combine with nodes
op = relsways | ndtmp
This is enormously slow, and I'm not even sure if it's working. I've tried using print statements to debug, just to make sure anything is actually being parsed, and I get the following:
print(ndtmp.count())
>>> 12747
print(len(sids))
>>> 12747
print(okmems.count())
... and then the code just hangs for minutes, and eventually I quit it. I think that I just overcomplicated the query, but I'm not sure how best to simplify it. Should I:
Migrate Feature to use a CharField instead of a BigIntegerField? There is no real reason for me to use a BigIntegerField, I just did so because I was following a tutorial when I began this project. I tried a simple migration by just changing it in models.py and I got a "numeric" value in the column in PostgreSQL with format 'Decimal:( the id )', but there's probably some way around that that would force it to just shove the id into a string.
Use some feature of Many-To-Many Fields which I don't know abut to more efficiently check for matches
Calculate the bounding box of each Feature and store it in another column so that I don't have to do this calculation every time I query the database (so just the single fixed cost of calculation upon Migration + the cost of calculating whenever I add a new Feature or modify an existing one)?
Or something else? In case it helps, this is for a server-side script for an ongoing OpenStreetMap related project of mine, and you can see the work in progress here.
EDIT - I think a much faster way to get ndids is like this:
ndids = ndtmp.values_list('feature_id', flat=True)
This works, producing a non-empty set of ids.
Unfortunately, I am still at a loss as to how to get okmems. I tried:
okmems = Member.objects.filter(ref__in=str(ndids))
But it returns an empty query set. And I can confirm that the ref points are correct, via the following test:
Member.objects.values('ref')[:1]
>>> [{'ref': '2286047272'}]
Feature.objects.filter(feature_id='2286047272').values('feature_id')[:1]
>>> [{'feature_id': '2286047272'}]
You should take a look at annotate:
okmems = Member.objects.annotate(
feat_count=models.Count('feature')).filter(feat_count__gte=1)
relsways = Feature.geoobjects.filter(members__in=okmems)
Ultimately, I was wrong to set up the database using a numeric id in one table and a text-type id in the other. I am not very familiar with migrations yet, but as some point I'll have to take a deep dive into that world and figure out how to migrate my database to use numerics on both. For now, this works:
# ndtmp is a query set of member-less Features which Members can point to
# get the unique ids from ndtmp as strings
strids = ndtmp.extra({'feature_id_str':"CAST( \
feature_id AS VARCHAR)"}).order_by( \
'-feature_id_str').values_list('feature_id_str',flat=True).distinct()
# find all members whose ref values can be found in stride
okmems = Member.objects.filter(ref__in=strids)
# find all features containing one or more members in the accepted members list
relsways = Feature.geoobjects.filter(members__in=okmems)
# combine that with my existing list of allowed member-less features
op = relsways | ndtmp
# prove that this set is not empty
op.count()
# takes about 10 seconds
>>> 8997148 # looks like it worked!
Basically, I am making a query set of feature_ids (numerics) and casting it to be a query set of text-type (varchar) field values. I am then using values_list to make it only contain these string id values, and then I am finding all of the members whose ref ids are in that list of allowed Features. Now I know which members are allowed, so I can filter out all the Features which contain one or more members in that allowed list. Finally, I combine this query set of allowed Features which contain members with ndtmp, my original query set of allowed Features which do not contain members.

Python data structure recommendation?

I currently have a structure that is a dict: each value is a list that contains numeric values. Each of these numeric lists contain what (to borrow a SQL idiom) you could call a primary key containing the first three values which are: a year, a player identifier, and a team identifier. This is the key for the dict.
So you can get a unique row by passing the a value in for the year, player ID, and team ID like so:
statline = stats[(2001, 'SEA', 'suzukic01')]
Which yields something like
[305, 20, 444, 330, 45]
I'd like to alter this data structure to be quickly summed by either of these three keys: so you could easily slice the totals for a given index in the numeric lists by passing in ONE of year, player ID, and team ID, and then the index. I want to be able to do something like
hr_total = stats[year=2001, idx=3]
Where that idx of 3 corresponds to the third column in the numeric list(s) that would be retrieved.
Any ideas?
Read up on Data Warehousing. Any book.
Read up on Star Schema Design. Any book. Seriously.
You have several dimensions: Year, Player, Team.
You have one fact: score
You want to have a structure like this.
You then want to create a set of dimension indexes like this.
years = collections.defaultdict( list )
players = collections.defaultdict( list )
teams = collections.defaultdict( list )
Your fact table can be this a collections.namedtuple. You can use something like this.
class ScoreFact( object ):
def __init__( self, year, player, team, score ):
self.year= year
self.player= player
self.team= team
self.score= score
years[self.year].append( self )
players[self.player].append( self )
teams[self.team].append( self )
Now you can find all items in a given dimension value. It's a simple list attached to a dimension value.
years['2001'] are all scores for the given year.
players['SEA'] are all scores for the given player.
etc. You can simply use sum() to add them up. A multi-dimensional query is something like this.
[ x for x in players['SEA'] if x.year == '2001' ]
Put your data into SQLite, and use its relational engine to do the work. You can create an in-memory database and not even have to touch the disk.
The syntax stats[year=2001, idx=3] is invalid Python and there is no way you can make it work with those square brackets and "keyword arguments"; you'll need to have a function or method call in order to accept keyword arguments.
So, say we make it a function, to be called like wells(stats, year=2001, idx=3). I imagine the idx argument is mandatory (which is very peculiar given the call, but you give no indication of what could possibly mean to omit idx) and exactly one of year, playerid, and teamid must be there.
With your current data structure, wells can already be implemented:
def wells(stats, year=None, playerid=None, teamid=None, idx=None):
if idx is None: raise ValueError('idx must be specified')
specifiers = [(i, x) for x in enumerate((year, playerid, teamid)) if x is not None]
if len(specifiers) != 2:
raise ValueError('Exactly one of year, playerid, teamid, must be given')
ikey, keyv = specifiers[0]
return sum(v[idx] for k, v in stats.iteritems() if k[ikey]==keyv)
of course, this is O(N) in the size of stats -- it must examine every entry in it. Please measure correctness and performance with this simple implementation as a baseline. An alternative solutions (much speedier in use, but requiring much time for preparation) is to put three dicts of lists (one each for year, playerid, teamid) to the side of stats, each entry indicating (or copying, but I think indicating by full key may suffice) all entries of stats that match that that ikey / keyv pair. But it's not clear at this time whether this implementation may not be premature, so please try first with the simple-minded idea!-)
def getSum(d, year, idx):
sum = 0
for key in d.keys():
if key[0] == year:
sum += d[key][idx]
return sum
This should get you started. I have made the assumption in this code, that ONLY year will be asked for, but it should be easy enough for you to manipulate this to check for other parameters as well
Cheers

Categories

Resources