Append 2 querysets in Python without loosing their order - python

I have two separate query sets A and B, I want to append B to A without any constraint.
Let's assume two queryset A and B.
A = (1, 5, 7, 15, 20 )
B = (4, 6, 10,14, 19, 21)
Now What I really want:
Final_queryset = (1, 5, 7, 15, 20 ,4, 6, 10,14, 19, 21)
I just want to append B to A without giving any order_by and without disturbing any order.
Here I can not put any order_by constraint because it will disturb the order. I do not want to use list because list because it loads whole objects into memory and I have 50000- 60000 objects so I can not use list.
Any idea on how I can achieve this using only querysets in python

You want itertools.chain(A, B).

Related

Generic function for consequtive element paring by n given to a function with zip

I have created a generic function to process consecutive pairings of n length from a given list of integers and give them to a function. It works but I very much dislike the eval in the function but don't know how to change that and still use the zip function.
def consecutive_element_pairing(data: list[int], consecutive_element=3, map_to_func=sum) -> list[int]:
"""
Return a list with consecutively paired items given to a function that can handle an iterable
:param data: the list of integers to process
:param consecutive_element: how many to group consecutively
:param map_to_func: the function to give the groups to
:return: the new list of consecutive grouped functioned items
"""
if len(data) < consecutive_element:
return []
return list(map(map_to_func, eval("zip(%s)" % "".join((map(lambda x: "data[%d:], " % x, range(consecutive_element)))))))
given a list of e.g.:
values = [1, 2, 3, 4, 5, 6, 7, 8, 9]
and I call it like this:
print("result:", consecutive_element_pairing(values))
[6, 9, 12, 15, 18, 21, 24]
This is correct as it correctly groups ((1,2,3),(2,3,4),(3,4,5)...) them by consecutive groups of 3 and then sums those.
The trouble I have with my code is the eval statement on the generated string of zip(data[0:], data[1:], data[2:], ).
I have no idea how to do this a different way as zip with a list inside does something completely different.
Can this be done differently while still using zip?
Any help is appreciated.
I know how to do this in many different ways but the challenge for myself was the usage of zip here :-) and making it a "generic" function.
You can simply use zip(*(values[i:] for i in range(N))):
Example
values = [1, 2, 3, 4, 5, 6, 7, 8, 9]
N = 3
list(zip(*(values[i:] for i in range(N))))
# [(1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6, 7), (6, 7, 8), (7, 8, 9)]
A slightly improved variant for long lists and large N might be:
zip(*(values[i:len(values)-(N-i)+1] for i in range(N)))
function
def consecutive_element_pairing(data: list[int], consecutive_element=3, map_to_func=sum) -> list[int]:
N = consecutive_element
return list(map(map_to_func, zip(*(data[i:len(data)-(N-i)+1] for i in range(N)))))
consecutive_element_pairing(values)
# [6, 9, 12, 15, 18, 21, 24]

Is there a way to find the nᵗʰ entry in itertools.combinations() without converting the entire thing to a list?

I am using the itertools library module in python.
I am interested the different ways to choose 15 of the first 26000 positive integers. The function itertools.combinations(range(1,26000), 15) enumerates all of these possible subsets, in a lexicographical ordering.
The binomial coefficient 26000 choose 15 is a very large number, on the order of 10^54. However, python has no problem running the code y = itertools.combinations(range(1,26000), 15) as shown in the sixth line below.
If I try to do y[3] to find just the 3rd entry, I get a TypeError. This means I need to convert it into a list first. The problem is that trying to convert it into a list gives a MemoryError. All of this is shown in the screenshot above.
Converting it into a list does work for smaller combinations, like 6 choose 3, shown below.
My question is:
Is there a way to access specific elements in itertools.combinations() without converting it into a list?
I want to be able to access, say, the first 10000 of these ~10^54 enumerated 15-element subsets.
Any help is appreciated. Thank you!
You can use a generator expression:
comb = itertools.combinations(range(1,26000), 15)
comb1000 = (next(comb) for i in range(1000))
To jump directly to the nth combination, here is an itertools recipe:
def nth_combination(iterable, r, index):
"""Equivalent to list(combinations(iterable, r))[index]"""
pool = tuple(iterable)
n = len(pool)
if r < 0 or r > n:
raise ValueError
c = 1
k = min(r, n-r)
for i in range(1, k+1):
c = c * (n - k + i) // i
if index < 0:
index += c
if index < 0 or index >= c:
raise IndexError
result = []
while r:
c, n, r = c*r//n, n-1, r-1
while index >= c:
index -= c
c, n = c*(n-r)//n, n-1
result.append(pool[-1-n])
return tuple(result)
It's also available in more_itertools.nth_combination
>>> import more_itertools # pip install more-itertools
>>> more_itertools.nth_combination(range(1,26000), 15, 123456)
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 18, 19541)
To instantly "fast-forward" a combinations instance to this position and continue iterating, you can set the state to the previously yielded state (note: 0-based state vector) and continue from there:
>>> comb = itertools.combinations(range(1,26000), 15)
>>> comb.__setstate__((0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 17, 19540))
>>> next(comb)
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 18, 19542)
If you want to access the first few elements, it's pretty straightforward with islice:
import itertools
print(list(itertools.islice(itertools.combinations(range(1,26000), 15), 1000)))
Note that islice internally iterates the combinations up to the specified point, so it can't magically give you the middle elements without iterating all the way there. You'd have to go down the route of computing the elements you want combinatorially in that case.

List intersection in Django ORM how to?

Let's say, I have two tables: all_my_friends_ids and my_facebook_friends_ids which represent two lists of my friends in database:
all_my_friends_ids = self.user.follows.values_list('pk', flat=True)
(e.g. all_my_friends_ids = [1, 4, 9, 16, 18, 20, 24, 70])
my_facebook_friends_ids = User.objects.filter(facebook_uid__in=my_facebook_friends_uids)
(e.g. my_facebook_friends_ids = [4, 16, 28, 44, 39])
I want to check if all elements of my_facebook_friends_ids list have entry in all_my_friends_ids or not, and if not - return id elements that are not in the all_my_friends_ids list (and add them later in all_my_friends_ids).
How to solve this task in Django ORM with QuerySet? I tried to extract ids and apply this function to them:
def sublistExists(list1, list2):
return ''.join(map(str, list2)) in ''.join(map(str, list1))
but it doesn't seem the right way, especially for my case.
facebook_exclusives = (User.objects
.filter(facebook_uid__in=my_facebook_friends_uids)
.exclude(facebook_uid__in=all_my_friends_ids))
If you want, you can offload it to your database completely, without creating a (potentially huge) intermediate list in Python:
facebook_exclusives = (User.objects
.filter(facebook_uid__in=my_facebook_friends_uids)
.exclude(facebook_uid__in=self.user.follows.all()))

How to define column headers when reading a csv file in Python

I have a comma separated value table that I want to read in Python. What I need to do is first tell Python not to skip the first row because that contains the headers. Then I need to tell it to read in the data as a list and not a string because I need to build an array out of the data and the first column is non-integer (row headers).
There are a total of 11 columns and 5 rows.
Here is the format of the table (except there are no row spaces):
col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11
w0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
w1 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
w2 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
w3 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Is there a way to do this? Any help is greatly appreciated!
You can use the csv module for this sort of thing. It will read in each row as a list of strings representing the different fields.
How exactly you'd want to use it depends on how you're going to process the data afterwards, but you might consider making a Reader object (from the csv.reader() function), calling next() on it once to get the first row, i.e. the headers, and then iterating over the remaining lines in a for loop.
r = csv.reader(...)
headers = r.next()
for fields in r:
# do stuff
If you're going to wind up putting the fields into a dict, you'd use DictReader instead (and that class will automatically take the field names from the first row, so you can just construct it an use it in a loop).

Making PostgreSQL respect the order of the inputted parameters?

This question has a little history — Is there a way to make a query respect the order of the inputted parameters?
I'm new to building "specialized" queries, so I assumed that if I supply an IN clause as part of a SELECT query, it'll return results in the same order. Unfortunately that's not the case.
SELECT * FROM artists WHERE id IN (8, 1, 2, 15, 14, 3, 13, 31, 16, 5, 4, 7, 32, 9, 37)
>>> [7, 32, 3, 8, 4, 2, 31, 9, 37, 13, 16, 1, 5, 15, 14]
(Didn't include the step where I used Python to loop through the result and append the IDs to a list.)
So the question is, is there a way to make Postgres respect the ordering of the parameters given in an IN clause by returning results the same order?
Query results will be returned in non-deterministic order unless you specify an ORDER BY clause.
If you really want to do the query in the manner you are requesting, then you could construct such a clause. Here's an example using part of your data.
create table artists (
id integer not null primary key,
name char(1) not null);
insert into artists
values
(8, 'a'),
(1, 'b'),
(2, 'c'),
(15, 'd'),
(14, 'e'),
(3, 'f'),
(13, 'g');
select *
from artists
where id in (8, 1, 2, 15, 14, 3, 13)
order by
id = 8 desc,
id = 1 desc,
id = 2 desc,
id = 15 desc,
id = 14 desc,
id = 3 desc,
id = 13 desc;
Based on this and on your other question, I think there is something wrong with your model or the way you are trying to do this. Perhaps you should post a more generic question about how to do what you are trying to do.
If you do have artists and ranking tables, you should be able to do something like this (or the equivalent through your ORM).
select
a.*
from
artists a,
rankings r
where
a.id = r.artist_id
order by
r.score desc;
I suggest you let PostGreSQL return the set in any arbitrary order (especially since it's difficult to do fine-grained SQL-level control from a Django interface), then sort it in the way you wish in Python -- theresultset.sort(key=yourlistofids.index) should do fine (when theresultset is the arbitrary-order list resulting from the database and yourlistofids is the list whose order you want to preserve).
Another way:
SELECT *
FROM artists
WHERE id IN (8, 1, 2, 15, 14, 3, 13, 31, 16, 5, 4, 7, 32, 9, 37)
ORDER BY POSITION(id::text in '(8, 1, 2, 15, 14, 3, 13, 31, 16, 5, 4, 7, 32, 9, 37)');

Categories

Resources