Convert DataFrame into multi-dimensional array with the column names of DataFrame - python

Below is the DataFrame I want to action upon:
df = pd.DataFrame({'A': [1,1,1],
'B': [2,2,3],
'C': [4,5,4]})
Each row of df creates a unique key. Objective is to create the following list of multi-dimensional arrays:
parameter = [[['A', 1],['B', 2], ['C', 4]],
[['A', 1],['B', 2], ['C', 5]],
[['A', 1],['B', 3], ['C', 4]]]
Problem is related to this question where I have to iterate over the parameter but instead of manually providing them to my function, I have to put all parameter from df (rows) in a list.

You could use the following list comprehension, which zips the values on each row with the columns of the dataframe:
from itertools import repeat
[list(map(list,zip(cols, i))) for cols, i in zip(df.values.tolist(), repeat(df.columns))]
[[[1, 'A'], [2, 'B'], [4, 'C']],
[[1, 'A'], [2, 'B'], [5, 'C']],
[[1, 'A'], [3, 'B'], [4, 'C']]]

Related

How to take elements from two lists and combine them in a third one (python)?

Let's say I have:
list_a = [1, 2, 3, 4, 5]
list_b = ['a', 'b', 'c']
And I expect the outcome to be something like this, so I can easily access it later:
list_c = [['a', 1], ['a', 2], ['a', 3], ...]
What's the easiest way to do that?
The two lists have different lengths
I need every letter in list_b to have the five corresponding numbers, basically all possible combinations, because I need to easily access ie. [c, 4] later on.
I tried just to append list_a and list_b to list_c but it obviously didn't go as planned.
I can't use builtin functions such as zip, itertools, etc.
Use a list comprehension with 2 for statements:
list_c = [[b, a] for b in list_b for a in list_a]
Output: [['a', 1], ['a', 2], ['a', 3], ['a', 4], ['a', 5], ['b', 1], ['b', 2], ['b', 3], ['b', 4], ['b', 5], ['c', 1], ['c', 2], ['c', 3], ['c', 4], ['c', 5]]

Shuffle groups of sublists in Python

I want to shuffle this list:
[[1, 'A'], [2, 'A'], [6, 'B'], [3, 'B'], [4, 'C'], [5, 'C'], [7, 'F']]
But I need groups identified by sublists second elements to stay together, so that the shuffled list could look like this:
[[6, 'B'], [3, 'B'], [7, 'F'], [1, 'A'], [2, 'A'], [4, 'C'], [5, 'C']]
Where all 'B', 'F', 'A', and 'C' sublists stay together.
I'm guessing using a combination of shuffle and groupby would do the trick, but I don't know where to start with this. Any idea would be appreciated!
items = [[1, 'A'], [2, 'A'], [6, 'B'], [3, 'B'], [4, 'C'], [5, 'C'], [7, 'F']]
import itertools, operator, random
groups = [list(g) for _, g in itertools.groupby(items, operator.itemgetter(1))]
random.shuffle(groups)
shuffled = [item for group in groups for item in group]
print(shuffled)
Prints for example:
[[4, 'C'], [5, 'C'], [1, 'A'], [2, 'A'], [7, 'F'], [6, 'B'], [3, 'B']]
Giving each group a random number and sorting by that. Sublists stay together because Pythons sorting is stable.
Update years later: Using a defaultdict looks nicer and only generates one random number for each group, not one for every element:
from random import random
from collections import defaultdict
r = defaultdict(random)
items.sort(key=lambda item: r[item[1]])
As squeezed oneliner:
items.sort(key=lambda i, r=defaultdict(random): r[i[1]])
Back to original answer:
items = [[1, 'A'], [2, 'A'], [6, 'B'], [3, 'B'], [4, 'C'], [5, 'C'], [7, 'F']]
import random
r = {b: random.random() for a, b in items}
items.sort(key=lambda item: r[item[1]])
print(items)
Prints for example:
[[6, 'B'], [3, 'B'], [4, 'C'], [5, 'C'], [7, 'F'], [1, 'A'], [2, 'A']]
The two lines could be combined, then you don't have that extra variable flying around afterwards.
items.sort(key=lambda item, r={b: random.random() for a, b in items}: r[item[1]])
You can use a dict to group without needing to sort then just shuffle the values the flatten into a flat list:
from collections import defaultdict
from random import shuffle
from itertools import chain
def shuffle_groups(l):
d = defaultdict(list)
for v, k in l:
d[k].append([k, v])
vals = list(d.values())
shuffle(vals)
return chain(*vals)
Output:
In [9]: list(shuffle_groups(l))
Out[9]: [['A', 1], ['A', 2], ['F', 7], ['B', 6], ['B', 3], ['C', 4], ['C', 5]]
In [10]: list(shuffle_groups(l))
Out[10]: [['C', 4], ['C', 5], ['B', 6], ['B', 3], ['A', 1], ['A', 2], ['F', 7]]
In [11]: list(shuffle_groups(l))
Out[11]: [['F', 7], ['B', 6], ['B', 3], ['A', 1], ['A', 2], ['C', 4], ['C', 5]]
Some timings:
In [5]: l =[choice(l) for _ in range(100000)]
In [6]: timeit _groupy(l)
10 loops, best of 3: 139 ms per loop
In [7]: timeit shuffle_groups(l)
10 loops, best of 3: 27.1 ms per loop

Python Sort Lambda

Sort Key Lambda Parameters
I do not understand how the lambda parameters are working, the [-e[0],e[1]] portion is especially confusing. I have removed all the excessive printing code and I have also removed all unnecessary code from my question. What does the parameter -e[0] achieve and what is that e[1] achieves?
data.sort(key = lambda e: [-e[0],e[1]]) # --> anonymous function
print ("This is the data sort after the lambda filter but NOT -e %s" %data)`
[in] 'aeeccccbbbbwwzzzwww'
[out] This is the data before the sort [[2, 'e'], [4, 'c'], [1, 'a'], [4, 'b'], [5, 'w'], [3, 'z']]
[out] This is the data sort before the lambda filter [[1, 'a'], [2, 'e'], [3, 'z'], [4, 'b'], [4, 'c'], [5, 'w']]
[out] This is the data sort after the lambda filter but NOT -e [[1, 'a'], [2, 'e'], [3, 'z'], [4, 'b'], [4, 'c'], [5, 'w']]
[out] This is the data sort after the lambda filter [[5, 'w'], [4, 'b'], [4, 'c'], [3, 'z'], [2, 'e'], [1, 'a']]
[out] w 5
[out] b 4
[out] c 4
l = [[2, 'e'], [4, 'c'], [1, 'a'], [4, 'b'], [5, 'w'], [3, 'z']]
>>> l.sort()
Normal sort: first the first element of the nested list is considered and then the second element.
>>>l.sort(key=lambda e: [e[0], e[1]])
Similar to l.sort()
>>>l.sort(key=lambda e: [-e[0], e[1]])
Now, what is does is- Reverse sort the the list on the basis of first element of the nested list AND sort normally on the internal elements of the nested sorted list i.e
first 2,3,4,5 etc are considered for sorting the list in reverse order( -e[0] == -2,-3,-4...) and then we sort the elements on the basis of second element for internal sorting (e[1] == 'w', 'a', 'b'...)

How to combine two list of lists to a new list [duplicate]

This question already has answers here:
How to get the cartesian product of multiple lists
(17 answers)
Closed 7 years ago.
I have a problem like this. I have two lists, A and B, where A=[[1,2],[3,4],[5,6]] and B=[["a","b"],["c","d"]], I would like to got a new list from these two like
C = [
[[1,2],["a","b"]],
[[3,4],["a","b"]],
[[1,2],["c","d"]],
[[3,4],["c","d"]]
]
I had try the following code:
A = [[1,2],[3,4]]
B=[["a","b"],["c","d"]]
for each in A:
for evey in B:
print each.append(evey)
However, the output is None.
Any helpful information are appreciated. Thank you.
By the way, I had try to replace the "append" with simple "+". The output is a list which elements are not list.
This was answered here: Get the cartesian product of a series of lists?
Try this:
import itertools
A = [[1,2],[3,4]]
B = [["a","b"],["c","d"]]
C = []
for element in itertools.product(A,B):
C.append(list(element))
print C
This is one way to do it:
A = [[1,2],[3,4]]
B=[["a","b"],["c","d"]]
C = zip(A,B)
The output here is a list of tuples:
[([[1, 2], [3, 4]],), ([['a', 'b'], ['c', 'd']],)]
If you want a list of lists, you can do this:
D = [list(i) for i in zip(A, B)]
The output:
[[[1, 2], ['a', 'b']], [[3, 4], ['c', 'd']]]
Try this. You have to append each couple of elements in each iteration.
result = []
for each in A:
for evey in B:
result.append([each,evey])
>>>result
[[[1, 2], ['a', 'b']],
[[1, 2], ['c', 'd']],
[[3, 4], ['a', 'b']],
[[3, 4], ['c', 'd']]]
OR
simply use itertools.product
>>>from itertools import product
>>>list(product(A,B))
[([1, 2], ['a', 'b']),
([1, 2], ['c', 'd']),
([3, 4], ['a', 'b']),
([3, 4], ['c', 'd'])]
You can use itertools.product to achieve this.
import itertools
list(itertools.product(A,B)) # gives the desired result
[([1, 2], ['a', 'b']),
([1, 2], ['c', 'd']),
([3, 4], ['a', 'b']),
([3, 4], ['c', 'd']),
([5, 6], ['a', 'b']),
([5, 6], ['c', 'd'])]
itertools.product(*iterables[, repeat])
It returns the Cartesian product of input iterables
Eg.
product('ABCD', 'xy') --> Ax Ay Bx By Cx Cy Dx Dy
Don't print return value of append(), try this:
A = [[1,2],[3,4]]
B=[["a","b"],["c","d"]]
C = []
for each in B:
for evey in A:
C.append([evey, each])
print C

Get DataFrame selection's row posititions

Instead of the indices, I'd like to obtain the row positions, so I can use the result later using df.iloc(row_positions).
This is the example:
df = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']}, index=[10, 2, 7])
print df[df['a']>=2].index
# Int64Index([2, 7], dtype='int64')
# How do I convert the index list [2, 7] to [1, 2] (the row position)
# I managed to do this for 1 index element, but how can I do this for the entire selection/index list?
df.index.get_loc(2)
Update
I could use a list comprehension to apply the selected result on the get_loc function, but perhaps there's some Pandas-built-in function.
you can use where from numpy:
import numpy as np
df = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']}, index=[10, 2, 7])
np.where( df.a>=2)
returns row indices:
(array([1, 2], dtype=int64),)
#ssm's answer is what I would normally use. However to answer your specific query of how to select multiple rows try this:
df = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']}, index=[10, 2, 7])
indices = df[df['a']>=2].index
print df.ix[indices]
More information on .ix indexing scheme is here
[EDIT to answer the specific query]
How do I convert the index list [2, 7] to [1, 2] (the row position)
df[df['a']>=2].reset_index().index

Categories

Resources