Pandas DF to a nested list - python

I have a dataframe (df) and I want to transform it to a nested list.
df=pd.DataFrame({'Number':[1,2,3,4,5, 6],
'Name':['A', 'B', 'C', 'D', 'E', 'F'],
'Value': [223, 124, 434, 514, 821, 110]})
My expected outcome is a nested list. The first list inside the nested takes values from the first 3 rows of df from the first column. The second then the first 3 rows of the second column and the third the 3 first rows of the third column. After that I want to add lists of the remaning 3 rows.
[[1, 2, 3],
['A', 'B', 'C'],
[223, 124, 434]
[4, 5, 6],
['D', 'E', 'F'],
[514, 821, 110]]
I did a for loop and called tolist() on each series. Then I get all the values from one column in a list. How do I go from the outcome below to the expected outcome above?
col=df.columns
lst=[]
for i in col:
temp = df[i].tolist()
temp
lst.append(temp)
Outcome (lst):
[[1, 2, 3, 4, 5, 6],
['A', 'B', 'C', 'D', 'E', 'F'],
[223, 124, 434, 514, 821, 110]]

Use .values and some numpy slicing
v = df.values.T
v[:,:3].tolist() + v[:,3:].tolist()
output
[[1, 2, 3],
['A', 'B', 'C'],
[223, 124, 434],
[4, 5, 6],
['D', 'E', 'F'],
[514, 821, 110]]

Try:
lst = df.set_index(df.index // 3).groupby(level=0).agg(list) \
.to_numpy().ravel().tolist()
print(lst)
# Output
[[1, 2, 3],
['A', 'B', 'C'],
[223, 124, 434],
[4, 5, 6],
['D', 'E', 'F'],
[514, 821, 110]]

This is an example starting from 3 lists, the ones you got doing .tolist()
a = [1, 2, 3, 4, 5, 6, 4]
b = ['A', 'B', 'C', 'D', 'E', 'F']
c = [223, 124, 434, 514, 821, 110]
res = []
for i in range(len(a) // 3):
res.append(a[i * 3:(i * 3) + 3])
res.append(b[i * 3:(i * 3) + 3])
res.append(c[i * 3:(i * 3) + 3])
result is
[[1, 2, 3], ['A', 'B', 'C'], [223, 124, 434], [4, 5, 6], ['D', 'E', 'F'], [514, 821, 110]]

import pandas as pd
df=pd.DataFrame({
'Number':[1,2,3,4,5, 6],
'Name':['A', 'B', 'C', 'D', 'E', 'F'],
'Value': [223, 124, 434, 514, 821, 110]
})
# convert df into slices of 3
final_list = []
for i in range(0, len(df), 3):
final_list.append(
df.iloc[i:i+3]['Number'].to_list())
final_list.append(
df.iloc[i:i+3]['Name'].to_list())
final_list.append(
df.iloc[i:i+3]['Value'].to_list())
print(final_list)
output
[[1, 2, 3], ['A', 'B', 'C'], [223, 124, 434], [4, 5, 6], ['D', 'E', 'F'], [514, 821, 110]]

I think you just want to divide the list (column) into list of size n.
You can change the value of n, to change the sublist size.
lst=[]
n=3
for i in col:
temp = df[i].tolist()
for i in range(0,len(temp),n):
lst.append(temp[i:i+n])

Related

Possible sets of different sub-list items with one element of each sub-list

I am looking for a way to obtain combinations of single elements of all sub-lists contained in a list without knowing in advance the length of the list and the sub-lists. Let me illustrate what I mean via two examples below. I have two lists (myList1 and myList2) and would like to obtain the two combination sets (setsCombo1 and setsCombo1):
myList1 = [['a'], [1, 2, 3], ['X', 'Y']]
setsCombo1 = [['a', 1, 'X'],
['a', 1, 'Y'],
['a', 2, 'X'],
['a', 2, 'Y'],
['a', 3, 'X'],
['a', 3, 'Y']]
myList2 = [['a'], [1, 2, 3], ['X', 'Y'], [8, 9]]
setsCombo2 = [['a', 1, 'X', 8],
['a', 1, 'X', 9],
['a', 1, 'Y', 8],
['a', 1, 'Y', 9],
['a', 2, 'X', 8],
['a', 2, 'X', 9],
['a', 2, 'Y', 8],
['a', 2, 'Y', 9],
['a', 3, 'X', 8],
['a', 3, 'X', 9],
['a', 3, 'Y', 8],
['a', 3, 'Y', 9]]
I looked a bit into itertools but couldn't really find anything quickly that is appropriate...
itertools.product with unpacking * (almost) does that:
>>> from itertools import product
>>> list(product(*myList1))
[('a', 1, 'X'),
('a', 1, 'Y'),
('a', 2, 'X'),
('a', 2, 'Y'),
('a', 3, 'X'),
('a', 3, 'Y')]
To cast the inner elements to lists, we map:
>>> list(map(list, product(*myList1)))
[['a', 1, 'X'],
['a', 1, 'Y'],
['a', 2, 'X'],
['a', 2, 'Y'],
['a', 3, 'X'],
['a', 3, 'Y']]

Python: how to replicate the same row of a matrix?

How can I copy each row of an array n times?
So if I have a 2x3 array, and I copy each row 3 times, I will have a 6x3 array. For example, I need to convert A to B below:
A = np.array([[1, 2, 3],
[4, 5, 6]])
B = np.array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[4, 5, 6],
[4, 5, 6],
[4, 5, 6]])
If possible, I would like to avoid a for loop.
If I read correctly, this is probably what you want assuming you started with mat:
transformed = np.concatenate([np.vstack([mat[:, i]] * 3).T for i in range(mat.shape[1])], axis=1)
Here's a verifiable example:
# mocking a starting array
import string
mat = np.random.choice(list(string.ascii_lowercase), size=(5,3))
>>> mat
array([['s', 'r', 'e'],
['g', 'v', 'c'],
['i', 'b', 'd'],
['f', 'g', 's'],
['o', 'm', 'w']], dtype='<U1')
Transform it:
# this repeats it 3 times for sake of displaying
transformed = np.concatenate([np.vstack([mat[i, :]] * 3).T for i in range(mat.shape[0])], axis=1).T
>>> transformed
array([['s', 'r', 'e'],
['s', 'r', 'e'],
['s', 'r', 'e'],
['g', 'v', 'c'],
['g', 'v', 'c'],
['g', 'v', 'c'],
['i', 'b', 'd'],
['i', 'b', 'd'],
['i', 'b', 'd'],
['f', 'g', 's'],
['f', 'g', 's'],
['f', 'g', 's'],
['o', 'm', 'w'],
['o', 'm', 'w'],
['o', 'm', 'w']], dtype='<U1')
The idea of this is to use vstack to concatenate each column to itself multiple time, and then concatenate the result of that to get the final array.
You can use np.repeat with integer positional indexing:
B = A[np.repeat(np.arange(A.shape[0]), 3)]
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[4, 5, 6],
[4, 5, 6],
[4, 5, 6]])
v1=[3,2]
v3=v1[:]*10
print(v3)
np.repeat is exactly what you are looking for. You can use the axis option to specify that you want to duplicate rows.
B = np.repeat(A, 3, axis=0)

python: output data from a list

I'm trying to figure out how to output list items. the code below is taking answers and checking them against a key to see which answers are correct. for each student correct answers are stored in correct_count. Then I'm sorting in ascending order based on the correct count.
def main():
answers = [
['A', 'B', 'A', 'C', 'C', 'D', 'E', 'E', 'A', 'D'],
['D', 'B', 'A', 'B', 'C', 'A', 'E', 'E', 'A', 'D'],
['E', 'D', 'D', 'A', 'C', 'B', 'E', 'E', 'A', 'D'],
['C', 'B', 'A', 'E', 'D', 'C', 'E', 'E', 'A', 'D'],
['A', 'B', 'D', 'C', 'C', 'D', 'E', 'E', 'A', 'D'],
['B', 'B', 'E', 'C', 'C', 'D', 'E', 'E', 'A', 'D'],
['B', 'B', 'A', 'C', 'C', 'D', 'E', 'E', 'A', 'D'],
['E', 'B', 'E', 'C', 'C', 'D', 'E', 'E', 'A', 'D']]
keys = ['D', 'B', 'D', 'C', 'C', 'D', 'A', 'E', 'A', 'D']
grades = []
# Grade all answers
for i in range(len(answers)):
# Grade one student
correct_count = 0
for j in range(len(answers[i])):
if answers[i][j] == keys[j]:
correct_count += 1
grades.append([i, correct_count])
grades.sort(key=lambda x: x[1])
# print("Student", i, "'s correct count is", correct_count)
if __name__ == '__main__':
main()
if I print out grades the output looks like this
[[0, 7]]
[[1, 6], [0, 7]]
[[2, 5], [1, 6], [0, 7]]
[[3, 4], [2, 5], [1, 6], [0, 7]]
[[3, 4], [2, 5], [1, 6], [0, 7], [4, 8]]
[[3, 4], [2, 5], [1, 6], [0, 7], [5, 7], [4, 8]]
[[3, 4], [2, 5], [1, 6], [0, 7], [5, 7], [6, 7], [4, 8]]
[[3, 4], [2, 5], [1, 6], [0, 7], [5, 7], [6, 7], [7, 7], [4, 8]]
what I'm interested in is the last row. The first number of each set corresponds to a student id and it's sorted in ascending order based on the 2nd number which represents a grade (4, 5, 6, 7, 7, 7, 7, 8).
I'm not sure how to grab that last row and iterate through it so that i get output like
student 3 has a grade of 4 and student 2 has a grade of 5
[[3, 4], [2, 5], [1, 6], [0, 7], [5, 7], [6, 7], [7, 7], [4, 8]]
def main():
answers = [
['A', 'B', 'A', 'C', 'C', 'D', 'E', 'E', 'A', 'D'],
['D', 'B', 'A', 'B', 'C', 'A', 'E', 'E', 'A', 'D'],
['E', 'D', 'D', 'A', 'C', 'B', 'E', 'E', 'A', 'D'],
['C', 'B', 'A', 'E', 'D', 'C', 'E', 'E', 'A', 'D'],
['A', 'B', 'D', 'C', 'C', 'D', 'E', 'E', 'A', 'D'],
['B', 'B', 'E', 'C', 'C', 'D', 'E', 'E', 'A', 'D'],
['B', 'B', 'A', 'C', 'C', 'D', 'E', 'E', 'A', 'D'],
['E', 'B', 'E', 'C', 'C', 'D', 'E', 'E', 'A', 'D']]
keys = ['D', 'B', 'D', 'C', 'C', 'D', 'A', 'E', 'A', 'D']
grades = []
# Grade all answers
for i in range(len(answers)):
# Grade one student
correct_count = 0
for j in range(len(answers[i])):
if answers[i][j] == keys[j]:
correct_count += 1
grades.append([i, correct_count])
grades.sort(key=lambda x: x[1])
for student, correct in grades:
print("Student", student,"'s correct count is", correct)
if __name__ == '__main__':
main()
What you were doing was printing grades while you were still in the loop. If you would've printed grades after both loops, you would've only seen the last line: [[3, 4], [2, 5], [1, 6], [0, 7], [5, 7], [6, 7], [7, 7], [4, 8]], then just loop through grades and python will "unpack" the list into the student, and grade, respectively ash shown above.
Here is the output:
Student 3 's correct count is 4
Student 2 's correct count is 5
Student 1 's correct count is 6
Student 0 's correct count is 7
Student 5 's correct count is 7
Student 6 's correct count is 7
Student 7 's correct count is 7
Student 4 's correct count is 8
Don't forget to click the check mark if you like this answer.
What about something like the following:
students_grade = {}
for id, ans in enumerate(answers):
students_grade[id] = sum([x == y for x, y in zip(ans, key)])
Now you have a dictionary with the id of students mapping to their score ;)
Of course, you can change the enumerate to have the true list of ids instead!
While MMelvin0581 already addressed the problem in your code, You can also use nested list comprehension to achieve the same results
>>> [(a,sum([1 if k==i else 0 for k,i in zip(keys,j)])) for a,j in enumerate(answers)]
This will produce output like:
>>> [(0, 7), (1, 6), (2, 5), (3, 4), (4, 8), (5, 7), (6, 7), (7, 7)]
Then you can sort your results based on the criteria
>>> from operator import itemgetter
>>> sorted(out, key=itemgetter(1))
Note: itemgetter will have slight performance benefit over lambda. The above operation will produce output like:
>>> [(3, 4), (2, 5), (1, 6), (0, 7), (5, 7), (6, 7), (7, 7), (4, 8)]
Then finally print your list like:
for item in sorted_list:
print("Student: {} Scored: {}".format(item[0],item[1]))

Slicing flat list into multi-level nested list efficiently

For example, I have a flat list
[1, 2, 3, 4, 5, 6, 7, 8, 9, 'A', 'B', 'C', 'D', 'E', 'F', 'G']
I want to transform it into 4-deep list
[[[[1, 2], [3, 4]], [[5, 6], [7, 8]]], [[[9, 'A'], ['B', 'C']], [['D', 'E'] ['F', 'G']]]]
Is there a way to do it without creating a separate variable for every level? What is the most memory- and performance-efficient way?
UPDATE:
Also, is there a way to do it in a non-symmetrical fashion?
[[[[1, 2, 3], 4], [[5, 6, 7], 8]]], [[[9, 'A', 'B'], 'C']], [['D', 'E', 'F'], 'G']]]]
Note that your first list has 15 elements instead of 16. Also, what should A be? Is it a constant you've defined somewhere else? I'll just assume it's a string : 'A'.
If you work with np.arrays, you could simply reshape your array:
import numpy as np
r = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 'A', 'B', 'C', 'D', 'E', 'F', 'G'])
r.reshape(2,2,2,2)
It outputs:
array([[[['1', '2'],
['3', '4']],
[['5', '6'],
['7', '8']]]
[[['9', 'A'],
['B', 'C']],
[['D', 'E'],
['F', 'G']]]
dtype='<U11')
This should be really efficient because numpy doesn't change the underlying data format. It's still a flat array, displayed differently.
Numpy doesn't support irregular shapes. You'll have to work with standard python lists then:
i = iter([1, 2, 3, 4, 5, 6, 7, 8, 9, 'A', 'B', 'C', 'D', 'E', 'F', 'G'])
l1 = []
for _ in range(2):
l2 = []
for _ in range(2):
l3 = []
l4 = []
for _ in range(3):
l4.append(next(i))
l3.append(l4)
l3.append(next(i))
l2.append(l3)
l1.append(l2)
print(l1)
# [[[[1, 2, 3], 4], [[5, 6, 7], 8]], [[[9, 'A', 'B'], 'C'], [['D', 'E', 'F'], 'G']]]
As you said, you'll have to define a temporary variable for each level. I guess you could use list comprehensions, but they wouldn't be pretty.

Organize list of lists in Python

Let's say a have a list of lists in Python:
list_of_values = [[a, b, c], [d, e, f], [g, h, i], [j, k, l]]
And I want to convert automatically to independent lists like:
list1 = [[a, b, c],[d + g + j, e + h + k, f + i + l]]
list2 = [[d, e, f], [g + j, h + k, i + l]]
list3 = [[g, h, i], [j, k, l]]
Let's say I have a list of lists of integers in Python:
list_of_values = [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]
And I want to convert automatically to independent lists like:
list1 = [[1, 2, 3],[4 + 7 + 10, 5 + 8 + 11, 6 + 9 + l2]]
list2 = [[4, 5, 6], [7 + 10, 8 + 11, 9 + 12]]
list3 = [[7, 8, 9], [10, 11, l2]]
Performing the math:
list1 = [[1, 2, 3], [21, 24, 27]]
list2 = [[4, 5, 6], [17, 19, 21]]
list3 = [[7, 8, 9], [10, 11, l2]]
For your updated Question:
Suppose you have "list of lists of strings" like below:
s = [['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i'], ['j', 'k', 'l']]
Then you can use: join to concatenate:
>>> for i in range(len(s)):
... [s[i], map(lambda t: ''.join(t), zip(*s[i + 1:]))]
...
[['a', 'b', 'c'], ['dgj', 'ehk', 'fil']]
[['d', 'e', 'f'], ['gj', 'hk', 'il']]
[['g', 'h', 'i'], ['j', 'k', 'l']]
[['j', 'k', 'l'], []]
If you don't need last line in output then just use range argument less then one of length:
>>> for i in range(len(s)-1):
... [s[i], map(''.join, zip(*s[i + 1:]))] # remove lambda function
...
[['a', 'b', 'c'], ['dgj', 'ehk', 'fil']]
[['d', 'e', 'f'], ['gj', 'hk', 'il']]
[['g', 'h', 'i'], ['j', 'k', 'l']]
But suppose if you have "list of lists of numbers" e.g.:
l = [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]
Then you can use sum function:
>>> for i in range(len(l) - 1):
... [l[i], map(sum, zip(*l[i + 1:]))]
...
[[1, 2, 3], [21, 24, 27]]
[[4, 5, 6], [17, 19, 21]]
[[7, 8, 9], [10, 11, 12]]
Edit:..
If you wants to make single function for both strings and number then you canmake use of add() operator from operator library.
Check add() function:
>>> from operator import add
>>> add(1, 2)
3
>>> add('1', '2') # this is like + works
'12'
Now, using it make a new my_add() that add all elements in a sequence, check following codes:
>>> def my_add(t):
... return reduce(add, t)
...
>>> my_add(('a', 'b'))
'ab'
>>> my_add((2, 1))
3
Now, write a function using my_add() function that will so your work:
def do_my_work(s):
for i in range(len(s)-1):
print [s[i], map(my_add, zip(*s[i + 1:]))]
Now, see how this works for you:
>>> s
[['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i'], ['j', 'k', 'l']]
>>> do_my_work(s)
[['a', 'b', 'c'], ['dgj', 'ehk', 'fil']]
[['d', 'e', 'f'], ['gj', 'hk', 'il']]
[['g', 'h', 'i'], ['j', 'k', 'l']]
>>> l
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]
>>> do_my_work(l) # so same function for str and int both!
[[1, 2, 3], [21, 24, 27]]
[[4, 5, 6], [17, 19, 21]]
[[7, 8, 9], [10, 11, 12]]
>>> import itertools
>>> lst = [['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i'], ['j', 'k', 'l']]
>>> for i, item in enumerate(lst):
print [item, itertools.chain.from_iterable(lst[i+1:])]
[['a', 'b', 'c'], ['d', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l']]
[['d', 'e', 'f'], ['g', 'h', 'i', 'j', 'k', 'l']]
[['g', 'h', 'i'], ['j', 'k', 'l']]
[['j', 'k', 'l'], []]
for i in range(len(list_of_values) - 1):
print [list_of_values[i]] + [map(list, zip(*list_of_values[i+1:]))]
Output
[['a', 'b', 'c'], [['d', 'g', 'j'], ['e', 'h', 'k'], ['f', 'i', 'l']]]
[['d', 'e', 'f'], [['g', 'j'], ['h', 'k'], ['i', 'l']]]
[['g', 'h', 'i'], [['j'], ['k'], ['l']]]
For the numbers, you can simply do
list_of_values = [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]
for i in range(len(list_of_values) - 1):
print [list_of_values[i]] + [map(sum, zip(*list_of_values[i+1:]))]
Output
[[1, 2, 3], [21, 24, 27]]
[[4, 5, 6], [17, 19, 21]]
[[7, 8, 9], [10, 11, 12]]

Categories

Resources