Convert a dict to numpy multi-dimensional array

Convert a dict to numpy multi-dimensional array - python

I have a python dictionary defined as follows, where the innermost items are two-element array:
mydict = {1: {1: [1, 2], 2: [3, 4]}, 2: {1: [5, 6], 2: [7, 8]}}
What I need now is to form all the 0th elements as a new array, i.e., using a[:,:,0] or a[...,0] to return [1,3,5,6]. However, a[:,:,0] or a[...,0] would not work in this case as shown below.
import numpy as np
import pandas as pd
a = np.array(pd.DataFrame.from_dict(mydict))
print a
which gives the following output:
[[[1, 2] [5, 6]]
[[3, 4] [7, 8]]]
It seems that this is an 2x2x2 array. There is no problem with accessing the corresponding element using separate brackets, e.g., a[0][0][0] returns 1. However, a[0,0,0] would cause an error.
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-150-f68aba7de42a> in <module>()
----> 1 a[0,0,0]
IndexError: too many indices for array
It seems that the two-element arrays are considered as elements in the 2x2 array -- but what I need is a 2x2x2 array in order to achieve my goal. Is there any way to convert this to a 2x2x2 array?

Your issue comes from the fact that pandas is treating your initial entries (lists) as objects, so then when you convert to a numpy array, your inner most entries are list objects. For example,
> type(a)
numpy.ndarray
> type(a[0])
numpy.ndarray
> type(a[0,0])
list
If you know the shape you ultimately want (2x2x2), you could always do:
> b = np.array(map(np.array, a.flat)).reshape(2,2,2)
> b.shape
(2, 2, 2)
> b[0,0,0]
1
Edit: Or even simpler:
> b = np.array(a.tolist())
array([[[1, 2],
[5, 6]],
[[3, 4],
[7, 8]]])
If you want the first item of each innermost row, e.g. 1,3,5,7, you could do b[...,0] or b[...,0].flatten() depending on the resulting shape you want.

Without Pandas I can recreate your array with:
In [1723]: mydict = {1: {1: [1, 2], 2: [3, 4]}, 2: {1: [5, 6], 2: [7, 8]}}
In [1724]: mydict
Out[1724]: {1: {1: [1, 2], 2: [3, 4]}, 2: {1: [5, 6], 2: [7, 8]}}
In [1725]: mydict[1]
Out[1725]: {1: [1, 2], 2: [3, 4]}
In [1726]: mydict[2]
Out[1726]: {1: [5, 6], 2: [7, 8]}
In [1727]: a=np.empty((2,2),dtype=object)
In [1728]: for i in range(2):
...: for j in range(2):
...: a[i,j]=mydict[i+1][j+1]
...:
In [1729]: a
Out[1729]:
array([[[1, 2], [3, 4]],
[[5, 6], [7, 8]]], dtype=object)
In [1730]: print(a)
[[[1, 2] [3, 4]]
[[5, 6] [7, 8]]]
This last print is the same as yours.
Elements of this array are lists
In [1735]: a[0,1]
Out[1735]: [3, 4]
In [1736]: type(a[0,1])
Out[1736]: list
The easist way to turn this into a 3d array is with tolist:
In [1737]: a.tolist()
Out[1737]: [[[1, 2], [3, 4]], [[5, 6], [7, 8]]]
In [1738]: np.array(a.tolist())
Out[1738]:
array([[[1, 2],
[3, 4]],
[[5, 6],
[7, 8]]])
In [1739]: _.shape
Out[1739]: (2, 2, 2)
# dtype('int32')
tolist unpacks the array into a nested list; np.array then creates the highest-dimension array it can from that list structure.

You need to dig into each dictionary element, and then into each sub-dictionary, and pull out the first element of each of the leaf lists.
a = [mydict[x][y][0] for y in mydict[x] for x in mydict]
Result as a Python list:
[1, 3, 5, 7]
I believe this is what you actually want.

Related

Python - delete columns in 2D list

I have a 2D list = [[1, 8, 3], [4, 5, 6], [0, 5, 7]], and I want to delete columns in a loop.
For example, columns with index: 0(first) and 2(last) - - the result after deletions should be: [8, 5, 5].
There is a problem, because when I delete the 0th column, the size of the list is decreased to (0,1), and the 2nd index is out of scope.
What is the fastest method to delete columns in a loop without the out-of-scope problem?
For a better picture:
[[1, 8, 3],
[4, 5, 6],
[0, 5, 7]]

There is no such shortcut in python except for iterating over all the list items and removing those index values.
However, you can use pandas which is meant for some other purpose but will do the task.
import pandas as pd
s = [[1, 8, 3], [4, 5, 6], [0, 5, 7]]
df = pd.DataFrame(s,columns=['val1','val2','val3'])
li = df.drop('val1',axis=1).values.tolist()
now li will look like this
[[8, 3], [5, 6], [5, 7]]

You can use numpy like this:
import numpy as np
my_list = np.array([[1, 8, 3], [4, 5, 6], [0, 5, 7]])
new_list = my_list[:, 1].copy()
print(new_list)
Output:
>>> [8, 5, 5]
Also numpy.delete(your_list, index, axis) is do the same job:
new_list = np.delete(my_list,(0, 2), axis=1)
(0, 2) is the indices of the columns 0 and 2
axis=1 says numpy that (0, 2) are columns indices not rows.
if you want to delete rows 0 and 2 you can change axis=1 to axis=0
Output is a little different:
>>> array([[8],
[5],
[5]])
For a pure python approach:
my_list = [[1, 8, 3], [4, 5, 6], [0, 5, 7]]
new_list = [value[1] for value in my_list]
print(new_list)
Output:
>>> [8, 5, 5]

L is 2D list:
print(map(lambda x: x[1:], L))

data= [[1, 8, 3], [4, 5, 6], [0, 5, 7]]
index_to_remove=[0,2]
[list(x) for x in zip(*[d for i,d in enumerate(zip(*data)) if i not in index_to_remove])]

If I understood your question correctly, you want to keep the middle element (index 1) of each list,in that case I would suggest creating a new list. There could be other better ways, for sure. But you could try this, if this works for you:
twoD_list = [[1, 8, 3], [4, 5, 6], [0, 5, 7]]
def keep_col( twoD_list ,index_to_keep = 1):
final_list = []
for x in twoD_list:
final_list.append(x[index_to_keep])
return final_list
final_list = keep_col( twoD_list , 1)
Final output:
[8,5,5]

Assuming you always want only the second element and the inner lists always have at least two elements.
Pure python with list comprehension:
lst = [
[1, 8, 3],
[4, 5, 6],
[0, 5, 7],
]
filtered_lst = [
inner_element
for inner_lst in lst
for i, inner_element in enumerate(inner_lst)
if i == 1
]
print(filtered_lst)
# [8, 5, 5]
If you want you can the reassign the new list to the old variable:
lst = filtered_lst
The advantages of this method are:
no need to worry about the list being altered while you iterate it,
no need to import other libraries
list comprehension is built-in
list comprehension is often the fastest way to filter a list (see for example this article)
easier to read and maintain that other solutions (in my opinion).

Via itemgetter to extract the value at index 1.
from operator import itemgetter
my_list = [[1, 8, 3], [4, 5, 6], [0, 5, 7]]
result = list(map(itemgetter(1), my_list))

try this
my_list = [[1, 8, 3], [4, 5, 6], [0, 5, 7]]
filter_col=[0,2]
col_length=3
my_list=[[x[i] for i in range(col_length) if i not in filter_col] for x in my_list]
u do not want to directly mutate the list that you are working on
this performs a list comprehension to create a new list from the existing list
edit:
just saw u wanted only a flat list
assuming u only want one element for the list u can use
my_list=[x[1] for x in my_list]

How to do Math Functions on Lists within a List

I'm very new to python (using python3) and I'm trying to add numbers from one list to another list. The only problem is that the second list is a list of lists. For example:
[[1, 2, 3], [4, 5, 6]]
What I want is to, say, add 1 to each item in the first list and 2 to each item in the second, returning something like this:
[[2, 3, 4], [6, 7, 8]]
I tried this:
original_lst = [[1, 2, 3], [4, 5, 6]]
trasposition_lst = [1, 2]
new_lst = [x+y for x,y in zip(original_lst, transposition_ls)]
print(new_lst)
When I do this, I get an error
can only concatenate list (not "int") to list
This leads me to believe that I can't operate in this way on the lists as long as they are nested within another list. I want to do this operation without flattening the nested list. Is there a solution?

One approach using enumerate
Demo:
l = [[1, 2, 3], [4, 5, 6]]
print( [[j+i for j in v] for i,v in enumerate(l, 1)] )
Output:
[[2, 3, 4], [6, 7, 8]]

You can use enumerate:
l = [[1, 2, 3], [4, 5, 6]]
new_l = [[c+i for c in a] for i, a in enumerate(l, 1)]
Output:
[[2, 3, 4], [6, 7, 8]]

Why don't use numpy instead?
import numpy as np
mat = np.array([[1, 2, 3], [4, 5, 6]])
mul = np.array([1,2])
m = np.ones(mat.shape)
res = (m.T *mul).T + mat

You were very close with you original method. Just fell one step short.
Small addition
original_lst = [[1, 2, 3], [4, 5, 6]]
transposition_lst = [1, 2]
new_lst = [[xx + y for xx in x] for x, y in zip(original_lst, transposition_lst)]
print(new_lst)
Output
[[2, 3, 4], [6, 7, 8]]
Reasoning
If you print your original zip it is easy to see the issue. Your original zip yielded this:
In:
original_lst = [[1, 2, 3], [4, 5, 6]]
transposition_lst = [1, 2]
for x,y in zip(original_lst, transposition_lst):
print(x, y)
Output
[1, 2, 3] 1
[4, 5, 6] 2
Now it is easy to see that you are trying to add an integer to a list (hence the error). Which python doesn't understand. if they were both integers it would add them or if they were both lists it would combine them.
To fix this you need to do one extra step with your code to add the integer to each value in the list. Hence the addition of the extra list comprehension in the solution above.

A different approach than numpy that could work even for lists of different lengths is
lst = [[1, 2, 3], [4, 5, 6, 7]]
c = [1, 2]
res = [[l + c[i] for l in lst[i]] for i in range(len(c))]

Checking for duplicates in list of list and sorting them

I have a table containing:
table = [[5, 7],[4, 3],[3, 3],[2, 3],[1, 3]]
and the first values represented in each list, (5,4,3,2,1) can be said to be an ID of a person. the second values represented (7,3,3,3,3) would be a score. What I'm trying to do is to detect duplicates values in the second column which is in this case is the 3s in the list. Because the 4 lists has 3 as the second value, i now want to sort them based on the first value.
In the table, notice that [1,3] has one as the first value hence, it should replace [4,3] position in the table. [2,3] should replace [3,3] in return.
Expected output: [[5,7],[1,3],[2,3],[3,3],[4,3]]
I attempted:
def checkDuplicate(arr):
i = 0
while (i<len(arr)-1):
if arr[i][1] == arr[i+1][1] and arr[i][0] > arr[i+1][0]:
arr[i],arr[i+1] = arr[i+1],arr[i]
i+=1
return arr
checkDuplicate(table)
The code doesn't fulfil the output i wanted and i would appreciate some help on this matter.

You can use sorted with a key.
table = [[5, 7], [4, 3], [3, 3], [2, 3], [1, 3]]
# Sorts by second index in decreasing order and then by first index in increasing order
sorted_table = sorted(table, key=lambda x: (-x[1], x[0]))
# sorted_table: [[5, 7], [1, 3], [2, 3], [3, 3], [4, 3]]

You should sort the entire list by the second column, using the first to break ties. This has the advantage of correctly grouping the threes even when the seven is interpersed among them, e.g. something like
table = [[4, 3],[3, 3],[5, 7],[2, 3],[1, 3]]
In Python, you can do it with a one-liner:
result = sorted(table, key=lambda x: (-x[1], x[0]))
If you want an in-place sort, do
table.sort(key=lambda x: (-x[1], x[0]))
Another neat thing you can do in this situation is to rely on the stability of Python's sorting algorithm. The docs actually suggest doing multiple sorts in complex cases like this, in the reverse order of the keys. Using the functions from operator supposedly speeds up the code as well:
from opetator import itemgetter
result = sorted(table, key=itemgetter(0))
result.sort(key=itemgetter(1), reversed=True)
The first sort will arrange the IDs in the correct order. The second will sort by score, in descending order, leaving the IDs undisturbed for identical scores since the sort is stable.

If you want to leave the list items with non-duplicate second elements untouched, and the ability to deal with the cases where multiple second items can be duplicate, I think you'll need more than the built-in sort.
What my function achieves:
Say your list is: table = [[5, 7], [6, 1], [8, 9], [3, 1], [4, 3], [3, 3], [2, 3], [1, 3]]
It will not touch the items [5, 7] and [8, 9], but will sort the remaining items by swapping them based on their second elements. The result will be:
[[5, 7], [3, 1], [8, 9], [6, 1], [1, 3], [2, 3], [3, 3], [4, 3]]
Here is the code:
def secondItemSort(table):
# First get your second values
secondVals = [e[1] for e in table]
# The second values that are duplicate
dups = [k for k,v in Counter(secondVals).items() if v>1]
# The indices of those duplicate second values
indices = dict()
for d in dups:
for i, e in enumerate(table):
if e[1]==d:
indices.setdefault(d, []).append(i)
# Now do the sort by swapping the items intelligently
for dupVal, indexList in indices.items():
sortedItems = sorted([table[i] for i in indexList])
c = 0
for i in range(len(table)):
if table[i][1] == dupVal:
table[i] = sortedItems[c]
c += 1
# And return the intelligently sorted list
return table
Test
Let's test on a little bit more complicated table:
table = [[5, 7], [6, 1], [8, 9], [3, 1], [4, 3], [3, 9], [3, 3], [2, 2], [2, 3], [1, 3]]
Items that should stay in their places: [5, 7] and [2, 2].
Items that should be swapped:
[6, 1] and [3, 1].
[8, 9] and [3, 9]
[4, 3], [3, 3], [2, 3], [1, 3]
Drumroll...
In [127]: secondItemSort(table)
Out[127]:
[[5, 7],
[3, 1],
[3, 9],
[6, 1],
[1, 3],
[8, 9],
[2, 3],
[2, 2],
[3, 3],
[4, 3]]

In the for-loop, when I use 'append' to add a new element to the list, the one that has added before also change

I want to get transpose of matrix B without using Numpy. When I use 'append' to add a new element to the list, the one that has added before also change. How can I fix it?
from decimal import *
B = [[1,2,3,5],
[2,3,3,5],
[1,2,5,1]]
def shape(M):
r = len(M)
c = len(M[0])
return r,c
def matxRound(M, decPts=4):
for p in M:
for index in range(len(M[0])):
p[index] = round(p[index], decPts)
def transpose(M):
c_trans, r_trans = shape(M)
new_row = [0]*c_trans
trans_M = []
for i in range(r_trans):
for j in range(c_trans):
new_row[j] = M[j][i]
print 'new_row',new_row
print 'trans_M before append',trans_M
trans_M.append(new_row)
print 'trans_M after append',trans_M
return trans_M
print transpose(B)
The output is here:
new_row [1, 2, 1]
trans_M before append []
trans_M after append [[1, 2, 1]]
new_row [2, 3, 2]
trans_M before append [[2, 3, 2]]
trans_M after append [[2, 3, 2], [2, 3, 2]]
new_row [3, 3, 5]
trans_M before append [[3, 3, 5], [3, 3, 5]]
trans_M after append [[3, 3, 5], [3, 3, 5], [3, 3, 5]]
new_row [5, 5, 1]
trans_M before append [[5, 5, 1], [5, 5, 1], [5, 5, 1]]
trans_M after append [[5, 5, 1], [5, 5, 1], [5, 5, 1], [5, 5, 1]]
[[5, 5, 1], [5, 5, 1], [5, 5, 1], [5, 5, 1]]

I will complete #glibdud comment's answer :
What you are doing now is creating a list that fits your needs for your Transpose.
You are creating your new matrix.
You are, then, appending your transposed value into your new matrix... without creating a new Transpose list.
What happens then is that you modify the last list you just appended, and try to append it again.
So in the end, you added the 4 same lists to your new matrix. As the 4 lists point to the same address in memory as they are the same object, your new matrix have 4 identical rows.

The most pythonic way I know to perform matrix transposition without using Numpy (that should be the preferred way), is by using list unpacking (list expansion) and the builtin zip function transposed = list(zip(*B)).
However, zip() return tuples while your original matrix is a list of lists. So, if you want to keep your structure, you can use transposed = [list(i) for i in zip(*B)]

Printing a column of a 2-D List in Python

Suppose if A = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
Then A[0][:] prints [1, 2, 3]
But why does A[:][0] print [1, 2, 3] again ?
It should print the column [1, 4, 7], shouldn't it?

[:] is equivalent to copy.
A[:][0] is the first row of a copy of A.
A[0][:] is a copy of the first row of A.
The two are the same.
To get the first column: [a[0] for a in A]
Or use numpy and np.array(A)[:,0]

When you don't specify a start or end index Python returns the entire array:
A[:] = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

[:] matches the entire list.
So A[:] is the same as A. So A[0][:] is the same as A[0].
And A[0][:] is the same as A[0].

A[:] returns a copy of the entire list. which is A = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
A[:][0] Thus selects [1, 2, 3].
If you want the first column, do a loop:
col = []
for row in A:
col.append(row[0])

A is actually a list of list, not a matrix. With A[:][0] You are accessing the first element (the list [1,2,3]) of the full slice of the list A. The [:] is Python slice notation (explained in the relevant Stack Overflow question).
To get [1,4,7] you would have to use something like [sublist[0] for sublist in A], which is a list comprehension, a vital element of the Python language.

Note that [:] just gives you a copy of all the content of the list. So what you are getting is perfectly normal. I think you wanted to use this operator as you would in numpy or Matlab. This does not do the same in regular Python.
A[0] is [1, 2, 3]
Therefore A[0][:] is also [1, 2, 3]
A[:] is [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
Therefore A[:][0] is [1, 2, 3]
If you wanted the first column you should try:
[e[0] for e in A]
# [1, 4, 7]

Problem
A is not a 2-D list: it is a list of lists. In consideration of that:
A[0] is the first list in A:
>>> A = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>> A[0]
[1, 2, 3]
Consequently, A[0][:]: is every element of the first list:
>>> A[0][:]
[1, 2, 3]
A[:] is every element of A, in other words it is a copy of A:
>>> A[:]
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
Consequently, A[:][0] is the first element of that copy of A.
>>> A[:][0]
[1, 2, 3]
Solution
To get what you want, use numpy:
>>> import numpy as np
>>> A = np.array( [[1, 2, 3], [4, 5, 6], [7, 8, 9]] )
A is now a true two-dimensional array. We can get the first row of A:
>>> A[0,:]
array([1, 2, 3])
We can similarly get the first column of A:
>>> A[:,0]
array([1, 4, 7])
`

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Convert a dict to numpy multi-dimensional array - python

You need to dig into each dictionary element, and then into each sub-dictionary, and pull out the first element of each of the leaf lists. a = [mydict[x][y][0] for y in mydict[x] for x in mydict] Result as a Python list: [1, 3, 5, 7] I believe this is what you actually want.

Related

Python - delete columns in 2D list

How to do Math Functions on Lists within a List

Checking for duplicates in list of list and sorting them

In the for-loop, when I use 'append' to add a new element to the list, the one that has added before also change

Printing a column of a 2-D List in Python

Categories

Resources