How can I reshape a 2D array into 1D in python? - python

Let me edit my question again. I know how flatten works but I am looking if it possible to remove the inside braces and just simple two outside braces just like in MATLAB and maintain the same shape of (3,4). here it is arrays inside array, and I want to have just one array so I can plot it easily also get the same results is it is in Matlab.
For example I have the following matrix (which is arrays inside array):
s=np.arange(12).reshape(3,4)
print(s)
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
Is it possible to reshape or flatten() it and get results like this:
[ 0 1 2 3
4 5 6 7
8 9 10 11]

First answer
If I understood correctly your question (and 4 other answers say I didn't), your problem is not how to flatten() or reshape(-1) an array, but how to ensure that even after reshaping, it still display with 4 elements per line.
I don't think you can, strictly speaking. Arrays are just a bunch of elements. They don't contain indication about how we want to see them. That's a printing problem, you are supposed to solve when printing. You can see [here][1] that people who want to do that... start with reshaping array in 2D.
That being said, without creating your own printing function, you can control how numpy display arrays, using np.set_printoptions.
Still, it is tricky so, because this function allows you only to specify how many characters, not elements, are printed per line. So you need to know how many chars each element will need, to force linebreaks.
In your example:
np.set_printoptions(formatter={"all":lambda x:"{:>6}".format(x)}, linewidth=7+(6+2)*4)
The formatter ensure that each number use 6 chars.
The linewidth, taking into account "array([" part, and the closing "])" (9 chars) plus the 2 ", " between each elements, knowing we want 4 elements, must be 9+6×4+2×3: 9 chars for "array([...])", 6×4 for each 4 numbers, 2×3 for each 3 ", " separator. Or 7+(6+2)×4.
You can use it only for one printing
with np.printoptions(formatter={"all":lambda x:"{:>6}".format(x)}, linewidth=7+(6+2)*4):
print(s.reshape(-1))
Edit after some times : subclass
Another method that came to my mind, would be to subclass ndarray, to make it behave as you would want
import numpy as np
class MyArr(np.ndarray):
# To create a new array, with args ls: number of element to print per line, and arr, normal array to take data from
def __new__(cls, ls, arr):
n=np.ndarray.__new__(MyArr, (len(arr,)))
n.ls=ls
n[:]=arr[:]
return n
def __init__(self, *args):
pass
# So that this .ls is viral: when ever the array is created from an operation from an array that has this .ls, the .ls is copyied in the new array
def __array_finalize__(self, obj):
if not hasattr(self, 'ls') and type(obj)==MyArr and hasattr(obj, 'ls'):
self.ls=obj.ls
# Function to print an array with .ls elements per line
def __repr__(self):
# For other than 1D array, just use standard representation
if len(self.shape)!=1:
return super().__repr__()
mxsize=max(len(str(s)) for s in self)
s='['
for i in range(len(self)):
if i%self.ls==0 and i>0:
s+='\n '
s+=f'{{:{mxsize}}}'.format(self[i])
if i+1<len(self): s+=', '
s+=']'
return s
Now you can use this MyArr to build your own 1D array
MyArr(4, range(12))
shows
[ 0.0, 1.0, 2.0, 3.0,
4.0, 5.0, 6.0, 7.0,
8.0, 9.0, 10.0, 11.0]
And you can use it anywhere a 1d ndarray is legal. And most of the time, the .ls attribute will follows (I say "most of the time", because I cannot guarantee that some functions wont build a new ndarray, and fill them with the data from this one)
a=MyArr(4, range(12))
a*2
#[ 0.0, 2.0, 4.0, 6.0,
# 8.0, 10.0, 12.0, 14.0,
# 16.0, 18.0, 20.0, 22.0]
a*a
#[ 0.0, 1.0, 4.0, 9.0,
# 16.0, 25.0, 36.0, 49.0,
# 64.0, 81.0, 100.0, 121.0]
a[8::-1]
#[8.0, 7.0, 6.0, 5.0,
# 4.0, 3.0, 2.0, 1.0,
# 0.0]
# It even resists reshaping
b=a.reshape((3,4))
b
#MyArr([[ 0., 1., 2., 3.],
# [ 4., 5., 6., 7.],
# [ 8., 9., 10., 11.]])
b.reshape((12,))
#[ 0.0, 1.0, 2.0, 3.0,
# 4.0, 5.0, 6.0, 7.0,
# 8.0, 9.0, 10.0, 11.0]
# Or fancy indexing
a[np.array([1,2,5,5,5])]
#[1.0, 2.0, 5.0, 5.0,
# 5.0]
# Or matrix operations
M=np.eye(12,k=1)+2*M.identity(12) # Just a matrix
M#a
#[ 1.0, 4.0, 7.0, 10.0,
# 13.0, 16.0, 19.0, 22.0,
# 25.0, 28.0, 31.0, 22.0]
np.diag(M*a)
#[ 0.0, 2.0, 4.0, 6.0,
# 8.0, 10.0, 12.0, 14.0,
# 16.0, 18.0, 20.0, 22.0]
# But of course, some time you loose the MyArr class
import pandas as pd
pd.DataFrame(a, columns=['v']).v.values
#array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11.])
[1]: https://stackoverflow.com/questions/25991666/how-to-efficiently-output-n-items-per-line-from-numpy-array

Simply, using reshape function with -1 as shape should do:
print(s)
print(s.reshape(-1))
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[ 0 1 2 3 4 5 6 7 8 9 10 11]

Try .ravel():
s = np.arange(12).reshape(3, 4)
print(s.ravel())
Prints:
[ 0 1 2 3 4 5 6 7 8 9 10 11]

you can use itertools.chain
from itertools import chain
import numpy as np
s=np.arange(12).reshape(3,4)
print(list(chain(*s)))
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
print(s.reshape(12,)) # this will also work
print(s.reshape(s.shape[0] * s.shape[1],)) # if don't know number of elements before hand

Related

Get an array of corresponding values in a reference array from very big input array

I have the following array:
table = np.array([
[1.0, 1.0, 3.0, 5.0],
[1.0, 2.0, 5.0, 3.0],
...
[2.0, 5.0, 2.0, 1.0],
[8.0, 9.0, 7.0, 2.0]])
Let's name the different columns respectively by ['a', 'b', 'm', 'n'].
"table" is my my reference table where I want to extract 'm' and 'n' given 'a' and 'b' contained in a list we will call 'my_list'. In that list, we allow duplicate pairs (a, b).
N.B.: Here list can be referred as array (not in the python sense)
It is easier to do it with for loop. But, for my problem, my list 'my_list' can contain more than 100000 pairs (a, b) so doing it with for loop is not optimal for my work.
How can I do it with numpy functions or pandas functions in a few lines (1 to 3 lines)?
An example of what I want: Given the following list
my_list = np.array([
[1.0, 2.0],
[1.0, 2.0],
[8.0, 9.0]])
I want to have the following result:
results = np.array([
[5.0, 3.0],
[5.0, 3.0],
[7.0, 2.0]])
Thank you in advance
Edit 1: equivalence with for loop
Here is the equivalent with for loop (simplest way with for loop without dichotomous search):
result = []
for x in my_list:
for y in table:
if (x[0] == y[0]) and (x[1] == y[1]):
result.append([y[2], y[3]])
break
print(results)
One possible approach using pandas is to perform inner merge
pd.DataFrame(table).merge(pd.DataFrame(my_list))[[2, 3]].to_numpy()
array([[5., 3.],
[5., 3.],
[7., 2.]])

Flatten inner tuples inside 3D NumPy array and save to CSV as floats

I would like to flatten the innermost and outermost axes of a 3(/4) dimensional NumPy array, when the innermost rows contain mixed-type entries: both floats and arrays/tuples of floats. I would then like to save this to a CSV file with all values as type float64.
have = [[[0.0 array([24.0,25.0]) 2.0 3.0]
[4.0 array([26.0,27.0]) 6.0 7.0]
[8.0 array([28.0,29.0]) 10.0 11.0]]
[[12.0 array([30.0,31.0]) 14.0 15.0]
[16.0 array([30.0,31.0]) 18.0 19.0]
[20.0 array([30.0,31.0]) 22.0 23.0]]]
target = [[0.0 24.0 25.0 2.0 3.0]
[4.0 26.0 27.0 6.0 7.0]
[8.0 28.0 29.0 10.0 11.0]
[12.0 30.0 31.0 14.0 15.0]
[16.0 30.0 31.0 18.0 19.0]
[20.0 30.0 31.0 22.0 23.0]]
np.savetxt('target.csv', target, delimiter=',')
Your question uses the example of a 3D array with an array element pointer at a central index, however the existing answers do not directly apply to your example input/output without further nontrivial steps. Here is a solution which achieves everything you have asked for (and will generalise) from your example to the CSV file in an efficient and concise way.
First let's create your example array have.
>>> import numpy as np
>>> have = np.arange(24).reshape(2,3,4).astype('object')
>>> ins = np.arange(24,36)
>>> c = 0
>>> for i in range(2):
for j in range(3):
have[i][j][1] = np.array([ins[c], ins[c+1]])
c += 2
>>> print(have) # The exact array as in the question.
[[[0 array([24, 25]) 2 3]
[4 array([26, 27]) 6 7]
[8 array([28, 29]) 10 11]]
[[12 array([30, 31]) 14 15]
[16 array([32, 33]) 18 19]
[20 array([34, 35]) 22 23]]]
Now we will create a new array to become your target array.
# Axis 1 is not needed as output will be all rows
#
all_rows = np.vstack(have)
# Efficiently create dummy array of correct size
#
>>> target = np.empty((have.shape[0]*have.shape[1], 4+1))
# LHS floats in correct positions
#
>>> target[:,:1] = all_rows[:,:1]
# RHS floats in correct positions
#
>>> target[:,-2:] = all_rows[:,-2:]
# Slicing at a single index converts each array to floats
#
>>> target[:,1:3] = np.vstack(all_rows[:,1])
>>> print(target) # The exact array as in the question.
[[ 0. 24. 25. 2. 3.]
[ 4. 26. 27. 6. 7.]
[ 8. 28. 29. 10. 11.]
[12. 30. 31. 14. 15.]
[16. 32. 33. 18. 19.]
[20. 34. 35. 22. 23.]]
This results in an array target with all entries of type float64 and so we can save to a CSV file in the current directory exactly as you have suggested in your question.
>>> np.savetxt('target.csv', target, delimiter=',')
Also just to let you know, you can include inline code/highlights on Stack Overflow using single 'backticks' instead of the normal single quotation mark used in your question. This is often found immediately below the Esc key on a keyboard.
When you display have numpy only knows that it has a (2,4) array in which some of the entries are pointers to other objects (numpy doesn't know what those are, but accesses their displays when you print have). As such, any solution is going to require gathering up these other arrays to re-assign them elsewhere, since numpy is not going to know in advance that all the arrays are the same size.
import numpy as np
have = np.array([[np.array([1,2]),5],[np.array([3,4]),7]])
target = np.empty((2,3)) #Create some empty space
target[:,2] = have[:,1] #Reassign all the float points to their locations
target[:,:2] = np.vstack(have[:,0]) #Stack all the other arrays (assumes same size)
This code should work for the specific scenario in the question.
access the inner most array
check the instance of each element
if it is of type <class 'numpy.ndarray'> call .tolist() to convert it to python list and use + operator merge the list to the existing inner list
Code:
import numpy as np
have = [
[
[0.0, np.array([24.0,25.0]), 2.0, 3.0],
[4.0, np.array([26.0,27.0]), 6.0, 7.0],
[8.0, np.array([28.0,29.0]), 10.0, 11.0]
],
[
[12.0, np.array([30.0,31.0]), 14.0, 15.0],
[16.0, np.array([30.0,31.0]), 18.0, 19.0],
[20.0, np.array([30.0,31.0]), 22.0, 23.0]
]
]
tar = []
for i, e1 in enumerate(have):
tar.append([])
for j, e2 in enumerate(e1):
temp = []
for e3 in e2:
if isinstance(e3, np.ndarray):
temp += e3.tolist()
else:
temp.append(e3)
tar[i].append(temp)
print(tar)
Ouptut:
[[[0.0, 24.0, 25.0, 2.0, 3.0],
[4.0, 26.0, 27.0, 6.0, 7.0],
[8.0, 28.0, 29.0, 10.0, 11.0]],
[[12.0, 30.0, 31.0, 14.0, 15.0],
[16.0, 30.0, 31.0, 18.0, 19.0],
[20.0, 30.0, 31.0, 22.0, 23.0]]]
I think it's fair to say that the core of your problem is merging numpy.ndarrays into existing lists of floats, rather than manipulations of the outer dimensions of the lists. The piece of code below addresses that problem, using your first inner list as an example:
import numpy as np
iList = [0.0, np.array([24.0,25.0]), 2.0, 3.0]
newList = []
for i in iList:
if type(i) == np.ndarray:
lList = i.tolist()
for l in lList:
newList.append(l)
else:
newList.append(i)
If you run this test code you can confirm that all the results are now floats:
for x in newList:
print(type(x))
which outputs:
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
Saving to CSV is well-covered elsewhere.

sum elements of list under conditions of second list

I'm trying to add up certain elements of two lists that are related. I will put an example so you understand what I'm talking about. In the end I write the code I have, it works but I want to optimize it, otherwise I have to write lots of things by hand. Apologies if the question is not interesting.
list1 = [4.0, 8.0, 14.0, 20.0, 22.0, 26.0, 28.0, 30.0, 32.0, 34.0, 36.0, 38.0, 40.0]
list2 = [2.1, 1.8, 9.5, 5., 5.4, 6.7, 3.3, 5.3, 8.8, 9.4, 5., 9.3, 3.1]
List 1 corresponds to time, so what I want to do is to cluster everything every 10 [units of time], i.e. from list1 I can see that the first and second element belong to the range 0-10, so I would need to add their corresponding points in list2. Later from list1 I see that the third and fourth elements belong to the range (10< time <= 20), so I add the same elements in list2, later for the third range, I need to add the following 4 elements in list3 and so on. In the end I would like to create 2 new lists
list3 = [10., 20., 30., 40.]
list4 = [3.9, 14.5, 20.7, 35.6]
The code I wrote is the following:
list1 = [4.0, 8.0, 14.0, 20.0, 22.0, 26.0, 28.0, 30.0, 32.0, 34.0, 36.0, 38.0, 40.0]
list2 = [2.1, 1.8, 9.5, 5., 5.4, 6.7, 3.3, 5.3, 8.8, 9.4, 5., 9.3, 3.1]
list3 = numpy.arange(0., 40., 10.)
a = [[] for i in range(4)]
for i, j in enumerate(list1):
if 0.<=j<=10.:
a[0].append(list2[i])
elif 10.<j<=20.:
a[1].append(list2[i])
elif 20.<j<=30.:
a[2].append(list2[i])
elif 30.<j<=40.:
a[3].append(list2[i])
list4 = [sum(i) for i in a]
it works, however, list1 in reality is way more larger (few orders of magnitude) and I don't want to write all the if's by hand (as well as the sublists I make). Any suggestions will be appreciated.
First of all if we are talking about huge sets, I would use numpy, pandas, or another tool that is designed for this. From my experience, Python itself is not designed to work for things with more than 10M elements (unless there is a structure in the data you can exploit).
Now we can use this as follows:
import numpy as np
# construct lists
l1 = np.array(list1)
l2 = np.array(list2)
# determine the "groups" of the values
g = (l1-0.00001)//10
# create a boolean mask that determines where the groups change
flag = np.concatenate(([True], g[1:] != g[:-1]))
# determine the indices of the swaps
inv_idx, = flag.nonzero()
# calculate the sum per subrange
result = np.add.reduceat(list2,inv_idx)
For your sample output, this gives:
>>> result
array([ 3.9, 14.5, 20.7, 35.6])
The 0.00001 is used to push a 20.0 to some 19.9999 is and thus assign it to group 1 instead of group 2. The advantage of this approach is that (a) it works for an arbitrary number of "groups" and (b) a fixed number of "swipes" are done over the list so it scales linear with the number of elements in the list.
If you transform your list in numpy.array, there are easy way to extract some stuff in a 1D-array based on another one:
import numpy
list1 = numpy.array([4.0, 8.0, 14.0, 20.0, 22.0, 26.0, 28.0, 30.0, 32.0, 34.0, 36.0, 38.0, 40.0])
list2 = numpy.array([2.1, 1.8, 9.5, 5., 5.4, 6.7, 3.3, 5.3, 8.8, 9.4, 5., 9.3, 3.1])
step = 10
r, s = range(0,50,10), []
for i in r:
s.append(numpy.sum([l for l in list2[(list1 > i) & (list1 <= i+step)]]))
print r[1:], s[:-1]
#[10, 20, 30, 40] [3.9, 14.5, 20.7, 35.6]
Edit
In one line:
s = [numpy.sum([l for l in list2[(list1 > i) & (list1 < i+step)]]) for i in r]

Python: Make Copy of List [duplicate]

This question already has answers here:
How do I clone a list so that it doesn't change unexpectedly after assignment?
(24 answers)
Closed 6 years ago.
How do I make a copy of a list, so I can edit the copy without affecting the original. Ex:
x = [1., 2., 3., 4.]
y = x
y[0] = 9.
The output is:
x: [9.0, 2.0, 3.0, 4.0]
y: [9.0, 2.0, 3.0, 4.0]
when I want x to be:
x: [1.0, 2.0, 3.0, 4.0]
So how do I make a copy of a variable while keeping the original unchanged?
Thanks in advance,
Eric
Just wrap x with python's list function when declaring y and it works!
x = [1, 2, 3, 4]
y = list(x)
y[0] = 9
print x
print y
#This prints the following
#[1, 2, 3, 4]
#[9, 2, 3, 4]
You can, in this case, use:
x = [1., 2., 3., 4.]
y = x[:]
y[0] = 9.
Output for x and y:
[1.0, 2.0, 3.0, 4.0]
[9.0, 2.0, 3.0, 4.0]
But read this.

Python Linear Regression Error

I have two arrays with the following values:
>>> x = [24.0, 13.0, 12.0, 22.0, 21.0, 10.0, 9.0, 12.0, 7.0, 14.0, 18.0,
... 1.0, 18.0, 15.0, 13.0, 13.0, 12.0, 19.0, 13.0]
>>> y = [10.0, 9.0, 22.0, 7.0, 4.0, 7.0, 56.0, 5.0, 24.0, 25.0, 11.0, 2.0,
... 9.0, 1.0, 9.0, 12.0, 9.0, 4.0, 2.0]
I used the scipy library to calculate r-squared:
>>> from scipy.interpolate import polyfit
>>> p1 = polyfit(x, y, 1)
When I run the code below:
>>> yfit = p1[0] * x + p1[1]
>>> yfit
array([], dtype=float64)
The yfit array is empty. I don't understand why.
The problem is you are performing scalar addition with an empty list.
The reason you have an empty list is because you try to perform scalar multiplication with a python list rather than with a numpy.array. The scalar is converted to an integer, 0, and creates a zero length list.
We'll explore this below, but to fix it you just need your data in numpy arrays instead of in lists. Either create it originally, or convert the lists to arrays:
>>> x = numpy.array([24.0, 13.0, 12.0, 22.0, 21.0, 10.0, 9.0, 12.0, 7.0, 14.0,
... 18.0, 1.0, 18.0, 15.0, 13.0, 13.0, 12.0, 19.0, 13.0]
An explanation of what was going on follows:
Let's unpack the expression yfit = p1[0] * x + p1[1].
The component parts are:
>>> p1[0]
-0.58791208791208893
p1[0] isn't a float however, it's a numpy data type:
>>> type(p1[0])
<class 'numpy.float64'>
x is as given above.
>>> p1[1]
20.230769230769241
Similar to p1[0], the type of p1[1] is also numpy.float64:
>>> type(p1[0])
<class 'numpy.float64'>
Multiplying a list by a non-integer interpolates the number to be an integer, so p1[0] which is -0.58791208791208893 becomes 0:
>>> p1[0] * x
[]
as
>>> 0 * [1, 2, 3]
[]
Finally you are adding the empty list to p[1], which is a numpy.float64.
This doesn't try to append the value to the empty list. It performs scalar addition, i.e. it adds 20.230769230769241 to each entry in the list.
However, since the list is empty there is no effect, other than it returns an empty numpy array with the type numpy.float64:
>>> [] + p1[1]
array([], dtype=float64)
An example of a scalar addition having an effect:
>>> [10, 20, 30] + p1[1]
array([ 30.23076923, 40.23076923, 50.23076923])

Categories

Resources