Copy xarray into larger xarray in python? - python

I have an xarray of shape [3,] that looks like this
data= [2,4,6]
and I'm trying to copy it into array so that it looks like this:
data= [[2,4,6],[2,4,6],[2,4,6]]
(ie the entire array is copied three times).
I've tried a few different methods but keep getting:
data= [2 2 2,4 4 4,6 6 6]
Anyone know how I should go about doing this? (Also, sorry if I wrote this not according to the stack overflow rules, this is my first question...)

The first two answers don't actually copy the original array/list. Rather, they both just reference the array three times inside a new list. If you change one of the values inside of the the original array or any of the "copies" inside the new list, all of the "copies" of the array will change because they're really all the same structure just referenced in multiple places.
If you want to create a list containing three unique copies of your original array (xarray or list), you can do this:
new_list = [data[:] for _ in range(3)]
or if you want a new xarray containing your original array:
new_array = xarray.DataArray([data[:] for _ in range(3)])

I think the safest approach would be this:
import itertools
data = [2,4,6]
res = list(itertools.chain.from_iterable(itertools.repeat(x, 3) for x in [data]))
print(res)
Output: [[2, 4, 6], [2, 4, 6], [2, 4, 6]]

Related

Is there any way of getting multiple ranges of values in numpy array at once?

Let's say we have a simple 1D ndarray. That is:
import numpy as np
a = np.array([1,2,3,4,5,6,7,8,9,10])
I want to get the first 3 and the last 2 values, so that the output would be [ 1 2 3 9 10].
I have already solved this by merging and concatenating the merged variables as follows :
b= a[:2]
c= a[-2:]
a=np.concatenate([b,c])
However I would like to know if there is a more direct way to achieve this using slices, such as a[:2 and -2:] for instance. As an alternative I already tried this :
a = a[np.r_[:2, -2:]]
but it not seems to be working. It returns me only the first 2 values that is [1 2] ..
Thanks in advance!
Slicing a numpy array needs to be continuous AFAIK. The np.r_[-2:] does not work because it does not know how big the array a is. You could do np.r_[:2, len(a)-2:len(a)], but this will still copy the data since you are indexing with another array.
If you want to avoid copying data or doing any concatenation operation you could use np.lib.stride_tricks.as_strided:
ds = a.dtype.itemsize
np.lib.stride_tricks.as_strided(a, shape=(2,2), strides=(ds * 8, ds)).ravel()
Output:
array([ 1, 2, 9, 10])
But since you want the first 3 and last 2 values the stride for accessing the elements will not be equal. This is a bit trickier, but I suppose you could do:
np.lib.stride_tricks.as_strided(a, shape=(2,3), strides=(ds * 8, ds)).ravel()[:-1]
Output:
array([ 1, 2, 3, 9, 10])
Although, this is a potential dangerous operation because the last element is reading outside the allocated memory.
In afterthought, I cannot find out a way do this operation without copying the data somehow. The numpy ravel in the code snippets above is forced to make a copy of the data. If you can live with using the shapes (2,2) or (2,3) it might work in some cases, but you will only have reading permission to a strided view and this should be enforced by setting the keyword writeable=False.
You could try to access the elements with a list of indices.
import numpy as np
a = np.array([1,2,3,4,5,6,7,8,9,10])
b = a[[0,1,2,8,9]] # b should now be array([ 1, 2, 3, 9, 10])
Obviously, if your array is too long, you would not want to type out all the indices.
Thus, you could build the inner index list from for loops.
Something like that:
index_list = [i for i in range(3)] + [i for i in range(8, 10)]
b = a[index_list] # b should now be array([ 1, 2, 3, 9, 10])
Therefore, as long as you know where your desired elements are, you can access them individually.

Python: multidimensional pandas DataFrame

This is my first question.
I have many sets of data. Each of them should be presented in a DataFrame. I have tried to implement this by having a DataFrame as an item of a multidimensional tuple, e.g.:
data[0][1].Glucose.val
data[0][1].Glucose.time
I have predefined the tuple like this:
data = tuple([data_type for _ in range(3)] for _ in range(8))
Addressing this works fine, but if I try to fill the df with new values, all elements in the tuple are overwritten:
for condition in range(8):
for index in range(3):
loop_it = condition + row_mult * index
exp_setting = expIDs[loop_it]
tempval = pd.read_csv(f"raw_data/{exp_setting}_Glucose.csv", delimiter="\t")
rundata[condition][index].DOT.val = tempval.val.values
rundata[condition][index].DOT.time = tempval.t
What the hell am I doing wrong?
THANKS
Tuples are immutable, so you can't replace individual items without overwriting the whole tuple. You could use lists of DataFrames instead.
If your DataFrames all have the same shape, and all the values are numerical, you could also use just one multi-dimensional NumPy array for all the data, e.g.:
import numpy as np
data = np.array([[[1, 2], [3, 4]],
[[5, 6], [7, 8]]])
# replace the first item in the second row of the first frame with 9
data[0, 1, 0] = 9
print(data)
[[[1 2]
[9 4]]
[[5 6]
[7 8]]]
By the way, pandas did have special data structures for 3- and 4-dimensional DataFrames in earlier versions, but I guess they were found unnecessary. Maybe you can stack the data into one DataFrame with two dimensions. For that, you may want to look into pandas' MultiIndex functionality.
As mentioned here: Multidimensional list of classes - overwriting issue
The issue was, that I missed to inital the class correctly.
Wrong:
data = tuple([data_type for _ in range(3)] for _ in range(8))
Right:
data = tuple([data_type() for _ in range(3)] for _ in range(8))

Variable changes after I change the element of array I defined it with?

I cannot comprehend what is happening in this picture. I define a variable temp and it changes on it's own after I change the array I defined it with. It's not how I thought it works at all.
I'm using Python 3.6.1 if it matters.
A numpy array is not like Python lists. The array is a single object, and when you index it you get slices that refer to parts of the array. The rows are not independent objects, they're just views into the array.
So the value of temp is a reference to the first row of the array. Assigning to matrika[0] modifies the array. It's analogous to doing a slice assignment with regular lists, e.g.
matrika = [[1, 2, 3], [4, 5, 6], [5, 5, 5], [53, 1, 2]]
temp = matrika[0]
matrika[0][:] = matrika[1]
print(temp)

best way to create a numpy array from a list and additional individual values

I want to create an array from list entries and some additional individual values.
I am using the following approach which seems clumsy:
x=[1,2,3]
y=some_variable1
z=some_variable2
x.append(y)
x.append(z)
arr = np.array(x)
#print arr --> [1 2 3 some_variable1 some_variable2]
is there a better solution to the problem?
You can use list addition to add the variables all placed in a list to the larger one, like so:
arr = np.array(x + [y, z])
Appending or concatenating lists is fine, and probably fastest.
Concatenating at the array level works as well
In [456]: np.hstack([x,y,z])
Out[456]: array([1, 2, 3, 4, 5])
This is compact, but under the covers it does
np.concatenate([np.array(x),np.array([y]),np.array([z])])

list comprehension to merge various lists in python

I need to plot a lot of data samples, each stored in a list of integers. I want to create a list from a lot of concatenated lists, in order to plot it with enumerate(big_list) in order to get a fixed-offset x coordinate.
My current code is:
biglist = []
for n in xrange(number_of_lists):
biglist.extend(recordings[n][chosen_channel])
for x,y in enumerate(biglist):
print x,y
Notes: number_of_lists and chosen_channel are integer parameters defined elsewhere, and print x,y is for example (actually there are other statements to plot the points.
My question is:
is there a better way, for example, list comprehensions or other operation, to achieve the same result (merged list) without the loop and the pre-declared empty list?
Thanks
import itertools
for x,y in enumerate(itertools.chain(*(recordings[n][chosen_channel] for n in xrange(number_of_lists))):
print x,y
You can think of itertools.chain() as managing an iterator over the individual lists. It remembers which list and where in the list you are. This saves you all memory you would need to create the big list.
>>> import itertools
>>> l1 = [2,3,4,5]
>>> l2=[9,8,7]
>>> itertools.chain(l1,l2)
<itertools.chain object at 0x100429f90>
>>> list(itertools.chain(l1,l2))
[2, 3, 4, 5, 9, 8, 7]

Categories

Resources