Modify numpy array column by column inside a loop - python

Is there a way to modify a numpy array inside a loop column by column?
I expect this could be done by some code like that:
import numpy as n
cnA=n.array([[10,20]]).T
mnX=n.array([[1,2],[3,4]])
for cnX in n.nditer(mnX.T, <some params>):
cnX = cnX+cnA
Which parameters should I use to obtain mnX=[[10,23],[12,24]]?
I am aware that the problem could be solved using the following code:
cnA=n.array([10,20])
mnX=n.array([[1,2],[3,4]])
for col in range(mnX.shape[1]):
mnX[:,col] = mnX[:,col]+cnA
Hovewer, in python we loop through modified objects, not indexes, so the question is - is it possible to loop through columns (that need to be modified in-place) directly?

Just so you know, some of us, in Python, do iterate over indices and not modified objects when it is helpful. Although in NumPy, as a general rule, we don't explicitly iterate unless there is no other way out: for your problem, the simplest approach would be to skip the iteration and rely on broadcasting:
mnX += cnA
If you insist on iterating, I think the simplest would be to iterate over the transposed array:
for col in mnX.T:
col += cnA[:, 0].T

Related

Can I use apply() or anyting else on a 1-dim dataframe to construct a list of dataframes?

Is there any way to construct a list of dataframes from a 1-dim dataframe or list? I thought apply would do the trick but it seems not to be the case. The job can be done easily by using a for loop but I wish to avoid that. More details down below.
This is the code I tried but it wouldn't work
pd.DataFrame([1,2,3,4,5]).apply(lambda x: pd.DataFrame([x]))
This is the code that would do the trick but for loop is what I wish to avoid at all cost, do run it so that you know what I actually try to achieve
list = [1,2,3,4,5]
j = []
for i in list:
i = pd.DataFrame([i])
j = j + [i]
In the project I work on, the thing I wish to do would be much more complex than just turning a element into a 1x1 dataframe but rather transforming it into a huge dataframe and eventually each of the dataframes generated would be put into a list, the only bottleneck is exactly this issue I described,
thanks in advance.
You can simplify and speed up your loop by using list comprehension which takes away the overhead of a for loop, here's a good read on it
Note: I renamed your list to lst since "list" is a reserved word in Python, don't use it as an variable
dfs = [pd.DataFrame([x]) for x in lst]
Now we can access each dataframe:
print(dfs[0])
0
0 1
print(dfs[1])
0
0 2

For loop through a numpy array of strings

I imported a csv file containing zip codes as a string using the following line:
my_data = genfromtext('path\to\file.csv', delimiter = ',', dtype=str, autostrip=True)
I am importing as a string in order to keep the leading zeroes some zip codes may contain. Now I need to also loop through the entire numpy array and I wanted to do so like this:
for i in np.nditer(my_data):
do something with my_data[i]
But unfortunately it is returning the following error:
Arrays used as indices must be of integer (or boolean) type
Any idea how I can loop through each element of this numpy array?
While looping over NumPy arrays is often not a good solution, you can do it like this:
for i in range(len(my_data)):
do something with my_data[i]
You might be better off reading your data into a list, process the strings, and convert into NumPy array afterwards.
You should do something with i, not with my_data[i]. i is already your element (a part if mydata).
Thats why my_data[i] is not working, becouse i is not an index. it is a numpy array.
If you want to use index, and the given element too, use enumerate()
Example:
lista = [20,50,70]
for idx, element in enumerate(lista):
print (idx, element)
For more info visit this site numpy iteration tutorial

Concatenate dicts of numpy arrays retaining numpy dtype

I'm concatenating python dicts within a loop (not shown). I declare a new empty dict (dsst_mean_all) on the first instance of the loop:
if station_index == 0:
dsst_mean_all = {}
for key in dsst_mean:
dsst_mean_all[key] = []
source = [dsst_mean_all, dsst_mean]
for key in source[0]:
dsst_mean_all[key] = np.concatenate([d[key] for d in source])
and then, as you can see in the second part of the code above, I concatenate the dict that has been obtained within the loop (dsst_mean) with the large dict that's going to hold all the data (dsst_mean_all).
Now dsst_mean is a dict whose elements are numpy arrays of different types. Mostly they are float32. My question is, how can I retain the datatype during concatenation? My dsst_mean_all dict ends up being float64 numpy arrays for all elements. I need these to match dsst_mean to save memory and reduce file size. Note that dsst_mean for all iterations of the loop has the same structure and elements of the same dtype.
Thanks.
You can define the dtype of your arrays in the list comprehension.
Either hardecoded:
dsst_mean_all[key] = np.concatenate([d[key].astype('float32') for d in source])
Or dynamic:
dsst_mean_all[key] = np.concatenate([d[key].astype(d[key].dtype) for d in source])
Docs: https://docs.scipy.org/doc/numpy-1.13.0/user/basics.types.html
Ok one way to solve this is to avoid declaring dsst_mean_all as a new empty dict. This - I think - is why everything is being cast to float64 by default. With an if/else statement, on the first iteration simply set dsst_mean_all to dsst_mean, whilst for all subsequent iterations do the concatenation as shown in my original question.

Convert array to python scalar

I need big help, please check out this code:
import.math
dose =20.0
a = [[[2,3,4],[5,8,9],[12,56,32]]
[[25,36,45][21,65,987][21,58,89]]
[[78,21,98],[54,36,78],[23,12,36]]]
PAC = math.exp(-dose*a)
this what I would like to do. However the error I am getting is
TypeError: only length-1 arrays can be converted to Python scalars
If you want to perform mathematical operations on arrays (whatever their dimensions...), you should really consider using NumPy which is designed just for that. In your case, the corresponding NumPy command would be:
PAC = numpy.exp(-dose*np.array(a))
If NumPy is not an option, you'll have to loop on each element of a, compute your math.exp, store the result in a list... Really cumbersome and inefficient. That's because the math functions require a scalar as input (as the exception told you), when you're passing a list (of lists). You can combine all the loops in a single list comprehension, though:
PAC = [[[math.exp(-dose*j) for j in elem] for elem in row] for row in a]
but once again, I would strongly recommend NumPy.
You should really use NumPy for that.
And here is how you should do it using nested loops:
>>> for item in a:
... for sub in item:
... for idx, number in enumerate(sub):
... print number, math.exp(-dose*number)
... sub[idx] = math.exp(-dose*number)
Using append is slow, because every time you copy the previous array and stack the new item to it.
Using enumerate, changes numbers in place. If you want to keep a copy of a, do:
acopy = a[:]
If you don't have much numbers, and NumPy is an over kill, the above could be done a tiny bit faster using list comprehensions.
If you want, for each element of the array to have it multiplied by -dose then apply math.exp on the result, you need a loop :
new_a = []
for subarray in a:
new_sub_array = []
for element in sub_array:
new_element = math.exp(-dose*element)
new_sub_array.append(new_element)
new_a.append(new_sub_array)
Alternatvely, if you have a mathlab background, you could inquire numpy, that enable transformations on array.

numpy: Replacing values in a recarray

I'm pretty new to numpy, and I'm trying to replace a value in a recarray. So I have this array:
import numpy as np
d = [('1', ''),('4', '5'),('7', '8')]
a = np.array(d, dtype=[('first', 'a5'), ('second', 'a5')])
I would like to do something like this:
ind = a=='' #Replace all blanks
a[ind] = '12345'
but that doesnt work properly. I was able to do this:
col = a['second']
ind = col=='' #Replace all blanks
col[ind] = '54321'
a['second'] = col
Which works, but I would rather have a way to do it over the entire recarray. Anyone have a better solution?
The "element-by-element" operations of numpy (with wich you can perform some function on all elements of the array at once without a loop) don't work with recarrays as far as I know. You can only do that with the individual columns.
If you want to use recarrays, I think the easiest solution is to loop the different columns, although you wanted another solution, but you can do it pretty automatic like this:
for fieldname in a.dtype.names:
ind = a[fieldname] == ''
a[fieldname][ind] = '54321'
But maybe you should consider if you really need recarrays, and can't just use normal ndarray. Certainly if you have only one data type (as in the example), then the only advantage are the column names.
One possible solution:
a[np.where(a['second']=='')[0][0]]['second']='12345'

Categories

Resources