numpy: Replacing values in a recarray - python

I'm pretty new to numpy, and I'm trying to replace a value in a recarray. So I have this array:
import numpy as np
d = [('1', ''),('4', '5'),('7', '8')]
a = np.array(d, dtype=[('first', 'a5'), ('second', 'a5')])
I would like to do something like this:
ind = a=='' #Replace all blanks
a[ind] = '12345'
but that doesnt work properly. I was able to do this:
col = a['second']
ind = col=='' #Replace all blanks
col[ind] = '54321'
a['second'] = col
Which works, but I would rather have a way to do it over the entire recarray. Anyone have a better solution?

The "element-by-element" operations of numpy (with wich you can perform some function on all elements of the array at once without a loop) don't work with recarrays as far as I know. You can only do that with the individual columns.
If you want to use recarrays, I think the easiest solution is to loop the different columns, although you wanted another solution, but you can do it pretty automatic like this:
for fieldname in a.dtype.names:
ind = a[fieldname] == ''
a[fieldname][ind] = '54321'
But maybe you should consider if you really need recarrays, and can't just use normal ndarray. Certainly if you have only one data type (as in the example), then the only advantage are the column names.

One possible solution:
a[np.where(a['second']=='')[0][0]]['second']='12345'

Related

Key-Range does not stay rounded when put into dictionary in Python

I am trying to bin the values in my data and put them in a dictionary in Python.
However, after creating the dictionary, its key-range produces weird aritfacts, like 0.6900000000000001 instead of 0.69. They only appear after creating the dictionary, though, the initial array "key_range" has only normal values. Therefore, the last two lines of my code produce KeyErrors, since the value 0.69 does not exist.
Does anyone know what is going on? Is it wrong to use the zip-function? Can I not create a functioning dictionary like this? I suppose I can iterate through the key values and round them manually, but I imagine there are more elegant solutions.
Cheers, and thanks
import numpy as np
key_range = np.arange(0, 1, 0.01) # these numbers are perfectly OK.
values = [0] * len(key_range)
value_dict = dict(zip(key_range, values)) # and here, I get weird artifacts.
print(value_dict)
for i in range(0, len(data)):
value_dict[data[i]] = value_dict[data[i]] + 1
I suppose I can iterate through the key values and round them manually, but I imagine there are more elegant solutions.. For what it is worth, you can fix them within your expression that creates value_dict, which still looks pretty elegant to me:
value_dict = dict(zip(map(lambda x: round(x,2), key_range), values))

Python - convert a list in an array

I got a list of values and i would like to convert it in an array in order to extract easily columns, but i m embarassed with " which doesn t allow to use : " x = np.array(a, dtype=float)"
['"442116.503118","442116.251106"',
'"442141.502863","442141.247462"',
...
The message obtained is :
"could not convert string to float: "442116.503118","442116.251106""
Answering based on the VERY limited information given, but if that is your list it looks like a list of nested strings, not floats. Try
x = np.array([float(i.replace("\"","")) for i in a], dtype=float)"
This is just wrong... This does the trick for me though:
import numpy as np
wtf = ['"442116.503118","442116.251106"',
'"442141.502863","442141.247462"']
to_list = []
for nest1 in wtf:
nest2 = nest1.split(',')
for each in nest2:
to_list.append(float(each.strip('"')))
to_array = np.asarray(to_list)
Not exactly elegant. You need to deal with each level of nesting in your input data. I'd recommend you reconsider the way you're formatting the data you're inputting.

Output and Import list of lists to Pandas DataFrame

I want to be able to append to a .txt file each time I run a function.
The output I am trying to write to the function is something like this:
somelist = ['a','b','b','c']
somefloat = -0.64524
sometuple = (235,633,4245,524)
output = tuple(somelist,somefloat,sometuple) (the output does not need to be in tuple format.)
Right now, I am outputting like this:
outfile = open('log.txt','a')
out = str(output)+'\n
outfile.write(out)
This kind of works, but I have to import it like this:
with open('log.txt', "r") as myfile:
mydata = myfile.readlines()
for line in mydata:
line = eval(line)
Ideally, I would like to be able to import it back directly into a Pandas DataFrame something like this:
dflog = pd.read_csv('log.txt')
and have it generate a three column dataset with the first column containing a list (string format is fine), the second column containing a float, and the third column containing a tuple (same deal as the list).
My questions are:
Is there a way to append the output in a format that can be more easily imported into pandas
Is there a simpler way of doing this, this seems like a pretty common task, I wouldn't be surprised if somebody has made this into a line or two of code.
One way to do this is to separate your columns with a custom separator such as '|'
Say:
somelist = ['a','b','b','c']
somefloat = -0.64524
sometuple = (235,633,4245,524)
output = str(somelist) + "|" + str(somefloat) + "|" + str(sometuple)
(if you wanna have many more columns, then use string.join() or something like that)
Then, just as before:
outfile = open('log.txt','a')
out = output + '\n'
outfile.write(out)
As just read the whole file with
pd.read_csv("log.txt", sep='|')
Do note that using lists or tuples for an entry in pandasis discouraged (I couldn't find a official reference for that though). For speedups with operations, you might consider dividing your tuples or lists into separate columns so that you're left with floats, integers or simple strings. Pandas can easily handle automatic naming if you so need.

Python dot-multiply lists on list of lists, without using numpy

I am quite new to python and getting my head around arrays as such, and I am struck on a rather simple problem. I have a list of list, like so:
a = [[1,0,1,0,1],[0,0,1,0,1],[0,0,1,0,1],[1,1,1,0,1],[1,0,0,0,0]]
and I would like to multiply elements of each list with each other. Something like:
a_dot = [1,0,1,0,1]*[0,0,1,0,1]*[0,0,1,0,1]*[1,1,1,0,1]*[1,0,1,0,0]
=[0,0,1,0,0]
Was wondering if I can do the above without using numpy/scipy.
Thanks.
import operator
a_dot = [reduce(operator.mul, col, 1) for col in zip(*a)]
But if all your data is 0s and 1s:
a_dot = [all(col) for col in zip(*a)]
Did you try the reduce function? You call it with a function (see it as your operator) and a list and it applies it the way you described.
You can solve by below code,
def multiply(list_a,list_b):
c = []
for x,y in zip(list_a,list_b):
c.append(x*y)
return c
reduce(lambda list_a,list_b: multiply(list_a,list_b), a)
Happy coding!!!!

Modify numpy array column by column inside a loop

Is there a way to modify a numpy array inside a loop column by column?
I expect this could be done by some code like that:
import numpy as n
cnA=n.array([[10,20]]).T
mnX=n.array([[1,2],[3,4]])
for cnX in n.nditer(mnX.T, <some params>):
cnX = cnX+cnA
Which parameters should I use to obtain mnX=[[10,23],[12,24]]?
I am aware that the problem could be solved using the following code:
cnA=n.array([10,20])
mnX=n.array([[1,2],[3,4]])
for col in range(mnX.shape[1]):
mnX[:,col] = mnX[:,col]+cnA
Hovewer, in python we loop through modified objects, not indexes, so the question is - is it possible to loop through columns (that need to be modified in-place) directly?
Just so you know, some of us, in Python, do iterate over indices and not modified objects when it is helpful. Although in NumPy, as a general rule, we don't explicitly iterate unless there is no other way out: for your problem, the simplest approach would be to skip the iteration and rely on broadcasting:
mnX += cnA
If you insist on iterating, I think the simplest would be to iterate over the transposed array:
for col in mnX.T:
col += cnA[:, 0].T

Categories

Resources