This is what I wrote:
import numpy as np
a = np.array([[]])
np.insert(a, 0, 1, axis=1)
My code just ignores the insert line for some reason. I even tried
np.put_along_axis() but it's showing an error
I just want to insert or append or put a number into an ndarray.
This forces me to turn it into a normal list, append and turn it back.
Please help
Referring to the documentation:
Returns: out: ndarray
A copy of arr with values inserted. Note that insert does not occur in-place: a new array is returned. If axis is None, out is a flattened array.
So I think all that's missing here is to assign the modified array back to a:
a = np.insert(a, 0, 1, axis=1)
Related
I feel I'm making this harder than it should be: what I have is a dataframe with some columns whose entries each contain numpy arrays (the names of the columns containing these arrays is in an array called names_of_cols_that_contain_arrays). What I want to do is filter out rows for which these numpy arrays have a sum value of zero. This is a similar question on which my code is based, but it doesn't seem to work with the iterator over rows in each column.
What I have currently in my code is
for col_name in names_of_cols_that_contain_arrays:
for i in range(len(df[col_name])):
df = df[df[col_name][i].sum() > 0.0]
which doesn't seem that efficient but is a first attempt that explictly goes through what I thought would be the correct method. But this appears to return a boolean, i.e.
Traceback
...
KeyError: True
In fact in most cases to the code above I get some error associated with a boolean being returned. Any pointers would be appreciated, thanks in advance!
IIUC:
You can try:
df=df.loc[df['names_of_cols_that_contain_arrays'].map(sum)>0]
#OR
df=df.loc[df['names_of_cols_that_contain_arrays'].map(np.sum).gt(0)]
Sample dataframe used:
from numpy import array
d={'names_of_cols_that_contain_arrays': {0: array([-1, 0, -8]),
1: array([-1, -2, 5])}}
df=pd.DataFrame(d)
I have a dataframe that I am concatenating from dataframes and arrays.
Somehow its inherited the index of the original dataframe - hence I am trying to exclude rows based on one of the columns that should not have missing values.
If I view my dataframe, it shows as this:
print(model_data2['is_62p_days_overdue'][0:11])
now, when I run:
print(model_data2['is_62p_days_overdue'].where(model_data2['is_62p_days_overdue'] != np.nan)[0:11])
I get the exact same output.
And when I run :
print(model_data2['is_62p_days_overdue'].where(model_data2['is_62p_days_overdue'] == np.nan)[0:11])
What am I missing? this is driving me nuts!
I've tried resetting the index - but this also does nothing.
IIUC:
Instead of this:
print(model_data2['is_62p_days_overdue'].where(model_data2['is_62p_days_overdue'] != np.nan)[0:11])
try with loc accessor and notna() method:
print(model_data2.loc[model_data2['is_62p_days_overdue'].notna(),'is_62p_days_overdue'][0:11])
Answer to the comment:
there are 2 reasons of it
you can't compare NaN's like that like you do in your method:
model_data2['is_62p_days_overdue'] != np.nan
#this is wrong instead use notna() method
2.You are using where method even when you corrected above method it will make that back to NaN:
model_data2['is_62p_days_overdue'].where(model_data2['is_62p_days_overdue'].notna())
see the "# rows you may want to see" in the bottom of my code
import pandas as pd
import numpy as np
# make a dataset
dict={'is_62p_days_overdue':[0, 0, 0, 0, 0, None, None, 0, None, 0, None]}
data=pd.DataFrame(dict)
print(data)
# append numeric 1~10
data=data.append(pd.DataFrame({'is_62p_days_overdue': list(range(1,10+1))}),ignore_index=True)
data
# rows you may want to see
data.loc[~(data.is_62p_days_overdue.isna())]
you can use .dropna() to drop the rows with NaN values.
use this:
model_data2.dropna(subset = ['is_62p_days_overdue'], inplace = True)
I have a pandas dataframe df whose elements are each a whole numpy array. For example the 6th row of column 'x_grid':
>>> e = df.loc[6,'x_grid']
>>> print(e)
[-11.52616579 -11.48006112 -11.43395646 -11.3878518 -11.34174713
-11.29564247 -11.24953781 -11.20343315 -11.15732848 -11.11122382
-11.06511916 -11.01901449 ...
But I cannot use this as a numpy array as it is just given as a string:
>>> print(type(e))
<class 'str'>
How can I store a numpy array to a dataframe so it does not get converted to a string? Or convert this string back to a numpy array in a nice way?
If you just want to convert all those strings in each row into a list the following will work:
df['x_grid'].str[1:-1].str.split(" ").apply(lambda x: (list(map(float, x))))
# or for a numpy array
df['x_grid'].str[1:-1].str.split(" ").apply(lambda x: (np.array(list(map(float, x)))))
Hope that helps.
Thanks to Erfan and hpaulj for the suggestions that combined to answer this question.
The solution is that when setting an element of the dataframe I first convert the numpy array x to a list (so it is comma separated not space separated):
df = df.append({'x_grid': list(x)}, ignore_index=True)
Then after saving to a csv, and loading back in, I extract it back into a numpy array using np.array() and ast.literal_eval() (Note: requires import ast):
x = np.array(ast.literal_eval(df.loc[entry,'x_grid']))
This then returns a correct numpy array x.
Want to extend Rafal's answer to avoid numpy throwing exception from empty strings resulting from the x.split:
df['x_grid'].str[1:-1].apply(lambda x: list(filter(None,x.split(' ')))).apply(lambda x: np.array(x).astype(np.float))
From the pandas documentation, i get Series.axes will return a list, and indeed it is a list
$ python3 process_data.py
<class 'list'>
However, when I attempted to print the string representation of the list, I get this
To run print directly
print(row.axes)
$ python3 process_data.py
Index(['rank', 'name','high', 'low', 'analysis'],
dtype='object')
Which doesn't look like a normal list at all.
>>> [1,2,3,4,5]
[1, 2, 3, 4, 5]
I still can access the information in the weird list by doing list_name[0][index], which is like a two-dimensional list. I mean if its internal type is list, how can it have this behavior. If it is a numpy array like object, why the internal type is still list.
EDIT:
def process_nextfile(date, catagory):
df1 = pd.read_csv('{}all_csv/{}/catagory{}'.format(BASE_DIR, date, catagory), header = None, names = CATAGORY_HEADER[catagory-1])
for index, row in df1.iterrows():
print(row.axes.__name__)
break
if __name__ == '__main__':
process_nextfile('2016-04-05', 2)
When you use iterrows(), every row is a pandas Series, the axes attribute returns a list of labels/or index. So what is contained in the list are index objects, check this simple example:
s = pd.Series([1,2,3])
s.axes
# [RangeIndex(start=0, stop=3, step=1)]
To get a normal list, you can access the index object and then convert it to a list:
s.axes[0].tolist()
# [0, 1, 2]
Hello all I have a list of delimiter separated strings:
lists=['1|Abra|23|43|0','2|Cadabra|15|18|0','3|Grabra|4|421|0','4|Lol|1|15|0']
I need to convert it to numpy array than sort it just like excel do first by Column 3, then by Column 2, and finaly by the last column
Ive tried this:
def man():
a = np.array(lists[0].split('|'))
for line in lists:
temp = np.array(line.split('|'),)
a=np.concatenate((a, temp))
a.sort(order=[0, 1])
man()
Of course no luck because it is wrong! Unfortunately im not strong in numpy arrays. Can somebody help me pls? :(
This works just perfect for me but here numpy builds array from file so to make it work i've write my list of strings to file than read it and convert to array
import numpy as np
# let numpy guess the type with dtype=None
my_data = np.genfromtxt('Selector/tmp.txt',delimiter='|', dtype=None, names ["Num", "Date", "Desc", "Rgh" ,"Prc", "Color", "Smb", "MType"])
my_data.sort(order=["Color","Prc", "Rgh"])
# save specifying required format (tab separated values)
print(my_data)
How to remain everything as is but change the conversion function to make it build the same array not from file but from list
There may be better solutions, but for a start I would sort the array once by each column in reverse order.
I assume you want to sort by column 3 and ties are resolved by column 2. Finally, remaining ties are resolved by the last column. Thus, you'd actually sort by the last column first, then by 2, then by 3.
Furthermore, you can easily convert the list to an array using a list comprehension.
import numpy as np
lists=['1|Abra|23|43|0','2|Cadabra|15|18|0','3|Grabra|4|421|0','4|Lol|1|15|0']
# convert to numpy array by splitting each row
a = np.array([l.split('|') for l in lists])
# specify columns to sort by, in order
sort_cols = [3, 2, -1]
# sort by columns in reverse order.
# This only works correctly if the sorting algorithm is stable.
for sc in sort_cols[::-1]:
order = np.argsort(a[:, sc])
a = a[order]
print(a)
You can use a list comprehension in order to split your strings and convert the integers to int. Then use a proper dtype to create your numpy array then use np.sort() function by passing the expected order:
>>> dtype = [('1st', int), ('2nd', '|S7'), ('3rd', int), ('4th', int), ('5th', int)]
>>>
>>> a = np.array([tuple([int(i) if i.isdigit() else i for i in sub.split('|')]) for sub in delimit_strs], dtype=dtype)
>>> np.sort(a, axis=0, order=['3rd','2nd', '5th'])
array([(4, 'Lol', 1, 15, 0), (3, 'Grabra', 4, 421, 0),
(2, 'Cadabra', 15, 18, 0), (1, 'Abra', 23, 43, 0)],
dtype=[('1st', '<i8'), ('2nd', 'S7'), ('3rd', '<i8'), ('4th', '<i8'), ('5th', '<i8')])
You can also do this in python which for shorter data sets in more optimized. You can simple use sorted() function by passing a proper key function.
from operator import itemgetter
sorted([[int(i) if i.isdigit() else i for i in sub.split('|')]) for sub in delimit_strs], key=itemgetter(3, 2, 4))