Assign images to the elements of a pandas dataframe in Python - python

I have a pandas dataframe, which one of the columns are images (single channel uint8 2d images in the numpy arrays format).
I am iterating thorugh the rows with iterrows(), and processing the images and I want to assing the results (other image, in the same format) to the elements of other column of the dataframe. I have a column for the images.
for index,row in df.iterrows():
image=df['image']
processed=process_image(image)
df.loc[index,'processed_image']=processed
However, when I try to use either .loc or .at (or .iloc, .iat), face an error like this (respective for .loc and .at):
ValueError: cannot set using a multi-index selection indexer with a different length than the value
ValueError: setting an array element with a sequence.
Probably loc and at are expecting a single value, they expect that arrays are meant to fill several indexes of the pandas dataframe. But I dont want that, I want the array as a single element.
I couldnt find the exact questino elsewhere in the internet. The closest I found as already initializing the dataframe with arrays elements by hand, not assingning in an iterrows.
Anyone know how to solve? Thanks in advance.

Try adding a new column as a function of the existing columns via the .apply() method e.g.
df['new_col'] = df.apply(lambda row: ..., axis=1)

Related

How to split dataframe or array by unique column value with multiple unique values

So I have a dataframe that looks like this for example:
In this example, I need to split the dataframe into multiple dataframes based on the account_id(or arrays because I will convert it anyways). I want each account id (ab123982173 and bc123982173) to be either an individual data frame or array. Since the actual dataset is thousands of rows long, splitting into a temporary array in a loop was my original thought.
Any help would be appreciated.
you can get a subset of your dataframe.
Using your dataframe as example,
subset_dataframe = dataframe[dataframe["Account_ID"] == "ab123982173"]
Here is a link from the pandas documentation that has visual examples:
https://pandas.pydata.org/docs/getting_started/intro_tutorials/03_subset_data.html

Getting "cannot set using a multi-index selection indexer with a different length than the value" error while using np.where function

I am trying to append two data frames row wise iteratively. After that I am trying fill 0 values in one column with the values in other columns and vice versa. I am using np.where function to fill the 0 values. When I am doing it separately it is giving correct result but when I am using it in a loop it is throwing "cannot set using a multi-index selection indexer with a different length than the value" error. My code looks like below.
def myfunc(dd1,dd2,dfc):
n=dd1.shape[0]
for i in range(n):
dfc2=dd1.iloc[i:i+1].append(dd2.iloc[i:i+1])
dfc=dfc.append(dfc2)
m=dfc.shape[0]
for j in range(m):
dfc.iloc[j:j+1,2:3]=np.where(dfc.iloc[j:j+1,2:3]==0,dfc.iloc[j+1:j+2,3:4],dfc.iloc[j:j+1,2:3])
dfc.iloc[j+1:j+2,3:4]=np.where(dfc.iloc[j+1:j+2,3:4]==0,dfc.iloc[j:j+1,2:3],dfc.iloc[j+1:j+2,3:4])
return dfc
Where dd1 and dd2 are my dataframes, I am appending rows in them iteratively to a empty dataframe dfc. Here I am using row and column indices to fill the values. Any help on this will be appreciated.
This is not how np.where works. The input of np.where is a list-like object. Instead of looping every data in the dataframe and fed it into the np.where, you should input the entire array to the np where.
dfc.iloc[:,2:3] = np.where(dfc.iloc[:,2:3]==0,dfc.iloc[:,3:4].shift(-1),dfc.iloc[:,2:3])
dfc.iloc[:,3:4] = np.where(dfc.iloc[:,3:4]==0,dfc.iloc[:,2:3],dfc.iloc[:,3:4].shift(-1))
This should work now. Be careful about the pd.DataFrame.iloc and avoid it if you are assigning it to new values. I would recommend you to use loc instead. My script may have potential bug depends on your pandas version.

Pandas: after slicing along specific columns, get "values" without returning entire dataframe

Here is what is happening:
df = pd.read_csv('data')
important_region = df[df.columns.get_loc('A'):df.columns.get_loc('C')]
important_region_arr = important_region.values
print(important_region_arr)
Now, here is the issue:
print(important_region.shape)
output: (5,30)
print(important_region_arr.shape)
output: (5,30)
print(important_region)
output: my columns, in the panda way
print(important_region_arr)
output: first 5 rows of the dataframe
How, having indexed my columns, do I transition to the numpy array?
Alternatively, I could just convert to numpy from the get-go and run the slicing operation within numpy. But, how is this done in pandas?
So here is how you can slice the dataset with specific columns. loc gives you access to the grup of rows and columns. The ones before , represents rows and columns after. If a : is specified it means all the rows.
data.loc[:,'A':'C']
For more understanding, please look at the documentation.

Extracting values from pandas DataFrame using a pandas Series

I have a pandas Series that contains key-value pairs, where the key is the name of a column in my pandas DataFrame and the value is an index in that column of the DataFrame.
For example:
Series:
Series
Then in my DataFrame:
Dataframe
Therefore, from my DataFrame I want to extract the value at index 12 from my DataFrame for 'A', which is 435.81 . I want to put all these values into another Series, so something like { 'A': 435.81 , 'AAP': 468.97,...}
My reputation is low so I can't post my images as images instead of links (can someone help fix this? thanks!)
I think this indexing is what you're looking for.
pd.Series(np.diag(df.loc[ser,ser.axes[0]]), index=df.columns)
df.loc allows you to index based on string indices. You get your rows given from the values in ser (first positional argument in df.loc) and you get your column location from the labels of ser (I don't know if there is a better way to get the labels from a series than ser.axes[0]). The values you want are along the main diagonal of the result, so you take just the diagonal and associate them with the column labels.
The indexing I gave before only works if your DataFrame uses integer row indices, or if the data type of your Series values matches the DataFrame row indices. If you have a DataFrame with non-integer row indices, but still want to get values based on integer rows, then use the following (however, all indices from your series must be within the range of the DataFrame, which is not the case with 'AAL' being 1758 and only 12 rows, for example):
pd.Series(np.diag(df.iloc[ser,:].loc[:,ser.axes[0]]), index=df.columns)

pandas: Select one-row data frame instead of series [duplicate]

I have a huge dataframe, and I index it like so:
df.ix[<integer>]
Depending on the index, sometimes this will have only one row of values. Pandas automatically converts this to a Series, which, quite frankly, is annoying because I can't operate on it the same way I can a df.
How do I either:
1) Stop pandas from converting and keep it as a dataframe ?
OR
2) easily convert the resulting series back to a dataframe ?
pd.DataFrame(df.ix[<integer>]) does not work because it doesn't keep the original columns. It treats the <integer> as the column, and the columns as indices. Much appreciated.
You can do df.ix[[n]] to get a one-row dataframe of row n.

Categories

Resources