List from Column elements - python

I want to create a column 'List' for column 'Numbers' such that it gives a list leaving the element of the corresponding row in pandas.
Table:
| Numbers | List |
| -------- | -------------- |
| 1 | [2,3,4,1] |
| 2 | [3,4,1,1] |
| 3 | [4,1,1,2] |
| 4 | [1,1,2,3] |
| 1 | [1,2,3,4] |
Can anyone help with this, please?

For general solution working with duplicated values first repeat values by numpy.tile and then remove values of diagonal for delete value of row:
df = pd.DataFrame({'Numbers':[1,2,3,4,1]})
A = np.tile(df['Numbers'], len(df)).reshape(-1, len(df))
#https://stackoverflow.com/a/46736275/2901002
df['new'] = A[~np.eye(A.shape[0],dtype=bool)].reshape(A.shape[0],-1).tolist()
print (df)
0 1 [2, 3, 4, 1]
1 2 [1, 3, 4, 1]
2 3 [1, 2, 4, 1]
3 4 [1, 2, 3, 1]
4 1 [1, 2, 3, 4]

Try this:
df = pd.DataFrame({'numbers':range(1, 5)})
df['list'] = df['numbers'].apply(lambda x: [i for i in df['numbers'] if i != x])
df

import pandas as pd
df = pd.DataFrame({'Numbers':[1,2,3,4,5]})
df['List'] = df['Numbers'].apply(
# for every cell with element x in Numbers return Numbers without the element
lambda x: [y for y in df['Numbers'] if not y==x])
which results in:
df
Numbers List
0 1 [2, 3, 4, 5]
1 2 [1, 3, 4, 5]
2 3 [1, 2, 4, 5]
3 4 [1, 2, 3, 5]
4 5 [1, 2, 3, 4]

Abhinar Khandelwal. If the task is to get any other number besides the current row value, than my answer can be fixed to the following:
import numpy as np
rng = np.random.default_rng()
numbers = rng.integers(5, size=7)
df = pd.DataFrame({'numbers':numbers})
df['list'] = df.reset_index()['index'].apply(lambda x: df[df.index != x].numbers.values)
df
But this way is much faster https://stackoverflow.com/a/73275614/18965699 :)

Related

How to concatenate many more columns to one column with list format?

I need convert array to Dataframe, the default method like that:
lst = [[1,2,3],[1,2,3],[1,2,3]]
pd.DataFrame(lst)
out:
0 1 2
0 1 2 3
1 1 2 3
2 1 2 3
but, I want to the format, like:
0
0 [1, 2, 3]
1 [1, 2, 3]
2 [1, 2, 3]
just one column.
EDIT: the other situation,
lst = np.array([[1,2,3],[1,2,3],[1,2,3]])
pd.DataFrame(lst)
You may initialize a single column:
In [279]: pd.DataFrame({0: lst})
Out[279]:
0
0 [1, 2, 3]
1 [1, 2, 3]
2 [1, 2, 3]
In the array scenario, use apply with list and to_frame
lst = np.array([[1,2,3],[1,2,3],[1,2,3]])
pd.DataFrame(lst).apply(lambda x: list(x), axis=1).to_frame()
0
0 [1, 2, 3]
1 [1, 2, 3]
2 [1, 2, 3]
make a series and then a dataframe of that
>>> pd.DataFrame(pd.Series(lst))
0
0 [1, 2, 3]
1 [1, 2, 3]
2 [1, 2, 3]

Appending Pandas DataFrame column based on another column

I have Pandas DataFrame that looks like this:
| Index | Value |
|-------|--------------|
| 1 | [1, 12, 123] |
| 2 | [12, 123, 1] |
| 3 | [123, 12, 1] |
and I want to append third column with list of array elements lengths:
| Index | Value | Expected_value |
|-------|--------------|----------------|
| 1 | [1, 12, 123] | [1, 2, 3] |
| 2 | [12, 123, 1] | [2, 3, 1] |
| 3 | [123, 12, 1] | [3, 2, 1] |
I've tried to use python lambda function and mapping little bit like this:
dataframe["Expected_value"] = dataframe.value.map(lambda x: len(str(x)))
but instead of list I got sum of those lengths:
| Index | Value | Expected_value |
|-------|--------------|----------------|
| 1 | [1, 12, 123] | 6 |
| 2 | [12, 123, 1] | 6 |
| 3 | [123, 12, 1] | 6 |
You can use list comprehension with map:
dataframe["Expected_value"] = dataframe.Value.map(lambda x: [len(str(y)) for y in x])
Or nested list comprehension:
dataframe["Expected_value"] = [[len(str(y)) for y in x] for x in dataframe.Value]
There is also possible use alternative for get lengths of integers:
import math
dataframe["Expected_value"] = [[int(math.log10(y))+1 for y in x] for x in dataframe.Value]
print (dataframe)
Index Value Expected_value
0 1 [1, 12, 123] [1, 2, 3]
1 2 [12, 123, 1] [2, 3, 1]
2 3 [123, 12, 1] [3, 2, 1]
Use a list comprehension:
[[len(str(y)) for y in x] for x in df['Value'].tolist()]
# [[1, 2, 3], [2, 3, 1], [3, 2, 1]]
df['Expected_value'] = [[len(str(y)) for y in x] for x in df['Value'].tolist()]
df
Index Value Expected_value
0 1 [1, 12, 123] [1, 2, 3]
1 2 [12, 123, 1] [2, 3, 1]
2 3 [123, 12, 1] [3, 2, 1]
If you need to handle missing data,
def foo(x):
try:
return [len(str(y)) for y in x]
except TypeError:
return np.nan
df['Expected_value'] = [foo(x) for x in df['Value'].tolist()]
df
Index Value Expected_value
0 1 [1, 12, 123] [1, 2, 3]
1 2 [12, 123, 1] [2, 3, 1]
2 3 [123, 12, 1] [3, 2, 1]
It is probably the best in terms of performance when dealing with object type data. More reading at For loops with pandas - When should I care?.
Another solution with pd.DataFrame, applymap and agg:
pd.DataFrame(df['Value'].tolist()).astype(str).applymap(len).agg(list, axis=1)
0 [1, 2, 3]
1 [2, 3, 1]
2 [3, 2, 1]
dtype: object

remove elements from list based on index in pandas Dataframe

How to remove elements from list based on index range in pandas Dataframe.
suppose DataFrame is like
df:
values size
0 [1,2,3,4,5,6,7] 2 #delete first 2 elements from list
1 [1,2,3,4] 3 #delete first 3 elements from list
2 [9,8,7,6,5,4,3] 5 #delete first 5 elements from list
Expected Output is
df:
values size
0 [3,4,5,6,7] 2
1 [4] 3
2 [4,3] 5
Use list comprehension with indexing:
df['values'] = [i[j:] for i, j in zip(df['values'], df['size'])]
print (df)
values size
0 [3, 4, 5, 6, 7] 2
1 [4] 3
2 [4, 3] 5
Using df.apply
import pandas as pd
df = pd.DataFrame({"values": [[1,2,3,4,5,6,7], [1,2,3,4], [9,8,7,6,5,4,3]], "size": [2, 3, 5]})
df["values"] = df.apply(lambda x: x["values"][x['size']:], axis=1)
print(df)
Output:
size values
0 2 [3, 4, 5, 6, 7]
1 3 [4]
2 5 [4, 3]
Using map in base Python, you could do
dat['values'] = pd.Series(map(lambda x, y : x[y:], dat['values'], dat['size']))
which returns
dat
Out[34]:
values size
0 [3, 4, 5, 6, 7] 2
1 [4] 3
2 [4, 3] 5

Python - pick a value from a list basing on another list

I've got a dataframe. In column A there is a list of integers, in column B - an integer. I want to pick n-th value of the column A list, where n is a number from column B. So if in columns A there is [1,5,6,3,4] and in column B: 2, I want to get '6'.
I tried this:
result = [y[x] for y in df['A'] for x in df['B']
But it doesn't work. Please help.
Use zip with list comprehension:
df['new'] = [y[x] for x, y in zip(df['B'], df['A'])]
print (df)
A B new
0 [1, 2, 3, 4, 5] 1 2
1 [1, 2, 3, 4] 2 3
You can go for apply i.e
df = pd.DataFrame({'A':[[1,2,3,4,5],[1,2,3,4]],'B':[1,2]})
A B
0 [1, 2, 3, 4, 5] 1
1 [1, 2, 3, 4] 2
# df.apply(lambda x : np.array(x['A'])[x['B']],1)
# You dont need np.array here, use it when the column B is also a list.
df.apply(lambda x : x['A'][x['B']],1) # Thanks #Zero
0 2
1 3
dtype: int64

Find all duplicate rows in a pandas dataframe

I would like to be able to get the indices of all the instances of a duplicated row in a dataset without knowing the name and number of columns beforehand. So assume I have this:
col
1 | 1
2 | 2
3 | 1
4 | 1
5 | 2
I'd like to be able to get [1, 3, 4] and [2, 5]. Is there any way to achieve this? It sounds really simple, but since I don't know the columns beforehand I can't do something like df[col == x...].
First filter all duplicated rows and then groupby with apply or convert index to_series:
df = df[df.col.duplicated(keep=False)]
a = df.groupby('col').apply(lambda x: list(x.index))
print (a)
col
1 [1, 3, 4]
2 [2, 5]
dtype: object
a = df.index.to_series().groupby(df.col).apply(list)
print (a)
col
1 [1, 3, 4]
2 [2, 5]
dtype: object
And if need nested lists:
L = df.groupby('col').apply(lambda x: list(x.index)).tolist()
print (L)
[[1, 3, 4], [2, 5]]
If need use only first column is possible selected by position with iloc:
a = df[df.iloc[:,0].duplicated(keep=False)]
.groupby(df.iloc[:,0]).apply(lambda x: list(x.index))
print (a)
col
1 [1, 3, 4]
2 [2, 5]
dtype: object

Categories

Resources