List from Column elements

List from Column elements - python

I want to create a column 'List' for column 'Numbers' such that it gives a list leaving the element of the corresponding row in pandas.
Table:
| Numbers | List |
| -------- | -------------- |
| 1 | [2,3,4,1] |
| 2 | [3,4,1,1] |
| 3 | [4,1,1,2] |
| 4 | [1,1,2,3] |
| 1 | [1,2,3,4] |
Can anyone help with this, please?

For general solution working with duplicated values first repeat values by numpy.tile and then remove values of diagonal for delete value of row:
df = pd.DataFrame({'Numbers':[1,2,3,4,1]})
A = np.tile(df['Numbers'], len(df)).reshape(-1, len(df))
#https://stackoverflow.com/a/46736275/2901002
df['new'] = A[~np.eye(A.shape[0],dtype=bool)].reshape(A.shape[0],-1).tolist()
print (df)
0 1 [2, 3, 4, 1]
1 2 [1, 3, 4, 1]
2 3 [1, 2, 4, 1]
3 4 [1, 2, 3, 1]
4 1 [1, 2, 3, 4]

Try this:
df = pd.DataFrame({'numbers':range(1, 5)})
df['list'] = df['numbers'].apply(lambda x: [i for i in df['numbers'] if i != x])
df

import pandas as pd
df = pd.DataFrame({'Numbers':[1,2,3,4,5]})
df['List'] = df['Numbers'].apply(
# for every cell with element x in Numbers return Numbers without the element
lambda x: [y for y in df['Numbers'] if not y==x])
which results in:
df
Numbers List
0 1 [2, 3, 4, 5]
1 2 [1, 3, 4, 5]
2 3 [1, 2, 4, 5]
3 4 [1, 2, 3, 5]
4 5 [1, 2, 3, 4]

Abhinar Khandelwal. If the task is to get any other number besides the current row value, than my answer can be fixed to the following:
import numpy as np
rng = np.random.default_rng()
numbers = rng.integers(5, size=7)
df = pd.DataFrame({'numbers':numbers})
df['list'] = df.reset_index()['index'].apply(lambda x: df[df.index != x].numbers.values)
df
But this way is much faster https://stackoverflow.com/a/73275614/18965699 :)

Related

How to concatenate many more columns to one column with list format?

I need convert array to Dataframe, the default method like that:
lst = [[1,2,3],[1,2,3],[1,2,3]]
pd.DataFrame(lst)
out:
0 1 2
0 1 2 3
1 1 2 3
2 1 2 3
but, I want to the format, like:
0
0 [1, 2, 3]
1 [1, 2, 3]
2 [1, 2, 3]
just one column.
EDIT: the other situation,
lst = np.array([[1,2,3],[1,2,3],[1,2,3]])
pd.DataFrame(lst)

You may initialize a single column:
In [279]: pd.DataFrame({0: lst})
Out[279]:
0
0 [1, 2, 3]
1 [1, 2, 3]
2 [1, 2, 3]

In the array scenario, use apply with list and to_frame
lst = np.array([[1,2,3],[1,2,3],[1,2,3]])
pd.DataFrame(lst).apply(lambda x: list(x), axis=1).to_frame()
0
0 [1, 2, 3]
1 [1, 2, 3]
2 [1, 2, 3]

make a series and then a dataframe of that
>>> pd.DataFrame(pd.Series(lst))
0
0 [1, 2, 3]
1 [1, 2, 3]
2 [1, 2, 3]

Appending Pandas DataFrame column based on another column

I have Pandas DataFrame that looks like this:
| Index | Value |
|-------|--------------|
| 1 | [1, 12, 123] |
| 2 | [12, 123, 1] |
| 3 | [123, 12, 1] |
and I want to append third column with list of array elements lengths:
| Index | Value | Expected_value |
|-------|--------------|----------------|
| 1 | [1, 12, 123] | [1, 2, 3] |
| 2 | [12, 123, 1] | [2, 3, 1] |
| 3 | [123, 12, 1] | [3, 2, 1] |
I've tried to use python lambda function and mapping little bit like this:
dataframe["Expected_value"] = dataframe.value.map(lambda x: len(str(x)))
but instead of list I got sum of those lengths:
| Index | Value | Expected_value |
|-------|--------------|----------------|
| 1 | [1, 12, 123] | 6 |
| 2 | [12, 123, 1] | 6 |
| 3 | [123, 12, 1] | 6 |

You can use list comprehension with map:
dataframe["Expected_value"] = dataframe.Value.map(lambda x: [len(str(y)) for y in x])
Or nested list comprehension:
dataframe["Expected_value"] = [[len(str(y)) for y in x] for x in dataframe.Value]
There is also possible use alternative for get lengths of integers:
import math
dataframe["Expected_value"] = [[int(math.log10(y))+1 for y in x] for x in dataframe.Value]
print (dataframe)
Index Value Expected_value
0 1 [1, 12, 123] [1, 2, 3]
1 2 [12, 123, 1] [2, 3, 1]
2 3 [123, 12, 1] [3, 2, 1]

Use a list comprehension:
[[len(str(y)) for y in x] for x in df['Value'].tolist()]
# [[1, 2, 3], [2, 3, 1], [3, 2, 1]]
df['Expected_value'] = [[len(str(y)) for y in x] for x in df['Value'].tolist()]
df
Index Value Expected_value
0 1 [1, 12, 123] [1, 2, 3]
1 2 [12, 123, 1] [2, 3, 1]
2 3 [123, 12, 1] [3, 2, 1]
If you need to handle missing data,
def foo(x):
try:
return [len(str(y)) for y in x]
except TypeError:
return np.nan
df['Expected_value'] = [foo(x) for x in df['Value'].tolist()]
df
Index Value Expected_value
0 1 [1, 12, 123] [1, 2, 3]
1 2 [12, 123, 1] [2, 3, 1]
2 3 [123, 12, 1] [3, 2, 1]
It is probably the best in terms of performance when dealing with object type data. More reading at For loops with pandas - When should I care?.
Another solution with pd.DataFrame, applymap and agg:
pd.DataFrame(df['Value'].tolist()).astype(str).applymap(len).agg(list, axis=1)
0 [1, 2, 3]
1 [2, 3, 1]
2 [3, 2, 1]
dtype: object

remove elements from list based on index in pandas Dataframe

How to remove elements from list based on index range in pandas Dataframe.
suppose DataFrame is like
df:
values size
0 [1,2,3,4,5,6,7] 2 #delete first 2 elements from list
1 [1,2,3,4] 3 #delete first 3 elements from list
2 [9,8,7,6,5,4,3] 5 #delete first 5 elements from list
Expected Output is
df:
values size
0 [3,4,5,6,7] 2
1 [4] 3
2 [4,3] 5

Use list comprehension with indexing:
df['values'] = [i[j:] for i, j in zip(df['values'], df['size'])]
print (df)
values size
0 [3, 4, 5, 6, 7] 2
1 [4] 3
2 [4, 3] 5

Using df.apply
import pandas as pd
df = pd.DataFrame({"values": [[1,2,3,4,5,6,7], [1,2,3,4], [9,8,7,6,5,4,3]], "size": [2, 3, 5]})
df["values"] = df.apply(lambda x: x["values"][x['size']:], axis=1)
print(df)
Output:
size values
0 2 [3, 4, 5, 6, 7]
1 3 [4]
2 5 [4, 3]

Using map in base Python, you could do
dat['values'] = pd.Series(map(lambda x, y : x[y:], dat['values'], dat['size']))
which returns
dat
Out[34]:
values size
0 [3, 4, 5, 6, 7] 2
1 [4] 3
2 [4, 3] 5

Python - pick a value from a list basing on another list

I've got a dataframe. In column A there is a list of integers, in column B - an integer. I want to pick n-th value of the column A list, where n is a number from column B. So if in columns A there is [1,5,6,3,4] and in column B: 2, I want to get '6'.
I tried this:
result = [y[x] for y in df['A'] for x in df['B']
But it doesn't work. Please help.

Use zip with list comprehension:
df['new'] = [y[x] for x, y in zip(df['B'], df['A'])]
print (df)
A B new
0 [1, 2, 3, 4, 5] 1 2
1 [1, 2, 3, 4] 2 3

You can go for apply i.e
df = pd.DataFrame({'A':[[1,2,3,4,5],[1,2,3,4]],'B':[1,2]})
A B
0 [1, 2, 3, 4, 5] 1
1 [1, 2, 3, 4] 2
# df.apply(lambda x : np.array(x['A'])[x['B']],1)
# You dont need np.array here, use it when the column B is also a list.
df.apply(lambda x : x['A'][x['B']],1) # Thanks #Zero
0 2
1 3
dtype: int64

Find all duplicate rows in a pandas dataframe

I would like to be able to get the indices of all the instances of a duplicated row in a dataset without knowing the name and number of columns beforehand. So assume I have this:
col
1 | 1
2 | 2
3 | 1
4 | 1
5 | 2
I'd like to be able to get [1, 3, 4] and [2, 5]. Is there any way to achieve this? It sounds really simple, but since I don't know the columns beforehand I can't do something like df[col == x...].

First filter all duplicated rows and then groupby with apply or convert index to_series:
df = df[df.col.duplicated(keep=False)]
a = df.groupby('col').apply(lambda x: list(x.index))
print (a)
col
1 [1, 3, 4]
2 [2, 5]
dtype: object
a = df.index.to_series().groupby(df.col).apply(list)
print (a)
col
1 [1, 3, 4]
2 [2, 5]
dtype: object
And if need nested lists:
L = df.groupby('col').apply(lambda x: list(x.index)).tolist()
print (L)
[[1, 3, 4], [2, 5]]
If need use only first column is possible selected by position with iloc:
a = df[df.iloc[:,0].duplicated(keep=False)]
.groupby(df.iloc[:,0]).apply(lambda x: list(x.index))
print (a)
col
1 [1, 3, 4]
2 [2, 5]
dtype: object

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

List from Column elements - python

Try this: df = pd.DataFrame({'numbers':range(1, 5)}) df['list'] = df['numbers'].apply(lambda x: [i for i in df['numbers'] if i != x]) df

Related

How to concatenate many more columns to one column with list format?

Appending Pandas DataFrame column based on another column

remove elements from list based on index in pandas Dataframe

Python - pick a value from a list basing on another list

Find all duplicate rows in a pandas dataframe

Categories

Resources