Appending Pandas DataFrame column based on another column

Appending Pandas DataFrame column based on another column - python

I have Pandas DataFrame that looks like this:
| Index | Value |
|-------|--------------|
| 1 | [1, 12, 123] |
| 2 | [12, 123, 1] |
| 3 | [123, 12, 1] |
and I want to append third column with list of array elements lengths:
| Index | Value | Expected_value |
|-------|--------------|----------------|
| 1 | [1, 12, 123] | [1, 2, 3] |
| 2 | [12, 123, 1] | [2, 3, 1] |
| 3 | [123, 12, 1] | [3, 2, 1] |
I've tried to use python lambda function and mapping little bit like this:
dataframe["Expected_value"] = dataframe.value.map(lambda x: len(str(x)))
but instead of list I got sum of those lengths:
| Index | Value | Expected_value |
|-------|--------------|----------------|
| 1 | [1, 12, 123] | 6 |
| 2 | [12, 123, 1] | 6 |
| 3 | [123, 12, 1] | 6 |

You can use list comprehension with map:
dataframe["Expected_value"] = dataframe.Value.map(lambda x: [len(str(y)) for y in x])
Or nested list comprehension:
dataframe["Expected_value"] = [[len(str(y)) for y in x] for x in dataframe.Value]
There is also possible use alternative for get lengths of integers:
import math
dataframe["Expected_value"] = [[int(math.log10(y))+1 for y in x] for x in dataframe.Value]
print (dataframe)
Index Value Expected_value
0 1 [1, 12, 123] [1, 2, 3]
1 2 [12, 123, 1] [2, 3, 1]
2 3 [123, 12, 1] [3, 2, 1]

Use a list comprehension:
[[len(str(y)) for y in x] for x in df['Value'].tolist()]
# [[1, 2, 3], [2, 3, 1], [3, 2, 1]]
df['Expected_value'] = [[len(str(y)) for y in x] for x in df['Value'].tolist()]
df
Index Value Expected_value
0 1 [1, 12, 123] [1, 2, 3]
1 2 [12, 123, 1] [2, 3, 1]
2 3 [123, 12, 1] [3, 2, 1]
If you need to handle missing data,
def foo(x):
try:
return [len(str(y)) for y in x]
except TypeError:
return np.nan
df['Expected_value'] = [foo(x) for x in df['Value'].tolist()]
df
Index Value Expected_value
0 1 [1, 12, 123] [1, 2, 3]
1 2 [12, 123, 1] [2, 3, 1]
2 3 [123, 12, 1] [3, 2, 1]
It is probably the best in terms of performance when dealing with object type data. More reading at For loops with pandas - When should I care?.
Another solution with pd.DataFrame, applymap and agg:
pd.DataFrame(df['Value'].tolist()).astype(str).applymap(len).agg(list, axis=1)
0 [1, 2, 3]
1 [2, 3, 1]
2 [3, 2, 1]
dtype: object

Related

List from Column elements

I want to create a column 'List' for column 'Numbers' such that it gives a list leaving the element of the corresponding row in pandas.
Table:
| Numbers | List |
| -------- | -------------- |
| 1 | [2,3,4,1] |
| 2 | [3,4,1,1] |
| 3 | [4,1,1,2] |
| 4 | [1,1,2,3] |
| 1 | [1,2,3,4] |
Can anyone help with this, please?

For general solution working with duplicated values first repeat values by numpy.tile and then remove values of diagonal for delete value of row:
df = pd.DataFrame({'Numbers':[1,2,3,4,1]})
A = np.tile(df['Numbers'], len(df)).reshape(-1, len(df))
#https://stackoverflow.com/a/46736275/2901002
df['new'] = A[~np.eye(A.shape[0],dtype=bool)].reshape(A.shape[0],-1).tolist()
print (df)
0 1 [2, 3, 4, 1]
1 2 [1, 3, 4, 1]
2 3 [1, 2, 4, 1]
3 4 [1, 2, 3, 1]
4 1 [1, 2, 3, 4]

Try this:
df = pd.DataFrame({'numbers':range(1, 5)})
df['list'] = df['numbers'].apply(lambda x: [i for i in df['numbers'] if i != x])
df

import pandas as pd
df = pd.DataFrame({'Numbers':[1,2,3,4,5]})
df['List'] = df['Numbers'].apply(
# for every cell with element x in Numbers return Numbers without the element
lambda x: [y for y in df['Numbers'] if not y==x])
which results in:
df
Numbers List
0 1 [2, 3, 4, 5]
1 2 [1, 3, 4, 5]
2 3 [1, 2, 4, 5]
3 4 [1, 2, 3, 5]
4 5 [1, 2, 3, 4]

Abhinar Khandelwal. If the task is to get any other number besides the current row value, than my answer can be fixed to the following:
import numpy as np
rng = np.random.default_rng()
numbers = rng.integers(5, size=7)
df = pd.DataFrame({'numbers':numbers})
df['list'] = df.reset_index()['index'].apply(lambda x: df[df.index != x].numbers.values)
df
But this way is much faster https://stackoverflow.com/a/73275614/18965699 :)

Combine lists from several columns into one nested list pandas

Here is my dataframe:
| col1 | col2 | col3 |
----------------------------------
[1,2,3,4] | [1,2,3,4] | [1,2,3,4]
I also have this function:
def joiner(col1,col2,col3):
snip = []
snip.append(col1)
snip.append(col2)
snip.append(col3)
return snip
I want to call this on each of the columns and assign it to a new column.
My end goal would be something like this:
| col1 | col2 | col3 | col4
------------------------------------------------------------------
[1,2,3,4] | [1,2,3,4] | [1,2,3,4] | [[1,2,3,4],[1,2,3,4],[1,2,3,4]]

Just .apply list on axis=1, it'll create lists for each rows
>>> df['col4'] = df.apply(list, axis=1)
OUTPUT:
col1 col2 col3 col4
0 [1, 2, 3, 4] [1, 2, 3, 4] [1, 2, 3, 4] [[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]]

You can just do
df['col'] = df.values.tolist()

Why this code is giving unexpected output?

Hi I have just started to learn Python programming.
I wrote this code:
a = [[1, 2, 3], [4, 5, 6]]
b = [[1, 2, 3], [4, 5, 6]]
c = []
d = []
for i in range(len(a)):
for j in range(len(a[0])):
d.append(a[i][j]+b[i][j])
c.append(d)
print(c)
I got this output:
[[2, 4, 6, 8, 10, 12], [2, 4, 6, 8, 10, 12]]
But to my understanding the output should be:
[[2, 4, 6], [2, 4, 6, 8, 10, 12]]
So please someone explain me the output.
Thank you.

You need to include the copy statement for your desired output.
a = [[1, 2, 3], [4, 5, 6]]
b = [[1, 2, 3], [4, 5, 6]]
c = []
d = []
d1=[]
for i in range(len(a)):
for j in range(len(a[0])):
d.append(a[i][j]+b[i][j])
d1=d.copy() # This will copy the d list to d1 and the append d1 to c by this way d1 will not get appended with the values after[2,4,6]
c.append(d1)
print(c)

I tried to debug and got the same answer
| i = | j = | d = |
| --- | --- | ------------- |
| 0 | 0 | [ 2 ] |
| 0 | 1 | [ 2, 4 ] |
| 0 | 2 | [ 2, 4, 6 ] |
end of i = 0 iteration, so d = [ 2, 4, 6 ]
end of i = 0 iteration, so c = [ 2, 4, 6 ]
| i = | j = | d = |
| --- | --- | ---------------------- |
| 1 | 0 | [ 2, 4, 6, 8 ] |
| 1 | 1 | [ 2, 4, 6, 8, 10 ] |
| 1 | 2 | [ 2, 4, 6, 8, 10, 12 ] |
end of i = 1 iteration, so d = [ 2, 4, 6, 8, 10, 12 ]
end of i = 1 iteration, so c = [ [ 2, 4, 6 ], [ 2, 4, 6, 8, 10, 12 ] ]

How to concatenate many more columns to one column with list format?

I need convert array to Dataframe, the default method like that:
lst = [[1,2,3],[1,2,3],[1,2,3]]
pd.DataFrame(lst)
out:
0 1 2
0 1 2 3
1 1 2 3
2 1 2 3
but, I want to the format, like:
0
0 [1, 2, 3]
1 [1, 2, 3]
2 [1, 2, 3]
just one column.
EDIT: the other situation,
lst = np.array([[1,2,3],[1,2,3],[1,2,3]])
pd.DataFrame(lst)

You may initialize a single column:
In [279]: pd.DataFrame({0: lst})
Out[279]:
0
0 [1, 2, 3]
1 [1, 2, 3]
2 [1, 2, 3]

In the array scenario, use apply with list and to_frame
lst = np.array([[1,2,3],[1,2,3],[1,2,3]])
pd.DataFrame(lst).apply(lambda x: list(x), axis=1).to_frame()
0
0 [1, 2, 3]
1 [1, 2, 3]
2 [1, 2, 3]

make a series and then a dataframe of that
>>> pd.DataFrame(pd.Series(lst))
0
0 [1, 2, 3]
1 [1, 2, 3]
2 [1, 2, 3]

Dataframe with column of ranges. Given number, select row where number occurs

I have a dataframe with a column of a range of numbers, and then more columns of data
[1, 2, 3, ..., 10] | a | b
[11, 12, 13, 14, ...] | c | d
Given a number like 10, 14, etc. how do I select the row where that number is in the range, i.e for 10 I want [1, 2, 3, ..., 10] | a | b row to be returned.
So far Ive tried dfs['A'].ix[10 in dfs['A']['B']] where dfs is a dictionary of dataframes, 'A' is a dataframe, 'B' is the column with ranges.
How do I do this?

Use apply to loop through column B and check each element individually which returns a logical index for subsetting:
df = pd.DataFrame({"B": [list(range(1,11)), list(range(11,21))], "col1":["a", "b"], "col2":["c", "d"]})
df[df["B"].apply(lambda x: 10 in x)]
# B col1 col2
# 0 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] a c

df = pd.DataFrame({'ranges':[range(11), range(11,20)], 'dat1':['a','c'], 'dat2':['b','d']})
mask = df.ranges.apply(lambda x: 10 in x)
df.ix[mask]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Appending Pandas DataFrame column based on another column - python

Related

List from Column elements

Combine lists from several columns into one nested list pandas

Why this code is giving unexpected output?

How to concatenate many more columns to one column with list format?

Dataframe with column of ranges. Given number, select row where number occurs

Categories

Resources