Appending Pandas DataFrame column based on another column - python

I have Pandas DataFrame that looks like this:
| Index | Value |
|-------|--------------|
| 1 | [1, 12, 123] |
| 2 | [12, 123, 1] |
| 3 | [123, 12, 1] |
and I want to append third column with list of array elements lengths:
| Index | Value | Expected_value |
|-------|--------------|----------------|
| 1 | [1, 12, 123] | [1, 2, 3] |
| 2 | [12, 123, 1] | [2, 3, 1] |
| 3 | [123, 12, 1] | [3, 2, 1] |
I've tried to use python lambda function and mapping little bit like this:
dataframe["Expected_value"] = dataframe.value.map(lambda x: len(str(x)))
but instead of list I got sum of those lengths:
| Index | Value | Expected_value |
|-------|--------------|----------------|
| 1 | [1, 12, 123] | 6 |
| 2 | [12, 123, 1] | 6 |
| 3 | [123, 12, 1] | 6 |

You can use list comprehension with map:
dataframe["Expected_value"] = dataframe.Value.map(lambda x: [len(str(y)) for y in x])
Or nested list comprehension:
dataframe["Expected_value"] = [[len(str(y)) for y in x] for x in dataframe.Value]
There is also possible use alternative for get lengths of integers:
import math
dataframe["Expected_value"] = [[int(math.log10(y))+1 for y in x] for x in dataframe.Value]
print (dataframe)
Index Value Expected_value
0 1 [1, 12, 123] [1, 2, 3]
1 2 [12, 123, 1] [2, 3, 1]
2 3 [123, 12, 1] [3, 2, 1]

Use a list comprehension:
[[len(str(y)) for y in x] for x in df['Value'].tolist()]
# [[1, 2, 3], [2, 3, 1], [3, 2, 1]]
df['Expected_value'] = [[len(str(y)) for y in x] for x in df['Value'].tolist()]
df
Index Value Expected_value
0 1 [1, 12, 123] [1, 2, 3]
1 2 [12, 123, 1] [2, 3, 1]
2 3 [123, 12, 1] [3, 2, 1]
If you need to handle missing data,
def foo(x):
try:
return [len(str(y)) for y in x]
except TypeError:
return np.nan
df['Expected_value'] = [foo(x) for x in df['Value'].tolist()]
df
Index Value Expected_value
0 1 [1, 12, 123] [1, 2, 3]
1 2 [12, 123, 1] [2, 3, 1]
2 3 [123, 12, 1] [3, 2, 1]
It is probably the best in terms of performance when dealing with object type data. More reading at For loops with pandas - When should I care?.
Another solution with pd.DataFrame, applymap and agg:
pd.DataFrame(df['Value'].tolist()).astype(str).applymap(len).agg(list, axis=1)
0 [1, 2, 3]
1 [2, 3, 1]
2 [3, 2, 1]
dtype: object

Related

List from Column elements

I want to create a column 'List' for column 'Numbers' such that it gives a list leaving the element of the corresponding row in pandas.
Table:
| Numbers | List |
| -------- | -------------- |
| 1 | [2,3,4,1] |
| 2 | [3,4,1,1] |
| 3 | [4,1,1,2] |
| 4 | [1,1,2,3] |
| 1 | [1,2,3,4] |
Can anyone help with this, please?
For general solution working with duplicated values first repeat values by numpy.tile and then remove values of diagonal for delete value of row:
df = pd.DataFrame({'Numbers':[1,2,3,4,1]})
A = np.tile(df['Numbers'], len(df)).reshape(-1, len(df))
#https://stackoverflow.com/a/46736275/2901002
df['new'] = A[~np.eye(A.shape[0],dtype=bool)].reshape(A.shape[0],-1).tolist()
print (df)
0 1 [2, 3, 4, 1]
1 2 [1, 3, 4, 1]
2 3 [1, 2, 4, 1]
3 4 [1, 2, 3, 1]
4 1 [1, 2, 3, 4]
Try this:
df = pd.DataFrame({'numbers':range(1, 5)})
df['list'] = df['numbers'].apply(lambda x: [i for i in df['numbers'] if i != x])
df
import pandas as pd
df = pd.DataFrame({'Numbers':[1,2,3,4,5]})
df['List'] = df['Numbers'].apply(
# for every cell with element x in Numbers return Numbers without the element
lambda x: [y for y in df['Numbers'] if not y==x])
which results in:
df
Numbers List
0 1 [2, 3, 4, 5]
1 2 [1, 3, 4, 5]
2 3 [1, 2, 4, 5]
3 4 [1, 2, 3, 5]
4 5 [1, 2, 3, 4]
Abhinar Khandelwal. If the task is to get any other number besides the current row value, than my answer can be fixed to the following:
import numpy as np
rng = np.random.default_rng()
numbers = rng.integers(5, size=7)
df = pd.DataFrame({'numbers':numbers})
df['list'] = df.reset_index()['index'].apply(lambda x: df[df.index != x].numbers.values)
df
But this way is much faster https://stackoverflow.com/a/73275614/18965699 :)

Combine lists from several columns into one nested list pandas

Here is my dataframe:
| col1 | col2 | col3 |
----------------------------------
[1,2,3,4] | [1,2,3,4] | [1,2,3,4]
I also have this function:
def joiner(col1,col2,col3):
snip = []
snip.append(col1)
snip.append(col2)
snip.append(col3)
return snip
I want to call this on each of the columns and assign it to a new column.
My end goal would be something like this:
| col1 | col2 | col3 | col4
------------------------------------------------------------------
[1,2,3,4] | [1,2,3,4] | [1,2,3,4] | [[1,2,3,4],[1,2,3,4],[1,2,3,4]]
Just .apply list on axis=1, it'll create lists for each rows
>>> df['col4'] = df.apply(list, axis=1)
OUTPUT:
col1 col2 col3 col4
0 [1, 2, 3, 4] [1, 2, 3, 4] [1, 2, 3, 4] [[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]]
You can just do
df['col'] = df.values.tolist()

Why this code is giving unexpected output?

Hi I have just started to learn Python programming.
I wrote this code:
a = [[1, 2, 3], [4, 5, 6]]
b = [[1, 2, 3], [4, 5, 6]]
c = []
d = []
for i in range(len(a)):
for j in range(len(a[0])):
d.append(a[i][j]+b[i][j])
c.append(d)
print(c)
I got this output:
[[2, 4, 6, 8, 10, 12], [2, 4, 6, 8, 10, 12]]
But to my understanding the output should be:
[[2, 4, 6], [2, 4, 6, 8, 10, 12]]
So please someone explain me the output.
Thank you.
You need to include the copy statement for your desired output.
a = [[1, 2, 3], [4, 5, 6]]
b = [[1, 2, 3], [4, 5, 6]]
c = []
d = []
d1=[]
for i in range(len(a)):
for j in range(len(a[0])):
d.append(a[i][j]+b[i][j])
d1=d.copy() # This will copy the d list to d1 and the append d1 to c by this way d1 will not get appended with the values after[2,4,6]
c.append(d1)
print(c)
I tried to debug and got the same answer
| i = | j = | d = |
| --- | --- | ------------- |
| 0 | 0 | [ 2 ] |
| 0 | 1 | [ 2, 4 ] |
| 0 | 2 | [ 2, 4, 6 ] |
end of i = 0 iteration, so d = [ 2, 4, 6 ]
end of i = 0 iteration, so c = [ 2, 4, 6 ]
| i = | j = | d = |
| --- | --- | ---------------------- |
| 1 | 0 | [ 2, 4, 6, 8 ] |
| 1 | 1 | [ 2, 4, 6, 8, 10 ] |
| 1 | 2 | [ 2, 4, 6, 8, 10, 12 ] |
end of i = 1 iteration, so d = [ 2, 4, 6, 8, 10, 12 ]
end of i = 1 iteration, so c = [ [ 2, 4, 6 ], [ 2, 4, 6, 8, 10, 12 ] ]

How to concatenate many more columns to one column with list format?

I need convert array to Dataframe, the default method like that:
lst = [[1,2,3],[1,2,3],[1,2,3]]
pd.DataFrame(lst)
out:
0 1 2
0 1 2 3
1 1 2 3
2 1 2 3
but, I want to the format, like:
0
0 [1, 2, 3]
1 [1, 2, 3]
2 [1, 2, 3]
just one column.
EDIT: the other situation,
lst = np.array([[1,2,3],[1,2,3],[1,2,3]])
pd.DataFrame(lst)
You may initialize a single column:
In [279]: pd.DataFrame({0: lst})
Out[279]:
0
0 [1, 2, 3]
1 [1, 2, 3]
2 [1, 2, 3]
In the array scenario, use apply with list and to_frame
lst = np.array([[1,2,3],[1,2,3],[1,2,3]])
pd.DataFrame(lst).apply(lambda x: list(x), axis=1).to_frame()
0
0 [1, 2, 3]
1 [1, 2, 3]
2 [1, 2, 3]
make a series and then a dataframe of that
>>> pd.DataFrame(pd.Series(lst))
0
0 [1, 2, 3]
1 [1, 2, 3]
2 [1, 2, 3]

Dataframe with column of ranges. Given number, select row where number occurs

I have a dataframe with a column of a range of numbers, and then more columns of data
[1, 2, 3, ..., 10] | a | b
[11, 12, 13, 14, ...] | c | d
Given a number like 10, 14, etc. how do I select the row where that number is in the range, i.e for 10 I want [1, 2, 3, ..., 10] | a | b row to be returned.
So far Ive tried dfs['A'].ix[10 in dfs['A']['B']] where dfs is a dictionary of dataframes, 'A' is a dataframe, 'B' is the column with ranges.
How do I do this?
Use apply to loop through column B and check each element individually which returns a logical index for subsetting:
df = pd.DataFrame({"B": [list(range(1,11)), list(range(11,21))], "col1":["a", "b"], "col2":["c", "d"]})
df[df["B"].apply(lambda x: 10 in x)]
# B col1 col2
# 0 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] a c
df = pd.DataFrame({'ranges':[range(11), range(11,20)], 'dat1':['a','c'], 'dat2':['b','d']})
mask = df.ranges.apply(lambda x: 10 in x)
df.ix[mask]

Categories

Resources