I am trying something like this:
List append in pandas cell
But the problem is the post is old and everything is deprecated and should not be used anymore.
d = {'col1': ['TEST', 'TEST'], 'col2': [[1, 2], [1, 2]], 'col3': [35, 89]}
df = pd.DataFrame(data=d)
col1
col2
col3
TEST
[1, 2, 3]
35
TEST
[1, 2, 3]
89
My Dataframe looks like this, were there is the col2 is the one I am interested in. I need to add [0,0] to the lists in col2 for every row in the DataFrame. My real DataFrame is of dynamic shape so I cant just set every cell on its own.
End result should look like this:
col1
col2
col3
TEST
[1, 2, 3, 0, 0]
35
TEST
[1, 2, 3, 0, 0]
89
I fooled around with df.apply and df.assign but I can't seem to get it to work.
I tried:
df['col2'] += [0, 0]
df = df.col2.apply(lambda x: x.append([0,0]))
Which returns a Series that looks nothing like i need it
df = df.assign(new_column = lambda x: x + list([0, 0))
Not sure if this is the best way to go but, option 2 works with a little modification
import pandas as pd
d = {'col1': ['TEST', 'TEST'], 'col2': [[1, 2], [1, 2]], 'col3': [35, 89]}
df = pd.DataFrame(data=d)
df["col2"] = df["col2"].apply(lambda x: x + [0,0])
print(df)
Firstly, if you want to add all members of an iterable to a list use .extend instead of .append. This doesn't work because the method works inplace and doesn't return anything so "col2" values become None, so use list summation instead. Finally, you want to assign your modified column to the original DataFrame, not override it (this is the reason for the Series return)
One idea is use list comprehension:
df["col2"] = [x + [0,0] for x in df["col2"]]
print (df)
col1 col2 col3
0 TEST [1, 2, 0, 0] 35
1 TEST [1, 2, 0, 0] 89
for val in df['col2']:
val.append(0)
Related
Say I have the data below:
df = pd.DataFrame({'col1': [1, 2, 1],
'col2': [2, 4, 3],
'col3': [3, 6, 5],
'col4': [4, 8, 7]})
Is there a way to use list comprehensions to filter data efficiently? For example, if I wanted to find all cases where col2 was even OR col3 was even OR col 4 was even, is there a simpler way than just writing this?
df[(df['col2'] % 2 == 0) | (df['col3'] % 2 == 0) | (df['col4'] % 2 == 0)]
It would be nice if I could pass in a list of columns and the condition to check.
df[(df[cols] % 2 == 0).any(axis=1)]
where cols is your list of columns
df = pd.read_csv("school_data.csv")
col1 col2
0 [1,2,3] [4,5,6]
1 [0,5,3] [6,2,5]
want o/p
col1 col2 col3
0 [1,2,3] [4,5,6] [1,2,3,4,5,6]
1 [0,5,3] [6,2,5] [0,5,3,6,2,5]
col1 and col2 value are unique,
using pandas
Simplest way would be to do this:
df['col3'] = df['col1'] + df['col2']
Example:
import pandas as pd
row1 = [[1,2,3], [4,5,6]]
row2 = [[0,5,3], [6,2,5]]
df = pd.DataFrame(data=[row1, row2], columns=['col1', 'col2'])
df['col3'] = df['col1'] + df['col2']
print(df)
Output:
col1 col2 col3
0 [1, 2, 3] [4, 5, 6] [1, 2, 3, 4, 5, 6]
1 [0, 5, 3] [6, 2, 5] [0, 5, 3, 6, 2, 5]
You can use apply function on more than one column at once, like this:
def func(x):
return x['col1'] + x['col2']
df['col3'] = df[['col1','col2']].apply(func, axis=1)
Why not do a simple df['col1'] + df['col2']?
Assume col1 has list but in str types. In that case you can always modify func to:
def func(x):
return x['col1'][1:-1].split(',') + x['col2']
Given a DataFrame, I'd like to count the number of NaN values in each column, to show the proportion as a histogram.
I've come up with
df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
nan_dict = {}
for col in df:
nan_dict[col] = df[col].value_counts(dropna=False)[0]
and then build the histogram from the dict. This seems really cumbersome; also, it fails when there are no NaNs.
Is there a way I could apply value_counts along all columns so that I get back a Series with NaN values per column?
df = pd.DataFrame({"col1": [1, 2], "col2": [3, 4]})
print(dict(zip(df.columns, df.isna().sum())))
Prints:
{'col1': 0, 'col2': 0}
For dataframe:
col1 col2
0 1 3.0
1 2 NaN
Prints:
{'col1': 0, 'col2': 1}
I would like to create a new column in a dataframe that has a list at every row. I'm looking for something that will accomplish the following:
df = pd.DataFrame(data={'A': [1, 2, 3], 'B': ['x', 'y', 'z']})
list_=[1,2,3]
df['new_col] = list_
A B new_col
0 1 x [1,2,3]
1 2 y [1,2,3]
2 3 z [1,2,3]
Does anyone know how to accomplish this? Thank you!
df = pd.DataFrame(data={'A': [1, 2, 3], 'B': ['x', 'y', 'z']})
list_=[1,2,3]
df['new_col'] = [list_]*len(df)
Output:
A B new_col
0 1 x [1, 2, 3]
1 2 y [1, 2, 3]
2 3 z [1, 2, 3]
Tip: list as a variable name is not advised. list is a built in type like str, int etc.
df['new_col'] = pd.Series([mylist for x in range(len(df.index))])
("list" is a terrible variable name, so I'm using "mylist" in this example).
df['new_col'] = [[1,2,3] for j in range(df.shape[0])]
I have a dataframe with one of its column having a list at each index. I want to concatenate these lists into one list. I am using
ids = df.loc[0:index, 'User IDs'].values.tolist()
However, this results in
['[1,2,3,4......]'] which is a string. Somehow each value in my list column is type str. I have tried converting using list(), literal_eval() but it does not work. The list() converts each element within a list into a string e.g. from [12,13,14...] to ['['1'',','2',','1',',','3'......]'].
How to concatenate pandas column with list values into one list? Kindly help out, I am banging my head on it for several hours.
consider the dataframe df
df = pd.DataFrame(dict(col1=[[1, 2, 3]] * 2))
print(df)
col1
0 [1, 2, 3]
1 [1, 2, 3]
pandas simplest answer
df.col1.sum()
[1, 2, 3, 1, 2, 3]
numpy.concatenate
np.concatenate(df.col1)
array([1, 2, 3, 1, 2, 3])
chain
from itertools import chain
list(chain(*df.col1))
[1, 2, 3, 1, 2, 3]
response to comments:
I think your columns are strings
from ast import literal_eval
df.col1 = df.col1.apply(literal_eval)
If instead your column is string values that look like lists
df = pd.DataFrame(dict(col1=['[1, 2, 3]'] * 2))
print(df) # will look the same
col1
0 [1, 2, 3]
1 [1, 2, 3]
However pd.Series.sum does not work the same.
df.col1.sum()
'[1, 2, 3][1, 2, 3]'
We need to evaluate the strings as if they are literals and then sum
df.col1.apply(literal_eval).sum()
[1, 2, 3, 1, 2, 3]
If you want to flatten the list this is pythonic way to do it:
import pandas as pd
df = pd.DataFrame({'A': [[1,2,3], [4,5,6]]})
a = df['A'].tolist()
a = [i for j in a for i in j]
print a