Assign a list to a pandas dataframe element - python

I want to add a column to a data frame, and also set a list to each element of it, after the execution of below code, nothing changed,
df = pd.DataFrame({'A':[1,2,3],'B':[4,5,6]})
df['C'] = 0
for i in range(len(df)):
lst = [6,7,8]
data.iloc[i]['C'] = []
data.iloc[i]['C'] = lst
Also, based on Assigning a list value to pandas dataframe, I tried df.at[i,'C'] on the above code, and the following error appeared: 'setting an array element with a sequence.'

You can use np.tile with np.ndarray.tolist
l = len(df)
df['C'] = np.tile([6,7,8],(l,1)).tolist()
df
A B C
0 1 4 [6, 7, 8]
1 2 5 [6, 7, 8]
2 3 6 [6, 7, 8]

One idea is use list comprehension:
lst = [6,7,8]
df['C'] = [lst for _ in df.index]
print (df)
A B C
0 1 4 [6, 7, 8]
1 2 5 [6, 7, 8]
2 3 6 [6, 7, 8]
In your solution for me working:
df['C'] = ''
for i in range(len(df)):
lst = [6,7,8]
df.iloc[i, df.columns.get_loc('C')] = lst

Related

(python) subtract value in a list from value in the same list in a for loop / list comprehension

suppose i have
list1 = [3, 4, 6, 8, 13]
in a for loop I want to subtract the value i from the value that comes right after. In the above example: 4-3, 6-4, 8-6, 13-8. (and i want to skip the first value)
desired result
list2 = [3, 1, 2, 2, 5]
can i do this in a for loop / list comprehension?
more specifically do I want to do this in a dataframe
list1
0 3
1 4
2 6
3 8
4 13
and after the operation
list1 list2
0 3 3
1 4 1
2 6 2
3 8 2
4 13 5
I have tried for loops, lambda functions and list comprehensions and trying to access the positional index with enumerate() but I can't figure out how to access the value just before the value from which I want to subtract from
edit: answers below worked. thank you very much!
The dataframe solution has already been posted. This is an implementation for lists:
list1 = [3, 4, 6, 8, 13]
list2 = []
for i, v in enumerate(list1):
list2.append(list1[i] - list1[i-1])
list2[0] = list1[0]
print(list2) # [3, 1, 2, 2, 5]
And lastly, in list comprehension:
list2 = [list1[i] - list1[i-1] for i, v in enumerate(list1)]
list2[0] = list1[0]
You should use shift to access the next row:
df['list2'] = df['list1'].sub(df['list1'].shift(fill_value=0))
Or, using diff with fillna:
df['list2'] = df['list1'].diff().fillna(df['list1'])
Output:
list1 list2
0 3 3
1 4 1
2 6 2
3 8 2
4 13 5
For a pure python solution:
list1 = [3, 4, 6, 8, 13]
list2 = [a-b for a,b in zip(list1, [0]+list1)]
Output: [3, 1, 2, 2, 5]
You could loop backwards for x in range(len(list) - 1, 0, -1): and then the calculation can be done list[x] = list[x] - list[x - 1]
Try this code its working
import pandas as pd
list1 = [3, 4, 6, 8, 13]
list2 = [list1[i+1]-list1[i] for i in range(len(list1)-1)]
list2.insert(0, list1[0])
data = {
"list1":list1,
"list2":list2
}
df = pd.DataFrame(data)
print(df)
output:
$ python3 solution.py
list1 list2
0 3 3
1 4 1
2 6 2
3 8 2
4 13 5

remove values from pandas df and move remaining upwards

I have a dataframe with categorical data in it.
I have come with a procedure to keep only desired categories, while moving up the remaining categories in the empty cells of deleted values.
But I want to do it without the list intermediaries if possible.
import pandas as pd
mydf = pd.DataFrame(data = {'a': [9,6,3,8,5],
'b': [4, 3,5,6,7],
'c': [5, 3,6,9,10]
}
)
selecList = [5,8,4,6] # only this categories shall remain
mydf
a b c
0 9 4 5
1 6 3 3
2 3 5 6
3 8 6 9
4 5 7 10
Desired Output
a b c
0 6 4 5
1 8 5 6
2 5 6 <NA>
My workaround:
myList = mydf.T.values.tolist()
myList
[[9, 6, 3, 8, 5], [4, 3, 5, 6, 7], [5, 3, 6, 9, 10]]
filtered_list = [[x for x in y if x in selecList ] for y in myList]
filtered_list
[[6, 8, 5], [4, 5, 6], [5, 6]]
filtered_df = pd.DataFrame(filtered_list).T
filtered_df.columns = list(mydf)
filtered_df = filtered_df.astype('Int64')
Unsuccessful try:
pd.DataFrame(mydf.apply(lambda y: [x for x in y if x in selecList ])).T
Here is an alternative solution:
df.where(df.isin(selecList)).dropna(how='all')
Here is a another solution:
df.where(df.isin(selecList)).stack().droplevel(0).to_frame().assign(i = lambda x: x.groupby(level=0).cumcount()).set_index('i',append=True)[0].unstack(level=0)

How to update cell containing a list using values from a pd.Series?

I have the folowing dataframe:
df = pd.DataFrame({'cols': ['a', 'b', 'c'], 'vals': [[1,2], [3,4], [5,6]]})
series = pd.Series([3,5])
df
OUT:
cols vals
0 a [1, 2]
1 b [3, 4]
2 c [5, 6]
series
OUT:
0 3
1 5
i would like to get the following result:
cols vals
0 a [1, 2, 3]
1 b [3, 4, 5]
2 c [5, 6]
How can i achieve this without using itterrows?
good old += with index alignment:
df.loc[series.index, 'vals'] += pd.Series([[i] for i in series], index=series.index)
Altenatively with explode
df['vals'] = df['vals'].explode().append(series).groupby(level=0).agg(list)
print(df)
cols vals
0 a [1, 2, 3]
1 b [3, 4, 5]
2 c [5, 6]
You could use a list comprehension and slice assign back to vals (this assumes the index is a normal range):
df.loc[:len(series)-1, 'vals'] = [i+[j] for i,j in zip(df.loc[:len(series)-1, 'vals'], series)]
print(df)
cols vals
0 a [1, 2, 3]
1 b [3, 4, 5]
2 c [5, 6]

Unique identifier for each list of list in data frame

I have a list of list,
lst = [[2, 0, 1, 6, 7, 8], [4, 3, 5]]
and I want to flatten the list and assign a unique id to each list in the list merged into a data.frame.
Desired output:
value group
0 2 0
1 0 0
2 1 0
3 6 0
4 7 0
5 8 0
6 4 1
7 3 1
8 5 1
You're going to need to do some fancy flattening:
flattened = [(item, index) for index, sublist in enumerate(lst) for item in sublist]
df = pd.DataFrame(flattened, columns=['value','group'])
If you want a Pandas DataFrame:
import pandas as pd
lst = [[2, 0, 1, 6, 7, 8], [4, 3, 5]]
final_list = []
for i, l in enumerate(lst):
for num in l:
final_list.append({'value': num, 'group': i})
df = pd.DataFrame(final_list)
you can use this code:
new_lst = []
for group in lst:
for n in group:
new_lst.append({"group":lst.index(group),"value": n})
You should try something before asking for desired output.
Looping through a list of list, whilst having a unique identifier, you may want to use the function enumerate that "gives the indexer" of the list.
for i,sub_list in enumerate(lst):
identifier = i
[(value,identifier) for value in sublist]
....
Hoping this will help

remove elements from list based on index in pandas Dataframe

How to remove elements from list based on index range in pandas Dataframe.
suppose DataFrame is like
df:
values size
0 [1,2,3,4,5,6,7] 2 #delete first 2 elements from list
1 [1,2,3,4] 3 #delete first 3 elements from list
2 [9,8,7,6,5,4,3] 5 #delete first 5 elements from list
Expected Output is
df:
values size
0 [3,4,5,6,7] 2
1 [4] 3
2 [4,3] 5
Use list comprehension with indexing:
df['values'] = [i[j:] for i, j in zip(df['values'], df['size'])]
print (df)
values size
0 [3, 4, 5, 6, 7] 2
1 [4] 3
2 [4, 3] 5
Using df.apply
import pandas as pd
df = pd.DataFrame({"values": [[1,2,3,4,5,6,7], [1,2,3,4], [9,8,7,6,5,4,3]], "size": [2, 3, 5]})
df["values"] = df.apply(lambda x: x["values"][x['size']:], axis=1)
print(df)
Output:
size values
0 2 [3, 4, 5, 6, 7]
1 3 [4]
2 5 [4, 3]
Using map in base Python, you could do
dat['values'] = pd.Series(map(lambda x, y : x[y:], dat['values'], dat['size']))
which returns
dat
Out[34]:
values size
0 [3, 4, 5, 6, 7] 2
1 [4] 3
2 [4, 3] 5

Categories

Resources