I would like to create a new column in a dataframe that has a list at every row. I'm looking for something that will accomplish the following:
df = pd.DataFrame(data={'A': [1, 2, 3], 'B': ['x', 'y', 'z']})
list_=[1,2,3]
df['new_col] = list_
A B new_col
0 1 x [1,2,3]
1 2 y [1,2,3]
2 3 z [1,2,3]
Does anyone know how to accomplish this? Thank you!
df = pd.DataFrame(data={'A': [1, 2, 3], 'B': ['x', 'y', 'z']})
list_=[1,2,3]
df['new_col'] = [list_]*len(df)
Output:
A B new_col
0 1 x [1, 2, 3]
1 2 y [1, 2, 3]
2 3 z [1, 2, 3]
Tip: list as a variable name is not advised. list is a built in type like str, int etc.
df['new_col'] = pd.Series([mylist for x in range(len(df.index))])
("list" is a terrible variable name, so I'm using "mylist" in this example).
df['new_col'] = [[1,2,3] for j in range(df.shape[0])]
Related
With python Pandas, I'm trying to filter out the data that contains the specified value in the array, I try to use python in to filter value, but it's not working, I want to know if there is a way to achieve such a function without looping
import pandas as pd
df = pd.DataFrame({'A' : [1,2,3,4], 'B' : [[1, 2, 3], [2, 3], [3], [1, 2, 3]]})
df = 1 in df['custom_test_type']
A B
0 1 [1, 2, 3]
1 2 [2, 3]
2 3 [3]
3 4 [1, 2, 3]
I'm try to filter 1 in row B, so expected output will be:
A B
0 1 [1, 2, 3]
3 4 [1, 2, 3]
but the output always be True
due to my limited ability, Any help or explanation is welcome! Thank you.
You need to use a loop/list comprehension:
out = df[[1 in l for l in df['B']]]
A pandas version would be more verbose and less efficient:
out = df[df['B'].explode().eq(1).groupby(level=0).any()]
Output:
A B
0 1 [1, 2, 3]
3 4 [1, 2, 3]
I am trying to convert a dictionary into a dataframe.
import pandas as pd
dict = {'A': [1,2,3], 'B': [1,2,3,4])
pd.DataFrame.from_dict(dict, orient = 'index').T
Expect:
A B
0 [1,2,3] [1,2,3,4]
But got instead:
A B
-----------
0 1 a
1 2 b
2 3 c
3 None d
Try to put the dictionary inside list ([]):
import pandas as pd
dct = {"A": [1, 2, 3], "B": [1, 2, 3, 4]}
df = pd.DataFrame([dct])
print(df)
Prints:
A B
0 [1, 2, 3] [1, 2, 3, 4]
Note: Don't use reserved words such as dict for variable names.
I have the folowing dataframe:
df = pd.DataFrame({'cols': ['a', 'b', 'c'], 'vals': [[1,2], [3,4], [5,6]]})
series = pd.Series([3,5])
df
OUT:
cols vals
0 a [1, 2]
1 b [3, 4]
2 c [5, 6]
series
OUT:
0 3
1 5
i would like to get the following result:
cols vals
0 a [1, 2, 3]
1 b [3, 4, 5]
2 c [5, 6]
How can i achieve this without using itterrows?
good old += with index alignment:
df.loc[series.index, 'vals'] += pd.Series([[i] for i in series], index=series.index)
Altenatively with explode
df['vals'] = df['vals'].explode().append(series).groupby(level=0).agg(list)
print(df)
cols vals
0 a [1, 2, 3]
1 b [3, 4, 5]
2 c [5, 6]
You could use a list comprehension and slice assign back to vals (this assumes the index is a normal range):
df.loc[:len(series)-1, 'vals'] = [i+[j] for i,j in zip(df.loc[:len(series)-1, 'vals'], series)]
print(df)
cols vals
0 a [1, 2, 3]
1 b [3, 4, 5]
2 c [5, 6]
I've got a dataframe. In column A there is a list of integers, in column B - an integer. I want to pick n-th value of the column A list, where n is a number from column B. So if in columns A there is [1,5,6,3,4] and in column B: 2, I want to get '6'.
I tried this:
result = [y[x] for y in df['A'] for x in df['B']
But it doesn't work. Please help.
Use zip with list comprehension:
df['new'] = [y[x] for x, y in zip(df['B'], df['A'])]
print (df)
A B new
0 [1, 2, 3, 4, 5] 1 2
1 [1, 2, 3, 4] 2 3
You can go for apply i.e
df = pd.DataFrame({'A':[[1,2,3,4,5],[1,2,3,4]],'B':[1,2]})
A B
0 [1, 2, 3, 4, 5] 1
1 [1, 2, 3, 4] 2
# df.apply(lambda x : np.array(x['A'])[x['B']],1)
# You dont need np.array here, use it when the column B is also a list.
df.apply(lambda x : x['A'][x['B']],1) # Thanks #Zero
0 2
1 3
dtype: int64
I have a pandas.Series named matches like this:
When I called pandas.Series.str.get method on it, it returns a new Series with its values all NaN:
I have read the document pandas.Series.str.get, but still can't understand it.
It return second element from iterable, it is same as str[1]:
df = pd.DataFrame({"A": [[1,2,3], [0,1,3]], "B":['aswed','yuio']})
print (df)
A B
0 [1, 2, 3] aswed
1 [0, 1, 3] yuio
df['C'] = df['A'].str.get(1)
df['C1'] = df['A'].str[1]
df['D'] = df['B'].str.get(1)
df['D1'] = df['B'].str[1]
print (df)
A B C C1 D D1
0 [1, 2, 3] aswed 2 2 s s
1 [0, 1, 3] yuio 1 1 u u