how does pandas.Series.str.get method work? - python

I have a pandas.Series named matches like this:
When I called pandas.Series.str.get method on it, it returns a new Series with its values all NaN:
I have read the document pandas.Series.str.get, but still can't understand it.

It return second element from iterable, it is same as str[1]:
df = pd.DataFrame({"A": [[1,2,3], [0,1,3]], "B":['aswed','yuio']})
print (df)
A B
0 [1, 2, 3] aswed
1 [0, 1, 3] yuio
df['C'] = df['A'].str.get(1)
df['C1'] = df['A'].str[1]
df['D'] = df['B'].str.get(1)
df['D1'] = df['B'].str[1]
print (df)
A B C C1 D D1
0 [1, 2, 3] aswed 2 2 s s
1 [0, 1, 3] yuio 1 1 u u

Related

count number of elements in a list inside a dataframe

Assume that we have a dataframe and inside the dataframe in a column we have lists. How can I count the number per list? For example
A B
(1,2,3) (1,2,3,4)
(1) (1,2,3)
I would like to create 2 new columns with the count of each column. something like the following
A B C D
(1,2,3) (1,2,3,4) 3 4
(1) (1,2,3) 1 3
where C corresponds to the number of the elements in the column A for that row, and D for the number of elements in the list in column B for that row
I cannot just do
df['A'] = len(df['A'])
Because that returns the len of my dataframe
You can use the .apply method on the Series for the column df['A'].
>>> import pandas
>>> import pandas as pd
>>> pd.DataFrame({"column": [[1, 2], [1], [1, 2, 3]]})
column
0 [1, 2]
1 [1]
2 [1, 2, 3]
>>> df = pd.DataFrame({"column": [[1, 2], [1], [1, 2, 3]]})
>>> df["column"].apply
<bound method Series.apply of 0 [1, 2]
1 [1]
2 [1, 2, 3]
Name: column, dtype: object>
>>> df["column"].apply(len)
0 2
1 1
2 3
Name: column, dtype: int64
>>> df["column"] = df["column"].apply(len)
>>>
See Python Pandas, apply function for a more general discussion of apply.
You can pandas' apply with the len function to each column like bellow to obtain what you are looking for
# package importation
import pandas as pd
# creating a sample dataframce
df = pd.DataFrame(
{
'A':[[1,2,3],[32,4],[45,67,23,54,3],[],[0]],
'B':[[2],[3],[2,3],[5,6,1],[98,44]]
},
index=['z','y','m','n','o']
)
# computing lengths of lists in the column
df['items_in_A'] = df['A'].apply(len)
df['items_in_B'] = df['B'].apply(len)
# check the putput
print(df)
output
A B items_in_A items_in_B
z [1, 2, 3] [2] 3 1
y [32, 4] [3] 2 1
m [45, 67, 23, 54, 3] [2, 3] 5 2
n [] [5, 6, 1] 0 3
o [0] [98, 44] 1 2

python pandas DataFrame - assign a list to multiple cells

I have a DataFrame like
name col1 col2
a aa 123
a bb 123
b aa 234
and a list
[1, 2, 3]
I want to replace the col2 of every row with col1 = 'aa' with the list like
name col1 col2
a aa [1, 2, 3]
a bb 123
b aa [1, 2, 3]
I tried something like
df.loc[df[col1] == 'aa', col2] = [1, 2, 3]
but it gives me the error:
ValueError: could not broadcast input array from shape (xx,) into shape (yy,)
How should I get around this?
Make it simple. np.where should do. Code below
df['col2']=np.where(df['col1']=='aa', str(lst), df['col2'])
Alternatively use pd.Series with list locked in double brackects
df['col2']=np.where(df['col1']=='aa', pd.Series([lst]), df['col2'])
import pandas as pd
df = pd.DataFrame({"name":["a","a","b"],"col1":["aa","bb","aa"],"col2":[123,123,234]})
l = [1,2,3]
df["col2"] = df.apply(lambda x: l if x.col1 == "aa" else x.col2, axis =1)
df
A list comprehension with an if/else should work
df['col2'] = [x['col2'] if x['col1'] != 'aa' else [1,2,3] for ind,x in df.iterrows()]
It will be safe to do with for loop
df.col2 = df.col2.astype(object)
for x in df.index:
if df.at[x,'col1'] == 'aa':
df.at[x,'col2'] = [1,2,3]
df
name col1 col2
0 a aa [1, 2, 3]
1 a bb 123
2 b aa [1, 2, 3]
You can also use:
data = {'aa':[1,2,3]}
df['col2'] = np.where(df['col1'] == 'aa', df['col1'].map(data), df['col2'])
You should use this with care, as doing this will change list to both locations:
df['col2'].loc[0].append(5)
print(df)
#OUTPUT
name col1 col2
0 a aa [1, 2, 3, 5]
1 a bb 123
2 b aa [1, 2, 3, 5]
But this is fine:
df = df.loc[1:]
print(df)
#OUTPUT
name col1 col2
1 a bb 123
2 b aa [1, 2, 3]

Python - pick a value from a list basing on another list

I've got a dataframe. In column A there is a list of integers, in column B - an integer. I want to pick n-th value of the column A list, where n is a number from column B. So if in columns A there is [1,5,6,3,4] and in column B: 2, I want to get '6'.
I tried this:
result = [y[x] for y in df['A'] for x in df['B']
But it doesn't work. Please help.
Use zip with list comprehension:
df['new'] = [y[x] for x, y in zip(df['B'], df['A'])]
print (df)
A B new
0 [1, 2, 3, 4, 5] 1 2
1 [1, 2, 3, 4] 2 3
You can go for apply i.e
df = pd.DataFrame({'A':[[1,2,3,4,5],[1,2,3,4]],'B':[1,2]})
A B
0 [1, 2, 3, 4, 5] 1
1 [1, 2, 3, 4] 2
# df.apply(lambda x : np.array(x['A'])[x['B']],1)
# You dont need np.array here, use it when the column B is also a list.
df.apply(lambda x : x['A'][x['B']],1) # Thanks #Zero
0 2
1 3
dtype: int64

Create new pandas column with same list as every row?

I would like to create a new column in a dataframe that has a list at every row. I'm looking for something that will accomplish the following:
df = pd.DataFrame(data={'A': [1, 2, 3], 'B': ['x', 'y', 'z']})
list_=[1,2,3]
df['new_col] = list_
A B new_col
0 1 x [1,2,3]
1 2 y [1,2,3]
2 3 z [1,2,3]
Does anyone know how to accomplish this? Thank you!
df = pd.DataFrame(data={'A': [1, 2, 3], 'B': ['x', 'y', 'z']})
list_=[1,2,3]
df['new_col'] = [list_]*len(df)
Output:
A B new_col
0 1 x [1, 2, 3]
1 2 y [1, 2, 3]
2 3 z [1, 2, 3]
Tip: list as a variable name is not advised. list is a built in type like str, int etc.
df['new_col'] = pd.Series([mylist for x in range(len(df.index))])
("list" is a terrible variable name, so I'm using "mylist" in this example).
df['new_col'] = [[1,2,3] for j in range(df.shape[0])]

Checking which rows contain a value efficiently

I am trying to write a function that checks for the presence of a value in a row across columns. I have a script that does this by iterating through columns, but I am worried that this will be inefficient when used on large datasets.
Here is my current code:
import pandas as pd
a = [1, 2, 3, 4]
b = [2, 3, 3, 2]
c = [5, 6, 1, 3]
d = [1, 0, 0, 99]
df = pd.DataFrame({'a': a,
'b': b,
'c': c,
'd': d})
cols = ['a', 'b', 'c', 'd']
df['e'] = 0
for col in cols:
df['e'] = df['e'] + df[col] == 1
print(df)
result:
a b c d e
0 1 2 5 1 True
1 2 3 6 0 False
2 3 3 1 0 True
3 4 2 3 99 False
As you can see, column e keeps record of whether the value "1" exists in that row. I was wondering if there was a better/more efficient way of achieving these results.
You can check if values in the data frame is one and see if any is true in a row (with axis=1):
df['e'] = df.eq(1).any(1)
df
# a b c d e
#0 1 2 5 1 True
#1 2 3 6 0 False
#2 3 3 1 0 True
#3 4 2 3 99 False
Python supports 'in', and 'not in'.
EXAMPLE:
>>> a = [1, 2, 5, 1]
>>> b = [2, 3, 6, 0]
>>> c = [5, 6, 1, 3]
>>> d = [1, 0, 0, 99]
>>> 1 in a
True
>>> 1 not in a
False
>>> 99 in d
True
>>> 99 not in d
False
By using this, you don't have to iterate over the array by yourself for this case.

Categories

Resources