Grab Updated rows of pandas column while looping through dataframe - python

I am trying the following:
import pandas as pd
df = pd.DataFrame({'Col1': {0: 'A', 1: 'A', 2: 'B', 3: 'B', 4: 'B'},
'Col2': {0: 'a', 1: 'a', 2: 'b', 3: 'b', 4: 'c'},
'Col3': {0: 42, 1: 28, 2: 56, 3: 62, 4: 48}})
ii = 1
for idx, row in df.iterrows():
print(row)
df.at[:, 'Col2'] = 'asd{}'.format(ii)
ii += 1
But the print statement above doesn't reflect the change df.at[:, 'Col2'] = 'asd'.format(ii). I need the print statements to reflect the change df.at[:, 'Col2'] = 'asd'.format(ii)
Edit: Since I am updating all rows of df, I was expecting the idx and row to grab new values from dataframe.
If this is not the right way to grab updated values from df through idx and row, then what is the correct approach. I need idx and row to reflect new values.
Expected output:
Col1 A
Col2 a
Col3 42
Name: 0, dtype: object
Col1 A
Col2 asd1
Col3 28
Name: 1, dtype: object
Col1 B
Col2 asd2
Col3 56
.....

From iterrows documentation:
You should never modify something you are iterating over. This is not
guaranteed to work in all cases. Depending on the data types, the
iterator returns a copy and not a view, and writing to it will have no
effect.
As per your request for an alternative solution, here is one using DataFrame.apply:
df['Col2'] = df.apply(lambda row: 'asd{}'.format(row.name), axis=1)
Other examples (also using Series.apply) that may be useful for your eventual goal: (not clear what it is yet)
df['Col2'] = df['Col2'].apply(lambda x: 'asd{}'.format(x))
df['Col2'] = df.apply(lambda row: 'asd{}'.format(row['Col3']), axis=1)

Here is something you can try,
import pandas as pd
df = pd.DataFrame({'Col1': {0: 'A', 1: 'A', 2: 'B', 3: 'B', 4: 'B'},
'Col2': {0: 'a', 1: 'a', 2: 'b', 3: 'b', 4: 'c'},
'Col3': {0: 42, 1: 28, 2: 56, 3: 62, 4: 48}})
print(
df.assign(idx=df.index)[['idx', 'Col2']]
.apply(lambda x: x['Col2'] if x['idx'] == 0 else f"asd{x['idx']}", axis=1)
)
0 a
1 asd1
2 asd2
3 asd3
4 asd4
dtype: object

Related

Replace values in a row with values from a list in pandas?

I have the following df:
import pandas as pd
d = {
'Group': ['Month', 'Sport'],
'col1': [1, 'A'],
'col2': [4, 'B'],
'col3': [9, 'C']
}
df = pd.DataFrame(d)
I would like to convert all of the values in row index[0] excluding 'Month' to actual months. I've tried the following:
import datetime as dt
m_lst = []
for i in df.iloc[0]:
if type(i) != str:
x = dt.date(1900, i, 1).strftime('%B')
m_lst.append(x)
df.iloc[0][1:] = m_lst #(doesn't work)
So the for loop creates a list of months that correlate to the value in the dataframe. I just can't seem to figure out how to replace the original values with the values from the list. If there's an easier way of doing this, that would be great as well.
You can convert those values to datetime using pandas.to_datetime and then use the month_name property
import pandas as pd
d = {
'Group': ['Month', 'Sport'],
'col1': [1, 'A'],
'col2': [4, 'B'],
'col3': [9, 'C']
}
df = pd.DataFrame(d)
df.iloc[0, 1:] = pd.to_datetime(df.iloc[0, 1:], format='%m').dt.month_name()
Output:
>>> df
Group col1 col2 col3
0 Month January April September
1 Sport A B C
Assuming your month numbers are always in the same position, row 0, I'd use iloc and apply lambda like this:
import datetime as dt
import pandas as pd
def month_number_to_str(m: int):
return dt.datetime.strptime(str(m), '%m').strftime('%B')
d = {
'Group': ['Month', 'Sport'],
'col1': [1, 'A'],
'col2': [4, 'B'],
'col3': [9, 'C']
}
df = pd.DataFrame(d)
df.iloc[0, 1:] = df.iloc[0, 1:].apply(lambda x: month_number_to_str(x))
print(df)
Output:
Group col1 col2 col3
0 Month January April September
1 Sport A B C
Another way is to use Series.map.
It can translate values for you, e.g., based on a dictionary like this
(where you get it is up to you):
months = {1: 'January',
2: 'February',
3: 'March',
4: 'April',
5: 'May',
6: 'June',
7: 'July',
8: 'August',
9: 'September',
10: 'October',
11: 'November',
12: 'December'}
Then it's just a matter of selecting the right part of df and mapping the values:
>>> df.iloc[0, 1:] = df.iloc[0, 1:].map(month_dict)
>>> df
Group col1 col2 col3
0 Month January April September
1 Sport A B C

get the length of dictionary within a dataframe

I am currently learning pandas and would like to know how can i get filter the rows whose column (that is a dictionary) has more than 3 keys in it. For example,
data = {'id':[1,2,3], 'types': [{1: 'a', 2:'b', 3:'c'},{1: 'a', 2:'b', 3:'c', 4:'d'}, {1: 'a', 2:'b', 3:'c'}]}
df = pd.dataframe(data)
How can i get the rows where the len of dictionary in column types is > 3
I tried doing
df[len(df['types']) > 3]
but it doesnt work. Any simple solution out there?
Use Series.apply or Series.map:
df = df[df['types'].apply(len) > 3]
#alternative
#df = df[df['types'].map(len) > 3]
print (df)
id types
1 2 {1: 'a', 2: 'b', 3: 'c', 4: 'd'}
Or Series.str.len:
df = df[df['types'].str.len() > 3]

pandas filtering lists with lists

how do I get all of the 'id's' whereby an element in the 'combo' list matches any element in the 'search' list?
# setup df
d = {'id': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5},
'combo': {0: ['a', 'b'], 1: ['a'], 2: ['c', 'd'], 3: ['c', 'e'], 4: ['d'], 5: ['c', 'f']}}
df = pd.DataFrame(d)
search = ['a','d']
the following works, but can I just get a list of id's as a 1 liner - instead of writing to the data frame
df['check'] = df.apply(lambda x: any(i in search for i in x['combo']), axis=1)
df['id'][(df['check'] == True)]
Try:
df.loc[df['combo'].explode().isin(search).any(level=0),'id']
Output:
0 0
1 1
2 2
4 4
Name: id, dtype: int64
Try with
out = df.id[pd.DataFrame(df.combo.tolist()).isin(['a','d']).any(1).values]
Out[61]:
0 0
1 1
2 2
4 4
Name: id, dtype: int64
You can use set():
ids = df.id[df.combo.apply(lambda x: bool(set(x).intersection(search)))]
print(ids)
Prints:
0 0
1 1
2 2
4 4
Name: id, dtype: int64

Python: Pandas Dataframe Column Headers Look Strange After Groupby

I implemented the following groupby statement in my code. The purpose of the code below is to provide the minimum date from the "DTIN" column by unique EVENTID.
df_EVENT5_future_2 = df_EVENT5_future.groupby('EVENTID').agg({'DTIN': [np.min]})
df_EVENT5_future_3 = df_EVENT5_future_2.reset_index()
The output table is follows:
EVENTID DTIN
amin
A 1/3/2019
B 1/19/2019
C 2/10/2019
I would like the table to output like this. I don't want the amin to be in the column header.
EVENTID DTIN
A 1/3/2019
B 1/19/2019
C 2/10/2019
Any help is greatly appreciated.
This is as per #Wen's suggestion. You don't need to use agg for this. Simply use groupby.min() and set as_index=False:
result = df.groupby('EVENTID', as_index=False)['DTIN'].min()
Please do not upvote or accept this answer, as this is a duplicate.
Example
df = pd.DataFrame({'DTIN': {0: 4, 1: 3, 2: 9, 3: 1, 4: 2, 5: 5, 6: 6, 7: 5},
'EVENTID': {0: 'A', 1: 'A', 2: 'A', 3: 'B', 4: 'C', 5: 'B', 6: 'B', 7: 'C'}})
result = df.groupby('EVENTID', as_index=False)['DTIN'].min()
# EVENTID DTIN
# 0 A 3
# 1 B 1
# 2 C 2

Get most common column for each column value

I want the most common letter for each number. I've tried a variety of things; not sure what's the right way.
import pandas as pd
from pandas import DataFrame, Series
original = DataFrame({
'letter': {0: 'A', 1: 'A', 2: 'A', 3: 'B', 4: 'B'},
'number': {0: '01', 1: '01', 2: '02', 3: '02', 4: '02'}
})
expected = DataFrame({'most_common_letter': {'01': 'A', '02': 'B'}})
Ideally I'm looking to maximize readability.
We can use DataFrame.mode() method:
In [43]: df.groupby('number')[['letter']] \
.apply(lambda x: x.mode()) \
.reset_index(level=1, drop=True)
Out[43]:
letter
number
01 A
02 B
Use groupby + apply + value_counts + select first index values, because values are sorted.
Last convert Series to_frame and remove index name by rename_axis:
df = original.groupby('number')['letter'] \
.apply(lambda x: x.value_counts().index[0])
.to_frame('most_common_letter')
.rename_axis(None)
print (df)
most_common_letter
01 A
02 B
Similar solution:
from collections import Counter
df = original.groupby('number')['letter'] \
.apply(lambda x: Counter(x).most_common(1)[0][0]) \
.to_frame('most_common_letter') \
.rename_axis(None)
print (df)
most_common_letter
01 A
02 B
Or use Series.mode:
df = original.groupby('number')['letter'] \
.apply(lambda x: x.mode()[0][0])
.to_frame('most_common_letter')
.rename_axis(None)
print (df)
most_common_letter
01 A
02 B
>>> df = pd.DataFrame({
'letter': {0: 'A', 1: 'A', 2: 'A', 3: 'B', 4: 'B'},
'number': {0: '01', 1: '01', 2: '02', 3: '02', 4: '02'}})
>>> df['most_common_letter']=df.groupby('number')['letter'].transform(max)
>>> df = df.iloc[:,1:].drop_duplicates().set_index('number')
>>> df.index.name = None
>>> df
most_common_letter
01 A
02 B
Or this way if it helps readability:
>>> df['most_common_letter']=df.groupby('number')['letter'].transform(max)
>>> df = df.drop('letter', axis=1).drop_duplicates().rename({'number': None}).set_index('number')
>>> df
most_common_letter
01 A
02 B

Categories

Resources