add string based on conditional formating of a dataframe

add string based on conditional formating of a dataframe - python

data = {"marks":[1,2,3,4,5,6,7,8,9,10,11,12], "month":['jan','feb','mar','apr','may','jun','jul','aug','sep','oct','nov','dec']}
df2 = pd.DataFrame(data)
Till now I tried below but not getting as mentioned above:
for i in df2['month']:
if (i=='jan' or i=='feb' or i=='mar'):
df2['q'] = '1Q'
else:
df2['q']='other'

Use Series.dt.quarter with convert column to datetimes and add q:
df2['new'] = 'q' + pd.to_datetime(df2['month'], format='%b').dt.quarter.astype(str)
Or use Series.map by dictionary:
d = {'jan':'q1', 'feb':'q1','mar':'q1',
'apr':'q2','may':'q2', 'jun':'q2',
'jul':'q3','aug':'q3', 'sep':'q3',
'oct':'q4','nov':'q4', 'dec':'q4'}
df2['new'] = df2['month'].map(d)

Related

Python Pandas How to combine columns according to the condition of checking another column

I have a DataFrame:
value,combined,value_shifted,Sequence_shifted,long,short
12834.0,2.0,12836.0,3.0,2.0,-2.0
12813.0,-2.0,12781.0,-3.0,-32.0,32.0
12830.0,2.0,12831.0,3.0,1.0,-1.0
12809.0,-2.0,12803.0,-3.0,-6.0,6.0
12822.0,2.0,12805.0,3.0,-17.0,17.0
12800.0,-2.0,12807.0,-3.0,7.0,-7.0
12773.0,2.0,12772.0,3.0,-1.0,1.0
12786.0,-2.0,12787.0,1.0,1.0,-1.0
12790.0,2.0,12784.0,3.0,-6.0,6.0
I want to combine the long and short columns according to the value of the combined column
If df.combined == 2 then we leave the value long
If df.combined == -2 then we leave the value short
Expected result:
value,combined,value_shifted,Sequence_shifted,calc
12834.0,2.0,12836.0,3.0,2.0
12813.0,-2.0,12781.0,-3.0,32
12830.0,2.0,12831.0,3.0,1.0
12809.0,-2.0,12803.0,-3.0,6.0
12822.0,2.0,12805.0,3.0,-17
12800.0,-2.0,12807.0,-3.0,-1.0
12773.0,2.0,12772.0,3.0,-1.0
12786.0,-2.0,12787.0,1.0,-6.0
12790.0,2.0,12784.0,3.0,20.0

Use if possible 2,-2 or another values in combined column numpy.select:
df['calc'] = np.select([df['combined'].eq(2), df['combined'].eq(-2)],
[df['long'], df['short']])
Or if only 2,-1 values use numpy.where:
df['calc'] = np.where(df['combined'].eq(2), df['long'], df['short'])

Try this:
df['calc'] = df['long'].where(df['combined'] == 2, df['short'])

df['calc'] = np.nan
mask_2 = df['combined'] == 2
df.loc[mask_2, 'calc'] = df.loc[mask_2, 'long']
mask_minus_2 = df['combined'] == -2
df.loc[mask_minus_2, 'calc'] = df.loc[mask_minus_2, 'short']
then you can drop the long and short columns:
df.drop(columns=['long', 'short'], inplace=True)

Datetime comparison using f strings in python

Consider the following dataframe
Y = pd.DataFrame([("2021-10-11","john"),("2021-10-12","wick")],columns = ['Date','Name'])
Y['Date'] = pd.to_datetime(Y['Date'])
Now consider the following code snippet in which I try to print slices of the dataframe filtered on the column "Date". However, it prints a empty dataframe
for date in set(Y['Date']):
print(Y.query(f'Date == {date.date()}'))
Essentially, I wanted to filter the dataframe on the column "Date" and do some processing on that in the loop. How do I achieve that?

The date needs to be accessed at the following query command:
Y = pd.DataFrame([("2021-10-11","john"),("2021-10-12","wick")],columns = ['Date','Name'])
for date in set(Y['Date']):
print(Y.query('Date == #date'))

Use "" because f-strings removed original "" and error is raised:
Y = pd.DataFrame([("2021-10-11","john"),("2021-10-12","wick")],columns = ['Date','Name'])
Y['Date'] = pd.to_datetime(Y['Date'])
for date in set(Y['Date']):
print(Y.query(f'Date == "{date}"'))

How to convert object to float in Pandas?

I read a csv file into a pandas dataframe and got all column types as objects. I need to convert the second and third columns to float.
I tried using
df["Quantidade"] = pd.to_numeric(df.Quantidade, errors='coerce')
but got NaN.
Here's my dataframe. Should I need to use some regex in the third column to get rid of the "R$ "?

Try this:
# sample dataframe
d = {'Quantidade':['0,20939', '0,0082525', '0,009852', '0,012920', '0,0252'],
'price':['R$ 165.000,00', 'R$ 100.000,00', 'R$ 61.500,00', 'R$ 65.900,00', 'R$ 49.375,12']}
df = pd.DataFrame(data=d)
# Second column
df["Quantidade"] = df["Quantidade"].str.replace(',', '.').astype(float)
#Third column
df['price'] = df.price.str.replace(r'\w+\$\s+', '').str.replace('.', '')\
.str.replace(',', '.').astype(float)
Output:
Quantidade price
0 0.209390 165000.00
1 0.008252 100000.00
2 0.009852 61500.00
3 0.012920 65900.00
4 0.025200 49375.12

Try something like this:
df["Quantidade"] = df["Quantidade"].str.replace(',', '.').astype(float)

df['Quantidade'] = df['Quantidade'].astype(float)

DataFrame with one column 0 to 100

I need a DataFrame of one column ['Week'] that has all values from 0 to 100 inclusive.
I need it as a Dataframe so I can perform a pd.merge
So far I have tried creating an empty DataFrame, creating a series of 0-100 and then attempting to append this series to the DataFrame as a column.
alert_count_list = pd.DataFrame()
week_list= pd.Series(range(0,101))
alert_count_list['week'] = alert_count_list.append(week_list)

Try this:
df = pd.DataFrame(columns=["week"])
df.loc[:,"week"] = np.arange(101)

alert_count_list = pd.DataFrame(np.zeros(101), columns=['week'])
or
alert_count_list = pd.DataFrame({'week':range(101)})

You can try:
week_vals = []
for i in range(0, 101):
week_vals.append(i)
df = pd.Dataframe(columns = ['week'])
df['week'] = week_vals

Change of element in column not updating in data frame

I am trying to update the names in a pandas dataframe column. I want:
[IN]
B17.31
107.34
34
B50.56
[OUT]
B17.31
B107.34
B34
B50.56
The code I am using is:
for file in df1.loc[:, '#filename']:
new = str(file)
if new[0] != 'B':
final = new[:0] + 'B' + new[0:]
else:
final = new
print((final))
df1.replace(new, final)
print(df1['#filename'])
df1.to_csv('updated_name_data.csv')
I can not work out why it will print out the updated name but will not update in the dataframe or csv. Any help or a pointer in the right direction would be greatly appreciated.

This should work:
for file in df1.loc[:, '#filename']:
new = str(file)
if new[0] != 'B':
final = new[:0] + 'B' + new[0:]
else:
final = new
print((final))
df1.replace(new, final,inplace=True)# yuo are not using inplace=True, this is required as otherwise this will return the old dataframe after replacement
print(df1['#filename'])
df1.to_csv('updated_name_data.csv')

You should aim to use vectorised operations rather than a manual loop. For example, you can isolate numeric values and prefix with "B":
s = pd.Series(['B17.31', 107.34, 34, 'B50.56'])
mask = pd.to_numeric(s, errors='coerce').notnull()
s.loc[mask] = 'B' + s.astype(str)
print(s)
0 B17.31
1 B107.34
2 B34
3 B50.56
dtype: object

In pandas is best avoid loops, because slow, better is use vectorized functions.
So you can create boolean mask by str.startswith and then add B to original column with numpy.where:
mask = df1['#filename'].astype(str).str.startswith('B')
df1['#filename'] = np.where(mask, df1['#filename'], 'B' + df1['#filename'].astype(str))
Another similar solution with inverting mask by ~:
df1.loc[~mask, '#filename'] = 'B' + df1['#filename'].astype(str)
print (df1)
#filename
0 B17.31
1 B107.34
2 B34
3 B50.56

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

add string based on conditional formating of a dataframe - python

Related

Python Pandas How to combine columns according to the condition of checking another column

Datetime comparison using f strings in python

How to convert object to float in Pandas?

DataFrame with one column 0 to 100

Change of element in column not updating in data frame

Categories

Resources