add a dot in 4th index for each string - python

i want to add a dot to each string in dataframe,for example 49454170 become 4945.4170
below frame:
Missed Trades
0 49454170
1 49532878
2 49511387
3 49451350
4 49402211
5 49403961
6 49331707
7 49320696

Here's one approach using str.findall and join back the resulting lists with str.join:
df['Missed Trades'].str.findall(r'(\d{4})').str.join('.')
0 4945.4170
1 4953.2878
2 4951.1387
3 4945.1350
4 4940.2211
5 4940.3961
6 4933.1707
7 4932.0696
Name: MissedTrades, dtype: object

Something like this might work -
df['Missed Trades'] = df['Missed Trades'].astype(int) / 10000

df['Missed Trades'] = df['Missed Trades'].map(lambda x : x[0:4]+'.'+x[4:8])
Missed Trades
0 4945.4170
1 4953.2878
2 4951.1387
3 4945.1350
4 4940.2211
5 4940.3961
6 4933.1707
7 4932.0696

Related

Printing a number, the same number of times

I would like to create the following pattern:
1
2 2
3 3 3
4 4 4 4
5 5 5 5 5
Here is my attempt:
def print_numbers(number):
for i in range(number):
print(i * i)
Now clearly this will return the product of the number with itself. I would like to specify this function to print out the number that many times. So something like print(i) * i which is not correct syntax of course. How would I about doing this?
As mentioned by #Barmar you can convert the number into strings and multiply with same number.
#Ex str(4)*3 = 444 #It will print 3times 4.
def print_numbers(number):
for i in range(number+1):
print((str(i)+" ")*i) #Converted number into string and multiply with the same number.
print_numbers(5)
Output:
1
2 2
3 3 3
4 4 4 4
5 5 5 5 5
If you wanted a line space b/w each number you can print like: print((str(i)+" ")*i+"\n") This will give.
1
2 2
3 3 3
4 4 4 4
5 5 5 5 5

How to count ocurrences in a column dataframe Python

I have this dataframe
ORF IDClass genName ORFDesc
0 b186 [1,1,1,0] 'bglS' beta-glucosidase
1 b2202 [1,1,1,0] 'cbhK' carbohydrate kinase
2 b727 [1,1,1,0] 'fucA' L-fuculose phosphate aldolase
3 b1731 [1,1,1,0] 'gabD1' succinate-semialdehyde dehydrogenase
4 b234 [1,1,1,0] 'gabD2' succinate-semialdehyde dehydrogenase
and I need to count how many registers have IDClass = [1,1,1,0], IDClass = [1,2,0,0] etc
Im using he str.count().sum() function but it returns me more ocurrences than registers in my dataset. What am I doing wrong?
Ex:
IN: count = df2.IDClass.str.count('[1,1,1,0]').sum()
OUT: [3924 rows x 4 columns]
21552
If I do:
IN: count = df2.IDClass.str.count('[1,1,1,0]').sum()
OUT: [3924 rows x 4 columns]
0 7
1 7
2 7
3 7
4 7
..
3919 6
3920 6
3921 6
3922 6
3923 6
Any idea?
Thanks is advance,
If your IDClass is string type, you can just do:
df['IDClass'].value_counts()
If that gives an error, it's likely that your IDClass is list type. Then you can use tuple:
df['IDClass'].apply(tuple).value_counts()

Python Groupby and Count

I'm working on create a sankey plot and have the raw data mapped so that I know source and target node. I'm having an issue with grouping the source & target and then counting the number of times each occurs. E.g. using the table below finding out how many time 0 -> 4 occurs and recording that in the dataframe.
index event_action_num next_action_num
227926 0 6
227928 1 5
227934 1 6
227945 1 7
227947 1 6
227951 0 7
227956 0 6
227958 2 6
227963 0 6
227965 1 6
227968 1 5
227972 3 6
Where I want to send up is:
event_action_num next_action_num count_of
0 4 1728
0 5 2382
0 6 3739
etc
Have tried:
df_new_2 = df_new.groupby(['event_action_num', 'next_action_num']).count()
but doesn't give me the result I'm looking for.
Thanks in advance
Try to use agg('size') instead of count():
df_new_2.groupby(['event_action_num', 'next_action_num']).agg('size')
For your sample data output will be:

Change some values based on condition

Can you help on the following task? I have a dataframe column such as:
index df['Q0']
0 1
1 2
2 3
3 5
4 5
5 6
6 7
7 8
8 3
9 2
10 4
11 7
I want to substitute the values in df.loc[3:8,'Q0'] with the values in df.loc[0:2,'Q0'] if df.loc[0,'Q0']!=df.loc[3,'Q0']
The result should look like the one below:
index df['Q0']
0 1
1 2
2 3
3 1
4 2
5 3
6 1
7 2
8 3
9 2
10 4
11 7
I tried the following line:
df.loc[3:8,'Q0'].where(~df.loc[0,'Q0']!=df.loc[3,'Q0']),other=df.loc[0:2,'Q0'],inplace=True)
or
df['Q0'].replace(to_replace=df.loc[3:8,'Q0'], value=df.loc[0:2,'Q0'], inplace=True)
But it doesn't work. Most possible I am doing something wrong.
Any suggestions?
You can use the cycle function:
from itertools import cycle
c = cycle(df["Q0"][0:3])
if df.Q0[0] != df.Q0[3]:
df["Q0"][3:8] = [next(c) for _ in range(5)]
Thanks for the replies. I tried the suggestions but I have some issues:
#adnanmuttaleb -
When I applied the function in a dataframe with more than 1 column (e.g. 12x2 or larger) I notice that the value in df.Q0[8] didn't change. Why?
#jezrael -
When I adjust to your suggestion I get the error:
ValueError: cannot copy sequence with size 5 to array axis with dimension 6
When I change the range to 6, I am getting wrong results
import pandas as pd
from itertools import cycle
data={'Q0':[1,2,3,5,5,6,7,8,3,2,4,7],
'Q0_New':[0,0,0,0,0,0,0,0,0,0,0,0]}
df = pd.DataFrame(data)
##### version 1
c = cycle(df["Q0"][0:3])
if df.Q0[0] != df.Q0[3]:
df['Q0_New'][3:8] = [next(c) for _ in range(5)]
##### version 2
d = cycle(df.loc[0:3,'Q0'])
if df.Q0[0] != df.Q0[3]:
df.loc[3:8,'Q0_New'] = [next(d) for _ in range(6)]
Why we have different behaviors and what corrections need to be made?
Thanks once more guys.

Rolling sum on a dynamic window

I am new to python and the last time I coded was in the mid-80's so I appreciate your patient help.
It seems .rolling(window) requires the window to be a fixed integer. I need a rolling window where the window or lookback period is dynamic and given by another column.
In the table below, I seek the Lookbacksum which is the rolling sum of Data as specified by the Lookback column.
d={'Data':[1,1,1,2,3,2,3,2,1,2],
'Lookback':[0,1,2,2,1,3,3,2,3,1],
'LookbackSum':[1,2,3,4,5,8,10,7,8,3]}
df=pd.DataFrame(data=d)
eg:
Data Lookback LookbackSum
0 1 0 1
1 1 1 2
2 1 2 3
3 2 2 4
4 3 1 5
5 2 3 8
6 3 3 10
7 2 2 7
8 1 3 8
9 2 1 3
You can create a custom function for use with df.apply, eg:
def lookback_window(row, values, lookback, method='sum', *args, **kwargs):
loc = values.index.get_loc(row.name)
lb = lookback.loc[row.name]
return getattr(values.iloc[loc - lb: loc + 1], method)(*args, **kwargs)
Then use it as:
df['new_col'] = df.apply(lookback_window, values=df['Data'], lookback=df['Lookback'], axis=1)
There may be some corner cases but as long as your indices align and are unique - it should fulfil what you're trying to do.
here is one with a list comprehension which stores the index and value of the column df['Lookback'] and the gets the slice by reversing the values and slicing according to the column value:
df['LookbackSum'] = [sum(df.loc[:e,'Data'][::-1].to_numpy()[:i+1])
for e,i in enumerate(df['Lookback'])]
print(df)
Data Lookback LookbackSum
0 1 0 1
1 1 1 2
2 1 2 3
3 2 2 4
4 3 1 5
5 2 3 8
6 3 3 10
7 2 2 7
8 1 3 8
9 2 1 3
An exercise in pain, if you want to try an almost fully vectorized approach. Sidenote: I don't think it's worth it here. At all.
Inspired by Divakar's answer here
Given:
import numpy as np
import pandas as pd
d={'Data':[1,1,1,2,3,2,3,2,1,2],
'Lookback':[0,1,2,2,1,3,3,2,3,1],
'LookbackSum':[1,2,3,4,5,8,10,7,8,3]}
df=pd.DataFrame(data=d)
Using the function from Divakar's answer, but slightly modified
from skimage.util.shape import view_as_windows as viewW
def strided_indexing_roll(a, r, fill_value=np.nan):
# Concatenate with sliced to cover all rolls
p = np.full((a.shape[0],a.shape[1]-1),fill_value)
a_ext = np.concatenate((p,a,p),axis=1)
# Get sliding windows; use advanced-indexing to select appropriate ones
n = a.shape[1]
return viewW(a_ext,(1,n))[np.arange(len(r)), -r + (n-1),0]
Now, we just need to prepare a 2d array for the data and independently shift the rows according to our desired lookback values.
arr = df['Data'].to_numpy().reshape(1, -1).repeat(len(df), axis=0)
shifter = np.arange(len(df) - 1, -1, -1) #+ d['Lookback'] - 1
temp = strided_indexing_roll(arr, shifter, fill_value=0)
out = strided_indexing_roll(temp, (len(df) - 1 - df['Lookback'])*-1, 0).sum(-1)
Output:
array([ 1, 2, 3, 4, 5, 8, 10, 7, 8, 3], dtype=int64)
We can then just assign it back to the dataframe as needed and check.
df['out'] = out
#output:
Data Lookback LookbackSum out
0 1 0 1 1
1 1 1 2 2
2 1 2 3 3
3 2 2 4 4
4 3 1 5 5
5 2 3 8 8
6 3 3 10 10
7 2 2 7 7
8 1 3 8 8
9 2 1 3 3

Categories

Resources