Create Duplicate Rows and Change Values in Specific Columns - python

How to create x amount of duplicates based on a row in the dataframe and change a single or multi variables from specific columns. The rows are then added to the end of the same dataframe.
A B C D E F
0 1 1 0 1 1 0
1 2 2 1 1 1 0
2 2 2 1 1 1 0
3 2 2 1 1 1 0
4 1 1 0 1 1 0 <- Create 25 Duplicates of this row (4) and change variable C to 1
5 1 1 0 1 1 0
6 2 2 1 1 1 0
7 2 2 1 1 1 0
8 2 2 1 1 1 0
9 1 1 0 1 1 0

I repeat only 10 times to keep length of result reasonable.
# Number of repeats |
# v
df.append(df.loc[[4] * 10].assign(C=1), ignore_index=True)
A B C D E F
0 1 1 0 1 1 0
1 2 2 1 1 1 0
2 2 2 1 1 1 0
3 2 2 1 1 1 0
4 1 1 0 1 1 0
5 1 1 0 1 1 0
6 2 2 1 1 1 0
7 2 2 1 1 1 0
8 2 2 1 1 1 0
9 1 1 0 1 1 0
10 1 1 1 1 1 0
11 1 1 1 1 1 0
12 1 1 1 1 1 0
13 1 1 1 1 1 0
14 1 1 1 1 1 0
15 1 1 1 1 1 0
16 1 1 1 1 1 0
17 1 1 1 1 1 0
18 1 1 1 1 1 0
19 1 1 1 1 1 0
Per comments, try:
df.append(df.loc[[4] * 10].assign(**{'C': 1}), ignore_index=True)

I am using repeat and reindex
s=df.iloc[[4],] # pick the row you want to do repeat
s=s.reindex(s.index.repeat(45))# repeat the row by the giving number
#s=pd.DataFrame([df.iloc[4,].tolist()]*25) if need enhance the speed , using this line replace the above
s.loc[:,'C']=1 # change the value
pd.concat([df,s]) #append to the original df

Related

How to count consecutive same values in a pythonic way that looks iterative

So I am trying to count the number of consecutive same values in a dataframe and put that information into a new column in the dataframe, but I want the count to look iterative.
Here is what I have so far:
df = pd.DataFrame(np.random.randint(0,3, size=(15,4)), columns=list('ABCD'))
df['subgroupA'] = (df.A != df.A.shift(1)).cumsum()
dfg = df.groupby(by='subgroupA', as_index=False).apply(lambda grp: len(grp))
dfg.rename(columns={None: 'numConsec'}, inplace=True)
df = df.merge(dfg, how='left', on='subgroupA')
df
Here is the result:
A B C D subgroupA numConsec
0 2 1 1 1 1 1
1 1 2 1 0 2 2
2 1 0 2 1 2 2
3 0 1 2 0 3 1
4 1 0 0 1 4 1
5 0 2 2 1 5 2
6 0 2 1 1 5 2
7 1 0 0 1 6 1
8 0 2 0 0 7 4
9 0 0 0 2 7 4
10 0 2 1 1 7 4
11 0 2 2 0 7 4
12 1 2 0 1 8 1
13 0 1 1 0 9 1
14 1 1 1 0 10 1
The problem is, in the numConsec column, I don't want the full count for every row. I want it to reflect how it looks as you iteratively look at the dataframe. The problem is, my dataframe is too large to iteratively loop through and make the counts, as that would be too slow. I need to do it in a pythonic way and make it look like this:
A B C D subgroupA numConsec
0 2 1 1 1 1 1
1 1 2 1 0 2 1
2 1 0 2 1 2 2
3 0 1 2 0 3 1
4 1 0 0 1 4 1
5 0 2 2 1 5 1
6 0 2 1 1 5 2
7 1 0 0 1 6 1
8 0 2 0 0 7 1
9 0 0 0 2 7 2
10 0 2 1 1 7 3
11 0 2 2 0 7 4
12 1 2 0 1 8 1
13 0 1 1 0 9 1
14 1 1 1 0 10 1
Any ideas?

ffil with a group by and matching a condition

I am willing to foward fill the value of log for each id whenever you find the first 1 in the log column
Example:
df
id log
1 0
1 1
1 0
1 0
2 1
2 0
3 1
3 0
3 1
to
id log ffil_log
1 0 0
1 1 1
1 0 1
1 0 1
2 1 1
2 0 1
3 1 1
3 0 1
3 1 1
My try was:
df['ffil_log']=df.log.where(df.log==1).groupby(df.id).ffill()
You can use cummax and groupby such as:
df['ffil_log'] = df.groupby('id')['log'].cummax()
for each id, once your reach 1 in a row, it will be the value for the one after, and you get as expected
id log ffil_log
0 1 0 0
1 1 1 1
2 1 0 1
3 1 0 1
4 2 1 1
5 2 0 1
6 3 1 1
7 3 0 1
8 3 1 1

Python append dataframe such that only columns remain the same

I have the following dataframes in python pandas:
A:
1 2 3 4 5 6 7 8 9 10
0 1 1 1 1 1 1 1 0 0 1 1
B:
1 2 3 4 5 6 7 8 9 10
1 0 1 1 1 1 1 1 0 0 1 0
C:
1 2 3 4 5 6 7 8 9 10
2 0 1 1 1 0 0 0 0 0 1 0
I want to concatenate them together such that the column titles remain the same while row index and values get appended so the new dataframe is:
df:
1 2 3 4 5 6 7 8 9 10
0 1 1 1 1 1 1 1 0 0 1 1
1 0 1 1 1 1 1 1 0 0 1 0
2 0 1 1 1 0 0 0 0 0 1 0
I have tried using append and concat but none seem to be fulfilling the output I am trying to achieve. Any suggestions?
Here is what I tried:
df = pd.concat([df,pd.concat([A,B,C], ignore_index=True)], axis=1)
This is a plain vanilla concat
pd.concat([A, B, C])
1 2 3 4 5 6 7 8 9 10
0 1 1 1 1 1 1 1 0 0 1 1
1 0 1 1 1 1 1 1 0 0 1 0
2 0 1 1 1 0 0 0 0 0 1 0
Simple pd.concat will just do the work, you over complicated the task a little bit:
pd.concat([A,B,C], axis=0, ignore_index=True)

Identifying groups with same column value and count them

I am working with a dataframe, consisting of a continuity column df['continuity'] and a column group df['group'].
Both are binary columns.
I want to add an extra column 'group_id' that gives consecutive rows of 1s the same integer value, where the first group of rows have a
1, then 2 etc. After each time where the continuity value of a row is 0, the counting should start again at 1.
Since this question is rather specific, I'm not sure how to tackle this vectorized. Below an example, where the first two
columns are the input and the column the output I'd like to have.
continuity group group_id
1 0 0
1 1 1
1 1 1
1 1 1
1 0 0
1 1 2
1 1 2
1 1 2
1 0 0
1 0 0
1 1 3
1 1 3
0 1 1
0 0 0
1 1 1
1 1 1
1 0 0
1 0 0
1 1 2
1 1 2
I believe you can use:
#get unique groups in both columns
b = df[['continuity','group']].ne(df[['continuity','group']].shift()).cumsum()
#identify first 1
c = ~b.duplicated() & (df['group'] == 1)
#cumulative sum of first values only if group are 1, else 0 per groups
df['new'] = np.where(df['group'] == 1,
c.groupby(b['continuity']).cumsum(),
0).astype(int)
print (df)
continuity group group_id new
0 1 0 0 0
1 1 1 1 1
2 1 1 1 1
3 1 1 1 1
4 1 0 0 0
5 1 1 2 2
6 1 1 2 2
7 1 1 2 2
8 1 0 0 0
9 1 0 0 0
10 1 1 3 3
11 1 1 3 3
12 0 1 1 1
13 0 0 0 0
14 1 1 1 1
15 1 1 1 1
16 1 0 0 0
17 1 0 0 0
18 1 1 2 2
19 1 1 2 2

How to concatenate all values of a pandas dataframe into an integer in python?

I have the following dataframe:
1 2 3 4 5 6 7 8 9 10
dog cat 1 1 0 1 1 1 0 0 1 0
dog 1 1 1 1 1 1 0 0 1 1
fox 1 1 1 1 1 1 0 0 1 1
jumps 1 1 1 1 1 1 0 1 1 1
over 1 1 1 1 1 1 0 0 1 1
the 1 1 1 1 1 1 1 0 1 1
I want to first drop all labels from both rows and columns so the df becomes:
1 1 0 1 1 1 0 0 1 0
1 1 1 1 1 1 0 0 1 1
1 1 1 1 1 1 0 0 1 1
1 1 1 1 1 1 0 1 1 1
1 1 1 1 1 1 0 0 1 1
1 1 1 1 1 1 1 0 1 1
And then get then concatenate the values into one long int number so it becomes:
110111001011111100111111110011111111011111111100111111111011
Does any know a way of doing it in the shortest snippet of code possible. I appreciate the suggestions. Thank you.
Option 1
apply(str.join) + str.cat:
df.astype(str).apply(''.join, 1).str.cat(sep='')
'110111001011111100111111110011111111011111111100111111111011'
Option 2
apply + np.add, proposed by Wen:
np.sum(df.astype(str).apply(np.sum, 1))
'110111001011111100111111110011111111011111111100111111111011'
IIUC
''.join(str(x) for x in sum(df.values.tolist(),[]))
Out[344]: '110111001011111100111111110011111111011111111100111111111011'
Or
''.join(map(str,sum(df.values.tolist(),[])))

Categories

Resources