How to add unbalanced List into a dataFrame in Python?

How to add unbalanced List into a dataFrame in Python? - python

Here is My dataframe and List
X Y Z X1
1 2 3 3
2 7 2 6
3 10 5 4
4 3 7 9
5 3 3 4
list1=[3,5,6]
list2=[4,3,7,4]
I want to add the lists into a data frame, I have tried some code but it gives an error and something is not working
#Expected Output
X Y Z X1
1 2 3 3
2 7 2 6
3 10 5 4
4 3 7 9
5 3 3 4
3 4
5 3
6 7
4
#here is my code
list1 = [3,5, 6]
df_length = len(df1)
df1.loc[df_length] = list1
Please help me to solve this problem.
Thanks in advance.

Use series.append() to create the new series (X & X1), and create the output df using pd.concat():
s1 = df.X.append(pd.Series(list1)).reset_index(drop=True)
s2 = df.X1.append(pd.Series(list2)).reset_index(drop=True)
df = pd.concat([s1, df.Y, df.Z, s2], axis=1).rename(columns={0: 'X', 1: 'X1'})
df
X Y Z X1
0 1.0 2.0 3.0 3
1 2.0 7.0 2.0 6
2 3.0 10.0 5.0 4
3 4.0 3.0 7.0 9
4 5.0 3.0 3.0 4
5 3.0 NaN NaN 4
6 5.0 NaN NaN 3
7 6.0 NaN NaN 7
8 NaN NaN NaN 4

'''
X Y Z X1
1 2 3 3
2 7 2 6
3 10 5 4
4 3 7 9
5 3 3 4
'''
list1=[3,5,6]
list2=[4,3,7,4]
ls_empty=[]
import pandas as pd
import numpy as np
df = pd.read_clipboard()
df1 = pd.DataFrame([list1, ls_empty, ls_empty, list2])
df1 = df1.T
df1.columns = df.columns
df2 = pd.concat([df, df1]).replace(np.nan, '', regex=True).reset_index(drop=True).astype({'X1': int})
print(df2)
Output:
X Y Z X1
0 1 2 3 3
1 2 7 2 6
2 3 10 5 4
3 4 3 7 9
4 5 3 3 4
5 3 4
6 5 3
7 6 7
8 4

Related

Jupyter Notebook for csv file to select 3 window rolling [duplicate]

This question already has answers here:
Rolling or sliding window iterator?
(29 answers)
Closed 1 year ago.
I have this big data in csv file:
I manage to open this on Jupyter Notebook.
The data in csv example: 1 2 3 4 5 6 7 8 9 10
And I wanted to open in the notebook as '3 windows rolling' without doing any (sum,mean for example)
The output I want in the notebook are>>

First open csv to get first column.
import pandas as pd
df = pd.read_csv("filename.csv")
I will use io only to simulate data from file
text = """first
1
2
3
4
5
6
7
8
9
10"""
import pandas as pd
import io
df = pd.read_csv(io.StringIO(text))
Result
first
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
Next you can use shift to create other columns
df['second'] = df['first'].shift(-1)
df['third'] = df['first'].shift(-2)
Result
first second third
0 1 2.0 3.0
1 2 3.0 4.0
2 3 4.0 5.0
3 4 5.0 6.0
4 5 6.0 7.0
5 6 7.0 8.0
6 7 8.0 9.0
7 8 9.0 10.0
8 9 10.0 NaN
9 10 NaN NaN
At the end you can remove two last rows with NaN and convert all to integer
df = df[:-2].astype(int)
or if you don't have NaN in other places
df = df.dropna().astype(int)
Result:
first second third
0 1 2 3
1 2 3 4
2 3 4 5
3 4 5 6
4 5 6 7
5 6 7 8
6 7 8 9
7 8 9 10
Minimal working code
text = """first
1
2
3
4
5
6
7
8
9
10"""
import pandas as pd
import io
df = pd.read_csv(io.StringIO(text))
#df = pd.DataFrame(range(1,11), columns=['first'])
print(df)
df['second'] = df['first'].shift(-1) #, fill_value=0)
df['third'] = df['first'].shift(-2)
print(df)
#df = df.dropna().astype(int)
df = df[:-2].astype(int)
print(df)
EDIT:
The same using for-loop to create any number of columns
text = """col 1
1
2
3
4
5
6
7
8
9
10"""
import pandas as pd
import io
df = pd.read_csv(io.StringIO(text))
#df = pd.DataFrame(range(1,11), columns=['col 1'])
print(df)
number = 5
for x in range(1, number+1):
df[f'col {x+1}'] = df['col 1'].shift(-x)
print(df)
#df = df.dropna().astype(int)
df = df[:-number].astype(int)
print(df)
Result
col 1
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
col 1 col 2 col 3 col 4 col 5 col 6
0 1 2.0 3.0 4.0 5.0 6.0
1 2 3.0 4.0 5.0 6.0 7.0
2 3 4.0 5.0 6.0 7.0 8.0
3 4 5.0 6.0 7.0 8.0 9.0
4 5 6.0 7.0 8.0 9.0 10.0
5 6 7.0 8.0 9.0 10.0 NaN
6 7 8.0 9.0 10.0 NaN NaN
7 8 9.0 10.0 NaN NaN NaN
8 9 10.0 NaN NaN NaN NaN
9 10 NaN NaN NaN NaN NaN
col 1 col 2 col 3 col 4 col 5 col 6
0 1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10

Calculate the amount of consecutive missing values in a row

I am trying to find a way to calculate the amount of values randomly removed from a data frame and the amount of values randomly removed one after another.
The code I have so far is:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
#Sampledata
x=[1,2,3,4,5,6,7,8,9,10]
y=[1,2,3,4,5,6,7,8,9,10]
df = pd.DataFrame({'col_1':y,'col_2':x})
drop_indices = np.random.choice(df.index, 5,replace=False )
df_subset = df.drop(drop_indices)
print(df_subset)
print(df)
Which randomly removes 5 rows from the data frame and gives as output:
col_1 col_2
0 1 1
1 2 2
2 3 3
5 6 6
8 9 9
col_1 col_2
0 1 1
1 2 2
2 3 3
3 4 4
4 5 5
5 6 6
6 7 7
7 8 8
8 9 9
9 10 10
I would like to turn this into the following data frame:
col_1 col_2 col_2 N_removedvalues N_consecutive
0 1 1 1 0 0
1 2 2 2 0 0
2 3 3 3 0 0
3 4 4 1 1
4 5 5 2 2
5 6 6 6 2 0
6 7 7 3 1
7 8 8 4 2
8 9 9 9 4 0
9 10 10 5 1

res=df.merge(df_subset, on='col_1', suffixes=['_1',''], how='left')
res["N_removedvalues"]=np.where(res['col_2'].isna(), res.groupby(res['col_2'].isna()).cumcount().add(1), np.nan)
res["N_removedvalues"]=res["N_removedvalues"].ffill().fillna(0)
res['N_consecutive']=np.logical_and(res['col_2'].isna(), np.logical_or(~res['col_2'].shift().isna(), res.index==res.index[0]))
res.loc[np.logical_and(res['N_consecutive']==0, res['col_2'].isna()), 'N_consecutive']=np.nan
res['N_consecutive']=res.groupby('N_consecutive')['N_consecutive'].cumsum().ffill()
res.loc[res['N_consecutive'].gt(0), 'N_consecutive']=res.loc[res['N_consecutive'].gt(0)].groupby('N_consecutive').cumcount().add(1)
Outputs:
col_1 col_2_1 col_2 N_removedvalues N_consecutive
0 1 1 1.0 0.0 0.0
1 2 2 2.0 0.0 0.0
2 3 3 NaN 1.0 1.0
3 4 4 4.0 1.0 0.0
4 5 5 NaN 2.0 1.0
5 6 6 NaN 3.0 2.0
6 7 7 7.0 3.0 0.0
7 8 8 8.0 3.0 0.0
8 9 9 NaN 4.0 1.0
9 10 10 NaN 5.0 2.0

Insert columns with constants in pandas

I have a row of pandas dataframe, i.e.
x p y q z
---------
1 4 2 5 3
I want to append only some columns ('x','y','z') of it to another dataframe as new columns with names 'a','b','c'.
Before:
A B
---
7 8
9 6
8 5
After
A B a b c
---------
7 8 1 2 3
9 6 1 2 3
8 5 1 2 3

try this,
df1=pd.DataFrame({'x':[1],'y':[2],'z':[3]})
df2=pd.DataFrame({'A':[7,9,8],'B':[8,6,5]})
print pd.concat([df2,df1],axis=1).fillna(method='ffill').rename(columns={'x':'a','y':'b','z':'c'})
A B a b c
0 7 8 1.0 2.0 3.0
1 9 6 1.0 2.0 3.0
2 8 5 1.0 2.0 3.0

Use assign by Series created by selecting 1. row of df1:
cols = ['x','y','z']
new_cols = ['a','b','c']
df = df2.assign(**pd.Series(df1[cols].iloc[0].values, index=new_cols))
print (df)
A B a b c
0 7 8 1 2 3
1 9 6 1 2 3
2 8 5 1 2 3

pandas dataframe sum of shift(x) for x in range(1, n)

I have a dataframe with like this, and want to add a new column that is the equivalent of applying shift n times. For example, let n = 2:
df = pd.DataFrame(numpy.random.randint(0, 10, (10, 2)), columns=['a','b'])
a b
0 0 3
1 7 0
2 6 6
3 6 0
4 5 0
5 0 7
6 8 0
7 8 7
8 4 4
9 2 2
df['c'] = df['b'].shift(1) + df['b'].shift(2)
a b c
0 0 3 NaN
1 7 0 NaN
2 6 6 3.0
3 6 0 6.0
4 5 0 6.0
5 0 7 0.0
6 8 0 7.0
7 8 7 7.0
8 4 4 7.0
9 2 2 11.0
In this manner, column c gets the sum of the previous n values from column b.
Other than a loop, is there a better way to accomplish this for a large n?

You can use the rolling() method with a window of 2:
df['c'] = df.b.rolling(window = 2).sum().shift()
df
a b c
0 0 3 NaN
1 7 0 NaN
2 6 6 3.0
3 6 0 6.0
4 5 0 6.0
5 0 7 0.0
6 8 0 7.0
7 8 7 7.0
8 4 4 7.0
9 2 2 11.0

split dataframe and create a new dataframe

the simple code
I have a dataframe df and i I have a dataframe and I divided into 3 dataframe of same size .However I wanted with these 3 dataframe created one dataframe .The columns of the new dataframe be the transposed these 3 dataframe ie there will 3 columns
In [4]: np.array_split(df, 3)
Out[4]:
[ A B C D
0 foo one -0.174067 -0.608579
1 bar one -0.860386 -1.210518
2 foo two 0.614102 1.689837,
A B C D
3 bar three -0.284792 -1.071160
4 foo two 0.843610 0.803712
5 bar two -1.514722 0.870861,
A B C D
6 foo one 0.131529 -0.968151
7 foo three -1.002946 -0.257468
8 foo three -1.002946 -0.257468]

UPDATE
Sliced and transposed
In [2]: df
Out[2]:
a b c
0 9 9 7
1 1 7 6
2 5 9 1
3 7 4 0
4 5 2 3
5 2 4 6
6 6 3 6
7 0 2 7
8 9 1 4
9 2 9 3
In [3]: dfs = [pd.DataFrame(a).T for a in np.array_split(df, 3)]
In [4]: dfs[0]
Out[4]:
0 1 2 3
a 9 1 5 7
b 9 7 9 4
c 7 6 1 0
OLD version
One option would be to use this:
In [114]: dfs = [pd.DataFrame(a) for a in np.array_split(df, 3)]
In [115]: dfs[0]
Out[115]:
a b c
0 NaN 7.0 0
1 0.0 NaN 4
2 2.0 NaN 4
3 1.0 7.0 0
In [116]: df
Out[116]:
a b c
0 NaN 7.0 0
1 0.0 NaN 4
2 2.0 NaN 4
3 1.0 7.0 0
4 1.0 3.0 9
5 7.0 4.0 9
6 2.0 6.0 9
7 9.0 6.0 4
8 3.0 0.0 9
9 9.0 0.0 1

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to add unbalanced List into a dataFrame in Python? - python

Related

Jupyter Notebook for csv file to select 3 window rolling [duplicate]

Calculate the amount of consecutive missing values in a row

Insert columns with constants in pandas

pandas dataframe sum of shift(x) for x in range(1, n)

split dataframe and create a new dataframe

Categories

Resources