Why Python Pandas append to DataFrame like this?

Why Python Pandas append to DataFrame like this? - python

I want to add l in column 'A' but it creates a new column and adds l to the last one. Why is it happening? And how can I make what I want?
import pandas as pd
l=[1,2,3]
df = pd.DataFrame(columns =['A'])
df = df.append(l, ignore_index=True)
df = df.append(l, ignore_index=True)
print(df)
A 0
0 NaN 1.0
1 NaN 2.0
2 NaN 3.0
3 NaN 1.0
4 NaN 2.0
5 NaN 3.0

Edited
Is this what you want to do:
In[6]:df=df.A.append(pd.Series(l)).reset_index().drop('index',1).rename(columns={0:'A'})
In[7]:df
Out[7]:
A
0 1
1 2
2 3
Then you can add any list of different length.
Suppose:
a=[9,8,7,6,5]
In[11]:df=df.A.append(pd.Series(a)).reset_index().drop('index',1).rename(columns={0:'A'})
In[12]:df
Out[12]:
A
0 1
1 2
2 3
3 9
4 8
5 7
6 6
7 5
Previously
are you looking for this :
df=pd.DataFrame(l,columns=['A'])
df
Out[5]:
A
0 1
1 2
2 3

You can just pass a dictionary in the dataframe constructor, that if I understand your question correctly.
l = [1,2,3]
df = pd.DataFrame({'A': l})
df
A
0 1
1 2
2 3

Related

How Can I combine two columns is one dataframe?

I have a dataset like this.
A B C A2
1 2 3 4
5 6 7 8
and I want to combine A and A2.
A B C
1 2 3
5 6 7
4
8
how can I combine two columns?
Hope for help. Thank you.

I don't think it is possible directly. But you can do it with a few lines of code:
df = pd.DataFrame({'A':[1,5],'B':[2,6],'C':[3,7],'A2':[4,8]})
df_A2 = df[['A2']]
df_A2.columns = ['A']
df = pd.concat([df.drop(['A2'],axis=1),df_A2])
You will get this if you print df:
A B C
0 1 2.0 3.0
1 5 6.0 7.0
0 4 NaN NaN
1 8 NaN NaN

You could append the last columns after renaming it:
df.append(df[['A2']].set_axis(['A'], axis=1)).drop(columns='A2')
it gives as expected:
A B C
0 1 2.0 3.0
1 5 6.0 7.0
0 4 NaN NaN
1 8 NaN NaN

if the index is not important to you:
import pandas as pd
pd.concat([df[['A','B','C']], df[['A2']].rename(columns={'A2': 'A'})]).reset_index(drop=True)

fill missing values based on the last value [duplicate]

I am dealing with pandas DataFrames like this:
id x
0 1 10
1 1 20
2 2 100
3 2 200
4 1 NaN
5 2 NaN
6 1 300
7 1 NaN
I would like to replace each NAN 'x' with the previous non-NAN 'x' from a row with the same 'id' value:
id x
0 1 10
1 1 20
2 2 100
3 2 200
4 1 20
5 2 200
6 1 300
7 1 300
Is there some slick way to do this without manually looping over rows?

You could perform a groupby/forward-fill operation on each group:
import numpy as np
import pandas as pd
df = pd.DataFrame({'id': [1,1,2,2,1,2,1,1], 'x':[10,20,100,200,np.nan,np.nan,300,np.nan]})
df['x'] = df.groupby(['id'])['x'].ffill()
print(df)
yields
id x
0 1 10.0
1 1 20.0
2 2 100.0
3 2 200.0
4 1 20.0
5 2 200.0
6 1 300.0
7 1 300.0

df
id val
0 1 23.0
1 1 NaN
2 1 NaN
3 2 NaN
4 2 34.0
5 2 NaN
6 3 2.0
7 3 NaN
8 3 NaN
df.sort_values(['id','val']).groupby('id').ffill()
id val
0 1 23.0
1 1 23.0
2 1 23.0
4 2 34.0
3 2 34.0
5 2 34.0
6 3 2.0
7 3 2.0
8 3 2.0
use sort_values, groupby and ffill so that if you have Nan value for the first value or set of first values they also get filled.

Solution for multi-key problem:
In this example, the data has the key [date, region, type]. Date is the index on the original dataframe.
import os
import pandas as pd
#sort to make indexing faster
df.sort_values(by=['date','region','type'], inplace=True)
#collect all possible regions and types
regions = list(set(df['region']))
types = list(set(df['type']))
#record column names
df_cols = df.columns
#delete ffill_df.csv so we can begin anew
try:
os.remove('ffill_df.csv')
except FileNotFoundError:
pass
# steps:
# 1) grab rows with a particular region and type
# 2) use forwardfill to fill nulls
# 3) use backwardfill to fill remaining nulls
# 4) append to file
for r in regions:
for t in types:
group_df = df[(df.region == r) & (df.type == t)].copy()
group_df.fillna(method='ffill', inplace=True)
group_df.fillna(method='bfill', inplace=True)
group_df.to_csv('ffill_df.csv', mode='a', header=False, index=True)
Checking the result:
#load in the ffill_df
ffill_df = pd.read_csv('ffill_df.csv', header=None, index_col=None)
ffill_df.columns = df_reindexed_cols
ffill_df.index= ffill_df.date
ffill_df.drop('date', axis=1, inplace=True)
ffill_df.head()
#compare new and old dataframe
print(df.shape)
print(ffill_df.shape)
print()
print(pd.isnull(ffill_df).sum())

Shift rows in a Column when equal to specific value

I want to shift rows in a pandas df when values are equal to a specific value in a Column. For the df below, I'm trying to shift the values in Column B to Column A when values in A == x.
import pandas as pd
df = pd.DataFrame({
'A' : [1,'x','x','x',5],
'B' : ['x',2,3,4,'x'],
})
This is my attempt:
df = df.loc[df.A.shift(-1) == df.A.shift(1), 'x'] = df.A.shift(1)
Intended Output:
A B
0 1 x
1 2
2 3
3 4
4 5 x

You can use:
m = df.A.eq('x')
df[m]=df[m].shift(-1,axis=1)
print(df)
A B
0 1 x
1 2 NaN
2 3 NaN
3 4 NaN
4 5 x

You can use:
df[df.A=='x'] = df.shift(-1,axis=1)
print(df)
A B
0 1 x
1 2 NaN
2 3 NaN
3 4 NaN
4 5 x

Randomly assign values to subset of rows in pandas dataframe

I am using Python 2.7.11 with Anaconda.
I understand how to set the value of a subset of rows of a Pandas DataFrame like Modifying a subset of rows in a pandas dataframe, but I need to randomly set these values.
Say I have the dataframe df below. How can I randomly set the values of group == 2 so they are not all equal to 1.0?
import pandas as pd
import numpy as np
df = pd.DataFrame([1,1,1,2,2,2], columns = ['group'])
df['value'] = np.nan
df.loc[df['group'] == 2, 'value'] = np.random.randint(0,5)
print df
group value
0 1 NaN
1 1 NaN
2 1 NaN
3 2 1.0
4 2 1.0
5 2 1.0
df should look something like the below:
print df
group value
0 1 NaN
1 1 NaN
2 1 NaN
3 2 1.0
4 2 4.0
5 2 2.0

You must determine the size of group 2
g2 = df['group'] == 2
df.loc[g2, 'value'] = np.random.randint(5, size=g2.sum())
print(df)
group value
0 1 NaN
1 1 NaN
2 1 NaN
3 2 3.0
4 2 4.0
5 2 2.0

Setting values in one dataframe from the boolean values in another

I have a MWE that can be reproduced with the following code:
import pandas as pd
a = pd.DataFrame([[1,2],[3,4]], columns=['A', 'B'])
b = pd.DataFrame([[True,False],[False,True]], columns=['A', 'B'])
Which creates the following dataframes:
In [8]: a
Out[8]:
A B
0 1 2
1 3 4
In [9]: b
Out[9]:
A B
0 True False
1 False True
My question is, how can I change the values for dataframe A based on the boolean values in dataframe B?
Say for example if I wanted to make NAN values in dataframe A where there's an instance of False in dataframe B?

If need replace False to NaN:
print (a[b])
A B
0 1.0 NaN
1 NaN 4.0
or:
print (a.where(b))
A B
0 1.0 NaN
1 NaN 4.0
and if need replace True to NaN:
print (a[~b])
A B
0 NaN 2.0
1 3.0 NaN
or:
print (a.mask(b))
A B
0 NaN 2.0
1 3.0 NaN
Also you can use where or mask with some scalar value:
print (a.where(b, 7))
A B
0 1 7
1 7 4
print (a.mask(b, 7))
A B
0 7 2
1 3 7
print (a.where(b, 'TEST'))
A B
0 1 TEST
1 TEST 4

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why Python Pandas append to DataFrame like this? - python

You can just pass a dictionary in the dataframe constructor, that if I understand your question correctly. l = [1,2,3] df = pd.DataFrame({'A': l}) df A 0 1 1 2 2 3

Related

How Can I combine two columns is one dataframe?

fill missing values based on the last value [duplicate]

Shift rows in a Column when equal to specific value

Randomly assign values to subset of rows in pandas dataframe

Setting values in one dataframe from the boolean values in another

Categories

Resources