Pandas dataframe. Wrong series fly in the line

Pandas dataframe. Wrong series fly in the line - python

I want to have 81 rows x 1 columns.
How to correct this distortion?

Use fillna. Basically, use the values in the second column to fill holes in the first column:
df['first_column'].fillna(df['second_column'])
For example, if you have DataFrame df:
a b
0 1.0 NaN
1 2.0 NaN
2 NaN 100.0
then
df['a'] = df['a'].fillna(df['b'])
df = df.drop(columns=['b'])
Output:
a
0 1.0
1 2.0
2 100.0

Related

Python Dataframe Duplicated Columns while Merging multple times

I have a main dataframe and a sub dataframe. I want to merge each column in sub dataframe into main dataframe with main dataframe column as a reference. I have successfully arrived at my desired answer, except that I see duplicated columns of the main dataframe. Below are the my expected and present answers.
Present solution:
df = pd.DataFrame({'Ref':[1,2,3,4]})
df1 = pd.DataFrame({'A':[2,3],'Z':[1,2]})
df = [df.merge(df1[col_name],left_on='Ref',right_on=col_name,how='left') for col_name in df1.columns]
df = pd.concat(df,axis=1)
df =
Ref A Ref Z
0 1 NaN 1 1.0
1 2 2.0 2 2.0
2 3 3.0 3 NaN
3 4 NaN 4 NaN
Expected Answer:
df =
Ref A Z
0 1 NaN 1.0
1 2 2.0 2.0
2 3 3.0 NaN
3 4 NaN NaN

Update
Use duplicated:
>>> df.loc[:, ~df.columns.duplicated()]
Ref A Z
0 1 NaN 1.0
1 2 2.0 2.0
2 3 3.0 NaN
3 4 NaN NaN
Old answer
You can use:
# Your code
...
df = pd.concat(df, axis=1)
# Use pop and insert to cleanup your dataframe
df.insert(0, 'Ref', df.pop('Ref').iloc[:, 0])
Output:
>>> df
Ref A Z
0 1 NaN 1.0
1 2 2.0 2.0
2 3 3.0 NaN
3 4 NaN NaN

What about setting 'Ref' col as index while getting dataframe list. (And resetting index such that you get back Ref as a column)
df = pd.DataFrame({'Ref':[1,2,3,4]})
df1 = pd.DataFrame({'A':[2,3],'Z':[1,2]})
df = [df.merge(df1[col_name],left_on='Ref',right_on=col_name,how='left').set_index('Ref') for col_name in df1.columns]
df = pd.concat(df,axis=1)
df = df.reset_index()
Ref A Z
1 NaN 1.0
2 2.0 2.0
3 3.0 NaN
4 NaN NaN

This is a reduction process. Instead of the list comprehension use for - loop, or even reduce:
from functools import reduce
reduce(lambda x, y : x.merge(df1[y],left_on='Ref',right_on=y,how='left'), df1.columns, df)
Ref A Z
0 1 NaN 1.0
1 2 2.0 2.0
2 3 3.0 NaN
3 4 NaN NaN
The above is similar to:
for y in df1.columns:
df = df.merge(df1[y],left_on='Ref',right_on=y,how='left')
df
Ref A Z
0 1 NaN 1.0
1 2 2.0 2.0
2 3 3.0 NaN
3 4 NaN NaN

how to append two dataframes with different column names and avoid columns with nan values

xyarr= [[0,1,2],[1,1,3],[2,1,2]]
df1 = pd.DataFrame(xyarr, columns=['a', 'b','c'])
df2 = pd.DataFrame([['text','text2']], columns=['x','y'])
df3 = pd.concat([df1,df2],axis=0, ignore_index=True)
df3 will have NaN values, from the empty columns a b c.
a b c x y
0 0.0 1.0 2.0 NaN NaN
1 1.0 1.0 3.0 NaN NaN
2 2.0 1.0 2.0 NaN NaN
3 NaN NaN NaN text text2
I want to save df3 to a csv, but without the extra commas
any suggestions?

As pd.concat is an outer join by default, you will get the NaN values from the empty columns a b c. If you use other Pandas function e.g. .join() which is left join by default, you can get around the problem here.
You can try using .join(), as follows:
df3 = df1.join(df2)
Result:
print(df3)
a b c x y
0 0 1 2 text text2
1 1 1 3 NaN NaN
2 2 1 2 NaN NaN

Move values in rows in a new column in pandas

I have a DataFrame with an Ids column an several columns with data, like the column "value" in this example.
For this DataFrame I want to move all the values that correspond to the same id to a new column in the row as shown below:
I guess there is an opposite function to "melt" that allow this, but I'm not getting how to pivot this DF.
The dicts for the input and out DFs are:
d = {"id":[1,1,1,2,2,3,3,4,5],"value":[12,13,1,22,21,23,53,64,9]}
d2 = {"id":[1,2,3,4,5],"value1":[12,22,23,64,9],"value2":[1,21,53,"","",],"value3":[1,"","","",""]}

Create MultiIndex by cumcount, reshape by unstack and add change columns names by add_prefix:
df = (df.set_index(['id',df.groupby('id').cumcount()])['value']
.unstack()
.add_prefix('value')
.reset_index())
print (df)
id value0 value1 value2
0 1 12.0 13.0 1.0
1 2 22.0 21.0 NaN
2 3 23.0 53.0 NaN
3 4 64.0 NaN NaN
4 5 9.0 NaN NaN
Missing values is possible replace by fillna, but get mixed numeric with strings data, so some function should failed:
df = (df.set_index(['id',df.groupby('id').cumcount()])['value']
.unstack()
.add_prefix('value')
.reset_index()
.fillna(''))
print (df)
id value0 value1 value2
0 1 12.0 13 1
1 2 22.0 21
2 3 23.0 53
3 4 64.0
4 5 9.0

You can GroupBy to a list, then expand the series of lists:
df = pd.DataFrame(d) # create input dataframe
res = df.groupby('id')['value'].apply(list).reset_index() # groupby to list
res = res.join(pd.DataFrame(res.pop('value').values.tolist())) # expand lists to columns
print(res)
id 0 1 2
0 1 12 13.0 1.0
1 2 22 21.0 NaN
2 3 23 53.0 NaN
3 4 64 NaN NaN
4 5 9 NaN NaN
In general, such operations will be expensive as the number of columns is arbitrary. Pandas / NumPy solutions work best when you can pre-allocate memory, which isn't possible here.

Randomly assign values to subset of rows in pandas dataframe

I am using Python 2.7.11 with Anaconda.
I understand how to set the value of a subset of rows of a Pandas DataFrame like Modifying a subset of rows in a pandas dataframe, but I need to randomly set these values.
Say I have the dataframe df below. How can I randomly set the values of group == 2 so they are not all equal to 1.0?
import pandas as pd
import numpy as np
df = pd.DataFrame([1,1,1,2,2,2], columns = ['group'])
df['value'] = np.nan
df.loc[df['group'] == 2, 'value'] = np.random.randint(0,5)
print df
group value
0 1 NaN
1 1 NaN
2 1 NaN
3 2 1.0
4 2 1.0
5 2 1.0
df should look something like the below:
print df
group value
0 1 NaN
1 1 NaN
2 1 NaN
3 2 1.0
4 2 4.0
5 2 2.0

You must determine the size of group 2
g2 = df['group'] == 2
df.loc[g2, 'value'] = np.random.randint(5, size=g2.sum())
print(df)
group value
0 1 NaN
1 1 NaN
2 1 NaN
3 2 3.0
4 2 4.0
5 2 2.0

I am trying to fill all NaN values in rows with number data types to zero in pandas

I have a DateFrame with a mixture of string, and float rows. The float rows are all still whole numbers and were only changed to floats because their were missing values. I want to fill in all the NaN rows that are numbers with zero while leaving the NaN in columns that are strings. Here is what I have currently.
df.select_dtypes(include=['int', 'float']).fillna(0, inplace=True)
This doesn't work and I think it is because .select_dtypes() returns a view of the DataFrame so the .fillna() doesn't work. Is there a method similar to this to fill all the NaNs on only the float rows.

Use either DF.combine_first (does not act inplace):
df.combine_first(df.select_dtypes(include=[np.number]).fillna(0))
or DF.update (modifies inplace):
df.update(df.select_dtypes(include=[np.number]).fillna(0))
The reason why fillna fails is because DF.select_dtypes returns a completely new dataframe which although forms a subset of the original DF, but is not really a part of it. It behaves as a completely new entity in itself. So any modifications done to it will not affect the DF it gets derived from.
Note that np.number selects all numeric type.

Your pandas.DataFrame.select_dtypes approach is good; you've just got to cross the finish line:
>>> df = pd.DataFrame({'A': [np.nan, 'string', 'string', 'more string'], 'B': [np.nan, np.nan, 3, 4], 'C': [4, np.nan, 5, 6]})
>>> df
A B C
0 NaN NaN 4.0
1 string NaN NaN
2 string 3.0 5.0
3 more string 4.0 6.0
Don't try to perform the in-place fillna here (there's a time and place for inplace=True, but here is not one). You're right in that what's returned by select_dtypes is basically a view. Create a new dataframe called filled and join the filled (or "fixed") columns back with your original data:
>>> filled = df.select_dtypes(include=['int', 'float']).fillna(0)
>>> filled
B C
0 0.0 4.0
1 0.0 0.0
2 3.0 5.0
3 4.0 6.0
>>> df = df.join(filled, rsuffix='_filled')
>>> df
A B C B_filled C_filled
0 NaN NaN 4.0 0.0 4.0
1 string NaN NaN 0.0 0.0
2 string 3.0 5.0 3.0 5.0
3 more string 4.0 6.0 4.0 6.0
Then you can drop whatever original columns you had to keep only the "filled" ones:
>>> df.drop([x[:x.find('_filled')] for x in df.columns if '_filled' in x], axis=1, inplace=True)
>>> df
A B_filled C_filled
0 NaN 0.0 4.0
1 string 0.0 0.0
2 string 3.0 5.0
3 more string 4.0 6.0

Consider a dataframe like this
col1 col2 col3 id
0 1 1 1 a
1 0 NaN 1 a
2 NaN 1 1 NaN
3 1 0 1 b
You can select the numeric columns and fillna
num_cols = df.select_dtypes(include=[np.number]).columns
df[num_cols]=df.select_dtypes(include=[np.number]).fillna(0)
col1 col2 col3 id
0 1 1 1 a
1 0 0 1 a
2 0 1 1 NaN
3 1 0 1 b

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas dataframe. Wrong series fly in the line - python

I want to have 81 rows x 1 columns. How to correct this distortion?

Related

Python Dataframe Duplicated Columns while Merging multple times

how to append two dataframes with different column names and avoid columns with nan values

Move values in rows in a new column in pandas

Randomly assign values to subset of rows in pandas dataframe

I am trying to fill all NaN values in rows with number data types to zero in pandas

Categories

Resources