How to get and show counts of values each column - python

I have a pandas dataframe.
df:
col1 col2 col3 col4 col5
0 1.0 1.0 NaN NaN 1.0
1 NaN 1.0 1.0 2.0 1.0
2 2.0 NaN 1.0 NaN 1.0
I want to get the count number of rows that have the same data each column like following.
OutPut:
col1 col2 col3 col4 col5
1.0 1 2 2 0 3
2.0 1 0 0 1 0
or only the count of a value.
col1 col2 col3 col4 col5
1.0 1 2 2 0 3
Are there any ways to get my expected output?

You could use value_counts method of pandas Series and then fillna for filling NaN values with 0:
In [7]: df
Out[7]:
col1 col2 col3 col4 col5
0 1.0 1.0 NaN NaN 1.0
1 NaN 1.0 1.0 2.0 1.0
2 2.0 NaN 1.0 NaN 1.0
In [8]: df.apply(lambda x: x.value_counts()).fillna(0)
Out[8]:
col1 col2 col3 col4 col5
1.0 1 2.0 2.0 0.0 3.0
2.0 1 0.0 0.0 1.0 0.0
If you need int values instead of float you could also use astype with int:
In [9]: df.apply(lambda x: x.value_counts()).fillna(0).astype(int)
Out[9]:
col1 col2 col3 col4 col5
1.0 1 2 2 0 3
2.0 1 0 0 1 0

Edit: df.replace(np.NaN, 0) does not work reliably across versions, so updated to use df.fillna(0) instead.
To count the occurrences of a value in each column use value_counts. Non-occurring values become NaN, so need to be replaced with 0:
>>> df.apply(pd.value_counts).fillna(0)
col1 col2 col3 col4 col5
1 1 2 2 0 3
2 1 0 0 1 0
To retrieve a particular row:
>>> df.apply(pd.value_counts).fillna(0).loc[1:1]
col1 col2 col3 col4 col5
1 1 2 2 0 3

Related

How to insert NaN if value is not between fence_high and fence_low columns

I have to replace the values from the first three columns with NaN if they are >= than fence_high or <= than fence_low.
I have a dataframe like this:
col1 col2 col3 fence_high fence_low
0 1 3 9 9 1.5
1 2 4 6 7 1
2 4 7 -1 6.5 0
This is what I would like to achieve:
col1 col2 col3 fence_high fence_low
0 NaN 3 NaN 9 1.5
1 2 4 6 7 1
2 4 NaN NaN 6.5 0
So far I tried df_new = df[(df < df["fence_high"]) & (df > df["fence_low"])], but this gives me all NaN.
We can simply keep values where they fall between fence_low and fence_high using gt and lt to maintain index alignment:
df.loc[:, 'col1':'col3'] = df.loc[:, 'col1':'col3'].where(
lambda x: x.gt(df['fence_low'], axis=0) & x.lt(df['fence_high'], axis=0)
)
df
col1 col2 col3 fence_high fence_low
0 NaN 3.0 NaN 9.0 1.5
1 2.0 4.0 6.0 7.0 1.0
2 4.0 NaN NaN 6.5 0.0
If needing a new DataFrame we can join after where to restore the columns that were not considered:
new_df = df.loc[:, 'col1':'col3'].where(
lambda x: x.gt(df['fence_low'], axis=0) & x.lt(df['fence_high'], axis=0)
).join(df[['fence_high', 'fence_low']])
new_df:
col1 col2 col3 fence_high fence_low
0 NaN 3.0 NaN 9.0 1.5
1 2.0 4.0 6.0 7.0 1.0
2 4.0 NaN NaN 6.5 0.0
One of the ways is to use apply
See if this helps:
import pandas as pd
import numpy as np
cols_list = ["col1", "col2", "col3"]
def compare_val(val, high, low):
if val >= high or val <= low:
return np.nan
return val
def compare(row):
result = []
for i in cols_list:
result.append(
compare_val(val=row[i], high=row["fence_high"], low=row["fence_low"])
)
return pd.Series(result)
data = [[1, 3, 9, 9, 1.5], [2, 4, 6, 7, 1], [4, 7, -1, 6.5, 0]]
df = pd.DataFrame(data, columns=[*cols_list, "fence_high", "fence_low"])
print("Original:\n", df.head())
df[cols_list] = df.apply(compare, axis=1)
print("Transformed:\n", df.head())
Output:
Original:
col1 col2 col3 fence_high fence_low
0 1 3 9 9.0 1.5
1 2 4 6 7.0 1.0
2 4 7 -1 6.5 0.0
Transformed:
col1 col2 col3 fence_high fence_low
0 NaN 3.0 NaN 9.0 1.5
1 2.0 4.0 6.0 7.0 1.0
2 4.0 NaN NaN 6.5 0.0

Shift all NaN values in pandas to the left

I have a (250, 33866) dataframe. As you can see in the picture, all the NaN values are at the end of each row. I would like to shift those NaNvalues ti the left of the dataframe. At the same time I wanna keep the 0 column (which refers to the Id) in its place (stays the first one).
I was trying to define a function that loops over all rows and columns to do that, but figured it will be very inefficient for large data. Any other options? Thanks
You could reverse the columns of df, drop NaNs; build a DataFrame and reverse it back:
out = pd.DataFrame(df.iloc[:,::-1].apply(lambda x: x.dropna().tolist(), axis=1).tolist(),
columns=df.columns[::-1]).iloc[:,::-1]
For example, for a DataFrame that looks like below:
col0 col1 col2 col3 col4
1 1.0 2.0 3.0 10.0 20.0
2 1.0 2.0 3.0 NaN NaN
3 1.0 2.0 NaN NaN NaN
the above code produces:
col0 col1 col2 col3 col4
0 1.0 2.0 3.0 10.0 20.0
1 NaN NaN 1.0 2.0 3.0
2 NaN NaN NaN 1.0 2.0

Fill NaN values in a pandas DataFrame depending on values of cells to its left

I'm trying to fill NaN's in a very large pandas dataframe with zeros, but only if there are non-NaN values in the same row but in a cell to its left. So for example, from this input DataFrame,
input = pd.DataFrame([[1, np.NaN, 1.5, np.NaN], [np.NaN, 2, np.NaN, np.NaN]], index=['A', 'B'], columns=['col1', 'col2', 'col3', 'col4'])
which looks like:
col1 col2 col3 col4
A 1.0 NaN 1.5 NaN
B NaN 2.0 NaN NaN
The expected output would be:
col1 col2 col3 col4
A 1.0 0 1.5 0
B NaN 2.0 0 0
See how [B, col1] remains a Nan because there's no not-NaN value to its left, but all four [A,col2], [A, col4], [B,col3] and [B, col4] have been filled with zeros (because there are leftier non-NaN values).
Does anyone have any idea on how to go on about this? Thanks a lot!
Use forward filling missing values and test not missing and chain by test missing values and by this mask assign 0:
df[df.ffill(axis=1).notna() & df.isna()] = 0
print (df)
col1 col2 col3 col4
A 1.0 0.0 1.5 0.0
B NaN 2.0 0.0 0.0
Or you can use cumulative sum with test not equal 0 values:
df[df.fillna(0).cumsum(axis=1).ne(0) & df.isna()] = 0
print (df)
col1 col2 col3 col4
A 1.0 0.0 1.5 0.0
B NaN 2.0 0.0 0.0

How to fill the data frame with using the match between columns and column list and value list using pandas?

I have a data frame like this:
df
col1 col2 col3 col4 col5 col6 col7
1
2
3
4
5
the values from col2 to col7 are empty now, Now I have two lists,
list1=[['col2'],['col5','col6'],[],['col3','col4','col5','col6'],['col7','col4']]
list2=[['1'],['2','3'],[],['4','5','6','7'],['8','9']]
I want to fill the data frame if the column names matches with list with the corresponding values with list2
The result df should look like,
col1 col2 col3 col4 col5 col6 col7
1 1 NA NA NA NA NA
2 NA NA NA 2 3 NA
3 NA NA NA NA NA NA
4 NA 4 5 6 7 NA
5 NA NA 9 NA NA 8
How to do it in most efficient way using pandas, python ?
What I will do
df.update(pd.concat([pd.DataFrame(data=[z],columns=y,index=[x]) for x , (y, z) in enumerate(zip(list1,list2))]))
df
col1 col2 col3 col4 col5 col6 col7
0 1 1 NaN NaN NaN NaN NaN
1 2 NaN NaN NaN 2 3 NaN
2 3 NaN NaN NaN NaN NaN NaN
3 4 NaN 4 5 6 7 NaN
4 5 NaN NaN 9 NaN NaN 8
Use loop solution with zip and enumerate for counter:
for i, (a, b) in enumerate(zip(list1, list2)):
df.loc[i, a] = b
print (df)
col1 col2 col3 col4 col5 col6 col7
0 1 1 NaN NaN NaN NaN NaN
1 2 NaN NaN NaN 2 3 NaN
2 3 NaN NaN NaN NaN NaN NaN
3 4 NaN 4 5 6 7 NaN
4 5 NaN NaN 9 NaN NaN 8
Or try create 3 column DataFrame and then pivot:
a = [(i, a1, b1) for i, (a, b) in enumerate(zip(list1, list2)) for a1, b1 in zip(a, b)]
df1 = pd.DataFrame(a).pivot(0,1,2)
print (df1)
1 col2 col3 col4 col5 col6 col7
0
0 1 NaN NaN NaN NaN NaN
1 NaN NaN NaN 2 3 NaN
3 NaN 4 5 6 7 NaN
4 NaN NaN 9 NaN NaN 8
and then DataFrame.join:
df = df[['col1']].join(df1)
print (df)
col1 col2 col3 col4 col5 col6 col7
0 1 1 NaN NaN NaN NaN NaN
1 2 NaN NaN NaN 2 3 NaN
2 3 NaN NaN NaN NaN NaN NaN
3 4 NaN 4 5 6 7 NaN
4 5 NaN NaN 9 NaN NaN 8
With simple loop:
In [54]: for i, col_names in enumerate(list1):
...: df.loc[i, col_names] = list2[i]
...:
...:
In [55]: df
Out[55]:
col1 col2 col3 col4 col5 col6 col7
0 1 1 NaN NaN NaN NaN NaN
1 2 NaN NaN NaN 2 3 NaN
2 3 NaN NaN NaN NaN NaN NaN
3 4 NaN 4 5 6 7 NaN
4 5 NaN NaN 9 NaN NaN 8

Pandas dataframe a particular case of merging

How can I merge the rows of the dataframe1 into the dataframe2 ?
If one of the corresponding values is NaN then the value should be
copied from the other.
If both are NaN then NaN.
If none are NaN then the first one.
Dataframe1
Dataframe2
Thanks in advance
You can use combine_first:
df
Out:
col1 col2 col3 col4
0 NaN NaN 3.0 4
1 1.0 2.0 NaN 5
df.loc[0].combine_first(df.loc[1])
Out:
col1 1.0
col2 2.0
col3 3.0
col4 4.0
Name: 0, dtype: float64
In the specified format:
df.loc[0].combine_first(df.loc[1]).to_frame('Row1-2').T
Out:
col1 col2 col3 col4
Row1-2 1.0 2.0 3.0 4.0
An alternative:
df.loc[[0]].fillna(df.loc[1])
Out:
col1 col2 col3 col4
0 1.0 2.0 3.0 4
And a cleaner version of filling from #MaxU:
df.bfill().iloc[[0]]
Out:
col1 col2 col3 col4
0 1.0 2.0 3.0 4

Categories

Resources