Concatenate two pandas dataframe with the same index but on different positions - python

I have a data frame like
id value_right color_right value_left color_left
1 asd red dfs blue
2 dfs blue afd green
3 ccd yellow asd blue
4 hty red hrr red
I need to get the left values below the right values, something like
id value color
1 asd red
1 dfs blue
2 dfs blue
2 afd green
3 ccd yellow
3 asd blue
4 hty red
4 hrr red
I tried to split in two data frames and to interleave using the id, but I got only half of the data on it, using the mod of the value of the id. Any ideas?

Take a view of the desired left and right side dfs, then rename the columns and then concat them and sort on 'id' column:
In [205]:
left = df[['id','value_left','color_left']].rename(columns={'value_left':'value','color_left':'color'})
right = df[['id','value_right','color_right']].rename(columns={'value_right':'value','color_right':'color'})
merged = pd.concat([right,left]).sort('id')
merged
Out[205]:
id value color
0 1 asd red
0 1 dfs blue
1 2 dfs blue
1 2 afd green
2 3 ccd yellow
2 3 asd blue
3 4 hty red
3 4 hrr red

Related

Finding combinations of values across multiple rows and columns in pandas dataframe

I am trying to create all possible sets of values from a pandas dataframe, taking 3 values from each row.
For example, if I consider the dataframe below:
0 1 2 3 4 5 6
Table Black Brown Red Blue Green Amber
Chair Magenta Turquoise White Orange Violet Pink
Window Indigo Yellow Cerulean Grey Peach Aqua
I want to generate all possible solution sets, by taking 3 values from the first, second, third and fourth rows each.
This is what I tried:
from itertools import product
uniques = [df[i].unique().tolist() for i in df.columns ]
pd.DataFrame(product(*uniques), columns = df.columns)
But this generates all combinations with all 6 columns like this:
0 1 2 3 4 5 6
Table Black Brown Red Blue Green Aqua
Table Black Brown Red Blue Green Violet
Here, all values of Row 1 remain the same except for the last value, and all combinations are generated like this.
But what I need is this:
0 1 2 3 4 5 6 7 8 9
Table Black Red Blue Magenta White Orange Yellow Peach Aqua
Here, the first three values are from Row 1, the second 3 values are from Row 2, and the last 3 values are from Row 3.
Similarly, I want to display all such sets, and store them in a new dataframe.
Any help will be appreciated.
df
###
0 1 2 3 4 5 6
0 Table Black Brown Red Blue Green Amber
1 Chair Magenta Turquoise White Orange Violet Pink
2 Window Indigo Yellow Cerulean Grey Peach Aqua
from itertools import product
import random
uniques = [df[i].unique().tolist() for i in df.columns]
rl = list(product(*uniques))
pd.DataFrame(random.choices(rl))
product() generates all the combinations from sets, but you want a random result from the combinations list.
0 1 2 3 4 5 6
0 Table Black Brown White Orange Peach Aqua
Supplement
Combination
with 3 sets.
Select one element from each set, how many combinations would be?
2 * 2 * 3 = 12
Let's check whether the total number of all combinations is 12 or not.
list_of_lists = [['Yellow','Blue'],['Cat','Dog'],['Swim','Run','Sleep']]
combination = product(*list_of_lists)
combination_list = list(combination)
pd.DataFrame(combination_list)
###
0 1 2
0 Yellow Cat Swim
1 Yellow Cat Run
2 Yellow Cat Sleep
3 Yellow Dog Swim
4 Yellow Dog Run
5 Yellow Dog Sleep
6 Blue Cat Swim
7 Blue Cat Run
8 Blue Cat Sleep
9 Blue Dog Swim
10 Blue Dog Run
11 Blue Dog Sleep
And choose one row from above randomly, would be the solution to generate a set from combinations.

Using groupby() and value_counts()

The goal is to identify the count of colors in each groupby().
If you see the outcome below shows the first group color blue appears 3 times.
The second group yellow appears 2 times and third group blue appears 5 times
So far this is what I got.
df.groupby(['X','Y','Color'})..Color.value_counts()
but this produces only the count of 1 since color for each row appears once.
The final output should be like this:
Thanks in advance for any assistance.
If you give size to the transform function, it will be in tabular form without aggregation.
df['Count'] = df.groupby(['X','Y']).Color.transform('size')
df.set_index(['X','Y'], inplace=True)
df
Color Count
X Y
A B Blue 3
B Blue 3
B Blue 3
C D Yellow 2
D Yellow 2
E F Blue 5
F Blue 5
F Blue 5
F Blue 5
F Blue 5

Pandas count size of groupby groups idiomatically

I often want a dataframe of counts for how many members are in each group after a groupby operation in pandas. I have a verbose way of doing it with size and reset index and rename, but I'm sure there is a better way.
Here's an example of what I'd like to do:
import pandas as pd
import numpy as np
np.random.seed(0)
colors = ['red','green','blue']
cdf = pd.DataFrame({
'color1':np.random.choice(colors,10),
'color2':np.random.choice(colors,10),
})
print(cdf)
#better way to do next line? (somehow use agg?)
gb_count = cdf.groupby(['color1','color2']).size().reset_index().rename(columns={0:'num'})
print(gb_count)
#cdf.groupby(['color1','color2']).count() #<-- this doesn't work
Final output:
color1 color2 num
0 blue green 1
1 blue red 1
2 green blue 3
3 red green 4
4 red red 1
To avoid getting your MultiIndex, use as_index=False:
cdf.groupby(['color1','color2'], as_index=False).size()
color1 color2 size
0 blue green 1
1 blue red 1
2 green blue 3
3 red green 4
4 red red 1
If you explicitly want to name your new column num. You can use reset_index with name=.. since groupby will return a series:
cdf.groupby(['color1','color2']).size().reset_index(name='num')
color1 color2 num
0 blue green 1
1 blue red 1
2 green blue 3
3 red green 4
4 red red 1
Another way is to reset the grouper_index after sending it to_frame(with preferred column name) in an agg operation.
gb_count = cdf.groupby(['color1','color2']).agg('size').to_frame('num').reset_index()
color1 color2 num
0 blue green 1
1 blue red 1
2 green blue 3
3 red green 4
4 red red 1

How to repeatedly loop through list to assign values

I have two pandas data frames. Within df1 I have a string column with a finite list of unique values. I want to make those values a list, then loop through and append a new column onto df2. The value would loop through the list and then start over for the entire range of the second data frame.
df1
my_value
0 A
1 B
2 C
df2
color
0 red
1 orange
2 yellow
3 green
4 blue
5 indigo
6 violet
7 maroon
8 brown
9 black
What I want
color my_value
0 red A
1 orange B
2 yellow C
3 green A
4 blue B
5 indigo C
6 violet A
7 maroon B
8 brown C
9 black A
#create list
my_list = pd.Series(df1.my_value.values).to_list()
# create column
my_new_column = []
for i in range(len(df2)):
assigned_value = my_list[i]
my_new_column.append(assigned_value)
df2['my_new_column'] = my_new_column
return df2
The list index and range are differing lengths which is where I'm getting hung up.
This is super straight forward and I'm completely looking past the solution, please feel free to link me to another question if this is answered elsewhere. Thanks for you input!
#You can use zip with itertools.cycle() to cycle thru the smallest list/Series
df1 = pd.Series(data=['a','b','c'],name='my_values')
df2 = pd.Series(data= 'red','orange','yellow','green','blue','indigo','violet','maroon','brown','black'], name='color')
import itertools
df2 = pd.concat([df2, pd.Series([b for a,b in zip(df2 , itertools.cycle(df1))], name='my_value')],axis=1)
df2
color my_value
0 red a
1 orange b
2 yellow c
3 green a
4 blue b
5 indigo c
6 violet a
7 maroon b
8 brown c
9 black a

Count the amount of NaNs in each group

In this dataframe, I'm trying to count how many NaN's there are for each color within the color column.
This is what the sample data looks like. In reality, there's 100k rows.
color value
0 blue 10
1 blue NaN
2 red NaN
3 red NaN
4 red 8
5 red NaN
6 yellow 2
I'd like the output to look like this:
color count
0 blue 1
1 red 3
2 yellow 0
You can use DataFrame.isna, GroupBy the column color and sum to add up all True rows in each group:
df.value.isna().groupby(df.color).sum().reset_index()
color value
0 blue 1.0
1 red 3.0
2 yellow 0.0
Also you may use agg() and isnull() or isna() as follows:
df.groupby('color').agg({'value': lambda x: x.isnull().sum()}).reset_index()
Use isna().sum()
df.groupby('color').value.apply(lambda x: x.isna().sum())
color
blue 1
red 3
yellow 0
A usage from size and count
g=df.groupby('color')['value']
g.size()-g.count()
Out[115]:
color
blue 1
red 3
yellow 0
Name: value, dtype: int64

Categories

Resources