I am trying to create all possible sets of values from a pandas dataframe, taking 3 values from each row.
For example, if I consider the dataframe below:
0 1 2 3 4 5 6
Table Black Brown Red Blue Green Amber
Chair Magenta Turquoise White Orange Violet Pink
Window Indigo Yellow Cerulean Grey Peach Aqua
I want to generate all possible solution sets, by taking 3 values from the first, second, third and fourth rows each.
This is what I tried:
from itertools import product
uniques = [df[i].unique().tolist() for i in df.columns ]
pd.DataFrame(product(*uniques), columns = df.columns)
But this generates all combinations with all 6 columns like this:
0 1 2 3 4 5 6
Table Black Brown Red Blue Green Aqua
Table Black Brown Red Blue Green Violet
Here, all values of Row 1 remain the same except for the last value, and all combinations are generated like this.
But what I need is this:
0 1 2 3 4 5 6 7 8 9
Table Black Red Blue Magenta White Orange Yellow Peach Aqua
Here, the first three values are from Row 1, the second 3 values are from Row 2, and the last 3 values are from Row 3.
Similarly, I want to display all such sets, and store them in a new dataframe.
Any help will be appreciated.
df
###
0 1 2 3 4 5 6
0 Table Black Brown Red Blue Green Amber
1 Chair Magenta Turquoise White Orange Violet Pink
2 Window Indigo Yellow Cerulean Grey Peach Aqua
from itertools import product
import random
uniques = [df[i].unique().tolist() for i in df.columns]
rl = list(product(*uniques))
pd.DataFrame(random.choices(rl))
product() generates all the combinations from sets, but you want a random result from the combinations list.
0 1 2 3 4 5 6
0 Table Black Brown White Orange Peach Aqua
Supplement
Combination
with 3 sets.
Select one element from each set, how many combinations would be?
2 * 2 * 3 = 12
Let's check whether the total number of all combinations is 12 or not.
list_of_lists = [['Yellow','Blue'],['Cat','Dog'],['Swim','Run','Sleep']]
combination = product(*list_of_lists)
combination_list = list(combination)
pd.DataFrame(combination_list)
###
0 1 2
0 Yellow Cat Swim
1 Yellow Cat Run
2 Yellow Cat Sleep
3 Yellow Dog Swim
4 Yellow Dog Run
5 Yellow Dog Sleep
6 Blue Cat Swim
7 Blue Cat Run
8 Blue Cat Sleep
9 Blue Dog Swim
10 Blue Dog Run
11 Blue Dog Sleep
And choose one row from above randomly, would be the solution to generate a set from combinations.
Related
I have two pandas data frames. Within df1 I have a string column with a finite list of unique values. I want to make those values a list, then loop through and append a new column onto df2. The value would loop through the list and then start over for the entire range of the second data frame.
df1
my_value
0 A
1 B
2 C
df2
color
0 red
1 orange
2 yellow
3 green
4 blue
5 indigo
6 violet
7 maroon
8 brown
9 black
What I want
color my_value
0 red A
1 orange B
2 yellow C
3 green A
4 blue B
5 indigo C
6 violet A
7 maroon B
8 brown C
9 black A
#create list
my_list = pd.Series(df1.my_value.values).to_list()
# create column
my_new_column = []
for i in range(len(df2)):
assigned_value = my_list[i]
my_new_column.append(assigned_value)
df2['my_new_column'] = my_new_column
return df2
The list index and range are differing lengths which is where I'm getting hung up.
This is super straight forward and I'm completely looking past the solution, please feel free to link me to another question if this is answered elsewhere. Thanks for you input!
#You can use zip with itertools.cycle() to cycle thru the smallest list/Series
df1 = pd.Series(data=['a','b','c'],name='my_values')
df2 = pd.Series(data= 'red','orange','yellow','green','blue','indigo','violet','maroon','brown','black'], name='color')
import itertools
df2 = pd.concat([df2, pd.Series([b for a,b in zip(df2 , itertools.cycle(df1))], name='my_value')],axis=1)
df2
color my_value
0 red a
1 orange b
2 yellow c
3 green a
4 blue b
5 indigo c
6 violet a
7 maroon b
8 brown c
9 black a
Good afternoon, i am trying to split text in a column to a specfic format
here is my table below
UserId Application
1 Grey Blue::Black Orange;White:Green
2 Yellow Purple::Orange Grey;Blue Pink::Red
I would like it to read the following:
UserId Application Role
1 Grey Blue Black Orange
1 White Green
2 Yellow Purple Orange Grey
2 Blue Pink Red
So far my code is
def unnesting(df, explode):
idx=df.index.repeat(df[explode[0]].str.len())
df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)
df1.index=idx
return df1.join(df.drop(explode,1),how='left')
df['Application']=df.Roles.str.split(';|::|:').map(lambda x : x[0::2])
unnesting(df.drop('Roles',1),['Application'])
The following output code reads
UserId Application
1 Grey Blue
1 White
2 Yellow Purple
2 Blue Pink
i do not know how to add the second column (role) in the code for the second split after ::
Given this dataframe:
UserId Application
0 1 Grey Blue::Black Orange;White::Green
1 2 Yellow Purple::Orange Grey;Blue Pink::Red
you could at least achieve the last two columns directly via
df.Application.str.split(';', expand=True).stack().str.split('::', expand=True).reset_index().drop(columns=['level_0', 'level_1'])
which results in
0 1
0 Grey Blue Black Orange
1 White Green
2 Yellow Purple Orange Grey
3 Blue Pink Red
However, defining UserId as index before would also provide the proper UserId column:
result = df.set_index('UserId').Application.str.split(';', expand=True).stack().str.split('::', expand=True).reset_index().drop(columns=['level_1'])
result.columns = ['UserId', 'Application', 'Role']
UserId Application Role
0 1 Grey Blue Black Orange
1 1 White Green
2 2 Yellow Purple Orange Grey
3 2 Blue Pink Red
i have a pandas dateframe like this:
FRUITS COLOURS
0 apple red
1 berry black
2 apple green
3 grapes green
4 apple black
5 grapes red
6 tomato black
7 tomato green
keeping in mind the priority order of COLOURS red > green > black, i want to eliminate all the duplicate entries in FRUITS
Desired output should be:
FRUITS COLOURS
0 apple red
1 berry black
2 grapes red
3 tomato green
You can set the order by setting COLOUR to an ordered categorical, then sorting, and dropping the duplicate FRUITS:
df['COLOURS'] = pd.Categorical(df['COLOURS'], categories=['red','green','black'],ordered=True)
df.sort_values('COLOURS').drop_duplicates('FRUITS').sort_index()
FRUITS COLOURS
0 apple red
1 berry black
5 grapes red
7 tomato green
I have a column with na values that I want to fill according to values from another data frame according to a key. I was wondering if there is any simple way to do so.
Example:
I have a data frame of objects and their colors like this:
object color
0 chair black
1 ball yellow
2 door brown
3 ball **NaN**
4 chair white
5 chair **NaN**
6 ball grey
I want to fill na values in the color column with default color from the following data frame:
object default_color
0 chair brown
1 ball blue
2 door grey
So the result will be this:
object color
0 chair black
1 ball yellow
2 door brown
3 ball **blue**
4 chair white
5 chair **brown**
6 ball grey
Is there any "easy" way to do this?
Thanks :)
Use np.where and mapping by setting a column as index i.e
df['color']= np.where(df['color'].isnull(),df['object'].map(df2.set_index('object')['default_color']),df['color'])
or df.where
df['color'] = df['color'].where(df['color'].notnull(), df['object'].map(df2.set_index('object')['default_color']))
object color
0 chair black
1 ball yellow
2 door brown
3 ball blue
4 chair white
5 chair brown
6 ball grey
First create Series and then replace NaNs:
s = df1['object'].map(df2.set_index('object')['default_color'])
print (s)
0 brown
1 blue
2 grey
3 blue
4 brown
5 brown
6 blue
Name: object, dtype: object
df1['color']= df1['color'].mask(df1['color'].isnull(), s)
Or:
df1.loc[df1['color'].isnull(), 'color'] = s
Or:
df1['color'] = df1['color'].combine_first(s)
Or:
df1['color'] = df1['color'].fillna(s)
print (df1)
object color
0 chair black
1 ball yellow
2 door brown
3 ball blue
4 chair white
5 chair brown
6 ball grey
If unique values in object:
df = df1.set_index('object')['color']
.combine_first(df2.set_index('object')['default_color'])
.reset_index()
Or:
df = df1.set_index('object')['color']
.fillna(df2.set_index('object')['default_color'])
.reset_index()
Using loc + map:
m = df.color.isnull()
df.loc[m, 'color'] = df.loc[m, 'object'].map(df2.set_index('object').default_color)
df
object color
0 chair black
1 ball yellow
2 door brown
3 ball blue
4 chair white
5 chair brown
6 ball grey
If you're going to be doing a lot of these replacements, you should call set_index on df2 just once and save its result.
I have a data frame like
id value_right color_right value_left color_left
1 asd red dfs blue
2 dfs blue afd green
3 ccd yellow asd blue
4 hty red hrr red
I need to get the left values below the right values, something like
id value color
1 asd red
1 dfs blue
2 dfs blue
2 afd green
3 ccd yellow
3 asd blue
4 hty red
4 hrr red
I tried to split in two data frames and to interleave using the id, but I got only half of the data on it, using the mod of the value of the id. Any ideas?
Take a view of the desired left and right side dfs, then rename the columns and then concat them and sort on 'id' column:
In [205]:
left = df[['id','value_left','color_left']].rename(columns={'value_left':'value','color_left':'color'})
right = df[['id','value_right','color_right']].rename(columns={'value_right':'value','color_right':'color'})
merged = pd.concat([right,left]).sort('id')
merged
Out[205]:
id value color
0 1 asd red
0 1 dfs blue
1 2 dfs blue
1 2 afd green
2 3 ccd yellow
2 3 asd blue
3 4 hty red
3 4 hrr red