Replace & filter dataframe row values - python

I have two dataframe's one with expression and another with values. Dataframe 1 criteria column has value with column name of another dataframe. My need is to take each row values from Dataframe 2 and replace Dataframe 1 criteria without loop.
How should I do it in an optimized way ?
DataFrame 1:
Criteria point
0 chgdsl='10' 1
1 chgdt ='01022007' 2
3 chgdsl='9' 3
DataFrame 2:
chgdsl chgdt chgname
0 10 01022007 namrr
1 9 02022007 chard
2 9 01022007 exprr
I expect that when I take first row of DataFrame 2 , output of Dataframe 1 will be 10='10' , 01022007 ='01022007' 10='9'
Need to take one row at a time from Dataframe 2 and replace it in all rows of Dataframe 1.

Related

Return the index of a row based on value in different dataframe's row

I have two Pandas dataframes. They all contain 3 decimal point float values.
Dataframe A is a one column dataframe with 12 rows. Dataframe B is a one column dataframe with over 40,000 rows, which contain the 12 values in Dataframe A spread out randomly.
I need to find the indices of the values in Dataframe A within Dataframe B.
I have tried .query(), .index.value() and .where() but am unable to return the indices.
Dataframe A
Row Index
Time
0
148.521
1
112.379
...
...
12
510.121
Dataframe B
Row Index
Time
0
0.000
1
0.025
...
...
46871
1171.675
You can use df.loc[]
for i in dataframe_A['Time']:
dataframe_B.loc[dataframe_B['Time'] == i]
This should return the twelve values along with their row index from dataframe B

Copying (assembling) the column from smaller data frames into the bigger data frame with pandas

I have a data frame with measurements for several groups of participants, and I am doing some calculations for each group. I want to add a column in a big data frame (all participants), from secondary data frames (partial list of participants).
When I do merge a couple of times (merging a new data frame to the existing one), it creates a duplicate of the column instead of one column.
As the size of the dataframes is different I can not compare them directly.
I tried
#df1 - main bigger dataframe, df2 - smaller dataset contains group of df1
for i in range(len(df1)):
# checking indeces to place the data to correct participant:
if df1.index[i] not in df2['index']:
pass
else :
df1['rate'][i] = list(df2[rate][df2['index']==i])
It does not work properly though. Can you please help with the correct way of assembling the column?
update: where the index of the initial dataframe and the "index" column of the calculation is the same, copy the rate value from the calculation into main df
main dataframe 1df
index
rate
1
0
2
0
3
0
4
0
5
0
6
0
dataframe with calculated values
index
rate
1
value
4
value
6
value
output df
index
rate
1
value
2
0
3
0
4
value
5
0
6
value
Try this – using .join() to merge dataframes on their indices and combining two columns using .combine_first():
df = df1.join(df2, lsuffix="_df1", rsuffix="_df2")
df["rate"] = df["rate_df2"].combine_first(df["rate_df1"])
EDIT:
This assumes both dataframes use a matching index. If that is not the case for df2, run this first:
df2 = df2.set_index('index')

How to append column values of one dataframe to column of another dataframe

I'm working with 2 dataframes, A & B. Dataframe A is populated with values, while dataframe B is empty except for a header structure
I want to take the value of column in dataframe A, and append them to the corresponding column in dataframe B.
I've placed the values of the dataframe A column I want to append in a list. I 've tried setting the destination column values to equal the list of start column values, but that gives me the following error:
dataframeB[x] = list(dataframeA[A])
This yields the following error:
ValueError: Length of values does not match length of index
The result I expect is
Dataframe A's column A transfers over to Dataframe B's column x
A B C D
1 2 3 4
1 2 3 4
Dataframe B
x y
- -
Create the dataframe with the data already in it...
dataframeB = pd.DataFrame(dataframeA['A'], columns = ['x'])
Then you can add columns in from the other dataframe:
dataframeB['y'] = dataframeA['B']
Result:
x y
1 2
1 2

Occurence frequency from a list against each row in Pandas dataframe

Let say I have a list of 6 integers named ‘base’ and a dataframe of 100,000 rows with 6 columns of integers as well.
I need to create an additional column which show frequency of occurences of the list ‘base’ against each row in the dataframe data.
The sequence of integers both in the list ‘base’ and dataframe are to be ignored in this case.
The occurrence frequency can have a value ranging from 0 to 6.
0 means all 6 integers in list ‘base’ does not match any of 6 columns from a row in the dataframe.
Can anyone shed some light on this please ?
you can try this:
import pandas as pd
# create frame with six columns of ints
df = pd.DataFrame({'a':[1,2,3,4,10],
'b':[8,5,3,2,11],
'c':[3,7,1,8,8],
'd':[3,7,1,8,8],
'e':[3,1,1,8,8],
'f':[7,7,1,8,8]})
# list of ints
base =[1,2,3,4,5,6]
# define function to count membership of list
def base_count(y):
return sum(True for x in y if x in base)
# apply the function row wise using the axis =1 parameter
df.apply(base_count, axis=1)
outputs:
0 4
1 3
2 6
3 2
4 0
dtype: int64
then assign it to a new column:
df['g'] = df.apply(base_count, axis=1)

Dividing two columns of an unstacked dataframe

I have two columns in a pandas dataframe.
Column 1 is ed and contains strings (e.g. 'a','a','b,'c','c','a')
ed column = ['a','a','b','c','c','a']
Column 2 is job and also contains strings (e.g. 'aa','bb','aa','aa','bb','cc')
job column = ['aa','bb','aa','aa','bb','cc'] #these are example values from column 2 of my pandas data frame
I then generate a two column frequency table like this:
my_counts= pdata.groupby(['ed','job']).size().unstack().fillna(0)
Now how do I then divide the frequencies in one column by the frequencies in another column of that frequency table? I want to take that ratio and use it to argsort() so that I can sort by the calculated ratio but I don't know how to reference each column of the resulting table.
I initialized the data as follows:
ed_col = ['a','a','b','c','c','a']
job_col = ['aa','bb','aa','aa','bb','cc']
pdata = pd.DataFrame({'ed':ed_col, 'job':job_col})
my_counts= pdata.groupby(['ed','job']).size().unstack().fillna(0)
Now my_counts looks like this:
job aa bb cc
ed
a 1 1 1
b 1 0 0
c 1 1 0
To access a column, you could use my_counts.aa or my_counts['aa'].
To access a row, you could use my_counts.loc['a'].
So the frequencies of aa divided by bb are my_counts['aa'] / my_counts['bb']
and now, if you want to get it sorted, you can do:
my_counts.iloc[(my_counts['aa'] / my_counts['bb']).argsort()]

Categories

Resources