Copy Row(s) from One DataFrame to Another with Regex [duplicate] - python

This question already has answers here:
How to test if a string contains one of the substrings in a list, in pandas?
(4 answers)
Closed 5 months ago.
I am trying to extract specific rows from a dataframe where values in a column contain a designated string. For example, the current dataframe looks like:
df1=
Location Value Name Type
Up 10 Test A X
Up 12 Test B Y
Down 11 Prod 1 Y
Left 8 Test C Y
Down 15 Prod 2 Y
Right 30 Prod 3 X
And I am trying to build a new dataframe will all rows that have "Test" in the 'Name' column.
df2=
Location Value Name Type
Up 10 Test A X
Up 12 Test B Y
Left 8 Test C Y
Is there a way to do this with regex or match?

Try:
df_out = df[df["Name"].str.contains("Test")]
print(df_out)
Prints:
Location Value Name Type
0 Up 10 Test A X
1 Up 12 Test B Y
3 Left 8 Test C Y

How about: df2 = df1.loc[['Test' in name for name in df1.Name ]]

Related

Pandas replace value with other column value [duplicate]

This question already has answers here:
Pandas conditional creation of a series/dataframe column
(13 answers)
Closed 25 days ago.
I have a table and I want to replace column values with other columns values based on a condition:
Table:
A
B
C
D
E
x
1
test
fool
bar
y
3
test
fool
bar
If column C contains the word test -> value should be replaced with content of column A
If column D contains the word fool -> value should be replaced with content of column B
A
B
C
D
E
x
1
x
1
bar
y
3
y
3
bar
How can I create this table?
We can use np.where here:
df["C"] = np.where(df["C"] == "test", df["A"], df["C"])
df["D"] = np.where(df["D"] == "fool", df["B"], df["D"])

fill a dataframe with value of another dataframe according to columns value [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 10 months ago.
I have two dataframe:
the first one, let's say dfrA
x,y,z
0,0,1
0,1,2
0,2,3
0,3,4
1,0,5
1,1,6
1,2,7
1,3,8
2,0,9
2,1,10
2,2,11
2,3,12
3,0,13
3,1,14
3,2,15
3,3,16
and another one, let's say dfrB
x,y
1,2
2,3
I would like to add a column in dfrB according with z value in the dfrA which has the same x and y of the dfrB.
In other words I expect:
x,y,z
1,2,7
2,3,12
I am able to a empty column to dfrB:
df_support = pd.DataFrame(columns=['z'])
dfrB = dfrB.join(df_support, how="outer")
how can now fill column z in dfrB? I would like to avoid to do a cycle full of if.
You can try pandas.DataFrame.merge
dfrB['z'] = dfrB.merge(dfrA, on=['x', 'y'], how='left')['z']
print(dfrB)
x y z
0 1 2 7
1 2 3 12

Two column DataFrame to transition table (pivot) [duplicate]

This question already has answers here:
Get statistics for each group (such as count, mean, etc) using pandas GroupBy?
(9 answers)
How can I pivot a dataframe?
(5 answers)
Closed 3 years ago.
I have a pandas dataframe with two columns. I want to measure the transition count, that is, the number of times that each unique first column value is related to each unique second column value. This should be a pivot or pivot_table but I am stuck. In the code pasted, trial is the input dataframe, and ans is the answer dataframe what I would like to see by manipulating the trial dataframe.
I did not spot a similar dataframe question which has only two columns. The others used pivot on a third table where a mean or sum aggfunc were used. This is a case where there are only two columns, and I want to count the transitions. The other questions also used numerical columns where aggregation is possible. I want to count the columns for a non-numeric value.
If there is a similar question, would be very helpful if someone can point me to it.
trial=pd.DataFrame({'col1':list('AABCCCDDDD'),'col2':list('XYXXXYYXZZ')})
index col1 col2
0 A X
1 A Y
2 B X
3 C X
4 C X
5 C Y
6 D Y
7 D X
8 D Z
9 D Z
ans=pd.DataFrame({'col1':list('ABCD'),'X':[1,1,2,1],'Y':[1,0,1,1],'Z':[0,0,0,2]})
ans.set_index('col1')
col1 X Y Z
A 1 1 0
B 1 0 0
C 2 1 0
D 1 1 2

Getting a cell value from a column based on cell value from another column [duplicate]

This question already has an answer here:
Pandas select rows and columns based on boolean condition
(1 answer)
Closed 4 years ago.
I have this dataframe
d = {'Number': [1, 2,3,4,5,6,7], 'Letters': ["a", "d","z","f","u","p","g"]}
df = pd.DataFrame(data=d)
Number Letters
0 1 a
1 2 d
2 3 z
3 4 f
4 5 u
5 6 p
6 7 g
And i want to get a value from Letters column based on the Number column
Lets say i want to get the letter where the number is 3
What I did was
letter = df.loc[df['Number'] == 3]
dfletter = pd.DataFrame(data=letter.values, columns = ['Number', 'Letter'])
dfletter = dfletter.drop(columns = 'Number')
which gives me what i want
Letter
0 z
But this seems like a dumb workaround, so I am looking for a better solution
output = df.loc[df['Number'] == 3, 'Letters']
>>> df[df.Number == 3].Letters
2 z
Name: Letters, dtype: object
Or, if you really need a scalar value:
>>> df[df.Number == 3].Letters.values[0]
'z'

Error subsetting a data frame in python [duplicate]

This question already has an answer here:
Python - splitting dataframe into multiple dataframes based on column values and naming them with those values [duplicate]
(1 answer)
Closed 4 years ago.
I am learning python and pandas and am having trouble overcoming an error while trying to subset a data frame.
I have an input data frame:
df0-
Index Group Value
1 A 10
2 A 15
3 B 20
4 C 10
5 C 10
df0.dtypes-
Group object
Value float64
That I am trying to split out into unique values based off of the Group column. With the output looking something like this:
df1-
Index Group Value
1 A 10
2 A 15
df2-
Index Group Value
3 B 20
df3-
Index Group Value
4 C 10
5 C 10
So far I have written this code to subset the input:
UniqueGroups = df0['Group'].unique().tolist()
OutputFrame = {}
for x in UniqueAgencies:
ReturnFrame[str('ConsolidateReport_')+x] = UniqueAgencies[df0['Group']==x]
The code above returns the following error, which I can`t quite work my head around. Can anyone point me in the right direction?
*** TypeError: list indices must be integers or slices, not str
you can use groupby to group the column
for _, g in df0.groupby('Group'):
print g

Categories

Resources