Querying panda DataFrame column to the main datasource - python

Hi Guys I'm new to python and I want to learn how to query a data column to my data source.
this is my panda dataframe
[In] top10_athletes = athletes.head(10)
top10_athletes = top10_athletes.rename(columns={'index': 'Name', 'Name': 'Medal Count'})
top10_athletes.index = np.arange(1,len(top10_athletes)+1)
top10_athletes
[Out]
Name Medal Count
1 Michael Fred Phelps, II 28
2 Larysa Semenivna Latynina (Diriy-) 18
3 Nikolay Yefimovich Andrianov 15
4 Ole Einar Bjrndalen 13
5 Borys Anfiyanovych Shakhlin 13
6 Edoardo Mangiarotti 13
7 Takashi Ono 13
8 Birgit Fischer-Schmidt 12
9 Paavo Johannes Nurmi 12
10 Sawao Kato 12
I want to query all the values in the Name column into my main data source
the only way that I could think of is this piece of code I tried searching in Google
df.query("Name == 'Michael Fred Phelps, II'")
Thanks guys!

To query by Name column you can use the following:
df[df["Name"] == 'Michael Fred Phelps, II')

Related

How to delete a column in panda if in a row we can not see a value SpaceX in panda?

I have an excel file to analyze but have a lot of data that I don't want to analyze, can we delete a column if we don't find the value SpaceX string in the first row like following
SL# State District 10/01/2021 10/01/2021 10/01/2021 11/01/2021 11/01/2021 11/01/2021
SpaceX in Star in StarX out SpaceX out Star out StarX in
1 wb al 10 11 12 13 14 15
2 wb not 23 22 20 24 25 25
Now here I want to delete the columns where in the rows SpaceX not there. And then Want to delete the SpaceX as well to shift up the rows ultimate output will look like as follows
SL# State District 10/01/2021 11/01/2021
1 wb al 10 13
2 wb not 23 24
Tried with loc and iloc functions but no clue at the moment.
Also checked the answer: Drop columns if rows contain a specific value in Pandas but it's different. I'm checking the substring not the exact value match.
Firstly create a boolean mask with startswith() method and fillna() method:
mask=df.loc[0].str.startswith('SpaceX').fillna(True)
Finally use Transpose(T) attribute,loc accessor and drop() method:
df=df.T.loc[mask].T.drop(0)
Output of df:
SL# State District 2021-01-10 00:00:00 2021-01-11 00:00:00 2021-01-12 00:00:00
1 1.0 wb al 10 13 16
2 2.0 wb not 23 13 16

How to recode the columns based on condition

I have a bigdata to analyze that includes many rows with columns.
I would like to make a new column('Recode_Brand') copying 'Brand' column based on the condition that only displays Top 10 brands and 'Others'
Then how can I make the condition and the logic?
It will be perfect if I can use the condition as below;
Brand_list = ['Google', 'Apple', 'Amazon', 'Microsoft', 'Tencent', 'Facebook', 'Visa', 'McDonald's', 'Alibaba', 'AT&T']
I am quite new to Pandas and need your support. Highly appreciate in advance.
enter image description here
Just use the 2018 column, for example:
df['Recode_Brand'] = df.apply(lambda row: row['Brand'] if row['2018'] <= 10 else 'Other', axis=1)
Or otherwise if you need that brands list you can do:
Brand_list = ["Google", "Apple", "Amazon", "Microsoft", "Tencent", "Facebook", "Visa", "McDonald's", "Alibaba", "AT&T"]
df['Recode_Brand'] = df.apply(lambda row: row['Brand'] if row['Brand'] in Brand_list else 'Other', axis=1)
NB If your string contains a ' character as in McDonald's, you have to either wrap it in double quotes ", or to escape that character with \'.
Use numpy.where to find Brand in top10 and add a new column:
df = pd.DataFrame({'2018':[7,8,3,12,15,16,10,9,4,5,11,1,14,2,13,6],
'Brand':['Google','Apple','Amazon','Microsoft','Tencent','Facebook','Visa','McDonalds','Alibaba','AT&T',
'IBM','Verizon','Marlboro','Coca-Cola','MasterCard','UPS']})
Create a new series with top 10 brands
top10 = df.nsmallest(10, '2018')
And add a new column, Recode_Brand if brand is in top10 else 'Others'
df['Recode_Brand'] = np.where((df['Brand'].eq(top10['Brand'])),df['Brand'],'Others')
print(df)
2018 Brand Recode_Brand
0 7 Google Google
1 8 Apple Apple
2 3 Amazon Amazon
3 12 Microsoft Others
4 15 Tencent Others
5 16 Facebook Others
6 10 Visa Visa
7 9 McDonalds McDonalds
8 4 Alibaba Alibaba
9 5 AT&T AT&T
10 11 IBM Others
11 1 Verizon Verizon
12 14 Marlboro Others
13 2 Coca-Cola Coca-Cola
14 13 MasterCard Others
15 6 UPS UPS

Pandas, Dataframe, conditional sum of column for each row

I am new to python and trying to move some of my work from excel to python, and wanted an excel SUMIFS equivalent in pandas, for example something like:
SUMIFS(F:F, D:D, "<="&C2, B:B, B2, F:F, ">"&0)
I my case, I have 6 columns, a unique Trade ID, an Issuer, a Trade date, a release date, a trader, and a quantity. I wanted to get a column which show the sum of available quantity for release at each row. Something like the below:
A B C D E F G
ID Issuer TradeDate ReleaseDate Trader Quantity SumOfAvailableRelease
1 Horse 1/1/2012 13/3/2012 Amy 7 0
2 Horse 2/2/2012 15/5/2012 Dave 2 0
3 Horse 14/3/2012 NaN Dave -3 7
4 Horse 16/5/2012 NaN John -4 9
5 Horse 20/5/2012 10/6/2012 John 2 9
6 Fish 6/6/2013 20/6/2013 John 11 0
7 Fish 25/6/2013 9/9/2013 Amy 4 11
8 Fish 8/8/2013 15/9/2013 Dave 5 11
9 Fish 25/9/2013 NaN Amy -3 20
Usually, in excel, I just pull the SUMIFS formulas down the whole column and it will work, I am not sure how I can do it in python.
Many thanks!
What you could do is a df.where
so for example you could say
Qdf = df.where(df["Quantity"]>=5)
and then do you sum, Idk what you want to do since I have 0 knowledge about excell but I hope this helps

How to extract info from original dataframe after doing some analysis on it?

So I had a dataframe and I had to do some cleansing to minimize the duplicates. In order to do that I created a dataframe that had instead of 40 only 8 of the original columns. Now I have two columns I need for further analysis from the original dataframe but they would mess with the desired outcome if I used them in my previous analysis. Anyone have any idea on how to "extract" these columns based on the new "clean" dataframe I have?
You can merge the new "clean" dataframe with the other two variables by using the indexes. Let me use a pratical example. Suppose the "initial" dataframe, called "df", is:
df
name year reports location
0 Jason 2012 4 Cochice
1 Molly 2012 24 Pima
2 Tina 2013 31 Santa Cruz
3 Jake 2014 2 Maricopa
4 Amy 2014 3 Yuma
while the "clean" dataframe is:
d1
year location
0 2012 Cochice
2 2013 Santa Cruz
3 2014 Maricopa
The remaing columns are saved in dataframe "d2" ( d2 = df[['name','reports']] ):
d2
name reports
0 Jason 4
1 Molly 24
2 Tina 31
3 Jake 2
4 Amy 3
By using the inner join on the indexes d1.merge(d2, how = 'inner' left_index= True, right_index = True) you get the following result:
name year reports location
0 Jason 2012 4 Cochice
2 Tina 2013 31 Santa Cruz
3 Jake 2014 2 Maricopa
You can make a new dataframe with the specified columns;
import pandas
#If your columns are named a,b,c,d etc
df1 = df[['a','b']]
#This will extract columns 0, to 2 based on their index
#[remember that pandas indexes columns from zero!
df2 = df.iloc[:,0:2]
If you could, provide a sample piece of data, that'd make it easier for us to help you.

Split one table into multiple table based on one column [duplicate]

This question already has an answer here:
Convert pandas.groupby to dict
(1 answer)
Closed 4 years ago.
Given a table (/dataFrame) x:
name day earnings revenue
Oliver 1 100 44
Oliver 2 200 69
John 1 144 11
John 2 415 54
John 3 33 10
John 4 82 82
Is it possible to split the table into two tables based on the name column (that acts as an index), and nest the two tables under the same object (not sure about the exact terms to use). So in the example above, tables[0] will be:
name day earnings revenue
Oliver 1 100 44
Oliver 2 200 69
and tables[1] will be:
name day earnings revenue
John 1 144 11
John 2 415 54
John 3 33 10
John 4 82 82
Note that that the number of rows in each 'sub-table' may vary.
Cheers,
Create dictionary of DataFrames:
dfs = dict(tuple(df.groupby('name')))
And then select by keys - value of name column:
print (dfs['Oliver'])
print (dfs['John'])

Categories

Resources