Get value from pandas df with only one line - python

I am creating a simulator for a football dice game and I am running in to an issue in getting values from the dataframe.
My code for the result of a play is this:
def off_play(off_team_chart):
off_UPCID = int(input('Enter UPCID for Offense Play: '))
off_play = off_team_chart[off_team_chart['UPCID'] == off_UPCID]
oRoll = dice.oDice()
oPlay = off_play[off_play['DieRoll']==oRoll]
print(oPlay)
oResult = oPlay['ResultCodeID']
oYards = oPlay['Yards']
return oResult, oYards
which when run outputs the following:
TeamChartDetailD TeamChartID UPCID ... ResultCodeID Yards OutOfBounds
108292 866811 874 8 ... 8 19 False
[1 rows x 7 columns]
108292 8
Name: ResultCodeID, dtype: int64 108292 19
Name: Yards, dtype: object
I would like to have oResult be the int 8 and oYards be the int 19 in this scenario, the documentation for pandas seemed to suggest that I will need to know the 108292 number in order to get the value. Is there a way around this?

The code is printing the row index.
In order to print the return correctly, you should use the .item() method as follows:
oResult = oPlay['ResultCodeID'].item()
oYards = oPlay['Yards'].item()

Related

How to filter the values in a dataframe where a column having list of strings as value?

domain intents expressionId name
0 [] [] 8 When I dare to be powerful – to use my strengt...
1 [] [] 7 When one door of happiness closes, another ope...
2 [] [] 6 The first step toward success is taken when yo...
3 [] [] 5 Get busy living or get busy dying
4 [financial, education] [resolutions] 4 You know you’re in love when you can’t fall as...
5 [financial, business] [materialistic]3 Honesty is the best policy
Here is my dataframe which is having the domain column with list of strings. And, what I want is to fetch only rows that are having
'domain' contains 'financial'(domain==financial), so that I get the below results:
domain intents expressionId name
4 [financial, education] [resolutions] 4 You know you’re in love when you can’t fall as...
5 [financial, business] [materialistic]3 Honesty is the best policy
What I have tried so far is with the below command:
df['domain'].map(lambda x: 'financial' in x)
This is returning, the column with objectType as 'bool'.
0 False
1 False
2 False
3 False
4 True
5 True
Name: domain, dtype: bool
But what I want is the filtered result not the bool values.
Please help me with this. Thank you.
dfFinaicals = df[df['domain'].map(lambda x: 'financial' in x)]
This is just asking the domain and using it to select the rows. You were almost there.
dfFinaicals = df[df['domain'].contains('financial')]
is more elegant
df[df.domain.apply(lambda v: 'financial' in v)]
df.domain.contains('financial') didn't work for me.
Without actually trying to make your dataframe, I would suggest:
new_df = old_df[old_df['domain'] == 'financial']

How to match input data and a data in df, for loop minus

I want the input str to match with str in file that have fix row and then I will minus the score column of that row
1!! == i think this is for loop to find match str line by line from first to last
2!! == this is for when input str have matched it will minus score of matched row by 1.
CSV file:
article = pd.read_csv('Customer_List.txt', delimiter = ',',names = ['ID','NAME','LASTNAME','SCORE','TEL','PASS'])
y = len(article.ID)
line=article.readlines()
for x in range (0,y): # 1!!
if word in line :
newarticle = int(article.SCORE[x]) - 1 #2!!
print(newarticle)
else:
x = x + 1
P.S. I have just study python for 5 days, please give me a suggestion.Thank you.
Since I see you using pandas, I will give a solution without any loops as it is much easier.
You have, for example:
df = pd.DataFrame()
df['ID'] = [216, 217]
df['NAME'] = ['Chatchai', 'Bigm']
df['LASTNAME'] = ['Karuna', 'Koratuboy']
df['SCORE'] = [25, 15]
You need to do:
lookfor = str(input("Enter the name: "))
df.loc[df.NAME == lookfor, 'SCORE']-= 1
What happens in the lines above is, you look for the name entered in the NAME column of your dataframe, and reduce the score by 1 if there is a match, which is what you want if I understand your question.
Example:
Now, let's say you are looking for a person called Alex with the name, since there is no such person, you must get the same dataframe back.
Enter the name: Alex
ID NAME LASTNAME SCORE
0 216 Chatchai Karuna 25
1 217 Bigm Koratuboy 15
Now, let's say you are looking for a person called Chatchai with the name, since there is a match and you want the score to be reduced, you will get:
Enter the name: Chatchai
ID NAME LASTNAME SCORE
0 216 Chatchai Karuna 24
1 217 Bigm Koratuboy 15

Assignment to DataFrame not working but dtypes changed

Assignment to DataFrame not working but dtypes changed.
New to data-science, I wanna assign the target_frame to the empty_frame, but it's not working until assign again. And during the assignments, the dtypes of empty_frame has changed from int32 to float64 and finally setup to int64.
I try to simplify my model as the code below, they have the same problem.
import pandas as pd
import numpy as np
dataset = [[[i for i in range(5)], ] for i in range(5)]
dataset = pd.DataFrame(dataset, columns=['test'])
empty_numpy = np.arange(25).reshape(5, 5)
empty_numpy.fill(np.nan)
# Solution 1: change the below code into 'empty_frame = pd.DataFrame(empty_numpy)' then everything will be fine
empty_frame = pd.DataFrame(empty_numpy, columns=[str(i) for i in range(5)])
series = dataset['test']
target_frame = pd.DataFrame(list(series))
# Solution 2: run `empty_frame[:] = target_frame` twice, work fine to me.
# ==================================================================
# First try.
empty_frame[:] = target_frame
print("="*40)
print(f"Data types of empty_frame: {empty_frame.dtypes}")
print("="*40)
print("Result of first try: ")
print(empty_frame)
print("="*40)
# Second try.
empty_frame[:] = target_frame
print(f"Data types of empty_frame: {empty_frame.dtypes}")
print("="*40)
print("Result of second try: ")
print(empty_frame)
print("="*40)
# ====================================================================
I expect the output of code above should be:
========================================
Data types of empty_frame: 0 int64
1 int64
2 int64
3 int64
4 int64
dtype: object
========================================
Result of first try:
0 1 2 3 4
0 0 1 2 3 4
1 0 1 2 3 4
2 0 1 2 3 4
3 0 1 2 3 4
4 0 1 2 3 4
========================================
but it's not working when I first try.
There are two solutions for this problem but I don't know why:
as I showed in my code, try the assignment twice in one run.
remove the columns' name when creating empty_frame.
Two things I want to figure out:
why empty_frame's data types changed.
why the solutions showed in my code can solve this assignment problem.
Thanks.
if I understand your question correctly, then your problem starts when you create empty_numpy matrix.
My favourite solution would be to use empty_numpy = np.empty([5,5]) instead (default dtypes is float64 here). Then the "Result of first try: " is correct. It means:
import pandas as pd
import numpy as np
dataset = [[[i for i in range(5)],] for i in range(5)]
dataset = pd.DataFrame(dataset, columns=['test'])
empty_numpy = np.empty([5,5])
# here you may add empty_numpy.fill(np.nan) but it's not necessary,result is the same
empty_frame = pd.DataFrame(empty_numpy, columns=[str(i) for i in range(5)])
series = dataset['test']
target_frame = pd.DataFrame(list(series))
# following assignment is correct then
empty_frame[:] = target_frame
print('='*40)
print(f'Data types of empty_frame: {empty_frame.dtypes}')
print('='*40)
print("Result of first try: ")
print(empty_frame)
print("="*40)
Or just add dtype attribute to your np.arrange call, just like this:
empty_numpy = np.arange(25, dtype=float).reshape(5, 5)
Then it works too (but it's a little boring ;o).

converting an object to float in pandas along with replacing a $ sign

I am fairly new to Pandas and I am working on project where I have a column that looks like the following:
AverageTotalPayments
$7064.38
$7455.75
$6921.90
ETC
I am trying to get the cost factor out of it where the cost could be anything above 7000. First, this column is an object. Thus, I know that I probably cannot do a comparison with it to a number. My code, that I have looks like the following:
import pandas as pd
health_data = pd.read_csv("inpatientCharges.csv")
state = input("What is your state: ")
issue = input("What is your issue: ")
#This line of code will create a new dataframe based on the two letter state code
state_data = health_data[(health_data.ProviderState == state)]
#With the new data set I search it for the injury the person has.
issue_data=state_data[state_data.DRGDefinition.str.contains(issue.upper())]
#I then make it replace the $ sign with a '' so I have a number. I also believe at this point my code may be starting to break down.
issue_data = issue_data['AverageTotalPayments'].str.replace('$', '')
#Since the previous line took out the $ I convert it from an object to a float
issue_data = issue_data[['AverageTotalPayments']].astype(float)
#I attempt to print out the values.
cost = issue_data[(issue_data.AverageTotalPayments >= 10000)]
print(cost)
When I run this code I simply get nan back. Not exactly what I want. Any help with what is wrong would be great! Thank you in advance.
Try this:
In [83]: df
Out[83]:
AverageTotalPayments
0 $7064.38
1 $7455.75
2 $6921.90
3 aaa
In [84]: df.AverageTotalPayments.str.extract(r'.*?(\d+\.*\d*)', expand=False).astype(float) > 7000
Out[84]:
0 True
1 True
2 False
3 False
Name: AverageTotalPayments, dtype: bool
In [85]: df[df.AverageTotalPayments.str.extract(r'.*?(\d+\.*\d*)', expand=False).astype(float) > 7000]
Out[85]:
AverageTotalPayments
0 $7064.38
1 $7455.75
Consider the pd.Series s
s
0 $7064.38
1 $7455.75
2 $6921.90
Name: AverageTotalPayments, dtype: object
This gets the float values
pd.to_numeric(s.str.replace('$', ''), 'ignore')
0 7064.38
1 7455.75
2 6921.90
Name: AverageTotalPayments, dtype: float64
Filter s
s[pd.to_numeric(s.str.replace('$', ''), 'ignore') > 7000]
0 $7064.38
1 $7455.75
Name: AverageTotalPayments, dtype: object

comparing column values based on other column values in pandas

I have a dataframe:
import pandas as pd
import numpy as np
df = pd.DataFrame([['M',2014,'Seth',5],
['M',2014,'Spencer',5],
['M',2014,'Tyce',5],
['F',2014,'Seth',25],
['F',2014,'Spencer',23]],columns =['sex','year','name','number'])
print df
I would like to find the most gender ambiguous name for 2014. I have tried many ways but haven't had any luck yet.
NOTE: I do write a function at the end of my answer, but I decided to run through the code part by part for better understanding.
Obtaining Gender Ambiguous Names
First, you would want to get the list of gender ambiguous names. I would suggest using set intersection:
>>> male_names = df[df.sex == "M"].name
>>> female_names = df[df.sex == "F"].name
>>> gender_ambiguous_names = list(set(male_names).intersection(set(female_names)))
Now, you want to actually subset the data to show only gender ambiguous names in 2014. You would want to use membership conditions and chain the boolean conditions as a one-liner:
>>> gender_ambiguous_data_2014 = df[(df.name.isin(gender_ambiguous_names)) & (df.year == 2014)]
Aggregating the Data
Now you have this as gender_ambiguous_data_2014:
>>> gender_ambiguous_data_2014
sex year name number
0 M 2014 Seth 5
1 M 2014 Spencer 5
3 F 2014 Seth 25
4 F 2014 Spencer 23
Then you just have to aggregate by number:
>>> gender_ambiguous_data_2014.groupby('name').number.sum()
name
Seth 30
Spencer 28
Name: number, dtype: int64
Extracting the Name(s)
Now, the last thing you want is to get the name with the highest numbers. But in reality you might have gender ambiguous names that have the same total numbers. We should apply the previous result to a new variable gender_ambiguous_numbers_2014 and play with it:
>>> gender_ambiguous_numbers_2014 = gender_ambiguous_data_2014.groupby('name').number.sum()
>>> # get the max and find the list of names:
>>> gender_ambiguous_max_2014 = gender_ambiguous_numbers_2014[gender_ambiguous_numbers_2014 == gender_ambiguous_numbers_2014.max()]
Now you get this:
>>> gender_ambiguous_max_2014
name
Seth 30
Name: number, dtype: int64
Cool, let's extract the index names then!
>>> gender_ambiguous_max_2014.index
Index([u'Seth'], dtype='object')
Wait, what the heck is this type? (HINT: it's pandas.core.index.Index)
No problem, just apply list coercion:
>>> list(gender_ambiguous_max_2014.index)
['Seth']
Let's Write This in a Function!
So, in this case, our list has only element. But maybe we want to write a function where it returns a string for the sole contender, or returns a list of strings if some gender ambiguous names have the same total number in that year.
In the wrapper function below, I abbreviated my variable names with ga to shorten the code. Of course, this is assuming the data set is in the same format you have shown and is named df. If it's named otherwise just change the df accordingly.
def get_most_popular_gender_ambiguous_name(year):
"""Get the gender ambiguous name with the most numbers in a certain year.
Returns:
a string, or a list of strings
Note:
'gender_ambiguous' will be abbreviated as 'ga'
"""
# get the gender ambiguous names
male_names = df[df.sex == "M"].name
female_names = df[df.sex == "F"].name
ga_names = list(set(male_names).intersection(set(female_names)))
# filter by year
ga_data = df[(df.name.isin(ga_names)) & (df.year == year)]
# aggregate to get total numbers
ga_total_numbers = ga_data.groupby('name').number.sum()
# find the max number
ga_max_number = ga_total_numbers.max()
# subset the Series to only those that have max numbers
ga_max_data = ga_total_numbers[
ga_total_numbers == ga_max_number
]
# get the index (the names) for those satisfying the conditions
most_popular_ga_names = list(ga_max_data.index) # list coercion
# if list only contains one element, return the only element
if len(most_popular_ga_names) == 1:
return most_popular_ga_names[0]
return most_popular_ga_names
Now, calling this function is as easy as it gets:
>>> get_most_popular_gender_ambiguous_name(2014) # assuming df is dataframe var name
'Seth'
Not sure what do you mean by 'most gender ambigious', but you can start from this
>>> dfy = (df.year == 2014)
>>> dfF = df[(df.sex == 'F') & dfy][['name', 'number']]
>>> dfM = df[(df.sex == 'M') & dfy][['name', 'number']]
>>> pd.merge(dfF, dfM, on=['name'])
name number_x number_y
0 Seth 25 5
1 Spencer 23 5
If you want just the name with highest total number then:
>>> dfT = pd.merge(dfF, dfM, on=['name'])
>>> dfT
name number_x number_y
0 Seth 25 5
1 Spencer 23 5
>>> dfT['total'] = dfT['number_x'] + dfT['number_y']
>>> dfT.sort_values('total', ascending=False).head(1)
name number_x number_y total
0 Seth 25 5 30

Categories

Resources