Reading Excel file based on 2 conditions in Pythton 3.4 - python

I have a excel file that has 5 columns holding player data.
column A: Player name, Column B: Team:, Column C: points, column D, Cost, Column E: position.
I know how to get the price of the player by entering the player name as follows:
from openpyxl import load_workbook
print ("Going to execute the Player Choices")
total = {}
for player in range(4):
player = input("Enter player name")
wb = load_workbook("LeaguePlayers.xlsx")
ws = wb.active
for cell in ws.columns[0]: # get first column
if cell.value == player:
cost = ws.cell(row=cell.row, column=4).value
position = ws.cell(row=cell.row, column=5).value
print("{0} ({2}) costs {1}".format(player, cost, position))
total[player] = cost
break
print("Total Spend is: ",sum(total.values()),"Million")
print ("End of player choices")
print(total)
What I want to know is how is it possible to get a players price if the player I have searched for is position "Midfielder" from column E. So just to be clear If I want to get a price for a midfielder and I type Rooney it should look in column E and realise this is not a midfielder and prompt me to enter again until I enter a player who is a midfielder and the price is then displayed.
Any pointers much appreciated.
Thanks

To be honest, the best solution to this problem is to take the data from Excel and put it in a database which you can then query. Alternatively, you might want to query the data with Excel using the excellent xlwings.

Related

How to compare all rows from a Data frame with each other and alter values, in a timely manner? [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 3 months ago.
I have a pandas Dataframe of tennis games with 70,000 games (rows) with two issues:
Every game is duplicated, because for every game between player A and B, there's a row when A plays with B and a row when B plays with A. This happens because I extracted all games played for each player, so I have all games that Nadal played, and then all games that Federer played. For the games I extracted from Nadal's page, Nadal is player A and Federer is player B, and for the games I extracted from Federer's page, Federer is player A and Nadal is player B.
The second issue is that for every game, I only have info about player A, so using the example mentioned before, for the games I extracted where Nadal is player A, facing Federer, I have Nadal's height, age and ranking, but I don't have that info for Federer. And for the games I extracted where Federer is player A, facing Nadal, I have Federer's height, age and ranking, but I don't have that info for Nadal
Bellow is the example of the data for a better understanding:
Player A
Rank
Height
Age
Tourn.
Year
Round
Player B
Result
Nadal
3
185
37
US Open
2019
Finals
Federer
W
Federer
7
183
40
US Open
2019
Finals
Nadal
L
My objective is to add in the same row the information of both players like this:
Player A
Rank
Height
Age
Tourn.
Year
Round
Player B
Rank_B
Height_B
Age_B
Result
Nadal
3
185
37
US Open
2019
Finals
Federer
7
183
40
W
And then remove all duplicate lines.
I have already solved the issue by doing a for loop inside a for loop and comparing every line. Once the criteria I set is met I proceed to change the lines. I consider that a game is duplicate if in the same year, tournament and round, the same players face each other.
import pandas as pd
import numpy as np
games = pd.read_csv("games.csv")
# create the new columns to add info of opponent:
games["Rank_B"] = np.nan
games["Height_B"] = np.nan
games["Age_B"] = np.nan
# loop through every line:
for i in range(0,len(games)):
# if the row was already mark to delete skip it
if games.loc[i, "p_name"] == "Delete":
next
# for each line compare it to every line:
for j in range(0,len(games)):
if games.loc[i, "Tourn."] == games.loc[j, "Tourn."] and games.loc[i, "Year"] == games.loc[j, "Year"] and games.loc[i, "Round"] == games.loc[j, "Round"] and games.loc[i, "Player A"] == games.loc[j, "Player B"]:
games.loc[i, "Height_B"] = games.loc[j, "Height"]
games.loc[i, "Rank_B"] = games.loc[j, "Rank"]
games.loc[i, "Age_B"] = games.loc[j, "Age"]
# marks row to delete because it is duplicate:
games.loc[j, "p_name"] = "Delete"
break
games = games[games["p_name"].str.contains("Delete") == False]
The problem is that my solution is very slow, taking a whopping 12 hours to run for 70,000 rows. If I want to run this code with a dataframe of 1,000,000 rows this solution is impractical.
Can you think of a better way to accomplish my objective?
Try with merge:
df = pd.merge(left=df, right=df, on=['Tourn.','Round','Year'])
Then remove duplicates:
df.drop_duplicates(subset=['Tourn.','Round','Year'], inplace=True)
After you just need to rename the column names
You can then leave only rows with the same playerA & playerB:
df = df[df['Player A_x'] == df['Player A_y']]
Just a thought, but typically when I loop through a df I use the iterrows function.
iterrows
instead of:
for i in range(len(games)):
use something like:
for index, row in games.iterrows()
then using the row['Column'] to locate the value you are interested in. I think this will speed up the loop a bit.

How to sum specific values in a csv file in Python?

I am trying to search through a CSV file for certain criteria, and anything that fits that criteria, to be printed as a sum.
Example data:
| city | state | college | cases |
|Huntsville | Alabama | Alabama A&M University | 42 |
etc, for hundreds of lines. I would like to be able to search the data, for example, the state of Alabama, and sum all cases that are equal to that state.
This is what I have so far:
category = input(What would you like to look up? Please enter 'city', 'state', or 'college': ")
if category == "city":
city = input("Enter a city: ")
for row in reader:
if row[0] == city:
print("The city of", city, "has had a total of", row[3], "cases at", row[2])
print("All cities with the name", city, "have a total of", sum(row[3]), "cases.")
The row numbers entered correspond to the row I need in the original CSV file. All code works, except for my last line, where the sum command for the row clearly does not work. While playing around with different options, it does not like that it is a string variable (even though it's all numbers for the cases). Is there a better way to do this? Thank you.
sum(row[3]), assuming it works at all, is just going to return row[3] (explanation here). You need to change your code as follows.
category = input(What would you like to look up? Please enter 'city', 'state', or 'college': ")
if category == "city":
city = input("Enter a city: ")
sum = 0
for row in reader:
if row[0] == city:
print("The city of", city, "has had a total of", row[3], "cases at", row[2])
sum += int(row[3])
print("All cities with the name", city, "have a total of", sum, "cases.")
You won't know the total for the city until you have read all the rows for city.
You're getting a data structure from csvreader that is either a list or a dictionary. I'll assume it's a list. The easy way is:
total = 0
for line in csvdata:
if line[1] == 'Alabama':
total += int(line[3])
that can be turned into a list comprehension form
total = sum([int(x[3]) for x in csvdata if x[1] == 'Alabama'])
(Update, thanks for the correction. Corrections.)

How would i use a for loop to display an list of numbers as the output?

I recently came across this website using api to call a company and display the company numbers as an output. I used this as a basis and then was trying to use one specific company number where each time the code loops the number adds 1. How would I be able to do this at the moment the display is random numbers, any help would be great thank you?
For example, currently, the code runs and displays a number of random numbers i want it to display numbers starting from 09628955 then adding 1 to that number 09628956,09628957 etc.
ComapnySICS.py file
df = pd.read_csv(company_numbers_file)
ch_api = CompaniesHouseService("API KEY")
tic = datetime.datetime.now()
for index, row in df.iterrows():
company_number = row["Company Number"]
ch_profile = ch_api.get_company_profile(company_number)
df.at[index, "Company Name"] = ch_profile.get("company_name", None)
sics = ch_profile.get("sic_codes", [None])
for i in range(0,len(sics)):
df.at[index, f"SIC {i+1}"] = sics[i]
print(f"Number: {row['Company Number']} | "\
f"Name: {df.at[index,'Company Name']}")
#End timer
toc = datetime.datetime.now()
avg_time = ((toc-tic).total_seconds())/(len(df.index)-1)
print(f"Average time between API calls: {avg_time:0.2f} seconds")

Similar functions in Python don't produce same result

I'm having an issue with two functions I have defined in Python. Both functions have similar operations in the first few lines of the function body, and one will run and the other produces a 'key error' message. I will explain more below, but here are the two functions first.
#define function that looks at the number of claims that have a decider id that was dealer
#normalize by business amount
def decider(df):
#subset dataframe by date
df_sub = df[(df['vehicle_repair_date'] >= Q1_sd) & (df['vehicle_repair_date'] <= Q1_ed)]
#get the dealer id
did = df_sub['dealer_id'].unique()
#subset data further by selecting only records where 'dealer_decide' equals 1
df_dealer_decide = df_sub[df_sub['dealer_decide'] == 1]
#count the number of unique warranty claims
dealer_decide_count = df_dealer_decide['warranty_claim_number'].nunique()
#get the total sales amount for that dealer
total_sales = float(df_sub['amount'].max())
#get the number of warranty claims decided by dealer per $100k in dealer sales
decider_count_phk = dealer_decide_count * (100000/total_sales)
#create a dictionary to store results
output_dict = dict()
output_dict['decider_phk'] = decider_count_phk
output_dict['dealer_id'] = did
output_dict['total_claims_dealer_dec_Q1_2019'] = dealer_decide_count
output_dict['total_sales2019'] = total_sales
#convert resultant dictionary to dataframe
sum_df = pd.DataFrame.from_dict(output_dict)
#return the summarized dataframe
return sum_df
#apply the 'decider' function to each dealer in dataframe 'data'
decider_count = data.groupby('dealer_id').apply(decider)
#define a function that looks at the percentage change between 2018Q4 and 2019Q1 in terms of the number #of claims processed
def turnover(df):
#subset dealer records for Q1
df_subQ1 = df[(df['vehicle_repair_date'] >= Q1_sd) & (df['vehicle_repair_date'] <= Q1_ed)]
#subset dealer records for Q4
df_subQ4 = df[(df['vehicle_repair_date'] >= Q4_sd) & (df['vehicle_repair_date'] <= Q4_ed)]
#get the dealer id
did = df_subQ1['dealer_id'].unique()
#get the unique number of claims for Q1
unique_Q1 = df_subQ1['warranty_claim_number'].nunique()
#get the unique number of claims for Q1
unique_Q4 = df_subQ4['warranty_claim_number'].nunique()
#determine percent change from Q4 to Q1
percent_change = round((1 - (unique_Q1/unique_Q4))*100, ndigits = 1)
#create a dictionary to store results
output_dict = dict()
output_dict['nclaims_Q1_2019'] = unique_Q1
output_dict['nclaims_Q4_2018'] = unique_Q4
output_dict['dealer_id'] = did
output_dict['quarterly_pct_change'] = percent_change
#apply the 'turnover' function to each dealer in 'data' dataframe
dealer_turnover = data.groupby('dealer_id').apply(turnover)
Each function is being applied to the exact same dataset and I am obtaining the dealer id(variable did in function body) in the same way. I am also using the same groupby then apply code, but when I run the two functions the function decider runs as expected, but the function turnover gives the following error:
KeyError: 'dealer_id'.
At first I thought it might be a scoping issue, but that doesn't really make sense so if anyone can shed some light on what might be happening I would greatly appreciate it.
Thanks,
Curtis
IIUC, you are applying turnover function after decider function. You are getting the key error since dealer_id is present as index and not as a column.
Try replacing
decider_count = data.groupby('dealer_id').apply(decider)
with
decider_count = data.groupby('dealer_id', as_index=False).apply(decider)

IndexError: list index out of range - python

I have the following error:
currency = row[0]
IndexError: list index out of range
Here is the code:
crntAmnt = int(input("Please enter the amount of money to convert: "))
print(currencies)
exRtFile = open ('exchangeRate.csv','r')
exchReader = csv.reader(exRtFile)
crntCurrency = input("Please enter the current currency: ")
validateloop=0
while validateloop == 0:
for row in exchReader:
currency = row[0]
if currency == crntCurrency:
crntRt = row[1]
validateloop=+1
Heres the CSV file:
Japanese Yen,169.948
US Dollar,1.67
Pound Sterling,1
Euro,5.5
Here's an input/Output example:
Please enter the amount of money to convert: 100
['Pound Sterling', 'Euro', 'US Dollar', 'Japanese Yen']
Please enter the current currency: Pound Sterling
You probably have a blank row in your csv file, causing it to produce an empty list
There are a couple solutions
1. Check if there are elements, only proceed if there are:
for row in exchReader:
if len(row): # can also just do if row:
currency = row[0]
if currency == crntCurrency:
2. Short-circuit an and operator to make currency an empty list, which won't match crntCurrency:
for row in exchReader:
currency = row and row[0]
if currency == crntCurrency:
Try printing out the row. The convention for variable names in python are like_this and not likeThis. You might find the break keyword useful:
for row in exch_reader:
currency = row[0]
if currency == crnt_currency:
crnt_rt = row[1]
break
To only index the row when the row actually contains something:
currency = row and row[0]
Here row[0] is only executed if row evaluates to True, which would be when it has at least one element.

Categories

Resources