data['Year'] = input("Select a Year: ")
data['Month'] = input("Select a Month: ")
grouping = data.groupby(["Year", "Month"])
monthly_averages = grouping.aggregate({"Value":np.mean})
print(monthly_averages)
Guys - trying to pick a year, and a month, then show the mean value for that month. The last 3 lines alone will show every year and month average, but I want to be able to select one. New to python, not sure how to apply the choice to the grouping.
Do you have an example table you can show us? Something like this should work but I can't test it without an example. I'd recommend reading up on the loc method.
year = input('Select a year: ')
month = input('Select a month: ')
df2 = data.loc['Year' == year]
df3 = df2.loc['Month' == month]
grouping = df3.groupby(["Year", "Month"])
monthly_averages = grouping.aggregate({"Value":np.mean})
print(monthly_averages)
I don't think you need grouping if your picking one month and one year
df[(df['Year'] == 'input_year') & (df['Month'] == 'input_month')].mean()
Related
I have a file that has a large list of Countries, years, and ages of living expectancies. I cannot figure out how to make sure the user is only allowed to input a year that actually exists. After figuring this out, I will need to call only those years (with corresponding country name, code, and living expectancies. How can I do this?
import pathlib
cwd = pathlib.Path(__file__).parent.resolve()
data_file = f'{cwd}/life-expectancy.csv'
with open(data_file) as f:
while True:
user_year = input('Enter the year of interest: ')
for lines in f:
cat = lines.strip().split(',')
country = cat[0]
code = cat[1]
year = cat[2]
age = cat[3]
if any( [year in user_year for year in cat[2]] ):
print(f'Your year is {user_year}. That is one of our known years.')
print(year)
print()
continue
else:
print('Please enter a valid year (1751-2019)')
print('test')
Solution 1
If all the dates from 1751 to 2019 are in your file, then you don't need to read your file to check that, you can simply do that:
# Ask the user for the year
prompt_text = "Enter the year of interest: "
user_year = int(input(prompt_text))
while not 1751 <= user_year <= 2019:
print("Please enter a valid year (1751-2019)")
user_year = int(input(prompt_text))
After that you can read your file and store the data only if the years are matching:
# Get the data for the asked year
# Example of final data: [("France", "FR", 45), ("Espagne", "ES", 29)]
data = []
with open(data_file, "r", encoding="utf-8") as file:
for line in file:
country, code, year, age = line.strip().split(",")
if int(year) == user_year:
data.append((country, code, int(age)))
Solution 2
If you really need to check the year in your file, e.g. because 1845 is not in it, then read the file once and store all the data in a dictionary indexed by the year and return the data of the asked year if it is present:
data = {}
with open(data_file, "r", encoding="utf-8") as file:
for line in file:
country, code, year, age = line.strip().split(",")
year = int(year)
if year in data:
data[year].append((country, code, int(age)))
else:
data[year] = [(country, code, int(age))]
prompt_text = "Enter the year of interest: "
user_year = int(input(prompt_text))
while user_year not in data:
print("The year is not present in the file")
user_year = int(input(prompt_text))
print(data[user_year])
One could use DataFrames to handle such cases. To know more information on dataframe, take a look into Pandas.DataFrame
To select specific column contents from the dataframe: df[[<col_1>, <col_2>]]
Considering the data fetched could produce the following.
import pandas as pd
df = pd.read_csv("Life Expectancy Data.csv")
year = int(input("Enter the year of interest: "))
df = df[["Country", "Year", "Life expectancy "]]
if year in df["Year"].values:
print(f'Your year is {year}. That is one of our known years.')
display(df.loc[df["Year"] == year])
else:
print("Please enter a valid year (2000-2015)")
Your question includes two questions.
1. Question and answer
I cannot figure out how to make sure the user is only allowed to
input a year that actually exists.
Your range of accepted years is 1751-2019. You could create a list with these integers and check that the user input is within that range. E.g.
allowed_answers = list(range(1751, 2019, 1))
There are multiple ways to check the user input and the one you want to use depends on how you want the user interaction to be. Here are few examples:
1.Stop the program immediately if user input is invalid
user_year = input('Enter the year of interest: ')
allowed_answers = list(range(1751, 2019, 1))
assert user_year in allowed_answers, "User input is invalid"
...
2.Ask user to input number until it is accepted
allowed_answers = list(range(1751, 2019, 1))
user_year = 0
while int(user_year) not in allowed_answers:
print('Please enter a valid year (1751-2019)')
user_year = input('Enter the year of interest: ')
3.Combining the two solutions to have a limit of prompts.
allowed_answers = list(range(1751, 2019, 1))
user_year = 0
for i in range(0,5):
print('Please enter a valid year (1751-2019)')
user_year = input('Enter the year of interest: ')
if int(user_year) in allowed_answers:
input_valid = True
break
else:
input_valid = False
assert input_valid, "No correct input after five tries."
Note that all these solutions only handle inputs that can be converted into integer. To go around that, you might need some try... except clauses for the data transformation from string to integer, or transform the list items of allowed_answers into strings.
2. Question and answer
After figuring this out, I will need to call only those years (with corresponding country name, code, and living expectancies. How can I do this?
I would read the file only once a make it into a dictionary. Then you only need to do the indexing once and search from there as long as your program is running. See https://docs.python.org/3/tutorial/datastructures.html#dictionaries .
With these suggestions I would do the data reading and transformation into dictionary outside (and before) your while loop.
I'm trying to do this for our assignment. We are only allowed to use pandas, numpy and basic python and we had to do this by accessing our uni timetable which is an excel file and has 5 separate sheets. If we cannot use loc function in data frames how do we do this? Also how do you find the least populated rows? I'm either getting a lot of errors or i can't find any way to solve these problems.
Read the Timetable Using Pandas.
Remove Un necessary Top Rows using pandas.
Return all the classes of the particular subject (Ask from User) along with Time, Day and
Room Number.
Ask the list of Subjects from user and return the classes the of these subjects along with
Time, Day and Room Number.
Return the Empty slots of the given Day (Monday, Tuesday, Wednesday, Thursday,
Friday).
Which Classroom is less busy in overall week.
Which Lab is the busiest lab in the overall week, return its name.
Here is my code:
import pandas as pd
import numpy as np
#accessing excel file
path = (r'C:\Users\user\Downloads\TimeTable, FSC, Fall-2022.xlsx')
data = pd.read_excel(path,sheet_name=None)
df = pd.concat(data[frame] for frame in data.keys())
#allocating variables(names of days) to their respective file paths
mon = pd.read_excel(path,sheet_name = "Monday")
tues = pd.read_excel(path,sheet_name = "Tuesday")
wed = pd.read_excel(path,sheet_name = "Wednesday")
thur = pd.read_excel(path,sheet_name = "Thursday")
fri = pd.read_excel(path,sheet_name = "Friday")
#timetable.head()
#creating a dictionary that has access to (names of days) variables -- a dictionary for the timetable of whole week
empty_dictionary = {}
timetable = {
"Monday" : mon,
"Tuesday" : tues,
"Wednesday" : wed,
"Thursday" : thur,
"Friday" : fri,
}
#dropping unnecessary rows from all sheets (not dropping 0th row because dropping it removes the name of day from the top left corner)
mon.drop([1,2], axis = 0, inplace = True)
tues.drop([1,2], axis = 0, inplace = True)
wed.drop([1,2], axis = 0, inplace = True)
thur.drop([1,2], axis = 0, inplace = True)
fri.drop([1,2], axis = 0, inplace = True)
mon.head()
#This part is not working
df.loc[df[timetable(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday'])==subject]]
df.loc[df["Monday"]==subject]
#Part 3 of question
#asking user to enter the subject name
subject = str(input("Enter the subject you want to search classes for: "))
reads and stores the data in this file.
User for two integers corresponding to start and end years, and finds and lists the year of publication, title, author, in that order, of all books published during that period.
It repeats the previous step till the user enters -1 when prompted for the start year.
This is what I have so far (see picture)
def main():
file = open("resources.txt","r")
myList = []
year1 = int(input("Enter the first year:"))
year2 = int(input("Enter the second year: "))
for x in range(year1, year2):
print(yearofpublication,title, author)
and the file is 1000 lines
I need help with #2 mainly.
Thank you
Here is a solution that doesn't uses Pandas. I have put comments to break down the code according to the steps you requested. Step 1 imports the text file, gets rid of all tabs and newline characters and splits each line on the semicolon to create a list of lists.
Step 2 iterates through all the books and compares index 3 (year) of each book to the specified years. Step 3 creates an infinite loop and breaks it only when the user enters -1.
#step 1
data = open('resources.txt', 'r')
book_list = []
for line in data:
new_line = line.rstrip('\n').replace('\t', '').split(';')
book_list.append(new_line)
#step 3
while True:
year1 = int(input("Enter the first year:"))
if year1 == -1:
break
year2 = int(input("Enter the second year: "))
#step2
for book in book_list:
if year1 <= int(book[3]) <= year2:
print(f'Publication Year: {book[3]}, Title: {book[1]}, Author: {book[2]}')
Assuming you have a txt file like below that is ; separated with a consistent format and no headers.
1 ; A ; X ;1220
2 ; B ; Y ;1245
You can load the file using pandas which will allow you to easily filter the data on conditions.
import pandas
df = pandas.read_csv("data.txt", sep=";", names=["id", "author", "title", "year"])
Then for your step 2, you can filter the dataframe based on year1 and year2
df[(df['year'] > year1) & (df['year'] < year2)]
print(df.head())
I have this data frame of clients purchases and I would like to create a function that gave me the total purchases for a given input of month and year.
I have a dataframe (df) with lots of columns but i'm going to use only 3 ("year", "month", "value")
This is what I'm trying but not working:
def total_purchases():
y = input('Which year do you want to consult?')
m = int(input('Which month do you want to consult?')
sum = []
if df[df['year']== y] & df[df['month']== m]:
for i in df:
sum = sum + df[df['value']]
return sum
You're close, you need to ditch the IF statement and the For loop.
additionally, when dealing with multiple logical operators in pandas you need to use parenthesis to seperate the conditions.
def total_purchases(df):
y = input('Which year do you want to consult? ')
m = int(input('Which month do you want to consult? '))
return df[(df['year'].eq(y)) & (df['month'].eq(m))]['value'].sum()
setup
df_p = pd.DataFrame({'year' : ['2011','2011','2012','2013'],
'month' : [1,2,1,2],
'value' : [200,500,700,900]})
Test
total_purchases(df_p)
Which year do you want to consult? 2011
Which month do you want to consult? 2
500
I'm filtering a dataframe by hour and weekday:
if type == 'daily':
hour = data.index.hour
day = data.index.weekday
selector = ((hour != 17)) | ((day!=5) & (day!=6))
data = data[selector]
if type == 'weekly':
day = data.index.weekday
selector = ((day!=5) & (day!=6))
data = data[selector]
Then I'm using a for where I need to write some conditional according to the weekday/hour and the row.index doesn't have any information. What can I do in this case ?
I need to do something like (this it won't work since row.index doesn't have weekday or hour info):
for index, row in data.iterrows():
if type == 'weekly' & row.index.weekday == 1 & row.index.hour == 0 & row.index.min == 0 | \
type == 'daily' & row.index.hour == 18 & row.index.min == 0:
Thx in advance
I know this is not the most elegant way, but you could create your variables in columns:
df['Hour'] = df.index.hour
If you need a min or a max based on those variables, you could create another column and use rolling_min or rolling type formulas.
Once you have your columns, you can iterate as you please with iteration you suggested.
There's info about the index properties here