Looking for any matching terms from file - python

I have a file that has a large list of Countries, years, and ages of living expectancies. I cannot figure out how to make sure the user is only allowed to input a year that actually exists. After figuring this out, I will need to call only those years (with corresponding country name, code, and living expectancies. How can I do this?
import pathlib
cwd = pathlib.Path(__file__).parent.resolve()
data_file = f'{cwd}/life-expectancy.csv'
with open(data_file) as f:
while True:
user_year = input('Enter the year of interest: ')
for lines in f:
cat = lines.strip().split(',')
country = cat[0]
code = cat[1]
year = cat[2]
age = cat[3]
if any( [year in user_year for year in cat[2]] ):
print(f'Your year is {user_year}. That is one of our known years.')
print(year)
print()
continue
else:
print('Please enter a valid year (1751-2019)')
print('test')

Solution 1
If all the dates from 1751 to 2019 are in your file, then you don't need to read your file to check that, you can simply do that:
# Ask the user for the year
prompt_text = "Enter the year of interest: "
user_year = int(input(prompt_text))
while not 1751 <= user_year <= 2019:
print("Please enter a valid year (1751-2019)")
user_year = int(input(prompt_text))
After that you can read your file and store the data only if the years are matching:
# Get the data for the asked year
# Example of final data: [("France", "FR", 45), ("Espagne", "ES", 29)]
data = []
with open(data_file, "r", encoding="utf-8") as file:
for line in file:
country, code, year, age = line.strip().split(",")
if int(year) == user_year:
data.append((country, code, int(age)))
Solution 2
If you really need to check the year in your file, e.g. because 1845 is not in it, then read the file once and store all the data in a dictionary indexed by the year and return the data of the asked year if it is present:
data = {}
with open(data_file, "r", encoding="utf-8") as file:
for line in file:
country, code, year, age = line.strip().split(",")
year = int(year)
if year in data:
data[year].append((country, code, int(age)))
else:
data[year] = [(country, code, int(age))]
prompt_text = "Enter the year of interest: "
user_year = int(input(prompt_text))
while user_year not in data:
print("The year is not present in the file")
user_year = int(input(prompt_text))
print(data[user_year])

One could use DataFrames to handle such cases. To know more information on dataframe, take a look into Pandas.DataFrame
To select specific column contents from the dataframe: df[[<col_1>, <col_2>]]
Considering the data fetched could produce the following.
import pandas as pd
df = pd.read_csv("Life Expectancy Data.csv")
year = int(input("Enter the year of interest: "))
df = df[["Country", "Year", "Life expectancy "]]
if year in df["Year"].values:
print(f'Your year is {year}. That is one of our known years.')
display(df.loc[df["Year"] == year])
else:
print("Please enter a valid year (2000-2015)")

Your question includes two questions.
1. Question and answer
I cannot figure out how to make sure the user is only allowed to
input a year that actually exists.
Your range of accepted years is 1751-2019. You could create a list with these integers and check that the user input is within that range. E.g.
allowed_answers = list(range(1751, 2019, 1))
There are multiple ways to check the user input and the one you want to use depends on how you want the user interaction to be. Here are few examples:
1.Stop the program immediately if user input is invalid
user_year = input('Enter the year of interest: ')
allowed_answers = list(range(1751, 2019, 1))
assert user_year in allowed_answers, "User input is invalid"
...
2.Ask user to input number until it is accepted
allowed_answers = list(range(1751, 2019, 1))
user_year = 0
while int(user_year) not in allowed_answers:
print('Please enter a valid year (1751-2019)')
user_year = input('Enter the year of interest: ')
3.Combining the two solutions to have a limit of prompts.
allowed_answers = list(range(1751, 2019, 1))
user_year = 0
for i in range(0,5):
print('Please enter a valid year (1751-2019)')
user_year = input('Enter the year of interest: ')
if int(user_year) in allowed_answers:
input_valid = True
break
else:
input_valid = False
assert input_valid, "No correct input after five tries."
Note that all these solutions only handle inputs that can be converted into integer. To go around that, you might need some try... except clauses for the data transformation from string to integer, or transform the list items of allowed_answers into strings.
2. Question and answer
After figuring this out, I will need to call only those years (with corresponding country name, code, and living expectancies. How can I do this?
I would read the file only once a make it into a dictionary. Then you only need to do the indexing once and search from there as long as your program is running. See https://docs.python.org/3/tutorial/datastructures.html#dictionaries .
With these suggestions I would do the data reading and transformation into dictionary outside (and before) your while loop.

Related

Appending Data to an existing data frame with panda

I have been trying to find a way to add information to an excel file using panda by appending it but I can't seem to get it. The information is the input from the user
Every time it runs, the data in the excel sheet seems to overwrite the one before, not appending a new row to it.
FirstName = input('What is your First Name? \n')
LastName = input('What is Your Last Name? \n')
ageCustomer = int(input('What is your current Age? \n'))
genderCustomer = input('What is your biological assigned gender? \n')
socialCustomer = int(input('What is your Social Security Number or ITIN? (Must Be 6 digits) \n'))
bdDayCustomer = int(input('What is the day of your birthday? (Enter Just Number) \n'))
bdMonthCustomer = int(input('What is the month of your birthday? (Enter Just Number) \n'))
bdYearCustomer = int(input('What is the year of your birthday? (Must be 4 digits) \n'))
InAmountCustomer = int(input('What is the Initial Amount Being deposited? \n'))
df1 = pd.DataFrame({'FirstN':[''],
'LastN':[''],
'Age':[0],
'Gender':[''],
'SSN':[0],
'bdDay':[0],
'bdMonth':[0],
'bdyear':[0],
'InAmount':[0],
})
row_to_add = pd.DataFrame({'FirstN':[FirstName],
'LastN':[LastName],
'Age':[ageCustomer],
'Gender':[genderCustomer],
'SSN':[socialCustomer],
'bdDay':[bdDayCustomer],
'bdMonth':[bdMonthCustomer],
'bdyear':[bdYearCustomer],
'InAmount':[InAmountCustomer],
})
df_final = df1.append(row_to_add, ignore_index=True)
writer = pd.ExcelWriter('CustomerInfo.xlsx')
df_final.to_excel(writer)
writer.save()
print(df_final)

Multiple lines of data, want to get an index or figure out how to get it in order

reads and stores the data in this file.
User for two integers corresponding to start and end years, and finds and lists the year of publication, title, author, in that order, of all books published during that period.
It repeats the previous step till the user enters -1 when prompted for the start year.
This is what I have so far (see picture)
def main():
file = open("resources.txt","r")
myList = []
year1 = int(input("Enter the first year:"))
year2 = int(input("Enter the second year: "))
for x in range(year1, year2):
print(yearofpublication,title, author)
and the file is 1000 lines
I need help with #2 mainly.
Thank you
Here is a solution that doesn't uses Pandas. I have put comments to break down the code according to the steps you requested. Step 1 imports the text file, gets rid of all tabs and newline characters and splits each line on the semicolon to create a list of lists.
Step 2 iterates through all the books and compares index 3 (year) of each book to the specified years. Step 3 creates an infinite loop and breaks it only when the user enters -1.
#step 1
data = open('resources.txt', 'r')
book_list = []
for line in data:
new_line = line.rstrip('\n').replace('\t', '').split(';')
book_list.append(new_line)
#step 3
while True:
year1 = int(input("Enter the first year:"))
if year1 == -1:
break
year2 = int(input("Enter the second year: "))
#step2
for book in book_list:
if year1 <= int(book[3]) <= year2:
print(f'Publication Year: {book[3]}, Title: {book[1]}, Author: {book[2]}')
Assuming you have a txt file like below that is ; separated with a consistent format and no headers.
1 ; A ; X ;1220
2 ; B ; Y ;1245
You can load the file using pandas which will allow you to easily filter the data on conditions.
import pandas
df = pandas.read_csv("data.txt", sep=";", names=["id", "author", "title", "year"])
Then for your step 2, you can filter the dataframe based on year1 and year2
df[(df['year'] > year1) & (df['year'] < year2)]
print(df.head())

print the average of each year from txt.file

I need to print the average per year for an assignment. I have the following:
a text file that is like this with over 2000 lines:
Unit 42;2017;7.0
Love Your Garden;2011;8.0
Limmy's Show;2010;8.3
Nazi Megastructures;2013;8.0
Omniscient;2020;6.3
Green Frontier;2019;7.4
Los BriceƱo;2019;8.4
Aftermath;2014;
Sugar;2006;
Beyond Stranger Things;2017;
Men on a Mission;2018;
Click for Murder;2017;
As you can see some movies don't have a grade so these need to be ignored
Now i need to output it like this:
2000: 1,1111
2001: 2,2222
etc up until 2020
Now I made the following code to extract the right parts from the txt file
I tried the following:
file = open("tv_shows.txt", "r", encoding='utf8')
#content = file.read()
result = {}
for line in file:
year, number = line.split(';')[1], line.split(';')[2]
if len(number) <3:
continue
year = int(year)
number = float(number)
try:
result[year].append(number)
except KeyError:
result[year] = [number]
for k, v in sorted(result.items()):
print('{}: {:.4f}'.format(k, sum(v) / len(v)))
it gives me this, which is a lot better, but now it raises a new question for me. How can i remove the redundant zero's in the average numbers.
2000: 7.7000
2001: 7.4000
2002: 7.1000
2003: 7.0091
2004: 7.6667
2005: 7.7333
2006: 7.2579
2007: 7.5080
2008: 7.1630
2009: 7.3884
2010: 7.3904
2011: 7.3507
2012: 7.0787
2013: 7.0418
2014: 7.2427
2015: 7.2462
2016: 7.1730
2017: 7.1478
2018: 7.0034
2019: 7.1191
2020: 6.8130
If you are not allowed to use pandas,
file = open("tv_shows.txt", "r", encoding='utf8')
years = {}
for a in file:
_, year, number = a.split(';')
if len(number) <3:
continue
year = int(year)
number = float(number)
if year not in years:
years[year] = [] # Add a new list to the years dict
years[year].append(number) # Append the current number to the correct list.
avgyears = {}
for year, numberlist in years.items():
# iterate over the dict, find the mean of each list
avgyears[year] = sum(numberlist) / len(numberlist)
The question was edited while I was writing my answer. The modified question asks "How can I remove the redundant zero's in the average numbers?"
The extra zeros are added because you ask Python to format your number to four decimal places. To remove the zeros from the right side of the string, you can simply use str.rstrip()
for year, numberlist in years.items():
# iterate over the dict, find the mean of each list
avgyears[year] = sum(numberlist) / len(numberlist)
num = f"{avgyears[year]:.4f}".rstrip("0")
print(f"{year}: {num}")
If you are allowed to use pandas then
df = pd.read_csv("tv_show.txt", delimiter=";", header=None,
names=['name', 'year', 'rating'])
df = df.dropna()
df.groupby(['year'])['rating'].mean().reset_index()
How about you keep a dictionary whose keys are years and values are lists of scores in that year? Populate the dict as you loop (dont forget to convert str to float). Then at the end you can just average each list.

User input to control output

data['Year'] = input("Select a Year: ")
data['Month'] = input("Select a Month: ")
grouping = data.groupby(["Year", "Month"])
monthly_averages = grouping.aggregate({"Value":np.mean})
print(monthly_averages)
Guys - trying to pick a year, and a month, then show the mean value for that month. The last 3 lines alone will show every year and month average, but I want to be able to select one. New to python, not sure how to apply the choice to the grouping.
Do you have an example table you can show us? Something like this should work but I can't test it without an example. I'd recommend reading up on the loc method.
year = input('Select a year: ')
month = input('Select a month: ')
df2 = data.loc['Year' == year]
df3 = df2.loc['Month' == month]
grouping = df3.groupby(["Year", "Month"])
monthly_averages = grouping.aggregate({"Value":np.mean})
print(monthly_averages)
I don't think you need grouping if your picking one month and one year
df[(df['Year'] == 'input_year') & (df['Month'] == 'input_month')].mean()

Python: dict and sorting in alphabetical

I require to write a program which accept input year from user and read information from CSV file then export result on the screen. The csv source file has format: year, name, count, gender and export result are boy only with format Name Count in alphabetical order.
Input file:
2010,Ruby,440,Female
2010,Cooper,493,Male
Output:
Please enter the year: 2010
Popular boy names in year 2010 are:
Aidan 112
I have error when run program:
Please enter the year: 2014
Traceback (most recent call last):
File "E:\SIT111\A1\printBoynameTPL.py", line 26, in <module>
year, name, count, gender = row
ValueError: need more than 0 values to unpack
This is my code:
'''
This program accepts a year as input from a user and print boys' information of the year. The output should be sorted by name in alphabetical order.
Steps:
1. Receive a year from a user
2. Read CSV files:
Format: year, name, count, gender
3. Display popular boy names on the screen:
Format: Name Count
'''
import csv
inputYear = raw_input('Please enter the year: ')
inFile = open('output/babyQldAll.csv', 'rU')
cvsFile = csv.reader(inFile, delimiter=',')
dict = {}
for row in cvsFile:
year, name, count, gender = row
if (year == inputYear) and (gender == 'Boy'):
dict[name] = count
print('Popular boy names in year %s are:' % inputYear)
# +++++ You code here ++++
# According to informaiton in 'dict', print (name, count) sorted by 'name' in alphabetical order
sortedName = shorted(dict.keys())
for name in sortedName:
print(name, dict[name])
print("Print boy names... ")
inFile.close()
I edited a bit:
for row in cvsFile:
if row:
year, name, count, gender = row
if (year == inputYear) and (gender == 'Male'):
dict[name] = count
print('Popular boy names in year %s are:' % inputYear)
# +++++ You code here ++++
# According to informaiton in 'dict', print (name, count) sorted by 'name' in alphabetical order
sortedName = sorted(dict.keys())
for name in sortedName:
print(name,dict[name])
print("Print boy names... ")
did i do sth wrong? indents or sth?
result:
>>>
Please enter the year: 2013
Popular boy names in year 2013 are:
Print boy names...
>>>
You seem to be having empty lines in your csv file, which is causing empty row to come you iterate the csv file. You can simply check if row is empty or not, before doing rest of the logic. Example -
for row in cvsFile:
if row:
year, name, count, gender = row
if (year == inputYear) and (gender == 'Boy'):
dict[name] = count
Also, you should not use dict as a variable name, it shadows the built-in function dict() .
Also, you have another typo in your program -
sortedName = shorted(dict.keys())
I am guessing you intended to use sorted() .

Categories

Resources