How to split the textfile - python

04-05-1993:1.068
04-12-1993:1.079
04-19-1993:1.079
06-06-1994:1.065
06-13-1994:1.073
06-20-1994:1.079
I have text file for date-year-price for gas and i want to calculate the avg gas prices for year. So i tried to split,
with open('c:/Gasprices.txt','r') as f:
fullfile=[x.strip() for x in f.readlines()]
datesprices=[(x.split('-')[0], x.split(':')[1]) for x in fullfile]
print(datesprices)
But I can't get year and price data but data like this.
('04', '1.068'), ('04', '1.079')
please let me know what should i know.
and plus, please let me know how to use split data to calculate the avg price per year using a dictionary if you can.

I see no need to split the input lines as they have a fixed format for the date - i.e., its length is known. Therefore we can just slice.
with open('gas.txt') as gas:
td = dict()
for line in gas:
year = line[6:10]
price = float(line[11:])
td.setdefault(year, []).append(price)
for k, v in td.items():
print(f'{k} {sum(v)/len(v):.3f}')
Output:
1993 1.075
1994 1.072
Note:
There is no check here for blank lines. It is assumed that there are none and that the sample shown in the question is malformed.
Also, no need to strip the incoming lines as float() is impervious to leading/trailing whitespace

As it was already mentioned, to get the year you should use a bit more complex split. But your format seems to be very consistent, you could probably go for:
datesprices=[(x[6:10], x[11:]) for x in fullfile]
but how to get average of it? You need to store list for specific year somewhere.
from statistics import mean
my_dict = {} # could be defaultdict too
for year, price in datesprices:
if year not in my_dict:
my_dict[year] = []
my_dict[year].append(price)
for year, prices in my_dict.items():
print(year, mean(prices))

TRY THIS
with open('c:/Gasprices.txt','r') as f:
fullfile=[x.strip() for x in f.readlines()]
datesprices=[(x.split('-')[0],x.split('-')[-1].split(':')[0], x.split(':')[1]) for x in fullfile]
print(datesprices)
OUTPUT
[('04', '1993', '1.068'), ('04', '1993', '1.079'), ('04', '1993', '1.079'), ('06', '1994', '1.065'), ('06', '1994', '1.073'), ('06', '1994', '1.079')]
OR
with open('c:/Gasprices.txt','r') as f:
fullfile=[x.strip() for x in f.readlines()]
datesprices=[(x.split('-')[-1].split(':')[0], x.split(':')[1]) for x in fullfile]
print(datesprices)
OUTPUT
[('1993', '1.068'), ('1993', '1.079'), ('1993', '1.079'), ('1994', '1.065'), ('1994', '1.073'), ('1994', '1.079')]

txt = ['04-05-1993:1.068', '04-12-1993:1.079', '04-19-1993:1.079', '06-06-1994:1.065', '06-13-1994:1.073', '06-20-1994:1.079']
price_per_year = {}
number_of_years = {}
for i in txt:
x = txt.split(':')
Date = x[0]
Price = x[1]
year = date.split('-')[2]
if year ~in price_per_year.keys:
price_per_year.update({year:Price})
number_of_years.update({year:1})
else:
price_per_year[year] += Price
number_of_years[year] += 1
av_price_1993 = price_per_year[1993] / number_of_years[1993]
av_price_1994
= price_per_year[1994] / number_of_years[1994]

Related

How could I create a tuple with for-loops to create a list variable using a file with headers and its corresponding data

So I have a text file containing 22 lines and three headers which is:
economy name
unique economy code given the World Bank standard (3 uppercase letters)
Trade-to-GDP from year 1990 to year 2019 (30 years, 30 data points); 0.3216 means that trade-to-gdp ratio for Australia in 1990
is 32.16%
The code I have used to import this file and open/read it is:
def Input(filename):
f = open(filename, 'r')
lines = f.readlines()
lines = [l.strip() for l in lines]
f.close()
return lines
However once I have done that I have to create a code with for-loops to create a list variable named result. It should contain 22 tuples, and each tuple contains four elements:
economy name,
World Bank economy code,
average trade-to-gdp ratio for this economy from 1990 to 2004,
average trade-to-gdp ratio for this economy from 2005 to 2019.
Coming out like
('Australia', 'AUS', '0.378', '0.423')
So far the code I have written looks like this:
def result:
name, age, height, weight = zip(*[l.split() for l in text_file.readlines()])
I am having trouble starting this and knowing how to grapple with the multiple years required and output all the countries with corresponding ratios.Here is the table of all the data I have on the text file.
I would suggest to use Pandas for this.
You can simply do:
import pandas as pd
df = read_csv('filename.csv')
for index, row in df.iterrows():
***Do something***
In for loop you can use row['columnName'] and get the data, For example: row['code'] or row['1999'].
This approach will be lot easier for you to carry operations and process the data.
Also to answer your approach:
You can iter over the lines and extract the data using index.
Try the below code:
def Input(filename):
f = open(filename, 'r')
lines = f.readlines()
lines = [l.strip().split() for l in lines]
f.close()
return lines
for line in lines[1:]:
total = sum([float(x) for x in line[2:17])# this will give you sum of values from 1990 to 2004
total2 = sum([float(x) for x in line[17:])# this will give you sum of values from 2005 to 2019
val= (line[0], line[1], total, total1) #This will give you tuple
You can continue the approach and create a tuple in each for loop.

How to find specific items in a CSV file using inputs?

I'm still new to python, so forgive me if my code seems rather messy or out of place. However, I need help with an assignment for university. I was wondering how I am able to find specific items in a CSV file? Here is what the assignment says:
Allow the user to type in a year, then, find the average life expectancy for that year. Then find the country with the minimum and the one with the maximum life expectancies for that year.
import csv
country = []
digit_code = []
year = []
life_expectancy = []
count = 0
lifefile = open("life-expectancy.csv")
with open("life-expectancy.csv") as lifefile:
for line in lifefile:
count += 1
if count != 1:
line.strip()
parts = line.split(",")
country.append(parts[0])
digit_code.append(parts[1])
year.append(parts[2])
life_expectancy.append(float(parts[3]))
highest_expectancy = max(life_expectancy)
country_with_highest = country[life_expectancy.index(max(life_expectancy))]
print(f"The country that has the highest life expectancy is {country_with_highest} at {highest_expectancy}!")
lowest_expectancy = min(life_expectancy)
country_with_lowest = country[life_expectancy.index(min(life_expectancy))]
print(f"The country that has the lowest life expectancy is {country_with_lowest} at {lowest_expectancy}!")
It looks like you only want the first and fourth tokens from each row in your CSV. Therefore, let's simplify it like this:
Hong Kong,,,85.29
Japan,,,85.03
Macao,,,84.68
Switzerland,,,84.25
Singapore,,,84.07
You can then process it like this:
FILE = 'life-expectancy.csv'
data = []
with open(FILE) as csv:
for line in csv:
tokens = line.split(',')
data.append((float(tokens[3]), tokens[0]))
hi = max(data)
lo = min(data)
print(f'The country with the highest life expectancy {hi[0]:.2f} is {hi[1]}')
print(f'The country with the lowest life expectancy {lo[0]:.2f} is {lo[1]}')

print the average of each year from txt.file

I need to print the average per year for an assignment. I have the following:
a text file that is like this with over 2000 lines:
Unit 42;2017;7.0
Love Your Garden;2011;8.0
Limmy's Show;2010;8.3
Nazi Megastructures;2013;8.0
Omniscient;2020;6.3
Green Frontier;2019;7.4
Los Briceño;2019;8.4
Aftermath;2014;
Sugar;2006;
Beyond Stranger Things;2017;
Men on a Mission;2018;
Click for Murder;2017;
As you can see some movies don't have a grade so these need to be ignored
Now i need to output it like this:
2000: 1,1111
2001: 2,2222
etc up until 2020
Now I made the following code to extract the right parts from the txt file
I tried the following:
file = open("tv_shows.txt", "r", encoding='utf8')
#content = file.read()
result = {}
for line in file:
year, number = line.split(';')[1], line.split(';')[2]
if len(number) <3:
continue
year = int(year)
number = float(number)
try:
result[year].append(number)
except KeyError:
result[year] = [number]
for k, v in sorted(result.items()):
print('{}: {:.4f}'.format(k, sum(v) / len(v)))
it gives me this, which is a lot better, but now it raises a new question for me. How can i remove the redundant zero's in the average numbers.
2000: 7.7000
2001: 7.4000
2002: 7.1000
2003: 7.0091
2004: 7.6667
2005: 7.7333
2006: 7.2579
2007: 7.5080
2008: 7.1630
2009: 7.3884
2010: 7.3904
2011: 7.3507
2012: 7.0787
2013: 7.0418
2014: 7.2427
2015: 7.2462
2016: 7.1730
2017: 7.1478
2018: 7.0034
2019: 7.1191
2020: 6.8130
If you are not allowed to use pandas,
file = open("tv_shows.txt", "r", encoding='utf8')
years = {}
for a in file:
_, year, number = a.split(';')
if len(number) <3:
continue
year = int(year)
number = float(number)
if year not in years:
years[year] = [] # Add a new list to the years dict
years[year].append(number) # Append the current number to the correct list.
avgyears = {}
for year, numberlist in years.items():
# iterate over the dict, find the mean of each list
avgyears[year] = sum(numberlist) / len(numberlist)
The question was edited while I was writing my answer. The modified question asks "How can I remove the redundant zero's in the average numbers?"
The extra zeros are added because you ask Python to format your number to four decimal places. To remove the zeros from the right side of the string, you can simply use str.rstrip()
for year, numberlist in years.items():
# iterate over the dict, find the mean of each list
avgyears[year] = sum(numberlist) / len(numberlist)
num = f"{avgyears[year]:.4f}".rstrip("0")
print(f"{year}: {num}")
If you are allowed to use pandas then
df = pd.read_csv("tv_show.txt", delimiter=";", header=None,
names=['name', 'year', 'rating'])
df = df.dropna()
df.groupby(['year'])['rating'].mean().reset_index()
How about you keep a dictionary whose keys are years and values are lists of scores in that year? Populate the dict as you loop (dont forget to convert str to float). Then at the end you can just average each list.

How to make categories out of my text file and calculate average out of the numbers?

I am working on a assignment, but I am stuck and I do not know how to proceed.
I need to make different categories out of the different categories from the first line (from the txt file) and calculate averages over every numerical value. The program has to work flawless when I add new lines to the txt file.
Category;currency;sellerRating;Duration;endDay;ClosePrice;OpenPrice;Competitive?
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Automotive/Game;US;3249;5;Mon;0,01;0,01;No
Music/Automotive/Game;US;3249;5;Mon;0,01;0,01;No
This is the text file. I tried to make different categories out of them, but I do not know if I did it correctly and how to let Python know that he has to calculate all the numbers from 1 group.
with open('bijlage2.txt') as bestand:
maak_er_lists_van = [(line.strip()).split(';') for line in bestand]
keys = maak_er_lists_van[0]
lijst = list(zip([keys]*len(maak_er_lists_van[1:]),
maak_er_lists_van[1:]))
x = [zip(i[0], i[1]) for i in lijst]
maak_dict = [dict(i) for i in x]
for i in maak_dict:
categorieen =[i['Category'], i['currency'], i['sellerRating'],
i['Duration'], i['endDay'], i['ClosePrice'], i['OpenPrice'],
i['Competitive?']]
categorieen = list(map(int, categorieen))
This is what I have so far. I am a Python beginner so the whole text file thing is new to me. Can somebody help me or explain what I have to do so that I can work further on this project? Many thanks in advance!
Here's how I would do it. I had to add using locale.atof() because where I am . is used as the decimal point, not commas. You may have to change this as indicated.
The csv module is used to read the file, and the averages are computed in a two-step process. First the values for each category are summed, and then afterwards, the average value of each one is calculated based on the number of values read.
import csv
import locale
from pprint import pprint, pformat
import locale
#locale.setlocale(locale.LC_ALL, '') # empty string for platform's default settings
# Following used for testing to force ',' to be considered as a decimal point.
locale.setlocale(locale.LC_ALL, 'French_France.1252')
avg_names = 'sellerRating', 'Duration', 'ClosePrice', 'OpenPrice'
averages = {avg_name: 0 for avg_name in avg_names} # Initialze.
# Find total of each category of interest.
num_values = 0
with open('bijlage2.txt', newline='') as bestand:
csvreader = csv.DictReader(bestand, delimiter=';')
for row in csvreader:
num_values += 1
for avg_name in avg_names:
averages[avg_name] += locale.atof(row[avg_name])
# Calculate average of each summed value.
for avg_name, total in averages.items():
averages[avg_name] = total / num_values
print('raw results:')
pprint(averages)
print() # Formatted output
print('Averages:')
for avg_name in avg_names:
rounded = locale.format_string('%.2f', round(averages[avg_name], 2),
grouping=True)
print(' {:<13} {:>10}'.format(avg_name, rounded))
Output:
raw results:
{'ClosePrice': 0.01, 'Duration': 5.0, 'OpenPrice': 0.01, 'sellerRating': 3249.0}
Averages:
sellerRating 3 249,00
Duration 5,00
ClosePrice 0,01
OpenPrice 0,01
Everything is fine with your way to read the file and creating a dictionary with the categories and values, imo. Your list maak_dict contains one dictionary for every line. To calculate an average for one category, you could do something like this:
def calc_average(categ):
values = [i[categ] for i in maak_dict]
average = sum(values)/len(values)
return average
assuming that you want to calculate the mean average. categ has to be a string.
After that, you can create a new dictionary that contains all the averages:
new_dict = {}
for category in maak_dict[0].keys():
avg = calc_average(category)
new_dict[category] = avg

Print the lowest numeric value from a text file

I have a text file consisting of some stocks and their prices and what not, I am trying to print out the stock which has the lowest value along with the name of the company here is my code.
stocks = open("P:\COM661\stocks.txt")
name_lowest = ""
price_lowest = 0
for line in stocks:
rows = line.split("\t")
price = float(rows[2])
if price>price_lowest:
price_lowest = price
name_lowest = rows[1]
print(name_lowest + "\t" + str(price_lowest))
I'm trying to go through the file and compare each numeric value to the one before it to see if it is higher or lower and then at the end it should have saved the lowest price and print it along with the name of the company.
Instead it prints the value of the last company in the file along with its name.
How can I fix this?
You made 2 mistakes.
First is initialised the initial value to 0
You should initialise the initial value to the max available number in python float.
import sys
price_lowest = sys.float_info.max
Or else you could initialise it to the first element
Second your should if statement should be
if price<price_lowest:
Initialize:
price_lowest = 999999 # start with absurdly high value, or take first one
Plus your if check is the opposite.
Should be:
if price < price_lowest
Others already suggested a solution that fixes your current code. However, using Python you can have a shorter solution:
with open('file') as f:
print min(
[(i.split('\t')[0], float(i.split('\t')[1])) for i in f.readlines()],
key=lambda t: t[1]
)
Your "if" logic is backwards, it should be price<lowest_pre.
Just make a little adjustment start your price_lowest at None then set it to your first encounter and compare from there on
stocks = open("P:\COM661\stocks.txt")
name_lowest = ""
price_lowest = None
for line in stocks:
rows = line.split("\t")
price = float(rows[2])
if price_lowest = None:
price = price_lowest
name_lowest = rows[1]
elif price < price_lowest:
price_lowest = price
name_lowest = rows[1]
print(name_lowest + "\t" + str(price_lowest))

Categories

Resources