I have a text file with some cities and temperatures, like this:
City 1 16
City 2 4
...
City100 20
And Im showing the city with higher temperature with code below.
But I would like to show the 5 cities with higher temperature. Do you see a way to do this? Im here doing some tests but Im always showing 5 times the same city.
#!/usr/bin/env python
import sys
current_city = None
current_max = 0
city = None
for line in sys.stdin:
line = line.strip()
city, temperature = line.rsplit('\t', 1)
try:
temperature = float(temperature)
except ValueError:
continue
if temperature > current_max:
current_max = temperature
current_city = city
print '%s\t%s' % (current_city, current_max)
You can use heapq.nlargest:
import sys
import heapq
# Read cities temperatures pairs
pairs = [
(c, float(t))
for line in sys.stdin for c, t in [line.strip().rsplit('\t', 1)]
]
# Find 5 largest pairs based on second field which is temperature
for city, temperature in heapq.nlargest(5, pairs, key=lambda p: p[1]):
print city, temperature
I like pandas. This is not a complete answer, but I like to encourage people on their way of research. Check this out...
listA = [1,2,3,4,5,6,7,8,9]
import pandas as pd
df = pd.DataFrame(listA)
df.sort(0)
df.tail()
With Pandas, you'll want to learn about Series and DataFrames. DataFrames have a lot of functionality, you can name your columns, create directly from input files, sort by almost anything. There's the common unix words of head and tail (beggining and end), and you can specify count of rows returned....blah blah, blah blah, and so on. I liked the book, "Python for Data Analysis".
Store the list of temperatures and cities in a list. Sort the list. Then, take the last 5 elements: they will be your five highest temperatures.
Read the data into a list, sort the list, and show the first 5:
cities = []
for line in sys.stdin:
line = line.strip()
city, temp = line.rsplit('\t', 1)
cities.append((city, int(temp))
cities.sort(key=lambda city, temp: -temp)
for city, temp in cities[:5]:
print city, temp
This stores the city, temperature pairs in a list, which is then sorted. The key function in the sort tells the list to sort by temperature descending, so the first 5 elements of the list [:5] are the five highest temperature cities.
The following code performs exactly what you need:
fname = "haha.txt"
with open(fname) as f:
content = f.readlines()
content = [line.split(' ') for line in content]
for line in content:
line[1] = float(line[1])
from operator import itemgetter
content = sorted(content, key=itemgetter(1))
print content
to get the country with the highest temprature:
print content[-1]
to get the 5 countries with highest temperatures:
print content[-6:-1]
Related
So I have a text file containing 22 lines and three headers which is:
economy name
unique economy code given the World Bank standard (3 uppercase letters)
Trade-to-GDP from year 1990 to year 2019 (30 years, 30 data points); 0.3216 means that trade-to-gdp ratio for Australia in 1990
is 32.16%
The code I have used to import this file and open/read it is:
def Input(filename):
f = open(filename, 'r')
lines = f.readlines()
lines = [l.strip() for l in lines]
f.close()
return lines
However once I have done that I have to create a code with for-loops to create a list variable named result. It should contain 22 tuples, and each tuple contains four elements:
economy name,
World Bank economy code,
average trade-to-gdp ratio for this economy from 1990 to 2004,
average trade-to-gdp ratio for this economy from 2005 to 2019.
Coming out like
('Australia', 'AUS', '0.378', '0.423')
So far the code I have written looks like this:
def result:
name, age, height, weight = zip(*[l.split() for l in text_file.readlines()])
I am having trouble starting this and knowing how to grapple with the multiple years required and output all the countries with corresponding ratios.Here is the table of all the data I have on the text file.
I would suggest to use Pandas for this.
You can simply do:
import pandas as pd
df = read_csv('filename.csv')
for index, row in df.iterrows():
***Do something***
In for loop you can use row['columnName'] and get the data, For example: row['code'] or row['1999'].
This approach will be lot easier for you to carry operations and process the data.
Also to answer your approach:
You can iter over the lines and extract the data using index.
Try the below code:
def Input(filename):
f = open(filename, 'r')
lines = f.readlines()
lines = [l.strip().split() for l in lines]
f.close()
return lines
for line in lines[1:]:
total = sum([float(x) for x in line[2:17])# this will give you sum of values from 1990 to 2004
total2 = sum([float(x) for x in line[17:])# this will give you sum of values from 2005 to 2019
val= (line[0], line[1], total, total1) #This will give you tuple
You can continue the approach and create a tuple in each for loop.
I'm still new to python, so forgive me if my code seems rather messy or out of place. However, I need help with an assignment for university. I was wondering how I am able to find specific items in a CSV file? Here is what the assignment says:
Allow the user to type in a year, then, find the average life expectancy for that year. Then find the country with the minimum and the one with the maximum life expectancies for that year.
import csv
country = []
digit_code = []
year = []
life_expectancy = []
count = 0
lifefile = open("life-expectancy.csv")
with open("life-expectancy.csv") as lifefile:
for line in lifefile:
count += 1
if count != 1:
line.strip()
parts = line.split(",")
country.append(parts[0])
digit_code.append(parts[1])
year.append(parts[2])
life_expectancy.append(float(parts[3]))
highest_expectancy = max(life_expectancy)
country_with_highest = country[life_expectancy.index(max(life_expectancy))]
print(f"The country that has the highest life expectancy is {country_with_highest} at {highest_expectancy}!")
lowest_expectancy = min(life_expectancy)
country_with_lowest = country[life_expectancy.index(min(life_expectancy))]
print(f"The country that has the lowest life expectancy is {country_with_lowest} at {lowest_expectancy}!")
It looks like you only want the first and fourth tokens from each row in your CSV. Therefore, let's simplify it like this:
Hong Kong,,,85.29
Japan,,,85.03
Macao,,,84.68
Switzerland,,,84.25
Singapore,,,84.07
You can then process it like this:
FILE = 'life-expectancy.csv'
data = []
with open(FILE) as csv:
for line in csv:
tokens = line.split(',')
data.append((float(tokens[3]), tokens[0]))
hi = max(data)
lo = min(data)
print(f'The country with the highest life expectancy {hi[0]:.2f} is {hi[1]}')
print(f'The country with the lowest life expectancy {lo[0]:.2f} is {lo[1]}')
You have a CSV file of individual song ratings and you'd like to know the average rating for a particular song. The file will contain a single 1-5 rating for a song per line.
Write a function named average_rating that takes two strings as parameters where the first string represents the name of a CSV file containing song ratings in the format: "YouTubeID, artist, title, rating" and the second parameter is the YouTubeID of a song. The YouTubeID, artist, and title are all strings while the rating is an integer in the range 1-5. This function should return the average rating for the song with the inputted YouTubeID.
Note that each line of the CSV file is individual rating from a user and that each song may be rated multiple times. As you read through the file you'll need to track a sum of all the ratings as well as how many times the song has been rated to compute the average rating. (My code below)
import csv
def average_rating(csvfile, ID):
with open(csvfile) as f:
file = csv.reader(f)
total = 0
total1 = 0
total2 = 0
for rows in file:
for items in ID:
if rows[0] == items[0]:
total = total + int(rows[3])
for ratings in total:
total1 = total1 + int(ratings)
total2 = total2 + 1
return total1 / total2
I am getting error on input ['ratings.csv', 'RH5Ta6iHhCQ']: division by zero. How would I go on to resolve the problem?
You can do this by using pandas DataFrame.
import pandas as pd
df = pd.read_csv('filename.csv')
total_sum = df[df['YouTubeID'] == 'RH5Ta6iHhCQ'].rating.sum()
n_rating = len(df[df['YouTubeID'] == 'RH5Ta6iHhCQ'].rating)
average = total_sum/n_rating
There are a few confusing things, I think renaming variables and refactoring would be a smart decision. It might even make things more obvious if one function was tasked with getting all the rows for a specific youtube id and another function for calculating the average.
def average_rating(csvfile, id):
'''
Calculate the average rating of a youtube video
params: - csvfile: the location of the source rating file
- id: the id of the video we want the average rating of
'''
total_ratings = 0
count = 0
with open(csvfile) as f:
file = csv.reader(f)
for rating in file:
if rating[0] == id:
count += 1
total_ratings += rating[3]
if count == 0:
return 0
return total_ratings / count
import csv
def average_rating(csvfile, ID) :
with open(csvfile) as f:
file = csv.reader(f)
cont = 0
total = 0
for rows in file:
if rows[0] == ID:
cont = cont + 1
total = total + int(rows[3])
return total/cont
this works guyx
I looked around for a while and didn't find anything that matched what I was doing.
I have this code:
import csv
import datetime
legdistrict = []
reader = csv.DictReader(open('active.txt', 'rb'), delimiter='\t')
for row in reader:
if '27' in row['LegislativeDistrict']:
legdistrict.append(row)
ages = []
for i,value in enumerate(legdistrict):
dates = datetime.datetime.now() - datetime.datetime.strptime(value['Birthdate'], '%m/%d/%Y')
ages.append(int(datetime.timedelta.total_seconds(dates) / 31556952))
total_values = len(ages)
total = sum(ages) / total_values
print total_values
print sum(ages)
print total
which searches a tab-delimited text file and finds the rows in the column named LegislativeDistrict that contain the string 27. (So, finding all rows that are in the 27th LD.) It works well, but I run into issues if the string is a single digit number.
When I run the code with 27, I get this result:
0 ;) eric#crunchbang ~/sbdmn/May 2014 $ python data.py
74741
3613841
48
Which means there are 74,741 values that contain 27, with combined ages of 3,613,841, and an average age of 48.
But when I run the code with 4 I get this result:
0 ;) eric#crunchbang ~/sbdmn/May 2014 $ python data.py
1177818
58234407
49
The first result (1,177,818) is much too large. There are no LDs in my state over 170,000 people, and my lists deal with voters only.
Because of this, I'm assuming using 4 is finding all the values that have 4 in them... so 14, 41, and 24 would all be used thus causing the huge number.
Is there a way I can search for a value in a specific column and use a regex or exact search? Regex works, but I can't get it to search just one column -- it searches the entire text file.
My data looks like this:
StateVoterID CountyVoterID Title FName MName LName NameSuffix Birthdate Gender RegStNum RegStFrac RegStName RegStType RegUnitType RegStPreDirection RegStPostDirection RegUnitNum RegCity RegState RegZipCode CountyCode PrecinctCode PrecinctPart LegislativeDistrict CongressionalDistrict Mail1 Mail2 Mail3 Mail4 MailCity MailZip MailState MailCountry Registrationdate AbsenteeType LastVoted StatusCode
IDNUMBER OTHERIDNUMBER NAME MI 01/01/1900 M 123 FIRST ST W CITY STATE ZIP MM 123 4 AGE 5 01/01/1950 N 01/01/2000 B
'4' in '400' will return True as in does a substring check. Use instead '4' == '400', which only will return True if the two strings are identical:
if '4' == row['LegislativeDistrict']:
(...)
I have written a program in python, where I have used a hash table to read data from a file and then add data in the last column of the file corresponding to the values in the 2nd column of the file. for example, for all entries in column 2 with same values, the corresponding last column values will be added.
Now I have implemented the above successfully. Now I want to sort the table in descending order according to last column values and print these values and the corresponding 2nd column (key) values. i am not able to figure out on how to do this. Can anyone please help ?
pmt txt file is of the form
0.418705 2 3 1985 20 0
0.420657 4 5 119 3849 5
0.430000 2 3 1985 20 500
and so on...
So, for example, for number 2 in column 2, i have added all data of last column corresponding to all numbers '2' in the 2nd column. So, this process will continue for the next set of numbers lie 4, 5 ,etc in column 2.
I'm using python 3
import math
source_ip = {}
f = open("pmt.txt","r",1)
lines = f.readlines()
for line in lines:
s_ip = line.split()[1]
bit_rate = int(line.split()[-1]) + 40
if s_ip in source_ip.keys():
source_ip[s_ip] = source_ip[s_ip] + bit_rate
print (source_ip[s_ip])
else:
source_ip[s_ip] = bit_rate
f.close()
for k in source_ip.keys():
print(str(k)+": "+str(source_ip[k]))
print ("-----------")
It sounds like you want to use the sorted function with a key parameter that gets the value from the key/value tuple:
sorted_items = sorted(source_ip.items(), key=lambda x: x[1])
You could also use itemgetter from the operator module, rather than a lambda function:
import operator
sorted_items = sorted(source_ip.items(), key=operator.itemgetter(1))
How about something like this?
#!/usr/local/cpython-3.4/bin/python
import collections
source_ip = collections.defaultdict(int)
with open("pmt.txt","r",1) as file_:
for line in file_:
fields = line.split()
s_ip = fields[1]
bit_rate = int(fields[-1]) + 40
source_ip[s_ip] += bit_rate
print (source_ip[s_ip])
for key, value in sorted(source_ip.items()):
print('{}: {}'.format(key, value))
print ("-----------")