How to make dict from list components - python - python

I have a list of dates:
dates = ['2018-11-13 ', '2018-11-14 ']
and I have a list of weather data for various cities:
weather_data = [('Carbondale', 1875.341, '2018-11-13 '), ('Carbondale', 1286.16, '2018-11-14 '), ('Davenport', 708.5, '2018-11-13 '), ('Davenport', 506.1, '2018-11-14 ')]
i[1] in weather_data is a climate score, based on climatic info for each day. I have shortened the above lists for the sake of this example.
My goal is to find the city with the lowest climate score for each day. I thought a good way to do that would be to put them in a dictionary.
An example of what I want is...
conditions_dict = {'2018-11-13': ('Carbondale',1875.341), ('Davenport', 708.5)}
and my end output would be...
The best weather on 2018-11-13 is in Davenport with a value of 708.5
Basically, if I had a dict with a date as the key, and (city,value) as the value, I could then easily find the lowest value by city for each day.
However, I cannot figure how to make my dictionary look like this. The part I am really struggling with is how to match the date to multiple readings for various cities on one day.
Is using a dictionary even a good way to do this?

You don't really need an intermediate dict with all cities and scores for each date if your goal is find the minimum score and city of each date since you can simply iterate through weather_data and keep track of the lowest score so far and its associated city for each date in a dict:
min_score_of_date = {}
for city, score, date in weather_data:
if date not in min_score_of_date or score < min_score_of_date.get(date)[1]:
min_score_of_date[date] = (city, score)
Given your sample input, min_score_of_date would become:
{'2018-11-13 ': ('Davenport', 708.5), '2018-11-14 ': ('Davenport', 506.1)}

This is another way you can go about it if the lowest temperature dates haven't already been filtered for you.
# each date has a tuple of cities and their temperature
conditions = {
'2018-11-13': (
('Carbondale',1875.341),
('Davenport', 708.5)
)
}
# loop through every date
for date, cities in conditions.items():
# for every date, loop through its values
# grab its temperateure and add to the list
# them find the minimun temperature
# get all tempertures
tempertures = [_[1] for _ in cities]
# get minimum temperature
min_temperture = min(tempertures)
# loop throught all cities
for city in cities:
# if a city matches min_temperature do whats bellow
if min_temperture in city:
# city name
name = city[0]
# city temperture
temperture = str(city[1])
print(
"The best weather on "\
+ date\
+ "is in "\
+ name + " with a value of "\
+ temperture
)

Related

pandas returning line number and type?

I have a csv file, and using python get the highest average price of avocado from the data. All works fine until printing the region
avocadoesDB = pd.read_csv("avocado.csv")
avocadoesDB = pd.DataFrame(avocadoesDB)
avocadoesDB = avocadoesDB[['AveragePrice', 'type', 'year', 'region']]
regions = avocadoesDB[['AveragePrice', 'region']]
regionMax = max(regions['AveragePrice'])
region = regions.loc[regions['AveragePrice']==regionMax]
print(f"The highest average price for both types of potatoes is ${regionMax} from {region['region']}.")
Output:
The highest average price for both types of potatoes is $3.25 from 14125 SanFrancisco
Name: region, dtype: object.
Expected:
The highest average price for both types of potatoes is $3.25 from SanFrancisco.
So i've tried to copy the similar method on a simple dataset and i've seem to make it work, here's the code snippet
mx = max(df1['Salary'])
plc = df.loc[df1['Salary']==mx]['Name']
print('Max Sal : ' + str(plc.iloc[0]))
Output:
Max Sal : Farah
According to this post on Stack Overflow, when you use df.loc[df1['Salary']==mx]['Name'] , A Series Object is returned, and so to retrieve the value of the desired column, you use [0], if I understood the post correctly.
So for your code, you can replace
region = regions.loc[regions['AveragePrice']==regionMax]
print(f"The highest average price for both types of potatoes is ${regionMax} from {region['region']}.")
with
region = regions.loc[regions['AveragePrice']==regionMax]['region']
print(f"The highest average price for both types of potatoes is ${regionMax} from {region}.")
This should work. Hope this helps!

How to find specific items in a CSV file using inputs?

I'm still new to python, so forgive me if my code seems rather messy or out of place. However, I need help with an assignment for university. I was wondering how I am able to find specific items in a CSV file? Here is what the assignment says:
Allow the user to type in a year, then, find the average life expectancy for that year. Then find the country with the minimum and the one with the maximum life expectancies for that year.
import csv
country = []
digit_code = []
year = []
life_expectancy = []
count = 0
lifefile = open("life-expectancy.csv")
with open("life-expectancy.csv") as lifefile:
for line in lifefile:
count += 1
if count != 1:
line.strip()
parts = line.split(",")
country.append(parts[0])
digit_code.append(parts[1])
year.append(parts[2])
life_expectancy.append(float(parts[3]))
highest_expectancy = max(life_expectancy)
country_with_highest = country[life_expectancy.index(max(life_expectancy))]
print(f"The country that has the highest life expectancy is {country_with_highest} at {highest_expectancy}!")
lowest_expectancy = min(life_expectancy)
country_with_lowest = country[life_expectancy.index(min(life_expectancy))]
print(f"The country that has the lowest life expectancy is {country_with_lowest} at {lowest_expectancy}!")
It looks like you only want the first and fourth tokens from each row in your CSV. Therefore, let's simplify it like this:
Hong Kong,,,85.29
Japan,,,85.03
Macao,,,84.68
Switzerland,,,84.25
Singapore,,,84.07
You can then process it like this:
FILE = 'life-expectancy.csv'
data = []
with open(FILE) as csv:
for line in csv:
tokens = line.split(',')
data.append((float(tokens[3]), tokens[0]))
hi = max(data)
lo = min(data)
print(f'The country with the highest life expectancy {hi[0]:.2f} is {hi[1]}')
print(f'The country with the lowest life expectancy {lo[0]:.2f} is {lo[1]}')

Iterating string elements in a list and appending a portion of that string into an empty list

In the code below there is a transaction list containing the name , price , color and the date of a transaction . I want to append the names say 'John', 'Jay' in the customers list , the price say $1.21 ,$2.12 in sales list and the color say 'white', 'red' in the color list.
Iterating the list would just give the elements within the ' ' (quotes). How do i append the names , price , color specifically into these empty lists ?:
transaction = ['John:$1.21:white:09/15/17','Jay:$2.12:red:09/15/17','Leo:$3,5:blue:09/15/17']
customers = [names_of_customer]
sales = [price_of_goods]
color = [color_of_goods]
You can use the below code snippet which uses the split method to achieve the required output.
transaction = ['John:$1.21:white:09/15/17','Jay:$2.12:red:09/15/17','Leo:$3,5:blue:09/15/17']
customers=[]
sales=[]
color=[]
for tran in transaction:
elems = tran.split(':')
customers.append(elems[0])
sales.append(elems[1])
color.append(elems[2])
print customers
print sales
print color

Show the 5 cities with higher temperature from a text file

I have a text file with some cities and temperatures, like this:
City 1 16
City 2 4
...
City100 20
And Im showing the city with higher temperature with code below.
But I would like to show the 5 cities with higher temperature. Do you see a way to do this? Im here doing some tests but Im always showing 5 times the same city.
#!/usr/bin/env python
import sys
current_city = None
current_max = 0
city = None
for line in sys.stdin:
line = line.strip()
city, temperature = line.rsplit('\t', 1)
try:
temperature = float(temperature)
except ValueError:
continue
if temperature > current_max:
current_max = temperature
current_city = city
print '%s\t%s' % (current_city, current_max)
You can use heapq.nlargest:
import sys
import heapq
# Read cities temperatures pairs
pairs = [
(c, float(t))
for line in sys.stdin for c, t in [line.strip().rsplit('\t', 1)]
]
# Find 5 largest pairs based on second field which is temperature
for city, temperature in heapq.nlargest(5, pairs, key=lambda p: p[1]):
print city, temperature
I like pandas. This is not a complete answer, but I like to encourage people on their way of research. Check this out...
listA = [1,2,3,4,5,6,7,8,9]
import pandas as pd
df = pd.DataFrame(listA)
df.sort(0)
df.tail()
With Pandas, you'll want to learn about Series and DataFrames. DataFrames have a lot of functionality, you can name your columns, create directly from input files, sort by almost anything. There's the common unix words of head and tail (beggining and end), and you can specify count of rows returned....blah blah, blah blah, and so on. I liked the book, "Python for Data Analysis".
Store the list of temperatures and cities in a list. Sort the list. Then, take the last 5 elements: they will be your five highest temperatures.
Read the data into a list, sort the list, and show the first 5:
cities = []
for line in sys.stdin:
line = line.strip()
city, temp = line.rsplit('\t', 1)
cities.append((city, int(temp))
cities.sort(key=lambda city, temp: -temp)
for city, temp in cities[:5]:
print city, temp
This stores the city, temperature pairs in a list, which is then sorted. The key function in the sort tells the list to sort by temperature descending, so the first 5 elements of the list [:5] are the five highest temperature cities.
The following code performs exactly what you need:
fname = "haha.txt"
with open(fname) as f:
content = f.readlines()
content = [line.split(' ') for line in content]
for line in content:
line[1] = float(line[1])
from operator import itemgetter
content = sorted(content, key=itemgetter(1))
print content
to get the country with the highest temprature:
print content[-1]
to get the 5 countries with highest temperatures:
print content[-6:-1]

How to efficiently iterate through a dictionary?

I'm new to Python and programming.
My textbook says I have to do the following problem set:
Create a second purchase summary that which accumulates total investment by ticker symbol. In the
above sample data, there are two blocks of CAT.
These can easily be combined by creating a dict where
the key is the ticker and the value is the list of blocks purchased. The program makes one pass
through the data to create the dict. A pass through the dict can then create a report showing each
ticker symbol and all blocks of stock.
I cannot think of a way, apart from hard-coding, to add the two entries of the 'CAT' stock.
## Stock Reports
stockDict = {"GM":"General Motors", "CAT":"Caterpillar", "EK":"Eastman Kodak",
"FB":"Facebook"}
# symbol,prices,dates,shares
purchases = [("GM",100,"10-sep-2001",48), ("CAT",100,"01-apr-1999",24),
("FB",200,"01-jul-2013",56), ("CAT", 200,"02-may-1999",53)]
# purchase history:
print "Company", "\t\tPrice", "\tDate\n"
for stock in purchases:
price = stock[1] * stock[3]
name = stockDict[stock[0]]
print name, "\t\t", price, "\t", stock[2]
print "\n"
# THIS IS THE PROBLEM SET I NEED HELP WITH:
# accumulate total investment by ticker symbol
byTicker = {}
# create dict
for stock in purchases:
ticker = stock[0]
block = [stock]
if ticker in byTicker:
byTicker[ticker] += block
else:
byTicker[ticker] = block
for i in byTicker.values():
shares = i[0][3]
price = i[0][1]
investment = shares * price
print investment
Right now, the output is:
4800
11200
2400
It's not good because it does not calculate the two CAT stocks. Right now it only calculates one. The code should be flexible enough that I could add more CAT stocks.
Your problem is in the last part of your code, the penultimate bit creates a list of all stocks against each ticker, which is fine:
for i in byTicker.values():
shares = i[0][3]
price = i[0][1]
investment = shares * price
print investment
Here you only use the zeroth stock for each ticker. Instead, try:
for name, purchases in byTicker.items():
investment = sum(shares * price for _, shares, _, price in purchases)
print name, investment
This will add up all of the stocks for each ticker, and for your example gives me:
CAT 13000
FB 11200
GM 4800
The problem with your code is that you are not iterating over the purchaes, but just getting the first element from each ticker value. That is, byTicker looks something like:
byTicker: {
"GM": [("GM",100,"10-sep-2001",48)],
"CAT": [("CAT",100,"01-apr-1999",24), ("CAT", 200,"02-may-1999",53)],
"FB": [("FB",200,"01-jul-2013",56)]
}
so when you iterate over the values, you actually get three lists. But when you process these lists, you are actually accessing only the first of them:
price = i[0][1]
for the value corresponding to "CAT", i[0] is ("CAT",100,"01-apr-1999",24). You should look into i[1] as well! Consider iterating over the different purchases:
for company, purchases in byTicker.items():
investment = 0
for purchase in purchases:
investment += purchase[1] * purchase[3]
print(company, investment)
Maybe something like this:
## Stock Reports
stockDict = {"GM":"General Motors", "CAT":"Caterpillar", "EK":"Eastman Kodak",
"FB":"Facebook"}
# symbol,prices,dates,shares
purchases = [("GM",100,"10-sep-2001",48), ("CAT",100,"01-apr-1999",24),
("FB",200,"01-jul-2013",56), ("CAT", 200,"02-may-1999",53)]
# purchase history:
print "Company", "\t\tPrice", "\tDate\n"
for stock in purchases:
price = stock[1] * stock[3]
name = stockDict[stock[0]]
print name, "\t\t", price, "\t", stock[2]
print "\n"
# THIS IS THE PROBLEM SET I NEED HELP WITH:
# accumulate total investment by ticker symbol
byTicker = {}
# create dict
for stock in purchases:
ticker = stock[0]
price = stock[1] * stock[3]
if ticker in byTicker:
byTicker[ticker] += price
else:
byTicker[ticker] = price
for ticker, price in byTicker.iteritems():
print ticker, price
The output I get is:
Company Price Date
General Motors 4800 10-sep-2001
Caterpillar 2400 01-apr-1999
Facebook 11200 01-jul-2013
Caterpillar 10600 02-may-1999
GM 4800
FB 11200
CAT 13000
which appears to be correct.
Testing whether or not a ticker is in the byTicker dict tells you whether or not there's already been a purchase recorded for that stock. If there is, you just add to it, if not, you start fresh. This is basically what you were doing, except for some reason you were collecting all of the purchase records for a given stock in that dict, when all you really cared about was the price of the purchase.
You could build the dict the same way you were originally, and then iterate over the items stored under each key, and add them up. Something like this:
totals = []
for ticker in byTicker:
total = 0
for purchase in byTicker[ticker]:
total += purchase[1] * purchase[3]
totals.append((ticker, total))
for ticker, total in totals:
print ticker, total
And just for kicks, you could compress it all into one line with generator statements:
print "\n".join("%s: %d" % (ticker, sum(purchase[1]*purchase[3] for purchase in byTicker[ticker])) for ticker in byTicker)
Either of these last two are completely unnecessary to do though, since you're already iterating through every purchase, you may as well just accumulate the total price for each stock as you go, as I showed in the first example.

Categories

Resources