Iterating over a csv file

Iterating over a csv file - python

I have to produce a function that will return a tuple containing the maximum temperature in the country and a list of the cities in which that temperature has been recorded. The data was input as a CSV file in the format:
Month, Jan, Feb, March....etc.
So far I have managed to get the maximum temperature but when it tries to find the cities in which that temperature was found, it comes up as an empty list.
def hottest_city(csv_filename):
import csv
file = open(csv_filename)
header = next(file)
data = csv.reader(file)
city_maxlist = []
for line in data:
maxtemp = 0.0
for temp in line[1:]:
if float(temp >= maxtemp:
maxtemp = float(temp)
city_maxlist.append([maxtemp, line[0])
maxtempcity = 0.0
for city in city_maxlist:
if city[0] >= maxtempcity:
maxtempcity = city[0]
citylist = []
for line in data:
for temp in line:
if temp == maxtempcity:
citylist.append(line[0])
return (maxtempcity, citylist)

Your main problem is that "data" is an iterator, so you can iterate over it only once (your second loop over data does not loop at all).
data = list(csv.reader(file))
should help. Then as Hans Then noticed, you did not provide the code you execute, so I let you correct your tests in the second loop over data.

Related

How to write CSV files with Python using print statements with variables in them

I'm a noob here and I have pretty straight forward question with writing CSV output files with Python. I've been googling this for a while and I can't find an answer to my q. I have a bunch of tasks which have to output answers to the terminal as well as write the answers to a CSV output file. I've got all the correct answers in the terminal but I can't figure out how to write them to a CSV file. My print statements contain variables, and I need the value of the variable printed. I.E. "The total profit/loss for this period is: $22564198" should be printed to the CSV not the print statement format which is: 'The total profit/loss for this period is: ${total}'
I'm copying my code below.
import os
import csv
date = []
profloss = []
changes = []
total = 0
totalChange = 0
mo2mo = {}
budget_csv = os.path.join(xxxxxx)
with open(budget_csv) as csvfile:
csvreader = csv.reader(csvfile, delimiter=",")
#splitting the two columns into seperate lists
for row in csvreader:
date.append(row[0])
profloss.append(row[1])
#removing header rows
date.pop(0)
profloss.pop(0)
#printing how many months are in the data set
dataLen = len(date)
countMonths = "This data set has " + str(len(date)) + " months."
#calculating total profit/loss
for i in range(0, len(profloss)):
profloss[i] = int(profloss[i])
total = total + (profloss[i])
print(f'The total profit/loss for this period is: ${total}')
#calculating the difference between months and adding it to a list
for i in range(0, len(profloss)-1):
difference = (profloss[i+1]) - (profloss[i])
changes.append(difference)
#removing the first element in date to make a dictionary for dates: changes
date.pop(0)
#creating a dictionary of months as keys and change as values, starting with the second month
mo2mo = {date[i]: changes[i] for i in range(len(date))}
#calculating the average change from one month to the next
for i in range(0, len(changes)):
totalChange = totalChange + changes[i]
avChange = totalChange/len(changes)
print(f'The average change from month to month for this dataset is: {round((avChange),2)}')
#getting the month with the maximum increase
keyMax = max(mo2mo, key= lambda x: mo2mo[x])
for key,value in mo2mo.items():
if key == keyMax:
print(f'The month with the greatest increase was: {key} ${value}')
#getting the month with the maximum decrease
keyMin = min(mo2mo, key= lambda x: mo2mo[x])
for key, value in mo2mo.items():
if key == keyMin:
print(f'The maximum decrease in profits was: {key} ${value}')
outputCSV = ("countMonths",)
output_path = os.path.join("..", "output", "PyBankAnswers.csv")
#writing outcomes to csv file
with open(output_file,"w") as datafile:
writer = csv.writer(datafile)
for row in writer.writerow()
I've only got experience printing whole lists to csv files and not actual statements of text. I've tried to find how to do this but I'm not having luck. There's got to be a way without me just writing out the sentence I want printed by hand and having the CSV writer print that statement? OR do I just have to copy the sentence from the terminal and then print those statements row by row?

The print() function accepts a file option to specify that the output should be written to an open file stream. So you can make all your print() statements twice, once to the terminal and then to the file. To avoid all this duplication, you can put that into a function. Then call that instead of print() everywhere.
import os
import csv
date = []
profloss = []
changes = []
total = 0
totalChange = 0
mo2mo = {}
def print_to_terminal_and_file(f, *args, **kwargs):
print(*args, **kwargs)
print(*args, file=f, **kwargs)
budget_csv = os.path.join(xxxxxx)
with open(budget_csv) as csvfile:
csvreader = csv.reader(csvfile, delimiter=",")
#splitting the two columns into seperate lists
for row in csvreader:
date.append(row[0])
profloss.append(int(row[1]))
#removing header rows
date.pop(0)
profloss.pop(0)
output_path = os.path.join("..", "output", "PyBankAnswers.txt")
#writing outcomes to text file
with open(output_file,"w") as datafile:
#printing how many months are in the data set
print_to_terminal_and_file(datafile, f"This data set has {len(date)} months.")
#calculating total profit/loss
total = sum(profloss)
print_to_terminal_and_file(datafile, f'The total profit/loss for this period is: ${total}')
#calculating the difference between months and adding it to a list
for i in range(0, len(profloss)-1):
difference = (profloss[i+1]) - (profloss[i])
changes.append(difference)
#removing the first element in date to make a dictionary for dates: changes
date.pop(0)
#creating a dictionary of months as keys and change as values, starting with the second month
mo2mo = {date[i]: changes[i] for i in range(len(date))}
#calculating the average change from one month to the next
for i in range(0, len(changes)):
totalChange = totalChange + changes[i]
avChange = totalChange/len(changes)
print_to_terminal_and_file(datafile, f'The average change from month to month for this dataset is: {round((avChange),2)}')
#getting the month with the maximum increase
keyMax = max(mo2mo, key= lambda x: mo2mo[x])
for key,value in mo2mo.items():
if key == keyMax:
print_to_terminal_and_file(datafile, f'The month with the greatest increase was: {key} ${value}')
#getting the month with the maximum decrease
keyMin = min(mo2mo, key= lambda x: mo2mo[x])
for key, value in mo2mo.items():
if key == keyMin:
print_to_terminal_and_file(datafile, f'The maximum decrease in profits was: {key} ${value}')

How to find minimum value from CSV file row in Python?

I'm a beginner at Python and I am using it for my project.
I want to extract the minimum value from column4 of a CSV file and I am not sure how to.
I can print the whole of column[4] but am not sure how to print the minimum value (just one column) from column[4].
CSV File: https://www.emcsg.com/marketdata/priceinformation
I'm downloading the Uniform Singapore Energy Price & Demand Forecast for 9 Sep.
Thank you in advance.
This is my current codes:
import csv
import operator
with open('sep.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
header = next(readCSV)
data = []
for row in readCSV:
Date = row[0]
Time = row[1]
Demand = row[2]
RCL = row[3]
USEP = row [4]
EHEUR = row [5]
LCP = row[6]
Regulations = row[7]
Primary = row[8]
Secondary = row[9]
Contingency = row[10]
Last_Updated = row[11]
print header[4]
print row[4]

not sure how are you reading the values. however, you can add all the values to and list and then:
list = []
<loop to extract values>
list.insert(index, value)
min_value = min(list)
Note: index is the 'place' where the value get inserted.

Your phrasing is a bit ambiguous. At first I thought you meant the minimum of the fourth row, but looking at the data you seem to be wanting the minimum of the fourth column (USEP($/MWh)). For that, (assuming that "Realtime_Sep-2017.csv" is the filename) you can do:
import pandas as pd
df = pd.read_csv("Realtime_Sep-2017.csv")
print(min(df["USEP($/MWh)"])
Other options include df.min()["USEP($/MWh)"], df.min()[4], and min(df.iloc[:,4])

EDIT 2 :
Solution for a column without pandas module:
with open("Realtime_Sep-2017.csv") as file:
lines = file.read().split("\n") #Read lines
num_list = []
for line in lines:
try:
item = line.split(",")[4][1:-1] #Choose 4th column and delete ""
num_list.append(float(item)) #Try to parse
except:
pass #If it can't parse, the string is not a number
print(max(num_list)) #Prints maximum value
print(min(num_list)) #Prints minimum value
Output:
81.92
79.83
EDIT :
Here is the solution for a column:
import pandas as pd
df = pd.read_csv("Realtime_Sep-2017.csv")
row_count = df.shape[0]
column_list = []
for i in range(row_count):
item = df.at[i, df.columns.values[4]] #4th column
column_list.append(float(item)) #parse float and append list
print(max(column_list)) #Prints maximum value
print(min(column_list)) #Prints minimum value
BEFORE EDIT :
(solution for a row)
Here is a simple code block:
with open("Realtime_Sep-2017.csv") as file:
lines = file.read().split("\n") #Reading lines
num_list = []
line = lines[3] #Choosing 4th row.
for item in line.split(","):
try:
num_list.append(float(item[1:-1])) #Try to parse
except:
pass #If it can't parse, the string is not a number
print(max(num_list)) #Prints maximum value
print(min(num_list)) #Prints minimum value

Add column from one .csv to another .csv file using Python

I am currently writing a script where I am creating a csv file ('tableau_input.csv') composed both of other csv files columns and columns created by myself. I tried the following code:
def make_tableau_file(mp, current_season = 2016):
# Produces a csv file containing predicted and actual game results for the current season
# Tableau uses the contents of the file to produce visualization
game_data_filename = 'game_data' + str(current_season) + '.csv'
datetime_filename = 'datetime' + str(current_season) + '.csv'
with open('tableau_input.csv', 'wb') as writefile:
tableau_write = csv.writer(writefile)
tableau_write.writerow(['Visitor_Team', 'V_Team_PTS', 'Home_Team', 'H_Team_PTS', 'True_Result', 'Predicted_Results', 'Confidence', 'Date'])
with open(game_data_filename, 'rb') as readfile:
scores = csv.reader(readfile)
scores.next()
for score in scores:
tableau_content = score[1::]
# Append True_Result
if int(tableau_content[3]) > int(tableau_content[1]):
tableau_content.append(1)
else:
tableau_content.append(0)
# Append 'Predicted_Result' and 'Confidence'
prediction_results = mp.make_predictions(tableau_content[0], tableau_content[2])
tableau_content += list(prediction_results)
tableau_write.writerow(tableau_content)
with open(datetime_filename, 'rb') as readfile2:
days = csv.reader(readfile2)
days.next()
for day in days:
tableau_write.writerow(day)
'tableau_input.csv' is the file I am creating. The columns 'Visitor_Team', 'V_Team_PTS', 'Home_Team', 'H_Team_PTS' come from 'game_data_filename'(e.g tableau_content = score[1::]). The columns 'True_Result', 'Predicted_Results', 'Confidence' are columns created in the first for loop.
So far, everything works but finally I tried to add to the 'Date' column data from the 'datetime_filename' using the same structure as above but when I open my 'tableau_input' file, there is no data in my 'Date' column. Can someone solve this problem?
For info, below are screenshots of csv files respectively for 'game_data_filename' and 'datetime_filename' (nb: datetime values are in datetime format)

It's hard to test this as I don't really know what the input should look like, but try something like this:
def make_tableau_file(mp, current_season=2016):
# Produces a csv file containing predicted and actual game results for the current season
# Tableau uses the contents of the file to produce visualization
game_data_filename = 'game_data' + str(current_season) + '.csv'
datetime_filename = 'datetime' + str(current_season) + '.csv'
with open('tableau_input.csv', 'wb') as writefile:
tableau_write = csv.writer(writefile)
tableau_write.writerow(
['Visitor_Team', 'V_Team_PTS', 'Home_Team', 'H_Team_PTS', 'True_Result', 'Predicted_Results', 'Confidence', 'Date'])
with open(game_data_filename, 'rb') as readfile, open(datetime_filename, 'rb') as readfile2:
scoreReader = csv.reader(readfile)
scores = [row for row in scoreReader]
scores = scores[1::]
daysReader = csv.reader(readfile2)
days = [day for day in daysReader]
if(len(scores) != len(days)):
print("File lengths do not match")
else:
for i in range(len(days)):
tableau_content = scores[i][1::]
tableau_date = days[i]
# Append True_Result
if int(tableau_content[3]) > int(tableau_content[1]):
tableau_content.append(1)
else:
tableau_content.append(0)
# Append 'Predicted_Result' and 'Confidence'
prediction_results = mp.make_predictions(tableau_content[0], tableau_content[2])
tableau_content += list(prediction_results)
tableau_content += tableau_date
tableau_write.writerow(tableau_content)
This combines both of the file reading parts into one.
As per your questions below:
scoreReader = csv.reader(readfile)
scores = [row for row in scoreReader]
scores = scores[1::]
This uses list comprehension to create a list called scores, with every element being one of the rows from scoreReader. As scorereader is a generator, every time we ask it for a row, it spits one out for us, until there are no more.
The second line scores = scores[1::] just chops off the first element of the list, as you don't want the header.
For more info try these:
Generators on Wiki
List Comprehensions
Good luck!

Transforming a text file into column vectors

I have a text file that I would like to break up into column vectors:
dtstamp ozone ozone_8hr_avg
06/18/2015 14:00:00 0.071 0.059
06/18/2015 13:00:00 0.071 0.053
How do I produce output in the following format?
dtstamp = [06/18/2015 14:00:00, 06/18/2015]
ozone = [0.071, 0.071]
etc.

import datetime
dtstamp = [] # initialize the dtstamp list
ozone = [] # initialize the ozone list
with open('file.txt', 'r') as f:
next(f) # skip the title line
for line in f: # iterate through the file
if not line: continue # skip blank lines
day, time, value, _ = line.split() # split up the line
dtstamp.append(datetime.datetime.strptime(' '.join((date, time)),
'%m/%d/%Y %H:%M:%S') # add a date
ozone.append(float(value)) # add a value
You can then combine these lists with zip to work with corresponding dates/values:
for date, value in zip(dtstamp, ozone):
print(date, value) # just an example

Few of the other answers seem to give errors on running them.
Try this, it should work like a charm!
dtstmp = []
ozone = []
ozone_8hr_avg = []
with open('file.txt', 'r') as file:
next(file)
for line in file:
if (line=="\n") or (not line): #If a blank line occurs
continue
words = line.split() #Extract the words
dtstmp.append(' '.join(words[0::1]))#join the date
ozone.append(words[2]) #Add ozone
ozone_8hr_avg.append(words[3]) #Add the third entry
print "dtstmp =", dtstmp
print "ozone =", ozone
print "ozone_8hr_avg =", ozone_8hr_avg

I would check out pandashttp://pandas.pydata.org or the csv module. With cvs you'll have to make the columns yourself, since it will give you rows.
rows = [row for row in csv.reader(file, delimiter='\t') ] #get the rows
col0 = [ row[0] for row in rows ] # construct a colonm from element 0 of each row.

Try this my friend:
# -*- coding: utf8 -*-
file = open("./file.txt")
lines = file.readlines()
data = []
data_hour = []
ozone = []
ozone_8hr_avg = []
for i_line in lines:
data.append(i_line.split()[0:2])
data_hour.append(' '.join(data[-1]))
ozone.append(i_line.split()[2])
ozone_8hr_avg.append(i_line.split()[3])
#print (data)
print (data_hour)
print (ozone)
print (ozone_8hr_avg)
If that helps you remember accept the answer.

Finding maximum values within a data set given restrictions

I have a task where I have been giving a set of data as follows
Station1.txt sample #different sets of data for different no. stations
Date Temperature
19600101 46.1
19600102 46.7
19600103 99999.9 #99999 = not recorded
19600104 43.3
19600105 38.3
19600106 40.0
19600107 42.8
I am trying to create a function
display_maxs(stations, dates, data, start_date, end_date) which displays
a table of maximum temperatures for the given station/s and the given date
range. For example:
stations = load_stations('stations2.txt')
5
data = load_all_stations_data(stations)
dates = load_dates(stations)
display_maxs(stations, dates, data, '20021224','20021228' #these are date yyyy/mm/dd)
I have created functions for data
def load_all_stations_data(stations):
data = {}
file_list = ("Brisbane.txt", "Rockhampton.txt", "Cairns.txt", "Melbourne.txt", "Birdsville.txt", "Charleville.txt") )
for file_name in file_list:
file = open(stations(), 'r')
station = file_name.split()[0]
data[station] = []
for line in file:
values = line.strip().strip(' ')
if len(values) == 2:
data[station] = values[1]
file.close()
return data
functions for stations
def load_all_stations_data(stations):
stations = []
f = open(stations[0] + '.txt', 'r')
stations = []
for line in f:
x = (line.split()[1])
x = x.strip()
temp.append(x)
f.close()
return stations
and functions for dates
def load_dates(stations):
f = open(stations[0] + '.txt', 'r')
dates = []
for line in f:
dates.append(line.split()[0])
f.close()
return dates
Now I just need help with creating the table which displays the max temp for any given date restrictions and calls the above functions with data, dates and station.

Not really sure what those functions are supposed to do, particularly as two of them seem to have the same name. Also there are many errors in your code.
file = open(stations(), 'r') here, you try to call stations as a function, but it seems to be a list.
station = file_name.split()[0] the files names have no space, so this has no effect. Did you mean split('.')?
values = line.strip().strip(' ') probably one of those strip should be split?
data[station] = values[1] overwrites data[station] in each iteration. You probably wanted to append the value?
temp.append(x) the variable temp is not defined; did you mean stations?
Also, instead of reading the dates and the values into two separate list, I suggest you create a list of tuples. This way you will only need a single function:
def get_data(filename):
with open(filename) as f:
data = []
for line in f:
try:
date, value = line.split()
data.append((int(date), float(value)))
except:
pass # pass on header, empty lines ,etc.
return data
If this is not an option, you might create a list of tuples by zipping the lists of dates and values, i.e. data = zip(dates, values). Then, you can use the max builtin function together with a list comprehension or generator expression for filtering the values between the dates and a special key function for sorting by the value.
def display_maxs(data, start_date, end_date):
return max(((d, v) for (d, v) in data
if start_date <= d <= end_date and v < 99999),
key=lambda x: x[1])
print display_maxs(get_data("Station1.txt"), 19600103, 19600106)

Use pandas. Reading in each text file is just a single function, with comment handling, missing data (99999.9) handling, and date handling. The below code will read in the files from a sequence of file names fnames, with handling for comments and converting 9999.9 to "missing" value. Then it will get the date from start to stop, and the sequence of station names (the file names minus the extensions), then get the maximum of each (in maxdf).
import pandas as pd
import os
def load_all_stations_data(stations):
"""Load the stations defined in the sequence of file names."""
sers = []
for fname in stations:
ser = pd.read_csv(fname, sep='\s+', header=0, index_col=0,
comment='#', engine='python', parse_dates=True,
squeeze=True, na_values=['99999.9'])
ser.name = os.path.splitext(fname)[0]
sers.append(ser)
return pd.concat(sers, axis=1)
def get_maxs(startdate, stopdate, stations):
"""Get the stations and date range given, then get the max for each"""
return df.loc[start:stop, sites].max(skipna=True)
Usage of the second function would be like so:
maxdf = get_maxs(df, '20021224','20021228', ("Rockhampton", "Cairns"))
If the #99999 = not recorded comment is not actually in your files, you can get rid of the engine='python' and comment='#' arguments:
def load_all_stations_data(stations):
"""Load the stations defined in the sequence of file names."""
sers = []
for fname in stations:
ser = pd.read_csv(fname, sep='\s+', header=0, index_col=0,
parse_dates=True, squeeze=True,
na_values=['99999.9'])
ser.name = os.path.splitext(fname)[0]
sers.append(ser)
return pd.concat(sers, axis=1)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Iterating over a csv file - python

Related

How to write CSV files with Python using print statements with variables in them

How to find minimum value from CSV file row in Python?

Add column from one .csv to another .csv file using Python

Transforming a text file into column vectors

Finding maximum values within a data set given restrictions

Categories

Resources