Disclaimer: New to python
I am trying to print out a count of attributes from a list of objects, and when i try to run it, the count comes back as zero. The list of object prints fine, however trying to count the countries each student is from is proving difficult.
Below is the txt file i am reading from, the class i have set up and the code. Any help would be greatly appreciated. I have attached a screenshot of the output at the bottom.
(I have had to space of the data from the text file)
B123, Jones, Barry, 24, Wales
B134, Kerry, Jane, 21, Scotland
B456, Smith, Percy, 19, England
B788, Roberts, Mary, 20, England
B543, Brown, Sinead, 22, Scotland
B777, Wilson, Rachel, 24, Wales
B321, Taylor, Peter, 20, England
B448, Anderson, Jill, 18, England
B999, Moore, Misty, 20, Wales
B278, Jackson, Bob, 23, Scotland
class Student:
def __init__(self, student_id, surname, forename, age, country):
self.student_id = student_id
self.surname = surname
self.forename = forename
self.age = age
self.country = country
def printStudentDetails(self):
print("StudentID: ", self.student_id)
print("Surname: ", self.surname)
print("Forename: ", self.forename)
print("Age: ", self.age)
print("Country: ", self.country)
from Student import *
students_list = []
students_text = open("studentsText.txt", "r")
for line in students_text:
split_line = line.split(", ")
students = Student(*split_line)
students_list.append(students)
students_text.close()
def print_students1():
english_count = 0
scotland_count = 0
wales_count = 0
for studentObj in students_list:
studentObj.printStudentDetails()
if studentObj.country == "England":
english_count += 1
elif studentObj.country == "Scotland":
scotland_count += 1
elif studentObj.country == "Wales":
wales_count += 1
print("The amount of students is ", len(students_list))
print("English Students: ", english_count)
print("Scottish Students: ", scotland_count)
print("Welsh Students: ", wales_count)
Output for print(studentObj)
It looks like whitespace characters could be causing the if statements to always return false. Try using the .strip() function:
if studentObj.country.strip() == "England":
english_count += 1
elif studentObj.country.strip() == "Scotland":
scotland_count += 1
elif studentObj.country.strip() == "Wales":
wales_count += 1
The following could be helpful. "blankpaper.txt" is a file with the text you provided pasted in. The program uses a counts dictionary to store the number of observations by country. The program reads the file, line-by-line. For each line, the count for the corresponding country is incremented in counts.
The code is designed so that if you need to make use of information in more than one column (presently we only need information from the last column) then it would be easy to modify. To illustrate this, snippet 2 illustrates how to also compute the min and max ages from the file (while also counting the number of observations by country).
I hope this helps. If there are any questions and/or if there is any way that you think I can help please let me know!
Snippet 1 (counts observations by country)
import csv
# dictionary to store number of observations by country
counts = {"Wales": 0, "England": 0, "Scotland": 0}
with open("blankpaper.txt", newline = '') as f:
# returns reader object used to iterate over lines of f
spamreader = csv.reader(f, delimiter = ',')
# each row read from file is returned as a list of strings
for index_a, row in enumerate(spamreader):
# reversed() returns reverse iterator (start from end of list of str)
for index_b, i in enumerate(reversed(row)):
# if last element of line (where countries are)
if index_b == 0:
for key in counts:
if key in i:
counts[key] += 1
break
break
print(f"Counts: {counts}")
Output
Counts: {'Wales': 3, 'England': 4, 'Scotland': 3}
Snippet 2 (counts observations by country and computes max and min ages)
import csv
# dictionary to store number of observations by country
counts = {"Wales": 0, "England": 0, "Scotland": 0}
with open("blankpaper.txt", newline = '') as f:
# returns reader object used to iterate over lines of f
spamreader = csv.reader(f, delimiter = ',')
# each row read from file is returned as a list of strings
for index_a, row in enumerate(spamreader):
# reversed() returns reverse iterator (start from end of list of str)
for index_b, i in enumerate(reversed(row)):
# if last element of line (where countries are)
if index_b == 0:
for key in counts:
if key in i:
counts[key] += 1
break
continue
# second to last element of line
i = int(i)
# if first line, second to last element
if index_a == 0:
# initialize max_age and min_age
max_age = min_age = i
break
#print(row)
# if encounter an age greater than current max, make that the max
if i > max_age:
max_age = i
# if encounter an age less than current min, make that the min
if i < min_age:
min_age = i
print(f"\nMax age from file: {max_age}")
print(f"Min age from file: {min_age}")
print(f"Counts: {counts}")
Output
Max age from file: 24
Min age from file: 18
Counts: {'Wales': 3, 'England': 4, 'Scotland': 3}
Related
I have a txt file with this structure of data:
3
100 name1
200 name2
50 name3
2
1000 name1
2000 name2
0
The input contains several sets. Each set starts with a row containing one natural number N, the number of bids, 1 ≤ N ≤ 100. Next, there are N rows containing the player's price and his name separated by a space. The player's prize is an integer and ranges from 1 to 2*109.
Expected out is:
Name2
Name2
How can I find the highest price and name for each set of data?
I had to try this:(find the highest price)
offer = []
name = []
with open("futbal_zoznam_hracov.txt", "r") as f:
for line in f:
maximum = []
while not line.isdigit():
price = line.strip().split()[0]
offer.append(int(price))
break
maximum.append(max(offer[1:]))
print(offer)
print(maximum)
This creates a list of all sets but not one by one. Thank you for your advice.
You'll want to manually loop over each set using the numbers, rather than a for loop over the whole file
For example
with open("futbal_zoznam_hracov.txt") as f:
while True:
try: # until end of file
bids = int(next(f).strip())
if bids == 0:
continue # or break if this is guaranteed to be end of the file
max_price = float("-inf")
max_player = None
for _ in range(bids):
player = next(f).strip().split()
price = int(player[0])
if price > max_price:
max_price = price
max_player = player[1]
print(max_player)
except:
break
EDITED:
The lines in the input file containing a single token are irrelevant so this can be greatly simplified
with open('futbal_zoznam_hracov.txt') as f:
_set = []
for line in f:
p, *n = line.split()
if n:
_set.append((float(p), n[0]))
else:
if _set:
print(max(_set)[1])
_set = []
The first line gives the number of entries. Further, each entry contains the name of the candidate and the number of votes cast for him in one of the states. Summarize the results of the elections: for each candidate, determine the number of votes cast for him. Use dictionaries to complete the tasks.
Input:
Number of voting records (integer number), then pairs <> - <>
Output:
Print the solution of the problem.
Example:
Input:
5
McCain 10
McCain 5
Obama 9
Obama 8
McCain 1
Output:
McCain 16
Obama 17
My problem is at the step when I have to sum keys with same names but different values.
My code is:
cand_n = int(input())
count = 0
countd = 0
cand_list = []
cand_f = []
num = []
surname = []
edict = {}
while count < cand_n:
cand = input()
count += 1
cand_f = cand.split(' ')
cand_list.append(cand_f)
for k in cand_list:
for i in k:
if i.isdigit():
num.append(int(i))
else: surname.append(i)
while countd < cand_n:
edict[surname[countd]] = num[countd]
countd += 1
print(edict)
You can add the name and vote to the dictionary directly instead of using one more for() and while().
If the name does not exist in the dictionary, you add the name and vote. If the name exists in the dictionary, increase the vote.
cand_n = int(input())
count = 0
cand_list = []
cand_f = []
edict = {}
while count < cand_n:
cand = input()
count += 1
cand_f = cand.split(' ')
if cand_f[0] in edict:
edict[cand_f[0]] += int(cand_f[1])
else:
edict[cand_f[0]] = int(cand_f[1])
print(edict)
I am having difficulty with finding the mean, median, mode, counting occurrences of a value within a csv file.
This section of the file is a column of letters 'M' or 'F'
This specific excerpt of code displays a problem I am facing:
I am not sure why the counting variables are not being incremented.
Any assistance would be greatly appreciated
citations2 = open('Non Traffic Citations.csv')
data2 = csv.reader(citations2)
gender = []
for row in data2:
gender.append(row[2])
del gender [0]
male_count = 0
female_count = 0
for item in gender:
# print(item) - shows that the list has values within it
if 'M' == item:
male_count = + 1
if 'F' == item:
female_count = + 1
print(male_count)
print(female_count)
If you are trying to increment the gender counts, you have the syntax incorrect in your loop.
for item in gender:
if 'F' == item:
female_count += 1
elif 'M' == item:
male_count += 1
print(male_count)
print(female_count)
You can use pandas:
import pandas as pd
df=pd.read_csv('Non Traffic Citations.csv')
df.describe()
I am quite new to working with python, so i hope you can help me out here. I have to write a programm that opens a csv file, reads it and let you select columns you want by entering the number. those have to be put in a new file. the problem is: after doing the input of which columns i want and putting "X" to start the main-part it generates exactly what i want but by using a loop, not printing a variable that contains it. But for the csv-writer i need to have a variable containg it. any ideas? here you have my code, for questions feel free to ask. the csvfile is just like:
john, smith, 37, blue, michigan
tom, miller, 25, orange, new york
jack, o'neill, 40, green, Colorado Springs
...etc
Code is:
import csv
with open("test.csv","r") as t:
t_read = csv.reader(t, delimiter=",")
t_list = []
max_row = 0
for row in t_read:
if len(row) != 0:
if max_row < len(row):
max_row = len(row)
t_list = t_list + [row]
print([row], sep = "\n")
twrite = csv.writer(t, delimiter = ",")
tout = []
counter = 0
matrix = []
for i in range(len(t_list)):
matrix.append([])
print(len(t_list), max_row, len(matrix), "Rows / Columns / Matrix Dimension")
eingabe = input("Enter column number you need or X to start generating output: ")
nr = int(eingabe)
while type(nr) == int:
colNr = nr-1
if max_row > colNr and colNr >= 0:
nr = int(nr)
# print (type(nr))
for i in range(len(t_list)):
row_A=t_list[i]
matrix[i].append(row_A[int(colNr)])
print(row_A[int(colNr)])
counter = counter +1
matrix.append([])
else:
print("ERROR")
nr = input("Enter column number you need or X to start generating output: ")
if nr == "x":
print("\n"+"Generating Output... " + "\n")
for row in matrix:
# Loop over columns.
for column in row:
print(column + " ", end="")
print(end="\n")
else:
nr = int(nr)
print("\n")
t.close()
Well you have everything you need with matrix, apart from an erroneous line that adds an unneeded row:
counter = counter +1
matrix.append([]) # <= remove this line
else:
print("ERROR")
You can then simply do:
if nr == "x":
print("\n"+"Generating Output... " + "\n")
with open("testout.csv", "w") as out:
wr = csv.writer(out, delimiter=",")
wr.writerows(matrix)
I've a list which has approximately 177071007 items.
and i'm trying to perform the following operations
a) get the first and last occurance of a unique item in the list.
b) the number of occurances.
def parse_data(file, op_file_test):
ins = csv.reader(open(file, 'rb'), delimiter = '\t')
pc = list()
rd = list()
deltas = list()
reoccurance = list()
try:
for row in ins:
pc.append(int(row[0]))
rd.append(int(row[1]))
except:
print row
pass
unique_pc = set(pc)
unique_pc = list(unique_pc)
print "closing file"
#takes a long time from here!
for a in range(0, len(unique_pc)):
index_first_occurance = pc.index(unique_pc[a])
index_last_occurance = len(pc) - 1 - pc[::-1].index(unique_pc[a])
delta_rd = rd[index_last_occurance] - rd[index_first_occurance]
deltas.append(int(delta_rd))
reoccurance.append(pc.count(unique_pc[a]))
print unique_pc[a] , delta_rd, reoccurance[a]
print "printing to file"
map_file = open(op_file_test,'a')
for a in range(0, len(unique_pc)):
print >>map_file, "%d, %d, %d" % (unique_pc[a], deltas[a], reoccurance)
map_file.close()
However the complexity is in the order of O(n).
Would there be a possibility to make the for loop 'run fast', by that i mean, do you think yielding would make it fast? or is there any other way? unfortunately, i don't have numpy
Try the following:
from collections import defaultdict
# Keep a dictionary of our rd and pc values, with the value as a list of the line numbers each occurs on
# e.g. {'10': [1, 45, 79]}
pc_elements = defaultdict(list)
rd_elements = defaultdict(list)
with open(file, 'rb') as f:
line_number = 0
csvin = csv.reader(f, delimiter='\t')
for row in csvin:
try:
pc_elements[int(row[0])].append(line_number)
rd_elements[int(row[1])].append(line_number)
line_number += 1
except ValueError:
print("Not a number")
print(row)
line_number += 1
continue
for pc, indexes in pc_elements.iteritems():
print("pc {0} appears {1} times. First on row {2}, last on row {3}".format(
pc,
len(indexes),
indexes[0],
indexes[-1]
))
This works by creating a dictionary, when reading the TSV with the pc value as the the key and a list of occurrences as the value. By the nature of a dict the key must be unique so we avoid the set and the list values are only being used to keep the rows that key occurs on.
Example:
pc_elements = {10: [4, 10, 18, 101], 8: [3, 12, 13]}
would output:
"pc 10 appears 4 times. First on row 4, last on row 101"
"pc 8 appears 3 times. First on row 3, last on row 13"
As you scan items from your input file, put the items into a collections.defaultdict(list) where the key is the item and the value is a list of occurence indices. It will take linear time to read the file and build up this data structure and constant time to get the first and last occurrence index of an item, and constant time to get the number of occurrences of an item.
Here's how it might work
mydict = collections.defaultdict(list)
for item, index in itemfilereader: # O(n)
mydict[item].append(index)
# first occurrence of item, O(1)
mydict[item][0]
# last occurrence of item, O(1)
mydict[item][-1]
# number of occurrences of item, O(1)
len(mydict[item])
Maybe it's worth chaning the data structure used. I'd use a dict that uses pc as key and the occurence as values.
lookup = dict{}
counter = 0
for line in ins:
values = lookup.setdefault(int(line[0]),[])
values.append(tuple(counter,int(line[1])))
counter += 1
for key, val in lookup.iteritems():
value_of_first_occurence = lookup[key][1][1]
value_of_last_occurence = lookup[key][-1][1]
first_occurence = lookup[key][1][0]
last_occurence = lookup[key][-1][0]
value = lookup[key][0]
Try replacing list by dicts, lookup in a dict is much faster than in a long list.
That could be something like this:
def parse_data(file, op_file_test):
ins = csv.reader(open(file, 'rb'), delimiter = '\t')
# Dict of pc -> [rd first occurence, rd last occurence, list of occurences]
occurences = {}
for i in range(0, len(ins)):
row = ins[i]
try:
pc = int(row[0])
rd = int(row[1])
except:
print row
continue
if pc not in occurences:
occurences[pc] = [rd, rd, i]
else:
occurences[pc][1] = rd
occurences[pc].append(i)
# (Remove the sorted is you don't need them sorted but need them faster)
for value in sorted(occurences.keys()):
print "value: %d, delta: %d, occurences: %s" % (
value, occurences[value][1] - occurences[value][0],
", ".join(occurences[value][2:])