Reading CSV files and pulling specific data

Reading CSV files and pulling specific data - python

Here is some sample data
Game
Date
HomeTeam
FT
HT
AwayTeam
1
(Fri) 10 Aug 2018 (W32)
Manchester United FC
2-1
1-0
Leicester City FC
2
(Sat) 11 Aug 2018 (W32)
AFC Bournemouth
2-0
1-0
Cardiff City FC
3
(Sat) 11 Aug 2018 (W32)
Fulham FC
0-2
0-1
Crystal Palace FC
Based on the user input provide the total number of goals scored by a specific team throughout the season.
Asks user for the game number and provide names of the both teams and score for the game.
This is what I have so far (note that I'm not allowed to use pandas) ...
def t_goals():
f = open("EPL_18-19_HW2.txt")
next(f)
total_goals = 0
for lines in f:
game = lines.strip().split(',')
goals = game[3].split("-")
for num in goals:
total_goals += int(num)
f.close()
return total_goals

If you are not using pandas then use the inbuilt csv module(https://docs.python.org/3/library/csv).
Read your file like this:
def t_goals():
with open('your_csv_file.csv', 'r') as input:
# create a reader for your file
input_reader = csv.reader(input, delimiter=',')
# skip the first line which has the column names
next(input_reader)
total_goals = 0
# all the lines can be read by iteration
for line in input_reader:
# since all the values a line in a csv file are separated by comma, they are read as a list
# so read the FT column with line[3]
goals = line[3].split("-")
for num in goals:
total_goals += int(num)
# if you open a file using 'with', you don't have to explicitly write the close statement
return total goals

Here are some quick functions I wrote that I think acheive what you want.
For the first problem check if the team you passed to the function is either home or away then get the corresponding score and add it to total_goals:
def get_total_goals(target_team='Manchester United FC'):
total_goals = 0
with open('sample.csv', 'r') as f:
next(f)
for line in f:
current_home_team = line.strip().split(',')[2].strip()
current_away_team = line.strip().split(',')[5].strip()
if current_home_team == target_team:
score = int(line.strip().split(',')[3].split('-')[0])
total_goals += score
elif current_away_team == target_team:
score = int(line.strip().split(',')[3].split('-')[1])
total_goals += score
return total_goals
For the next problem, iterate through the rows and check if the game number equals the game number you've passed into the function. If there's a match, return the required details in a dictionary, if not then "Game not found" is returned.
def get_game_details(game_number=1):
with open('sample.csv', 'r') as f:
next(f)
for line in f:
if int(line.strip().split(',')[0]) == game_number:
return {
'HomeTeam':line.strip().split(',')[2],
'AwayTeam':line.strip().split(',')[5],
'FT':line.strip().split(',')[3],
'HT':line.strip().split(',')[4]
}
return "Game not found"
These should give you a starting point, you can make changes as required for your use case. You can also use the default csv module included in python as mentioned by anotherGatsby in their answer.

Related

Cleaning text file data after reading line by line without using pandas

Like lets say i have text file data like this..
|-------|
|Arsenal|
|-------|
|2021
|-------|
|Rnd|A|W|D|L|Venu|Date|
|R1|Tottenham|1|0|0|Emirates|March|
|R2|Man utd|0|1|0|Old Trafford|March|
|Total|Average|1234|5678|
|Arsenal|
|-------|
|2020|
|-------|
|Rnd|A|W|D|L|Venu|Date|
|R1|Chelsea|1|0|0|Stamford Bridge|March|
|R2|Mancity|0|1|0|Ethiad|March|
|Total|Average|1234|5678|
I want to convert this file in to 2D array (list of list) without using pandas. And hoping for output like this
Arsenal 2021 R1 Tottenham 1 0 0 Emirates March
Arsenal 2021 R2 Man utd 0 1 0 Old Trafford March
Arsenal 2020 R1 Chelsea 1 0 0 Stamford Bridge March
Arsenal 2020 R2 Man city 0 1 0 Ethiad March
So here i need to ignore |----|, |Rnd|, |Total|Average|1234|5678|, and i need to make Arsenal and 2021 attached to the every row and Arsenal and 2020 to every row in next year..
I have applied for loop going every line by line and created the list of list. But i couldn't delete the header like (Rnd,T,W,D,L,Venu, Total) and total, average while going through line by line without using pandas...

You can use variable like first_part = True/False to run different code in loop.
You can also use next(file) to read next line(s) from file so in first part you can read more lines to get word and year and set first_part = False. In second part you has to only add this word and year to lines and check if line starts with |Total' to change first_part = True`.
Minimal working example.
I uses io to simulate file but you should use open().
text = '''|Arsenal|
|-------|
|2021
|-------|
|Rnd|A|W|D|L|Venu|Date|
|R1|Tottenham|1|0|0|Emirates|March|
|R2|Man utd|0|1|0|Old Trafford|March|
|Total|Average|1234|5678|
|Arsenal|
|-------|
|2020|
|-------|
|Rnd|A|W|D|L|Venu|Date|
|R1|Chelsea|1|0|0|Stamford Bridge|March|
|R2|Mancity|0|1|0|Ethiad|March|
|Total|Average|1234|5678|'''
import io
#fh = open('data.csv')
fh = io.StringIO(text)
first_part = True
for line in fh:
if first_part:
word = line.rstrip('\n').rstrip('|')
line = next(fh)
line = next(fh)
year = line.rstrip('\n').rstrip('|')
line = next(fh)
line = next(fh)
first_part = False
else:
if line.startswith('|Total|'):
first_part = True
else:
new_line = word + year + line
print(new_line, end='')
Result:
|Arsenal|2021|R1|Tottenham|1|0|0|Emirates|March|
|Arsenal|2021|R2|Man utd|0|1|0|Old Trafford|March|
|Arsenal|2020|R1|Chelsea|1|0|0|Stamford Bridge|March|
|Arsenal|2020|R2|Mancity|0|1|0|Ethiad|March|

with open('Arsenal.txt', 'r') as f:
for line in f:
if not line.startswith(('| --- |', '| Rnd |','| Totals |','| Averages |')) :
line= line.strip()
field= line.split('|')
print(field)
#furas this is my code I tried

How do i skip the first two lines from my text file? [duplicate]

This question already has answers here:
How to ignore the first line of data when processing CSV data?
(18 answers)
Closed 3 years ago.
This is my code
from collections import Counter
counter = Counter()
with open('demo.txt') as f:
for line in f:
splits = line.split(';')
change = float(splits[6])
country = splits[1].strip()
counter[country] += change
#Percentage Change By Countries"
print()
print ("Percentage Change By Countries")
for country, change_sum in counter.most_common():
print(country, change_sum,"%")
This is the text file "Demo.txt"
World Population Data 2019 from the United Nations
Rank; Country; 2018; 2019; % Share; Pop Change; % Change; Continent
1; China; 1427647786; 1433783686; 18.6; 6135900; 0.43; Asia
2; India; 1352642280; 1366417754; 17.7; 13775474; 1.02; Asia
3; United States of America; 327096265; 329064917; 4.27; 1968652; 0.60; North America
4; Indonesia; 267670543; 270625568; 3.51; 2955025; 1.10; Asia
5; Pakistan; 212228286; 216565318; 2.81; 4337032; 2.04; Asia
6; Brazil; 209469323; 211049527; 2.74; 1580204; 0.75; South America
7; Nigeria; 195874683; 200963599; 2.61; 5088916; 2.60; Africa
8; Bangladesh; 161376708; 163046161; 2.11; 1669453; 1.03; Asia
9; Russian Federation; 145734038; 145872256; 1.89; 138218; 0.09; Europe
10; Mexico; 126190788; 127575529; 1.65; 1384741; 1.10; North America
I tried readlines() but i received an error " Out of range".How do i skip the first two lines?

If you want to skip the first n lines, you can just call next on the file object n times:
with open("demo.txt") as f:
for _ in range(2):
next(f)
for line in f:
...
This solution avoids having to call f.readlines(), which will allocate a list containing all lines, which you then slice to allocate another list.

def file_search():
userInput = input('Enter a country: ').lower()
result = []
with open("json.txt", 'r') as f:
for x in f:
if userInput in x.lower():
result.append(x.split(';'))
for s in result:
print(s[1] + "check:" + s[3])
file_search()
from collections import Counter
counter = Counter()
with open('json.txt') as f:
for i in range(0,2):
next(f)
for line in f:
splits = line.split(';')
change = float(splits[6])
country = splits[1].strip()
counter[country] += change
#Percentage Change By Countries"
print()
print ("Percentage Change By Countries")
for country, change_sum in counter.most_common():
print(country, change_sum,"%")

this may help
Also
with open(file) as f:
content = f.readlines()
content = content[2:]

Either use #SayandipDutta's comment, so instead of:
if userInput in x.lower():
You use:
if x[0].isnumeric() and userInput in x.lower():
Or do:
with open("demo.txt", 'r') as f:
for x in f.readlines()[2:]:
if userInput in x.lower():
result.append(x.split(';'))

Python File Reading & Writing

So I need to write a program that reads a text file, and copies its contents to another file. I then have to add a column at the end of the text file, and populate that column with an int that is calculated using the function calc_bill. I can get it to copy the contents of the original file to the new one, but I cannot seem to get my program to read in the ints necessary for calc_bill to run.
Any help would be greatly appreciated.
Here are the first 3 lines of the text file I am reading from:
CustomerID Title FirstName MiddleName LastName Customer Type
1 Mr. Orlando N. Gee Residential 297780 302555
2 Mr. Keith NULL Harris Residential 274964 278126
It is copying the file exactly as it is supposed to to the new file. What is not working is writing the bill_amount (calc_bill)/ billVal(main) to the new file in a new column. Here is the expected output to the new file:
CustomerID Title FirstName MiddleName LastName Customer Type Company Name Start Reading End Reading BillVal
1 Mr. Orlando N. Gee Residential 297780 302555 some number
2 Mr. Keith NULL Harris Residential 274964 278126 some number
And here is my code:
def main():
file_in = open("water_supplies.txt", "r")
file_in.readline()
file_out = input("Please enter a file name for the output:")
output_file = open(file_out, 'w')
lines = file_in.readlines()
for line in lines:
lines = [line.split('\t')]
#output_file.write(str(lines)+ "\n")
billVal = 0
c_type = line[5]
start = int(line[7])
end = int(line[8])
billVal = calc_bill(c_type, start, end)
output_file.write(str(lines)+ "\t" + str(billVal) + "\n")
def calc_bill(customer_type, start_reading, end_reading):
price_per_gallon = 0
if customer_type == "Residential":
price_per_gallon = .012
elif customer_type == "Commercial":
price_per_gallon = .011
elif customer_type == "Industrial":
price_per_gallon = .01
if start_reading >= end_reading:
print("Error: please try again")
else:
reading = end_reading - start_reading
bill_amount = reading * price_per_gallon
return bill_amount
main()

There are the issues mentioned above, but here is a small change to your main() method that works correctly.
def main():
file_in = open("water_supplies.txt", "r")
# skip the headers in the input file, and save for output
headers = file_in.readline()
# changed to raw_input to not require quotes
file_out = raw_input("Please enter a file name for the output: ")
output_file = open(file_out, 'w')
# write the headers back into output file
output_file.write(headers)
lines = file_in.readlines()
for line in lines:
# renamed variable here to split
split = line.split('\t')
bill_val = 0
c_type = split[5]
start = int(split[6])
end = int(split[7])
bill_val = calc_bill(c_type, start, end)
# line is already a string, don't need to cast it
# added rstrip() to remove trailing newline
output_file.write(line.rstrip() + "\t" + str(bill_val) + "\n")
Note that the line variable in your loop includes the trailing newline, so you will need to strip that off as well if you're going to write it to the output file as-is. Your start and end indices were off by 1 as well, so I changed to split[6] and split[7].
It is a good idea to not require the user to include the quotes for the filename, so keep that in mind as well. An easy way is to just use raw_input instead of input.
Sample input file (from OP):
CustomerID Title FirstName MiddleName LastName Customer Type
1 Mr. Orlando N. Gee Residential 297780 302555
2 Mr. Keith NULL Harris Residential 274964 278126
$ python test.py
Please enter a file name for the output:test.out
Output (test.out):
1 Mr. Orlando N. Gee Residential 297780 302555 57.3
2 Mr. Keith NULL Harris Residential 274964 278126 37.944

There are a couple things. The inconsistent spacing in your column names makes counting the actual columns a bit confusing, but I believe there are 9 column names there. However, each of your rows of data have only 8 elements, so it looks like you've got an extra column name (maybe "CompanyName"). So get rid of that, or fix the data.
Then your "start" and "end" variables are pointing to indexes 7 and 8, respectively. However, since there are only 8 elements in the row, I think the indexes should be 6 and 7.
Another problem could be that inside your for-loop through "lines", you set "lines" to the elements in that line. I would suggest renaming the second "lines" variable inside the for-loop to something else, like "elements".
Aside from that, I'd just caution you about naming consistency. Some of your column names are camel-case and others have spaces. Some of your variables are separated by underscores and others are camel-case.
Hopefully that helps. Let me know if you have any other questions.

You have two errors in handling your variables, both in the same line:
lines = [line.split()]
You put this into your lines variable, which is the entire file contents. You just lost the rest of your input data.
You made a new list-of-list from the return of split.
Try this line:
line = line.split()
I got reasonable output with that change, once I make a couple of assumptions about your placement of tabs.
Also, consider not overwriting a variable with a different data semantic; it confuses the usage. For instance:
for record in lines:
line = record.split()

Trouble with dictionaries

The idea for this function is to take a file as input. this file contains politicians with their respective parties. independent is 1, republican is 2, democrat is 3, and not known is 4. what has to be returned is the number of times each party is represented.
the file has independent 6, republican 16, democrat 22, and not known 6.
the output should look like this.
Independent 6
Republican 16
Democrat 22
Not Known 6
but what i have is
4 6
3 22
2 16
1 6
and I'm not sure how to change the number representing the parties to the names of the actual parties.
def polDict(s1):
infile=open(s1,'r')
content=infile.read()
counters={}
party='1234'
wordList = content.split()
for i in wordList:
if i in party:
if i in counters:
counters[i]+=1
else:
counters[i]=1
for i in counters:
print('{:2} {}'.format(i,counters[i]))

You haven't provided much information about how your file looks like; that being said, with the limited information given, if I understood your code correctly, what you need to do is define a dictionary with the party names and their respective numbers and then edit your print statement to print the party name respective to i instead of i itself:
def polDict(s1):
infile=open(s1,'r')
content=infile.read()
counters={}
party='1234'
party_names = {1:'Independent', 2:'Republican', 3:'Democrat', 4:'Not known'}
wordList = content.split()
for i in wordList:
if i in party:
if i in counters:
counters[i]+=1
else:
counters[i]=1
for i in counters:
print('{:2} {}'.format(party_names[i], counters[i]))

You forgot to close your open() which is one of many reasons to use the with block. Anyways, I'm assuming this is the style of the input file:
Clinton 3
Cruz 2
Sanders 3
Trump 2
Dutter 1
And you want the output to be:
Republican 2
Democratic 2
Independent 1
If this is not correct, then this function should be changed to fit exactly what you want.
from collections import defaultdict
def getCandidates(infile):
parties = {1: "Independent", 2: "Republican", 3: "Democratic", 4: "Unknown"}
candidates = defaultdict(int)
with open(infile, "r") as fin:
for line in fin: # assuming only 2 columns and the last column is the number
candidates[parties[int(line.split()[-1])]] += 1
for party, count in candidates.items(): #.iteritems() in python 2.7
print("{} {}".format(party, count))
getCandidates("test.txt")

Return the average mark for all student in that Section

I know it was asked already but the answers the super unclear
The first requirement is to open a file (sadly I have no idea how to do that)
The second requirement is a section of code that does the following:
Each line represents a single student and consists of a student number, a name, a section code and a midterm grade, all separated by whitespace
So I don't think i can target that element due to it being separate by whitespace?
Here is an excerpt of the file, showing line structure
987654322 Xu Carolyn L0101 19.5
233432555 Jones Billy Andrew L5101 16.0
555432345 Patel Amrit L0101 13.5
888332441 Fletcher Bobby L0201 18
777998713 Van Ryan Sarah Jane L5101 20
877633234 Zhang Peter L0102 9.5
543444555 Martin Joseph L0101 15
876543222 Abdolhosseini Mohammad Mazen L0102 18.5
I was provided the following hints:
Notice that the number of names per student varies.
Use rstrip() to get rid of extraneous whitespace at the end of the lines.
I don't understand the second hint.
This is what I have so far:
counter = 0
elements = -1
for sets in the_file
elements = elements + 1
if elements = 3
I know it has something to do with readlines() and the targeting the section code.

marks = [float(line.strip().split()[-1]) for line in open('path/to/input/file')]
average = sum(marks)/len(marks)
Hope this helps

Open and writing to files
strip method
Something like this?
data = {}
with open(filename) as f:#open a file
for line in f.readlines():#proceed through file lines
#next row is to split data using spaces and them skip empty using strip
stData = [x.strip() for x in line.split() if x.strip()]
#assign to variables
studentN, studentName, sectionCode, midtermGrade = stData
if sectionCode not in data:
data[sectionCode] = []
#building dict, key is a section code, value is a tuple with student info
data[sectionCode].append([studentN, studentName, float(midtermGrade)]
#make calculations
for k,v in data.iteritems():#iteritems returns you (key, value) pair on each iteration
print 'Section:' + k + ' Grade:' + str(sum(x[2] for x in v['grade']))

more or less:
infile = open('grade_file.txt', 'r')
score = 0
n = 0
for line in infile.readlines():
score += float(line.rstrip().split()[-1])
n += 1
avg = score / n

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reading CSV files and pulling specific data - python

Related

Cleaning text file data after reading line by line without using pandas

How do i skip the first two lines from my text file? [duplicate]

Python File Reading & Writing

Trouble with dictionaries

Return the average mark for all student in that Section

Categories

Resources