Trouble with dictionaries - python

The idea for this function is to take a file as input. this file contains politicians with their respective parties. independent is 1, republican is 2, democrat is 3, and not known is 4. what has to be returned is the number of times each party is represented.
the file has independent 6, republican 16, democrat 22, and not known 6.
the output should look like this.
Independent 6
Republican 16
Democrat 22
Not Known 6
but what i have is
4 6
3 22
2 16
1 6
and I'm not sure how to change the number representing the parties to the names of the actual parties.
def polDict(s1):
infile=open(s1,'r')
content=infile.read()
counters={}
party='1234'
wordList = content.split()
for i in wordList:
if i in party:
if i in counters:
counters[i]+=1
else:
counters[i]=1
for i in counters:
print('{:2} {}'.format(i,counters[i]))

You haven't provided much information about how your file looks like; that being said, with the limited information given, if I understood your code correctly, what you need to do is define a dictionary with the party names and their respective numbers and then edit your print statement to print the party name respective to i instead of i itself:
def polDict(s1):
infile=open(s1,'r')
content=infile.read()
counters={}
party='1234'
party_names = {1:'Independent', 2:'Republican', 3:'Democrat', 4:'Not known'}
wordList = content.split()
for i in wordList:
if i in party:
if i in counters:
counters[i]+=1
else:
counters[i]=1
for i in counters:
print('{:2} {}'.format(party_names[i], counters[i]))

You forgot to close your open() which is one of many reasons to use the with block. Anyways, I'm assuming this is the style of the input file:
Clinton 3
Cruz 2
Sanders 3
Trump 2
Dutter 1
And you want the output to be:
Republican 2
Democratic 2
Independent 1
If this is not correct, then this function should be changed to fit exactly what you want.
from collections import defaultdict
def getCandidates(infile):
parties = {1: "Independent", 2: "Republican", 3: "Democratic", 4: "Unknown"}
candidates = defaultdict(int)
with open(infile, "r") as fin:
for line in fin: # assuming only 2 columns and the last column is the number
candidates[parties[int(line.split()[-1])]] += 1
for party, count in candidates.items(): #.iteritems() in python 2.7
print("{} {}".format(party, count))
getCandidates("test.txt")

Related

How to print out nicely formatted tables from a dictionary

For the sake of practicing how to be more comfortable and fluent in working with dictionaries, I have written a little program that reads the content of a file and adds it to a dictionary as a key: value pair. This is no problem, but when I got curious about how to print the content out again in the same format as the table in the datafile using for-loops, I ran into trouble.
My question is: How can I print out the content of the dictionary onto the terminal using for-loops?
The datafile is:
Name Age School
Anne 10 Eiksmarka
Tom 15 Marienlyst
Vidar 18 Persbråten
Ayla 18 Kongshavn
Johanne 17 Wang
Silje 16 Eikeli
Per 19 UiO
Ali 25 NTNU
My code is:
infile = open("table.dat", "r")
data = {}
headers = infile.readline().split()
for i in range(len(headers)):
data[headers[i]] = []
for line in infile:
words = line.split()
for i in range(len(headers)):
data[headers[i]].append(words[i])
infile.close()
I would like the out print the data back onto the terminal. Ideally, the out print should look something like this
Name Age School
Anne 10 Eiksmarka
Tom 15 Marienlyst
Vidar 18 Persbråten
Ayla 18 Kongshavn
Johanne 17 Wang
Silje 16 Eikeli
Per 19 UiO
Ali 25 NTNU
If someone can help me with this, I would be grateful.
The easiest solution is to use a library such as Tabulate, which you can find here an example of an output (You can customize it further)
>>> from tabulate import tabulate
>>> table = [["Sun",696000,1989100000],["Earth",6371,5973.6],
... ["Moon",1737,73.5],["Mars",3390,641.85]]
>>> print(tabulate(table))
----- ------ -------------
Sun 696000 1.9891e+09
Earth 6371 5973.6
Moon 1737 73.5
Mars 3390 641.85
----- ------ -------------
Otherwise, if you MUST use your own custom for-loop, you can add tabs to fix how it looks as in:
print(a+"\t") where \t is the horizental tabulation escape character
Edit: An example of how this can be utilized is below:
infile = open("table.dat", "r")
data = {}
headers = infile.readline().split()
for i in range(len(headers)):
data[headers[i]] = []
for line in infile:
words = line.split()
for i in range(len(headers)):
data[headers[i]].append(words[i])
print(words[i],end= '\t')
print()
infile.close()
Things to note:
1- For each field, we use print(...,end= '\t'), this causes the output to be a tab instead of a new line, we also might consider adding more tabs (e.g. end='\t\t') or spaces, or any other formating such as a seperator character (e.g. `end='\t|\t')
2- After each line, we use print(), this will only print a new line, moving the cursor for the printing downwards.
Take look at .ljust, .rjust and .center methods of str, consider following simple example
d = {"Alpha": 1, "Beta": 10, "Gamma": 100, "ExcessivelyLongName": 1}
for key, value in d.items():
print(key.ljust(5), str(value).rjust(3))
output
Alpha 1
Beta 10
Gamma 100
ExcessivelyLongName 1
Note that ljust does add (by default) space to attain specified width or do nothing if name is longer than that, also as values are integers they need to be first converted to str if you want to use one of mentioned methods.
You can do this using pandas although it isn't exactly your same styling:
import pandas as pd
with open('filename.csv') as f:
headers, *data = map(str.split, f.readlines())
df = pd.DataFrame(dict(zip(headers, zip(*data)))
print(df.to_string(index=False))
Name Age School
Anne 10 Eiksmarka
Tom 15 Marienlyst
Vidar 18 Persbråten
Ayla 18 Kongshavn
Johanne 17 Wang
Silje 16 Eikeli
Per 19 UiO
Ali 25 NTNU

Computational tractability of algorithm for matching names in two files in python

So I have two .txt files that I'm trying to match up. The first .txt file is just lines of about 12,500 names.
John Smith
Jane Smith
Joe Smith
The second .txt file also contains lines with names (that might repeat) but also extra info, about 17GB total.
584 19423 john smith John Smith 79946792 5 5 11 2016-06-24
584 19434 john smith John Smith 79923732 5 4 11 2018-03-14
584 19423 jane smith Jane Smith 79946792 5 5 11 2016-06-24
My goal is to find all the names from File 1 in File 2, and then spit out the File 2 lines that contain any of those File 1 names.
Here is my python code:
with open("Documents/File1.txt", "r") as t:
terms = [x.rstrip('\n') for x in t]
with open("Documents/File2.txt", "r") as f, open("Documents/matched.txt","w") as w:
for line in f:
if any([term in line for term in terms]):
w.write(line)
So this code definitely works, but it has been running for 3 days (and is still going!!!). I did some back-of-the-envelope calculations, and I'm very worried that my algorithm is computationally intractable (or hyper inefficient) given the size of the data.
Would anyone be able to provide feedback re: (1) whether this is actually intractable and/or extremely inefficient and if so (2) what an alternative algorithm might be?
Thank you!!
First, when testing membership, set and dict are going to be much, much faster, so terms should be a set:
with open("Documents/File1.txt", "r") as t:
terms = set(line.strip() for line in t)
Next, I would split each line into a list, and check if the name is in the set, not if members of the set are in the line, which is O(N) where N is the length of each line. This way you can directly pick out the column numbers (via slicing) that contain the first and last name:
with open("Documents/File2.txt", "r") as f, open("Documents/matched.txt","w") as w:
for line in f:
# split the line on whitespace
names = line.split()
# Your names seem to occur here
name = ' '.join(names[4:6])
if name in terms:
w.write(line)

Text files help(Python 3)

Students.txt
64 Mary Ryan
89 Michael Murphy
22 Pepe
78 Jenny Smith
57 Patrick James McMahon
89 John Kelly
22 Pepe
74 John C. Reilly
My code
f = open("students.txt","r")
for line in f:
words = line.strip().split()
mark = (words[0])
name = " ".join(words[1:])
for i in (mark):
print(i)
The output im getting is
6
4
8
9
2
2
7
8
etc...
My expected output is
64
80
22
78
etc..
Just curious to know how I would print the whole integer, not just a single integer at a time.
Any help would be more than appreciative.
As I can see you have some integer with a string in the text file. You wanted to know about your code will output only full Integer.
You can use the code
f = open("Students.txt","r")
for line in f:
l = line.split(" ")
print(l[0])
In Python, when you do this:
for i in (mark):
print(i)
and mark is of type string, you are asking Python to iterate over each character in the string. So, if your string contains space-separated integers and you iterate over the string, you'll get one integer at a time.
I believe in your code the line
mark = (words[0])name = " ".join(words[1:])
is a typo. If you fix that we can help you with what's missing (it's most likely a statement like mark = something.split(), but not sure what something is based on the code).
You should be using context managers when you open files so that they are automatically closed for you when the scope ends. Also mark should be a list to which you append the first element of the line split. All together it will look like this:
with open("students.txt","r") as f:
mark = []
for line in f:
mark.append(line.strip().split()[0])
for i in mark:
print(i)
The line
for i in (mark):
is same as this because mark is a string:
for i in mark:
I believe you want to make mark an element of some iterable, which you can create a tuple with single item by:
for i in (mark,):
and this should give what you want.
in your line:
line.strip().split()
you're not telling the sting to split based on a space. Try the following:
str(line).strip().split(" ")
A quick one with list comprehensions:
with open("students.txt","r") as f:
mark = [line.strip().split()[0] for line in f]
for i in mark:
print(i)

Python: I have a list of words, and want to check the number of occurrences of those words in each line in a file

So I want to count the occurrences of certain words, per line, in a text file. How many times each specific word occurred doesnt matter, just how many times any of them occurred per line. I have a file containing a list of words, delimited by newline character. It looks like this:
amazingly
astoundingly
awful
bloody
exceptionally
frightfully
.....
very
I then have another text file containing lines of text. Lets say for example:
frightfully frightfully amazingly Male. Don't forget male
green flag stops? bloody bloody bloody bloody
I'm biased.
LOOKS like he was headed very
green flag stops?
amazingly exceptionally exceptionally
astoundingly
hello world
I want my output to look like:
3
4
0
1
0
3
1
Here's my code:
def checkLine(line):
count = 0
with open("intensifiers.txt") as f:
for word in f:
if word[:-1] in line:
count += 1
print count
for line in open("intense.txt", "r"):
checkLine(line)
Here's my actual output:
4
1
0
1
0
2
1
0
any ideas?
How about this:
def checkLine(line):
with open("intensifiers.txt") as fh:
line_words = line.rstrip().split(' ')
check_words = [word.rstrip() for word in fh]
print sum(line_words.count(w) for w in check_words)
for line in open("intense.txt", "r"):
checkLine(line)
Output:
3
4
0
1
0
3
1
0

Return the average mark for all student in that Section

I know it was asked already but the answers the super unclear
The first requirement is to open a file (sadly I have no idea how to do that)
The second requirement is a section of code that does the following:
Each line represents a single student and consists of a student number, a name, a section code and a midterm grade, all separated by whitespace
So I don't think i can target that element due to it being separate by whitespace?
Here is an excerpt of the file, showing line structure
987654322 Xu Carolyn L0101 19.5
233432555 Jones Billy Andrew L5101 16.0
555432345 Patel Amrit L0101 13.5
888332441 Fletcher Bobby L0201 18
777998713 Van Ryan Sarah Jane L5101 20
877633234 Zhang Peter L0102 9.5
543444555 Martin Joseph L0101 15
876543222 Abdolhosseini Mohammad Mazen L0102 18.5
I was provided the following hints:
Notice that the number of names per student varies.
Use rstrip() to get rid of extraneous whitespace at the end of the lines.
I don't understand the second hint.
This is what I have so far:
counter = 0
elements = -1
for sets in the_file
elements = elements + 1
if elements = 3
I know it has something to do with readlines() and the targeting the section code.
marks = [float(line.strip().split()[-1]) for line in open('path/to/input/file')]
average = sum(marks)/len(marks)
Hope this helps
Open and writing to files
strip method
Something like this?
data = {}
with open(filename) as f:#open a file
for line in f.readlines():#proceed through file lines
#next row is to split data using spaces and them skip empty using strip
stData = [x.strip() for x in line.split() if x.strip()]
#assign to variables
studentN, studentName, sectionCode, midtermGrade = stData
if sectionCode not in data:
data[sectionCode] = []
#building dict, key is a section code, value is a tuple with student info
data[sectionCode].append([studentN, studentName, float(midtermGrade)]
#make calculations
for k,v in data.iteritems():#iteritems returns you (key, value) pair on each iteration
print 'Section:' + k + ' Grade:' + str(sum(x[2] for x in v['grade']))
more or less:
infile = open('grade_file.txt', 'r')
score = 0
n = 0
for line in infile.readlines():
score += float(line.rstrip().split()[-1])
n += 1
avg = score / n

Categories

Resources