I'm taking the "Python for Everyone" course. I recently got to database structures, and while I managed to successfully pass the assignment, it was due to stumbling upon the solution.
The assignment wanted a program to read through a text file of e-mail mailbox addresses, collect them all in a dictionary, and count up each occurrence, then print out the address with the highest occurrence, as well as the number of times it appeared. This is the program I wrote:
name = input("Enter file: ")
if len(name) < 1:
name = "mbox-short.txt"
handle = open(name)
sender = dict()
for line in handle:
line = line.rstrip()
words = line.split()
if line.startswith("From "):
w = words[1]
sender[w] = sender.get(w, 0) + 1
else:
continue
print(w, sender[w])
Originally I had the final print command indented one (which included it in the for loop). The output was that it gave me a complete list of every address, as well as the current tally next to it. I found that by moving the print command out of the loop, it returned the desired output.
Why does w, when moved out of the loop, return the desired email address, instead of giving me all of the email addresses (since it's word[1])? Is it because it's part of the dictionary function to do so?
What you are doing in the for loop is reassigning a string to w on each iteration, so when you print w after the for loop you will only get the value that was most recently assigned to it. If you want to print all senders after the loop, you could do something like this:
for s, c in sender.items():
print(s, c)
Or if you want to output only the one(s) with the highest occurrence:
maxcount = max(sender.values())
for s, c in sender.items():
if c == maxcount:
print(s, c)
Related
I have to go through a txt file which contains all manner of info and pull the email address that occurs the most therewithin.
My code is as follows, but it does not work. It prints no output and I am not sure why. Here is the code:
name = input("Enter file:")
if len(name) < 1 : name = "mbox-short.txt"
handle = open(name)
names = handle.readlines()
count = dict()
for name in names:
name = name.split()
for letters in name:
if '#' not in letters:
name.remove(letters)
else:
continue
name = str(name)
if name not in count:
count[name] = 1
else:
count[name] = count[name]+ 1
print(max(count, key=count.get(1)))
As I understand it, this code works as follows:
we first open the file, then we read the lines, then we create an empty dict
Then in the first for loop, we split the txt file into a list based on each line.
Then, in the second for loop, for each item in each line, if there is no #, then it is removed.
We then return for the original for loop, where, if the name is not a key in dict, it is added with a value of 1; else one is added to its value.
Finally, we print the max key & value.
Where did I go wrong???
Thank you for your help in advance.
You need to change the last line to:
print(max(count, key=count.get))
EDIT
For sake of more explanation:
You were providing max() with the wrong ordering function by key=count.get(1).
So, count.get(1) would return default value or None when the key argument you passed to get() isn't in the dictionary.
If so, max() would then behave by outputing the max string key in your dictionary (as long as all your keys are strings and your dictionary is not empty).
Please use the following code:
names = '''hola#hola.com
whatsap#hola.com
hola#hola.com
hola#hola.com
klk#klk.com
klk#klk.com
klk#klk.com
klk#klk.com
klk#klk.com
whatsap#hola.com'''
count = list(names.split("\n"))
sett = set(names.split("\n"))
highest = count.count(count[0])
theone = count[0]
for i in sett:
l = count.count(i)
if l > highest:
highest = l
theone = i
print(theone)
Output:
klk#klk.com
Import Regular Expressions (re) as it will help in getting emails.
import re
name = input("Enter file:")
if len(name) < 1 : name = "mbox-short.txt"
handle = open(name)
names = "\n".join(handle.readlines())
email_ids = re.findall(r"[0-9a-zA-Z._+%]+#[0-9a-zA-Z._+%]+[.][0-9a-zA-Z.]+", names)
email_ids = [(email_ids.count(email_id), email_id) for email_id in email_ids].sort(reverse=True)
email_ids = set([i[1] for i in email_ids)
In the variable email_ids you will get a set of the emails arranged on the basis of their occurrences, in descending order.
I know that the code is lengthy and has a few redundant lines, but there are there to make the code self-explanatory.
sorry if this question has already been asked but I can't find any answer that solves my problem.
I am using Python 3.8 with PyCharm on Mac (if that information can help).
I just started learning python and have a solid C and MatLab background.
My goal here is to read some information about train stations from a file in the format
and then ask the user for a station and give out the names of the stations that are connected by trains. Here is my code:
fin = open('trains', 'r')
string = fin.read()
lines = string.split('\n')
print(lines)
station = input("Insert station name\n")
from_station = [] #stations from which trains arrive at the user's station
to_station = [] #stations to which trains arrive from user's station
for i in range(0,len(lines)):
words = lines[i].split()
for i in range(0,4):
print(words[i]) #put to check if the words list actually stores the different words
if words[0] == station:
to_station.append(words[2])
if words[2] == station:
from_station.append(words[0])
print("Trains arriving from stations: ")
print(from_station)
print("Trains going to stations: ")
print(to_station)
fin.close()
I keep getting the Index out of bounds error for print(words[i]) in line 17 even if my complier (or interpreter) manages to print the right informartions without any problem.
I cannot manage to compile the code after the end of the for.
Thank you in advance for your help
EDIT: Even if I make that correction you suggested - I didn't notice in the inner loop that mistake - I still keep getting that error. I get that error even if I remove that inner loop altogether.
issue is in this line
words = lines[i].split()
you need to check the len(words) each time and need to confirm that len(words) is in bound of your indices range
exactly viewing your data can resolve the issue
Use another variable in the inner loop other than 'i'.
The problem comes from your inner loop and also the iterator on the list words. You may have a list of two words then can have a Index out of bounds error.
fin = open('trains', 'r')
string = fin.read()
lines = string.split('\n')
print(lines)
station = input("Insert station name\n")
from_station = [] #stations from which trains arrive at the user's station
to_station = [] #stations to which trains arrive from user's station
for i in range(0,len(lines)):
words = lines[i].split()
for j in range(0,len(words)):
print(words[j]) #put to check if the words list actually stores the different words
if words[0] == station:
to_station.append(words[2])
if words[2] == station:
from_station.append(words[0])
print("Trains arriving from stations: ")
print(from_station)
print("Trains going to stations: ")
print(to_station)
fin.close()
beginning python programmer here. I am currently stuck with writing a small python script that would open a txt source file, find a specific number in that source file with a regular expression (107.5 in this case) and ultimately replace that 107.5 with a new number. the new number comes from a second txt file which contains 30 numbers. Each time a number has been replaced, the script uses the next number for its replacement. Although the command prompt does seem to print a successfull find and replace, "an IndexError: list index out of range" occurs after the 30th loop...
My hunge is that I somehow have to limit my loop with something like "for i in range x". However I am not sure which list this should be and how I can incorporate that loop limitation in my current code. Any help is much appreciated!
nTemplate = [" "]
output = open(r'C:\Users\Sammy\Downloads\output.txt','rw+')
count = 0
for line in templateImport:
priceValue = re.compile(r'107.5')
if priceValue.sub(pllines[count], line) != None:
priceValue.sub(pllines[count], line)
nTemplate.append(line)
count = count + 1
print('found a match. replaced ' + '107.5 ' + 'with ' + pllines[count] )
print(nTemplate)
else:
nTemplate.append(line)
The IndexError is raised because you are incrementing count in each iteration of the loop, but haven't added an upper limit based on how many values the pllines list actually contains. You should break out of the loop when it reaches len(pllines) in order to avoid the error.
Another issue which you may not have noticed is with your usage of the re.sub() method. It returns a new string with the appropriate replacements, and does not modify the original.
If the pattern doesn't exist in the string, it'll return the original itself. So your nTemplate list probably never had any of the replaced strings appended to it. Unless you need to do some other actions if the pattern was found in the line, you can do away with the if condition (as I have in the example below).
Since the priceValue object is the same for all lines, it can be moved outside the loop.
The following code should work:
nTemplate = [" "]
output = open(r'C:\Users\Sammy\Downloads\output.txt','rw+')
count = 0
priceValue = re.compile(r'107.5')
for line in templateImport:
if count == len(pllines):
break
nTemplate.append(priceValue.sub(pllines[count], line))
count = count + 1
print(nTemplate)
I want to populate a dictionary newDict in following code:
def sessions():
newDict = {}
output = exe(['loginctl','list-sessions']) # uses subprocess.check_output(). returns shell command's multiline output
i = 0;
for line in output.split('\n'):
words = line.split()
newDict[i] = {'session':words[0], 'uid':words[1], 'user':words[2], 'seat':words[4]}
i += 1
stdout(newDict) # prints using pprint.pprint(newDict)
But it only keeps giving me error:
newDict[i] = {'session':words[0], 'uid':words[1], 'user':words[2], 'seat':words[4]}
IndexError: list index out of range
If I do print words in the loop, here's what I get:
['c3', '1002', 'john', 'seat0']
['c4', '1003', 'jeff', 'seat0']
What am I doing wrong?
I think, it is a typo:
You use words[4] instead of words[3].
BTW:
Here is a slightly improved version of your code. It uses splitlines() instead of split('\n') and skips empty lines. And it uses enumerate(), wich is a pretty neat function when it comes to counting entries while iterating over collections.
def sessions():
newDict = {}
output = exe(['loginctl','list-sessions']) #returns shell command's multiline output
for i, line in enumerate(output.splitlines()):
if len(line.strip()) == 0:
continue
words = line.split()
print words
newDict[i] = {'session':words[0], 'uid':words[1], 'user':words[2], 'seat':words[3]}
stdout(newDict) # prints using pprint.pprint(newDict)
Imo You should check if "words" isn't too short. It's most likely problem with list length after spliting some line (It has no enough elements) .
My best guess is that words, does not allways hold five items,
please try to print len(words) before assigning the dict.
As far as I can tell, this has nothing to do with the dictionary itself, but with parsing the output. Here is an example of the output I obtain:
SESSION UID USER SEAT
c2 1000 willem seat0
1 sessions listed.
Or the string version:
' SESSION UID USER SEAT \n c2 1000 willem seat0 \n\n1 sessions listed.\n'
This all appears on the stdout. The problem is -- as you can see - is that not every line contains four words (there is the empty line at the bottom). Or more pythonic:
>>> lines[2].split()
[]
You thus have to implement a check whether the line has four columns:
def sessions():
newDict = {}
output = exe(['loginctl','list-sessions']) # uses subprocess.check_output(). returns shell command's multiline output
i = 0;
for line in output.split('\n'):
words = line.split()
if len(words) >= 4:
newDict[i] = {'session':words[0], 'uid':words[1], 'user':words[2], 'seat':words[3]}
i += 1
stdout(newDict)
(changes highlighted in boldface)
In the code I've also rewritten words[4] to words[3].
I am trying to write a function that will allow the user to enter a name or phone number, check if it is present in a file, and if it is prints out the entire line in which that element has been found. I have so far:
def searchPlayer():
with open("players.txt") as f:
data = f.readlines()
print "Enter 0 to go back"
nameSearch = str(raw_input("Enter player surname, forname, email, or phone number: "))
if any(nameSearch in s for s in data):
#Finding the element in the list works
#Can't think of a way to print the entire line with the player's information
else:
print nameSearch + " was not found in the database"
The file is formatted like so:
Joe;Bloggs;j.bloggs#anemailaddress.com;0719451625
Sarah;Brown;s.brown#anemailaddress.com;0749154184
So if nameSearch == Joe, the output should be Joe;Bloggs;j.bloggs#anemailaddress.com;0719451625
Any help would be appreciated, thank you
Why not use a loop?
for s in data:
if nameSearch in s:
print s
break
any is looping anyway, from the docs:
def any(iterable):
for element in iterable:
if element:
return True
return False
Seems too complicated, just do
with open("players.txt") as f:
for line in f:
if nameSearch in line:
print line
You can't use any as others have mentioned, but you can use next if you want to keep the more compact code. Instead of:
if any(nameSearch in s for s in data):
you'd use next with a default value:
entry = next((s for s in data if nameSearch in s), None)
if entry is not None:
print entry,
else:
print nameSearch, "was not found in the database"
Note: You might want to use csv.reader or the like to parse here, as otherwise you end up mistaking formatting for data; if a user enters ; you'll blindly return the first record, even though the ; was formatting, not field data. Similarly, a search for Jon would find the first person named Jon or Jonathan or any other name that might exist that begins with Jon.
As #alexis mentioned in a comment, you shouldn't use any() if you want to know which line matched. In this case, you should use a loop instead:
found = False
for s in data:
if nameSearch in s:
print s
found = True
#break # Optional. Put this in if you want to print only the first match.
if not found:
print "{} was not found in the database".format(nameSearch)
If you want to print only the first match, take out the hash sign before break and change if not found: to else:.