Question about using dictionaries other than using Counter(email_lst).most_common() - python

I have a question regarding my python programming. Before I ask a question, here is the instructions that I have to complete:
[Write a program to read through the example_messages.txt file and figure out who has sent the greatest number of mail messages.
The program looks for 'From ' lines and takes the second word of those lines as the person who sent the mail. The program creates a Python dictionary that maps the sender's mail address to a count of the number of times they appear in the file.]
I attached a PDF file URL of the kind of program that I created. I also tried to attach the "example_messages" text file to Stack Overflow but wouldn't let me. So instead, I attached a photo instead.
[Program that I created] https://ibb.co/nm3dBYt
[Photo of example_messages.txt] https://ibb.co/qkmfrLn
I used the “Counter(email_lst).most_common()” function in my program to complete the task. This method works, but based on the assignment, I have to use the dictionary to complete the task, and I am having a difficult time coming up with any ideas when using dictionaries. The program should be no more than 10 lines of code. Does anyone have any ideas or suggestions?
Best Regards

You can "manually" add 1 to entries in the dictionary and use the max() function at the end to get the entry with the highest count:
counts = dict()
with open('example_messages.txt') as f:
for line in f.readlines():
if not line.stastswith("From"): continue
email = line.split()[1]
counts[email] = counts.get(email,0) + 1
email,count = max(counts.items(),key=lambda ec:ec[1])
print(email,"hase themost sent emails which is",count)

Related

Find IP address which occurs most frequent and count the number of times that it appears

Hi all First time having to look for assistance but i am sort of at a brick wall for now. i have been learning python since August and i have been giving a challenge to complete for the end of Novemeber and i hope that there could be some help in making my code works. My task requires to find an ip address which occurs most frequent and count the number of times it appears also this information must be displayed to the user i have been giving 4 files .txt that have the ips. I am also required to make use of non trivial data structures and built in python sorting and/or searching functionalities, make use of functions, parameter passing and return values in the program. Below is a sample data structure they have recommended that i use: -
`enter code here`
def analyse_logs(parameter):
# Your Code Hear
return something
def extract_ip(parameter):
# Your Code Hear
return something
def find_most_frequent(parameter):
# Your Code Hear
return something
# Test Program
def main():
# Your Code Hear
# Call Test Program
main()
And below hear is what i have came up with and the code is completley differant from the sample that has been provided but what i have done dosnt give me output straight back instead creats a new text file which has been sorted but now what i am looking for: -
enter code here
def sorting(filename):
infile = open(filename)
ip_addr = []
for line in infile:
temp = line.split()
for i in temp:
ip_addr.append(i)
infile.close()
ip_addr.sort()
outfile = open("result.txt", "w")
for i in ip_addr:
outfile.writelines(i)
outfile.writelines(" ")
outfile.close()
sorting("sample_log_1.txt")e here
The code that i have created has sorted everything thats in the .txt file and outputs the most frequent that has been used all the way to the lest frequent. All i am look for is for an algorithim that can sort through the .txt file, find the IP address thats more frequent then print that ip out and how many times it appears. I hope i have provided everything and i am sure this is probally somthing very basic but i just cant get my head round it.
You should keep the number of times the IP addresses are repeated in a variable. You can use dictionary.
ip_count_dict = {"IP1": repeat_count, "IP2": repeat_count}
When first time you find a IP in your list set repeat_count 1 and after that if you find same ip again just increase counter.
For example,
ip_count_dict = {}
ip_list = ['1.1.1.1','1.1.1.2','1.1.1.3','1.1.1.1']
#Loop and count ips
#Final version of ip_count_dict {'1.1.1.1':2 , '1.1.1.2':1, '1.1.1.3':1}
With this dictionary you can store all ips and sort by their value.
P.S.: Dictionary keeps key,value pairs you can search "sort dictionary by value" after all counting thing done.

How to use the most recently printed line as an input?

I am trying to create a Twitter bot that posts a random line from a text file. I have gone as far as generating the random lines, which print one at a time, and giving the bot access to my Twitter app, but I can't for the life of me figure out how to use a printed line as a status.
I am using Tweepy. My understanding is that I need to use api.update_status(status=X), but I don't know what X needs to be for the status to match the most recently printed line.
This is the relevant section of what I have so far:
from random import choice
x = 1
while True:
file = open('quotes.txt')
content = file.read()
lines = content.splitlines()
print(choice(lines))
api.update_status(status=(choice(lines)))
time.sleep(3600)
The bot is accessing Twitter no problem. It is currently posting another random quote generated by (choice(lines)), but I'd like it to match what prints immediately before.
I may not fully understand your question, but from the very top, where it says, "How to use the most recently printed line as an input", I think I can answer that. Whenever you use the print() command, store the argument into a string variable that overwrites its last value. Then it saves the last printed value.
Instead of directly printing a choice:
print(choice(lines))
create a new variable and use it in your print() and your api.update_status():
selected_quote = choice(lines)
print(selected_quote)
api.update_status(status=selected_quote)

Python code to read from a file and finding count of a word

This is the problem statement
Write a program to read through the mbox-short.txt and figure out who has the sent the greatest number of mail messages. The program looks for 'From ' lines and takes the second word of those lines as the person who sent the mail. The program creates a Python dictionary that maps the sender's mail address to a count of the number of times they appear in the file. After the dictionary is produced, the program reads through the dictionary using a maximum loop to find the most prolific committer.
name = input("Enter file:")
if len(name) < 1 :
name = "mbox-short.txt"
handle = open(name)
a=dict()
for line in handle:
if line.startswith('From'):
line=line.rstrip()
words=line.split()
for word in words:
a[word]=a.get(name,0)+1
print(a)
This is the code i have written. I know this is incomplete as i am not able to figure out the logic to do this. Please help. The desired output is cwen#iupui.edu 5
where cwen#iupui.edu is the word which appears the most and 5 is the count. So, it means cwem#iupui.edu appears 5 times in the text file. Remember every sender email ID starts after the first word (S0 first word is from and the second word is the sender ID (all differetn with differetn domain)). So, i need the maximum count of the most appearing sender ID. Hope this helps. Please let me know if you all want more details. And please help as i am stuck on this.

Infinite For Loop issue in Python

I need to extract an ID from a JSON string that is needed for loading information into a MySQL database. The ID is a 5 or 6 digit number, but the JSON key that contains this number is the URL net_devices resource string that has the number at the end like this example:
{u'router': u'https://www.somecompany.com/api/v2/routers/123456/'}
Since there is not a key with just the ID, I have used the following to return just the ID from the JSON key string:
url = 'https://www.somecompany.com/api/v2/net_devices/?fields=router,service_type'
r = json.loads(s.get((url), headers=headers).text)
status = r["data"]
for item in status:
type = item['service_type']
router_url = item['router']
router_id = router_url.replace("https://www.somecompany.com/api/v2/routers/", "")
id = router_id.replace("/", "")
print id
This does indeed return just the ID values I want, and it doesn't matter if the result varies in the number of digits.
The problem: This code creates an infinite loop when I include the two lines above the print statement.
How can I change the syntax to allow the loop to run through all the returned IDs once, but still strip out everything except the numerical ID?
I am new to Python, and just starting to write code again after a very long hiatus since college. Any help would be greatly appreciated!
UPDATE
Thanks everyone for the feedback! With the help from David and Gerrat, I was able to find the issue that was causing the infinite loop and it was not this segment of the code, but another segment that was not properly indented. I am learning how to properly indent loops in Python, and this was one of my silly mistakes! Thanks again for the help!

Using Python to split a Unicode file object into dictionary Keys and values

Hi and thanks for reading. I’ll admit that this is a progression on from a previous question I asked earlier, after I partially solved the issue. I am trying to process a block of text (file_object) in an earlier working function. The text or file_object happens to be in Unicode, but I have managed to convert to ascii text and split on a line by line basis. I am hoping to then further split the text on the ‘=’ symbol so that I can drop the text into a dictionary. For example Key: Value as ‘GPS Time’:’ 14:18:43’ so removing the trailing '.000' from the time (though this is a second issue).
Here’s the file_object format…
2015 Jan 01 20:07:16.047 GPS Info #Log packet ID
GPS Time = 14:18:43.000
Longitude = 000.65341
Latitude = +41.25385
Altitude = +111.400
This is my partially working function…
def process_data(file_object):
file_object = file_object.encode('ascii','ignore')
split = file_object.split('\n')
for i in range(len(split)):
while '=' in split[i]:
processed_data = (split[i].split('=', 1) for _ in xrange(len(split)))
return {k.strip(): v.strip() for k, v in processed_data}
This is the initial section of the main script that prompts the above function, and then sets GPS Time as the Dictionary key…
while (mypkt.Next()): #mypkt.Next is an API function in the log processor app I am using – essentially it grabs the whole GPS Info packet shown above
data = process_data(mypkt.Text, 1)
packets[data['GPS Time']] = data
The code above has no problem splitting the first instance ‘GPS Time’, but it ignores Lonitude, Latitude etc, To make matters worse, there is sometimes a blank line between each packet item too. I guess I need to store previous dictionary related splits before the ‘return’, but I am having difficulty trying to find out how to do this.
The dict output I am currently getting is…
'14:19:09.000': {'GPS Time': '14:19:09.000'},
But What I am hoping for is…
'14:19:09': {'GPS Time': '14:19:09',
‘Longitude’:’000.65341’,
‘Latitude’:’+41.25385’,
‘Altitude’:’+111.400’},
Thanks in advance for any help.
MikG
All this use of range(len(whatever)) is nonsense. You almost never need to do that in Python. Just iterate through the thing.
Your problem however is more fundamental: you return from inside the while loop. That means you only ever get one element, because as soon as that first line is processed, you return and the function ends.
Also, you have a while loop which means that processing will end as soon as the program encounters a line without an equals; but you have blank lines between each data line, so again execution would never proceed past the first one.
So all you need is:
split_data = file_object.split('\n')
result = {}
for line in split_data:
if '=' in line:
key, value = line.split('=', 1)
result[key.strip()] = value.strip()
return result

Categories

Resources