Find the email address that occurs the most in a txt file

Find the email address that occurs the most in a txt file - python

I have to go through a txt file which contains all manner of info and pull the email address that occurs the most therewithin.
My code is as follows, but it does not work. It prints no output and I am not sure why. Here is the code:
name = input("Enter file:")
if len(name) < 1 : name = "mbox-short.txt"
handle = open(name)
names = handle.readlines()
count = dict()
for name in names:
name = name.split()
for letters in name:
if '#' not in letters:
name.remove(letters)
else:
continue
name = str(name)
if name not in count:
count[name] = 1
else:
count[name] = count[name]+ 1
print(max(count, key=count.get(1)))
As I understand it, this code works as follows:
we first open the file, then we read the lines, then we create an empty dict
Then in the first for loop, we split the txt file into a list based on each line.
Then, in the second for loop, for each item in each line, if there is no #, then it is removed.
We then return for the original for loop, where, if the name is not a key in dict, it is added with a value of 1; else one is added to its value.
Finally, we print the max key & value.
Where did I go wrong???
Thank you for your help in advance.

You need to change the last line to:
print(max(count, key=count.get))
EDIT
For sake of more explanation:
You were providing max() with the wrong ordering function by key=count.get(1).
So, count.get(1) would return default value or None when the key argument you passed to get() isn't in the dictionary.
If so, max() would then behave by outputing the max string key in your dictionary (as long as all your keys are strings and your dictionary is not empty).

Please use the following code:
names = '''hola#hola.com
whatsap#hola.com
hola#hola.com
hola#hola.com
klk#klk.com
klk#klk.com
klk#klk.com
klk#klk.com
klk#klk.com
whatsap#hola.com'''
count = list(names.split("\n"))
sett = set(names.split("\n"))
highest = count.count(count[0])
theone = count[0]
for i in sett:
l = count.count(i)
if l > highest:
highest = l
theone = i
print(theone)
Output:
klk#klk.com

Import Regular Expressions (re) as it will help in getting emails.
import re
name = input("Enter file:")
if len(name) < 1 : name = "mbox-short.txt"
handle = open(name)
names = "\n".join(handle.readlines())
email_ids = re.findall(r"[0-9a-zA-Z._+%]+#[0-9a-zA-Z._+%]+[.][0-9a-zA-Z.]+", names)
email_ids = [(email_ids.count(email_id), email_id) for email_id in email_ids].sort(reverse=True)
email_ids = set([i[1] for i in email_ids)
In the variable email_ids you will get a set of the emails arranged on the basis of their occurrences, in descending order.
I know that the code is lengthy and has a few redundant lines, but there are there to make the code self-explanatory.

Related

How to save input in dictionary (but separate the words when there's a space)?

dict = {}
name_surname = input("Enter your name and surname: ").split(" ")
dict["Name and surname"] = name_surname
print(dict)
I need to make it so that when the user inputs their name and surname (example: Michael Jokester Scott), it will separate the name and the username, so I can use each of them later.
The purpose of this is to be able to take a randomized combinations of someones name and surname(s) and append a "#gmail.com" at the end. This way you get "randomized," but personal email address.
So in the end, I should be able to make a randomized email such as: "jokester.scott.michael#gmail.com."
What I have so far is pretty bad, I'm new to Python and I don't really understand dict well, lists are easier for me, but I need to learn this as well.

if i understood the problem correctly, you can use lists in a dict.
sample_dct['Name and Surname'] = []
# take input from user
name_surname_list = taken_data_from_user.split()
sample_dct['Name and Surname'].append(name_surname_list)
# get sample_dct values, iterate on these with a loop
# generate 2 random number range between (0,len(sample_dct)) use generated these random numbers to take random value.
# for surname, use [-1] index and store random_surname; for name, use [:-1] and store random_name.
random_name_full = '.'.join(random_name)
random_mail = '.'.join(random_name_full ,random_surname) + '#gmail.com'

Is this what you are looking for?
You never know what names you are going to get.
dict_data = {}
name_surname = input("Enter your name and surname: ").split(" ")
arr_size = len(name_surname)
def name(data):
count = 1
if arr_size == len(data):
dict_data['name'] = data[0]
data.pop(0)
dict_data['last_name'] = data[-1]
data.pop(-1)
while arr_size > len(data) != 0:
name = 'middle_name_' + str(count)
dict_data[name] = data[0]
data.pop(0)
count += 1
name(name_surname)
print(dict_data)

Extract data between two lines from text file

Say I have hundreds of text files like this example :
NAME
John Doe
DATE OF BIRTH
1992-02-16
BIO
THIS is
a PRETTY
long sentence
without ANY structure
HOBBIES
//..etc..
NAME, DATE OF BIRTH, BIO, and HOBBIES (and others) are always there, but text content and the number of lines between them can sometimes change.
I want to iterate through the file and store the string between each of these keys. For example, a variable called Name should contain the value stored between 'NAME' and 'DATE OF BIRTH'.
This is what I turned up with :
lines = f.readlines()
for line_number, line in enumerate(lines):
if "NAME" in line:
name = lines[line_number + 1] # In all files, Name is one line long.
elif "DATE OF BIRTH" in line:
date = lines[line_number + 2] # Date is also always two lines after
elif "BIO" in line:
for x in range(line_number + 1, line_number + 20): # Length of other data can be randomly bigger
if "HOBBIES" not in lines[x]:
bio += lines[x]
else:
break
elif "HOBBIES" in line:
#...
This works well enough, but I feel like instead of using many double loops, there must be a smarter and less hacky way to do it.
I'm looking for a general solution where NAME would store everything until DATE OF BIRTH, and BIO would store everything until HOBBIES, etc. With the intention of cleaning up and removing extra white lintes later.
Is it possible?
Edit : While I was reading through the answers, I realized I forgot a really significant detail, the keys will sometimes be repeated (in the same order).
That is, a single text file can contain more than one person. A list of persons should be created. The key Name signals the start of a new person.

I did it storing everything in a dictionary, see code below.
f = open("test.txt")
lines = f.readlines()
dict_text = {"NAME":[], "DATEOFBIRTH":[], "BIO":[]}
for line_number, line in enumerate(lines):
if not ("NAME" in line or "DATE OF BIRTH" in line or "BIO" in line):
text = line.replace("\n","")
dict_text[location].append(text)
else:
location = "".join((line.split()))

You could use a regular expression:
import re
keys = """
NAME
DATE OF BIRTH
BIO
HOBBIES
""".strip().splitlines()
key_pattern = '|'.join(f'{key.strip()}' for key in keys)
pattern = re.compile(fr'^({key_pattern})', re.M)
# uncomment to see the pattern
# print(pattern)
with open(filename) as f:
text = f.read()
parts = pattern.split(text)
... process parts ...
parts will be a list strings. The odd indexed positions (parts[1], parts[3], ...) will be the keys ('NAME', etc) and the even indexed positions (parts[2], parts[4], ...) will be the text in between the keys. parts[0] will be whatever was before the first key.

Instead of reading lines you could cast the file as one long string. Use string.index() to find the start index of your trigger words, then set everything from that index to the next trigger word index to a variable.
Something like:
string = str(f)
important_words = ['NAME', 'DATE OF BIRTH']
last_phrase = None
for phrase in important_words:
phrase_start = string.index(phrase)
phrase_end = phrase_start + len(phrase)
if last_phrase is not None:
get_data(string, last_phrase, phrase_start)
last_phrase = phrase_end
def get_data(string, previous_end_index, current_start_index):
usable_data = string[previous_end_index: current_start_index]
return usable_data
Better/shorter variable names should probably be used

You can just read the text in as 1 long string. And then make use of .split()
This will only work if the categories are in order and don't repeat.
Like so;
Categories = ["NAME", "DOB", "BIO"] // in the order they appear in text
Output = {}
Text = str(f)
for i in range(1,len(Categories)):
SplitText = Text.split(Categories[i])
Output.update({Categories[i-1] : SplitText[0] })
Text = SplitText[1]
Output.update({Categories[-1] : Text})

You can try the following.
keys = ["NAME","DATE OF BIRTH","BIO","HOBBIES"]
f = open("data.txt", "r")
result = {}
for line in f:
line = line.strip('\n')
if any(v in line for v in keys):
last_key = line
else:
result[last_key] = result.get(last_key, "") + line
print(result)
Output
{'NAME': 'John Doe', 'DATE OF BIRTH': '1992-02-16', 'BIO ': 'THIS is a PRETTY long sentence without ANY structure ', 'HOBBIES ': '//..etc..'}

How to split a single line input string having Name(1 or more words) and Number into ["Name" , "Number"] in Python?

I am a newbie. I failed one of the test cases in a phone book problem. As per the question, a user is expected to enter a single line input which contains a name (which can be one or more words) followed by a number. I have to split the the input into ["name","number"] and store it in dictionary. Note that the name will have one or more words(Eg: John Conor Jr. or Apollo Creed). I am confused with the splitting part. I tried out the split() function and re.split(). Not sure I can solve this.
Sample input 1 : david james 93930000
Sample Input 2 : hshhs kskssk sshs 99383000
Output: num = {"david james" : "93930000", "hshhs kskssk sshs" : "99383000"}
I need to store it in a dictionary where the key:value is "david james": "93930000"
Please help. Thank you
=====>I found a solution<==========
if __name__ == '__main__':
N=int(input())
phonebook={}
(*name,num) = input().split()
name = ''.join(map(str,name)
phonebook.update({name:num})
print(phonebook)
The astrik method words. But for a large data set this might slow me down. Not sure.

So im assuming that the inputs stated are coming from a user, if that
is the case you could change the format in your code to something
similar to this. You can change the range depending on how many inputs you want.
name = {}
for i in range(5):
student_name = input("Enter student's name: ")
student_mark = input("Enter student's mark: ")
name[student_name.title()] = student_mark
print(marks)
This should print the results in the way you mentioned!

Please check for the updated answer if this is what you are looking
for.
# Sample text in a line...
# With a name surname and number
txt = "Tamer Jar 9000"
# We define a dictionary variable
name_dictionary = {}
# We make a list inorder to appened the name and surname to the list
name_with_surname = []
# We split the text and if we print it out it should look something like this
# "Tamer", "Jar", "9000"
# But you need the name and surname together so we do that below
x = txt.split()
# We take the first value of the split text which is "Tamer"
# And we take the second value of the split text us "Jar"
name = x[0]
surname = x[1]
# And here we append them to our list
name_with_surname.append(name + " " + surname)
#print(name_with_surname)
# Finally here to display the values in a dictionary format
# We take the value of the list which is "Tamer Jar" and the value of the number "9000"
name_dictionary[name_with_surname[0]] = x[2]
print(name_dictionary)

The above answers can't handle if a data has too many name parts in one line.
Try my code below.
You can just loop through whatever the total number of inputs you want.
phonebook = {}
total_inputs = int(input())
for i in range(total_inputs):
name_marks = input().split() # taking input and splitting them by spaces
name = " ".join(x for x in name_marks[:-1]) # extracting the name
marks = name_marks[-1] # extracting the marks
phonebook[name] = marks # storing the marks in the dictionary
This way you can store the marks for the name. It will handle even one input has many name parts.

How to check for a empty element within a list of elements in python

Say for example i have a list of lists that contain data like this:
customer1 = ['Dan','24','red']
customer2 = ['Bob',' ','Blue']
customerlist = [customer1, customer2]
I would like to run a line of code that will run a function if one of these elements is empty. For example something like this:
for c in customerlist:
if not in c:
***RUN CODE***
else:
print('Customer Complete')
That way if a customer is missing data i can run some code.
Thanks for the help!

Instead of this:
if not in c:
You want this:
for val in c:
if not val.strip():
Which basically checks if any of the strings is empty (empty strings are "falsey" in Python). Stripping first detects strings which only contain whitespace.

You can use in to check for ' '
for c in customerlist:
if ' ' in c:
RUN CODE
else:
print('Customer Complete')

Both of the answers given by Guy and John are correct, but perhaps it would interest you to look into objects:
class Customer:
def __init__(self, name, age = None, color = None):
self.name = name
self.age = age if age else age_function_generator()
self.color = color if color else color_function_generator()
To create a customer, then, simply do:
c1 = Customer(name = "Dan", age = 24, color = "red")
c2 = Customer(name = "Bob", color = "Blue")
In the case of c2 the function age_function_generator() (not defined here) would be called. To access the attributes of the customer object one would do:
print(c1.name, c1.age, c1.color)

You may use Python Regular Expression to search for blank entries on the list. A Regular Expression is a sequence of characters that define a pattern. For more information on Python Regular Expression, kindly visit:
w3school link and Google Developer link
Kindly replace the following code
for c in customerlist:
if not in c:
with the following code:
for i in range(len(customerlist)):
for j in range(len(customer1)):
emptylist = re.findall('\s*', customerlist[i][j])
Dont forget to include 'import re' at the beginning of code to import Python re module
The complete code:
import re
customer1 = ['Dan','24','red']
customer2 = ['Bob',' ','Blue', ' ']
customerlist = [customer1, customer2]
for i in range(len(customerlist)):
for j in range(len(customer1)):
emptylist = re.findall('\s*', customerlist[i][j])
if(len(emptylist) == 0):
print('There are no blank entries')
else:
print('There are blank entries')
#code goes here to do something
The output:
There are blank entries
In the code:
emptylist = re.findall('\s*', customerlist[i][j])
re.findall() search for zero or more instances(*) of white space character(\s) with customerlist being the iterating list. customerlist[i][j] as it is a list of lists.

Accessing Values from Text Dictionary

I am trying to create a "This is Your New Name Generator" program. I am doing this by asking the user for their first and last name. The program then takes the first letter of their first name, and the last letter of the their last name, and pulls from two text files to give their new first and last name.
I've gotten as far as getting the user's first and last name, and pulling information from a file, however it always gives me the last line of the file.
I thought I could setup the files like dictionaries and then use the user's input as keys, but it doesn't seem to be working.
Any advice?
firstName = input("What is your first Name? ")
lastName = input("What is your last Name? ")
fN = firstName[0].lower()
lN_len = len(lastName) -1
lN = lastName[lN_len]
fNdict = {}
with open('firstName.txt') as f:
for line in f:
(fN, fNval) = line.split(",")
fNdict[fN] = fNval
lNdict = {}
with open('lastName.txt') as fileobj:
for line in fileobj:
lNkey, lNvalue = line.split(",")
lNdict[lN] = lNvalue
newFirstName = fNval
newLastName = lNvalue
print("Your zombie Name is: %s %s "%(newFirstName,newLastName))
Reference Image:

When you run these lines:
newFirstName = fNval
newLastName = lNvalue
fNval and lNvalue have the last values they had in their respective loops. I think you mean to use the user's first and last names as keys to the dictionaries, e.g.
newFirstName = fNdict[fN]
newLastName = lNdict[lN]
Note that this will fail if fN and lN aren't in the dictionaries. You might want to create defaultdicts instead.
Note also that Python has an official style guide that most Python developers follow. Please consider reading it and writing your code accordingly. The code you've shared is very hard to read.

You could follow a slightly different implementation to achieve the same result.
Create two python dictionaries with all the associations letters - first names and letter - last names.
Write them in a file using json. This file will substitute yours firstName.txt and lastName.txt
This should be done only once to create the file with the names.
Then your name generator is a script which:
Loads those two dictionaries.
Ask the user for an input to obtain the keys.
Retrieve the names from the dictionaries using the user input.
The first two points are implemented in this way:
import json
#these are just brief examples, provide complete dictionaries.
firstnames = {"A": "Crafty", "B": "Brainy"}
lastnames = {"A": "Decapitator", "B": "McBrains"}
with open("fullnames.txt", "w") as ff:
json.dump(firstnames, ff)
ff.write('\n')
json.dump(lastnames, ff)
This would be a script to generate the file with the names.
The name generator would be:
import json
with open("fullnames.txt", "r") as ff:
ll = ff.readlines()
firstnames = json.loads(ll[0].strip())
lastnames = json.loads(ll[1].strip())
inputfirst = input("What is your first Name? ")
inputlast = input("What is your last Name? ")
fn = inputfirst[0].upper()
ln = inputlast[-1].upper() #negative indexes start from the last element of the iterable, so -1 would be the last.
print("Your zombie Name is: {} {} ".format(firstnames[fn], lastnames[ln])) #using string format method, better that the old %

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.