extracting name, email and number and save it into a variable - python

I want to extract the name, email and phone number of all the conversations and then save them into different variables. I want to save it like this: a=max, b=email and so on.
This is my text file:
[11:23] max : Name : max
Email : max#gmail.com
Phone : 01716345678
[11:24] harvey : hello there how can i help you
[11:24] max : can you tell me about the latest feature
and this is my code. What am I missing here?
in_file = open("chat.txt", "rt")
contents = in_file.read()
#line: str
for line in in_file:
if line.split('Name :'):
a=line
print(line)
elif line.split('Email :'):
b = line
elif line.split('Phone :'):
c = line
else:
d = line

That's not what split does, at all. You might be getting it confused with in.
In any case, a regular expression will do:
import re
string = '''[11:23] max : Name : max
Email : max#gmail.com
Phone : 01716345678
[11:24] harvey : hello there how can i help you
[11:24] max : can you tell me about the latest feature'''
keys = ['Name', 'Email', 'Phone', 'Text']
result = re.search('.+Name : (\w+).+Email : ([\w#\.]+).+Phone : (\d+)(.+)', string, flags=re.DOTALL).groups()
{key: data for key, data in zip(keys, result)}
Output:
{'Name': 'max',
'Email': 'max#gmail.com',
'Phone': '01716345678',
'Text': '\n\n[11:24] harvey : hello there how can i help you\n[11:24] max : can you tell me about the latest feature'}

Remove this line in your code:
"contents = in_file.read()"
Also, use "in" instead of "split":
in_file = open("chat.txt", "rt")
for line in in_file:
if ('Name') in line:
a=line
print(a)
elif 'Email' in line:
b = line
print(b)
elif 'Phone' in line:
c = line
print(c)
else:
d = line
print(d)

Related

Program should read from a file and returns a dictionary but returning a type error

The dataset looks like this-
Action|10|Golden Tree (2012)
Drama|3|Titanic (1967)
So it is Genre|SerialNo|Movie
Required output is-
{ "Toy Story (1995)" : "Adventure", "Golden Tree (2012)" : "Action" }
Currently, the only output generated is "Action", I tried to write some code to fix it, but returns a type error. How do I fix this?
from collections import defaultdict
def read_genre_data(file):
movie_genre_dict = {}
ratings = defaultdict(list)
for line in open(file):
genre, num, movie = line.split('|')
#movie[genre].append(movie)
return genre
readGenre = read_genre_data("genreMovieSample.txt")
print(readGenre)
You need to add to the dictionary, and then return the dictionary. You're just returning the value of genre from the last line of the file.
def read_genre_data(file):
movie_genre_dict = {}
with open(file) as f:
for line in f:
genre, num, movie = line.split('|')
movie_genre_dict[movie] = genre
return movie_genre_dict

How to split a text file into a nested array?

Working on a project creating a python flask website that stores user logins into a text file. I have a text file where each line is one user and each user has 5 parameters stored on the line. All user parameters are separated by a ; character.
Parameters are:
username
password
first name
last name
background color
title
avatar
Sample of the text file:
joebob;pass1;joe;bob;yellow;My title!!;https://upload.wikimedia.org/wikipedia/commons/c/cd/Stick_Figure.jpg
richlong;pass2;rich;long;blue;My title2!!;https://www.iconspng.com/images/stick-figure-walking/stick-figure-walking.jpg
How do I go about storing the parameters into a python array, and how do I access them later when I need to reference log-ins.
Here is what I wrote so far:
accounts = { }
def readAccounts():
file = open("assignment11-account-info.txt", "r")
for accounts in file: #line
tmp = accounts.split(';')
for data in tmp: #data in line
accounts[data[0]] = {
'user': data[0],
'pass': data[1],
'first': data[2],
'last': data[3],
'color': data[4],
'title': data[5],
'avatar': data[6].rstrip()
}
file.close()
You can use the python builtin csv to parse
import csv
with open("assignment11-account-info.txt", "r") as file:
reader = csv.reader(file, delimiter=';')
result = []
for row in reader:
fields = ('user', 'passwd', 'first', 'last', 'color','title','avatar')
res = dict(zip(fields, row))
result.append(res)
Or equivalent but harder to read for a beginner the pythonic list comprehension:
with open("assignment11-account-info.txt", "r") as file:
reader = csv.reader(file, delimiter=';')
fields = ('user', 'passwd', 'first', 'last', 'color','title','avatar')
result = [ dict(zip(fields, row)) for row in reader ]
Here's what I might do:
accounts = {}
with open("assignment11-account-info.txt", "r") as file:
for line in file:
fields = line.rstrip().split(";")
user = fields[0]
pass = fields[1]
first = fields[2]
last = fields[3]
color = fields[4]
title = fields[5]
avatar = fields[6]
accounts[user] = {
"user" : user,
"pass" : pass,
"first" : first,
"last" : last,
"color" : color,
"title" : title,
"avatar" : avatar
}
By using with, the file handle file is closed for you automatically. This is the most "Python"-ic way of doing things.
So long as user is unique, you won't overwrite any entries you put in as you read through the file assignment11-account-info.txt.
If you need to deal with a case where user is repeated in the file assignment11-account-info.txt, then you need to use an array or list ([...]) as opposed to a dictionary ({...}). This is because reusing the value of user will overwrite any previous user entry you add to accounts. Overwriting existing entries is almost always a bad thing when using dictionaries!
If that is the case, I might do the following:
accounts = {}
with open("assignment11-account-info.txt", "r") as file:
for line in file:
fields = line.rstrip().split(";")
user = fields[0]
pass = fields[1]
first = fields[2]
last = fields[3]
color = fields[4]
title = fields[5]
avatar = fields[6]
if user not in accounts:
accounts[user] = []
accounts[user].append({
"user" : user,
"pass" : pass,
"first" : first,
"last" : last,
"color" : color,
"title" : title,
"avatar" : avatar
})
In this way, you preserve any cases where user is duplicated.

I need help on how to save dictionary elements into a csv file

I intend to save a contact list with name and phone number in a .csv file from user input through a dictionary.
The problem is that only the name is saved on the .csv file and the number is omitted.
contacts={}
def phone_book():
running=True
while running:
command=input('A(dd D)elete L)ook up Q)uit: ')
if command=='A' or command=='a':
name=input('Enter new name: ')
print('Enter new number for', name, end=':' )
number=input()
contacts[name]=number
elif command=='D' or command=='d':
name= input('Enter the name to delete: ')
del contacts[name]
elif command=='L' or command=='l':
name= input('Enter name to search: ')
if name in contacts:
print(name, contacts[name])
else:
print("The name is not in the phone book, use A or a to save")
elif command=='Q' or command=='q':
running= False
elif command =='list':
for k,v in contacts.items():
print(k,v)
else:
print(command, 'is not a valid command')
def contact_saver():
import csv
global name
csv_columns=['Name', 'Phone number']
r=[contacts]
with open(r'C:\Users\Rigelsolutions\Documents\numbersaver.csv', 'w') as f:
dict_writer=csv.writer(f)
dict_writer.writerow(csv_columns)
for data in r:
dict_writer.writerow(data)
phone_book()
contact_saver()
as I am reading your code contacts will look like
{
'name1': '1',
'name2': '2'
}
keys are the names and the value is the number.
but when you did r = [contacts] and iterating over r for data in r that will mess up I guess your code since you are passing dictionary value to writerow instead of a list [name, number]
You can do two things here. parse properly the contacts by:
for k, v in contacts.items():
dict_writer.writerow([k, v])
Or properly construct the contacts into a list with dictionaries inside
[{
'name': 'name1',
'number': 1
}]
so you can create DictWriter
fieldnames = ['name', 'number']
writer = csv.DictWriter(f, fieldnames=fieldnames)
...
# then you can insert by
for contact in contacts:
writer.writerow(contact) # which should look like writer.writerow({'name': 'name1', 'number': 1})

How to get information with python when data is heavily nested

I have a text file which contains some data to be mined.
The structure is shown below
name (personA {
field1 : data1
field2 : data2
fieldN : dataN
subfield() {
fieldx1 : datax1
fieldxN : dataxN
}
}
name (personB {
field1 : data11
field2 : data12
fieldN : data1N
}
In some person's record the subfield is absent and output should specify subfield to be unknown in that case. Now below is the code I use to extract the data
import re
data = dict()
with open('data.txt', 'r') as fin:
FLAG, FLAGP, FLAGS = False, False, False
for line in fin:
if FLAG:
if re.search('field1', line):
d1 = line.split()[2]
data['field1'] = d1
if re.search('fieldN', line):
dN = line.split()[2]
data['fieldN'] = dN
data['fieldxn'] = 'unknown'
FLAGP = True
if FLAGS:
if re.search('fieldxN', line):
dsN = line.split()[2]
data['fieldxn'] = dsN
if re.search('name\ \(', line):
pn = line.split()[1]
FLAG = True
data['name'] = pn
if re.search('subfield', line):
FLAGS = True
if len(data) == 4:
if FLAGP:
print data
FLAGP = False
FLAG = False
FLAGS = False
The output is shown below
{'field1': 'data1', 'fieldN': 'dataN', 'name': '(personA', 'fieldxn': 'unknown'}
{'field1': 'data11', 'fieldN': 'data1N', 'name': '(personB', 'fieldxn': 'unknown'}
The problem has been that I don't know where to print data so current I am using below statment to print data which is wrong
if len(data) == 4:
if FLAGP:
print data
FLAGP = False
FLAG = False
FLAGS = False
I would appreciate if someone could give any suggestion to retrieve the data correctly
I would take a different approach to parsing, storing the subfields (and other fields) in a dictionary.
data = open('data.txt', 'rt').read()
### Given a string containing lines of "fieldX : valueY"
### return a dictionary of values
def getFields(field_data):
fields = {}
if (field_data != None):
field_lines = field_data.strip().split("\n")
for pair in field_lines:
name, value = pair.split(":")
fields[name.strip()] = value.strip()
return fields
### Split the data by name
people_data = data.strip().split("name (")[1:]
### Loop though every person record
for person_data in people_data:
name, person_data = person_data.split(" {", 1) # split the name and the fields
# Split out the subfield data, if any
subfield_data = None
if (person_data.find("subfield()") > -1):
field_data, subfield_data = person_data.split("subfield() {", 1)
subfield_data = subfield_data.split("}")[0]
# Separate the fields into single lines of pairs
fields = getFields(field_data)
# and any subfields
subfields = getFields(subfield_data)
print("Person: "+str(name))
print("Fields: "+str(fields))
print("Sub_Fields:"+str(subfields))
Which gives me:
Person: personA
Fields: {'field1': 'data1', 'field2': 'data2', 'fieldN': 'dataN'}
Sub_Fields:{'fieldx1': 'datax1', 'fieldxN': 'dataxN'}
Person: personB
Fields: {'field1': 'data1', 'field2': 'data2', 'fieldN': 'dataN'}
Sub_Fields:{}
So you could just adjust your output based on whether subfields was None, or otherwise. The idea is to get your data input into more flexible structures, rather than "brute-force" parsing like you have done. In the above I use split() a lot to give a more flexible way through, rather than relying on finding exact names. Obviously it depends on your design requirements too.

Converting a text file into csv file using python

I have a requirement where in I need to convert my text files into csv and am using python for doing it. My text file looks like this ,
Employee Name : XXXXX
Employee Number : 12345
Age : 45
Hobbies: Tennis
Employee Name: xxx
Employee Number :123456
Hobbies : Football
I want my CSV file to have the column names as Employee Name, Employee Number , Age and Hobbies and when a particular value is not present it should have a value of NA in that particular place. Any simple solutions to do this? Thanks in advance
You can do something like this:
records = """Employee Name : XXXXX
Employee Number : 12345
Age : 45
Hobbies: Tennis
Employee Name: xxx
Employee Number :123456
Hobbies : Football"""
for record in records.split('Employee Name'):
fields = record.split('\n')
name = 'NA'
number = 'NA'
age = 'NA'
hobbies = 'NA'
for field in fields:
field_name, field_value = field.split(':')
if field_name == "": # This is employee name, since we split on it
name = field_value
if field_name == "Employee Number":
number = field_value
if field_name == "Age":
age = field_value
if field_name == "Hobbies":
hobbies = field_value
Of course, this method assumes that there is (at least) Employee Name field in every record.
Maybe this helps you get started? It's just the static output of the first employee data. You would now need to wrap this into some sort of iteration over the file. There is very very likely a more elegant solution, but this is how you would do it without a single import statement ;)
with open('test.txt', 'r') as f:
content = f.readlines()
output_line = "".join([line.split(':')[1].replace('\n',';').strip() for line in content[0:4]])
print(output_line)
I followed very simple steps for this and may not be optimal but solves the problem. Important case here I can see is there can be multiple keys ("Employee Name" etc) in single file.
Steps
Read txt file to list of lines.
convert list to dict(logic can be more improved or complex lambdas can be added here)
Simply use pandas to convert dict to csv
Below is the code,
import pandas
etxt_file = r"test.txt"
txt = open(txt_file, "r")
txt_string = txt.read()
txt_lines = txt_string.split("\n")
txt_dict = {}
for txt_line in txt_lines:
k,v = txt_line.split(":")
k = k.strip()
v = v.strip()
if txt_dict.has_key(k):
list = txt_dict.get(k)
else:
list = []
list.append(v)
txt_dict[k]=list
print pandas.DataFrame.from_dict(txt_dict, orient="index")
Output:
0 1
Employee Number 12345 123456
Age 45 None
Employee Name XXXXX xxx
Hobbies Tennis Football
I hope this helps.

Categories

Resources