Parsing data from a text file

Parsing data from a text file - python

I have built a contact form which sends me email for every user registration My question is more related to parsing some text data into csv format. and I have received multiple users information in my mail box which I had copied into a text file. The data looks like below.
Name: testuser2
Email: testuser2#gmail.com
Cluster Name: o b
Contact No.: 12346971239
Coming: Yes
Name: testuser3
Email: testuser3#gmail.com
Cluster Name: Mediternea
Contact No.: 9121319107
Coming: Yes
Name: testuser4
Email: tuser4#yahoo.com
Cluster Name: Mediterranea
Contact No.: 7892174896
Coming: Yes
Name: tuser5
Email: tuserner5#gmail.com
Cluster Name: River Retreat A
Contact No.: 7583450912
Coming: Yes
Members Participating: 2
Name: Test User
Email: testuser#yahoo.co.in
Cluster Name: RD
Contact No.: 09833123445
Coming: Yes
Members Participating: 2
As can see the data contains some common fields and some fields which are not present, I am looking for solution/suggestion on how I can parse this data so under the heading "Name", I will collect the name information under that column, and similarly for others. For the data with title "Members Participating" I can just pick the numbers and add it into Excel sheet under the same heading, in case this information is not present for the user, it can just be blank.

Let's decompose the problem into smaller subproblems:
Split the large block of text into separate registrations
Convert each of those registrations to a dictionary
Write the list of dictionaries to CSV
First, let's break the blocks of registration data into different elements:
DATA = '''
Name: testuser2
Email: testuser2#gmail.com
Cluster Name: o b
Contact No.: 12346971239
Coming: Yes
Name: testuser3
Email: testuser3#gmail.com
Cluster Name: Mediternea
Contact No.: 9121319107
Coming: Yes
'''
def parse_registrations(data):
data = data.strip()
return data.split('\n\n')
This function gives us a list of each registration:
>>> regs = parse_registrations(DATA)
>>> regs
['Name: testuser2\nEmail: testuser2#gmail.com\nCluster Name: o b\nContact No.: 12346971239\nComing: Yes', 'Name: testuser3\nEmail: testuser3#gmail.com\nCluster Name: Mediternea\nContact No.: 9121319107\nComing: Yes']
>>> regs[0]
'Name: testuser2\nEmail: testuser2#gmail.com\nCluster Name: o b\nContact No.: 12346971239\nComing: Yes'
>>> regs[1]
'Name: testuser3\nEmail: testuser3#gmail.com\nCluster Name: Mediternea\nContact No.: 9121319107\nComing: Yes'
Next, we can convert those substrings to a list of (key, value) pairs:
>>> [field.split(': ', 1) for field in regs[0].split('\n')]
[['Name', 'testuser2'], ['Email', 'testuser2#gmail.com'], ['Cluster Name', 'o b'], ['Contact No.', '12346971239'], ['Coming', 'Yes']]
The dict() function can convert a list of (key, value) pairs into a dictionary:
>>> dict(field.split(': ', 1) for field in regs[0].split('\n'))
{'Coming': 'Yes', 'Cluster Name': 'o b', 'Name': 'testuser2', 'Contact No.': '12346971239', 'Email': 'testuser2#gmail.com'}
We can pass these dictionaries into a csv.DictWriter to write the records as CSV with defaults for any missing values.
>>> w = csv.DictWriter(open("/tmp/foo.csv", "w"), fieldnames=["Name", "Email", "Cluster Name", "Contact No.", "Coming", "Members Participating"])
>>> w.writeheader()
>>> w.writerow({'Name': 'Steve'})
12
Now, let's combine these all together!
import csv
DATA = '''
Name: testuser2
Email: testuser2#gmail.com
Cluster Name: o b
Contact No.: 12346971239
Coming: Yes
Name: tuser5
Email: tuserner5#gmail.com
Cluster Name: River Retreat A
Contact No.: 7583450912
Coming: Yes
Members Participating: 2
'''
COLUMNS = ["Name", "Email", "Cluster Name", "Contact No.", "Coming", "Members Participating"]
def parse_registration(reg):
return dict(field.split(': ', 1) for field in reg.split('\n'))
def parse_registrations(data):
data = data.strip()
regs = data.split('\n\n')
return [parse_registration(r) for r in regs]
def write_csv(data, filename):
regs = parse_registrations(data)
with open(filename, 'w') as f:
writer = csv.DictWriter(f, fieldnames=COLUMNS)
writer.writeheader()
writer.writerows(regs)
if __name__ == '__main__':
write_csv(DATA, "/tmp/test.csv")
Output:
$ python3 write_csv.py
$ cat /tmp/test.csv
Name,Email,Cluster Name,Contact No.,Coming,Members Participating
testuser2,testuser2#gmail.com,o b,12346971239,Yes,
tuser5,tuserner5#gmail.com,River Retreat A,7583450912,Yes,2

The program below might satisfy your requirement. The general strategy:
First read in all of the email files, parsing the data "by hand", and
Second write the data to a CSV file, using csv.DictWriter.writerows().
import sys
import pprint
import csv
# Usage:
# python cfg2csv.py input1.cfg input2.cfg ...
# The data is combined and written to 'output.csv'
def parse_file(data):
total_result = []
single_result = []
for line in data:
line = line.strip()
if line:
single_result.append([item.strip() for item in line.split(':', 1)])
else:
if single_result:
total_result.append(dict(single_result))
single_result = []
if single_result:
total_result.append(dict(single_result))
return total_result
def read_file(filename):
with open(filename) as fp:
return parse_file(fp)
# First parse the data:
data = sum((read_file(filename) for filename in sys.argv[1:]), [])
keys = set().union(*data)
# Next write the data to a CSV file
with open('output.csv', 'w') as fp:
writer = csv.DictWriter(fp, sorted(keys))
writer.writeheader()
writer.writerows(data)

You can use the the empty line between records to signify end of record. Then process the input file line-by-line and construct a list of dictionaries. Finally write the dictionaries out to a CSV file.
from csv import DictWriter
from collections import OrderedDict
with open('input') as infile:
registrations = []
fields = OrderedDict()
d = {}
for line in infile:
line = line.strip()
if line:
key, value = [s.strip() for s in line.split(':', 1)]
d[key] = value
fields[key] = None
else:
if d:
registrations.append(d)
d = {}
else:
if d: # handle EOF
registrations.append(d)
# fieldnames = ['Name', 'Email', 'Cluster Name', 'Contact No.', 'Coming', 'Members Participating']
fieldnames = fields.keys()
with open('registrations.csv', 'w') as outfile:
writer = DictWriter(outfile, fieldnames=fields)
writer.writeheader()
writer.writerows(registrations)
This code attempts to automate the collection of field names, and will use the same order as unique keys are first seen in the input. If you require a specific field order in the output you can nail it up by uncommenting the appropriate line.
Running this code on your sample input produces this:
Name,Email,Cluster Name,Contact No.,Coming,Members Participating
testuser2,testuser2#gmail.com,o b,12346971239,Yes,
testuser3,testuser3#gmail.com,Mediternea,9121319107,Yes,
testuser4,tuser4#yahoo.com,Mediterranea,7892174896,Yes,
tuser5,tuserner5#gmail.com,River Retreat A,7583450912,Yes,2
Test User,testuser#yahoo.co.in,RD,09833123445,Yes,2

The following will convert your input text file automatically to a CSV file. The headings are automatically generated based on the longest entry.
import csv, re
with open("input.txt", "r") as f_input, open("output.csv", "wb") as f_output:
csv_output = csv.writer(f_output)
entries = re.findall("^(Name: .*?)(?:\n\n|\Z)", f_input.read(), re.M+re.S)
# Determine the entry with the most fields for the CSV headers
headings = []
for entry in entries:
headings = max(headings, [line.split(":")[0] for line in entry.split("\n")], key=len)
csv_output.writerow(headings)
# Write the entries
for entry in entries:
csv_output.writerow([line.split(":")[1].strip() for line in entry.split("\n")])
This produces a CSV text file that can be opened in Excel as follows:
Name,Email,Cluster Name,Contact No.,Coming,Members Participating
testuser2,testuser2#gmail.com,o b,12346971239,Yes
testuser3,testuser3#gmail.com,Mediternea,9121319107,Yes
testuser4,tuser4#yahoo.com,Mediterranea,7892174896,Yes
tuser5,tuserner5#gmail.com,River Retreat A,7583450912,Yes,2
Test User,testuser#yahoo.co.in,RD,09833123445,Yes,2

Related

Getting KeyError when trying to assign data from a dictionary into class and object in python

The file, data used
Austin = null|Stone Cold Austin|996003892|987045321|Ireland
keller = null|Mathew Keller|02/05/2002|0199999999|0203140819|019607892|9801 2828 5596 0889
The Nested Dictionary
data = {'Austin': {'Full Name': 'Stone Cold Steve Austin', 'Contact Details': '996003892', 'Emergency Contact Number': '987045321', Country: 'Ireland'}}
The class and Object that I want to use to assign the dict data
class member2:
def __init__(self, realname, phone, emergencyContact, country):
self.realname = realname
self.phone = phone
self.emergencyContact = emergencyContact
self.country = country
Assigning text file data into a nested dictionary
with open("something.txt", 'r') as f:
for line in f:
key, values = line.strip().split(" = ") # note the space around =, to avoid trailing space in key
values = values.split('|')
data2 = {key: dict(zip(keys, values[1:]))}
#To assign data to the class (NOT WORKING)
member2.realname = data2[values[2]]
print(member2)
if key == username:
data2 = {key: dict(zip(keys, values[1:]))}
Output
member2.realname = data2[values[2]]
KeyError: 'Stone Cold Steve Austin'

You are referring non existing key 'Stone Cold Steve Austin'
Maybe you wish to access something like data2[key][keys[0]]:
keys = ["Full Name", "Contact Details", "Emergency Contact Number", "Country"]
with open("we.txt", 'r') as f:
for line in f:
key, values = line.strip().split(" = ") # note the space around =, to avoid trailing space in key
values = values.split('|')
data2 = {key: dict(zip(keys, values[1:]))}
print(data2[key][keys[0]])
Output:
Stone Cold Austin
Mathew Keller

How to split a text file into a nested array?

Working on a project creating a python flask website that stores user logins into a text file. I have a text file where each line is one user and each user has 5 parameters stored on the line. All user parameters are separated by a ; character.
Parameters are:
username
password
first name
last name
background color
title
avatar
Sample of the text file:
joebob;pass1;joe;bob;yellow;My title!!;https://upload.wikimedia.org/wikipedia/commons/c/cd/Stick_Figure.jpg
richlong;pass2;rich;long;blue;My title2!!;https://www.iconspng.com/images/stick-figure-walking/stick-figure-walking.jpg
How do I go about storing the parameters into a python array, and how do I access them later when I need to reference log-ins.
Here is what I wrote so far:
accounts = { }
def readAccounts():
file = open("assignment11-account-info.txt", "r")
for accounts in file: #line
tmp = accounts.split(';')
for data in tmp: #data in line
accounts[data[0]] = {
'user': data[0],
'pass': data[1],
'first': data[2],
'last': data[3],
'color': data[4],
'title': data[5],
'avatar': data[6].rstrip()
}
file.close()

You can use the python builtin csv to parse
import csv
with open("assignment11-account-info.txt", "r") as file:
reader = csv.reader(file, delimiter=';')
result = []
for row in reader:
fields = ('user', 'passwd', 'first', 'last', 'color','title','avatar')
res = dict(zip(fields, row))
result.append(res)
Or equivalent but harder to read for a beginner the pythonic list comprehension:
with open("assignment11-account-info.txt", "r") as file:
reader = csv.reader(file, delimiter=';')
fields = ('user', 'passwd', 'first', 'last', 'color','title','avatar')
result = [ dict(zip(fields, row)) for row in reader ]

Here's what I might do:
accounts = {}
with open("assignment11-account-info.txt", "r") as file:
for line in file:
fields = line.rstrip().split(";")
user = fields[0]
pass = fields[1]
first = fields[2]
last = fields[3]
color = fields[4]
title = fields[5]
avatar = fields[6]
accounts[user] = {
"user" : user,
"pass" : pass,
"first" : first,
"last" : last,
"color" : color,
"title" : title,
"avatar" : avatar
}
By using with, the file handle file is closed for you automatically. This is the most "Python"-ic way of doing things.
So long as user is unique, you won't overwrite any entries you put in as you read through the file assignment11-account-info.txt.
If you need to deal with a case where user is repeated in the file assignment11-account-info.txt, then you need to use an array or list ([...]) as opposed to a dictionary ({...}). This is because reusing the value of user will overwrite any previous user entry you add to accounts. Overwriting existing entries is almost always a bad thing when using dictionaries!
If that is the case, I might do the following:
accounts = {}
with open("assignment11-account-info.txt", "r") as file:
for line in file:
fields = line.rstrip().split(";")
user = fields[0]
pass = fields[1]
first = fields[2]
last = fields[3]
color = fields[4]
title = fields[5]
avatar = fields[6]
if user not in accounts:
accounts[user] = []
accounts[user].append({
"user" : user,
"pass" : pass,
"first" : first,
"last" : last,
"color" : color,
"title" : title,
"avatar" : avatar
})
In this way, you preserve any cases where user is duplicated.

I need help on how to save dictionary elements into a csv file

I intend to save a contact list with name and phone number in a .csv file from user input through a dictionary.
The problem is that only the name is saved on the .csv file and the number is omitted.
contacts={}
def phone_book():
running=True
while running:
command=input('A(dd D)elete L)ook up Q)uit: ')
if command=='A' or command=='a':
name=input('Enter new name: ')
print('Enter new number for', name, end=':' )
number=input()
contacts[name]=number
elif command=='D' or command=='d':
name= input('Enter the name to delete: ')
del contacts[name]
elif command=='L' or command=='l':
name= input('Enter name to search: ')
if name in contacts:
print(name, contacts[name])
else:
print("The name is not in the phone book, use A or a to save")
elif command=='Q' or command=='q':
running= False
elif command =='list':
for k,v in contacts.items():
print(k,v)
else:
print(command, 'is not a valid command')
def contact_saver():
import csv
global name
csv_columns=['Name', 'Phone number']
r=[contacts]
with open(r'C:\Users\Rigelsolutions\Documents\numbersaver.csv', 'w') as f:
dict_writer=csv.writer(f)
dict_writer.writerow(csv_columns)
for data in r:
dict_writer.writerow(data)
phone_book()
contact_saver()

as I am reading your code contacts will look like
{
'name1': '1',
'name2': '2'
}
keys are the names and the value is the number.
but when you did r = [contacts] and iterating over r for data in r that will mess up I guess your code since you are passing dictionary value to writerow instead of a list [name, number]
You can do two things here. parse properly the contacts by:
for k, v in contacts.items():
dict_writer.writerow([k, v])
Or properly construct the contacts into a list with dictionaries inside
[{
'name': 'name1',
'number': 1
}]
so you can create DictWriter
fieldnames = ['name', 'number']
writer = csv.DictWriter(f, fieldnames=fieldnames)
...
# then you can insert by
for contact in contacts:
writer.writerow(contact) # which should look like writer.writerow({'name': 'name1', 'number': 1})

How do I create a loop such that I get all the queries into one csv in through python?

I have created a function that fetches price, rating, etc after it hits an API:
def is_priced(business_id):
try:
priced_ind = get_business(API_KEY, business_id)
priced_ind1 = priced_ind['price']
except:
priced_ind1 = 'None'
return priced_ind1
priced_ind = is_priced(b_id)
print(priced_ind)
Similar for rating
def is_rated(business_id):
try:
rated_ind = get_business(API_KEY, business_id)
rated_ind1 = rated_ind['rating']
except:
rated_ind1 = 'None'
return rated_ind1
However, I want my function to loop through the business names I have in my CSV file and catch all this data and export it to a new csv file with these two parameters beside the names of the business.
The CSV file has info on the name of the business along with its address,city,state,zip and country
Eg:
Name address city state zip country
XYZ(The) 5* WE 223899th St. New York NY 19921 US
My output:
Querying https://api.xyz.com/v3/businesses/matches ...
True
Querying https://api.xyz.com/v3/businesses/matches ...
4.0
Querying https://api.xyz.com/v3/businesses/matches ...
$$
Querying https://api.xyz.com/v3/businesses/matches ...
Querying https://api.xyz.com/v3/businesses/matches ...
The real issue is my output only returns business id in the csv. and the rating etc as u see is just returned in the console. how do I set a loop such that it returns for all the businesses the info i desire into a single CSV?

The csv module is useful for this sort of thing e.g.
import csv
with open('f.csv', 'r') as csvfile:
reader = csv.reader(csvfile, delimiter=',', quotechar='"')
with open('tmp.csv', 'w') as output:
writer = csv.writer(output)
for row in reader:
business_id = row[0]
row.append(get_price_index(business_id))
row.append(get_rate_index(business_id))
writer.writerow(row)

You can read the business names from the CSV file, iterate over them using a for loop, hit the API and store the results, and write to a new CSV file.
import csv
data = []
with open('businesses.csv') as fp:
# skip header line
header = next(fp)
reader = csv.reader(fp)
for row in reader:
b_name = reader[0]
# not sure how you get the business ID:
b_id = get_business_id(b_name)
p = is_priced(b_id)
r = is_rated(b_id)
out.append((b_name, p, r))
# write out the results
with open('business_data.csv', 'w') as fp:
writer = csv.writer(fp)
writer.writerow(['name', 'price', 'rating'])
for row in data:
writer.writerow(row)

You can do this easily using pandas:
import pandas as pd
csv = pd.read_csv('your_csv.csv', usecols=['business_name']) # since you only need the name
# you'll receive business_name in your functions
csv = csv.apply(is_priced, axis=1)
csv = csv.apply(is_rated, axis=1)
csv.to_csv('result.csv', index=False)
All you have to do in your functions is:
def is_priced(row):
business_name = row['business_name']
business_id = ??
...

Converting a text file into csv file using python

I have a requirement where in I need to convert my text files into csv and am using python for doing it. My text file looks like this ,
Employee Name : XXXXX
Employee Number : 12345
Age : 45
Hobbies: Tennis
Employee Name: xxx
Employee Number :123456
Hobbies : Football
I want my CSV file to have the column names as Employee Name, Employee Number , Age and Hobbies and when a particular value is not present it should have a value of NA in that particular place. Any simple solutions to do this? Thanks in advance

You can do something like this:
records = """Employee Name : XXXXX
Employee Number : 12345
Age : 45
Hobbies: Tennis
Employee Name: xxx
Employee Number :123456
Hobbies : Football"""
for record in records.split('Employee Name'):
fields = record.split('\n')
name = 'NA'
number = 'NA'
age = 'NA'
hobbies = 'NA'
for field in fields:
field_name, field_value = field.split(':')
if field_name == "": # This is employee name, since we split on it
name = field_value
if field_name == "Employee Number":
number = field_value
if field_name == "Age":
age = field_value
if field_name == "Hobbies":
hobbies = field_value
Of course, this method assumes that there is (at least) Employee Name field in every record.

Maybe this helps you get started? It's just the static output of the first employee data. You would now need to wrap this into some sort of iteration over the file. There is very very likely a more elegant solution, but this is how you would do it without a single import statement ;)
with open('test.txt', 'r') as f:
content = f.readlines()
output_line = "".join([line.split(':')[1].replace('\n',';').strip() for line in content[0:4]])
print(output_line)

I followed very simple steps for this and may not be optimal but solves the problem. Important case here I can see is there can be multiple keys ("Employee Name" etc) in single file.
Steps
Read txt file to list of lines.
convert list to dict(logic can be more improved or complex lambdas can be added here)
Simply use pandas to convert dict to csv
Below is the code,
import pandas
etxt_file = r"test.txt"
txt = open(txt_file, "r")
txt_string = txt.read()
txt_lines = txt_string.split("\n")
txt_dict = {}
for txt_line in txt_lines:
k,v = txt_line.split(":")
k = k.strip()
v = v.strip()
if txt_dict.has_key(k):
list = txt_dict.get(k)
else:
list = []
list.append(v)
txt_dict[k]=list
print pandas.DataFrame.from_dict(txt_dict, orient="index")
Output:
0 1
Employee Number 12345 123456
Age 45 None
Employee Name XXXXX xxx
Hobbies Tennis Football
I hope this helps.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parsing data from a text file - python

Related

Getting KeyError when trying to assign data from a dictionary into class and object in python

How to split a text file into a nested array?

I need help on how to save dictionary elements into a csv file

How do I create a loop such that I get all the queries into one csv in through python?

Converting a text file into csv file using python

Categories

Resources