Converting a text file into csv file using python

Converting a text file into csv file using python - python

I have a requirement where in I need to convert my text files into csv and am using python for doing it. My text file looks like this ,
Employee Name : XXXXX
Employee Number : 12345
Age : 45
Hobbies: Tennis
Employee Name: xxx
Employee Number :123456
Hobbies : Football
I want my CSV file to have the column names as Employee Name, Employee Number , Age and Hobbies and when a particular value is not present it should have a value of NA in that particular place. Any simple solutions to do this? Thanks in advance

You can do something like this:
records = """Employee Name : XXXXX
Employee Number : 12345
Age : 45
Hobbies: Tennis
Employee Name: xxx
Employee Number :123456
Hobbies : Football"""
for record in records.split('Employee Name'):
fields = record.split('\n')
name = 'NA'
number = 'NA'
age = 'NA'
hobbies = 'NA'
for field in fields:
field_name, field_value = field.split(':')
if field_name == "": # This is employee name, since we split on it
name = field_value
if field_name == "Employee Number":
number = field_value
if field_name == "Age":
age = field_value
if field_name == "Hobbies":
hobbies = field_value
Of course, this method assumes that there is (at least) Employee Name field in every record.

Maybe this helps you get started? It's just the static output of the first employee data. You would now need to wrap this into some sort of iteration over the file. There is very very likely a more elegant solution, but this is how you would do it without a single import statement ;)
with open('test.txt', 'r') as f:
content = f.readlines()
output_line = "".join([line.split(':')[1].replace('\n',';').strip() for line in content[0:4]])
print(output_line)

I followed very simple steps for this and may not be optimal but solves the problem. Important case here I can see is there can be multiple keys ("Employee Name" etc) in single file.
Steps
Read txt file to list of lines.
convert list to dict(logic can be more improved or complex lambdas can be added here)
Simply use pandas to convert dict to csv
Below is the code,
import pandas
etxt_file = r"test.txt"
txt = open(txt_file, "r")
txt_string = txt.read()
txt_lines = txt_string.split("\n")
txt_dict = {}
for txt_line in txt_lines:
k,v = txt_line.split(":")
k = k.strip()
v = v.strip()
if txt_dict.has_key(k):
list = txt_dict.get(k)
else:
list = []
list.append(v)
txt_dict[k]=list
print pandas.DataFrame.from_dict(txt_dict, orient="index")
Output:
0 1
Employee Number 12345 123456
Age 45 None
Employee Name XXXXX xxx
Hobbies Tennis Football
I hope this helps.

Related

Python and CSV: find a row that is given by the user and give column value

It consists in creating a function that searches for the name that the user will give us and we need to print his email
here is a small part of the CSV file and it is structured as this:
Name
Email
mahmoud
mahmoud.123#gmail.com
sam
sam.123#gmail.com
import pandas as pd
data = pd.read_csv(r'C:\Users\samer\Downloads\cvs\mail list.csv')
print (data)
df= pd.DataFrame()
df['Name1'] = input("Enter Name:")
for row in data:
if row == "Name1":
print(row)
I have been stuck here, i dont know how to print the email after checking if the name the user gives us exist
here's an example of the output:
if the user gives a name that is sam
the output will be sam.123#gmail.com

You can assign the Name column to the index when loading the CSV to a DataFrame and then use .loc to locate the name explicitly:
import pandas as pd
df = pd.read_csv(r'./mail list.csv', index_col='Name')
name = input("Enter Name:")
email = df.loc[name, 'Email']
Which results in
sam.123#gmail.com

Admitting all your names in Name column are unique :
def get_email(df, name):
email_list = list(df.loc[df["Name"] == name]["Email"])
if email_list:
print(email_list[0])
else:
print("User does not exist")
get_email(df, "sam")
[Out] : sam.123#gmail.com
get_email(df, "Jean")
[Out] : User does not exist
Edit: I used this data to run the test:
Name Email
mahmoud mahmoud.123#gmail.com
sam sam.123#gmail.com

Extract data from text file using Python (or any language)

I have a text file that looks like:
First Name Bob
Last name Smith
Phone 555-555-5555
Email bob#bob.com
Date of Birth 11/02/1986
Preferred Method of Contact Text Message
Desired Appointment Date 04/29
Desired Appointment Time 10am
City Pittsburgh
Location State
IP Address x.x.x.x
User-Agent (Browser/OS) Apple Safari 14.0.3 / OS X
Referrer http://www.example.com
First Name john
Last name Smith
Phone 555-555-4444
Email john#gmail.com
Date of Birth 03/02/1955
Preferred Method of Contact Text Message
Desired Appointment Date 05/22
Desired Appointment Time 9am
City Pittsburgh
Location State
IP Address x.x.x.x
User-Agent (Browser/OS) Apple Safari 14.0.3 / OS X
Referrer http://www.example.com
.... and so on
I need to extract each entry to a csv file, so the data should look like: first name, last name, phone, email, etc. I don't even know where to start on something like this.

first of all you'll need to open the text file in read mode.
I'd suggest using a context manager like so:
with open('path/to/your/file.txt', 'r') as file:
for line in file.readlines():
# do something with the line (it is a string)
as for managing the info you could build some intermediate structure, for example a dictionary or a list of dictionaries, and then translate that into a CSV file with the csv module.
you could for example split the file whenever there is a blank line, maybe like this:
with open('Downloads/test.txt', 'r') as f:
my_list = list() # this will be the final list
entry = dict() # this contains each user info as a dict
for line in f.readlines():
if line.strip() == "": # if line is empty start a new dict
my_list.append(entry) # and append the old one to the list
entry = dict()
else: # otherwise split the line and create new dict
line_items = line.split(r' ')
print(line_items)
entry[line_items[0]] = line_items[1]
print(my_list)
this code won't work because your text is not formatted in a consistent way: you need to find a way to make the split between "title" and "content" (like "first name" and "bob") in a consistent way. I suggest maybe looking at regex and fixing the txt file by making spacing more consistent.

assuming the data resides in a:
a="""
First Name Bob
Last name Smith
Phone 555-555-5555
Email bob#bob.com
Date of Birth 11/02/1986
Preferred Method of Contact Text Message
Desired Appointment Date 04/29
Desired Appointment Time 10am
City Pittsburgh
Location State
IP Address x.x.x.x
User-Agent (Browser/OS) Apple Safari 14.0.3 / OS X
Referrer http://www.example.com
First Name john
Last name Smith
Phone 555-555-4444
Email john#gmail.com
Date of Birth 03/02/1955
Preferred Method of Contact Text Message
Desired Appointment Date 05/22
Desired Appointment Time 9am
City Pittsburgh
Location State
IP Address x.x.x.x
User-Agent (Browser/OS) Apple Safari 14.0.3 / OS X
Referrer http://www.example.com
"""
line_sep = "\n" # CHANGE ME ACCORDING TO DATA
fields = ["First Name", "Last name", "Phone",
"Email", "Date of Birth", "Preferred Method of Contact",
"Desired Appointment Date", "Desired Appointment Time",
"City", "Location", "IP Address", "User-Agent","Referrer"]
records = a.split(line_sep * 2)
all_records = []
for record in records:
splitted_record = record.split(line_sep)
one_record = {}
csv_record = []
for f in fields:
found = False
for one_field in splitted_record:
if one_field.startswith(f):
data = one_field[len(f):].strip()
one_record[f] = data
csv_record.append(data)
found = True
if not found:
csv_record.append("")
all_records.append(";".join(csv_record))
one_record will have the record as dictionary and csv_record will have it as a list of fields (ordered as fields variable)

Edited to add: ignore this answer, the code from Koko Jumbo looks infinitely more sensible and actually gives you a CVS file at the end of it! It was a fun exercise though :)
Just to expand on fcagnola's code a bit.
If it's a quick and dirty one-off, and you know that the data will be consistently presented, the following should work to create a list of dictionaries with the correct key/value pairing. Each line is processed by splitting the line and comparing the line number (reset to 0 with each new dict) against an array of values that represent where the boundary between key and value falls.
For example, "First Name Bob" becomes ["First","Name","Bob"]. The function has been told that linenumber= 0 so it checks entries[linenumber] to get the value "2", which it uses to join the key name (items 0 & 1) and then join the data (items 2 onwards). The end result is ["First Name", "Bob"] which is then added to the dictionary.
class Extract:
def extractEntry(self,linedata,lineindex):
# Hardcoded list! The quick and dirty part.
# This is specific to the example data provided. The entries
# represent the index to be used when splitting the string
# between the key and the data
entries = (2,2,1,1,3,4,3,3,1,1,2,2,1)
return self.createNewEntry(linedata,entries[lineindex])
def createNewEntry(self,linedata,dataindex):
list_data = linedata.split()
key = " ".join(list_data[:dataindex])
data = " ".join(list_data[dataindex:])
return [key,data]
with open('test.txt', 'r') as f:
my_list = list() # this will be the final list
entry = dict() # this contains each user info as a dict
extr = Extract() # class for splitting the entries into key/value
x = 0
for line in f.readlines():
if line.strip() == "": # if line is empty start a new dict
my_list.append(entry) # and append the old one to the list
entry = dict()
x = 0
else: # otherwise split the line and create new dict
extracted_data = extr.extractEntry(line,x)
entry[extracted_data[0]] = extracted_data[1]
x += 1
my_list.append(entry)
print(my_list)

How to check if User Input is present in CSV Python

I am trying create a program that will ask for user input for a city and gender and Test their input to make sure it is valid and corresponds to values within the a CSV that has city name and different gender population.
How do I check if the userinput for both gender and city name are within the csv file. I want to create it in the way that the if the user does not put a valid year or gender. it will tell the user to choose a different city and or year.
Here is what the CSV looks like:
name,gen_male,gen_female
Tokyo,5000,4500
San_Francsico,400,500
Manila,600,700
New_York,8000,9000
Paris,5600,5600
Chicago,500,6000
Can anyone help me to figure out a way to check user input if a given value is within the csv file.
Here is my script:
import csv
with open('C:/Users/PycharmProjects/CityGen.csv') as csvfile:
reader = csv.DictReader(csvfile)
city = raw_input('Which city?:')
gender = raw_input('What gender?:')
yearPop = 'gen_' + year
try:
for row in reader:
if row['name'] == city:
print row[yearPop]
except ValueError:
print 'incorrect value'

I would first read the csv file and save it into a dictionary with the city name as key and the value is a tuple (or maybe a named tuple?) tuple[0] is gen_male, and tuple[1] is gen_female. Then ask the user to input city's name, look it up in the dictionary, if its there, then ask him to input the gender and check if its valid for that city.
with open('C:/Users/PycharmProjects/CityGen.csv') as csvfile:
reader = csv.DictReader(csvfile)
dictionary = {}
for row in reader:
city = row[0]
genders = tuple(row[1:])
dict1 = {city: genders}
dictionary.update(dict1)
city = raw_input('Which city?:')
if city in dictionary:
gender = raw_input('What gender?:')
if gender in dictionary[city]:
# gender and city are valid
else:
# gender is not valid
else:
# city is not valid

How do I merge two csv files?

I have two csv files. EMPLOYEES contains a dict of every employee at a company with 10 rows of information about each one. SOCIAL contains a dict of employees who filled out a survey, with 8 rows of information. Every employee in survey is also on the master dict. Both dicts have a unique identifier (the EXTENSION.)
I want to say "If an employee is on the SOCIAL dict, add rows 4,5,6 to their column in the EMPLOYEES dict" In other words, if an employee filled out a survey, additional information should be appended to the master dict.
Currently, my program pulls out all information from EMPLOYEES for employees who have taken the SURVEY. But I don't know how to add the additional rows of information to the EMPLOYEES csv. I have spent much of the day reading StackOverflow about DictReader and Dictionary and am still confused.
Thank you in advance for your guidance.
Sample EMPLOYEE:
Name Extension Job
Bill 1111 plumber
Alice 2222 fisherman
Carl 3333 rodeo clown
Sample SURVEY:
Extension Favorite Color Book
2222 blue A Secret Garden
3333 green To Kill a Mockingbird
Sample OUTPUT
Name Extension Job Favorite Color Favorite Book
Bill 1111 plumber
Alice 2222 fisherman blue A Secret Garden
Carl 3333 rodeo clown green To Kill a Mockingbird
import csv
with open('employees.csv', "rU") as npr_employees:
employees = csv.DictReader(npr_employees)
all_employees = {}
total_employees = {}
for employee in employees:
all_employees[employee['Extension']] = employee
with open('social.csv', "rU") as social_employees:
social_employee = csv.DictReader(social_employees)
for row in social_employee:
print all_employees.get(row['Extension'], None)

You can merge two dictionaries in Python using:
dict(d1.items() + d2.items())
Using a dict, all_employees, with the key as 'Extension' works perfectly to link a "social employee" row with its corresponding "employee" row.
Then you need to go through all the updated employee info and output their fields in a consistent order. Since dictionaries are inherently orderless, we keep a list of the headers, output_headers as we see them.
import csv
# Store all the info about the employees
all_employees = {}
output_headers = []
# First, get all employee record info
with open('employees.csv', 'rU') as npr_employees:
employees = csv.DictReader(npr_employees)
for employee in employees:
ext = employee['Extension']
all_employees[ext] = employee
# Add headers from "all employees"
output_headers.extend(employees.fieldnames)
# Then, get all info from social, and update employee info
with open('social.csv', 'rU') as social_employees:
social_employees = csv.DictReader(social_employees)
for social_employee in social_employees:
ext = social_employee['Extension']
# Combine the two dictionaries.
all_employees[ext] = dict(
all_employees[ext].items() + social_employee.items()
)
# Add headers from "social employees", but don't add duplicate fields
output_headers.extend(
[field for field in social_employees.fieldnames
if field not in output_headers]
)
# Finally, output the records ordered by extension
with open('output.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerow(output_headers)
# Write the new employee rows. If a field doesn't exist,
# write an empty string.
for employee in sorted(all_employees.values()):
writer.writerow(
[employee.get(field, '') for field in output_headers]
)
outputs:
Name,Extension,Job,Favorite Color,Book
Bill,1111,plumber,,
Alice,2222,fisherman,blue,A Secret Garden
Carl,3333,rodeo clown,green,To Kill a Mockingbird
Let me know if you have any questions!

You Could try:
for row in social_employee:
employee = all_employees.get(row['Extension'], None)
if employee is not None:
all_employees[employee['additionalinfo1']] = row['additionalinfo1']
all_employees[employee['additionalinfo2']] = row['additionalinfo2']

list indices must be integers, not str 6

I am very new to python and am really struggling to find a solution to this issue.
I just don't understand why I need to include only integers in my list when I though they are supposed to support multiple data types.
I've got a very simple field entry system for an account registration and I just can't add the items into a list.
Any help would be greatly appreciated. I've have included my code and the message I receive.
useraccounts = {}
group = []
forename = input('Forename: ')
surname = input('Surname: ')
DOB = input('DOB: ')
stu_class = input('Class: ')
group['forename'] = forename
group['surname'] = surname
group['dob'] = DOB
group['class'] = stu_class
group.append(user accounts)
This is the error message:
Traceback (most recent call last):
File "/Users/admin/Documents/Homework/Computing/testing/testing.py", line 11, in <module>
group['forename'] = forename
TypeError: list indices must be integers, not str

It looks like you want group to be a dict, and useraccounts to be a list. You have them backwards, as well as the append:
useraccounts = [] # <-- list
group = {} # <-- dict
forename = input('Forename: ')
surname = input('Surname: ')
DOB = input('DOB: ')
stu_class = input('Class: ')
group['forename'] = forename
group['surname'] = surname
group['dob'] = DOB
group['class'] = stu_class
useraccounts.append(group) # <-- reversed. this will append group to useraccounts
As written, you were trying to append useraccuonts, an empty list, to group, a dict which has no append method

What you want is a dictionary:
group = {}
group['forename'] = forename
group['surname'] = surname
group['dob'] = DOB
group['class'] = stu_class
In your original code useraccounts stays an empty dict that you just append to the list. If you wanted to add group to useraccounts:
useraccounts['key'] = group

group is a list, it cannot take string indices. It looks like you wanted to use a dictionary instead:
useraccounts = []
group = {}
group['forename'] = forename
group['surname'] = surname
group['dob'] = DOB
group['class'] = stu_class
useraccounts.append(group)
Note that you probably wanted useraccounts to be the list here; your code tried to call .append() on the group object..
or inline the keys and values directly into the dictionary definition:
useraccounts.append({
'forename': forename,
'surname': surname,
'dob']: DOB,
'class': stu_class})

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Converting a text file into csv file using python - python

Related

Python and CSV: find a row that is given by the user and give column value

Extract data from text file using Python (or any language)

How to check if User Input is present in CSV Python

How do I merge two csv files?

list indices must be integers, not str 6

Categories

Resources