Parsing a file with readlines and split function in python

Parsing a file with readlines and split function in python - python

I have files I'm trying to parse. I want to print the date_of_birth on each line. The code below only returns the first line. I don't want to use readlines, as some of my files are very large.
HEADER: Date_of_birth, ID, First_Name, Last_Name
1/1/1970, 1, John, Smith
12/31/1969, 2, Peter, Smith
with open("test.csv", "r") as f:
lines = f.readline().split[0]
print(lines)

I suggest the csv module, though you have a slightly odd file format because it starts with "HEADER: " followed by the actual headers that you care about. Maybe just read in those initial 8 bytes, verify that they actually contain the string "HEADER: " but otherwise discard them, then pass the open file handle to csv to parse the rest of the file.
Here's a simple example, which you might want to tweak to do more graceful handling of any errors:
import csv
with open('test.csv') as f:
start_bytes = f.read(8)
assert(start_bytes == 'HEADER: ')
c = csv.reader(f)
header_row = next(c)
column_number = header_row.index('Date_of_birth')
for row in c:
print(row[column_number])
Update: thanks to another contributor for suggesting csv.DictReader. Similarly it seems that you can instantiate this with a file object positioned at some non-zero offset to discard the initial bytes containing "HEADER: " from the start of the file.
import csv
with open('test.csv') as f:
start_bytes = f.read(8)
assert(start_bytes == 'HEADER: ')
c = csv.DictReader(f)
for row in c:
print(row['Date_of_birth'])

use csv module
import csv
with open("test.csv", "r") as f:
reader = csv.DictReader(f)
for line in reader:
print(line['Date_of_birth'])

Sorry for my mistake
Check this
dates = []
with open("test.csv") as f:
for row in f:
dates.append(row.split()[0])

The readline function returns only one line at a time, so you have to use a while loop to read the lines:
with open("test.csv", "r") as f:
dates = []
while True:
line = f.readline()
if not line: # if line is blank, there are no more lines
break # stop the loop
dates.append(line.split()[0])

If the first row does not actually contain what you show as the Header, i.e. Date_of_birth, ID, First_Name, Last_Name, then:
import csv
with open("test.csv", "r", newline='') as f:
fieldnames = ['Date_of_birth', 'ID', 'First_Name', 'Last_Name']
rdr = csv.DictReader(f, fieldnames=fieldnames)
for row in rdr:
date_of_birth = row['Date_of_birth']
print(date_of_birth)
Otherwise:
import csv
with open("test.csv", "r", newline='') as f:
rdr = csv.DictReader(f)
for row in rdr:
date_of_birth = row['Date_of_birth']
print(date_of_birth)
If the file's first row actually contains HEADER: Date_of_birth, ID, First_Name, Last_Name, then you must use the first alternative code but add logic to skip the first row.
My answer would have been 60% shorter had you been 10% clearer.

Related

When i try to remove a row from the csv the file size is multipliying

I want to create a program which generates numbers from 0 to 100000 and stores it in a file then, remove the numbers i give as input
I have done the code for generating the numbers and storing them in a csv file
import csv
nums = list(range(0,100000))
with open('codes.csv', 'w') as f:
writer = csv.writer(f)
for val in nums:
writer.writerow([val])
and i tried to delete the row i wanted with this
import csv
import os
lines = list()
while True:
members= input("Please enter a number to be deleted: ")
with open('codes.csv', 'r') as readFile:
reader = csv.reader(readFile)
for row in reader:
lines.append(row)
for field in row:
if field == members:
lines.remove(row)
os.remove('codes.csv')
with open('codes.csv', 'a+') as writeFile:
writer = csv.writer(writeFile)
writer.writerows(lines)
but the file size is multiplying each time i remove a number, please help

Add check before appending to your list, something like this should work:
with open('codes.csv', 'r') as readFile:
reader = csv.reader(readFile)
for row in reader:
if all(field != members for field in row):
lines.append(row)
Ps: don't forget to clear lines by adding lines = [] at the beginning of the while loop (I assume you know what you're doing).

There a two problems:
The lines list is never cleared. Whenever a number is entered, everything is written again to lines.
When writing, the file is opened with the a+ attributes, which means "append and update" file.
Try to recreate the list within the outer while loop and override the file contents by opening the file with attribute w, like this:
import csv
import os
while True:
members= input("Please enter a number to be deleted: ")
lines = list()
with open('codes.csv', 'r') as readFile:
reader = csv.reader(readFile)
for row in reader:
lines.append(row)
for field in row:
if field == members:
lines.remove(row)
os.remove('codes.csv')
with open('codes.csv', 'w') as writeFile:
writer = csv.writer(writeFile)
writer.writerows(lines)

How to delete only one row from a CSV file with python?

I'm trying to make a program which stores a list of names in a CSV file, and I'm trying to add a function to delete rows, which isn't working as it deletes everything in the CSV file.
I've tried using writer.writerow(row), which hasn't worked.
memberName = input("Please enter a member's name to be deleted.")
imp = open('mycsv.csv' , 'rb')
out = open('mycsv.csv' , 'wb')
writer = csv.writer(out)
for row in csv.reader(imp):
if row == memberName:
writer.writerow(row)
imp.close()
out.close()
I expected the program to only delete rows which contained memberName, but it deletes every row in the CSV file.
How do I change it to only delete a single row?

You can't write to the same file while reading it. Instead, use another file for output, e.g.:
import csv
member_name = input("Please enter a member's name to be deleted: ")
with open('in_file.csv') as in_file, open('out_file.csv', 'w') as out_file:
reader = csv.reader(in_file)
writer = csv.writer(out_file)
for row in reader:
if member_name not in row: # exclude a specific row
writer.writerow(row)
Alternatively, you could store needed rows in memory and write them back to the input file after resetting the file pointer:
import csv
member_name = input("Please enter a member's name to be deleted: ")
with open('in_file.csv', 'r+') as in_file:
reader = csv.reader(in_file)
rows = [row for row in csv.reader(in_file) if member_name not in row]
in_file.seek(0)
in_file.truncate()
writer = csv.writer(in_file)
writer.writerows(rows)

This worked for me: you could write the contents of the csv file to a list, then edit the list in python, then write the list back to the csv file.
lines = list()
memberName = input("Please enter a member's name to be deleted.")
with open('mycsv.csv', 'r') as readFile:
reader = csv.reader(readFile)
for row in reader:
lines.append(row)
for field in row:
if field == memberName:
lines.remove(row)
with open('mycsv.csv', 'w') as writeFile:
writer = csv.writer(writeFile)
writer.writerows(lines)

(Simple Python) CSV input to usernames

I have a CSV file names.csv
First_name, Last_name
Mike, Hughes
James, Tango
, Stoke
Jack,
....etc
What I want is to be able to take the first letter of the First_name and the full Last_name and output it on screen as usernames but not include the people with First_name and Last_name property's empty. I'm completely stuck any help would be greatly appreciated
import csv
ifile = open('names.csv', "rb")
reader = csv.reader(ifile)
rownum = 0
for row in reader:
if rownum == 0:
header = row
else:
colnum = 0
for col in row:
print '%-8s: %s' % (header[colnum], col)
colnum += 1
rownum += 1
ifile.close()
Attempt #2
import csv
dataFile = open('names.csv','rb')
reader = csv.reader(dataFile)
next(reader, None)
for row in reader:
if (row in reader )
print (row[0])
I haven't saved many attempts because none of them have worked :S

import csv
dataFile = open('names.csv','rb')
reader = csv.reader(dataFile, delimiter=',', quoting=csv.QUOTE_NONE)
for row in reader:
if not row[0] or not row[1]:
continue
print (row[0][0] + row[1]).lower()
Or
import csv
dataFile = open('names.csv','rb')
reader = csv.reader(dataFile, delimiter=',', quoting=csv.QUOTE_NONE)
[(row[0][0] + row[1]).lower() for row in reader if
row[0] and row[1]]

Once you get the text from the .csv you can use the split() function to break up the text by the new lines. Your sample text is a little inconsistent, but if I understand you question correctly you can say
import csv
dataFile = open('names.csv','rb')
reader = csv.reader(dataFile)
reader = reader.split('\n')
for x in reader
print(reader[x])
Or if you want to break it up by commas just replace the '\n' with ','

Maybe like this
from csv import DictReader
with open('names.csv') as f:
dw = DictReader(f, skipinitialspace=True)
fullnames = filter(lambda n: n['First_name'] and n['Last_name'], dw)
for f in fullnames:
print('{}{}'.format(f['First_name'][0], f['Last_name']))
You have headings in your csv so use a DictReader and just filter out those whose with empty first or last names and display the remaining names.

How to read multiple records from a CSV file?

I have a csv file, l__cyc.csv, that contains this:
trip_id, time, O_lat, O_lng, D_lat, D_lng
130041910101,1300,51.5841153671,0.134444590094,51.5718053872,0.134878021928
130041910102,1335,51.5718053872,0.134878021928,51.5786920389,0.180940040247
130041910103,1600,51.5786920389,0.180940040247,51.5841153671,0.134444590094
130043110201,1500,51.5712712038,0.138532882664,51.5334949484,0.130489470325
130043110202,1730,51.5334949484,0.130489470325,51.5712712038,0.138532882664
And I am trying to pull out separate values, using:
with open('./l__cyc.csv', 'rU') as csvfile:
reader = csv.DictReader(csvfile)
origincoords = ['{O_lat},{O_lng}'.format(**row) for row in reader]
with open('./l__cyc.csv', 'rU') as csvfile:
reader = csv.DictReader(csvfile)
trip_id = ['{trip_id}'.format(**row) for row in reader]
with open('./l__cyc.csv', 'rU') as csvfile:
reader = csv.DictReader(csvfile)
destinationcoords = ['{D_lat},{D_lng}'.format(**row) for row in reader]
Where origincoords should be 51.5841153671, 0.134444590094,
trip_id should be 130041910101, and destinationcoords should be
51.5718053872, 0.134878021928.
However, I get a KeyError:
KeyError: 'O_lat'
Is this something simple and there's something fundamental I'm misunderstanding?

You just avoid the space between headers
trip_id,time,O_lat,O_lng,D_lat,D_lng
OR
reader = csv.DictReader(csvfile, skipinitialspace=True)

First things first, you get the key error, because the key does not exist in your dictionary.
Next, I would advise against running through the file 3 times, when you can do it a single time!
For me it worked, when I added the fieldnames to the reader.
import csv
from cStringIO import StringIO
src = """trip_id, time, O_lat, O_lng, D_lat, D_lng
130041910101,1300,51.5841153671,0.134444590094,51.5718053872,0.134878021928
130041910102,1335,51.5718053872,0.134878021928,51.5786920389,0.180940040247
130041910103,1600,51.5786920389,0.180940040247,51.5841153671,0.134444590094
130043110201,1500,51.5712712038,0.138532882664,51.5334949484,0.130489470325
130043110202,1730,51.5334949484,0.130489470325,51.5712712038,0.138532882664
"""
f = StringIO(src)
# determine the fieldnames
fieldnames= "trip_id,time,O_lat,O_lng,D_lat,D_lng".split(",")
# read the file
reader = csv.DictReader(f, fieldnames=fieldnames)
# storage
origincoords = []
trip_id = []
destinationcoords = []
# iterate the rows
for row in reader:
origincoords.append('{O_lat},{O_lng}'.format(**row))
trip_id.append('{trip_id}'.format(**row))
destinationcoords.append('{D_lat},{D_lng}'.format(**row))
# pop the header off the list
origincoords.pop(0)
trip_id.pop(0)
destinationcoords.pop(0)
# show the result
print origincoords
print trip_id
print destinationcoords
I don't really know what you are trying to achieve there, but I'm sure there is a better way of doing it!

python csv write only certain fieldnames, not all

I must be missing something, but I don't get it. I have a csv, it has 1200 fields. I'm only interested in 30. How do you get that to work? I can read/write the whole shebang, which is ok, but i'd really like to just write out the 30. I have a list of the fieldnames and I'm kinda hacking the header.
How would I translate below to use DictWriter/Reader?
for file in glob.glob( os.path.join(raw_path, 'P12*.csv') ):
fileReader = csv.reader(open(file, 'rb'))
fileLength = len(file)
fileGeom = file[fileLength-7:fileLength-4]
table = TableValues[fileGeom]
filename = file.split(os.sep)[-1]
with open(out_path + filename, "w") as fileout:
for line in fileReader:
writer = csv.writer(fileout, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
if 'ID' in line:
outline = line.insert(0,"geometryTable")
else:
outline = line.insert(0,table) #"%s,%s\n" % (line, table)
writer.writerow(line)

Here's an example of using DictWriter to write out only fields you care about. I'll leave the porting work to you:
import csv
headers = ['a','b','d','g']
with open('in.csv','rb') as _in, open('out.csv','wb') as out:
reader = csv.DictReader(_in)
writer = csv.DictWriter(out,headers,extrasaction='ignore')
writer.writeheader()
for line in reader:
writer.writerow(line)
in.csv
a,b,c,d,e,f,g,h
1,2,3,4,5,6,7,8
2,3,4,5,6,7,8,9
Result (out.csv)
a,b,d,g
1,2,4,7
2,3,5,8

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parsing a file with readlines and split function in python - python

use csv module import csv with open("test.csv", "r") as f: reader = csv.DictReader(f) for line in reader: print(line['Date_of_birth'])

Sorry for my mistake Check this dates = [] with open("test.csv") as f: for row in f: dates.append(row.split()[0])

The readline function returns only one line at a time, so you have to use a while loop to read the lines: with open("test.csv", "r") as f: dates = [] while True: line = f.readline() if not line: # if line is blank, there are no more lines break # stop the loop dates.append(line.split()[0])

Related

When i try to remove a row from the csv the file size is multipliying

How to delete only one row from a CSV file with python?

(Simple Python) CSV input to usernames

How to read multiple records from a CSV file?

python csv write only certain fieldnames, not all

Categories

Resources