Why is using .reader() skipping the first line and .readlines() isn't?

Why is using .reader() skipping the first line and .readlines() isn't? - python

I am attempting to read in all the data from a .csv file. First, I tried using csv.reader(), but this would skip the first line of my file. I was able to remedy this using .readlines(), but I am wondering why this happens with .reader() and would like to make it read my first line.
import glob
import csv
new_cards = []
path = 'C:\\Users\\zrc\\Desktop\\GCData2\\*.asc'
files = glob.glob(path)
# First Method
for name in files:
with open(name) as f:
for line in f:
reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
for row in reader:
new_cards.append(row)
print(len(new_cards))
# Second Method
for name in files:
with open(name) as f:
m = f.readlines()
for line in m:
new_cards.append(line)
print(len(new_cards))

In your first function you dont need to use for line in f: this line is taking your first line and then the reader starts from the second.
The correct way should be:
for name in files:
with open(name) as f:
reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
for row in reader:
new_cards.append(row)
print(len(new_cards))
You dont need to iterate over each line in the first one because you are already doing it with for row in reader:

Related

How to ignore lines in a file starting with "##" and load the table in csv module?

So, I have a file which has some 40 lines starting with '##'. After those lines there is a TSV table structure which I want to read using csv.DictReader().
I am trying the following code:
f = open(file, 'r')
for line in f.readlines():
if line.startswith('##'):
next(line)
However, I am not sure how to load the data into csv.DictReader after ignoring these lines. Any suggestions as to how to go about this?

You can use an iterator, which does not realize all of the file in memory (can be a concern if the file is big)
def read_fn():
path = "./text.tsv"
with open(path, "r") as f:
for line in f:
if line.startswith('##'):
continue
yield line
reader = csv.DictReader(read_fn())
for row in reader:
print(row)

Basically you need to create an intermediate list of lines that you then pass to DictReader (I am also adding a with statement) as this is the conventional, Pythonic way of properly handling files in case of exceptions:
good_lines = []
with open(file, 'r') as f:
for line in f.readlines():
if line.startswith('##'):
next(line)
else:
good_lines.append(line)
dr = csv.DictReader(good_lines)

Get length of csv file without ruining reader?

I am trying to do the following:
reader = csv.DictReader(open(self.file_path), delimiter='|')
reader_length = sum([_ for item in reader])
for line in reader:
print line
However, doing the reader_length line, makes the reader itself unreadable. Note that I do not want to do a list() on the reader, as it is too big to read on my machine entirely from memory.

Use enumerate with a start value of 1, when you get to the end of the file you will have the line count:
for count,line in enumerate(reader,1):
# do work
print count
Or if you need the count at the start for some reason sum using a generator expression and seek back to the start of the file:
with open(self.file_path) as f:
reader = csv.DictReader(f, delimiter='|')
count = sum(1 for _ in reader)
f.seek(0)
reader = csv.DictReader(f, delimiter='|')
for line in reader:
print(line)

reader = list(csv.DictReader(open(self.file_path), delimiter='|'))
print len(reader)
is one way to do this i suppose
another way to do it would be
reader = csv.DictReader(open(self.file_path), delimiter='|')
for i,row in enumerate(reader):
...
num_rows = i+1

Reading a specific line from CSV file

I have a ten line CSV file. From this file, I only want the, say, fourth line. What's the quickest way to do this? I'm looking for something like:
with open(file, 'r') as my_file:
reader = csv.reader(my_file)
print reader[3]
where reader[3] is obviously incorrect syntax for what I want to achieve. How do I move the reader to line 4 and get it's content?

If all you have is 10 lines, you can load the whole file into a list:
with open(file, 'r') as my_file:
reader = csv.reader(my_file)
rows = list(reader)
print rows[3]
For a larger file, use itertools.islice():
from itertools import islice
with open(file, 'r') as my_file:
reader = csv.reader(my_file)
print next(islice(reader, 3, 4))

How do I put lines into a list from CSV using python

I am new to Python (coming from PHP background) and I have a hard time figuring out how do I put each line of CSV into a list. I wrote this:
import csv
data=[]
reader = csv.reader(open("file.csv", "r"), delimiter=',')
for line in reader:
if "DEFAULT" not in line:
data+=line
print(data)
But when I print out data, I see that it's treated as one string. I want a list. I want to be able to loop and append every line that does not have "DEFAULT" in a given line. Then write to a new file.

How about this?
import csv
reader = csv.reader(open("file.csv", "r"), delimiter=',')
print([line for line in reader if 'DEFAULT' not in line])
or if it's easier to understand:
import csv
reader = csv.reader(open("file.csv", "r"), delimiter=',')
data = [line for line in reader if 'DEFAULT' not in line]
print(data)
and of course the ultimate one-liner:
import csv
print([l for l in csv.reader(open("file.csv"), delimiter=',') if 'DEFAULT' not in l])

Python reading files in a directory

I have a .csv with 3000 rows of data in 2 columns like this:
uc007ayl.1 ENSMUSG00000041439
uc009mkn.1 ENSMUSG00000031708
uc009mkn.1 ENSMUSG00000035491
In another folder I have a graphs with name like this:
uc007csg.1_nt_counts.txt
uc007gjg.1_nt_counts.txt
You should notice those graphs have a name in the same format of my 1st column
I am trying to use python to identify those rows that have a graph and print the name of 2nd column in a new .txt file
These are the codes I have
import csv
with open("C:/*my dir*/UCSC to Ensembl.csv", "r") as f:
reader = csv.reader(f, delimiter = ',')
for row in reader:
print row[0]
But this as far as I can get and I am stuck.

You're almost there:
import csv
import os.path
with open("C:/*my dir*/UCSC to Ensembl.csv", "rb") as f:
reader = csv.reader(f, delimiter = ',')
for row in reader:
graph_filename = os.path.join("C:/folder", row[0] + "_nt_counts.txt")
if os.path.exists(graph_filename):
print (row[1])
Note that the repeated calls to os.path.exists may slow down the process, especially if the directory lies on a remote filesystem and does not significantly more files than the number of lines in the CSV file. You may want to use os.listdir instead:
import csv
import os
graphs = set(os.listdir("C:/graph folder"))
with open("C:/*my dir*/UCSC to Ensembl.csv", "rb") as f:
reader = csv.reader(f, delimiter = ',')
for row in reader:
if row[0] + "_nt_counts.txt" in graphs:
print (row[1])

First, try to see if print row[0] really gives the correct file identifier.
Second, concatenate the path to the files with row[0] and check if this full path exists (if the file exists, actually) with os.path.exists(path) (see http://docs.python.org/library/os.path.html#os.path.exists ).
If it exits, you can write the row[1] (the second column) to a new file with f2.write("%s\n" % row[1] (first you have to open f2 for writing of course).

Well, the next step would be to check if the file exists? There are a few ways, but I like the EAFP approach.
try:
with open(os.path.join(the_dir,row[0])) as f: pass
except IOError:
print 'Oops no file'
the_dir is the directory where the files are.

result = open('result.txt', 'w')
for line in open('C:/*my dir*/UCSC to Ensembl.csv', 'r'):
line = line.split(',')
try:
open('/path/to/dir/' + line[0] + '_nt_counts.txt', 'r')
except:
continue
else:
result.write(line[1] + '\n')
result.close()

import csv
import os
# get prefixes of all graphs in another directory
suff = '_nt_counts.txt'
graphs = set(fn[:-len(suff)] for fn in os.listdir('another dir') if fn.endswith(suff))
with open(r'c:\path to\file.csv', 'rb') as f:
# extract 2nd column if the 1st one is a known graph prefix
names = (row[1] for row in csv.reader(f, delimiter='\t') if row[0] in graphs)
# write one name per line
with open('output.txt', 'w') as output_file:
for name in names:
print >>output_file, name

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why is using .reader() skipping the first line and .readlines() isn't? - python

Related

How to ignore lines in a file starting with "##" and load the table in csv module?

Get length of csv file without ruining reader?

Reading a specific line from CSV file

How do I put lines into a list from CSV using python

Python reading files in a directory

Categories

Resources