I have this code which I am using to read my dataset from csv file with the CSV module:
keys = []
def ahencrypt(row):
for field in row:
print field
def read_data(filename):
import csv
with open(filename, "rb") as csvfile:
datareader = csv.reader(csvfile)
for row in datareader:
yield row
for row in read_data('dataset/car.data'):
print ahencrypt(row)
My data has just 7 columns, but after reading each row the program is giving my redudant None value. I can't understand the problem here. Can anyone please have a look at it?
PS: I am using this dataset
Your ahencrypt function prints things and returns None. Your code at the end prints the return value of ahencrypt, which is None. You can just remove the print in the last line of your code.
Your ahencrypt function prints a line and implicitly returns a None. You then print that None in this loop:
for row in read_data('dataset/car.data'):
print ahencrypt(row)
Related
For the last part of a Python assignment, I need to iterate through the lists of lists and print to the console the rows with category 'Hardware'. This is the csv file:
Hardware,Hammer,10,10.99
Hardware,Wrench,12,5.75
Food,Beans,32,1.99
Paper,Plates,100,2.59
The following is the code for the last part, which simply opens the file to be read and passed into a list:
def read_text():
with open("products.csv", "r", newline="") as file:
reader = csv.reader(file)
temp_prod = list(reader)
I'm having an issue with coding the right for loop to pull out the 'Hardware' rows. Help would be appreciated! Thanks.
When you are stuck, try to do a lot of printing to see what is going on. You were in a good point. temp_prod gave you a list of lists. You have to iterate through this list of list and ask if the first item is equal to your search criteria.
You code should look like that:
def read_text(my_file):
import csv
with open(my_file, "r", newline="") as file:
reader = csv.reader(file)
temp_prod = list(reader)
for row in temp_prod:
if row[0] == "Hardware":
print(row)
Call the function that way:
read_text("filepath.csv")
Please note that this function (as you requested) doesn't return anything, it only prints.
I am trying to build a small crawler to grab twitter handles. I cannot for the life get around an error I keep having. It seems to be the exact same error for re.search. re.findall and re.finditer. The error is TypeError: expected string or buffer.
The data is structured as followed from the CSV:
30,"texg",#handle,,,,,,,,
Note that the print row works fine, the test = re.... errors out before getting to the print line.
def read_urls(filename):
f = open(filename, 'rb')
reader = csv.reader(f)
data = open('Data.txt', 'w')
dict1 = {}
for row in reader:
print row
test = re.search(r'#(\w+)', row)
print test.group(1)
Also not I have been working through this problem at a number of different threads but all solutions explained have not worked. It just seems like re isn't able to read the row call...
Take a look at your code carefully:
for row in reader:
print row
test = re.search(r'#(\w+)', row)
print test.group(1)
Note that row is a list not a string and according to search documentation:
Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.
That means you should create a string and check whether test is not None
for row in reader:
print row
test = re.search(r'#(\w+)', ''.join(row))
if test:
print test.group(1)
Also open file without b flag like
f = open(filename, 'r')
You're trying to read a list after you run the file through the reader.
import re
f = open('file1.txt', 'r')
for row in f:
print(row)
test = re.search(r'#(\w+)', row)
print(test.group(1))
f.close()
https://repl.it/JCng/1
If you want to use the CSV reader, you can loop through the list.
My .csv has two columns: SKU and LongDesc. I want to search the two rows below, specifically in the LongDesc column, for specific strings. If a string is found, a variable will increase. The variable will only increase once for each time a string is found. If the string is not found, the variable will not remain the same.
I started out with this:
import csv
answer=0
with open('sku.csv') as f:
reader = csv.reader(f)
for row in reader:
def test(x):
while 'x' in row:
answer== answer+1
return answer
print test('e')
Trying to search for the string "e" within the file. However, the only result I get is zero. I'm clearly not coding correctly to have the reader check each row, and it's not searching for the right string.
import csv
def main()
find_text = 'this'
with open('sku.csv') as f:
reader = csv.reader(f)
found = sum(1 for sku,descr in reader if descr.find(find_text) > -1)
if __name__=="__main__":
main()
I have some code that is meant to convert CSV files into tab delimited files. My problem is that I cannot figure out how to write the correct values in the correct order. Here is my code:
for file in import_dir:
data = csv.reader(open(file))
fields = data.next()
new_file = export_dir+os.path.basename(file)
tab_file = open(export_dir+os.path.basename(file), 'a+')
for row in data:
items = zip(fields, row)
item = {}
for (name, value) in items:
item[name] = value.strip()
tab_file.write(item['name']+'\t'+item['order_num']...)
tab_file.write('\n'+item['amt_due']+'\t'+item['due_date']...)
Now, since both my write statements are in the for row in data loop, my headers are being written multiple times over. If I outdent the first write statement, I'll have an obvious formatting error. If I move the second write statement above the first and then outdent, my data will be out of order. What can I do to make sure that the first write statement gets written once as a header, and the second gets written for each line in the CSV file? How do I extract the first 'write' statement outside of the loop without breaking the dictionary? Thanks!
The csv module contains methods for writing as well as reading, making this pretty trivial:
import csv
with open("test.csv") as file, open("test_tab.csv", "w") as out:
reader = csv.reader(file)
writer = csv.writer(out, dialect=csv.excel_tab)
for row in reader:
writer.writerow(row)
No need to do it all yourself. Note my use of the with statement, which should always be used when working with files in Python.
Edit: Naturally, if you want to select specific values, you can do that easily enough. You appear to be making your own dictionary to select the values - again, the csv module provides DictReader to do that for you:
import csv
with open("test.csv") as file, open("test_tab.csv", "w") as out:
reader = csv.DictReader(file)
writer = csv.writer(out, dialect=csv.excel_tab)
for row in reader:
writer.writerow([row["name"], row["order_num"], ...])
As kirelagin points out in the commends, csv.writerows() could also be used, here with a generator expression:
writer.writerows([row["name"], row["order_num"], ...] for row in reader)
Extract the code that writes the headers outside the main loop, in such a way that it only gets written exactly once at the beginning.
Also, consider using the CSV module for writing CSV files (not just for reading), don't reinvent the wheel!
Ok, so I figured it out, but it's not the most elegant solutions. Basically, I just ran the first loop, wrote to the file, then ran it a second time and appended the results. See my code below. I would love any input on a better way to accomplish what I've done here. Thanks!
for file in import_dir:
data = csv.reader(open(file))
fields = data.next()
new_file = export_dir+os.path.basename(file)
tab_file = open(export_dir+os.path.basename(file), 'a+')
for row in data:
items = zip(fields, row)
item = {}
for (name, value) in items:
item[name] = value.strip()
tab_file.write(item['name']+'\t'+item['order_num']...)
tab_file.close()
for file in import_dir:
data = csv.reader(open(file))
fields = data.next()
new_file = export_dir+os.path.basename(file)
tab_file = open(export_dir+os.path.basename(file), 'a+')
for row in data:
items = zip(fields, row)
item = {}
for (name, value) in items:
item[name] = value.strip()
tab_file.write('\n'+item['amt_due']+'\t'+item['due_date']...)
tab_file.close()
Apparently some csv output implementation somewhere truncates field separators from the right on the last row and only the last row in the file when the fields are null.
Example input csv, fields 'c' and 'd' are nullable:
a|b|c|d
1|2||
1|2|3|4
3|4||
2|3
In something like the script below, how can I tell whether I am on the last line so I know how to handle it appropriately?
import csv
reader = csv.reader(open('somefile.csv'), delimiter='|', quotechar=None)
header = reader.next()
for line_num, row in enumerate(reader):
assert len(row) == len(header)
....
Basically you only know you've run out after you've run out. So you could wrap the reader iterator, e.g. as follows:
def isLast(itr):
old = itr.next()
for new in itr:
yield False, old
old = new
yield True, old
and change your code to:
for line_num, (is_last, row) in enumerate(isLast(reader)):
if not is_last: assert len(row) == len(header)
etc.
I am aware it is an old question, but I came up with a different answer than the ones presented. The reader object already increments the line_num attribute as you iterate through it. Then I get the total number of lines at first using row_count, then I compare it with the line_num.
import csv
def row_count(filename):
with open(filename) as in_file:
return sum(1 for _ in in_file)
in_filename = 'somefile.csv'
reader = csv.reader(open(in_filename), delimiter='|')
last_line_number = row_count(in_filename)
for row in reader:
if last_line_number == reader.line_num:
print "It is the last line: %s" % row
If you have an expectation of a fixed number of columns in each row, then you should be defensive against:
(1) ANY row being shorter -- e.g. a writer (SQL Server / Query Analyzer IIRC) may omit trailing NULLs at random; users may fiddle with the file using a text editor, including leaving blank lines.
(2) ANY row being longer -- e.g. commas not quoted properly.
You don't need any fancy tricks. Just an old-fashioned if-test in your row-reading loop:
for row in csv.reader(...):
ncols = len(row)
if ncols != expected_cols:
appropriate_action()
if you want to get exactly the last row try this code:
with open("\\".join([myPath,files]), 'r') as f:
print f.readlines()[-1] #or your own manipulations
If you want to continue working with values from row do the following:
f.readlines()[-1].split(",")[0] #this would let you get columns by their index
Just extend the row to the length of the header:
for line_num, row in enumerate(reader):
while len(row) < len(header):
row.append('')
...
Could you not just catch the error when the csv reader reads the last line in a
try:
... do your stuff here...
except: StopIteration
condition ?
See the following python code on stackoverflow for an example of how to use the try: catch: Python CSV DictReader/Writer issues
If you use for row in reader:, it will just stop the loop after the last item has been read.