My .csv has two columns: SKU and LongDesc. I want to search the two rows below, specifically in the LongDesc column, for specific strings. If a string is found, a variable will increase. The variable will only increase once for each time a string is found. If the string is not found, the variable will not remain the same.
I started out with this:
import csv
answer=0
with open('sku.csv') as f:
reader = csv.reader(f)
for row in reader:
def test(x):
while 'x' in row:
answer== answer+1
return answer
print test('e')
Trying to search for the string "e" within the file. However, the only result I get is zero. I'm clearly not coding correctly to have the reader check each row, and it's not searching for the right string.
import csv
def main()
find_text = 'this'
with open('sku.csv') as f:
reader = csv.reader(f)
found = sum(1 for sku,descr in reader if descr.find(find_text) > -1)
if __name__=="__main__":
main()
Related
I am trying to build a small crawler to grab twitter handles. I cannot for the life get around an error I keep having. It seems to be the exact same error for re.search. re.findall and re.finditer. The error is TypeError: expected string or buffer.
The data is structured as followed from the CSV:
30,"texg",#handle,,,,,,,,
Note that the print row works fine, the test = re.... errors out before getting to the print line.
def read_urls(filename):
f = open(filename, 'rb')
reader = csv.reader(f)
data = open('Data.txt', 'w')
dict1 = {}
for row in reader:
print row
test = re.search(r'#(\w+)', row)
print test.group(1)
Also not I have been working through this problem at a number of different threads but all solutions explained have not worked. It just seems like re isn't able to read the row call...
Take a look at your code carefully:
for row in reader:
print row
test = re.search(r'#(\w+)', row)
print test.group(1)
Note that row is a list not a string and according to search documentation:
Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.
That means you should create a string and check whether test is not None
for row in reader:
print row
test = re.search(r'#(\w+)', ''.join(row))
if test:
print test.group(1)
Also open file without b flag like
f = open(filename, 'r')
You're trying to read a list after you run the file through the reader.
import re
f = open('file1.txt', 'r')
for row in f:
print(row)
test = re.search(r'#(\w+)', row)
print(test.group(1))
f.close()
https://repl.it/JCng/1
If you want to use the CSV reader, you can loop through the list.
I have this code which I am using to read my dataset from csv file with the CSV module:
keys = []
def ahencrypt(row):
for field in row:
print field
def read_data(filename):
import csv
with open(filename, "rb") as csvfile:
datareader = csv.reader(csvfile)
for row in datareader:
yield row
for row in read_data('dataset/car.data'):
print ahencrypt(row)
My data has just 7 columns, but after reading each row the program is giving my redudant None value. I can't understand the problem here. Can anyone please have a look at it?
PS: I am using this dataset
Your ahencrypt function prints things and returns None. Your code at the end prints the return value of ahencrypt, which is None. You can just remove the print in the last line of your code.
Your ahencrypt function prints a line and implicitly returns a None. You then print that None in this loop:
for row in read_data('dataset/car.data'):
print ahencrypt(row)
I have a csv file that needs to add a zero in front of the number if its less than 4 digits.
I only have to update a particular row:
import csv
f = open('csvpatpos.csv')
csv_f = csv.reader(f)
for row in csv_f:
print row[5]
then I want to parse through that row and add a 0 to the front of any number that is shorter than 4 digits. And then input it into a new csv file with the adjusted data.
You want to use string formatting for these things:
>>> '{:04}'.format(99)
'0099'
Format String Syntax documentation
When you think about parsing, you either need to think about regex or pyparsing. In this case, regex would perform the parsing quite easily.
But that's not all, once you are able to parse the numbers, you need to zero fill it. For that purpose, you need to use str.format for padding and justifying the string accordingly.
Consider your string
st = "parse through that row and add a 0 to the front of any number that is shorter than 4 digits."
In the above lines, you can do something like
Implementation
parts = re.split(r"(\d{0,3})", st)
''.join("{:>04}".format(elem) if elem.isdigit() else elem for elem in parts)
Output
'parse through that row and add a 0000 to the front of any number that is shorter than 0004 digits.'
The following code will read in the given csv file, iterate through each row and each item in each row, and output it to a new csv file.
import csv
import os
f = open('csvpatpos.csv')
# open temp .csv file for output
out = open('csvtemp.csv','w')
csv_f = csv.reader(f)
for row in csv_f:
# create a temporary list for this row
temp_row = []
# iterate through all of the items in the row
for item in row:
# add the zero filled value of each temporary item to the list
temp_row.append(item.zfill(4))
# join the current temporary list with commas and write it to the out file
out.write(','.join(temp_row) + '\n')
out.close()
f.close()
Your results will be in csvtemp.csv. If you want to save the data with the original filename, just add the following code to the end of the script
# remove original file
os.remove('csvpatpos.csv')
# rename temp file to original file name
os.rename('csvtemp.csv','csvpatpos.csv')
Pythonic Version
The code above is is very verbose in order to make it understandable. Here is the code refactored to make it more Pythonic
import csv
new_rows = []
with open('csvpatpos.csv','r') as f:
csv_f = csv.reader(f)
for row in csv_f:
row = [ x.zfill(4) for x in row ]
new_rows.append(row)
with open('csvpatpos.csv','wb') as f:
csv_f = csv.writer(f)
csv_f.writerows(new_rows)
Will leave you with two hints:
s = "486"
s.isdigit() == True
for finding what things are numbers.
And
s = "486"
s.zfill(4) == "0486"
for filling in zeroes.
I'm trying to determine the number of columns that are present in a CSV file in python v2.6. This has to be in general, as in, for any input that I pass, I should be able to obtain the number of columns in the file.
Sample input file: love hurt hit
Other input files: car speed beforeTune afterTune repair
So far, what I have tried to do is read the file (with lots of rows), get the first row, and then count the number of words in the first row. Delimiter is ,. I ran into a problem when I try to split headings based on the sample input, and next len(headings) gives me 14 which is wrong as it should give me 3. Any ideas? I am a beginner.
with open(filename1, 'r') as f1:
csvlines = csv.reader(f1, delimiter=',')
for lineNum, line in enumerate(csvlines):
if lineNum == 0:
#colCount = getColCount(line)
headings = ','.join(line) # gives me `love, hurt, hit`
print len(headings) # gives me 14; I need 3
else:
a.append(line[0])
b.append(line[1])
c.append(line[2])
len("love, hurt, hit") is 14 because it's a string.
The len you want is of line, which is a list:
print len(line)
This outputs the number of columns, rather than the number of characters
# old school
import csv
c=0
field={}
with open('csvmsdos.csv', 'r') as csvFile:
reader = csv.reader(csvFile)
for row in reader:
field[c]=row
print(field[c])
c=c+1
row=len (field[0])
column=len(field)
csvFile.close()
A simple solution:
with open(filename1) as file:
# for each row in a given file
for row in file:
# split that row into list elements
# using comma (",") as a separator,
# count the elements and print
print(len(row.split(",")))
# break out of the loop after
# first iteration
break
I've a .xls file that I convert to .csv, and then read this .csv until one specific line that contains the word clientegen, get that row and put it on a array.
This is my code so far:
import xlrd
import csv
def main():
print "Converts xls to csv and reads csv"
wb = xlrd.open_workbook('ejemplo.xls')
sh = wb.sheet_by_name('Hoja1')
archivo_csv = open('fichero_csv.csv', 'wb')
wr = csv.writer(archivo_csv, quoting=csv.QUOTE_ALL)
for rownum in xrange(sh.nrows):
wr.writerow(sh.row_values(rownum))
archivo_csv.close()
f = open('fichero_csv.csv', 'r')
for lines in f:
print lines
if __name__ == '__main__':
main()
This prints me:
[... a lot of more stuff ...]
"marco 4","","","","","","","","","","","","","","",""
"","","","","","","","","","","","","","","",""
"","","","","","","","","","","","","","","",""
"clientegen","maier","embega","Jegan ","tapa pure","cil HUF","carcHUF","tecla NSS","M1 NSS","M2 nss","M3 nss","doble nss","tapon","sagola","clip volvo","pillar"
"pz/bast","33.0","40.0","34.0","26.0","80.0","88.0","18.0","16.0","8.0","6.0","34.0","252.0","6.0","28.0","20.0"
"bast/Barra","5.0","3.0","6.0","8.0","10.0","4.0","10.0","10.0","10.0","10.0","8.0","4.0","6.0","10.0","6.0"
[... a lot of more stuff ...]
The thing I want to do is take that clientegen line, and save the content of the row on a new string array with the name finalarray for example.
finalarray = ["maier", "embega", "Jegan", "tapa pure", "cil HUF", "carcHUF", "tecla NSS", "M1 NSS", "M2 nss", "M3 nss", "doble nss", "tapon", "sagola", "clip volvo", "pillar"]
I'm not a lot into python file's read/read so I would like to know how if someone could give me a hand to find that line, get those values and put them on a array. Thanks in advance.
If you swap this for loop out for your for loop, it should do the trick:
for rownum in xrange(sh.nrows):
row = sh.row_values(rownum)
if row[0] == "clientegen": # Check if "clientgen" is the first element of the row
finalarray = list(row) # If so, make a copy of it and name it `finalarray`
wr.writerow(row)
If there will ever be more than one "clientegen" line, we can adjust this code to save all of them.
If you are just looking for the line that contains clientegen, then you could try:
finalarray = list()
with open("fichero_csv.csv") as f:
for line in f: #loop through all the lines
words = line.split(" ") #get a list of all the words
if "clientegen" in words: #check to see if your word is in the list
finalarray = words #if so, put the word list in your finalarray
break #stop checking any further