Multiple passes trough csv.reader in python - python

trying to implement nested "for" loop in CSV files search in way - 'name' found in one CSV file being searched in other file. Here is code example:
import csv
import re
# Open the input file
with open("Authentication.csv", "r") as citiAuthen:
with open("Authorization.csv", "r") as citiAuthor:
#Set up CSV reader and process the header
csvAuthen = csv.reader(citiAuthen, quoting=csv.QUOTE_ALL, skipinitialspace=True)
headerAuthen = next(csvAuthen)
userIndex = headerAuthen.index("'USERNAME'")
statusIndex = headerAuthen.index("'STATUS'")
csvAuthor = csv.reader(citiAuthor)
headerAuthor = next(csvAuthor)
userAuthorIndex = headerAuthor.index("'USERNAME'")
iseAuthorIndex = headerAuthor.index("'ISE_NODE'")
# Make an empty list
userList = []
usrNumber = 0
# Loop through the authen file and build a list of
for row in csvAuthen:
user = row[userIndex]
#status = row[statusIndex]
#if status == "'Pass'" :
for rowAuthor in csvAuthor:
userAuthor = rowAuthor[userAuthorIndex]
print userAuthor
What is happening that "print userAuthor" make just one pass, while it has to make as many passes as there rows in csvAuthen.
What I am doing wrong? Any help is really appreciated.

You're reading the both files line-by-line from storage. When you search csvAuthor the first time, if the value you are searching for is not found, the file pointer remains at the end of the file after the search. The next search will start at the end of the file and return immediately. You could need to reset the file pointer to the beginning of the file before each search. Probably better just to read both files into memory before you start searching them.

Related

For loop isn't working in python 3

I'm trying to save a file from a URL into a folder on my computer, but I have 732 URLs (that when saved, gives experimental data) in a list. I'm trying to run a for loop on all those URLs to save each data set into its own file. This is what I'm doing right now:
for i in ExperimentURLs:
myurl123 = str(i)
myreq = urllib.request.urlopen(myurl123)
mydata = myreq.read()
with open('/Users/lauren/Desktop/IDData/file', 'wb') as ofile:
ofile.write(mydata)
ExperimentURLs is my list of URLs, but I don't know how to handle the for loop to save each data set into a new file. Currently, this code only writes a single experiment's data into a file and stops there. If I try to save it to a different file name, it takes a different experiment's data and saves that to the file. Help?
First, you need to automatically generate a new output file name every time through the loop. I'll give you the trivial version below. Also, note that the URLs are already strings; you don't have to convert them.
pos = 0
for myurl123 in ExperimentURLs:
myreq = urllib.request.urlopen(myurl123)
mydata = myreq.read()
out_file = '/Users/lauren/Desktop/IDData/file' + str(pos)
with open(out_file, 'wb') as ofile:
ofile.write(mydata)
pos += 1
Does that solve your problem?
BTW, you can do the two iterations in parallel with
for i, myurl123 in enumerate(ExperimentURLs):
Your mistake is simply at the point of writing the files. Not that the for loop is not working. You are writing to the same file again and again. Here is a modified version, using requests. All you need to do is simply change the file name when saving.
import requests
ExperimentURLs = [
"https://www.google.com",
"https://www.yahoo.com"
]
counter = 0;
for i in ExperimentURLs:
myurl123 = str(i)
r = requests.get(myurl123)
mydata = r.text.encode('utf-8').strip()
fileName = counter
with open("results/"+str(fileName)+".html", 'w') as ofile:
ofile.write(mydata)
counter += 1

Reading CSV file with python

filename = 'NTS.csv'
mycsv = open(filename, 'r')
mycsv.seek(0, os.SEEK_END)
while 1:
time.sleep(1)
where = mycsv.tell()
line = mycsv.readline()
if not line:
mycsv.seek(where)
else:
arr_line = line.split(',')
var3 = arr_line[3]
print (var3)
I have this Paython code which is reading the values from a csv file every time there is a new line printed in the csv from external program. My problem is that the csv file is periodically completely rewriten and then python stops reading the new lines. My guess is that python is stuck on some line number and the new update can put maybe 50 more or less lines. So for example python is now waiting a new line at line 70 and the new line has come at line 95. I think the solution is to let mycsv.seek(0, os.SEEK_END) been updated but not sure how to do that.
What you want to do is difficult to accomplish without rewinding the file every time to make sure that you are truly on the last line. If you know approximately how many characters there are on each line, then there is a shortcut you could take using mycsv.seek(-end_buf, os.SEEK_END), as outlined in this answer. So your code could work somehow like this:
avg_len = 50 # use an appropriate number here
end_buf = 3 * avg_len / 2
filename = 'NTS.csv'
mycsv = open(filename, 'r')
mycsv.seek(-end_buf, os.SEEK_END)
last = mycsv.readlines()[-1]
while 1:
time.sleep(1)
mycsv.seek(-end_buf, os.SEEK_END)
line = mycsv.readlines()[-1]
if not line == last:
arr_line = line.split(',')
var3 = arr_line[3]
print (var3)
Here, in each iteration of the while loop, you seek to a position close to the end of the file, just far back enough that you know for sure the last line will be contained in what remains. Then you read in all the remaining lines (this will probably include a partial amount of the second or third to last lines) and check if the last line of these is different to what you had before.
You can do a simpler way of reading lines in your program. Instead of trying to use seek in order to get what you need, try using readlines on the file object mycsv.
You can do the following:
mycsv = open('NTS.csv', 'r')
csv_lines = mycsv.readlines()
for line in csv_lines:
arr_line = line.split(',')
var3 = arr_line[3]
print(var3)

Python function only runs once

I am pre-processing a csv file and want to output 3 dictionaries comprised of the csv file data filtered by a field.
The set-up is:
import csv
from m_functions import region_goals
csvFile = # a file path
mnDict = dict()
nlDict = dict()
neDict = dict()
# READ CSV
weekList = csv.reader(open(csvFile))
# CREATE DICTIONARY FOR THIS WEEK AND REGION
region_goals(weekList, "STR1", neDict)
region_goals(weekList, "STR2", mnDict)
region_goals(weekList, "STR3", nlDict)
The region_goals function is:
def region_goals(csv, region, region_dictionary):
firstline = True
for row in csv:
if firstline:
firstline = False
continue
if row[14] == region:
if row[16] not in region_dictionary:
region_dictionary[row[16]] = float(row[6])
else:
region_dictionary[row[16]] += float(row[6])
else:
continue
return region_dictionary
The output is always as expected for the first use of the function. The second 2 times I use the function, empty dictionaries are returned.
I'm sure this is me missing something small but I am new to python and have been struggling to fix this for a while. Thanks in advance for your responses.
After the first pass, you're at the end of your CSV file and there's nothing left to read, so you need to re-open it.
Also, it's not the best idea to modify object in-place with functions. It's better to return a new object each time.
import csv
from m_functions import region_goals
csvFile = # a file path
regions = ['STR1', 'STR2', 'STR3']
for region in regions:
with csv.reader(open(csvFile)) as weekList:
region_dict = dict()
output = region_goals(weekList, region, region_dict )
Your title is wrong in the sense, that the function is obviously executed multiple times. Otherwise you would not get back empty dicts. The reason for the empty dicts is, that csv.reader returns already an object that behaves like an iterator. So you can iterate it only once. The next two calls will not get any more data. You have to call csv.reader again or you have to read the data into memory and process it three times.
You already read the file after the first function call, you could do a ´seek(0)´ on the opened file. Try something like this:
# READ CSV
f = open(csvFile)
weekList = csv.reader(f)
region_goals(weekList, "STR1", neDict)
f.seek(0)
region_goals(weekList, "STR2", mnDict)
EDIT:
If the file is no too big and/or you handle more memory useage, you could do something like:
# READ CSV
weekList = list(csv.reader(open(csvFile)))
And your code should work, but keep in mind the whole file will be loaded to memory.
The best solution would be to refactor things to populate those three dicts in one pass and call that function once.
Per g.d.d.c's suggestion I modified the function to include the reader and pass the file location rather than the read-in csv.
import csv
def region_goals(csvfile, region, region_dictionary):
weeklist = csv.reader(open(csvfile))
firstline = True
for row in weeklist:
if firstline:
firstline = False
continue
if row[14] == region:
if row[16] not in region_dictionary:
region_dictionary[row[16]] = float(row[6])
else:
region_dictionary[row[16]] += float(row[6])
else:
continue
return region_dictionary
Thank you for all responses!

Using Python to Merge Single Line .dat Files into one .csv file

I am beginner in the programming world and a would like some tips on how to solve a challenge.
Right now I have ~10 000 .dat files each with a single line following this structure:
Attribute1=Value&Attribute2=Value&Attribute3=Value...AttibuteN=Value
I have been trying to use python and the CSV library to convert these .dat files into a single .csv file.
So far I was able to write something that would read all files, store the contents of each file in a new line and substitute the "&" to "," but since the Attribute1,Attribute2...AttributeN are exactly the same for every file, I would like to make them into column headers and remove them from every other line.
Any tips on how to go about that?
Thank you!
Since you are a beginner, I prepared some code that works, and is at the same time very easy to understand.
I assume that you have all the files in the folder called 'input'. The code beneath should be in a script file next to the folder.
Keep in mind that this code should be used to understand how a problem like this can be solved. Optimisations and sanity checks have been left out intentionally.
You might want to check additionally what happens when a value is missing in some line, what happens when an attribute is missing, what happens with a corrupted input etc.. :)
Good luck!
import os
# this function splits the attribute=value into two lists
# the first list are all the attributes
# the second list are all the values
def getAttributesAndValues(line):
attributes = []
values = []
# first we split the input over the &
AtributeValues = line.split('&')
for attrVal in AtributeValues:
# we split the attribute=value over the '=' sign
# the left part goes to split[0], the value goes to split[1]
split = attrVal.split('=')
attributes.append(split[0])
values.append(split[1])
# return the attributes list and values list
return attributes,values
# test the function using the line beneath so you understand how it works
# line = "Attribute1=Value&Attribute2=Value&Attribute3=Vale&AttibuteN=Value"
# print getAttributesAndValues(line)
# this function writes a single file to an output file
def writeToCsv(inFile='', wfile="outFile.csv", delim=","):
f_in = open(inFile, 'r') # only reading the file
f_out = open(wfile, 'ab+') # file is opened for reading and appending
# read the whole file line by line
lines = f_in.readlines()
# loop throug evert line in the file and write its values
for line in lines:
# let's check if the file is empty and write the headers then
first_char = f_out.read(1)
header, values = getAttributesAndValues(line)
# we write the header only if the file is empty
if not first_char:
for attribute in header:
f_out.write(attribute+delim)
f_out.write("\n")
# we write the values
for value in values:
f_out.write(value+delim)
f_out.write("\n")
# Read all the files in the path (without dir pointer)
allInputFiles = os.listdir('input/')
allInputFiles = allInputFiles[1:]
# loop through all the files and write values to the csv file
for singleFile in allInputFiles:
writeToCsv('input/'+singleFile)
but since the Attribute1,Attribute2...AttributeN are exactly the same
for every file, I would like to make them into column headers and
remove them from every other line.
input = 'Attribute1=Value1&Attribute2=Value2&Attribute3=Value3'
once for the the first file:
','.join(k for (k,v) in map(lambda s: s.split('='), input.split('&')))
for each file's content:
','.join(v for (k,v) in map(lambda s: s.split('='), input.split('&')))
Maybe you need to trim the strings additionally; don't know how clean your input is.
Put the dat files in a folder called myDats. Put this script next to the myDats folder along with a file called temp.txt. You will also need your output.csv. [That is, you will have output.csv, myDats, and mergeDats.py in the same folder]
mergeDats.py
import csv
import os
g = open("temp.txt","w")
for file in os.listdir('myDats'):
f = open("myDats/"+file,"r")
tempData = f.readlines()[0]
tempData = tempData.replace("&","\n")
g.write(tempData)
f.close()
g.close()
h = open("text.txt","r")
arr = h.read().split("\n")
dict = {}
for x in arr:
temp2 = x.split("=")
dict[temp2[0]] = temp2[1]
with open('output.csv','w' """use 'wb' in python 2.x""" ) as output:
w = csv.DictWriter(output,my_dict.keys())
w.writeheader()
w.writerow(my_dict)

Python script not iterating through array

So, I recently got into learning python and at work we wanted some way to make the process of finding specific keywords in our log files easier, to make it easier to tell what IPs to add to our block list.
I decided to go about writing a python script that would take in a logfile, take in a file with a list of key terms, and then look for those key terms in the log file and then write the lines that matched the session IDs where that key term was found; to a new file.
import sys
import time
import linecache
from datetime import datetime
def timeStamped(fname, fmt='%Y-%m-%d-%H-%M-%S_{fname}'):
return datetime.now().strftime(fmt).format(fname=fname)
importFile = open('rawLog.txt', 'r') #pulling in log file
importFile2 = open('keyWords.txt', 'r') #pulling in keywords
exportFile = open(timeStamped('ParsedLog.txt'), 'w') #writing the parsed log
FILE = importFile.readlines()
keyFILE = importFile2.readlines()
logLine = 1 #for debugging purposes when testing
parseString = ''
holderString = ''
sessionID = []
keyWords= []
j = 0
for line in keyFILE: #go through each line in the keyFile
keyWords = line.split(',') #add each word to the array
print(keyWords)#for debugging purposes when testing, this DOES give all the correct results
for line in FILE:
if keyWords[j] in line:
parseString = line[29:35] #pulling in session ID
sessionID.append(parseString) #saving session IDs to a list
elif importFile == '' and j < len(keyWords): #if importFile is at end of file and we are not at the end of the array
importFile.seek(0) #goes back to the start of the file
j+=1 #advance the keyWords array
logLine +=1 #for debugging purposes when testing
importFile2.close()
print(sessionID) #for debugging purposes when testing
importFile.seek(0) #goes back to the start of the file
i = 0
for line in FILE:
if sessionID[i] in line[29:35]: #checking if the sessionID matches (doing it this way since I ran into issues where some sessionIDs matched parts of the log file that were not sessionIDs
holderString = line #pulling the line of log file
exportFile.write(holderString)#writing the log file line to a new text file
print(holderString) #for debugging purposes when testing
if i < len(sessionID):
i+=1
importFile.close()
exportFile.close()
It is not iterating across my keyWords list, I probably made some stupid rookie mistake but I am not experienced enough to realize what I messed up. When I check the output it is only searching for the first item in the keyWords list in the rawLog.txt file.
The third loop does return the results that appear based on the sessionIDs that the second list pulls and does attempt to iterate (this gives an out of bounds exception due to i never being less than the length of the sessionID list, due to sessionID only having 1 value).
The program does write to and name the new logfile sucessfully, with a DateTime followed by ParsedLog.txt.
It looks to me like your second loop needs an inner loop instead of an inner if statement. E.g.
for line in FILE:
for word in keyWords:
if word in line:
parseString = line[29:35] #pulling in session ID
sessionID.append(parseString) #saving session IDs to a list
break # Assuming there will only be one keyword per line, else remove this
logLine +=1 #for debugging purposes when testing
importFile2.close()
print(sessionID) #for debugging purposes when testing
Assuming I have understood correctly, that is.
If the elif is never True you never increase j so you either need to increment always or check that the elif statement is actually ever evaluating to True
for line in FILE:
if keyWords[j] in line:
parseString = line[29:35] #pulling in session ID
sessionID.append(parseString) #saving session IDs to a list
elif importFile == '' and j < len(keyWords): #if importFile is at end of file and we are not at the end of the array
importFile.seek(0) #goes back to the start of the file
j+=1 # always increase
Looking at the above loop, you create the file object with importFile = open('rawLog.txt', 'r') earlier in your code so comparing elif importFile == '' will never be True as importFile is a file object not a string.
You assign FILE = importFile.readlines() so that does exhaust the iterator creating the FILE list, you importFile.seek(0) but don't actually use the file object anywhere again.
So basically you loop one time over FILE, j never increases and your code then moves to the next block.
What you actually need are nested loops, using any to see if any word from keyWords is in each line and forget about your elif :
for line in FILE:
if any(word in line for word in keyWords):
parseString = line[29:35] #pulling in session ID
sessionID.append(parseString) #saving session IDs to a list
The same logic applies to your next loop:
for line in FILE:
if any(sess in line[29:35] for sess in sessionID ): #checking if the sessionID matches (doing it this way since I ran into issues where some sessionIDs matched parts of the log file that were not sessionIDs
exportFile.write(line)#writing the log file line to a new text file
holderString = line does nothing bar refer to the same object line so you can simply exportFile.write(line) and forget the assignment.
On a sidenote use lowercase and underscores for variables etc.. holderString -> holder_string and using with to open your files would be best as it also closes them for.
with open('rawLog.txt') as import_file:
log_lines = import_file.readlines()
I also changed FILE to log_lines, using more descriptive names makes your code easier to follow.

Categories

Resources