Append Function Nested Inside IF Statement Body Not Working - python

I am fairly new to Python (just started learning in the last two weeks) and am trying to write a script to parse a csv file to extract some of the fields into a List:
from string import Template
import csv
import string
site1 = 'D1'
site2 = 'D2'
site3 = 'D5'
site4 = 'K0'
site5 = 'K1'
site6 = 'K2'
site7 = '0'
site8 = '0'
site9 = '0'
lbl = 1
portField = 'y'
sw = 5
swpt = 6
cd = 0
pt = 0
natList = []
with open(name=r'C:\Users\dtruman\Documents\PROJECTS\SCRIPTING - NATAERO DEPLOYER\NATAERO DEPLOYER V1\nataero_deploy.csv') as rcvr:
for line in rcvr:
fields = line.split(',')
Site = fields[0]
siteList = [site1,site2,site3,site4,site5,site6,site7,site8,site9]
while Site in siteList == True:
Label = fields[lbl]
Switch = fields[sw]
if portField == 'y':
Switchport = fields[swpt]
natList.append([Switch,Switchport,Label])
else:
Card = fields[cd]
Port = fields[pt]
natList.append([Switch,Card,Port,Label])
print natList
Even if I strip the ELSE statement away and break into my code right after the IF clause-- i can verify that "Switchport" (first statement in IF clause) is successfully being populated with a Str from my csv file, as well as "Switch" and "Label". However, "natList" is not being appended with the fields parsed from each line of my csv for some reason. Python returns no errors-- just does not append "natList" at all.
This is actually going to be a function (once I get the code itself to work), but for now, I am simply setting the function parameters as global variables for the sake of being able to run it in an iPython console without having to call the function.
The "lbl", "sw", "swpt", "cd", and "pt" refer to column#'s in my csv (the finished function will allow user to enter values for these variables).
I assume I am running into some issue with "natList" scope-- but I have tried moving the "natList = []" statement to various places in my code to no avail.
I can run the above in a console, and then run "append.natList([Switch,Switchport,Label])" separately and it works for some reason....?
Thanks for any assistance!

It seems to be that the while condition needs an additional parenthesis. Just add some in this way while (Site in siteList) == True: or a much cleaner way suggested by Padraic while Site in siteList:.
It was comparing boolean object against string object.

Change
while Site in siteList == True:
to
if Site in siteList:

You might want to look into the csv module as this module attempts to make reading and writing csv files simpler, e.g.:
import csv
with open('<file>') as fp:
...
reader = csv.reader(fp)
if portfield == 'y':
natlist = [[row[i] for i in [sw, swpt, lbl]] for row in fp if row[0] in sitelist]
else:
natlist = [[row[i] for i in [sw, cd, pt, lbl]] for row in fp if row[0] in sitelist]
print natlist
Or alternatively using a csv.DictReader which takes the first row as the fieldnames and then returns dictionaries:
import csv
with open('<file>') as fp:
...
reader = csv.DictReader(fp)
if portfield == 'y':
fields = ['Switch', 'card/port', 'Label']
else:
fields = ['Switch', '??', '??', 'Label']
natlist = [[row[f] for f in fields] for row in fp if row['Building/Site'] in sitelist]
print natlist

Related

Cannot save modifications made in xlsx file

I read a .xlsx file, update it but Im not able to save it
from xml.dom import minidom as md
[... some code ....]
sheet = workDir + '/xl/worksheets/sheet'
sheet1 = sheet + '1.xml'
importSheet1 = open(sheet1,'r')
whole_file= importSheet1.read()
data_Sheet = md.parseString(whole_file)
[... some code ....]
self.array_mem_name = []
y = 1
x = 5 #first useful row
day = int(day)
found = 0
while x <= len_array_shared:
readrow = data_Sheet.getElementsByTagName('row')[x]
c_data = readrow.getElementsByTagName('c')[0]
c_attrib = c_data.getAttribute('t')
if c_attrib == 's':
vName = c_data.getElementsByTagName('v')[0].firstChild.nodeValue
#if int(vName) != broken:
mem_name = self.array_shared[int(vName)]
if mem_name != '-----':
if mem_name == old:
c_data = readrow.getElementsByTagName('c')[day]
c_attrib = c_data.getAttribute('t')
if (c_attrib == 's'):
v_Attrib = c_data.getElementsByTagName('v')[0].firstChild.nodeValue
if v_Attrib != '':
#loc = self.array_shared[int(v_Attrib)]
index = self.array_shared.index('--')
c_data.getElementsByTagName('v')[0].firstChild.nodeValue = index
with open(sheet1, 'w') as f:
f.write(whole_file)
As you can see I use f.write(whole_file) but whole_file has not the changes made with index.
Checking the debug I see that the new value has been added to the node, but I can't save sheet1 with the modified value
I switched to using openpyxl instead, as was suggested in a comment by Lei Yang. I found that this tool worked better for my jobs. With openpyxl, reading cell values is much easier than with xml.dom.minidom.
My only concern is that openpyxl seems really slower than the dom to load the workbook. Maybe the memory was overloaded. But, I was more interested in using something simpler than this minor performance issue.

Txt file to excel conversion in python

I'm trying to convert text file to excel sheet in python. The txt file contains data in the below specified formart
Column names: reg no, zip code, loc id, emp id, lastname, first name. Each record has one or more error numbers. Each record have their column names listed above the values. I would like to create an excel sheet containing reg no, firstname, lastname and errors listed in separate rows for each record.
How can I put the records in excel sheet ? Should I be using regular expressions ? And how can I insert error numbers in different rows for that corresponding record?
Expected output:
Here is the link to the input file:
https://github.com/trEaSRE124/Text_Excel_python/blob/master/new.txt
Any code snippets or suggestions are kindly appreciated.
Here is a draft code. Let me know if any changes needed:
# import pandas as pd
from collections import OrderedDict
from datetime import date
import csv
with open('in.txt') as f:
with open('out.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=',', quoting=csv.QUOTE_MINIMAL)
#Remove inital clutter
while("INPUT DATA" not in f.readline()):
continue
header = ["REG NO", "ZIP CODE", "LOC ID", "EMP ID", "LASTNAME", "FIRSTNAME", "ERROR"]; data = list(); errors = list()
spamwriter.writerow(header)
print header
while(True):
line = f.readline()
errors = list()
if("END" in line):
exit()
try:
int(line.split()[0])
data = line.strip().split()
f.readline() # get rid of \n
line = f.readline()
while("ERROR" in line):
errors.append(line.strip())
line = f.readline()
spamwriter.writerow(data + errors)
spamwriter.flush()
except:
continue
# while(True):
# line = f.readline()
Use python-2 to run. The errors are appended as subsequent columns. It's slightly complicated the way you want it. I can fix it if still needed
Output looks like:
You can do this using the openpyxl library which is capable of depositing items directly into a spreadsheet. This code shows how to do that for your particular situation.
NEW_PERSON, ERROR_LINE = 1,2
def Line_items():
with open('katherine.txt') as katherine:
for line in katherine:
line = line.strip()
if not line:
continue
items = line.split()
if items[0].isnumeric():
yield NEW_PERSON, items
elif items[:2] == ['ERROR', 'NUM']:
yield ERROR_LINE, line
else:
continue
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
ws['A2'] = 'REG NO'
ws['B2'] = 'LASTNAME'
ws['C2'] = 'FIRSTNAME'
ws['D2'] = 'ERROR'
row = 2
for kind, data in Line_items():
if kind == NEW_PERSON:
row += 2
ws['A{:d}'.format(row)] = int(data[0])
ws['B{:d}'.format(row)] = data[-2]
ws['C{:d}'.format(row)] = data[-1]
first = True
else:
if first:
first = False
else:
row += 1
ws['D{:d}'.format(row)] = data
wb.save(filename='katherine.xlsx')
This is a screen snapshot of the result.

list index out of range error for loan data

I am trying to recreate this analysis: https://rstudio-pubs-static.s3.amazonaws.com/203258_d20c1a34bc094151a0a1e4f4180c5f6f.html
I could not get the shell script to work on my computer so I created a code to essentially do just that:
import sys
input_file = sys.argv[1]
output_file = sys.argv[2]
in_fp = open(input_file,"r")
out_fp = open(output_file,"w")
count = 0
for line in in_fp:
if count == 1:
out_fp.write(line+"\n")
elif count>1:
elems = line.split(",")
loan = elems[16].upper()
if loan == "FULLY PAID" or loan == "LATE (31-120 DAYS)" or loan == "DEFAULT" or loan == "CHARGED OFF":
out_fp.write(line+"\n")
count+=1
in_fp.close()
out_fp.close()
While this code works for the year 2015 data, when I run it for 2012-2013 data I get the error message:
File "ShellScript.py", line 16, in <module>
loan = elems[16].upper()
IndexError: list index out of range
Can someone please tell me how to fix this error to get the data to sort? Thank you
One of your lines doesn't have 17 elements therefore elems[16] fails. This is usually caused by a blank line in your data. It can also be caused by a quoted field with embedded newlines. If it's a quoted field with embedded newlines you will need to use the csv module.
Here is a rewrite using the csv module. It reports and skips short lines. I have changed it to be more Pythonic.
import sys
import csv
input_file = sys.argv[1]
output_file = sys.argv[2]
ncolumns = 17 # IS THIS RIGHT?
keep_loans = {"FULLY PAID", "LATE (31-120 DAYS)", "DEFAULT", "CHARGED OFF"}
# with statment automatically closes files after block
with open(input_file, "rb") as in_fp, open(output_file, "wb") as out_fp:
reader = csv.reader(in_fp)
writer = csv.writer(out_fp)
# you are currently skipping line 0
next(reader)
# copy headers
writer.writerow(next(reader))
# you are currently adding an extra newline to headers
# writer.writerow([]) # uncomment if you want that extra newline
for row_num, row in enumerate(reader, start=2):
if len(row) < ncolumns:
# report and skip short rows
print "row %s shorter than expected. skipping row. row: %s" % (row_num, row)
continue
# use `in` rather than multiple == statements
if row[16].upper() in keep_loans
writer.writerow(row)

File Operation in Python

What I am trying to do:
I am trying to use 'Open' in python and this is the script I am trying to execute. I am trying to give "restaurant name" as input and a file gets saved (reviews.txt).
Script: (in short, the script goes to a page and scrapes the reviews)
from bs4 import BeautifulSoup
from urllib import urlopen
queries = 0
while queries <201:
stringQ = str(queries)
page = urlopen('http://www.yelp.com/biz/madison-square-park-new-york?start=' + stringQ)
soup = BeautifulSoup(page)
reviews = soup.findAll('p', attrs={'itemprop':'description'})
authors = soup.findAll('span', attrs={'itemprop':'author'})
flag = True
indexOf = 1
for review in reviews:
dirtyEntry = str(review)
while dirtyEntry.index('<') != -1:
indexOf = dirtyEntry.index('<')
endOf = dirtyEntry.index('>')
if flag:
dirtyEntry = dirtyEntry[endOf+1:]
flag = False
else:
if(endOf+1 == len(dirtyEntry)):
cleanEntry = dirtyEntry[0:indexOf]
break
else:
dirtyEntry = dirtyEntry[0:indexOf]+dirtyEntry[endOf+1:]
f=open("reviews.txt", "a")
f.write(cleanEntry)
f.write("\n")
f.close
queries = queries + 40
Problem:
It's using append mode 'a' and according to documentation, 'w' is the write mode where it overwrites. When i change it to 'w' nothing happens.
f=open("reviews.txt", "w") #does not work!
Actual Question:
EDIT: Let me clear the confusion.
I just want ONE review.txt file with all the reviews. Everytime I run the script, I want the script to overwrite the existing review.txt with new reviews according to my input.
Thank you,
If I understand properly what behavior you want, then this should be the right code:
with open("reviews.txt", "w") as f:
for review in reviews:
dirtyEntry = str(review)
while dirtyEntry.index('<') != -1:
indexOf = dirtyEntry.index('<')
endOf = dirtyEntry.index('>')
if flag:
dirtyEntry = dirtyEntry[endOf+1:]
flag = False
else:
if(endOf+1 == len(dirtyEntry)):
cleanEntry = dirtyEntry[0:indexOf]
break
else:
dirtyEntry = dirtyEntry[0:indexOf]+dirtyEntry[endOf+1:]
f.write(cleanEntry)
f.write("\n")
This will open the file for writing only once and will write all the entries to it. Otherwise, if it's nested in for loop, the file is opened for each review and thus overwritten by the next review.
with statement ensures that when the program quits the block, the file will be closed. It also makes code easier to read.
I'd also suggest to avoid using brackets in if statement, so instead of
if(endOf+1 == len(dirtyEntry)):
it's better to use just
if endOf + 1 == len(dirtyEntry):
If you want to write every record to a different new file, you must name it differently, because this way you are always overwritting your old data with new data, and you are left only with the latest record.
You could increment your filename like this:
# at the beginning, above the loop:
i=1
f=open("reviews_{0}.txt".format(i), "a")
f.write(cleanEntry)
f.write("\n")
f.close
i+=1
UPDATE
According to your recent update, I see that this is not what you want. To achieve what you want, you just need to move f=open("reviews.txt", "w") and f.close() outside of the for loop. That way, you won't be opening it multiple times inside a loop, every time overwriting your previous entries:
f=open("reviews.txt", "w")
for review in reviews:
# ... other code here ... #
f.write(cleanEntry)
f.write("\n")
f.close()
But, I encourage you to use with open("reviews.txt", "w") as described in Alexey's answer.

Parsing specific contents in a file

I have a file that looks like this
!--------------------------------------------------------------------------DISK
[DISK]
DIRECTION = 'OK'
TYPE = 'normal'
!------------------------------------------------------------------------CAPACITY
[CAPACITY]
code = 0
ID = 110
I want to read sections [DISK] and [CAPACITY].. there will be more sections like these. I want to read the parameters defined under those sections.
I wrote a following code:
file_open = open(myFile,"r")
all_lines = file_open.readlines()
count = len(all_lines)
file_open.close()
my_data = {}
section = None
data = ""
for line in all_lines:
line = line.strip() #remove whitespace
line = line.replace(" ", "")
if len(line) != 0: # remove white spaces between data
if line[0] == "[":
section = line.strip()[1:]
data = ""
if line[0] !="[":
data += line + ","
my_data[section] = [bit for bit in data.split(",") if bit != ""]
print my_data
key = my_data.keys()
print key
Unfortunately I am unable to get those sections and the data under that. Any ideas on this would be helpful.
As others already pointed out, you should be able to use the ConfigParser module.
Nonetheless, if you want to implement the reading/parsing yourself, you should split it up into two parts.
Part 1 would be the parsing at file level: splitting the file up into blocks (in your example you have two blocks: DISK and CAPACITY).
Part 2 would be parsing the blocks itself to get the values.
You know you can ignore the lines starting with !, so let's skip those:
with open('myfile.txt', 'r') as f:
content = [l for l in f.readlines() if not l.startswith('!')]
Next, read the lines into blocks:
def partition_by(l, f):
t = []
for e in l:
if f(e):
if t: yield t
t = []
t.append(e)
yield t
blocks = partition_by(content, lambda l: l.startswith('['))
and finally read in the values for each block:
def parse_block(block):
gen = iter(block)
block_name = next(gen).strip()[1:-1]
splitted = [e.split('=') for e in gen]
values = {t[0].strip(): t[1].strip() for t in splitted if len(t) == 2}
return block_name, values
result = [parse_block(b) for b in blocks]
That's it. Let's have a look at the result:
for section, values in result:
print section, ':'
for k, v in values.items():
print '\t', k, '=', v
output:
DISK :
DIRECTION = 'OK'
TYPE = 'normal'
CAPACITY :
code = 0
ID = 110
Are you able to make a small change to the text file? If you can make it look like this (only changed the comment character):
#--------------------------------------------------------------------------DISK
[DISK]
DIRECTION = 'OK'
TYPE = 'normal'
#------------------------------------------------------------------------CAPACITY
[CAPACITY]
code = 0
ID = 110
Then parsing it is trivial:
from ConfigParser import SafeConfigParser
parser = SafeConfigParser()
parser.read('filename')
And getting data looks like this:
(Pdb) parser
<ConfigParser.SafeConfigParser instance at 0x100468dd0>
(Pdb) parser.get('DISK', 'DIRECTION')
"'OK'"
Edit based on comments:
If you're using <= 2.7, then you're a little SOL.. The only way really would be to subclass ConfigParser and implement a custom _read method. Really, you'd just have to copy/paste everything in Lib/ConfigParser.py and edit the values in line 477 (2.7.3):
if line.strip() == '' or line[0] in '#;': # add new comment characters in the string
However, if you're running 3'ish (not sure what version it was introduced in offhand, I'm running 3.4(dev)), you may be in luck: ConfigParser added the comment_prefixes __init__ param to allow you to customize your prefix:
parser = ConfigParser(comment_prefixes=('#', ';', '!'))
If the file is not big, you can load it and use Regexes to find parts that are of interest to you.

Categories

Resources