Python for loop only running once?

Python for loop only running once? - python

This script looks interrogates a csv containing species names against a database in a csv and returns if they are in both. The issue is while it is still reading all the terms to search fine, it is only searching the first one. i.e. if I print speciesl before 'for row in p' all species names are returned correctly
from pathlib import Path
import os
import csv
p = csv.reader(open('Paldat.csv','r',newline=''), delimiter=',')
with open('newsssssss.csv','r',newline='\n')as r:
for line in r:
taxons=line.split(',')
no = ['\r\n']
noo = ['\n']
if taxons == no:
continue
elif taxons == noo:
continue
else:
speciesl = []
for val in taxons:
val = val.replace('\n','')
speciesl.append(val)
g=speciesl[0].lower()
if len(speciesl) < 2:
continue
else:
s=speciesl[1].lower()
for row in p: #This loop seems to be the issue
genus = row[0].lower()
species = row[1].lower()
if g == genus and s == species:
print('Perfect match')
print(g)
elif s == species:
print(speciesl)
print('Species found')
else:
continue
else:
continue
Here is part of Paldat.csv:
Camassia,leichtlinii,monad,monad,large (51-100 µm),-,-,-,-,-,sulcate,heteropolar,oblate,-,elliptic,-,-,boat-shaped,no suitable term,aperture(s) sunken,1,sulcus,sulcate,aperture membrane ornamented,-,-,-,"reticulate, heterobrochate, perforate",-,-,-,-,-,-,-,-,-,-,-,present,,
Cistus,parviflorus,monad,monad,medium-sized (26-50 µm),-,-,-,-,-,colporate,isopolar,-,spheroidal,circular,-,-,spheroidal,circular,"aperture(s) sunken, not infolded",3,colporus,"colporate, tricolporate",-,-,-,-,striato-reticulate,-,-,-,-,-,-,-,-,-,-,-,absent,,
Camellia,japonica,monad,monad,medium-sized (26-50 µm),41-50 µm,36-40 µm,41-50 µm,41-50 µm,41-50 µm,colpate,isopolar,-,spheroidal,circular,oblique,prolate,-,triangular,aperture(s) sunken,3,colpus,"colpate, tricolpate",operculum,"granulate, scabrate, reticulate",-,-,microreticulate,-,-,-,-,-,-,-,-,-,-,-,-,,
Camellia,sinensis,monad,monad,medium-sized (26-50 µm),41-50 µm,36-40 µm,41-50 µm,41-50 µm,41-50 µm,colporate,isopolar,oblate,-,triangular,oblique,isodiametric,-,triangular,aperture(s) sunken,3,colporus,"colporate, tricolporate",operculum,"scabrate, verrucate, gemmate",-,-,"verrucate, perforate",-,-,-,-,-,-,-,-,-,-,-,-,,
And part of newsssssss.csv:
Camassia,leichtlinii
Camellia,japonica
Camellia,sinensis
Chrysanthemum,leucanthemum
Cirsium,arvense
Cissus,quadrangularis

Try removing "newline='\n'" from the "open" line.

Related

Find and replace regex within text file (mac addresses)

This has been asked other places, but no joy when trying those solutions. I am trying to search and replace using open(file) instead of file input. Reason is I am printing a "x of y completed" message as it works (fileinput puts that in the file and not to terminal). My test file is 100 mac addresses separated by new lines.
All I would like to do is find the regex matching a mac address and replace it with "MAC ADDRESS WAS HERE". Below is what I have and it is only putting the replace string once at bottom of file.
#!/usr/bin/env python3
import sys
import getopt
import re
import socket
import os
import fileinput
import time
file = sys.argv[1]
regmac = re.compile("^(([a-fA-F0-9]{2}-){5}[a-fA-F0-9]{2}|([a-fA-F0-9]{2}:){5}[a-fA-F0-9]{2}|([0-9A-Fa-f]{4}\.){2}[0-9A-Fa-f]{4})?$")
regmac1 = "^(([a-fA-F0-9]{2}-){5}[a-fA-F0-9]{2}|([a-fA-F0-9]{2}:){5}[a-fA-F0-9]{2}|([0-9A-Fa-f]{4}\.){2}[0-9A-Fa-f]{4})?$"
regv4 = re.compile(r'^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$')
regv41 = '^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$'
menu = {}
menu['1']="MAC"
menu['2']="IPV4"
menu['3']="IPV6"
menu['4']="STRING"
menu['5']="EXIT"
while True:
options=menu.keys()
sorted(options)
for entry in options:
print(entry, menu[entry])
selection = input("Please Select:")
if selection == '1':
print("MAC chosen...")
id = str('mac')
break
elif selection == '2':
print("IPV4 chosen")
id = str('ipv4')
break
elif selection == '3':
print("IPV6 chosen")
id = str('ipv6')
break
elif selection == '4':
print("String chosen")
id = str('string')
break
elif selection == '5':
print("Exiting...")
exit()
else:
print("Invalid selection!")
macmatch = 0
total = 0
while id == 'mac':
with open(file, 'r') as i:
for line in i.read().split('\n'):
matches = regmac.findall(line)
macmatch += 1
print("I found",macmatch,"MAC addresses")
print("Filtering found MAC addresses")
i.close()
with open(file, 'r+') as i:
text = i.readlines()
text = re.sub(regmac, "MAC ADDRESS WAS HERE", line)
i.write(text)
The above will put "MAC ADDRESS WAS HERE", at the end of the last line while not replacing any MAC addresses.
I am fundamentally missing something. If someone would please point me in right direction that would be great!
caveat, I have this working via fileinput, but cannot display progress from it, so trying using above. Thanks again!

All, I figured it out. Posting working code just in case someone happens upon this post.
#!/usr/bin/env python3
#Rewriting Sanitizer script from bash
#Import Modules, trying to not download any additional packages. Using regex to make this python2 compatible (does not have ipaddress module).
import sys
import getopt
import re
import socket
import os
import fileinput
import time
#Usage Statement sanitize.py /path/to/file, add help statement
#Test against calling entire directories, * usage
#Variables
file = sys.argv[1]
regmac = re.compile("^(([a-fA-F0-9]{2}-){5}[a-fA-F0-9]{2}|([a-fA-F0-9]{2}:){5}[a-fA-F0-9]{2}|([0-9A-Fa-f]{4}\.){2}[0-9A-Fa-f]{4})?$")
regmac1 = "^(([a-fA-F0-9]{2}-){5}[a-fA-F0-9]{2}|([a-fA-F0-9]{2}:){5}[a-fA-F0-9]{2}|([0-9A-Fa-f]{4}\.){2}[0-9A-Fa-f]{4})?$"
regv4 = re.compile(r'^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$')
regv41 = '^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$'
#Functions
menu = {}
menu['1']="MAC"
menu['2']="IPV4"
menu['3']="IPV6"
menu['4']="STRING"
menu['5']="EXIT"
while True:
options=menu.keys()
sorted(options)
for entry in options:
print(entry, menu[entry])
selection = input("Please Select:")
if selection == '1':
print("MAC chosen...")
id = str('mac')
break
elif selection == '2':
print("IPV4 chosen")
id = str('ipv4')
break
elif selection == '3':
print("IPV6 chosen")
id = str('ipv6')
break
elif selection == '4':
print("String chosen")
id = str('string')
break
elif selection == '5':
print("Exiting...")
exit()
else:
print("Invalid selection!")
macmatch = 0
total = 0
while id == 'mac':
with open(file, 'r') as i:
for line in i.read().split('\n'):
matches = regmac.findall(line)
macmatch += 1
print("I found",macmatch,"MAC addresses")
print("Filtering found MAC addresses")
i.close()
with open(file, 'r') as i:
lines = i.readlines()
with open(file, 'w') as i:
for line in lines:
line = re.sub(regmac, "MAC ADDRESS WAS HERE", line)
i.write(line)
i.close()
break
The above overwrites the regex match (found MAC address) with "MAC ADDRESS WAS HERE". Hopefully this helps someone. Any suggestions to make this more efficient or another way to accomplish are welcomed. Will mark as answer once i am able to, 2days.

Having trouble parsing a .CSV file into a dict

I've done some simple .csv parsing in python but have a new file structure that's giving me trouble. The input file is from a spreadsheet converted into a .CSV file. Here is an example of the input:
Layout
Each set can have many layouts, and each layout can have many layers. Each layer has only one layer and name.
Here is the code I am using to parse it in. I suspect it's a logic/flow control problem because I've parsed things in before, just not this deep. The first header row is skipped via code. Any help appreciated!
import csv
import pprint
def import_layouts_schema(layouts_schema_file_name = 'C:\\layouts\\LAYOUT1.csv'):
class set_template:
def __init__(self):
self.set_name =''
self.layout_name =''
self.layer_name =''
self.obj_name =''
def check_layout(st, row, layouts_schema):
c=0
if st.layout_name == '':
st.layer_name = row[c+2]
st.obj_name = row[c+3]
layer = {st.layer_name : st.obj_name}
layout = {st.layout_name : layer}
layouts_schema.update({st.set_name : layout})
else:
st.layout_name = row[c+1]
st.layer_name = row[c+2]
st.obj_name = row[c+3]
layer = {st.layer_name : st.obj_name}
layout = {st.layout_name : layer}
layouts_schema.update({st.set_name : layout})
return layouts_schema
def layouts_schema_parsing(obj_list_raw1): #, location_categories, image_schema, set_location):
#------ init -----------------------------------
skipfirst = True
c = 0
firstrow = True
layouts_schema = {}
end_flag = ''
st = set_template()
#---------- start parsing here -----------------
print('Now parsing layouts schema list')
for row in obj_list_raw1:
#print ('This is the row: ', row)
if skipfirst==True:
skipfirst=False
continue
if row[c] != '':
st.set_name = row[c]
st.layout_name = row[c+1]
st.layer_name = row[c+2]
st.obj_name = row[c+3]
print('FOUND A NEW SET. SET details below:')
print('Set name:', st.set_name, 'Layout name:', st.layout_name, 'Layer name:', st.layer_name, 'Object name:', st.obj_name)
if firstrow == True:
print('First row of layouts import!')
layer = {st.layer_name : st.obj_name}
layout = {st.layout_name : layer}
layouts_schema = {st.set_name : layout}
firstrow = False
check_layout(st, row, layouts_schema)
continue
elif firstrow == False:
print('Not the first row of layout import')
layer = {st.layer_name : st.obj_name}
layout = {st.layout_name : layer}
layouts_schema.update({st.set_name : layout})
check_layout(st, row, layouts_schema)
return layouts_schema
#begin subroutine main
layouts_schema_file_name ='C:\\Users\\jason\\Documents\\RAY\\layout_schemas\\ANIBOT_LAYOUTS_SCHEMA.csv'
full_path_to_file = layouts_schema_file_name
print('============ Importing LAYOUTS schema from: ', full_path_to_file , ' ==============')
openfile = open(full_path_to_file)
reader_ob = csv.reader(openfile)
layout_list_raw1 = list(reader_ob)
layouts_schema = layouts_schema_parsing(layout_list_raw1)
print('=========== End of layouts schema import =========')
return layouts_schema
layouts_schema = import_layouts_schema()
Feel free to throw any part away that doesn't work. I suspect I've inside my head a little bit here. A for loop or another while loop may do the trick. Ultimately I just want to parse the file into a dict with the same key structure shown. i.e. the final dict's first line would look like:
{'RESTAURANT': {'RR_FACING1': {'BACKDROP': 'restaurant1'}}}
And the rest on from there. Ultimately I am goign to use this key structure and the dict for other purposes. Just can't get the parsing down!

Wouaw, that's a lot of code !
Maybe try something simpler :
with open('file.csv') as f:
keys = f.readline().split(';') # assuming ";" is your csv fields separator
for line in f:
vals = line.split(';')
d = dict(zip(keys, vals))
print(d)
Then either make a better data file (without blanks), or have the parser remembering the previous values.

While I agree with #AK47 that the code review site may be the better approach, I received so many help from SO that I'll try to give back a little: IMHO you are overthinking the problem. Please find below an approach that should get you in the right direction and doesn't even require converting from Excel to CSV (I like the xlrd module, it's very easy to use). If you already have a CSV, just exchange the loop in the process_sheet() function. Basically, I just store the last value seen for "SET" and "LAYOUT" and if they are different (and not empty), I set the new value. Hope that helps. And yes, you should think about a better data structure (redundancy is not always bad, if you can avoid empty cells :-) ).
import xlrd
def process_sheet(sheet : xlrd.sheet.Sheet):
curr_set = ''
curr_layout = ''
for rownum in range(1, sheet.nrows):
row = sheet.row(rownum)
set_val = row[0].value.strip()
layout_val = row[1].value.strip()
if set_val != '' and set_val != curr_set:
curr_set = set_val
if layout_val != '' and layout_val != curr_layout:
curr_layout = layout_val
result = {curr_set: {curr_layout: {row[2].value: row[3].value}}}
print(repr(result))
def main():
# open a workbook (adapt your filename)
# then get the first sheet (index 0)
# and call the process function
wbook = xlrd.open_workbook('/tmp/test.xlsx')
sheet = wbook.sheet_by_index(0)
process_sheet(sheet)
if __name__ == '__main__':
main()

Use of 'and' in if statements

I need to check thousands of directories for two kinds of files. I have restricted to the index, or idx, to less than four since within that range there would be the two kinds of files that need to be found, the 'jpg' and the '.thmb'. But I need the the if statement to require that those two kinds of files are in the directory. The if statement:
if ('.jpg' in val) and ('thmb' in val):
works except I keep getting printout through the else statement that data is missing, when it is not true:
Data missing W:\\North2015\200\10 200001000031.jpg 0
Data missing W:\\North2015\200\10 200001000032.jpg 1
Data missing W:\\North2015\200\100 200014000001.jpg 0
Data missing W:\\North2015\200\100 200014000002.jpg 1
Data missing W:\\North2015\200\101 200014100081.jpg 2
Here is the code below:
def missingFileSearch():
for folder in setFinder():
for idx,val in enumerate(os.listdir(folder)):
if idx < 4:
if ('.jpg' in val) and ('thmb' in val):
pass
else:
print'Data missing',folder,val,idx
So i am wondering why I am getting the output through the else statement.
Also, this line of code gets hung up:
if val.endswith('.jpg') and ('thmb' in val):
print'Data is here!',folder,val,idx
This is chiefly what I need the code to do.

I would do this:
def missingFileSearch():
folders_with_missing = []
for folder in setFinder():
thmb_found = False
jpg_found = False
for fname in os.listdir(folder):
thmb_found |= 'thmb' in fname
jpg_found |= fname.endswith('.jpg')
if thmb_found and jpg_found:
break # break inner loop, move on to check next folder
else: # loop not broken
if not thmb_found and not jpg_found:
desc = "no thmb, no .jpg"
elif not thmb_found:
desc = "no thmb"
else:
desc = "no .jpg"
folders_with_missing.append((folder, desc))
return folders_with_missing
I have tested a slightly modified version of this code (no setFinder() function):
def missingFileSearch():
folders_with_missing = []
for folder in os.listdir(my_dir):
thmb_found = False
jpg_found = False
for fname in os.listdir(os.path.join(my_dir, folder)):
thmb_found |= 'thmb' in fname
jpg_found |= fname.endswith('.jpg')
if thmb_found and jpg_found:
break # break inner loop, move on to check next folder
else: # loop not broken
if not thmb_found and not jpg_found:
desc = "no thmb, no .jpg"
elif not thmb_found:
desc = "no thmb"
else:
desc = "no .jpg"
folders_with_missing.append((folder, desc))
return folders_with_missing
I created four test folders with self explanatory names:
>>> os.listdir(my_dir)
['both_thmb_jpg', 'missing_jpg', 'missing_thmb', 'no_files']
Then ran the function:
>>> missingFileSearch()
[('missing_jpg', 'no .jpg'), ('missing_thmb', 'no thmb'), ('no_files', 'no thmb, no .jpg')]

unknown error in jython when use startswith()

I'm using python to analyze some records bib and ris files. I made two functions for each type. The first function is the one you see below:
def limpiarlineasris(self, data):
cont = data
dic = cont.splitlines()
cont = ""
con = []
i = 0
for a in dic:
if len(a) != 0:
con.append(a)
for a in con:
cont = cont + a + "\n"
return cont
That works well and I can compile without problem. The problem arises when I write the second function see below:
def limpiarlineasbib(self, data):
cont = data
dic = cont.splitlines()
cont = ""
con = []
separador = "°-°-°"
for a in dic:
if len(a)!= 0:
if a.startswith('#'):
con.append(separador)
else:
con.append(a)
for a in con:
cont = cont + a + "\n"
return cont
When building the first function no problem. But when I compile the second compiler shows me an error but does not tell me exactly what or where it is because I am using plyjy a jar to create Jython objects, and the console only shows me an exception Plyjy without the line where it occurs. I'm using Netbeans to compile

MapReduce: Join data files and summarize information

I have the following data sets:
Data set #1 that provides shows and the number of viewers of that show:
TVShow1,25
TVShow2,30
TVShow3,7
TVShow1,15
Data set #2 that provides channels that broadcast each show:
TVShow4,BBC
TVShow2,COM
TVShow1,TNT
TVShow3,TNT
I want to calculated the total number of viewers of each show on the channel TNT, e.g.
TVShow1 40
TVShow3 7
I have the following mapper:
#!/usr/bin/env python
import sys
for line in sys.stdin:
line = line.strip()
key_value = line.split(",")
key_in = key_value[0]
value_in = key_value[1]
if (value_in == 'TNT' or value_in.isdigit()):
print( '%s\t%s' % (key_in, value_in) )
And the following reducer:
#!/usr/bin/env python
import sys
prev_TV_show = " "
line_cnt = 0
tnt_found = False
curr_TV_show_total_cnt = 0
for line in sys.stdin:
line = line.strip()
key_value = line.split('\t')
line_cnt = line_cnt+1
curr_TV_show = key_value[0]
value_in = key_value[1]
if curr_TV_show != prev_TV_show:
prev_TV_show = curr_TV_show
if (line_cnt>1 and tnt_found == True):
print('{0} {1}'.format(curr_TV_show,curr_TV_show_total_cnt))
tnt_found = False
curr_TV_show_total_cnt = 0
if (value_in == 'TNT'):
tnt_found = True
else:
curr_TV_show_total_cnt += int(value_in)
Then I tested the code as follows:
cat data_file*.txt | ./my_mapper.py | sort | ./my_reducer.py
However, it seams that total number of viewers of the first line is incorrect. It looks like it is merged between two TV shows. Is there any error in the code related to managing the first line?

I think that there are 2 problems in your code -
Updating prev_TV_show causes you to print the wrong value. You
actually want to print the prev_TV_show with its' count, not the
curr_TV_show
Printing the last iteration value - you need to add an additional print (+condition) outside the loop

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python for loop only running once? - python

Try removing "newline='\n'" from the "open" line.

Related

Find and replace regex within text file (mac addresses)

Having trouble parsing a .CSV file into a dict

Use of 'and' in if statements

unknown error in jython when use startswith()

MapReduce: Join data files and summarize information

Categories

Resources