Python for loop only running once? - python
This script looks interrogates a csv containing species names against a database in a csv and returns if they are in both. The issue is while it is still reading all the terms to search fine, it is only searching the first one. i.e. if I print speciesl before 'for row in p' all species names are returned correctly
from pathlib import Path
import os
import csv
p = csv.reader(open('Paldat.csv','r',newline=''), delimiter=',')
with open('newsssssss.csv','r',newline='\n')as r:
for line in r:
taxons=line.split(',')
no = ['\r\n']
noo = ['\n']
if taxons == no:
continue
elif taxons == noo:
continue
else:
speciesl = []
for val in taxons:
val = val.replace('\n','')
speciesl.append(val)
g=speciesl[0].lower()
if len(speciesl) < 2:
continue
else:
s=speciesl[1].lower()
for row in p: #This loop seems to be the issue
genus = row[0].lower()
species = row[1].lower()
if g == genus and s == species:
print('Perfect match')
print(g)
elif s == species:
print(speciesl)
print('Species found')
else:
continue
else:
continue
Here is part of Paldat.csv:
Camassia,leichtlinii,monad,monad,large (51-100 µm),-,-,-,-,-,sulcate,heteropolar,oblate,-,elliptic,-,-,boat-shaped,no suitable term,aperture(s) sunken,1,sulcus,sulcate,aperture membrane ornamented,-,-,-,"reticulate, heterobrochate, perforate",-,-,-,-,-,-,-,-,-,-,-,present,,
Cistus,parviflorus,monad,monad,medium-sized (26-50 µm),-,-,-,-,-,colporate,isopolar,-,spheroidal,circular,-,-,spheroidal,circular,"aperture(s) sunken, not infolded",3,colporus,"colporate, tricolporate",-,-,-,-,striato-reticulate,-,-,-,-,-,-,-,-,-,-,-,absent,,
Camellia,japonica,monad,monad,medium-sized (26-50 µm),41-50 µm,36-40 µm,41-50 µm,41-50 µm,41-50 µm,colpate,isopolar,-,spheroidal,circular,oblique,prolate,-,triangular,aperture(s) sunken,3,colpus,"colpate, tricolpate",operculum,"granulate, scabrate, reticulate",-,-,microreticulate,-,-,-,-,-,-,-,-,-,-,-,-,,
Camellia,sinensis,monad,monad,medium-sized (26-50 µm),41-50 µm,36-40 µm,41-50 µm,41-50 µm,41-50 µm,colporate,isopolar,oblate,-,triangular,oblique,isodiametric,-,triangular,aperture(s) sunken,3,colporus,"colporate, tricolporate",operculum,"scabrate, verrucate, gemmate",-,-,"verrucate, perforate",-,-,-,-,-,-,-,-,-,-,-,-,,
And part of newsssssss.csv:
Camassia,leichtlinii
Camellia,japonica
Camellia,sinensis
Chrysanthemum,leucanthemum
Cirsium,arvense
Cissus,quadrangularis
Try removing "newline='\n'" from the "open" line.
Related
Find and replace regex within text file (mac addresses)
This has been asked other places, but no joy when trying those solutions. I am trying to search and replace using open(file) instead of file input. Reason is I am printing a "x of y completed" message as it works (fileinput puts that in the file and not to terminal). My test file is 100 mac addresses separated by new lines. All I would like to do is find the regex matching a mac address and replace it with "MAC ADDRESS WAS HERE". Below is what I have and it is only putting the replace string once at bottom of file. #!/usr/bin/env python3 import sys import getopt import re import socket import os import fileinput import time file = sys.argv[1] regmac = re.compile("^(([a-fA-F0-9]{2}-){5}[a-fA-F0-9]{2}|([a-fA-F0-9]{2}:){5}[a-fA-F0-9]{2}|([0-9A-Fa-f]{4}\.){2}[0-9A-Fa-f]{4})?$") regmac1 = "^(([a-fA-F0-9]{2}-){5}[a-fA-F0-9]{2}|([a-fA-F0-9]{2}:){5}[a-fA-F0-9]{2}|([0-9A-Fa-f]{4}\.){2}[0-9A-Fa-f]{4})?$" regv4 = re.compile(r'^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$') regv41 = '^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$' menu = {} menu['1']="MAC" menu['2']="IPV4" menu['3']="IPV6" menu['4']="STRING" menu['5']="EXIT" while True: options=menu.keys() sorted(options) for entry in options: print(entry, menu[entry]) selection = input("Please Select:") if selection == '1': print("MAC chosen...") id = str('mac') break elif selection == '2': print("IPV4 chosen") id = str('ipv4') break elif selection == '3': print("IPV6 chosen") id = str('ipv6') break elif selection == '4': print("String chosen") id = str('string') break elif selection == '5': print("Exiting...") exit() else: print("Invalid selection!") macmatch = 0 total = 0 while id == 'mac': with open(file, 'r') as i: for line in i.read().split('\n'): matches = regmac.findall(line) macmatch += 1 print("I found",macmatch,"MAC addresses") print("Filtering found MAC addresses") i.close() with open(file, 'r+') as i: text = i.readlines() text = re.sub(regmac, "MAC ADDRESS WAS HERE", line) i.write(text) The above will put "MAC ADDRESS WAS HERE", at the end of the last line while not replacing any MAC addresses. I am fundamentally missing something. If someone would please point me in right direction that would be great! caveat, I have this working via fileinput, but cannot display progress from it, so trying using above. Thanks again!
All, I figured it out. Posting working code just in case someone happens upon this post. #!/usr/bin/env python3 #Rewriting Sanitizer script from bash #Import Modules, trying to not download any additional packages. Using regex to make this python2 compatible (does not have ipaddress module). import sys import getopt import re import socket import os import fileinput import time #Usage Statement sanitize.py /path/to/file, add help statement #Test against calling entire directories, * usage #Variables file = sys.argv[1] regmac = re.compile("^(([a-fA-F0-9]{2}-){5}[a-fA-F0-9]{2}|([a-fA-F0-9]{2}:){5}[a-fA-F0-9]{2}|([0-9A-Fa-f]{4}\.){2}[0-9A-Fa-f]{4})?$") regmac1 = "^(([a-fA-F0-9]{2}-){5}[a-fA-F0-9]{2}|([a-fA-F0-9]{2}:){5}[a-fA-F0-9]{2}|([0-9A-Fa-f]{4}\.){2}[0-9A-Fa-f]{4})?$" regv4 = re.compile(r'^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$') regv41 = '^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$' #Functions menu = {} menu['1']="MAC" menu['2']="IPV4" menu['3']="IPV6" menu['4']="STRING" menu['5']="EXIT" while True: options=menu.keys() sorted(options) for entry in options: print(entry, menu[entry]) selection = input("Please Select:") if selection == '1': print("MAC chosen...") id = str('mac') break elif selection == '2': print("IPV4 chosen") id = str('ipv4') break elif selection == '3': print("IPV6 chosen") id = str('ipv6') break elif selection == '4': print("String chosen") id = str('string') break elif selection == '5': print("Exiting...") exit() else: print("Invalid selection!") macmatch = 0 total = 0 while id == 'mac': with open(file, 'r') as i: for line in i.read().split('\n'): matches = regmac.findall(line) macmatch += 1 print("I found",macmatch,"MAC addresses") print("Filtering found MAC addresses") i.close() with open(file, 'r') as i: lines = i.readlines() with open(file, 'w') as i: for line in lines: line = re.sub(regmac, "MAC ADDRESS WAS HERE", line) i.write(line) i.close() break The above overwrites the regex match (found MAC address) with "MAC ADDRESS WAS HERE". Hopefully this helps someone. Any suggestions to make this more efficient or another way to accomplish are welcomed. Will mark as answer once i am able to, 2days.
Having trouble parsing a .CSV file into a dict
I've done some simple .csv parsing in python but have a new file structure that's giving me trouble. The input file is from a spreadsheet converted into a .CSV file. Here is an example of the input: Layout Each set can have many layouts, and each layout can have many layers. Each layer has only one layer and name. Here is the code I am using to parse it in. I suspect it's a logic/flow control problem because I've parsed things in before, just not this deep. The first header row is skipped via code. Any help appreciated! import csv import pprint def import_layouts_schema(layouts_schema_file_name = 'C:\\layouts\\LAYOUT1.csv'): class set_template: def __init__(self): self.set_name ='' self.layout_name ='' self.layer_name ='' self.obj_name ='' def check_layout(st, row, layouts_schema): c=0 if st.layout_name == '': st.layer_name = row[c+2] st.obj_name = row[c+3] layer = {st.layer_name : st.obj_name} layout = {st.layout_name : layer} layouts_schema.update({st.set_name : layout}) else: st.layout_name = row[c+1] st.layer_name = row[c+2] st.obj_name = row[c+3] layer = {st.layer_name : st.obj_name} layout = {st.layout_name : layer} layouts_schema.update({st.set_name : layout}) return layouts_schema def layouts_schema_parsing(obj_list_raw1): #, location_categories, image_schema, set_location): #------ init ----------------------------------- skipfirst = True c = 0 firstrow = True layouts_schema = {} end_flag = '' st = set_template() #---------- start parsing here ----------------- print('Now parsing layouts schema list') for row in obj_list_raw1: #print ('This is the row: ', row) if skipfirst==True: skipfirst=False continue if row[c] != '': st.set_name = row[c] st.layout_name = row[c+1] st.layer_name = row[c+2] st.obj_name = row[c+3] print('FOUND A NEW SET. SET details below:') print('Set name:', st.set_name, 'Layout name:', st.layout_name, 'Layer name:', st.layer_name, 'Object name:', st.obj_name) if firstrow == True: print('First row of layouts import!') layer = {st.layer_name : st.obj_name} layout = {st.layout_name : layer} layouts_schema = {st.set_name : layout} firstrow = False check_layout(st, row, layouts_schema) continue elif firstrow == False: print('Not the first row of layout import') layer = {st.layer_name : st.obj_name} layout = {st.layout_name : layer} layouts_schema.update({st.set_name : layout}) check_layout(st, row, layouts_schema) return layouts_schema #begin subroutine main layouts_schema_file_name ='C:\\Users\\jason\\Documents\\RAY\\layout_schemas\\ANIBOT_LAYOUTS_SCHEMA.csv' full_path_to_file = layouts_schema_file_name print('============ Importing LAYOUTS schema from: ', full_path_to_file , ' ==============') openfile = open(full_path_to_file) reader_ob = csv.reader(openfile) layout_list_raw1 = list(reader_ob) layouts_schema = layouts_schema_parsing(layout_list_raw1) print('=========== End of layouts schema import =========') return layouts_schema layouts_schema = import_layouts_schema() Feel free to throw any part away that doesn't work. I suspect I've inside my head a little bit here. A for loop or another while loop may do the trick. Ultimately I just want to parse the file into a dict with the same key structure shown. i.e. the final dict's first line would look like: {'RESTAURANT': {'RR_FACING1': {'BACKDROP': 'restaurant1'}}} And the rest on from there. Ultimately I am goign to use this key structure and the dict for other purposes. Just can't get the parsing down!
Wouaw, that's a lot of code ! Maybe try something simpler : with open('file.csv') as f: keys = f.readline().split(';') # assuming ";" is your csv fields separator for line in f: vals = line.split(';') d = dict(zip(keys, vals)) print(d) Then either make a better data file (without blanks), or have the parser remembering the previous values.
While I agree with #AK47 that the code review site may be the better approach, I received so many help from SO that I'll try to give back a little: IMHO you are overthinking the problem. Please find below an approach that should get you in the right direction and doesn't even require converting from Excel to CSV (I like the xlrd module, it's very easy to use). If you already have a CSV, just exchange the loop in the process_sheet() function. Basically, I just store the last value seen for "SET" and "LAYOUT" and if they are different (and not empty), I set the new value. Hope that helps. And yes, you should think about a better data structure (redundancy is not always bad, if you can avoid empty cells :-) ). import xlrd def process_sheet(sheet : xlrd.sheet.Sheet): curr_set = '' curr_layout = '' for rownum in range(1, sheet.nrows): row = sheet.row(rownum) set_val = row[0].value.strip() layout_val = row[1].value.strip() if set_val != '' and set_val != curr_set: curr_set = set_val if layout_val != '' and layout_val != curr_layout: curr_layout = layout_val result = {curr_set: {curr_layout: {row[2].value: row[3].value}}} print(repr(result)) def main(): # open a workbook (adapt your filename) # then get the first sheet (index 0) # and call the process function wbook = xlrd.open_workbook('/tmp/test.xlsx') sheet = wbook.sheet_by_index(0) process_sheet(sheet) if __name__ == '__main__': main()
Use of 'and' in if statements
I need to check thousands of directories for two kinds of files. I have restricted to the index, or idx, to less than four since within that range there would be the two kinds of files that need to be found, the 'jpg' and the '.thmb'. But I need the the if statement to require that those two kinds of files are in the directory. The if statement: if ('.jpg' in val) and ('thmb' in val): works except I keep getting printout through the else statement that data is missing, when it is not true: Data missing W:\\North2015\200\10 200001000031.jpg 0 Data missing W:\\North2015\200\10 200001000032.jpg 1 Data missing W:\\North2015\200\100 200014000001.jpg 0 Data missing W:\\North2015\200\100 200014000002.jpg 1 Data missing W:\\North2015\200\101 200014100081.jpg 2 Here is the code below: def missingFileSearch(): for folder in setFinder(): for idx,val in enumerate(os.listdir(folder)): if idx < 4: if ('.jpg' in val) and ('thmb' in val): pass else: print'Data missing',folder,val,idx So i am wondering why I am getting the output through the else statement. Also, this line of code gets hung up: if val.endswith('.jpg') and ('thmb' in val): print'Data is here!',folder,val,idx This is chiefly what I need the code to do.
I would do this: def missingFileSearch(): folders_with_missing = [] for folder in setFinder(): thmb_found = False jpg_found = False for fname in os.listdir(folder): thmb_found |= 'thmb' in fname jpg_found |= fname.endswith('.jpg') if thmb_found and jpg_found: break # break inner loop, move on to check next folder else: # loop not broken if not thmb_found and not jpg_found: desc = "no thmb, no .jpg" elif not thmb_found: desc = "no thmb" else: desc = "no .jpg" folders_with_missing.append((folder, desc)) return folders_with_missing I have tested a slightly modified version of this code (no setFinder() function): def missingFileSearch(): folders_with_missing = [] for folder in os.listdir(my_dir): thmb_found = False jpg_found = False for fname in os.listdir(os.path.join(my_dir, folder)): thmb_found |= 'thmb' in fname jpg_found |= fname.endswith('.jpg') if thmb_found and jpg_found: break # break inner loop, move on to check next folder else: # loop not broken if not thmb_found and not jpg_found: desc = "no thmb, no .jpg" elif not thmb_found: desc = "no thmb" else: desc = "no .jpg" folders_with_missing.append((folder, desc)) return folders_with_missing I created four test folders with self explanatory names: >>> os.listdir(my_dir) ['both_thmb_jpg', 'missing_jpg', 'missing_thmb', 'no_files'] Then ran the function: >>> missingFileSearch() [('missing_jpg', 'no .jpg'), ('missing_thmb', 'no thmb'), ('no_files', 'no thmb, no .jpg')]
unknown error in jython when use startswith()
I'm using python to analyze some records bib and ris files. I made two functions for each type. The first function is the one you see below: def limpiarlineasris(self, data): cont = data dic = cont.splitlines() cont = "" con = [] i = 0 for a in dic: if len(a) != 0: con.append(a) for a in con: cont = cont + a + "\n" return cont That works well and I can compile without problem. The problem arises when I write the second function see below: def limpiarlineasbib(self, data): cont = data dic = cont.splitlines() cont = "" con = [] separador = "°-°-°" for a in dic: if len(a)!= 0: if a.startswith('#'): con.append(separador) else: con.append(a) for a in con: cont = cont + a + "\n" return cont When building the first function no problem. But when I compile the second compiler shows me an error but does not tell me exactly what or where it is because I am using plyjy a jar to create Jython objects, and the console only shows me an exception Plyjy without the line where it occurs. I'm using Netbeans to compile
MapReduce: Join data files and summarize information
I have the following data sets: Data set #1 that provides shows and the number of viewers of that show: TVShow1,25 TVShow2,30 TVShow3,7 TVShow1,15 Data set #2 that provides channels that broadcast each show: TVShow4,BBC TVShow2,COM TVShow1,TNT TVShow3,TNT I want to calculated the total number of viewers of each show on the channel TNT, e.g. TVShow1 40 TVShow3 7 I have the following mapper: #!/usr/bin/env python import sys for line in sys.stdin: line = line.strip() key_value = line.split(",") key_in = key_value[0] value_in = key_value[1] if (value_in == 'TNT' or value_in.isdigit()): print( '%s\t%s' % (key_in, value_in) ) And the following reducer: #!/usr/bin/env python import sys prev_TV_show = " " line_cnt = 0 tnt_found = False curr_TV_show_total_cnt = 0 for line in sys.stdin: line = line.strip() key_value = line.split('\t') line_cnt = line_cnt+1 curr_TV_show = key_value[0] value_in = key_value[1] if curr_TV_show != prev_TV_show: prev_TV_show = curr_TV_show if (line_cnt>1 and tnt_found == True): print('{0} {1}'.format(curr_TV_show,curr_TV_show_total_cnt)) tnt_found = False curr_TV_show_total_cnt = 0 if (value_in == 'TNT'): tnt_found = True else: curr_TV_show_total_cnt += int(value_in) Then I tested the code as follows: cat data_file*.txt | ./my_mapper.py | sort | ./my_reducer.py However, it seams that total number of viewers of the first line is incorrect. It looks like it is merged between two TV shows. Is there any error in the code related to managing the first line?
I think that there are 2 problems in your code - Updating prev_TV_show causes you to print the wrong value. You actually want to print the prev_TV_show with its' count, not the curr_TV_show Printing the last iteration value - you need to add an additional print (+condition) outside the loop