Python List of Dictionaries Only See Last Element - python

Struggling to figure out why this doesn't work. It should. But when I create a list of dictionaries and then look through that list, I only ever see the final entry from the list:
alerts = []
alertDict = {}
af=open("C:\snort.txt")
for line in af:
m = re.match(r'([0-9/]+)-([0-9:.]+)\s+.*?(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(\d{1,5})\s+->\s+(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(\d{1,5})', line)
if m:
attacktime = m.group(2)
srcip = m.group(3)
srcprt = m.group(4)
dstip = m.group(5)
dstprt = m.group(6)
alertDict['Time'] = attacktime
alertDict['Source IP'] = srcip
alertDict['Destination IP'] = dstip
alerts.append(alertDict)
for alert in alerts:
if alert["Time"] == "13:13:42.443062":
print "Found Time"

You create exactly one dict at the beginning of the script, and then append that one dict to the list multiple times.
Try creating multiple individual dicts, by moving the initialization to the inside of the loop.
alerts = []
af=open("C:\snort.txt")
for line in af:
alertDict = {}
#rest of loop goes here

Related

Use exact value from a list to search another list. Append variable with results from search

I need to create a list/dataframe that has component ID's along with their description. I have a list containing the component ID and another list containing the component ID with a description. Only components with an ID in both lists should be displayed along with its description.
I have tried to use the component ID list to exact search in the component and description list. I wasn't able to get a desired output.
desclist = ['R402 MSG ='4k2 1%'','R403 MSG ='100 1%'','R404 MSG ='4k 1%'']
component = ['R402','R403','R404']
combinedlist = []
while count<(len(component) - 1):
while True:
for c in desclist:
if c in component[count]:
combinedlist.append(c)
print(comp[count]+ ' , ' + desclist[count])
count = count + 1
This is not code I've tried but believe is similar to what I need, I'm aware there is no loop until in python.
I expect the output to be something like:
R402 , MSG ='4k2 1%'
This will require me to remove everything before the equals in the description list.
This is a simple (easy to understand) way to accomplish what you need!
desclist = ['R402 MSG = Desc402','R403 MSG = Desc403',
'R404 MSG = Desc404','R405 MSG = Desc405']
component = ['R402','R403','R404','R406']
combinedlist = []
for i in range(len(component)):
found = False
for j in range(len(desclist)):
if str(component[i]) == str(desclist[j]).split(' ')[0]:
found = True
combinedlist.append(component[i] + ', ' + desclist[j].split(' ',1)[1])
print(component[i], ',', desclist[j].split(' ',1)[1])
#print('Comp : ', component[i], 'Desc : ', desclist[j].split(' ',1)[1])
break
if not found:
print(component[i], ' not found in Description List')
print('Combined List : ', combinedlist)
Output:
R402 , MSG = Desc402
R403 , MSG = Desc403
R404 , MSG = Desc404
R406 not found in Description List
Combined List : ['R402, MSG = Desc402', 'R403, MSG = Desc403', 'R404, MSG = Desc404']
I have changed your description & component lists to cover all scenarios you may face. Also, your description list has extra quotes in each element. You would have to use escape characters if you want to keep these quotes in your list.
In your combined list, if you want to remove everything before the equal to sign (in description list) then use any one of the below (depending on all the elements in your description list).
desclist[j].split('=',1)[1]
desclist[j].rpartition('=')[2]
Try this,
>>> desclist = ['R402 MSG = "4k2 1%"','R403 MSG ="100 1%"','R404 MSG ="4k 1%"', 'R407 MSG ="4k 1%"']
# For test i have added 'R407 MSG ="4k 1%"'
>>> component = ['R402','R403','R404']
Output:
>>> from itertools import chain
>>> new_list = [[desc for desc in desclist if cid in desc] for cid in component]
>>> list(chain(*new_list))
['R402 MSG = "4k2 1%"', 'R403 MSG ="100 1%"', 'R404 MSG ="4k 1%"']

Having trouble parsing a .CSV file into a dict

I've done some simple .csv parsing in python but have a new file structure that's giving me trouble. The input file is from a spreadsheet converted into a .CSV file. Here is an example of the input:
Layout
Each set can have many layouts, and each layout can have many layers. Each layer has only one layer and name.
Here is the code I am using to parse it in. I suspect it's a logic/flow control problem because I've parsed things in before, just not this deep. The first header row is skipped via code. Any help appreciated!
import csv
import pprint
def import_layouts_schema(layouts_schema_file_name = 'C:\\layouts\\LAYOUT1.csv'):
class set_template:
def __init__(self):
self.set_name =''
self.layout_name =''
self.layer_name =''
self.obj_name =''
def check_layout(st, row, layouts_schema):
c=0
if st.layout_name == '':
st.layer_name = row[c+2]
st.obj_name = row[c+3]
layer = {st.layer_name : st.obj_name}
layout = {st.layout_name : layer}
layouts_schema.update({st.set_name : layout})
else:
st.layout_name = row[c+1]
st.layer_name = row[c+2]
st.obj_name = row[c+3]
layer = {st.layer_name : st.obj_name}
layout = {st.layout_name : layer}
layouts_schema.update({st.set_name : layout})
return layouts_schema
def layouts_schema_parsing(obj_list_raw1): #, location_categories, image_schema, set_location):
#------ init -----------------------------------
skipfirst = True
c = 0
firstrow = True
layouts_schema = {}
end_flag = ''
st = set_template()
#---------- start parsing here -----------------
print('Now parsing layouts schema list')
for row in obj_list_raw1:
#print ('This is the row: ', row)
if skipfirst==True:
skipfirst=False
continue
if row[c] != '':
st.set_name = row[c]
st.layout_name = row[c+1]
st.layer_name = row[c+2]
st.obj_name = row[c+3]
print('FOUND A NEW SET. SET details below:')
print('Set name:', st.set_name, 'Layout name:', st.layout_name, 'Layer name:', st.layer_name, 'Object name:', st.obj_name)
if firstrow == True:
print('First row of layouts import!')
layer = {st.layer_name : st.obj_name}
layout = {st.layout_name : layer}
layouts_schema = {st.set_name : layout}
firstrow = False
check_layout(st, row, layouts_schema)
continue
elif firstrow == False:
print('Not the first row of layout import')
layer = {st.layer_name : st.obj_name}
layout = {st.layout_name : layer}
layouts_schema.update({st.set_name : layout})
check_layout(st, row, layouts_schema)
return layouts_schema
#begin subroutine main
layouts_schema_file_name ='C:\\Users\\jason\\Documents\\RAY\\layout_schemas\\ANIBOT_LAYOUTS_SCHEMA.csv'
full_path_to_file = layouts_schema_file_name
print('============ Importing LAYOUTS schema from: ', full_path_to_file , ' ==============')
openfile = open(full_path_to_file)
reader_ob = csv.reader(openfile)
layout_list_raw1 = list(reader_ob)
layouts_schema = layouts_schema_parsing(layout_list_raw1)
print('=========== End of layouts schema import =========')
return layouts_schema
layouts_schema = import_layouts_schema()
Feel free to throw any part away that doesn't work. I suspect I've inside my head a little bit here. A for loop or another while loop may do the trick. Ultimately I just want to parse the file into a dict with the same key structure shown. i.e. the final dict's first line would look like:
{'RESTAURANT': {'RR_FACING1': {'BACKDROP': 'restaurant1'}}}
And the rest on from there. Ultimately I am goign to use this key structure and the dict for other purposes. Just can't get the parsing down!
Wouaw, that's a lot of code !
Maybe try something simpler :
with open('file.csv') as f:
keys = f.readline().split(';') # assuming ";" is your csv fields separator
for line in f:
vals = line.split(';')
d = dict(zip(keys, vals))
print(d)
Then either make a better data file (without blanks), or have the parser remembering the previous values.
While I agree with #AK47 that the code review site may be the better approach, I received so many help from SO that I'll try to give back a little: IMHO you are overthinking the problem. Please find below an approach that should get you in the right direction and doesn't even require converting from Excel to CSV (I like the xlrd module, it's very easy to use). If you already have a CSV, just exchange the loop in the process_sheet() function. Basically, I just store the last value seen for "SET" and "LAYOUT" and if they are different (and not empty), I set the new value. Hope that helps. And yes, you should think about a better data structure (redundancy is not always bad, if you can avoid empty cells :-) ).
import xlrd
def process_sheet(sheet : xlrd.sheet.Sheet):
curr_set = ''
curr_layout = ''
for rownum in range(1, sheet.nrows):
row = sheet.row(rownum)
set_val = row[0].value.strip()
layout_val = row[1].value.strip()
if set_val != '' and set_val != curr_set:
curr_set = set_val
if layout_val != '' and layout_val != curr_layout:
curr_layout = layout_val
result = {curr_set: {curr_layout: {row[2].value: row[3].value}}}
print(repr(result))
def main():
# open a workbook (adapt your filename)
# then get the first sheet (index 0)
# and call the process function
wbook = xlrd.open_workbook('/tmp/test.xlsx')
sheet = wbook.sheet_by_index(0)
process_sheet(sheet)
if __name__ == '__main__':
main()

Python - Getting Attributes From A File of Constants

I have a file of constant variables that I need to query and I am not sure how to go about it.
I have a database query which is returning user names and I need to find the matching user name in the file of constant variables.
The file looks like this:
SALES_MANAGER_01 = {"user_name": "BO01", "password": "password", "attend_password": "BO001",
"csm_password": "SM001", "employee_num": "BOSM001"}
There is just a bunch of users just like the one above.
My function looks like this:
#attr("user_test")
def test_get_user_for_login(self):
application_code = 'BO'
user_from_view = self.select_user_for_login(application_code=application_code)
users = [d['USER'] for d in user_from_view]
user_with_ent = choice(users)
user_wo_ent = user_with_ent[-4:]
password = ""
global_users = dir(gum)
for item in global_users:
if user_wo_ent not in item.__getattr__("user_name"):
user_with_ent = choice(users)
user_wo_ent = user_with_ent[-4:]
else:
password = item.__getattr__("password")
print(user_wo_ent, password)
global_users = dir(gum) is my file of constants. So I know I am doing something wrong since I am getting an attribute error AttributeError: 'str' object has no attribute '__getattr__', I am just not sure how to go about resolving it.
You should reverse your looping as you want to compare each item to your match condition. Also, you have a dictionary, so use it to do some heavy lifting.
You need to add some imports
import re
from ast import literal_eval
I've changed the dir(gum) bit to be this function.
def get_global_users(filename):
gusers = {} # create a global users dict
p_key = re.compile(ur'\b\w*\b') # regex to get first part, e.g.. SALES_MANAGER_01
p_value = re.compile(ur'\{.*\}') # regex to grab everything in {}
with (open(filename)) as f: # open the file and work through it
for line in f: # for each line
gum_key = p_key.match(line) # pull out the key
gum_value = p_value.search(line) # pull out the value
''' Here is the real action. update a dictionary
with the match of gum_key and with match of gum_value'''
gusers[gum_key.group()] = literal_eval(gum_value.group())
return(gusers) # return the dictionary
The bottom of your existing code is replaced with this.
global_users = get_global_users(gum) # assign return to global_users
for key, value in global_users.iteritems(): # walk through all key, value pairs
if value['user_name'] != user_wo_ent:
user_with_ent = choice(users)
user_wo_ent = user_with_ent[-4:]
else:
password = value['password']
So a very simple answer was get the dir of the constants file then parsing over it like so:
global_users = dir(gum)
for item in global_users:
o = gum.__dict__[item]
if type(o) is not dict:
continue
if gum.__dict__[item].get("user_name") == user_wo_ent:
print(user_wo_ent, o.get("password"))
else:
print("User was not in global_user_mappings")
I was able to find the answer by doing the following:
def get_user_for_login(application_code='BO'):
user_from_view = BaseServiceTest().select_user_for_login(application_code=application_code)
users = [d['USER'] for d in user_from_view]
user_with_ent = choice(users)
user_wo_ent = user_with_ent[4:]
global_users = dir(gum)
user_dict = {'user_name': '', 'password': ''}
for item in global_users:
o = gum.__dict__[item]
if type(o) is not dict:
continue
if user_wo_ent == o.get("user_name"):
user_dict['user_name'] = user_wo_ent
user_dict['password'] = o.get("password")
return user_dict

Python multiprocessing: Reading a file and updating a dictionary

Lets assume that I have a text file with only 2 rows as follows:
File.txt:
100022441 #DavidBartonWB Guarding Constitution
100022441 RT #frankgaffney 2nd Amendment Guy.
First column is user id and second column is user tweet. I'd like to read the above text file and update the following dictionary:
d={'100022441':{'#frankgaffney': 0, '#DavidBartonWB': 0}}.
Here is my code:
def f(line):
data = line.split('\t')
uid = data[0]
tweet = data[1]
if uid in d.keys():
for gn in d[uid].keys():
if gn in tweet:
return uid, gn, 1
else:
return uid, gn, 0
p = Pool(4)
with open('~/File.txt') as source_file:
for uid, gn, r in p.map(f, source_file):
d[uid][gn] += r
So basically I need to read each line of the file and determine whether the user is in my dictionary, and if it is, whether the tweet contain user's keys in the dictionary (e.g. '#frankgaffney' and '#DavidBartonWB'). So based on the two lines I wrote above, the code should result:
d = {{'100022441':{'#frankgaffney': 1, '#DavidBartonWB': 1 }}
But it gives:
d = {{'100022441':{'#frankgaffney': 1, '#DavidBartonWB': 0 }}
For some reason the code always loses one of the keys for all users. Any idea what is wrong in my code?
Your file is tab delimited, and you are always checking the third column for the mention; it works correctly for the first mention because you are passing in the entire file to the function, not each line. So effectively you are doing this:
>>> s = '100022441\t#DavidBartonWB Guarding Constitution\n100022441\tRT#frankgaffney 2nd Amendment Guy.'
>>> s.split('\t')
['100022441', '#DavidBartonWB Guarding Constitution\n100022441', 'RT#frankgaffney 2nd Amendment Guy.']
I recommend two approaches:
Map your function to each line in the file.
Use regular expressions for a more robust search.
Try this version:
import re
d = {'100022441':{'#frankgaffney': 0, '#DavidBartonWB': 0}}
e = r'(#\w+)'
def parser(line):
key, tweet = line.split('\t')
data = d.get(key)
if data:
mentions = re.findall(e, tweet)
for mention in mentions:
if mention in data.keys():
d[key][mention] += 1
with open('~/File.txt') as f:
for line in f:
parser(line)
print(d)
Once you've confirmed its working correctly, then you can multi-process it:
import itertools, re
from multiprocessing import Process, Manager
def parse(queue, d, m):
while True:
line = queue.get()
if line is None:
return # we are done with this thread
key, tweet = line.split('\t')
data = d.get(key)
e = r'(#\w+)'
if data:
mentions = re.findall(e, tweet)
for mention in mentions:
if mention in data:
if mention not in m:
m[mention] = 1
else:
m[mention] += 1
if __name__ == '__main__':
workers = 2
manager = Manager()
d = manager.dict()
d2 = manager.dict()
d = {'100022441': ['#frankgaffney', '#DavidBartonWB']}
queue = manager.Queue(workers)
worker_pool = []
for i in range(workers):
p = Process(target=parse, args=(queue, d, d2))
p.start()
worker_pool.append(p)
# Fill the queue with data for the workers
with open(r'tweets2.txt') as f:
iters = itertools.chain(f, (None,)*workers)
for line in iters:
queue.put(line)
for p in worker_pool:
p.join()
for i,data in d.iteritems():
print('For ID: {}'.format(i))
for key in data:
print(' {} - {}'.format(key, d2[key]))
second column is data[1], not data[2]
the fact that data[2] works means that you are splitting into words, not columns
if you want to search for the user key as a separate word (as opposed to substring), you need tweet=data[1:]
if you want to search for a substring you need to split into exactly two pieces: uid,tweet=line.split(None,1)

Getting multiple children's values using minidom

As you can see from the xml here there are multiple <item> nodes with a set of children such as <summary>, <status> and <key>.
The problem I've encountered is that in using minidom, it's possible to get values of the firstChild and lastChild, but not necessarily any values in between.
I've created the below which doesn't work, but I think is a close approximation of what I need to be doing
import xml.dom.minidom
xml = xml.dom.minidom.parse(result) # or xml.dom.minidom.parseString(xml_string)
itemList = xml.getElementsByTagName('item')
for item in itemList [1:]:
summaryList = item.getElementsByTagName('summary')
statusList = item.getElementsByTagName('status')
keyList = item.getElementsByTagName('key')
lineText = (summaryList[0].nodeValue + " " + statusList[0].nodeValue + " " + keyList[0].nodeValue)
p = Paragraph(lineText, style)
Story.append(p)
Define get_text() function that joins all of the text child nodes (see this answer):
def get_text(element):
return " ".join(t.nodeValue for t in element[0].childNodes
if t.nodeType == t.TEXT_NODE)
dom = xml.dom.minidom.parseString(data)
itemList = dom.getElementsByTagName('item')
for item in itemList[1:]:
summaryList = item.getElementsByTagName('summary')
statusList = item.getElementsByTagName('status')
keyList = item.getElementsByTagName('key')
print get_text(summaryList)
print get_text(statusList)
print get_text(keyList)
print "----"
prints:
Unapprove all pull request reviewers after major change
Needs Triage
STASH-4473
----
Allow using left/right arrow to move side by side diff left/right
Needs Triage
STASH-4478
----
Hope that helps.
How about something like
for item in itemList:
lineText = ' '.join(child.nodeValue for child in item.childNodes)
p = Paragraph(lineText, style)
Story.append(p)

Categories

Resources