How to convert semicolon delimited file to nested dict? - python

I am trying to convert a semicolon delimited file to a nested dict. Been working on this quite a bit this morning and guessing I am overlooking something simple:
Input (Sample)
This is actually about 200 lines long. Just a small sample.
key;name;desc;category;type;action;range;duration;skill;strain_mod;apt_bonus
ambiencesense;Ambience Sense;This sleight provides the async with an instinctive sense about an area and any potential threats nearby. The async receives a +10 modifier to all Investigation, Perception, Scrounging, and Surprise Tests.;psi-chi;passive;automatic;self;constant;;0;
cogboost;Cognitive Boost;The async can temporarily elevate their cognitive performance.;psi-chi;active;quick;self;temp;;-1;{'COG': 5}
Current Output
[['key',
'name',
'desc',
'category',
'type',
'action',
'range',
'duration',
'skill',
'strain_mod',
'apt_bonus'],
['ambiencesense',
'Ambience Sense',
'This sleight provides the async with an instinctive sense about an area and any potential threats nearby. The async receives a +10 modifier to all Investigation, Perception, Scrounging, and Surprise Tests.',
'psi-chi',
'passive',
'automatic',
'self',
'constant',
'',
'0',
''],
['cogboost',
'Cognitive Boost',
'The async can temporarily elevate their cognitive performance.',
'psi-chi',
'active',
'quick',
'self',
'temp',
'',
'-1',
"{'COG': 5}"]]
Desired Output
blahblah = {
'ambiencesense': {
'name': 'Ambiance Sense'
'desc': 'This sleight provides the async with an instinctive sense about an area and any potential threats nearby. The async receives a +10 modifier to all Investigation, Perception, Scrounging, and Surprise Tests.',
'category': 'psi-chi',
'type': 'passive',
'action': 'automatic',
'range': 'self',
'duration': 'constant',
'skill': '',
'strain_mod': '0',
'apt_bonus': '',
},
'cogboost': {
'name': 'Cognitive Boost'
'desc': 'The async can temporarily elevate their cognitive performance.',
'category': 'psi-chi',
'type': 'active',
'action': 'quick',
'range': 'self',
'duration': 'temp',
'skill': '',
'strain_mod': '-1',
'apt_bonus': 'COG', 5',
},
...
Script (Nonfunctional)
#!/usr/bin/env python
# Usage: ./csvdict.py <filename to convert to dict> <file to output>
import csv
import sys
import pprint
def parse(filename):
with open(filename, 'rb') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(), delimiters=';')
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
dict_list = []
for line in reader:
dict_list.append(line)
return dict_list
new_dict = {}
for item in dict_list:
key = item.pop('key')
new_dict[key] = item
output = parse(sys.argv[1])
with open(sys.argv[2], 'wt') as out:
pprint.pprint(output, stream=out)
Working Script
#!/usr/bin/env python
# Usage: ./csvdict.py <input filename> <output filename>
import sys
import pprint
file_name = sys.argv[1]
data = {}
error = 'Incorrect number of arguments.\nUsage: ./csvdict.py <input filename> <output filename>'
if len(sys.argv) != 3:
print(error)
else:
with open(file_name, 'r') as test_fh:
header_line = next(test_fh)
header_line = header_line.strip()
headers = header_line.split(';')
index_headers = {index:header for index, header in enumerate(headers)}
for line in test_fh:
line = line.strip()
values = line.split(';')
index_vals = {index:val for index, val in enumerate(values)}
data[index_vals[0]] = {index_headers[key]:value for key, value in index_vals.items() if key != 0}
with open(sys.argv[2], 'wt') as out:
pprint.pprint(data, stream=out)
The only thing this doesn't handle well is the embedded dicts. Any ideas how to clean this up? (see apt_bonus)
'cogboost': {'action': 'quick',
'apt_bonus': "{'COG': 5}",
'category': 'psi-chi',
'desc': 'The async can temporarily elevate their cognitive performance.',
'duration': 'temp',
'name': 'Cognitive Boost',
'range': 'self',
'skill': '',
'strain_mod': '-1',
'type': 'active'},

Here's another version that is a bit more drawn out, but has no dependancies.
file_name = "<path>/test.txt"
data = {}
with open(file_name, 'r') as test_fh:
header_line = next(test_fh)
header_line = header_line.strip()
headers = header_line.split(';')
index_headers = {index:header for index, header in enumerate(headers)}
for line in test_fh:
line = line.strip()
values = line.split(';')
index_vals = {index:val for index, val in enumerate(values)}
data[index_vals[0]] = {index_headers[key]:value for key, value in index_vals.items() if key != 0}
print(data)

This is pretty easy to do with pandas:
In [7]: import pandas as pd
In [8]: pd.read_clipboard(sep=";", index_col=0).T.to_dict()
Out[8]:
{'ambiencesense': {'action': 'automatic',
'apt_bonus': nan,
'category': 'psi-chi',
'desc': 'This sleight provides the async with an instinctive sense about an area and any potential threats nearby. The async receives a +10 modifier to all Investigation, Perception, Scrounging, and Surprise Tests.',
'duration': 'constant',
'name': 'Ambience Sense',
'range': 'self',
'skill': nan,
'strain_mod': 0,
'type': 'passive'},
'cogboost': {'action': 'quick',
'apt_bonus': "{'COG': 5}",
'category': 'psi-chi',
'desc': 'The async can temporarily elevate their cognitive performance.',
'duration': 'temp',
'name': 'Cognitive Boost',
'range': 'self',
'skill': nan,
'strain_mod': -1,
'type': 'active'}}
In your case, you'd use pd.read_csv() instead of .read_clipboard() but it would look roughly the same. You might also need to tweak it a little if you want to parse the apt_bonus column as a dictionary.

Try this pythonic way using no libraries:
s = '''key;name;desc;category;type;action;range;duration;skill;strain_mod;apt_bonus
ambiencesense;Ambience Sense;This sleight provides the async with an instinctive sense about an area and any potential threats nearby. The async receives a +10 modifier to all Investigation, Perception, Scrounging, and Surprise Tests.;psi-chi;passive;automatic;self;constant;;0;
cogboost;Cognitive Boost;The async can temporarily elevate their cognitive performance.;psi-chi;active;quick;self;temp;;-1;{'COG': 5}'''
lists = [delim.split(';') for delim in s.split('\n')]
keyIndex = lists[0].index('key')
nested = {lst[keyIndex]:{lists[0][i]:lst[i] for i in range(len(lists[0])) if i != keyIndex} for lst in lists[1:]}
That results with:
{
'cogboost': {
'category': 'psi-chi',
'name': 'Cognitive Boost',
'strain_mod': '-1',
'duration': 'temp',
'range': 'self',
'apt_bonus': "{'COG': 5}",
'action': 'quick',
'skill': '',
'type': 'active',
'desc': 'The async can temporarily elevate their cognitive performance.'
},
'ambiencesense': {
'category': 'psi-chi',
'name': 'Ambience Sense',
'strain_mod': '0',
'duration': 'constant',
'range': 'self',
'apt_bonus': '',
'action': 'automatic',
'skill': '',
'type': 'passive',
'desc': 'This sleight provides the async with an instinctive sense about an area and any potential threats nearby. The async receives a +10 modifier to all Investigation, Perception, Scrounging, and Surprise Tests.'
}
}

Related

[SOLVED];Can't turn a list file into rows [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 days ago.
Improve this question
I'm using an API from anomali to gather intel list and i wanna ask on how i could run the code so that it would output all the need columns header into an excel file.
So i created a code where i pull out the needed columns to be implemented to the site.
import requests
import json
import pandas as pd
import csv
url = 'https://api.threatstream.com/api/v2/intelligence/?itype=bot_ip'
csv_columns = ['ip','source_created', 'status', 'itype', 'expiration_ts', 'is_editable', 'feed_id', 'update_id',
'value', 'ispublic', 'threat_type', 'workgroups', 'rdns', 'confidence', 'uuid', 'retina_confidence',
'trusted_circle_ids', 'id', 'source', 'owner_organization_id', 'import_session_id', 'source_modified',
'type', 'sort', 'description', 'tags', 'threatscore', 'latitude', 'modified_ts', 'org', 'asn',
'created_ts', 'tlp', 'is_anonymous', 'country', 'source_reported_confidence', 'can_add_public_tags',
'subtype', 'meta', 'resource_uri']
with open("AnomaliThreat.csv","a", newline='') as filecsv:
writer = csv.DictWriter(filecsv, fieldnames=csv_columns)
writer.writeheader()
headers = {
'Accept': 'application/json',
'Authorization': 'apikey testing:wdwfawaf12321rfewawafa'
}
response= requests.get( url=url,headers=headers)
json_Data = json.loads(response.content)
result = json_Data["objects"]
with open("AnomaliThreat.csv","a", newline='')as filecsv:
writer = csv.DictWriter(filecsv,fieldnames=csv_columns)
writer.writerow(result)
If i ran this code, all i got is 'list' no attribute keys, my guess is because inside the response, there's a list inside the list or another string inside the list for example like this
'trusted_circle_ids': [1241412, 212141241]
or this
'tags': [{'id': 'fwafwff', 'name': 'wfwafwawf'},
{'id': '31231ewfw',
'name': 'fwafwafwafaw#gmail.com.wafawfawfds.com'}],
And this is what's inside the response of anomali
[{'source_created': None,
'status': 'inactive',
'itype': 'bot_ip',
'expiration_ts': '',
'ip': '231.24124.1241.412',
'is_editable': False,
'feed_id': 23112231,
'update_id': 231231,
'value': '124124124141224141',
'is_public': False,
'threat_type': 'bot',
'workgroups': [],
'rdns': None,
'confidence': 12,
'uuid': '3123414124124142',
'retina_confidence': 52414,
'trusted_circle_ids': [1241412, 212141241],
'id': fwaffewaewafw1231231,
'source': 'wfawfwaefwadfwa',
'owner_organization_id': 2,
'import_session_id': None,
'source_modified': None,
'type': 'ip',
'sort': [312312424124141241, '1241414214241'],
'description': None,
'tags': [{'id': 'fwafwff', 'name': 'wfwafwawf'},
{'id': '31231ewfw',
'name': 'fwafwafwafaw#gmail.com.wafawfawfds.com'}],
'threatscore': 412,
'latitude': wafefwaf,
'modified_ts': 'wawafwadfd',
'org': 'fawfwafawe',
'asn': 'fwafwa2131231',
'created_ts': '41241241241241',
'tlp': None,
'is_anonymous': False,
'country': 'fwafw',
'source_reported_confidence': 21,
'can_add_public_tags': False,
'longitude': --321412,
'subtype': None,
'meta': {'detail2': 'bi2141412412342424',
'severity': '3123124r3'},
'resource_uri': '/api/v2/intelligence/241fsdfsf241325/'},
{'source_created': None,
'status': 'inactive',
'itype': 'bot_ip',
'expiration_ts': '',
'ip': '231.24124.1241.412',
'is_editable': False,
'feed_id': 23112231,
'update_id': 231231,
'value': '124124124141224141',
'is_public': False,
'threat_type': 'bot',
'workgroups': [],
'rdns': None,
'confidence': 12,
'uuid': '3123414124124142',
'retina_confidence': 52414,
'trusted_circle_ids': [1241412, 212141241],
'id': fwaffewaewafw1231231,
'source': 'wfawfwaefwadfwa',
'owner_organization_id': 2,
'import_session_id': None,
'source_modified': None,
'type': 'ip',
'sort': [312312424124141241, '1241414214241'],
'description': None,
'tags': [{'id': 'fwafwff', 'name': 'wfwafwawf'},
{'id': '31231ewfw',
'name': 'fwafwafwafaw#gmail.com.wafawfawfds.com'}],
'threatscore': 412,
'latitude': wafefwaf,
'modified_ts': 'wawafwadfd',
'org': 'fawfwafawe',
'asn': 'fwafwa2131231',
'created_ts': '41241241241241',
'tlp': None,
'is_anonymous': False,
'country': 'fwafw',
'source_reported_confidence': 21,
'can_add_public_tags': False,
'longitude': --321412,
'subtype': None,
'meta': {'detail2': 'bi2141412412342424',
'severity': '3123124r3'},
'resource_uri': '/api/v2/intelligence/241fsdfsf241325/'}]
I'm open to any suggestions on how to make it so that the results can be inputed into an excel file
Problem Solved!
I needed to add a value to the code, so i added this line
csv_writer = csv.writer(data_file)
count = 0
for res in result:
if count == 0:
header = res.keys()
csv_writer.writerow(header)
count += 1
csv_writer.writerow(res.values())
data_file.close()
You can try doing something like this if i understood correctly,
import requests
import json
import pandas as pd
import csv
url = 'https://api.threatstream.com/api/v2/intelligence/?itype=bot_ip'
csv_columns = ['ip','source_created', 'status', 'itype', 'expiration_ts', 'is_editable', 'feed_id', 'update_id',
'value', 'ispublic', 'threat_type', 'workgroups', 'rdns', 'confidence', 'uuid', 'retina_confidence',
'trusted_circle_ids', 'id', 'source', 'owner_organization_id', 'import_session_id', 'source_modified',
'type', 'sort', 'description', 'tags', 'threatscore', 'latitude', 'modified_ts', 'org', 'asn',
'created_ts', 'tlp', 'is_anonymous', 'country', 'source_reported_confidence', 'can_add_public_tags',
'subtype', 'meta', 'resource_uri']
headers = {
'Accept': 'application/json',
'Authorization': 'apikey testing:wdwfawaf12321rfewawafa'
}
response= requests.get( url=url,headers=headers)
json_Data = json.loads(response.content)
result = json_Data["objects"]
dataframe_1 = pd.Dataframe
for key, value in result.items():
if key in csv_columns:
dataframe_1[key] = value
dataframe_1.to_csv("AnomaliThreat.csv")
something along those lines, so basically iterate through the key, value pairs with in the result, check if the key is in the csv_columns, save that key value pair, finally once all that is done just use the dataframe.to_csv

How to interate for loop in dictionaries python

i am new to python. I am working on a project in which i need to add agents in a room then the user can call to the available agents. But i am struck on how to add agents.
i need to add agents in dict like:
{1: {'fkb1bDXXF_qh7AgyAAAD': {'Type': 'Agent', 'First_Name': '', 'Last_Name': '', 'Data': ''}}, 2: {'Ttr-d9HWdzkgaPrsAAAB': {'Type': 'Agent', 'First_Name': '', 'Last_Name': '', 'Data': ''}}}
agents no. and session_id is different for all agents. But i am getting output like this:
{1: {'fkb1bDXXF_qh7AgyAAAD': {'Type': 'Agent', 'First_Name': '', 'Last_Name': '', 'Data': ''}}}
{1: {'Ttr-d9HWdzkgaPrsAAAB': {'Type': 'Agent', 'First_Name': '', 'Last_Name': '', 'Data': ''}}}
need help on this.
import socketio
sio = socketio.Server()
app = socketio.WSGIApp(sio, static_files={
'/': './public/'
})
agents = {}
total_agents = 0
def add_agents(sid):
total_agents = 0
global total_rooms
global users
global sid_room_identifier
for key in agents.keys():
agents[key][sid] = {}
agents[key][sid] = {
'Type': 'Agent',
'First_Name': '',
'Last_Name': '',
'Data': ''
}
if (total_agents == 0):
total_agents += 1
agents[total_agents] = {
sid: {
'Type': 'Agent',
'First_Name': '',
'Last_Name': '',
'Data': ''
}
}
print(total_agents)
print(agents)
return total_agents
return agents
#sio.event
def connect(sid, environ):
print(sid, 'connected')
add_agents(sid)
#sio.event
def disconnect(sid):
print(sid, 'disconnected')

DuckDuckGo API not responding when space in encoded url query

I kind of have two real questions. Both relate to this code:
import urllib
import requests
def query(q):
base_url = "https://api.duckduckgo.com/?q={}&format=json"
resp = requests.get(base_url.format(urllib.parse.quote(q)))
json = resp.json()
return json
One is this: When I query something like this: "US Presidents", I get back something like this:
{'Abstract': '', 'AbstractSource': '', 'AbstractText': '', 'AbstractURL': '', 'Answer': '', 'AnswerType': '', 'Definition': '', 'DefinitionSource': '', 'DefinitionURL': '', 'Entity': '', 'Heading': '', 'Image': '', 'ImageHeight': '', 'ImageIsLogo': '', 'ImageWidth': '', 'Infobox': '', 'Redirect': '', 'RelatedTopics': [], 'Results': [], 'Type': '', 'meta': {'attribution': None, 'blockgroup': None, 'created_date': '2021-03-24', 'description': 'testing', 'designer': None, 'dev_date': '2021-03-24', 'dev_milestone': 'development', 'developer': [{'name': 'zt', 'type': 'duck.co', 'url': 'https://duck.co/user/zt'}], 'example_query': '', 'id': 'just_another_test', 'is_stackexchange': 0, 'js_callback_name': 'another_test', 'live_date': None, 'maintainer': {'github': ''}, 'name': 'Just Another Test', 'perl_module': 'DDG::Lontail::AnotherTest', 'producer': None, 'production_state': 'offline', 'repo': 'fathead', 'signal_from': 'just_another_test', 'src_domain': 'how about there', 'src_id': None, 'src_name': 'hi there', 'src_options': {'directory': '', 'is_fanon': 0, 'is_mediawiki': 0, 'is_wikipedia': 0, 'language': '', 'min_abstract_length': None, 'skip_abstract': 0, 'skip_abstract_paren': 0, 'skip_icon': 0, 'skip_image_name': 0, 'skip_qr': '', 'src_info': '', 'src_skip': ''}, 'src_url': 'Hello there', 'status': None, 'tab': 'is this source', 'topic': [], 'unsafe': None}}
Basically, everything is empty. Even the Heading key, which I know was sent as "US Presidents" encoded into url form. This issue seems to affect all queries I send with a space in them. Even when I go to this url: "https://api.duckduckgo.com/?q=US%20Presidents&format=json&pretty=1" in a browser, all I get is a bunch of blank json keys.
My next question is this. When I send in something like this: "1+1", the json response's "Answer" key is this:
{'from': 'calculator', 'id': 'calculator', 'name': 'Calculator', 'result': '', 'signal': 'high', 'templates': {'group': 'base', 'options': {'content': 'DDH.calculator.content'}}}
Everything else seems to be correct, but the 'result' should be '2', should it not? The entire rest of the json seems to be correct, including all 'RelatedTopics'
Any help with this would be greatly appreciated.
Basically duckduckgo api is not a real search engine. It is just a dictionary. So try US%20President instead of US%20presidents and you'll get an answer. For encoding you can use blanks, but if it's not a fixed term I would prefer the plus-sign. You can do this by using urllib.parse.quote_plus()
With calculation you're right. But I see absolutely no use case to use a calculus-api within a python code. It is like using trampoline to travel to the moon if there is a rocket available. And maybe they see it the same and do not offer calc services in their api?!

Fetch specific record from Kafka in Python using KafkaConsumer by filitering the transactions with sessionID

Context:
We have a requirement where I need to capture a specific transaction using KafkaConsumer library.I use a unique session as key which I obtain from a different tool(let's call as Feeder tool) to capture the transaction.
I run my code immediately once the session is obtained from Feeder.
Problem
I am able to fetch multiple records from Kafka but I don't see the record which I'm trying to filter using the session.
Code
from kafka import KafkaConsumer
import json
SESSION = 'sessionID'
def consumeRecords(topic,group,bootstrapserver,auto_offset_reset,mySession,auto_commit_boolean):
consumer = KafkaConsumer(topic,group_id=group,bootstrap_servers=bootstrapserver,auto_offset_reset=auto_offset_reset,enable_auto_commit=auto_commit_boolean)
consumer.topics()
consumer.seek_to_beginning()
try:
while True:
print("CALLING POLL")
records = consumer.poll(timeout_ms=1000)
print("RETURNED FROM POLL")
if records:
for consumedRecord in records.values():
for msg in consumedRecord:
json_data = json.loads(msg.value.decode('utf-8'))
#print(json_data)
if 'alias' in json_data.keys() and json_data['alias']=='myServer':
current_session = json_data[SESSION]
print("SESSION is :" , current_session)
if mySession == current_session :
print('My record is ', json_data)
except Exception as e:
print("Unable to find any related sessions")
print(e)
if __name__ == '__main__':
KAFKA_TOPIC = 'e-commerce.request'
KAFKA_GROUP = 'test'
KAFKA_BROKERS = ['ABC.net:9092', 'DEF:9092']
auto_commit = False
consumeRecords(KAFKA_TOPIC,KAFKA_GROUP,KAFKA_BROKERS,'earliest','38l87jondkvnefNW886QMTWVcN6S4my5Y-No167ZzqF',auto_commit)
I'm supposed to print the following json data consumed from Kafka , but my code doesn't fetch this record and hence prints nothing and runs for infinite time
{'Type': 'request', 'requestID': '2018100819564659-5', 'payload': {'timing': {'startTime': '20181008195624322', 'total': '0.063', 'totalActions': '0', 'jsp': '0.063'}, 'user': {'orgID': '', 'userID': '', 'newComer': 'FALSE', 'helpdeskUserID': '', 'helpdeskUserOrgID': '', 'travelerID': ''}, 'version': '1.0.0', 'client': {'referrer': '', 'ip': ''}, 'url': {'parameters': {'JSESSIONID': '38l87jondkvnefNW886QMTWVcN6S4my5Y-No167ZzqF!430553578!-1652153437'}, 'baseUrl': 'http://server_url', 'path': 'DUMMY', 'method': 'POST'}, 'actions': [{'cumulTiming': '0', 'name': 'OverrideServlet', 'isChained': 'FALSE', 'features': '[GlobalTimeSpent = 0][PRE_RULES = 0][POST_RULES = 0]', 'chainParent': ''}], 'context': {'sessionSize': '12|12', 'fatalError': 'FALSE', 'requestType': 'XML', 'error': [], 'requestedSessionID': '', 'templateName': ''}}, 'Hostname': 'DummyAgain', 'sessionID': '38l87jondkvnefNW886QMTWVcN6S4my5Y-No167ZzqF', 'Site': 'ABCDEFGH', 'ClientId': 1234551515439, 'Epoch': 1539028584353, 'IP': 'A.B.C.D', 'alias': 'myServer', 'SeqNb': 21845, 'Source': 'eCommerce'}

What's causing local variable reference error?

ALSO: CAN SOMEONE tell me why it's not picking up 'website'?
I'm trying to figure out what's causing this error:
line 60:
What's causing this error??? Only arising on the first line:
UnboundLocalError: local variable 'name' referenced before assignment
Code:
import re
import json
import jsonpickle
from nameparser import HumanName
from pprint import pprint
import csv
import json
import jsonpickle
from nameparser import HumanName
from pprint import pprint
from string import punctuation, whitespace
def parse_ieca_gc(s):
########################## HANDLE NAME ELEMENT ###############################
degrees = ['M.A.T.','Ph.D.','MA','J.D.','Ed.M.', 'M.A.', 'M.B.A.', 'Ed.S.', 'M.Div.', 'M.Ed.', 'RN', 'B.S.Ed.', 'M.D.']
degrees_list = []
# check whether the name string has an area / has a comma
if ',' in s['name']:
# separate area of practice from name and degree and bind this to var 'area'
split_area_nmdeg = s['name'].split(',')
area = split_area_nmdeg.pop()
print 'split area nmdeg'
print area
print split_area_nmdeg
# Split the name and deg by spaces. If there's a deg, it will match with one of elements and will be stored deg list. The deg is removed name_deg list and all that's left is the name.
split_name_deg = re.split('\s',split_area_nmdeg[0])
for word in split_name_deg:
for deg in degrees:
if deg == word:
degrees_list.append(split_name_deg.pop())
name = ' '.join(split_name_deg)
# if the name string does not contain a comma, just parse as normal string
else:
area = []
split_name_deg = re.split('\s',s['name'])
for word in split_name_deg:
for deg in degrees:
if deg == word:
degrees_list.append(split_name_deg.pop())
name = ' '.join(split_name_deg)
# area of practice
category = area
# name
name = HumanName(name)
first_name = name.first
middle_name = name.middle
last_name = name.last
title = name.title
full_name = dict(first_name=first_name, middle_name=middle_name, last_name=last_name, title=title)
# degrees
degrees = degrees_list
# website
website = s.get('website','')
gc_ieca = dict(
name = name,
website = website,
degrees = degrees,
),
myjson = [] # myjson = list of dictionaries where each dictionary
with(open("ieca_first_col_fake_text.txt", "rU")) as f:
sheet = csv.DictReader(f,delimiter="\t")
for row in sheet:
myjson.append(row)
for i in range(4):
s = myjson[i]
a = parse_ieca_gc(s)
pprint(a)
example data (made up data):
name phone email website
Diane Grant Albrecht M.S.
"Lannister G. Cersei M.A.T., CEP" 111-222-3333 cersei#got.com www.got.com
Argle D. Bargle Ed.M.
Sam D. Man Ed.M. 000-000-1111 dman123#gmail.com www.daManWithThePlan.com
D G Bamf M.S.
Amy Tramy Lamy Ph.D.
Download ex data
Last login: Tue Jul 2 15:33:31 on ttys000
/var/folders/jv/9_sy0bn10mbdft1bk9t14qz40000gn/T/Cleanup\ At\ Startup/ieca_first_col-394486416.142.py.command ; exit;
Samuel-Finegolds-MacBook-Pro:~ samuelfinegold$ /var/folders/jv/9_sy0bn10mbdft1bk9t14qz40000gn/T/Cleanup\ At\ Startup/ieca_first_col-394486416.142.py.command ; exit;
range 4
split area nmdeg
CEP
['Lannister G. Cersei M.A.T.']
({'additionaltext': '',
'bio': '',
'category': ' CEP',
'certifications': [],
'company': '',
'counselingoptions': [],
'counselingtype': [],
'datasource': {'additionaltext': '',
'linktext': '',
'linkurl': '',
'logourl': ''},
'degrees': ['M.A.T.'],
'description': '',
'email': {'emailtype': [], 'value': 'cersei#got.com'},
'facebook': '',
'languages': 'english',
'linkedin': '',
'linktext': '',
'linkurl': '',
'location': {'address': '',
'city': '',
'country': 'united states',
'geo': {'lat': '', 'lng': ''},
'loc_name': '',
'locationtype': '',
'state': '',
'zip': ''},
'logourl': '',
'name': {'first_name': u'Lannister',
'last_name': u'Cersei',
'middle_name': u'G.',
'title': u''},
'phone': {'phonetype': [], 'value': '1112223333'},
'photo': '',
'price': {'costrange': [], 'costtype': []},
'twitter': '',
'website': ''},)
({'additionaltext': '',
'bio': '',
'category': [],
'certifications': [],
'company': '',
'counselingoptions': [],
'counselingtype': [],
'datasource': {'additionaltext': '',
'linktext': '',
'linkurl': '',
'logourl': ''},
'degrees': ['Ed.M.'],
'description': '',
'email': {'emailtype': [], 'value': ''},
'facebook': '',
'languages': 'english',
'linkedin': '',
'linktext': '',
'linkurl': '',
'location': {'address': '',
'city': '',
'country': 'united states',
'geo': {'lat': '', 'lng': ''},
'loc_name': '',
'locationtype': '',
'state': '',
'zip': ''},
'logourl': '',
'name': {'first_name': u'Argle',
'last_name': u'Bargle',
'middle_name': u'D.',
'title': u''},
'phone': {'phonetype': [], 'value': ''},
'photo': '',
'price': {'costrange': [], 'costtype': []},
'twitter': '',
'website': ''},)
({'additionaltext': '',
'bio': '',
'category': [],
'certifications': [],
'company': '',
'counselingoptions': [],
'counselingtype': [],
'datasource': {'additionaltext': '',
'linktext': '',
'linkurl': '',
'logourl': ''},
'degrees': ['Ed.M.'],
'description': '',
'email': {'emailtype': [], 'value': 'dman123#gmail.com'},
'facebook': '',
'languages': 'english',
'linkedin': '',
'linktext': '',
'linkurl': '',
'location': {'address': '',
'city': '',
'country': 'united states',
'geo': {'lat': '', 'lng': ''},
'loc_name': '',
'locationtype': '',
'state': '',
'zip': ''},
'logourl': '',
'name': {'first_name': u'Sam',
'last_name': u'Man',
'middle_name': u'D.',
'title': u''},
'phone': {'phonetype': [], 'value': '0000001111'},
'photo': '',
'price': {'costrange': [], 'costtype': []},
'twitter': '',
'website': ''},)
({'additionaltext': '',
'bio': '',
'category': [],
'certifications': [],
'company': '',
'counselingoptions': [],
'counselingtype': [],
'datasource': {'additionaltext': '',
'linktext': '',
'linkurl': '',
'logourl': ''},
'degrees': ['M.S.'],
'description': '',
'email': {'emailtype': [], 'value': ''},
'facebook': '',
'languages': 'english',
'linkedin': '',
'linktext': '',
'linkurl': '',
'location': {'address': '',
'city': '',
'country': 'united states',
'geo': {'lat': '', 'lng': ''},
'loc_name': '',
'locationtype': '',
'state': '',
'zip': ''},
'logourl': '',
'name': {'first_name': u'D',
'last_name': u'Bamf',
'middle_name': u'G',
'title': u''},
'phone': {'phonetype': [], 'value': ''},
'photo': '',
'price': {'costrange': [], 'costtype': []},
'twitter': '',
'website': ''},)
logout
[Process completed]
You are using a local variable name here:
name = HumanName(name)
You do set name before that point, but only if certain conditions match. When those conditions do not match, name is never assigned to and the exception is thrown.
For example, in the first if branch, the loop is:
for word in split_name_deg:
for deg in degrees:
if deg == word:
degrees_list.append(split_name_deg.pop())
name = ' '.join(split_name_deg)
If deg == word never matches, then name is never set either.
Your function also doesn't return anything, so the line a = parse_ieca_gc(s) will only ever assign None to a. You need to use the return keyword to set a return value for your function.
Last but not least, you only pass the first row from your CSV file to the function, and that first row has no website associated with it:
Diane Grant Albrecht M.S.
I thought leaving a comment, but I guess that should actually qualify as an answer. You seem to like programming (or at least be serious about it), so please take my answer positively: not as another piece of criticisms but as an advice how to avoid similar errors/problems in the future.
This are just a few points that I came up with after reading your code:
1) Add entry points
Code is messy which makes it difficult to find and follow the main line of your thinking (the program logic). Since you are not just prototyping or experimenting, but writing a functioning program you should really add an entry point. In python one first defines his module with all entries and elements (mainly imports, constants and functions), and only then sets the entry point with the section: if __name__ == '__main__': at the bottom of the module.
2) Break code into many functions
The program is not that big, but because you are trying to do too much (very quick-n-dirty) and using just few lines it becomes dangerous. Your code growth organically very fast and exposing it errors like this. Please take your time and learn how to break your code into functions, which are the basic building blocks for each module. Try to define many small self-consistent functions in your module and call them from the main part of the program. If you manage to give them proper names - your code will be very readable, especially starting with __main__ part.
Treat each function as a small program (divide and conquer). Keep each function small in number of lines (<= 20) and compact in number of arguments (<= 5-7). It has many advantages:
you can test the code of each function individually and be sure that it is working by using calls form __main__, doctests or unittests. Like this, you will always have full control of your program even before/without applying sophisticated debugging techniques
each function is a closure and declared variables a bounded to a local scope. This helps to avoid complications with global variables that may lead to "side-effects".
functions can be imported by other modules. Like this you have a better organization of the code and greater re-usability.
3) Never write too much code without running it
Progressing slowly allows to keep constant overview of your idea while writing the program. Even if the code will end up uglier than you would wish it to be, any incremental change to your code should be traceable (you kind-of know/observe how much code is added with each step). You can also start using version control even locally (just for yourself), that will allow you to progress slowly by keeping your commits atomic and self-contained.
4) Print and die
If you still feel like you give to far or written too much code w/o running it, your end up in a situation similar to yours now. Another trick could be just to put exit() call in the middle or before of the newly written code that breaks (by checking line number of the exception info). In most cases, trying to print out variables and check if their values are similar to expected helps to find the problem. Otherwise just comment a section of you program in order to make a few steps back (cutting it until it gets that small that whatever is "on" - works)
5) Cyclic complexity
Avoid too many nested loops and conditional constructions. Try to do not more than 2-3 nested blocks per function. It is important matter. Use tools like pylint and PEP8 to check the quality of your code. You will be surprised how many complains those tools are capable to find about the code that looks decent. E.g. there is a lot of motivation for having 80 chars limit per line of your code. That really does prevent writing too much hanging and nested code. Ideally code is always compact: each function is not too wide, and not too tall.
6) Avoid
Finally, try to avoid
reassigning variables to itself, especially if your are changing the type in addition: name = HumanName(name)
defining "hanging" variables inside nested blocks and expressions which may or may not take place
dependency on global variables (exactly by defining functions!) and too much cross dependencies. Unless you program a recursive algorithm, you should be totally fine with top down approach. Dependencies on badly tested, uncertain outcomes.
respect indent - always replace tabs! you would not make it far in python if you don't do it (set ts=4 | set sw=4 | set et)
If you write a line of code that takes you too long to think afterwards, consider correcting it. If you write a function that you later don't understand, consider throwing it away. If you do everything right and
you get an error that you don't understand, consider going to sleep.
PS
Don't forget to smoke the famous
>>> import this
Hope some of the points are useful.
GL!

Categories

Resources