Trouble with handling URI data (attempting to load direct to DB) - python

I am working on a script to make an API call to a vendor. The initial call returns a JSON list containing a list of URIs to get my data from. When I connect to one of those URIs and retrieve that data, it is not JSON being returned, its comma delimited. I can write this to a CSV file without a problem.
What I want to do is write it directly to my DB and therein lies the issue. The rows are \n delimited and the fields are comma delimited and sometimes they are enclosed with a double quote, sometimes not. Compounding the issue is some of the fields enclosed by double quotes have commas in them.
I need to be able to get the headers (which i have figured out) so I can use them for the field names to write to the DB (the vendor likes to change the order and occasionally exclude fields) I cannot just dump the data into the table since there could be new or missing or out of order fields. I've tried a number or methods and nothing is splitting this string correctly.
Here is an example of one row in the dataset:
"July Test", "", 'nothing to see here', "1043 E Main, Dallas, TX 40565", more random crap
What I need is
"July Test", "", "nothing to see here", "1043 E Main, Dallas, TX 40565", "more random crap"
Here is my HTTP call and handling the return. Maybe I should do it differently? I've commented out everything I have tried and failed with.
Takes the URL for the most current file and opens connection and exports data
site= str(x["full_csv_url"])
hdr = {'User-Agent': 'Mozilla/5.0'}
req = Request(site,headers=hdr)
req.add_header('Authorization', token)
with urlopen(req) as x:
data = x.read().decode('utf-8')
try:
#for i in data.split('\n'):
# list = print([i])
list_of_lines = data.splitlines(True)
new_split_data = []
for i in range(1, 2): #nlines
ith_line = str(list_of_lines[i])
ith_line = ith_line.replace("\n","")
ith_line = ith_line.replace("\r","")
"""Split a python-tokenizable expression on comma operators"""
#compos = [-1] # compos stores the positions of the relevant commas in the argument string
#compos.extend(t[2][1] for t in generate_tokens(StringIO(ith_line).readline) if t[1] == ',')
#compos.append(len(ith_line))
#new_ith_line = [ ith_line[compos[i]+1:compos[i+1]] for i in xrange(len(compos)-1)]
#for i in new_ith_line:
# print[i]
print(ith_line)
print("New Line")
print("New Line")
#new_ith_line = re.split(r', (?=(?:"[^"]*?(?: [^"]*)*))|, (?=[^",]+(?:,|$))', ith_line)
new_ith_line = list(csv.reader(ith_line, delimiter=','))
#new_ith_line = re.split(r',(?=")', ith_line)
#new_ith_line = new_ith_line.replace("'\"","'")
#new_ith_line = new_ith_line.replace("\"'","'")
print(new_ith_line)
##Didnt work-- split fields with commas between double quotes
##newstr = ith_line.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)")
# Didnt work, only returned 1st 2 columns
#print(pp.commaSeparatedList.parseString(ith_line).asList())
# Didnt work, returned error
#newStr = [ '"{}"'.format(x) for x in list(csv.reader([ith_line], delimiter=',', quotechar='"'))[0] ]
#print(newStr)
#print(ith_line)
#each_line = data.body.getText().partition("\n")[i]

I managed to find a regex expression which worked for my situation with a small tweak.
This code: new_list = re.findall(r'(?:[^,"]|"(?:\.|[^"])*")+', list)
Gave me: "July Test", "", "nothing to see here", "1043 E Main, Dallas, TX 40565", "more random crap"
I was then able to create a list and load to the DB.

Related

Loop through a spreadsheet

I made a Python program using tkinter and pandas to select rows and send them by email.
The program let the user decides on which excel file wants to operate;
then asks on which sheet of that file you want to operate;
then it asks how many rows you want to select (using .tail function);
then the program is supposed to iterate through rows and read from a cell (within selected rows) the email address;
then it sends the correct row to the correct address.
I'm stuck at the iteration process.
Here's the code:
import pandas as pd
import smtplib
def invio_mail(my_tailed__df): #the function imports the sliced (.tail) dataframe
gmail_user = '###'
gmail_password = '###'
sent_from = gmail_user
server = smtplib.SMTP_SSL('smtp.gmail.com', 465)
server.ehlo()
server.login(gmail_user, gmail_password)
list = my_tailed_df
customers = list['CUSTOMER']
wrs = list['WR']
phones = list['PHONE']
streets = list['STREET']
cities = list["CITY"]
mobiles = list['MOBILE']
mobiles2 = list['MOBILE2']
mails = list['INST']
for i in range(len(mails)):
customer = customers[i]
wr = wrs[i]
phone = phones[i]
street = streets[i]
city = cities[i]
mobile = mobiles[i]
mobile2 = mobiles2[i]
"""
for i in range(len(mails)):
if mails[i] == "T138":
mail = "email_1#gmail.com"
elif mails[i] == "T139":
mail = "email_2#gmail.com"
"""
subject = 'My subject'
body = f"There you go" \n Name: {customer} \n WR: {wr} \n Phone: {phone} \n Street: {street} \n City: {city} \n Mobile: {mobile} \n Mobile2: {mobile2}"
email_text = """\
From: %s
To: %s
Subject: %s
%s
""" % (sent_from, ", ".join(mail), subject, body)
try:
server.sendmail(sent_from, [mail], email_text)
server.close()
print('Your email was sent!')
except:
print("Some error")
The program raises a KeyError: 0 after it enters the for loop, on the first line inside the loop: customer = customers[i]
I know that the commented part (the nested for loop) will raise the same error.
I'm banging my head on the wall, I think i've read and tried everything.
Where's my error?
Things start to go wrong here: list = my_tailed_df. In Python list() is a Built-in Type.
However, with list = my_tailed_df, you are overwriting the type. You can check this:
# before the line:
print(type(list))
<class 'type'>
list = my_tailed_df
# after the line:
print(type(list))
<class 'pandas.core.frame.DataFrame'> # assuming that your df is an actual df!
This is bad practice and adds no functional gain at the expense of confusion. E.g. customers = list['CUSTOMER'] is doing the exact same thing as would customers = my_tailed_df['CUSTOMER'], namely creating a pd.Series with the index from my_tailed_df. So, first thing to do, is to get rid of list = my_tailed_df and to change all those list[...] snippets into my_tailed_df[...].
Next, let's look at your error. for i in range(len(mails)): generates i = 0, 1, ..., len(mails)-1. Hence, what you are trying to do, is access the pd.Series at the index 0, 1 etc. If you get the error KeyError: 0, this must simply mean that the index of your original df does not contain this key in the index (e.g. it's a list of IDs or something).
If you don't need the original index (as seems to be the case), you could remedy the situation by resetting the index:
my_tailed_df.reset_index(drop=True, inplace=True)
print(my_tailed_df.index)
# will get you: RangeIndex(start=0, stop=x, step=1)
# where x = len(my_tailed_df)-1 (== len(mails)-1)
Implement the reset before the line customers = my_tailed_df['CUSTOMER'] (so, instead of list = my_tailed_df), and you should be good to go.
Alternatively, you could keep the original index and change for i in range(len(mails)): into for i in mails.index:.
Finally, you could also do for idx, element in enumerate(mails.index): if you want to keep track both of the position of the index element (idx) and its value (element).

Converting a string representation of a json/dict to something usable with python request

I finally had to give up and ask for help. I am retrieving a document (with requests) that has a json type of format (but is not well formatted - no double quotes) and trying to extract the data as a normal dict. Here is what I have: this works and will get you the output from which I am trying to extract the data.
def test():
url = "http://www.sgx.com/JsonRead/JsonstData"
payload = {}
payload['qryId'] = 'RSTIc'
payload['timeout'] = 60
header = {'User-Agent' : 'Mozilla/5.0 (compatible; MSIE 10.0; Linux i686; Trident/2.0)', 'Content-Type': 'text/html; charset=utf-8'}
req = requests.get(url, headers = header, params = payload)
print(req.url)
prelim = req.content.decode('utf-8')
print(type(prelim))
print(prelim)
test()
What I would like to have after that is: (assuming a properly functioning dict)
for stock in prelim['items']:
print(stock['N'])
Which should give me a list of all the stocks names.
I have tried most json functions: prelim.json(), loads., load., dump., dumps., parse. None seems to work because the data is not formatted properly. I also tried ast.literal_eval() without success. I tried some examples on Stack Overflow to convert that string in a proper dict but no luck. I don't seem to be able to convert that string to make it behave as a proper dictionary. If you can point me in the right direction that would be much appreciated.
Good samaritains have asked for an example of the data. The data coming from the above request is a bit longer but I removed a few 'items' so people can see the general look of the retrieved data.
{}&& {identifier:'ID', label:'As at 19-03-2018 8:38 AM',items:[{ID:0,N:'AscendasReit',SIP:'',NC:'A17U',R:'',I:'',M:'',LT:0,C:0,VL:97.600,BV:485.300,B:'2.670',S:'2.670',SV:1009.100,O:0,H:0,L:0,V:259811.200,SC:'9',PV:2.660,P:0,P_:'X',V_:''},
{ID:1,N:'CapitaComTrust',SIP:'',NC:'C61U',R:'',I:'',M:'',LT:0,C:0,VL:126.349,BV:1467.300,B:'1.800',S:'1.800',SV:620.900,O:0,H:0,L:0,V:228691.690,SC:'9',PV:1.810,P:0,P_:'X',V_:''},
{ID:2,N:'CapitaLand',SIP:'',NC:'C31',R:'',I:'',M:'',LT:0,C:0,VL:78.000,BV:184.900,B:'3.670',S:'3.670',SV:372.900,O:0,H:0,L:0,V:286026.000,SC:'9',PV:3.660,P:0,P_:'X',V_:''},
{ID:28,N:'Wilmar Intl',SIP:'',NC:'F34',R:'CD',I:'',M:'',LT:0,C:0,VL:0.000,BV:32.000,B:'3.210',S:'3.210',SV:73.100,O:0,H:0,L:0,V:0.000,SC:'2',PV:3.220,P:0,P_:'',V_:''},
{ID:29,N:'YZJ Shipbldg SGD',SIP:'',NC:'BS6',R:'',I:'',M:'',LT:0,C:0,VL:0.000,BV:349.500,B:'1.330',S:'1.330',SV:417.700,O:0,H:0,L:0,V:0.000,SC:'2',PV:1.340,P:0,P_:'',V_:''}]}
Following the recent commentaries, I know I could do this:
def test2():
my_text = "{}&& {identifier:'ID', label:'As at 19-03-2018 8:38 AM',items:[{ID:0,N:'AscendasReit',SIP:'',NC:'A17U',R:'',I:'',M:'',LT:0,C:0,VL:97.600,BV:485.300,B:'2.670',S:'2.670',SV:1009.100,O:0,H:0,L:0,V:259811.200,SC:'9',PV:2.660,P:0,P_:'X',V_:''}, {ID:1,N:'CapitaComTrust',SIP:'',NC:'C61U',R:'',I:'',M:'',LT:0,C:0,VL:126.349,BV:1467.300,B:'1.800',S:'1.800',SV:620.900,O:0,H:0,L:0,V:228691.690,SC:'9',PV:1.810,P:0,P_:'X',V_:''}, {ID:2,N:'CapitaLand',SIP:'',NC:'C31',R:'',I:'',M:'',LT:0,C:0,VL:78.000,BV:184.900,B:'3.670',S:'3.670',SV:372.900,O:0,H:0,L:0,V:286026.000,SC:'9',PV:3.660,P:0,P_:'X',V_:''}, {ID:28,N:'Wilmar Intl',SIP:'',NC:'F34',R:'CD',I:'',M:'',LT:0,C:0,VL:0.000,BV:32.000,B:'3.210',S:'3.210',SV:73.100,O:0,H:0,L:0,V:0.000,SC:'2',PV:3.220,P:0,P_:'',V_:''}, {ID:29,N:'YZJ Shipbldg SGD',SIP:'',NC:'BS6',R:'',I:'',M:'',LT:0,C:0,VL:0.000,BV:349.500,B:'1.330',S:'1.330',SV:417.700,O:0,H:0,L:0,V:0.000,SC:'2',PV:1.340,P:0,P_:'',V_:''}]}"
prelim = my_text.split("items:[")[1].replace("}]}", "}")
temp_list = prelim.split(", ")
end_list = []
main_dict = {}
for tok1 in temp_list:
temp_dict = {}
temp = tok1.replace("{","").replace("}","").split(",")
for tok2 in temp:
my_key = tok2.split(":")[0]
my_value = tok2.split(":")[1].replace("'","")
temp_dict[my_key] = my_value
end_list.append(temp_dict)
main_dict['items'] = end_list
for stock in main_dict['items']:
print(stock['N'])
test2()
Which is the desired result. I am just asking, if there is an easier (more elegant/pythonic) way of doing this.
You need to convert the string to JSON convertible text first then use json.loads to get dictionary
prelim is not in JSON format and values are not surrounded by "
Remove '{}&& '
Surround properties with "
Apply json.loads(new_text) to get the dictionary representation
i.e
import requests, json
#replace tuples
reps = (('identifier:', '"identifier":'),
('label:', '"label":'),
('items:', '"items":'),
('NC:', '"NC":'),
('ID:', '"ID":'),
('N:', '"N":'),
('SIP:', '"SIP":'),
('SC:', '"SC":'),
('R:', '"R":'),
('I:', '"I":'),
('M:', '"M":'),
('LT:', '"LT":'),
('C:', '"C":'),
('VL:', '"VL":'),
('BV:', '"BV":'),
('BL:', '"BL":'),
('B:', '"B":'),
('S:', '"S":'),
('SV:', '"SV":'),
('O:', '"O":'),
('H:', '"H":'),
('L:', '"L":'),
('PV:', '"PV":'),
('V:', '"V":'),
('P_:', '"P_":'),
('P:', '"P":'),
('V_:', '"V_":'))
#getting rid of invalid json text
prelim = prelim.replace('{}&& ', '')
#replacing single quotes with double quotes
prelim = prelim.replace("'", "\"")
print(prelim)
#reduce to get all replacements
dict_text = fn.reduce(lambda a, kv: a.replace(*kv), reps, prelim)
dic = json.loads(dict_text)
print(dic)
get Items:
for x in dic['items']:
print(x['N'])
Output:
2ndChance W200123
3Cnergy
3Cnergy W200528
800 Super
8Telecom^
A-Smart
A-Sonic Aero^
AA
....

urllib.request: Data Not Writing to Outfile

I've got a script here which (ideally) iterates through multiple pages X of JSON data for each entity Y (in this case, multiple loans X for each team Y). The way that the api is constructed, I believe I must physically change a subdirectory within the URL in order to iterate through multiple entities. Here is the explicit documentation and URL:
GET /teams/:id/loans
Returns loans belonging to a particular team.
Example http://api.kivaws.org/v1/teams/2/loans.json
Parameters id(number) Required. The team ID for which to return loans.
page(number) The page position of results to return. Default: 1
sort_by(string) The order by which to sort results. One of: oldest,
newest Default: newest app_id(string) The application id in reverse
DNS notation. ids_only(string) Return IDs only to make the return
object smaller. One of: true, false Default: false Response
loan_listing – HTML , JSON , XML , RSS
Status Production
And here is my script, which does run and appear to extract the correct data, but doesn't seem to write any data to the outfile:
# -*- coding: utf-8 -*-
import urllib.request as urllib
import json
import time
# storing team loans dict. The key is the team id, en value is the list of lenders
team_loans = {}
url = "http://api.kivaws.org/v1/teams/"
#teams_id range 1 - 11885
for i in range(1, 100):
params = dict(
id = i
)
#i =1
try:
handle = urllib.urlopen(str(url+str(i)+"/loans.json"))
print(handle)
except:
print("Could not handle url")
continue
# reading response
item_html = handle.read().decode('utf-8')
# converting bytes to str
data = str(item_html)
# converting to json
data = json.loads(data)
# getting number of pages to crawl
numPages = data['paging']['pages']
# deleting paging data
data.pop('paging')
# calling additional pages
if numPages >1:
for pa in range(2,numPages+1,1):
#pa = 2
handle = urllib.urlopen(str(url+str(i)+"/loans.json?page="+str(pa)))
print("Pulling loan data from team " + str(i) + "...")
# reading response
item_html = handle.read().decode('utf-8')
# converting bytes to str
datatemp = str(item_html)
# converting to json
datatemp = json.loads(datatemp)
#Pagings are redundant headers
datatemp.pop('paging')
# adding data to initial list
for loan in datatemp['loans']:
data['loans'].append(loan)
time.sleep(2)
# recording loans by team in dict
team_loans[i] = data['loans']
if (data['loans']):
print("===Data added to the team_loan dictionary===")
else:
print("!!!FAILURE to add data to team_loan dictionary!!!")
# recorging data to file when 10 teams are read
print("===Finished pulling from page " + str(i) + "===")
if (int(i) % 10 == 0):
outfile = open("team_loan.json", "w")
print("===Now writing data to outfile===")
json.dump(team_loans, outfile, sort_keys = True, indent = 2, ensure_ascii=True)
outfile.close()
else:
print("!!!FAILURE to write data to outfile!!!")
# compliance with API # of requests
time.sleep(2)
print ('Done! Check your outfile (team_loan.json)')
I know that may be a heady amount of code to throw in your faces, but it's a pretty sequential process.
Again, this program is pulling the correct data, but it is not writing this data to the outfile. Can anyone understand why?
For others who may read this post, the script does in face write data to an outfile. It was simply test code logic that was wrong. Ignore the print statements I have put into place.

Parsing already parsed results with BeautifulSoup

I have a question with using python and beautifulsoup.
My end result program basically fills out a form on a website and brings me back the results which I will eventually output to an lxml file. I'll be taking the results from https://interactive.web.insurance.ca.gov/survey/survey?type=homeownerSurvey&event=HOMEOWNERS and I want to get a list for every city all into some excel documents.
Here is my code, I put it on pastebin:
http://pastebin.com/bZJfMp2N
MY RESULTS ARE ALMOST GOOD :D except now I'm getting            355 for my "correct value" instead of 355, for example. I want to parse that and only show the number, you will see when you put this into python.
However, anything I have tried does NOT work, there is no way I can parse that values_2 variable because the results are in bs4.element.resultset when I think i need to parse a string. Sorry if I am a noob, I am still learning and have worked very long on this program.
Would anyone have any input? Anything would be appreciated! I've read up that my results are in a list or something and i can't parse lists? How would I go about doing this?
Here is the code:
__author__ = 'kennytruong'
#THE PROBLEM HERE IS TO PARSE THE RESULTS PROPERLY!!
import urllib.parse, urllib.request
import re
from bs4 import BeautifulSoup
URL = "https://interactive.web.insurance.ca.gov/survey/survey?type=homeownerSurvey&event=HOMEOWNERS"
#Goes through these locations, strips the whitespace in the string and creates a list that starts at every new line
LOCATIONS = '''
ALAMEDA ALAMEDA
'''.strip().split('\n') #strip() basically removes whitespaces
print('Available locations to choose from:', LOCATIONS)
INSURANCE_TYPES = '''
HOMEOWNERS,CONDOMINIUM,MOBILEHOME,RENTERS,EARTHQUAKE - Single Family,EARTHQUAKE - Condominium,EARTHQUAKE - Mobilehome,EARTHQUAKE - Renters
'''.strip().split(',') #strips the whitespaces and starts a newline of the list every comma
print('Available insurance types to choose from:', INSURANCE_TYPES)
COVERAGE_AMOUNTS = '''
15000,25000,35000,50000,75000,100000,150000,200000,250000,300000,400000,500000,750000
'''.strip().split(',')
print('All options for coverage amounts:', COVERAGE_AMOUNTS)
HOME_AGE = '''
New,1-3 Years,4-6 Years,7-15 Years,16-25 Years,26-40 Years,41-70 Years
'''.strip().split(',')
print('All Home Age Options:', HOME_AGE)
def get_premiums(location, coverage_type, coverage_amt, home_age):
formEntries = {'location':location,
'coverageType':coverage_type,
'coverageAmount':coverage_amt,
'homeAge':home_age}
inputData = urllib.parse.urlencode(formEntries)
inputData = inputData.encode('utf-8')
request = urllib.request.Request(URL, inputData)
response = urllib.request.urlopen(request)
responseData = response.read()
soup = BeautifulSoup(responseData, "html.parser")
parseResults = soup.find_all('tr', {'valign':'top'})
for eachthing in parseResults:
parse_me = eachthing.text
name = re.findall(r'[A-z].+', parse_me) #find me all the words that start with a cap, as many and it doesn't matter what kind.
# the . for any character and + to signify 1 or more of it.
values = re.findall(r'\d{1,10}', parse_me) #find me any digits, however many #'s long as long as btwn 1 and 10
values_2 = eachthing.find_all('div', {'align':'right'})
print('raw code for this part:\n' ,eachthing, '\n')
print('here is the name: ', name[0], values)
print('stuff on sheet 1- company name:', name[0], '- Premium Price:', values[0], '- Deductible', values[1])
print('but here is the correct values - ', values_2) #NEEDA STRIP THESE VALUES
# print(type(values_2)) DOING SO GIVES ME <class 'bs4.element.ResultSet'>, NEEDA PARSE bs4.element type
# values_3 = re.split(r'\d', values_2)
# print(values_3) ANYTHING LIKE THIS WILL NOT WORK BECAUSE I BELIEVE RESULTS ARENT STRING
print('\n\n')
def main():
for location in LOCATIONS: #seems to be looping the variable location in LOCATIONS - each location is one area
print('Here are the options that you selected: ', location, "HOMEOWNERS", "150000", "New", '\n\n')
get_premiums(location, "HOMEOWNERS", "150000", "New") #calls function get_premiums and passes parameters
if __name__ == "__main__": #this basically prevents all the indent level 0 code from getting executed, because otherwise the indent level 0 code gets executed regardless upon opening
main()

Creating loop for __main__

I am new to Python, and I want your advice on something.
I have a script that runs one input value at a time, and I want it to be able to run a whole list of such values without me typing the values one at a time. I have a hunch that a "for loop" is needed for the main method listed below. The value is "gene_name", so effectively, i want to feed in a list of "gene_names" that the script can run through nicely.
Hope I phrased the question correctly, thanks! The chunk in question seems to be
def get_probes_from_genes(gene_names)
import json
import urllib2
import os
import pandas as pd
api_url = "http://api.brain-map.org/api/v2/data/query.json"
def get_probes_from_genes(gene_names):
if not isinstance(gene_names,list):
gene_names = [gene_names]
#in case there are white spaces in gene names
gene_names = ["'%s'"%gene_name for gene_name in gene_names]**
api_query = "?criteria=model::Probe"
api_query= ",rma::criteria,[probe_type$eq'DNA']"
api_query= ",products[abbreviation$eq'HumanMA']"
api_query= ",gene[acronym$eq%s]"%(','.join(gene_names))
api_query= ",rma::options[only$eq'probes.id','name']"
data = json.load(urllib2.urlopen(api_url api_query))
d = {probe['id']: probe['name'] for probe in data['msg']}
if not d:
raise Exception("Could not find any probes for %s gene. Check " \
"http://help.brain- map.org/download/attachments/2818165/HBA_ISH_GeneList.pdf? version=1&modificationDate=1348783035873 " \
"for list of available genes."%gene_name)
return d
def get_expression_values_from_probe_ids(probe_ids):
if not isinstance(probe_ids,list):
probe_ids = [probe_ids]
#in case there are white spaces in gene names
probe_ids = ["'%s'"%probe_id for probe_id in probe_ids]
api_query = "? criteria=service::human_microarray_expression[probes$in%s]"% (','.join(probe_ids))
data = json.load(urllib2.urlopen(api_url api_query))
expression_values = [[float(expression_value) for expression_value in data["msg"]["probes"][i]["expression_level"]] for i in range(len(probe_ids))]
well_ids = [sample["sample"]["well"] for sample in data["msg"] ["samples"]]
donor_names = [sample["donor"]["name"] for sample in data["msg"] ["samples"]]
well_coordinates = [sample["sample"]["mri"] for sample in data["msg"] ["samples"]]
return expression_values, well_ids, well_coordinates, donor_names
def get_mni_coordinates_from_wells(well_ids):
package_directory = os.path.dirname(os.path.abspath(__file__))
frame = pd.read_csv(os.path.join(package_directory, "data", "corrected_mni_coordinates.csv"), header=0, index_col=0)
return list(frame.ix[well_ids].itertuples(index=False))
if __name__ == '__main__':
probes_dict = get_probes_from_genes("SLC6A2")
expression_values, well_ids, well_coordinates, donor_names = get_expression_values_from_probe_ids(probes_dict.keys())
print get_mni_coordinates_from_wells(well_ids)
whoa, first things first. Python ain't Java, so do yourself a favor and use a nice """xxx\nyyy""" string, with triple quotes to multiline.
api_query = """?criteria=model::Probe"
,rma::criteria,[probe_type$eq'DNA']
...
"""
or something like that. you will get white spaces as typed, so you may need to adjust.
If, like suggested, you opt to loop on the call to your function through a file, you will need to either try/except your data-not-found exception or you will need to handle missing data without throwing an exception. I would opt for returning an empty result myself and letting the caller worry about what to do with it.
If you do opt for raise-ing an Exception, create your own, rather than using a generic exception. That way your code can catch your expected Exception first.
class MyNoDataFoundException(Exception):
pass
#replace your current raise code with...
if not d:
raise MyNoDataFoundException(your message here)
clarification about catching exceptions, using the accepted answer as a starting point:
if __name__ == '__main__':
with open(r"/tmp/genes.txt","r") as f:
for line in f.readlines():
#keep track of your input data
search_data = line.strip()
try:
probes_dict = get_probes_from_genes(search_data)
except MyNoDataFoundException, e:
#and do whatever you feel you need to do here...
print "bummer about search_data:%s:\nexception:%s" % (search_data, e)
expression_values, well_ids, well_coordinates, donor_names = get_expression_values_from_probe_ids(probes_dict.keys())
print get_mni_coordinates_from_wells(well_ids)
You may want to create a file with Gene names, then read content of the file and call your function in the loop. Here is an example below
if __name__ == '__main__':
with open(r"/tmp/genes.txt","r") as f:
for line in f.readlines():
probes_dict = get_probes_from_genes(line.strip())
expression_values, well_ids, well_coordinates, donor_names = get_expression_values_from_probe_ids(probes_dict.keys())
print get_mni_coordinates_from_wells(well_ids)

Categories

Resources