Duplicate output with arrays in python - python

I'm trying to gather the data from 6 stocks in the array, but when my API can find the data I want it to move to the next item but still selecting just 6.
I tried with this code and other variants but nothing seems to work. The output always duplicate one stock and I don't know why
portfolio = ['NVDA', 'SPCE', 'IMGN', 'SUMR', 'EXPE', 'PWM.V', 'SVMK', 'DXCM']
tt = 0
irange = 6;
for i in range(irange):
try:
t = requests.get('https://finnhub.io/api/v1/stock/profile?symbol='+portfolio[tt])
t = t.json()
except Exception as e:
print("Error calling API, waiting 70 seconds and trying again...")
time.sleep(70)
t = requests.get('https://finnhub.io/api/v1/stock/profile?symbol='+portfolio[tt])
t = t.json()
try:
coticker = t['ticker']
coexchange = t['exchange']
coname = t['name']
codesc = t['ggroup']
coipo = t['ipo']
cosector = t['gsector']
costate = t['state']
coweburl = t['weburl']
except Exception as e:
print("Information not available")
irange = irange+1
print("THE TT IS:"+str(tt))
tt = tt+1
print("")
print(coticker,coexchange,coname,codesc,coipo,cosector,costate,coweburl)
This is the output:
THE TT IS:0
NVDA -- GATHERED DATA
Information not available
THE TT IS:1
NVDA -- GATHERED DATA
THE TT IS:2
IMGN -- GATHERED DATA
THE TT IS:3
SUMR -- GATHERED DATA
THE TT IS:4
EXPE -- GATHERED DATA
Information not available
THE TT IS:5
EXPE -- GATHERED DATA
As you can see, when there is no information available, it doesn't move to the next one, it repeats the same one. What's the mistake? Thanks in advance for your kind help.

Put the line that prints the information inside the try block that sets all the variables. Otherwise, you'll print the variables from the previous stock.
To make it keep going past 6 items when you have failures, don't use range(irange). Loop over the entire list with for symbol in portfolio:, and use a variable to count the successful attempts. Then break out of the loop when you've printed 6 stocks.
I've changed the code to use if statements instead of try/except to handle empty responses.
portfolio = ['NVDA', 'SPCE', 'IMGN', 'SUMR', 'EXPE', 'PWM.V', 'SVMK', 'DXCM']
irange = 6
successes = 0
for symbol in portfolio:
try:
t = requests.get('https://finnhub.io/api/v1/stock/profile?symbol='+symbol)
t = t.json()
except Exception as e:
print("Error calling API, waiting 70 seconds and trying again...")
time.sleep(70)
t = requests.get('https://finnhub.io/api/v1/stock/profile?symbol='+symbol)
if t:
t = t.json()
if t:
coticker = t['ticker']
coexchange = t['exchange']
coname = t['name']
codesc = t['ggroup']
coipo = t['ipo']
cosector = t['gsector']
costate = t['state']
coweburl = t['weburl']
print("")
print(coticker,coexchange,coname,codesc,coipo,cosector,costate,coweburl)
successes += 1
if successes >= irange:
break
else:
print("Information not available for "+symbol)

Related

Set variable to change on next line in given file during loop python

Here im workin on a script which will go through and analyze all given .mp3 files in a .txt document which will contain the full path to the files. Then it prints out the metadata along with the file path. This analyzing will run in a loop until all lines in the document have been read. I can do this individually if I manually specify the location of the file, but unable to do so when make the variable change per new line in the document when looped. I have tried putting all these functions under 1 large function and looping it but then I still get an undefined variable for "song". Do note: I am a complete noob at python and this is my first attempt at building this small project.
The goal: loop function that goes through every line of .txt and sets that line of text as the "song" variable for the destination that the analyzer function can then go read & print.
Here is the code which works when the file location is manually set:
song = "Z:/Vibe Playlists\Throwbacks\Taio Cruz - Hangover ft. Flo Rida.mp3"
audio1 = ID3(song) #artist,album,title,bpm,initial key,date,
audio2 = MP3(song) #length
audio3 = eyed3.load(song) #publisher
audio4 = mp3.Mp3AudioFile(song) #bitrate
###########################
#global variables for JSON#
###########################
artist = ""
title = ""
album = ""
genre = ""
bpm = ""
initial_key = ""
date = ""
length = ""
publisher = ""
bitrate = ""
song_location = song
###############################
def analyze_metadata():
try:
Artist = audio1['TPE1'].text[0]
# print("Artist:",Artist) #Artist
global artist
artist = Artist
except KeyError:
# print("no artist") - Debugging
pass
try:
Title =audio1['TIT2'].text[0]
global title
title = Title
# print("Title:",audio1['TIT2'].text[0]) #title
except KeyError:
# print("no title") - Debugging
pass
try:
Album = audio1['TALB'].text[0]
global album
album = Album
# print("Album:",audio1['TALB'].text[0]) #album
except KeyError:
# print("no album") - Debugging
pass
try:
Genre = audio3.tag.genre.name
global genre
genre = Genre
# print("Genre:",audio3.tag.genre.name) #genre
except AttributeError:
# print("no genre") - Debugging
pass
try:
Bpm = audio1['TBPM'].text[0]
global bpm
bpm = Bpm
# print("BPM:",audio1['TBPM'].text[0]) #bpm
except KeyError:
# print("no bpm") - Debugging
pass
try:
Initial_Key = audio1['TKEY'].text[0]
global initial_key
initial_key = Initial_Key
# print("Initial Key:",audio1['TKEY'].text[0]) #initial key
except KeyError:
# print("no key") - Debugging
pass
try:
Date = audio1['TDRC'].text[0]
global date
date = Date
# print("Date:",audio1['TDRC'].text[0]) #date/year
except KeyError:
# print("no date") - Debugging
pass
song_length = (audio2.info.length) #length
def convert(seconds):
return time.strftime("%M:%S", time.gmtime(song_length))
# print("Length",convert(song_length))
global length
length = time.strftime("%M:%S", time.gmtime(song_length))
Publisher = audio3.tag.publisher
# print("Publisher:",audio3.tag.publisher) #publisher
global publisher
publisher = Publisher
song_bitrate = str(audio4.info.bit_rate)
song_bitrate = re.sub('[^0-9]', '', song_bitrate)
# print("Bit rate:",song_bitrate, "kbps") #bitrate
global bitrate
bitrate = song_bitrate
#to call function u do this#
analyze_metadata()
#testing global variables for json
def Print_Metadata():
print("Artist:",artist)
print("Title:",title)
print("Album:",album)
print("Genre:",genre)
print("BPM:",bpm)
print("Initial Key:",initial_key)
print("Date:",date)
print("Length:",length)
print("Publisher:",publisher)
print("Bitrate:",bitrate)
print("Location:", song_location)
Print_Metadata()
This prints the metadata to shell, but when I try to run the same code within a while loop that changes the "song" variable to the location of the second line and then third and so on, I get an undefined error.
def make_filepath_variable():
filepath = 'py_test.txt'
with open(filepath) as fp:
line = fp.readline()
cnt = 1
while line:
#print("Line {}: {}".format(cnt, line.strip()))
line = fp.readline()
cnt += 1
global song
song = line
print(song)
make_filepath_variable()
when this code is run before all other functions, The rest of the above code cant grab the variable information. although from this code i can print every line of the text and go down one by one.
I've been going through forums for almost 2 days now trying to find a way to do this and i'm unable to find anything similar or usable that works in my situation.. Some insight or tips would be greatly appreciated, Thanks!

Passing over errors in loop for my web-scraper

I currently have a loop running for my web-scraper. If it encounters an error (i.e can't load the page) I have it set to ignore it and continue with the loop.
for i in links:
try:
driver.get(i);
d = driver.find_elements_by_xpath('//p[#class = "list-details__item__date"]')
s = driver.find_elements_by_xpath('//p[#class = "list-details__item__score"]')
m = driver.find_elements_by_xpath('//span[#class="list-breadcrumb__item__in"]')
o = driver.find_elements_by_xpath('//tr[#data-bid]');
l = len(o)
lm= len(m)
for i in range(l):
a = o[i].text
for i in range(lm):
b = m[i].text
c = s[i].text
e = d[i].text
odds.append((a,b,c,e))
except:
pass
However, I now wish for there to be a note of some kind when an error was encountered so that I can see which pages didn't load. Even if they are just left blank in the output table, that would be fine.
Thanks for any help.
You can add a catch for the exception and then do something with that catch. This should be suitable for your script.
import ... (This is where your initial imports are)
import io
import trackback
for i in links:
try:
driver.get(i);
d = driver.find_elements_by_xpath('//p[#class = "list-details__item__date"]')
s = driver.find_elements_by_xpath('//p[#class = "list-details__item__score"]')
m = driver.find_elements_by_xpath('//span[#class="list-breadcrumb__item__in"]')
o = driver.find_elements_by_xpath('//tr[#data-bid]');
l = len(o)
lm= len(m)
for i in range(l):
a = o[i].text
for i in range(lm):
b = m[i].text
c = s[i].text
e = d[i].text
odds.append((a,b,c,e))
except Exception as error_script:
print(traceback.format_exc())
odds.append('Error count not add')
Essentially what happens is that you catch the exception using the exception Exception as error_script: line. Afterwards , you can print the actual error message to the console using thetraceback.format_exc()`command.
But most importantly you can append a string to the list by passing the append statement in the exception catch and use pass at the end of the exception. pass will run the code int he catch and then go to the next iteration.

How to parse a single-column text file into a table using python?

I'm new here to StackOverflow, but I have found a LOT of answers on this site. I'm also a programming newbie, so i figured i'd join and finally become part of this community - starting with a question about a problem that's been plaguing me for hours.
I login to a website and scrape a big body of text within the b tag to be converted into a proper table. The layout of the resulting Output.txt looks like this:
BIN STATUS
8FHA9D8H 82HG9F RECEIVED SUCCESSFULLY AWAITING STOCKING PROCESS
INVENTORY CODE: FPBC *SOUP CANS LENTILS
BIN STATUS
HA8DHW2H HD0138 RECEIVED SUCCESSFULLY AWAITING STOCKING PROCESS
8SHDNADU 00A123 #2956- INVALID STOCK COUPON CODE (MISSING).
93827548 096DBR RECEIVED SUCCESSFULLY AWAITING STOCKING PROCESS
There are a bunch of pages with the exact same blocks, but i need them to be combined into an ACTUAL table that looks like this:
BIN INV CODE STATUS
HA8DHW2HHD0138 FPBC-*SOUP CANS LENTILS RECEIVED SUCCESSFULLY AWAITING STOCKING PROCESS
8SHDNADU00A123 FPBC-*SOUP CANS LENTILS #2956- INVALID STOCK COUPON CODE (MISSING).
93827548096DBR FPBC-*SOUP CANS LENTILS RECEIVED SUCCESSFULLY AWAITING STOCKING PROCESS
8FHA9D8H82HG9F SSXR-98-20LM NM CORN CREAM RECEIVED SUCCESSFULLY AWAITING STOCKING PROCESS
Essentially, all separate text blocks in this example would become part of this table, with the inv code repeating with its Bin values. I would post my attempts at parsing this data(have tried Pandas/bs/openpyxl/csv writer), but ill admit they are a little embarrassing, as i cannot find any information on this specific problem. Is there any benevolent soul out there that can help me out? :)
(Also, i am using Python 2.7)
A simple custom parser like the following should do the trick.
from __future__ import print_function
def parse_body(s):
line_sep = '\n'
getting_bins = False
inv_code = ''
for l in s.split(line_sep):
if l.startswith('INVENTORY CODE:') and not getting_bins:
inv_data = l.split()
inv_code = inv_data[2] + '-' + ' '.join(inv_data[3:])
elif l.startswith('INVENTORY CODE:') and getting_bins:
print("unexpected inventory code while reading bins:", l)
elif l.startswith('BIN') and l.endswith('MESSAGE'):
getting_bins = True
elif getting_bins == True and l:
bin_data = l.split()
# need to add exception handling here to make sure:
# 1) we have an inv_code
# 2) bin_data is at least 3 items big (assuming two for
# bin_id and at least one for message)
# 3) maybe some constraint checking to ensure that we have
# a valid instance of an inventory code and bin id
bin_id = ''.join(bin_data[0:2])
message = ' '.join(bin_data[2:])
# we now have a bin, an inv_code, and a message to add to our table
print(bin_id.ljust(20), inv_code.ljust(30), message, sep='\t')
elif getting_bins == True and not l:
# done getting bins for current inventory code
getting_bins = False
inv_code = ''
A rather complex one, but this might get you started:
import re, pandas as pd
from pandas import DataFrame
rx = re.compile(r'''
(?:INVENTORY\ CODE:)\s*
(?P<inv>.+\S)
[\s\S]+?
^BIN.+[\n\r]
(?P<bin_msg>(?:(?!^\ ).+[\n\r])+)
''', re.MULTILINE | re.VERBOSE)
string = your_string_here
# set up the dataframe
df = DataFrame(columns = ['BIN', 'INV', 'MESSAGE'])
for match in rx.finditer(string):
inv = match.group('inv')
bin_msg_raw = match.group('bin_msg').split("\n")
rxbinmsg = re.compile(r'^(?P<bin>(?:(?!\ {2}).)+)\s+(?P<message>.+\S)\s*$', re.MULTILINE)
for item in bin_msg_raw:
for m in rxbinmsg.finditer(item):
# append it to the dataframe
df.loc[len(df.index)] = [m.group('bin'), inv, m.group('message')]
print(df)
Explanation
It looks for INVENTORY CODE and sets up the groups (inv and bin_msg) for further processing in afterwork() (note: it would be easier if you had only one line of bin/msg as you need to split the group here afterwards).
Afterwards, it splits the bin and msg part and appends all to the df object.
I had a code written for a website scrapping which may help you.
Basically what you need to do is write click on the web page go to html and try to find the tag for the table you are looking for and using the module (i am using beautiful soup) extract the information. I am creating a json as I need to store it into mongodb you can create table.
#! /usr/bin/python
import sys
import requests
import re
from BeautifulSoup import BeautifulSoup
import pymongo
def req_and_parsing():
url2 = 'http://businfo.dimts.in/businfo/Bus_info/EtaByRoute.aspx?ID='
list1 = ['534UP','534DOWN']
for Route in list1:
final_url = url2 + Route
#r = requests.get(final_url)
#parsing_file(r.text,Route)
outdict = []
outdict = [parsing_file( requests.get(url2+Route).text,Route) for Route in list1 ]
print outdict
conn = f_connection()
for i in range(len(outdict)):
insert_records(conn,outdict[i])
def parsing_file(txt,Route):
soup = BeautifulSoup(txt)
table = soup.findAll("table",{"id" : "ctl00_ContentPlaceHolder1_GridView2"})
#trtags = table[0].findAll('tr')
tdlist = []
trtddict = {}
"""
for trtag in trtags:
print 'print trtag- ' , trtag.text
tdtags = trtag.findAll('td')
for tdtag in tdtags:
print tdtag.text
"""
divtags = soup.findAll("span",{"id":"ctl00_ContentPlaceHolder1_ErrorLabel"})
for divtag in divtags:
for divtag in divtags:
print "div tag - " , divtag.text
if divtag.text == "Currently no bus is running on this route" or "This is not a cluster (orange bus) route":
print "Page not displayed Errored with below meeeage for Route-", Route," , " , divtag.text
sys.exit()
trtags = table[0].findAll('tr')
for trtag in trtags:
tdtags = trtag.findAll('td')
if len(tdtags) == 2:
trtddict[tdtags[0].text] = sub_colon(tdtags[1].text)
return trtddict
def sub_colon(tag_str):
return re.sub(';',',',tag_str)
def f_connection():
try:
conn=pymongo.MongoClient()
print "Connected successfully!!!"
except pymongo.errors.ConnectionFailure, e:
print "Could not connect to MongoDB: %s" % e
return conn
def insert_records(conn,stop_dict):
db = conn.test
print db.collection_names()
mycoll = db.stopsETA
mycoll.insert(stop_dict)
if __name__ == "__main__":
req_and_parsing()

Send an email from Python script whenever the threshold is met for a given machine?

I have a URL which gives me the below JSON String if I hit them on the browser -
Below is my URL, let's say it is URL-A and I have around three URL's like this -
http://hostnameA:1234/Service/statistics?%24format=json
And below is my JSON String which I get back from the url -
{
"description": "",
"statistics": {
"dataCount": 0,
}
}
Now I have written a Python script which is scanning all my 3 URL's and then parse then JSON String to extract the value of dataCount from it. And it should keep on running every few seconds to scan the URL and then parse it.
Below are my URL's
hostnameA http://hostnameA:1234/Service/statistics?%24format=json
hostnameB http://hostnameB:1234/Service/statistics?%24format=json
hostnameC http://hostnameC:1234/Service/statistics?%24format=json
And the data which I am seeing on the console is like this after running my python script -
hostnameA - dataCount
hostnameB - dataCount
hostnameC - dataCount
Below is my python script which works fine
def get_data_count(url):
try:
req = requests.get(url)
except requests.ConnectionError:
return 'could not get page'
try:
data = json.loads(req.content)
return int(data['statistics']['dataCount'])
except TypeError:
return 'field not found'
except ValueError:
return 'not an integer'
def send_mail(data):
sender = 'user#host.com'
receivers = ['some_name#host.com']
message = """\
From: user#host.com
To: some_name#host.com
Subject: Testing Script
"""
body = '\n\n'
for item in data:
body = body + '{name} - {res}\n'.format(name=item['name'], res=item['res'])
message = message + body
try:
smtpObj = smtplib.SMTP('some_server_name' )
smtpObj.sendmail(sender, receivers, message)
print "Mail sent"
except smtplib.SMTPException:
print "Mail sending failed!"
def main():
urls = [
('hostnameA', 'http://hostnameA:1234/Service/statistics?%24format=json'),
('hostnameB', 'http://hostnameB:1234/Service/statistics?%24format=json'),
('hostnameC', 'http://hostnameC:1234/Service/statistics?%24format=json')
]
count = 0
while True:
data = []
print('')
for name, url in urls:
res = get_data_count(url)
print('{name} - {res}'.format(name=name, res=res))
data.append({'name':name, 'res':res})
if len([item['res'] for item in data if item['res'] >= 20]) >= 1: count = count+1
else: count = 0
if count == 2:
send_mail(data)
count = 0
sleep(10.)
if __name__=="__main__":
main()
What I am also doing with above script is, suppose if any of the machines dataCount value is greater than equal to 20 for two times continuously, then I am sending out an email and it also works fine.
One issue which I am noticing is, suppose hostnameB is down for whatever reason, then it will print out like this for first time -
hostnameA - 1
hostnameB - could not get page
hostnameC - 10
And second time it will also print out like this -
hostnameA - 5
hostnameB - could not get page
hostnameC - 7
so my above script, sends out an email for this case as well since could not get page was two times continuously but infact, hostnameB dataCount value is not greater than equal to 20 at all two times? Right? So there is some bug in my script and not sure how to solve that?
I just need to send out an email, if any of the hostnames dataCount value is greater than equal to 20 for two times continuously. if the machine is down for whatever reason, then I will skip that case but my script should keep on running.
Without changing the get_data_count function:
I took the liberty to make data a dictionary with the server name as index, this makes looking up the last value easier.
I store the last dictionary and then compare the current and old values to 20. Most strings are > 19, so I create an int object from the result, this throws an exception when the result is a string, which I can then again catch to prevent shut-down servers from being counted.
last = False
while True:
data = {}
hit = False
print('')
for name, url in urls:
res = get_data_count(url)
print('{name} - {res}'.format(name=name, res=res))
data[name] = res
try:
if int(res) > 19:
hit = True
except ValueError:
continue
if hit and last:
send_mail(data)
last = hit
sleep(10.)
Pong Wizard is right, you should not handle errors like that. Either return False or None and reference the value later, or just throw an exception.
You should use False for a failed request, instead of the string "could not get page". This would be cleaner, but a False value will also double as a 0 if it is treated as an int.
>>> True + False
1
Summing two or more False values will therefore equal 0.

Strange issue when using google DFP API python client

have a look at this code
order_service = client.GetService('OrderService', version='v201208')
creative_service = client.GetService('CreativeService', version='v201208')
with open('/tmp/urls.txt', 'w') as f:
for i in range(0, 3929, 100):
print 'ORDER BY ID LIMIT 100 OFFSET '+str(i)
creatives = creative_service.getCreativesByStatement({'query':'ORDER BY ID LIMIT 100 OFFSET '+str(i)})
try:
for creative in creatives[0]['results']:
try:
for var in creative['creativeTemplateVariableValues']:
if var['uniqueName'] == 'DetailsPageURL':
print var['value']
f.write(creative['advertiserId']+','+var['value']+"\n")
exception:
pass
except:
raise
pass
The second iteration when offset is 200, will complain at for creative in creatives[0]['results'] about results keyerror, but if I change a try/except statement to if creative.has_key('creativeTemplateVariableValues'): like following fixes the problem:
order_service = client.GetService('OrderService', version='v201208')
creative_service = client.GetService('CreativeService', version='v201208')
with open('/tmp/urls.txt', 'w') as f:
for i in range(0, 3929, 100):
print 'ORDER BY ID LIMIT 100 OFFSET '+str(i)
creatives = creative_service.getCreativesByStatement({'query':'ORDER BY ID LIMIT 100 OFFSET '+str(i)})
try:
print creatives[0]['results']
except:
print creatives
#creatives = creative_service.getCreativesByStatement({'query':'ORDER BY ID LIMIT 10 OFFSET 200'})
try:
for creative in creatives[0]['results']:
if creative.has_key('creativeTemplateVariableValues'):
for var in creative['creativeTemplateVariableValues']:
if var['uniqueName'] == 'DetailsPageURL':
print var['value']
f.write(creative['advertiserId']+','+var['value']+"\n")
except:
raise
pass
Why???
The field 'creativeTemplateVariableValues' creatives of type 'TemplateCreative' so if you have other creatives on your network that's not a TemplateCreative, it will not have the field and throw the key error as you have seen. You can do the has_key check as you have done or an alternative is to do a type check:
if creative['Creative_Type'] == 'ImageCreative':
for var in creative['creativeTemplateVariableValues']:
...
If you only care about TemplateCreatives, I would suggest using a statement filter for that particular creative type. Please see the get_creatives_by_statement example (http://code.google.com/p/google-api-ads-python/source/browse/trunk/examples/adspygoogle/dfp/v201208/get_creatives_by_statement.py)
For future questions regarding DFP API and the related client libraries, please post to the DFP API forum: https://groups.google.com/forum/#!forum/google-doubleclick-for-publishers-api

Categories

Resources