Separate python tuple with newlines - python

I am working on a simple script that collects earthquake data and sends me a text with the info. For some reason I am not able to get my data to be separated by new lines. Im sure I am missing something easy but Im still pretty new to programming so any help is greatly appreciated! Some of the script below:
import urllib.request
import json
from twilio.rest import Client
import twilio
events_list = []
def main():
#Site to pull quake json data
urlData = "http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.geojson"
webUrl = urllib.request.urlopen(urlData)
if (webUrl.getcode() == 200):
data = webUrl.read()
# Use the json module to load the string data into a dictionary
theJSON = json.loads(data)
# collect the events that only have a magnitude greater than 4
for i in theJSON["features"]:
if i["properties"]["mag"] >= 4.0:
events_list.append(("%2.1f" % i["properties"]["mag"], i["properties"]["place"]))
print(events_list)
# send with twilio
body = events_list
client = Client(account_sid, auth_token)
if len(events_list) > 0:
client.messages.create (
body = body,
to = my_phone_number,
from_ = twilio_phone_number
)
else:
print ("Received an error from server, cannot retrieve results " + str(webUrl.getcode()))
if __name__ == "__main__":
main()

To split the tuple with newlines, you need to call the "\n".join() function. However, you need to first convert all of the elements in the tuple into strings.
The following expression should work on a given tuple:
"\n".join(str(el) for el in mytuple)
Note that this is different from converting the entire tuple into a string. Instead, it iterates over the tuple and converts each element into its own string.

Since you have the list of tuples stored in "events_list" you could probably do something like this:
for event in events_list:
print(event[0],event[1])
It will give you something like this:
4.12 10km near Florida
5.00 4km near Bay

Related

How to optimize retrieval of 10 most frequent words inside a json data object?

I'm looking for ways to make the code more efficient (runtime and memory complexity)
Should I use something like a Max-Heap?
Is the bad performance due to the string concatenation or sorting the dictionary not in-place or something else?
Edit: I replaced the dictionary/map object to applying a Counter method on a list of all retrieved names (with duplicates)
minimal request: script should take less then 30 seconds
current runtime: it takes 54 seconds
# Try to implement the program efficiently (running the script should take less then 30 seconds)
import requests
# Requests is an elegant and simple HTTP library for Python, built for human beings.
# Requests is the only Non-GMO HTTP library for Python, safe for human consumption.
# Requests is not a built in module (does not come with the default python installation), so you will have to install it:
# http://docs.python-requests.org/en/v2.9.1/
# installing it for pyCharm is not so easy and takes a lot of troubleshooting (problems with pip's main version)
# use conda/pip install requests instead
import json
# dict subclass for counting hashable objects
from collections import Counter
#import heapq
import datetime
url = 'https://api.namefake.com'
# a "global" list object. TODO: try to make it "static" (local to the file)
words = []
#####################################################################################
# Calls the site http://www.namefake.com 100 times and retrieves random names
# Examples for the format of the names from this site:
# Dr. Willis Lang IV
# Lily Purdy Jr.
# Dameon Bogisich
# Ms. Zora Padberg V
# Luther Krajcik Sr.
# Prof. Helmer Schaden etc....
#####################################################################################
requests.packages.urllib3.disable_warnings()
t = datetime.datetime.now()
for x in range(100):
# for each name, break it to first and last name
# no need for authentication
# http://docs.python-requests.org/en/v2.3.0/user/quickstart/#make-a-request
responseObj = requests.get(url, verify=False)
# Decoding JSON data from returned response object text
# Deserialize ``s`` (a ``str``, ``bytes`` or ``bytearray`` instance
# containing a JSON document) to a Python object.
jsonData = json.loads(responseObj.text)
x = jsonData['name']
newName = ""
for full_name in x:
# make a string from the decoded python object concatenation
newName += str(full_name)
# split by whitespaces
y = newName.split()
# parse the first name (check first if header exists (Prof. , Dr. , Mr. , Miss)
if "." in y[0] or "Miss" in y[0]:
words.append(y[2])
else:
words.append(y[0])
words.append(y[1])
# Return the top 10 words that appear most frequently, together with the number of times, each word appeared.
# Output example: ['Weber', 'Kris', 'Wyman', 'Rice', 'Quigley', 'Goodwin', 'Lebsack', 'Feeney', 'West', 'Marlen']
# (We don't care whether the word was a first or a last name)
# list of tuples
top_ten =Counter(words).most_common(10)
top_names_list = [name[0] for name in top_ten ]
print((datetime.datetime.now()-t).total_seconds())
print(top_names_list)
You are calling an endpoint of an API that generates dummy information one person at a time - that takes considerable amount of time.
The rest of the code is taking almost no time.
Change the endpoint you are using (there is no bulk-name-gathering on the one you use) or use built-in dummy data provided by python modules.
You can clearly see that "counting and processing names" is not the bottleneck here:
from faker import Faker # python module that generates dummy data
from collections import Counter
import datetime
fake = Faker()
c = Counter()
# get 10.000 names, split them and add 1st part
t = datetime.datetime.now()
c.update( (fake.name().split()[0] for _ in range(10000)) )
print(c.most_common(10))
print((datetime.datetime.now()-t).total_seconds())
Output for 10000 names:
[('Michael', 222), ('David', 160), ('James', 140), ('Jennifer', 134),
('Christopher', 125), ('Robert', 124), ('John', 120), ('William', 111),
('Matthew', 111), ('Lisa', 101)]
in
1.886564 # seconds
General advise for code optimization: measure first then optimize the bottlenecks.
If you need a codereview you can check https://codereview.stackexchange.com/help/on-topic and see if your code fits with the requirements for the codereview stackexchange site. As with SO some effort should be put into the question first - i.e. analyzing where the majority of your time is being spent.
Edit - with performance measurements:
import requests
import json
from collections import defaultdict
import datetime
# defaultdict is (in this case) better then Counter because you add 1 name at a time
# Counter is superiour if you update whole iterables of names at a time
d = defaultdict(int)
def insertToDict(n):
d[n] += 1
url = 'https://api.namefake.com'
api_times = []
process_times = []
requests.packages.urllib3.disable_warnings()
for x in range(10):
# for each name, break it to first and last name
try:
t = datetime.datetime.now() # start time for API call
# no need for authentication
responseObj = requests.get(url, verify=False)
jsonData = json.loads(responseObj.text)
# end time for API call
api_times.append( (datetime.datetime.now()-t).total_seconds() )
x = jsonData['name']
t = datetime.datetime.now() # start time for name processing
newName = ""
for name_char in x:
# make a string from the decoded python object concatenation
newName = newName + str(name_char)
# split by whitespaces
y = newName.split()
# parse the first name (check first if header exists (Prof. , Dr. , Mr. , Miss)
if "." in y[0] or "Miss" in y[0]:
insertToDict(y[2])
else:
insertToDict(y[0])
insertToDict(y[1])
# end time for name processing
process_times.append( (datetime.datetime.now()-t).total_seconds() )
except:
continue
newA = sorted(d, key=d.get, reverse=True)[:10]
print(newA)
print(sum(api_times))
print(sum( process_times ))
Output:
['Ruecker', 'Clare', 'Darryl', 'Edgardo', 'Konopelski', 'Nettie', 'Price',
'Isobel', 'Bashirian', 'Ben']
6.533625
0.000206
You can make the parsing part better .. I did not, because it does not matter.
It is better to use timeit for performance testing (it calls code multiple times and averages, smoothing artifacts due to caching/lag/...) (thx #bruno desthuilliers ) - in this case I did not use timeit because I do not want to call API 100000 times to average results

Why would Python's sys.stdout.write() not work after I index a list obtained from a file object using os.popen?

Updated Updated Question:
I have a JQuery $.ajax call that looks like this:
$.ajax({
url: 'http://website.domain.gov/cgi-bin/myFolder/script.py',
type: 'post',
dataType: 'json',
data:{'lat':'30.5', 'lon':'-80.2'},
success: function(response){alert(response.data.lat);
otherFunction();}
});
In script.py, which runs with no errors on its own, I do the following:
#!/usr/bin/python
import sys, json, cgi
import os
fs = cgi.FieldStorage()
d = {}
for k in fs.keys():
d[k] = fs.getvalue(k)
ilat = d['lat']
ilon = d['lon']
pntstr = ilat + ',' + ilon
my_list = []
#os.chdir('..'); os.chdir('..')
#os.chdir('a_place/where_data/exists')
# If you'd like to run this code, you probably don't have grib files or 'degrib', so another unix command will have to be utilized to open some sort of dummy data
f = os.popen('degrib '+'datafilename '+
'options '+ pntstr)
out = f.read()
# data has 5 columns, many rows
j = 0
while j < (5*num_lines_to_read):
my_list.append(out[j+4])
j += 5
f.close()
#os.chdir('..') etc until I get back to /cgi-bin/myFolder directory...
x = my_list[0]
result = {}
result['data'] = d
sys.stdout.write('Content-Type: application/json\n\n')
sys.stdout.write(json.dumps(result))
sys.stdout.close()
NOTE: 'd' is the dummy value and has nothing to do with 'my_list'...for now
When I update the web page I'm developing and try to return 'd' from Python (in the form of an alert), I get a "very helpful" server 500 error.
I've narrowed the problem down to the "x = my_list[0]" line. If I instead type "x = my_list", I get no error. sys.stdout.write() only stops working when I try to index "my_list". I tried printing the list. It's not empty and contains the expected values with expected types. I get successful output from sys.stdout.write() if I place it before "x = my_list[0]". Problem is, I'll eventually use "my_list" to create the output that is written to stdout.
Is there some obscure file I/O thing that I'm missing here?
The expression my_list[0] is throwing an exception. Either it isn't a list, or whatever it is doesn't support the array operator.
Try examining your web server log files. Try printing type(my_list) and len(my_list) after you emit the content-type header.

Is is possible to search gmail via imap for multiple X-GM-THRIDs in one request?

I have many X-GM-THRIDs for which I want to retreive the list of uids for those threads.
https://developers.google.com/gmail/imap_extensions describes how to find these for a single X-GM-THRID. However, it would be nice if I could make one request for all of the X-GM-THRIDs that I want the data for. When I try this, I only get error codes.
Does anyone know if this is possible?
I'm using python and doing something like:
res, ids = imap_conn.uid('search', 'X-GM-THRID', thread_id)
# this works fine.
But I'd like to do something like:
res, ids = imap_conn.uid('search', 'X-GM-THRID', [thread_id1, thread_id2])
UPDATE:
after #max's answer, I'm now doing something like this:
res, ids = imap_conn.uid(
'search', None,
'(OR (OR (X-GM-THRID 123) X-GM-THRID 456) X-GM-THRID 789)')
# 123, 456, 789 are fake gmail thread ids
I'm not sure exactly what or how gmail limits search length - I think it's probably number of characters rather than number of 'OR's in the query. I can get away with about 9 before gmail starts returning errors.
You would need to add "or" qualifiers and then build an extended search query. See RFC 3501 for search format.
I am able to get UIDs from thread-ID using following set of code.
from imaplib import IMAP4_SSL
mail = IMAP4_SSL('imap.gmail.com')
mail.login(<Email>,<password>)
mail.select('INBOX')
result, uid = mail.uid('search',None,"ALL") # Get all the UIDs
uid_list = []
for uids in uid[0].split():
uid_list.append(uids)
for p in uids_list:
result, data = mail.uid('fetch', p, '(X-GM-THRID)')
q = data[0].split() #splitting all thread IDs into list.
m = q[2]
res, ids = mail.uid('search', 'X-GM-THRID', m)
w = ids[0].split()
print w # printing UIDs related to Thread-IDs
At first I am getting all UIDs from Inbox and using those UIDs I am searching Thread-IDs and using those Thread-IDs I am again getting UIDs related to particular Thread-IDs.

Using Python gdata to clear the rows in worksheet before adding data

I have a Google Spreadsheet which I'm populating with values using a python script and the gdata library. If i run the script more than once, it appends new rows to the worksheet, I'd like the script to first clear all the data from the rows before populating it, that way I have a fresh set of data every time I run the script. I've tried using:
UpdateCell(row, col, value, spreadsheet_key, worksheet_id)
but short of running a two for loops like this, is there a cleaner way? Also this loop seems to be horrendously slow:
for x in range(2, 45):
for i in range(1, 5):
self.GetGDataClient().UpdateCell(x, i, '',
self.spreadsheet_key,
self.worksheet_id)
Not sure if you got this sorted out or not, but regarding speeding up the clearing out of current data, try using a batch request. For instance, to clear out every single cell in the sheet, you could do:
cells = client.GetCellsFeed(key, wks_id)
batch_request = gdata.spreadsheet.SpreadsheetsCellsFeed()
# Iterate through every cell in the CellsFeed, replacing each one with ''
# Note that this does not make any calls yet - it all happens locally
for i, entry in enumerate(cells.entry):
entry.cell.inputValue = ''
batch_request.AddUpdate(cells.entry[i])
# Now send the entire batchRequest as a single HTTP request
updated = client.ExecuteBatch(batch_request, cells.GetBatchLink().href)
If you want to do things like save the column headers (assuming they are in the first row), you can use a CellQuery:
# Set up a query that starts at row 2
query = gdata.spreadsheet.service.CellQuery()
query.min_row = '2'
# Pull just those cells
no_headers = client.GetCellsFeed(key, wks_id, query=query)
batch_request = gdata.spreadsheet.SpreadsheetsCellsFeed()
# Iterate through every cell in the CellsFeed, replacing each one with ''
# Note that this does not make any calls yet - it all happens locally
for i, entry in enumerate(no_headers.entry):
entry.cell.inputValue = ''
batch_request.AddUpdate(no_headers.entry[i])
# Now send the entire batchRequest as a single HTTP request
updated = client.ExecuteBatch(batch_request, no_headers.GetBatchLink().href)
Alternatively, you could use this to update your cells as well (perhaps more in line with that you want). The link to the documentation provides a basic way to do that, which is (copied from the docs in case the link ever changes):
import gdata.spreadsheet
import gdata.spreadsheet.service
client = gdata.spreadsheet.service.SpreadsheetsService()
# Authenticate ...
cells = client.GetCellsFeed('your_spreadsheet_key', wksht_id='your_worksheet_id')
batchRequest = gdata.spreadsheet.SpreadsheetsCellsFeed()
cells.entry[0].cell.inputValue = 'x'
batchRequest.AddUpdate(cells.entry[0])
cells.entry[1].cell.inputValue = 'y'
batchRequest.AddUpdate(cells.entry[1])
cells.entry[2].cell.inputValue = 'z'
batchRequest.AddUpdate(cells.entry[2])
cells.entry[3].cell.inputValue = '=sum(3,5)'
batchRequest.AddUpdate(cells.entry[3])
updated = client.ExecuteBatch(batchRequest, cells.GetBatchLink().href)

How do I change ADO ResultSet format in python?

I have the following code to query a database using an ADO COMObject in python. This is connecting to a Time series database (OSIPI) and this is the only way we've been able to get Python connected to the database.
from win32com.client import Dispatch
oConn = Dispatch('ADODB.Connection')
oRS = Dispatch('ADODB.RecordSet')
oConn.ConnectionString = <my connection string>
oConn.Open()
oRS.ActiveConnection = oConn
if oConn.State == adStateOpen:
print "Connected to DB"
else:
raise SystemError('Database Connection Failed')
cmd = """SELECT tag, dataowner FROM pipoint WHERE tag LIKE 'TEST_TAG1%'"""
self.oRS.Open(cmd)
result = oRS.GetRows(1)
print result
result2 = oRS.GetRows(2)
print result2
if oConn.State == adStateOpen:
oConn.Close()
oConn = None
This code returns the following two lines as results to the query:
result ((u'TEST_TAG1.QTY.BLACK',), (u'piadmin',))
result2 = ((u'TEST_TAG1.QTY.BLACK', u'TEST_TAG1.QTY.PINK'), (u'piadmin', u'piuser'))
This is not the expected format. In this case, I was expecting something like this:
result = ((u'TEST_TAG1.QTY.BLACK',u'piadmin'))
result2 = ((u'TEST_TAG1.QTY.BLACK',u'piadmin'),
(u'TEST_TAG1.QTY.PINK',u'piuser'))
Is there a way to adjust the results of an ADO query so everything related to row 1 is in the same tuple and everything in row 2 is in the same tuple?
What you're seeing is not really a Python thing but the output of GetRows(), which returns a two-dimensional array, which is organized by by field and then row.
Fortunately, Python has the zip() function that will make the relevant change for you. Try changing your code from:
result = oRS.GetRows(1)
to:
result = zip(*oRS.GetRows(1))
etc.

Categories

Resources