I'm working on a divvy dataset project.
I want to scrape information for each suggestion location and comments provided from here http://suggest.divvybikes.com/.
Am I able to scrape this information from Mapbox? It is displayed on a map so it must have the information somewhere.
I visited the page, and logged my network traffic using Google Chrome's Developer Tools. Filtering the requests to view only XHR (XmlHttpRequest) requests, I saw a lot of HTTP GET requests to various REST APIs. These REST APIs return JSON, which is ideal. Only two of these APIs seem to be relevant for your purposes - one is for places, the other for comments associated with those places. The places API's JSON contains interesting information, such as place ids and coordinates. The comments API's JSON contains all comments regarding a specific place, identified by its id. Mimicking those calls is pretty straightforward with the third-party requests module. Fortunately, the APIs don't seem to care about request headers. The query-string parameters (the params dictionary) need to be well-formulated though, of course.
I was able to come up with the following two functions: get_places makes multiple calls to the same API, each time with a different page query-string parameter. It seems that "page" is the term they use internally to split up all their data into different chunks - all the different places/features/stations are split up across multiple pages, and you can only get one page per API call. The while-loop accumulates all places in a giant list, and it keeps going until we receive a response which tells us there are no more pages. Once the loop ends, we return the list of places.
The other function is get_comments, which takes a place id (string) as a parameter. It then makes an HTTP GET request to the appropriate API, and returns a list of comments for that place. This list may be empty if there are no comments.
def get_places():
import requests
from itertools import count
api_url = "http://suggest.divvybikes.com/api/places"
page_counter = count(1)
places = []
for page_nr in page_counter:
params = {
"page": str(page_nr),
"include_submissions": "true"
}
response = requests.get(api_url, params=params)
response.raise_for_status()
content = response.json()
places.extend(content["features"])
if content["metadata"]["next"] is None:
break
return places
def get_comments(place_id):
import requests
api_url = "http://suggest.divvybikes.com/api/places/{}/comments".format(place_id)
response = requests.get(api_url)
response.raise_for_status()
return response.json()["results"]
def main():
from operator import itemgetter
places = get_places()
place_id = places[12]["id"]
print("Printing comments for the thirteenth place (id: {})\n".format(place_id))
for comment in map(itemgetter("comment"), get_comments(place_id)):
print(comment)
return 0
if __name__ == "__main__":
import sys
sys.exit(main())
Output:
Printing comments for the thirteenth place (id: 107062)
I contacted Divvy about this five years ago and would like to pick the conversation back up! The Evanston Divvy bikes are regularly spotted in Wilmette and we'd love to expand the system for riders. We could easily have four stations - at the Metra Train Station, and the CTA station, at the lakefront Gillson Park and possibly one at Edens Plaza in west Wilmette. Please, please, please contact me directly. Thanks.
>>>
For this example, I'm printing all the comments for the 13th place in our list of places. I picked that one because it is the first place which actually has comments (0 - 11 didn't have any comments, most places don't seem to have comments). In this case, this place only had one comment.
EDIT - If you wanted to save the place ids, latitude, longitude and comments in a CSV, you can try changing the main function to:
def main():
import csv
print("Getting places...")
places = get_places()
print("Got all places.")
fieldnames = ["place id", "latitude", "longitude", "comments"]
print("Writing to CSV file...")
with open("output.csv", "w") as file:
writer = csv.DictWriter(file, fieldnames)
writer.writeheader()
num_places_to_write = 25
for place_nr, place in enumerate(places[:num_places_to_write], start=1):
print("Writing place #{}/{} with id {}".format(place_nr, num_places_to_write, place["id"]))
writer.writerow(dict(zip(fieldnames, [place["id"], *place["geometry"]["coordinates"], [c["comment"] for c in get_comments(place["id"])]])))
return 0
With this, I got results like:
place id,latitude,longitude,comments
107098,-87.6711076553,41.9718155716,[]
107097,-87.759540081,42.0121073671,[]
107096,-87.747695446,42.0263916146,[]
107090,-87.6642036438,42.0162096564,[]
107089,-87.6609444613,41.8852953922,[]
107083,-87.6007853815,41.8199433342,[]
107082,-87.6355862613,41.8532736671,[]
107075,-87.6210737228,41.8862644836,[]
107074,-87.6210737228,41.8862644836,[]
107073,-87.6210737228,41.8862644836,[]
107065,-87.6499611139,41.9627251578,[]
107064,-87.6136027649,41.8332984674,[]
107062,-87.7073025402,42.0760990584,"[""I contacted Divvy about this five years ago and would like to pick the conversation back up! The Evanston Divvy bikes are regularly spotted in Wilmette and we'd love to expand the system for riders. We could easily have four stations - at the Metra Train Station, and the CTA station, at the lakefront Gillson Park and possibly one at Edens Plaza in west Wilmette. Please, please, please contact me directly. Thanks.""]"
In this case, I used the list-slicing syntax (places[:num_places_to_write]) to only write the first 25 places to the CSV file, just for demonstration purposes. However, after about the first thirteen were written, I got this exception message:
A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
So, I'm guessing that the comment-API doesn't expect to receive so many requests in such a short amount of time. You may have to sleep in the loop for a bit to get around this. It's also possible that the API doesn't care, and just happened to timeout.
I would like to know if there is a place from where I can download metadata of a given stock. I was studying sometime back about REST API and I though I could maybe use something like this:
stock_code = "GME"
base_url = "https://somestockmarkekpage.com/api/stock?code={}"
resp = requests.get(base_url.format(stock_code))
print(resp.json()['short_ratio'])
The problem is I dont know any base_url from where I can download this data, dont even know if it exist for free. However any other API or service you could provide is very welcome
There is a free API provided by Yahoo that contains up to date data related with several tickets. You can see the API details here. One example to extract metadata from a ticket would be:
import yfinance as yf
stock_obj = yf.Ticker("GME")
# Here are some fixs on the JSON it returns
validated = str(stock_obj.info).replace("'","\"").replace("None", "\"NULL\"").replace("False", "\"FALSE\"").replace("True", "\"TRUE\"")
# Parsing the JSON here
meta_obj = json.loads(validated)
# Some of the short fields
print("sharesShort: "+str( meta_obj['sharesShort']))
print("shortRatio: "+str( meta_obj['shortRatio']))
print("shortPercentOfFloat: "+str( meta_obj['shortPercentOfFloat']))
The output for the ticket you are interested in would be:
sharesShort: 61782730
shortRatio: 2.81
shortPercentOfFloat: 2.2642
You can use the free Yahoo Finance API and their most popular Python library yfinance.
Link: https://pypi.org/project/yfinance/
Sample Code:
import yfinance as yf
GME_data = yf.Ticker("GME")
# get stock info
GME_data.info
Other than that you can also use many other API. You can search in RapidAPI and search "Stock".
I have a data frame:
I want to automatically insert the data frame details in quip. I have searched online, but couldn't find any satisfactory answer. Please help
here is my answer based on a similar problem and this article: https://towardsdatascience.com/updating-a-quip-spreadsheet-with-python-api-1b4bb24d4aac
First, follow the step to get and use a personal access token from quip.com/dev/token--this will help with your authentication.
Then, you can get an updated client version from Lynn Zheng's medium post (linked above) https://github.com/RuolinZheng08/quip-api for local import.
My imports look like this:
import quip_update as quip #from Zheng's repo
from login_token import login_token #this is a variable that holds the value of the token I got from their auth website
Then you setup/authorize the client with the following:
quip_client = quip.QuipClient(access_token=login_token, base_url='https://platform.quip.com')
user = quip_client.get_authenticated_user()
If your company has a contract with quip, it might look like base_url='https://platform.quip-amazon.com'
I like to print(user) to see basic info/that the client connected.
Then, again mostly narrating Zheng's article, you can use one of the client functions to insert a spreadsheet:
def add_to_spreadsheet(self, thread_id, *rows, **kwargs):
'''Adds the given rows to the named (or first) spreadsheet in the
given document.
client = quip.QuipClient(...)
client.add_to_spreadsheet(thread_id, ["5/1/2014", 2.24])'''
from quip.py
So you can put a spreadsheet in manually and then call to it by name="name of spreadsheet" to incorporate rows from Pandas.
So, for example:
I am trying to connect knack online database with my python data handling scripts in order to renew objects/tables directly into my knack app builder. I discovered pyknackhq Python API for KnackHQ can fetch objects and return json objects for the object's records. So far so good.
However, following the documentation (http://www.wbh-doc.com.s3.amazonaws.com/pyknackhq/quick%20start.html) I have tried to fetch all rows (records in knack) for my object-table (having in total 344 records).
My code was:
i =0
for rec in undec_obj.find():
print(rec)
i=i+1
print(i)
>> 25
All first 25 records were returned indeed, however the rest until the 344-th were never returned. The documentation of pyknackhq library is relatively small so I couldn't find a way around my problem there. Is there a solution to get all my records/rows? (I have also changed the specification in knack to have all my records appear in the same page - page 1).
The ultimate goal is to take all records and make them a pandas dataframe.
thank you!
I haven't worked with that library, but I've written another python Knack API wrapper that should help:
https://github.com/cityofaustin/knackpy
The docs should get you where you want to go. Here's an example:
>>> from knackpy import Knack
# download data from knack object
# will fetch records in chunks of 1000 until all records have been downloaded
# optionally pass a rows_per_page and/or page_limit parameter to limit record count
>>> kn = Knack(
obj='object_3',
app_id='someappid',
api_key='topsecretapikey',
page_limit=10, # not needed; this is the default
rows_per_page=1000 # not needed; this is the default
)
>>> for row in kn.data:
print(row)
{'store_id': 30424, 'inspection_date': 1479448800000, 'id': '58598262bcb3437b51194040'},...
Hope that helps. Open a GitHub issue if you have any questions using the package.
Using any Facebook API for Python, I am trying to get the # of people who shared my post and who those people are. I currently have the first part..
>>> from facepy import *
>>> graph = GraphAPI("CAAEr")
>>> g = graph.get('apple/posts?limit=20')
>>> g['data'][10]['shares']
That gets the count, but I want to know who those people are.
The sharedposts connection will give you more information about the shares of a post. You have to GET USER_ID?fields=sharedposts. This information doesn't appear in the doc.
This follows your code:
# Going through each of your posts one by one
for post in g['data']:
# Getting post ID
id = post['id'] # Gives USERID_POSTID
id[-17:] # We know that a post ID is represented by 17 numerals
# Another Graph request to get the shared posts
shares = graph.get(id + '?fields=sharedposts')
print('Post' id, 'was shared by:')
# Displays the name of each sharer
print share['from']['name'] for share in shares['data']
It is my first time with Python, there might be some syntax errors in the code. But you got the idea.