I'm working on my last semester project which is based on real-time transit. So, I want to develop website (using Python + Django Fw) that will show all our university buses' real-time Geo-location (Google map API or any other please suggest the best one) to the university students and professors, and also shows the bus arrival times for bus stop nearest to them (From wherever they are). This is main module of my project. So the question is that
May I use Google transit for that? Because the statement
"If you provide a transportation service that is open to the public,
and operates with fixed schedules and routes, we welcome your
participation"
in Google's documentation on
"Why Google Maps?"
So in this case my website can only be accessible by our university persons, and I think it may violet Google's policy. And second question,
If I implement this is it accessible from any place other than my website? If it is, then it will become meaning less for those who are not our university persons and so may cause nasty situations.
About Q1:(May I use Google transit for that? ...)
You might be able to use Google transit by using Google Maps Directions API(https://developers.google.com/maps/documentation/directions/).
But, there are several limitations. Transportation agency must be in a supported city(http://www.google.com/landing/transit/cities/index.html) and a public transportation agencies(https://developers.google.com/transit/gtfs-realtime/).
About Q2:(If I implement this is it accessible ...)
Other peoples may access your transit feed info.
But, there are Quality Assurance Review(https://support.google.com/transitpartners/answer/1106422?hl=en&ref_topic=1095593) about the info before launch it.
Related
I am want to connect/know if there are ways to get Bloomberg data to Python. I see we can connect through blpapi/pdblp package.
So wanted to check what is the pricing for this. Appreciate if anyone can help me here?
Getting ways to connect to Python to get Bloomberg data
Bloomberg has a number of products, which support the real-time API known as the BLP API. This API is a microservice based API. They have microservices for streaming marketdata (//blp/mktdata), requesting static reference (//blp/refdata), contributing OTC pricing (//firm/c-gdco), submitting orders (//blp/emsx), etc etc. The API supports a number of languages including Python, Perl, C++, .NET, etc. The API pattern requires setting up a session where you 'target'/connect to a delivery point. There are several flavours of delivery points depending on what Bloomberg products you buy. For the Bloomberg (Professional) Terminal, you have something called Desktop API (DAPI), they have something called the Server (SAPI), they have something called B-PIPE, another is EMSX. They all present delivery points. They all support the same BLP API.
The Bloomberg Terminal's delivery point is localhost:8194. No Bloomberg Terminal, no localhost delivery point. However, maybe your organisation has bought an Enterprise B-PIPE product, in which case you don't need a Bloomberg Terminal, and the delivery point will sit on at least two servers (IPs), again on port 8194.
So, bottom line, the API library is available and you can develop against it. Problem is, the first few lines of creating a session object and connecting to the end point will fail unless you have a Bloomberg product. There's no sandbox, sadly.
Pricing depends on product, and unfortunately you'll also need to consider your application use-case. As an example, if you're writing a systematic trading application, then the licensing of the Bloomberg (Professional) Terminal will not permit that, however, a B-PIPE will include a licence that will permit that (plus hefty exchange fees if not OTC).
Good luck.
Let's say you have some proprietary python + selenium script that needs to run daily. If you host them on AWS, Google cloud, Azure, etc. are they allowed to see your script ? What is the best practice to "hide" such script when hosted online ?
Any way to "obfuscate" the logic, such as converting python script to binary ?
Can the cloud vendors access your script/source code/program/data?
I am not including government/legal subpoenas in this answer.
They own the infrastructure. They govern access. They control security.
However, in the real world there are numerous firewalls in place with auditing, logging and governance. A cloud vendor employee would risk termination and/or prison time for bypassing these controls.
Secrets (or rumors) are never secret for long and the valuation of AWS, Google, etc. would vaporize if they violated customer trust.
Therefore the answer is yes, it is possible but extremely unlikely. Professionally, I trust the cloud vendors with the same respect I give my bank.
Here you can find information regarding Google Cloud Enterprise Privacy Commitments.
It is described how Google protect the privacy of Google Cloud Platform and Google Workspace customers.
You control your data. Customer data is your data, not Google’s. We
only process your data according to your agreement(s).
We never use your data for ads targeting. We do not process your
customer data or service data to create ads profiles or improve Google
Ads products.
We are transparent about data collection and use. We’re committed to
transparency, compliance with regulations like the GDPR, and privacy
best practices.
We never sell customer data or service data. We never sell customer
data or service data to third parties.
Security and privacy are primary design criteria for all of our
products. Prioritizing the privacy of our customers means protecting
the data you trust us with. We build the strongest security
technologies into our products.
Therefore I believe is extremely unlikely that Google will check the content of your scripts.
I am working on a project wherein I need to catalog all the movie and TV show titles from major OTT platforms such as Netflix, Hotstar, Hulu, and such. The metadata collected would be title name, genre, released date, available on.
Further, any automated way to update my list every day with latest movies/shows added?
I did a research of my own and understand that neither of these platforms offers their API hence that option's closed. Scraping the titles is illegal I believe as it may harm their servers.
What are my options to do so?
The basic idea is to display what movie is available on which platform.
There are few apps for that like JustWatch, Reelgood. I don't understand how are they so updated. Are they scraping or anything.
Nevertheless, I need to understand the legal way of extracting the data.
Thanks
Websites/ Projects like reelgood and similiar pages rely on third-party APIs for the most part.
For netflix and hulu there are no public APIs (anymore). (For netflix the (first-party) API is only available for partners)
This article about the shutdown of the public netflix API also mentions alternatives and an API for Hulu. If you decide to take advantage of such third-part APIs do it at our own risk.
There are also paid services such as guidebox providing APIs.
I have an appengine app that I'm happy with and I localized it to several countries and languages. Now I want to localize it for China and Hong Kong but I believe that these areas block appengine so that google cannot be used. What is then the best development plan to make my app available in China? Should I change to a different backend (django + mysql?) and deploy it to some Hong Kong or China hosting or is there a simpler way? I use gae blobstore, gae models, the gae search API, memcache, gnu gettext, jinja2 and the gae mail API (but no compute engine). So the best way I can think of would be to migrate the backend to something than can run and get accessed from China and Hong Kong. Or is there a better way?
There are a few different options:
If you're planning on deploying a separate, localized version for China (with a separate data set), you can use AppScale and running in a Chinese public cloud (Aliyun is pretty easy to get started with). This ensures accessibility for your users in China, but means that this deployment would function as an isolated, China-specific deployment.
If you want to have users around the globe (including Chinese/Hong Kong users) access a single app with a shared backend, you could:
keep the app in GAE and use a reverse proxy (such as CloudFlare) to
get around the firewall issues
use AppScale to run the app (unmodified) in a different public cloud, such as EC2
rewrite the app
In my experience, a reverse proxy works is only a viable solution if performance isn't crucial. For a user-facing web app like KoolBusiness, it might pose a few challenges latency-wise. But I work for AppScale, so most of the users I talk with are already exploring options other than reverse proxy.
Proxying your website will be easier than moving to a different host. You could try using CloudFlare for this.
A potentially simpler option is just mapping the app to a custom domain, see https://stackoverflow.com/a/19093384/4495081.
I see used the http://www.greatfirewallofchina.org tool to check your app and it is reported all green:
http://www.greatfirewallofchina.org/index.php?siteurl=http%3A%2F%2Fwww.koolbusiness.com
There are different solutions available according to your budget:
1- Host your servers in Hong Kong. Contact the hosting platform and inform them that you need your data to be available in China, they would be able to host your data on the fastest data center (the one with the fastest connection to Mainland China). Won’t be the fastest option available but it will be the cheapest option.
2- Use Public Cloud providers such as AWS or Microsoft Azure (outside of China) + their respective CDNs. Most of the cloud providers have endpoints in Hong Kong and you will be able to get an acceptable band-with. Will be cost-efficient.
3- Use Chinese Cloud providers such as AliCloud (China branch, not international), AWS China or Azure China. For that, you will need to have a Chinese company established and you may need to get an ICP license (https://en.wikipedia.org/wiki/ICP_license if you have public content). Expensive but that’s the best option if you decide to operate in China for the long run.
4- Use your own backend and add a professional CDN such as Akamai, they have an option called China CDN that will help you go through the Great Firewall (budget +/- 10k$ per month). Will be very expensive.
I want to compare different information (citation, h-index, etc) of professors in a specific field in different institutions all over the world by data mining and analysis techniques. But I have no idea how to extract these data of hundreds of (or even thousands of) professors since Google does not provide an official API for it. So I am wondering are there any other ways to do that?
Use this google code tool will calculate an individual h-index but if you do this on demand for a limited number in a particular field you will not break the terms of use - it doesn't specifically refer to limits on access but does refer to disruption of service (eg bulk requests may potentially do this) the export questions state:
I wrote a program to download lots of search results, but you blocked my computer from accessing Google Scholar. Can you raise the limit?
Err, no, please respect our robots.txt when you access Google Scholar using automated software. As the wearers of crawler's shoes and webmaster's hat, we cannot recommend adherence to web standards highly enough.
Web of Science does have an API available and a collaboration agreement with google scholar but Web of Science only for certain individuals
A solution could be to request user's web of science credential (or your own) to return the information on demand - perhaps for the top ones in the field, then store it as you planned. Google scholar only updates a few times per week and this would not be excessive use.
The other option is to request permission from google, which is an mentioned in the terms of use, although seems unlikely to be granted.
I've done a project exactly on this.
You provide an input text file to the script with the names of the professor you'd like to retrieve the information from, and the script is able to crawl google scholar and manage the info you are interested on.
The project provides also a functionality for downloading automatically the profile picture of the researchers/professors.
In order to respect the constraint imposed by the portal you can set a delay between each requests. if you have >1k of profile to crawl it might take a while but it works.
A concurrency-enabled script has also been implemented and it runs way faster than the basic sequence approach.
note: in order to specify the information you need you have to know either the id of the class of the html generated by google scholar or the name of the class.
good luck!