Let's say you have some proprietary python + selenium script that needs to run daily. If you host them on AWS, Google cloud, Azure, etc. are they allowed to see your script ? What is the best practice to "hide" such script when hosted online ?
Any way to "obfuscate" the logic, such as converting python script to binary ?
Can the cloud vendors access your script/source code/program/data?
I am not including government/legal subpoenas in this answer.
They own the infrastructure. They govern access. They control security.
However, in the real world there are numerous firewalls in place with auditing, logging and governance. A cloud vendor employee would risk termination and/or prison time for bypassing these controls.
Secrets (or rumors) are never secret for long and the valuation of AWS, Google, etc. would vaporize if they violated customer trust.
Therefore the answer is yes, it is possible but extremely unlikely. Professionally, I trust the cloud vendors with the same respect I give my bank.
Here you can find information regarding Google Cloud Enterprise Privacy Commitments.
It is described how Google protect the privacy of Google Cloud Platform and Google Workspace customers.
You control your data. Customer data is your data, not Google’s. We
only process your data according to your agreement(s).
We never use your data for ads targeting. We do not process your
customer data or service data to create ads profiles or improve Google
Ads products.
We are transparent about data collection and use. We’re committed to
transparency, compliance with regulations like the GDPR, and privacy
best practices.
We never sell customer data or service data. We never sell customer
data or service data to third parties.
Security and privacy are primary design criteria for all of our
products. Prioritizing the privacy of our customers means protecting
the data you trust us with. We build the strongest security
technologies into our products.
Therefore I believe is extremely unlikely that Google will check the content of your scripts.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 days ago.
The community reviewed whether to reopen this question 4 days ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I'm new to the world of Android development. Self taught in python and still learning. I've spent basically 10 - 14 hours a day every day for 7 months developing a app(cause im addicted to programming now hehe)
Anyway my app is functional but not yet on play store but i'm looking into it, as a result it got me thinking about another rabbit hole to go down...
The app collects location data, phone numbers and some other sensitive data. That data gets stored on firebase realtime database as a JSON. Because firebase is "google" owned, its https and you need to be authorised to access it, in the eyes of proper developers and google, etc. would the JSON data i'm storing of sensitive info be classed as "secure" on firebase or do i have to learn about python encryption and stuff as well ?
No
No, it's not necessarily secure because of the location.
Security is a multi-layered discipline that involves awareness of threat vectors, sensitivity to the threat environment and consideration of the data stored, not to mention legal requirements and financial risks!
Multi-layered Security
To use a concrete example, imagine an extremely "secure" vault with 15-inch thick steel walls. Inside is a priceless treasure trove. However, for the sake of convenience, the owner of the vault has left the key taped to the front door and for the sake of cost has not hired security guards or paid for cameras.
While the vault may be impressive, the way in which it is used makes it an easy target for anyone who wants to break in and steal the contents.
Your firebase database may be physically secure (since it is located inside of a Google warehouse somewhere), but your app holds the keys to the database. If your app is easy to hack into, then the security of your database is compromised.
When we say that security is "multi-layered", it means that you shouldn't be overly-reliant on any one layer of security. Perhaps your database has a password. But if the password is compromised, then all of that data is now compromised. Likewise, if your data is encrypted, but the encryption key is compromised, then all of your data is compromised. But if your database requires a password AND the data is encrypted, then an attacker would need both the password and the encryption key. Having one would not be enough. This is an example of multi-layered security.
Security Doesn't Stop at Your Database
Unfortunately, the need to access data requires, by definition, a breach of your security walls around the database. Again, to use a concrete example, this is like the classic movie trope of a laundry truck entering a maximum security prison. All the barbed wire and guards may be undone by the perfectly ordinary and expected laundry truck driving out the front gate. So in addition to database security, you need to consider app security.
For example, how easy is it for a user to spoof another user in your app? It doesn't matter if your database has many layers of security if an attacker can just use your app to access data for any and all users. (For the sake of this conversation, your "app" includes the service endpoints which your locally installed Android app uses to communicate with the server, which can be easily sniffed out by even an amateur hacker.)
No One-Size-Fits-All Advice
Security is a non-trivial topic and so it's not possible to give you advice on how to secure your app and database. The best advice I can give is to be very thoughtful about what you choose to store. If you are going to have a central database, then assume it will be breached and all the contents leaked. How bad will that be for you? If it will expose you to legal or financial risk, then it may be cost-effective to hire a professional who can help you provide the necessary security for your app. Note that privacy laws are very complicated and vary exceedingly across jurisdictions, so if you are going to store sensitive user data you might need to consult a lawyer.
Here's a quick handful of laws you may need to consider when storing sensitive user data, but there are many, many more:
USA's COPPA (Children's Online Privacy Protection Act)
The EU's GDPR (General Data Protection Regulation)
California's CCPA (California Consumer Privacy Act)
However, it sounds like this is a hobby. If so, then consider alternatives to a central database (which can get very expensive if your app goes viral!). Maybe use a local database so that all data is stored on the user's personal device (and, perhaps, provide an easy way to export/import that data). Some users (include me!) would actually find that to be a valuable feature! Or consider a hybrid model, where sensitive information is stored locally and general, non-personally-identifiable information (PII) is stored centrally (so you can run usage reports, etc).
Security is a balancing act between accessibility and secrecy, so there is not going to be any one-size-fits-all advice.
Learn More
Firebase: Privacy and Security in Firebase
Android: Security tips
Oracle: What is Data Security?
FTC: Mobile Health App Developers: FTC Best Practices
Note that the FTC's #1 Tip is:
1. Minimize Data.
Do you need to collect and retain people’s information? Remember, if you don’t collect data in the first place, you don’t have to go to the effort of securing it. If the data you collect is integral to your product or service, that’s fine, but take reasonable steps to secure the data you transmit and store, and delete it once you no longer have a legitimate business need to retain it. If you collect and retain it, you must protect it.
Can you keep the data in a de-identified form? When data is de-identified, it can’t be reasonably associated with a particular individual. A key to effective de-identification is to ensure that the data cannot be reasonably re-identified. For example, U.S. Department of Health and Human Services regulations require entities covered by the Health Insurance Portability and Accountability Act (HIPAA) either to remove specific identifiers, including date of birth and five-digit zip code, from protected health information or to have a privacy and data security expert determine that the risk of re-identification is “very small.” Appropriately de-identified data can protect people’s privacy while still allowing for beneficial use. For example, if your app collects geolocation information as part of an effort to map asthma outbreaks in a metropolitan area, consider whether you can provide the same functionality while maintaining and using that information in de-identified form. You can reduce the risk of re-identification of location data by not collecting highly specific location data about individual users in the first place, by limiting the number of locations stored for each user, or aggregating location data across users.
Since re-identification is always a risk, it’s important to keep up with technological developments. Publicly commit not to re-identify the data. And make sure your contracts with third parties require them to commit not to re-identify the data. Then monitor the third parties to make sure they live up to their promises.
I am want to connect/know if there are ways to get Bloomberg data to Python. I see we can connect through blpapi/pdblp package.
So wanted to check what is the pricing for this. Appreciate if anyone can help me here?
Getting ways to connect to Python to get Bloomberg data
Bloomberg has a number of products, which support the real-time API known as the BLP API. This API is a microservice based API. They have microservices for streaming marketdata (//blp/mktdata), requesting static reference (//blp/refdata), contributing OTC pricing (//firm/c-gdco), submitting orders (//blp/emsx), etc etc. The API supports a number of languages including Python, Perl, C++, .NET, etc. The API pattern requires setting up a session where you 'target'/connect to a delivery point. There are several flavours of delivery points depending on what Bloomberg products you buy. For the Bloomberg (Professional) Terminal, you have something called Desktop API (DAPI), they have something called the Server (SAPI), they have something called B-PIPE, another is EMSX. They all present delivery points. They all support the same BLP API.
The Bloomberg Terminal's delivery point is localhost:8194. No Bloomberg Terminal, no localhost delivery point. However, maybe your organisation has bought an Enterprise B-PIPE product, in which case you don't need a Bloomberg Terminal, and the delivery point will sit on at least two servers (IPs), again on port 8194.
So, bottom line, the API library is available and you can develop against it. Problem is, the first few lines of creating a session object and connecting to the end point will fail unless you have a Bloomberg product. There's no sandbox, sadly.
Pricing depends on product, and unfortunately you'll also need to consider your application use-case. As an example, if you're writing a systematic trading application, then the licensing of the Bloomberg (Professional) Terminal will not permit that, however, a B-PIPE will include a licence that will permit that (plus hefty exchange fees if not OTC).
Good luck.
there.I am moving forward to use google cloud services to store Django media files. But one thing that stops me is about the Google and Amazon free tier. I had read the google cloud docs but I am confuse about many things. For free tiers, New customers also get $300 in free credits to run, test, and deploy workloads. What I want to know is if they are gonna automatically charge me for using the cloud-storage after 3 months of trial is over because I am gonna put my bank account. This case is same on Aws bucket which allows to store mediafiles for 1 year after then what's gonna happen. Are they auto gonna charge me?
I have never used google cloud before. For AWS free tier you can use the storage with the limited features they allow to free tier. Regarding charges, you can definitely setup a cloudwatch alert in AWS which will alert you if your usage is beyond the free tier limit or you are about to get charged. So you can set that up and be assured you won't get surprise before you get alerted for the same.
Hope this helps. Good luck with your free tier experience.
I'm working on my last semester project which is based on real-time transit. So, I want to develop website (using Python + Django Fw) that will show all our university buses' real-time Geo-location (Google map API or any other please suggest the best one) to the university students and professors, and also shows the bus arrival times for bus stop nearest to them (From wherever they are). This is main module of my project. So the question is that
May I use Google transit for that? Because the statement
"If you provide a transportation service that is open to the public,
and operates with fixed schedules and routes, we welcome your
participation"
in Google's documentation on
"Why Google Maps?"
So in this case my website can only be accessible by our university persons, and I think it may violet Google's policy. And second question,
If I implement this is it accessible from any place other than my website? If it is, then it will become meaning less for those who are not our university persons and so may cause nasty situations.
About Q1:(May I use Google transit for that? ...)
You might be able to use Google transit by using Google Maps Directions API(https://developers.google.com/maps/documentation/directions/).
But, there are several limitations. Transportation agency must be in a supported city(http://www.google.com/landing/transit/cities/index.html) and a public transportation agencies(https://developers.google.com/transit/gtfs-realtime/).
About Q2:(If I implement this is it accessible ...)
Other peoples may access your transit feed info.
But, there are Quality Assurance Review(https://support.google.com/transitpartners/answer/1106422?hl=en&ref_topic=1095593) about the info before launch it.
I want to compare different information (citation, h-index, etc) of professors in a specific field in different institutions all over the world by data mining and analysis techniques. But I have no idea how to extract these data of hundreds of (or even thousands of) professors since Google does not provide an official API for it. So I am wondering are there any other ways to do that?
Use this google code tool will calculate an individual h-index but if you do this on demand for a limited number in a particular field you will not break the terms of use - it doesn't specifically refer to limits on access but does refer to disruption of service (eg bulk requests may potentially do this) the export questions state:
I wrote a program to download lots of search results, but you blocked my computer from accessing Google Scholar. Can you raise the limit?
Err, no, please respect our robots.txt when you access Google Scholar using automated software. As the wearers of crawler's shoes and webmaster's hat, we cannot recommend adherence to web standards highly enough.
Web of Science does have an API available and a collaboration agreement with google scholar but Web of Science only for certain individuals
A solution could be to request user's web of science credential (or your own) to return the information on demand - perhaps for the top ones in the field, then store it as you planned. Google scholar only updates a few times per week and this would not be excessive use.
The other option is to request permission from google, which is an mentioned in the terms of use, although seems unlikely to be granted.
I've done a project exactly on this.
You provide an input text file to the script with the names of the professor you'd like to retrieve the information from, and the script is able to crawl google scholar and manage the info you are interested on.
The project provides also a functionality for downloading automatically the profile picture of the researchers/professors.
In order to respect the constraint imposed by the portal you can set a delay between each requests. if you have >1k of profile to crawl it might take a while but it works.
A concurrency-enabled script has also been implemented and it runs way faster than the basic sequence approach.
note: in order to specify the information you need you have to know either the id of the class of the html generated by google scholar or the name of the class.
good luck!