Can you use secret tests in github classroom autograding? - python

For a new machine learning course we're looking to design a series of coding assignments in which students get some starter code, and make improvements until the unit tests pass. Then they commit and push their code back to the remote where an autograding workflow runs more tests to see if they did adequate work.
What we'd like to do would be to give the students some tests that they can look into, to see what the general programming goal is; but to also have a secret unit test to try their code on data that the students have never seen. On this unseen test data they'd have to reach at least a certain accuracy score to get a passing grade.
The question is: can this be done in github classroom? It seems that the default setup is to give all the tests openly in the starter code repository. But we want to have some tests that the students can't see, so that we can test if they're only narrowly writing to the visible test or actually writing a properly generic solution.
If this isn't directly possible, is there a workaround strategy?

No idea if this could or would work, but maybe try the top answer from here :
"
GitHubPages (like Bitbucket Pages and GitLab Pages) only serve static pages, so the only solution is something client side (Javascript).
A solution could be, instead of using real authentication, just to share only a secret (password) with all the authorized persons and implement one of the following scheme:
put all the private files in a (not listed) subdirectory and name that with the hash of the chosen password. The index page asks you (with Javascript) for the password and build the correct start link calculating the hash.
See for example: https://github.com/matteobrusa/Password-protection-for-static-pages
PRO: Very simple approach protecting a whole subdirectory tree
CONS:
possible attack: sniffing the following requests to obtain the name of the subdirectory
the admins on the hosting site have access to the full contents
crypt the page with password and decrypt on the fly with javascript
see for example: https://github.com/robinmoisson/staticrypt
PRO: no plaintext page code around (decrypting happens on the client side)
CONS:
just a single page, and need to reinsert the password on every refresh
an admin could change your Javascript code to obtain the password when you insert it"

Related

Lettuce BDD - Values in scenario defined in config file

I'm pretty new to BDD and Lettuce and I cam across an issue which I'm not sure how best to handle.
I want to create a Lettuce test suite which I can then run against different environments, where some parameters in the scenario would be different for each environment.
So following the Lettuce documentation I have this example scenario:
Scenario: Create correct config
Given I have IP "127.0.0.0:8000"
And I specify username "myuser" and password "mypassword"
When I connect to the server
Then I get return code 200
In this case I would have to change the IP, user and password for each environment. But this is not practical and I want to be able to have some config file which I can create for each environment and it would contain the value for these parameters.
I found out about terrain.py and saw that you can set variables in this file which you can access from your steps.py using world.
So it would be possible to re-word the scenario like this:
Scenario: Create correct config
Given I have a correct IP
And I specify correct credentials
When I connect to the sever
Then I get return code 200
Now in the step definitions example for "I have a correct IP" you can use world.correctIP which will be defined in terrain.py.
This would work in the way I need it to, but I'm not convinced this is the correct way to do it and if terrain.py was intended to be used like this...or is there a different way to handle this situation?
I would say that hiding the implementation details is a good approach. That is, I have a correct IP is a better way to go than keeping this detail in a property file.
BDD is all about communication. If it is enough to know that you use the correct ip, then there is no need to know which ip when you read the example.

How to load dummy/fake/mock emails and email folders to a shared IMAP account to test MUA configs?

Summary
How do I create a set of on-demand, mock/dummy/fake emails in numerous
IMAP folders? Email content needs to be share-able in a public forum via
a commonly-access IMAP-server account (typically for testers/developers
trying to debug MUA problems/configurations), with no privacy
risks.
I haven't yet found a solution to do this. Unless I can find something
(I'm totally looking for suggestions), I'm relegated to writing my own
software client. If so, I'd do it in Python, and I'm looking for general
pointers on which tools/libraries/methods/approaches I should employ to
most-quickly get a first, working prototype.
How should I solve the above, given the context below?
Purpose
I want to test various MUA deployments, sharing the same IMAP
account between many users/testers/developers (of any MUA) in a
public arena. Example: I might ask on a Notmuch email list:
"Why is my mbsync/Notmuch
config not working? Here's a shared gmail.com account we can
collectively use as a common IMAP server to minimize server-side
variables and thus collectively help debug stuff."
IMAP-Client Requirements
The IMAP-client program:
must be able to create a variable number of nested IMAP folders with any
number emails,
must prove all email content and folder names are share-able,
with no privacy concerns (any reasonable content will do,
and it doesn't have to make sense; eg #1: variants of Lorem
ipsum might work; eg #1:
provide input for 3 or more example emails, provided by the user/caller)
so long as the emails can be opened and read by MUAs, and their
attachments are "real" enough to be opened by the attachment-file's
corresponding application,
will include some number of emails with 1 or more attachments,
will optimally (but not required for initial versions) be capable
of generating GB's of content by creating many thousands of emails in
hundreds of nested IMAP folders. The client can leverage many or large
file attachments will help do this.
must be able to do all of the above on-demand, given any new/fresh
IMAP-server account credentials.
As an implementation shortcut, it's ok for the client to duplicate
much/most of the email conent, so long as there's significant variance
in Date:, To:, From:, and Subject: headers and email-folder names (all
of which are presumably easy to "randomize").
More Details
I've pondered trying to non-private-ize existing emails/folders from
IMAP accounts I already have (that serve the above requirements), but
that work appears way too hard. Too much personal, sensitive information
would need to be "converted"/"private-ized." However, I'd like to hear
options for ways to easily privatize (scramble, encrypt, something?)
this existing email content. Such a path might save me having to write
the software.
The only way I see to solve this properly: leverage an IMAP client
program (again, I'm presumably writing it) that can create emails and
email folders on any designated IMAP server/account. Program input can
include example (presumably private) email content, number of folders
and nesting levels, randomness, date ranges (of emails), etc.
I've not yet found anything that does this.
GreenMail appears to setup
the IMAP server, but not the IMAP server content--unless I'm overlooking
something?

How to encrypt and decrypt passwords for selenium testing?

The context is testing of a web app with selenium while using a number of virtual user accounts we created for this very purpose. And so the testing process needs to access our sites and log-on with the virtual user's id and password.
None of these accounts are critical and they are flagged as testing accounts so no damage can be done. Still, it would probably be a good idea to encrypt the passwords and decrypt them prior to use.
If it matter, our test app is written in Python, Django and uses PostgreSQL for the database. It runs on a small Linode instance.
What might best practices be for something like this?
EDIT 1
The other thought I had was to store the credentials on a second machine and access them through and API while only allowing that access to happen from a known server's non-public IP. In other words, get two instances at Linode and create a private machine-to-machine connection within the data center.
In this scenario, access to the first machine would allow someone to potentially make requests to the second machine if they are able to de-obfuscate the API code. If someone really wants the data they can certainly get it.
We could add two factor authentication as a way to gate the tests. In other words, even if you had our unencrypted test_users table you couldn't do anything with them because of the 2FA mechanism in place just for these users.
Being that this is for testing purposes only I am starting to think the best solution might very well be to populate the test_users table with valid passwords only while running a test. We could keep the data safe elsewhere and have a script that uploads the data to the test server when we want to run a test suite. Someone with access to this table could not do thing with it because all the passwords would be invalid. In fact, we could probably use this fact to detect such a breach.
I just hate the idea of storing unencrypted passwords even if it is for test users that can't really do any damage to the actual app (their transactions being virtual).
EDIT 2
An improvement to that would be to go ahead and encrypt the data and keep it in the test server. However, every time the tests are run the system would reach out to us for the crypto key. And, perhaps, after the test is run the data is re-encrypted with a new key. A little convoluted but it would allow for encrypted passwords (and even user id's, just to make it harder) on the test server. The all-important key would be nowhere near the server and it would self-destruct after each use.
What is generally done in a case like this is to put the password through a cryptographic hash function, and store the hashed password.
To verify a login, hash the provided password and compare the calculated hash to the stored version.
The idea behind this is that it is considered impossible to reverse a good cryptographic hash function. So it doesn't matter if an attacker could read the hashed passwords.
Example in Python3:
In [1]: import hashlib
In [2]: hashlib.sha256('This is a test'.encode('utf8')).hexdigest()
Out[2]: 'c7be1ed902fb8dd4d48997c6452f5d7e509fbcdbe2808b16bcf4edce4c07d14e'
In [3]: hashlib.sha256('This is a tist'.encode('utf8')).hexdigest()
Out[3]: 'f80b4162fc28f1f67d1a566da60c6c5c165838a209e89f590986333d62162cba'
In [4]: hashlib.sha256('This is a tst.'.encode('utf8')).hexdigest()
Out[4]: '1133d07c24ef5f46196ff70026b68c4fa703d25a9f12405ff5384044db4e2adf'
(for Python2, just leave out the encode.)
As you can see, even one-letter changes lead to a big change in the hash value.

Fetching images from URL and saving on server and/or Table (ImageField)

I'm not seeing much documentation on this. I'm trying to get an image uploaded onto server from a URL. Ideally I'd like to make things simple but I'm in two minds as to whether using an ImageField is the best way or simpler to simply store the file on the server and display it as a static file. I'm not uploading anyfiles so I need to fetch them in. Can anyone suggest any decent code examples before I try and re-invent the wheel?
Given an URL say http://www.xyx.com/image.jpg, I'd like to download that image to the server, put it into a suitable location after renaming. My question is general as I'm looking for examples of what people have already done. So far I just see examples relating to uploading images, but that doesn't apply. This should be a simple case and I'm looking for a canonical example that might help.
This is for uploading an image from the user: Django: Image Upload to the Server
So are there any examples out there that just deal with the process of fetching and image and storing on the server and/or ImageField.
Well, just fetching an image and storing it into a file is straightforward:
import urllib2
with open('/path/to/storage/' + make_a_unique_name(), 'w') as f:
f.write(urllib2.urlopen(your_url).read())
Then you need to configure your Web server to serve files from that directory.
But this comes with security risks.
A malicious user could come along and type a URL that points nowhere. Or that points to their own evil server, which accepts your connection but never responds. This would be a typical denial of service attack.
A naive fix could be:
urllib2.urlopen(your_url, timeout=5)
But then the adversary could build a server that accepts a connection and writes out a line every second indefinitely, never stopping. The timeout doesn’t cover that.
So a proper solution is to run a task queue, also with timeouts, and a carefully chosen number of workers, all strictly independent of your Web-facing processes.
Another kind of attack is to point your server at something private. Suppose, for the sake of example, that you have an internal admin site that is running on port 8000, and it is not accessible to the outside world, but it is accessible to your own processes. Then I could type http://localhost:8000/path/to/secret/stats.png and see all your valuable secret graphs, or even modify something. This is known as server-side request forgery or SSRF, and it’s not trivial to defend against. You can try parsing the URL and checking the hostname against a blacklist, or explicitly resolving the hostname and making sure it doesn’t point to any of your machines or networks (including 127.0.0.0/8).
Then of course, there is the problem of validating that the file you receive is actually an image, not an HTML file or a Windows executable. But this is common to the upload scenario as well.

Security measures for controlling access to web-services/API

I have a webapp with some functionality that I'd like to be made accessible via an API or webservice. My problem is that I want to control where my API can be accessed from, that is, I only want the apps that I create or approve to have access to my API. The API would be a web-based REST service. My users do not login, so there is no authentication of the user. The most likely use case, and the one to work with now, is that the app will be an iOS app. The API will be coded with django/python.
Given that it is not possible to view the source-code of an iOS app (I think, correct me if I'm wrong), my initial thinking is that I could just have some secret key that is passed in as a parameter to the API. However, anyone listening in on the connection would be able to see this key and just use it from anywhere else in the world.
My next though is that I could add a prior step. Before the app gets to use API it must pass a challenge. On first request, my API will create a random phrase and encrypt it with some secret key (RSA?). The original, unencrypted phrase will be sent to the app, which must also encrypt the phrase with the same secret key and send back the encrypted text with their request. If the encryptions match up, the app gets access but if not they don't.
My question is: Does this sound like a good methodology and, if so, are there any existing libraries out there that can do these types of things? I'll be working in python server-side and objective-c client side for now.
The easiest solution would be IP whitelisting if you expect the API consumer to be requesting from the same IP all the time.
If you want to support the ability to 'authenticate' from anywhere, then you're on the right track; it would be a lot easier to share an encryption method and then requesting users send a request with an encrypted api consumer handle / password / request date. Your server decodes the encrypted value, checks the handle / password against a whitelist you control, and then verifies that the request date is within some timeframe that is valid; aka, if the request date wasnt within 1 minute ago, deny the request (that way, someone intercepts the encrypted value, it's only valid for 1 minute). The encrypted value keeps changing because the request time is changing, so the key for authentication keeps changing.
That's my take anyways.
In addition to Tejs' answer, one known way is to bind the Product ID of the OS (or another unique ID of the client machine) with a specific password that is known to the user, but not stored in the application, and use those to encrypt/decrypt messages. So for example, when you get the unique no. of the machine from the user, you supply him with password, such that they complete each other to create a seed X for RC4 for example and use it for encryption / decryption. this seed X is known to the server as well, and it also use it for encryption / decryption. I won't tell you this is the best way of course, but assuming you trust the end-user (but not necessarily any one who has access to this computer), it seems sufficient to me.
Also, a good python library for cryptography is pycrypto
On first request, my API will create a random phrase and encrypt it with some secret key (RSA?)
Read up on http://en.wikipedia.org/wiki/Digital_signature to see the whole story behind this kind of handshake.
Then read up on
http://en.wikipedia.org/wiki/Lamport_signature
And it's cousin
http://en.wikipedia.org/wiki/Hash_tree
The idea is that a signature can be used once. Compromise of the signature in your iOS code doesn't matter since it's a one-use-only key.
If you use a hash tree, you can get a number of valid signatures by building a hash tree over the iOS binary file itself. The server and the iOS app both have access to the same
file being used to generate the signatures.

Categories

Resources