Related
Please redirect me to right place if I'm in the wrong place, but I have a presentation to showcase an application that I built. The audience loves to hear big numbers, so just to impress I'm trying to put in a number which gives an idea of calculations that are being done in the entire application demo.
I can use number of calculations the processor is capable of * time taken but I was looking for a more direct way.
If it helps the code of my application is in Python.
Maybe using one of python's builtin profilers could help you, bearing in mind they're made for optimizing code and maybe too high-level for you. cProfile is one example and it can give you the number of times functions were called.
Note: there are a lot of good, easy tutorials out there on good first steps with these complex libraries
If you start with a list of hundreds or perhaps thousands of separate items, and you want Python to choose one (at a time) at random (for creating a ciphertext), how "random" will it really be? It's highly important that there be no repeats of the same item (integers, strings) whatsoever, because of the crypt0graphic nature of the app. But is there some way to confidently perform random selection from dictionaries?
Thanks for the suggestions of such, but this question is not a duplicate of the two possibilities listed. For one thing, the range of items up for selection needs to be entirely dynamic, yet for brevity's sake, I've limited the description of the mechanics of the app, which is intended for educational/entertainment purposes and not for saving the world ;-)
From random module docs:
Warning: The pseudo-random generators of this module should not be used for security purposes. Use os.urandom() or SystemRandom if you require a cryptographically secure pseudo-random number generator.
If you're using Python 3.6 you can use:
from secrets import choice
choice(your_options)
According to the module documentation:
The secrets module is used for generating cryptographically strong random numbers suitable for managing data such as passwords, account authentication, security tokens, and related secrets.
First off, what you're talking about is how random a human perceives a generator to be; not how random something is. There's a good post on how Spotify shuffles music to seem more random to humans, while actually reducing entropy. (or at least how they used to do it).
Not ever using the same number/string twice in the same message is a worse security flaw than the one used to crack the Enigma during WW2.
Second, by "how random", you probably mean "how much entropy".
Third, the random module in Python is not cryptographically secure, as others have pointed out. Don't use it for cryptography-related code. There's os.urandom(), SystemRandom or secrets, but you should probably not use any of them, because:
Fourth, and most important, you should never roll your own crypto unless you have a degree in cryptography. Check what the state of the art is, and use that instead. Crypto SE knows their stuff, and so does Security SE.
One of the big additions in the recently released Python 3.6 is the addition of a secrets module for generating cryptographically strong random numbers.
Project Euler
I have recently begun to solve some of the Project Euler riddles. I found the discussion forum in the site a bit frustrating (most of the discussions are closed and poorly-threaded), So I have decided to publish my Python solutions on launchpad for discussion.
The problem is that it seems quite unethical to publish these solutions, as it would let other people gain reputation without doing the programming work, which the site deeply discourages.
My Encryption problem
I want to encrypt my answers so that only those who have already solved the riddles can see my code. The logical key would be the answer to the riddle, which is always numeric.
In order to prevent brute-force attacks on my answers, I want to find an encryption algorithm that takes a significantly long time (few seconds) to run.
Do you know any such algorithm? I would fancy a Python package, which I can attach to the code, over an external program that might have portability issues.
Thanks,
Adam
It sounds like people will have to write their own decryption utility, or use something off-the-shelf, or use off-the-shelf components to decrypt your posts.
PBKDF2 is a standardized algorithm for password-based key derivation, defined in PKCS #5. Basically, you can tune "iterations" parameter so that deriving the key from a password (the answer to the Euler problem) would take several seconds. The key can then be used for any common symmetric encryption algorithm, like AES-128.
This has the advantage that most crypto libraries already support PBKDF2. In fact, you might find mail clients that support password-based encryption for S/MIME messages. Then you could just post an S/MIME and people could read it with the mail client. Unfortunately, my mail client (Thunderbird) only supports public-key encryption.
I think Yin Zhu pegged the social aspect of it and Whirlwind the technical. Using your preferred approach of:
python decrypt.py --problem=123 --key=1234567
the key number is readily available to Google, and even without that, slamming through a million keys (assuming a median key length of 5 decimal digits yields less than 20 bits of key) is pretty fast. If I wanted to be more clever I could use plain-text assumptions (e.g. import, for) and vastly reduce my search space.
For all the trouble you're probably best off using something really complicated like:
>>> print codecs.getencoder('rot_13')('import codecs')[0]
vzcbeg pbqrpf
And if you want the solution to Project Euler problem 123, you'll have to beat it out of me...
Yes, you can do this with virtually any symmetric encryption algorithm: DSA, or AES, for example; just use the integer as the key, and pad the key out to the required length of the encryption algorithm's key, and use that key to decrypt the answer.
Keep in mind that if you extend a short key, the encryption won't be very good. The strength of the encryption has a lot more to do with key length and the algorithm itself than how long it takes to run.
This question seems to have some examples of libraries to use with python.
Just use triple DES and use different keys for each iteration, use the number to generate a each of the 3 keys. Pad up the key length with some text, and you're good.
Tripple DES was designed to increase effectiveness against brute force.
It's not the world's most secure option, but I'll keep most bruter's at bay.
If you encrypt your answers, those who have solved the problem simply do not want to see your answers with such effort, provided that they already have plenty of answers to see in the answer page. Those who haven't cannot see. Then your work becomes less useful.
Btw, there are many places providing answers to Project Euler, e.g. Haskell answers, Clojure answers, F# answers. If somebody only wants the answer to a question, he/she could simply run the program. Provided that Python is so popular, google "Python Euler xx" would give you plenty of blogs solving a specific problem.
The simplest approach would be to hash the answer using a secure hash function such as SHA-1, then provide the hash so users can verify their answer. If you want to make brute-forcing more difficult, iterate the hash - eg, provide the result of n recursive applications of SHA1, where n is some parameter you choose to make it difficult to brute-force.
If the number of possible answers is small, though, it'll be difficult to impossible to prevent someone from brute-forcing it even with an expensive hash function.
Edit: Sorry, I misread your original question. If you want to encrypt your answer, you could do that by using the resulting hash, above, as the encryption key for your answer, rather than posting the hash.
If you want an encryption routine that is easy to use and distribute, I recommend Paul Rubin's p3.py. It's probably on the fast side, for how secure it is, but since you seem to be in need of a hurdle to be jumped rather than a siege-resistant wall, it may be a good choice for your purposes.
You could also look into rijndael.py, which is an implementation of AES, and slower than p3.py.
Recently i have developed a billing application for my company with Python/Django. For few months everything was fine but now i am observing that the performance is dropping because of more and more users using that applications. Now the problem is that the application is now very critical for the finance team. Now the finance team are after my life for sorting out the performance issue. I have no other option but to find a way to increase the performance of the billing application.
So do you guys know any performance optimization techniques in python that will really help me with the scalability issue
Guys we are using mysql database and its hosted on apache web server on Linux box. Secondly what i have noticed more is the over all application is slow and not the database transactional part. For example once the application is loaded then it works fine but if they navigate to other link on that application then it takes a whole lot of time.
And yes we are using HTML, CSS and Javascript
As I said in comment, you must start by finding what part of your code is slow.
Nobody can help you without this information.
You can profile your code with the Python profilers then go back to us with the result.
If it's a Web app, the first suspect is generally the database. If it's a calculus intensive GUI app, then look first at the calculations algo first.
But remember that perf issues car be highly unintuitive and therefor, an objective assessment is the only way to go.
ok, not entirely to the point, but before you go and start fixing it, make sure everyone understands the situation. it seems to me that they're putting some pressure on you to fix the "problem".
well first of all, when you wrote the application, have they specified the performance requirements? did they tell you that they need operation X to take less than Y secs to complete? Did they specify how many concurrent users must be supported without penalty to the performance? If not, then tell them to back off and that it is iteration (phase, stage, whatever) one of the deployment, and the main goal was the functionality and testing. phase two is performance improvements. let them (with your help obviously) come up with some non functional requirements for the performance of your system.
by doing all this, a) you'll remove the pressure applied by the finance team (and i know they can be a real pain in the bum) b) both you and your clients will have a clear idea of what you mean by "performance" c) you'll have a base that you can measure your progress and most importantly d) you'll have some agreed time to implement/fix the performance issues.
PS. that aside, look at the indexing... :)
A surprising feature of Python is that the pythonic code is quite efficient... So a few general hints:
Use built-ins and standard functions whenever possible, they're already quite well optimized.
Try to use lazy generators instead one-off temporary lists.
Use numpy for vector arithmetic.
Use psyco if running on x86 32bit.
Write performance critical loops in a lower level language (C, Pyrex, Cython, etc.).
When calling the same method of a collection of objects, get a reference to the class function and use it, it will save lookups in the objects dictionaries (this one is a micro-optimization, not sure it's worth)
And of course, if scalability is what matters:
Use O(n) (or better) algorithms! Otherwise your system cannot be linearly scalable.
Write multiprocessor aware code. At some point you'll need to throw more computing power at it, and your software must be ready to use it!
before you can "fix" something you need to know what is "broken". In software development that means profiling, profiling, profiling. Did I mention profiling. Without profiling you don't know where CPU cycles and wall clock time is going. Like others have said to get any more useful information you need to post the details of your entire stack. Python version, what you are using to store the data in (mysql, postgres, flat files, etc), what web server interface cgi, fcgi, wsgi, passenger, etc. how you are generating the HTML, CSS and assuming Javascript. Then you can get more specific answers to those tiers.
You may be interested in this document I've found some time ago.
As personal advice, be as more pythonic as you can: lazy evaluation is the keyword, so learn to use iterators and generators.
For the type of application you are describing (a web application probably backed by a database) your performance problems are unlikely to be language specific. They are far more likely to stem from design or architecture issues, though they could be simple coding problems too.
To sort this out you need to figure out where the bottlenecks are in your application and for that you need some sort of profiler.
Once you have found your bottlenecks you will be in a much better position. You can evaluate then problem areas for common issues including:
Design and Architecture issues
SQL anti-patterns
Incorrect usage of your framework (perhaps relying on inappropriate defaults)
Badly structured algorithms
The specifics of any solution are going to depend on the specifics of the bottlenecks your find.
http://wiki.python.org/moin/PythonSpeed/PerformanceTips
I optimized some python code a while back, the most surprising thing to me was how much each function call costs. If you minimize function calls or replace loops with builtins you'll be running much faster.
There are some great suggestions hereā¦ So let me suggest an implementation detail. I have found the runprofileserver command found in django-command-extensions very convenient for profiling my Django code.
I am not sure if this would solve the problem but you should have a look at psyco
We are mainting a web application that is built on Classic ASP using VBScript as the primary language. We are in agreement that our backend (framework if you will) is out dated and doesn't provide us with the proper tools to move forward in a quick manner. We have pretty much embraced the current webMVC pattern that is all over the place, and cannot do it, in a reasonable manner, with the current technology. The big missing features are proper dispatching and templating with inheritance, amongst others.
Currently there are two paths being discussed:
Port the existing application to Classic ASP using JScript, which will allow us to hopefully go from there to .NET MSJscript without too much trouble, and eventually end up on the .NET platform (preferably the MVC stuff will be done by then, ASP.NET isn't much better than were we are on now, in our opinions). This has been argued as the safer path with less risk than the next option, albeit it might take slightly longer.
Completely rewrite the application using some other technology, right now the leader of the pack is Python WSGI with a custom framework, ORM, and a good templating solution. There is wiggle room here for even django and other pre-built solutions. This method would hopefully be the quickest solution, as we would probably run a beta beside the actual product, but it does have the potential for a big waste of time if we can't/don't get it right.
This does not mean that our logic is gone, as what we have built over the years is fairly stable, as noted just difficult to deal with. It is built on SQL Server 2005 with heavy use of stored procedures and published on IIS 6, just for a little more background.
Now, the question. Has anyone taken either of the two paths above? If so, was it successful, how could it have been better, etc. We aren't looking to deviate much from doing one of those two things, but some suggestions or other solutions would potentially be helpful.
Don't throw away your code!
It's the single worst mistake you can make (on a large codebase). See Things You Should Never Do, Part 1.
You've invested a lot of effort into that old code and worked out many bugs. Throwing it away is a classic developer mistake (and one I've done many times). It makes you feel "better", like a spring cleaning. But you don't need to buy a new apartment and all new furniture to outfit your house. You can work on one room at a time... and maybe some things just need a new paintjob. Hence, this is where refactoring comes in.
For new functionality in your app, write it in C# and call it from your classic ASP. You'll be forced to be modular when you rewrite this new code. When you have time, refactor parts of your old code into C# as well, and work out the bugs as you go. Eventually, you'll have replaced your app with all new code.
You could also write your own compiler. We wrote one for our classic ASP app a long time ago to allow us to output PHP. It's called Wasabi and I think it's the reason Jeff Atwood thought Joel Spolsky went off his rocker. Actually, maybe we should just ship it, and then you could use that.
It allowed us to switch our entire codebase to .NET for the next release while only rewriting a very small portion of our source. It also caused a bunch of people to call us crazy, but writing a compiler is not that complicated, and it gave us a lot of flexibility.
Also, if this is an internal only app, just leave it. Don't rewrite it - you are the only customer and if the requirement is you need to run it as classic asp, you can meet that requirement.
Use this as an opportunity to remove unused features! Definitely go with the new language. Call it 2.0. It will be a lot less work to rebuild the 80% of it that you really need.
Start by wiping your brain clean of the whole application. Sit down with a list of its overall goals, then decide which features are needed based on which ones are used. Then redesign it with those features in mind, and build.
(I love to delete code.)
It works out better than you'd believe.
Recently I did a large reverse-engineering job on a hideous old collection of C code. Function by function I reallocated the features that were still relevant into classes, wrote unit tests for the classes, and built up what looked like a replacement application. It had some of the original "logic flow" through the classes, and some classes were poorly designed [Mostly this was because of a subset of the global variables that was too hard to tease apart.]
It passed unit tests at the class level and at the overall application level. The legacy source was mostly used as a kind of "specification in C" to ferret out the really obscure business rules.
Last year, I wrote a project plan for replacing 30-year old COBOL. The customer was leaning toward Java. I prototyped the revised data model in Python using Django as part of the planning effort. I could demo the core transactions before I was done planning.
Note: It was quicker to build a the model and admin interface in Django than to plan the project as a whole.
Because of the "we need to use Java" mentality, the resulting project will be larger and more expensive than finishing the Django demo. With no real value to balance that cost.
Also, I did the same basic "prototype in Django" for a VB desktop application that needed to become a web application. I built the model in Django, loaded legacy data, and was up and running in a few weeks. I used that working prototype to specify the rest of the conversion effort.
Note: I had a working Django implementation (model and admin pages only) that I used to plan the rest of the effort.
The best part about doing this kind of prototyping in Django is that you can mess around with the model, unit tests and admin pages until you get it right. Once the model's right, you can spend the rest of your time fiddling around with the user interface until everyone's happy.
Whatever you do, see if you can manage to follow a plan where you do not have to port the application all in one big bang. It is tempting to throw it all away and start from scratch, but if you can manage to do it gradually the mistakes you do will not cost so much and cause so much panic.
Half a year ago I took over a large web application (fortunately already in Python) which had some major architectural deficiencies (templates and code mixed, code duplication, you name it...).
My plan is to eventually have the system respond to WSGI, but I am not there yet. I found the best way to do it, is in small steps. Over the last 6 month, code reuse has gone up and progress has accelerated.
General principles which have worked for me:
Throw away code which is not used or commented out
Throw away all comments which are not useful
Define a layer hierarchy (models, business logic, view/controller logic, display logic, etc.) of your application. This has not to be very clear cut architecture but rather should help you think about the various parts of your application and help you better categorize your code.
If something grossly violates this hierarchy, change the offending code. Move the code around, recode it at another place, etc. At the same time adjust the rest of your application to use this code instead of the old one. Throw the old one away if not used anymore.
Keep you APIs simple!
Progress can be painstakingly slow, but should be worth it.
I would not recommend JScript as that is definitely the road less traveled.
ASP.NET MVC is rapidly maturing, and I think that you could begin a migration to it, simultaneously ramping up on the ASP.NET MVC framework as its finalization comes through.
Another option would be to use something like ASP.NET w/Subsonic or NHibernate.
Don't try and go 2.0 ( more features then currently exists or scheduled) instead build your new platform with the intent of resolving the current issues with the code base (maintainability/speed/wtf) and go from there.
A good place to begin if you're considering the move to Python is to rewrite your administrator interface in Django. This will help you get some of the kinks worked out in terms of getting Python up and running with IIS (or to migrate it to Apache). Speaking of which, I recommend isapi-wsgi. It's by far the easiest way to get up and running with IIS.
I agree with Michael Pryor and Joel that it's almost always a better idea to continue evolving your existing code base rather than re-writing from scratch. There are typically opportunities to just re-write or re-factor certain components for performance or flexibility.