Implementation of string functions in Python? [closed] - python

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
Is there any documentation on how Python's string functions are implemented in Python?
I understand that str is a built-in module, and so its functions are implemented in C.
But isn't there code for it anyways? How about in Pypy? From what I've read so far, they've re-implemented a lot of built-in modules in Python itself.
Example Question: How is the split method of strings implemented? (Without writing my own implementation of it)
EDIT: I am not looking for an implementation written in C (which is the default implementation in the source code of Python/CPython).

This doesn't really answer the question posed, it's just a bit too long to be a comment.
Some quick digging through the source shows that PyPy has two implementations of split(), this high-level, readable version and this lower-level version, which appears to be the implementation of split() in rpython itself.
Neither of these implementations are equivalent to CPython's split() method (most obviously they do not handle the special case CPython does where the sep is not supplied). However, if you are merely interested in the basic algorithm used rather than the details, PyPy's implementations could be a guide (at a quick glance, it looks to be doing basically the same thing as both CPython and Jython).
As a general resource, though, there's no reason to think that PyPy's implementation of all string functions would mirror the algorithms used in CPython -- PyPy is, after all, intended as an optimized version of Python running in a JIT, and this may have significant impact on what the most reasonable implementation of a method is (especially string functions, which frequently can be performance bottlenecks and which the implementors of an "optimized" runtime therefore have incentive to optimize).
Thinking about the more general question, there's very little incentive for the CPython developers to maintain a separate set of pure Python implementations of the low-level library already maintained in C. It seems that there is to much risk that the mirror implementations would grow stale or inaccurate to what is actually being done, which could ultimately be harmful for people who were trying to understand the inner workings of Python without reading the C code.

Related

Machine learning development environment [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I use python for doing prototypes in machine learning but have often been frustrated with the slow interpreter. Is there a language which is good for prototype (enough libraries like sklearn, numpy, scipy) but at the same time is fast and is a powerful language.
What I am looking for is something that I can prototype in and deploy in production as well. What do people commonly use ?
As far as I know, Python is as good as it gets if you want a real language with lots of libraries.
MATLAB is probably the most popular commercial solution for prototyping. It has numerous built-ins and is easy to handle. In terms of performance, MATLAB is currently king in prototyping, second only to compiled languages for production (C, Fortran, C++, ...). It's not a proper language, though, so I guess this isn't what you are looking for.
Python is pretty much as good as it gets for the sort of prototyping you describe. However, I have to ask, if you're frustrated with its speed as a numeric language: how are you writing your code? The way to do this in Python is with Numpy, which is a package for numerical computing where the underlying operations on arrays (matrices) are performed using compiled C code. It does mean learning how to express your computations as matrix operations however, so if you're not used to linear algebra/matrix manipulation then it might require a bit of getting used to. It's basically a Matlab-like environment.
My experience: if you're writing your python code using a lot of loops, element-wise operations, etc. it is slow and ugly. Once you learn the equivalent Numpy/Scipy way, the speed gains are phenomenal (and what you write is much closer to the mathematical expression too).
You can use R within Python RPy. This way you can use R functionalities within a python program for further usage.
Depending on what you want to do, you can also have a look on OpenCV Python, for lower level machine learning tools (SVM ...)

What makes C faster than Python? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I know this is probably a very obvious answer and that I'm exposing myself to less-than-helpful snarky comments, but I don't know the answer so here goes.
If Python compiles to bytecode at runtime, is it just that initial compiling step that takes longer? If that's the case wouldn't that just be a small upfront cost in the code (ie if the code is running over a long period of time, do the differences between C and python diminish?)
It's not merely the fact that Python code is interpreted which makes it slower, although that definitely sets a limit to how fast you can get.
If the bytecode-centric perspective were right, then to make Python code as fast as C all you'd have to do is replace the interpreter loop with direct calls to the functions, eliminating any bytecode, and compile the resulting code. But it doesn't work like that. You don't have to take my word for it, either: you can test it for yourself. Cython converts Python code to C, but a typical Python function converted and then compiled doesn't show C-level speed. All you have to do is look at some typical C code thus produced to see why.
The real challenge is multiple dispatch (or whatever the right jargon is -- I can't keep it all straight), by which I mean the fact that whereas a+b if a and b are both known to be integers or floats can compile down to one op in C, in Python you have to do a lot more to compute a+b (get the objects that the names are bound to, go via __add__, etc.)
This is why to make Cython reach C speeds you have to specify the types in the critical path; this is how Shedskin makes Python code fast using (Cartesian product) type inference to get C++ out of it; and how PyPy can be fast -- the JIT can pay attention to how the code is behaving and specialize on things like types. Each approach eliminates dynamism, whether at compile time or at runtime, so that it can generate code which knows what it's doing.
Byte codes are not natural to the CPU so they need interpretation (by a CPU native code called interpreter). The advantage of byte code is that it enables optimizations, pre-computations, and saves space. C compiler produces machine code and machine code does not need interpretation, it is native to CPU.

Suitable Language for RSA implementation [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I want to implement an RSA cryptosystem algorithm for a university project and I' m trying to decide which programming language to use. I am very familiar with C so it would be a convenient choice. However, the algorithm will have to deal with very large numbers (it will include a Primality subroutine), and I have heard that using Python will result in a better implementation. Is that right?
Thank you in advance.
Of course you could use any language to implement RSA, even Assembler. The question is probably not about a "better" implementation, but maybe about what's easier to grasp when looking at the resulting code a couple of weeks later.
Let's recap what you would need for a RSA implementation:
Large integer support
modular exponentiation
modular inverse
primality testing for key generation
The more support the language of your choice has for these, the cleaner and easier to understand will be the result. Lower-level languages like C(++) won't have native support for large integers, but a library like gmp will provide you with everything that's necessary. Java has the BigInteger class for that.
But still, the result will probably not be as easy to understand as an implementation in a language that has built-in big integer support, such as Python, Ruby or Haskell for example. The resulting code will pretty much look like the textbook description of the algorithms that are used. On the downside, they tend to be slower than for example the highly optimized gmp code.
But since performance is probably not what you are after at this point, I would recommend to use a higher-level language. You don't have to deal with low-level maintenance and can concentrate on the task at hand, pick the one you like best or have experience in. If you want to draw from your familiarity with C, no problem, use a aribitrary-precision library such as gmp and you're good to go, too.
For the missing parts that are probably not built into the language by default, you can use the following as references:
modular exponentiation is built-in in Python (pow with 3 arguments), for other languages you may try the "square-and-multiply" method
the modular inverse can be retrieved from the Extended Euclidean Algorithm. gmp and the Python port gmpy2 will have these algorithms built in.
for primality testing I'd recommend to use Miller-Rabin which is also not too hard to implement, but you could also find implementations for example in PyCrypto
Although you probably know this already, for the sake of completeness just let me warn you that this, what is called "textbook RSA" implementation, will not be secure to use in production - a lot of things have not been addressed yet. There's RSA blinding to prevent side-channel attacks, for RSA to be secure as an encryption scheme you will also need to implement some form of padding, it's crucial to use a cryptographically secure random generator for your keys etc. etc.
I don't know if Python will result in a "better" implementation, since better is rather subjective here. You can find numerical libraries for both that will allow you to deal with large numbers easily. Python has the advantage (imo) of having the numpy library which is very easy to read and use and is generally more human readable which often leads to easier debugging.
Using a scripting language or any more high level language than C (e.g. C# or Java) will most likely be easier since you don't have to deal with memory management and other tasks not really related to your project.

python implemented in assembly [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 12 years ago.
Am just being curious but I would like to know whether python can be implemented in assembly and if not why has it not been done to help for speed issues. forgive my naivete in matters of programming languages.
The main implementation is written in C, and that's compiled to machine code (i.e. assembly made readable for the CPU). So writing it assembly is certainly possible, and if it's possible for a compiler, it's possible for humans - in theory. In practice, it is not even remotely practical. Not only asm is even more low-level than C (increasing development time significantly, perhaps even expotentially to the project size), it's also highly platform-specific, so each port takes a huge lot of work (and maintaince is multiplied by the number of supported platforms - quite a few in the case of CPython).
Apart from that, it's highly questionable if this would give a notable speed bonus. Writing it closer to the metal doesn't make stuff go faster magically (the contrary can be the case - you'd be hard-pressed to find a programmer who can consistently write better assembly than the four or five well-known C compilers). And much of Python's slowness comes from the many many abstractions and indirections the language consists of, not from a sloppy implementation of these.
A more promising approach (which is indeed followed by several alternative implementations) is a clever Just In Time-Compiler (JIT), which preserves all the dynamicness but exploits the fact that most Python programs make little use of that dynamicness by recognizing the most common paths at runtime and optimizing for these. Such complex programs are again not written in asm.
Native code isn't a magic make-it-go-faster operation. The language semantics really dictate quite a bit about how fast (or not) a language is. (For instance, erlang compiled to native code via Hipe is still fairly slow).

Do you use Python mostly for its functional or object-oriented features? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I see what seems like a majority of Python developers on StackOverflow endorsing the use of concise functional tools like lambdas, maps, filters, etc., while others say their code is clearer and more maintainable by not using them. What is your preference?
Also, if you are a die-hard functional programmer or hardcore into OO, what other specific programming practices do you use that you think are best for your style?
Thanks in advance for your opinions!
I mostly use Python using object-oriented and procedural styles. Python is actually not particularly well-suited to functional programming.
A lot of people think they are writing functional Python code by using lots of lambda, map, filter, and reduce, but this is a bit over-simplified. The hallmark feature of functional programming is a lack of state or side effects. Important elements of a functional style are pure functions, recursive algorithms, and first class functions.
Here are my thoughts on functional programming and Python:
Pure functions are great. I do my best to make my module-level functions pure.
Pure functions can be tested. Since they do not depend on outside state, they are much easier to test.
Pure functions are able to support other optimizations, such as memoization and trivial parallelization.
Class-based programming can be pure. If you want an equivalent to pure functions using Python classes (which is sometimes but not always what you want),
Make your instances immutable. In particular, this mainly means to make your methods always return new instances of your class rather than changing the current one.
Use dependency injection rather than getting stuff (like imported module) from global scope.
This might not always be exactly what you want.
Don't try to avoid state all together. This isn't a reasonable strategy in Python. For example, use some_list.append(foo) rather than new_list = some_list + [foo], the former of which is more idiomatic and efficient. (Indeed, a ton of the "functional" solutions I see people use in Python are algorithmically suboptimal compared to just-as-simple or simpler solutions that are not functional or are just as functional but don't use the functional-looking tools.)
Learn the best lessons from functional programming, for example mutable state is dangerous. Ask yourself, Do I really want to change this X or do I want a new X?
One really common place this comes up is when processing a list. I would use
foo = [bar(item.baz()) for item in foo]
rather than
for index, _ in enumerate(foo):
foo[index] = bar(foo[index].baz())
and stuff like it. This avoids confusing bugs where the same list object is stored elsewhere and shouldn't be changed. (If it should be changed, then there is a decent chance you have a design error. Mutating some list you have referenced multiple places isn't a great way to share state.)
Don't use map and friends gratuitously. There is nothing more functional about doing this.
map/filter are not more functional than list comprehensions. List comprehensions were borrowed from Haskell, a pure functional language. map and especially filter can be harder to understand than a list comprehension. I would never use map or filter with a lambda but might if I had a function that already existed; I use map a decent bit.
The same goes for itertools.imap/ifilter compared to generator expressions. (These things are somewhat lazy, which is something great we can borrow from the functional world.)
Don't use map and filter for side effects. I see this with map a lot, which both makes hard-to-understand code, unneeded lists, and is decidedly not functional (despite people thinking it must be because of map.) Just use a for loop.
reduce is confusing except for very simple cases. Python has for loops and there is no hurt in using them.
Don't use recursive algorithms. This is one part of functional programming Python just does not support well. CPython (and I think all other Pythons) do not support tail call optimization. Use iteration instead.
Only use lambda when you are defining functions on the fly. Anonymous functions aren't better than named functions, the latter of which are often more robust, maintainable, and documented.
I use the features of the language that get the job done with the shortest, cleanest code possible. If that means that I have to mix the two, which I do quite often, then that's what gets done.
I am both a die-hard OOP and functional programmer and these styles work very well together, mostly because they are completely orthogonal. There are plenty of object-oriented, functional languages and Python is one of them.
So basically, decomposing a application into classes is very helpful when designing a system. When you're doing the actual implementation, FP helps to write correct code.
Also I find it very offensive that you imply that functional programming just means "use folds everywhere". That is probably the biggest and worst misconception about FP. Much has been written of that topic, so I'll just say that the great thing about FP is the idea to combine simple (,correct and reusable) functions into new, more and more complex function. That way it's pretty hard to write "almost correct" code - either the whole thing does exactly what you want, or it breaks completely.
FP in Python mostly revolves around writing generators and their relatives (list comprehensions) and the things in the itertools module. Explicit map/filter/reduce calls are just unneeded.
Python has only marginal functional programming features so I would be surprised if many people would use it especially for that. For example there is no standard way to do function composition and the standard library's reduce() has been deprecated in favor of explicit loops.
Also, I don't think that map() or filter() are generally endorsed. In opposite, usually list comprehensions seem to be preferred.
Most answers on StackOverflow are short, concise answers, and the functional aspects of python make writing that kind of answers easy.
Python's OO-features simply aren't needed in 10-20 line answers, so you don't see them around here as much.
I select Python when I'm taking on a problem that maps well to an OO solution. Python only provides a limited ability to program in a functional manner compared to full blown functional languages.
If I really want functional programming, I use Lisp.

Categories

Resources