What does abstraction mean in programming?

What does abstraction mean in programming? - python

I'm learning python and I'm not sure of understanding the following statement : "The function (including its name) can capture our mental chunking, or abstraction, of the problem."
It's the part that is in bold that I don't understand the meaning in terms of programming. The quote comes from http://www.openbookproject.net/thinkcs/python/english3e/functions.html
How to think like a computer scientist, 3 edition.
Thank you !

Abstraction is a core concept in all of computer science. Without abstraction, we would still be programming in machine code or worse not have computers in the first place. So IMHO that's a really good question.
What is abstraction
Abstracting something means to give names to things, so that the name captures the core of what a function or a whole program does.
One example is given in the book you reference, where it says
Suppose we’re working with turtles, and a common operation we need is
to draw squares. “Draw a square” is an abstraction, or a mental chunk,
of a number of smaller steps. So let’s write a function to capture the
pattern of this “building block”:
Forget about the turtles for a moment and just think of drawing a square. If I tell you to draw a square (on paper), you immediately know what to do:
draw a square => draw a rectangle with all sides of the same length.
You can do this without further questions because you know by heart what a square is, without me telling you step by step. Here, the word square is the abstraction of "draw a rectangle with all sides of the same length".
Abstractions run deep
But wait, how do you know what a rectangle is? Well, that's another abstraction for the following:
rectangle => draw two lines parallel to each other, of the same length, and then add another two parallel lines perpendicular to the other two lines, again of the same length but possibly of different length than the first two.
Of course it goes on and on - lines, parallel, perpendicular, connecting are all abstractions of well-known concepts.
Now, imagine each time you want a rectangle or a square to be drawn you have to give the full definition of a rectangle, or explain lines, parallel lines, perpendicular lines and connecting lines -- it would take far too long to do so.
The real power of abstraction
That's the first power of abstractions: they make talking and getting things done much easier.
The second power of abstractions comes from the nice property of composability: once you have defined abstractions, you can compose two or more abstractions to form a new, larger abstraction: say you are tired of drawing squares, but you really want to draw a house. Assume we have already defined the triangle, so then we can define:
house => draw a square with a triangle on top of it
Next, you want a village:
village => draw multiple houses next to each other
Oh wait, we want a city -- and we have a new concept street:
city => draw many villages close to each other, fill empty spaces with more houses, but leave room for streets
street => (some definition of street)
and so on...
How does this all apply to programmming?
If in the course of planning your program (a process known as analysis and design), you find good abstractions to the problem you are trying to solve, your programs become shorter, hence easier to write and - maybe more importantly - easier to read. The way to do this is to try and grasp the major concepts that define your problems -- as in the (simplified) example of drawing a house, this was squares and triangles, to draw a village it was houses.
In programming, we define abstractions as functions (and some other constructs like classes and modules, but let's focus on functions for now). A function essentially names a set of single statements, so a function essentially is an abstraction -- see the examples in your book for details.
The beauty of it all
In programming, abstractions can make or break productivity. That's why often times, commonly used functions are collected into libraries which can be reused by others. This means you don't have to worry about the details, you only need to understand how to use the ready-made abstractions. Obviously that should make things easier for you, so you can work faster and thus be more productive:
Example:
Imagine there is a graphics library called "nicepic" that contains pre-defined functions for all abstractions discussed above: rectangles, squares, triangles, house, village.
Say you want to create a program based on the above abstractions that paints a nice picture of a house, all you have to write is this:
import nicepic
draw_house()
So that's just two lines of code to get something much more elaborate. Isn't that just wonderful?

A great way to understand abstraction is through abstract classes.
Say we are writing a program which models a house. The house is going to have several different rooms, which we will represent as objects. We define a class for a Bathroom, Kitchen, Living Room, Dining Room, etc.
However, all of these are Rooms, and thus share several properties (# of doors/windows, square feet, etc.) BUT, a Room can never exist on it's own...it's always going to be some type of room.
It then makes sense to create an abstract class called Room, which will contain the properties all rooms share, and then have the classes of Kitchen, Living Room, etc, inherit the abstract class Room.
The concept of a room is abstract and only exists in our head, because any room that actually exists isn't just a room; it's a bedroom or a living room or a classroom.
We want our code to thus represent our "mental chunking". It makes everything a lot neater and easier to deal with.

As defined on wikipedia: Abstraction_(computer_science)
Abstraction tries to factor out details from a common pattern so that
programmers can work close to the level of human thought, leaving out
details which matter in practice, but are not exigent to the problem being
solved.
Basically it is removing the details of the problem. For example, to draw a square requires several steps, but I just want a function that draws a square.

Let's say you write a function which receives a bunch of text as parameter, then reads credentials in a config file, then connects to a SMTP server using those credentials and sends a mail using that text.
The function should be named sendMail(text), not parseTextReadCredentialsInFileConnectToSmtpThenSend(text) because it is more easy to represent what it does this way, to yourself and when presenting the API to coworkers or users... even though the 2nd name is more accurate, the first is a better abstraction.

In a simple sentence, I can say: The essence of abstraction is to extract essential properties while omitting inessential details. But why should we omit inessential details? The key motivator is preventing the risk of change.

The best way to to describe something is to use examples:
A function is nothing more than a series of commands to get stuff done. Basically you can organize a chunk of code that does a single thing. That single thing can get re-used over and over and over through your program.
Now that your function does this thing, you should name it so that it's instantly identifiable as to what it does. Once you have named it you can re-use it all over the place by simply calling it's name.
def bark():
print "woof!"
Then to use that function you can just do something like:
bark();
What happens if we wanted this to bark 4 times? Well you could write bark(); 4 times.
bark();
bark();
bark();
bark();
Or you could modify your function to accept some type of input, to change how it works.
def bark(times):
i=0
while i < times:
i = i + 1
print "woof"
Then we could just call it once:
bark(4);
When we start talking about Object Oriented Programming (OOP) abstraction means something different. You'll discover that part later :)

Abstraction: is a very important concept both in hardware and software.
Importance: We the human can not remember all the things all the times. For example, if your friend speaks 30 random numbers quickly and asks you to add them all, it won't be possible for you. Reason? You might not be able to keep all those numbers in mind. You might write those numbers on a paper even then you will be adding right most digits one by one ignoring the left digits at one time and then ignoring the right most digits at the other time having added the right most ones.
It shows that at one time we the human can focus at some particular issue while ignoring those which are already solved and moving focus towards what are left to be solved.
Ignoring less important thing and focusing the most important (for the time being and in particular context) is called Abstraction
Here is how abstraction works in programming.
Below is the world's famous hello world program in C language:
//C hello world example hello.c
#include <stdio.h>
int main()
{
printf("Hello world\n");
return 0;
}
This is the simplest and usually the first computer program a programmer writes. When you compile and run this program on command prompt, the output may appear like this:
Here are the serious questions
Computer only understands the binary code how was it able to run your English like code? You may say that you compiled the code to binary using compiler. Did you write compiler to make your program work? No. You needed not to. You installed GNU C compiler on your Linux system and just used it by giving command:
gcc -o hello hello.c
And it converted your English like C language code to binary code and you could run that code by giving command:
./hello
So, for writing an application in C program, you never need to know how C compiler converts C language code to binary code. So you used GCC compiler as an abstraction.
Did you write code for main() and printf() functions? No. Both are already defined by someone else in C language. When we run a C program, it looks for main() function as starting point of program while printf() function prints output on computer screen and is already defined in stdio.h so we have to include it in program. If both the functions were not already written, we had to write them ourselves just to print two words and computers would be the most boring machines on earth ever. Here again you use abstraction i.e. you don't need to know how printf prints text on monitor and all you need to know is how to give input to printf function so that it shows the desired output.
I did not expand the answer to abstraction of operating system, kernel, firmware and hardware for the sake of simplicity.
Things to remember:
While doing programming, you can use abstraction in a variety of ways to make your program simple and easy.
Example 1: You can use a constant to abstract value of PI 3.14159 in your program because PI is easy to remember than 3.14159 for the rest of program
Example 2: You can write a function which returns square of a given number and then anyone, including you, can use that function by giving it input as parameters and getting a return value from it.
Example 3: In an Object-oriented programming (OOP), like Java, you may define an object which encapsulates data and methods and you can use that object by invoking its methods.
Example 4: Many applications provide you API which you use to interact with that application. When you use API methods, you never need to know how they are implemented. So abstraction is there.
Through all the examples, you can realize the importance of abstraction and how it is implemented in programming. One key thing to remember is that abstraction is contextual i.e. keeps on changing as per context

Related

Which is preferred for mapping values: reusing mapping functions OR building reference tables to do lookups?

Which is preferred for mapping values: reusing mapping functions OR building reference tables to do lookups?
This is a very general high-level question, which, I believe is mostly language-independent. For the example, I will use SAS, but I also would like to know how people feel about this in R and python.
I am on a team that likes to use mapping functions to take an input and generate some output, usually in a one-to-one manner. For example, here is some modified SAS code that shows what my team typically does:
proc fcmp;
function mapFunction($ input);
output = .;
if input= "Day zero" then output= 0;
else if input = "Day one" then output = 1;
else if input = "Day two" then output = 2;
return(output);
endsub;
run;
The team would then use mapFunction in multiple other programs when mapping needs to be done.
This goes against my instinct. My instinct would be to use the mapping function once to generate a set of key:value pairs or a dictionary or two-column table and thereafter use lookup/indexing/joining/merging operations to refer to the table.
I have a hard time explaining why this is my instinct, but it is. It feels better to have a reference table instead of calling a custom function repeatedly. However, I don't trust my feelings, and I'd like to hear from others to see if I can rationalize one technique over the other.
Can someone provide arguments/counter-arguments to support one technique or the other?
Thank you!

I've done a lot of conversions, which involve re-mapping codes. Your situation may be a little different. Conversions being one-off events, rather than on-going transformations like overnight DW refreshes.
The default position is to use a table, as you suggest. It has a real benefit in that a) you can query data to check for non-existent values in the map before you convert; b) the responsibility goes back to the business to confirm the code mappings are correct.
It gets trickier where the business rules for the conversion are in flux. For example, your first cut might provide a simple map, but the someone says 'except for these codes, where you need to look up an extra value' and suddenly you need a function. At that point, you need to revisit the code base to find where the mapped table is used and update to a function. That can be a lesson learned, depending on the size of the code base.
I have used functions to wrap the use of a mapping table. If you don't know which code is going to be problematic in a month's time, this can be useful. Function just selects a value from a mapping table as you would do in a join. It will be slower to execute but if you're optimizing programmer hours rather than execution time then that works. In any case, you get the data visibility from the code map as well as flexibility for scope-creep from the function. When the scope changes, just update the function.
When you say 'multiple other programs', sounds like a red flag. If you store the data in mapping tables, would the database in which they are stored be visible to these programs? That would be a consideration, big one.

What's your evaluation criteria?
Are you looking for:
program run time
developer coding time
usability
ease of maintenance
Different people prioritize different things. The one good thing about the table method is that it's generic, simple and makes sense in almost any language. But...that does not mean it's the best for that language. How each language implements things under the hood will affect the criteria above.
In SAS, I would rarely use FCMP, but Formats are one of the fastest ways there and really have a table behind it that can be easily updated, so that would be my choice in SAS. It's slightly slower than the HASH and SET+KEY methods but it's easier to use and maintain and those end up saving more time in the long run. And I prioritize human time over computer time for 99% of my projects.
If you search on lexjansen.com you'll find many papers on the different ways to do look ups and a comparison of their run times. https://www.lexjansen.com/phuse/2007/cs/CS06.pdf

ABAQUS python scripting inconsistencies when selecting regions

This may sound more like a rant to some extent, but I also would like to have your opinion on how to deal with the inconsistencies when using python scripting in abaqus.
here my example: in my rootAssembly (ra) I have three instances called a, b, c. in the script below I assign general seed, then mesh control, and element types, finally I generate the mesh:
ra.seedPartInstance(regions=(a,b,c), size=1.0)
ra.setMeshControls(elemShape=QUAD,
regions=(a.faces+b.faces+c.faces),
technique=STRUCTURED)
ra.setElementType(
elemTypes=eltyp,
regions=(a.faces,b.faces,c.faces))
ra.generateMesh(regions=(a,b,c))
As you can see, ABAQUS requires you to define the same region in several different modes.
Even though the argument is called "regions", ABAQUS either asks for a Set, or a Vertex, or a GeomSequence.
how do you deal with this? scripting feels a lot like trial and error, as there is no way to know in advance what is expected.
any suggestions?

Yes, there is clearly "a way to know in advance what is expected" - the docs. These spell out exactly what arguments are allowed.
But seriously - I see no inconsistency in your example. In practice, the reuse of the argument regions makes complete sense when you consider the context for what each of the functions actually do. Consider how the word "region" is a useful conceptual framework that can be adapted to easily allow the user to specify the necessary info for a variety of different tasks.
Now consider the complexity of the underlying system that the Python API exposes, and the variety of tasks that different users want to control and do with that underlying system. I doubt it would be simpler if the args were named something like seq_of_geomCells_geomFaces_or_geomSets. Or even worse, if there were a different argument for each allowable model entity that the function was designed to handle - that would be a nightmare. In this respect, the reuse of the keyword regions as a logical conceptual framework makes complete sense.

ok, i read now from the documentation of the three commands used above:
seedPartInstance(...)
regions: A sequence of PartInstance objects specifying the part instances to seed.
setMeshControls(...)
regions: A sequence of Face or Cell regions specifying the regions for which to set the mesh control parameters.
setElementType(...)
regions: A sequence of Geometry regions or MeshElement objects, or a Set object containing either geometry regions or elements, specifying the regions to which element types are to be assigned.
ok, i get the difference between partInstances and faces, but still it's not extremely clear why one is appended (using commas) and the other is added (using +), since they both call for a Sequence, and at this point, how does setElementType even works when passing faces objects to it?
I will take some more time to learn ABAQUS and to think through it, hopefully i can understand truly these differences.

How do you implement experiments with conditional branching in PsychoPy Builder?

Many behavioural experimental designs in psychology/neuroscience require conditional branching (e.g. only proceed to the test phase if a requisite performance level has been reached in an initial practice phase). PsychoPy’s Builder view allows one to generate a Python script to run an experiment using largely graphical controls. But it doesn't seem to have built-in support for conditional branching.
Can skipping a particular routine on a given run be implemented in Builder by using Python snippets in a Code component? Or does it require moving to the full Python Coder environment?

The Coder view in PsychoPy gives you full access to the Python programming language and hence you can implement arbitrarily complex experimental designs.
PsychoPy’s graphical Builder view, meanwhile, emphasises ease of use and simplicity over flexibility. One thing it does not cater for directly is conditional branching. It can, however, be hacked to achieve it indirectly.
Let’s say you have a three-phase experiment: a practice block, followed by two possible experimental blocks, ConditionA or ConditionB. After completing the practice block, high-performing subjects are assigned to conditionA, while low-performing subjects are assigned to conditionB.
To implement this in Builder, create three routines to represent each of the task blocks (Practice, conditionA, and conditionB). Each will also be surrounded by a loop (practice_loop, A_loop, and B_loop, respectively.) Also insert a routine between Practice and conditionA (called, say, assignCondition).
In the assignCondition routine, place a Code component. Assume in this case that a performance score counter was maintained in the Practice routine. We can use this to change the number of repetitions of subsequent routines. That is, by setting the repetition number of a loop to zero, we ensure that the routines inside that loop will not be executed. Hence the number of repetitions of these loops will not be a fixed value, but instead a variable (say, repetitionsA and repetitionsB).
In the "Begin Routine" tab of the assignCondition routine's Code component, put a Python snippet like this:
if performanceScore > 25:
repetitionsA = 50 # run this routine 50 times
repetitionsB = 0 # don't run this condition at all
else:
repetitionsA = 0 # vice versa: don't run this
repetitionsB = 50 # do run this
A fuller description of this technique is given by Matt Wall in a blog post here (with an fMRI block design as example, in which the order of blocks needs to be variable):
http://computingforpsychologists.wordpress.com/2013/11/12/how-to-hack-conditional-branching-in-the-psychopy-builder/

Preserve code readability while optimising

I am writing a scientific program in Python and C with some complex physical simulation algorithms. After implementing algorithm, I found that there are a lot of possible optimizations to improve performance. Common ones are precalculating values, getting calculations out of cycle, replacing simple matrix algorithms with more complex and other. But there arises a problem. Unoptimized algorithm is much slower, but its logic and connection with theory look much clearer and readable. Also, it's harder to extend and modify optimized algorithm.
So, the question is - what techniques should I use to keep readability while improving performance? Now I am trying to keep both fast and clear branches and develop them in parallel, but maybe there are better methods?

Just as a general remark (I'm not too familiar with Python): I would suggest you make sure that you can easily exchange the slow parts of the 'reference implementation' with the 'optimized' parts (e.g., use something like the Strategy pattern).
This will allow you to cross-validate the results of the more sophisticated algorithms (to ensure you did not mess up the results), and will keep the overall structure of the simulation algorithm clear (separation of concerns). You can place the optimized algorithms into separate source files / folders / packages and document them separately, in as much detail as necessary.
Apart from this, try to avoid the usual traps: don't do premature optimization (check if it is actually worth it, e.g. with a profiler), and don't re-invent the wheel (look for available libraries).

Yours is a very good question that arises in almost every piece of code, however simple or complex, that's written by any programmer who wants to call himself a pro.
I try to remember and keep in mind that a reader newly come to my code has pretty much the same crude view of the problem and the same straightforward (maybe brute force) approach that I originally had. Then, as I get a deeper understanding of the problem and paths to the solution become clearer, I try to write comments that reflect that better understanding. I sometimes succeed and those comments help readers and, especially, they help me when I come back to the code six weeks later. My style is to write plenty of comments anyway and, when I don't (because: a sudden insight gets me excited; I want to see it run; my brain is fried), I almost always greatly regret it later.
It would be great if I could maintain two parallel code streams: the naïve way and the more sophisticated optimized way. But I have never succeeded in that.
To me, the bottom line is that if I can write clear, complete, succinct, accurate and up-to-date comments, that's about the best I can do.
Just one more thing that you know already: optimization usually doesn't mean shoehorning a ton of code onto one source line, perhaps by calling a function whose argument is another function whose argument is another function whose argument is yet another function. I know that some do this to avoid storing a function's value temporarily. But it does very little (usually nothing) to speed up the code and it's a bitch to follow. No news to you, I know.

It is common to assume you must give up readability to get performance.
That's not necessarily so.
You need to find out What exactly is it spending much time doing, and why?
Notice, I didn't say you need to do any measuring.
Here's an example of what I mean.
Chances are very good that you can do some simple changes to avoid waste motion, but don't fix anything until the program itself has told you what to fix.

def whatYouShouldDo(servings, integration_method=oven):
"""
Make chicken soup
"""
# Comments:
# They are important. With some syntax highlighting, the comments are
# the first thing a new programmer will look for. Therefore, they should
# motivate your algorithm, outline it, and break it up into stages.
# You can MAKE IT FEEL AS IF YOU ARE READING TEXT, interspersing code
# amongst the text.
#
# Algorithm overview:
# To make chicken soup, we will acquire chicken broth and some noodles.
# Preprocessing ingredients is done to optimize cooking time. Then we
# will output in SOUP format via stdout.
#
# BEGIN ALGORITHM
#
# Preprocessing:
# 1. Thaw chicken broth.
broth = chickenstore.deserialize()
# 2. Mix with noodles
if not noodles in cache:
with begin_transaction(locals=poulty) as t:
cache[noodles] = t.buy(noodles) # get from local store
noodles = cache[noodles]
# 3. Perform 4th-order Runge-Kutta numerical integration
import kitchensink import * # FIXME: poor form, better to from kitchensink import pan at beginning
result = boilerplate.RK4(broth `semidirect_product` noodles)
# 4. Serve hot
log.debug('attempting to serve')
return result
log.debug('server successful')
also see http://en.wikipedia.org/wiki/Literate_programming#Example
I've also heard that this is what http://en.wikipedia.org/wiki/Aspect-oriented_programming attempts to help with, though I haven't really looked into it. (It just seems to be a fancy way of saying "put your optimizations and your debugs and your other junk outside of the function you're writing".)

How to organise the structure of a 3D game?

This is a bit of a vague question but bear with me. I am in the process of writing a game using Python/Pyglet and openGL. I currently have it structured so that there is an object called 'world', in this are other objects with other objects inside them etc. I did it this way because for instance one part of the game is a platform with other objects on it, and when I tilt the platform I want the objects on it to tilt with it. So I do platform.draw() which calls glRotate, glTranslate, then draw each of the objects saving the modelview matrix inbetween, this way all the objects on the platform move together.
The first question is, is this a sensible way to organise things or should I be using some other method?
I don't have a camera class, currently I am just translating the whole world or parts of it to give the illusion of movement. However, in the future I want to be able to switch viewpoints between objects, so for instance switch from looking down onto the world from above to a 1st person view from one of the objects in the world. So the second question is what is the best way to structure my program so that this will be achievable in the future?

Not probably a full answer, but i think a full one would take a whole book...
What you are doing with your representation of the world in a hierarchy is very similar to what is usually known as "scene-graph", and it is a good idea in many cases. A good example of a library to do precisly this is Open Scene Graph.
About "translating the world", unless you are really transforming all the vertices every time, it is also perfectly fine. It is just a matter of relative point of view and what your really have is the same that a camera matrix. You can see the transformation as placing the camera in the world, or as moving the world in front of the camera.

You could put the logic into a seperate module / into seperate classes/functions.
In my 2D-Game I have a GameLogic class which simplifies registering it's methods for certain events or scheduling them (and unregistering+unscheduling them), and I created a #state_wrapper decorator which injects a simple new-style object as state-storage for that method. If you do it like that you don't have to pass the pointer to all your world objects, only the event-methods have to get access to your objects.
But I wouldn't claim that this is the best solution ;)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.