Refactoring a loop in Python - python

I know this is not good coding and I'm looking to improve it. I want to get the first name by default if no name is supplied. My hack still enters the for loop and if it would be a big one it would be innefficient. But if I do all the attributions that I'm doing in the for loop's inside if, again, in the outside if, I would duplicate code. And I wouldn't use a function just to set attributes so I'm not sure how to proceed here.
if not name:
name = self.data['big_list'][0]['name']
for i in range(len(self.data['big_list'])):
if self.data['big_list'][i]['name'] == name:
self.name = name
self.address = self.data['big_list'][i]['address']
...
return

if think there is a general bad practice or misunderstanding of class: there is a class right? and you check if your instance has the same name of one of the names in your list, and set itself.
-> in your instance it seems you have large data, containing a list of people AND attributes for one person. That does not sound correct.
Let say the class is called Person.
you can create an init() method, which takes one row of your data['big_list']. instead of setting each attribute in a loop.
you might also want to create a equals() method which checks if a person is the same as someone else. (check duplicates)
consider taking your large_data out of that class.
Could you provide us with a little more context?

Here are some comments (that might be insuficient because I do not really understand what you want to achieve with the program).
The purpose of the for loop seems to be to find an item of the list self.data['big_list'] that meets the condition self.data['big_list'][i]['name'] == name, get some data and then terminate. Each entry of self.data['big_list'] is a dict.
This is a good job for a technique called list comprehension, which is much faster than for-looping.
The expression
filtered = [item for item in self.data['big_list'][1:] if item['name'] == name]
results in a list of dicts that are not the first one and meet the name-condition. The used expression self.data['big_list'][1:] is all of self.data['big_list'] but the first one. This technique is called slicing of lists. I assume you are not interested in the first entry, because you got the name from this first entry and search for other entries with the same name (which your for-loop doesn't, btw).
There may be more than one entry in filtered, or, none. I assume you are only interested in the first match, because your program does return when a match happens. Therefore, the second part of the program would be
if len(filtered) > 0:
first_match = filtered[0]
self.name = name
self.address = first_match['address']
...
This way the structure of the program is more clear and other readers can better understand what the program does. Furthermore, it is much faster ;-).

Related

Why isn't there an ignore special variable in python?

Let's say I want to partition a string. It returns a tuple of 3 items. I do not need the second item.
I have read that _ is used when a variable is to be ignored.
bar = 'asd cds'
start,_,end = bar.partition(' ')
If I understand it correctly, the _ is still filled with the value. I haven't heard of a way to ignore some part of the output.
Wouldn't that save cycles?
A bigger example would be
def foo():
return list(range(1000))
start,*_,end = foo()
It wouldn't really save any cycles to ignore the return argument, no, except for those which are trivially saved, by noting that there is no point to binding a name to a returned object that isn't used.
Python doesn't do any function inlining or cross-function optimization. It isn't targeting that niche in the slightest. Nor should it, as that would compromise many of the things that python is good at. A lot of core python functionality depends on the simplicity of its design.
Same for your list unpacking example. Its easy to think of syntactic sugar to have python pick the last and first item of the list, given that syntax. But there is no way, staying within the defining constraints of python, to actually not construct the whole list first. Who is to say that the construction of the list does not have relevant side-effects? Python, as a language, will certainly not guarantee you any such thing. As a dynamic language, python does not even have the slightest clue, or tries to concern itself, with the fact that foo might return a list, until the moment that it actually does so.
And will it return a list? What if you rebound the list identifier?
As per the docs, a valid variable name can be of this form
identifier ::= (letter|"_") (letter | digit | "_")*
It means that, first character of a variable name can be a letter or an underscore and rest of the name can have a letter or a digit or _. So, _ is a valid variable name in Python but that is less commonly used. So people normally use that like a use and throw variable.
And the syntax you have shown is not valid. It should have been
start,*_,end = foo()
Anyway, this kind of unpacking will work only in Python 3.x
In your example, you have used
list(range(1000))
So, the entire list is already constructed. When you return it, you are actually returning a reference to the list, the values are not copied actually. So, there is no specific way to ignore the values as such.
There certainly is a way to extract just a few elements. To wit:
l = foo()
start, end = foo[0], foo[-1]
The question you're asking is, then, "Why doesn't there exist a one-line shorthand for this?" There are two answers to that:
It's not common enough to need shorthand for. The two line solution is adequate for this uncommon scenario.
Features don't need a good reason to not exist. It's not like Guido van Rossum compiled a list of all possible ideas and then struck out yours. If you have an idea for improved syntax you could propose it to the Python community and see if you could get them to implement it.

Python Lists append mutable variable

Still new to programming/scripting, and this one's been bothering me. I have a function that searches through a list of names, comparing it to a template list of names, and when it finds a match, it places it in my final list in the correct order. For some later functions to work correctly, I need to be able to append some of these names as arrays/lists with. I'm running into the problem that every time I need to add a list to the final list, as soon as I change the variable, the final list updates with it. How do I fix this?
light = ['template of names in here in correct order']
listUser = ['names gathered from user input']
for userChan in listUser:
for channelName in light:
#check if channelName is a list or string
if isinstance(channelName, basestring):
#search for matches in userchan
print channelName, 'is a string'
if channelName in userChan.lower():
matchFound = True
listLight.append(userChan)
else:
print channelName, 'is a list'
for piece in channelName:
print 'searching %s in %s' %(piece, userChan.lower())
if piece in userChan.lower():
print "found %s in %s" %(piece, userChan.lower())
lightMultList.append(piece)
matchFound = True
if len(lightMultList) == 2:
listLight.append(lightMultList)
del lightMultList[:]
So my problem is with the lightMultList. It's always going to be limited to 2 elements, but it changes. Hopefully this wasn't worded too horribly..
The problem is that you're only ever creating one lightMultList. You repeatedly clear it out (with del lightMultList[:]) and re-fill it, and append the same thing over and over to lightList.
The simple fix is to just create a new lightMultList each time. Which you can do by changing this line:
del lightMultList[:]
… to:
lightMultList = []
This kind of problem is often a result of trying to directly porting C or C++ code, or just thinking in C++. If you were expecting lightList.append(lightMultList) to call a "copy constructor", that's the root problem: there is no such thing in Python. Assigning a value to a variable, appending it to a list, etc., doesn't copy anything; it just binds another reference to the same value.
Also, a C++ programmer might try to optimize performance by avoiding the wasteful creation of all those temporary objects by trying to reuse the same one, but in Python, the cost of creating a new list is about the same as the cost of iterating one step over listUser in the first place. If it's slow enough to worry about, you're going to have to reorganize your code or move the whole thing to C or Cython anyway; this isn't going to help. (That being said, it's rarely a useful optimization in C++ either; the right thing to do there, on the rare occasions where it matters, is to construct the new vector in-place within the containing vector…)

Passing a collection argument without unpacking its contents

Question: What are the pros and cons of writing an __init__ that takes a collection directly as an argument, rather than unpacking its contents?
Context: I'm writing a class to process data from several fields in a database table. I iterate through some large (~100 million rows) query result, passing one row at a time to a class that performs the processing. Each row is retrieved from the database as a tuple (or optionally, as a dictionary).
Discussion: Assume I'm interested in exactly three fields, but what gets passed into my class depends on the query, and the query is written by the user. The most basic approach might be one of the following:
class Direct:
def __init__(self, names):
self.names = names
class Simple:
def __init__(self, names):
self.name1 = names[0]
self.name2 = names[1]
self.name3 = names[2]
class Unpack:
def __init__(self, names):
self.name1, self.name2, self.name3 = names
Here are some examples of rows that might be passed to a new instance:
good = ('Simon', 'Marie', 'Kent') # Exactly what we want
bad1 = ('Simon', 'Marie', 'Kent', '10 Main St') # Extra field(s) behind
bad2 = ('15', 'Simon', 'Marie', 'Kent') # Extra field(s) in front
bad3 = ('Simon', 'Marie') # Forgot a field
When faced with the above, Direct always runs (at least to this point) but is very likely to be buggy (GIGO). It takes one argument and assigns it exactly as given, so this could be a tuple or list of any size, a Null value, a function reference, etc. This is the most quick-and-dirty way I can think of to initialize the object, but I feel like the class should complain immediately when I give it data it's clearly not designed to handle.
Simple handles bad1 correctly, is buggy when given bad2, and throws an error when given bad3. It's convenient to be able to effectively truncate the inputs from bad1 but not worth the bugs that would come from bad2. This one feels naive and inconsistent.
Unpack seems like the safest approach, because it throws an error in all three "bad" cases. The last thing we want to do is silently fill our database with bad information, right? It takes the tuple directly, but allows me to identify its contents as distinct attributes instead of forcing me to keep referring to indices, and complains if the tuple is the wrong size.
On the other hand, why pass a collection at all? Since I know I always want three fields, I can define __init__ to explicitly accept three arguments, and unpack the collection using the *-operator as I pass it to the new object:
class Explicit:
def __init__(self, name1, name2, name3):
self.name1 = name1
self.name2 = name2
self.name3 = name3
names = ('Guy', 'Rose', 'Deb')
e = Explicit(*names)
The only differences I see are that the __init__ definition is a bit more verbose and we raise TypeError instead of ValueError when the tuple is the wrong size. Philosophically, it seems to make sense that if we are taking some group of data (a row of a query) and examining its parts (three fields), we should pass a group of data (the tuple) but store its parts (the three attributes). So Unpack would be better.
If I wanted to accept an indeterminate number of fields, rather than always three, I still have the choice to pass the tuple directly or use arbitrary argument lists (*args, **kwargs) and *-operator unpacking. So I'm left wondering, is this a completely neutral style decision?
This question is probably best answered by trying out the different approaches and seeing what makes the most sense to you and is the most easily understood by others reading your code.
Now that I have the benefit of more experience, I'd ask myself, how do I plan to access these values?
When I access any one of the values in this collection, am I likely to be using most or all of the values in that same subroutine or section of code? If so, the "Direct" approach is a good choice; it's the most compact and it lets me think about the collection as a collection until the point that I absolutely need to pay attention to what's inside.
On the other hand, if I'm using some values here, some values there, I don't want have to constantly remember which index to access or add verbosity in the form of dictionary keys when I could just be referring directly to the values using separately named attributes. I would probably avoid the "Direct" approach in this case so that I only have to even think about the fact that there's a collection when the class is first initialized.
Each of the remaining approaches involves splitting the collection up into different attributes, and I think the clear winner here is the "Explicit" approach. The "Simple" and "Unpack" approaches share a hidden dependency on the order of the collection, without offering any real advantage.

attribute naming, tuple vs multiple names

I was thinking about parts of my class api's and one thing that came up was the following:
Should I use a tuple/list of equal attributes or should I use several attributes, e.g. let's say I've got a Controller class which reads several thermometers.
class Controller(object):
def __init__(self):
self.temperature1 = Thermometer()
self.temperature3 = Thermometer()
self.temperature2 = Thermometer()
self.temperature4 = Thermometer()
vs.
class Controller(object):
def __init__(self):
self.temperature = tuple(Thermometer() for _ in range(4))
Is there a best practice when I should use which style?
(Let's assume the number of Thermometers will not be changed, otherwise choosing the second style with a list would be obvious.)
A tuple or list, 100%. variable1, variable2, etc... is a really common anti-pattern.
Think about how you code later - it's likely you'll want to do similar things to these items. In a data structure, you can loop over them to perform operations, with the numbered variable names, you'll have to do it manually. Not only that but it makes it easier to add in more values, it makes you code more generic and therefore more reusable, and means you can add new values mid-execution easily.
Why make the assumption the number will not be changed? More often than not, assumptions like that end up being wrong. Regardless, you can already see that the second example exemplifies the do not repeat yourself idiom that is central to clear, efficient code.
Even if you had more relevant names eg: cpu_temperature, hdd_temperature, I would say that if you ever see yourself performing the same operations on them, you want a data structure, not lots of variables. In this case, a dictionary:
temperatures = {
"cpu": ...,
"hdd": ...,
...
}
The main thing is that by storing the data in a data structure, you are giving the software the information about the grouping you are providing. If you just give them the variable names, you are only telling the programmer(s) - and if they are numbered, then you are not even really telling the programmer(s) what they are.
Another option is to store them as a dictionary:
{1: temp1, 2: temp2}
The most important thing in deciding how to store data is relaying the data's meaning, if these items are essentially the same information in a slightly different context then they should be grouped (in terms of data-type) to relay that - i.e. they should be stored as either a tuple or a dictionary.
Note: if you use a tuple and then later insert more data, e.g. a temp0 at the beginning, then there could be backwards-compatability issues where you've grabbed individual variables. (With a dictionary temp[1] will always return temp1.)

Referencing Python list elements without specifying indexes

How can I store values in a list without specifying index numbers?
For example
outcomeHornFive=5
someList = []
someList.append(outComeHornFive)
instead of doing this,
someList[0] # to reference horn five outcome
how can i do something like this? The reason is there are many items that I need to reference within the list and I just think it's really inconvenient to keep track of which index is what.
someList.hornFive
You can use another data structure if you'd like to reference things by attribute access (or otherwise via a name).
You can put them in a dict, or create a class, or do something else. It depends what kind of other interaction you want to have with that object.
(P.S., we call those lists, not arrays).
Instead of using a list you can use a dictionary.
See data types in the python documentation.
A dictionary allows you to lookup a value using a key:
my_dict["HornFive"] = 20
You cannot and you shouldn't. If you could do that, how would you refer to the list itself? And you will need to refer to the list itself.
The reason is there are many items that i need to reference within the list and I just think it's really inconvenient to keep track of which index is what.
You'll need to do something of that ilk anyway, no matter how you organize your data. If you had separate variables, you'd need to know which variable stores what. If you had your way with this, you'd still need to know that a bare someList refers to "horn five" and not to, say, "horn six".
One advantage of lists and dicts is that you can factor out this knowledge and write generic code. A dictionary, or even a custom class (if there is a finite number of semantically distinct attributes, and you'd never have to use it as a collection), may help with the readability by giving it an actual name instead of a numeric index.
referenced from http://parand.com/say/index.php/2008/10/13/access-python-dictionary-keys-as-properties/
Say you want to access the values if your dictionary via the dot notation instead of the dictionary syntax. That is, you have:
d = {'name':'Joe', 'mood':'grumpy'}
And you want to get at “name” and “mood” via
d.name
d.mood
instead of the usual
d['name']
d['mood']
Why would you want to do this? Maybe you’re fond of the Javascript Way. Or you find it more aesthetic. In my case I need to have the same piece of code deal with items that are either instances of Django models or plain dictionaries, so I need to provide a uniform way of getting at the attributes.
Turns out it’s pretty simple:
class DictObj(object):
def __init__(self, d):
self.d = d
def __getattr__(self, m):
return self.d.get(m, None)
d = DictObj(d)
d.name
# prints Joe
d.mood
# prints grumpy

Categories

Resources