Select columns in pyrhon based on a condition - python

I am new to Python!
I have an input vector of p. I am trying to select columns of p such that p(i)>2 and put them into a new vector y. e.g. something like below which by the way, gives error:
y=(p[i]>2)

If I understand correctly, your question is not about Pandas Dataframe, rather about regular Python List. If so, you can use list comprehension.
A list comprehension is a short syntax for iterating through a list and picking the elements that satisfy a certain condition.
Let's see first how you can accomplish what you want with a regular for loop (the non-pythonic way):
my_list = [1, 4, 6, 1, 0]
my_new_list = []
for n in my_list:
if n > 2:
my_new_list.append(n)
Now, Python makes such a selection of elements from a list very easy using the list comprehension syntax:
my_new_list = [n for n in my_list if n > 2]
where the first n refers to what we append to my_new_list, then comes the for loop and finally the filtering condition.

In python you have to select give the column name value inside the bracket so you need to give the column name first and then you will be able to acces that column and then condition will be working fine. LIke this:
y = dataframe[p['i'] > 80]
and also you will be getting a column which will be taken as dataframe. visit this website for more information.

Related

how to apply function to a list element within a list of lists?

I have a list of lists. Here is an example of 2 of the lists inside a list:
global_tp_old = [[2, 1, 0.8333595991134644],[2, 1, 0.8530714511871338]]
I want to access a dataframe index where the index is specified in the first element of the above list in a list. At the moment I have tried:
global_tp_new = []
for element in global_tp_old:
element[:][0] = df_unique[element[:][0]]
global_tp_new.append(element)
where df_unique is a pandas dataframe produced like this:
['img1.png', 'img2.png', 'img3.png']
I'm trying to match the first element from the list defined above to the number in df_unique.
I should get:
'img3.png'
as it's the 3rd element (0 indexing)
However, I get the incorrect output where it essentially returns the first element every time. It's probably obvious but what do I do to fix this?
Remember that your element array is actually a reference into the original list. If you modify the list, you'll modify global_tp_old as well.
Something like this, although you may need to change the dataframe indexing depending on whether you're looking for rows or columns.
global_tp_old = [[2, 1, 0.8333595991134644],[2, 1, 0.8530714511871338]]
global_tp_new = []
for element in global_tp_old:
element = [df_unique.iloc[element[0]]] + element[1:]
global_tp_new.append(element)
List comprehension might be useful to apply a function fun to the first element of each list in a list of lists (LoL).
LoL = [[61, 1, 0.8333595991134644],[44, 1, 0.8530714511871338]]
newL = [fun(l_loc[0]) for l_loc in LoL]
No need to use a Pandas DataFrame.

Pyspark RDD: find index of an element

I am new to pyspark and I am trying to convert a list in python to rdd and then I need to find elements index using the rdd. For the first part I am doing:
list = [[1,2],[1,4]]
rdd = sc.parallelize(list).cache()
So now the rdd is actually my list. The thing is that I want to find index of any arbitrary element something like "index" function which works for python lists. I am aware of a function called zipWithIndex which assign index to each element but I could not find proper example in python (there are examples with java and scala).
Thanks.
Use filter and zipWithIndex:
rdd.zipWithIndex().
filter(lambda (key,index) : key == [1,2]).
map(lambda (key,index) : index).collect()
Note that [1,2] here can be easily changed to a variable name and this whole expression can be wrapped within a function.
How It Works
zipWithIndex simply returns a tuple of (item,index) like so:
rdd.zipWithIndex().collect()
> [([1, 2], 0), ([1, 4], 1)]
filter finds only those that match a particular criterion (in this case, that key equals a specific sublist):
rdd.zipWithIndex().filter(lambda (key,index) : key == [1,2]).collect()
> [([1, 2], 0)]
map is fairly obvious, we can just get back the index:
rdd.zipWithIndex().filter(lambda (key,index) : key == [1,2]).
map(lambda (key,index): index).collect()
> [0]
and then we can simply get the first element by indexing [0] if you want.

Find index of a sublist in a list

Trying to find the index of a sublists with an element. I’m not sure how to specify the problem exactly (which may be why I’ve overlooked it in a manual), however my problem is thus:
list1 = [[1,2],[3,4],[7,8,9]]
I want to find the first sub-list in list1 where 7 appears (in this case the index is 2, but lll could be very very long). (It will be the case that each number will appear in only 1 sub-list – or not at all. Also these are lists of integers only)
I.e. a function like
spam = My_find(list1, 7)
would give spam = 2
I could try looping to make a Boolean index
[7 in x for x in lll]
and then .index to find the 'true' - (as per Most efficient way to get indexposition of a sublist in a nested list)
However surely having to build a new boolean list is really inefficient..
My code starts with list1 being relatively small, however it keeps building up (eventually there will be 1 million numbers arranged in approx. 5000 sub-lists of list1
Any thoughts?
I could try looping to make a Boolean index
[7 in x for x in lll]
and then .index to find the 'true' … However surely having to build a new boolean list is really inefficient
You're pretty close here.
First, to avoid building the list, use a generator expression instead of a list comprehension, by just replacing the [] with ().
sevens = (7 in x for x in lll)
But how do you do the equivalent of .index when you have an arbitrary iterable, instead of a list? You can use enumerate to associate each value with its index, then just filter out the non-sevens with filter or dropwhile or another generator expression, then next will give you the index and value of the first True.
For example:
indexed_sevens = enumerate(sevens)
seven_indexes = (index for index, value in indexed_sevens if value)
first_seven_index = next(seven_indexes)
You can of course collapse all of this into one big expression if you want.
And, if you think about it, you don't really need that initial expression at all; you can do that within the later filtering step:
first_seven_index = next(index for index, value in enumerate(lll) if 7 in value)
Of course this will raise a StopIteration exception instead of a ValueError expression if there are no sevens, but otherwise, it does the same thing as your original code, but without building the list, and without continuing to test values after the first match.

Indexing According to Number in the Names of Objects in a List in Python

Apologies for my title not being the best. Here is what I am trying to accomplish:
I have a list:
list1 = [a0_something, a2_something, a1_something, a4_something, a3_something]
i have another list who entries are tuples including a name such as :
list2 = [(x1,y1,z1,'bob'),(x2,y2,z2,'alex')...]
the 0th name in the second list corresponds to a0_something and the name in the 1st entry of the second list corresponds to a1_something. basically the second list is in the write order but the 1st list isnt.
The program I am working with has a setName function I would like to do this
a0_something.setName(list2[0][4])
and so on with a loop.
So that I can really just say
for i in range(len(list1)):
a(i)_something.setName(list2[i][4])
Is there anyway I can refer to that number in the a#_something so that I can iterate with a loop?
No.
Variable names have no meaning in run-time. (Unless you're doing introspection, which I guarantee you is something you should not be doing.)
Use a proper list such that:
lst = [a0_val, a1_val, a2_val, a3_val, a4_val]
and then address it by lst[0].
Alternatively, if those names have meanings, use a dict where:
dct = {
'a0' : a0_val,
'a1' : a1_val,
# ...
}
and use it with dct['a0'].
The enumerate function lets you get the value and the index of the current item. So, for your example, you could do:
for i, asomething in enumerate(list1):
asomething.setName(list2[i][3])
Since your list2 is length 4, the final element is index 3 (you could also use -1)

How to unpack only some arguments from zip, not all?

My sql query:
select id,value,zvalue from axis
gives me result like this:
ans=(1,23,34)(12,34,35)(31,67,45)(231,3412,234)
now if i want all these 3 variables as 3 different lists
id,value,zvalue=zip(*ans)
it will give me 3 separate lists.
but if i only want id and value as separate lists.It will give me TOO MANY VALUES TO UNPACK ERROR.
id,value =zip(*ans)
is there any way where i can create any number of lists from sql query.because if there are 10 parameters in the query , i have to use all the parameters while using ZIP???
please help
The number of arguments must match, this is a rule in Python 2. For Python 3, you can use * to capture into a list.
The common pythonic (2.x) workaround is to use _ to denote variables you won't use, i.e.:
id,value,_ = zip(*ans) # only works for exactly three values
As DSM commented, for Python 3, you can use * to grab "remaining" args as a list:
id, value, *_ = zip(*ans) # _ will be a list of zero or more args
Or, simplest, just slice the return from zip:
id,value = zip(*ans)[:2] # ignore all but first two values
If you are using Python 3 you can use this for unpacking n additional elements:
In [0]: a, b, *_ = (1, 2, 3, 4)
In [1]: a
1
I think you might be looking for something like this:
ids = [t[0] for t in ans]
values = [t[1] for t in ans]
The first list comprehension gets the first column in all tuples in ans, that is, the id column. The second list comprehension gets the second column for all tuples in ans, that is, the value column.

Categories

Resources