Insert number to a list - python

I have an ordered dictionary like following:
source =([('a',[1,2,3,4,5,6,7,11,13,17]),('b',[1,2,3,12])])
I want to calculate the length of each key's value first, then calculate the sqrt of it, say it is L.
Insert L to the positions which can be divided without remainder and insert "1" after other number.
For example, source['a'] = [1,2,3,4,5,6,7,11,13,17] the length of it is 9.
Thus sqrt of len(source['a']) is 3.
Insert number 3 at the position which can be divided exactly by 3 (eg. position 3, position 6, position 9) if the position of the number can not be divided exactly by 3 then insert 1 after it.
To get a result like folloing:
result=([('a',["1,1","2,1","3,3","4,1","5,1","6,3","7,1","11,1","13,3","10,1"]),('b',["1,1","2,2","3,1","12,2"])]
I dont know how to change the item in the list to a string pair. BTW, this is not my homework assignment, I was trying to build a boolean retrival engine, the source data is too big, so I just created a simple sample here to explain what I want to achive :)

As this seems to be a homework, I will try to help you with the part you are facing problem with
I dont know how to change the item in the list to a string pair.
As the entire list needs to be updated, its better to recreate it rather than update it in place, though its possible as lists are mutable
Consider a list
lst = [1,2,3,4,5]
to convert it to a list of strings, you can use list comprehension
lst = [str(e) for e in lst]
You may also use built-in map as map(str,lst), but you need to remember than in Py3.X, map returns a map object, so it needs to be handled accordingly
Condition in a comprehension is best expressed as a conditional statement
<TRUE-STATEMENT> if <condition> else <FALSE-STATEMENT>
To get the index of any item in a list, your best bet is to use the built-in enumerate
If you need to create a formatted string expression from a sequence of items, its suggested to use the format string specifier
"{},{}".format(a,b)
The length of any sequence including a list can be calculated through the built-in len
You can use the operator ** with fractional power or use the math module and invoke the sqrt function to calculate the square-root
Now you just have to combine each of the above suggestion to solve your problem.

Related

Python disregarding zeros when sorting numbers in a list

I'm trying to sort a list of dollar amounts from lowest to highest using python's built in sort ability, but when I call on it, it sorts the numbers super screwy. It starts at $10,000 then goes up to $19,0000 (which is the highest) then jumps down to $2,000 and counts up from there ostensibly because 2 is bigger than 1. I don't know how to correct for this. The code I've used is below.
numbers=[['$10014.710000000001'], ['$10014.83'],['$11853.300000000001'],
['$19060.010000000006'],['$2159.1099999999997'],['$3411.1400000000003']]
print(sorted(numbers))
The key insight here is that the values in your list are actually strings, and strings are compared lexically: each character in the string is compared one at a time until the first non-matching character. So "aa" sorts before "ab", but that also means that "a1000" sorts before "a2". If you want to sort in a different way, you need to tell the sort method (or the sorted function) what it is you want to sort by.
In this case, you probably should use the decimal module. And you want the key attribute of the sort method. This will sort the existing list you have, only using the converted values during the sorting process.
import decimal
def extract_sortable_value(value):
# value is a list, so take the first element
first_value = value[0]
return decimal.Decimal(first_value.lstrip('$'))
numbers.sort(key=extract_sortable_value)
Equivalently, you could do:
print(sorted(numbers, key=extract_sortable_value))
Demo: https://repl.it/repls/MiserableDarkPatches
You are not sorting numbers but strings, which explains the "weird" result. Instead, change your type to float and sort the resulting list:
In [3]: sorted([[float(el[0][1:])] for el in numbers])
Out[3]:
[[2159.1099999999997],
[3411.1400000000003],
[10014.710000000001],
[10014.83],
[11853.300000000001],
[19060.010000000006]]
I need the el[0] because every number is inside its own list, which is not a good style, but I guess you have your reasons for this. The [1:] strips away the $ sign.
EDIT really good point made in the comments. More robust solution:
from decimal import Decimal
import decimal
decimal.getcontext().prec = 4
sorted([Decimal(el[0][1:]) for el in numbers])
Out[8]:
[Decimal('2159.1099999999997'),
Decimal('3411.1400000000003'),
Decimal('10014.710000000001'),
Decimal('10014.83'),
Decimal('11853.300000000001'),
Decimal('19060.010000000006')]
Your numbers are currency values. So as pointed out in the comments below, it might make sense to use Python's decimal module which offers several advantages over the float datatype. (See link for further information.)
If, however, this is only an exercise for better getting to know Python, as I suspect. You might look for a simpler solution:
The reason, why your sorting doesn't work, is because your numbers are stored in the list inside another list as a string. You have to convert them to integers or floats before sorting has the effect you're looking for:
numbers=[
['$10014.710000000001'],
['$10014.83'],
['$11853.300000000001'],
['$19060.010000000006'],
['$2159.1099999999997'],
['$3411.1400000000003']
]
numbers_float = [float(number[0][1:]) for number in numbers]
numbers_float.sort()
print(numbers_float)
Which prints:
[2159.1099999999997, 3411.1400000000003, 10014.710000000001, 10014.83, 11853.300000000001, 19060.010000000006]
When you look at float(number[0][1:]), then [0] takes the first (and only) number of your (inner) number list, [1:] strips the $ sign and finally float does the conversion to floating point number.
If you want the $ sign back:
for number in numbers_float:
print("${}".format(number))
Which prints:
$2159.1099999999997
$3411.1400000000003
$10014.710000000001
$10014.83
$11853.300000000001
$19060.010000000006

How would I have my code remove the lowest integer?

def calc_average(scores)
x = scores[1:]
return (sum(x) / float(len(x)))
Here is some code that is supposed to find the average of a list of numbers, but not count the lowest. I'm not sure how to find the lowest number in the list, remove it from it, and then find the average of that. Can anyone help? Thanks! (The lowest number isn't always the first number in the list, that's the mistake I made...)
You don't have to literally remove the item from the list; you can just remove it from the calculation.
def calc_average(scores)
return (sum(scores) - min(scores)) / float(len(x) - 1)
I would sort the list then romove the first element.
a = [ 3,2,6,3,8,7,2,1]
a.sort()
a = a[1:]
print a
print float(sum(a))/len(a)
The benefit of doing it this way, even though it is more work for just removing the single lowest value is that it is salable. Let's say you wanted to remove the lowest 2 or three values, that would be easy once you have the list sorted.
You can use the min() function to find the smallest value of a list, and then you can remove the value with the list.remove() function.
Check this out for reference https://docs.python.org/2/library/functions.html#min or https://docs.python.org/3.6/library/functions.html#min
You could use
return float((sum(scores) - min(scores))) / (len(scores) - 1)
That finds the sum of all the scores, then removes the smallest score, before dividing by the number of scores after the removal.
Note there is no error-checking here, so this will not work properly for a list of length zero or one. That float() is there to ensure floating point division in Python 2.x. This uses the full speed of Python functions, but it does loop over the scores twice. A hand-coded function would be faster in looping only once but would slow down again in executing multiple commands.

Find index of a sublist in a list

Trying to find the index of a sublists with an element. I’m not sure how to specify the problem exactly (which may be why I’ve overlooked it in a manual), however my problem is thus:
list1 = [[1,2],[3,4],[7,8,9]]
I want to find the first sub-list in list1 where 7 appears (in this case the index is 2, but lll could be very very long). (It will be the case that each number will appear in only 1 sub-list – or not at all. Also these are lists of integers only)
I.e. a function like
spam = My_find(list1, 7)
would give spam = 2
I could try looping to make a Boolean index
[7 in x for x in lll]
and then .index to find the 'true' - (as per Most efficient way to get indexposition of a sublist in a nested list)
However surely having to build a new boolean list is really inefficient..
My code starts with list1 being relatively small, however it keeps building up (eventually there will be 1 million numbers arranged in approx. 5000 sub-lists of list1
Any thoughts?
I could try looping to make a Boolean index
[7 in x for x in lll]
and then .index to find the 'true' … However surely having to build a new boolean list is really inefficient
You're pretty close here.
First, to avoid building the list, use a generator expression instead of a list comprehension, by just replacing the [] with ().
sevens = (7 in x for x in lll)
But how do you do the equivalent of .index when you have an arbitrary iterable, instead of a list? You can use enumerate to associate each value with its index, then just filter out the non-sevens with filter or dropwhile or another generator expression, then next will give you the index and value of the first True.
For example:
indexed_sevens = enumerate(sevens)
seven_indexes = (index for index, value in indexed_sevens if value)
first_seven_index = next(seven_indexes)
You can of course collapse all of this into one big expression if you want.
And, if you think about it, you don't really need that initial expression at all; you can do that within the later filtering step:
first_seven_index = next(index for index, value in enumerate(lll) if 7 in value)
Of course this will raise a StopIteration exception instead of a ValueError expression if there are no sevens, but otherwise, it does the same thing as your original code, but without building the list, and without continuing to test values after the first match.

List comprehension

I have some trouble with list comprehension, I think I already know how to use it well but certainly I don't.
So here is my code:
vector1=[x for x in range(0,351,10)]
first=list(range(0,91))
second=list(range(100,181))
third=list(range(190,271))
fourth=list(range(280,351))
Quadrants=first+second+third+fourth
string=['First']*91+['Second']*81+['Third']*81+['Fourth']*71
vector2=dict(zip(Quadrants,string))
Quadrant=[]
for n in range (len(vector1)):
Quadrant+=[vector2[vector1[n])]]
So i want to do the for_loop with list comprehension, but i can't... I tried this:
Quadrant=[y3 for y3 in [vector2[vector1[i]]] for i in range (len(vector1))]
Here's the code you're trying to convert to a listcomp:
Quadrant=[]
for n in range (len(vector1)):
Quadrant+=[y[vector1[n]]]
First, you have to convert that into a form using append. There's really no reason to build a 1-element list out of y[vector1[n]] in the first place, so just scrap that and we have something we can appenddirectly:
Quadrant=[]
for n in range(len(vector1)):
Quadrant.append(y[vector1[n]])
And now, we have something we can convert directly into a list comprehension:
Quadrant = [y[vector1[n]] for n in range(len(vector1))]
That's all there is to it.
However, I'm not sure why you're doing for n in range(len(vector1)) in the first place if the only thing you need n for is vector1[n]. Just loop over vector1 directly:
Quadrant=[]
for value in vector1:
Quadrant.append(y[value])
Which, again, can be converted directly:
Quadrant = [y[value] for value in vector1]
However, all of this assumes that your original explicit loop is correct in the first place, which obviously it isn't. Your vector1 is a dict, not a list. Looping over it the keys from 0 to len(vector1) is just going to raise KeyErrors all over the place. Changing it to loop directly over vector1 is going to solve that problem, but it means you're looping over the keys. So… I have no idea what your code was actually trying to do, but get the simple but verbose version right first, and you can probably convert it to a comprehension just as easily as the above.

Python Spark split list into sublists divided by the sum of value inside elements

I try to split a list of objects in python into sublists based on the cumulative value of one of the parameters in the object. Let me present it on the example:
I have a list of objects like this:
[{x:1, y:2}, {x:3, y:2}, ..., {x:5, y: 1}]
and I want to divide this list into sub-lists where the total sum of x values inside a sublist will be the same (or roughly the same) so the result could look like this:
[[{x:3, y:1}, {x:3, y:1}, {x:4, y:1}], [{x:2, y:1}, {x:2, y:1}, {x:6, y:1}]]
Where the sum of x'es is equal to 10. Objects I am working with are a little bit more complicated, and my x'es are float values. So I want to aggregate the values from the ordered list, up till the sum of x'es will be >= 10, and then start creating next sub-list.
In my case, the first list of elements is an ordered list, and the summation has to take place on the ordered list.
I done something like this already in C#, where I iterate through all my elements, and keep one counter of "x" value. I sum the value of x for consecutive objects, until it will hit my threshold, and then I create a new sub-list, and restart my counter.
Now I want to reimplement it in python, and next use it with Spark. So I am looking for a little bit more "functional" implementation, maybe something to work nicely with map-reduce framework. I can't figure out another way than the iterative approach.
If you have any suggestions, or possible solutions, I would welcome all constructive comments.

Categories

Resources