ErrorCode: UnboundLocalError on line 10 - python

Line 10 of the python code below has an UnboundLocalError. Can anyone please teach me how to fix this?
def answer(data, n):
new_data = []
for each_integer in data:
new_data = [each_integer for each_integer in data if data.count(each_integer) == n]
if n > 1:
new_data = data
print("\n\nNew Data")
print(new_data)
supplied_data = [53, 85, 29, 23, 29, 26, 88, 78, 5, 75, 74, 44, 33, 62, 98, 50, 89, 93, 24, 14, 74, 49, 83, 45, 41, 14, 68, 76, 68, 8, 77, 85, 17, 3, 9, 30, 71, 48, 18, 25, 86, 55, 55, 20, 74, 76, 99, 87, 59, 87, 36, 29, 29, 8, 22, 65, 1, 18, 23, 5, 13, 60, 7, 5, 98, 61, 78, 64, 36, 60, 49, 57, 31, 32, 41, 86, 52, 90, 9, 55, 35, 35, 2, 44, 8, 19, 96, 81, 68, 7, 8, 51, 9, 76, 12, 96, 61, 99, 74]
answer(supplied_data, 0)
answer(supplied_data, 1)
answer(supplied_data, 6)
The Traceback
>>> def answer(data, n):
... for each_integer in data:
... new_data = [each_integer for each_integer in data if data.count(each_integer) == n]
... if n > 1:
... new_data = data
... print("\n\nNew Data")
... print(new_data)
...
... supplied_data = [53, 85, 29, 23, 29, 26, 88, 78, 5, 75, 74, 44, 33, 62, 98, 50, 89, 93, 24, 14, 74, 49, 83, 45, 41, 14, 68, 76, 68, 8, 77, 85, 17, 3, 9, 30, 71, 48, 18, 25, 86, 55, 55, 20,
74, 76, 99, 87, 59, 87, 36, 29, 29, 8, 22, 65, 1, 18, 23, 5, 13, 60, 7, 5, 98, 61, 78, 64, 36, 60, 49, 57, 31, 32, 41, 86, 52, 90, 9, 55, 35, 35, 2, 44, 8, 19, 96, 81, 68, 7, 8, 51, 9, 76, 12
, 96, 61, 99, 74]
File "", line 9
supplied_data = [53, 85, 29, 23, 29, 26, 88, 78, 5, 75, 74, 44, 33, 62, 98, 50, 89, 93, 24, 14, 74, 49, 83, 45, 41, 14, 68, 76, 68, 8, 77, 85, 17, 3, 9, 30, 71, 48, 18, 25, 86, 55, 55, 20,
74, 76, 99, 87, 59, 87, 36, 29, 29, 8, 22, 65, 1, 18, 23, 5, 13, 60, 7, 5, 98, 61, 78, 64, 36, 60, 49, 57, 31, 32, 41, 86, 52, 90, 9, 55, 35, 35, 2, 44, 8, 19, 96, 81, 68, 7, 8, 51, 9, 76, 12
, 96, 61, 99, 74]
^
SyntaxError: invalid syntax
>>> answer(supplied_data, 0)
Traceback (most recent call last):
File "", line 1, in
NameError: name 'answer' is not defined
>>> answer(supplied_data, 1)
Traceback (most recent call last):
File "", line 1, in
NameError: name 'answer' is not defined
>>> answer(supplied_data, 6)
Traceback (most recent call last):
File "", line 1, in
NameError: name 'answer' is not defined
>>>

The problem is you have a branch in your logic that uses new_data but it is possible that new_data is never defined. This will happen when data is empty, so you never enter the for-loop body and create new_data, and n <= 1, i.e. you enter the else-block, where you use new_data without it being defined.
See:
>>> answer([],1)
New Data
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 10, in answer
UnboundLocalError: local variable 'new_data' referenced before assignment
You could solve this quickly by putting new_data = data at the top of your function. Honestly, this approach is really inefficient because it works in quadratic time. some_list.count iterates over the entire list each time! So, it is better to make one pass to count element, and then another pass to filter - this will be linear time:
>>> from collections import Counter
>>> def answer(data, n):
... counts = Counter(data)
... return [e for e in data if counts[e] == n]
...
>>> answer(data, 6)
[]
>>> answer(data, 2)
[85, 23, 78, 44, 98, 14, 49, 41, 14, 85, 18, 86, 99, 87, 87, 36, 18, 23, 60, 7, 98, 61, 78, 36, 60, 49, 41, 86, 35, 35, 44, 96, 7, 96, 61, 99]
>>> answer(data, 3)
[5, 68, 76, 68, 9, 55, 55, 76, 5, 5, 9, 55, 68, 9, 76]
>>> answer(data, 4)
[29, 29, 74, 74, 8, 74, 29, 29, 8, 8, 8, 74]
>>> answer(data, 5)

Related

Keep remaining numbers in range 100 except numbers in the array

Array
a = (0, 3, 5, 8, 11, 12, 14, 15, 18, 20, 21, 22, 26, 26, 28, 33, 38, 41, 42, 42, 51, 52, 61, 62, 64, 65, 67, 69, 73, 76, 79, 82, 83, 84, 85, 86, 93, 94, 96, 97)
How to print the remaining numbers in the range 0-100, except those numbers in a?
You can use sets and subtract a from the range of numbers 0 - 100:
a = (0, 3, 5, 8, 11, 12, 14, 15, 18, 20, 21, 22, 26, 26, 28, 33, 38, 41, 42, 42, 51, 52, 61, 62, 64, 65, 67, 69, 73, 76, 79, 82, 83, 84, 85, 86, 93, 94, 96, 97)
print(set(range(101)) - set(a))
Prints:
{1, 2, 4, 6, 7, 9, 10, 13, 16, 17, 19, 23, 24, 25, 27, 29, 30, 31, 32, 34, 35, 36, 37, 39, 40, 43, 44, 45, 46, 47, 48, 49, 50, 53, 54, 55, 56, 57, 58, 59, 60, 63, 66, 68, 70, 71, 72, 74, 75, 77, 78, 80, 81, 87, 88, 89, 90, 91, 92, 95, 98, 99, 100}
If order is crucial, you can filter the range by removing items in a -- still using set(a) to make it efficient.
a = (0, 3, 5, 8, 11, 12, 14, 15, 18, 20, 21, 22, 26, 26, 28, 33, 38, 41, 42, 42, 51, 52, 61, 62, 64, 65, 67, 69, 73, 76, 79, 82, 83, 84, 85, 86, 93, 94, 96, 97)
s_a = set(a)
filtered = [n for n in range(101) if n not in s_a]

PySpark RDD filtered-out elements coming back

I am trying to implement a Sieve of Eratosthenes using PySpark.
For this, I am trying to apply many filter s to my RDD, but on each iteration, whatever was filtered out during the previous iterations keeps coming back, and I wonder why.
Here's the code:
from math import ceil
from math import sqrt
min_number = 2
max_number = 101
rdd = sc.parallelize(range(min_number, max_number), 4)
pivot = min_number
max_pivot = ceil(sqrt(max_number))
while pivot <= max_pivot:
print "RDD for pivot = " + str(pivot) + ":"
rdd = rdd.filter(lambda x: x <= pivot or x % pivot != 0)
pivot = rdd.filter(lambda x: x > pivot).reduce(min)
rdd.collect()
And the output:
Pivot = 2
[2, 3, 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19, 20, 22, 23, 25, 26, 28, 29, 31, 32, 34, 35, 37, 38, 40, 41, 43, 44, 46, 47, 49, 50, 52, 53, 55, 56, 58, 59, 61, 62, 64, 65, 67, 68, 70, 71, 73, 74, 76, 77, 79, 80, 82, 83, 85, 86, 88, 89, 91, 92, 94, 95, 97, 98, 100]
Pivot = 3
[2, 3, 4, 5, 6, 7, 9, 10, 11, 13, 14, 15, 17, 18, 19, 21, 22, 23, 25, 26, 27, 29, 30, 31, 33, 34, 35, 37, 38, 39, 41, 42, 43, 45, 46, 47, 49, 50, 51, 53, 54, 55, 57, 58, 59, 61, 62, 63, 65, 66, 67, 69, 70, 71, 73, 74, 75, 77, 78, 79, 81, 82, 83, 85, 86, 87, 89, 90, 91, 93, 94, 95, 97, 98, 99]
Pivot = 4
[2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 16, 17, 18, 19, 21, 22, 23, 24, 26, 27, 28, 29, 31, 32, 33, 34, 36, 37, 38, 39, 41, 42, 43, 44, 46, 47, 48, 49, 51, 52, 53, 54, 56, 57, 58, 59, 61, 62, 63, 64, 66, 67, 68, 69, 71, 72, 73, 74, 76, 77, 78, 79, 81, 82, 83, 84, 86, 87, 88, 89, 91, 92, 93, 94, 96, 97, 98, 99]
Pivot = 5
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 19, 20, 21, 22, 23, 25, 26, 27, 28, 29, 31, 32, 33, 34, 35, 37, 38, 39, 40, 41, 43, 44, 45, 46, 47, 49, 50, 51, 52, 53, 55, 56, 57, 58, 59, 61, 62, 63, 64, 65, 67, 68, 69, 70, 71, 73, 74, 75, 76, 77, 79, 80, 81, 82, 83, 85, 86, 87, 88, 89, 91, 92, 93, 94, 95, 97, 98, 99, 100]
Pivot = 6
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 29, 30, 31, 32, 33, 34, 36, 37, 38, 39, 40, 41, 43, 44, 45, 46, 47, 48, 50, 51, 52, 53, 54, 55, 57, 58, 59, 60, 61, 62, 64, 65, 66, 67, 68, 69, 71, 72, 73, 74, 75, 76, 78, 79, 80, 81, 82, 83, 85, 86, 87, 88, 89, 90, 92, 93, 94, 95, 96, 97, 99, 100]
Pivot = 7
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28, 29, 30, 31, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44, 45, 46, 47, 49, 50, 51, 52, 53, 54, 55, 57, 58, 59, 60, 61, 62, 63, 65, 66, 67, 68, 69, 70, 71, 73, 74, 75, 76, 77, 78, 79, 81, 82, 83, 84, 85, 86, 87, 89, 90, 91, 92, 93, 94, 95, 97, 98, 99, 100]
Pivot = 8
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 20, 21, 22, 23, 24, 25, 26, 28, 29, 30, 31, 32, 33, 34, 35, 37, 38, 39, 40, 41, 42, 43, 44, 46, 47, 48, 49, 50, 51, 52, 53, 55, 56, 57, 58, 59, 60, 61, 62, 64, 65, 66, 67, 68, 69, 70, 71, 73, 74, 75, 76, 77, 78, 79, 80, 82, 83, 84, 85, 86, 87, 88, 89, 91, 92, 93, 94, 95, 96, 97, 98, 100]
Pivot = 9
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 29, 31, 32, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44, 45, 46, 47, 48, 49, 51, 52, 53, 54, 55, 56, 57, 58, 59, 61, 62, 63, 64, 65, 66, 67, 68, 69, 71, 72, 73, 74, 75, 76, 77, 78, 79, 81, 82, 83, 84, 85, 86, 87, 88, 89, 91, 92, 93, 94, 95, 96, 97, 98, 99]
Pivot = 10
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 100]
Pivot = 11
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 97, 98, 99, 100]
As you can see, on each iteration, only multiples of the current pivot are being filtered out, but numbers that had already being filtered out keep coming back, even when I replace the rdd reference on each iteration.
In case it is of any help, I am running PySpark 2.0.1 on Python 2.7.10 for Mac.
Thanks!
Python closures are evaluated when function is called, not when it is created (late binding).
As a result in the first iteration rdd is evaluated as:
(sc.parallelize(range(min_number, max_number), 4)
.filter(lambda x: x <= 2 or x % 2 != 0))
in the second one:
(sc.parallelize(range(min_number, max_number), 4)
.filter(lambda x: x <= 3 or x % 3 != 0)
.filter(lambda x: x <= 3 or x % 3 != 0))
in the third one:
(sc.parallelize(range(min_number, max_number), 4)
.filter(lambda x: x <= 4 or x % 4 != 0)
.filter(lambda x: x <= 4 or x % 4 != 0)
.filter(lambda x: x <= 4 or x % 4 != 0))
and each time pivot is resolved in the current scope.
Correct implementation:
while pivot <= max_pivot:
def f(x, pivot=pivot):
return x <= pivot or x % pivot != 0
rdd = rdd.filter(f)
pivot = rdd.filter(lambda x: x > pivot).min()

convert list of strings from file to list of integers

I have a large file filled with integers separated by white space and comma. I am trying to read in 1KB at a time and convert it into a list of integers.
This code works fine:
with open('test_age.txt', 'r+') as inf:
with open('test_age_out.txt', 'r+') as outf:
sorted_list =[]
a = [x.strip() for x in inf.read(1000).split(',')]
int_a = map(int, a)
f = tempfile.TemporaryFile()
outf_array = sorted(int_a)
f.write(str(outf_array))
f.seek(0)
#etc...
output:
[1, 1, 2, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, etc...
But once I add in a while loop to read the next 1KB:
with open('test_age.txt', 'r+') as inf:
with open('test_age_out.txt', 'r+') as outf:
sorted_list =[]
while True:
a = [x.strip() for x in inf.read(1000).split(',')]
int_a = map(int, a)
if not a:
break
f = tempfile.TemporaryFile()
outf_array = sorted(int_a)
print outf_array
f.write(str(outf_array))
f.seek(0)
I get the output and a ValueError:
[1, 1, 2, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8,
8, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 12, 12, 12,
12, 12, 12, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 15, 15, 16, 17, 18,
19, 19, 20, 20, 20, 20, 21, 21, 22, 22, 22, 23, 23, 24, 24, 24, 24, 25,
25, 25, 25, 25, 26, 26, 26, 26, 27, 27, 27, 28, 28, 29, 30, 30, 30, 30,
31, 31, 31, 32, 32, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 35, 35,
35, 35, 35, 36, 36, 37, 37, 37, 37, 38, 38, 39, 39, 39, 39, 39, 39, 40,
40, 40, 40, 41, 41, 42, 43, 43, 43, 44, 44, 44, 44, 44, 45, 46, 46, 46,
46, 47, 47, 47, 47, 47, 48, 48, 48, 48, 48, 48, 49, 49, 49, 50, 50, 50,
50, 50, 50, 51, 51, 51, 51, 51, 51, 52, 52, 52, 52, 52, 52, 53, 53, 54,
54, 54, 55, 55, 55, 55, 56, 56, 56, 56, 56, 57, 57, 57, 57, 58, 58, 58,
59, 59, 60, 60, 60, 61, 62, 62, 62, 62, 63, 63, 63, 63, 63, 63, 63, 64,
64, 64, 65, 66, 66, 67, 67, 67, 67, 68, 68, 68, 68, 68, 69, 69, 69, 69,
69, 69, 69, 70, 70, 70, 70, 71, 71, 72, 72, 73, 74, 74, 74, 75, 76, 76,
76, 76, 77, 77, 77, 77, 78, 78, 79, 79, 79, 79, 81, 81, 81, 81, 82, 82,
82, 82, 82, 83, 83, 83, 83, 84, 85, 85, 85, 85, 86, 86, 86, 87, 87, 87,
87, 87, 87, 88, 88, 88, 88, 88, 88, 88, 89, 89, 89, 89, 90, 90, 90, 91,
91, 91, 91, 91, 91, 91, 92, 92, 93, 93, 93, 94, 94, 94, 94, 95, 95,
96, 96, 96, 97, 97, 98, 99, 100, 100, 100, 100, 100]
[2, 3, 3, 3, 3, 4, 4, 5, 5, 6, 8, 9, 10, 10, 11, 11, 11, 11, 12, 12,12,
13, 14, 15, 17, 17, 17, 17, 17, 17, 18, 18, 18, 20, 21, 22, 22, 22, 22,
23, 23, 24, 24, 24, 26, 27, 27, 27, 27, 28, 28, 29, 29, 29, 29, 30, 32,
32, 32, 32, 33, 33, 34, 34, 36, 37, 37, 37, 37, 38, 39, 41, 41, 42, 43,
44, 44, 46, 46, 47, 48, 49, 49, 49, 49, 51, 51, 52, 52, 52, 52, 53, 54,
54, 54, 55, 55, 56, 60, 60, 61, 61, 61, 62, 63, 63, 64, 65, 65, 65, 65,
66, 66, 67, 68, 68, 68, 70, 70, 73, 73, 73, 74, 74, 75, 75, 75, 77, 77,
77, 77, 78, 78, 78, 78, 79, 80, 81, 81, 82, 82, 83, 83, 83, 83, 84, 84,
85, 85, 85, 85, 86, 87, 88, 90, 91, 91, 91, 92, 93, 93, 93, 94, 95, 97,
98, 98, 99, 100]
int_a = map(int, a)
ValueError: invalid literal for int() with base 10: ''
I am not sure why this is happening. If I call print, it seems as if the lists ARE being created and sorted. However the ValueError exists. What gives?
Look at the output of str.split with a passed delimiter appearing at the head or tail of a string:
>>> ', 3, 5'.split(', ')
['', '3', '5']
That empty string is what your program is trying (and failing) to parse as an integer. ''.strip() doesn't help (and isn't necessary for int(), by the way - it automatically ignores leading and trailing whitespace). I recommend reading blocks that are guaranteed to be full and valid, such as lines. If the file is just one big line, you'll have to do some extra work to save the last characters from a line and move them into the next line's processing. Don't forget to process the remaining characters after the loop.
line = inf.read(1000)
new += line
current, delimiter, new = line.rpartition(', ')
# process current
# continue loop to add more content
If the file can comfortably fit in your system's memory, you could just read the entire file and split it in one go:
numbers = map(int, inf.read().split(', '))

not all duplicates being removed

This is a sample code I am running to remove duplicates from sorted and comma separated list.
But it is not removing some duplicates.....
import sys
beginning=1;
prev=0;
f=open(sys.argv[1]);
for line in f:
lst=line.split(",")
for num in lst:
if(beginning==1):
sys.stdout.write("if case ")
sys.stdout.write(num)
beginning=0
prev=num
else:
if(num==prev):
continue;
else:
sys.stdout.write("else case ")
sys.stdout.write(",")
sys.stdout.write(num)
prev=num
beginning=1
Have tried many times to figure our what is wrong, working fine in java.
you dont need to do that whole process when you can use set()
example:
>>> my_list = [1,4,2,3,4,4,3,1,1,5,6,4,3,2]
>>> set(my_list)
set([1, 2, 3, 4, 5, 6])
>>>
set() will take all duplicate items out of youre list and leave you with one of each item
read more here
given a file k.txt
k.txt
1, 2, 3, 4, 5, 6, 4, 2, 3, 2, 1, 4, 6, 7, 4, 3, 4, 8, 9, 0, 0, 0
you can do the following:
import numpy as np
# split it in to a list of values and get rid of the newline
a = open('k.txt','r').read().replace('\n','').split(',')
np.unique(a) # returns unique values and sorts it for you :)
why is this better than set?
well:
given a large set:
a = np.random.randint(0,100,size=(100000))
>>> b = time(); set(a); print time()-b
set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])
0.0197851657867
>>> b = time(); np.unique(a); print time()-b
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])
0.00981211662292
--> faster run times :D

Percent list slicing

I'm using python 3.2.3 IDLE and this is my code:
originalList = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]
newList = orginalList[0.05:0.95] #<<<<I have no idea what I'm doing here
print (newList)
I have an original list of numbers, they are 1 - 100 and i want to make a new list from the original list however the new list must only have data that belongs to the sub-range 5%- 95% of the original list
so the new list must be like [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18....95]. How do i do that? i know my newList code is wrong
originalList.sort()
newList = originalList[int(len(originalList) * .05) : int(len(originalList) * .95)]
sl = slice(4, 95)
print(originalList[sl])
Also see http://docs.python.org/2/library/functions.html#slice
size = len(originalList)
newList = originalList[0.05*size - 1:0.95*size + 1]
If you want to get part of a list, the syntax is
List = [1,2,3,4,5,6,7,8,9,10]
newList = [*start index*:*Index to end AT*]
so, the first number is the index where the sub-list starts, while the second number is the index at which the sublist stops (that index is not included).
hope this helps!
I'd also use a list comprehension for creating the original list... less mistake prone.
originalList = range(1,101)
newList = originalList[(len(originalList)*.05)-1:len(originalList)*.95]
print newList
Gives the desired result...
Edit: Changed range to be more concise per comment below.
For lists of arbitrary length, you could do:
>>> l = range(200)
>>> percentage = 5
>>> skip = int(len(l) * (float(percentage) / 100) / 2)
>>> len(l[skip:-skip])
190
You could use the fidx module, which allows percentages as indexes:
import fidx
originalList = fidx([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100])
# or better: originalList = fidx.list(range(1,101))
newList = originalList[0.05:0.95]
print (newList)
which returns
[6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95]

Categories

Resources