Select first 20% of list, then next 20% of list - python

I have a list like this with about 141 entries:
training = [40.0,49.0,77.0,...... 3122.0]
and my goal is to select the first 20% of the list. I did it like this:
testfile_first20 = training[0:int(len(set(training))*0.2)]
testfile_second20 = training[int(len(set(training))*0.2):int(len(set(training))*0.4)]
testfile_third20 = training[int(len(set(training))*0.4):int(len(set(training))*0.6)]
testfile_fourth20 = training[int(len(set(training))*0.6):int(len(set(training))*0.8)]
testfile_fifth20 = training[int(len(set(training))*0.8):]
Is there any way to do this automatically in a loop? This is my way of selecting the Kfold.
Thank you.

You can use list comprehensions:
div_length = int(0.2*len(set(training)))
testfile_divisions = [training[i*div_length:(i+1)*div_length] for i in range(5)]
This will give you your results stacked in a list:
>>> [testfile_first20, testfile_second20, testfile_third20, testfile_fourth20, testfile_fifth20]
If len(training) does not divide equally into five parts, then you can either have five full divisions with a sixth taking the remainder as follows:
import math
div_length = math.floor(0.2*len(set(training)))
testfile_divisions = [training[i*div_length:min(len(training), (i+1)*div_length)] for i in range(6)]
or you can have four full divisions with the fifth taking the remainder as follows:
import math
div_length = math.ceil(0.2*len(set(training)))
testfile_divisions = [training[i*div_length:min(len(training), (i+1)*div_length)] for i in range(5)]

Here's a simple take with list comprehension
lst = list('abcdefghijkl')
l = len(lst)
[lst[i:i+l//5] for i in range(0, l, l//5)]
# [['a', 'b'],
# ['c', 'd'],
# ['e', 'f'],
# ['g', 'h'],
# ['i', 'j'],
# ['k', 'l']]
Edit: Actually now that I look at my answer, it's not a true 20% representation as it returns 6 sublists instead of 5. What is expected to happen when the list cannot be equally divided into 5 parts? I'll leave this up for now until further clarifications are given.

You can loop this by just storing the "size" of 20% and the current starting point in two variables. Then add one to the other:
start = 0
twenty_pct = len(training) // 5
parts = []
for k in range(5):
parts.append(training[start:start+twenty_pct])
start += twenty_pct
However, I suspect there are numpy/pandas/scipy operations that might be a better match for what you want. For example, sklearn includes a function called KFold: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html

Something like this, but maybe you may lose an element due to rounding.
tlen = float(len(training))
testfiles = [ training[ int(i*0.2*tlen): int((i+1)*0.2*tlen) ] for i in range(5) ]

Related

Creating a Python list with given indexes for each repeating element

First list : contains the list indexes of corresponding category name
Second list : contains the category names as string
Intervals=[[Indexes_Cat1],[Indexes_Cat2],[Indexes_Cat3], ...]
Category_Names=["cat1","cat2","cat3",...]
Desired Output:
list=["cat1", "cat1","cat2","cat3","cat3"]
where indexes of any element in output list is placed using Intervals list.
Ex1:
Intervals=[[0,4], [2,3] , [1,5]]
Category_Names=["a","b","c"]
Ex: Output1
["a","c","b","b","a","c"]
Edit: More Run Cases
Ex2:
Intervals=[[0,1], [2,3] , [4,5]]
Category_Names=["a","b","c"]
Ex: Output2
["a","a","b","b","c","c"]
Ex3:
Intervals=[[3,4], [1,5] , [0,2]]
Category_Names=["a","b","c"]
Ex: Output3
["c","b","c","a","a","b"]
My solution:
Create any empty array of size n.
Run a for loop for each category.
output=[""]*n
for i in range(len(Category_Names)):
for index in Intervals[I]:
output[index]=Categories[i]
Is there a better solution, or a more pythonic way? Thanks
def categorise(Intervals=[[0,4], [2,3] , [1,5]],
Category_Names=["a","b","c"]):
flattened = sum(Intervals, [])
answer = [None] * (max(flattened) + 1)
for indices, name in zip(Intervals, Category_Names):
for i in indices:
answer[i] = name
return answer
assert categorise() == ['a', 'c', 'b', 'b', 'a', 'c']
assert categorise([[3,4], [1,5] , [0,2]],
["a","b","c"]) == ['c', 'b', 'c', 'a', 'a', 'b']
Note that in this code you will get None values in the answer if the "intervals" don't cover all integers from zero to the max interval number. It is assumed that the input is compatable.
I am not sure if there is a way to avoid the nested loop (I can't think of any right now) so it seems your solution is good.
A way you could do it a bit better is to construct the output array with one of the categories:
output = [Category_Names[0]]*n
and then start the iteration skipping that category:
for i in range(1, len(Category_Names)):
If you know there is a category that appears more than the others then you should use that as the one initializing the array.
I hope this helps!
You can reduce the amount of strings created and use enumerate to avoid range(len(..)) for indexing.
Intervals=[[0,4], [2,3] , [1,5]]
Category_Names=["a","b","c"]
n = max(x for a in Intervals for x in a) + 1
# do not construct strings that get replaced anyhow
output=[None] * n
for i,name in enumerate(Category_Names):
for index in Intervals[i]:
output[index]=name
print(output)
Output:
["a","c","b","b","a","c"]

all combination of a complicated list

I want to find all possible combination of the following list:
data = ['a','b','c','d']
I know it looks a straightforward task and it can be achieved by something like the following code:
comb = [c for i in range(1, len(data)+1) for c in combinations(data, i)]
but what I want is actually a way to give each element of the list data two possibilities ('a' or '-a').
An example of the combinations can be ['a','b'] , ['-a','b'], ['a','b','-c'], etc.
without something like the following case of course ['-a','a'].
You could write a generator function that takes a sequence and yields each possible combination of negations. Like this:
import itertools
def negations(seq):
for prefixes in itertools.product(["", "-"], repeat=len(seq)):
yield [prefix + value for prefix, value in zip(prefixes, seq)]
print list(negations(["a", "b", "c"]))
Result (whitespace modified for clarity):
[
[ 'a', 'b', 'c'],
[ 'a', 'b', '-c'],
[ 'a', '-b', 'c'],
[ 'a', '-b', '-c'],
['-a', 'b', 'c'],
['-a', 'b', '-c'],
['-a', '-b', 'c'],
['-a', '-b', '-c']
]
You can integrate this into your existing code with something like
comb = [x for i in range(1, len(data)+1) for c in combinations(data, i) for x in negations(c)]
Once you have the regular combinations generated, you can do a second pass to generate the ones with "negation." I'd think of it like a binary number, with the number of elements in your list being the number of bits. Count from 0b0000 to 0b1111 via 0b0001, 0b0010, etc., and wherever a bit is set, negate that element in the result. This will produce 2^n combinations for each input combination of length n.
Here is one-liner, but it can be hard to follow:
from itertools import product
comb = [sum(t, []) for t in product(*[([x], ['-' + x], []) for x in data])]
First map data to lists of what they can become in results. Then take product* to get all possibilities. Finally, flatten each combination with sum.
My solution basically has the same idea as John Zwinck's answer. After you have produced the list of all combinations
comb = [c for i in range(1, len(data)+1) for c in combinations(data, i)]
you generate all possible positive/negative combinations for each element of comb. I do this by iterating though the total number of combinations, 2**(N-1), and treating it as a binary number, where each binary digit stands for the sign of one element. (E.g. a two-element list would have 4 possible combinations, 0 to 3, represented by 0b00 => (+,+), 0b01 => (-,+), 0b10 => (+,-) and 0b11 => (-,-).)
def twocombinations(it):
sign = lambda c, i: "-" if c & 2**i else ""
l = list(it)
if len(l) < 1:
return
# for each possible combination, make a tuple with the appropriate
# sign before each element
for c in range(2**(len(l) - 1)):
yield tuple(sign(c, i) + el for i, el in enumerate(l))
Now we apply this function to every element of comb and flatten the resulting nested iterator:
l = itertools.chain.from_iterable(map(twocombinations, comb))

Python: Append double items to new array

lets say I have an array "array_1" with these items:
A b A c
I want to get a new array "array_2" which looks like this:
b A c A
I tried this:
array_1 = ['A','b','A','c' ]
array_2 = []
for item in array_1:
if array_1[array_1.index(item)] == array_1[array_1.index(item)].upper():
array_2.append(array_1[array_1.index(item)+1]+array_1[array_1.index(item)])
The problem: The result looks like this:
b A b A
Does anyone know how to fix this? This would be really great!
Thanks, Nico.
It's because you have 2 'A' in your array. In both case for the 'A',
array_1[array_1.index(item)+1
will equal 'b' because the index method return the first index of 'A'.
To correct this behavior; i suggest to use an integer you increment for each item. In that cas you'll retrieve the n-th item of the array and your program wont return twice the same 'A'.
Responding to your comment, let's take back your code and add the integer:
array_1 = ['A','b','A','c' ]
array_2 = []
i = 0
for item in array_1:
if array_1[i] == array_1[i].upper():
array_2.append(array_1[i+1]+array_1[i])
i = i + 1
In that case, it works but be careful, you need to add an if statement in the case the last item of your array is an 'A' for example => array_1[i+1] won't exist.
I think that simple flat list is the wrong data structure for the job if each lower case letter is paired with the consecutive upper case letter. If would turn it into a list of two-tuples i.e.:
['A', 'b', 'A', 'c'] becomes [('A', 'b'), ('A', 'c')]
Then if you are looping through the items in the list:
for item in list:
print(item[0]) # prints 'A'
print(item[1]) # prints 'b' (for first item)
To do this:
input_list = ['A', 'b', 'A', 'c']
output_list = []
i = 0;
while i < len(input_list):
output_list.append((input_list[i], input_list[i+1]))
i = i + 2;
Then you can swap the order of the upper case letters and the lower case letters really easily using a list comprehension:
swapped = [(item[1], item[0]) for item in list)]
Edit:
As you might have more than one lower case letter for each upper case letter you could use a list for each group, and then have a list of these groups.
def group_items(input_list):
output_list = []
current_group = []
while not empty(input_list):
current_item = input_list.pop(0)
if current_item == current_item.upper():
# Upper case letter, so start a new group
output_list.append(current_group)
current_group = []
current_group.append(current_item)
Then you can reverse each of the internal lists really easily:
[reversed(group) for group in group_items(input_list)]
According to your last comment, you can get what you want using this
array_1 = "SMITH Mike SMITH Judy".split()
surnames = array_1[1::2]
names = array_1[0::2]
print array_1
array_1[0::2] = surnames
array_1[1::2] = names
print array_1
You get:
['SMITH', 'Mike', 'SMITH', 'Judy']
['Mike', 'SMITH', 'Judy', 'SMITH']
If I understood your question correctly, then you can do this:
It will work for any length of array.
array_1 = ['A','b','A','c' ]
array_2 = []
for index,itm in enumerate(array_1):
if index % 2 == 0:
array_2.append(array_1[index+1])
array_2.append(array_1[index])
print array_2
Output:
['b', 'A', 'c', 'A']

python how to efficiently cycle through few elements in a list

I have a very long list in wich I would like to replace strings. I have made a simplified example below to illustrate my problem.
my_list = ['a7_1_1', 'a7_2_1', 'a7_3_1','a7_1_2', 'a7_2_2', 'a7_3_2','a7_1_3', 'a7_2_3', 'a7_3_3']
Out[12]:
['a7_1_1',
'a7_2_1',
'a7_3_1',
'a7_1_2',
'a7_2_2',
'a7_3_2',
'a7_1_3',
'a7_2_3',
'a7_3_3'
I would like to replace the strings with a suffix added to the first 3 strings so the final list should look like:
my_new_list = ['a7_1_1', 'a7_2_1', 'a7_3_1','a7_1_1.1', 'a7_2_1.1', 'a7_3_1.1','a7_1_1.2', 'a7_2_1.2', 'a7_3_1.2']
Out[15]:
['a7_1_1',
'a7_2_1',
'a7_3_1',
'a7_1_1.1',
'a7_2_1.1',
'a7_3_1.1',
'a7_1_1.2',
'a7_2_1.2',
'a7_3_1.2']
Is there an easy way to do this?
Using itertools.cycle() function
import itertools as it #1
def cycle_first_n(lst, n):
""" cycles through first n elements of the list """
c = it.cycle(lst[:n]) #2
for idx in xrange(len(lst)): #3
sfx = idx / n
yield c.next() + ('.' + str(sfx) if sfx > 0 else '') #4
itertools is a library for creating iterators for
efficient looping
creates an iterator to cycles through a slice of n elements of
the list
use xrange rather than range to avoid creating a presumably long
list in memory (see the question)
yield means we are creating a generator. Again to avoid creating a
long list in memory
How to use the function
lst = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
for o in cycle_first_n(lst, 3):
print o,
Output
a b c a.1 b.1 c.1 a.2 b.2
I'm not very clear with what you mean. Check if you want to do this:
>>> my_list = ['a7_1_1', 'a7_2_1', 'a7_3_1','a7_1_2', 'a7_2_2', 'a7_3_2','a7_1_3', 'a7_2_3', 'a7_3_3']
>>> my_new_list = sum([[x, x+'.1', x+'.2'] for x in my_list[:3]], [])
>>> print(my_new_list)
['a7_1_1', 'a7_1_1.1', 'a7_1_1.2', 'a7_2_1', 'a7_2_1.1', 'a7_2_1.2', 'a7_3_1', 'a7_3_1.1', 'a7_3_1.2']

How to pop() a list n times in Python?

I have a photo chooser function that counts the number of files in a given directory and makes a list of them. I want it to return only 5 image URLs. Here's the function:
from os import listdir
from os.path import join, isfile
def choose_photos(account):
photos = []
# photos dir
pd = join('C:\omg\photos', account)
# of photos
nop = len([name for name in listdir(location) if isfile(name)]) - 1
# list of photos
pl = list(range(0, nop))
if len(pl) > 5:
extra = len(pl) - 5
# How can I pop extra times, so I end up with a list of 5 numbers
shuffle(pl)
for p in pl:
photos.append(join('C:\omg\photos', account, str(p) + '.jpg'))
return photos
I'll go ahead and post a couple answers. The easiest way to get some of a list is using slice notation:
pl = pl[:5] # get the first five elements.
If you really want to pop from the list this works:
while len(pl) > 5:
pl.pop()
If you're after a random selection of the choices from that list, this is probably most effective:
import random
random.sample(range(10), 3)
Since this is a list, you can just get the last five elements by slicing it:
last_photos = photos[5:]
This will return a shallow copy, so any edit in any of the lists will be reflected in the other. If you don't want this behaviour you should first make a deep copy.
import copy
last_photos = copy.deepcopy(photos)[5:]
edit:
should of course have been [5:] instead of [:-5]
But if you actually want to 'pop' it 5 times, this means you want the list without its last 5 elements...
In most languages pop() removes and returns the last element from a collection. So to remove and return n elements at the same time, how about:
def getPhotos(num):
return [pl.pop() for _ in range(0,num)]
Quick and simple -
a = list("test string")
print(a[5:])#['s', 't', 'r', 'i', 'n', 'g']
a[:5] = []
print(a)#['s', 't', 'r', 'i', 'n', 'g']

Categories

Resources