python - Splitting a list of integers into list of digits - python

I have a list which contains 8-digit integers, where each integer represents a flag. e.g:
qc = [11221427, 23414732, 144443277,...]
I want to create 8 new variables where first variable is the first digit of all the numbers and so on. e.g:
qc1 = [1,2,1]
qc2 = [1,3,4]
I am able to calculate it using the following code:
qc_str = [str(e) for e in qc]
k,l = 0,0
for item in qc_str:
qc1[k] = int(qc_str[k][l])
qc2[k] = int(qc_str[k][l+1])
qc3[k] = int(qc_str[k][l+2])
qc4[k] = int(qc_str[k][l+3])
qc5[k] = int(qc_str[k][l+4])
qc6[k] = int(qc_str[k][l+5])
qc7[k] = int(qc_str[k][l+6])
qc8[k] = int(qc_str[k][l+7])
k += 1
It takes a lot of time for running on 100,000 rows. Is there a better or faster way of doing it. Any thoughts would be appreciated.

This is one way:
qc = [11221427, 23414732, 144443277]
lst = [list(map(int, i)) for i in zip(*map(str, qc))]
# [[1, 2, 1],
# [1, 3, 4],
# [2, 4, 4],
# [2, 1, 4],
# [1, 4, 4],
# [4, 7, 3],
# [2, 3, 2],
# [7, 2, 7]]
If you really need these as separate variables, either use lst[idx] or a dictionary {i: j for i, j in enumerate(lst, 1)}.

If by faster you meant a lower processing time, you should note that the str() and int() casts are computationally pretty expensive.
You should consider using integer division and modulus to extract the single digits:
k-th digit (from the left) = number / 10^(k-1) % 10
Here some quick dirty code I used to confirm my hypothesis.
import time
l = [x for x in range(1000000,9999999)]
l2 = []
l3 = []
start = time.time()
for x in l:
a = str(x)
l2.append(int(a[-2]))
stop = time.time()
print ("Elasped time: ", stop-start)
start = time.time()
for x in l:
l3.append(x//10 % 10)
stop = time.time()
print("Elapsed time: ", stop-start)
Basically I compare the timings between doing a str() and an int() and extracting the digits using integer division to extract the 2nd digits.
I get the following output:
13.855608940124512
5.115100622177124
That's a 2.5x performance boost.

Related

lookup function for repeated elements in an array

I have a list-like python object of positive integers and I want to get which locations on that list have repeated values. For example
if input is [0,1,1] the function should return [1,2] because the value of 1, which is the element at position 1 and 2 of the input array appears twice. Similarly:
[0,13,13] should return [[1, 2]]
[0,1,2,1,3,4,2,2] should return [[1, 3], [2, 6, 7]] because 1 appears twice, at positions [1, 3] of the input array and 2 appears 3 times at positions [2, 6, 7]
[1, 2, 3] should return an empty array []
What I have written is this:
def get_locations(labels):
out = []
label_set = set(labels)
for label in list(label_set):
temp = [i for i, j in enumerate(labels) if j == label]
if len(temp) > 1:
out.append(np.array(temp))
return np.array(out)
While it works ok for small input arrays it gets too slow when size grows. For instance, The code below on my pc, skyrockets from 0.14secs when n=1000 to 12secs when n = 10000
from timeit import default_timer as timer
start = timer()
n = 10000
a = np.arange(n)
b = np.append(a, a[-1]) # append the last element to the end
out = get_locations(b)
end = timer()
print(out)
print(end - start) # Time in seconds
How can I speed this up please? Any ideas highly appreciated
Your nested loop results in O(n ^ 2) in time complexity. You can instead create a dict of lists to map indices to each label, and extract the sub-lists of the dict only if the length of the sub-list is greater than 1, which reduces the time complexity to O(n):
def get_locations(labels):
positions = {}
for index, label in enumerate(labels):
positions.setdefault(label, []).append(index)
return [indices for indices in positions.values() if len(indices) > 1]
so that get_locations([0, 1, 2, 1, 3, 4, 2, 2]) returns:
[[1, 3], [2, 6, 7]]
Your code is slow because of the nested for-loop. You can solve this in a more efficient way by using another data structure:
from collections import defaultdict
mylist = [0,1,2,1,3,4,2,2]
output = defaultdict(list)
# Loop once over mylist, store the indices of all unique elements
for i, el in enumerate(mylist):
output[el].append(i)
# Filter out elements that occur only once
output = {k:v for k, v in output.items() if len(v) > 1}
This produces the following output for your example b:
{1: [1, 3], 2: [2, 6, 7]}
You can turn this result into the desired format:
list(output.values())
> [[1, 3], [2, 6, 7]]
Know however that this relies on the dictionary being insertion ordered, which is only the case as of python 3.6.
Heres a code i implemented. It runs in linear time:
l = [0,1,2,1,3,4,2,2]
dict1 = {}
for j,i in enumerate(l): # O(n)
temp = dict1.get(i) # O(1) most cases
if not temp:
dict1[i] = [j]
else:
dict1[i].append(j) # O(1)
print([item for item in dict1.values() if len(item) > 1]) # O(n)
Output:
[[1, 3], [2, 6, 7]]
This is essentially a time-complexity issue. Your algorithm has nested for loops that iterate through the list twice, so the time complexity is of the order of n^2, where n is the size of the list. So when you multiply the size of the list by 10 (from 1,000 to 10,000), you see an approximate time increase of 10^2 = 100. This is why it goes from 0.14 s to 12 s.
Here is a simple solution with no extra libraries required:
def get_locations(labels):
locations = {}
for index, label in enumerate(labels):
if label in locations:
locations[label].append(index)
else:
locations[label] = [index]
return [locations[i] for i in locations if len(locations[i]) > 1]
Since the for loops are not nested, the time complexity is approximately 2n, so you should see about a 4-times increase in time whenever the problem size is doubled.
you can try using "Counter" function from "collections" module
from collections import Counter
list1 = [1,1,2,3,4,4,4]
Counter(list1)
you will get an output similar to this
Counter({4: 3, 1: 2, 2: 1, 3: 1})

Error in converting an array to 2D matrix in Python

I have the problem regarding the output of this algorithm. For example: for input chunk([1, 2, 3, 4, 5, 6, 7, 8], 3) it should return [[ 1, 2, 3], [4, 5, 6], [7, 8, '']] but instead it returns [[7, 8, 6], [7, 8, 6], [7, 8, 6]].
However, when m_list is defined under the loop for r in range(rows):, it returns correct value.
I can't figure out why it returns wrong value if m_list is defined outside the loop for r in range(rows):. What could be the reason ?
# --- Directions
# Given an array and chunk size, divide the array into many subarrays
# where each subarray is of length size
# --- Examples
# chunk([1, 2, 3, 4], 2) --> [[ 1, 2], [3, 4]]
# chunk([1, 2, 3, 4, 5], 2) --> [[ 1, 2], [3, 4], [5, '']]
# chunk([1, 2, 3, 4, 5, 6, 7], 3) --> [[ 1, 2, 3], [4, 5, 6], [7, '', '']]
import math
def chunk (array, size):
rows = 0
l = len(array)
if l % size == 0:
rows = l/size
else:
rows = int(math.floor(l/size) + 1)
m_list = ['' for e in range(size)]
m_matrix = [['' for g in range(size)] for w in range(rows)]
i = 0
for r in range(rows):
for u in range(size):
if i == l:
break
else:
m_list[u] = array[i]
i += 1
m_matrix[r] = m_list
return m_matrix
length = int(raw_input('how many elements you want in the array?: '))
m_inputArray = ['' for q in range(length)]
print 'Debug0:--> ' + str(m_inputArray)
for z in range(length):
p = int(raw_input('Enter the value at index %i: ' %(z)))
m_inputArray[z] = p
m_inputSize = int(raw_input('Enter the size: '))
result = chunk(m_inputArray, m_inputSize)
print result
There are several things wrong with your code. Firstly every loop of u the start value of m_list is the previous list (so the first time it is ['','',''] but the second time it is [1,2,3], and the third time it is [4,5,6]. Which means that since the third time only one value is left in the array, only the first value in the m_list gets redefined, resulting in an m_list of [7,5,6].
Secondly, by saying: m_matrix[u] = m_list you are creating a reference to m_list, you are not copying m_list into m_matrix. This means that once m_list changes, so do the values in m_matrix. Which means in the end you will have defined m_matrix to be [m_list,m_list,m_list], resulting in your results of [[7,5,6],[7,5,6],[7,5,6]]. A solution for this would be to make slice of m_list, like this: m_matrix = m_list[:].
This is how I would do the whole thing:
def chunk(inputarray,size):
array = inputarray[:]
m_matrix = []
while len(array) > 0:
if len(array[:size]) < size:
array.extend(['' for j in range(size-len(array[:size]))])
m_matrix.append(array[:size])
del array[:size]
return m_matrix
If you don't need the original array anymore you can also remove the array = inputarray[:] line of code. Also, probably not the fastest/best way of doing this, but I just wanted to provide something quick. This was done in python 2.7, so if you're using another version you might have to alter some things.
seems a bit over complicated. this is what i came up with.
written for python 3 but does work in 2.
def pop_with_replace(array, index=0, blank=''):
try:
return array.pop(index)
except IndexError:
return blank
def chunk(array, size):
out = []
while array:
t_list = []
for i in range(size):
t_list.append(pop_with_replace(array))
out.append(t_list)
return out
if __name__ == '__main__':
print(chunk(list(range(10)), 3))
there's some things we could change as well. like removing this method pop_with_replace for a ternary operator? i didn't put this in the first solution as they can be awkward to read if not used to them.
t_list.append(array.pop() if array else '')
looking at this we could roll it all up into a list comp. but we're starting to get hard to read.
While array:
out.append([array.pop(0) if array else '' for x in range(size)]
but it does leave the final code looking nice and small.
def chunk(array, size):
out = []
while array:
out.append([array.pop(0) if array else '' for x in range(size)])
return out
In this example there is only one m_list where you update values and final result is [m_list, m_list, m_list ... m_list].
If you define m_list in loop new list will be created in each loop pass.
You can assign directly to m_matrix[r][u] = array[i]
Note that you are with list of lists, not true matrix, and m_matrix[r] = m_list replaces list on r-th position with reference to m_list list.

merge adjacent number in a list in python

I have a list that contains a random number of ints.
I would like to iterate over this list, and if a number and the successive number are within one numeric step of one another, I would like to concatenate them into a sublist.
For example:
input = [1,2,4,6,7,8,10,11]
output = [[1,2],[4],[6,7,8],[10,11]]
The input list will always contain positive ints sorted in increasing order.
I tried some of the code from here.
initerator = iter(inputList)
outputList = [c + next(initerator, "") for c in initerator]
Although I can concat every two entries in the list, I cannot seem to add a suitable if in the list comprehension.
Python version = 3.4
Unless you have to have a one-liner, you could use a simple generator function, combining elements until you hit a non consecutive element:
def consec(lst):
it = iter(lst)
prev = next(it)
tmp = [prev]
for ele in it:
if prev + 1 != ele:
yield tmp
tmp = [ele]
else:
tmp.append(ele)
prev = ele
yield tmp
Output:
In [2]: lst = [1, 2, 4, 6, 7, 8, 10, 11]
In [3]: list(consec(lst))
Out[3]: [[1, 2], [4], [6, 7, 8], [10, 11]]
Nice way (found the "splitting" indices and then slice:
input = [1,2,4,6,7,8,10,11]
idx = [0] + [i+1 for i,(x,y) in enumerate(zip(input,input[1:])) if x+1!=y] + [len(input)]
[ input[u:v] for u,v in zip(idx, idx[1:]) ]
#output:
[[1, 2], [4], [6, 7, 8], [10, 11]]
using enumerate() and zip().
Simplest version I have without any imports:
def mergeAdjNum(l):
r = [[l[0]]]
for e in l[1:]:
if r[-1][-1] == e - 1:
r[-1].append(e)
else:
r.append([e])
return r
About 33% faster than one liners.
This one handles the character prefix grouping mentioned in a comment:
def groupPrefStr(l):
pattern = re.compile(r'([a-z]+)([0-9]+)')
r = [[l[0]]]
pp, vv = re.match(pattern, l[0]).groups()
vv = int(vv)
for e in l[1:]:
p,v = re.match(pattern, e).groups()
v = int(v)
if p == pp and v == vv + 1:
r[-1].append(e)
else:
pp, vv = p, v
r.append([e])
return r
This is way slower than the number only one. Knowing the exact format of the prefix (only one char ?) could help avoid using the re module and speed things up.

Python: compute average of n-th elements in list of lists with different lengths

Suppose I have the following list of lists:
a = [
[1, 2, 3],
[2, 3, 4],
[3, 4, 5, 6]
]
I want to have the average of each n-th element in the arrays. However, when wanting to do this in a simple way, Python generated out-of-bounds errors because of the different lengths. I solved this by giving each array the length of the longest array, and filling the missing values with None.
Unfortunately, doing this made it impossible to compute an average, so I converted the arrays into masked arrays. The code shown below works, but it seems rather cumbersome.
import numpy as np
import numpy.ma as ma
a = [ [1, 2, 3],
[2, 3, 4],
[3, 4, 5, 6] ]
# Determine the length of the longest list
lenlist = []
for i in a:
lenlist.append(len(i))
max = np.amax(lenlist)
# Fill each list up with None's until required length is reached
for i in a:
if len(i) <= max:
for j in range(max - len(i)):
i.append(None)
# Fill temp_array up with the n-th element
# and add it to temp_array
temp_list = []
masked_arrays = []
for j in range(max):
for i in range(len(a)):
temp_list.append(a[i][j])
masked_arrays.append(ma.masked_values(temp_list, None))
del temp_list[:]
# Compute the average of each array
avg_array = []
for i in masked_arrays:
avg_array.append(np.ma.average(i))
print avg_array
Is there a way to do this more quickly? The final list of lists will contain 600000 'rows' and up to 100 'columns', so efficiency is quite important :-).
tertools.izip_longest would do all the padding with None's for you so your code can be reduced to:
import numpy as np
import numpy.ma as ma
from itertools import izip_longest
a = [ [1, 2, 3],
[2, 3, 4],
[3, 4, 5, 6] ]
averages = [np.ma.average(ma.masked_values(temp_list, None)) for temp_list in izip_longest(*a)]
print(averages)
[2.0, 3.0, 4.0, 6.0]
No idea what the fastest way in regard to the numpy logic but this is definitely going to be a lot more efficient than your own code.
If you wanted a faster pure python solution:
from itertools import izip_longest, imap
a = [[1, 2, 3],
[2, 3, 4],
[3, 4, 5, 6]]
def avg(x):
x = filter(None, x)
return sum(x, 0.0) / len(x)
filt = imap(avg, izip_longest(*a))
print(list(filt))
[2.0, 3.0, 4.0, 6.0]
If you have 0's in the arrays that won't work as 0 will be treated as Falsey, you will have to use a list comp to filter in that case but it will still be faster:
def avg(x):
x = [i for i in x if i is not None]
return sum(x, 0.0) / len(x)
filt = imap(avg, izip_longest(*a))
Here's an almost* fully vectorized solution based on np.bincount and np.cumsum -
# Store lengths of each list and their cumulative and entire summations
lens = np.array([len(i) for i in a]) # Only loop to get lengths
C = lens.cumsum()
N = lens.sum()
# Create ID array such that the first element of each list is 0,
# the second element as 1 and so on. This is needed in such a format
# for use with bincount later on.
shifts_arr = np.ones(N,dtype=int)
shifts_arr[C[:-1]] = -lens[:-1]+1
id_arr = shifts_arr.cumsum()-1
# Use bincount to get the summations and thus the
# averages across all lists based on their positions.
avg_out = np.bincount(id_arr,np.concatenate(a))/np.bincount(id_arr)
-* Almost because we are getting the lengths of lists with a loop, but with minimal computation involved there, must not affect the total runtime hugely.
Sample run -
In [109]: a = [ [1, 2, 3],
...: [2, 3, 4],
...: [3, 4, 5, 6] ]
In [110]: lens = np.array([len(i) for i in a])
...: C = lens.cumsum()
...: N = lens.sum()
...:
...: shifts_arr = np.ones(N,dtype=int)
...: shifts_arr[C[:-1]] = -lens[:-1]+1
...: id_arr = shifts_arr.cumsum()-1
...:
...: avg_out = np.bincount(id_arr,np.concatenate(a))/np.bincount(id_arr)
...:
In [111]: avg_out
Out[111]: array([ 2., 3., 4., 6.])
You can already clean your code to compute the max length: this single line does the job:
len(max(a,key=len))
Combining with other answer you will get the result like so:
[np.mean([x[i] for x in a if len(x) > i]) for i in range(len(max(a,key=len)))]
You can also avoid the masked array and use np.nan instead:
def replaceNoneTypes(x):
return tuple(np.nan if isinstance(y, type(None)) else y for y in x)
a = [np.nanmean(replaceNoneTypes(temp_list)) for temp_list in zip_longest(*df[column], fillvalue=np.nan)]
On your test array:
[np.mean([x[i] for x in a if len(x) > i]) for i in range(4)]
returns
[2.0, 3.0, 4.0, 6.0]
If you are using Python version >= 3.4, then import the statistics module
from statistics import mean
if using lower versions, create a function to calculate mean
def mean(array):
sum = 0
if (not(type(array) == list)):
print("there is some bad format in your input")
else:
for elements in array:
try:
sum = sum + float(elements)
except:
print("non numerical entry found")
average = (sum + 0.0) / len(array)
return average
Create a list of lists, for example
myList = [[1,2,3],[4,5,6,7,8],[9,10],[11,12,13,14],[15,16,17,18,19,20,21,22],[23]]
iterate through myList
for i, lists in enumerate(myList):
print(i, mean(lists))
This will print down the sequence n, and the average of nth list.
To find particularly the average of only nth list, create a function
def mean_nth(array, n):
if((type(n) == int) and n >= 1 and type(array) == list):
return mean(myList[n-1])
else:
print("there is some bad format of your input")
Note that index starts from zero, so for instance if you are looking for the mean of 5th list, it will be at index 4. this explains n-1 in the code.
And then call the function, for example
avg_5thList = mean_nth(myList, 5)
print(avg_5thList)
Running the above code on myList yields following result:
0 2.0
1 6.0
2 9.5
3 12.5
4 18.5
5 23.0
18.5
where the first six lines are generated from the iterative loop, and display the index of nth list and list average. Last line (18.5) displays the average of 5th list as a result of mean_nth(myList, 5) call.
Further, for a list like yours,
a = [
[1, 2, 3],
[2, 3, 4],
[3, 4, 5, 6]
]
Lets say you want average of 1st elements, i.e. (1+2+3)/3 = 2, or 2nd elements, i.e., (2+3+4)/3 = 3, or 4th elements such as 6/1 = 6, you will need to find the length of each list so that you can identify in the nth element exists in a list or not. For that, you first need to arrange your list of lists in the order of length of lists.
You can either
1) first sort the main list according to size of constituent lists iteratively, and then go through the sorted list to identify if the constituent lists are of sufficient length
2) or you can iteratively look into the original list for length of constituent lists.
(I can definitely get back with working out a faster recursive algorithm if needed)
Computationally second one is more efficient, so assuming that your 5th element means 4th in the index(0, 1, 2, 3, 4), or nth element means (n-1)th element, lets go with that and create a function
def find_nth_average(array, n):
if(not(type(n) == int and (int(n) >= 1))):
return "Bad input format for n"
else:
if (not(type(array) == list)):
return "Bad input format for main list"
else:
total = 0
count = 0
for i, elements in enumerate(array):
if(not(type(elements) == list)):
return("non list constituent found at location " + str(i+1))
else:
listLen = len(elements)
if(int(listLen) >= n):
try:
total = total + elements[n-1]
count = count + 1
except:
return ("non numerical entity found in constituent list " + str(i+1))
if(int(count) == 0):
return "No such n-element exists"
else:
average = float(total)/float(count)
return average
Now lets call this function on your list a
print(find_nth_average(a, 0))
print(find_nth_average(a, 1))
print(find_nth_average(a, 2))
print(find_nth_average(a, 3))
print(find_nth_average(a, 4))
print(find_nth_average(a, 5))
print(find_nth_average(a, 'q'))
print(find_nth_average(a, 2.3))
print(find_nth_average(5, 5))
The corresponding results are:
Bad input format for n
2.0
3.0
4.0
6.0
No such n-element exists
Bad input format for n
Bad input format for n
Bad input format for main list
If you have an erratic list, like
a = [[1, 2, 3], 2, [3, 4, 5, 6]]
that contains a non - list element, you get an output:
non list constituent found at location 2
If your constituent list is erratic, like:
a = [[1, 'p', 3], [2, 3, 4], [3, 4, 5, 6]]
that contains a non - numerical entity in a list, and find the average of 2nd elements by print(find_nth_average(a, 2))
you get an output:
non numerical entity found in constituent list 1

Split list into smaller lists (split in half)

I am looking for a way to easily split a python list in half.
So that if I have an array:
A = [0,1,2,3,4,5]
I would be able to get:
B = [0,1,2]
C = [3,4,5]
A = [1,2,3,4,5,6]
B = A[:len(A)//2]
C = A[len(A)//2:]
If you want a function:
def split_list(a_list):
half = len(a_list)//2
return a_list[:half], a_list[half:]
A = [1,2,3,4,5,6]
B, C = split_list(A)
A little more generic solution (you can specify the number of parts you want, not just split 'in half'):
def split_list(alist, wanted_parts=1):
length = len(alist)
return [ alist[i*length // wanted_parts: (i+1)*length // wanted_parts]
for i in range(wanted_parts) ]
A = [0,1,2,3,4,5,6,7,8,9]
print split_list(A, wanted_parts=1)
print split_list(A, wanted_parts=2)
print split_list(A, wanted_parts=8)
f = lambda A, n=3: [A[i:i+n] for i in range(0, len(A), n)]
f(A)
n - the predefined length of result arrays
def split(arr, size):
arrs = []
while len(arr) > size:
pice = arr[:size]
arrs.append(pice)
arr = arr[size:]
arrs.append(arr)
return arrs
Test:
x=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
print(split(x, 5))
result:
[[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13]]
If you don't care about the order...
def split(list):
return list[::2], list[1::2]
list[::2] gets every second element in the list starting from the 0th element.
list[1::2] gets every second element in the list starting from the 1st element.
Using list slicing. The syntax is basically my_list[start_index:end_index]
>>> i = [0,1,2,3,4,5]
>>> i[:3] # same as i[0:3] - grabs from first to third index (0->2)
[0, 1, 2]
>>> i[3:] # same as i[3:len(i)] - grabs from fourth index to end
[3, 4, 5]
To get the first half of the list, you slice from the first index to len(i)//2 (where // is the integer division - so 3//2 will give the floored result of1, instead of the invalid list index of1.5`):
>>> i[:len(i)//2]
[0, 1, 2]
..and the swap the values around to get the second half:
>>> i[len(i)//2:]
[3, 4, 5]
B,C=A[:len(A)/2],A[len(A)/2:]
Here is a common solution, split arr into count part
def split(arr, count):
return [arr[i::count] for i in range(count)]
def splitter(A):
B = A[0:len(A)//2]
C = A[len(A)//2:]
return (B,C)
I tested, and the double slash is required to force int division in python 3. My original post was correct, although wysiwyg broke in Opera, for some reason.
If you have a big list, It's better to use itertools and write a function to yield each part as needed:
from itertools import islice
def make_chunks(data, SIZE):
it = iter(data)
# use `xragne` if you are in python 2.7:
for i in range(0, len(data), SIZE):
yield [k for k in islice(it, SIZE)]
You can use this like:
A = [0, 1, 2, 3, 4, 5, 6]
size = len(A) // 2
for sample in make_chunks(A, size):
print(sample)
The output is:
[0, 1, 2]
[3, 4, 5]
[6]
Thanks to #thefourtheye and #Bede Constantinides
This is similar to other solutions, but a little faster.
# Usage: split_half([1,2,3,4,5]) Result: ([1, 2], [3, 4, 5])
def split_half(a):
half = len(a) >> 1
return a[:half], a[half:]
There is an official Python receipe for the more generalized case of splitting an array into smaller arrays of size n.
from itertools import izip_longest
def grouper(n, iterable, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
This code snippet is from the python itertools doc page.
10 years later.. I thought - why not add another:
arr = 'Some random string' * 10; n = 4
print([arr[e:e+n] for e in range(0,len(arr),n)])
While the answers above are more or less correct, you may run into trouble if the size of your array isn't divisible by 2, as the result of a / 2, a being odd, is a float in python 3.0, and in earlier version if you specify from __future__ import division at the beginning of your script. You are in any case better off going for integer division, i.e. a // 2, in order to get "forward" compatibility of your code.
#for python 3
A = [0,1,2,3,4,5]
l = len(A)/2
B = A[:int(l)]
C = A[int(l):]
General solution split list into n parts with parameter verification:
def sp(l,n):
# split list l into n parts
if l:
p = len(l) if n < 1 else len(l) // n # no split
p = p if p > 0 else 1 # split down to elements
for i in range(0, len(l), p):
yield l[i:i+p]
else:
yield [] # empty list split returns empty list
Since there was no restriction put on which package we can use.. Numpy has a function called split with which you can easily split an array any way you like.
Example
import numpy as np
A = np.array(list('abcdefg'))
np.split(A, 2)
With hints from #ChristopheD
def line_split(N, K=1):
length = len(N)
return [N[i*length/K:(i+1)*length/K] for i in range(K)]
A = [0,1,2,3,4,5,6,7,8,9]
print line_split(A,1)
print line_split(A,2)
Another take on this problem in 2020 ... Here's a generalization of the problem. I interpret the 'divide a list in half' to be .. (i.e. two lists only and there shall be no spillover to a third array in case of an odd one out etc). For instance, if the array length is 19 and a division by two using // operator gives 9, and we will end up having two arrays of length 9 and one array (third) of length 1 (so in total three arrays). If we'd want a general solution to give two arrays all the time, I will assume that we are happy with resulting duo arrays that are not equal in length (one will be longer than the other). And that its assumed to be ok to have the order mixed (alternating in this case).
"""
arrayinput --> is an array of length N that you wish to split 2 times
"""
ctr = 1 # lets initialize a counter
holder_1 = []
holder_2 = []
for i in range(len(arrayinput)):
if ctr == 1 :
holder_1.append(arrayinput[i])
elif ctr == 2:
holder_2.append(arrayinput[i])
ctr += 1
if ctr > 2 : # if it exceeds 2 then we reset
ctr = 1
This concept works for any amount of list partition as you'd like (you'd have to tweak the code depending on how many list parts you want). And is rather straightforward to interpret. To speed things up , you can even write this loop in cython / C / C++ to speed things up. Then again, I've tried this code on relatively small lists ~ 10,000 rows and it finishes in a fraction of second.
Just my two cents.
Thanks!
from itertools import islice
Input = [2, 5, 3, 4, 8, 9, 1]
small_list_length = [1, 2, 3, 1]
Input1 = iter(Input)
Result = [list(islice(Input1, elem)) for elem in small_list_length]
print("Input list :", Input)
print("Split length list: ", small_list_length)
print("List after splitting", Result)
You can try something like this with numpy
import numpy as np
np.array_split([1,2,3,4,6,7,8], 2)
result:
[array([1, 2, 3, 4]), array([6, 7, 8])]

Categories

Resources