Related
I have a list of values which might or might not have certain cycles of elements in it.
I have written below code to extract the index and length of values after that which repeats in the list. My problem is that i now have multiple index and length of values which are repeating. How to remove these elements from main list to remove those cycle
data = [1,2,3,1,2,3,4,5,6,7,4,5,6,7,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,23,18]
minrun = 1
lendata = len(data)
for runlen in range(minrun, lendata // 2):
i = 0
while i < lendata - runlen * 2:
# print("i",i)
# print("runlen", runlen)
# print(lendata - runlen * 2)
s1 = data[i:i + runlen]
# print("s1",s1)
s2 = data[i + runlen:i + runlen * 2]
# print("s2",s2)
if s1 == s2:
print(i, runlen, s1)
i += runlen
else:
i += 1
Is there a better way to do this?
This could work, but I didn't test for other sequence, I only tested it for your example.
data = [1,2,3,1,2,3,4,5,6,7,4,5,6,7,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,23,18]
print(data)
checkindex = []
for j in range(1, len(data)):
for k in range(len(data)):
try:
if data[k] == data[j+k]:
if j+k not in checkindex:
checkindex.append(j+k)
except IndexError:
continue
checkindex = sorted(checkindex)
for i in range(len(checkindex)-1, -1, -1):
del data[checkindex[i]]
print(data)
Output:
[1, 2, 3, 1, 2, 3, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 23, 18]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 23, 18]
Explanation:
for k in range(len(data)):
if data[i+k] == data[j+k]:
Basically means like this for example for j=3:
data[0] == data[3+0]: True
data[1] == data[3+1]: True
data[2] == data[3+2]: True
data[3] == data[3+3]: False
For every True, append (j+k) --> if it is not duplicate. I recommend try printing out here and there to understand more, for example:
data = [1,2,3,1,2,3,4,5,6,7,4,5,6,7,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,23,18]
print(data)
checkindex = []
for j in range(1, len(data)):
for k in range(len(data)):
try:
if data[k] == data[j+k]:
print(j, k, j+k)
if j+k not in checkindex:
checkindex.append(j+k)
except IndexError:
continue
print(checkindex)
checkindex = sorted(checkindex)
for i in range(len(checkindex)-1, -1, -1):
del data[checkindex[i]]
print(data)
Your output will be:
[1, 2, 3, 1, 2, 3, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 23, 18]
[]
[]
3 0 3
3 1 4
3 2 5
[3, 4, 5]
4 6 10
4 7 11
4 8 12
4 9 13
4 10 14
4 11 15
4 12 16
4 13 17
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17]
8 6 14
8 7 15
8 8 16
8 9 17
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17]
10 14 24
10 15 25
10 16 26
10 17 27
10 18 28
10 19 29
10 20 30
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30]
14 10 24
14 11 25
14 12 26
14 13 27
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30]
18 3 21
18 4 22
18 5 23
18 6 24
18 7 25
18 8 26
18 9 27
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30, 21, 22, 23]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30, 21, 22, 23]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30, 21, 22, 23]
21 0 21
21 1 22
21 2 23
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30, 21, 22, 23]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30, 21, 22, 23]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30, 21, 22, 23]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30, 21, 22, 23]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30, 21, 22, 23]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30, 21, 22, 23]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30, 21, 22, 23]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30, 21, 22, 23]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30, 21, 22, 23]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30, 21, 22, 23]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30, 21, 22, 23]
[3, 4, 5, 10, 11, 12, 13, 14, 15, 16, 17, 24, 25, 26, 27, 28, 29, 30, 21, 22, 23]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 23, 18]
I personally think the print statement helps me understanding more.
I'm interested in reordering the bits within a number, and since I want to do it several trillion times, I want to do it fast.
Here are the details: given a number num and an order matrix order.
order contains up to ~6000 lines of permutations of the numbers 0..31.
These are the positions to which the bits change.
Simplified example: binary(num) = 1001, order[1]=[0,1,3,2], reordered number for order[1] would be 1010 (binary).
Now I want to know, if my input number num is the smallest of these (~6000) reordered numbers. I'm searching for all 32-Bit numbers which fullfill this criterion.
My current approach is to slow, so I'm looking for a speedup.
minimal-reproducible-example:
num = 1753251840
order = [[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31],
[ 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12, 19, 18, 17, 16, 23, 22, 21, 20, 27, 26, 25, 24, 31, 30, 29, 28],
[15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16],
[31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0],
[ 0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23, 8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31],
[21, 20, 23, 22, 29, 28, 31, 30, 17, 16, 19, 18, 25, 24, 27, 26, 5, 4, 7, 6, 13, 12, 15, 14, 1, 0, 3, 2, 9, 8, 11, 10]]
patterns=set()
bits = format(num, '032b')
for perm in order:
bitsn = [bits[perm[i]] for i in range(32)]
patterns.add(int(''.join(bitsn),2))
print( min(patterns)==num)
Where can I start to improve this?
Extracting bits using string is generally very inefficient (whatever the language). The same thing also apply for parsing. Moreover, for such a fast low-level operation, you need to use a JIT or a compiled language as comments already pointed out.
Here is a prototype using the Numba's JIT (assume all numbers are unsigned):
npOrder = np.array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31],
[ 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12, 19, 18, 17, 16, 23, 22, 21, 20, 27, 26, 25, 24, 31, 30, 29, 28],
[15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16],
[31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0],
[ 0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23, 8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31],
[21, 20, 23, 22, 29, 28, 31, 30, 17, 16, 19, 18, 25, 24, 27, 26, 5, 4, 7, 6, 13, 12, 15, 14, 1, 0, 3, 2, 9, 8, 11, 10]], dtype=np.uint32)
#njit
def extractBits(num):
bits = np.empty(32, dtype=np.int32)
for i in range(32):
bits[i] = (num >> i) & 0x01
return bits
#njit
def permuteAndMerge(bits, perm):
bitsnFinal = 0
for i in range(32):
bitsnFinal |= bits[31-perm[i]] << i
return bitsnFinal
#njit
def computeOptimized(num):
bits = extractBits(num)
permCount = npOrder.shape[0]
patterns = np.empty(permCount, dtype=np.uint32)
for i in range(permCount):
patterns[i] = permuteAndMerge(bits, npOrder[i])
# The array can be converted to a set if needed here with: set(patterns)
return min(patterns) == num
This code is about 25 time faster than the original one on my machine (ran 5 000 000 times).
You can also use Numba to accelerate and parallelize the loop that run the function computeOptimized resulting in a significant additional speed-up.
Note that this code can be again much faster in C or C++ using low-level processor instructions (available for example on many x86_64 processors). With that and parallelism, the order of magnitude of the execution speed should be close to a billion of permutation per second.
Couple of possible speed-ups, staying with Python and the current algorithm:
Bail out as soon as you find a pattern less than num; once one like that is found, the condition cannot possibly be true. (You also don't need to store patterns; at most a flag whether an equal one was found, if that's not guaranteed by the problem.)
bitsn could be a generator expression, and doesn't need to be in a variable; you'll have to measure whether that's faster.
More fundamental improvements:
If you want to find all the numbers (rather than just test a particular one), it feels like there ought to be a faster algorithm by considering what the bits mean. A couple of hours thinking could potentially let you process just the 6000 lists, rather than all 2³² integers.
As others have written, if you're after pure speed, python is not the ideal language. That depends on the balance of how much time you want to spend on programming vs on running the program.
Side note:
Are the 32-bit integers signed or unsigned?
I'm trying to save some data using pandas as csv and then I'm gonna load and use those data in somewhere else. This is my code for saving data
shuffledInteractionsFirstSeq = [[11, 15, 19, 9, 6, 8, 15, 9, 12, 21, 14, 7, 10, 20, 15, 6, 6, 17, 10, 10, 10, 6, 11, 10, 11, 8, 2, 16, 1, 19, 4, 9, 10, 20, 19, 17, 19, 21, 21, 6, 19, 13, 19, 20, 9, 4, 1, 17, 17, 17, 10, 5, 2, 1, 16, 3, 1, 9, 1, 21, 3, 17, 4, 19, 7, 12, 19, 20, 1, 17, 7, 1, 2, 19, 13, 17, 3, 13, 12, 13, 14, 4, 8, 19, 10, 4, 12, 19, 17, 4, 12, 5, 12, 11, 20, 9, 12, 12, 11, 19, 4, 14, 11, 7, 4, 3, 8, 8, 16, 10, 20, 3, 14, 16, 10, 9, 13, 2, 19, 9, 10, 17, 13, 10, 2, 19, 17, 10, 7, 2, 17, 12, 10, 9, 12, 1, 17, 12, 17, 9, 16, 16, 12, 20, 9, 4, 11, 3, 15, 6, 4, 8, 9, 12, 2, 16, 5, 9, 19, 17, 17, 16, 8, 15, 12, 9, 11, 14, 9, 4, 21, 1, 10, 5, 21, 9, 10, 3, 19, 19, 13, 8, 3, 12, 3, 12, 17, 16, 21, 9, 10, 8, 12, 2, 12, 17, 16, 19, 8, 17, 14, 1, 2, 13, 9, 19, 16, 5, 4, 13, 8, 13, 8, 7, 21, 2, 1, 13, 1, 6, 5, 1, 8, 10, 9, 2, 12, 3, 9, 9, 5, 12, 6, 16, 6, 13, 2, 17, 12, 19, 16, 17, 19, 14, 2, 17, 7, 6, 8, 15, 13, 19, 19, 16, 17, 14, 10, 10, 10, 12, 6, 16, 10, 1, 4, 4, 6, 19, 19, 8, 15, 16, 4, 12, 5, 17, 3, 12, 1, 9, 17, 8, 8, 19, 14, 10, 9, 4, 16, 19, 4, 8, 12, 2, 17, 15, 13, 12, 12, 12, 17, 15, 9, 16, 8, 17, 8, 6, 13, 6, 15, 1, 5, 21, 1, 17, 6, 3, 8, 8, 6, 3, 8, 15, 14, 1, 7, 2, 12, 8, 16, 6, 4, 9, 20, 12, 12, 17, 10, 9, 14, 8, 19, 17, 9, 10, 14, 1, 14, 5, 6, 12, 9, 17, 8, 19, 5, 9, 14, 16, 16, 6, 6, 3, 13, 4, 8, 19, 11, 7, 16, 5, 12, 2, 6, 6, 4, 5, 5, 21, 2, 12, 16, 17, 14, 10, 5, 12, 16, 17, 20, 12, 12, 17, 8, 6, 13, 12, 12, 17, 12, 6, 17, 8, 17, 10, 13, 2, 15, 8, 9, 14, 8, 8, 12, 15, 20, 14, 4, 19, 6, 9, 1, 11, 21, 1, 13, 13, 8, 15, 6, 14, 8, 15, 2, 16, 16, 12, 8, 17, 6, 10, 10, 10, 17, 15, 3, 6, 6, 9, 4, 8, 16, 12, 17, 17, 4, 8, 5, 15, 13, 6, 6, 6, 3, 11, 15, 3, 12, 20, 15, 16, 4, 10, 21, 9, 21, 9, 19, 19, 9, 8, 4, 13, 10, 6, 19, 1, 13, 17, 9, 1, 9, 15, 15, 19, 19, 14, 15, 4, 9, 15, 1, 19, 17, 10, 6, 1, 11, 5, 10, 6, 5, 10, 6, 1, 1, 6, 16, 17, 11, 6, 1, 15, 16, 10, 17, 10, 17, 19, 14, 1, 15, 14, 10, 10, 16, 6, 8, 19, 14, 14, 14, 12, 12, 10, 10, 15, 1, 8, 4, 1, 14, 14, 7, 10, 10, 14, 10, 17, 19, 20, 6, 8, 9, 14, 10, 14, 1, 15, 19, 10, 1, 19, 4, 15, 21, 10, 9, 3, 14, 14, 10, 10, 6, 8, 20, 6, 2, 16, 6, 9, 10, 8, 2, 17, 17, 1, 19, 13, 20, 12, 1, 16, 20, 16, 12, 9, 16, 10, 3, 14, 8, 20, 12, 12, 11, 17, 20, 11, 4, 20, 4, 15, 4, 8, 3, 12, 21, 17, 12, 10, 8, 21, 17, 10, 8, 4, 4, 16, 14, 12, 14, 14, 4, 9, 12, 4, 14, 4, 10, 10, 4, 10, 3, 9, 20, 1, 16, 10, 20, 12, 20, 5, 3, 8, 16, 9, 20, 10, 20, 21, 8, 9, 8, 5, 8, 11, 8, 19, 6, 6, 10, 19, 6, 10, 15, 8, 19, 5, 17, 19, 10, 16, 8, 19, 12, 15, 19, 15, 14, 6, 21, 16, 13, 10, 16, 5, 14, 17, 15, 5, 13, 1, 13, 15, 6, 13, 3, 15, 13, 4, 6, 8, 4, 4, 4, 6, 6, 4, 15, 3, 15, 3, 15, 16, 16, 13, 10, 19, 7, 6, 10, 10, 1, 10, 8, 20, 3, 3, 10, 15, 16, 10, 2, 10, 5, 16, 21, 7, 15, 10, 15, 3, 10, 8, 10, 8, 1, 1, 15, 8, 19, 4, 10, 10, 6, 15, 15, 6, 20, 4, 1, 10, 9, 21, 20, 6, 12, 10, 10, 14, 21, 20, 8, 14, 4, 10, 9, 12, 16, 1, 19, 16, 10, 5, 3, 1, 8, 1, 8, 1, 19, 1, 4, 6, 17, 3, 15, 8, 8, 4, 19, 1, 14, 15, 8, 6, 15, 1, 5, 10, 7, 8, 13, 15, 15, 8, 15, 14, 6, 5, 4, 15, 1, 10, 10]]
shuffledInteractionsSecondSeq = [[11, 15, 17, 10, 1, 8, 10, 1, 1, 8, 10, 10, 19, 1, 10, 14, 1, 14, 1, 4, 13, 10, 14, 1, 15, 1, 3, 4, 19, 1, 1, 1, 13, 4, 14, 8, 1, 1, 3, 8, 13, 4, 19, 19, 19, 16, 10, 1, 20, 3, 4, 16, 10, 1, 13, 9, 7, 13, 6, 16, 15, 9, 12, 11, 1, 2, 21, 2, 15, 8, 13, 1, 2, 8, 1, 6, 4, 15, 15, 21, 6, 17, 2, 8, 21, 14, 6, 15, 10, 20, 1, 5, 2, 2]]
print(max([len(seq) for seq in shuffledInteractionsFirstSeq]))
# 847
print(max([len(seq) for seq in shuffledInteractionsSecondSeq]))
# 94
DataFrame(data={'Sequence1' : shuffledInteractionsFirstSeq, 'Sequence2' : shuffledInteractionsSecondSeq}).to_csv('interactionDataSet.csv', index = False)
interactionsCSV = read_csv('interactionDataSet.csv')
interactionsSequence1 = list(interactionsCSV.get('Sequence1'))
interactionsSequence2 = list(interactionsCSV.get('Sequence2'))
print(max([len(seq) for seq in interactionsSequence1]))
# 3026
print(max([len(seq) for seq in interactionsSequence2]))
# 327
as you can see the data is changing after saving. I know maybe I'm doing something wrong but I couldn't figure it out
This occures because read_csv doesn't recognize complex types like list and reads them as strings:
type(interactionsCSV.at[0, 'Sequence1'])
# <class 'str'>
One possible work around is to use pandas.eval function:
interactionsCSV['Sequence1'] = pd.eval(interactionsCSV['Sequence1'])
type(interactionsCSV.at[0, 'Sequence1'])
# <class 'list'>
max([len(s) for s in interactionsCSV.get('Sequence1')])
# 847
I have an array in python called dates with this content:
[datetime.datetime(2012, 1, 11, 17, 24, 12, 676000), datetime.datetime(2012, 2, 3, 11, 25, 17, 73000), datetime.datetime(2012, 2, 3, 14, 9, 23, 699000), datetime.datetime(2012, 2, 4, 9, 15, 26, 644000), datetime.datetime(2012, 2, 4, 17, 14, 36, 65000), datetime.datetime(2012, 2, 5, 6, 18, 31, 139000), datetime.datetime(2012, 2, 5, 14, 55, 28, 62000), datetime.datetime(2012, 2, 5, 18, 28, 59, 379000), datetime.datetime(2012, 2, 6, 12, 24, 21, 768000), datetime.datetime(2012, 2, 6, 17, 32, 46, 675000), datetime.datetime(2012, 2, 14, 11, 33, 6, 74000), datetime.datetime(2012, 2, 14, 11, 36, 48, 11000), datetime.datetime(2012, 2, 16, 8, 54, 14, 175000), datetime.datetime(2012, 2, 16, 18, 33, 9, 200000), datetime.datetime(2012, 2, 20, 8, 41, 2, 550000), datetime.datetime(2012, 2, 20, 9, 4, 37, 446000), datetime.datetime(2012, 2, 20, 10, 10, 42, 950000), datetime.datetime(2012, 2, 20, 21, 21, 21, 986000), datetime.datetime(2012, 2, 21, 9, 1, 8, 429000), datetime.datetime(2012, 2, 21, 12, 5, 20, 475000), datetime.datetime(2012, 2, 21, 13, 23, 25, 281000), datetime.datetime(2012, 2, 21, 15, 4, 29, 366000), datetime.datetime(2012, 2, 21, 15, 12, 21, 729000), datetime.datetime(2012, 2, 21, 15, 29, 10, 723000), datetime.datetime(2012, 2, 21, 18, 10, 24, 822000), datetime.datetime(2012, 2, 22, 10, 42, 11, 689000), datetime.datetime(2012, 2, 22, 13, 10, 1, 309000), datetime.datetime(2012, 2, 22, 20, 28, 34, 260000), datetime.datetime(2012, 2, 27, 17, 53, 19, 225000), datetime.datetime(2012, 2, 28, 8, 13, 57, 139000), datetime.datetime(2012, 3, 2, 7, 55, 11, 505000), datetime.datetime(2012, 3, 2, 21, 6, 35, 270000), datetime.datetime(2012, 3, 5, 8, 10, 47, 76000), datetime.datetime(2012, 3, 5, 9, 15, 15, 448000), datetime.datetime(2012, 3, 7, 18, 15, 35, 401000), datetime.datetime(2012, 3, 15, 8, 6, 56, 968000), datetime.datetime(2012, 3, 16, 15, 34, 10, 59000), datetime.datetime(2012, 3, 20, 18, 19, 13, 687000), datetime.datetime(2012, 3, 22, 8, 50, 28, 983000), datetime.datetime(2012, 3, 23, 8, 26, 5, 468000), datetime.datetime(2012, 3, 27, 7, 50, 14, 474000), datetime.datetime(2012, 3, 27, 15, 14, 35, 59000), datetime.datetime(2012, 4, 5, 7, 23, 1, 374000), datetime.datetime(2012, 4, 6, 13, 8, 59, 578000), datetime.datetime(2012, 4, 6, 13, 34, 24, 843000), datetime.datetime(2012, 4, 6, 15, 35, 40, 538000), datetime.datetime(2012, 4, 10, 7, 0, 37, 455000), datetime.datetime(2012, 4, 10, 7, 12, 37, 199000), datetime.datetime(2012, 4, 10, 7, 39, 16, 366000), datetime.datetime(2012, 4, 10, 7, 55, 51, 228000), datetime.datetime(2012, 4, 11, 7, 53, 31, 699000), datetime.datetime(2012, 4, 11, 15, 32, 21, 582000), datetime.datetime(2012, 4, 13, 10, 22, 4, 673000), datetime.datetime(2012, 4, 16, 7, 17, 20, 578000), datetime.datetime(2012, 11, 29, 16, 5, 21, 53000), datetime.datetime(2012, 11, 29, 16, 6, 15, 244000), datetime.datetime(2013, 1, 25, 9, 45, 48, 921000), datetime.datetime(2013, 2, 4, 18, 1, 1, 418000), datetime.datetime(2013, 2, 5, 6, 14, 55, 728000), datetime.datetime(2013, 2, 5, 17, 2, 11, 959000), datetime.datetime(2013, 2, 7, 6, 4, 8, 629000), datetime.datetime(2013, 2, 7, 18, 6, 47, 247000), datetime.datetime(2013, 2, 8, 5, 36, 55, 702000), datetime.datetime(2013, 2, 8, 8, 51, 46, 261000), datetime.datetime(2013, 2, 12, 5, 56, 37, 233000), datetime.datetime(2013, 2, 12, 16, 6, 25, 126000), datetime.datetime(2013, 2, 13, 7, 45, 33, 448000), datetime.datetime(2013, 2, 13, 10, 43, 15, 749000), datetime.datetime(2013, 2, 14, 6, 10, 27, 562000), datetime.datetime(2013, 2, 14, 16, 44, 45, 469000), datetime.datetime(2013, 2, 15, 6, 3, 12, 787000), datetime.datetime(2013, 2, 15, 14, 8, 40, 281000), datetime.datetime(2013, 2, 17, 11, 46, 41, 983000), datetime.datetime(2013, 2, 20, 15, 32, 52, 455000), datetime.datetime(2013, 2, 21, 16, 0, 40, 165000), datetime.datetime(2013, 2, 22, 9, 12, 55, 32000), datetime.datetime(2013, 2, 22, 15, 11, 45, 979000), datetime.datetime(2013, 2, 25, 6, 52, 49, 991000), datetime.datetime(2013, 2, 25, 8, 52, 8, 947000), datetime.datetime(2013, 2, 25, 9, 27, 7, 716000), datetime.datetime(2013, 2, 25, 9, 33, 21, 121000), datetime.datetime(2013, 2, 26, 7, 15, 0, 135000), datetime.datetime(2013, 2, 26, 16, 15, 39, 693000), datetime.datetime(2013, 2, 27, 6, 33, 23, 745000), datetime.datetime(2013, 2, 27, 17, 28, 47, 793000), datetime.datetime(2013, 2, 28, 5, 43, 32, 479000), datetime.datetime(2013, 2, 28, 17, 22, 15, 510000), datetime.datetime(2013, 3, 1, 6, 54, 21, 676000), datetime.datetime(2013, 3, 1, 15, 47, 19, 912000), datetime.datetime(2013, 3, 4, 17, 39, 55, 809000), datetime.datetime(2013, 3, 5, 6, 40, 35, 101000), datetime.datetime(2013, 3, 5, 17, 5, 4, 324000), datetime.datetime(2013, 3, 6, 6, 39, 42, 235000), datetime.datetime(2013, 3, 6, 16, 6, 29, 410000), datetime.datetime(2013, 3, 7, 6, 32, 56, 197000), datetime.datetime(2013, 3, 7, 17, 31, 39, 249000), datetime.datetime(2013, 3, 8, 6, 56, 44, 369000), datetime.datetime(2013, 3, 11, 7, 17, 20, 748000), datetime.datetime(2013, 3, 11, 17, 27, 43, 102000), datetime.datetime(2013, 3, 12, 7, 10, 24, 751000), datetime.datetime(2013, 3, 12, 10, 23, 44, 759000), datetime.datetime(2013, 3, 12, 15, 42, 20, 461000), datetime.datetime(2013, 3, 13, 7, 12, 40, 494000), datetime.datetime(2013, 3, 13, 12, 7, 24, 986000), datetime.datetime(2013, 3, 14, 6, 52, 10, 779000), datetime.datetime(2013, 3, 14, 16, 39, 12, 776000), datetime.datetime(2013, 3, 15, 7, 4, 26, 454000), datetime.datetime(2013, 3, 15, 16, 40, 37, 98000), datetime.datetime(2013, 3, 18, 6, 53, 56, 937000), datetime.datetime(2013, 3, 18, 16, 53, 26, 914000), datetime.datetime(2013, 3, 19, 6, 34, 41, 813000), datetime.datetime(2013, 3, 19, 17, 19, 59, 721000), datetime.datetime(2013, 3, 20, 6, 57, 37, 141000), datetime.datetime(2013, 3, 20, 15, 15, 43, 458000), datetime.datetime(2013, 3, 21, 15, 36, 12, 949000), datetime.datetime(2013, 3, 22, 6, 57, 21, 973000), datetime.datetime(2013, 3, 22, 15, 36, 14, 388000), datetime.datetime(2013, 3, 25, 7, 0, 43, 602000), datetime.datetime(2013, 3, 25, 18, 27, 0, 693000), datetime.datetime(2013, 3, 26, 17, 20, 48, 194000), datetime.datetime(2013, 3, 27, 7, 11, 17, 665000), datetime.datetime(2013, 3, 27, 18, 27, 41, 894000), datetime.datetime(2013, 3, 28, 7, 2, 8, 624000), datetime.datetime(2013, 3, 28, 11, 12, 22, 53000), datetime.datetime(2013, 4, 3, 5, 45, 23, 995000), datetime.datetime(2013, 4, 4, 6, 5, 39, 243000), datetime.datetime(2013, 4, 8, 6, 4, 34, 667000), datetime.datetime(2013, 4, 8, 17, 6, 8, 718000), datetime.datetime(2013, 4, 9, 6, 2, 32, 813000), datetime.datetime(2013, 4, 9, 15, 16, 46, 622000), datetime.datetime(2013, 4, 10, 5, 26, 16, 694000), datetime.datetime(2013, 4, 10, 18, 50, 54, 809000), datetime.datetime(2013, 4, 11, 15, 12, 29, 376000), datetime.datetime(2013, 4, 12, 6, 9, 38, 925000), datetime.datetime(2013, 4, 12, 14, 42, 32, 607000), datetime.datetime(2013, 4, 15, 10, 0, 59, 995000), datetime.datetime(2013, 4, 15, 10, 11, 42, 16000), datetime.datetime(2013, 4, 16, 6, 8, 3, 838000), datetime.datetime(2013, 4, 16, 15, 27, 35, 147000), datetime.datetime(2013, 4, 17, 6, 4, 44, 272000), datetime.datetime(2013, 4, 17, 15, 23, 0, 924000), datetime.datetime(2013, 4, 18, 6, 9, 55, 454000), datetime.datetime(2013, 4, 18, 15, 5, 43, 601000), datetime.datetime(2013, 4, 19, 6, 0, 38, 132000), datetime.datetime(2013, 4, 19, 16, 35, 26, 14000), datetime.datetime(2013, 4, 19, 17, 44, 17, 116000), datetime.datetime(2013, 4, 19, 17, 51, 48, 43000), datetime.datetime(2013, 4, 19, 17, 54, 30, 44000), datetime.datetime(2013, 4, 21, 14, 58, 56, 363000), datetime.datetime(2013, 4, 21, 15, 8, 11, 276000), datetime.datetime(2013, 4, 23, 6, 24, 57, 124000), datetime.datetime(2013, 4, 23, 15, 44, 30, 503000), datetime.datetime(2013, 4, 25, 6, 13, 9, 269000), datetime.datetime(2013, 4, 25, 15, 41, 11, 370000), datetime.datetime(2013, 4, 26, 6, 2, 17, 877000), datetime.datetime(2013, 4, 27, 16, 17, 34, 97000), datetime.datetime(2013, 4, 27, 18, 20, 57, 975000), datetime.datetime(2013, 4, 29, 10, 17, 41, 746000), datetime.datetime(2013, 4, 29, 16, 45, 18, 65000), datetime.datetime(2013, 4, 30, 6, 13, 2, 333000), datetime.datetime(2013, 4, 30, 15, 3, 22, 343000), datetime.datetime(2013, 5, 1, 7, 22, 40, 401000), datetime.datetime(2013, 5, 1, 11, 16, 38, 525000), datetime.datetime(2013, 5, 2, 6, 7, 7, 749000), datetime.datetime(2013, 5, 3, 12, 48, 22, 617000), datetime.datetime(2013, 5, 6, 6, 1, 1, 168000), datetime.datetime(2013, 5, 6, 14, 56, 48, 236000), datetime.datetime(2013, 5, 7, 16, 47, 4, 597000), datetime.datetime(2013, 5, 8, 15, 26, 52, 105000), datetime.datetime(2013, 5, 10, 6, 10, 39, 379000), datetime.datetime(2013, 5, 13, 6, 9, 57, 990000), datetime.datetime(2013, 5, 13, 19, 56, 15, 354000), datetime.datetime(2013, 5, 15, 16, 39, 9, 127000), datetime.datetime(2013, 5, 16, 5, 59, 27, 609000), datetime.datetime(2013, 5, 16, 14, 18, 33, 253000), datetime.datetime(2013, 5, 17, 6, 20, 11, 853000), datetime.datetime(2013, 5, 21, 15, 38, 10, 53000), datetime.datetime(2013, 5, 22, 5, 59, 8, 126000), datetime.datetime(2013, 5, 22, 15, 48, 55, 877000), datetime.datetime(2013, 5, 23, 5, 47, 4, 779000), datetime.datetime(2013, 5, 23, 16, 59, 16, 948000), datetime.datetime(2013, 5, 24, 10, 57, 34, 831000), datetime.datetime(2013, 5, 24, 12, 29, 17, 332000), datetime.datetime(2013, 5, 27, 17, 0, 14, 513000), datetime.datetime(2013, 6, 20, 7, 28, 45, 975000), datetime.datetime(2013, 6, 20, 13, 31, 13, 228000), datetime.datetime(2013, 6, 21, 6, 18, 47, 789000), datetime.datetime(2013, 7, 1, 6, 12, 3, 640000), datetime.datetime(2013, 7, 1, 14, 33, 9, 251000), datetime.datetime(2013, 7, 2, 14, 59, 0, 421000), datetime.datetime(2013, 7, 3, 6, 12, 58, 282000), datetime.datetime(2013, 7, 3, 17, 23, 38, 745000), datetime.datetime(2013, 7, 5, 13, 40, 44, 719000), datetime.datetime(2013, 7, 9, 14, 51, 27, 348000), datetime.datetime(2013, 7, 10, 5, 12, 3, 104000)]
It should be ordered by date. What I need to know is how many days apear on this array. If there are many dates of the same day, I'll count only 1.
I could do it "by hand", iterating over each point and checking to a temp variable and count the days, but isn't there a faster, proper way to "unique" by days?
thanks
You can use the datetime.date() method:
s = {d.date() for d in dates}
print len(s)
Since dates are hashable, you can put them in a set just fine...
Note that you could also get a count of how many times each date appeared:
import collections
print collections.Counter(d.date() for d in dates)
Or, even do a list of datetime instances keyed by date:
import collections
d = collections.defaultdict(list)
for dt in dates:
d[dt.date()].append(dt)
Although, I suppose that since the input is sorted, you could do the same thing more or less with itertools.groupby:
for date, dt_group in itertools.groupby(dates, key=lambda dt: dt.date()):
print date, list(dt_group)
If you simply want to count how many unique days there are, the following works:
print len({(i.day,i.month,i.year) for i in dates})
This is to ensure that it is infact the same date and not just the same day number, since 1st of November has the say .day as the 1st of december but they are obviously not the same day.
I've been trying to wrap my head around the best way to split this list of numbers up that are ordered but broken up in sections. Ex:
data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 29, 30, 31, 32, 33, 35, 36, 44, 45, 46, 47]
I'd like the output to be this..
sliced_data = [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19],[29, 30, 31, 32, 33],[35, 36],[44, 45, 46, 47]]
I've been trying a while look until it's empty but that isn't working too well..
Edit:
for each_half_hour in half_hour_blocks:
if next_number != each_half_hour:
skippers.append(half_hour_blocks[:next_number])
del half_hour_blocks[:next_number]
next_number = each_half_hour + 1
>>> data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 29, 30, 31, 32, 33, 35, 36, 44, 45, 46, 47]
>>> from itertools import groupby, count
>>> [list(g) for k,g in groupby(data, key=lambda i, c=count():i-next(c))]
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [29, 30, 31, 32, 33], [35, 36], [44, 45, 46, 47]]
I don't see why a while-loop wouldn't work here, unless you're going for something more efficient or succinct.
Something like:
slice = [data.pop(0)]
sliced_data = []
while data:
if data[0] == slice[-1] + 1:
slice.append(data.pop(0))
else:
sliced_data.append(slice)
slice = [data.pop(0)]
sliced_data.append(slice)