python - fastest way to find unique "words" in a string - python
I have a string, s, with a length of 2*2*1092-1 = 4367 characters. Here are the first 15 characters:
0 0 0 1 0 1 0 0
I am interested in the characters as couples; i.e. for the first 15 characters:
00, 01, 01, 00
My string only contains 0 and 1 and the possible couples are thus:
00, 01, 10 and 11
I want to get an unordered list of all the couples present in my string; i.e. for the first 15 characters:
00 and 01
What is the fastest way to do this in Python? Below are 3 methods (Python 3):
def method1(s):
l_couples = [s[4*i:4*i+3] for i in range(1092)]
set_couples = set(l_couples)
def method2(s):
set_couples = []
for i in range(1092):
couple = s[4*i:4*i+3]
if not couple in set_couples:
set_couples += [couple]
def method3(s):
set_couples = set()
for i in range(1092):
couple = s[4*i:4*i+3]
set_couples |= set([couple])
I looped over each method 10k times and got these run times:
method1: 3.94s
method2: 5.64s
method3: 10.7s
Here is the entire string consisting of 4367 characters:
0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 1 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 0 0 1 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Thank you.
Only optimize on such a low level if you really need to!
There is no reason why you should micro-optimize for such a small input amount of data. A string of ~4000 characters is nearly nothing.
Donald Knuth: "We should forget about small efficiencies, say about
97% of the time: premature optimization is the root of all evil"
So in this case the most readable and comprehensable solution would be the best.
But to be constructive, here is my proposal:
def method5(s):
return {s[x:x+3] for x in xrange(0, len(s), 4)}
%timeit method5(s)
10000 loops, best of 3: 123 us per loop
and here the method by another user:
def method4(s):
return {s[4*i:4*i + 3] for i in range(len(s) / 4)}
%timeit method4(s)
10000 loops, best of 3: 176 us per loop
and the benchmarks for the other methods:
%timeit method1(s)
10000 loops, best of 3: 184 us per loop
%timeit method2(s)
10000 loops, best of 3: 185 us per loop
%timeit method3(s)
1000 loops, best of 3: 513 us per loop
Instead of brute force indexing through the string like it was an array, learn about Python's features like split, and iter, and zip:
def uniqueCouples(s):
# get all the characters in a list, splitting on whitespace
items = source.split()
# create an iterator over this list
it = iter(items)
# using zip(it,it) will create a sequence of tuples, taking the generated list
# in pairs; then use ''.join() to merge the tuples into couples; then find the
# set of unique pairs
return set(''.join(couple) for couple in zip(it,it))
EDIT: adding performance numbers
To normalize out differences in hardware, I ran #michaelkrisper's method5 on my system, and got 189 us (note also that method5 returns '0 1' and '0 0', not '01' and '00' as requested).
Testing the above solution as posted gives a time of about 370 us. Then I realized that I was calling ''.join on every zipped tuple, instead of just on the reduced number of items after the set removed all the duplicates - who cares now if join is slow, we're only going to call it a couple of times. Changing the return statement to:
return [''.join(cpl) for cpl in set(zip(it,it))]
cuts the time to 209 us.
So maybe split() is slowing us down, so I'll change to using islice, creating a slicing iterator to walk through the input source string (of course this is now a bit more fragile, as any deviation in the input source format, like an extra space between values, will break our code, whereas using split, while a little slower, is more robust). Changed to:
from itertools import islice
def uniqueCouples(s):
it = islice(s, 0, None, 2)
return [''.join(cpl) for cpl in set(zip(it,it))]
And the time now drops to 197 us. Changing the list comprehension to use map(''.join, set(zip(it,it))) drops us down to about 194 us.
So I'm not sure where you get your information that split and join are slow - the big inefficiency I had in my submission was that I was calling join before using set to remove the duplicates.
Related
Hi, I took a Python algorithm Question concerning 2-dimensional arrays
I'm struggling with solving a Python programming question concerning 2-dimensional arrays. First of all, 19*19 sized arrays are assigned. Then the number of coordinates are assigned, which is followed by coordinates' exact values which are to be assigned. for example, 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 2 10 10 12 12 What two coordinates do is to change values on x and y axes which are meeting in an assigned value from 0 to 1 or from 1 to 0. Accordingly, the upper example returns the result below. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 The code I made for this algorithm is this. d = [] for i in range(1, 20): d.append(input().split()) n = int(input()) for i in range(n): x, y = map(int, input().split()) x, y = x-1, y-1 for j in range(1, 20): if d[j - 1][int(y)] == 0: d[j - 1][int(y)] = 1 else: d[j - 1][int(y)] = 0 if d[int(x)][j - 1] == 0: d[int(x)][j - 1] = 1 else: d[int(x)][j - 1] = 0 for i in range(1, 20): for j in range(1, 20): print(d[i - 1][j - 1], end=' ') print() However when below values are assigned, a result becomes to be incorrect. Below is the assigned values which make my code return a wrong result. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 19 19 Below is what my code returns which is a wrong answer. 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 But what a right answer should be is same to below. 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 I've been struggling for over an hour but I still don't get what's wrong to my code. Thank you for reading my question and I'm expecting you to advice me in a right way.
Here is a working solution minus the input feature. You might have to adapt the variables to best suit your needs. # array is the 19x19 array # change is the coordinates to change, namely [10, 10] and [12, 12] # the +1 mod 2 simply flips the 0s and 1s for c in change: for i in range(19): array[i][c[0]-1] = (array[i][c[0]-1]+1)%2 for i in range(19): array[c[0]-1][i] = (array[c[0]-1][i]+1)%2
Is there a function in tensorflow 2 where i can create a one dimensional tensor of ones and zeros with equal distance between ones?
I am creating a Spiking Neural Network and want to create a spike train with 1000 timesteps that spikes every x timesteps. inp_spikes = tf.reshape(tf.random.categorical(tf.math.log([[((1000-49)/1000), (49/1000)]]), 1*1000), [1, 1000]) The above code will give me a tensor with approx 49 ones and the rest zeros but is random. In the interest of reproducibility, I want to take the random out and also have the ones equally spread out.
I'll assume you want a tensor of length 1000, with a 1 every 20 indexes. There's probably a more simple method, but one way would be to just use a range, a mod, a comparison to 0, and a cast: inp_spikes = tf.cast(tf.less_equal(tf.mod(tf.range(1000), 20), 0), tf.int32) or alternatively you can use tf.scatter_nd: list_of_inds = list(range(0, 1000, 20)) indices = tf.constant([[i] for i in list_of_inds]) updates = tf.constant([1] * len(list_of_inds)) shape = tf.constant([1000]) inp_spikes = tf.scatter_nd(indices, updates, shape)
Since you have relatively few non-zero values in the desired tensor, it is best to define it as a sparse tensor: import tensorflow as tf x = 20 # timesteps between spikes and index of first spike n = 1000 # overall number of timesteps s = tf.math.floordiv(n, x) # number of spikes indices = tf.cast(tf.linspace([x], [n-1], s), tf.int64) # indices of spikes values = tf.repeat(1, s) # values of spikes dense_shape = [n] # the dense_shape of the sparse tensor inp_spikes = tf.sparse.SparseTensor(indices=indices, values=values, dense_shape=dense_shape) print(tf.sparse.to_dense(inp_spikes)) # just to illustrate the result tf.Tensor( [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1], shape=(1000,), dtype=int32)
How to put data produced by my terminal into a numpy array
I am a beginner to programming in general, and my situation is as follows. I am doing a computation using software (polymake) that I'm running interactively with my terminal, and my computation output some numeric data that looks like this: facet 1 contains vertices: 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 -8566355578160561/9007199254740992 5566755204060609/18014398509481984 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 1 8566355578160561/9007199254740992 5566755204060609/18014398509481984 0 0 0 0 1 0 0 0 0 0 0 1 1323574716436937/2251799813685248 -7286977229400801/9007199254740992 0 0 0 0 0 1 0 0 0 0 0 1 -4044484486813853/18014398509481984 5566755204060609/18014398509481984 0 0 0 0 0 0 1 0 0 0 0 1 4044484486813853/18014398509481984 5566755204060609/18014398509481984 0 0 0 0 0 0 0 1 0 0 0 1 -3272056622340821/9007199254740992 -4252622667048423/36028797018963968 0 0 0 0 0 0 0 0 1 0 0 1 0 -6880887921216781/18014398509481984 0 0 0 0 0 0 0 0 0 1 0 1 1000927696824871/2251799813685248 -6629910960894707/18014398509481984 0 0 0 0 0 0 0 0 0 0 1 1 0 0 2 0 0 0 0 0 0 0 0 0 0 1 -8566355578160561/9007199254740992 5566755204060609/18014398509481984 0 0 2 0 0 0 0 0 0 0 0 1 0 1 0 0 0 2 0 0 0 0 0 0 0 1 8566355578160561/9007199254740992 5566755204060609/18014398509481984 0 0 0 0 2 0 0 0 0 0 0 1 1323574716436937/2251799813685248 -7286977229400801/9007199254740992 0 0 0 0 0 2 0 0 0 0 0 1 -4044484486813853/18014398509481984 5566755204060609/18014398509481984 0 0 0 0 0 0 2 0 0 0 0 1 4044484486813853/18014398509481984 5566755204060609/18014398509481984 0 0 0 0 0 0 0 2 0 0 0 1 -3272056622340821/9007199254740992 -4252622667048423/36028797018963968 0 0 0 0 0 0 0 0 2 0 0 1 0 -6880887921216781/18014398509481984 0 0 0 0 0 0 0 0 0 2 0 1 1000927696824871/2251799813685248 -6629910960894707/18014398509481984 0 0 0 0 0 0 0 0 0 0 2 facet 2 contains vertices: 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 -1323574716436937/2251799813685248 -7286977229400801/9007199254740992 0 1 0 0 0 0 0 0 0 0 0 1 -8566355578160561/9007199254740992 5566755204060609/18014398509481984 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 1 8566355578160561/9007199254740992 5566755204060609/18014398509481984 0 0 0 0 1 0 0 0 0 0 0 1 1323574716436937/2251799813685248 -7286977229400801/9007199254740992 0 0 0 0 0 1 0 0 0 0 0 1 -4044484486813853/18014398509481984 5566755204060609/18014398509481984 0 0 0 0 0 0 1 0 0 0 0 1 4044484486813853/18014398509481984 5566755204060609/18014398509481984 0 0 0 0 0 0 0 1 0 0 0 1 0 -6880887921216781/18014398509481984 0 0 0 0 0 0 0 0 0 1 0 1 1000927696824871/2251799813685248 -6629910960894707/18014398509481984 0 0 0 0 0 0 0 0 0 0 1 1 0 0 2 0 0 0 0 0 0 0 0 0 0 1 -1323574716436937/2251799813685248 -7286977229400801/9007199254740992 0 2 0 0 0 0 0 0 0 0 0 1 -8566355578160561/9007199254740992 5566755204060609/18014398509481984 0 0 2 0 0 0 0 0 0 0 0 1 0 1 0 0 0 2 0 0 0 0 0 0 0 1 8566355578160561/9007199254740992 5566755204060609/18014398509481984 0 0 0 0 2 0 0 0 0 0 0 1 1323574716436937/2251799813685248 -7286977229400801/9007199254740992 0 0 0 0 0 2 0 0 0 0 0 1 -4044484486813853/18014398509481984 5566755204060609/18014398509481984 0 0 0 0 0 0 2 0 0 0 0 1 4044484486813853/18014398509481984 5566755204060609/18014398509481984 0 0 0 0 0 0 0 2 0 0 0 1 0 -6880887921216781/18014398509481984 0 0 0 0 0 0 0 0 0 2 0 1 1000927696824871/2251799813685248 -6629910960894707/18014398509481984 0 0 0 0 0 0 0 0 0 0 2 I need to use this data to do computations, which I am doing using Python. In order for me to run my algorithm on the data, I need to first organize it into numpy arrays as follows: F_2 = np.array([ [0,0,1,0,0,0,0,0,0,0,0,0,0], [-1323574716436937/2251799813685248,-7286977229400801/9007199254740992,0,1,0,0,0,0,0,0,0,0,0], [-8566355578160561/9007199254740992,5566755204060609/18014398509481984,0,0,1,0,0,0,0,0,0,0,0], [0,1,0,0,0,1,0,0,0,0,0,0,0], [8566355578160561/9007199254740992,5566755204060609/18014398509481984,0,0,0,0,1,0,0,0,0,0,0], [1323574716436937/2251799813685248,-7286977229400801/9007199254740992,0,0,0,0,0,1,0,0,0,0,0], [-4044484486813853/18014398509481984,5566755204060609/18014398509481984,0,0,0,0,0,0,1,0,0,0,0], [4044484486813853/18014398509481984,5566755204060609/18014398509481984,0,0,0,0,0,0,0,1,0,0,0], [0,-6880887921216781/18014398509481984,0,0,0,0,0,0,0,0,0,1,0], [1000927696824871/2251799813685248,-6629910960894707/18014398509481984,0,0,0,0,0,0,0,0,0,0,1], [0,0,2,0,0,0,0,0,0,0,0,0,0], [-1323574716436937/2251799813685248,-7286977229400801/9007199254740992,0,2,0,0,0,0,0,0,0,0,0], [-8566355578160561/9007199254740992,5566755204060609/18014398509481984,0,0,2,0,0,0,0,0,0,0,0], [0,1,0,0,0,2,0,0,0,0,0,0,0], [8566355578160561/9007199254740992,5566755204060609/18014398509481984,0,0,0,0,2,0,0,0,0,0,0], [1323574716436937/2251799813685248,-7286977229400801/9007199254740992,0,0,0,0,0,2,0,0,0,0,0], [-4044484486813853/18014398509481984,5566755204060609/18014398509481984,0,0,0,0,0,0,2,0,0,0,0], [4044484486813853/18014398509481984,5566755204060609/18014398509481984,0,0,0,0,0,0,0,2,0,0,0], [0,-6880887921216781/18014398509481984,0,0,0,0,0,0,0,0,0,2,0], [1000927696824871/2251799813685248,-6629910960894707/18014398509481984,0,0,0,0,0,0,0,0,0,0,2] ]) This is extremely tedious to do by hand, since I have to place the data manually into a 2D numpy array. This involves having to place commas separating the numbers, and putting the sequences of numbers on each line between square brackets to form the rows of the 2D array etc. I am wondering if there is a way I can do this much faster with programming commands (especially since I have to do this many times)? Thank you very much in advance.
use pandas import pandas as pd df = pd.read_csv('yourContent', delimiter=r' ')
You could copy-paste your data into a text file and then use numpy.genfromtxt(), e.g.: import numpy as np arr = np.genfromtxt(filepath) more info no how to use it in the linked documentation. An even more efficient approach would be to collect the output of your script. One way of doing this in Python is by running the output-producing script via subprocess functionalities (e.g. subprocess.run()).
Python Multi-Index: Finding cordinates with level 2 index, DataFrame
I have an empty DataFrame with Multi-Index index and columns. I also have list of strings that is cordinates of second level indexes. Since all of my second level index are unique, I am hoping to find cordinates and input values with my list of strings. Take a look at below example df= DNA Cat2 .... Item A B C D E F F H I J DNA Item Cat2 A 0 0 0 0 0 0 0 0 0 0 B 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0 0 0 D 0 0 0 0 0 0 0 0 0 0 E 0 0 0 0 0 0 0 0 0 0 F 0 0 0 0 0 0 0 0 0 0 .... str_cord = [(A,B),(A,H),(A,I),(B,H),(B,I),(H,I)] #and my output should be like below. df_result= DNA Cat2 .... Item A B C D E F F H I J DNA Item Cat2 A 0 1 0 0 0 0 0 1 1 0 B 0 0 0 0 0 0 0 1 1 0 C 0 0 0 0 0 0 0 0 0 0 D 0 0 0 0 0 0 0 0 0 0 E 0 0 0 0 0 0 0 0 0 0 F 0 0 0 0 0 0 0 0 0 0 H 0 0 0 0 0 0 0 0 1 0 .... It looks kinda complicated, but all I want to do is use my str_cord[0] as my cordinate for df_result. I tried with .loc, but it seems like I need to input level 1 index. I am looking for the way that I do not have to input Multi-Index level1 and find cordinates with level2 strings. Hope it make sense and thanks in advance! (Oh the data itself is very big, so as efficient as possible)
You can use: for i, j in str_cord: idx = pd.IndexSlice df.loc[idx[:, i], idx[:, j]] = 1 Sample: L = list('ABCDEFGHIJ') mux = pd.MultiIndex.from_product([['Cat1','Cat2'], L]) df = pd.DataFrame(0, index=mux, columns=mux) print (df) Cat1 Cat2 A B C D E F G H I J A B C D E F G H I J Cat1 A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 B 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 E 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 F 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 H 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 I 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 J 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Cat2 A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 B 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 E 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 F 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 H 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 I 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 J 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 str_cord = [('A','B'),('A','H'),('A','I'),('B','H'),('B','I'),('H','I')] for i, j in str_cord: idx = pd.IndexSlice df.loc[idx[:, i], idx[:, j]] = 1 print (df) Cat1 Cat2 A B C D E F G H I J A B C D E F G H I J Cat1 A 0 1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 1 0 B 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 E 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 F 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 H 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 I 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 J 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Cat2 A 0 1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 1 0 B 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 E 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 F 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 H 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 I 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 J 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
how to read integers only from a file, and ignore all spaces, so the output will be a table or list
many thanks the reason I asked this question is that I want to analyse a table which I pulled out from a network switch, I only need to analyse the number; but there are a few difficulties, please see the table I pulled out as below 1, the table contains random number of spaces between numbers and string, some are 5, and some are 12; 2, I only need to analyze the number or integer, I want to eliminate the strings, 3, its better to save this into 2-dimensional list, I searched my question in this website, and tried a bit of functions people mentioned, like replace, split, and also tried a couple of lib like Anacoda; also tried different ideas, like replace space with , or read from .csv instead of .txt, or read only numbers, but none of them are working; I am still a new in programming, so definitely need to think about more, and definitely need people's help thanks this is my current code to read the data, but can not analyze it lines = [line.rstrip('\n') for line in open ('result.txt')] Port Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize OutDiscards Gi1/0/1 0 0 0 0 0 0 Gi1/0/2 0 0 0 0 0 443 Gi1/0/3 0 0 0 0 0 0 Gi1/0/4 0 80 0 86 0 4029 Gi1/0/5 0 0 0 0 0 0 Gi1/0/6 0 0 0 0 0 0 Gi1/0/7 0 0 0 0 0 0 Gi1/0/8 0 0 0 0 0 626 Gi1/0/9 0 0 0 0 0 0 Gi1/0/10 0 0 0 0 0 0 Gi1/0/11 0 0 0 0 0 0 Gi1/0/12 0 0 0 0 0 0 Gi1/0/13 0 0 0 0 0 626 Gi1/0/14 0 0 0 0 0 626 Gi1/0/15 0 0 0 0 0 0 Gi1/0/16 0 0 0 0 0 626 Gi1/0/17 0 0 0 0 0 0 Gi1/0/18 0 0 0 0 0 0 Gi1/0/19 0 0 0 0 0 0 Gi1/0/20 0 0 0 0 0 0 Gi1/0/21 0 0 0 0 0 0 Gi1/0/22 0 0 0 0 0 20 Gi1/0/23 0 0 0 0 0 0 Gi1/0/24 0 0 0 0 0 0 Gi1/0/25 0 0 0 0 0 0 Gi1/0/26 0 0 0 0 0 0 Gi1/0/27 0 0 0 0 0 0 Gi1/0/28 0 0 0 0 0 0 Gi1/0/29 0 0 0 0 0 0 Gi1/0/30 0 0 0 0 0 0 Gi1/0/31 0 0 0 0 0 0 Gi1/0/32 0 0 0 0 0 0 Gi1/0/33 0 0 0 0 0 0 Gi1/0/34 0 0 0 0 0 0 Gi1/0/35 0 0 0 0 0 0 Gi1/0/36 0 0 0 0 0 0 Gi1/0/37 0 0 0 0 0 0 Gi1/0/38 0 0 0 0 0 0 Gi1/0/39 0 0 0 0 0 0 Gi1/0/40 0 0 0 0 0 0 Gi1/0/41 0 0 0 0 0 0 Gi1/0/42 0 0 0 0 0 33 Gi1/0/43 0 0 0 0 0 0 Gi1/0/44 0 0 0 0 0 0 Gi1/0/45 0 0 0 0 0 0 Gi1/0/46 0 0 0 0 0 0 Gi1/0/47 0 0 0 0 0 0 Gi1/0/48 0 0 0 0 0 0 Gi1/1/1 0 0 0 0 0 462 Gi1/1/2 0 0 0 0 0 623 Gi1/1/3 0 0 0 0 0 62 Gi1/1/4 0 0 0 0 0 78 Gi2/0/1 0 0 0 0 0 0 Gi2/0/2 0 0 0 0 0 0 Gi2/0/3 0 0 0 0 0 0 Gi2/0/4 0 0 0 0 0 0 Gi2/0/5 0 0 0 0 0 0 Gi2/0/6 0 0 0 0 0 0 Gi2/0/7 0 0 0 0 0 0 Gi2/0/8 0 0 0 0 0 629 Gi2/0/9 0 0 0 0 0 0 Gi2/0/10 0 0 0 0 0 0 Gi2/0/11 0 0 0 0 0 0 Gi2/0/12 0 0 0 0 0 0 Gi2/0/13 0 0 0 0 0 628 Gi2/0/14 0 0 0 0 0 0 Gi2/0/15 0 0 0 0 0 0 Gi2/0/16 0 0 0 0 0 0 Gi2/0/17 0 0 0 0 0 0 Gi2/0/18 0 0 0 0 0 0 Gi2/0/19 0 0 0 0 0 0 Gi2/0/20 0 0 0 0 0 0 Gi2/0/21 0 0 0 0 0 0 Gi2/0/22 0 0 0 0 0 0 Gi2/0/23 0 0 0 0 0 0 Gi2/0/24 0 0 0 0 0 0 Gi2/0/25 0 0 0 0 0 0 Gi2/0/26 0 0 0 0 0 0 Gi2/0/27 0 0 0 0 0 0 Gi2/0/28 0 0 0 0 0 0 Gi2/0/29 0 0 0 0 0 0 Gi2/0/30 0 0 0 0 0 0 Gi2/0/31 0 0 0 0 0 0 Gi2/0/32 0 0 0 0 0 0 Gi2/0/33 0 0 0 0 0 0 Gi2/0/34 0 0 0 0 0 0 Gi2/0/35 0 0 0 0 0 0 Gi2/0/36 0 0 0 0 0 0 Gi2/0/37 0 0 0 0 0 0 Gi2/0/38 0 0 0 0 0 0 Gi2/0/39 0 0 0 0 0 0 Gi2/0/40 0 0 0 0 0 0 Gi2/0/41 0 0 0 0 0 0 Gi2/0/42 0 0 0 0 0 0 Gi2/0/43 0 0 0 0 0 148 Gi2/0/44 0 0 0 0 0 0 Gi2/0/45 0 0 0 0 0 0 Gi2/0/46 0 0 0 0 0 0 Gi2/0/47 0 0 0 0 0 0 Gi2/0/48 0 0 0 0 0 0 Gi2/1/1 0 0 0 0 0 0 Gi2/1/2 0 0 0 0 0 0 Gi2/1/3 0 0 0 0 0 0 Gi2/1/4 0 0 0 0 0 0 Po2 0 0 0 0 0 0 Po11 0 0 0 0 0 0 Po12 0 0 0 0 0 181 Po13 0 0 0 0 0 0 Po14 0 0 0 0 0 0 Po20 0 0 0 0 0 0 Po21 0 0 0 0 0 462 Po22 0 0 0 0 0 623 Po23 0 0 0 0 0 62 Po24 0 0 0 0 0 78 Po25 0 0 0 0 0 443
Try this: with open('data', 'r') as content_file: content = content_file.read() content = content.replace("\n", "").split(" ") content = [content[i:i+7] for i in range(0, len(content), 7)] print content