Python - Counter merges elements from list together - python

I have a list full of Windows API calls:
listOfSequences =
['GetSystemDirectoryA',
'IsDBCSLeadByte',
'LocalAlloc',
'CreateSemaphoreW',
'CreateSemaphoreA',
'GlobalAddAtomW',
'lstrcpynW',
'LoadLibraryExW',
'SearchPathW',
'CreateFileW',
'CreateFileMappingW',
'MapViewOfFileEx',
'GetSystemMetrics',
'RegisterClipboardFormatW',
'SystemParametersInfoW',
'GetDC',
'GetDeviceCaps',
'ReleaseDC', ...... and so on .....]
Since some of them occurs several times, I wanted to collected their number of occurences. Thus, I used collections.Counter.
But it concatenates some APIs together:
lCountedAPIs = Counter(listOfSequences)
when I print the lCountedAPIs I get the folowing:
Counter({'IsRectEmptyLocalAlloc': 2,
'DdePostAdvise': 3,
'DispatchMessageWGetModuleFileNameA': 2,
'FindResourceExW': 50318,
'ReleaseDCGetModuleFileNameW': 7,
'DefWindowProcAGetThreadLocale': 1,
'CoGetCallContext': 40,
'CoGetTreatAsClassGetCommandLineA': 1,
'GetForegroundWindowGetSystemDirectoryW': 1,
'GetModuleHandleWGetSystemTimeAsFileTime': 2,
'WaitForSingleObjectExIsChild': 1,
'LoadIconAGetWindowsDirectoryW': 2,
'GlobalFreeLocalAlloc': 10,
'GetMapModeCreateSemaphoreW': 1,
'HeapLock': 11494, <---------- A
'CharNextAGetCurrentProcessId': 11, <---------- B
'RemovePropWGetStartupInfoA': 1,
'GetTickCountGetVersionExW': 55,
So for ex.:
HeapLock (see A) was not merged with another API
But CharNextA was concatenated with GetCurrentProcessId (see B)
Can somebody tell me why this happens and how to fix that ?
Thanks in advcance & best regards :)

Check your list definition. Python concatenates adjacent string literals, so you must have missed a comma somewhere in the the middle:
listOfSequences = [
'GetSystemDirectoryA',
'IsDBCSLeadByte',
'LocalAlloc',
...
'CharNextA'
# ^ comma missing here
'GetCurrentProcessId',
...
]
This has bitten me several times.

Nothing in Counter does that. You must necessarily have 11 occurrences of 'CharNextAGetCurrentProcessId' in listOfSequences. You can check this by running 'CharNextAGetCurrentProcessId' in listOfSequences.

Related

Find every element's first index in a list

Here is a list a=[1,1,1,2,4,2,4,32,1,4,35,23,24,23]
I do this in python:
unique_number=list(set(a))
ans=map(lambda x:a.index(x),unique_number)
output:
<map at 0x2b98b307828>
I want to know what's wrong with my code and find an more efficient way to achieve this.
This code would work as you expected in Python 2. In Python 3, map returns an iterator. You could, e.g., convert it to a list:
>>> ans=map(lambda x:a.index(x),unique_number)
>>> list(ans)
[7, 0, 3, 10, 4, 11, 12]
You can avoid keep re-indexing and building a set first - simply build a dict iterating over a backwards as the dictionary will only keep the last value for a key (in this case - the earliest appearing index), eg:
a=[1,1,1,2,4,2,4,32,1,4,35,23,24,23]
first_index = {v:len(a) - k for k,v in enumerate(reversed(a), 1)}
# {1: 0, 2: 3, 4: 4, 23: 11, 24: 12, 32: 7, 35: 10}
This way you're only scanning the sequence once.
Try this:
for value in map(lambda x:a.index(x),unique_number):
print(value)
or append this:
for var in ans:
print(var)

Keeping track of skipped numbers in Python

I'm pretty new to Python and I'm looking for a way for Python to keep track of skipped numbers in a sequence. For example, if I have a folder with pictures numbered 1-100, but 47, 58 and 98 are missing in the directory, how can I keep track of this?
You can subtract your set with missing numbers from a complete set of all the numbers, e.g.:
>>> incomplete_set = { 0, 1, 2, 3, 4, 6, 8, 9 }
>>> complete_set = set(range(10))
>>> complete_set - incomplete_set
set([5, 7])

Piping a pipe-delimited flat file into python for use in Pandas and Stats

I have searched a lot, but haven't found an answer to this.
I am trying to pipe in a flat file with data and put into something python read and that I can do analysis with (for instance, perform a t-test).
First, I created a simple pipe delimited flat file:
1|2
3|4
4|5
1|6
2|7
3|8
8|9
and saved it as "simpledata".
Then I created a bash script in nano as
#!/usr/bin/env python
import sys
from scipy import stats
A = sys.stdin.read()
print A
paired_sample = stats.ttest_rel(A[:,0],A[:,1])
print "The t-statistic is %.3f and the p-value is %.3f." % paired_sample
Then I save the script as pairedttest.sh and run it as
cat simpledata | pairedttest.sh
The error I get is
TypeError: string indices must be integers, not tuple
Thanks for your help in advance
Are you trying to call this?:
paired_sample = stats.ttest_rel([1,3,4,1,2,3,8], [2,4,5,6,7,8,9])
If so, you can't do it the way you're trying. A is just a string when you read it from stdin, so you can't index it the way you're trying. You need to build the two lists from the string. The most obvious way is like this:
left = []
right = []
for line in A.splitlines():
l, r = line.split("|")
left.append(int(l))
right.append(int(r))
print left
print right
This will output:
[1, 3, 4, 1, 2, 3, 8]
[2, 4, 5, 6, 7, 8, 9]
So you can call stats.ttest_rel(left, right)
Or to be really clever and make a (nearly impossible to read) one-liner out of it:
z = zip(*[map(int, line.split("|")) for line in A.splitlines()])
This will output:
[(1, 3, 4, 1, 2, 3, 8), (2, 4, 5, 6, 7, 8, 9)]
So you can call stats.ttest_rel(*z)

Using combinatorics in Python to list 4-digit passcodes

I came across this interesting article about why using 3 unique numbers for a 4-digit passcode is the most secure: (LINK)
The math involved is pretty straightforward - if you have to guess a phone's 4-digit passcode based on the smudges left on the screen, then:
4 smudges indicates that there are 4 unique numbers in the passcode. Since each of them must be used at least once, then we have 4! = 24 possible passcodes.
With 3 distinct numbers, the passcode becomes a little more secure. Since there are three smudges, one number is repeated - but we don't know which one. So accounting for multiplicity, we get (4!/2!) x 3 = 36 possible passcodes.
Similarly, with 2 distinct numbers, we get 14 possible passcodes.
My question is, is there any way I can "prove" the above in Python? A way to justify that 3 numbers gives the most secure passcode, with Python code, probably something that lists out all possible passcodes if you give it some numbers? I was thinking about using itertools, with itertools.permutations as a starting point, but then I found that Python has a combinatorics module, which may be a far more elegant way. Would someone be kind enough to show me how to use it? I'm reading the documentation right now but some syntax is escaping me.
There is no combinatorics module in the standard distribution, but this is easy to do regardless. For example,
def guess(smudged_numbers):
from itertools import product
num_smudges = len(smudged_numbers)
for raw in product(smudged_numbers, repeat=4):
if len(set(raw)) == num_smudges:
yield raw
count = 0
for nums in guess([1, 8]):
print nums
count += 1
print "total", count
That prints:
(1, 1, 1, 8)
(1, 1, 8, 1)
(1, 1, 8, 8)
(1, 8, 1, 1)
(1, 8, 1, 8)
(1, 8, 8, 1)
(1, 8, 8, 8)
(8, 1, 1, 1)
(8, 1, 1, 8)
(8, 1, 8, 1)
(8, 1, 8, 8)
(8, 8, 1, 1)
(8, 8, 1, 8)
(8, 8, 8, 1)
total 14
The search space is very small (len(num_smudges)**4, which as at most 4**4 = 256), so no point to doing anything fancier ;-)
How it works: it generates all possible (product) 4-tuples (repeat=4) containing the passed-in sequence of smudged numbers. So for [1, 8], it generates all 2**4 = len(smudged_numbers)**4 = 16 possibilities for a 4-tuple containing nothing but 1's and 8's.
Converting a raw possibility to a set then tells us how many (len) different numbers appear in the raw 4-tuple. We only want those containing all the smudged numbers. That's all there is to it. In the [1, 8] case, this step only weeds out 2 of the 16 raw 4-tuples: (1, 1, 1, 1) and (8, 8, 8, 8).
My try with the permutations method in the itertools module.
I have added the shuffle method from the random module to generate more random tries from normal crackers. (To try your luck you would never go serially would you?!) But, if you want the serial tries method, you can just remove the shuffle(codes_to_try) line.
from itertools import combinations, permutations
from random import randint, shuffle
def crack_the_code(smudges, actual_code):
""" Takes a list of digit strings (smudges) & generates all possible
permutations of it (4 digits long). It then compares the actual given
code & returns the index of it in the generated list, which basically
becomes the number of tries.
"""
attempts_to_crack = 0
no_smudges = len(smudges)
if no_smudges == 3:
all_codes = ["".join(digits)
for repeated_num in smudges
for digits in permutations([repeated_num]+smudges)
]
all_codes = list(set(all_codes)) # remove duplicates
elif no_smudges == 4:
all_codes = ["".join(digits)
for digits in permutations(smudges)
]
else:
print "Smudges aren't 3 or 4"
raise ValueError
shuffle(all_codes)
return all_codes.index(actual_code)
print crack_the_code(["1","2","3"],"1232")
# above prints random values between 0 & 35 inclusive.
Note - You may play around with the function if you like int & not str.
PS - I have kept the code self-explanatory, but you can always comment & ask something you don't understand.

Backspace does not seem to work in python

network={1:[2,3,4],2:[1,3,4], 3:[1,2], 4:[1,3,5], 5:[6,7,8], 6:[5,8],7:[5,6], 8:[5,6,7]}
str1='network.csv'
output = open(str1,'w')
for ii1 in network.keys():
output.write(repr(ii1)+":[")
for n in network[ii1]:
output.write(' %s,'%(repr(n)))
output.write('\b'+']\n')
output.close()
What I expect is something like:
1:[ 2, 3, 4]
2:[ 1, 3, 4]
3:[ 1, 2]
4:[ 1, 3, 5]
5:[ 6, 7, 8]
6:[ 5, 8]
7:[ 5, 6]
8:[ 5, 6, 7]
but what I get is:
1:[ 2, 3, 4,]
2:[ 1, 3, 4,]
3:[ 1, 2,]
4:[ 1, 3, 5,]
5:[ 6, 7, 8,]
6:[ 5, 8,]
7:[ 5, 6,]
8:[ 5, 6, 7,]
I am a newbie....could someone please help?
The "\b" simply inserts the ASCII backspace character; it does not remove the just-written character from the output file. This is why your code doesn't behave as you expect.
Now, to fix it you could replace
for ii1 in network.keys():
output.write(repr(ii1)+":[")
for n in network[ii1]:
output.write(' %s,'%(repr(n)))
output.write('\b'+']\n')
with
for ii1 in network.keys():
output.write(repr(ii1)+":[ ")
output.write(", ".join(map(repr, network[ii1])))
output.write(']\n')
or, to improve it further, with
for k, v in network.items():
print >>output, "%s:[ %s]" % (repr(k), ", ".join(map(repr, v)))
Lastly, if the keys are simple integers as your example indicates, then the repr(k) can be simplified to just k. Also, if the values in the dictionary are lists of integers or somesuch, then the entire ", ".join(map(repr, v)) dance might be unnecessary.
Use str.join to generate Comma-Separated-Values, to avoid the need for backspace:
str.join(iterable)
Return a string which is the concatenation of the strings in the iterable iterable. The separator between elements is the string providing this method.
A simpler approach is, for example, list comprehensions iterating over dictionary items:
>>> [output.write("%s:%s\n" % item) for item in network.items()]
Why not to use str(dict)?
for k, v in network.iteritems():
output.write(str({k: v})[1:-1] + '\n')
You can't delete characters written in a file in general.
However, with a little redesigning of your code, you can get this:
network={1:[2,3,4],2:[1,3,4], 3:[1,2], 4:[1,3,5], 5:[6,7,8], 6:[5,8],7:[5,6], 8:[5,6,7]}
str1='network.csv'
output = open(str1,'w')
for ii1 in network.keys():
output.write(repr(ii1)+":[")
first=false
for n in network[ii1]:
if first:
first=false
else:
output.write(',')
output.write('%s'%(repr(n)))
output.write('\b'+']\n')
output.close()
Whether or not the backspace character actually 'backspaces' is probably dependent on the shell you're using.
It is much simpler and easier (and proper) to just output the data yourself as you want it formatted.
network={1:[2,3,4],2:[1,3,4], 3:[1,2], 4:[1,3,5], 5:[6,7,8], 6:[5,8],7:[5,6], 8:[5,6,7]}
output = open('network.csv','w')
for key,values in network.items():
str_values = [str(x) for x in values]
output.write('%s:[%s]' % (key,','.join(str_values))
output.close()
Try this:
network={1:[2,3,4],2:[1,3,4], 3:[1,2], 4:[1,3,5], 5:[6,7,8], 6:[5,8],7:[5,6], 8:[5,6,7]}
str1='network.csv'
with open(str1, 'w') as output:
for ii1 in network.keys():
output.write(repr(ii1)+":[")
output.write(','.join(repr(n) for n in network[ii1]))
output.write(']\n')
Output in network.csv:
1:[2,3,4]
2:[1,3,4]
3:[1,2]
4:[1,3,5]
5:[6,7,8]
6:[5,8]
7:[5,6]
8:[5,6,7]
Some points:
I'm using with ... as ...:. This guarantees that the file will be closed properly.
I'm using ','.join to create the comma-separated list. This is the 'pythonic' way to merge lists (or, more precisely, iterables) of strings.

Categories

Resources