I'm a Python newbie trying to parse a file to make a table of memory allocations. My input file is in the following format:
48 bytes allocated at 0x8bb970a0
24 bytes allocated at 0x8bb950c0
48 bytes allocated at 0x958bd0e0
48 bytes allocated at 0x8bb9b060
96 bytes allocated at 0x8bb9afe0
24 bytes allocated at 0x8bb9af60
My first objective is to make a table that counts the instances of a particular number of byte allocations. In other words, my desired output for the above input would be something like:
48 bytes -> 3 times
96 bytes -> 1 times
24 bytes -> 2 times
(for now, I'm not concerned about the memory addresses)
Since I'm using Python, I thought doing this using a dictionary would be the right way to go (based on about 3 hours' worth of reading Python tutorials). Is that a good idea?
In trying to do this using a dictionary, I decided to make the number of bytes the 'key', and a counter as the 'value'. My plan was to increment the counter on every occurrence of the key. As of now, my code snippet is as follows:
# Create an empty dictionary
allocationList = {}
# Open file for reading
with open("allocFile.txt") as fp:
for line in fp:
# Split the line into a list (using space as delimiter)
lineList = line.split(" ")
# Extract the number of bytes
numBytes = lineList[0];
# Store in a dictionary
if allocationList.has_key('numBytes')
currentCount = allocationList['numBytes']
currentCount += 1
allocationList['numBytes'] = currentCount
else
allocationList['numBytes'] = 1
for bytes, count in allocationList.iteritems()
print bytes, "bytes -> ", count, " times"
With this, I get a syntax error in the 'has_key' call, which leads me to question whether it is even possible to use variables as dictionary keys. All examples I have seen so far assume that keys are available upfront. In my case, I can get my keys only when I'm parsing the input file.
(Note that my input file can run into thousands of lines, with hundreds of different keys)
Thank you for any help you can provide.
Learning a language is as much about the syntax and basic types as it is about the standard library. Python already has a class that makes your task very easy: collections.Counter.
from collections import Counter
with open("allocFile.txt") as fp:
counter = Counter(line.split()[0] for line in fp)
for bytes, count in counter.most_common():
print bytes, "bytes -> ", count, " times"
You get a syntax error because you are missing the colon at the end of this line:
if allocationList.has_key('numBytes')
^
Your approach is fine, but it might be easier to use dict.get() with a default value:
allocationList[numBytes] = allocationList.get(numBytes, 0) + 1
Since your allocationList is a dictionary and not a list, you might want to chose a different name for the variable.
The dict.has_key() method of dictionnary has disappeared in python3, to replace it, use the in keyword :
if numBytes in allocationList: # do not use numBytes as a string, use the variable directly
#do the stuff
But in your case, you can also replace all the
if allocationList.has_key('numBytes')
currentCount = allocationList['numBytes']
currentCount += 1
allocationList['numBytes'] = currentCount
else
allocationList['numBytes'] = 1
with one line with get:
allocationList[numBytes] = allocationList.get(numBytes, 0) + 1
You most definitely can use variables as dict keys. However, you have a variable called numBytes, but are using a string containing the text "numBytes" - you're using a string constant, not the variable. That won't cause the error, but is a problem. Instead, try:
if numBytes in allocationList:
# do stuff
Additionally, consider a Counter. This is a convenient class for handling the case you're looking at.
Related
I'm creating a program that saves his infos as encoded infos using a method that I created.
Basically when I start the program it creates two variables,
one is the alphabet (uppercase, lowercase, digits, ...)
the other is the exact same except that I use a random.shuffle() to randomize it.
That's the "key" to a "key" I give it a random number using random.randint(1000000000,9999999999) and this number I call it the name of the "key".
All the keys and theirs name are stored in a file.
In the program you have the opportunity to write something like a name, that name is going to be encrypted using the key that is generated when I start the program, then stored in a file along with the name of the key.
(Note that the keys always have a different name, the encrypted data may has been encrypted using many times the same key then stored in another file).
I read from the key file first
{NOTE: the keys are stored using a \n pattern, example
file.write(f'{key}\n{key_name}\n') }
using my method the length will always be divisible for 2 so I use a variable initialized before the for cycle and increase it along with the cycle, meanwhile I use that same variable to read from the list (the result of reading the key file) and assign a name to a key, example:
{4819572: 'varoabIfokdbKjs3918', 40101846: 'opqihduqv', 8291073: 'hqowirhwip', ...}
My keys are 354 chars long so those are a really small example.
Here is the code described above
sep = os.sep
Ot = platform.system()
file_name = os.path.basename(__file__)
path_to_file = __file__.replace(file_name, '')
with open(f'{path_to_file}database{sep}history.dll', 'r', encoding='utf-8') as file:
keys = file.readlines()
num = 0
archive = {}
for _ in range(int(len(keys)/2)):
key_name = str(keys[num+1]).replace('\n','')
key = str(keys[num]).replace('\n','')
archive[int(key_name)] = key
num =+ 2
num1 = 0
num2 = 0
After this I use the key_name to get the key which is used in a decrypt function along with the encrypted text.
The problem is that even if I have 16 keys in the dictionary there are only 2. I really don't know how to resolve this or why this is appending.
You have writen "num =+ 2", I think you wanted write "num += 2"
tools = {"Wooden_Sword1" : 10, "Bronze_Helmet1 : 20}
I have code written to add items, i'm adding an item like so:
tools[key_to_find] = int(b)
the key_to_find is the tool and the b is the durability and i need to find a way so if i'm adding and Wooden_Sword1 already exists it adds a Wooden_Sword2 instead. This has to work with other items as well
As user3483203 and ShadowRanger commented, it's probably a bad idea to use numbers in your key string as part of the data. Manipulating those numbers will be awkward, and there are better alternatives. For instance, rather than storing a single value for each numbered key, use simple keys and store a list. The index into the list will take the place of the number in the key.
Here's how you could implement it:
tools = {"Wooden_Sword" : [10], "Bronze_Helmet" : [20]}
Add a new wooden sword with durability 10:
tools.setdefault("Wooden_Sword", []).append(10)
Find how many bronze helmets we have:
helmets = tools.get("Bronze_Helmet", [])
print("we have {} helmets".format(len(helmets)))
Find the first bronze helmet with a non-zero durability, and reduce it by 1:
helmets = tools.get("Bronze_Helmet", [])
for i, durability in helmets:
if durability > 0:
helmets[i] -= 1
break
else: # this runs if the break statement was never reached and the loop ran to completion
take_extra_damage() # or whatever
You could simplify some of this code by using a collections.defaultdict instead of a regular dictionary, but if you learn how to use get and setdefault it's not too hard to get by with the regular dict.
To ensure a key name is not taken yet, and add a number if it is, create the new name and test. Then increment the number if it is already in your list. Just repeat until none is found.
In code:
def next_name(basename, lookup):
if basename not in lookup:
return basename
number = 1
while basename+str(number) in lookup:
number += 1
return basename+str(number)
While this code does what you ask, you may want to look at other methods. A possible drawback is that there is no association between, say, WoodenShoe1 and WoodenShoe55 – if 'all wooden shoes' need their price increased, you'd have to iterate over all possible names between 1 and 55, just in case these existed at some time.
From what I understand of the question, your keys have 2 parts: "Name" and "ID". The ID is just an integer that starts at 1, so you can initialize a counter for every name:
numOfWoodenSwords = 0
And to add to the array:
numOfWoodenSwords += 1
tools["wodden_sword" + str(numOfWoodenSwords)] = int(b)
If you need to have an unknown amount of tools, I recommend looking at the re module: https://docs.python.org/3/library/re.html.
Or you could iterate over tools.keys to see if the entry exists.
You could write a function that determines if a character is a letter:
def is_letter(char):
return 65 <= ord(char) <= 90 or 97 <= ord(char) <= 122
Then when you are looking at a key in your dictionary, simply:
if is_letter(key[-1]):
...
I’m having a problem with a dictionary. I"m using Python3. I’m sure there’s something easy that I’m just not seeing.
I’m reading lines from a file to create a dictionary. The first 3 characters of each line are used as keys (they are unique). From there, I create a list from the information in the rest of the line. Each 4 characters make up a member of the list. Once I’ve created the list, I write to the directory with the list being the value and the first three characters of the line being the key.
The problem is, each time I add a new key:value pair to the dictionary, it seems to overlay (or update) the values in the previously written dictionary entries. The keys are fine, just the values are changed. So, in the end, all of the keys have a value equivalent to the list made from the last line in the file.
I hope this is clear. Any thoughts would be greatly appreciated.
A snippet of the code is below
formatDict = dict()
sectionList = list()
for usableLine in formatFileHandle:
lineLen = len(usableLine)
section = usableLine[:3]
x = 3
sectionList.clear()
while x < lineLen:
sectionList.append(usableLine[x:x+4])
x += 4
formatDict[section] = sectionList
for k, v in formatDict.items():
print ("for key= ", k, "value =", v)
formatFileHandle.close()
You always clear, then append and then insert the same sectionList, that's why it always overwrites the entries - because you told the program it should.
Always remember: In Python assignment never makes a copy!
Simple fix
Just insert a copy:
formatDict[section] = sectionList.copy() # changed here
Instead of inserting a reference:
formatDict[section] = sectionList
Complicated fix
There are lots of things going on and you could make it "better" by using functions for subtasks like the grouping, also files should be opened with with so that the file is closed automatically even if an exception occurs and while loops where the end is known should be avoided.
Personally I would use code like this:
def groups(seq, width):
"""Group a sequence (seq) into width-sized blocks. The last block may be shorter."""
length = len(seq)
for i in range(0, length, width): # range supports a step argument!
yield seq[i:i+width]
# Printing the dictionary could be useful in other places as well -> so
# I also created a function for this.
def print_dict_line_by_line(dct):
"""Print dictionary where each key-value pair is on one line."""
for key, value in dct.items():
print("for key =", key, "value =", value)
def mytask(filename):
formatDict = {}
with open(filename) as formatFileHandle:
# I don't "strip" each line (remove leading and trailing whitespaces/newlines)
# but if you need that you could also use:
# for usableLine in (line.strip() for line in formatFileHandle):
# instead.
for usableLine in formatFileHandle:
section = usableLine[:3]
sectionList = list(groups(usableLine[3:]))
formatDict[section] = sectionList
# upon exiting the "with" scope the file is closed automatically!
print_dict_line_by_line(formatDict)
if __name__ == '__main__':
mytask('insert your filename here')
You could simplify your code here by using a with statement to auto close the file and chunk the remainder of the line into groups of four, avoiding the re-use of a single list.
from itertools import islice
with open('somefile') as fin:
stripped = (line.strip() for line in fin)
format_dict = {
line[:3]: list(iter(lambda it=iter(line[3:]): ''.join(islice(it, 4)), ''))
for line in stripped
}
for key, value in format_dict.items():
print('key=', key, 'value=', value)
new to these boards and understand there is protocol and any critique is appreciated. I have begun python programming a few days ago and am trying to play catch-up. The basis of the program is to read a file, convert a specific occurrence of a string into a dictionary of positions within the document. Issues abound, I'll take all responses.
Here is my code:
f = open('C:\CodeDoc\Mm9\sampleCpG.txt', 'r')
cpglist = f.read()
def buildcpg(cpg):
return "\t".join(["%d" % (k) for k in cpg.items()])
lookingFor = 'CG'
i = 0
index = 0
cpgdic = {}
try:
while i < len(cpglist):
index = cpglist.index(lookingFor, i)
i = index + 1
for index in range(len(cpglist)):
if index not in cpgdic:
cpgdic[index] = index
print (buildcpg(cpgdic))
except ValueError:
pass
f.close()
The cpgdic is supposed to act as a dictionary of the position reference obtained in the index. Each read of index should be entering cpgdic as a new value, and the print (buildcpg(cpgdic)) is my hunch of where the logic fails. I believe(??) it is passing cpgdic into the buildcpg function, where it should be returned as an output of all the positions of 'CG', however the error "TypeError:not all arguments converted during string formatting" shows up. Your turn!
ps. this destroys my 2GB memory; I need to improve with much more reading
cpg.items is yielding tuples. As such, k is a tuple (length 2) and then you're trying to format that as a single integer.
As a side note, you'll probably be a bit more memory efficient if you leave off the [ and ] in the join line. This will turn your list comprehension to a generator expression which is a bit nicer. If you're on python2.x, you could use cpg.iteritems() instead of cpg.items() as well to save a little memory.
It also makes little sense to store a dictionary where the keys and the values are the same. In this case, a simple list is probably more elegant. I would probably write the code this way:
with open('C:\CodeDoc\Mm9\sampleCpG.txt') as fin:
cpgtxt = fin.read()
indices = [i for i,_ in enumerate(cpgtxt) if cpgtxt[i:i+2] == 'CG']
print '\t'.join(indices)
Here it is in action:
>>> s = "CGFOOCGBARCGBAZ"
>>> indices = [i for i,_ in enumerate(s) if s[i:i+2] == 'CG']
>>> print indices
[0, 5, 10]
Note that
i for i,_ in enumerate(s)
is roughly the same thing as
i for i in range(len(s))
except that I don't like range(len(s)) and the former version will work with any iterable -- Not just sequences.
This is kind of convoluted, so if I'm missing out on an easy construct for this, please let me know :)
I'm analysing the results of some matching experiments. At the end game, I want to be able to query things such as experiments[0]["cat"]["cat"], which yields the number of times "cat" was matched against "cat". Conversely, I could do experiments[0]["cat"]["dog"], when the first query was a cat and the match attempt was a dog.
The following is my code to populate this structure:
# initializing the first layer, a list of dictionaries.
experiments = []
for assignment in assignments:
match_sums = {}
experiments.append(match_sums)
for i in xrange(len(classes)):
for experiment in xrange(len(experiments)):
# experiments[experiment][classes[i]] should hold a dictionary,
# where the keys are the things that were matched against classes[i],
# and the value is the number of times this occurred.
experiments[experiment][classes[i]] = collections.defaultdict(dict)
# matches[experiment][i] is an integer for what the i'th match was in an experiment.
# classes[j] for some integer j is the string name of the i'th match. could be "dog" or "cat".
experiments[experiment][classes[i]][classes[matches[experiment][i]]] += 1
total_class_sums[classes[i]] = total_class_sums.get(classes[i], 0) + 1
print experiments[0]["cat"]["cat"]
exit()
So clearly this is a bit convoluted. And I'm getting a value of "1" for the last match, rather than a full dictionary at experiments[0]["cat"]. Have I approached this wrong? What could the bug here be? Sorry for the craziness and thanks for any possible help!
Two points:
Dictionary keys can be tuples; and
If you're counting things, use collections.Counter. (You can use defaultdict(int), but Counter is more useful.)
So, instead of
experiments[experiment][classes[i]][classes[matches[experiment][i]]] += 1
write
experiments = Counter()
...
experiments[experiment, classes[i], classes[matches[experiment][i]]] += 1
I just trying to guess your needs, so i tried to change order of your dimensions.
for className, classIdx in enumerate(classes):
experiment = collections.defaultdict(list)
experiments[className] = experiment
for assignment,assignmentIdx in enumerate(assignments):
counterpart = classes[matches[assignmentIdx][classIdx]]
experiment[counterpart].append((assignment,assignmentIdx))
print(len(experiments["cat"]["cat"]), len(experiments["cat"]))