Python- parse .txt files with multiple dictionaries - python

I have the following as a .txt file
{"a": 1, "b": 2, "c": 3}
{"d": 4, "e": 5, "f": 6}
{"g": 7, "h": 8, "i": 9}
How can I use python to open the file, and write a comma to separate each dictionary?
I.e. what regular expression can find every instance of "} {" and put a comma there?
(the real file is much larger (~10GB), and this issue prevents the file from being a syntactically correct JSON object that I can parse with json.loads())

You can use str.join with ',' as the delimeter, using a line-by-line file read in a generator expression. Then put [] around the contents to make it valid json.
import json
with open(filename, 'r') as f:
contents = '[' + ','.join(line for line in f) + ']'
data = json.loads(n)
This results in data
[
{'a': 1, 'b': 2, 'c': 3},
{'d': 4, 'e': 5, 'f': 6},
{'g': 7, 'h': 8, 'i': 9}
]

Related

Is it possible to get the frequency of characters in more then 1 string in Python?

Im aware that we are able to get the frequency of characters in a string by doing this:
def freq_of_chars(string):
all_freq = {}
for i in string:
if i in all_freq:
all_freq[i] += 1
else:
all_freq[i] = 1
print("Count of all characters is: " + str(all_freq))
freq_of_chars('Test')
However, if i want to get the frequency of characters in more then 1 string, how do i do it? I tried this but the output shows the frequency count of each word instead of characters.
def freq_of_chars(string1,string2):
freq = {}
for i in string1,string2:
if i in freq:
freq[i] += 1
else:
freq[i] = 1
print("Count of all characters is: " + str(freq))
freq_of_chars('First','Second')
This seems like a good opportunity to learn about *args. I'll modify your function:
def freq_of_chars(*strings):
all_strings = "".join(strings)
all_freq = {}
for i in all_strings:
if i in all_freq:
all_freq[i] += 1
else:
all_freq[i] = 1
print("Count of all characters is: " + str(all_freq))
Here, I've turned your parameter string into *strings, and then added a line which joins all the strings. What *strings does as a parameter is allow you to pass any number of strings to your function. The "".join() simply joins all the strings so that you can iterate over one big string.
Here's an example of how to use the function:
>>> freq_of_chars("hello", "this", "is", "a", "test", "function", "call", "with", "an", "arbitrary", "number", "of", "strings")
Count of all characters is: {'h': 3, 'e': 3, 'l': 4, 'o': 3, 't': 7, 'i': 6, 's': 5, 'a': 5, 'f': 2, 'u': 2, 'n': 5, 'c': 2, 'w': 1, 'r': 5, 'b': 2, 'y': 1, 'm': 1, 'g': 1}
Now, it's a bit tedious to have to write all those strings out, so if you happen to already have your strings all together in a collection, you can simply unpack that collection in the function call:
>>> strings = ["hello", "this", "is", "a", "test", "function", "call", "with", "an", "arbitrary", "number", "of", "strings"]
>>> freq_of_chars(*strings)
Count of all characters is: {'h': 3, 'e': 3, 'l': 4, 'o': 3, 't': 7, 'i': 6, 's': 5, 'a': 5, 'f': 2, 'u': 2, 'n': 5, 'c': 2, 'w': 1, 'r': 5, 'b': 2, 'y': 1, 'm': 1, 'g': 1}
The simplest way would be to use freq_of_chars(string1+string2)
Add them together:
def freq_of_chars(string1, string2):
freq = {}
for i in string1 + string2:
if i in freq:
freq[i] += 1
else:
freq[i] = 1
print("Count of all characters is: " + str(freq))
But also you could use:
def freq_of_chars(*strings):
freq = {}
for i in ''.join(strings):
if i in freq:
freq[i] += 1
else:
freq[i] = 1
print("Count of all characters is: " + str(freq))
This way you could enter unlimited strings!
Like:
freq_of_chars('First', 'Second', 'Third', 'Fourth', 'this', 'is', 'a', 'string', 'infinity!')
But I suggest Counter from collections:
from collections import Counter
def freq_of_chars(*strings):
return Counter(''.join(strings))
But why not just:
freq_of_chars = lambda *strings: Counter(''.join(strings))
Or just no function:
freq_of_chars = Counter(''.join(strings))
Rather than implementing the logic for counting the chars you could use collections.Counter instead
from collections import Counter
def freq_of_chars(*strings):
freq = Counter()
for string in strings:
freq.update(string)
return freq

How does this for loop work on the following string?

I have been working through Automate the Boring Stuff by Al Sweighart. I'm struggling with understanding the code below:
INPUT
message = 'It was a bright cold day in April, and the clocks were striking thirteen.'
count = {}
for character in message:
count.setdefault(character, 0)
count[character] = count[character] + 1
print(count)
OUTPUT
{'I': 1, 't': 6, ' ': 13, 'w': 2, 'a': 4, 's': 3, 'b': 1, 'r': 5, 'i': 6, 'g': 2, 'h': 3, 'c': 3, 'o': 2, 'l': 3, 'd': 3, 'y': 1, 'n': 4, 'A': 1, 'p': 1, ',': 1, 'e': 5, 'k': 2, '.': 1}
QUESTION
Since it does not matter what the variable in a for loop is called (ie character can be changed to x, pie etc) how does the code know to run the loop through each character in the string?
It's not about the variable's name, it's about the object this variable points to. The implementation of the loop in the Python virtual machine knows how to iterate over objects based on their types.
Iterating over something is implemented as iterating over iter(something), which in turn is the same as iterating over something.__iter__(). Different classes implement their own versions of __iter__, so that loops work correctly.
str.__iter__ iterates over the individual characters of a string, list.__iter__ - over the list's elements and so on.
You could create your own object and iterate over it:
class MyClass:
def __iter__(self):
return iter([1,2,3,4])
my_object = MyClass()
for x in my_object:
print(x)
This will print the numbers from 1 to 4.
A string is an array in python. So, it means that when you loop on a string, you loop on each character; in your case, you set what has been read to character.
Then, setdefault maps character to 0 if character is not yet in the dict. The rest looks quite straightforward.
Strings in python are sequences of chars : https://docs.python.org/3/library/stdtypes.html#textseq. Therefore, the for c in m: line iterate on every elements of the m sequence, i.e. on every character of the string

Python Create Count of Letters from Text File [duplicate]

This question already has answers here:
Count letters in a text file
(8 answers)
Closed 6 years ago.
I'm trying to do a count of the letters only from a text file. I want to exclude any punctuation and spaces. Here's what I have so far. I've been searching for ways to do this, but I keep getting errors whenever I try to exclude certain characters. Any help is greatly appreciated.
#Read Text Files To Memory
with open("encryptedA.txt") as A:
AText = A.read()
with open("encryptedB.txt") as B:
BText = B.read()
#Create Dictionary Object
from collections import Counter
CountA = Counter(AText)
print(CountA)
CountB = Counter(BText)
print(CountB)
You are on the right track, but you want to filter the text-file based on characters that are alphabetic (using isalpha()) or alphanumeric (using isalnum()):
from collections import Counter
with open('data.txt') as f:
print (Counter(c for c in f.read() if c.isalpha())) # or c.isalnum()
For my sample file, this prints:
Counter({'t': 3, 'o': 3, 'r': 3, 'y': 2, 's': 2, 'h': 2, 'p': 2, 'a': 2, 'g': 1, 'G': 1, 'H': 1, 'i': 1, 'e': 1, 'M': 1, 'S': 1})

Python - Ready text file and report count of individual characters

I have a Python script that prints out a randomly generated password, then prints prints it to a text file, adding a new row every time it is called. For example:
PSWD = (str(pswd01)) + (str(pswd02)) + (str(pswd03)) + (str(pswd04))
# Note, variables pswd01, pswd02 etc are randomly created earier in the script.
print(PSWD)
with open('PSWD_output.txt','a') as f:
f.write(PSWD + '\n')
f.close()
Note, the variable PSWD contains lower case, upper case, and numbers.
I then want to read the file and count the number of individual characters, and print a report to a different text file, but not sure how I can do this. I have asked a similar question here, which answers how to print a character report to the console.
Any idea how I can read PSWD_output.txt, and count each different character, writing the result to a separate text file?
Any help appreciated!
Use dictionaries to count characters repeat as below:
my_dictionary = {}
f = open('PSWD_output.txt','r')
line = f.readline()
while(line):
for letter in line:
if letter in my_dictionary:
my_dictionary[letter] +=1
else:
my_dictionary[letter] = 1
line = f.readline()
print my_dictionary
For a text file containing:
salam
asjf
asjg;
asdkj
14kj'asdf
5
returns:
>>> ================================ RESTART ================================
>>>
{'a': 6, 'j': 4, 'd': 2, 'g': 1, 'f': 2, 'k': 2, '\n': 6, 'm': 1, 'l': 1, "'": 1, '1': 1, 's': 5, '5': 1, '4': 1, ';': 1}
>>>
You can use a Counter. It takes an iterable (and strings are iterable: iterate over the characters) and stores the result as a dictionary
>>> from collections import Counter
>>> c = Counter("foobarbaz")
>>> c
Counter({'o': 2, 'a': 2, 'b': 2, 'z': 1, 'f': 1, 'r': 1})
>>> c.most_common(2)
[('o', 2), ('a', 2)]

Reading in a pprinted file in Python

I have a long-running script that collates a bunch of data for me. Without thinking too much about it, I set it up to periodically serialize all this data collected out to a file using something like this:
pprint.pprint(data, open('my_log_file.txt', 'w'))
The output of pprint is perfectly valid Python code. Is there an easy way to read in the file into memory so that if I kill the script I can start where I left off? Basically, is there a function which parses a text file as if it were a Python value and returns the result?
If I understand the problem correctly, you are writing one object to a log file? In that case you can simply use eval to turn it back in to a valid python object.
from pprint import pprint
# make some simple data structures
dct = {k: v for k, v in zip('abcdefghijklmnopqrstuvwxyz', range(26))}
# define a filename
filename = '/tmp/foo.txt'
# write them to some log
pprint(dct, open(filename, 'w'))
# open them back out of that log and use the readlines() function
# to let python split on the new lines for us
with open(filename, 'r') as f:
obj = eval(f.read())
print(type(obj))
print(obj)
It gets a little trickier if you are trying to write multiple objects to this file, but that is still doable.
The output of the above script is
<type 'dict'>
{'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3, 'g': 6, 'f': 5, 'i': 8, 'h': 7, 'k': 10, 'j': 9, 'm': 12, 'l': 11, 'o': 14, 'n': 13, 'q': 16, 'p': 15, 's': 18, 'r': 17, 'u': 20, 't': 19, 'w': 22, 'v': 21, 'y': 24, 'x': 23, 'z': 25}
Does this solve your problem?

Categories

Resources