dict = {'A': 71.07884,
'B': 110,
'C': 103.14484,
'D': 115.08864,
'E': 129.11552,
'F': 147.1766,
'G': 57.05196,
'H': 137.1412
}
def search_replace(search, replacement, searchstring):
p = re.compile(search)
searchstring = p.sub(replacement, searchstring)
return (searchstring)
def main():
with open(sys.argv[1]) as filetoread:
lines = filetoread.readlines()
file = ""
for i in range(len(lines)):
file += lines[i]
file = search_replace('(?<=[BC])', ' ', file)
letterlist = re.split('\s+', file)
for j in range(len(letterlist)):
print(letterlist[j])
if __name__ == '__main__':
import sys
import re
main()
My program open a file and split the text of letters after B or C.
The file looks like:
ABHHFBFEACEGDGDACBGHFEDDCAFEBHGFEBCFHHHGBAHGBCAFEEAABCHHGFEEEAEAGHHCF
Now I want to sum each lines with their values from dict.
For example:
AB = 181.07884
HHFB = 531.4590000000001
And so on.
I dont know how to start. Thanks a lot for all your answers.
You already did most of the work! All you miss out is the sum for each substring.
As substrings can occur more often, I'll do the summation only once, and store the values for each substring encountered in a dict (and your above dict for the relation of letter to value I renamed to mydict in order to avoid keyword confustion):
snippets = {}
for snippet in letterlist:
if snippet not in snippets:
value = 0
for s in snippet:
value += mydict.get(s)
snippets[snippet] = value
print(snippets)
That gives me an output of
{
'AB': 181.07884,
'HHFB': 531.4590000000001,
'FEAC': 450.5158,
'EGDGDAC': 647.6204,
'B': 110,
'GHFEDDC': 803.8074,
'AFEB': 457.37096,
'HGFEB': 580.4852800000001,
'C': 103.14484,
'FHHHGB': 725.6521600000001,
'AHGB': 375.272,
'AFEEAAB': 728.64416,
'HHGFEEEAEAGHHC': 1571.6099199999999,
'F': 147.1766}
Try to simplify things...
Given you already have a string s and a dictionary d:
ctr = 0
temp = ''
for letter in s:
ctr += d[letter]
temp += letter
if letter in 'BC':
print(temp, ctr)
ctr = 0
temp = ''
In the case you supplied where:
s = "ABHHFBFEACEGDGDACBGHFEDDCAFEBHGFEBCFHHHGBAHGBCAFEEAABCHHGFEEEAEAGHHCF"
d = {'A': 71.07884,
'B': 110,
'C': 103.14484,
'D': 115.08864,
'E': 129.11552,
'F': 147.1766,
'G': 57.05196,
'H': 137.1412
}
You get the results (printed to terminal):
>>> ('AB', 181.07884)
('HHFB', 531.4590000000001)
('FEAC', 450.5158)
('EGDGDAC', 647.6204)
('B', 110)
('GHFEDDC', 803.8074)
('AFEB', 457.37096)
('HGFEB', 580.4852800000001)
('C', 103.14484)
('FHHHGB', 725.6521600000001)
('AHGB', 375.272)
('C', 103.14484)
('AFEEAAB', 728.64416)
('C', 103.14484)
('HHGFEEEAEAGHHC', 1571.6099199999999)
Open you file and then read each character, then find the character on the dictionary and add the value to your total.
sum_ = 0
letters = "letters_file"
opened = open(letters, "r")
for row in opened:
for char in row:
sum_ += int(your_dictionary[char])
print(sum_)
You can use re.split with itertools.zip_longest in a dict comprehension:
import re
from itertools import zip_longest
i = iter(re.split('([BC])', s))
{w: sum(d[c] for c in w)for p in zip_longest(i, i, fillvalue='') for w in (''.join(p),)}
This returns:
{'AB': 181.07884, 'HHFB': 531.4590000000001, 'FEAC': 450.5158, 'EGDGDAC': 647.6204, 'B': 110, 'GHFEDDC': 803.8074, 'AFEB': 457.37096, 'HGFEB': 580.4852800000001, 'C': 103.14484, 'FHHHGB': 725.6521600000001, 'AHGB': 375.272, 'AFEEAAB': 728.64416, 'HHGFEEEAEAGHHC': 1571.6099199999999, 'F': 147.1766}
Related
I'm writing JSON to a file using DataFrame.to_json() with the indent option:
df.to_json(path_or_buf=file_json, orient="records", lines=True, indent=2)
The important part here is indent=2, otherwise it works.
Then how do I read this file using DataFrame.read_json()?
I'm trying the code below, but it expects the file to be a JSON object per line, so the indentation messes things up:
df = pd.read_json(file_json, lines=True)
I didn't find any options in read_json to make it handle the indentation.
How else could I read this file created by to_json, possibly avoiding writing my own reader?
The combination of lines=True, orient='records', and indent=2 doesn't actually produce valid json.
lines=True is meant to create line-delimited json, but indent=2 adds extra lines. You can't have your delimiter be line breaks, AND have extra line breaks!
If you do just orient='records', and indent=2, then it does produce valid json.
The current read_json(lines=True) code can be found here:
def _combine_lines(self, lines) -> str:
"""
Combines a list of JSON objects into one JSON object.
"""
return (
f'[{",".join([line for line in (line.strip() for line in lines) if line])}]'
)
You can see that it expects to read the file line by line, which isn't possible when indent has been used.
The other answer is good, but it turned out it requires reading the entire file in memory. I ended up writing a simple lazy parser that I include below. It requires removing lines=True argument in df.to_json.
The use is following:
for obj, pos, length in lazy_read_json('file.json'):
print(obj['field']) # access json object
It includes pos - start position in file for the object, and length - the length of object in file; it allows some more functionality for me, like being able to index object and load them to memory on demand.
The parser is below:
def lazy_read_json(filename: str):
"""
:return generator returning (json_obj, pos, lenth)
>>> test_objs = [{'a': 11, 'b': 22, 'c': {'abc': 'z', 'zzz': {}}}, \
{'a': 31, 'b': 42, 'c': [{'abc': 'z', 'zzz': {}}]}, \
{'a': 55, 'b': 66, 'c': [{'abc': 'z'}, {'z': 3}, {'y': 3}]}, \
{'a': 71, 'b': 62, 'c': 63}]
>>> json_str = json.dumps(test_objs, indent=4, sort_keys=True)
>>> _create_file("/tmp/test.json", [json_str])
>>> g = lazy_read_json("/tmp/test.json")
>>> next(g)
({'a': 11, 'b': 22, 'c': {'abc': 'z', 'zzz': {}}}, 120, 116)
>>> next(g)
({'a': 31, 'b': 42, 'c': [{'abc': 'z', 'zzz': {}}]}, 274, 152)
>>> next(g)
({'a': 55, 'b': 66, 'c': [{'abc': 'z'}, {'z': 3}, {'y': 3}]}, 505, 229)
>>> next(g)
({'a': 71, 'b': 62, 'c': 63}, 567, 62)
>>> next(g)
Traceback (most recent call last):
...
StopIteration
"""
with open(filename) as fh:
state = 0
json_str = ''
cb_depth = 0 # curly brace depth
line = fh.readline()
while line:
if line[-1] == "\n":
line = line[:-1]
line_strip = line.strip()
if state == 0 and line == '[':
state = 1
pos = fh.tell()
elif state == 1 and line_strip == '{':
state = 2
json_str += line + "\n"
elif state == 2:
if len(line_strip) > 0 and line_strip[-1] == '{': # count nested objects
cb_depth += 1
json_str += line + "\n"
if cb_depth == 0 and (line_strip == '},' or line_strip == '}'):
# end of parsing an object
if json_str[-2:] == ",\n":
json_str = json_str[:-2] # remove trailing comma
state = 1
obj = json.loads(json_str)
yield obj, pos, len(json_str)
pos = fh.tell()
json_str = ""
elif line_strip == '}' or line_strip == '},':
cb_depth -= 1
line = fh.readline()
# this function is for doctest
def _create_file(filename, lines):
# cause doctest can't input new line characters :(
f = open(filename, "w")
for line in lines:
f.write(line)
f.write("\n")
f.close()
I have a list that already quite resembles a dictionary:
l=["'S':'NP''VP'", "'NP':'DET''N'", "'VP':'V'", "'DET':'a'", "'DET':'an'", "'N':'elephant'", "'N':'elephants'", "'V':'talk'", "'V':'smile'"]
I want to create a dictionary keeping all information:
dict= {'S': [['NP','VP']],
'NP': [['DET', 'N']],
'VP': [['V']], 'DET': [['a'], ['an']],
'N': [['elephants'], ['elephant']],
'V': [['talk'], ['smile]]}
I tried using this:
d = {}
elems = filter(str.isalnum,l.replace('"',"").split("'"))
values = elems[1::2]
keys = elems[0::2]
d.update(zip(keys,values))
and this:
s = l.split(",")
dictionary = {}
for i in s:
dictionary[i.split(":")[0].strip('\'').replace("\"", "")] = i.split(":")[1].strip('"\'')
print(dictionary)
You can use collections.defaultdict with re:
import re, collections
l=["'S':'NP''VP'", "'NP':'DET''N'", "'VP':'V'", "'DET':'a'", "'DET':'an'", "'N':'elephant'", "'N':'elephants'", "'V':'talk'", "'V':'smile'"]
d = collections.defaultdict(list)
for i in l:
d[(k:=re.findall('\w+', i))[0]].append(k[1:])
print(dict(d))
Output:
{'S': [['NP', 'VP']], 'NP': [['DET', 'N']], 'VP': [['V']], 'DET': [['a'], ['an']], 'N': [['elephant'], ['elephants']], 'V': [['talk'], ['smile']]}
I have a dict of dicts, but a given entry might not exist. For example, I have the following dict where the entry for c is missing:
g = {
'a': {'w': 14, 'x': 7, 'y': 9},
'b': {'w': 9, 'c': 6}, # <- c is not in dict
'w': {'a': 14, 'b': 9, 'y': 2},
'x': {'a': 7, 'y': 10, 'z': 15},
'y': {'a': 9, 'w': 2, 'x': 10, 'z': 11},
'z': {'b': 6, 'x': 15, 'y': 11}
}
My current code
start = 'a'
end = 'z'
queue, seen = [(0, start, [])], set()
while True:
(distance, vertex, path) = heapq.heappop(queue)
if vertex not in seen:
path = path + [vertex]
seen.add(vertex)
if vertex == end:
print(distance, path)
break # new line, based on solutions below
# new line
if vertex not in graph: # new line
continue # new line
for (next_v, d) in graph[vertex].items():
heapq.heappush(queue, (distance + d, next_v, path))
Right now I am getting the error:
for (next_v, d) in graph[vertex].items():
KeyError: 'c'
EDIT 1
If key is not found in dict skip ahead.
EDIT 2
Even with the newly added code I get an error, this time:
(distance, vertex, path) = heapq.heappop(queue)
IndexError: index out of range
Here is the data file I use
https://s3-eu-west-1.amazonaws.com/citymapper-assets/citymapper-coding-test-graph.dat
Here is the file format:
<number of nodes>
<OSM id of node>
...
<OSM id of node>
<number of edges>
<from node OSM id> <to node OSM id> <length in meters>
...
<from node OSM id> <to node OSM id> <length in meters>
And here is the code to create the graph
with open(filename, 'r') as reader:
num_nodes = int(reader.readline())
edges = []
for line in islice(reader, num_nodes + 1, None):
values = line.split()
values[2] = int(values[2])
edges.append(tuple(values))
graph = {k: dict(x[1:] for x in grp) for k, grp in groupby(sorted(edges), itemgetter(0))}
Change start and end to:
start = '876500321'
end = '1524235806'
Any help/advice is highly appreciated.
Thanks
Before accessing graph[vertex], make sure it is in the dict:
if vertex not in graph:
continue
for (next_v, d) in graph[vertex].items():
heapq.heappush(queue, (distance + d, next_v, path))
You can check whether the vertex is in the graph before executing that final for loop:
if vertex in graph:
for (next_v, d) in graph[vertex].items():
heapq.heappush(queue, (distance + d, next_v, path))
You could do a .get and return a empty {} incase the key is not there, so that the .items() won't break like,
for (next_v, d) in graph.get(vertex, {}).items():
heapq.heappush(queue, (distance + d, next_v, path))
I am using Python 2; how can I migrate an array to multiple dimensions? example:
a = ['a', 'b', 'c', ...]
To:
foo['a']['b']['c']...
And check if exist, example have multiple arrays:
a = ['a', 'b', 'c']
b = ['a', 'x', 'y']
The result:
foo['a'] -> ['b'], ['x']
foo['a']['b'] -> ['c']
foo['a']['x'] -> ['y']
I need this for making a file dir tree navigation, for each path discovered need add the paths and files, the paths is get from db. Need separate a navigation by example:
http://foo.site/a ->
/b
/c
/d
http://foo.site/a/b ->
/file1.jpg
/file2.jpg
For each path make a split by / and need make multisimensional array or dictionary with each path and files availables.
It's not really clear what you are asking,
Nevertheless, you can define a simple tree structure like this:
import collections
def tree():
return collections.defaultdict(tree)
And use it as follow:
foo = tree()
foo['a']['b']['c'] = "x"
foo['a']['b']['d'] = "y"
You get:
defaultdict(<function tree at 0x7f9e4829f488>,
{'a': defaultdict(<function tree at 0x7f9e4829f488>,
{'b': defaultdict(<function tree at 0x7f9e4829f488>,
{'c': 'x',
'd': 'y'})})})
Which is similar to:
{'a': {'b': {'c': 'x', 'd': 'y'})})})
EDIT
But you also ask for “For each path make a split by / and need make multidimensional array or dictionary with each path and files available.”
I usually use os.walk to search files in directories:
import os
import os.path
start_dir = ".."
result = {}
for root, filenames, dirnames in os.walk(start_dir):
relpath = os.path.relpath(root, start_dir)
result[relpath] = filenames
This solution works for me using eval and Dictionaries of dictionaries merge :
def __init__(self):
self.obj = {}
def setPathToObject(self, path):
path_parts = path.replace('://', ':\\\\').split('/')
obj_parts = eval('{ \'' + ('\' : { \''.join(path_parts)) + '\' ' + ('}' * len(path_parts)))
obj_fixed = str(obj_parts).replace('set([\'', '[\'').replace('])}', ']}').replace(':\\\\', '://')
obj = eval(obj_fixed)
self.obj = self.recMerge(self.obj.copy(), obj.copy())
return obj
def recMerge(self, d1, d2):
for k, v in d1.items():
if k in d2:
if all(isinstance(e, MutableMapping) for e in (v, d2[k])):
d2[k] = self.recMerge(v, d2[k])
elif all(isinstance(item, list) for item in (value, dict2[key])):
d2[key] = v + d2[k]
d3 = d1.copy()
d3.update(d2)
return d3
Testing:
setPathToObject('http://www.google.com/abc/def/ghi/file1.jpg')
setPathToObject('http://www.google.com/abc/xyz/file2.jpg')
setPathToObject('http://www.google.com/abc/def/123/file3.jpg')
setPathToObject('http://www.google.com/abc/def/123/file4.jpg')
print self.obj
> { 'http://www.google.com': { 'abc': { 'def': { 'ghi': [ 'file1.jpg' ], '123': [ 'file3.jpg', 'file4.jpg' ] }, 'xyz': [ 'file2.jpg' ] } } }
Works on Python 2.
I know there are many questions with the same title. My situation is a little different. I have a string like:
"Cat(Money(8)Points(80)Friends(Online(0)Offline(8)Total(8)))Mouse(Money(10)Points(10000)Friends(Online(10)Offline(80)Total(90)))"
(Notice that there are parenthesis nested inside another)
and I need to parse it into nested dictionaries like for example:
d["Cat"]["Money"] == 8
d["Cat"]["Points"] = 80
d["Mouse"]["Friends"]["Online"] == 10
and so on. I would like to do this without libraries and regex. If you choose to use these, please explain the code in great detail.
Thanks in advance!
Edit:
Although this code will not make any sense, this is what I have so far:
o_str = "Jake(Money(8)Points(80)Friends(Online(0)Offline(8)Total(8)))Mouse(Money(10)Points(10000)Friends(Online(10)Offline(80)Total(90)))"
spl = o_str.split("(")
def reverseIndex(str1, str2):
try:
return len(str1) - str1.rindex(str2)
except Exception:
return len(str1)
def app(arr,end):
new_arr = []
for i in range(0,len(arr)):
if i < len(arr)-1:
new_arr.append(arr[i]+end)
else:
new_arr.append(arr[i])
return new_arr
spl = app(spl,"(")
ends = []
end_words = []
op = 0
cl = 0
for i in range(0,len(spl)):
print i
cl += spl[i].count(")")
op += 1
if cl == op-1:
ends.append(i)
end_words.append(spl[i])
#break
print op
print cl
print
print end_words
The end words are the sections at the beginning of each statement. I plan on using recursive to do the rest.
Now that was interesting. You really nerd-sniped me on this one...
def parse(tokens):
""" take iterator of tokens, parse to dictionary or atom """
dictionary = {}
# iterate tokens...
for token in tokens:
if token == ")" or next(tokens) == ")":
# token is ')' -> end of dict; next is ')' -> 'leaf'
break
# add sub-parse to dictionary
dictionary[token] = parse(tokens)
# return dict, if non-empty, else token
return dictionary or int(token)
Setup and demo:
>>> s = "Cat(Money(8)Points(80)Friends(Online(0)Offline(8)Total(8)))Mouse(Money(10)Points(10000)Friends(Online(10)Offline(80)Total(90)))"
>>> tokens = iter(s.replace("(", " ( ").replace(")", " ) ").split())
>>> pprint(parse(tokens))
{'Cat': {'Friends': {'Offline': 8, 'Online': 0, 'Total': 8},
'Money': 8,
'Points': 80},
'Mouse': {'Friends': {'Offline': 80, 'Online': 10, 'Total': 90},
'Money': 10,
'Points': 10000}}
Alternatively, you could also use a series of string replacements to turn that string into an actual Python dictionary string and then evaluate that, e.g. like this:
as_dict = eval("{'" + s.replace(")", "'}, ")
.replace("(", "': {'")
.replace(", ", ", '")
.replace(", ''", "")[:-3] + "}")
This will wrap the 'leafs' in singleton sets of strings, e.g. {'8'} instead of 8, but this should be easy to fix in a post-processing step.