I have that weird string (single line) where first field is a key, second is a value. It looks like this:
key1\val1\key2\val2\key3\val3\...\keyn\valn
What would be the best way to convert such notation to python dictionary?
Just use a temporary list to split your string to:
s = 'key1\\val1\\key2\\val2\\key3\\val3'
temp = s.split('\\')
d = {k: v for k, v in zip(temp[0::2], temp[1::2])}
Simple answer.
a = "key1\\val1\\key2\\val2\\key3\\val3"
b = a.split('\\')
dc = {}
for i in range(0,len(b), 2):
dc[b[i]]=b[i+1]
Here is what I came up with:
import re
string = 'key1\\val1\\key2\\val2\\key3\\val3'
dictionary = {match.group(1): match.group(2) for match in re.finditer(r'(\w+)\\(\w+)', string)}
print dictionary
However, note that this would work only if the key values are only characters (no space or underscores and stuff like that). In order to accomodate to such different cases, you would have to modify the simple regex I am using in the above code.
This does it without any imports:
s = """key1\\val1\\key2\\val2\\key3\\val3\\...\\keyn\\valn"""
spl = s.split("\\")
m = {}
for i in range(0, len(spl)-1, 2):
m[spl[i]] = spl[i+1]
print(m)
use split and itertools.islice
import itertools
def parse(ss):
inp = ss.split('\\')
keys, vals = itertools.tee(inp)
keys = itertools.islice(keys,0,None,2)
vals = itertools.islice(vals,1,None,2)
nd = {}
for key,val in zip(keys,vals):
nd[key] = val
return nd
In Python 2.7.12:
line = "key1\\val1\\key2\\val2\\key3\\val3"
line_data = line.split("\\")
line_dict = {}
print line_data
for i in range(0, len(line_data), 2):
key = line_data[i]
value = line_data[i+1]
line_dict[key] = value
print line_dict
Related
In my LIST(not dictionary) I have these strings:
"K:60",
"M:37",
"M_4:47",
"M_5:89",
"M_6:91",
"N:15",
"O:24",
"P:50",
"Q:50",
"Q_7:89"
in output I need to have
"K:60",
"M_6:91",
"N:15",
"O:24",
"P:50",
"Q_7:89"
What is the possible decision?
Or even maybe, how to take tag with the maximum among strings with the same tag.
Use re.split and list comprehension as shown below. Use the fact that when the dictionary dct is created, only the last value is kept for each repeated key.
import re
lst = [
"K:60",
"M:37",
"M_4:47",
"M_5:89",
"M_6:91",
"N:15",
"O:24",
"P:50",
"Q:50",
"Q_7:89"
]
dct = dict([ (re.split(r'[:_]', s)[0], s) for s in lst])
lst_uniq = list(dct.values())
print(lst_uniq)
# ['K:60', 'M_6:91', 'N:15', 'O:24', 'P:50', 'Q_7:89']
Probably far from the cleanest but here is a method quite easy to understand.
l = ["K:60", "M:37", "M_4:47", "M_5:89", "M_6:91", "N:15", "O:24", "P:50", "Q:50", "Q_7:89"]
reponse = []
val = []
complete_val = []
for x in l:
if x[0] not in reponse:
reponse.append(x[0])
complete_val.append(x.split(':')[0])
val.append(int(x.split(':')[1]))
elif int(x.split(':')[1]) > val[reponse.index(x[0])]:
val[reponse.index(x[0])] = int(x.split(':')[1])
for x in range(len(complete_val)):
print(str(complete_val[x]) + ":" + str(val[x]))
K:60
M:91
N:15
O:24
P:50
Q:89
I do not see any straight-forward technique. Other than iterating on entire thing and computing yourself, I do not see if any built-in can be used. I have written this where you do not require your values to be sorted in your input.
But I like the answer posted by Timur Shtatland, you can make us of that if your values are already sorted in input.
intermediate = {}
for item in a:
key, val = item.split(':')
key = key.split('_')[0]
val = int(val)
if intermediate.get(key, (float('-inf'), None))[0] < val:
intermediate[key] = (val, item)
ans = [x[1] for x in intermediate.values()]
print(ans)
which gives:
['K:60', 'M_6:91', 'N:15', 'O:24', 'P:50', 'Q_7:89']
I am trying to print the following dictionary in a hierarchy format
fam_dict{'6081740103':['60817401030000','60817401030100','60817401030200',
'60817401030300','60817401030400','60817401030500','60817401030600']
as shown here:
60817401030000
60817401030100
60817401030200
60817401030400
60817401030500
60817401030600
So far I have the following code which works but I'm having to manually input the i'th index in each line. How can I readjust this code in a recursive format instead of having to count how many lines of code and manually put the index value each time
my_p = node(fam_dict['6081740103'][0], None)
my_c = node(fam_dict['6081740103'][1], my_p)
my_d = node(fam_dict['6081740103'][2], my_c)
my_e = node(fam_dict['6081740103'][4], my_d)
my_f = node(fam_dict['6081740103'][5], my_e)
my_g = node(fam_dict['6081740103'][6], my_f)
print (my_p.name)
print_children(my_p)
You can try this:
fam_dict = {'6081740103':['60817401030000','60817401030100','60817401030200',
'60817401030300','60817401030400','60817401030500','60817401030600']}
for i, val in enumerate(fam_dict['6081740103']):
print(' ' * i * 4 + val)
Which outputs your desired hierachy:
60817401030000
60817401030100
60817401030200
60817401030300
60817401030400
60817401030500
60817401030600
You can create a variable that stores the line that you are iterating through, and then increment the variable each time through the loop. You can multiply that variable by \t Which is the tab operator in order to control how many tabs you want. Here is an example:
lines = 0
fam_dict = {'6081740103': ['60817401030000','60817401030100','60817401030200',
'60817401030300','60817401030400','60817401030500','60817401030600']}
for k, val in fam_dict.items():
for v in val:
lines += 1
t = '\t'
t = t * lines
print(t + str(v))
Here is your output:
60817401030000
60817401030100
60817401030200
60817401030300
60817401030400
60817401030500
60817401030600
You can do it this way too.
for key in fam_dict.keys():
for i in range(len(fam_dict[key])):
print(i*"\t"+ fam_dict[key][i])
Here is an example:
fam_dict = {'6081740103':['60817401030000','60817401030100','60817401030200','60817401030300','60817401030400','60817401030500','60817401030600']}
for k, v in fam_dict.items():
for i, s in enumerate(v):
print("%s%s"% ("\t"*i, s))
In case you want to make nodes for it:
fam_dict = {'6081740103':['60817401030000','60817401030100','60817401030200','60817401030300','60817401030400','60817401030500','60817401030600']}
node_list = []
for k, v in fam_dict.items():
last_parent = none
for i, s in enumerate(v):
print("%s%s"% ("\t"*i, s))
node_list.append(node(v, last_parent))
last_parent=node_list[-1]
The parent node will be node_list[0].
Try this:
fam_dict = {'6081740103':['60817401030000','60817401030100','60817401030200',
'60817401030300','60817401030400','60817401030500','60817401030600']}
l = fam_dict['6081740103']
for i in l:
print(' '*l.index(i)*4+i)
Output:
60817401030000
60817401030100
60817401030200
60817401030300
60817401030400
60817401030500
60817401030600
I have multiple string required original string to append different strings. Both of origin string and append string contains 1 int variable Based on my knowledge, both of following code are working but what is the best way to do it or if there is a better way to do it?
or is there any way I can write something like
newstrg = '{}{}'.format(org%OrgInt, appd%appdInt)
first method
org = "org__%s"
appd = "appd__%s"
orgInt = 1
appdInt = 7
newstrg = org % orgInt + appd % appdInt
print(newstrg)
org__1appd__7
Second method
org = "org__{}"
appd = "appd__{}"
orgInt = 1
appdInt = 7
newstrg = (org + appd).format(orgInt, appdInt)
org__1appd__7
Here is another way:
org_appd = {'org': 1, 'appd': 7}
org = "org__{org}"
appd = "appd__{appd}"
newstrg = (org + appd).format(**org_appd)
What about "org__{org}appd{appd}".format (org =1, appd= 7) or similar? Your format string can be arbitrary, and it's cleaner to use named placeholders.
edit
if the tokens and the numbers are variable, feed them in as a list of token-value pairs:
tokenpairs = [('org',1), ('appd', 7)] # etc
unit = lambda t,v : "{0}__{1}".format(t ,v)
renamed = "".join([unit (t, v) for t, v in tokenpairs])
I have a file like this :
2.nseasy.com.|['azeaonline.com']
ns1.iwaay.net.|['alchemistrywork.com', 'dha-evolution.biz', 'hidada.net', 'sonifer.biz']
ns2.hd28.co.uk.|['networksound.co.uk']
Expected result:
2.nseasy.com.|'azeaonline.com'
ns1.iwaay.net.|'alchemistrywork.com'
ns1.iwaay.net.|'dha-evolution.biz'
ns1.iwaay.net.|'hidada.net'
ns1.iwaay.net.|'sonifer.biz'
ns2.hd28.co.uk.|'networksound.co.uk'
When I try to do that, instead of items of value domains_list, I get characters of domains. which means that the lists in the value of dictionary d are are recognized as a list but as a string. Here is an my code:
d = defaultdict(list)
f = open(file,'r')
start = time()
for line in f:
NS,domain_list = line.split('|')
s = json.dumps(domain_list)
d[NS] = json.loads(s)
for NS, domains in d.items():
for domain in domains:
print (NS, domain)
example of the current result:
w
o
o
d
l
a
n
d
f
a
r
m
e
r
s
m
a
r
k
e
t
.
o
r
g
'
]
What you are doing with json is not correct. s = json.dumps(domain_list) dumps the list into a string s. The json.loads(s) reads the string again, and then you range over the the string and print it, hence the single characters in the output.
Try something like:
d = defaultdict(list)
f = open(file,'r')
start = time()
for line in f:
NS,domain_list = line.split('|')
d[NS] = json.loads(domain_list.replace("'", '"'))
for NS, domains in d.items():
for domain in domains:
print (NS, domain)
Here's another one (assuming names.txt contains your data):
with open('names.txt') as f: # Open the file for reading
for line in f: # iterate over each line
host,parts=line.strip().split('|') # Split the parts on the |
parts=parts.replace('[','').replace(']','') # Remove the [] chars
parts_a=map(str.strip, parts.split(',')) # Split on the comma, and remove any spaces
for part in parts_a: # for the split part, iterate through each one
print '{0}|{1}'.format(host, part) # print the host and part separated by a |
Note: You could replace the 4th and 5th line with parts_a=json.loads(parts) as well, assuming that the part after the | is JSON...
You dont need to use json in this case as it doesn't solve your problem , you can use ast.literal_eval and itertools.repeat inside a list comprehension to create the desire pairs :
>>> from itertools import repeat
>>> import ast
>>> sp_l=[(i.split('|')[0],ast.literal_eval(i.split('|')[1])) for i in s.split('\n')]
>>> for k in [zip(repeat(i,len(j)),j) for i,j in sp_l]:
... for item in k:
... print '|'.join(item)
...
2.nseasy.com.|azeaonline.com
ns1.iwaay.net.|alchemistrywork.com
ns1.iwaay.net.|dha-evolution.biz
ns1.iwaay.net.|hidada.net
ns1.iwaay.net.|sonifer.biz
ns2.hd28.co.uk.|networksound.co.uk
Try:
import ast
with open(file, "r") as f:
d = {k: ast.literal_eval(v) for k, v in map(lambda s: s.split("|"), f)}
for NS, domains in d.items():
for domain in domains:
print "%s|'%s'" % (NS, domain)
Or even just:
with open('file.xyz') as f:
for thing in f:
q, r = thing.split('|')
r = ast.literal_eval(r)
for other in r:
print '{}|{}'.format(q, other)
Here is a regex solution:
import re
input = '''2.nseasy.com.|['azeaonline.com']
ns1.iwaay.net.|['alchemistrywork.com', 'dha-evolution.biz', 'hidada.net', 'sonifer.biz']
ns2.hd28.co.uk.|['networksound.co.uk']'''
for line in input.split('\n'):
splitted = line.split('|')
left = splitted[0]
right = re.findall("'([a-z\.-]+?)'", splitted[1])
for domain in right:
print '{0}|{1}'.format(left, domain)
Outputs:
2.nseasy.com.|azeaonline.com
ns1.iwaay.net.|alchemistrywork.com
ns1.iwaay.net.|dha-evolution.biz
ns1.iwaay.net.|hidada.net
ns1.iwaay.net.|sonifer.biz
ns2.hd28.co.uk.|networksound.co.uk
I have a list containing strings as ['Country-Points'].
For example:
lst = ['Albania-10', 'Albania-5', 'Andorra-0', 'Andorra-4', 'Andorra-8', ...other countries...]
I want to calculate the average for each country without creating a new list. So the output would be (in the case above):
lst = ['Albania-7.5', 'Andorra-4.25', ...other countries...]
Would realy appreciate if anyone can help me with this.
EDIT:
this is what I've got so far. So, "data" is actually a dictionary, where the keys are countries and the values are list of other countries points' to this country (the one as Key). Again, I'm new at Python so I don't realy know all the built-in functions.
for key in self.data:
lst = []
index = 0
score = 0
cnt = 0
s = str(self.data[key][0]).split("-")[0]
for i in range(len(self.data[key])):
if s in self.data[key][i]:
a = str(self.data[key][i]).split("-")
score += int(float(a[1]))
cnt+=1
index+=1
if i+1 != len(self.data[key]) and not s in self.data[key][i+1]:
lst.append(s + "-" + str(float(score/cnt)))
s = str(self.data[key][index]).split("-")[0]
score = 0
self.data[key] = lst
itertools.groupby with a suitable key function can help:
import itertools
def get_country_name(item):
return item.split('-', 1)[0]
def get_country_value(item):
return float(item.split('-', 1)[1])
def country_avg_grouper(lst) :
for ctry, group in itertools.groupby(lst, key=get_country_name):
values = list(get_country_value(c) for c in group)
avg = sum(values)/len(values)
yield '{country}-{avg}'.format(country=ctry, avg=avg)
lst[:] = country_avg_grouper(lst)
The key here is that I wrote a function to do the change out of place and then I can easily make the substitution happen in place by using slice assignment.
I would probabkly do this with an intermediate dictionary.
def country(s):
return s.split('-')[0]
def value(s):
return float(s.split('-')[1])
def country_average(lst):
country_map = {}|
for point in lst:
c = country(pair)
v = value(pair)
old = country_map.get(c, (0, 0))
country_map[c] = (old[0]+v, old[1]+1)
return ['%s-%f' % (country, sum/count)
for (country, (sum, count)) in country_map.items()]
It tries hard to only traverse the original list only once, at the expense of quite a few tuple allocations.