Try to get hierarchical string structure into dict - python

I'm pretty new to python and now for 2 days, I'm struggling with getting a hierarchical based string structure into a python dict/list structure to handle it better:
Example Strings:
Operating_System/Linux/Apache
Operating_System/Linux/Nginx
Operating_System/Windows/Docker
Operating_System/FreeBSD/Nginx
What I try to achieve is to split each string up and pack it into a python dict, that should be
something like:
{'Operating_System': [{'Linux': ['Apache', 'Nginx']}, {'Windows': ['Docker']}, {'FreeBSD': ['Nginx']}]}
I tried multiple ways, including zip() and some ways by string split('/') and then doing
it by nested iteration but I could not yet solve it. Does anyone know a good/elegant way to achieve
something like this with python3 ?
best regards,
Chris

one way about it ... defaultdict could help here :
#assumption is that it is a collection of strings
strings = ["Operating_System/Linux/Apache",
"Operating_System/Linux/Nginx",
"Operating_System/Windows/Docker",
"Operating_System/FreeBSD/Nginx"]
from collections import defaultdict
d = defaultdict(dict)
e = defaultdict(list)
m = [entry.split('/') for entry in strings]
print(m)
[['Operating_System', 'Linux', 'Apache'],
['Operating_System', 'Linux', 'Nginx'],
['Operating_System', 'Windows', 'Docker'],
['Operating_System', 'FreeBSD', 'Nginx']]
for a,b,c in m:
e[b].append(c)
d[a] = e
print(d)
defaultdict(dict,
{'Operating_System': defaultdict(list,
{'Linux': ['Apache', 'Nginx'],
'Windows': ['Docker'],
'FreeBSD': ['Nginx']})})
if u want them exactly as u shared in ur output, u could skip the defaultdict(dict) part :
mapp = {'Operating_System':[{k:v} for k,v in e.items()]}
mapp
{'Operating_System': [{'Linux': ['Apache', 'Nginx']},
{'Windows': ['Docker']},
{'FreeBSD': ['Nginx']}]
}
this post was also useful

Related

Efficient group substring search in Python?

Lets say I've loaded some information from a file into a Python3 dict and the result looks like this.
d = {
'hello' : ['hello', 'hi', 'greetings'],
'goodbye': ['bye', 'goodbye', 'adios'],
'lolwut': ['++$(#$(#%$(##*', 'ASDF #!## TOW']
}
Let's say I'm going to analyze a bunch, I mean an absolute ton, of strings. If a string contains any of the values for a given key of d, then I want to categorize it as being in that key.
For example...
'My name is DDP, greetings' => 'hello'
Obviously I can loop through the keys and values like this...
def classify(s, d):
for k, v in d.items():
if any([x in s for x in v]):
return k
return ''
But I want to know if there's a more efficient algorithm for this kind of bulk searching; more efficient than my naive loop. Is anyone aware of such an algorithm?
You can use regex to avoid extra operations. Here all you need is to join the words with a pip character and pass it to re.search(). Since the order or the exact word is not important to you this way you can find out if there's any intersection between any of those values and the given string.
import re
def classify(s, d):
for k, v in d.items():
regex = re.compile(re.escape(r'|'.join(v)))
if regex.search(s):
return k
Also note that you can, instead of returning k yield it to get an iterator of all occurrences or use a dictionary to store them, etc.

Python Best Way to construct Dictionary from Inputs

I'm trying to construct a nested dictionary from user inputs. The only issue is, the user can opt to not enter some of these inputs. In these cases, I want the dictionary to completely exclude that field. For Instance:
ids = 1234
dmas = 5678
I would like the dictionary to look like this:
d = {profile:{"dma_targets":dmas, "id":ids}}
However, if user decides not to include certain input:
ids = None
dmas = 5678
I would like the dictionary to look like this:
d = {profile:{"dma_targets":dmas}}
I'm a bit stuck here, and it seems like a very simple thing, as it would be easy to do if I wanted a list instead of a dict. One of the problems I'm running into is:
x = "dma_targets":dmas
is not a valid object, so I'm having a hard time constructing the pieces of this, then adding them into the dictionary. Thanks so much!
How about a little dict comprehension?
fkeys = ['dma_targets', 'ids']
fvals = [5678, None]
d = {'profile': {k:v for (k,v) in zip(fkeys, fvals) if v is not None}}
which yields d as
{'profile': {'dma_targets': 5678}}
d = {profile:{}}
if ids:
d['profile']['ids'] = ids
if dmas:
d['profile']['dma_targets'] = dmas
If I understand correctly you want a nested dictionary with different types (python allows you to do this).
from collections import defaultdict
d = defaultdict(lambda: defaultdict(list))
d['a']['b'].append('bla')
d[15] = 15

How can i convert the dictionary items into flat strings in python

I have the dictionary of items from which i am generating the URLs like this
request.build_absolute_uri("/myurl/" + urlencode(myparams))
The output i am getting is like this
number=['543543']&region=['5,36,37']
but i want the url to be
number=543543&region=5,36,37
all those items are in myparams dictionary
You'll probably find for ease of use that passing doseq=True will be useful (albeit not exactly what you want - but does mean that any url parsing library should be able to handle the input without custom coding...)
>>> from urllib import urlencode
>>> a = range(3)
>>> urlencode({'test': a})
'test=%5B0%2C+1%2C+2%5D'
>>> urlencode({'test': a}, True)
'test=0&test=1&test=2'
Otherwise, you'll have to write custom code to ','.join(str(el) for el in your_list) for values in myparams where it's a list/similar... (then .split(',') the other end)
looks like myparams is a dict that has lists as values.
new_params = dict(k, v[0] for k, v in dict.iteritems())
will construct a new dict.

How do I make a list with the same name as a dictionary key?

I have a dictionary, containing several hundred entries, of format:
>>>dict
{'1620': 'aaaaaa'}
I would like to make new empty lists named '1620', etc. I have tried variations of the following but it doesn't recognize eachkey as a variable to be used when creating the list. Instead, it names the list literally "eachkey" and my key, in this example '1620', is not connected to the new list.
>>>for eachkey in dict.keys():
>>> eachkey=[]
>>>
>>>eachkey
[]
>>>'1620'
1620
Edited to add:
Maybe I could make the list at the same time as I make the dictionary? Slip it in here below? The str(eachfile[-4:]) is what I want the list named.
files=open(sys.argv[1])
dict={}
for eachfile in files:
value=open(eachfile)
key=str(eachfile[-4:])
dict[key]=value
eachfile.close()
Edit: it would be fine for me to add letters along w/ the numbers if that's what it needs.
I don't think it's possible to change the integer literal 1620 so that it gives you an object other than the integer 1620. Similarly I don't think you can change the string literal '1620' to give you a list instead of a string.
You could do it if you prefix the variable names with some letters to make them valid names. For example you could use my1620 instead of 1620. I wouldn't advise doing this, but it's possible:
>>> d = {'1620': 'aaaaaa'}
>>> for k,v in d.items():
... locals()['my'+k] = []
>>> my1620
'aaaaaa'
With a dict like this:
d1 = {'foo':'bar', '1621':'hello'}
Try doing this:
d2 = dict((k,list()) for k in d1.keys())
Now d2 is:
{'1621': [], 'foo': []}
And you can reference your lists list so:
d2['1621'].append(20)
d2['foo'].append(5)
d2['foo'].append('zig')
Which makes d2:
{'1621': [20], 'foo': [5, 'zig']}
As Gareth said, it's VERY unlikely you really want to do what you're asking to do. This is probably better.

culling values in csv.DictReader

I'm working with a huge csv that I am parsing with csv.DictReader , what would be some most efficient way to trim the data in the resulting dictionary based on the key name .
Say, just keep the keys that contain "JAN" .
Thanks !
result = {key:val for key, val in row.items() if 'JAN' in key}
where row is a dictionary obtained from DictReader.
Okay, here's a dirt stupid example of using csv.DictReader with /etc/passwd
#!python
keepers = dict()
r = csv.DictReader(open('/etc/passwd', 'r'), delimiter=":", \
fieldnames=('login','pw', 'uid','gid','gecos','homedir', 'shell'))
for i in r:
if i['uid'] < 1:
continue
keepers[i['login']]=i
Now, trying to apply that to your question ... I'm only guessing that you were building a dictionary of dictionaries based on the phrase "from the resulting dictionary." It seems obvious that the read/object is going to return a dictionary for every input record. So there will be one resulting dictionary for every line of your file (assuming any of the common CSV "dialects").
Naturally I could have used if i['uid'] > 1 or if "Jan" in i['gecos'] and only added to my "keepers" if the condition holds true. I wrote it this way to emphasize how you can easily skip those values in which you're not interested, such that the rest of your for suite could do various interesting things with those records that are of interest.
However, this answer is so simple that I have to suspect that I'm not understanding the question. (I'm using ''/etc/passwd'' and a colon separated list simply because it's an extremely well known format and world-readable copies are readily available on Linux, Unix, and MacOS X systems).
You could do something like this:
>>> with open('file.csv') as f:
... culled = [{k: d[k] for k in d if "JAN" in k} for d in csv.DictReader(f)]
When I tried this on a simple CSV file with the following contents:
JAN11,FEB11,MAR11,APR11,MAY11,JUN11,JUL11,AUG11,SEP11,OCT11,NOV11,DEC11,JAN12,FEB12,MAR12,APR12
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32
I got the following result:
>>> with open('file.csv') as f:
... culled = [{k: d[k] for k in d if "JAN" in k} for d in csv.DictReader(f)]
...
>>> culled
[{'JAN11': '1', 'JAN12': '13'}, {'JAN11': '17', 'JAN12': '29'}]

Categories

Resources