Searching for submap in maps - python

I want to search for map names in maps. To make it more clear what I want, this is the map structure:
\All data
\Submap1
\SubSubmap1
\some files
\Subsubmap2
\Submap2
\Submap3
What I want to do is search for SubSubmap's. I want to search them on the name of the subsubmap.
I hope you guys can give me a head start, cause I can't find any way to search on the name of a map.

Let's use explore() function from here to save result of os.walk() into dictionary.
And after that just iterate over names and match them with pattern.
My folders:
.\all_data
.\all_data\sub1
.\all_data\sub1\subsub1
.\all_data\sub1\subsub1\some_files
.\all_data\sub1\subsub2
.\all_data\sub2
def explore(starting_path):
alld = {'': {}}
for dirpath, dirnames, filenames in os.walk(starting_path):
d = alld
dirpath = dirpath[len(starting_path):]
for subd in dirpath.split(os.sep):
based = d
d = d[subd]
if dirnames:
for dn in dirnames:
d[dn] = {}
else:
based[subd] = filenames
return alld['']
data = explore('.')
for k, v in data['all_data'].iteritems():
if v:
for key in v:
if 'subsub' in key:
print key
>>> {'all_data': {'sub1': {'subsub1': {'some_files': []}, 'subsub2': []},
'sub2': []}}
>>> subsub2
>>> subsub1
You could use more smarter verifications here if 'subsub' in key: as regex and so on.

Related

Trying to pull all the values that matches the key into one key

I've been trying to simply just pull all the values that matches the key into one key. I can't wrap my head around on how to do this. Please help.
list_dir = ['192586_Sample_010_Test.pdf', '192586_Sample_020_Test.pdf', '192586_Sample_050_Test.pdf', '192120_Sample_020_Test.pdf', '192120_Sample_050_Test.pdf', '192163_Sample_010_Test.pdf', '192163_Sample_020_Test.pdf', '192145_Sample_010_Test.pdf', '192145_Sample_020_Test.pdf', '192145_Sample_050_Test.pdf', '192051_Sample_010_Test.pdf', '192051_Sample_020_Test.pdf', '192051_Sample_050_Test.pdf']
dict = {}
match = []
for i in list_dir:
match.append((i.split("_", 1)[-2]))
for i in match:
for x in list_dir:
if i in x:
dict[i] = list_dir
print(dict)
Output I'm looking for is
{'192586': '192586_Sample_010_Test.pdf', '192586_Sample_020_Test.pdf', '192586_Sample_050_Test.pdf',
'192120': '192120_Sample_020_Test.pdf', '192120_Sample_050_Test.pdf',
'192163': '192163_Sample_010_Test.pdf', '192163_Sample_020_Test.pdf',
'192145': '192145_Sample_010_Test.pdf', '192145_Sample_020_Test.pdf', '192145_Sample_050_Test.pdf',
'192051': '192051_Sample_010_Test.pdf', '192051_Sample_020_Test.pdf', '192051_Sample_050_Test.pdf'}
Just extract the key from the string and check if it's in the dict or not. If so append to the list otherwise create a new list.
Like this:
dct = {}
for i in list_dir:
key = i.split("_")[0]
if key in dct:
dct[key].append(i)
else:
dct[key] = [i]
print(dct)

Recursively accessing paths and values of a nested dictionary

In Python 2.7, how does one dynamically access and print out the keys and values of a nested dictionary? Here's a nonsensical example: https://jsoneditoronline.org/?id=da7a486dc2e24bf8b94add9f04c71b4d
Normally, I would do something like:
import json
json_sample = 'sample_dict.json'
json_file = open(json_sample, 'r')
json_data = json.load(json_file)
items = json_data['sample_dict']
for item in items:
dict_id = item['dict_id']
person = item['person']['person_id']
family = item['family']['members']
print dict_id
print person
print family
I can hard code it like this and it'll give me desirable results, but how would I access each of the keys and values dynamically so that:
The first row just prints the keys (dict_id, person['person_id'], person['name'], family['members']['father'])
The second row prints the values respectively (5, 15, "Martin", "Jose")
The end result should be in a CSV file.
You can use a recursive visitor/generator which returns all the path/value pairs of the leaves:
def visit_dict(d, path=[]):
for k, v in d.items():
if not isinstance(v, dict):
yield path + [k], v
else:
yield from visit_dict(v, path + [k])
(replace the yield from ... with the appropriate equivalent if using Python < 3.4)
Getting the keys:
>>> ','.join('/'.join(k) for k, v in visit_dict(json_data['sample_dict'][0]))
'dict_id,person/person_id,person/name,person/age,family/person_id,family/members/father,family/members/mother,family/members/son,family/family_id,items_id,furniture/type,furniture/color,furniture/size,furniture/purchases'
and the values:
>>> ','.join(str(v) for k, v in visit_dict(json_data['sample_dict'][0]))
'5,15,Martin,18,20,Jose,Maddie,Jerry,2,None,Chair,Brown,Large,[]'

Searching a dictionary in Python 2

I have created a dictionary in python, this is some sample code from it.
filesAndHashes = dict()
...
>>>print filesAndHashes
{
"/home/rob/Desktop/test.txt":"1c52fe8fbb1463d541c2d971d9890c24",
"/home/rob/Desktop/file.dat":"6386ba70e82f11aa027bfc9874cd58cb",
"/home/rob/Desktop/test2.exe":"5b73c2a88fab97f558a07d40cc1e9d8e"
}
So all this is, is a file path and the MD5 of the file.
So what I want to do now is, I have found some MD5's of interest and created a list of them and want to search the dictionary for each MD5 in my list and return the file path to me for each hash.
Also the way the program works, there will never be an MD5 in my list that isn't in the dictionary, so not worried about error checking that.
Please feel free to ask for my information
Thanks.
You have a path -> hash mapping, but you need a hash -> path mapping. Assuming the hashes are unique, reverse the dictionary
>>> filesAndHashes = {'foo': '123', 'bar': '456'}
>>> hashesAndFiles = {hash:fname for fname,hash in filesAndHashes.iteritems()}
>>> hashesAndFiles
{'123': 'foo', '456': 'bar'}
Now just iterate over your list and report matches:
>>> hashes = ['456']
>>> for hash in hashes:
... filename = hashesAndFiles[hash]
... print(filename)
...
bar
If you cannot rule out that hashes are not unique, which in theory is possible, use a defaultdict.
>>> from collections import defaultdict
>>> hashesAndFiles = defaultdict(list)
>>>
>>> filesAndHashes = {'foo': '123', 'bar': '456', 'baz': '456'}
>>> for fname, hash in filesAndHashes.items():
... hashesAndFiles[hash].append(fname)
...
>>> hashesAndFiles
defaultdict(<type 'list'>, {'123': ['foo'], '456': ['baz', 'bar']})
>>>
>>> hashes = ['456']
>>> for hash in hashes:
... for filename in hashesAndFiles[hash]:
... print(filename)
...
baz
bar
Catch KeyErrors as needed (from your question I assumed you don't expect any non existing hashes in your list).
Reverse the dictionary so that the Keys are the Hashes, since you want to search with the Hashes.
Then simply search for the key in the dictionary with: filesAndHashes_reversed.get( hash_value, None )
filesAndHashes_reversed = { value: key for key, value in filesAndHashes.iteritems() }
hash_list = [ hash_1,hash_2, hash_3, ]
for hash in hash_list:
if filesAndHashes_reversed.get( hash, None ) == None:
print( "Not Found" )
else:
print( filesAndHashes_reversed.get( hash, None ) )
Probably you aren't using the right approach but first I'll answer the question as asked.
To find the FIRST match you can do this:
def find_item(md5hash)
for k,v in a.iteritems():
if v == md5hash:
return k
Note that this is the first match. In theory it is possible to have multiple entries with the same hash but the OP has said that the hashes are expected to be unique. But in that case why not use them as the key? This makes it easy to search for them:
hashes_and_files = dict()
hashes_and_files["1c52fe8fbb1463d541c2d971d9890c24"]="/home/rob/Desktop/test.txt"
hashes_and_files["6386ba70e82f11aa027bfc9874cd58cb"]="/home/rob/Desktop/file.dat"
hashes_and_files["5b73c2a88fab97f558a07d40cc1e9d8e"]="/home/rob/Desktop/test2.exe"
#finding is trivial
find_hash = "5b73c2a88fab97f558a07d40cc1e9d8e"
file_name = hashes_and_files["5b73c2a88fab97f558a07d40cc1e9d8e"]

Count unique values per unique keys in python dictionary

I have dictionary like this:
yahoo.com|98.136.48.100
yahoo.com|98.136.48.105
yahoo.com|98.136.48.110
yahoo.com|98.136.48.114
yahoo.com|98.136.48.66
yahoo.com|98.136.48.71
yahoo.com|98.136.48.73
yahoo.com|98.136.48.75
yahoo.net|98.136.48.100
g03.msg.vcs0|98.136.48.105
in which I have repetitive keys and values. And what I want is a final dictionary with unique keys (ips) and count of unique values (domains). I have laready below code:
for dirpath, dirs, files in os.walk(path):
for filename in fnmatch.filter(files, '*.txt'):
with open(os.path.join(dirpath, filename)) as f:
for line in f:
if line.startswith('.'):
ip = line.split('|',1)[1].strip('\n')
semi_domain = (line.rsplit('|',1)[0]).split('.',1)[1]
d[ip]= semi_domains
if ip not in d:
key = ip
val = [semi_domain]
domains_per_ip[key]= val
but this is not working properly. Can somebody help me out with this?
Use a defaultdict:
from collections import defaultdict
d = defaultdict(set)
with open('somefile.txt') as thefile:
for line in the_file:
if line.strip():
value, key = line.split('|')
d[key].add(value)
for k,v in d.iteritems(): # use d.items() in Python3
print('{} - {}'.format(k, len(v)))
you can use zip function to separate the ips and domains in tow list , then use set to get the unique entries !
>>>f=open('words.txt','r').readlines()
>>> zip(*[i.split('|') for i in f])
[('yahoo.com', 'yahoo.com', 'yahoo.com', 'yahoo.com', 'yahoo.com', 'yahoo.com', 'yahoo.com', 'yahoo.com', 'yahoo.net', 'g03.msg.vcs0'), ('98.136.48.100\n', '98.136.48.105\n', '98.136.48.110\n', '98.136.48.114\n', '98.136.48.66\n', '98.136.48.71\n', '98.136.48.73\n', '98.136.48.75\n', '98.136.48.100\n', '98.136.48.105')]
>>> [set(dom) for dom in zip(*[i.split('|') for i in f])]
[set(['yahoo.com', 'g03.msg.vcs0', 'yahoo.net']), set(['98.136.48.71\n', '98.136.48.105\n', '98.136.48.100\n', '98.136.48.105', '98.136.48.114\n', '98.136.48.110\n', '98.136.48.73\n', '98.136.48.66\n', '98.136.48.75\n'])]
and then with len you can find the number of unique objects ! all in one line with list comprehension :
>>> [len(i) for i in [set(dom) for dom in zip(*[i.split('|') for i in f])]]
[3, 9]

How to parse a directory structure into dictionary?

I have list of directory structure such as:
['/a/b', '/a/b/c', '/a/b/c/d', '/a/b/c/e', '/a/b/c/f/g', '/a/b/c/f/h', '/a/b/c/f/i']
I want to convert it into dict like a tree structure.
{'/': {'a': {'b': {'c':
[{'d':None},
{'e':None},
{'f':[{'g':None, {'h':None}, {'i':None}]}
]
}
}
}
}
I got stuck where to strat ? Which data structure will be suitable?
Thanks.
basically
lst = ['/a/b', '/a/b/c', '/a/b/c/d', '/a/b/c/e', '/a/b/c/f/g', '/a/b/c/f/h', '/a/b/c/f/i']
dct = {}
for item in lst:
p = dct
for x in item.split('/'):
p = p.setdefault(x, {})
print dct
produces
{'': {'a': {'b': {'c': {'e': {}, 'd': {}, 'f': {'i': {}, 'h': {}, 'g': {}}}}}}}
this is not exactly your structure, but should give you a basic idea.
As Sven Marnach said, the output data structure should be more consistent, eg only nested dictionaries where folders are associated to dict and files to None.
Here is a script which uses os.walk. It does not take a list as input but should do what you want in the end if you want to parse files.
import os
from pprint import pprint
def set_leaf(tree, branches, leaf):
""" Set a terminal element to *leaf* within nested dictionaries.
*branches* defines the path through dictionnaries.
Example:
>>> t = {}
>>> set_leaf(t, ['b1','b2','b3'], 'new_leaf')
>>> print t
{'b1': {'b2': {'b3': 'new_leaf'}}}
"""
if len(branches) == 1:
tree[branches[0]] = leaf
return
if not tree.has_key(branches[0]):
tree[branches[0]] = {}
set_leaf(tree[branches[0]], branches[1:], leaf)
startpath = '.'
tree = {}
for root, dirs, files in os.walk(startpath):
branches = [startpath]
if root != startpath:
branches.extend(os.path.relpath(root, startpath).split('/'))
set_leaf(tree, branches, dict([(d,{}) for d in dirs]+ \
[(f,None) for f in files]))
print 'tree:'
pprint(tree)
Start by looking at os.listdir or os.walk. They will allow you to traverse directories recursively. Either automatically (os.walk) or semi-automatically (with os.listdir). You could then store what you find in a dictionary.

Categories

Resources