I have dictionary like this:
yahoo.com|98.136.48.100
yahoo.com|98.136.48.105
yahoo.com|98.136.48.110
yahoo.com|98.136.48.114
yahoo.com|98.136.48.66
yahoo.com|98.136.48.71
yahoo.com|98.136.48.73
yahoo.com|98.136.48.75
yahoo.net|98.136.48.100
g03.msg.vcs0|98.136.48.105
in which I have repetitive keys and values. And what I want is a final dictionary with unique keys (ips) and count of unique values (domains). I have laready below code:
for dirpath, dirs, files in os.walk(path):
for filename in fnmatch.filter(files, '*.txt'):
with open(os.path.join(dirpath, filename)) as f:
for line in f:
if line.startswith('.'):
ip = line.split('|',1)[1].strip('\n')
semi_domain = (line.rsplit('|',1)[0]).split('.',1)[1]
d[ip]= semi_domains
if ip not in d:
key = ip
val = [semi_domain]
domains_per_ip[key]= val
but this is not working properly. Can somebody help me out with this?
Use a defaultdict:
from collections import defaultdict
d = defaultdict(set)
with open('somefile.txt') as thefile:
for line in the_file:
if line.strip():
value, key = line.split('|')
d[key].add(value)
for k,v in d.iteritems(): # use d.items() in Python3
print('{} - {}'.format(k, len(v)))
you can use zip function to separate the ips and domains in tow list , then use set to get the unique entries !
>>>f=open('words.txt','r').readlines()
>>> zip(*[i.split('|') for i in f])
[('yahoo.com', 'yahoo.com', 'yahoo.com', 'yahoo.com', 'yahoo.com', 'yahoo.com', 'yahoo.com', 'yahoo.com', 'yahoo.net', 'g03.msg.vcs0'), ('98.136.48.100\n', '98.136.48.105\n', '98.136.48.110\n', '98.136.48.114\n', '98.136.48.66\n', '98.136.48.71\n', '98.136.48.73\n', '98.136.48.75\n', '98.136.48.100\n', '98.136.48.105')]
>>> [set(dom) for dom in zip(*[i.split('|') for i in f])]
[set(['yahoo.com', 'g03.msg.vcs0', 'yahoo.net']), set(['98.136.48.71\n', '98.136.48.105\n', '98.136.48.100\n', '98.136.48.105', '98.136.48.114\n', '98.136.48.110\n', '98.136.48.73\n', '98.136.48.66\n', '98.136.48.75\n'])]
and then with len you can find the number of unique objects ! all in one line with list comprehension :
>>> [len(i) for i in [set(dom) for dom in zip(*[i.split('|') for i in f])]]
[3, 9]
Related
For some reason my code refuses to convert to uppercase and I cant figure out why. Im trying to then write the dictionary to a file with the uppercase dictionary values being inputted into a sort of template file.
#!/usr/bin/env python3
import fileinput
from collections import Counter
#take every word from a file and put into dictionary
newDict = {}
dict2 = {}
with open('words.txt', 'r') as f:
for line in f:
k,v = line.strip().split(' ')
newDict[k.strip()] = v.strip()
print(newDict)
choice = input('Enter 1 for all uppercase keys or 2 for all lowercase, 3 for capitalized case or 0 for unchanged \n')
print("Your choice was " + choice)
if choice == 1:
for k,v in newDict.items():
newDict.update({k.upper(): v.upper()})
if choice == 2:
for k,v in newDict.items():
dict2.update({k.lower(): v})
#find keys and replace with word
print(newDict)
with open("tester.txt", "rt") as fin:
with open("outwords.txt", "wt") as fout:
for line in fin:
fout.write(line.replace('{PETNAME}', str(newDict['PETNAME:'])))
fout.write(line.replace('{ACTIVITY}', str(newDict['ACTIVITY:'])))
myfile = open("outwords.txt")
txt = myfile.read()
print(txt)
myfile.close()
In python 3 you cannot do that:
for k,v in newDict.items():
newDict.update({k.upper(): v.upper()})
because it changes the dictionary while iterating over it and python doesn't allow that (It doesn't happen with python 2 because items() used to return a copy of the elements as a list). Besides, even if it worked, it would keep the old keys (also: it's very slow to create a dictionary at each iteration...)
Instead, rebuild your dict in a dict comprehension:
newDict = {k.upper():v.upper() for k,v in newDict.items()}
You should not change dictionary items as you iterate over them. The docs state:
Iterating views while adding or deleting entries in the dictionary may
raise a RuntimeError or fail to iterate over all entries.
One way to update your dictionary as required is to pop values and reassign in a for loop. For example:
d = {'abc': 'xyz', 'def': 'uvw', 'ghi': 'rst'}
for k, v in d.items():
d[k.upper()] = d.pop(k).upper()
print(d)
{'ABC': 'XYZ', 'DEF': 'UVW', 'GHI': 'RST'}
An alternative is a dictionary comprehension, as shown by #Jean-FrançoisFabre.
I have a text file and its content is something like this:
A:3
B:5
C:7
A:8
C:6
I need to print:
A numbers: 3, 8
B numbers: 5
C numbers: 7, 6
I'm a beginner so if you could give some help I would appreciate it. I have made a dictionary but that's pretty much all I know.
You could use an approach that keeps the values in a dictionary:
d = {} # create an empty dictionary
for line in open(filename): # opens the file
k, v = line.split(':') # unpack each line in the char before : and after
if k in d: # add the values to the dictionary
d[k].append(v)
else:
d[k] = [v]
This gives you a dictionary containing your file in a format that you can utilize to get the desired output:
for key, values in sorted(d.items()):
print(key, 'numbers:' ', '.join(values))
The sorted is required because dictionaries are unordered.
Note that using collections.defaultdict instead of a normal dict could simplify the approach somewhat. The:
d = {}
...
if k in d: # add the values to the dictionary
d[k].append(v)
else:
d[k] = [v]
could then be replaced by:
from collections import defaultdict
d = defaultdict(list)
...
d[k].append(v)
Short version (Which should sort in alphabetic order)
d = {}
lines = [line.rstrip('\n') for line in open('filename.txt')]
[d.setdefault(line[0], []).append(line[2]) for line in lines]
[print(key, 'numbers:', ', '.join(values)) for key,values in sorted(d.items())]
Or if you want to maintain the order as they appear in file (file order)
from collections import OrderedDict
d = OrderedDict() # Empty dict
lines = [line.rstrip('\n') for line in open('filename.txt')] # Get the lines
[d.setdefault(line[0], []).append(line[2]) for line in lines] # Add lines to dictionary
[print(key, 'numbers:', ', '.join(values)) for key,values in d.items()] # Print lines
Tested with Python 3.5.
You can treat your file as csv (comma separated value) so you can use the csv module to parse the file in one line. Then use defaultdict with input in the costructor the class list to say that to create it when the key not exists. Then use OrderedDict class because standard dictionary don't keeps the order of your keys.
import csv
from collection import defaultdict, OrderedDict
values = list(csv.reader(open('your_file_name'), delimiter=":")) #[['A', '3'], ['B', '5'], ['C', '7'], ['A', '8'], ['C', '6']]
dct_values = defaultdict(list)
for k, v in values:
dct_values[k].append(v)
dct_values = OrderedDict(sorted(dct_values.items()))
Then you can simply print iterating the dictionary.
A very easy way to group by key is by external library, if you are interested try PyFunctional
I want to search for map names in maps. To make it more clear what I want, this is the map structure:
\All data
\Submap1
\SubSubmap1
\some files
\Subsubmap2
\Submap2
\Submap3
What I want to do is search for SubSubmap's. I want to search them on the name of the subsubmap.
I hope you guys can give me a head start, cause I can't find any way to search on the name of a map.
Let's use explore() function from here to save result of os.walk() into dictionary.
And after that just iterate over names and match them with pattern.
My folders:
.\all_data
.\all_data\sub1
.\all_data\sub1\subsub1
.\all_data\sub1\subsub1\some_files
.\all_data\sub1\subsub2
.\all_data\sub2
def explore(starting_path):
alld = {'': {}}
for dirpath, dirnames, filenames in os.walk(starting_path):
d = alld
dirpath = dirpath[len(starting_path):]
for subd in dirpath.split(os.sep):
based = d
d = d[subd]
if dirnames:
for dn in dirnames:
d[dn] = {}
else:
based[subd] = filenames
return alld['']
data = explore('.')
for k, v in data['all_data'].iteritems():
if v:
for key in v:
if 'subsub' in key:
print key
>>> {'all_data': {'sub1': {'subsub1': {'some_files': []}, 'subsub2': []},
'sub2': []}}
>>> subsub2
>>> subsub1
You could use more smarter verifications here if 'subsub' in key: as regex and so on.
I have option file in this format:
key value\t\n
N:B:. Some values show tab after it.
I use Code like :
src = open("conf.cfg").readlines()
item = item.split(" ")[0:2]
key = item[0]
value = item[1]
dict_[key] = value
Can I use generator expression to get the same result ??
You could use a dictionary comprehension, for example:
with open("conf.cfg") as f:
dict_ = {key: value
for key, value in (line.strip().split(" ")[:2]
for line in f)}
Maybe this way:
with open('config.txt') as f:
dict_ = {k: v for k, v in (line.split() for line in f.readlines())}
Yes, you can use a generator expression, it would look like this:
with open("conf.cfg") as f:
dict_ = dict(line.split() for line in f)
I am assuming you don't have spaces in the keys or values. If you have spaces in them (and "they are quoted" strings) then it will be easier for you to read your file with the csv module.
Using generator function:
def myGenerator():
with open("conf.cfg") as f:
for line in f.readlines():
yield(line.split())
result = {k:v for k,v in myGenerator()}
print result
I have a dictionary with many, many key/value pairs.
The keys are dates and the values are worldwide top-level domains.
I want to output the dictionary to a text file so that it counts and alpha sorts similar values but only within the same key
for example:
*key: value1:count value2:count*
date1: au:4 be:12 com:44
date2: az:4 com:14 net:5
Code:
with open('access_logshort.txt','rU') as f:
for line in f:
list1 = re.search(r'(?P<Date>[0-9]{2}/[a-zA-Z]{3}/[0-9]{4})(.+)(GET|POST)\s(http://|https://)([a-zA-Z.]+)(\.)(?P<tld>[a-zA-Z]+)(/).+?"\s200',line)
if list1 != None:
print list1.groupdict()
one_tuple = list1.group(1,7)
my_dict[one_tuple[0]]=one_tuple[1]
output:
print my_dict
{'09/Mar/2004': 'hu'}
{'09/Mar/2004': 'hu'}
{'09/Mar/2004': 'com'}
{'09/Mar/2004': 'ru'}
{'09/Mar/2004': 'ru'}
{'09/Mar/2004': 'com'}
T
This should suit your case.
from collections import defaultdict
from dateutil.parser import parse
import csv
import re
data = defaultdict(lambda: defaultdict(int))
with open('access_logshort.txt','rU') as f:
for line in f:
list1 = re.search(r'(?P<Date>[0-9]{2}/[a-zA-Z]{3}/[0-9]{4})(.+)(GET|POST)\s(http://|https://)([a-zA-Z.]+)(\.)(?P<tld>[a-zA-Z]+)(/).+?"\s200',line)
if list1 is not None:
date, domain = list1.group(1,7)
data[date.lower()][domain.lower()] += 1
with open('my_data.csv', 'wb') as ofile:
# add delimiter='\t' to the argument list of csv.writer if you want
# tsv rather than csv
writer = csv.writer(ofile)
for key, value in sorted(data.iteritems(), key=lambda x: parse(x[0])):
domains = sorted(value.iteritems())
writer.writerow([key] + ['{}:{}'.format(*d) for d in domains])
Output:
10/Mar/2004,com:2,hu:2,ru:2
09/Mar/2004,com:2,hu:2,ru:2