How to store regex in dictionary python? - python

How to store regex inside dictionary. I have a text file stored in my computer and I wanna go trough it line by line. I need to use regex to get some information. So far I'm good. My problem is I do not know how to store it in dictionary. Only the information that I need, not the other stuff that comes with it.
I want to safe the data in key while value how many time appeared
input syslog.log ... output email address and how many time appeard

You can use below approach first read the file and create empty dictionary. An for each line check if that line is alreday in dictionary increment the value else intialize the value with 1.
d = {}
with open('input.txt', 'r') as fp:
for data in fp.readlines():
data = data.strip()
if data in d:
d[data] += 1
else:
d[data] = 1
print(d)

Related

How can I convert a text file to a dictionary to be used in Python?

I am doing a small programming project for school, one of the main elements of my program is to be able to allow the user to enter a problem (e.g. "my phone screen is blank") and for me to iterate through each word of the string and compare it to a dictionary, and the dictionary must be an external text file, I would like help on the dictionary aspect of this. I have tried a few other methods from different posts of stack overflow but they do not seem to work.
Some lines of my dictionary include:
battery Have you tried charging your phone? You may need to replace your battery.
sound Your volume may be on 0 or your speakers may be broken, try using earphones.
blank Your screen may be dead, you may need a replacement.
Here, "battery,"sound" and "blank" are they keys with their respective values following on after, how do I plan on doing this?
Thank you!
[edited code]
def answerInput():
print("Hello, how may I help you with your mobile device")
a = input("Please enter your query below\n")
return a
def solution(answer):
string = answer
my_dict = {}
with open("Dictionary.txt") as f:
for line in f:
key, value = line.strip("\n").split(maxsplit=1)
my_dict[key] = value
for word in string.split():
if word == my_dict[key]:
print(value)
process = 0
answer = answerInput()
solution(answer)
my_dict = {} # defining it as a dictionary
with open("file.txt") as f: # opening txt file
for line in f:
key, value= line.strip("\n").split(maxsplit=1) # key is before first whitespace, value is after whitespace
my_dict[key] = value
This will work well, something I have used personally.
It initiates a dictionary called my_dict,
opens the file and for each line it strips \n from the line and splits it once only. This creates two values which we call key and value which we add to the dictionary in the form {key: value}
Or, using dict comprehension
with open("file.txt") as f: # opening txt file
my_dict = {k: v for k,v in (line.strip("\n").split(maxsplit=1) for line in f)}

How to extract separate nth lines from a txt file and assign them to key:value pairs in python 3?

I'm learning how to code and I've run into a problem I don't have an answer to. I have a text file from which I have to make three dictionaries:
Georgie Porgie
87%
$$$
Canadian, Pub Food
Queen St. Cafe
82%
$
Malaysian, Thai
For the purpose of this thread I just want to ask how to extract the first line of each text block and store it as a key and the second line of each block as a value? I am supposed to write a code using nothing more but the very basic functions and loops.
Here is my code(once the file is opened):
d = {}
a = 0
for i in file:
d[i] = i + 1
a = i + 5
return(d)
Thank you.
First you have to read the file:
with open("data.txt") as file:
lines = file.readlines()
The with clause ensures it is closed after it is read. Next, according to your description, a line contains a key if the index % 5 is 0. Then, the next line contains the value. With only "basic" elements of the language, you could construct your dictionary like this:
dic = {lines[idx].strip(): lines[idx + 1].strip()
for idx in range(0, len(lines), 5)}
This is a dictionary comprehension, which can also be written unfolded.
Now you can also zip the keys and values first, so you can iterate them quite easily. This makes the dictionary comprehension more readable. The strip method is necessary though, since we want to get rid of the line breaks.
entries = zip(lines[::5], lines[1::5])
dic = {key.strip(): value.strip() for key, value in entries}

reading a file and parse them into section

okay so I have a file that contains ID number follows by name just like this:
10 alex de souza
11 robin van persie
9 serhat akin
I need to read this file and break each record up into 2 fields the id, and the name. I need to store the entries in a dictionary where ID is the key and the name is the satellite data. Then I need to output, in 2 columns, one entry per line, all the entries in the dictionary, sorted (numerically) by ID. dict.keys and list.sort might be helpful (I guess). Finally the input filename needs to be the first command-line argument.
Thanks for your help!
I have this so far however can't go any further.
fin = open("ids","r") #Read the file
for line in fin: #Split lines
string = str.split()
if len(string) > 1: #Seperate names and grades
id = map(int, string[0]
name = string[1:]
print(id, name) #Print results
We need sys.argv to get the command line argument (careful, the name of the script is always the 0th element of the returned list).
Now we open the file (no error handling, you should add that) and read in the lines individually. Now we have 'number firstname secondname'-strings for each line in the list "lines".
Then open an empty dictionary out and loop over the individual strings in lines, splitting them every space and storing them in the temporary variable tmp (which is now a list of strings: ('number', 'firstname','secondname')).
Following that we just fill the dictionary, using the number as key and the space-joined rest of the names as value.
To print the dictionary sorted just loop over the list of numbers returned by sorted(out), using the key=int option for numerical sorting. Then print the id (the number) and then the corresponding value by calling the dictionary with a string representation of the id.
import sys
try:
infile = sys.argv[1]
except IndexError:
infile = input('Enter file name: ')
with open(infile, 'r') as file:
lines = file.readlines()
out = {}
for fullstr in lines:
tmp = fullstr.split()
out[tmp[0]] = ' '.join(tmp[1:])
for id in sorted(out, key=int):
print id, out[str(id)]
This works for python 2.7 with ASCII-strings. I'm pretty sure that it should be able to handle other encodings as well (German Umlaute work at least), but I can't test that any further. You may also want to add a lot of error handling in case the input file is somehow formatted differently.
Just a suggestion, this code is probably simpler than the other code posted:
import sys
with open(sys.argv[1], "r") as handle:
lines = handle.readlines()
data = dict([i.strip().split(' ', 1) for i in lines])
for idx in sorted(data, key=int):
print idx, data[idx]

Python 2 - iterating through csv with determinating specific lines as dicitonary

I generated csv from multiple dictionaries (to be readable and editable too) with help of this question. Output is simple
//Dictionary
key,value
key2,value2
//Dictionary2
key4, value4
key5, value5
i want double backslash to be separator to create new dictionary, but every calling csv.reader(open("input.csv")) evaluates through lines so i have no use of:
import csv
dict = {}
for key, val in csv.reader(open("input.csv")):
dict[key] = val
Thanks for helping me out..
Edit: i made this piece of.. well "code".. I'll be glad if you can check it out and review:
#! /usr/bin/python
import csv
# list of dictionaries
l = []
# evalute throught csv
for row in csv.reader(open("test.csv")):
if row[0].startswith("//"):
# stripped "//" line is name for dictionary
n = row[0][2:]
# append stripped "//" line as name for dictionary
#debug
print n
l.append(n)
#debug print l[:]
elif len(row) == 2:
# debug
print "len(row) %s" % len(row)
# debug
print "row[:] %s" % row[:]
for key, val in row:
# print key,val
l[-1] = dic
dic = {}
dic[key] = val
# debug
for d in l:
print l
for key, value in d:
print key, value
unfortunately i got this Error:
DictName
len(row) 2
row[:] ['key', ' value']
Traceback (most recent call last):
File "reader.py", line 31, in <module>
for key, val in row:
ValueError: too many values to unpack
Consider not using CSV
First of all, your overall strategy to the data problem is probably not optimal. The less tabular your data looks, the less sense it makes to keep it in a CSV file (though your needs aren't too far out of the realm).
For example, it would be really easy to solve this problem using json:
import json
# First the data
data = dict(dict1=dict(key1="value1", key2="value2"),
dict2=dict(key3="value3", key4="value4"))
# Convert and write
js = json.dumps(data)
f = file("data.json", 'w')
f.write(js)
f.close()
# Now read back
f = file("data.json", 'r')
data = json.load(f)
print data
Answering the question as written
However, if you are really set on this strategy, you can do something along the lines suggested by jonrsharpe. You can't just use the csv module to do all the work for you, but actually have to go through and filter out (and split by) the "//" lines.
import csv
import re
def header_matcher(line):
"Returns something truthy if the line looks like a dict separator"
return re.match("//", line)
# Open the file and ...
f = open("data.csv")
# create some containers we can populate as we iterate
data = []
d = {}
for line in f:
if not header_matcher(line):
# We have a non-header row, so we make a new entry in our draft dictionary
key, val = line.strip().split(',')
d[key] = val
else:
# We've hit a new header, so we should throw our draft dictionary in our data list
if d:
# ... but only if we actually have had data since the last header
data.append(d)
d = {}
# The very last chunk will need to be captured as well
if d:
data.append(d)
# And we're done...
print data
This is quite a bit messier, and if there is any chance of needed to escape commas, it will get messier still. If you needed, you could probably find a clever way of chunking up the file into generators that you read with CSV readers, but it won't be particularly clean/easy (I started an approach like this but it looked like pain...). This is all a testament to your approach likely being the wrong way to store this data.
An alternative if you're set on CSV
Another way to go if you really want CSV but aren't stuck on the exact data format you specify: Add a column in the CSV file corresponding to the dictionary the data should go into. Imagine a file (data2.csv) that looks like this:
dict1,key1,value1
dict1,key2,value2
dict2,key3,value3
dict2,key4,value4
Now we can do something cleaner, like the following:
import csv
data = dict()
for chunk, key, val in csv.reader(file('test2.csv')):
try:
# If we already have a dict for the given chunk id, this should add the key/value pair
data[chunk][key] = val
except KeyError:
# Otherwise, we catch the exception and add a fresh dictionary with the key/value pair
data[chunk] = {key: val}
print data
Much nicer...
The only good argument for doing something closer to what you have in mind over this is if there is LOTS of data, and space is a concern. But that is not very likely to be case in most situations.
And pandas
Oh yes... one more possible solution is pandas. I haven't used it much yet, so I'm not as much help, but there is something along the lines of a group_by function it provides, which would let you group by the first column if you end up structuring the data as in the the 3-column CSV approach.
I decided to use json instead
Reading this is easier for the program and there's no need to filter text. For generating the data inside database in external file.json will serve python program.
#! /usr/bin/python
import json
category1 = {"server name1":"ip address1","server name2":"ip address2"}
category2 = {"server name1":"ip address1","server name1":"ip address1"}
servers = { "category Alias1":category1,"category Alias2":category2}
js = json.dumps(servers)
f = file("servers.json", "w")
f.write(js)
f.close()
# Now read back
f = file("servers.json", "r")
data = json.load(f)
print data
So the output is dictionary containing keys for categories and as values are another dictionaries. Exactly as i wanted.

Read initially unknown number of N lines from file in a nested dictionary and start in next iteration at line N+1

I want to process a text file (line by line). An (initially unknown) number of consecutive lines belong to the same entity (i.e. they carry the same identifier with the line). For example:
line1: stuff, stuff2, stuff3, ID1, stuff4, stuff5
line2: stuff, stuff2, stuff3, ID1, stuff4, stuff5
line3: stuff, stuff2, stuff3, ID1, stuff4, stuff5
line4: stuff, stuff2, stuff3, ID2, stuff4, stuff5
line5: stuff, stuff2, stuff3, ID2, stuff4, stuff5
...
In this dummy lines 1-3 belong to the entity ID1 and lines 4-5 to ID2. I want to read each of these lines as a dictionary and then want to nest them into a dictionary containing all the dictionaries of IDX (e.g. a dictionary ID1 with 3 nested dictionary of lines 1-3, respectively).
More specifically I would like to define a function that:
opens the file
reads all (but only) the lines of entity ID1 into individual dictionaries
returns the dictionary which carries the nested dictionaries of the ID1 lines
I want to be able to call the function some time later again to read in the next dictionary of all the lines of the following identifier (ID2) and later ID3 etc. One of the problems I am having is that I need to test in every line whether my current line is still carrying the ID of interest or already a new one. If it is a new one, I sure can stop and return the dictionary but in the next round (say, ID2) the first line of ID2 has then already been read and I thus seem to lose that line.
In other words: I would like to somehow reset the counter in the function once it encounters a line with new ID so that in the next iteration this first line with the new ID is not lost.
This seems such a straightforward task but I cannot figure out a way to do that elegantly. I currently pass some "memory"-flags/variables between functions in order to keep track of whether the first line of a new ID was already read in a previous iteration. That is quite bulky and error prone.
Thanks for reading... any ideas/hints are highly appreciated. If some points are unclear please ask.
Here is my "solution". It seems to work in the sense that it prints the dictionary correctly (although I am sure there is a more elegant way to do that).
I also forgot to mention that the textfile is very large and I hence want to process it ID by ID instead of reading the whole file into memory.
with open(infile, "r") as f:
newIDLine = None
for line in f:
if not line:
break
# the following function returns the ID
ID = get_ID_from_line(line)
counter = 1
ID_Dic = dict()
# if first line is completely new (i.e. first line in infile)
if newIDLine is None:
currID = ID
# the following function returns the line as a dic
ID_Dic[counter] = process_line(line)
# if first line of new ID was already read in
# the previous "while" iteration (see below).
if newIDLine is not None:
# if the current "line" is of the same ID then the
# previous one: put previous and current line in
# the same dic and start the while loop.
if ID == oldID:
ID_Dic[counter] = process_line(newIDLine)
counter += 1
ID_Dic[counter] = process_line(line)
currID = ID
# iterate over the following lines until file end or
# new ID starts. In the latter case: keep the info in
# objects newIDline and oldID
while True:
newLine = next(f)
if not newLine:
break
ID = get_ID_from_line(newLine)
if ID == currID:
counter += 1
ID_Dic[counter] = process_line(newLine)
# new ID; save line for the upcomming ID dic
if not ID == currID:
newIDLine = newLine
oldID = ID
break
# at this point it would be great to return the Dictionary of
# the current ID to the calling function but at return to this
# function continue where I left off.
print ID_Dic
If you want this function to lazily return a dict for each id, you should make it a generator function by using yield instead of return. At the end of each id, yield the dict for that id. Then you can iterate over that generator.
To handle the file, write a generator function that iterates over a source unless you send it a value, in which case it returns that value next, then goes back to iterating. (For example, here's a module I wrote to do this for myself: politer.py.)
Then you can solve this problem easily by sending the value "back" if you don't want it:
with open(infile, 'r') as f:
polite_f = politer(f)
current_id = None
while True:
id_dict = {}
for i, line in enumerate(polite_f):
id = get_id_from_line(line)
if id != current_id:
polite_f.send(line)
break
else:
id_dict[i] = process_line(line)
if current_id is not None:
yield id_dict
current_id = id
Note that this keeps the state handling abstracted in the generator where it belongs.
You could use a dictionary to keep track of all the IDX columns and just add each line's IDX column to the appropriate list in the dictionary, something like:
from collections import defaultdict
import csv
all_lines_dict = defaultdict(list)
with open('your_file') as f:
csv_reader = csv.reader(f)
for line_list in csv_reader:
all_lines_dict[line_list[3]].append(line_list)
Csv reader is part of python standard library, and makes reading csv files easy. It will read each line as a list of its columns.
This differs from your requirements because each key is not a dictionary of dictionaries but it is a list of the lines that share the IDX key.

Categories

Resources