Python - converting textfile contents into dictionary values/keys easily - python

Let's say I have a text file with the following:
line = "this is line 1"
line2 = "this is the second line"
line3 = "here is another line"
line4 = "yet another line!"
And I want to quickly convert these into dictionary keys/values with " line* " being the key and the text in quotes as the value while also removing the equals sign.
What would be the best way to do this in Python?

f = open(filepath, 'r')
answer = {}
for line in f:
k, v = line.strip().split('=')
answer[k.strip()] = v.strip()
f.close()
Hope this helps

In one line:
d = dict((line.strip().split(' = ') for line in file(filename)))

Here's what the urlopen version of inspectorG4dget's answer might look like:
from urllib.request import urlopen
url = 'https://raw.githubusercontent.com/sedeh/github.io/master/resources/states.txt'
response = urlopen(url)
lines = response.readlines()
state_names_dict = {}
for line in lines:
state_code, state_name = line.decode().split(":")
state_names_dict[state_code.strip()] = state_name.strip()

Related

Find a dot in a text file and add a newline to the file in Python?

I read from a file, if it finds a ".", it should add a newline "\n" to the text and write it back to the file. I tried this code but still have the problem.
inp = open('rawCorpus.txt', 'r')
out = open("testFile.text", "w")
for line in iter(inp):
l = line.split()
if l.endswith(".")
out.write("\n")
s = '\n'.join(l)
print(s)
out.write(str(s))
inp.close()
out.close()
Try This ( Normal way ):
with open("rawCorpus.txt", 'r') as read_file:
raw_data = read_file.readlines()
my_save_data = open("testFile.text", "a")
for lines in raw_data:
if "." in lines:
re_lines = lines.replace(".", ".\r\n")
my_save_data.write(re_lines)
else:
my_save_data.write(lines + "\n")
my_save_data.close()
if your text file is not big you can try this too :
with open("rawCorpus.txt", 'r') as read_file:
raw_data = read_file.read()
re_data = raw_data.replace(".", ".\n")
with open("testFile.text", "w") as save_data:
save_data.write(re_data)
UPDATE ( output new lines depends on your text viewer too! because in some text editors "\n" is a new line but in some others "\r\n" is a new line. ) :
input sample :
This is a book. i love it.
This is a apple. i love it.
This is a laptop. i love it.
This is a pen. i love it.
This is a mobile. i love it.
Code:
last_buffer = []
read_lines = [line.rstrip('\n') for line in open('input.txt')]
my_save_data = open("output.txt", "a")
for lines in read_lines:
re_make_lines = lines.split(".")
for items in re_make_lines:
if items.replace(" ", "") == "":
pass
else:
result = items.strip() + ".\r\n"
my_save_data.write(result)
my_save_data.close()
Ouput Will Be :
This is a book.
i love it.
This is a apple.
i love it.
This is a laptop.
i love it.
This is a pen.
i love it.
This is a mobile.
i love it.
You are overwriting the string s in every loop with s = '\n'.join(l).
Allocate s = '' as empty string before the for-loop and add the new lines during every loop, e.g. with s += '\n'.join(l) (short version of s = s + '\n'.join(l)
This should work:
inp = open('rawCorpus.txt', 'r')
out = open('testFile.text', 'w')
s = '' # empty string
for line in iter(inp):
l = line.split('.')
s += '\n'.join(l) # add new lines to s
print(s)
out.write(str(s))
inp.close()
out.close()
Here is my own solution, but still I want one more newline after ".", that this solution not did this
read_lines = [line.rstrip('\n') for line in open('rawCorpus.txt')]
words = []
my_save_data = open("my_saved_data.txt", "w")
for lines in read_lines:
words.append(lines)
for word in words:
w = word.rstrip().replace('.', '\n.')
w = w.split()
my_save_data.write(str("\n".join(w)))
print("\n".join(w))
my_save_data.close()

Python I/O, URL Reading, Strings, Count

I'm having issues with my python program it supposed to read from text file URL address and read and count the occurrence of for example div tags etc.
I got error in line 23, in
di[ffline[k]]-=1
import urllib
with open('top5_BRZ.txt') as urlf:
uf=urlf.readlines()
for i in range(len(uf)):
link = uf[i]
f = urllib.urlopen(link)
myfile = f.read()
fline=myfile.split('\n')
di={}
for j in range(len(fline)):
line = fline[j]
line = line.replace('"', " ")
line = line.replace("'", " ")
line = line.replace('<', " ")
line = line.replace('>', " ")
line = line.replace('=', " ")
line = line.replace('/', " ")
line = line.replace("\\", " ")
ffline=line.split(' ')
for k in range(len(ffline)):
di[ffline[k]]-=1
sx = sorted(di.items(), key=operator.itemgetter(1))
rr=0
for key, value in di:
if(rr==25): break
print key,value
rr+=1
I agree with #brian. You can use below code (on line 22) which checks whether key is in dictionary and then decrements the value.
for k in range(len(ffline)):
if ffline[k] in di.keys():
di[ffline[k]] -= 1
else:
di[ffline[k]] = something
The dict di doesn't have any keys in it when di[ffline[k]]-=1 is run. di is still an empty dict when you try to decrement the value of the ffline[k] key.
You forgot to use html5lib to parse your html:
import html5lib
import urllib
def main():
for link in ["http://www.google.com/"]:
f = urllib.urlopen(link)
tree = html5lib.parse(f)
divs = len(tree.findall("*//{http://www.w3.org/1999/xhtml}div"))
print("{}: {} divs".format(link, divs))
main()

Copy complete sentences from text file and add to list

I am trying to extract complete sentences from a long text file and adding them as strings to a list in Python 2.7. I want to automate this and not just cut and paste in the list.
Here is what I have:
from sys import argv
script, filename = argv # script = alien.py; filename = roswell.txt
listed = []
text = open(filename, 'rw')
for i in text:
lines = readline(i)
listed.append(lines)
print listed
text.close()
Nothing loads to the list.
You can do it with a while loop:
listed = []
with open(filename,"r") as text:
Line = text.readline()
while Line!='':
listed.append(Line)
Line = text.readline()
print listed
In the previous example, I assumed that each sentence is written on a different line, if that's not the case, use this code instead:
listed = []
with open(filename,"r") as text:
Line = text.readline()
while Line!='':
Line1 = Line.split(".")
for Sentence in Line1:
listed.append(Sentence)
Line = text.readline()
print listed
And on a side note, try using with open(...) as text: instead of text = open(...)
Normally sentences are separated by '. ', not '\n'. Under this condition, use split with period+space(without return-enter):
listed = []
fd = open(filename,"r")
try:
data = fd.read()
sentences = data.split(". ")
for sentence in sentences:
listed.append(sentence)
print listed
finally:
fd.close()

read a specific string from a file in python?

I want to read the above file foo.txt and read only UDE from the first line and store it in a variable then Unspecified from the second line and store it in a variable and so on.
should I use read or readlines ? should I use regex for this ??
My below program is reading the entire line. how to read the specific word in the line ?
fo = open("foo.txt", "r+")
line = fo.readline()
left, right = line.split(':')
result = right.strip()
File_Info_Domain = result
print File_Info_Domain
line = fo.readline()
left, right = line.split(':')
result = right.strip()
File_Info_Intention = result
print File_Info_Intention
line = fo.readline()
left, right = line.split(':')
result = right.strip()
File_Info_NLU_Result = result
print File_Info_NLU_Result
fo.close()
You can use readline() (without s in name) to read line on-by-one, and then you can use split(':') to get value from line.
fo = open("foo.txt", "r+")
# read first line
line = fo.readline()
# split line only on first ':'
elements = line.split(':', 1)
if len(elements) < 2:
print("there is no ':' or there is no value after ':' ")
else:
# remove spaces and "\n"
result = elements[1].strip()
print(result)
#
# time for second line
#
# read second line
line = fo.readline()
# split line only on first ':'
elements = line.split(':', 1)
if len(elements) < 2:
print("there is no ':' or there is no value after ':' ")
else:
# remove spaces and "\n"
result = elements[1].strip()
print(result)
# close
fo.close()
While you can use #furas response or regex, I would recommend you to use a config file to do this, instead of a plain txt. So your config file would look like:
[settings]
Domain=UDE
Intention=Unspecified
nlu_slot_details={"Location": {"literal": "18 Slash 6/2015"}, "Search-phrase": {"literal": "18 slash 6/2015"}
And in your python code:
import configparser
config = configparser.RawConfigParser()
config.read("foo.cfg")
domain = config.get('settings', 'Domain')
intention = config.get('settings', 'Intention')
nlu_slot_details = config.get('settings', 'nlu_slot_details')

Why is my code printing incorrectly to the text file?

I have this code:
with open("pool2.txt", "r") as f:
content = f.readlines()
for line in content:
line = line.strip().split(' ')
try:
line[0] = float(line[0])+24
line[0] = "%.5f" % line[0]
line = ' ' + ' '.join(line)
except:
pass
with open("pool3.txt", "w") as f:
f.writelines(content)
It should take lines that look like this:
-0.597976 -6.85293 8.10038
Into a line that has 24 added to the first number. Like so:
23.402024 -6.85293 8.10038
When I use print in the code to print the line, the line is correct, but when it prints to the text file, it prints as the original.
The original text file can be found here.
When you loop through an iterable like:
for line in content:
line = ...
line is a copy1 of the element. So if you modify it, the changes won't affect to content.
What can you do? You can iterate through indices, so you access directly to the current element:
for i in range(len(content)):
content[i] = ...
1: See #MarkRansom comment.

Categories

Resources