Creating a Dictionary from txt

Creating a Dictionary from txt - python

I have a couple of lines inside a text that i am looking to turn the first word to a key (space is between each) with a function, and the rest to follow as values.
This is what the text contains:
FFFB10 11290 Charlie
1A9345 37659 Delta
221002 93323 Omega
The idea is to turn the first word into a key, but also arrange it (row underneath a row) visualy, so the first word(FFFB10) is the key, and the rest are values, meaning:
Entered: FFFB10
Location: 11290
Name: Charlie
I tried with this as a beginning:
def code(codeenter, file):
for line in file.splitlines():
if name in line:
parts = line.split(' ')
But i dont know how to continue (i erased most of the code), any suggestions?

Assuming you managed to extract a list of lines without the newline character at the end.
def MakeDict(lines):
return {key: (location, name) for key, location, name in (line.split() for line in lines)}
This is an ordinary dictionary comprehension with a generator expression. The former is all the stuff in brackets and the latter is inside the last pair of brackets. line.split splits a line with whitespace being the delimiter.
Example run:
>>> data = '''FFFB10 11290 Charlie
... 1A9345 37659 Delta
... 221002 93323 Omega'''
>>> lines = data.split('\n')
>>> lines
['FFFB10 11290 Charlie', '1A9345 37659 Delta', '221002 93323 Omega']
>>> def MakeDict(lines):
... return {key: (location, name) for key, location, name in (line.split() for line in lines)}
...
>>>
>>> MakeDict(lines)
{'FFFB10': ('11290', 'Charlie'), '1A9345': ('37659', 'Delta'), '221002': ('93323', 'Omega')}
How to format the output:
for key, values in MakeDict(lines).items():
print("Key: {}\nLocation: {}\nName: {}".format(key, *values))

See ForceBru's answer on how to construct the dictionary. Here's the printing part:
for k, (v1, v2) in your_dict.items():
print("Entered: {}\nLocation: {}\nName: {}\n".format(k, v1, v2))

You can try this:
f = [i.strip('\n').split() for i in open('filename.txt')]
final_dict = {i[0]:i[1:] for i in f}
Assuming the data is structured like this:
FFFB10 11290 Charlie
1A9345 37659 Delta
221002 93323 Omega
Your output will be:
{'FFFB10': ['11290', 'Charlie'], '221002': ['93323', 'Omega'], '1A9345': ['37659', 'Delta']}

You may want to consider using namedtuple.
from collections import namedtuple
code = {}
Code = namedtuple('Code', 'Entered Location Name')
filename = '/Users/ca_mini/Downloads/junk.txt'
with open(filename, 'r') as f:
for row in f:
row = row.split()
code[row[0]] = Code(*row)
>>> code
{'1A9345': Code(Entered='1A9345', Location='37659', Name='Delta'),
'221002': Code(Entered='221002', Location='93323', Name='Omega'),
'FFFB10': Code(Entered='FFFB10', Location='11290', Name='Charlie')}

Related

Adding words between lines to an array

This is the content of my file:
david C001 C002 C004 C005 C006 C007
* C008 C009 C010 C011 C016 C017 C018
* C019 C020 C021 C022 C023 C024 C025
anna C500 C521 C523 C547 C555 C556
* C557 C559 C562 C563 C566 C567 C568
* C569 C571 C572 C573 C574 C575 C576
* C578
charlie C701 C702 C704 C706 C707 C708
* C709 C712 C715 C716 C717 C718
I want my output to be:
david=[C001,C002,C004,C005,C006,C007,C008,C009,C010,C011,C016,C017,C018,C019,C020,C021,C022,C023,C024,C025]
anna=[C500,C521,C523,C547,C555,C556,C557,C559,C562,C563,C566,C567,C568,C569,C571,C572,C573,C574,C575,C576,C578]
charlie=[C701,C702,C704,C706,C707,C708,C709,C712,C715,C716,C717,C718]
I am able to create:
david=[C001,C002,C004,C005,C006,C007]
anna=[C500,C521,C523,C547,C555,C556]
charlie=[C701,C702,C704,C706,C707,C708]
counting the number of words in a line and using line[0] as the array name and adding the remaining words to the array.
However, I don't know how to take the continuation of words in the next lines starting with "*" to the array.
Can anyone help?

NOTE: This solution relies on defaultdict being ordered, which is something that was introduced on Python 3.6
Somewhat naive approach:
from collections import defaultdict
# Create a dictionary of people
people = defaultdict(list)
# Open up your file in read-only mode
with open('your_file.txt', 'r') as f:
# Iterate over all lines, stripping them and splitting them into words
for line in filter(bool, map(str.split, map(str.strip, f))):
# Retrieve the name of the person
# either from the current line or use the name of the last person processed
name, words = list(people)[-1] if line[0] == '*' else line[0], line[1:]
# Add all remaining words to that person's record
people[name].extend(words)
print(people['anna'])
# ['C500', 'C521', 'C523', 'C547', 'C555', 'C556', 'C557', 'C559', 'C562', 'C563', 'C566', 'C567', 'C568', 'C569', 'C571', 'C572', 'C573', 'C574', 'C575', 'C576', 'C578']
It also has the additional benefit of returning an empty list for unknown names:
print(people['matt'])
# []

You could read the lists into a dictionary using regular expressions:
import re
with open('file_name') as file:
contents = file.read()
res_list = re.findall(r"[a-z]+\s+[^a-z]+",contents)
res_dict = {}
for p in res_list:
elt = p.split()
res_dict[elt[0]] = [e for e in elt[1:] if e != '*']
print(res_dict)

I figured out a way myself. Thanks to the ones who gave their own solution. It gave me new perspective.
Below is my code:
persons_library={}
persons=['david','anna','charlie']
for i,person in enumerate(persons,start=0):
persons_library[person]=[]
with open('data.txt','r') as f:
for line in f:
line=line.replace('*',"")
line=line.split()
for i,val in enumerate(line,start=0):
if val in persons_library:
key=val
else:
persons_library[key].append(val)
print(persons_library)

Python splitting data record

I have a record as below:
29 16
A 1.2595034 0.82587254 0.7375044 1.1270138 -0.35065323 0.55985355
0.7200067 -0.889543 0.2300735 0.56767654 0.2789483 0.32296127 -0.6423197 0.26456305 -0.07363393 -1.0788593
B 1.2467299 0.78651106 0.4702038 1.204216 -0.5282698 0.13987103
0.5911153 -0.6729466 0.377103 0.34090135 0.3052503 0.028784657 -0.39129165 0.079238065 -0.29310825 -0.99383247
I want to split the data into key-value pairs neglecting the first top row i.e 29 16. It should be neglected.
The output should be something like this:
x = A , B
y = 1.2595034 0.82587254 0.7375044 1.1270138 -0.35065323 0.55985355 0.7200067 -0.889543 0.2300735 0.56767654 0.2789483 0.32296127 -0.6423197 0.26456305 -0.07363393 -1.0788593
1.2467299 0.78651106 0.4702038 1.204216 -0.5282698 0.13987103 0.5911153 -0.6729466 0.377103 0.34090135 0.3052503 0.028784657 -0.39129165 0.079238065 -0.29310825 -0.99383247
I am able to neglect the first line using the below code:
f = open(fileName, 'r')
lines = f.readlines()[1:]
Now how do I separate rest record in Python?

So here's my take :D I expect you'd want to have the numbers parsed as well?
def generate_kv(fileName):
with open(fileName, 'r') as file:
# ignore first line
file.readline()
for line in file:
if '' == line.strip():
# empty line
continue
values = line.split(' ')
try:
yield values[0], [float(x) for x in values[1:]]
except ValueError:
print(f'one of the elements was not a float: {line}')
if __name__ == '__main__':
x = []
y = []
for key, value in generate_kv('sample.txt'):
x.append(key)
y.append(value)
print(x)
print(y)
assumes that the values in sample.txt look like this:
% cat sample.txt
29 16
A 1.2595034 0.82587254 0.7375044 1.1270138 -0.35065323 0.55985355 0.7200067 -0.889543 0.2300735 0.56767654 0.2789483 0.32296127 -0.6423197 0.26456305 -0.07363393 -1.0788593
B 1.2467299 0.78651106 0.4702038 1.204216 -0.5282698 0.13987103 0.5911153 -0.6729466 0.377103 0.34090135 0.3052503 0.028784657 -0.39129165 0.079238065 -0.29310825 -0.99383247
and the output:
% python sample.py
['A', 'B']
[[1.2595034, 0.82587254, 0.7375044, 1.1270138, -0.35065323, 0.55985355, 0.7200067, -0.889543, 0.2300735, 0.56767654, 0.2789483, 0.32296127, -0.6423197, 0.26456305, -0.07363393, -1.0788593], [1.2467299, 0.78651106, 0.4702038, 1.204216, -0.5282698, 0.13987103, 0.5911153, -0.6729466, 0.377103, 0.34090135, 0.3052503, 0.028784657, -0.39129165, 0.079238065, -0.29310825, -0.99383247]]
Alternatively, if you'd wanted to have a dictionary, do:
if __name__ == '__main__':
print(dict(generate_kv('sample.txt')))
That will convert the list into a dictionary and output:
{'A': [1.2595034, 0.82587254, 0.7375044, 1.1270138, -0.35065323, 0.55985355, 0.7200067, -0.889543, 0.2300735, 0.56767654, 0.2789483, 0.32296127, -0.6423197, 0.26456305, -0.07363393, -1.0788593], 'B': [1.2467299, 0.78651106, 0.4702038, 1.204216, -0.5282698, 0.13987103, 0.5911153, -0.6729466, 0.377103, 0.34090135, 0.3052503, 0.028784657, -0.39129165, 0.079238065, -0.29310825, -0.99383247]}

you can use this script if your file is a text
filename='file.text'
with open(filename) as f:
data = f.readlines()
x=[data[0][0],data[1][0]]
y=[data[0][1:],data[1][1:]]

If you're happy to store the data in a dictionary here is what you can do:
records = dict()
with open(filename, 'r') as f:
f.readline() # skip the first line
for line in file:
key, value = line.split(maxsplit=1)
records[key] = value.split()
The structure of records would be:
{
'A': ['1.2595034', '0.82587254', '0.7375044', ... ]
'B': ['1.2467299', '0.78651106', '0.4702038', ... ]
}
What's happening
with ... as f we're opening the file within a context manager (more info here). This allows us to automatically close the file when the block finishes.
Because the open file keeps track of where it is in the file we can use f.readline() to move the pointer down a line. (docs)
line.split() allows you to turn a string into a list of strings. With the maxsplits=1 arg it means that it will only split on the first space.
e.g. x, y = 'foo bar baz'.split(maxsplit=1), x = 'foo' and y = 'bar baz'

If I understood correctly, you want the numbers to be collected in a list. One way of doing this is:
import string
text = '''
29 16
A 1.2595034 0.82587254 0.7375044 1.1270138 -0.35065323 0.55985355 0.7200067 -0.889543 0.2300735 0.56767654 0.2789483 0.32296127 -0.6423197 0.26456305 -0.07363393 -1.0788593
B 1.2467299 0.78651106 0.4702038 1.204216 -0.5282698 0.13987103 0.5911153 -0.6729466 0.377103 0.34090135 0.3052503 0.028784657 -0.39129165 0.079238065 -0.29310825 -0.99383247
'''
lines = text.split('\n')
x = [
line[1:].strip().split()
for i, line in enumerate(lines)
if line and line[0].lower() in string.ascii_letters]
This will produce a list of lists when the outer list contains A, B, etc. and the inner lists contain the numbers associated to A, B, etc.
This code assumes that you are interested in lines starting with any single letter (case-insensitive).
For more elaborated conditions you may want to look into regular expressions.
Obviously, if your text is in a file, you could substitute lines = ... with:
with open(filepath, 'r') as lines:
x = ...
Also, if the items in x should not be separated, but rather in a string, you may want to change line[1:].strip().split() with line[1:].strip().
Instead, if you want the numbers as float and not string, you should replace line[1:].strip().split() with [float(value) for value in line[1:].strip().split()].
EDIT:
Alternatively to line[1:].strip().split() you may want to do:
line.split(maxsplit=1)[1].split()
as suggested in some other answer. This would generalize better if the first token is not a single character.

Parsing file structure based on a specific pattern

I have a text file with multiple lines that are in the order of name, location, website, then 'END' to indicate the end of one person's profile, then again name, location, website, and so on.
I need to add the name as a key to a dictionary and the rest (location, website) as its values.
So if I have a file:
name1
location1
website1
END
name2
location2
website2
END
name3
location3
website3
END
the outcome would be:
dict = {'name1': ['location1','website1'],
'name2': ['location2', 'website2'],
'name3': ['location3', 'website3']}
edit: the value would be a list, sorry about that
I have no idea how to approach this, can someone point me in the right direction?

First, there appears to be a misconception about the structure of a dictionary, or, more general, of associative containers in general, underlying this question.
The structure of a dict is, in python-like syntax
{
key : whatever_value1,
another_key: whatever_value2,
# ...
}
Second, if you trim the trailing digit from
name1
location1
website1
you naturally arrive at a struct-like ADT for the END-seperated individual entries of that file, namely
class Whatever(object):
def __init__(self, name, location, website):
self.name = name
self.location = location
self.website = website
(your mileage will vary regarding the name of the class)
Thus what you could use, is a python dict, that maps a key - likely the name attribute of your records - to a (reference to) an instance of that type.
To process the input file, you simple read the file line-wise each time until you encounter END, and then commit a class Whatever to the dictionary using (e.g.) its name as the key.

Use the fact "END" delimits each section, itertools.groupby will split the file using END and we just need to create our key/value pairing as we iterate over the groupby object.
from itertools import groupby
from collections import OrderedDict
with open("test.txt") as f:
d = OrderedDict((next(v), list(v))
for k, v in groupby(map(str.rstrip, f), key=lambda x: x[:3] != "END") if k)
Output:
OrderedDict([('name1', ['location1', 'website1']),
('name2', ['location2', 'website2']),
('name3', ['location3', 'website3'])])
Or using a regular for loop, just change the key each time we hit END storing the lines for each section in a tmp list:
from collections import OrderedDict
with open("test.txt") as f:
# itertools.imap for python2
data = map(str.rstrip, f)
d, tmp, k = OrderedDict(), [], next(data)
for line in data:
if line == "END":
d[k] = tmp
k, tmp = next(data, ""), []
else:
tmp.append(line)
Output will be the same:
OrderedDict([('name1', ['location1', 'website1']),
('name2', ['location2', 'website2']),
('name3', ['location3', 'website3'])])
Both code examples will work for any length sections not just three lines.

It has been answered, but you can shorten things by applying Python's very own dict and list comprehension:
with open(file, 'r') as f:
triplets = [data.strip().split('\n') for data in f.read().strip().split('END') if data]
d = {name: [line, site] for name, line, site in triplets}

You can take a slice of four lines at a time from the file without having to load it all into memory. One way to do this is with islice from itertools.
from itertools import islice
data = dict()
with open('file.path') as input:
while True:
batch = tuple(x.strip() for x in islice(input, 4))
if not batch:
break;
name, location, website, end = batch
data[name] = (location, website)
Verification:
> from pprint import pprint
> pprint(data)
{'name1': ('location1', 'website1'),
'name2': ('location2', 'website2'),
'name3': ('location3', 'website3')}

If you are guaranteed that you will always get this data in this format, then you could do the following:
dict = {}
name = None
location = None
website = None
count = 0:
with open(file, 'r') as f: #where file is the file name
for each in f:
count += 1
if count == 1:
name = each
elif count == 2:
location = each
elif count == 3:
website = each
elif count == 4 and each == 'END':
count = 0 # Forgot to reset to 0 when it got to four... my bad.
dict[name] = (location, website) # Adding to the dictionary as a tuple since you need to have key -> value not key -> value1, value2
else:
print("Well, something went amiss %i %s" % count, each)

Python Replacing Words from Definitions in Text File

I've got an old informix database that was written for cobol. All the fields are in code so my SQL queries look like.
SELECT uu00012 FROM uu0001;
This is pretty hard to read.
I have a text file with the field definitions like
uu00012 client
uu00013 date
uu00014 f_name
uu00015 l_name
I would like to swap out the code for the more english name. Run a python script on it maybe and have a file with the english names saved.
What's the best way to do this?

If each piece is definitely a separate word, re.sub is definitely the way to go here:
#create a mapping of old vars to new vars.
with open('definitions') as f:
d = dict( [x.split() for x in f] )
def my_replace(match):
#if the match is in the dictionary, replace it, otherwise, return the match unchanged.
return d.get( match.group(), match.group() )
with open('inquiry') as f:
for line in f:
print re.sub( r'\w+', my_replace, line )

Conceptually,
I would probably first build a mapping of codings -> english (in memory or o.
Then, for each coding in your map, scan your file and replace with the codes mapped english equivalent.

infile = open('filename.txt','r')
namelist = []
for each in infile.readlines():
namelist.append((each.split(' ')[0],each.split(' ')[1]))
this will give you a list of key,value pairs
i dont know what you want to do with the results from there though, you need to be more explicit

dictionary = '''uu00012 client
uu00013 date
uu00014 f_name
uu00015 l_name'''
dictionary = dict(map(lambda x: (x[1], x[0]), [x.split() for x in dictionary.split('\n')]))
def process_sql(sql, d):
for k, v in d.items():
sql = sql.replace(k, v)
return sql
sql = process_sql('SELECT f_name FROM client;', dictionary)
build dictionary:
{'date': 'uu00013', 'l_name': 'uu00015', 'f_name': 'uu00014', 'client': 'uu00012'}
then run thru your SQL and replace human readable values with coded stuff. The result is:
SELECT uu00014 FROM uu00012;

import re
f = open("dictfile.txt")
d = {}
for mapping in f.readlines():
l, r = mapping.split(" ")
d[re.compile(l)] = r.strip("\n")
sql = open("orig.sql")
out = file("translated.sql", "w")
for line in sql.readlines():
for r in d.keys():
line = r.sub(d[r], line)
out.write(line)

Python: create dict from list and auto-gen/increment the keys (list is the actual key values)?

i've searched pretty hard and cant find a question that exactly pertains to what i want to..
I have a file called "words" that has about 1000 lines of random A-Z sorted words...
10th
1st
2nd
3rd
4th
5th
6th
7th
8th
9th
a
AAA
AAAS
Aarhus
Aaron
AAU
ABA
Ababa
aback
abacus
abalone
abandon
abase
abash
abate
abater
abbas
abbe
abbey
abbot
Abbott
abbreviate
abc
abdicate
abdomen
abdominal
abduct
Abe
abed
Abel
Abelian
I am trying to load this file into a dictionary, where using the word are the key values and the keys are actually auto-gen/auto-incremented for each word
e.g {0:10th, 1:1st, 2:2nd} ...etc..etc...
below is the code i've hobbled together so far, it seems to sort of works but its only showing me the last entry in the file as the only dict pair element
f3data = open('words')
mydict = {}
for line in f3data:
print line.strip()
cmyline = line.split()
key = +1
mydict [key] = cmyline
print mydict

key = +1
+1 is the same thing as 1. I assume you meant key += 1. I also can't see a reason why you'd split each line when there's only one item per line.
However, there's really no reason to do the looping yourself.
with open('words') as f3data:
mydict = dict(enumerate(line.strip() for line in f3data))

dict(enumerate(x.rstrip() for x in f3data))
But your error is key += 1.

f3data = open('words')
print f3data.readlines()

The use of zero-based numeric keys in a dict is very suspicious. Consider whether a simple list would suffice.
Here is an example using a list comprehension:
>>> mylist = [word.strip() for word in open('/usr/share/dict/words')]
>>> mylist[1]
'A'
>>> mylist[10]
"Aaron's"
>>> mylist[100]
"Addie's"
>>> mylist[1000]
"Armand's"
>>> mylist[10000]
"Loyd's"
I use str.strip() to remove whitespace and newlines, which are present in /usr/share/dict/words. This may not be necessary with your data.
However, if you really need a dictionary, Python's enumerate() built-in function is your friend here, and you can pass the output directly into the dict() function to create it:
>>> mydict = dict(enumerate(word.strip() for word in open('/usr/share/dict/words')))
>>> mydict[1]
'A'
>>> mydict[10]
"Aaron's"
>>> mydict[100]
"Addie's"
>>> mydict[1000]
"Armand's"
>>> mydict[10000]
"Loyd's"

With keys that dense, you don't want a dict, you want a list.
with open('words') as fp:
data = map(str.strip, fp.readlines())
But if you really can't live without a dict:
with open('words') as fp:
data = dict(enumerate(X.strip() for X in fp))

{index: x.strip() for index, x in enumerate(open('filename.txt'))}
This code uses a dictionary comprehension and the enumerate built-in, which takes an input sequence (in this case, the file object, which yields each line when iterated through) and returns an index along with the item. Then, a dictionary is built up with the index and text.
One question: why not just use a list if all of your keys are integers?
Finally, your original code should be
f3data = open('words')
mydict = {}
for index, line in enumerate(f3data):
cmyline = line.strip()
mydict[index] = cmyline
print mydict

Putting the words in a dict makes no sense. If you're using numbers as keys you should be using a list.
from __future__ import with_statement
with open('words.txt', 'r') as f:
lines = f.readlines()
words = {}
for n, line in enumerate(lines):
words[n] = line.strip()
print words

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Creating a Dictionary from txt - python

See ForceBru's answer on how to construct the dictionary. Here's the printing part: for k, (v1, v2) in your_dict.items(): print("Entered: {}\nLocation: {}\nName: {}\n".format(k, v1, v2))

Related

Adding words between lines to an array

Python splitting data record

Parsing file structure based on a specific pattern

Python Replacing Words from Definitions in Text File

Python: create dict from list and auto-gen/increment the keys (list is the actual key values)?

Categories

Resources