Convert a text file into a dictionary - python

I have a text file in this format:
key:object,
key2:object2,
key3:object3
How can I convert this into a dictionary in Python for the following process?
Open it
Check if string s = any key in the dictionary
If it is, then string s = the object linked to the aforementioned key.
If not, nothing happens
File closes.
I've tried the following code for dividing them with commas, but the output was incorrect. It made the combination of key and object in the text file into a single key and single object, effectively duplicating it:
Code:
file = open("foo.txt","r")
dict = {}
for line in file:
x = line.split(",")
a = x[0]
b = x[0]
dict[a] = b
Incorrect output:
key:object, key:object
key2:object2, key2:object2
key3:object3, key3:object3
Thank you

m={}
for line in file:
x = line.replace(",","") # remove comma if present
y=x.split(':') #split key and value
m[y[0]] = y[1]

# -*- coding:utf-8 -*-
key_dict={"key":'',"key5":'',"key10":''}
File=open('/home/wangxinshuo/KeyAndObject','r')
List=File.readlines()
File.close()
key=[]
for i in range(0,len(List)):
for j in range(0,len(List[i])):
if(List[i][j]==':'):
if(List[i][0:j] in key_dict):
for final_num,final_result in enumerate(List[i][j:].split(',')):
if(final_result!='\n'):
key_dict["%s"%List[i][0:j]]=final_result
print(key_dict)
I am using your file in "/home/wangxinshuo/KeyAndObject"

You can convert the content of your file to a dictionary with some oneliner similar to the below one:
result = {k:v for k,v in [line.strip().replace(",","").split(":") for line in f if line.strip()]}
In case you want the dictionary values to be stripped, just add v.strip()

Related

Python splitting data record

I have a record as below:
29 16
A 1.2595034 0.82587254 0.7375044 1.1270138 -0.35065323 0.55985355
0.7200067 -0.889543 0.2300735 0.56767654 0.2789483 0.32296127 -0.6423197 0.26456305 -0.07363393 -1.0788593
B 1.2467299 0.78651106 0.4702038 1.204216 -0.5282698 0.13987103
0.5911153 -0.6729466 0.377103 0.34090135 0.3052503 0.028784657 -0.39129165 0.079238065 -0.29310825 -0.99383247
I want to split the data into key-value pairs neglecting the first top row i.e 29 16. It should be neglected.
The output should be something like this:
x = A , B
y = 1.2595034 0.82587254 0.7375044 1.1270138 -0.35065323 0.55985355 0.7200067 -0.889543 0.2300735 0.56767654 0.2789483 0.32296127 -0.6423197 0.26456305 -0.07363393 -1.0788593
1.2467299 0.78651106 0.4702038 1.204216 -0.5282698 0.13987103 0.5911153 -0.6729466 0.377103 0.34090135 0.3052503 0.028784657 -0.39129165 0.079238065 -0.29310825 -0.99383247
I am able to neglect the first line using the below code:
f = open(fileName, 'r')
lines = f.readlines()[1:]
Now how do I separate rest record in Python?
So here's my take :D I expect you'd want to have the numbers parsed as well?
def generate_kv(fileName):
with open(fileName, 'r') as file:
# ignore first line
file.readline()
for line in file:
if '' == line.strip():
# empty line
continue
values = line.split(' ')
try:
yield values[0], [float(x) for x in values[1:]]
except ValueError:
print(f'one of the elements was not a float: {line}')
if __name__ == '__main__':
x = []
y = []
for key, value in generate_kv('sample.txt'):
x.append(key)
y.append(value)
print(x)
print(y)
assumes that the values in sample.txt look like this:
% cat sample.txt
29 16
A 1.2595034 0.82587254 0.7375044 1.1270138 -0.35065323 0.55985355 0.7200067 -0.889543 0.2300735 0.56767654 0.2789483 0.32296127 -0.6423197 0.26456305 -0.07363393 -1.0788593
B 1.2467299 0.78651106 0.4702038 1.204216 -0.5282698 0.13987103 0.5911153 -0.6729466 0.377103 0.34090135 0.3052503 0.028784657 -0.39129165 0.079238065 -0.29310825 -0.99383247
and the output:
% python sample.py
['A', 'B']
[[1.2595034, 0.82587254, 0.7375044, 1.1270138, -0.35065323, 0.55985355, 0.7200067, -0.889543, 0.2300735, 0.56767654, 0.2789483, 0.32296127, -0.6423197, 0.26456305, -0.07363393, -1.0788593], [1.2467299, 0.78651106, 0.4702038, 1.204216, -0.5282698, 0.13987103, 0.5911153, -0.6729466, 0.377103, 0.34090135, 0.3052503, 0.028784657, -0.39129165, 0.079238065, -0.29310825, -0.99383247]]
Alternatively, if you'd wanted to have a dictionary, do:
if __name__ == '__main__':
print(dict(generate_kv('sample.txt')))
That will convert the list into a dictionary and output:
{'A': [1.2595034, 0.82587254, 0.7375044, 1.1270138, -0.35065323, 0.55985355, 0.7200067, -0.889543, 0.2300735, 0.56767654, 0.2789483, 0.32296127, -0.6423197, 0.26456305, -0.07363393, -1.0788593], 'B': [1.2467299, 0.78651106, 0.4702038, 1.204216, -0.5282698, 0.13987103, 0.5911153, -0.6729466, 0.377103, 0.34090135, 0.3052503, 0.028784657, -0.39129165, 0.079238065, -0.29310825, -0.99383247]}
you can use this script if your file is a text
filename='file.text'
with open(filename) as f:
data = f.readlines()
x=[data[0][0],data[1][0]]
y=[data[0][1:],data[1][1:]]
If you're happy to store the data in a dictionary here is what you can do:
records = dict()
with open(filename, 'r') as f:
f.readline() # skip the first line
for line in file:
key, value = line.split(maxsplit=1)
records[key] = value.split()
The structure of records would be:
{
'A': ['1.2595034', '0.82587254', '0.7375044', ... ]
'B': ['1.2467299', '0.78651106', '0.4702038', ... ]
}
What's happening
with ... as f we're opening the file within a context manager (more info here). This allows us to automatically close the file when the block finishes.
Because the open file keeps track of where it is in the file we can use f.readline() to move the pointer down a line. (docs)
line.split() allows you to turn a string into a list of strings. With the maxsplits=1 arg it means that it will only split on the first space.
e.g. x, y = 'foo bar baz'.split(maxsplit=1), x = 'foo' and y = 'bar baz'
If I understood correctly, you want the numbers to be collected in a list. One way of doing this is:
import string
text = '''
29 16
A 1.2595034 0.82587254 0.7375044 1.1270138 -0.35065323 0.55985355 0.7200067 -0.889543 0.2300735 0.56767654 0.2789483 0.32296127 -0.6423197 0.26456305 -0.07363393 -1.0788593
B 1.2467299 0.78651106 0.4702038 1.204216 -0.5282698 0.13987103 0.5911153 -0.6729466 0.377103 0.34090135 0.3052503 0.028784657 -0.39129165 0.079238065 -0.29310825 -0.99383247
'''
lines = text.split('\n')
x = [
line[1:].strip().split()
for i, line in enumerate(lines)
if line and line[0].lower() in string.ascii_letters]
This will produce a list of lists when the outer list contains A, B, etc. and the inner lists contain the numbers associated to A, B, etc.
This code assumes that you are interested in lines starting with any single letter (case-insensitive).
For more elaborated conditions you may want to look into regular expressions.
Obviously, if your text is in a file, you could substitute lines = ... with:
with open(filepath, 'r') as lines:
x = ...
Also, if the items in x should not be separated, but rather in a string, you may want to change line[1:].strip().split() with line[1:].strip().
Instead, if you want the numbers as float and not string, you should replace line[1:].strip().split() with [float(value) for value in line[1:].strip().split()].
EDIT:
Alternatively to line[1:].strip().split() you may want to do:
line.split(maxsplit=1)[1].split()
as suggested in some other answer. This would generalize better if the first token is not a single character.

Removing unecessary double quotes from keys in a dictionary - Python

The goal of my program is to create a dictionary of items (key) and their count (values). The keys are extracted from a text file, in which they're organized as lists.
Example: ['synonymous_variant'] ['splice_region_variant&synonymous_variant' ['synonymous_variant'] (each lists are on a new line, without any separators)
Code:
from collections import Counter
file = open('/home/becquart/Stagiaire_refinement_construct_peptides/Travail5/RE__[Allogenomics]_travail_Vcf/results.txt', 'r').read()
for char in '""-.,\n[]':
file = file.replace(char,' ')
for i in char:
file = file.replace('""', ' ')
file = file.lower()
word_list = file.split()
d = dict(Counter(word_list).most_common())
print d
The output is something like: {"'coding_sequence_variant&3_prime_utr_variant'": 6, "'inframe_insertion&nmd_transcript_variant'": 17 etc.
I would like to remove " from the keys, but I am having a hard time figuring it out as I'm very new in programming...I would be extremely happy if I could get this solved.
Thank you in advance!
Edit:
Input file here: https://ufile.io/v1tm0

Python: convert and join rows of a text file into a dictionary with list values

I have a text file with an arbitrary (non-Python) list of blocks of four lines, as follows:
WHAT
EVER
0.00000904
17577
FOO
BAR
7.00000031
426
The file comprises thousands of such blocks. How can I convert the data in the file into a dictionary of lists, where the key is the first two lines of each block, concatenated, and the next two lines are the list values? For example:
{'WHATEVER': [0.00000904, 17577], 'FOOBAR': [7.00000031, 426]}
Try the following:
import re
# Open the file
data = open('odd_lines.txt').read()
# Split on the double newline characters
data = data.split("\n\n")
# Split each element of the data list on the newline characters followed by a float
data = [re.split("\n(\d+\.\d+)", x) for x in data]
# Put the data in a dictionary with the key being the first element of each element of the data list.
# Make sure to replace the newline character with an empty space
output = {x[0].replace("\n",""):[float(y) for y in x[1:]] for x in data}
print(output)
This should yield:
#{'FOOBAR': [7.00000031, 426], 'WHATEVER': [0.00000904, 17577]}
The following is the starting file (odd_lines.txt):
WHAT
EVER
0.00000904
17577
FOO
BAR
7.00000031
426
I hope this helps.
You could do the following:
import os
# set base path to main dir of target file
root = os.getcwd()
# split on double spaces
vals = open(os.path.join(root, 'test.txt'), 'r').read().split('\n\n')
# create empty dictionary to store values
valdict = {}
# iterate over each item which should contain the keys and values
for val in vals:
# fill in dict with key and turn numbers into float and dict value as float list
key = ''.join(val.split('\n')[0:2])
nums = val.split('\n')[2:]
nums = map(float, nums)
valdict[key] = nums
valdict
# output: {'FOOBAR': [7.00000031, 426.0], 'WHATEVER': [9.04e-06, 17577.0]}

Python: dictionary to collection

I have a file with 2 columns:
Anzegem Anzegem
Gijzelbrechtegem Anzegem
Ingooigem Anzegem
Aalst Sint-Truiden
Aalter Aalter
The first column is a town and the second column is the district of that town.
I made a dictionary of that file like this:
def readTowns(text):
input = open(text, 'r')
file = input.readlines()
dict = {}
verzameling = set()
for line in file:
tmp = line.split()
dict[tmp[0]] = tmp[1]
return dict
If I set a variable 'writeTowns' equal to readTowns(text) and do writeTown['Anzegem'], I want to get a collection of {'Anzegem', 'Gijzelbrechtegem', 'Ingooigem'}.
Does anybody know how to do this?
I think you can just create another function that can create appropriate data structure for what you need. Because, at the end you will end up writing code which basically manipulates the dictionary returned by readTowns to generate data as per your requirement. Why not keep the code clean and create another function for that. You Just create a name to list dictionary and you are all set.
def writeTowns(text):
input = open(text, 'r')
file = input.readlines()
dict = {}
for line in file:
tmp = line.split()
dict[tmp[1]] = dict.get(tmp[1]) or []
dict.get(tmp[1]).append(tmp[0])
return dict
writeTown = writeTowns('file.txt')
print writeTown['Anzegem']
And if you are concerned about reading the same file twice, you can do something like this as well,
def readTowns(text):
input = open(text, 'r')
file = input.readlines()
dict2town = {}
town2dict = {}
for line in file:
tmp = line.split()
dict2town[tmp[0]] = tmp[1]
town2dict[tmp[1]] = town2dict.get(tmp[1]) or []
town2dict.get(tmp[1]).append(tmp[0])
return dict2town, town2dict
dict2town, town2dict = readTowns('file.txt')
print town2dict['Anzegem']
You could do something like this, although, please have a look at #ubadub's answer, there are better ways to organise your data.
[town for town, region in dic.items() if region == 'Anzegem']
It sounds like you want to make a dictionary where the keys are the districts and the values are a list of towns.
A basic way to do this is:
def readTowns(text):
with open(text, 'r') as f:
file = input.readlines()
my_dict = {}
for line in file:
tmp = line.split()
if tmp[1] in dict:
my_dict[tmp[1]].append(tmp[0])
else:
my_dict[tmp[1]] = [tmp[0]]
return dict
The if/else blocks can also be achieved using python's defaultdict subclass (docs here) but I've used the if/else statements here for readability.
Also some other points: the variables dict and file are python types so it is bad practice to overwrite these with your own local variable (notice I've changed dict to my_dict in the code above.
If you build your dictionary as {town: district}, so the town is the key and the district is the value, you can't do this easily*, because a dictionary is not meant to be used in that way. Dictionaries allow you to easily find the values associated with a given key. So if you want to find all the towns in a district, you are better of building your dictionary as:
{district: [list_of_towns]}
So for example the district Anzegem would appear as {'Anzegem': ['Anzegem', 'Gijzelbrechtegem', 'Ingooigem']}
And of course the value is your collection.
*you could probably do it by iterating through the entire dict and checking where your matches occur, but this isn't very efficient.

Creating a dictionary of lists from a file

I have a list in the following format in a txt file :
Shoes, Nike, Addias, Puma,...other brand names
Pants, Dockers, Levis,...other brand names
Watches, Timex, Tiesto,...other brand names
how to put these into dictionary like this format:
dictionary={Shoes: [Nike, Addias, Puma,.....]
Pants: [Dockers, Levis.....]
Watches:[Timex, Tiesto,.....]
}
How to do this in a for loop rather than manual input.
i have tried
clothes=open('clothes.txt').readlines()
clothing=[]
stuff=[]
for line in clothes:
items=line.replace("\n","").split(',')
clothing.append(items[0])
stuff.append(items[1:])
Clothing:{}
for d in clothing:
Clothing[d]= [f for f in stuff]
Here's a more concise way to do things, though you'll probably want to split it up a bit for readability
wordlines = [line.split(', ') for line in open('clothes.txt').read().split('\n')]
d = {w[0]:w[1:] for w in wordlines}
How about:
file = open('clothes.txt')
clothing = {}
for line in file:
items = [item.strip() for item in line.split(",")]
clothing[items[0]] = items[1:]
Try this, it will remove the need for replacing line breaks and is quite simple, but effective:
clothes = {}
with open('clothes.txt', 'r', newline = '/r/n') as clothesfile:
for line in clothesfile:
key = line.split(',')[0]
value = line.split(',')[1:]
clothes[key] = value
The 'with' statement will make sure the file reader is closed after your code to implement the dictionary is executed. From there you can use the dictionary to your heart's content!
Using list comprehension you could do:
clothes=[line.strip() for line in open('clothes.txt').readlines()]
clothingDict = {}
for line in clothes:
arr = line.split(",")
clothingDict[arr[0]] = [arr[i] for i in range(1,len(arr))]

Categories

Resources