I'm trying to clean up the content of a csv file and then create a new dictionary out of it. I want the new dictionary to be globally available:
import csv
input_file = csv.DictReader(open("test_file3.csv"))
final_dict = {} #this should get filled with the new dictionary
for row in input_file: #cleaning the dictionary
new_dict = {}
for key, value in row.items():
if key == "Start Date":
new_dict[key] = value
else:
first_replace = value.replace(".", "")
second_replace = first_replace.replace(",", ".")
all_replaced = second_replace.replace(" €", "")
new_dict[key] = all_replaced
It works inside the the first loop but I don't know how to get the dictionary under new_dict to final_dict
Ideally I want final_dict = new_dict.
You don't need to create a new_dict inside your for loop, just access final_dict inside it:
final_dict = {}
for row in input_file:
for key, value in row.items():
if key == "Start Date":
final_dict[key] = value
else:
first_replace = value.replace(".", "")
second_replace = first_replace.replace(",", ".")
all_replaced = second_replace.replace(" €", "")
final_dict[key] = all_replaced
print final_dict
If there are multiple entries with the same key, only the last one will be included in final_dict.
Related
I have a directory with many pickled files, each containing a dictionary. The filename indicates the setting of the dictionary. E.g.: 20NewsGroup___10___Norm-False___Probs-True___euclidean.pickle.
I want to combine these different dicts all in one large dict. To do this, I have written the following code:
PATH = '<SOME PATH>'
all_dicts = os.listdir(PATH)
one_dict = dict()
for i, filename in enumerate(all_dicts):
infile = open(PATH+filename, 'rb')
new_dict = pickle.load(infile)
infile.close()
splitted = filename.split('___')
splitted[4] = splitted[4].split('.')[0]
one_dict[splitted[0]][splitted[1]][splitted[2]][splitted[3]][splitted[4]] = new_dict
However, when I run this, I get a KeyError, as the key for splitted[0] does not yet exist. What is the best way to populate a dictionary similar to what I envision?
you need to create in these fields
Example:
from typing import List
PATH = '<SOME PATH>'
all_dicts = os.listdir(PATH)
def check_exist_path(target_dict, paths: List[str]):
for key in paths:
if key not in target_dict:
target_dict[key] = {}
target_dict = target_dict[key]
one_dict = dict()
for i, filename in enumerate(all_dicts):
infile = open(PATH+filename, 'rb')
new_dict = pickle.load(infile)
infile.close()
splitted = filename.split('___')
splitted[4] = splitted[4].split('.')[0]
for i in range(5):
check_exist_path(one_dict, [splitted[j] for j in range(i)])
one_dict[splitted[0]][splitted[1]][splitted[2]][splitted[3]][splitted[4]] = new_dict
You could use try-except blocks:
for i, filename in enumerate(all_dicts):
infile = open(PATH+filename, 'rb')
new_dict = pickle.load(infile)
infile.close()
splitted = filename.split('___')
splitted[4] = splitted[4].split('.')[0]
try:
one_dict[splitted[0]][splitted[1]][splitted[2]][splitted[3]][splitted[4]] = new_dict
except KeyError:
one_dict[splitted[0]] = {}
one_dict[splitted[0]][splitted[1]] = {}
one_dict[splitted[0]][splitted[1]][splitted[2]] = {}
one_dict[splitted[0]][splitted[1]][splitted[2]][splitted[3]] = {}
one_dict[splitted[0]][splitted[1]][splitted[2]][splitted[3]][splitted[4]] = new_dict
This code tries to access nested dictionary keys, if they don't exist, then the code will create them with an empty dictionary.
I have one nested list, and one list for "numbers"
test_keys = [["tobbe", "kalle"],["karl", "Clara"],["tobbe"],["tank"]]
test_values = ['123', '234','345','456']
res = {}
for key in test_keys:
for value in test_values:
res[value] = key
test_values.remove(value)
break
with open("myfile.txt", 'w') as f:
for key, value in res.items():
f.write('%s;%s;\n' % (key, value))
This provides the file
123;['tobbe', 'kalle'];
234;['karl', 'Clara'];
345;['finis'];
456;['tank'];
now I want to load the data back into the a dictionary without the ";" and later on back into the corresponding lists.
Try this:
res = {}
with open("myfile.txt") as file:
for line in file:
chunks = line.split(';')
names = chunks[1][1:-1].split(', ')
res[chunks[0]] = [name[1:-1] for name in names]
print(res)
test_keys = []
test_values = []
for key in res:
test_keys.append(key)
test_values.append(res[key])
print(test_keys)
print(test_values)
here is my txt file that has contained all of the lines. What I want to do is create a dictionary, and access a key, and get a list of values
Finance:JPMC
Software:Microsoft
Conglomerate:L&T
Conglomerate:Amazon
Software:Palantir
Defense:BAE
Defense:Lockheed
Software:TCS
Retail:TjMax
Retail:Target
Oil:Exxon
Oil:Chevron
Oil:BP
Oil:Gulf
Finance:Square
FMCG:PnG
FMCG:JohnsonNJohnson
FMCG:Nestle
Retail:Sears
Retail:FiveBelow
Defense:Boeing
Finance:Citadel
Finance:BridgeWater
Conglomerate:GE
Conglomerate:HoneyWell
Oil:ONGC
FMCG:Unilever
Semiconductor:Intel
Semiconductor:Nvidia
Semiconductor:Qualcomm
Semiconductor:Microchip
Conglomerate:Samsung
Conglomerate:LG
Finance:BoA
Finance:Discover
Software:TCS
Defense:Raytheon
Semiconductor:Microsemi
Defense:BAE
Software:Meta
Oil:SinoPec
Defense:Saab
Defense:Dassault
Defense:Airbus
Software:Adobe
Semiconductor:TSMC
FMCG:CocoCola
FMCG:Pesico
Retail:Kohls
Here is my attempted code
f = open("companyList.txt", "r")
sector, company = [], []
for line in f:
first, second = line.split(":")
sector.append(first)
company.append(second)
dictionary = {}
for key in sector:
for element in company:
dictionary[sector].append(element)
print(dictionary)
Since there are multiple duplicate keys, I wanted to append a list to that particular key as python doesn't allow duplicate keys.
If i understand your question right you can do this:
from collections import defaultdict
dictionary = defaultdict(list)
for line in f:
first, second = line.split(":")
dictionary[first].append(second)
I think this is what you want:
pairs = {}
with open("tst.txt", "r") as f:
while True:
line = f.readline().strip()
if not line:
break
sector, value = line.split(":", 1)
if sector not in pairs:
pairs[sector] = []
pairs[sector].append(value)
f.close()
print(pairs)
you should do:
f = open("companyList.txt", "r")
sector, company = [], []
for line in f:
first, second = line.split(":")
sector.append(first)
company.append(second)
dictionary = {}
for sectory,companyy in zip(sector,company):
dictionary[sectory] = companyy
for key in sector:
dictionary[sector] = key
I have written a script to convert a text file into dictionary..
script.py
l=[]
d={}
count=0
f=open('/home/asha/Desktop/test.txt','r')
for row in f:
rowcount+=1
if row[0] == ' ' in row:
l.append(row)
else:
if count == 0:
temp = row
count+=1
else:
d[temp]=l
l=[]
count=0
print d
textfile.txt
Time
NtGetTickCount
NtQueryPerformanceCounter
NtQuerySystemTime
NtQueryTimerResolution
NtSetSystemTime
NtSetTimerResolution
RtlTimeFieldsToTime
RtlTimeToTime
System informations
NtQuerySystemInformation
NtSetSystemInformation
Enumerations
Structures
The output i have got is
{'Time\n': [' NtGetTickCount\n', ' NtQueryPerformanceCounter\n', ' NtQuerySystemTime\n', ' NtQueryTimerResolution\n', ' NtSetSystemTime\n', ' NtSetTimerResolution\n', ' RtlTimeFieldsToTime\n', ' RtlTimeToTime\n']}
Able to convert upto 9th line in the text file. Suggest me where I am going wrong..
You never commit (i.e. run d[row] = []) the final list to the dictionary.
You can simply commit when you create the row:
d = {}
cur = []
for row in f:
if row[0] == ' ': # line in section
cur.append(row)
else: # new row
d[row] = cur = []
print (d)
Using dict.setdefault to create dictionary with lists as values will make your job easier.
d = {}
with open('input.txt') as f:
key = ''
for row in f:
if row.startswith(' '):
d.setdefault(key, []).append(row.strip())
else:
key = row
print(d)
Output:
{'Time\n': ['NtGetTickCount', 'NtQueryPerformanceCounter', 'NtQuerySystemTime', 'NtQueryTimerResolution', 'NtSetSystemTime', 'NtSetTimerResolution', 'RtlTimeFieldsToTime', 'RtlTimeToTime'], 'System informations\n': ['NtQuerySystemInformation', 'NtSetSystemInformation', 'Enumerations', 'Structures']}
A few things to note here:
Always use with open(...) for file operations.
If you want to check the first index, or the first few indices, use str.startswith()
The same can be done using collections.defaultdict:
from collections import defaultdict
d = defaultdict(list)
with open('input.txt') as f:
key = ''
for row in f:
if row.startswith(' '):
d[key].append(row)
else:
key = row
So you need to know two things at any given time while looping over the file:
1) Are we on a title level or content level (by indentation) and
2) What is the current title
In the following code, we first check if the current line we are at, is a title (so it does not start with a space) and set the currentTitle to that as well as insert that into our dictionary as a key and an empty list as a value.
If it is not a title, we just append to corresponding title's list.
with open('49359186.txt', 'r') as input:
topics = {}
currentTitle = ''
for line in input:
line = line.rstrip()
if line[0] != ' ':
currentTitle = line
topics[currentTitle] = []
else:
topics[currentTitle].append(line)
print topics
Try this:
d = {}
key = None
with open('/home/asha/Desktop/test.txt','r') as file:
for line in file:
if line.startswith(' '):
d[key].append(line.strip())
else:
key = line.strip(); d[key] = []
print(d)
Just for the sake of adding in my 2 cents.
This problem is easier to tackle backwards. Consider iterating through your file backwards and then storing the values into a dictionary whenever a header is reached.
f=open('test.txt','r')
d = {}
l = []
for row in reversed(f.read().split('\n')):
if row[0] == ' ':
l.append(row)
else:
d.update({row: l})
l = []
Just keep track the line which start with ' ' and you are done with one loop only :
final=[]
keys=[]
flag=True
with open('new_text.txt','r') as f:
data = []
for line in f:
if not line.startswith(' '):
if line.strip():
keys.append(line.strip())
flag=False
if data:
final.append(data)
data=[]
flag=True
else:
if flag==True:
data.append(line.strip())
final.append(data)
print(dict(zip(keys,final)))
output:
{'Example': ['data1', 'data2'], 'Time': ['NtGetTickCount', 'NtQueryPerformanceCounter', 'NtQuerySystemTime', 'NtQueryTimerResolution', 'NtSetSystemTime', 'NtSetTimerResolution', 'RtlTimeFieldsToTime', 'RtlTimeToTime'], 'System informations': ['NtQuerySystemInformation', 'NtSetSystemInformation', 'Enumerations', 'Structures']}
I have a text file with the following:
1 cdcdm
1 dhsajdhsa
2 ffdm
2 mdff
3 ccdfm
3 cdmfc
3 fmdcc
My goal is for the output to look like this:
1 : cdcdm, dhsajdhsa
2 : ffdm, mdff
3 : ccdfm, cdmfc, fmdcc
I wrote the following code, but for some reason, I'm not getting the expected output.
value_list = ''
cur_key = None
key = None
f = open('example.txt', 'r')
for line in f.readlines():
try:
key, value = line.split()
key = key.strip()
value = value.strip()
if cur_key == key:
value_list = value_list + "," + value
else:
if cur_key:
print(cur_key + ":" +value_list)
cur_key = key
value_list = ''
else:
cur_key = key
except Exception as e:
continue
I'm getting the following output:
1:,dhsajdhsa
2:,mdff
How can I modify my code to get this to work?
Thanks,
Mango
A minimally changed implementation might look like this
with open('example.txt', 'r') as f:
cur_key = None
value_list = []
for line in f.readlines():
key, value = line.split()
value = value.strip()
if not cur_key:
cur_key = key
if cur_key == key:
value_list.append(value)
else:
print(cur_key + ":" + ', '.join(value_list))
cur_key = key
value_list = [value]
print(cur_key + ":" +', '.join(value_list))
output:
1:cdcdm, dhsajdhsa
2:ffdm, mdff
3:ccdfm, cdmfc, fmdcc
So we need to make sure cur_key has a value for the first iteration. So set it if not None. Also when we find a new key we shouldn't reset value_list to be blank. It should be the set to the value read on that line, so the lien is not skipped. Also to catch the final groups line we should print the values again outside the loop at the end.
Use itertools.groupby:
import itertools
with open('example.txt') as f:
for key, strings in itertools.groupby(f, lambda s: s.strip()[0]):
print('{}: {}'.format(
key, ', '.join(s.split(None, 1)[1].strip() for s in strings)))
Here's a answer based on your code:
value_list = []
cur_key = None
f = open('example.txt', 'r')
for line in f:
key, value = line.split()
key = key.strip()
value = value.strip()
if cur_key == key or cur_key is None:
value_list.append(value)
else:
print('{}: {}'.format(cur_key, ','.join(value_list)))
value_list = [value]
cur_key = key
if value_list:
print('{}: {}'.format(cur_key, ','.join(value_list)))
I recommend throwing that away and using a collections.defaultdict. Then you can add values to a list for the corresponding key, and print the completed dictionary when you're done:
import collections
d = collections.defaultdict(list)
with open('example.txt') as f:
for line in f:
k,v = line.split()
d[k].append(v.strip())
for k,v in sorted(d.items()):
print('{} : {}'.format(k, ', '.join(v)))
I also believe there are better ways to do it, but if you really want to stick to the basics, at least use lists instead of concatenating text. Here's yet another version of your code, with slight changes:
lists = []
cur_key = None
key = None
f = open('example.txt', 'r')
for line in f.readlines():
try:
key, value = line.split()
key = key.strip()
value = value.strip()
if cur_key != key:
if(cur_key):
lists.append(value_list)
value_list = []
cur_key = key
value_list.append(value)
except Exception as e:
continue
lists.append(value_list)
for i,l in enumerate(lists):
print(str(i+1) + ' : ' + ', '.join(l))