How to group array of the same name using Python? - python

I have over a thousand array categories in a text file, for example: Category A1 and Cateogry A2: (array in matlab code)
A1={[2,1,2]};
A1={[4,2,1,2,3]};
A2={[3,3,2,1]};
A2={[4,4,2,2]};
A2={[2,2,1,1,1]};
I would like to use Python to help me read the file and group them into:
A1=[{[2,1,2]} {[4,2,1,2,3]}];
A2=[{[3,3,2,1]} {[4,4,2,2]} {[2,2,1,1,1]}];

Use a dict to group, I presume you mean group as strings as they are not valid python containers coming from a .mat matlab file:
from collections import OrderedDict
od = OrderedDict()
with open("infile") as f:
for line in f:
name, data = line.split("=")
od.setdefault(name,[]).append(data.rstrip(";\n"))
from pprint import pprint as pp
pp((od.values()))
[['{[2,1,2]}', '{[4,2,1,2,3]}'],
['{[3,3,2,1]}', '{[4,4,2,2]}', '{[2,2,1,1,1]}']]
To group the data in your file just write the content:
with open("infile", "w") as f:
for k, v in od.items():
f.write("{}=[{}];\n".format(k, " ".join(v))))
Output:
A1=[{[2,1,2]} {[4,2,1,2,3]}];
A2=[{[3,3,2,1]} {[4,4,2,2]} {[2,2,1,1,1]}];
Which is actually your desired output with the semicolons removed from each sub array, the elements grouped and the semicolon added to the end of the group to keep the data valid in your matlab file.
The collections.OrderedDict will keep the order from your original file where using a normal dict will have no order.
A safer approach when updating a file is to write to a temp file then replace the original file with the updated using a NamedTemporaryFile and shutil.move:
from collections import OrderedDict
od = OrderedDict()
from tempfile import NamedTemporaryFile
from shutil import move
with open("infile") as f, NamedTemporaryFile(dir=".", delete=False) as temp:
for line in f:
name, data = line.split("=")
od.setdefault(name, []).append(data.rstrip("\n;"))
for k, v in od.items():
temp.write("{}=[{}];\n".format(k, " ".join(v)))
move(temp.name, "infile")
If the code errored in the loop or your comp crashed during the write, your original file would be preserved.

You can first loop over you lines and then split your lines with = then use ast.literal_eval and str.strip to extract the list within brackets and at last use a dictionary with a setdefault method to get your expected result :
import ast
d={}
with open('file_name') as f :
for line in f:
var,set_=line.split('=')
d.setdefault(var,[]).append(ast.literal_eval(set_.strip("{}\n;")))
print d
result :
{'A1': [[2, 1, 2], [4, 2, 1, 2, 3]], 'A2': [[3, 3, 2, 1], [4, 4, 2, 2], [2, 2, 1, 1, 1]]}
If you want the result to be exactly as your expected format you can do :
d={}
with open('ex.txt') as f,open('new','w')as out:
for line in f:
var,set_=line.split('=')
d.setdefault(var,[]).append(set_.strip(";\n"))
print d
for i,j in d.items():
out.write('{}=[{}];\n'.format(i,' '.join(j)))
At last you'll have the following result in new file :
A1=[{[2,1,2]} {[4,2,1,2,3]}];
A2=[{[3,3,2,1]} {[4,4,2,2]} {[2,2,1,1,1]}];

Related

write dictionary of lists to a tab delimited file in python, with dictionary key values as columns without Pandas

the dictionary I am using is:
dict={'item': [1,2,3], 'id':['a','b','c'], 'car':['sedan','truck','moped'], 'color': ['r','b','g'], 'speed': [2,4,10]}
I am trying to produce a tab delimited out put as such:
item id
1 a
2 b
3 c
The code I have written:
with open('file.txt', 'w') as tab_file:
dict_writer = DictWriter(tab_file, dict.keys(), delimiter = '\t')
dict_writer.writeheader()
dict_writer.writerows(dict)
specifically, I am struggling with writing to the file in a column based manner. Meaning, that the dictionary keys populate as the header, and the dictionary values populate vertically underneath the associated header. Also, I do NOT have the luxury of using Pandas
This solution will work for an ambiguous number of items and subitems in the dict:
d = {'item': [1, 2, 3], 'id': [4, 5, 6]}
for i in d:
print(i + "\t", end="")
numSubItems = len(d[i])
print()
for level in range(numSubItems):
for i in d:
print(str(d[i][level]) + "\t", end="")
print()
EDIT:
To implement this with writing to a text file:
d = {'item': [1, 2, 3], 'id': [4, 5, 6], 'test': [6, 7, 8]}
with open('file.txt', 'w') as f:
for i in d:
f.write(i + "\t")
numSubItems = len(d[i])
f.write("\n")
for level in range(numSubItems):
for i in d:
f.write(str(d[i][level]) + "\t")
f.write("\n")
Here's a way to do this using a one-off function and zip:
d = {
'item': [1, 2, 3],
'id': ['a', 'b', 'c'],
'car': ['sedan', 'truck', 'moped'],
'color': ['r', 'b', 'g'],
'speed': [2, 4, 10],
}
def row_printer(row):
print(*row, sep='\t')
row_printer(d.keys()) # Print header
for t in zip(*d.values()): # Print rows
row_printer(t)
To print to a file: print(..., file='file.txt')
You can use a simple loop with a zip:
d={'item': [1,2,3], 'id':["a","b","c"]}
print('item\tid')
for num, letter in zip(d['item'], d['id']):
print('\t'.join(str(num) + letter))
item id
1 a
2 b
3 c
EDIT:
If you don't want to hard code column names you can use this:
d={'item': [1,2,3], 'id':["a","b","c"]}
print('\t'.join(d.keys()))
for num, letter in zip(*d.values()):
print('\t'.join(str(num) + letter))
However the order of the columns is only guaranteed in python3.7+ if you use a dictionary. If you have a lower python version use an orderedDict instead, like this:
from collections import OrderedDict
d=OrderedDict({'item': [1,2,3], 'id':["a","b","c"]})
print('\t'.join(d.keys()))
for num, letter in zip(*d.values()):
print('\t'.join(str(num) + letter))
Instead of using csv.DictWriter you can also use a module like pandas for this:
import pandas as pd
df = pd.DataFrame.from_dict(d)
df.to_csv(“test.csv”, sep=“\t”, index=False)
Probably, you have to install it first by using
pip3 install pandas
See here for an example.

I want to convert dictionary with tuples as keys to specific format and then store it in a file

I've a dictionary dic = {(1,2,3): 3, (2,3,4): 2, (3,4,8): 5}
I want it to be saved it in the text file output.txt with the specified format
1 2 3 (3)
2 3 4 (2)
3 4 8 (5)
modify the following code for this task
dic = {(1,2,3): 3, (2,3,4): 2, (3,4,8): 5}
with open('output.txt', 'w') as file:
file.write(str(dic))
Iterate the dictionary and write content to text file.
Ex:
dic = {(1,2,3): 3, (2,3,4): 2, (3,4,8): 5}
with open('output.txt', 'w') as file:
for k, v in dic.items(): #Iterate dic
file.write("{} ({}) \n".format(" ".join(map(str, k)), v)) #write to file.
dic = {(1,2,3): 3, (2,3,4): 2, (3,4,8): 5}
with open('output.txt', 'w') as file:
for k, v in dic.items(): #Iterate dic
file.write("{} ({}) \n".format(k, v)) #write to file.
Here we just have to pass the key and value to the format function. I dont think any other operations has to be done on this.
str.format() is one of the string formatting methods in Python3, which allows multiple substitutions and value formatting. This method allows to concatenate elements within a string through positional formatting.

error in retriving dictionary keys from file in python

There are similar questions/answers on SO, but this refers to a specific error, and I have referred to the relevant SO topics to solve this, but with no luck.
The code I have seeks to retrieve lines from a text file and read them into a dictionary. It works, but as you can see below, not completely.
File
"['a', 5]"
"['b', 2]"
"['c', 3]"
"['d', 0]"
Code
def readfiletodict():
with open("testfile.txt","r",newline="") as f:
mydict={} #create a dictionary called mydict
for line in f:
(key,val) = line.split(",")
mydict[key]=val
print(mydict) #test
for keys in mydict:
print(keys) #test to see if the keys are being retrieved correctly
readfiletodict()
Desired output:
I wish the dictionary to hold keys: a,b,c,d and corresponding values as shown in the file, without the unwanted character. Simiarly, I need the values to be stored correctly in the dictionary as integers (so that they can be worked with later)
For quick replication see: https://repl.it/KgQe/0 for the whole code and problem
Current (erroneous) output:
Python 3.6.1 (default, Dec 2015, 13:05:11)
[GCC 4.8.2] on linux
{'"[\'a\'': ' 5]"\r\n', '"[\'b\'': ' 2]"\r\n', '"[\'c\'': ' 3]"\r\n', '"[\'d\'': ' 0]"\r\n'}
"['a'
"['b'
"['c'
"['d'
The Stackoverflow answer I have used in my current code is from: Python - file to dictionary? but it doesn't quite work for me...
Your code slightly modified - the key is to strip out all the chars that we don't care about ([Python]: str.rstrip([chars])):
def readfiletodict():
with open("testfile.txt", "r") as f:
mydict = {} #create a dictionary called mydict
for line in f:
key, val = line.strip("\"\n[]").split(",")
mydict[key.strip("'")] = val.strip()
print(mydict) #test
for key in mydict:
print(key) #test to see if the keys are being retrieved correctly
readfiletodict()
Output:
(py35x64_test) c:\Work\Dev\StackOverflow\q46041167>python a.py
{'d': '0', 'c': '3', 'a': '5', 'b': '2'}
d
c
a
b
The efficient way to do this would be using python lists as suggested by #Tico.
However, if for some reason you can't, you can try this.
lineFormat = re.sub('[^A-Za-z0-9,]+', '', line)
this will transform "['a', 5]" to a,5. Now you can apply your split function.
(key,val) = lineFormat.split(",")
mydict[key]=val
It's much easier if you transform your string_list in a real python list, so you don't need parsing. Use json loads:
import json
...
list_line = json.loads(line)
...
Hope it helps!
You can use regex and a dict-comprehension to do that:
#!/usr/bin/env python
import re
with open('file.txt', 'r') as f: l = f.read().splitlines()
d = {''.join(re.findall('[a-zA-Z]+',i)):int(''.join(re.findall('\d',i))) for i in l}
Result:
{'a': 5, 'c': 3, 'b': 2, 'd': 0}
Using only a very basic knowledge of Python:
>>> mydict = {}
>>> with open('temp.txt') as the_input:
... for line in the_input:
... values = line.replace('"', '').replace("'", '').replace(',', '').replace('[', '').replace(']', '').rstrip().split(' ')
... mydict[values[0]] = int(values[1])
...
>>> mydict
{'a': 5, 'b': 2, 'c': 3, 'd': 0}
In other words, discard all of the punctuation, leaving only the blank between the two values needed for the dictionary. Split on that blank, then put the pieces from the split into the dictionary.
Edit: In a similar vein, using a regex. The re.sub looks for the various alternative characters given by its first argument and any that are found are replaced by its second argument, an empty string. The alternatives are delimited by the '|' character in a regex pattern. Some of the alternatives, such as the '[', must be escaped with an '\' because on their own they have special meanings within a regex expression.
>>> mydict = {}
>>> with open('temp.txt') as the_input:
... for line in the_input:
... values = re.sub(r'"|\'|\,|\[|\]|,', '', line).split(' ')
... mydict[values[0]] = int(values[1])
...
>>> mydict
{'a': 5, 'b': 2, 'c': 3, 'd': 0}
You were almost there, missing two things:
stripping the keys
converting the values
The following code does what you need (I think):
#!/usr/bin/env python
# -*- coding: utf-8 -*-
output = dict()
with open('input', 'r') as inputfile:
for line in inputfile:
line = line.strip('"[]\n')
key, val = line.split(',')
output[key.strip("'")] = int(val)
Be careful however, since this code is very brittle. It won't process any variations on the input format you have provided correctly. To build on top of this, I'd recommend to at least use except ValueError for the int conversion and to think about the stripping characters again.

How to print values from a file?

I have a text file and its content is something like this:
A:3
B:5
C:7
A:8
C:6
I need to print:
A numbers: 3, 8
B numbers: 5
C numbers: 7, 6
I'm a beginner so if you could give some help I would appreciate it. I have made a dictionary but that's pretty much all I know.
You could use an approach that keeps the values in a dictionary:
d = {} # create an empty dictionary
for line in open(filename): # opens the file
k, v = line.split(':') # unpack each line in the char before : and after
if k in d: # add the values to the dictionary
d[k].append(v)
else:
d[k] = [v]
This gives you a dictionary containing your file in a format that you can utilize to get the desired output:
for key, values in sorted(d.items()):
print(key, 'numbers:' ', '.join(values))
The sorted is required because dictionaries are unordered.
Note that using collections.defaultdict instead of a normal dict could simplify the approach somewhat. The:
d = {}
...
if k in d: # add the values to the dictionary
d[k].append(v)
else:
d[k] = [v]
could then be replaced by:
from collections import defaultdict
d = defaultdict(list)
...
d[k].append(v)
Short version (Which should sort in alphabetic order)
d = {}
lines = [line.rstrip('\n') for line in open('filename.txt')]
[d.setdefault(line[0], []).append(line[2]) for line in lines]
[print(key, 'numbers:', ', '.join(values)) for key,values in sorted(d.items())]
Or if you want to maintain the order as they appear in file (file order)
from collections import OrderedDict
d = OrderedDict() # Empty dict
lines = [line.rstrip('\n') for line in open('filename.txt')] # Get the lines
[d.setdefault(line[0], []).append(line[2]) for line in lines] # Add lines to dictionary
[print(key, 'numbers:', ', '.join(values)) for key,values in d.items()] # Print lines
Tested with Python 3.5.
You can treat your file as csv (comma separated value) so you can use the csv module to parse the file in one line. Then use defaultdict with input in the costructor the class list to say that to create it when the key not exists. Then use OrderedDict class because standard dictionary don't keeps the order of your keys.
import csv
from collection import defaultdict, OrderedDict
values = list(csv.reader(open('your_file_name'), delimiter=":")) #[['A', '3'], ['B', '5'], ['C', '7'], ['A', '8'], ['C', '6']]
dct_values = defaultdict(list)
for k, v in values:
dct_values[k].append(v)
dct_values = OrderedDict(sorted(dct_values.items()))
Then you can simply print iterating the dictionary.
A very easy way to group by key is by external library, if you are interested try PyFunctional

Stripping string from lines in text file and reading columns into dictionary with lists as values

I've been struggling a bit with getting my input file into the right format for my algorithm.
I want to read this text file:
1 -> 7,8
11 -> 1,19
219 -> 1,9,8
Into this dictionary:
{ 1: [7, 8], 11: [1, 19], 219: [1, 9, 8]}
I've tried this code:
with open("file.txt", "r+") as f:
f.write(f.read().replace("->", " "))
f.close()
d = {}
with open("file.txt") as file:
for line in file:
(key, val) = line.split()
d[key] = val
But with this code it get's stuck on the fact that there are more than 2 arguments in the second column. How can make a list out of the elements in the second column and use that list as the value for each key?
There is no need to do that pre-processing step to remove the '->'. Simply use:
d = {}
with open("file.txt") as file:
for line in file:
left,right = line.split('->')
d[int(left)] = [int(v) for v in right.split(',')]
You can even use dictionary comprehension and make it a one-liner:
with open("file.txt") as file:
d = {int(left) : [int(v) for v in right.split(',')]
for left,right in (line.split('->') for line in file)
}
This gives:
>>> d
{1: [7, 8], 11: [1, 19], 219: [1, 9, 8]}
Nest a generator expression with str.split in a dictionary comprehension, converting the key to an integer and mapping the value to integers:
with open('file.txt') as f:
result = {int(k):list(map(int, v.split(','))) for k,v in (line.split(' -> ') for line in f)}

Categories

Resources