I have a dictionary in this format:
data_dict = {'a' : [1,2,3], 'b' : [[4,5],[6,7],[8,9]]}
What I would like to do is parse data from dictionary to csv file in a column 'format'. So key would be a title, and values goes afterwards, so the output should look like:
a b
1 [4,5]
2 [6,7]
3 [8,9]
I have tried to use csv.DictWriter or csv.writer but nothing have worked out for me.
You can use zip to aggregates elements from multiple iterables:
>>> rows = zip([1,2,3], [[4,5],[6,7],[8,9]])
>>> for row in rows:
... print(row)
...
(1, [4, 5])
(2, [6, 7])
(3, [8, 9])
import csv
import sys
data_dict = {'a' : [1,2,3], 'b' : [[4,5],[6,7],[8,9]]}
keys = sorted(data_dict) # to get ordered keys
values = [data_dict[key] for key in keys]
writer = csv.writer(sys.stdout, delimiter='\t') # Replace `sys.stdout` as you need
writer.writerow(keys)
for row in zip(*values):
writer.writerow(row)
Related
the dictionary I am using is:
dict={'item': [1,2,3], 'id':['a','b','c'], 'car':['sedan','truck','moped'], 'color': ['r','b','g'], 'speed': [2,4,10]}
I am trying to produce a tab delimited out put as such:
item id
1 a
2 b
3 c
The code I have written:
with open('file.txt', 'w') as tab_file:
dict_writer = DictWriter(tab_file, dict.keys(), delimiter = '\t')
dict_writer.writeheader()
dict_writer.writerows(dict)
specifically, I am struggling with writing to the file in a column based manner. Meaning, that the dictionary keys populate as the header, and the dictionary values populate vertically underneath the associated header. Also, I do NOT have the luxury of using Pandas
This solution will work for an ambiguous number of items and subitems in the dict:
d = {'item': [1, 2, 3], 'id': [4, 5, 6]}
for i in d:
print(i + "\t", end="")
numSubItems = len(d[i])
print()
for level in range(numSubItems):
for i in d:
print(str(d[i][level]) + "\t", end="")
print()
EDIT:
To implement this with writing to a text file:
d = {'item': [1, 2, 3], 'id': [4, 5, 6], 'test': [6, 7, 8]}
with open('file.txt', 'w') as f:
for i in d:
f.write(i + "\t")
numSubItems = len(d[i])
f.write("\n")
for level in range(numSubItems):
for i in d:
f.write(str(d[i][level]) + "\t")
f.write("\n")
Here's a way to do this using a one-off function and zip:
d = {
'item': [1, 2, 3],
'id': ['a', 'b', 'c'],
'car': ['sedan', 'truck', 'moped'],
'color': ['r', 'b', 'g'],
'speed': [2, 4, 10],
}
def row_printer(row):
print(*row, sep='\t')
row_printer(d.keys()) # Print header
for t in zip(*d.values()): # Print rows
row_printer(t)
To print to a file: print(..., file='file.txt')
You can use a simple loop with a zip:
d={'item': [1,2,3], 'id':["a","b","c"]}
print('item\tid')
for num, letter in zip(d['item'], d['id']):
print('\t'.join(str(num) + letter))
item id
1 a
2 b
3 c
EDIT:
If you don't want to hard code column names you can use this:
d={'item': [1,2,3], 'id':["a","b","c"]}
print('\t'.join(d.keys()))
for num, letter in zip(*d.values()):
print('\t'.join(str(num) + letter))
However the order of the columns is only guaranteed in python3.7+ if you use a dictionary. If you have a lower python version use an orderedDict instead, like this:
from collections import OrderedDict
d=OrderedDict({'item': [1,2,3], 'id':["a","b","c"]})
print('\t'.join(d.keys()))
for num, letter in zip(*d.values()):
print('\t'.join(str(num) + letter))
Instead of using csv.DictWriter you can also use a module like pandas for this:
import pandas as pd
df = pd.DataFrame.from_dict(d)
df.to_csv(“test.csv”, sep=“\t”, index=False)
Probably, you have to install it first by using
pip3 install pandas
See here for an example.
I have the following dict
items = {'people': ['Peter', 'Danny'], 'numbers': [1,2,3,4], 'cities': ['London']}
And I would like to write that dict to a CSV file by columns, that is, with the following format:
people,numbers,cities
Peter,1,London
Danny,2,
,3,
,4,
My current approach won't work because I get the CSV file by rows:
people,Peter,Danny
numbers,1,2,3,4
cities,London
How can I do what I need?
Or you can use Pandas for that, which only takes two lines
import pandas as pd
pd.DataFrame(items).fillna('').to_csv('file_path')
You can use itertools.zip_longest (itertools.izip_longest in Python2):
from itertools import zip_longest
import csv
items = {'people': ['Peter', 'Danny'], 'numbers': [1,2,3,4], 'cities': ['London']}
headers = ['people', 'numbers', 'cities']
with open('filename.csv', 'w') as f:
full_listing = [['' if not b else b for b in i] for i in zip_longest(*[items[c] for c in headers])]
write = csv.writer(f)
write.writerows([headers]+full_listing)
Output:
people,numbers,cities
Peter,1,London
Danny,2,
,3,
,4,
A simple way is to calculate the length of the longest list in your dictionary, and then append '' to all the lists so they have this length.
num_rows = max((len(x) for x in items.values()))
items = {k: items[k] + [''] * (num_rows - len(items[k])) for k in items}
print(items)
#{'cities': ['London', '', '', ''],
# 'numbers': [1, 2, 3, 4],
# 'people': ['Peter', 'Danny', '', '']}
Then write the dict to csv using the csv module.
Or you can build a pandas DataFrame from your dictionary:
import pandas as pd
df = pd.DataFrame(items)
print(df)
# cities numbers people
#0 London 1 Peter
#1 2 Danny
#2 3
#3 4
Now you can write it to a file using the to_csv() method.
If you do not want to rely on external dependencies like pandas, you can quickly achieve this in pure python with join method of str objects.
items = {'people': ['Peter', 'Danny'],
'numbers': [1, 2, 3, 4],
'cities': ['London']}
def to_csv(items):
# names of columns
header = ','.join(list(items.keys()))
# building lines
lines = list()
max_len = max([len(items[key]) for key in items.keys()])
for i in range(max_len):
lines.append(
','.join(
[str(items[key][i]) for key in items.keys()
if i < len(items[key])]))
# return header and lines separated by new lines
return '\n'.join([header] + lines)
print(to_csv(items))
outputs :
people,numbers,cities
Peter,1,London
Danny,2,
,3,
,4,
I want to do the below in python.The csv file is:
item1,item2,item2,item3
item2,item3,item4,item1
i want to make a dictionary with unique keys item1, item2, item3 and item4.
dictionary = {item1: value1, item2: value2....}. Value is how many times the key appears in csv file.How can I do this?
Obtain a list of all items from your cvs:
with open('your.csv') as csv:
content = csv.readlines()
items = ','.join(content).split(',')
Then start the mapping
mapping = {}
for item in items:
mapping[item] = (mapping.get(item) or 0) + 1
and your will get the following:
>>> mapping
{'item2': 3, 'item3': 2, 'item1': 2, 'item4': 1}
import csv
from collections import Counter
# define a generator, that will yield you field after field
# ignoring newlines:
def iter_fields(filename):
with open(filename, 'rb') as f:
reader = csv.reader(f)
for row in reader:
for field in row:
yield field
# now use collections.Counter to count your values:
counts = Counter(iter_fields('stackoverflow.csv'))
print counts
# output:
# Counter({'item3': 2, 'item2': 2, 'item1': 1,
# ' item1': 1, ' item2': 1, 'item4': 1})
see https://docs.python.org/2/library/collections.html#collections.Counter
import csv
temp = dict()
with open('stackoverflow.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
for x in row:
if x in temp.keys():
temp[x] = int(temp[x]) + 1
else:
temp[x] = 1
print temp
The output is like:-
{'item2': 3, 'item3': 2, 'item1': 2, 'item4': 1}
I have two tab separated files with multiple columns. I used 2 dictionaries, to store specific column of interest.
import csv
dic1={}
dic2={}
with open("Table1.tsv") as samplefile:
reader = csv.reader(samplefile, delimiter="\t")
columns = zip(*reader)
for column in columns:
A, B, C, D = columns
with open("Table2.tsv") as samplefile1:
reader = csv.reader(samplefile1, delimiter="\t")
columns = zip(*reader)
for column1 in columns:
A1, B1, C1 = columns
dic1['PMID'] = A # the first dictionary storing the data of column "A"
dic2['PMID'] = A1 # the second dictionary storing the data of column "A1"
# statement to compare the data in dic1[PMID] with dic2['PMID'] and print the common
Problem: What is the proper logic /or conditional statement to use to compare the two dictionaries and print the common data in both.
You can use set intersection as:
>>> d1={'a':2,'b':3,'c':4,'d':5}
>>> d2={'a':2,'f':3,'c':4,'b':5,'q':17}
>>> dict(set(d1.items()) & set(d2.items()))
{'a': 2, 'c': 4}
For your specific problem, this is the code:
>>> dic1={}
>>> dic2={}
>>> dic1['PMID']=[1,2,34,2,3,4,5,6,7,3,5,16]
>>> dic2['PMID']=[2,34,1,3,4,15,6,17,31,34,16]
>>> common=list(set(dic1['PMID']) & set(dic2['PMID']))
>>> common
[1, 2, 3, 4, 6, 34, 16]
Assume you have a data set as something like a CSV file that contains mildly sensitive information, like who passed a note to whom in a 12 Grade English class. While it's not a crisis if this data got out, it would be nice to strip out the identifying information so the data could be made public, shared with collaborators, etc. The data looks something like this:
Giver, Recipient:
Anna,JoeAnna,MarkMark,MindyMindy,Joe
How would you process through this list, assign each name a unique but arbitrary identifier, then strip out the names and replace them with said identifier in Python such that you end up with something like:
1,21,3
3,44,2
you can use hash() to generate a unique arbitrary identifier, it will return always return same integer for a particular string:
with open("data1.txt") as f:
lis=[x.split(",") for x in f]
items=[map(lambda y:hash(y.strip()),x) for x in lis]
for x in items:
print ",".join(map(str,x))
....:
-1319295970,1155173045
-1319295970,-1963774321
-1963774321,-1499251772
-1499251772,1155173045
or you can also use iterools.count:
In [80]: c=count(1)
In [81]: with open("data1.txt") as f:
lis=[map(str.strip,x.split(",")) for x in f]
dic={}
for x in set(chain(*lis)):
dic.setdefault(x.strip(),next(c))
for x in lis:
print ",".join(str(dic[y.strip()]) for y in x)
....:
3,2
3,4
4,1
1,2
or improving my previous answer using the unique_everseen recipe from itertools, you can get the exact answer :
In [84]: c=count(1)
In [85]: def unique_everseen(iterable, key=None):
seen = set()
seen_add = seen.add
if key is None:
for element in ifilterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
....:
In [86]: with open("data1.txt") as f:
lis=[map(str.strip,x.split(",")) for x in f]
dic={}
for x in unique_everseen(chain(*lis)):
dic.setdefault(x.strip(),next(c))
for x in lis:
print ",".join(str(dic[y.strip()]) for y in x)
....:
1,2
1,3
3,4
4,2
names = """
Anna,Joe
Anna,Mark
Mark,Mindy
Mindy,Joe
"""
nameset = set((",".join(names.strip().splitlines())).split(","))
for i,name in enumerate(nameset):
names = names.replace(name,str(i))
print names
2,1
2,3
3,0
0,1
You could use hash to get a unique ID for each name of you could use a dictionary mapping names to their values (if you want numbers to be as in your example):
data = [("Anna", "Joe"), ("Anna", "Mark"), ("Mark", "Mindy"), ("Mindy", "Joe")]
names = {}
def anon(name):
if not name in names:
names[name] = len(names) + 1
return names[name]
result = []
for n1, n2 in data:
result.append((anon(n1), anon(n2)))
print names
print result
Will give when run:
{'Mindy': 4, 'Joe': 2, 'Anna': 1, 'Mark': 3}
[(1, 2), (1, 3), (3, 4), (4, 2)]
First, read your file into a list of rows:
import csv
with open('myFile.csv') as f:
rows = [row for row in csv.reader(f)]
At this point, you could build a dict to hold the mapping:
nameSet = set()
for row in rows:
for name in row:
nameSet.add(name)
map = dict((name, i) for i, name in enumerate(nameSet))
Alternatively, you could build the dict directly:
nextID = 0
map = {}
for row in rows:
for name in row:
if name not in map:
map[name] = nextID
nextID += 1
Either way, you go through the rows again and apply the mapping:
output = [[map[name] for name in row] for row in rows]
To genuinely anonymize the data, you need random aliases for the names. Hashes are good for that, but if you just want to map each name to an integer, you could do something like this:
from random import shuffle
data = [("Anna", "Joe"), ("Anna", "Mark"), ("Mark", "Mindy"), ("Mindy", "Joe")]
names = list(set(x for pair in data for x in pair))
shuffle(names)
aliases = dict((k, v) for v, k in enumerate(names))
munged = [(aliases[a], aliases[b]) for a, b in data]
That'll give you something like:
>>> data
[('Anna', 'Joe'), ('Anna', 'Mark'), ('Mark', 'Mindy'), ('Mindy', 'Joe')]
>>> names
['Mindy', 'Joe', 'Anna', 'Mark']
>>> aliases
{'Mindy': 0, 'Joe': 1, 'Anna': 2, 'Mark': 3}
>>> munged
[(2, 1), (2, 3), (3, 0), (0, 1)]
You can then (if you need to) get the name from the alias, and vice versa:
>>> aliases["Joe"]
1
>>> names[2]
'Anna'