Making a CSV list in Python - python

I am new to python and need help. I am trying to make a list of comma separated values.
I have this data.
EasternMountain 84,844 39,754 24,509 286 16,571 3,409 315
EasternHill 346,373 166,917 86,493 1,573 66,123 23,924 1,343
EasternTerai 799,526 576,181 206,807 2,715 6,636 1,973 5,214
CentralMountain 122,034 103,137 13,047 8 2,819 2,462 561
Now how do I get something like this;
"EasternMountain": 84844,
"EasternHill":346373,
and so on??
So far I have been able to do this:
fileHandle = open("testData", "r")
data = fileHandle.readlines()
fileHandle.close()
dataDict = {}
for i in data:
temp = i.split(" ")
dataDict[temp[0]]=temp[1]
with_comma='"'+temp[0]+'"'+':'+temp[1]+','
print with_comma

Use the csv module
import csv
with open('k.csv', 'r') as csvfile:
reader = csv.reader(csvfile, delimiter=' ')
my_dict = {}
for row in reader:
my_dict[row[0]] = [''.join(e.split(',')) for e in row[1:]]
print my_dict
k.csv is a text file containing:
EasternMountain 84,844 39,754 24,509 286 16,571 3,409 315
EasternHill 346,373 166,917 86,493 1,573 66,123 23,924 1,343
EasternTerai 799,526 576,181 206,807 2,715 6,636 1,973 5,214
CentralMountain 122,034 103,137 13,047 8 2,819 2,462 561
Output:
{'EasternHill': ['346373', '166917', '86493', '1573', '66123', '23924', '1343', ''], 'EasternTerai': ['799526', '576181', '206807', '2715', '6636', '1973', '5214', ''], 'CentralMountain': ['122034', '103137', '13047', '8', '2819', '2462', '561', ''], 'EasternMountain': ['84844', '39754', '24509', '286', '16571', '3409', '315', '']}

Try this:
def parser(file_path):
d = {}
with open(file_path) as f:
for line in f:
if not line:
continue
parts = line.split()
d[parts[0]] = [part.replace(',', '') for part in parts[1:]]
return d
Running it:
result = parser("testData")
for key, value in result.items():
print key, ':', value
Result:
EasternHill : ['346373', '166917', '86493', '1573', '66123', '23924', '1343']
EasternTerai : ['799526', '576181', '206807', '2715', '6636', '1973', '5214']
CentralMountain : ['122034', '103137', '13047', '8', '2819', '2462', '561']
EasternMountain : ['84844', '39754', '24509', '286', '16571', '3409', '315']

Related

How to save these elements read from a .txt file into an array/matrix in python

I have a .txt file that contains elements that look like this:
Smith 25 35 NC
Johnson 12 4 OH
Jones 23 14 FL
Lopez 2 7 TX
And I want to read the .txt file line by line, and save each of the elements (Name, number, number, state) in an array matrix or a list 4 x number_of_people , while ignoring any blank spaces. I'm trying to not use split() for it, but could use a "manual" form of split() instead, like shown below with split1.
def split1(line,delim):
s=[]
j=0
for i in range (len(line)):
if delim== line [i]:
s.append(line[j:i])
j=i+1
s.append (line[j:])
return s
f = open("Names.txt")
number_of_people = 0
#This portion is meant to go through the entire .txt file, 1 time and count how many people are listed on the file so I can make an appropiatly sized matrix, in the case of the example is 4
while True:
file_eof = f.readline()
if file_eof != '':
number_of_people = number_of_people + 1
if file_eof == '':
break
#This portion reads through the .txt file agin and saves the names of the list
while True:
file_eof = f.readline()
if file_eof != '':
split1(file_eof, '')
#print(file_eof)
if file_eof == '':
print('No more names on the list')
break
f.close()
I know there could be things missing here, and that's exactly what I would need help with here. If there is any "better" way of dealing with this than what I got please let me know and show me if possible.
Thank you for your time!
I don't understand why you want to create an array of a specific size first. I suppose you have a background in C? How large is the file?
Here are 2 pythonic ways to read and store that information:
filename = r"data.txt"
# Access items by index, e.g. people_as_list[0][0] is "Smith"
with open(filename) as f: # with statement = context manager = implicit/automatic closing of the file
people_as_list = [line.split() for line in f] # List comprehension
# Access items by index, then key, e.g. people_as_dict[0]["name"] is "Smith"
people_as_dict = []
with open(filename) as f:
for line in f:
name, number1, number2, state = line.split() # Iterable unpacking
person = {}
person["name"] = name
person["number1"] = number1
person["number2"] = number2
person["state"] = state
people_as_dict.append(person)
print(people_as_list)
print(people_as_dict)
Output:
[['Smith', '25', '35', 'NC'], ['Johnson', '12', '4', 'OH'], ['Jones', '23', '14', 'FL'], ['Lopez', '2', '7', 'TX']]
[{'name': 'Smith', 'number1': '25', 'number2': '35', 'state': 'NC'}, {'name': 'Johnson', 'number1': '12', 'number2': '4', 'state': 'OH'}, {'name': 'Jones', 'number1': '23', 'number2': '14', 'state': 'FL'}, {'name': 'Lopez', 'number1': '2', 'number2': '7', 'state': 'TX'}]

How to print current columns and to be lowercase without punctuation in CSV file?

I have little code which using regex and here I'm trying to make my records to be with lowercase and without any punctuations in it, but in further situation I have error
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 5387: character maps to <undefined>
I want to extract Record ID and Title for the records with Languages English
import csv
import re
import numpy
filename = ('records.csv')
def reg_test(name):
reg_result = ''
with open(name, 'r') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
row = re.sub('[^A-Za-z0-9]+', '', str(row))
reg_result += row + ','
if (row['Languages'] == 'English')
return reg_result
print(reg_test(filename).lower())
import re, csv
# sample.csv - contains some samples from original csv file.
with open('sample.csv', 'rb') as f:
patt = r'[:;\'".`~!##$?-_*()=\[\]\/]+'
puncs = re.findall(patt, f.read())
f.close()
with open('sample.csv', 'rb') as f:
reader = csv.reader(f)
next(reader) # leaving the header of csv file
data = []
for row in reader:
data.append(row)
f.close()
new_data = []
for i, j in enumerate(data):
d = ','.join(j)
nop = [c for c in d if c not in puncs]
nop = ''.join(nop)
new_data.append(nop.split(','))
print new_data
output:
[['UkEN000561198', 'article', 'text', '00310182', '', 'QE500', '56045', 'Mesozoic radiolarian biostratigraphy of Japan and collage tectonics along the eastern continental margin of Asia', '', 'Kojima', ' S Mizutani', ' S', '', 'Netherlands', 'PALAEOGEOGRAPHY PALAEOCLIMATOLOGY PALAEOECOLOGY', 'monthly', '1992', '96', '2Jan', '', '', '', '367', '', 'PALAEOGEOGRAPHY PALAEOCLIMATOLOGY PALAEOECOLOGY 9612', ' 367 1992', '634345'],
['UkEN001027396', 'article', 'text', '03778398', '', 'QE719', '560', 'Late Pliocene climate in the Southeast Atlantic Preliminary results from a multidisciplinary study of DSDP Site 532', '', 'Hall', ' M A Heusser', ' L Sancetta', ' C', '', 'Netherlands', 'MARINE MICROPALAEONTOLOGY', '4 issues per year', '1992', '20', '1', '', '', '', '59', '', 'MARINE MICROPALAEONTOLOGY 201', ' 59 1992', '53764']]
Hope, this may help.

Read CSV data and add it into dictionary

This is an empty Dictionary
d = {}
This is the csv file data
M, Max, Sporting, Football, Cricket
M, Jack, Sporting, Cricket, Tennis
M, Kevin, Sporting, Cricket, Basketball
M, Ben, Sporting, Football, Rugby
I tried to use the following code to append data from the csv to dictionary.
with open('example.csv', "r") as csvfile:
csv_reader = csv.reader(csvfile)
for row in csv_reader:
if row:
d.setdefault(row[0], {})[row[1]] = {row[2]: [row[3]]}
But it gives me an error:
d.setdefault(row[0], {})[row[1]] = {row[2]: [row[3]]}
IndexError: list index out of range
It there any way, i can add data from csv to the dictionary, in the form:
d = {'M': {'Max': {'Sporting': ['Football', 'Cricket']}, 'Jack': {'Sporting': ['Cricket', 'Tennis']}}}
I am new to this so help me.
import csv
d={}
with open('JJ.csv', "r") as csvfile:
csv_reader = csv.reader(csvfile)
for row in csv_reader:
if row:
d.setdefault(row[0],{})[row[1]] = {row[2]: [row[3],row[4]]}
print(d)
{'M': {' Max': {' Sporting': [' Football', ' Cricket']}, ' Jack': {' Sporting': [' Cricket', ' Tennis']}, ' Kevin': {' Sporting': [' Cricket', ' Basketball']}, ' Ben': {' Sporting': [' Football', ' Rugby']}}}
To remove all the leading/trailing spaces in the output, you can use the below line instead. There might be a better way which I'm not sure as of now.
d.setdefault(row[0],{})[row[1].strip()] = {row[2].strip(): [row[3].strip(),row[4].strip()]}
You can use a nested collections.defaultdict tree and check if the rows are long enough:
from collections import defaultdict
def tree():
return defaultdict(tree)
d = tree()
# ...
for row in csv_reader:
if len(row) >= 3:
d[row[0]][row[1]][row[2]] = row[3:]
Change "for column in csv_reader:" to "for row in csv_reader:"
Straightforwardly:
import csv, collections
with open('example.csv', 'r') as f:
reader = csv.reader(f, skipinitialspace=True)
result = collections.defaultdict(dict)
for r in reader:
if not result[r[0]].get(r[1]): result[r[0]][r[1]] = {}
if not result[r[0]][r[1]].get(r[2]):
result[r[0]][r[1]][r[2]] = r[-2:]
print(dict(result))
The output:
{'M': {'Kevin': {'Sporting': ['Cricket', 'Basketball']}, 'Max': {'Sporting': ['Football', 'Cricket']}, 'Jack': {'Sporting': ['Cricket', 'Tennis']}, 'Ben': {'Sporting': ['Football', 'Rugby']}}}

How to read a file block-wise in python

I am bit stuck in reading a file block-wise, and facing difficulty in getting some selective data in each block :
Here is my file content :
DATA.txt
#-----FILE-----STARTS-----HERE--#
#--COMMENTS CAN BE ADDED HERE--#
BLOCK IMPULSE DATE 01-JAN-2010 6 DEHDUESO203028DJE \
SEQUENCE=ai=0:at=221:ae=3:lu=100:lo=NNU:ei=1021055:lr=1: \
USERID=ID=291821 NO_USERS=3 GROUP=ONE id_info=1021055 \
CREATION_DATE=27-JUNE-2013 SN=1021055 KEY ="22WS \
DE34 43RE ED54 GT65 HY67 AQ12 ES23 54CD 87BG 98VC \
4325 BG56"
BLOCK PASSION DATE 01-JAN-2010 6 DEHDUESO203028DJE \
SEQUENCE=ai=0:at=221:ae=3:lu=100:lo=NNU:ei=324356:lr=1: \
USERID=ID=291821 NO_USERS=1 GROUP=ONE id_info=324356 \
CREATION_DATE=27-MAY-2012 SN=324356 KEY ="22WS \
DE34 43RE 342E WSEW T54R HY67 TFRT 4ER4 WE23 XS21 \
CD32 12QW"
BLOCK VICTOR DATE 01-JAN-2010 6 DEHDUESO203028DJE \
SEQUENCE=ai=0:at=221:ae=3:lu=100:lo=NNU:ei=324356:lr=1: \
USERID=ID=291821 NO_USERS=5 GROUP=ONE id_info=324356 \
CREATION_DATE=27-MAY-2012 SN=324356 KEY ="22WS \
DE34 43RE 342E WSEW T54R HY67 TFRT 4ER4 WE23 XS21 \
CD32 12QW"
#--BLOCK--ENDS--HERE#
#--NEW--BLOCKS--CAN--BE--APPENDED--HERE--#
I am only interested in Block Name , NO_USERS, and id_info of each block .
these three data to be saved to a data-structure(lets say dict), which is further stored in a list :
[{Name: IMPULSE ,NO_USER=3,id_info=1021055},{Name: PASSION ,NO_USER=1,id_info=324356}. . . ]
any other data structure which can hold the info would also be fine.
So far i have tried getting the block names by reading line by line :
fOpen = open('DATA.txt')
unique =[]
for row in fOpen:
if "BLOCK" in row:
unique.append(row.split()[1])
print unique
i am thinking of regular expression approach, but i have no idea where to start with.
Any help would be appreciate.Meanwhile i am also trying , will update if i get something . Please help .
You could use groupy to find each block, use a regex to extract the info and put the values in dicts:
from itertools import groupby
import re
with open("test.txt") as f:
data = []
# find NO_USERS= 1+ digits or id_info= 1_ digits
r = re.compile("NO_USERS=\d+|id_info=\d+")
grps = groupby(f,key=lambda x:x.strip().startswith("BLOCK"))
for k,v in grps:
# if k is True we have a block line
if k:
# get name after BLOCK
name = next(v).split(None,2)[1]
# get lines after BLOCK and get the second of those
t = next(grps)[1]
# we want two lines after BLOCK
_, l = next(t), next(t)
d = dict(s.split("=") for s in r.findall(l))
# add name to dict
d["Name"] = name
# add sict to data list
data.append(d)
print(data)
Output:
[{'NO_USERS': '3', 'id_info': '1021055', 'Name': 'IMPULSE'},
{'NO_USERS': '1', 'id_info': '324356', 'Name': 'PASSION'},
{'NO_USERS': '5', 'id_info': '324356', 'Name': 'VICTOR'}]
Or without groupby as your file follows a format we just need to extract the second line after the BLOCK line:
with open("test.txt") as f:
data = []
r = re.compile("NO_USERS=\d+|id_info=\d+")
for line in f:
# if True we have a new block
if line.startswith("BLOCK"):
# call next twice to get thw second line after BLOCK
_, l = next(f), next(f)
# get name after BLOCK
name = line.split(None,2)[1]
# find our substrings from l
d = dict(s.split("=") for s in r.findall(l))
d["Name"] = name
data.append(d)
print(data)
Output:
[{'NO_USERS': '3', 'id_info': '1021055', 'Name': 'IMPULSE'},
{'NO_USERS': '1', 'id_info': '324356', 'Name': 'PASSION'},
{'NO_USERS': '5', 'id_info': '324356', 'Name': 'VICTOR'}]
To extract values you can iterate:
for dct in data:
print(dct["NO_USERS"])
Output:
3
1
5
If you want a dict of dicts and to access each section from 1-n you can store as nested dicts using from 1-n as tke key:
from itertools import count
import re
with open("test.txt") as f:
data, cn = {}, count(1)
r = re.compile("NO_USERS=\d+|id_info=\d+")
for line in f:
if line.startswith("BLOCK"):
_, l = next(f), next(f)
name = line.split(None,2)[1]
d = dict(s.split("=") for s in r.findall(l))
d["Name"] = name
data[next(cn)] = d
data["num_blocks"] = next(cn) - 1
Output:
from pprint import pprint as pp
pp(data)
{1: {'NO_USERS': '3', 'Name': 'IMPULSE', 'id_info': '1021055'},
2: {'NO_USERS': '1', 'Name': 'PASSION', 'id_info': '324356'},
3: {'NO_USERS': '5', 'Name': 'VICTOR', 'id_info': '324356'},
'num_blocks': 3}
'num_blocks' will tell you exactly how many blocks you extracted.

How to store CSV data in a Nested Dictionary that has a dictionary and a list?

I have the following CSV Data,
Rule1,Status1,1
Rule1,Status2,1
Rule1,Status3,1
Rule1,Status4,2
Rule2,Status1,2
Rule2,Status2,1
Rule2,Status3,1
Rule2,Status4,3
I have unique rules (first column) stored in a list called Rules. I want my dictionary to look like the following:
DictionaryFull = {
'Rule1' : {1 : [Status1, Status2, Status3], 2 : [Status4]},
'Rule2' : {1 : [Status2, Status3], 2 : [Status1], 3 : [Status4]}
}
Here is what I tried:
openfile = ('data.csv', 'rU')
finalfile = csv.reader(openfile, delimiter=',')
FullDictionary = {}
for row in finalfile:
for j in range (0, 300): #300 number of rules
if Rules[j] not in FullDictionary:
for i in range(1, 71): #These are third column numbers 1 - 71
if i == int(row[2]) and row[0] == Rules[j]:
FullDictionary = {Rules[j] : { i : [].append[row[1]}}
print FullDictionary
But I am getting the following as the result:
{'Rule1': {1 : None}} and so on
Am I doing something wrong? How to accomplish this task of having a dictionary with both another dictionary and a list.
I tried this:
def something():
full_dictionary = {}
with open(DataFilePath) as f:
reader = csv.reader(f)
for row in reader:
rule = row[2], status = row[0], num = int(row[5])
r = full_dictionary.setdefault(rule, {})
r.setdefault(num, []).append(status)
print full_dictionary
The error: ValueError: I/O operation on closed file
Hwo about using collection.defaultdict:
import csv
from collections import defaultdict
full_dictionary = defaultdict(lambda: defaultdict(list))
with open('data.csv') as f:
reader = csv.reader(f)
for rule, status, num in reader:
full_dictionary[rule][num].append(status)
print full_dictionary
output:
defaultdict(<function <lambda> at 0x00000000025A6438>, {
'Rule2': defaultdict(<type 'list'>, {
'1': ['Status2', 'Status3'],
'3': ['Status4'],
'2': ['Status1']
}),
'Rule1': defaultdict(<type 'list'>, {
'1': ['Status1', 'Status2', 'Status3'],
'2': ['Status4']
})
})
If you don't want to use defaultdict, you have to care new key.
For example, using dict.setdefault:
import csv
full_dictionary = {}
with open('data.csv') as f:
reader = csv.reader(f)
for rule, status, num in reader:
r = full_dictionary.setdefault(rule, {})
r.setdefault(num, []).append(status)
print full_dictionary
output:
{'Rule1': {'1': ['Status1', 'Status2', 'Status3'], '2': ['Status4']},
'Rule2': {'1': ['Status2', 'Status3'], '2': ['Status1'], '3': ['Status4']}}
list.append returns None, so your assignment Rules[j] = [].append([row[1]) is setting Rules[j] = None.
Amend that to:
FullDictionary = {Rules[j] : { i : [row[1]}}
or
old_value = Rules[j].get(i, [])
old_value.append(row[1])
depending on what you're wishing to achieve.

Categories

Resources