Using defaultdict to append a list from an .xlsx file

Using defaultdict to append a list from an .xlsx file - python

I'm trying to take an excel file with two fields, ID and xy coordinates and create a dictionary so that each ID is a key to all of the xy coordinate values.
for example, the excel file looks like this:
[1] [1]: http://i.stack.imgur.com/P2agc.png
but there are more than 900 oID values
I want the final format to be something like,
[('0',[-121.129247,37.037939,-121.129247,37.037939,-121.056516,36.997779]),
('1',[all, the, coordinates,with,oID,of,1]),('2'[all,the,coordinate,with,oID,of,2]etc.)]
I am trying to use a for statement to iterate through the excel sheet to populate a list with the first 200 rows, and then putting that into a default dict.
Here is what I have done so far:
wb=openpyxl.load_workbook('simpleCoordinate.xlsx')
sheet=wb['Sheet1']
from collections import defaultdict
CoordDict = defaultdict(list)
for i in range (1,201,1):
coordinate_list=[(sheet.cell(row=i,column=1).value, sheet.cell(row=i, column=2).value)]
for oID, xy in coordinate_list:
CoordDict[oID].append(xy)
print(list(CoordDict.items()))
which returns:
[(11, ['-121.177487,35.49885'])]
Only the 200th line of the excel sheet, rather than the whole thing.. I'm not sure what I'm doing wrong, is it something with the for statement? Am I thinking about this in the wrong way? I'm a total newbie to python any advice would be helpful!

You are overwriting coordinate_list 200 times. Instead, create it, then append to it with the += operator.
wb=openpyxl.load_workbook('simpleCoordinate.xlsx')
sheet=wb.get_sheet_by_name('Sheet1')
from collections import defaultdict
coordinate_list = list()
for i in range (1,201,1):
coordinate_list += [(sheet.cell(row=i,column=1).value, sheet.cell(row=i, column=2).value)]
coord_dict = defaultdict(list)
for oid, xy in coordinate_list:
coord_dict[oid] = xy
print(list(coord_dict.items()))

Related

Reading the 2nd entry of a .json

I am trying to read the density entry of a list of arrays within a .json file. He's a small portion of the file from the beginning:
["time_tag","density","speed","temperature"],["2019-04-14 18:20:00.000","4.88","317.9","11495"],["2019-04-14 18:21:00.000","4.89","318.4","11111"]
This is the code I have thus far:
with open('plasma.json', 'r') as myfile:
data = myfile.read()
obj = json.loads(data)
print(str(obj['density']))
It should print everything under the density column but I'm getting an error saying that the file can't be opened

First, you json file is not correct. If you want to read it with a single call obj = json.load(data), the json file should be:
[["time_tag","density","speed","temperature"],["2019-04-14 18:20:00.000","4.88","317.9","11495"],["2019-04-14 18:21:00.000","4.89","318.4","11111"]]
Notice the extra square bracket, making it a single list of sublists.
This said, being obj a list of lists, there is no way print(str(obj['density'])) will work. You need to loop on the list to print what you want, or convert this to a dataframe before.
Looping directly
idx = obj[0].index('density') #get the index of the density entry
#from the first list in obj, the header
for row in obj[1:]: #looping on all sublists except the first one
print(row[idx]) #printing
Using a dataframe (pandas)
import pandas as pd
df = pd.DataFrame(obj[1:], columns=obj[0]) #converting to a dataframe, using
#first row as column header
print(df['density'])

Are you sure your data is a valid json and not a csv?
As the snippet of data provided above matches that of a csv file and not a json.
You will be able to read the density key of the csv with:
import csv
input_file = csv.DictReader(open("plasma.csv"))
for row in input_file:
print(row['density'])
Data formatted as csv
["time_tag","density","speed","temperature"]
["2019-04-14 18:20:00.000","4.88","317.9","11495"]
["2019-04-14 18:21:00.000","4.89","318.4","11111"]
Result
4.88
4.89

how to read csvfile into dictionary format

I have a csv file as below:
a,green
a,red
a,blue
b,white
b,black
b,brown
I want to read it into python dictionary as below
{'a':{'green','red','blue'},'b':{'white','black','brown'}}
How can I do? Help me please

I will assume you want a dict with a list of values per key
{'a':['green','red','blue'],'b':['white','black','brown']}
If so, a possible quick workaround could be something like this
import csv
# Get a all the rows in the format [[k, v], ...]
rows = list(csv.reader(open('file_path_here', 'r')))
# Get all the unique keys
keys = set(r[0] for r in rows)
# Get a list of values for the given key
def get_values_list(key, _rows):
return [r[1] for r in _rows if r[0] == key]
# Generate the dict
keys_dict = dict((k, get_values_list(k, rows)) for k in keys)
print keys_dict
But I'm pretty sure this solution has a lot of room for improvement if you spend some time on it.

How to create separate Pandas DataFrames for each CSV file and give them meaningful names?

I've searched thoroughly and can't quite find the guidance I am looking for on this issue so I hope this question is not redundant. I have several .csv files that represent raster images. I'd like to perform some statistical analysis on them so I am trying to create a Pandas dataframe for each file so I can slice 'em dice 'em and plot 'em...but I am having trouble looping through the list of files to create a DF with a meaningful name for each file.
Here is what I have so far:
import glob
import os
from pandas import *
#list of .csv files
#I'd like to turn each file into a dataframe
dataList = glob.glob(r'C:\Users\Charlie\Desktop\Qvik\textRasters\*.csv')
#name that I'd like to use for each data frame
nameList = []
for raster in dataList:
path_list = raster.split(os.sep)
name = path_list[6][:-4]
nameList.append(name)
#zip these lists into a dict
dataDct = {}
for k, v in zip(nameList,dataList):
dataDct[k] = dataDct.get(k,"") + v
dataDct
So now I have a dict where the key is the name I want for each dataframe and the value is the path for read_csv(path):
{'Aspect': 'C:\\Users\\Charlie\\Desktop\\Qvik\\textRasters\\Aspect.csv',
'Curvature': 'C:\\Users\\Charlie\\Desktop\\Qvik\\textRasters\\Curvature.csv',
'NormalZ': 'C:\\Users\\Charlie\\Desktop\\Qvik\\textRasters\\NormalZ.csv',
'Slope': 'C:\\Users\\Charlie\\Desktop\\Qvik\\textRasters\\Slope.csv',
'SnowDepth': 'C:\\Users\\Charlie\\Desktop\\Qvik\\textRasters\\SnowDepth.csv',
'Vegetation': 'C:\\Users\\Charlie\\Desktop\\Qvik\\textRasters\\Vegetation.csv',
'Z': 'C:\\Users\\Charlie\\Desktop\\Qvik\\textRasters\\Z.csv'}
My instinct was to try variations of this:
for k, v in dataDct.iteritems():
k = read_csv(v)
but that leaves me with a single dataframe, 'k' , that is filled with data from the last file read in by the loop.
I'm probably missing something fundamental here but I am starting to spin my wheels on this so I'd thought I'd ask y'all...any ideas are appreciated!
Cheers.

Are you trying to get all of the data frames separately in a dictionary, one data frame per key? If so, this will leave you with the dict you showed but instead will have the data from in each key.
dataDct = {}
for k, v in zip(nameList,dataList):
dataDct[k] = read_csv(v)
So now, you could do this for example:
dataDct['SnowDepth'][['cola','colb']].plot()

Unclear why you're overwriting your object here I think you want either a list or dict of the dfs:
df_list=[]
for k, v in dataDct.iteritems():
df_list.append(read_csv(v))
or
df_dict={}
for k, v in dataDct.iteritems():
df_dict[k] = read_csv(v)

Writing to csv file

Let's say I have a dictionary:
dict = {'R150': 'PN000123', 'R331': 'PN000873', 'C774': 'PN000064', 'L7896': 'PN000447', 'R0640': 'PN000878', 'R454': 'PN000333'}.
I need to fill in this sample csv file: https://www.dropbox.com/s/c95mlitjrvyppef/sheet.csv
example rows
HEADER,ID,ReferenceID,Value,Location X-Coordinate,Location Y-Coordinate,ROOM,ALT_SYMBOLS,Voltage,Thermal_Rating,Tolerance,PartNumber,MPN,Description,Part_Type,PCB Footprint,SPLIT_INST,SWAP_INFO,GROUP,Comments,Wattage,Tol,Population Notes,Gender,ICA_MFR_NAME,ICA_PARTNUM,Order#,CLASS,INSTALLED,TN,RATING,OriginalSymbolOrigin,Rated_Current,Manufacturer 2,Status,Need To Mirror/Rotate Pin Display Properties,TOLERANCE,LEVEL
,,R150,1,,,,,<null>,<null>,<null>,,,to be linked,Resistor,TODO,<null>,<null>,<null>,<null>,1/16W,?,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>
,,R4737,1,,,,,<null>,<null>,<null>,,,to be linked,Resistor,TODO,<null>,<null>,<null>,<null>,1/16W,?,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>
,,R4738,1,,,,,<null>,<null>,<null>,,,to be linked,Resistor,TODO,<null>,<null>,<null>,<null>,1/16W,?,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>,<null>
Specifically, I need to fill in the PartNumber column based on the keys of the dict I created. So I need to iterate through column ReferenceID and compare that value to my keys in dict. If there is a match I need to fill in the corresponding PartNumber cell with that value (in the dict)....
I'm sorry if this is all confusing I am new to python and am having trouble with the csv module.

To get you started - here's something that uses the csv.DictReader object, and loops through your file, one row at a time, and where something exists in your_dict based on the row's ReferenceID sets the PartNumber to that value, otherwise to an empty string.
If you use this in conjuction with the docs at http://docs.python.org/2/library/stdtypes.html#typesmapping and http://docs.python.org/2/library/csv.html - you should be able to write out the data and better understand what's happening.
import csv
your_dict = {'R150': 'PN000123', 'R331': 'PN000873', 'C774': 'PN000064', 'L7896': 'PN000447', 'R0640': 'PN000878', 'R454': 'PN000333'}
with open('your_csv_file.csv') as fin:
csvin = csv.DictReader(fin)
for row in csvin:
row['PartNumber'] = your_dict.get(row['ReferenceID'], '')
print row

Need more efficient way to parse out csv file in Python

Here's a sample csv file
id, serial_no
2, 500
2, 501
2, 502
3, 600
3, 601
This is the output I'm looking for (list of serial_no withing a list of ids):
[2, [500,501,502]]
[3, [600, 601]]
I have implemented my solution but it's too much code and I'm sure there are better solutions out there. Still learning Python and I don't know all the tricks yet.
file = 'test.csv'
data = csv.reader(open(file))
fields = data.next()
for row in data:
each_row = []
each_row.append(row[0])
each_row.append(row[1])
zipped_data.append(each_row)
for rec in zipped_data:
if rec[0] not in ids:
ids.append(rec[0])
for id in ids:
for rec in zipped_data:
if rec[0] == id:
ser_no.append(rec[1])
tmp.append(id)
tmp.append(ser_no)
print tmp
tmp = []
ser_no = []
**I've omitted var initializing for simplicity of code
print tmp
Gives me output I mentioned above. I know there's a better way to do this or pythonic way to do it. It's just too messy! Any suggestions would be great!

from collections import defaultdict
records = defaultdict(list)
file = 'test.csv'
data = csv.reader(open(file))
fields = data.next()
for row in data:
records[row[0]].append(row[1])
#sorting by ids since keys don't maintain order
results = sorted(records.items(), key=lambda x: x[0])
print results
If the list of serial_nos need to be unique just replace defaultdict(list) with defaultdict(set) and records[row[0]].append(row[1]) with records[row[0]].add(row[1])

Instead of a list, I'd make it a collections.defaultdict(list), and then just call the append() method on the value.
result = collections.defaultdict(list)
for row in data:
result[row[0]].append(row[1])

Here's a version I wrote, looks like there are plenty of answers for this one already though.
You might like using csv.DictReader, gives you easy access to each column by field name (from the header / first line).
#!/usr/bin/python
import csv
myFile = open('sample.csv','rb')
csvFile = csv.DictReader(myFile)
# first row will be used for field names (by default)
myData = {}
for myRow in csvFile:
myId = myRow['id']
if not myData.has_key(myId): myData[myId] = []
myData[myId].append(myRow['serial_no'])
for myId in sorted(myData):
print '%s %s' % (myId, myData[myId])
myFile.close()

Some observations:
0) file is a built-in (a synonym for open), so it's a poor choice of name for a variable. Further, the variable actually holds a file name, so...
1) The file can be closed as soon as we're done reading from it. The easiest way to accomplish that is with a with block.
2) The first loop appears to go over all the rows, grab the first two elements from each, and make a list with those results. However, your rows already all contain only two elements, so this has no net effect. The CSV reader is already an iterator over rows, and the simple way to create a list from an iterator is to pass it to the list constructor.
3) You proceed to make a list of unique ID values, by manually checking. A list of unique things is better known as a set, and the Python set automatically ensures uniqueness.
4) You have the name zipped_data for your data. This is telling: applying zip to the list of rows would produce a list of columns - and the IDs are simply the first column, transformed into a set.
5) We can use a list comprehension to build the list of serial numbers for a given ID. Don't tell Python how to make a list; tell it what you want in it.
6) Printing the results as we get them is kind of messy and inflexible; better to create the entire chunk of data (then we have code that creates that data, so we can do something else with it other than just printing it and forgetting it).
Applying these ideas, we get:
filename = 'test.csv'
with open(filename) as in_file:
data = csv.reader(in_file)
data.next() # ignore the field labels
rows = list(data) # read the rest of the rows from the iterator
print [
# We want a list of all serial numbers from rows with a matching ID...
[serial_no for row_id, serial_no in rows if row_id == id]
# for each of the IDs that there is to match, which come from making
# a set from the first column of the data.
for id in set(zip(*rows)[0])
]
We can probably do even better than this by using the groupby function from the itertools module.

example using itertools.groupby. This only works if the rows are already grouped by id
from csv import DictReader
from itertools import groupby
from operator import itemgetter
filename = 'test.csv'
# the context manager ensures that infile is closed when it goes out of scope
with open(filename) as infile:
# group by id - this requires that the rows are already grouped by id
groups = groupby(DictReader(infile), key=itemgetter('id'))
# loop through the groups printing a list for each one
for i,j in groups:
print [i, map(itemgetter(' serial_no'), list(j))]
note the space in front of ' serial_no'. This is because of the space after the comma in the input file

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using defaultdict to append a list from an .xlsx file - python

Related

Reading the 2nd entry of a .json

how to read csvfile into dictionary format

How to create separate Pandas DataFrames for each CSV file and give them meaningful names?

Writing to csv file

Need more efficient way to parse out csv file in Python

Categories

Resources