I'm trying to write out a CSV file by printing out an output of average readings, maximum readings and outlier readings. I have the readings, but they are not in order. Here is my output when I try to print out outlier readings, printing out the sensor name beforehand. This happens for every reading I try in the functions I have made. I would like it to print from 1 to 25, instead of starting at 22 and then becoming randomized.
sensor_22
[5]
sensor_23
[5, 6]
sensor_20
[5, 6, 5]
sensor_21
[5, 6, 5, 1]
sensor_24
[5, 6, 5, 1, 1]
sensor_25
[5, 6, 5, 1, 1, 6]
sensor_9
[5, 6, 5, 1, 1, 6, 8]
sensor_8
[5, 6, 5, 1, 1, 6, 8, 5]
sensor_3
[5, 6, 5, 1, 1, 6, 8, 5, 0]
sensor_2
[5, 6, 5, 1, 1, 6, 8, 5, 0, 6]
sensor_1
[5, 6, 5, 1, 1, 6, 8, 5, 0, 6, 9]...etc
The values are mismatched to the sensor name in my final CSV file.
Here is some CSV file data. It is a fairly large file.
sensor_1,2018-01-02,115
sensor_1,2018-01-03,51
sensor_1,2018-01-04,30
sensor_1,2018-01-05,198
Here is my current code. I cannot use any pandas, numpy..etc
import csv
options = []
options_readings = {}
max_readings= []
avg_readings = []
outlier_readings = []
with open('sensor_data.csv', 'rb') as f:
reader = csv.reader(f, delimiter = ',')
for row in reader:
if row[0] not in options:
options.append(row[0])
options_readings[row[0]] = []
sensor_name = row[0]
datastamp = row[1]
readings = float(row[2])
options_readings[row[0]].append(readings)
def calculateNumberOfOutlierReadings():
for option in options_readings:
print(option)
readings = options_readings[option]
count = 0
for row in readings:
if(row > 180 or row <0):
count +=1
outlier_readings.append(count)
print(outlier_readings)
def calculateMaxReadings():
for options in options_readings:
max_readings.append(max(options_readings[options]))
def calculateAverageReadings():
for option in options_readings:
readings = options_readings[option]
avg_readings.append(sum(readings)/len(readings))
calculateMaxReadings()
calculateAverageReadings()
calculateNumberOfOutlierReadings()
zip(options, avg_readings, max_readings, outlier_readings)
with open('output.csv', 'wb') as out:
writer = csv.writer(out, delimiter = ',')
header = ["Sensor Name", "Average Reading", "Maximum Reading", "Number of Outlier Readings"]
writer.writerow([i for i in header])
writer.writerows(zip(options, avg_readings, max_readings, outlier_readings))
I believe using options_readings as a dictionary and trying to sort those keys are my issue. Any help would be appreciated.
At the end of your solution, sort the data before you write it to a file.
....
data = sorted(zip(options, avg_readings, max_readings, outlier_readings) )
with open('output.csv', 'w') as out:
writer = csv.writer(out, delimiter = ',')
header = ["Sensor Name", "Average Reading", "Maximum Reading", "Number of Outlier Readings"]
writer.writerow([i for i in header])
writer.writerows(data)
Suggested improvements:
Rewrite the functions so that they return a result. Many (most?) people prefer functions to return value(s) instead of mutating things external to the function.
def calculateNumberOfOutlierReadings(readings):
count = 0
for reading in readings:
if reading < 0 or reading > 180:
count += 1
return count
def calculateMaxReadings(readings):
return max(readings)
def calculateAverageReadings(readings):
return sum(readings) / len(readings)
Keep all the data from the csv in a dictionary - {sensor_name:[value,value,...]}.
data = collections.defaultdict(list)
with open('foo.csv') as f:
reader = csv.reader(f)
for (sensor,date,value) in reader:
data[sensor].append(float(value))
Iterate over the data; perform calcs on each item's values; put the calcs in a tuple using the sensor name for the first item; store the tuple in a list.
results = []
for sensor,readings in data.items():
number_of_outliers = calculateNumberOfOutlierReadings(readings)
maximum = calculateMaxReadings(readings)
average = calculateAverageReadings(readings)
results.append((sensor,average,maximum,number_of_outliers))
Sort the results
results.sort()
For a csv that looks like
sensor_4,2018-01-05,198
sensor_2,2018-01-02,115
sensor_4,2018-01-03,51
sensor_2,2018-01-04,30
sensor_1,2018-01-02,115
sensor_4,2018-01-04,30
sensor_2,2018-01-05,198
sensor_1,2018-01-04,30
sensor_3,2018-01-04,30
sensor_1,2018-01-05,198
sensor_4,2018-01-02,115
sensor_2,2018-01-05,198
sensor_3,2018-01-02,115
sensor_1,2018-01-03,51
sensor_3,2018-01-03,51
sensor_2,2018-01-03,51
sensor_3,2018-01-05,198
results ends up being in the order you want:
In [16]: results
Out[16]:
[('sensor_1', 98.5, 198.0, 1),
('sensor_2', 118.4, 198.0, 2),
('sensor_3', 98.5, 198.0, 1),
('sensor_4', 98.5, 198.0, 1)]
Related
I have an csv file like this:
student_id,event_id,score
1,1,20
3,1,20
4,1,18
5,1,13
6,1,18
7,1,14
8,1,14
9,1,11
10,1,19
...
and I need to convert it into multiple arrays/lists like I did using pandas here:
scores = pd.read_csv("/content/score.csv", encoding = 'utf-8',
index_col = [])
student_id = scores['student_id'].values
event_id = scores['event_id'].values
score = scores['score'].values
print(scores.head())
As you can see, I get three arrays, which I need in order to run the data analysis. How can I do this using Python's CSV library? I have to do this without the use of pandas. Also, how can I export data from multiple new arrays into a csv file when I am done with this data? I, again, used panda to do this:
avg = avgScore
max = maxScore
min = minScore
sum = sumScore
id = student_id_data
dict = {'avg(score)': avg, 'max(score)': max, 'min(score)': min, 'sum(score)': sum, 'student_id': id}
df = pd.DataFrame(dict)
df.to_csv(r'/content/AnalyzedData.csv', index=False)
Those first 5 are arrays if you are wondering.
Here's a partial answer which will produce a separate list for each column in the CSV file.
import csv
csv_filepath = "score.csv"
with open(csv_filepath, "r", newline='') as csv_file:
reader = csv.DictReader(csv_file)
columns = reader.fieldnames
lists = {column: [] for column in columns} # Lists for each column.
for row in reader:
for column in columns:
lists[column].append(int(row[column]))
for column_name, column in lists.items():
print(f'{column_name}: {column}')
Sample output:
student_id: [1, 3, 4, 5, 6, 7, 8, 9, 10]
event_id: [1, 1, 1, 1, 1, 1, 1, 1, 1]
score: [20, 20, 18, 13, 18, 14, 14, 11, 19]
You also asked how to do the reverse of this. Here's an example I how is self-explanatory:
# Dummy sample analysis data
length = len(lists['student_id'])
avgScore = list(range(length))
maxScore = list(range(length))
minScore = list(range(length))
sumScore = list(range(length))
student_ids = lists['student_id']
csv_output_filepath = 'analysis.csv'
fieldnames = ('avg(score)', 'max(score)', 'min(score)', 'sum(score)', 'student_id')
with open(csv_output_filepath, 'w', newline='') as csv_file:
writer = csv.DictWriter(csv_file, fieldnames)
writer.writeheader()
for values in zip(avgScore, maxScore, minScore, sumScore, student_ids):
row = dict(zip(fieldnames, values)) # Combine into dictionary.
writer.writerow(row)
What you want to do does not require the csv module, it's just three lines of code (one of them admittedly dense)
splitted_lines = (line.split(',') for line in open('/path/to/you/data.csv')
labels = next(splitted_lines)
arr = dict(zip(labels,zip(*((int(i) for i in ii) for ii in splitted_lines))))
splitted_lines is a generator that iterates over your data file one line at a time and provides you a list with the three (in your example) items in each line, line by line.
next(splitted_lines) returns the list with the (splitted) content of the first line, that is our three labels
We fit our data in a dictionary; using the class init method (i.e., by invoking dict) it is possible to initialize it using a generator of 2-uples, here the value of a zip:
zip 1st argument is labels, so the keys of the dictionary will be the labels of the columns
the 2nd argument is the result of the evaluation of an inner zip but in this case zip is used because zipping the starred form of a sequence of sequences has the effect of transposing it... so the value associated to each key will be the transpose of what follows * …
what follows the * is simply (the generator equivalent of) a list of lists with (in your example) 9 rows of three integer values so that
the second argument to the 1st zip is consequently a sequence of three sequences of nine integers, that are going to be coupled to the corresponding three keys/labels
Here I have an example of using the data collected by the previous three lines of code
In [119]: print("\n".join("%15s:%s"%(l,','.join("%3d"%i for i in arr[l])) for l in labels))
...:
student_id: 1, 3, 4, 5, 6, 7, 8, 9, 10
event_id: 1, 1, 1, 1, 1, 1, 1, 1, 1
score: 20, 20, 18, 13, 18, 14, 14, 11, 19
In [120]: print(*arr['score'])
20 20 18 13 18 14 14 11 19
PS If the question were about an assignment in a sort of Python 101 it's unlikely that my solution would be deemed acceptable
I am trying to write csv file with DictWriter .
So far i have created a list and a loop but it only writes the last row of the list , number of times how many items are in the list.
This is my code:
fieldnames = ['Code', 'Hight', 'Country']
with open('write.csv', 'w',newline="") as f:
w = csv.DictWriter(f,fieldnames=fieldnames,delimiter = "\t")
w.writeheader()
for i in my_list:
w.writerow({"Code":code,"Hight":hight,"Country":country})
You are writing the same variable on each iteration of the for loop. If my_list is a list of dictionaries, the code should be:
for i in my_list:
w.writerow({"Code": i['code'], "Hight": i['hight'], "Country": i['country']})
If my_list is a list of lists,
for i in my_list:
w.writerow({"Code": i[0], "Hight": i[1], "Country": i[2]})
This might help you
import csv
nms = [[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]]
f = open('numbers2.csv', 'w')
with f:
writer = csv.writer(f)
for row in nms:
writer.writerow(row)
I'm working with python and I read out a column from a CSV file. I save the values in an array by grouping them. This array looks something like this:
[1, 5, 10, 15, 7, 3]
I want to create a second array where I take the number of the array and make the sum with the previous values. So in this case I would like the have the following output:
[1, 6, 16, 31, 38, 41]
My code is as follows:
import csv
import itertools
with open("c:/test", 'rb') as f:
reader = csv.reader(f, delimiter=';')
data1 = []
data2 = []
for column in reader:
data1.append(column[2])
results = data1
results = [int(i) for i in results]
results = [len(list(v)) for _, v in itertools.groupby(results)]
print results
data2.append(results[0])
data2.append(results[0]+results[1])
data2.append(results[0]+results[1]+results[2])
print data2
So I can make the array by doing it manually, but this costs a lot of time and is probably not the best way to do it. So what is the best way to do something like this?
You are looking for the cumulative sum of a list. The easiest way is to let numpy do it.
>>> import numpy as np
>>> np.cumsum([1, 5, 10, 15, 7, 3])
array([ 1, 6, 16, 31, 38, 41])
a = [1, 5, 10, 15, 7, 3]
b = [a[0]]
for i in range(1, len(a)):
b.append(b[-1]+ a[i])
a is your column from .csv. b is a list with already one value in it, which is first item of a. Then we loop through a starting from it's second item and we add the consequent values from it to last item of b and append it to b.
Using your code objects, what you look for would be something like:
from __future__ import print_function
import csv
import itertools
"""
with open("c:/test", 'rb') as f:
reader = csv.reader(f, delimiter=';')
for column in reader:
data1.append(column[2])
"""
data1 = [1, 5, 10, 15, 7, 3]
results = [data1[0]]
for i in range(1, len(data1)):
results.append(results[i-1] + data1[i])
print(data1, results)
I am attempting to merge several rows of csv data into one long row, given two cells contain the same data. For instance, take the following csv:
one, two, three
1, 2, 3
4, 5, 6
7, 8, 9
1, 1, 1
4, 4, 4
If two rows share the same value at row[0], I want the second row appended to the first. So my end product should look like this:
one, two, three
1, 2, 3, 1, 1, 1
4, 5, 6, 4, 4, 4
7, 8, 9
Here is my attempt so far:
import csv
uniqueNum = []
uniqueMaster = []
count = -1
with open("Test.csv", "rb") as source:
reader = csv.reader(source)
header = next(reader)
for row in reader:
if row[0] not in uniqueNum:
uniqueMaster.append(row)
uniqueNum.append(row[0])
count = count + 1
for row in reader:
if row[0] in uniqueNum:
uniqueMaster[count].append(row)
with open("holding.csv","wb") as result:
writer = csv.writer(result)
writer.writerow(header)
for row in uniqueMaster:
writer.writerow(row)
Things LOOK ok to me, but my script only outputs the following:
one, two, three
1, 2, 3, ['1', '1', '1']
This is obviously wrong for two reasons. First, it doesn't iterate through the entire csv, and second, the appended values are being squeezed into one cell, rather than individual cells. If anyone had any advice on getting this to work right I'd highly appreciate it!
Use a dictionary instead. Starting from the middle of your code(assume I have declared a dict called my_dict):
for row in reader:
if row[0] in my_dict.keys():
my_dict[row[0]].extend(row)
else:
my_dict[row[0]]=row
#...now we are at the bottom of your code, writing to the csv
for v in my_dict.values():
writer.writerow(v)
import csv
csv_dict = {}
with open("Test.csv", "r") as source:
reader = csv.reader(source)
header = next(reader)
for row in reader:
if row[0] in csv_dict:
csv_dict[row[0]] += row
else:
csv_dict[row[0]] = row
I have a dictionary that looks like this...
cla_1results= {"Tom":[1,7,4],"Dunc":[3,9,4],"Jack":[1,3,5]}
I want to write this dictionary to a csv so that it is in the following format
Don't have the rep to post images but it would be something like this...
Tom, 1, 7, 4
Dunc 3, 9, 4
Jack 1, 3, 5
Nothing I've tried has worked. My recent effort is below but I'm a real beginner with Python and programming in general.
import csv
cla_1results= {"Tom":[1,7,4],"Dunc":[3,9,4],"Jack":[1,3,5]}
cla_2results = {"Jane":[1,7,4],"Lynda":[3,9,4],"Dave":[1,3,5]}
cla_3results = {"Gemma":[1,7,4],"Steve":[3,9,4],"Jay":[1,3,5]}
b = open ('test.csv','w')
a = csv.writer(b)
data = cla_1results= {"Tom":[1,7,4],"Dunc":[3,9,4],"Jack":[1,3,5]}
a.writerows(data)
b.close()
which unfortunately only gives me:
T, o, m
D, u, n, c
J, a, c, k
etc
This should work, you just needed a list to generate csv file, so it can be generated on the fly as well.
import csv
cla_1results= {"Tom":[1,7,4],"Dunc":[3,9,4],"Jack":[1,3,5]}
with open('test.csv', 'wb') as csvfile:
writer = csv.writer(csvfile, delimiter=',')
for key,value in cla_1results.iteritems():
writer.writerow([key]+value)
One of many ways to do it:
import csv
cla_1results = {
"Tom": [1, 7, 4],
"Dunc": [3, 9, 4],
"Jack": [1, 3, 5]
}
with open("test.csv", 'w+') as file:
writer = csv.writer(file)
for name in cla_1results:
writer.writerow([name, ] + [i for i in cla_1results[name]])
You can use the DataFrame.from_dict() classmethod to convert dict to DataFrame and then can use to_csv to convert the dataframe to csv. I have used header=False to strip off the headers.
from pandas import DataFrame
cla_1results = {"Tom": [1, 7, 4], "Dunc": [3, 9, 4], "Jack": [1, 3, 5]}
df = DataFrame.from_dict(cla_1results, orient='index')
print(df.to_csv(header=False))
Dunc,3,9,4
Jack,1,3,5
Tom,1,7,4
Try:
import csv
with open('test.csv', 'wb') as csvfile:
c = csv.writer(csvfile)
line = []
for key, value in cla.iteritems():
line.append(key)
for i in value:
line.append(i)
c.writerow(line)
data = {"Tom":[1,7,4],"Dunc":[3,9,4],"Jack":[1,3,5]}
with open('test.csv', 'w') as f:
for k, vals in data.items():
line = ','.join([k] + map(str, vals)) + '\n'
f.write(line)