CSV Parsing Python, outputting certain rows that have an specific value

CSV Parsing Python, outputting certain rows that have an specific value - python

I want to use python to parse a CSV file, and output only certain rows which have a specific value. This is the code I have until now,
import csv
f = open('alerts2.csv')
csv_f = csv.reader(f)
li1 = []
header = next(csv_f)
for row in csv_f:
# li1.append(row[5])
# li1.append(row[0])
severity = int(row[0]) #Has The the integer value from 10 - 40
Status = str(row[1])
PolicyName = str(row[2])
PolicyBlockName = str(row[3])
PolicyRuleName = str(row[4])
Summary = str(row[5])
li1.append(severity)
li1.append(Summary) # string variables
print li1
f.close()
This outputs all the values from severity and summary, but I want it to output the data of severity and summary only if the severity value is at "10" .
I was thinking to use the list "li1" and search through the list and if the value "10" is found then output the values. Any suggestions?? I am a python newbie.

import pandas as pd
alerts_df = pd.DataFrame.from_csv('alerts2.csv', index_col=None)
print alerts_df[alerts_df['severity'] == 10]['Summary']

just add this check to your loop over the csv rows:
for row in csv_f:
severity = int(row[0])
if severity != 10:
continue
if the severity value is not 10 the loop will continue with the next row and not do anything that follows for the current row.

Related

Remove multiple lines from csv

This is my code so far, I have many lines in a CSV that I would like to keep, but if it's the 3rd line, then ignore
This is the line I'd like to be omitted if it is not the third row:
Curriculum Name,,Organization Employee Number,Employee Department,Employee Name,Employee Email,Employee Status,Date Assigned,Completion Date,Completion Status,Manager Name,Manager Email
it is appearing every 10 lines or so, but i want it removed if its not the first row (always the third)
import csv, sys, os
#Read the CSV file and skipping the first 130 lines based on mylist
scanReport = open('Audit.csv', 'r')
scanReader = csv.reader(scanReport)
#search row's in csv - print out list
for file in glob.glob(r'C:\sans\Audit.csv'):
lineNumber = 0
str - "Curriculum Name"
with open('first.csv', 'rb') as inp, open('first_edit.csv', 'wb') as out:
writer = csv.writer(out)
for row in csv.writer(inp):
if row[2] != " 0":
writer.writerow(row)

You want something like this in that loop:
index = 0
for row in csv.writer(inp):
if (index != 3) or (index == 3 and row[2] != " 0"):
writer.writerow(row)
index += 1
I am not familiar with the csv module, so I kept all your stuff assuming it is correct (I don't think you need that module for what you are doing though...)
More info on enumerate here.
EDIT:
To check if it's that line:
def IsThatLine(row):
return row[0] == "Curriculum Name" and row[1] == "" and row[2] == "Organization Employee" and ....
Then the if can become:
if (index != 3) or (index == 3 and not IsThatLine(row)):

Could you please be more specific in your question?
Would you like to remove any line containing the following description?
Curriculum Name,,Organization Employee Number,Employee Department,Employee Name,Employee Email,Employee Status,Date Assigned,Completion Date,Completion Status,Manager Name,Manager Email
Or would you like to remove only the third line (row) of this csv file?

How to find minimum value from CSV file row in Python?

I'm a beginner at Python and I am using it for my project.
I want to extract the minimum value from column4 of a CSV file and I am not sure how to.
I can print the whole of column[4] but am not sure how to print the minimum value (just one column) from column[4].
CSV File: https://www.emcsg.com/marketdata/priceinformation
I'm downloading the Uniform Singapore Energy Price & Demand Forecast for 9 Sep.
Thank you in advance.
This is my current codes:
import csv
import operator
with open('sep.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
header = next(readCSV)
data = []
for row in readCSV:
Date = row[0]
Time = row[1]
Demand = row[2]
RCL = row[3]
USEP = row [4]
EHEUR = row [5]
LCP = row[6]
Regulations = row[7]
Primary = row[8]
Secondary = row[9]
Contingency = row[10]
Last_Updated = row[11]
print header[4]
print row[4]

not sure how are you reading the values. however, you can add all the values to and list and then:
list = []
<loop to extract values>
list.insert(index, value)
min_value = min(list)
Note: index is the 'place' where the value get inserted.

Your phrasing is a bit ambiguous. At first I thought you meant the minimum of the fourth row, but looking at the data you seem to be wanting the minimum of the fourth column (USEP($/MWh)). For that, (assuming that "Realtime_Sep-2017.csv" is the filename) you can do:
import pandas as pd
df = pd.read_csv("Realtime_Sep-2017.csv")
print(min(df["USEP($/MWh)"])
Other options include df.min()["USEP($/MWh)"], df.min()[4], and min(df.iloc[:,4])

EDIT 2 :
Solution for a column without pandas module:
with open("Realtime_Sep-2017.csv") as file:
lines = file.read().split("\n") #Read lines
num_list = []
for line in lines:
try:
item = line.split(",")[4][1:-1] #Choose 4th column and delete ""
num_list.append(float(item)) #Try to parse
except:
pass #If it can't parse, the string is not a number
print(max(num_list)) #Prints maximum value
print(min(num_list)) #Prints minimum value
Output:
81.92
79.83
EDIT :
Here is the solution for a column:
import pandas as pd
df = pd.read_csv("Realtime_Sep-2017.csv")
row_count = df.shape[0]
column_list = []
for i in range(row_count):
item = df.at[i, df.columns.values[4]] #4th column
column_list.append(float(item)) #parse float and append list
print(max(column_list)) #Prints maximum value
print(min(column_list)) #Prints minimum value
BEFORE EDIT :
(solution for a row)
Here is a simple code block:
with open("Realtime_Sep-2017.csv") as file:
lines = file.read().split("\n") #Reading lines
num_list = []
line = lines[3] #Choosing 4th row.
for item in line.split(","):
try:
num_list.append(float(item[1:-1])) #Try to parse
except:
pass #If it can't parse, the string is not a number
print(max(num_list)) #Prints maximum value
print(min(num_list)) #Prints minimum value

Python - Extract data from csvfile1 and write to csvfile2 based on values in columns

I have data stored in a csv file :
ID;Event;Date
ABC;In;05/01/2015
XYZ;In;05/01/2016
ERT;In;05/01/2014
... ... ...
ABC;Out;05/01/2017
First, I am trying to extract all rows where Event is "In" and saves thoses rows in a new csv file. Here is the code i've tried so far:
[UPDATED : 05/18/2017]
with open('csv_in', 'r') as f, open('csv_out','w') as f2:
fieldnames=['ID','Event','Date']
reader = csv.DictReader(f, delimiter=';', lineterminator='\n',
fieldnames=fieldnames)
wr = csv.DictWriter(f2,dialect='excel',delimiter=';',
lineterminator='\n',fieldnames=fieldnames)
rows = [row for row in reader if row['Event'] == 'In']
for row in rows:
wr.writerows(row)
I am getting the following error : " ValueError: dict contains fields not in fieldnames: 'I', 'D'
[/UPDATED]
1/ Any thoughts on how to fix this ?
2/ Next step, how would you proceed to do a "lookup" on the ID (if exists several times as per ID "ABC") and extract the given "Date" value where Event is "Out"
output desired :
ID Date Exit date
ABC 05/01/2015 05/01/2017
XYZ 05/01/2016
ERT 05/01/2014
Thanks in advance for your input.
PS : can't use panda .. only standard lib.

you can interpret the raw csv with the standard library like so:
oldcsv=open('csv_in.csv','r').read().split('\n')
newcsv=[]
#this next part checks for events that are in
for line in oldcsv:
if 'In' in line.split(';'):
newcsv.append(line)
new_csv_file=open('new_csv.csv','w')
[new_csv_file.write(line+'\n') for line in newcsv]
new_csv_file.close()
you would use the same method to do your look-up, it's just that you'd change the keyword in that for loop, and if there's more than one item in the newly generated list you have more than one occurance of your ID, then just modify the condition to include two keywords

The error here is because you have not added a delimiter.
Syntax-
csv.DictReader(f, delimiter=';')
For Part 2.
import csv
import datetime
with open('csv_in', 'r') as f, open('csv_out','w') as f2:
reader = csv.DictReader(f, delimiter=';')
wr = csv.writer(f2,dialect='excel',lineterminator='\n')
result = {}
for row in reader:
if row['ID'] not in result:
# Assign Values if not in dictionary
if row['Event'] == 'In':
result[row['ID']] = {'IN' : datetime.datetime.strptime(row['Date'], '%d/%m/%Y') }
else:
result[row['ID']] = {'OUT' : datetime.datetime.strptime(row['Date'], '%d/%m/%Y') }
else:
# Compare dates with those present in csv.
if row['Event'] == 'In':
# if 'IN' is not present, use the max value of Datetime to compare
result[row['ID']]['IN'] = min(result[row['ID']].get('IN', datetime.datetime.max), datetime.datetime.strptime(row['Date'], '%d/%m/%Y'))
else:
# Similarly if 'OUT' is not present, use the min value of datetime to compare
result[row['ID']]['OUT'] = max(result[row['ID']].get('OUT', datetime.datetime.min), datetime.datetime.strptime(row['Date'], '%d/%m/%Y'))
# format the results back to desired representation
for v1 in result.values():
for k2,v2 in v1.items():
v1[k2] = datetime.datetime.strftime(v2, '%d/%m/%Y')
wr.writerow(['ID', 'Entry', 'Exit'])
for row in result:
wr.writerow([row, result[row].get('IN'), result[row].get('OUT')])
This code should work just fine. I have tested it on a small input

Split a row into multiple rows in csv based on column values

I have a csv as shown below and need to parse the csv into multiple rows based on value in column 3 to load into db...
Due to restrictions I can use only import csv module to do this function and that is where I am stuck and problem i am facing is if i write a insert query.. it's not fetching all the rows.. it's fetching only the last record in each for loop and inserting into table
1,2,3,4,5
10,20,30,50
100,200,300,400
Possible code:
if column 3 = 'y' else 'n' in column 4 in table
Output:
1,2,3,y
1,2,4,n
1,2,5,n
10,20,30,y
10,20,50,n
100,200,300,y
100,200,400,n
here is my code
import csv
import os
#Test-new to clean.csv
fRead=open("clean.csv")
csv_r=csv.reader(fRead)
#to skip first two lines
leave=0
for record in csv_r:
if leave<2:
leave+=1
continue
#storing the values of column 3,4,5 as an array
JMU=[]
for t in [2, 3, 4]:
if not(record[t] in ["", "NA"]):
JMU.append(record[t].strip())
#print len(JMU)
#print "2"
if len(JMU)==0:
#print "0"
pass
else:
#check if the name contains WRK
isWRK1 = "Table"
for data in JMU:
print data
if data[:3].lower()=="wrk" or data[-3:].lower()=="wrk":
isWRK1="Work"
print isWRK
else:
isWRK = "table"
#check if column 2 value is "Yes" or "No"
fourthColumn="N"
if not(record[2] in ["", "NA"]):
#print record[2]
if record[3].strip().lower()=="no":
# print record[3]
fourthColumn = "I"
else:
fourthColumn = "N"
for i in JMU:
iWRK = "Table"
if record[2]==i:
newRecord = [record[0], record[1], i, fourthColumn, isWRK,]
#print newRecord
elif record[3] == i:
newRecord = [record[0], record[1], i, "N", isWRK]
#print newRecord
else:
newRecord = [record[0], record[1], i, "N", isWRK]
print ("insert into table (column_a,column_b,column_c,column_d,column_e) values (%s,%s,%s,%s,%s)"% (record[0],record[1],record[2],record[3],record[4]))
fRead.close()
fWrite.close()

I'm assuming you want to keep the 1st 2 columns as constant and make a new row for every next number present on the same input line.
Initially I came up with this 1-liner awk command:
$ cat data
1,2,3,4,5
10,20,30,50
100,200,300,400
$ awk -F, -v OFS=, '{for(i=3;i<=NF;i++) print $1, $2, $i, (i==3?"y":"n")}' data
1,2,3,y
1,2,4,n
1,2,5,n
10,20,30,y
10,20,50,n
100,200,300,y
100,200,400,n
and then I replicated the same into python using the csv module:
import csv
with open('data', 'r') as f:
reader=csv.reader(f)
for row in reader:
l=list(map(int, row))
for i in range(2, len(l)):
print(l[0], l[1], l[i], 'y' if i==2 else 'n', sep=',')
and here is a sample run which is same as awk's output:
1,2,3,y
1,2,4,n
1,2,5,n
10,20,30,y
10,20,50,n
100,200,300,y
100,200,400,n

Splitting a delimited file and storing into new column

I am trying to split csv file. After reading the delimited file, I want to split desired column furthur. My sample code:
import csv
sample = open('~/sample.txt')
adr = csv.reader(sample, delimiter='|')
for row in adr:
a = row[0]
b = row[1]
c = row[2]
d = row [3]
new=""
new = row[4].split(",")
for row1 in new:
print row1
sample.txt file contains:
aa|bb|cc|dd|1,2,3,4|xx
ab|ax|am|ef|1,5,6|jk
cx|kd|rd|j|1,9|k
Above code produce output as:
[1,2,3,4]
[1,5,6]
[1,9]
I am trying to further split new column and going to use splited output for comparison. For example, desired output for splitting will be :
aa|bb|cc|dd|1|2|3|4|xx
ab|ax|am|ef|1|5|6| |jk
cx|kd|rd|j|1|9| | |k
Also I want to store mutiple blank or NULL value of new column, as shown in above example [1,2,3,4], [1,5,6]. Is there better way to split?

You're pretty much there already! A few more lines after new = row[4].split(",") are all you need.
for i in range(len(new), 4):
new.append('')
newrow = row[0:4] + new + row[5:]
print('|'.join(newrow))
Edit 2: addressing your comments below in the simplest way possible, just loop through it twice, looking for the longest "subarray" the first time. Re: printing extra times, you likely copied the code into the wrong place/indentation and have it in the loop.
Full code:
import csv
sample = open('~/sample.txt')
adr = csv.reader(sample, delimiter='|')
longest = 0
for row in adr:
curLen = len(row[4].split(','))
if curLen > longest:
longest = curLen
sample.seek(0)
for row in adr:
new = row[4].split(",")
for i in range(len(new), longest):
new.append(' ')
newrow = row[0:4] + new + row[5:]
print('|'.join(newrow))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

CSV Parsing Python, outputting certain rows that have an specific value - python

import pandas as pd alerts_df = pd.DataFrame.from_csv('alerts2.csv', index_col=None) print alerts_df[alerts_df['severity'] == 10]['Summary']

just add this check to your loop over the csv rows: for row in csv_f: severity = int(row[0]) if severity != 10: continue if the severity value is not 10 the loop will continue with the next row and not do anything that follows for the current row.

Related

Remove multiple lines from csv

How to find minimum value from CSV file row in Python?

Python - Extract data from csvfile1 and write to csvfile2 based on values in columns

Split a row into multiple rows in csv based on column values

Splitting a delimited file and storing into new column

Categories

Resources