I am working on converting the Blf file into Tab separated file. I am able to extract all the useful information from the file in a list below. I want to calculate the difference between timestamp values coming in one column. Please find attached my code so far:
import can
import csv
import datetime
import pandas as pd
filename = open('C:\\Users\\shraddhasrivastav\\Downloads\\BLF File\\output.csv', "w")
log = can.BLFReader('C:\\Users\\shraddhasrivastav\\Downloads\\BLF File\\test.blf')
# print ("We are here!")
log_output = []
for msg in log:
msg = str(msg).split()
#print (msg)
data_list = msg[7:(7 + int(msg[6]))]
log_output_entry = [(msg[1]), msg[3], msg[6], " ".join(data_list), msg[-1]]
log_output_entry.insert(1, 'ID=')
test_entry = " \t ".join(log_output_entry) # join the list and remove string quotes in the csv file
filename.write(test_entry + '\n')
df = pd.DataFrame(log_output)
df.columns = ['Timestamp', 'ID', 'DLC','Channel']
filename.close() # Close the file outside the loop
The output I am getting so far is below:
Under my first column, I want the difference between the timestamp values (Example- 2nd row value - 1st row timestamp value... 4th row timestamp value - 3rd row timestamp value...and so on... What should I add in my code to achieve this?
Below is the screenshot of how I want my file's Timestamp field to look like. (Calculating the difference between consecutive rows)
enter image description here
You can use pandas.DataFrame.shift:
df['Time Delta'] = df['Timestamp'] - df['Timestamp'].shift(periods=1, axis=0)
Keep in mind that the file you currently have seems to have variable length between the columns in the text file you write to, so it might be hard to directly insert into pandas. Maybe following would work:
import can
import csv
import datetime
import pandas as pd
#filename = open('C:\\Users\\shraddhasrivastav\\Downloads\\BLF File\\output.csv', "w")
log = can.BLFReader('C:\\Users\\shraddhasrivastav\\Downloads\\BLF File\\test.blf')
# print ("We are here!")
log_output = []
for msg in log:
msg = str(msg).split()
#print (msg)
data_list = msg[7:(7 + int(msg[6]))]
log_output_entry = [(msg[1]), msg[3], msg[6], " ".join(data_list), msg[-1]]
log_output_entry.insert(1, 'ID=')
assert len(log_output_entry)==4
log_output.append(log_output_entry)
#test_entry = " \t ".join(log_output_entry) # join the list and remove string quotes in the csv file
#filename.write(test_entry + '\n')
df = pd.DataFrame(log_output)
df.columns = ['Timestamp', 'ID', 'DLC','Channel']
df['Timestamp'] = pd.to_datetime(df['Timestamp'],unit='s')
df['Time Delta'] = df['Timestamp'] - df['Timestamp'].shift(periods=1, axis=0)
df.to_csv('C:\\Users\\shraddhasrivastav\\Downloads\\BLF File\\output_df.csv')
#filename.close() # Close the file outside the loop
Related
Going nuts trying to update a column of time entries in a dataframe. I am opening a csv file that has a column of time entries in UTC. I can take these times, convert them to Alaska Standard time, and print that new time out just fine. But when I attempt to put the time back into the dataframe, while I get no errors, I also don't get the new time in the dataframe. The old UTC time is retained. Code is below, I'm curious what it is I am missing. Is there something special about times?
import glob
import os
import pandas as pd
from datetime import datetime
from statistics import mean
def main():
AKST = 9
allDirectories = os.listdir('c:\\MyDir\\')
for directory in allDirectories:
curDirectory = directory.capitalize()
print('Gathering data from: ' + curDirectory)
dirPath = 'c:\\MyDir\\' + directory + '\\*.csv'
# Files are named by date, so sorting by name gives us a proper date order
files = sorted(glob.glob(dirPath))
df = pd.DataFrame()
for i in range(0,len(files)):
data = pd.read_csv(files[i], usecols=['UTCDateTime', 'current_humidity', 'pm2_5_cf_1', 'pm2_5_cf_1_b'])
dfTemp = pd.DataFrame(data) # Temporary dataframe to hold our new info
df = pd.concat([df, dfTemp], axis=0) # Add new info to end of dataFrame
print("Converting UTC to AKST, this may take a moment.")
for index, row in df.iterrows():
convertedDT = datetime.strptime(row['UTCDateTime'], '%Y/%m/%dT%H:%M:%Sz') - pd.DateOffset(hours=AKST)
print("UTC: " + row['UTCDateTime'])
df.at[index,'UTCDateTime'] = convertedDT
print("AKST: " + str(convertedDT))
print("row['UTCDateTime] = " + row['UTCDateTime'] + '\n') # Should be updated with AKST, but is not!
Edit - Alternatively: Is there a way to go about converting the date when it is first read in to the dataframe? Seems like that would be faster than having two for loops.
From your code, it looks like the data is getting updated correctly in the dataframe, but you are printing the row, which is not updated, as it was fetched from dataframe before its updation!
#You are updating df
df.at[index,'UTCDateTime'] = convertedDT #You are updating df
# below you are printing row
print("row['UTCDateTime] = " + row['UTCDateTime']
See sample code below and its output for the explanation.
data=pd.DataFrame({'Year': [1982,1983], 'Statut':['Yes', 'No']})
for index, row in data.iterrows():
data.at[index, 'Year'] = '5000' + str(index)
print('Printing row which is unchanged : ', row['Year'])
print('Updated Dataframe\n',data)
Output
Printing row which is unchanged : 1982
Printing row which is unchanged : 1983
Updated Dataframe
Year Statut
0 50000 Yes
1 50001 No
I am to download a number of .csv files which I convert to pandas dataframe and append to each other.
The csv can be accessed via url which is created each day and using datetime it can be easily generated and put in a list.
I am able to open these individually in the list.
When I try to open a number of these and append them together I get an empty dataframe. The code looks like this so.
#Imports
import datetime
import pandas as pd
#Testing can open .csv file
data = pd.read_csv('https://promo.betfair.com/betfairsp/prices/dwbfpricesukwin01022018.csv')
data.iloc[:5]
#Taking heading to use to create new dataframe
data_headings = list(data.columns.values)
#Setting up string for url
path_start = 'https://promo.betfair.com/betfairsp/prices/dwbfpricesukwin'
file = ".csv"
#Getting dates which are used in url
start = datetime.datetime.strptime("01-02-2018", "%d-%m-%Y")
end = datetime.datetime.strptime("04-02-2018", "%d-%m-%Y")
date_generated = [start + datetime.timedelta(days=x) for x in range(0, (end-start).days)]
#Creating new dataframe which is appended to
for heading in data_headings:
data = {heading: []}
df = pd.DataFrame(data, columns=data_headings)
#Creating list of url
date_list = []
for date in date_generated:
date_string = date.strftime("%d%m%Y")
x = path_start + date_string + file
date_list.append(x)
#Opening and appending csv files from list which contains url
for full_path in date_list:
data_link = pd.read_csv(full_path)
df.append(data_link)
print(df)
I have checked that they are not just empty csv but they are not. Any help would be appreciated.
Cheers,
Sandy
You are never storing the appended dataframe. The line:
df.append(data_link)
Should be
df = df.append(data_link)
However, this may be the wrong approach. You really want to use the array of URLs and concatenate them. Check out this similar question and see if it can improve your code!
I really can't understand what you wanted to do here:
#Creating new dataframe which is appended to
for heading in data_headings:
data = {heading: []}
df = pd.DataFrame(data, columns=data_headings)
By the way, try this:
for full_path in date_list:
data_link = pd.read_csv(full_path)
df.append(data_link.copy())
I have simple *.csv file where some of the columns are dates of the format mm/dd/yy. Here is an example:
$ cat somefile.csv
05/09/15,8,Apple,05/09/15
06/10/15,5,Banana,06/10/12
05/11/18,4,Carrot,09/03/18
02/09/15,2,Apple,01/09/15
I want to easily determine if a column only contains valid dates,
but I find myself struggling with counting '/' and counting characters. Surely there is some simple way of doing it right?
EDIT (Answer from #RahulAgarwal)
Here's my script (which still doesn't work :(( )
###########
# IMPORTS #
###########
import csv
import sys
import numpy
from dateutil.parser import parse
###########################
# [1] Open input csv file #
###########################
myfile=open("input4.csv","r")
myreader = csv.reader(myfile)
############################
# [2] read header csv file #
############################
for myline in myreader:
myheader=myline
break
####################################################################
# [3] read and put in ds only data originating in specific columns #
####################################################################
for myline in myreader:
for myColIndex in range(len(myline)):
if (parse(myline[myColIndex])):
print("column = {0}".format(myColIndex))
######################
# [4] Close csv file #
######################
myfile.close()
You can try below to check for valid dates:
from dateutil.parser import parse
parse("05/09/15")
You can use a set to keep track of columns seen in the file and a set of columns that didn't parse successfully as a valid date, then the difference between those two is columns that did parse as date, eg:
import csv
from datetime import datetime
with open('yourfile.csv') as fin:
seen_columns = set()
invalid_columns = set()
for row in csv.reader(fin):
for colno, col in enumerate(row, 1):
# We've seen it contains a non-date - don't try and parse it again
if colno in invalid_columns:
continue
# Make a note we've seen column N
seen_columns.add(colno)
# Try and see if we can parse it to the desired date format
try:
datetime.strptime(col, '%m/%d/%y')
# Nope - we couldn't... not a date - so don't both checking again
except ValueError:
invalid_columns.add(colno)
# Columns containing dates are those we've seen that
# didn't fail to parse as a date...
valid_columns = seen_columns - invalid_columns
You could use the strptime method of the datetime object:
from datetime import datetime
def isDateValid(date, pattern = "%d/%m/%y"):
try:
datetime.strptime(date, pattern)
return True
except ValueError:
return False
The strptime method raises a ValueError if the string doesn't match the pattern.
EDIT:
to let this work:
from datetime import datetime
def isDateValid(date, pattern = "%d/%m/%y"):
try:
datetime.strptime(date, pattern)
return True
except ValueError:
return False
# load file
with open("filename.csv") as f:
# split file into lines
lines = f.readlines()
# replace new-line character
lines = [x.replace("\n", "") for x in lines]
# extract the header
header = lines[0]
# extract rows
rows = lines[1:]
# loop over every row
for rowNumber, row in enumerate(rows, 1):
# split row into the seperate columns
columns = line.split(",")
# setting default value for every row
gotValidDate = False
# loop over every column
for column in columns:
# check if the column got a valid date
if isDateValid(column):
gotValidDate = True
# if at least one out of all columns in that row got a valid date
# the row number gets printed
if gotValidDate:
print(f"Row {rowNumber} got at least one valid date")
(Code is written in Python 3.7)
filenameA ="ApptA.csv"
filenameAc = "CheckoutA.csv"
def checkouttenantA():
global filenameA
global filenameAc
import csv
import datetime
with open(filenameA, 'r') as inp, open(filenameAc, 'a' , newline = "") as out:
my_writer = csv.writer(out)
for row in csv.reader(inp):
my_date= datetime.date.today()
string_date = my_date.strftime("%d/%m/%Y")
if row[5] <= string_date:
my_writer.writerow(row)
Dates are saved in format %d/%m/%Y in an excel file on column [5]. I am trying to compare dates in csv file with actual date, but it is only comparing the %d part. I assume it is because dates are in string format.
Ok so there are a few improvements to make as well, which I'll put as an edit to this, but you're converting todays date to a string with strftime() and comparing the two strings, you should be converting the string date from the csv file to a datetime object and comparing those instead.
I'll add plenty of comments to try and explain the code and the reasoning behind it.
# imports should go at the top
import csv
# notice we are importing datetime from datetime (we are importing the `datetime` type from the module datetime
import from datetime import datetime
# try to avoid globals where possible (they're not needed here)
def check_dates_in_csv(input_filepath):
''' function to load csv file and compare dates to todays date'''
# create a list to store the rows which meet our criteria
# appending the rows to this will make a list of lists (nested list)
output_data = []
# get todays date before loop to avoid calling now() every line
# we only need this once and it'll slow the loop down calling it every row
todays_date = datetime.now()
# open your csv here using the function argument
with open(input_filepath, output_filepath) as csv_file:
reader = csv.reader(csv_file)
# iterate over the rows and grab the date in each row
for row in reader:
string_date = row[5]
# convert the string to a datetime object
csv_date = datetime.strptime(string_date, '%d/%m/%Y')
# compare the dates and append if it meets the criteria
if csv_date <= todays_date:
output_data.append(row)
# function should only do one thing, compare the dates
# save the output after
return output_data
# then run the script here
# this comparison is basically the entry point of the python program
# this answer explains it better than I could: https://stackoverflow.com/questions/419163/what-does-if-name-main-do
if __name__ == "__main__":
# use our new function to get the output data
output_data = check_dates_in_csv("input_file.csv")
# save the data here
with open("output.csv", "w") as output_file:
writer = csv.writer(output_file)
writer.writerows(output_data)
I would recommend to use Pandas for such tasks:
import pandas as pd
filenameA ="ApptA.csv"
filenameAc = "CheckoutA.csv"
today = pd.datetime.today()
df = pd.read_csv(filenameA, parse_dates=[5])
df.loc[df.iloc[:, 5] <= today].to_csv(filenameAc, index=False)
So I have two csv files. One is in the following format:
last name, first name, Number
The other is in this format:
number, quiz
I want to create a new output file that takes these two csv files and gives me a file in the following format:
last name, first name, number, quiz.
I have tried the follwoing code and it works, but only for the first person listed in the first two input files. I am not sure what I am doing wrong. Also, I do not want to assume that the two input files follow the same order.
import sys, re
import numpy as np
import smtplib
from random import randint
import csv
import math
col = sys.argv[1]
source = sys.argv[2]
target = sys.argv[3]
newtarg = sys.argv[4]
input_source = csv.DictReader(open(source))
input_target = csv.DictReader(open(target))
data = {}
t = ()
for row in input_target:
t = row['First Name'], row['number']
for rows in input_source:
if rows['number'] == row['number']:
t = t + (rows[col],)
name = row['Last Name']
data[name] = [t]
rows.next()
row.next()
with open(newtarg,'w') as out:
csv_out=csv.writer(out)
for key, val in data.items():
csv_out.writerow([key] + list(val))
This might be a job for pandas, the Python Data Analysis Library:
import pandas as pd
x1 = pd.read_csv('x1.csv')
x2 = pd.read_csv('x2.csv')
result = pd.merge(x1, x2, on='number')
result.to_csv('result.csv',
index=False,
columns=['Last Name', 'First Name', 'number', 'quiz'])
Reference: https://chrisalbon.com/python/pandas_join_merge_dataframe.html
I think the following will work. Note: I've taken out all the stuff in the code in your question that's not being used (as you should have done before posting it). I've also hardcoded the input values for testing.
import csv
source = 'source1.csv'
target = 'target1.csv'
newtarg = 'new_output.csv'
targets = {}
with open(target) as file:
for row in csv.DictReader(file):
targets[row['number']] = row['quiz']
with open(source) as src, open(newtarg, 'w') as out:
reader = csv.DictReader(src)
writer = csv.writer(out)
writer.writerow(reader.fieldnames + ['quiz']) # create a header row (optional)
for row in reader:
row.update({'quiz': targets.get(row['Number'], 'no match')})
writer.writerow(row.values())