First off, I apologize for the terrible title; I didn't know how to summarize my problem. Okay, so here's a the first few lines of my .csv file. The first column is the timestamp. The program I'm getting this data from samples 24 times per second, so there are 24 rows that start with 15:40:15, 24 that start with 15:40:16, and so on. Instead of 24 rows with the same timestamp, I want the timestamp to increase increments of 1/24 seconds, or .042 seconds. So 15:40:15.042, 15:40:15.084, etc.
Another problem is that there aren't 24 rows for the first second, because we start in the middle of the second. For example, there are only 13 15:40:14 rows. For those it would preferably count backwards from 15:40:15.000 and subtract .042 seconds for every row.
How can I do this in Python? Thanks in advance!
CPUtime,Displacement Into Surface,Load On Sample,Time On Sample,Raw Load,Raw Displacement
15:40:14,-990.210561,-0.000025,1.7977E+308,-115.999137,-989.210000
15:40:14,-989.810561,-0.000025,1.7977E+308,-115.999105,-988.810000
15:40:14,-989.410561,-0.000025,1.7977E+308,-115.999073,-988.410000
15:40:14,-989.010561,-0.000025,1.7977E+308,-115.999041,-988.010000
15:40:14,-988.590561,-0.000025,1.7977E+308,-115.999007,-987.590000
15:40:14,-988.170561,-0.000025,1.7977E+308,-115.998974,-987.170000
15:40:14,-987.770561,-0.000025,1.7977E+308,-115.998942,-986.770000
15:40:14,-987.310561,-0.000025,1.7977E+308,-115.998905,-986.310000
15:40:14,-986.870561,-0.000025,1.7977E+308,-115.998870,-985.870000
15:40:14,-986.430561,-0.000025,1.7977E+308,-115.998834,-985.430000
15:40:14,-985.990561,-0.000025,1.7977E+308,-115.998799,-984.990000
15:40:14,-985.570561,-0.000025,1.7977E+308,-115.998766,-984.570000
15:40:14,-985.170561,-0.000025,1.7977E+308,-115.998734,-984.170000
15:40:15,-984.730561,-0.000025,1.7977E+308,-115.998698,-983.730000
15:40:15,-984.310561,-0.000025,1.7977E+308,-115.998665,-983.310000
15:40:15,-983.890561,-0.000025,1.7977E+308,-115.998631,-982.890000
15:40:15,-983.490561,-0.000025,1.7977E+308,-115.998599,-982.490000
15:40:15,-983.090561,-0.000025,1.7977E+308,-115.998567,-982.090000
I'd add to #robert king's answer that you could use itertools.groupby() to group rows with the same timestamp:
import csv
import shutil
from itertools import groupby
n = 24
time_increment = 1./n
fractions = [("%.3f" % (i*time_increment,)).lstrip('0') for i in xrange(n)]
with open('input.csv', 'rb') as f, open('output.csv', 'wb') as fout:
writer = csv.writer(fout)
# assume the file is sorted by timestamp
for timestamp, group in groupby(csv.reader(f), key=lambda row: row[0]):
sametime = list(group) # all rows that have the same timestamp
assert n >= len(sametime)
for i, row in enumerate(sametime, start=n-len(sametime)):
row[0] += fractions[i] # append fractions of a second
writer.writerows(sametime)
shutil.move('output.csv', 'input.csv') # update input file
'b' file mode is mandatory for csv in Python 2 otherwise entries that may span several physical lines won't work
if there are less than n entries with the same timestamp then the code assumes that they are consecutive values from the end of a second
open the csv file and create a csv reader as per http://docs.python.org/library/csv.html
Also create a csv writer as per http://docs.python.org/library/csv.html
Now loop through each row of the file. On each row, modify the timestamp and then write it to your new csv file.
If you want the new csv file to replace the old csv file, at the end use shutil http://docs.python.org/library/shutil.html to replace it.
I recommend inside your loop you have a variable called "current_timestamp" and a variable called "current_increment". If the timestamp in the row is equal to the current_timestamp, simply add the increment, otherwise change them both appropriately.
Related
I'm new in Python language and i'm facing a small challenge in which i havent been able to figure it out so far.
I receive a csv file with around 30-40 columns and 5-50 rows with various details in each cell. The 1st row of the csv has the title for each column and by the 2nd row i have item values.
What i want to do is to create a python script which will read the csv file and every time to do the following:
Add a row after the actual 1st item row, (literally after the 2nd row, cause the 1st row is titles), and in that new 3rd row to contain the same information like the above one with one difference only. in the column "item_subtotal" i want to add the value from the column "discount total".
all the bellow rows should remain as they are, and save this modified csv as a new file with the word "edited" added in the file name.
I could really use some help because so far i've only managed to open the csv file with a python script im developing, but im not able so far to add the contents of the above row to that newly created row and replace that specific value.
Looking forward any help.
Thank you
Here Im attaching the CSV with some values changed for privacy reasons.
order_id,order_number,date,status,shipping_total,shipping_tax_total,fee_total,fee_tax_total,tax_total,discount_total,order_total,refunded_total,order_currency,payment_method,shipping_method,customer_id,billing_first_name,billing_last_name,billing_company,billing_email,billing_phone,billing_address_1,billing_address_2,billing_postcode,billing_city,billing_state,billing_country,shipping_first_name,shipping_last_name,shipping_address_1,shipping_address_2,shipping_postcode,shipping_city,shipping_state,shipping_country,shipping_company,customer_note,item_id,item_product_id,item_name,item_sku,item_quantity,item_subtotal,item_subtotal_tax,item_total,item_total_tax,item_refunded,item_refunded_qty,item_meta,shipping_items,fee_items,tax_items,coupon_items,order_notes,download_permissions_granted,admin_custom_order_field:customer_type_5
15001_TEST_2,,"2017-10-09 18:53:12",processing,0,0.00,0.00,0.00,5.36,7.06,33.60,0.00,EUR,PayoneCw_PayPal,"0,00",0,name,surname,,name.surname#gmail.com,0123456789,"address 1",,41541_TEST,location,,DE,name,surname,address,01245212,14521,location,,DE,,,1328,302,"product title",103,1,35.29,6.71,28.24,5.36,0.00,0,,"id:1329|method_id:free_shipping:3|method_title:0,00|total:0.00",,id:1330|rate_id:1|code:DE-MWST-1|title:MwSt|total:5.36|compound:,"id:1331|code:#getgreengent|amount:7.06|description:Launchcoupon for friends","text string",1,
You can also use pandas to manipulate the data from the csv like this:
import pandas
import copy
Read the csv file into a pandas dataframe:
df = pandas.read_csv(filename)
Make a deepcopy of the first row of data and add the discount total to the item subtotal:
new_row = copy.deepcopy(df.loc[1])
new_row['item_subtotal'] += new_row['discount total']
Concatenate the first 2 rows with the new row and then everything after that:
df = pandas.concat([df.loc[:1], new_row, df.loc[2:]], ignore_index=True)
Change the filename and write the out the new csv file:
filename = filename.strip('.csv') + 'edited.csv'
df.to_csv(filename)
I hope this helps! Pandas is great for cleanly handling massive amounts of data, but may be overkill for what you are trying to do. Then again, maybe not. It would help to see an example data file.
The first step is to turn that .csv into something that is a little easier to work with. Fortunately, python has the 'csv' module which makes it easy to turn your .csv file into a much nicer list of lists. The below will give you a way to both turn your .csv into a list of lists and turn the modified data back into a .csv file.
import csv
import copy
def csv2list(ifile):
"""
ifile = the path of the csv to be converted into a list of lists
"""
f = open(ifile,'rb')
olist=[]
c = csv.reader(f, dialect='excel')
for line in c:
olist.append(line) #and update the outer array
f.close
return olist
#------------------------------------------------------------------------------
def list2csv(ilist,ofile):
"""
ilist = the list of lists to be converted
ofile = the output path for your csv file
"""
with open(ofile, 'wb') as csvfile:
csvwriter = csv.writer(csvfile, delimiter=',',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
[csvwriter.writerow(x) for x in ilist]
Now, you can simply copy list[1] and change the appropriate element to reflect your summed value using:
listTemp = copy.deepcopy(ilist[1])
listTemp[n] = listTemp[n] + listTemp[n-x]
ilist.insert(2,listTemp)
As for how to change the file name, just use:
import os
newFileName = os.path.splitext(oldFileName)[0] + "edited" + os.path.splitext(oldFileName)[1]
Hopefully this will help you out!
so I've seen this done is other questions asked here but I'm still a little confused. I've been learning python3 for the last few days and figured I'd start working on a project to really get my hands dirty. I need to loop through a certain amount of CSV files and make edits to those files. I'm having trouble with going to a specific column and also for loops in python in general. I'm used to the convention (int i = 0; i < expression; i++), but in python it's a little different. Here's my code so far and I'll explain where my issue is.
import os
import csv
pathName = os.getcwd()
numFiles = []
fileNames = os.listdir(pathName)
for fileNames in fileNames:
if fileNames.endswith(".csv"):
numFiles.append(fileNames)
for i in numFiles:
file = open(os.path.join(pathName, i), "rU")
reader = csv.reader(file, delimiter=',')
for column in reader:
print(column[4])
My issue falls on this line:
for column in reader:
print(column[4])
So in the Docs it says column is the variable and reader is what I'm looping through. But when I write 4 I get this error:
IndexError: list index out of range
What does this mean? If I write 0 instead of 4 it prints out all of the values in column 0 cell 0 of each CSV file. I basically need it to go through the first row of each CSV file and find a specific value and then go through that entire column. Thanks in advance!
It could be that you don't have 5 columns in your .csv file.
Python is base0 which means it starts counting at 0 so the first column would be column[0], the second would be column[1].
Also you may want to change your
for column in reader:
to
for row in reader:
because reader iterates through the rows, not the columns.
This code loops through each row and then each column in that row allowing you to view the contents of each cell.
for i in numFiles:
file = open(os.path.join(pathName, i), "rU")
reader = csv.reader(file, delimiter=',')
for row in reader:
for column in row:
print(column)
if column=="SPECIFIC VALUE":
#do stuff
Welcome to Python! I suggest you to print some debugging messages.
You could add this to you printing loop:
for row in reader:
try:
print(row[4])
except IndexError as ex:
print("ERROR: %s in file %s doesn't contain 5 colums" % (row, i))
This will print bad lines (as lists because this is how they are represented in CSVReader) so you could fix the CSV files.
Some notes:
It is common to use snake_case in Python and not camelCase
Name your variables appropriately (csv_filename instead of i, row instead of column etc.)
Use the with close to handle files (read more)
Enjoy!
I have two csv files. I am trying to look up a value the first column in one file (file 1) in the first column in the other file (file 2). If they match then print the row from file 2.
Pseudo code:
read file1.csv
read file2.csv
loop through file1
compare each row with each row of file 2 in turn
if file1[0] == file2[0]:
print row of file 2
file1:
45,John
46,Fred
47,Bill
File2:
46,Roger
48,Pete
49,Bob
I want it to print :
46 Roger
EDIT - these are examples, the actual file is much bigger (5,000 rows, 7 columns)
I have the following:
import csv
with open('csvfile1.csv', 'rt') as csvfile1, open('csvfile2.csv', 'rt') as csvfile2:
csv1reader = csv.reader(csvfile1)
csv2reader = csv.reader(csvfile2)
for rowcsv1 in csv1reader:
for rowcsv2 in csv2reader:
if rowcsv1[0] == rowcsv2[0]:
print(rowcsv1)
However I am getting no output.
I am aware there are other ways of doing it (with dict, pandas) but I cam keen to know why my approach is not working.
EDIT: I now see that it is only iterating through the first row of file 1 and then closing, but I am unclear how to stop it closing (I also understand that this is not the best way to do do it).
You open csv2reader = csv.reader(csvfile2) then iterate through it vs the first row of csv1reader - it has now reached end of file and will not produce any more data.
So for the second through last rows of csv1reader you are comparing against the items of an empty list, ie no comparison takes place.
In any case, this is a very inefficient method; unless you are working on very large files, it would be much better to do
import csv
# load second file as lookup table
data = {}
with open("csv2file.csv") as inf2:
for row in csv.reader(inf2):
data[row[0]] = row
# now process first file against it
with open("csv1file.csv") as inf1:
for row in csv.reader(inf1):
if row[0] in data:
print(data[row[0]])
See Hugh Bothwell's answer for why your code isn't working. For a fast way of doing what you stated you want to do in your question, try this:
import csv
with open('csvfile1.csv', 'rt') as csvfile1, open('csvfile2.csv', 'rt') as csvfile2:
csv1 = list(csv.reader(csvfile1))
csv2 = list(csv.reader(csvfile2))
duplicates = {a[0] for a in csv1} & {a[0] for a in csv2}
for row in csv2:
if row[0] in duplicates:
print(row)
It gets the duplicate numbers from the two csv files, then loops through the second cvs file, printing the row if the number at index 0 is in the first cvs file. This is a much faster algorithm than what you were attempting to do.
If order matters, as #hugh-bothwell's mentioned in #will-da-silva's answer, you could do:
import csv
from collections import OrderedDict
with open('csvfile1.csv', 'rt') as csvfile1, open('csvfile2.csv', 'rt') as csvfile2:
csv1 = list(csv.reader(csvfile1))
csv2 = list(csv.reader(csvfile2))
d = {row[0]: row for row in csv2}
k = OrderedDict.fromkeys([a[0] for a in csv1]).keys()
duplicate_keys = [k for k in k if k in d]
for k in duplicate_keys:
print(d[k])
I'm pretty sure there's a better way to do this, but try out this solution, it should work.
counter = 0
import csv
with open('csvfile1.csv', 'rt') as csvfile1, open('csvfile2.csv', 'rt') as
csvfile2:
csv1reader = csv.reader(csvfile1)
csv2reader = csv.reader(csvfile2)
for rowcsv1 in csv1reader:
for rowcsv2 in csv2reader:
if rowcsv1[counter] == rowcsv2[counter]:
print(rowcsv1)
counter += 1 #increment it out of the IF statement.
Hello I'm really new here as well as in the world of python.
I have some (~1000) .csv files, including ~ 1800000 rows of information each. The files are in the following form:
5302730,131841,-0.29999999999999999,NULL,2013-12-31 22:00:46.773
5303072,188420,28.199999999999999,NULL,2013-12-31 22:27:46.863
5350066,131841,0.29999999999999999,NULL,2014-01-01 00:37:21.023
5385220,-268368577,4.5,NULL,2014-01-01 03:12:14.163
5305752,-268368587,5.1900000000000004,NULL,2014-01-01 03:11:55.207
So, i would like for all of the files:
(1) to remove the 4th (NULL) column
(2) to keep in every file only certain rows (depending on the value of the first column i.e.5302730, keep only the rows that containing that value)
I don't know if this is even possible, so any answer is appreciated!
Thanks in advance.
Have a look at the csv module
One can use the csv.reader function to generate an iterator of lines, with each lines cells as a list.
for line in csv.reader(open("filename.csv")):
# Remove 4th column, remember python starts counting at 0
line = line[:3] + line[4:]
if line[0] == "thevalueforthefirstcolumn":
dosomethingwith(line)
If you wish to do this sort of operation with CSV files more than once and want to use different parameters regarding column to skip, column to use as key and what to filter on, you can use something like this:
import csv
def read_csv(filename, column_to_skip=None, key_column=0, key_filter=None):
data_from_csv = []
with open(filename) as csvfile:
csv_reader = csv.reader(csvfile)
for row in csv_reader:
# Skip data in specific column
if column_to_skip is not None:
del row[column_to_skip]
# Filter out rows where the key doesn't match
if key_filter is not None:
key = row[key_column]
if key_filter != key:
continue
data_from_csv.append(row)
return data_from_csv
def write_csv(filename, data_to_write):
with open(filename, 'w') as csvfile:
csv_writer = csv.writer(csvfile)
for row in data_to_write:
csv_writer.writerow(row)
data = read_csv('data.csv', column_to_skip=3, key_filter='5302730')
write_csv('data2.csv', data)
I am writing a scipt (i.e. once upon the time) where I am reading the data from an excel-file. For that data I create an id based on the date and time. I have one missing variable, which is contained in a txt-file. The txt-file has also date and time to create an id.
Now I would like to link the data from the excel-file and txt-file based on the id. Right no I am building two lists from the txt-file. One containing the id and the other containing the value I need. Then, I get the index from the id list, where the id is the same in both data sets using the enumerate function. I use that index to get the value from the valuelist. The code looks something like that:
datelist = []
valuelist = []
txtfile = open(folder + os.sep + "Textfile.txt", "r")
ILines = txtfile.readlines()
for i,row in enumerate(ILines):
datelist.append(row.split(",")[1])
valuelist.append(row.split(",")[2])
rows = myexceldata
for row in rows:
x = row[id]
row = row + valuelist[[i for i,e in enumerate(datelist ) if e == x][0]]
However, that takes ages and I wonder if there is a better way to to that.
The files look like that:
Excelfile:
Date Time Var1 Var2
03.02.2016 12:53:24 10 27
03.02.2016 12:53:25 10 27
03.02.2016 12:53:26 10 27
Textfile:
Date Time Var3
03.02.2016 12:53:24 16
03.02.2016 12:53:25 20
Result:
Date Time Var1 Var2 Var3
03.02.2016 12:53:24 10 27 16
03.02.2016 12:53:25 10 27 20
03.02.2016 12:53:26 10 27 *)
*) It would be perfect, if here would be the same value as above, but empty would be ok, too
Ok, I forgot one important thing. Sorry about that: Not all times of the excelfile are in the textfile. The best option would be to get var3 from the previous time of the textfile just before the time of the excelfile. But it would also be an option to leave it blank than.
If both of your files are sorted in time order then the following kind of approach would be fast:
from heapq import merge
from itertools import groupby, chain
import csv
with open('excel.txt', 'rb') as f_excel, open('textfile.txt', 'rb') as f_text, open('output.txt', 'wb') as f_output:
csv_excel = csv.reader(f_excel)
csv_text = csv.reader(f_text)
csv_output = csv.writer(f_output)
header_excel = next(csv_excel)
header_text = next(csv_text)
csv_output.writerow(header_excel + [header_text[-1]])
for k, g in groupby(merge(csv_text, csv_excel), key=lambda x: x[0:2]):
csv_output.writerow(k + list(chain.from_iterable(cols[2:] for cols in g)))
This assumes your two input files are both in csv format, and works as follows:
Create csv readers/writers for all of the files. This allows the files to automatically be read in as lists of columns without requiring each line to be split.
Extract the headers from both of the files and write a combined form to the output.
Take the two input files and pass them to merge. This returns a row at a time from either input file in order.
Pass this to groupby to group rows with the same date and time together. This returns a key and a group, where the key is the date and time that matched, and the group is an iterable of the matching rows.
For each grouped entry, write the key and columns 2 onwards from each row to the output file. chain is used to produce a flat list.
This would give you an output file as follows:
Date,Time,Var1,Var2,Var3
03.02.2016,12:53:24,10,27,16
03.02.2016,12:53:25,10,27,20
As you already have the excel data, this would need to be passed to merge instead of csv_excel as a list of rows/cols.