Changing date values in a .csv using a for loop - python

I have a csv file that I need to change the date value in each row. The date to be changed appears in the exact same column in each row of the csv.
import csv
firstfile = open('example.csv',"r")
firstReader = csv.reader(firstfile, delimiter='|')
firstData = list(firstReader)
DateToChange = firstData[1][25]
ChangedDate = '2018-09-30'
for row in firstReader:
for column in row:
print(column)
if column==DateToChange:
#Change the date
outputFile = open("output.csv","w")
outputFile.writelines(firstfile)
outputFile.close()
I am trying to grab and store a date already in the csv and change it using a for loop, then output the original file with the changed dates. However, the code above doesn't seem to do anything at all. I am newer to Python so I might not be understanding how to use a for loop correctly.
Any help at all is greatly appreciated!

When you call list(firstReader), you read all of the CSV data in to the firstData list. When you then, later, call for row in firstReader:, the firstReader is already exhausted, so nothing will be looped. Instead, try changing it to for row in firstData:.
Also, when you are trying to write to file, you are trying to write firstFile into the file, rather than the altered row. I'll leave you to figure out how to update the date in the row, but after that you'll need to give the file a string to write. That string should be ', '.join(row), so outputFile.write(', '.join(row)).
Finally, you should open your output file once, not each time in the loop. Move the open call to above your loop, and the close call to after your loop. Then when you have a moment, search google for 'python context manager open file' for a better way to manage the open file.

you could use pandas and numpy. Here I create a dataframe from scratch but you could load it directly from a .csv:
import pandas as pd
import numpy as np
date_df = pd.DataFrame(
{'col1' : ['12', '14', '14', '3412', '2'],
'col2' : ['2018-09-30', '2018-09-14', '2018-09-01', '2018-09-30', '2018-12-01']
})
date_to_change = '2018-09-30'
replacement_date = '2018-10-01'
date_df['col2'] = np.where(date_df['col2'] == date_to_change, replacement_date, date_df['col2'])

Related

What is going on with this CSV file?

I have a csv file that is for some reason creating unnecessary Unnamed columns any time I do something.
Two first lines from said CSV file before anything is done to it:
Date,Type,Action,Symbol,Instrument Type,Description,Value,Quantity,Average Price,Commissions,Fees,Multiplier,Underlying Symbol,Expiration Date,Strike Price,Call or Put
2020-04-29T06:37:14-0400,Receive Deliver,BUY_TO_OPEN,USO1 201016C00014500,Equity Option,Reverse split: Open 2 USO1 201016C00014500,-10,2,-5,,0,100,USO,10/16/2020,14.5,CALL
Note how there are no extraneous commas. When I read the file as is, it seems to have Date as the index, but I can't reach it. Gives me an error with ['Date'] and [0] and also .index So, I created a new column and made it the index. All seems well. I run a function that changes the Date values to timestamp and also reversed the index. Yet, when I try to reset the index with drop=True or inplace=True or anything else I can think of, I am getting an unnamed column set as the index, with the index column being pushed to the end.
,Date,Type,Action,Symbol,Instrument Type,Description,Value,Quantity,Average Price,Commissions,Fees,Multiplier,Underlying Symbol,Expiration Date,Strike Price,Call or Put,ind
0,2019-12-10 17:00:00,Money Movement,,,,ACH DEPOSIT,"1,500.00",0,,,0.0,,,,,,709
This is the current snippet I have:
import pandas as pd
# set csv file as constant
TRADER_READER = pd.read_csv('TastyTrades.csv')
TRADER_READER['ind'] = TRADER_READER.index
# change date format, make date into timestamp object, set date as index, write changes to csv file
def clean_date():
TRADER_READER['Date'] = TRADER_READER['Date'].replace({'T': ' ', '-0500': '', '-0400': ''}, regex=True)
TRADER_READER['Date'] = pd.to_datetime(TRADER_READER['Date'], format="%Y-%m-%d %H:%M:%S")
reversed_index = TRADER_READER.iloc[::-1].reset_index(drop=True)
reversed_index.to_csv('TastyTrades.csv')

Turning Panda Column into text file seperated by line break

I would like to create a txt file, where every line is a so called "ticker symbol" (=symbol for a stock). As a first step, I downloaded all the tickers I want via a wikipedia api:
import pandas as pd
import wikipedia as wp
html1 = wp.page("List of S&P 500 companies").html().encode("UTF-8")
df = pd.read_html(html1,header =0)[0]
df = df.drop(['SEC filings','CIK', 'Headquarters Location', 'Date first added', 'Founded'], axis = 1)
df.columns = df.columns.str.replace('Symbol', 'Ticker')
Secondly, I would like to create a txt file as mentionned above with all the ticker names of column "Ticker" from df. To do so, I probably have to do somithing similar to:
f = open("tickertest.txt","w+")
f.write("MMM\nABT\n...etc.")
f.close()
Now my problem: Does anybody know how it is possible to bring my Ticker column from df into one big string where between every ticker there is a \n or every ticker is on a new line?
You can use to_csv for this.
df.to_csv("test.txt", columns=["Ticker"], header=False, index=False)
This provides flexibility to include other columns, column names, and index values at some future point (should you need to do some sleuthing, or in case your boss asks for more information). You can even change the separator. This would be a simple modification (obvious changes, e.g.):
df.to_csv("test.txt", columns=["Ticker", "Symbol",], header=True, index=True, sep="\t")
I think the benefit of this method over jfaccioni's answer is flexibility and ease of adapability. This also gets you away from explicitly opening a file. However, if you still want to explicitly open a file you should consider using "with", which will automatically close the buffer when you break out of the current indentation. e.g.
with open("test.txt", "w") as fid:
fid.write("MMM\nABT\n...etc.")
This should do the trick:
'\n'.join(df['Ticker'].astype(str).values)

Copy the first column of a .csv file into a new file

I know this is a very easy task, but i am acting pretty dumb right now and dont get it solved. I need to copy the first column of a .csv file including header into a newly created file. My code:
station = 'SD_01'
import csv
import pandas as pd
df = pd.read_csv(str( station ) + "_ED.csv", delimiter =';')
list1 = []
matrix1 = df[df.columns[0]].as_matrix()
list1 = matrix1.tolist()
with open('{0}_RRS.csv'.format(station),"r+") as f:
writer = csv.writer(f)
writer.writerows(map(lambda x: [x], list1))
As result, my file has an empty line between the values, has no header (i could continue without the header, though) and something at the bottom which a can not identify.
>350
>
>351
>
>352
>
>...
>
>949
>
>950
>
>Ž‘’“”•–—˜™š›œžŸ ¡¢
Just a short impression of the 1200+ lines
I am pretty sure that this is a very clunky way to do this; easyier ways are always welcome.
How do i get rid of all the empty lines and this crazy stuff in the end?
When you get a column from a dataframe, it's returned as type Series - and the Series has a built in to_csv method you can use. So you don't need to do any matrix casting or anything like that.
import pandas as pd
df = pd.read_csv('name.csv',delimiter=';')
first_column = df[[df.columns[0]]
first_column.to_csv('new_file.csv')

python: How to clean the csv file

I am a beginner user of Python and would like to clean the csv file for analysis purpose. However, I am facing the problem with the code.
def open_dataset(file_name):
opened_file = open(file_name)
read_file = reader(opened_file, delimiter=",")
data = list(read_file)
return data
def column(filename):
filename = open_dataset(filename)
for row in filename:
print(row)
with the code above, the output is like
['Name;Product;Sales;Country;Website']
[';Milk;30;Germany;something.com']
[';;;USA;']
['Chris;Milk;40;;']
I would like to have the output following:
['Name','Product','Sales','Country','Website']
[NaN,'Milk','30','Germany','something.com']
[NaN,NaN,NaN,'USA',NaN]
['Chris','Milk',40,NaN,NaN]
I defined a delimiter as "," but still ";" used. I don't know why it is happening. Also Even if I tried to replace the space with "NaN, but still every space is replaced by "NaN".
Would be really appreciated if someone could give me tips
After all, I would like to analyse each column such as percentage of "NaN" etc.
Thank you!
You can get the result that you want by:
specifying ';' as the delimiter when constructing a reader object
passing each row through a function that converts empty cells to 'NaN' (or some other value of your choice)
Here is some sample code:
import csv
def row_factory(row):
return [x if x != '' else 'NaN' for x in row]
with open(filename, 'r', newline='') as f:
reader = csv.reader(f, delimiter=';')
for row in reader:
print(row_factory(row))
Output:
['Name', 'Product', 'Sales', 'Country', 'Website']
['NaN', 'Milk', '30', 'Germany', 'something.com']
['NaN', 'NaN', 'NaN', 'USA', 'NaN']
['Chris', 'Milk', '40', 'NaN', 'NaN']
You need to specify the correct delimiter:
read_file = reader(opened_file, delimiter=";")
Your CSV file appears to be using a semicolon rather than a comma, so you need to tell reader() what to use.
Tip:
filename = open_dataset(filename)
Don't reassign a variable to mean something else. Before this line executes, filename is a string with the name of the file to open. After this assignment, filename is now a list of rows from the file. Use a different variable name instead:
rows = open_dataset(filename)
Now the two variables are distinct and their meaning is clear from the names. Of course, feel free to use something other than rows other than filename.
You might want to look into using pandas. It can make data processing a lot easier, up to and including reading multi file formats.
if you want to read a csv:
import pandas as pd:
my_file = '/pat/to/my_csv.csv'
pd.read_csv(my_file)
That's because your lists contain only one element, and that element is a single string, in order to parse a string into a list you can split it.
This should do what you need:
for row in filename:
parsed_row = row[0].split(';')
for i in range(0, len(parsed_row)):
if parsed_row[i] == '':
parsed_row[i] = None
print(parsed_row)
I made you a Repl.it example

how to edit a csv in python and add one row after the 2nd row that will have the same values in all columns except 1

I'm new in Python language and i'm facing a small challenge in which i havent been able to figure it out so far.
I receive a csv file with around 30-40 columns and 5-50 rows with various details in each cell. The 1st row of the csv has the title for each column and by the 2nd row i have item values.
What i want to do is to create a python script which will read the csv file and every time to do the following:
Add a row after the actual 1st item row, (literally after the 2nd row, cause the 1st row is titles), and in that new 3rd row to contain the same information like the above one with one difference only. in the column "item_subtotal" i want to add the value from the column "discount total".
all the bellow rows should remain as they are, and save this modified csv as a new file with the word "edited" added in the file name.
I could really use some help because so far i've only managed to open the csv file with a python script im developing, but im not able so far to add the contents of the above row to that newly created row and replace that specific value.
Looking forward any help.
Thank you
Here Im attaching the CSV with some values changed for privacy reasons.
order_id,order_number,date,status,shipping_total,shipping_tax_total,fee_total,fee_tax_total,tax_total,discount_total,order_total,refunded_total,order_currency,payment_method,shipping_method,customer_id,billing_first_name,billing_last_name,billing_company,billing_email,billing_phone,billing_address_1,billing_address_2,billing_postcode,billing_city,billing_state,billing_country,shipping_first_name,shipping_last_name,shipping_address_1,shipping_address_2,shipping_postcode,shipping_city,shipping_state,shipping_country,shipping_company,customer_note,item_id,item_product_id,item_name,item_sku,item_quantity,item_subtotal,item_subtotal_tax,item_total,item_total_tax,item_refunded,item_refunded_qty,item_meta,shipping_items,fee_items,tax_items,coupon_items,order_notes,download_permissions_granted,admin_custom_order_field:customer_type_5
15001_TEST_2,,"2017-10-09 18:53:12",processing,0,0.00,0.00,0.00,5.36,7.06,33.60,0.00,EUR,PayoneCw_PayPal,"0,00",0,name,surname,,name.surname#gmail.com,0123456789,"address 1",,41541_TEST,location,,DE,name,surname,address,01245212,14521,location,,DE,,,1328,302,"product title",103,1,35.29,6.71,28.24,5.36,0.00,0,,"id:1329|method_id:free_shipping:3|method_title:0,00|total:0.00",,id:1330|rate_id:1|code:DE-MWST-1|title:MwSt|total:5.36|compound:,"id:1331|code:#getgreengent|amount:7.06|description:Launchcoupon for friends","text string",1,
You can also use pandas to manipulate the data from the csv like this:
import pandas
import copy
Read the csv file into a pandas dataframe:
df = pandas.read_csv(filename)
Make a deepcopy of the first row of data and add the discount total to the item subtotal:
new_row = copy.deepcopy(df.loc[1])
new_row['item_subtotal'] += new_row['discount total']
Concatenate the first 2 rows with the new row and then everything after that:
df = pandas.concat([df.loc[:1], new_row, df.loc[2:]], ignore_index=True)
Change the filename and write the out the new csv file:
filename = filename.strip('.csv') + 'edited.csv'
df.to_csv(filename)
I hope this helps! Pandas is great for cleanly handling massive amounts of data, but may be overkill for what you are trying to do. Then again, maybe not. It would help to see an example data file.
The first step is to turn that .csv into something that is a little easier to work with. Fortunately, python has the 'csv' module which makes it easy to turn your .csv file into a much nicer list of lists. The below will give you a way to both turn your .csv into a list of lists and turn the modified data back into a .csv file.
import csv
import copy
def csv2list(ifile):
"""
ifile = the path of the csv to be converted into a list of lists
"""
f = open(ifile,'rb')
olist=[]
c = csv.reader(f, dialect='excel')
for line in c:
olist.append(line) #and update the outer array
f.close
return olist
#------------------------------------------------------------------------------
def list2csv(ilist,ofile):
"""
ilist = the list of lists to be converted
ofile = the output path for your csv file
"""
with open(ofile, 'wb') as csvfile:
csvwriter = csv.writer(csvfile, delimiter=',',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
[csvwriter.writerow(x) for x in ilist]
Now, you can simply copy list[1] and change the appropriate element to reflect your summed value using:
listTemp = copy.deepcopy(ilist[1])
listTemp[n] = listTemp[n] + listTemp[n-x]
ilist.insert(2,listTemp)
As for how to change the file name, just use:
import os
newFileName = os.path.splitext(oldFileName)[0] + "edited" + os.path.splitext(oldFileName)[1]
Hopefully this will help you out!

Categories

Resources