Sort CSV with unformatted date - python

How do I sort the following CSV file with the date from newest to oldest? The dates are unformatted, I know I can format them, But what methods can be applied for both of the conditions?
IDN,NAME,Gender,DOJ,JOB ID,SALARY
100,Alpha Fenn,M,17-06-2003,AD_PRES,24000
101,Axpire Ced,F,2-9-2005,AD_VP,17000
102,Winston Cor,M,13-01-2001,AD_VP,17000
103,Relv Dest,M,3/1/2006,IT_PROG,9000
Is there any way to sort the whole CSV file with the order of DATE (DOJ)?
Sorted_Data = sorted(csv.reader(open('Empl.csv')), key=lambda x:datetime.strptime(x[4],"%d/%m/%Y"), reverse=True))
The above code does works but only if the date is well-formatted and It only sorts the one column.
After sorting it should look like this:
IDN,NAME,Gender,DOJ,JOB ID,SALARY
103,Relv Dest,M,3/1/2006,IT_PROG,9000
101,Axpire Ced,F,21-09-2005,AD_VP,17000
100,Alpha Fenn,M,17-06-2003,AD_PRES,24000
102,Winston Cor,M,13-01-2001,AD_VP,17000

Use pandas and sort_values
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO("""IDN,NAME,Gender,DOJ,JOB ID,SALARY
100,Alpha Fenn,M,17-06-2003,AD_PRES,24000
101,Axpire Ced,F,2-9-2005,AD_VP,17000
102,Winston Cor,M,13-01-2001,AD_VP,17000
103,Relv Dest,M,3/1/2006,IT_PROG,9000"""))
# Or if you have it in a csv file then use
# df = pd.read_csv('file_name.csv')
df['DOJ'] = pd.to_datetime(df['DOJ'])
df.sort_values(by=['DOJ'], ascending=False, inplace=True)
df.to_csv()
output
',IDN,NAME,Gender,DOJ,JOB ID,SALARY\n
3,103,Relv Dest,M,2006-03-01,IT_PROG,9000\n
1,101,Axpire Ced,F,2005-02-09,AD_VP,17000\n
0,100,Alpha Fenn,M,2003-06-17,AD_PRES,24000\n
2,102,Winston Cor,M,2001-01-13,AD_VP,17000\n'

Related

Partial string filter pandas

On Pandas 1.3.4 and Python 3.9.
So I'm having issues filtering for a partial piece of the string. The "Date" column is listed in the format of MM/DD/YYYY HH:MM:SS A/PM where the most recent one is on top. If the date is single digit (example: November 3rd), it does not have the 0 such that it is 11/3 instead of 11/03. Basically I'm looking to go look at column named "Date" and have python read parts of the string to filter for only today.
This is what the original csv looks like. This is what I want to do to the file. Basically looking for a specific date but not any time of that date and implement the =RIGHT() formula. However this is what I end up with with the following code.
from datetime import date
import pandas as pd
df = pd.read_csv(r'file.csv', dtype=str)
today = date.today()
d1 = today.strftime("%m/%#d/%Y") # to find out what today is
df = pd.DataFrame(df, columns=['New Phone', 'Phone number', 'Date'])
df['New Phone'] = df['Phone number'].str[-10:]
df_today = df['Date'].str.contains(f'{d1}',case=False, na=False)
df_today.to_csv(r'file.csv', index=False)
This line is wrong:
df_today = df['Date'].str.contains(f'{d1}',case=False, na=False)
All you're doing there is creating a mask; essentially what that is is just a Pandas series, containg True or False in each row, according to the condition you created the mask in. The spreadsheet get's only FALSE as you showed because non of the items in the Date contain the string that the variable d1 holds...
Instead, try this:
from datetime import date
import pandas as pd
# Load the CSV file, and change around the columns
df = pd.DataFrame(pd.read_csv(r'file.csv', dtype=str), columns=['New Phone', 'Phone number', 'Date'])
# Take the last ten chars of each phone number
df['New Phone'] = df['Phone number'].str[-10:]
# Convert each date string to a pd.Timestamp, removing the time
df['Date'] = pd.to_datetime(df['Date'].str.split(r'\s+', n=1).str[0])
# Get the phone numbers that are from today
df_today = df[df['Date'] == date.today().strftime('%m/%d/%Y')]
# Write the result to the CSV file
df_today.to_csv(r'file.csv', index=False)

How can I group by a part of timestamp value in pandas?

I try to split and output the csv file. I must use the date to be the file name but don't need the time.
So I want to split the Order_Date, which is a timestamp that has both date and time.
How can I group by a part of value in pandas?
There is my code:
import csv
import re
import pandas as pd
import os
df = pd.read_csv('test.csv',delimiter='|')
for i,x in df.groupby('Order_Date'):
p = os.path.join(r'~/Desktop/',("data_{}.csv").format(i.lower()))
x.to_csv(p,sep = '|', index=False)
Now I can get this:
data_2019-07-23 00:06:00.csv
data_2019-07-23 00:06:50.csv
data_2019-07-23 00:06:55.csv
data_2019-07-28 12:31:00.csv
Example test.csv data:
Channel|Store_ID|Store_Code|Store_Type|Order_ID|Order_Date|Member_ID|Member_Tier|Coupon_ID|Order_Total|Material_No|Material_Name|Size|Quantity|Unit_Price|Line_Total|Discount_Amount
ECOM|ECOM|ECOM|ECOM|A190700|2019-07-23 00:06:00||||1064.00|7564|Full Zip|750|1.00|399.00|168.00|231.00
ECOM|ECOM|ECOM|ECOM|A190700|2019-07-23 00:06:00||||1064.00|1361|COOL TEE|200|1.00|199.00|84.00|115.00
ECOM|ECOM|ECOM|ECOM|A190700|2019-07-23 00:06:00||||1064.00|7699|PANT|690|1.00|499.00|210.00|289.00
ECOM|ECOM|ECOM|ECOM|A190700|2019-07-23 00:06:00||||1064.00|8700|AI DRESS|690|1.00|399.00|196.00|203.00
ECOM|ECOM|ECOM|ECOM|A190700|2019-07-23 00:06:50||||1064.00|8438|COPA|690|1.00|229.00|112.00|117.00
ECOM|ECOM|ECOM|ECOM|A190700|2019-07-23 00:06:55||||1064.00|8324|CLASS|350|1.00|599.00|294.00|305.00
ECOM|ECOM|ECOM|ECOM|A190701|2019-07-28 12:31:00||||798.00|3689|DRESS|500|1.00|699.00|294.00|405.00
Expect I get this:
data_2019-07-23.csv
data_2019-07-28.csv
Any help would be very much appreciated.
You need to convert Order_Date to dates - stripping the time information. One quick way to do this is:
df['Order_Date1'] = pd.to_datetime(df['Order_Date']).dt.strftime('%Y-%m-%d')
Then proceed with a groupby using Order_Date1.
You can try making the i a string and then using .split() and then using the 0 index:
str(i).split()[0]
so replaced in your code:
for i,x in df.groupby('Order_Date'):
p = os.path.join(r'~/Desktop/',("data_{}.csv").format(str(i).split()[0]))
x.to_csv(p,sep = '|', index=False)

How do i add two dates that are saved in .json files?

I am having a hard time summing two dates that are saved in two separate json files. I want to add set dates together which are saved in separate libraries.
The first file (A1.json) contains: {"expires": "2019-09-11"}
The second file (Whitelist.json) contains: {"expires": "0000-01-00"}
These dates are created by using tkcalendar and are later exported to these seperate files, the idea being that summing them lets me set a time date one month into the future. However, I can't seem to add them together without some form of an error.
I have tried converting the json files to strings in python and then adding them and also using the striptime command to sum the dates.
Here is the relevant chunk of the code:
{with open('A1.json') as f:
data=json.loads(f.read())
for material in data.items():
A1 = (format(material[1]['expires']))
with open('Whitelist.json') as f:
data=json.loads(f.read())
for material in data.items():
A2 = (format(material[1]['expires']))
print(A1+A2)}
When this is used, they just get pasted one after another. They don't get summed the way I need.
I also have tried the following code:
{t1 = dt.datetime.strptime('A1', '%d-%m-%Y')
t2 = dt.datetime.strptime('Whitelist', '%d-%m-%Y')
time_zero = dt.datetime.strptime('00:00:00', '%d/%m/%Y')
print((t1 - time_zero + Whitelist).time())}
However, this constantly gives out ValueError: time data does not match format '%y:%m:%d'.
What I expect is the sum of 2019-09-11 and 0000-01-00's result is 2019-10-11. However, the result is 2019-09-110000-01-00. Trying the strptime method gives out ValueErrors such as: ValueError: time data does not match format '%y:%m:%d'.
Thank you in advance, and I apologize if I did something wrong on my first post.
Use pandas:
the actual format of the json file isn't provided, so use something like the following to get the data into a DataFrame:
pd.read_json('A1.json', orient='records'): parameters will depend on the format of the file
json_normalize
d2 is not a proper datetime format so don't try to convert it.
the Code section below, will use a dict to set up the DataFrame for the example.
json files to DataFrames:
df1 = pd.read_json('A1.json', orient='records')
df2 = pd.read_json('Whitelist.json', orient='records')
df = pd.DataFrame()
df['expires'] = df1.expires
df['d2'] = df2.expires
Code:
import pandas as pd
df = pd.DataFrame({"expires": ["2019-09-11", "2019-10-11", "2019-11-11"],
"d2": ["0000-01-00", "0000-02-00", "0000-03-00"]})
Expand d2 using str.split:
df.expires = pd.to_datetime(df.expires)
df[['y', 'm', 'd']] = df.d2.str.split('-', expand=True)
Use pd.DateOffset:
df['expires_new'] = df[['expires', 'm']].apply(lambda x: x[0] + pd.DateOffset(months=int(x[1])), axis=1)
if d2 is expected to have more than just a new m or month value, the lambda expression can be changed to call a function that adjusts for y, m, and d values.

Why can't I search for a row in a pandas df using a date as part of a tuple index?

I am trying to search a pandas df I made which has a tuple as an index. The first part of the tuple is a date and the second part is a forex pair. I've tried a few things but I can't seem to search using a date-formatted string as part of a tuple with .loc or .ix
My df looks like this:
Open Close
(11-01-2018, AEDAUD) 0.3470 0.3448
(11-01-2018, AEDCAD) 0.3415 0.3408
(11-01-2018, AEDCHF) 0.2663 0.2656
(11-01-2018, AEDDKK) 1.6955 1.6838
(11-01-2018, AEDEUR) 0.2277 0.2261
Here is the complete code :
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
forex_11 = pd.read_csv('FOREX_20180111.csv', sep=',', parse_dates=['Date'])
forex_12 = pd.read_csv('FOREX_20180112.csv', sep=',', parse_dates=['Date'])
time_format = '%d-%m-%Y'
forex = forex_11.append(forex_12, ignore_index=False)
forex['Date'] = forex['Date'].dt.strftime(time_format)
GBP = forex[forex['Symbol'] == "GBPUSD"]
forex.index = list(forex[['Date', 'Symbol']].itertuples(index=False, name=None))
forex_open_close = pd.DataFrame(np.array(forex[['Open','Close']]), index=forex.index)
forex_open_close.columns = ['Open', 'Close']
print(forex_open_close.head())
print(forex_open_close.ix[('11-01-2018', 'GBPUSD')])
How do I get the row which has index ('11-01-2018', 'GBPUSD') ?
Can you try putting the tuple in a list using brackets?
Like this:
print(forex_open_close.ix[[('11-01-2018', 'GBPUSD')]])
I would recommend using the Pandas multiIndex. In your case you could do the following:
tuples = list(data[['Date', 'Symbol']].itertuples(index=False, name=None))
data.index = pd.MultiIndex.from_tuples(tuples, names=['Date', 'Symbol'])
# And then to index
data.loc['2018-01-11', 'AEDCAD']

how to extract date/time parameters from a list of strings?

i have a pandas dataframe having a column as
from pandas import DataFrame
df = pf.DataFrame({ 'column_name' : [u'Monday,30 December,2013', u'Delivered', u'19:23', u'1']})
now i want to extract every thing from it and store in three columns as
date status time
[30/December/2013] ['Delivered'] [19:23]
i have so far used this :
import dateutil.parser as dparser
dparser.parse([u'Monday,30 December,2013', u'Delivered', u'19:23', u'1'])
but this throws an error . can anyone please guide me to a solution ?
You can apply() a function to a column, see the whole example:
from pandas import DataFrame
df = DataFrame({'date': ['Monday,30 December,2013'], 'delivery': ['Delivered'], 'time': ['19:23'], 'status':['1']})
# delete the status column
del df['status']
def splitter(val):
parts = val.split(',')
return parts[1]
df['date'] = df['date'].apply(splitter)
This yields a dataframe with date, delivery and the time.

Categories

Resources