Using 3 criteria for a Table Lookup Python - python

Backstory: I'm fairly new to python, and have only ever done things in MATLAB prior.
I am looking to take a specific value from a table based off of data I have.
The data I have is
Temperatures = [0.8,0.1,-0.8,-1.4,-1.7,-1.5,-2,-1.7,-1.7,-1.3,-0.7,-0.2,0.3,1.4,1.4,1.5,1.2,1,0.9,1.3,1.7,1.7,1.6,1.6]
Hour of the Day =
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23]
This is all data for a Monday.
My Monday table looks like this:
Temp | Hr0 | Hr1 | Hr2 ...
-15 < t <= -10 | 0.01 | 0.02 | 0.06 ...
-10 < t <= -5 | 0.04 | 0.03 | 0.2 ...
with the Temperatures increment by +5 until 30, and the hours of the day until 23. The values in the table are constants that I would like to call based off of the temperature and hour.
For example, I'd like to be able to say:
print(monday(1,1)) = 0.01
I would also be doing this for everyday of the week for a mass data analysis, thus the need for it to be efficient.
What I've done so far:
So i have stored all of my tables in dictionaries that look kind of like this:
monday_hr0 = [0.01,0.04, ... ]
So first by column then calling them by the temperature value.
What I have now is a bunch of loops that looks like this:
for i in range (0,365):
for j in range (0,24):
if Day[i] = monday
if hr[i+24*j] = 0
if temp[i] = -15
constant.append(monday_hr1[0])
...
if hr[i+24*j] = 1
if temp[i] = -15
constant.append(monday_hr2[0])
...
...
elif Day[i] = tuesday
if hr[i+24*j] = 0
if temp[i] = -15
constant.append(tuesday_hr1[0])
...
if hr[i+24*j] = 1
if temp[i] = -15
constant.append(tuesday_hr2[0])
...
...
...
I'm basically saying here if it's a monday, use this table. Then if it's this hour use this column. Then if it's this temperature, use this cell. This is VERY VERY inefficient however.
I'm sure there's a quicker way but I can't wrap my head around it. Thank you very much for your help!

Okay, bear with me here, I'm on mobile. I'll try to write up a solution.
I am assuming the following:
you have a dictionary called day_data which contains the table of data for each day of the week.
you have a dictionary called days which maps 0-6 to a day of the week. 0 is monday, 6 is Sunday.
you have a list of temperatures you want something done with
you have a time of the day you want to use to pick out the appropriate data from your day_data. You want to do this for each day of the year.
We should only have to iterate once through all 365 days and once through each hour of the day.
heat-load-days={}
for day_index in range(1,365):
day=Days[day_index%7]
#day is now the Day of the week.
data = day_data[day]
Heat_load =[]
for hour in range(24):
#still unsure on how to select which temperature row from the data table.
Heat_load.append (day_data_selected)
heat-load-days [day] = Heat_load

Related

Is there a better way to iterate through this calculation?

Running this code produces the error message :
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
I have 6 years' worth of competitor's results from a 1/2 marathon in one csv file.
The function year_runners aims to create a new column for each year with a difference in finishing time between each runner.
Is there a more efficient way of producing the same result?
Thanks in advance.
Pos Gun_Time Chip_Time Name Number Category
1 1900-01-01 01:19:15 1900-01-01 01:19:14 Steve Hodges 324 Senior Male
2 1900-01-01 01:19:35 1900-01-01 01:19:35 Theo Bately 92 Supervet Male
#calculating the time difference between each finisher in year and adding this result to into a new column called time_diff
def year_runners(year, x, y):
print('Event held in', year)
# x is the first number (position) for the runner of that year,
# y is the last number (position) for that year e.g. 2016 event spans from df[246:534]
time_diff = 0
#
for index, row in df.iterrows():
time_diff = df2015.loc[(x + 1),'Gun_Time'] - df2015.loc[(x),'Chip_Time']
# using Gun time as the start-time for all.
# using chip time as finishing time for each runner.
# work out time difference between the x-placed runner and the runner behind (x + 1)
df2015.loc[x,'time_diff'] = time_diff #set the time_diff column to the value of time_diff for
#each row of x in the dataframe
print("Runner",(x+1),"time, minus runner" , x,"=",time_diff)
x += 1
if x > y:
break
Hi everyone, this was solved using the shift technique.
youtube.com/watch?v=nZzBj6n_abQ
df2015['shifted_Chip_Time'] = df2015['Chip_Time'].shift(1)
df2015['time_diff'] = df2015['Gun_Time'] - df2015['shifted_Chip_Time']

function to get month values N-(x) from today's month in a dataframe

I have been spending hours trying to write a function to detect trend in a time series by taking the past 4 months months of data prior to today. I organized my monthly data with dt.month but the issue is that I cannot get the previous year's 12th month if today is january. Here is a toy dataset:
data1 = pd.DataFrame({'Id' : ['001','001','001','001','001','001','001','001','001',
'002','002','002','002','002','002','002','002','002',],
'Date': ['2020-01-12', '2019-12-30', '2019-12-01','2019-11-01', '2019-08-04', '2019-08-04', '2019-08-01', '2019-07-20', '2019-06-04',
'2020-01-11', '2019-12-12', '2019-12-01','2019-12-01', '2019-09-10', '2019-08-10', '2019-08-01', '2019-06-20', '2019-06-01'],
'Quantity' :[3,5,6,72,1,5,6,3,9,3,6,7,3,2,5,74,3,4]
})
and my data cleaning to get the format that i need is this:
data1['Date'] =pd.to_datetime(data1['Date'], format='%Y-%m')
data2 = data1.groupby('Id').apply(lambda x: x.set_index('Date').resample('M').sum())['Quantity'].reset_index()
data2['M'] =pd.to_datetime(data2['Date']).dt.month
data2['Y'] =pd.to_datetime(data2['Date']).dt.year
data = pd.DataFrame(data2.groupby(['Id','Date','M','Y'])['Quantity'].sum())
data = data.rename(columns={0 : 'Quantity'})
and my function looks like this:
def check_trend():
today_month = int(time.strftime("%-m"))
data['n3-n4'] = data['Quantity'].loc[data['M']== (today_month - 3)]-data['Quantity'].loc[data['M']== (today_month - 4)]
data['n2-n3'] = data['Quantity'].loc[data['M'] == (today_month - 2)] - data['Quantity'].loc[data['M'] == (today_month - 3)]
data['n2-n1'] = data['Quantity'].loc[data['M'] == (today_month - 2)] - data['Quantity'].loc[data['M'] == (today_month - 1)]
if data['n3-n4'] < 0 and data['n2-n3'] <0 and data['n2-n1'] <0:
elif data['n3-n4'] > 0 and data['n2-n3'] > 0 and dat['n2-n1'] >0:
data['Trend'] = 'Yes'
else:
data['Trend'] = 'No'
print(check_trend)
I have looked at this: Get (year,month) for the last X months but it does not seem to be working for a specific groupby object.
I would really appreciate a hint! At least I would love to know if this method to identify trend in a dataset is a good one. After that I plan on using exponential smoothing if there is no trend and Holt's method if there is trend.
UPDATE: thanks to #Vorsprung durch Technik, I have the function working well but i still struggle to incorporate the result into a new dataframe containing the Ids from data2
forecast = pd.DataFrame()
forecast['Id'] = data1['Id'].unique()
for k,g in data2.groupby(level='Id'):
forecast['trendup'] = g.tail(5)['Quantity'].is_monotonic_increasing
forecast['trendown'] = g.tail(5)['Quantity'].is_monotonic_decreasing
this returns the same value for each row of the dataset, like if it was doing the calculation for only the first one, how can I ensure that it gets calculated for EACH Id value?
I don't think you need check_trend().
There are built-in functions for this:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.is_monotonic_increasing.html
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.is_monotonic_decreasing.html
Let me know if this does what you need:
data2 = data1.groupby('Id').apply(lambda x: x.set_index('Date').resample('M').sum())
for k,g in data2.groupby(level='Id'):
print(g.tail(4)['Quantity'].is_monotonic_increasing)
print(g.tail(4)['Quantity'].is_monotonic_decreasing)
This is what is returned by g.tail(4):
Quantity
Id Date
001 2019-10-31 0
2019-11-30 72
2019-12-31 11
2020-01-31 3
Quantity
Id Date
002 2019-10-31 0
2019-11-30 0
2019-12-31 16
2020-01-31 3

Select dataframe rows defining a date range that contains a needle date

There are lots of answers which make it easy to select some date range and get the ones that fall into that range.
I don't want that.
I have data like this:
id other_flags d_dt_start d_dt_end
0 28 ... 1993-02-12 1993-12-31
1 28 ... 1993-02-12 1993-12-31
2 46 ... 1986-01-15 1993-09-30
3 46 ... 1986-01-15 1993-09-30
4 46 ... 1986-01-15 1993-09-30
I want to select the ones that match when I have a date, say, 1986-06-15, thus giving me the subset of indices 2, 3, and 4. Currently, I'm doing this by something like this:
subs = subs[(time >= subs['d_dt_start1']) # later
& (time <= subs['d_dt_end1'])] # before
There has got to be a more elegant way to do this similar to the between command, just the opposite of that.
Basically, instead of saying 'you have a date, I have a date range', 'you have a date range, I have a date'.

I need help making pandas perform better with dataframe interactions

I'm a newbie and have been studying pandas for a few days, and started my first project with it. I wanted to use it to create a product stock prediction timeline for the current month.
Basically I get the stock and predicted daily reduction and trace a line from today to the end of the month with the predicted stock. Also, if there is a purchase order to be delivered on day XYZ, I add the delivery amount on that day.
I have a dataframe that contain's the stock for today and the predicted daily redutcion for this month
ITEM STOCK DAILY_DEDUCTION
A 1000 20
B 2000 15
C 800 8
D 10000 100
And another dataframe that contains pending purchase orders and amount that will be delivered.
ITEM DATE RECEIVING_AMOUNT
A 2018-05-16 20
B 2018-05-23 15
A 2018-05-17 8
D 2018-05-29 100
I created this loop to iterate through the dataframe and do the following:
subtract the DAILY_DEDUCTION for the item
if the date is the same as a purchase order date, then add the RECEIVING_AMOUNT
df_dates = pd.date_range(start=today, end=endofmonth, freq='D')
temptable = []
for row in df_stock.itertuples(index=True):
predicted_stock= getattr(row, "STOCK")
item = getattr(row, "ITEM")
for date in df_dates:
date_format = date.strftime('%Y-%m-%d')
predicted_stock = predicted_stock - getattr(linha, "DAILY_DEDUCTION")
order_qty = df_purchase_orders.loc[(df_purchase_orders['DATE'] == date_format)
& (df_purchase_orders['ITEM'] == item), 'RECEIVING_AMOUNT']
if len(df_purchase_orders.index) > 0:
predicted_stock = predicted_stock + order_qty.item()
lista = [date_format, item, int(predicted_stock)]
temptable.append(lista)
And... well, it did the job, but it's quite slow. I run this on 100k rows give or take, and was hoping to find some insight on how I can solve this problem in a way that performs better?

Need to compare very large files around 1.5GB in python

"DF","00000000#11111.COM","FLTINT1000130394756","26JUL2010","B2C","6799.2"
"Rail","00000.POO#GMAIL.COM","NR251764697478","24JUN2011","B2C","2025"
"DF","0000650000#YAHOO.COM","NF2513521438550","01JAN2013","B2C","6792"
"Bus","00009.GAURAV#GMAIL.COM","NU27012932319739","26JAN2013","B2C","800"
"Rail","0000.ANU#GMAIL.COM","NR251764697526","24JUN2011","B2C","595"
"Rail","0000MANNU#GMAIL.COM","NR251277005737","29OCT2011","B2C","957"
"Rail","0000PRANNOY0000#GMAIL.COM","NR251297862893","21NOV2011","B2C","212"
"DF","0000PRANNOY0000#YAHOO.CO.IN","NF251327485543","26JUN2011","B2C","17080"
"Rail","0000RAHUL#GMAIL.COM","NR2512012069809","25OCT2012","B2C","5731"
"DF","0000SS0#GMAIL.COM","NF251355775967","10MAY2011","B2C","2000"
"DF","0001HARISH#GMAIL.COM","NF251352240086","22DEC2010","B2C","4006"
"DF","0001HARISH#GMAIL.COM","NF251742087846","12DEC2010","B2C","1000"
"DF","0001HARISH#GMAIL.COM","NF252022031180","09DEC2010","B2C","3439"
"Rail","000AYUSH#GMAIL.COM","NR2151120122283","25JAN2013","B2C","136"
"Rail","000AYUSH#GMAIL.COM","NR2151213260036","28NOV2012","B2C","41"
"Rail","000AYUSH#GMAIL.COM","NR2151313264432","29NOV2012","B2C","96"
"Rail","000AYUSH#GMAIL.COM","NR2151413266728","29NOV2012","B2C","96"
"Rail","000AYUSH#GMAIL.COM","NR2512912359037","08DEC2012","B2C","96"
"Rail","000AYUSH#GMAIL.COM","NR2517612385569","12DEC2012","B2C","96"
Above is the sample data.
Data is sorted according to email addresses and the file is very large around 1.5Gb
I want output in another csv file something like this
"DF","00000000#11111.COM","FLTINT1000130394756","26JUL2010","B2C","6799.2",1,0 days
"Rail","00000.POO#GMAIL.COM","NR251764697478","24JUN2011","B2C","2025",1,0 days
"DF","0000650000#YAHOO.COM","NF2513521438550","01JAN2013","B2C","6792",1,0 days
"Bus","00009.GAURAV#GMAIL.COM","NU27012932319739","26JAN2013","B2C","800",1,0 days
"Rail","0000.ANU#GMAIL.COM","NR251764697526","24JUN2011","B2C","595",1,0 days
"Rail","0000MANNU#GMAIL.COM","NR251277005737","29OCT2011","B2C","957",1,0 days
"Rail","0000PRANNOY0000#GMAIL.COM","NR251297862893","21NOV2011","B2C","212",1,0 days
"DF","0000PRANNOY0000#YAHOO.CO.IN","NF251327485543","26JUN2011","B2C","17080",1,0 days
"Rail","0000RAHUL#GMAIL.COM","NR2512012069809","25OCT2012","B2C","5731",1,0 days
"DF","0000SS0#GMAIL.COM","NF251355775967","10MAY2011","B2C","2000",1,0 days
"DF","0001HARISH#GMAIL.COM","NF251352240086","09DEC2010","B2C","4006",1,0 days
"DF","0001HARISH#GMAIL.COM","NF251742087846","12DEC2010","B2C","1000",2,3 days
"DF","0001HARISH#GMAIL.COM","NF252022031180","22DEC2010","B2C","3439",3,10 days
"Rail","000AYUSH#GMAIL.COM","NR2151213260036","28NOV2012","B2C","41",1,0 days
"Rail","000AYUSH#GMAIL.COM","NR2151313264432","29NOV2012","B2C","96",2,1 days
"Rail","000AYUSH#GMAIL.COM","NR2151413266728","29NOV2012","B2C","96",3,0 days
"Rail","000AYUSH#GMAIL.COM","NR2512912359037","08DEC2012","B2C","96",4,9 days
"Rail","000AYUSH#GMAIL.COM","NR2512912359037","08DEC2012","B2C","96",5,0 days
"Rail","000AYUSH#GMAIL.COM","NR2517612385569","12DEC2012","B2C","96",6,4 days
"Rail","000AYUSH#GMAIL.COM","NR2517612385569","12DEC2012","B2C","96",7,0 days
"Rail","000AYUSH#GMAIL.COM","NR2151120122283","25JAN2013","B2C","136",8,44 days
"Rail","000AYUSH#GMAIL.COM","NR2151120122283","25JAN2013","B2C","136",9,0 days
i.e if entry occurs 1st time i need to append 1 if it occurs 2nd time i need to append 2 and likewise i mean i need to count no of occurences of an email address in the file and if an email exists twice or more i want difference among dates and remember dates are not sorted so we have to sort them also against a particular email address and i am looking for a solution in python using numpy or pandas library or any other library that can handle this type of huge data without giving out of bound memory exception i have dual core processor with centos 6.3 and having ram of 4GB
make sure you have 0.11, read these docs: http://pandas.pydata.org/pandas-docs/dev/io.html#hdf5-pytables, and these recipes: http://pandas.pydata.org/pandas-docs/dev/cookbook.html#hdfstore (esp the 'merging on millions of rows'
Here is a solution that seems to work. Here is the workflow:
read data from your csv by chunks and appending to an hdfstore
iterate over the store, which creates another store that does the combiner
Essentially we are taking a chunk from the table and combining with a chunk from every other part of the file. The combiner function does not reduce, but instead calculates your function (the diff in days) between all elements in that chunk, eliminating duplicates as you go, and taking the latest data after each loop. Kind of like a recursive reduce almost.
This should be O(num_of_chunks**2) memory and calculation time
chunksize could be say 1m (or more) in your case
processing [0] [datastore.h5]
processing [1] [datastore_0.h5]
count date diff email
4 1 2011-06-24 00:00:00 0 0000.ANU#GMAIL.COM
1 1 2011-06-24 00:00:00 0 00000.POO#GMAIL.COM
0 1 2010-07-26 00:00:00 0 00000000#11111.COM
2 1 2013-01-01 00:00:00 0 0000650000#YAHOO.COM
3 1 2013-01-26 00:00:00 0 00009.GAURAV#GMAIL.COM
5 1 2011-10-29 00:00:00 0 0000MANNU#GMAIL.COM
6 1 2011-11-21 00:00:00 0 0000PRANNOY0000#GMAIL.COM
7 1 2011-06-26 00:00:00 0 0000PRANNOY0000#YAHOO.CO.IN
8 1 2012-10-25 00:00:00 0 0000RAHUL#GMAIL.COM
9 1 2011-05-10 00:00:00 0 0000SS0#GMAIL.COM
12 1 2010-12-09 00:00:00 0 0001HARISH#GMAIL.COM
11 2 2010-12-12 00:00:00 3 0001HARISH#GMAIL.COM
10 3 2010-12-22 00:00:00 13 0001HARISH#GMAIL.COM
14 1 2012-11-28 00:00:00 0 000AYUSH#GMAIL.COM
15 2 2012-11-29 00:00:00 1 000AYUSH#GMAIL.COM
17 3 2012-12-08 00:00:00 10 000AYUSH#GMAIL.COM
18 4 2012-12-12 00:00:00 14 000AYUSH#GMAIL.COM
13 5 2013-01-25 00:00:00 58 000AYUSH#GMAIL.COM
import pandas as pd
import StringIO
import numpy as np
from time import strptime
from datetime import datetime
# your data
data = """
"DF","00000000#11111.COM","FLTINT1000130394756","26JUL2010","B2C","6799.2"
"Rail","00000.POO#GMAIL.COM","NR251764697478","24JUN2011","B2C","2025"
"DF","0000650000#YAHOO.COM","NF2513521438550","01JAN2013","B2C","6792"
"Bus","00009.GAURAV#GMAIL.COM","NU27012932319739","26JAN2013","B2C","800"
"Rail","0000.ANU#GMAIL.COM","NR251764697526","24JUN2011","B2C","595"
"Rail","0000MANNU#GMAIL.COM","NR251277005737","29OCT2011","B2C","957"
"Rail","0000PRANNOY0000#GMAIL.COM","NR251297862893","21NOV2011","B2C","212"
"DF","0000PRANNOY0000#YAHOO.CO.IN","NF251327485543","26JUN2011","B2C","17080"
"Rail","0000RAHUL#GMAIL.COM","NR2512012069809","25OCT2012","B2C","5731"
"DF","0000SS0#GMAIL.COM","NF251355775967","10MAY2011","B2C","2000"
"DF","0001HARISH#GMAIL.COM","NF251352240086","22DEC2010","B2C","4006"
"DF","0001HARISH#GMAIL.COM","NF251742087846","12DEC2010","B2C","1000"
"DF","0001HARISH#GMAIL.COM","NF252022031180","09DEC2010","B2C","3439"
"Rail","000AYUSH#GMAIL.COM","NR2151120122283","25JAN2013","B2C","136"
"Rail","000AYUSH#GMAIL.COM","NR2151213260036","28NOV2012","B2C","41"
"Rail","000AYUSH#GMAIL.COM","NR2151313264432","29NOV2012","B2C","96"
"Rail","000AYUSH#GMAIL.COM","NR2151413266728","29NOV2012","B2C","96"
"Rail","000AYUSH#GMAIL.COM","NR2512912359037","08DEC2012","B2C","96"
"Rail","000AYUSH#GMAIL.COM","NR2517612385569","12DEC2012","B2C","96"
"""
# read in and create the store
data_store_file = 'datastore.h5'
store = pd.HDFStore(data_store_file,'w')
def dp(x, **kwargs):
return [ datetime(*strptime(v,'%d%b%Y')[0:3]) for v in x ]
chunksize=5
reader = pd.read_csv(StringIO.StringIO(data),names=['x1','email','x2','date','x3','x4'],
header=0,usecols=['email','date'],parse_dates=['date'],
date_parser=dp, chunksize=chunksize)
for i, chunk in enumerate(reader):
chunk['indexer'] = chunk.index + i*chunksize
# create the global index, and keep it in the frame too
df = chunk.set_index('indexer')
# need to set a minimum size for the email column
store.append('data',df,min_itemsize={'email' : 100})
store.close()
# define the combiner function
def combiner(x):
# given a group of emails (the same), return a combination
# with the new data
# sort by the date
y = x.sort('date')
# calc the diff in days (an integer)
y['diff'] = (y['date']-y['date'].iloc[0]).apply(lambda d: float(d.item().days))
y['count'] = pd.Series(range(1,len(y)+1),index=y.index,dtype='float64')
return y
# reduce the store (and create a new one by chunks)
in_store_file = data_store_file
in_store1 = pd.HDFStore(in_store_file)
# iter on the store 1
for chunki, df1 in enumerate(in_store1.select('data',chunksize=2*chunksize)):
print "processing [%s] [%s]" % (chunki,in_store_file)
out_store_file = 'datastore_%s.h5' % chunki
out_store = pd.HDFStore(out_store_file,'w')
# iter on store 2
in_store2 = pd.HDFStore(in_store_file)
for df2 in in_store2.select('data',chunksize=chunksize):
# concat & drop dups
df = pd.concat([df1,df2]).drop_duplicates(['email','date'])
# group and combine
result = df.groupby('email').apply(combiner)
# remove the mi (that we created in the groupby)
result = result.reset_index('email',drop=True)
# only store those rows which are in df2!
result = result.reindex(index=df2.index).dropna()
# store to the out_store
out_store.append('data',result,min_itemsize={'email' : 100})
in_store2.close()
out_store.close()
in_store_file = out_store_file
in_store1.close()
# show the reduced store
print pd.read_hdf(out_store_file,'data').sort(['email','diff'])
Use the built-in sqlite3 database: you can insert the data, sort and group as necessary, and there's no problem using a file which is larger than available RAM.
Another possible (system-admin) way, avoiding database and SQL queries plus a whole lot of requirements in runtime processes and hardware resources.
Update 20/04 Added more code and simplified approach:-
Convert the timestamp to seconds (from Epoch) and use UNIX sort, using email and this new field (that is: sort -k2 -k4 -n -t, < converted_input_file > output_file)
Initialize 3 variable, EMAIL, PREV_TIME and COUNT
Interate over each line, if new email is encountered, add "1,0 day". Update PREV_TIME=timestamp, COUNT=1, EMAIL=new_email
Next line: 3 possible scenario
a) if same email, different timestamp: calculate days, increment COUNT=1, update PREV_TIME, add "Count, Difference_in_days"
b) If same email, same timestamp: increment COUNT, add "COUNT, 0 day"
c) If new email, start from 3.
Alternative to 1. is to add a new field TIMESTAMP and remove it upon printing out the line.
Note: If 1.5GB is too huge to sort at a go, split it into smaller chuck, using email as the split point. You can run these chunks in parallel on different machine
/usr/bin/gawk -F'","' ' {
split("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC", month, " ");
for (i=1; i<=12; i++) mdigit[month[i]]=i;
print $0 "," mktime(substr($4,6,4) " " mdigit[substr($4,3,3)] " " substr($4,1,2) " 00 00 00"
)}' < input.txt | /usr/bin/sort -k2 -k7 -n -t, > output_file.txt
output_file.txt:
"DF","00000000#11111.COM","FLTINT1000130394756","26JUL2010","B2C","6799.2",1280102400
"DF","0001HARISH#GMAIL.COM","NF252022031180","09DEC2010","B2C","3439",1291852800
"DF","0001HARISH#GMAIL.COM","NF251742087846","12DEC2010","B2C","1000",1292112000
"DF","0001HARISH#GMAIL.COM","NF251352240086","22DEC2010","B2C","4006",1292976000
...
You pipe the output to Perl, Python or AWK script to process step 2. through 4.

Categories

Resources