There are several types of Commands in the third column of the text file. So, I am using the regular expression method to grep the number of occurrences for each type of command.
For example, ACTIVE has occurred 3 times, REFRESH 2 times. I desire to enhance the flexibility of my program. So, I wish to assign the time for each command.
Since one command can happen more than 1 time, if the script supports the command being assigned to the time, then the users will know which ACTIVE occurs at what time. Any guidance or suggestions are welcomed.
The idea is to have more flexible support for the script.
My code:
import re
a = a_1 = b = b_1 = c = d = e = 0
lines = open("page_stats.txt", "r").readlines()
for line in lines:
if re.search(r"WRITING_A", line):
a_1 += 1
elif re.search(r"WRITING", line):
a += 1
elif re.search(r"READING_A", line):
b_1 += 1
elif re.search(r"READING", line):
b += 1
elif re.search(r"PRECHARGE", line):
c += 1
elif re.search(r"ACTIVE", line):
d += 1
File content:
-----------------------------------------------------------------
| Number | Time | Command | Data |
-----------------------------------------------------------------
| 1 | 0015 | ACTIVE | |
| 2 | 0030 | WRITING | |
| 3 | 0100 | WRITING_A | |
| 4 | 0115 | PRECHARGE | |
| 5 | 0120 | REFRESH | |
| 6 | 0150 | ACTIVE | |
| 7 | 0200 | READING | |
| 8 | 0314 | PRECHARGE | |
| 9 | 0318 | ACTIVE | |
| 10 | 0345 | WRITING_A | |
| 11 | 0430 | WRITING_A | |
| 12 | 0447 | WRITING | |
| 13 | 0503 | PRECHARGE | |
| 14 | 0610 | REFRESH | |
Assuming you want to count the occurrences of each command and store
the timestamps of each command as well, would you please try:
import re
count = {}
timestamps = {}
with open ("page_stats.txt", "r") as f:
for line in f:
m = re.split(r"\s*\|\s*", line)
if len(m) > 3 and re.match(r"\d+", m[1]):
count[m[3]] = count[m[3]] + 1 if m[3] in count else 1
if m[3] in timestamps:
timestamps[m[3]].append(m[2])
else:
timestamps[m[3]] = [m[2]]
# see the limited result (example)
#print(count["ACTIVE"])
#print(timestamps["ACTIVE"])
# see the results
for key in count:
print("%-10s: %2d, %s" % (key, count[key], timestamps[key]))
Output:
REFRESH : 2, ['0120', '0610']
WRITING : 2, ['0030', '0447']
PRECHARGE : 3, ['0115', '0314', '0503']
ACTIVE : 3, ['0015', '0150', '0318']
READING : 1, ['0200']
WRITING_A : 3, ['0100', '0345', '0430']
m = re.split(r"\s*\|\s*", line) splits line on a pipe character which
may be preceded and/or followed by blank characters.
Then the list elements m[1], m[2], m[3] are assingned to the Number, Time, Command
in order.
The condition if len(m) > 3 and re.match(r"\d+", m[1]) skips the
header lines.
Then the dictionary variables count and timestamps are assigned,
incremented or appended one by one.
Related
I have a csv which has data that looks like this
id | code | date
-------------+-----------------------------
| 1 | 2 | 2022-10-05 07:22:39+00::00 |
| 1 | 0 | 2022-11-05 02:22:35+00::00 |
| 2 | 3 | 2021-01-05 10:10:15+00::00 |
| 2 | 0 | 2019-01-11 10:05:21+00::00 |
| 2 | 1 | 2022-01-11 10:05:22+00::00 |
| 3 | 2 | 2022-10-10 11:23:43+00::00 |
I want to remove duplicate id based on the following condition -
For code column, choose the value which is not equal to 0 and then choose one which is having latest timestamp.
Add another column prev_code, which contains list of all the remaining value of the code that's not present in code column.
Something like this -
id | code | prev_code
-------------+----------
| 1 | 2 | [0] |
| 2 | 1 | [0,2] |
| 3 | 2 | [] |
There is probably a sleeker solution but something along the following lines should work.
df = pd.read_csv('file.csv')
lastcode = df[df.code!=0].groupby('id').apply(lambda block: block[block['date'] == block['date'].max()]['code'])
prev_codes = df.groupby('id').agg(code=('code', lambda x: [val for val in x if val != lastcode[x.name].values[0]]))['code']
pd.DataFrame({'id': map(lambda x: x[0], lastcode.index.values), 'code': lastcode.values, 'prev_code': prev_codes.values})
so I currently have a large csv containing data for a number of events.
Column one contains a number of dates as well as some id's for each event for example.
Basically I want to write something within Python that whenever there is an id number (AL.....) it creates a new csv with the id number as the title with all the data in it before the next id number so i end up with a csv for each event.
For info the whole csv contains 8 columns but the division into individual csvs is only predicated on column one
Use Python to split a CSV file with multiple headers
I notice this questions is quite similar but in my case I I have AL and then a different string of numbers after it each time and also I want to call the new csvs by the id numbers.
You can achieve this using pandas, so let's first generate some data:
import pandas as pd
import numpy as np
def date_string():
return str(np.random.randint(1, 32)) + "/" + str(np.random.randint(1, 13)) + "/1997"
l = [date_string() for x in range(20)]
l[0] = "AL123"
l[10] = "AL321"
df = pd.DataFrame(l, columns=['idx'])
# -->
| | idx |
|---:|:-----------|
| 0 | AL123 |
| 1 | 24/3/1997 |
| 2 | 8/6/1997 |
| 3 | 6/9/1997 |
| 4 | 31/12/1997 |
| 5 | 11/6/1997 |
| 6 | 2/3/1997 |
| 7 | 31/8/1997 |
| 8 | 21/5/1997 |
| 9 | 30/1/1997 |
| 10 | AL321 |
| 11 | 8/4/1997 |
| 12 | 21/7/1997 |
| 13 | 9/10/1997 |
| 14 | 31/12/1997 |
| 15 | 15/2/1997 |
| 16 | 21/2/1997 |
| 17 | 3/3/1997 |
| 18 | 16/12/1997 |
| 19 | 16/2/1997 |
So, interesting positions are 0 and 10 as there are the AL* strings...
Now to filter the AL* you can use:
idx = df.index[df['idx'].str.startswith('AL')] # get's you all index where AL is
dfs = np.split(df, idx) # splits the data
for out in dfs[1:]:
name = out.iloc[0, 0]
out.to_csv(name + ".csv", index=False, header=False) # saves the data
This gives you two csv files named AL123.csv and AL321.csv with the first line being the AL* string.
I have the following table.
| product | check | check1 | type | amount |
|---------|-------|--------|------|--------|
| A | 1 | a | c | -10 |
| A | 1 | a | p | 20 |
| B | 2 | b | c | 20 |
| B | 2 | b | p | 20 |
| C | 3 | c | c | -10 |
| D | 4 | d | p | 15 |
| D | 4 | d | c | -15 |
I want to sum the amount for the rows where the first three columns are equal and one row in the 'type' columns contains 'C' and the other row a 'P' then also where 'type' = 'C' amount should be negative and when 'type' = 'P' the amount should be positive, otherwise they should not be summed. if they are summed the if the 'amount' is negative, 'type' should be 'c' otherwise 'p'.see required output below:
| product | check | check1 | type | amount |
|---------|-------|--------|------|--------|
| A | 1 | a | p | 10 |
| B | 2 | b | c | 20 |
| B | 2 | b | p | 20 |
| C | 3 | c | c | -10 |
| D | 4 | d | p | 0 |
i have tried group.by on the first three columns and then apply a lambda function;
df = df.groupby(['product', 'check', 'check1']).apply(lambda x, y : x + y, x.loc[(x['type']=='c')], y.loc[(y['type']=='p')], 'amount')
This gives a NameError where 'x' is not defined. I am also not sure if this is the right way to go, so if you have any tips please let me know!
Here is a solution for this, maybe not efficient but it works!
new_df = pd.DataFrame()
for product in df['product'].unique():
for check in df[df['product'] == product].check.unique():
for check1 in df[(df['product'] == product) & (df.check == check)].check1.unique():
tmp = df[(df['product'] == product) & (df.check == check) & (df.check1 == check1)]
if len(tmp[((tmp.type == 'c') & (tmp.amount < 0)) | ((tmp.type == 'p') & (tmp.amount > 0))]) != 2:
new_df = new_df.append(tmp, ignore_index=True)
else:
amount = tmp.sum()['amount']
type = 'c' if amount < 0 else 'p'
elt = {
'product': product,
'check': check,
'check1': check1,
'type': type,
'amount': amount
}
new_df = new_df.append(pd.Series(elt), ignore_index=True)
I would like to take values from two csv files and put that in a single CSV file.
Please refer the Data in those two csv files:
CSV 1:
| | Status | P | F | B | IP | NI | NA | CO | U |
|---|----------|----|---|---|----|----|----|----|---|
| 0 | Sanity 1 | 14 | | | | | | 1 | |
| 1 | Sanity 2 | 13 | | 1 | | | | 1 | |
| | | | | | | | | | |
CSV 2:
| | Status | P | F | B | IP | NI | NA | CO | U |
|---|------------|-----|---|---|----|----|----|----|---|
| 0 | P0 Dry Run | 154 | 1 | | | 1 | | | 5 |
| | | | | | | | | | |
| | | | | | | | | | |
Code:
I tried with following code:
filenames = glob.glob ("C:\\Users\\gomathis\\Downloads\\To csv\\*.csv")
wf = csv.writer(open("C:\\Users\\gomathis\\Downloads\\To
csv\\FinalTR.csv",'wb'))
for f in filenames:
rd = csv.writer(open(f,'r'))
next(rd)
for row in rd:
wf.writerow(row)
Actual result:
While trying with above code, I didn't get the values from those above CSV files.
Expected result:
I need that two files needs to be added in a single csv file and to be saved locally.
Modified code:
filenames = glob.glob ("C:\\Users\\gomathis\\Downloads\\To csv\\*.csv")
wf = csv.writer(open("C:\\Users\\gomathis\\Downloads\\To csv\\FinalTR.csv",'w'))
print(filenames)
for f in filenames:
rd = csv.reader(open(f,'r', newline=''))
next(rd)
for row in rd:
wf.writerow(row)
Latest result:
I got the below result after modifying the code. And I didn't get the index like status P, F,B,etc. Please refer the latest result.
| 0 | P0 Dry Run - 15/02/18 | 154 | 1 | | | 1 | | | 5 |
|---|--------------------------------|-----|---|---|---|---|---|---|---|
| | | | | | | | | | |
| 0 | Sanity in FRA Prod - 15/02/18 | 14 | | | | | | 1 | |
| | | | | | | | | | |
| 1 | Sanity in SYD Gamma - 15/02/18 | 13 | | 1 | | | | 1 | |
You need to call the csv reader method over your csv files in the loop.
rd = csv.reader(open(f,'r'))
import csv
import glob
dest_fname = "C:\\Users\\gomathis\\Downloads\\To csv\\FinalTR.csv"
src_fnames = glob.glob("C:\\Users\\gomathis\\Downloads\\To csv\\*.csv")
with open(dest_fname, 'w', newline='') as f_out:
writer = csv.writer(fout)
copy_headers = True
for src_fname in src_fnames:
# don't want to overwrite destination file
if src_fname.endswith('FinalTR.csv'):
continue
with open(src_fname, 'r', newline='') as f_in:
reader = csv.reader(f_in)
# header row is copied from first csv and skipped on the rest
if copy_headers:
copy_headers = False
else:
next(reader) # skip header
for row in reader:
writer.writerow(row)
Notes:
Placed your open() in with-statements for automatic closing of files.
Removed the binary flag from file modes and added newline='' which is needed for files passed to csv.reader and csv.writer in Python 3.
Changed from csv.writer to csv.reader for the files you were reading from.
Added a copy_headers flag to enable copying headers from the first file and skipping copying of headers from any files after that.
Check source filename and skip it when it matches the destination filename.
I have an example csv file with name 'r2.csv':
Factory | Product_Number | Date | Avg_Noshow | Walk_Cost | Room_Rev
-------------------------------------------------------------------------
A | 1 | 01MAY2017 | 5.6 | 125 | 275
-------------------------------------------------------------------------
A | 1 | 02MAY2017 | 0 | 200 | 300
-------------------------------------------------------------------------
A | 1 | 03MAY2017 | 6.6 | 150 | 250
-------------------------------------------------------------------------
A | 1 | 04MAY2017 | 7.5 | 175 | 325
-------------------------------------------------------------------------
And I would like to read the file and calculate as output using the following code:
I have the following python code to read a csv file and transfer the columns to arrays:
# Read csv file
import numpy as np
import scipy.stats as stats
from scipy.stats import poisson, norm
import csv
with open('r2.csv', 'r') as infile:
reader = csv.DictReader(infile)
data = {}
for row in reader:
for header, value in row.items():
try:
data[header].append(value)
except KeyError:
data[header] = [value]
# Transfer the column from list to arrays for later calculation.
mu = data['Avg_Noshow']
cs = data['Walk_Cost']
co = data['Room_Rev']
mu = map(float,mu)
cs = map(float,cs)
co = map(float,co)
The prior part works fine and it reads data. Following is the function for calculation.
# The following 'map()' function calculates Overbooking number
Overbooking_Number = map(lambda mu_,cs_,co_:np.ceil(poisson.ppf(co_/(cs_+co_),mu_)),mu,cs,co)
data['Overbooking_Number'] = Overbooking_Number
header = 'LOC_ID', 'Prod_Class_ID', 'Date', 'Avg_Noshow', 'Walk_Cost', 'Room_Rev', 'Overbooking_Number'
# Write to an output file
with open("output.csv",'wb') as resultFile:
wr = csv.writer(resultFile,quoting=csv.QUOTE_ALL)
wr.writerow(header)
z = zip(data['LOC_ID'],data['Prod_Class_ID'],data['Date'],data['Avg_Noshow'],data['Walk_Cost'],data['Room_Rev'],data['Overbooking_Number'])
for i in z:
wr.writerow(i)
It works fine as well.
However, if I would like to calculate and output 'Overbooking_Number' using the above function only if 'Avg_Noshow > 0' and output 'Overbooking_Number = 0' if 'Avg_Noshow = 0'?
For example, the output table may look like below:
Factory | Product_Number | Date | Avg_Noshow | Walk_Cost | Room_Rev | Overbooking_Number
----------------------------------------------------------------------------------------------
A | 1 | 01MAY2017 | 5.6 | 125 | 275 | ...
----------------------------------------------------------------------------------------------
A | 1 | 02MAY2017 | 0 | 200 | 300 | 0
----------------------------------------------------------------------------------------------
A | 1 | 03MAY2017 | 6.6 | 150 | 250 | ...
----------------------------------------------------------------------------------------------
A | 1 | 04MAY2017 | 7.5 | 175 | 325 | ...
----------------------------------------------------------------------------------------------
What shall I add a conditional if into my map(+lambda) function?
Thank you!
If I understand correctly, the condition is that mu should be higher than zero. In this case, I think you can simply use Python's "ternary operator" like this:
Overbooking_Number = map(lambda mu_, cs_, co_:
(np.ceil(poisson.ppf(co_ / (cs_ + co_), mu_)
if mu_ > 0 else 0),
mu, cs, co)