There are several types of Commands in the third column of the text file. So, I am using the regular expression method to grep the number of occurrences for each type of command.
For example, ACTIVE has occurred 3 times, REFRESH 2 times. I desire to enhance the flexibility of my program. So, I wish to assign the time for each command.
Since one command can happen more than 1 time, if the script supports the command being assigned to the time, then the users will know which ACTIVE occurs at what time. Any guidance or suggestions are welcomed.
The idea is to have more flexible support for the script.
My code:
import re
a = a_1 = b = b_1 = c = d = e = 0
lines = open("page_stats.txt", "r").readlines()
for line in lines:
if re.search(r"WRITING_A", line):
a_1 += 1
elif re.search(r"WRITING", line):
a += 1
elif re.search(r"READING_A", line):
b_1 += 1
elif re.search(r"READING", line):
b += 1
elif re.search(r"PRECHARGE", line):
c += 1
elif re.search(r"ACTIVE", line):
d += 1
File content:
-----------------------------------------------------------------
| Number | Time | Command | Data |
-----------------------------------------------------------------
| 1 | 0015 | ACTIVE | |
| 2 | 0030 | WRITING | |
| 3 | 0100 | WRITING_A | |
| 4 | 0115 | PRECHARGE | |
| 5 | 0120 | REFRESH | |
| 6 | 0150 | ACTIVE | |
| 7 | 0200 | READING | |
| 8 | 0314 | PRECHARGE | |
| 9 | 0318 | ACTIVE | |
| 10 | 0345 | WRITING_A | |
| 11 | 0430 | WRITING_A | |
| 12 | 0447 | WRITING | |
| 13 | 0503 | PRECHARGE | |
| 14 | 0610 | REFRESH | |
Assuming you want to count the occurrences of each command and store
the timestamps of each command as well, would you please try:
import re
count = {}
timestamps = {}
with open ("page_stats.txt", "r") as f:
for line in f:
m = re.split(r"\s*\|\s*", line)
if len(m) > 3 and re.match(r"\d+", m[1]):
count[m[3]] = count[m[3]] + 1 if m[3] in count else 1
if m[3] in timestamps:
timestamps[m[3]].append(m[2])
else:
timestamps[m[3]] = [m[2]]
# see the limited result (example)
#print(count["ACTIVE"])
#print(timestamps["ACTIVE"])
# see the results
for key in count:
print("%-10s: %2d, %s" % (key, count[key], timestamps[key]))
Output:
REFRESH : 2, ['0120', '0610']
WRITING : 2, ['0030', '0447']
PRECHARGE : 3, ['0115', '0314', '0503']
ACTIVE : 3, ['0015', '0150', '0318']
READING : 1, ['0200']
WRITING_A : 3, ['0100', '0345', '0430']
m = re.split(r"\s*\|\s*", line) splits line on a pipe character which
may be preceded and/or followed by blank characters.
Then the list elements m[1], m[2], m[3] are assingned to the Number, Time, Command
in order.
The condition if len(m) > 3 and re.match(r"\d+", m[1]) skips the
header lines.
Then the dictionary variables count and timestamps are assigned,
incremented or appended one by one.
I have a dataframe with a grouped date and a count, if there is any gap in this time series I have to fill them with excess of previous stacks, and if no gaps extend the series until all counts = 1.
These examples happen all same month
NOTE: day_date is a timestamp with daily frequency where missing values are 0, did integer for simplicity in example
An example with missing gaps but no previous stacks:
| day_date | stack |
| -------- | ----- |
| 1 | 0 |
| 2 | 2 |
Produces
| day_date | stack |
| -------- | ----- |
| 1 | 0 |
| 2 | 1 | #
| 3 | 1 | # The entire period flattents to a day frequency with value = 1
An example of days being over stacked and filling gaps:
| day_date | stack |
| -------- | ----- |
| 1 | 0 |
| 2 | 2 | #this row wont be able to fill until the 6th
| 6 | 3 | #this row and below will craete overlap
| 8 | 2 |
| 15 | 1 | # there is a big gap here that will get filled as much as possible from previous overlap
Produces:
| day_date | stack |
| -------- | ----- |
| 1 | 0 |
| 2 | 1 |
| 3 | 1 |
| 4 | 0 | # the previous staack coverd only until the 3rd.
| 5 | 0 |
| 6 | 1 |
| 7 | 1 |
| 8 | 1 | #Here is an overal of last stack from 6 and 2 days from 8, this results on the two days from 8 moving forward to fill gaps as the day is covered from past stack.
| 9 | 1 | # there is a big gap here that will get filled as much as possible from previous overlap from the 8th, which is 2 days that fill 9th and 10th.
| 10 | 1 |
| 11 | 0 |
| 12 | 0 |
| 13 | 0 |
| 14 | 0 |
| 15 | 1 | #last stack.
Note that the reason 9th and 10th have a 1 is because the excess from the date 8 which was covered since the big refill that happened the 6th and covered from 6th to 8th.
EDIT: using timestamps
Maybe a more readable solution (for beginners) using for loops and a bunch of if statements:
import pandas as pd
lst = [[pd.Timestamp(year=2017, month=1, day=1), 0],
[pd.Timestamp(year=2017, month=1, day=2), 2],
[pd.Timestamp(year=2017, month=1, day=10), 3],
[pd.Timestamp(year=2017, month=2, day=1), 2],
[pd.Timestamp(year=2017, month=2, day=3), 2]]
df = pd.DataFrame(lst, columns=['day_date', 'stack'])
n_days = (df.day_date.max() - df.day_date.min()).days + 1
stack = 0
for index in range(n_days):
stack += df.loc[index, 'stack']
# insert new day
if index + 1 < len(df): # if you are not at the end of the dataframe
next_day = df.loc[index+1].day_date # compute the next day in dataframe
this_day = df.loc[index].day_date # compute this day
if df.loc[index, 'stack'] >= 1:
df.loc[index, 'stack'] = 1
stack -= 1
if this_day + pd.DateOffset(1) != next_day: # if there is a gap in days
for new_day in range(1, (next_day - this_day).days):
if stack > 0:
df.loc[len(df)] = [this_day + pd.DateOffset(new_day), 1]
stack -= 1
else:
df.loc[len(df)] = [this_day + pd.DateOffset(new_day), 0]
df = df.sort_values('day_date').reset_index(drop=True)
else:
if df.loc[index, 'stack'] >= 1:
df.loc[index, 'stack'] = 1
stack -= 1
while stack >= 1:
this_day = df.loc[len(df)-1].day_date
df.loc[len(df)] = [this_day + pd.DateOffset(1), 1]
stack -= 1
This is not such an easy task (if needed to perform in a vectorial way).
You can calculate first the remainder days to carry them to the next date, then use reindexing to duplicate/fill the rows:
remainder = (df['stack'].add(df['day_date'].diff(-1))
.fillna(0, downcast='infer').clip(lower=0)
)
df2 = (df
# shift extra "stack" to next stack
.assign(stack=df['stack'].sub(remainder).add(remainder.shift(fill_value=0)))
# repeat rows using "stack" value with a minimum of 1
.loc[lambda d: d.index.repeat(d['stack'].clip(lower=1))]
# make stack>1 equal to 1
# and increment the days per group
.assign(stack=lambda d: d['stack'].clip(upper=1),
day_date=lambda d: d['day_date'].add(
(m:=d['day_date'].duplicated())
.astype(int)
.groupby((~m).cumsum())
.cumsum()
)
)
# fill missing days (all remaining lines)
.set_index('day_date')
.reindex(range(df['day_date'].min(), df['day_date'].max()+1))
.fillna(0, downcast='infer')
.reset_index()
)
output:
day_date stack
0 1 0
1 2 1
2 3 1
3 4 0
4 5 0
5 6 1
6 7 1
7 8 1
8 9 1
9 10 1
10 11 0
11 12 0
12 13 0
13 14 0
14 15 1
I have a column in a dataframe as follows:
| Category |
------------
| B5050.88
| 5051.90
| B5050.97Q
| 5051.23B
| 5051.78E
| B5050.11
| 5051.09
| Z5052
I want to extract the text after the period. For example, from B5050.88, I want only "88"; from 5051.78E, I want only "78E"; for Z50502, it would be nothing as there's no period.
Expected output:
| Category | Digits |
---------------------
| B5050.88 | 88 |
| 5051.90 | 90 |
| B5050.97Q| 97Q |
| 5051.23B | 23B |
| 5051.78E | 78E |
| B5050.11 | 11 |
| 5051.09 | 09 |
| Z5052 | - |
I tried using this
df['Digits'] = df.Category.str.extract('.(.*)')
But I'm not getting the right answer. Using the above, for B5050.88, I'm getting the same B5050.88; for 5051.09, I'm getting NaN. Basically NaN if there's no text.
You can do
splits = [str(p).split(".") for p in df["Category"]]
df["Digits"] = [p[1] if len(p)>1 else "-" for p in splits]
i.e
df = pd.DataFrame({"Category":["5050.88","5051.90","B5050.97","5051.23B","5051.78E",
"B5050.11","5051.09","Z5052"]})
#df
# Category
# 0 5050.88
# 1 5051.90
# 2 B5050.97
# 3 5051.23B
# 4 5051.78E
# 5 B5050.11
# 6 5051.09
# 7 Z5052
splits = [str(p).split(".") for p in df["Category"]]
splits
# [['5050', '88'],
# ['5051', '90'],
# ['B5050', '97'],
# ['5051', '23B'],
# ['5051', '78E'],
# ['B5050', '11'],
# ['5051', '09'],
# ['Z5052']]
df["Digits"] = [p[1] if len(p)>1 else "-" for p in splits]
df
# Category Digits
# 0 5050.88 88
# 1 5051.90 90
# 2 B5050.97 97
# 3 5051.23B 23B
# 4 5051.78E 78E
# 5 B5050.11 11
# 6 5051.09 09
# 7 Z5052 -
not so pretty but it works
EDIT:
Added the "-" instead of NaN and the code snippet
Another way
df.Category.str.split('[\.]').str[1]
0 88
1 90
2 97Q
3 23B
4 78E
5 11
6 09
7 NaN
Alternatively
df.Category.str.extract('((?<=[.])(\w+))')
You need to escape your first . and do fillna:
df["Digits"] = df["Category"].astype(str).str.extract("\.(.*)").fillna("-")
print(df)
Output:
Category Digits
0 B5050.88 88
1 5051.90 90
2 B5050.97Q 97Q
3 5051.23B 23B
4 5051.78E 78E
5 B5050.11 11
6 5051.09 09
7 Z5052 -
try out below :
df['Category'].apply(lambda x : x.split(".")[-1] if "." in list(x) else "-")
check below code
I know that I've seen some example somewhere before but for the life of me I cannot find it when googling around.
I have some rows of data:
data = [[1,2,3],
[4,5,6],
[7,8,9],
]
And I want to output this data in a table, e.g.
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 4 | 5 | 6 |
+---+---+---+
| 7 | 8 | 9 |
+---+---+---+
Obviously I could use a library like prettytable or download pandas or something but I'm very disinterested in doing that.
I just want to output my rows as tables in my Jupyter notebook cell. How do I do this?
There is a nice trick: wrap the data with pandas DataFrame.
import pandas as pd
data = [[1, 2], [3, 4]]
pd.DataFrame(data, columns=["Foo", "Bar"])
It displays data like:
| Foo | Bar |
0 | 1 | 2 |
1 | 3 | 4 |
I just discovered that tabulate has a HTML option and is rather simple to use.
Update: As of Jupyter v6 and later, the returned table should just render via the output cell:
import tabulate
data = [["Sun",696000,1989100000],
["Earth",6371,5973.6],
["Moon",1737,73.5],
["Mars",3390,641.85]]
table = tabulate.tabulate(data, tablefmt='html')
table
As for Jupyter v5 or earlier, you may need to be more explicit, similar to Werner's answer:
from IPython.display import HTML, display
display(HTML(table))
Still looking for something simple to use to create more complex table layouts like with latex syntax and formatting to merge cells and do variable substitution in a notebook:
Allow references to Python variables in Markdown cells #2958
I finally re-found the jupyter/IPython documentation that I was looking for.
I needed this:
from IPython.display import HTML, display
data = [[1,2,3],
[4,5,6],
[7,8,9],
]
display(HTML(
'<table><tr>{}</tr></table>'.format(
'</tr><tr>'.join(
'<td>{}</td>'.format('</td><td>'.join(str(_) for _ in row)) for row in data)
)
))
(I may have slightly mucked up the comprehensions, but display(HTML('some html here')) is what we needed)
tabletext fit this well
import tabletext
data = [[1,2,30],
[4,23125,6],
[7,8,999],
]
print tabletext.to_text(data)
result:
┌───┬───────┬─────┐
│ 1 │ 2 │ 30 │
├───┼───────┼─────┤
│ 4 │ 23125 │ 6 │
├───┼───────┼─────┤
│ 7 │ 8 │ 999 │
└───┴───────┴─────┘
If you don't mind using a bit of html, something like this should work.
from IPython.display import HTML, display
def display_table(data):
html = "<table>"
for row in data:
html += "<tr>"
for field in row:
html += "<td><h4>%s</h4></td>"%(field)
html += "</tr>"
html += "</table>"
display(HTML(html))
And then use it like this
data = [[1,2,3],[4,5,6],[7,8,9]]
display_table(data)
I used to have the same problem. I could not find anything that would help me so I ended up making the class PrintTable--code below. There is also an output. The usage is simple:
ptobj = PrintTable(yourdata, column_captions, column_widths, text_aligns)
ptobj.print()
or in one line:
PrintTable(yourdata, column_captions, column_widths, text_aligns).print()
Output:
-------------------------------------------------------------------------------------------------------------
Name | Column 1 | Column 2 | Column 3 | Column 4 | Column 5
-------------------------------------------------------------------------------------------------------------
Very long name 0 | 0 | 0 | 0 | 0 | 0
Very long name 1 | 1 | 2 | 3 | 4 | 5
Very long name 2 | 2 | 4 | 6 | 8 | 10
Very long name 3 | 3 | 6 | 9 | 12 | 15
Very long name 4 | 4 | 8 | 12 | 16 | 20
Very long name 5 | 5 | 10 | 15 | 20 | 25
Very long name 6 | 6 | 12 | 18 | 24 | 30
Very long name 7 | 7 | 14 | 21 | 28 | 35
Very long name 8 | 8 | 16 | 24 | 32 | 40
Very long name 9 | 9 | 18 | 27 | 36 | 45
Very long name 10 | 10 | 20 | 30 | 40 | 50
Very long name 11 | 11 | 22 | 33 | 44 | 55
Very long name 12 | 12 | 24 | 36 | 48 | 60
Very long name 13 | 13 | 26 | 39 | 52 | 65
Very long name 14 | 14 | 28 | 42 | 56 | 70
Very long name 15 | 15 | 30 | 45 | 60 | 75
Very long name 16 | 16 | 32 | 48 | 64 | 80
Very long name 17 | 17 | 34 | 51 | 68 | 85
Very long name 18 | 18 | 36 | 54 | 72 | 90
Very long name 19 | 19 | 38 | 57 | 76 | 95
-------------------------------------------------------------------------------------------------------------
The code for the class PrintTable
# -*- coding: utf-8 -*-
# Class
class PrintTable:
def __init__(self, values, captions, widths, aligns):
if not all([len(values[0]) == len(x) for x in [captions, widths, aligns]]):
raise Exception()
self._tablewidth = sum(widths) + 3*(len(captions)-1) + 4
self._values = values
self._captions = captions
self._widths = widths
self._aligns = aligns
def print(self):
self._printTable()
def _printTable(self):
formattext_head = ""
formattext_cell = ""
for i,v in enumerate(self._widths):
formattext_head += "{" + str(i) + ":<" + str(v) + "} | "
formattext_cell += "{" + str(i) + ":" + self._aligns[i] + str(v) + "} | "
formattext_head = formattext_head[:-3]
formattext_head = " " + formattext_head.strip() + " "
formattext_cell = formattext_cell[:-3]
formattext_cell = " " + formattext_cell.strip() + " "
print("-"*self._tablewidth)
print(formattext_head.format(*self._captions))
print("-"*self._tablewidth)
for w in self._values:
print(formattext_cell.format(*w))
print("-"*self._tablewidth)
Demonstration
# Demonstration
headername = ["Column {}".format(x) for x in range(6)]
headername[0] = "Name"
data = [["Very long name {}".format(x), x, x*2, x*3, x*4, x*5] for x in range(20)]
PrintTable(data, \
headername, \
[70, 10, 10, 10, 10, 10], \
["<",">",">",">",">",">"]).print()
You could try to use the following function
def tableIt(data):
for lin in data:
print("+---"*len(lin)+"+")
for inlin in lin:
print("|",str(inlin),"", end="")
print("|")
print("+---"*len(lin)+"+")
data = [[1,2,3,2,3],[1,2,3,2,3],[1,2,3,2,3],[1,2,3,2,3]]
tableIt(data)
Ok, so this was a bit harder than I though:
def print_matrix(list_of_list):
number_width = len(str(max([max(i) for i in list_of_list])))
cols = max(map(len, list_of_list))
output = '+'+('-'*(number_width+2)+'+')*cols + '\n'
for row in list_of_list:
for column in row:
output += '|' + ' {:^{width}d} '.format(column, width = number_width)
output+='|\n+'+('-'*(number_width+2)+'+')*cols + '\n'
return output
This should work for variable number of rows, columns and number of digits (for numbers)
data = [[1,2,30],
[4,23125,6],
[7,8,999],
]
print print_matrix(data)
>>>>+-------+-------+-------+
| 1 | 2 | 30 |
+-------+-------+-------+
| 4 | 23125 | 6 |
+-------+-------+-------+
| 7 | 8 | 999 |
+-------+-------+-------+
A general purpose set of functions to render any python data structure (dicts and lists nested together) as HTML.
from IPython.display import HTML, display
def _render_list_html(l):
o = []
for e in l:
o.append('<li>%s</li>' % _render_as_html(e))
return '<ol>%s</ol>' % ''.join(o)
def _render_dict_html(d):
o = []
for k, v in d.items():
o.append('<tr><td>%s</td><td>%s</td></tr>' % (str(k), _render_as_html(v)))
return '<table>%s</table>' % ''.join(o)
def _render_as_html(e):
o = []
if isinstance(e, list):
o.append(_render_list_html(e))
elif isinstance(e, dict):
o.append(_render_dict_html(e))
else:
o.append(str(e))
return '<html><body>%s</body></html>' % ''.join(o)
def render_as_html(e):
display(HTML(_render_as_html(e)))
I recently used prettytable for rendering a nice ASCII table. It's similar to the postgres CLI output.
import pandas as pd
from prettytable import PrettyTable
data = [[1,2,3],[4,5,6],[7,8,9]]
df = pd.DataFrame(data, columns=['one', 'two', 'three'])
def generate_ascii_table(df):
x = PrettyTable()
x.field_names = df.columns.tolist()
for row in df.values:
x.add_row(row)
print(x)
return x
generate_ascii_table(df)
Output:
+-----+-----+-------+
| one | two | three |
+-----+-----+-------+
| 1 | 2 | 3 |
| 4 | 5 | 6 |
| 7 | 8 | 9 |
+-----+-----+-------+
I want to output a table where each column has the smallest possible width,
where columns are padded with white space (but this can be changed) and rows are separated by newlines (but this can be changed) and where each item is formatted using str (but...).
def ftable(tbl, pad=' ', sep='\n', normalize=str):
# normalize the content to the most useful data type
strtbl = [[normalize(it) for it in row] for row in tbl]
# next, for each column we compute the maximum width needed
w = [0 for _ in tbl[0]]
for row in strtbl:
for ncol, it in enumerate(row):
w[ncol] = max(w[ncol], len(it))
# a string is built iterating on the rows and the items of `strtbl`:
# items are prepended white space to an uniform column width
# formatted items are `join`ed using `pad` (by default " ")
# eventually we join the rows using newlines and return
return sep.join(pad.join(' '*(wid-len(it))+it for wid, it in zip(w, row))
for row in strtbl)
The function signature, ftable(tbl, pad=' ', sep='\n', normalize=str), with its default arguments is intended to
provide for maximum flexibility.
You can customize
the column padding,
the row separator, (e.g., pad='&', sep='\\\\\n' to have the bulk of a LaTeX table)
the function to be used to normalize the input to a common string
format --- by default, for the maximum generality it is str but if
you know that all your data is floating point lambda item:
"%.4f"%item could be a reasonable choice, etc.
Superficial testing:
I need some test data, possibly involving columns of different width
so that the algorithm needs to be a little more sophisticated (but just a little bit;)
In [1]: from random import randrange
In [2]: table = [[randrange(10**randrange(10)) for i in range(5)] for j in range(3)]
In [3]: table
Out[3]:
[[974413992, 510, 0, 3114, 1],
[863242961, 0, 94924, 782, 34],
[1060993, 62, 26076, 75832, 833174]]
In [4]: print(ftable(table))
974413992 510 0 3114 1
863242961 0 94924 782 34
1060993 62 26076 75832 833174
In [5]: print(ftable(table, pad='|'))
974413992|510| 0| 3114| 1
863242961| 0|94924| 782| 34
1060993| 62|26076|75832|833174
You can add your own formatters. Recursion is optional but really nice.
Try this in JupyterLite:
from html import escape
fmtr = get_ipython().display_formatter.formatters['text/html']
def getfmtr(obj, func=None):
if fmtr.for_type(type(obj)):
return fmtr.for_type(type(obj))(obj)
else:
return escape(obj.__str__()).replace("\n", "<br>")
def strfmtr(obj):
return escape(obj.__str__()).replace("\n", "<br>")
fmtr.for_type(str, strfmtr)
def listfmtr(self):
_repr_ = []
_repr_.append("<table>")
for item in self:
_repr_.append("<tr>")
_repr_.append("<td>")
_repr_.append(getfmtr(item))
_repr_.append("<td>")
_repr_.append("</tr>")
_repr_.append("</table>")
return str().join(_repr_)
fmtr.for_type(list, listfmtr)
def dictfmtr(self):
_repr_ = []
_repr_.append("<table>")
for key in self:
_repr_.append("<th>")
_repr_.append(getfmtr(key))
_repr_.append("<th>")
_repr_.append("<tr>")
for key, value in self.items():
_repr_.append("<td>")
_repr_.append(getfmtr(value))
_repr_.append("<td>")
_repr_.append("</tr>")
_repr_.append("</table>")
return str().join(_repr_)
fmtr.for_type(dict, dictfmtr)
[
"Jupyter is really cool!",
[1, 2],
[
{"Name": "Adams", "Age": 32},
{"Name": "Baker", "Age": 32}
]
]