I am writing a converter code for our Data Department to convert fixed width files into delmited files. Normally we use import the file into Excel, use the text import wizard to set the field lengths, and then just save as a csv. However we have run into the limitation where we have started getting files that are millions of records long, and thus cant be imported into Excel. The files do not always have spaces in between the fields, espicially so between value fields like phone numbers or zip codes. The headers are also often filled completely in with no spaces.
A sample of a typical fixed width file we are dealing with:
SequenSack and PaFull Name****************************]JOB TITLE****************]HOSP NAME******************************]Delivery Address***********************]Alternate 1 Address********************]Calculated Text**********************************]POSTNET Bar
000001T1 P1 Sample A Sample 123 Any Street Anytown 12345-6789 12345678900
000002T1 P1 Sample A Sample Director of Medicine 123 Any Street Po Box 1234 Anytown 12345-6789 12345678900
The program needs to break file into the following delimited fields:
Sequen
Sack and Pa
Full name
Job Title
Hosp Name
Delivery Address
Alternate Address 1
Calculated Text
POSTNET Bar
Each file as a slightly different width of each field depending on the rest of the job. What i am looking for is a GUI oriented delimiter much like the Excel import wizard for fixed width files. I am writing this tool in Python as a part of a larger tool that does many other file operations such as breaking up files into multiple up, reversing a file, converting from delimited to fixed width and check digit checking. I am using Tkinter for the rest of the tools and it would be ideal if the solution use it as well.
Any help appreciated
If I understand the problem correctly (and there's a good chance I don't...), the simplest solution might be to use a text widget.
Make the first line be a series of spaces the same length as the row. Use a couple of alternating tags (eg: "even" and "odd") to give each character an alternate color so they stand out from one another. The second line would be the header, and any remaining lines would be a couple lines of sample data.
Then, set up bindings on the first row to convert a space into an "x" when the user clicks on a character. If they click on an "x", convert it back to a space. They can then go and click on the character that is the start of each column. When the user is done, you can get the first line of the text widget and it will have an "x" for each column. You then just need a little function that translates that into whatever format you need.
It would look roughly like this (though obviously the colors would be different than what appears on this website)
x x x ...
SequenSack and PaFull Name****************************]JOB...
000001T1 P1 Sample A Sample ...
Here's a quick hack to illustrate the general idea. It's a little sloppy but I think it illustrates the technique. When you run it, click on an area in the first row to set or clear a marker. This will cause the header to be highlighted in alternate colors for each marker.
import sys
import Tkinter as tk
import tkFont
class SampleApp(tk.Tk):
def __init__(self, *args, **kwargs):
tk.Tk.__init__(self, *args, **kwargs)
header = "SequenSack and PaFull Name****************************]JOB TITLE****************]HOSP NAME******************************]Delivery Address***********************]Alternate 1 Address********************]Calculated Text**********************************]POSTNET Bar"
sample = "000001T1 P1 Sample A Sample 123 Any Street Anytown 12345-6789 12345678900"
widget = DelimiterWidget(self, header, sample)
hsb = tk.Scrollbar(orient="horizontal", command=widget.xview)
widget.configure(xscrollcommand=hsb.set)
hsb.pack(side="bottom", fill="x")
widget.pack(side="top", fill="x")
class DelimiterWidget(tk.Text):
def __init__(self, parent, header, samplerow):
fixedFont = tkFont.nametofont("TkFixedFont")
tk.Text.__init__(self, parent, wrap="none", height=3, font=fixedFont)
self.configure(cursor="left_ptr")
self.tag_configure("header", background="gray")
self.tag_configure("even", background="#ffffff")
self.tag_configure("header_even", background="bisque")
self.tag_configure("header_odd", background="lightblue")
self.tag_configure("odd", background="#eeeeee")
markers = " "*len(header)
for i in range(len(header)):
tag = "even" if i%2==0 else "odd"
self.insert("end", " ", (tag,))
self.insert("end", "\n")
self.insert("end", header+"\n", "header")
self.insert("end", samplerow, "sample")
self.configure(state="disabled")
self.bind("<1>", self.on_click)
self.bind("<Double-1>", self.on_click)
self.bind("<Triple-1>", self.on_click)
def on_click(self, event):
'''Handle a click on a marker'''
index = self.index("#%s,%s" % (event.x, event.y))
current = self.get(index)
self.configure(state="normal")
self.delete(index)
(line, column) = index.split(".")
tag = "even" if int(column)%2 == 0 else "odd"
char = " " if current == "x" else "x"
self.insert(index, char, tag)
self.configure(state="disabled")
self.highlight_header()
return "break"
def highlight_header(self):
'''Highlight the header based on marker positions'''
self.tag_remove("header_even", 1.0, "end")
self.tag_remove("header_odd", 1.0, "end")
markers = self.get(1.0, "1.0 lineend")
i = 0
start = "2.0"
tag = "header_even"
while True:
try:
i = markers.index("x", i+1)
end = "2.%s" % i
self.tag_add(tag, start, end)
start = self.index(end)
tag = "header_even" if tag == "header_odd" else "header_odd"
except ValueError:
break
if __name__ == "__main__":
app = SampleApp()
app.mainloop()
edit: I now see that you are looking for a gui. I'll leave this incorrect answer for posterity.
import csv
def fixedwidth2csv(fw_name, csv_name, field_info, headings=None):
with open(fw_name, 'r') as fw_in:
with open(csv_name, 'rb') as csv_out: # 'rb' => 'r' for python 3
wtr = csv.writer(csv_out)
if headings:
wtr.writerow(headings)
for line in fw_in:
wtr.writerow(line[pos:pos+width].strip() for pos, width in field_info)
Related
I am trying to retrieve each new line in a text widget separately, I have 3 instances of Container class, the first instance prints data as expected but for second instance duplicates of first line are returned
I am using object.get('current linestart', 'current lineend') to return new lines separately
full code: https://pastebin.com/mLR3zbFg
class Container(tk.Frame):
def __init__(self, parent = None, priority = 3, bg = 'bisque'):
self.inpList = []
self.b = tk.Button(self.f, text = 'add', command = lambda: self.add_task(priority))
def add_task(self, priority): # refer 1.2_text for implementation
finished = False
self.t = tk.Text(self.f)
self.t.configure(height = 10, wrap = 'word')
self.t.bind("<Return>", self.save_task)
self.t.pack(fill = tk.X)
def print_all(self):
print(self.inpList)
def save_task(self, event):
td = self.t.get('current linestart', 'current lineend')
self.inpList.append(td)
if __name__ == '__main__':
window = tk.Tk()
window.minsize(300, 600)
p1 = Container(window, priority = 1)
p2 = Container(window, bg = 'blue', priority = 2)
p3 = Container(window, bg = 'red', priority = 3)
window.mainloop()
figbeam's answer could work, but it wouldn't be great if you had a lot of text inputted and it seems like you want to read each line individually. Here's a better solution in my opinion:
According to the docs, current doesn't seem to do what you expect; namely, current will give you to the character closest to your mouse (and only if you actually move your mouse). This could be a reason why you were noticing weird behavior for the widgets that were below but not the one on top.
A better solution is to move to the end of the text, then up one line, and then use the linestart and lineend selectors you had prior. Namely change
td = self.t.get('current linestart', 'current lineend')
to
td = self.t.get('end - 1 lines linestart', 'end - 1 lines lineend')
Things should work as expected after this change!
Is it necessary to read each line separately? Otherwise you could read the lines into a list and then work with the list items separately.
As far as I can tell the delimiter for lines is newline, which can make it look odd if you use wrap.
Here's an example of reading multiple lines into a list:
line_list = textbox.get('1.0', 'end').split('\n')
for line in line_list:
# Process line
Just a thought.
The line td = self.t.get('current linestart', 'current lineend') doesn't work as expected. One solution is to read the whole content of the text box at each update (as suggested also in https://stackoverflow.com/a/55739714/2314737).
In the code substitute the function save_task() with:
def save_task(self, event):
td = self.t.get("1.0",'end').rstrip()
self.inpList = td.split('\n')
This also takes care of any deleted lines that otherwise will stay in td.
See also this similar question: How to read the input(line by line) from a multiline Tkinter Textbox in Python?
I'm working on a simple markdown parse in tkinter. Concept being that headings can be surrounded by asterisk symbols for example *Heading 1*, **Heading 2**.
I'm use regex to find strings in this format, tag them and change the style of the tags.
The item that I am struggling with is removing the asterisk symbols from the text after they've been searched. I tried some code (included by commented out) but it just removes the tagged text.
My code correctly finds *Heading 1* and turns it in to *Heading 1* but doesn't remove the markdown symbols to get Heading 1
Can anyone help me with an algorithm to remove the asterisk symbols from the headings that retains the formatting?
import tkinter as tk
from tkinter.scrolledtext import ScrolledText
from tkinter import font
class HelpDialog(tk.Toplevel):
"""Seperate window to show the results of SSO Search"""
def __init__(self, parent,text):
super().__init__(parent)
self.title("Help")
self.defaultfont = font.Font(family="Sans Serif",size=12)
self.textbox = ScrolledText(self,height=40,width=80,font=self.defaultfont)
self.textbox.config(wrap=tk.WORD)
self.textbox.grid()
self.textbox.insert(0.0,text)
self.style()
def style(self):
self.h1font = font.Font(family="Sans Serif", size=18, weight="bold")
self.h2font = font.Font(family="Sans Serif", size=14, weight="bold")
self.h3font = font.Font(family="Sans Serif", size=12, weight="bold", slant="italic")
self.textbox.tag_configure("h1",font=self.h1font)
self.textbox.tag_configure("h2",font=self.h2font)
self.textbox.tag_configure("h3",font=self.h3font)
self.tag_match(r"^[\*]{1}[\w\d -]+[\*]{1}$", "h1")
self.tag_match(r"^[\*]{2}[\w\d -]+[\*]{2}$", "h2")
self.tag_match(r"^[\*]{3}[\w\d -]+[\*]{3}$", "h3")
def tag_match(self,regex,tag):
count = tk.IntVar()
self.textbox.mark_set("matchStart", "1.0")
self.textbox.mark_set("matchEnd", "1.0")
while True:
index = self.textbox.search(regex,"matchEnd","end",count=count,regexp=True)
if index=="": break
self.textbox.mark_set("matchStart",index)
self.textbox.mark_set("matchEnd", "%s+%sc" % (index, count.get()))
self.textbox.tag_add(tag,"matchStart","matchEnd")
#Futile attempt to remove the ** from the headings
#text = self.textbox.get("matchStart", "matchEnd")
#orig_length = len(text)
#text = text.replace("*","").ljust(orig_length, " ")
#self.textbox.delete("matchStart", "matchEnd")
#self.textbox.insert("matchStart", text)
if __name__ == '__main__':
text = """*Heading 1*
A paragraph
**Heading 2**
Some more text
***Heading 3***
Conclusion
"""
root = tk.Tk()
root.withdraw()
HelpDialog(root,text)
The short answer is that you can use the delete method of the text widget to delete the characters at the start and end of the range. You can do simplified math on the indexes to adjust them. So, for example, to delete the character at "matchEnd" (which actually represents the spot just after the last character in the matched range) you can do delete("matchEnd-1c") where -1c is short hand for "minus one character".
At the every end of your loop inside of tag_match, add the following two lines:
self.textbox.delete("matchStart")
self.textbox.delete("matchEnd-1c")
However, this code assumes that the markup is just a single byte. You will need to pass information in to tell the function how many characters on each side of the text to delete, since that information doesn't otherwise exist.
For example, you could pass it in like this:
self.tag_match(r"^[\*]{1}[\w\d -]+[\*]{1}$", "h1", 1)
You will then need to adjust the code that deletes the characters to take this information into account. For example, assuming you pass that number in as the variable n, it would look something like this:
def tag_match(self, regex, tag, n):
...
while True:
...
self.textbox.delete("matchEnd-{}c".format(n), "matchEnd")
self.textbox.delete("matchStart", "matchStart+{}c".format(n))
I want to highlight a last added text in my text widget.
I have seen an example regarding that How to highlight text in a tkinter Text widget.The problem is that I add a text with "\n". That's why program consider current line as a new line so it highlights the empty line.
Do you have any idea how I can alter the program? Here is my code
import time
import tkinter as tk
from threading import Thread
class MyApp:
def __init__(self, master):
self.master = master
self.text = tk.Text(self.master)
self.text.pack(side="top", fill="both", expand=True)
self.text.tag_configure("current_line", background="#e9e9e9")
self.start_adding_text()
self._highlight_current_line()
def start_adding_text(self):
thrd1 = Thread(target=self.add_tex)
thrd1.start()
def add_tex(self):
text = "This is demo text\n"
for _ in range(20):
self.text.insert(tk.END, text)
time.sleep(0.1)
return
def _highlight_current_line(self, interval=100):
'''Updates the 'current line' highlighting every "interval" milliseconds'''
self.text.tag_remove("current_line", 1.0, "end")
self.text.tag_add("current_line", "insert linestart", "insert lineend+1c")
self.master.after(interval, self._highlight_current_line)
if __name__ == '__main__':
root = tk.Tk()
app = MyApp(master=root)
root.mainloop()
Your function _highlight_current_line is doing what it is supposed to do: it highlights the line of the insert-cursor. But what you want is to highlight the last inserted text which is something different. You can simply create a new tag.
Let's name it 'last_insert':
self.text.tag_configure("last_insert", background="#e9e9e9")
And when you add text, you can specifiy the tag(s) attached to the inserted text:
self.text.insert(tk.END, text, ('last_insert',))
Of course, if you want only the last inserted text to be highlighted, you add this:
self.text.tag_remove("last_insert", 1.0, "end")
Remark: The tkinter function tag_add takes as arguments tag, start, end, where start and end are text indices in the form of a string 'a.b' where a is the line index (starting with 1 at the top) and b is the character inside this line (starting with 0). You can modify the index with expressions (see here: http://effbot.org/tkinterbook/text.htm. Further, "insert" is a mark (read up on aforementioned link) - and "insert linestart" is replaced by tkinter by the index "line.0" where line is the line the insert cursor is currently in.
You could check if you are at the last line and remove your newline:
def add_tex(self):
loop_times=20
text = "This is demo text\n"
for id,_ in enumerate(list(range(loop_times))):
if id==loop_times-1:
text = "This is demo text"
self.text.insert(tk.END, text)
time.sleep(0.1)
return
Hi I am new to python and I am self taught programmer otherwise.
I am trying to get the copy and paste methods working for the clipboard in wxpython.
I have found and implemented what I have found on the topic, but there is an issue when used on my mac computer (OS X 10.10.5).
The attached code is a sample application that works fine within itself (given limits of grid). It also works fine for copying from the grid cells and pasting to an external notepad or spreadsheet. Meaning to me that the SetData is getting and maintaining the tab delimiters and new lines when building the clipboard data.
However, if I select tab delimited and multiline data from the same notepad or spreadsheet and proceed to paste into the grid, I get a single column of data. This means to me that the tab delimiters and newline characters are lost in the GetData.
With a data selection of
1 2 3
4 5 6
in a spreadsheet.
Using print repr(data) to get what the clipboard is holding, as suggested,
When copying and pasting within application results in
pasting data - print repr(data) = u'1\t2\t3\n4\t5\t6\n'
When data is copied from an external source and pasting seems to only have \r return characters and ufeff ?? which I don't know about? Perhaps thats a key? (on the mac)
print repr(data) = u'\ufeff\r1\r2\r3\r4\r5\r6\r\r'
Now this works fine on my Windows machine, but not on my Mac.
Is this a known issue? Is there a workaround, or is there a setting that I am missing or that I don't understand?
Much appreciate any help or direction.
Thanks
Frank
import wx
import wx.grid as dg
class cp(wx.Panel):
def __init__(self, parent):
wx.Panel.__init__(self, parent, wx.ID_ANY, size = (600,600))
self.dgGrid = dg.Grid(self, size = (500,500))
self.dgGrid.CreateGrid(10,5)
self.dgGrid.Bind(wx.EVT_KEY_DOWN, self.OnKeyPress)
def OnKeyPress(self, event):
# If Ctrl+V is pressed...
if event.ControlDown() and event.GetKeyCode() == 86:
print "Ctrl+V"
# Call paste method
self.Paste()
if event.ControlDown() and event.GetKeyCode() == 67:
print "Ctrl+C"
# Call copy method
self.copy()
event.Skip()
def copy(self):
print "Copy method"
# Number of rows and cols
rows = self.dgGrid.GetSelectionBlockBottomRight()[0][0] - self.dgGrid.GetSelectionBlockTopLeft()[0][0] + 1
cols = self.dgGrid.GetSelectionBlockBottomRight()[0][1] - self.dgGrid.GetSelectionBlockTopLeft()[0][1] + 1
# data variable contain text that must be set in the clipboard
data = ''
# For each cell in selected range append the cell value in the data variable
# Tabs '\t' for cols and '\r' for rows
for r in range(rows):
for c in range(cols):
data = data + str(self.dgGrid.GetCellValue(self.dgGrid.GetSelectionBlockTopLeft()[0][0] + r, self.dgGrid.GetSelectionBlockTopLeft()[0][1] + c))
if c < cols - 1:
data = data + '\t'
data = data + '\n'
# Create text data object
clipboard = wx.TextDataObject()
# Set data object value
clipboard.SetText(data)
# Put the data in the clipboard
if wx.TheClipboard.Open():
wx.TheClipboard.SetData(clipboard)
wx.TheClipboard.Close()
else:
wx.MessageBox("Can't open the clipboard", "Error")
def Paste(self):
print "Paste method"
clipboard = wx.TextDataObject()
if wx.TheClipboard.Open():
wx.TheClipboard.GetData(clipboard)
wx.TheClipboard.Close()
else:
wx.MessageBox("Can't open the clipboard", "Error")
return
data = clipboard.GetText()
y = -1
# Convert text in a array of lines
for r in data.splitlines():
y = y +1
x = -1
print r
# Convert c in a array of text separated by tab
for c in r.split('\t'):
x = x +1
print c
self.dgGrid.SetCellValue(self.dgGrid.GetGridCursorRow() + y, self.dgGrid.GetGridCursorCol() + x, c)
if __name__ == '__main__':
print ' running locally not imported '
app = wx.App(False)
MainFrame = wx.Frame(None, title = "TestingCopy and Paste", size = (600,600))
cppanel = cp(MainFrame)
MainFrame.Refresh()
MainFrame.Show()
app.MainLoop()
I have syntax highlighting implemented in Python using Tkinter. For example, I can make it automatically highlight "derp". The problem is that when I modify the string to, say, "dERP"or something similar, it will still highlight the "d" (aka the only remaining original character). How do I clear formatting on this? I've considered creating a tag that will set the background to white for the entire document, but then this creates problems with highlighting.
code:
from Tkinter import *
import sys, os
class easyTex(Text):
def __init__(self,base,**args):
Text.__init__(self,base,**args)
self.tag_configure("default", background="white")
self.tag_configure("search", background="blue")
def highlightPattern(self, pattern, tag):
start = "1.0"
countVar = StringVar()
while True:
pos = self.search(pattern, start, stopindex="end", count=countVar, regexp=True)
if not pos: break
self.tag_add(tag, pos, "%s+%sc" % (pos, countVar.get()))
start = "%s+%dc" % (pos, int(countVar.get()) + 1)
def highlightSyntax(self):
self.highlightPattern(".*", "default")
self.highlightPattern("a red car", "search")
base = Tk()
editor = easyTex(base)
base.bind("<Escape>", lambda e: sys.exit())
base.bind("<Key>", lambda e: editor.highlightSyntax())
editor.pack(fill=BOTH, expand=1)
base.call('wm', 'attributes', '.', '-topmost', True)
mainloop()
(this is using the regex: "a red car":)
To remove the effects of a tag, remove the tag from the range of characters. You can remove a tag with tag_remove, giving it a starting and ending range that you want the tag removed from.
For example, to remove the "search" tag from the entire document, do this:
self.tag_remove("search", "1.0", "end")