How to remove the last 2 numbers from a string? - python

I am trying to take a Display Name / Keypad code from an excel document and add it into my companies website. My problem is when I parse the data from the excel document, the document will show 4240, but when it goes to add it into the website it picks it up at 4240.0. How can I remove the ".0" when I parse the data?
This is the code I currently have, the only problem with this is for some reason it will not picking up the "0" if it is in the front or end of a code.
For example, if the code is 0420, it only picks up 42 and doesn't apply the leading and ending 0. I tried changing the excel format to text that way it doesn't pick it up as a number but that didn't help either.
I think the best method would be to remove the last 2 pieces of information with index?
def addCodesA():
workbook = xlrd.open_workbook(path)
sheet = workbook.sheet_by_index(0)
for y in range(sheet.nrows):
names = []
codes = []
convertedcodes = []
names.append(str(sheet.cell_value(y, 0)))
codes.append(str(sheet.cell_value(y, 1)))
for strippedcode in codes:
convertedcodes.append(strippedcode.strip('.0'))
print(names)
print(codes)
driver.find_element_by_xpath('//*[#id="device_keypad_relay"][#value="0"]').click()
time.sleep(1)
codeadd = driver.find_element_by_name('keypad_code_1')
nameadd = driver.find_element_by_name('keypad_code_1_display')
codeadd.clear()
nameadd.clear()
codeadd.send_keys(convertedcodes)
nameadd.send_keys(names)
driver.find_element_by_class_name('btn-form-end').send_keys(Keys.SHIFT,Keys.ENTER)
time.sleep(6)
driver.get(customercodes)

Related

Data Formatting and Fixing

I am trying to clean user reviews which they are crawled in the web. When I try to read on the pandas. There is no warning or error. Then print the lenght of the dataframe.
Then I would like to apply normalization step. But I am focusing on Turkish language,so I cannot use python library. I will use third party software.
For this purpose, I am trying to write reviews columns to text file. When I write to these data text file lenght of the sample is
and target size:
Basically I do this:
Note: As I mentioned these are the customer reviews, as we expected they are dirty and noisy. Some of the samples contains many enter characters such as approximately 56 of the sample contains "\n\n\n\n". I have tried solve this problem in python by cleaning data but every time I am losing sample. Also I tried to fix it on Excel, it did not work.
Question: Do you have any suggestion for fixing data?
It seems that you are producing two CSVs files from your df and then read them back as reviews and targets.
If you use pd.read_csv to read them back, pd.read_csv has this argument skip_blank_lines=True by default which skips blank lines. If some rows of your original df contains only a number of '\n', then it will end up with an empty line in your new CSVs which will be skipped the next time they get read.
You can verify this by setting up two counter variables for the total number of empty lines and see if that matches with the 'loss'.
num_empty_review = 0
num_empty_target = 0
for ..., ... in df.iterrows():
review = ...replace('\n', '')
target = ...replace('\n', '')
if review.replace(' ', '') == '':
num_empty_review += 1
if target.replace(' ', '') == '':
num_empty_target += 1
...
...
print(num_empty_review, num_empty_target)
Lastly, next time, please paste your code here in text form like what I did in above :)

Rewriting Single Words in a .txt with Python

I need to create a Database, using Python and a .txt file.
Creating new items is no Problem,the inside of the Databse.txt looks like this:
Index Objektname Objektplace Username
i.e:
1 Pen Office Daniel
2 Saw Shed Nic
6 Shovel Shed Evelyn
4 Knife Room6 Evelyn
I get the index from a QR-Scanner (OpenCV) and the other informations are gained via Tkinter Entrys and if an objekt is already saved in the Database, you should be able to rewrite Objektplace and Username.
My Problems now are the following:
If I scan the Code with the index 6, how do i navigate to that entry, even if it's not in line 6, without causing a Problem with the Room6?
How do I, for example, only replace the "Shed" from Index 4 when that Objekt is moved to f.e. Room6?
Same goes for the Usernames.
Up until now i've tried different methods, but nothing worked so far.
The last try looked something like this
def DBChange():
#Removes unwanted bits from the scanned code
data2 = data.replace("'", "")
Index = data2.replace("b","")
#Gets the Data from the Entry-Widgets
User = Nutzer.get()
Einlagerungsort = Ort.get()
#Adds a whitespace at the end of the Entrys to seperate them
Userlen = len(User)
User2 = User.ljust(Userlen)
Einlagerungsortlen = len(Einlagerungsort)+1
Einlagerungsort2 = Einlagerungsort.ljust(Einlagerungsortlen)
#Navigate to the exact line of the scanned Index and replace the words
#for the place and the user ONLY in this line
file = open("Datenbank.txt","r+")
lines=file.readlines()
for word in lines[Index].split():
List.append(word)
checkWords = (List[2],List[3])
repWords = (Einlagerungsort2, User2)
for line in file:
for check, rep in zip(checkWords, repWords):
line = line.replace(check, rep)
file.write(line)
file.close()
Return()
Thanks in advance
I'd suggest using Pandas to read and write your textfile. That way you can just use the index to select the approriate line. And if there is no specific reason to use your text format, I would switch to csv for ease of use.
import pandas as pd
def DBChange():
#Removes unwanted bits from the scanned code
# I haven't changed this part, since I guess you need this for some input data
data2 = data.replace("'", "")
Indexnr = data2.replace("b","")
#Gets the Data from the Entry-Widgets
User = Nutzer.get()
Einlagerungsort = Ort.get()
# I removed the lines here. This isn't necessary when using csv and Pandas
# read in the csv file
df = pd.read_csv("Datenbank.csv")
# Select line with index and replace value
df.loc[Indexnr, 'Username'] = User
df.loc[Indexnr, 'Objektplace'] = Einlagerungsort
# Write back to csv
df.to_csv("Datenbank.csv")
Return()
Since I can't reproduce your specific problem, I haven't tested it. But something like this should work.
Edit
To read and write text-file, use ' ' as the seperator. (I assume all values do not contain spaces, and your text file now uses 1 space between values).
reading:
df = pd.read_csv('Datenbank.txt', sep=' ')
Writing:
df.to_csv('Datenbank.txt', sep=' ')
First of all, this is a terrible way to store data. My suggestion is not particularily well code, don't do this in production! (edit
newlines = []
for line in lines:
entry = line.split()
if entry[0] == Index:
#line now is the correct line
#Index 2 is the place, index 0 the ID, etc
entry[2] = Einlagerungsort2
newlines.append(" ".join(entry))
# Now write newlines back to the file

Writing to file random loss and gain of 0s

I am writing a python script to convert old DayLite contacts into CSV format to be imported into Outlook. I have a script that functions completely almost perfectly except for one small issue but due to being mass data fixing it in the file will take way to long.
The list of contacts is very long 1,100+ rows in the spreadsheet. When the text gets written into the CSV file everything is good except certain/random phone numbers lose their leading 0 and gain a '.0' at the end. However the majority of the phone numbers are left in the exact format.
This is my script code:
import xlrd
import xlwt
import csv
import numpy
##########################
# Getting XLS Data sheet #
##########################
oldFormatContacts = xlrd.open_workbook('DayliteContacts_Oct16.xls')
ofSheet = oldFormatContacts.sheet_by_index(0)
##################################
# Storing values in array medium #
##################################
rowVal = [''] * ofSheet.nrows
x = 1
for x in range(ofSheet.nrows):
rowVal[x] = (ofSheet.row_values(x))
######################
# Getting CVS titles #
######################
csvTemp = xlrd.open_workbook('Outlook.xls')
csvSheet = csvTemp.sheet_by_index(0)
csv_title = csvSheet.row_values(0)
rowVal[0] = csv_title
##############################################################
# Append and padding data to contain commas for empty fields #
##############################################################
x = 0
q = '"'
for x in range(ofSheet.nrows):
temporaryRow = rowVal[x]
temporaryRow = str(temporaryRow).strip('[]')
if x > 0:
rowVal[x] = (','+str(q+temporaryRow.split(',')[0]+q)+',,'+str(q+temporaryRow.split(',')[1]+q)+',,'+str(q+temporaryRow.split(',')[2]+q)+',,,,,,,,,,,,,,,,,,,,,,,,,,'+str(q+temporaryRow.split(',')[4]+q)+','+str(q+temporaryRow.split(',')[6]+q)+',,,,,,,,,,,,,,,,,,,,,,,,,'+str(q+temporaryRow.split(',')[8])+q)
j = 0
for j in range(0,21):
rowVal[x] += ','
tempString = str(rowVal[x])
tempString = tempString.replace("'","")
#tempString = tempString.replace('"', '')
#tempString = tempString.replace(" ", "")
rowVal[x] = tempString
######################################
# Open and write values too new file #
######################################
csv_file = open('csvTestFile.csv', 'w')
rownum = 0
for rownum in range(ofSheet.nrows):
csv_file.write(rowVal[rownum])
csv_file.write("\n")
csv_file.close()
Sorry if my coding is incoherent I am a beginner to python scripts.
Unfortunately I cannot show or provide the contact details due to privacy reasons however I will give some examples in the exact format that it occurs.
So in the DayLite document a contact would be saved as "First name, Second name, Company, phone number 1, phone number 2, email" for example:
"Joe, Black, Stack Overflow, 07472329584,"
but when written into the CSV file it will be
"Joe","Black","Stack Overflow","7472329584.0".
This is odd because for each occurrence of that problem there will be 10 or so fine numbers that get saved exactly the same e.g. In DayLite: "+446738193583" when written in CSV: "+446738193583".
I forgot to mention (this is an edit) that many phone numbers KEEP their leading 0 and do not gain a trailing 0. It's probably 1/20 phone numbers that gets messed up.
It seems to me to be a very weird error and this is why I have come here for help! If anyone has any ideas I'd be more than happy to hear them. Cheers guys.
The issue lied within the Excel document but I assumed it lied within my script. I placed an ' before each number that was causing a format error. This meant when read from the sheet there was no issues with format and it wrote it back to the file successfully.

Use Python xlsxwriter module to write srt data into and excel

this time I tried to use Python's xlsxwriter module to write data from a .srt into an excel.
The subtitle file looks like this in sublime text:
but I want to write the data into an excel, so it looks like this:
It's my first time to code python for this, so I'm still in the stage of trial and error...I tried to write some code like below
but I don't think it makes sense...
I'll continue trying out, but if you know how to do it, please let me know. I'll read your code and try to understand them! Thank you! :)
The following breaks the problem into a few pieces:
Parsing the input file. parse_subtitles is a generator that takes a source of lines and yields up a sequence of records in the form {'index':'N', 'timestamp':'NN:NN:NN,NNN -> NN:NN:NN,NNN', 'subtitle':'TEXT'}'. The approach I took was to track which of three distinct states we're in:
seeking to next entry for when we're looking for the next index number, which should match the regular expression ^\d*$ (nothing but a bunch of numbers)
looking for timestamp when an index is found and we expect a timestamp to come in the next line, which should match the regular expression ^\d{2}:\d{2}:\d{2},\d{3} --> \d{2}:\d{2}:\d{2},\d{3}$ (HH:MM:SS,mmm -> HH:MM:SS,mmm) and
reading subtitles while consuming actual subtitle text, with blank lines and EOF interpreted as subtitle termination points.
Writing the above records to a row in a worksheet. write_dict_to_worksheet accepts a row and worksheet, plus a record and a dictionary defining the Excel 0-indexed column numbers for each of the record's keys, and then it writes the data appropriately.
Organizaing the overall conversion convert accepts an input filename (e.g. 'Wildlife.srt' that'll be opened and passed to the parse_subtitles function, and an output filename (e.g. 'Subtitle.xlsx' that will be created using xlsxwriter. It then writes a header and, for each record parsed from the input file, writes that record to the XLSX file.
Logging statements left in for self-commenting purposes, and because when reproducing your input file I fat-fingered a : to a ; in a timestamp, making it unrecognized, and having the error pop up was handy for debugging!
I've put a text version of your source file, along with the below code, in this Gist
import xlsxwriter
import re
import logging
def parse_subtitles(lines):
line_index = re.compile('^\d*$')
line_timestamp = re.compile('^\d{2}:\d{2}:\d{2},\d{3} --> \d{2}:\d{2}:\d{2},\d{3}$')
line_seperator = re.compile('^\s*$')
current_record = {'index':None, 'timestamp':None, 'subtitles':[]}
state = 'seeking to next entry'
for line in lines:
line = line.strip('\n')
if state == 'seeking to next entry':
if line_index.match(line):
logging.debug('Found index: {i}'.format(i=line))
current_record['index'] = line
state = 'looking for timestamp'
else:
logging.error('HUH: Expected to find an index, but instead found: [{d}]'.format(d=line))
elif state == 'looking for timestamp':
if line_timestamp.match(line):
logging.debug('Found timestamp: {t}'.format(t=line))
current_record['timestamp'] = line
state = 'reading subtitles'
else:
logging.error('HUH: Expected to find a timestamp, but instead found: [{d}]'.format(d=line))
elif state == 'reading subtitles':
if line_seperator.match(line):
logging.info('Blank line reached, yielding record: {r}'.format(r=current_record))
yield current_record
state = 'seeking to next entry'
current_record = {'index':None, 'timestamp':None, 'subtitles':[]}
else:
logging.debug('Appending to subtitle: {s}'.format(s=line))
current_record['subtitles'].append(line)
else:
logging.error('HUH: Fell into an unknown state: `{s}`'.format(s=state))
if state == 'reading subtitles':
# We must have finished the file without encountering a blank line. Dump the last record
yield current_record
def write_dict_to_worksheet(columns_for_keys, keyed_data, worksheet, row):
"""
Write a subtitle-record to a worksheet.
Return the row number after those that were written (since this may write multiple rows)
"""
current_row = row
#First, horizontally write the entry and timecode
for (colname, colindex) in columns_for_keys.items():
if colname != 'subtitles':
worksheet.write(current_row, colindex, keyed_data[colname])
#Next, vertically write the subtitle data
subtitle_column = columns_for_keys['subtitles']
for morelines in keyed_data['subtitles']:
worksheet.write(current_row, subtitle_column, morelines)
current_row+=1
return current_row
def convert(input_filename, output_filename):
workbook = xlsxwriter.Workbook(output_filename)
worksheet = workbook.add_worksheet('subtitles')
columns = {'index':0, 'timestamp':1, 'subtitles':2}
next_available_row = 0
records_processed = 0
headings = {'index':"Entries", 'timestamp':"Timecodes", 'subtitles':["Subtitles"]}
next_available_row=write_dict_to_worksheet(columns, headings, worksheet, next_available_row)
with open(input_filename) as textfile:
for record in parse_subtitles(textfile):
next_available_row = write_dict_to_worksheet(columns, record, worksheet, next_available_row)
records_processed += 1
print('Done converting {inp} to {outp}. {n} subtitle entries found. {m} rows written'.format(inp=input_filename, outp=output_filename, n=records_processed, m=next_available_row))
workbook.close()
convert(input_filename='Wildlife.srt', output_filename='Subtitle.xlsx')
Edit: Updated to split multiline subtitles across multiple rows in output

Add page number to a Word document using Python

Is there a way to add page numbers to the lower right corner of a Word document using Python win32com? I am able to add headers and footers, but I can't find a way to add page numbers in the format PageNumber of TotalPages (for example: 1 of 5)
Below is the code to add centered headers and footers to a page
from win32com.client import Dispatch as MakeDoc
filename = name + '.doc'
WordDoc = MakeDoc("Word.Application")
WordDoc = WordDoc.Documents.Add()
WordDoc.Sections(1).Headers(1).Range.Text = name
WordDoc.Sections(1).Headers(1).Range.ParagraphFormat.Alignment = 1
WordDoc.Sections(1).Footers(1).Range.Text = filename
WordDoc.Sections(1).Footers(1).Range.ParagraphFormat.Alignment = 1
Thanks
To insert page numbers use the following statements:
WordDoc.Sections(1).Footers(1).PageNumbers.Add(2,True)
WordDoc.Sections(1).Footers(1).PageNumbers.NumberStyle = 57
However, the format of the page number is -page number-. Documentation for inserting the page number is here, and the one for the number style is here
I know this is an old question but I was banging my head against a wall trying to figure out the same thing and ended up working out a solution that's rather ugly, but gets the job done. Note that I had to redefine activefooter after inserting wdFieldPage or else the resulting footer would look like of 12 rather than 1 of 2.
The answer to this vba question was helpful when I was trying to figure out the formatting.
I'm using Python 3.4, testdocument.doc is just an existing .doc file with some random text spread across two pages and no existing footer.
w = win32com.client.gencache.EnsureDispatch("Word.Application")
w.Visible = 0
adoc = w.Documents.Open("C:\\temp1\\testdocument.doc")
activefooter = adoc.Sections(1).Footers(win32com.client.constants.wdHeaderFooterPrimary).Range
activefooter.ParagraphFormat.Alignment = win32com.client.constants.wdAlignParagraphRight
activefooter.Collapse(0)
activefooter.Fields.Add(activefooter,win32com.client.constants.wdFieldPage)
activefooter = adoc.Sections(1).Footers(win32com.client.constants.wdHeaderFooterPrimary).Range
activefooter.Collapse(0)
activefooter.InsertAfter(Text = ' of ')
activefooter.Collapse(0)
activefooter.Fields.Add(activefooter,win32com.client.constants.wdFieldNumPages)
adoc.Save()
adoc.Close()
w.Quit()

Categories

Resources