Writing to file random loss and gain of 0s - python

I am writing a python script to convert old DayLite contacts into CSV format to be imported into Outlook. I have a script that functions completely almost perfectly except for one small issue but due to being mass data fixing it in the file will take way to long.
The list of contacts is very long 1,100+ rows in the spreadsheet. When the text gets written into the CSV file everything is good except certain/random phone numbers lose their leading 0 and gain a '.0' at the end. However the majority of the phone numbers are left in the exact format.
This is my script code:
import xlrd
import xlwt
import csv
import numpy
##########################
# Getting XLS Data sheet #
##########################
oldFormatContacts = xlrd.open_workbook('DayliteContacts_Oct16.xls')
ofSheet = oldFormatContacts.sheet_by_index(0)
##################################
# Storing values in array medium #
##################################
rowVal = [''] * ofSheet.nrows
x = 1
for x in range(ofSheet.nrows):
rowVal[x] = (ofSheet.row_values(x))
######################
# Getting CVS titles #
######################
csvTemp = xlrd.open_workbook('Outlook.xls')
csvSheet = csvTemp.sheet_by_index(0)
csv_title = csvSheet.row_values(0)
rowVal[0] = csv_title
##############################################################
# Append and padding data to contain commas for empty fields #
##############################################################
x = 0
q = '"'
for x in range(ofSheet.nrows):
temporaryRow = rowVal[x]
temporaryRow = str(temporaryRow).strip('[]')
if x > 0:
rowVal[x] = (','+str(q+temporaryRow.split(',')[0]+q)+',,'+str(q+temporaryRow.split(',')[1]+q)+',,'+str(q+temporaryRow.split(',')[2]+q)+',,,,,,,,,,,,,,,,,,,,,,,,,,'+str(q+temporaryRow.split(',')[4]+q)+','+str(q+temporaryRow.split(',')[6]+q)+',,,,,,,,,,,,,,,,,,,,,,,,,'+str(q+temporaryRow.split(',')[8])+q)
j = 0
for j in range(0,21):
rowVal[x] += ','
tempString = str(rowVal[x])
tempString = tempString.replace("'","")
#tempString = tempString.replace('"', '')
#tempString = tempString.replace(" ", "")
rowVal[x] = tempString
######################################
# Open and write values too new file #
######################################
csv_file = open('csvTestFile.csv', 'w')
rownum = 0
for rownum in range(ofSheet.nrows):
csv_file.write(rowVal[rownum])
csv_file.write("\n")
csv_file.close()
Sorry if my coding is incoherent I am a beginner to python scripts.
Unfortunately I cannot show or provide the contact details due to privacy reasons however I will give some examples in the exact format that it occurs.
So in the DayLite document a contact would be saved as "First name, Second name, Company, phone number 1, phone number 2, email" for example:
"Joe, Black, Stack Overflow, 07472329584,"
but when written into the CSV file it will be
"Joe","Black","Stack Overflow","7472329584.0".
This is odd because for each occurrence of that problem there will be 10 or so fine numbers that get saved exactly the same e.g. In DayLite: "+446738193583" when written in CSV: "+446738193583".
I forgot to mention (this is an edit) that many phone numbers KEEP their leading 0 and do not gain a trailing 0. It's probably 1/20 phone numbers that gets messed up.
It seems to me to be a very weird error and this is why I have come here for help! If anyone has any ideas I'd be more than happy to hear them. Cheers guys.

The issue lied within the Excel document but I assumed it lied within my script. I placed an ' before each number that was causing a format error. This meant when read from the sheet there was no issues with format and it wrote it back to the file successfully.

Related

How to remove the last 2 numbers from a string?

I am trying to take a Display Name / Keypad code from an excel document and add it into my companies website. My problem is when I parse the data from the excel document, the document will show 4240, but when it goes to add it into the website it picks it up at 4240.0. How can I remove the ".0" when I parse the data?
This is the code I currently have, the only problem with this is for some reason it will not picking up the "0" if it is in the front or end of a code.
For example, if the code is 0420, it only picks up 42 and doesn't apply the leading and ending 0. I tried changing the excel format to text that way it doesn't pick it up as a number but that didn't help either.
I think the best method would be to remove the last 2 pieces of information with index?
def addCodesA():
workbook = xlrd.open_workbook(path)
sheet = workbook.sheet_by_index(0)
for y in range(sheet.nrows):
names = []
codes = []
convertedcodes = []
names.append(str(sheet.cell_value(y, 0)))
codes.append(str(sheet.cell_value(y, 1)))
for strippedcode in codes:
convertedcodes.append(strippedcode.strip('.0'))
print(names)
print(codes)
driver.find_element_by_xpath('//*[#id="device_keypad_relay"][#value="0"]').click()
time.sleep(1)
codeadd = driver.find_element_by_name('keypad_code_1')
nameadd = driver.find_element_by_name('keypad_code_1_display')
codeadd.clear()
nameadd.clear()
codeadd.send_keys(convertedcodes)
nameadd.send_keys(names)
driver.find_element_by_class_name('btn-form-end').send_keys(Keys.SHIFT,Keys.ENTER)
time.sleep(6)
driver.get(customercodes)

Rewriting Single Words in a .txt with Python

I need to create a Database, using Python and a .txt file.
Creating new items is no Problem,the inside of the Databse.txt looks like this:
Index Objektname Objektplace Username
i.e:
1 Pen Office Daniel
2 Saw Shed Nic
6 Shovel Shed Evelyn
4 Knife Room6 Evelyn
I get the index from a QR-Scanner (OpenCV) and the other informations are gained via Tkinter Entrys and if an objekt is already saved in the Database, you should be able to rewrite Objektplace and Username.
My Problems now are the following:
If I scan the Code with the index 6, how do i navigate to that entry, even if it's not in line 6, without causing a Problem with the Room6?
How do I, for example, only replace the "Shed" from Index 4 when that Objekt is moved to f.e. Room6?
Same goes for the Usernames.
Up until now i've tried different methods, but nothing worked so far.
The last try looked something like this
def DBChange():
#Removes unwanted bits from the scanned code
data2 = data.replace("'", "")
Index = data2.replace("b","")
#Gets the Data from the Entry-Widgets
User = Nutzer.get()
Einlagerungsort = Ort.get()
#Adds a whitespace at the end of the Entrys to seperate them
Userlen = len(User)
User2 = User.ljust(Userlen)
Einlagerungsortlen = len(Einlagerungsort)+1
Einlagerungsort2 = Einlagerungsort.ljust(Einlagerungsortlen)
#Navigate to the exact line of the scanned Index and replace the words
#for the place and the user ONLY in this line
file = open("Datenbank.txt","r+")
lines=file.readlines()
for word in lines[Index].split():
List.append(word)
checkWords = (List[2],List[3])
repWords = (Einlagerungsort2, User2)
for line in file:
for check, rep in zip(checkWords, repWords):
line = line.replace(check, rep)
file.write(line)
file.close()
Return()
Thanks in advance
I'd suggest using Pandas to read and write your textfile. That way you can just use the index to select the approriate line. And if there is no specific reason to use your text format, I would switch to csv for ease of use.
import pandas as pd
def DBChange():
#Removes unwanted bits from the scanned code
# I haven't changed this part, since I guess you need this for some input data
data2 = data.replace("'", "")
Indexnr = data2.replace("b","")
#Gets the Data from the Entry-Widgets
User = Nutzer.get()
Einlagerungsort = Ort.get()
# I removed the lines here. This isn't necessary when using csv and Pandas
# read in the csv file
df = pd.read_csv("Datenbank.csv")
# Select line with index and replace value
df.loc[Indexnr, 'Username'] = User
df.loc[Indexnr, 'Objektplace'] = Einlagerungsort
# Write back to csv
df.to_csv("Datenbank.csv")
Return()
Since I can't reproduce your specific problem, I haven't tested it. But something like this should work.
Edit
To read and write text-file, use ' ' as the seperator. (I assume all values do not contain spaces, and your text file now uses 1 space between values).
reading:
df = pd.read_csv('Datenbank.txt', sep=' ')
Writing:
df.to_csv('Datenbank.txt', sep=' ')
First of all, this is a terrible way to store data. My suggestion is not particularily well code, don't do this in production! (edit
newlines = []
for line in lines:
entry = line.split()
if entry[0] == Index:
#line now is the correct line
#Index 2 is the place, index 0 the ID, etc
entry[2] = Einlagerungsort2
newlines.append(" ".join(entry))
# Now write newlines back to the file

Python Programming Error for DataScience DataFrame

I am reading my data from a CSV file using pandas and it works well with range 700. But as soon as I go above 700 and trying to append to a list in python it is showing me list index out of range. But the CSV has around 500K of rows
Can anyone help me with that why is it happening?
Thanks in advance.
import pandas as pd
df_email = pd.read_csv('emails.csv',nrows=800)
test_email = df_email.iloc[:,-1]
list_of_emails = []
for i in range(len(test_email)):
var_email = test_email[i].split("\n") #this code takes one single email splits based on a new line giving a python list of all the strings in the email
email = {}
message_body = ''
for _ in var_email:
if ":" in _:
var_sentence = _.split(":") #this part actually uses the ":" to find the elements in the list that have ":" present
for j in range(len(var_sentence)):
if var_sentence[j].lower().strip() == "from":
email['from'] = var_sentence[var_sentence.index(var_sentence[j+1])].lower().strip()
elif var_sentence[j].lower().strip() == "to":
email['to'] = var_sentence[var_sentence.index(var_sentence[j+1])].lower().strip()
elif var_sentence[j].lower().strip() == 'subject':
if var_sentence[var_sentence.index(var_sentence[j+1])].lower().strip() == 're':
email['subject'] = var_sentence[var_sentence.index(var_sentence[j+2])].lower().strip()
else:
email['subject'] = var_sentence[var_sentence.index(var_sentence[j+1])].lower().strip()
elif ":" not in _:
message_body += _.strip()
email['body'] = message_body
list_of_emails.append(email)
I am not sure of what you are trying to say here (might as well put example inputs and outputs here), but I came across this problem, which might be of the same nature, sometime weeks ago.
CSV files are comma-separated, which means it always takes note of every comma in a line to separate them in columns. If some dirty input from strings in your CSV file are present, then it will mess up the columns that you are expecting to have.
Best solution here is have some code to cleanup your CSV file, change its delimiter to another character (probably '|', '&', or anything that also doesn't mess up with the data), and revise your code to reflect these changes to the CSV.
use the pandas library to read the file.
it is very efficient and saves you time in writing the code yourself.
eg :
import pandas as pd
training_data = pd.read_csv( "train.csv", sep = ",", header = None )

Python: Writing peoples scores to individual lines

I have a task where I need to record peoples scores in a text file. My Idea was to set it out like this:
Jon: 4, 1, 3
Simon: 1, 3, 6
This has the name they inputted along with their 3 last scores (Only 3 should be recorded).
Now for my question; Can anyone point me in the right direction to do this? Im not asking for you to write my code for me, Im simply asking for some tips.
Thanks.
Edit: Im guessing it would look something like this: I dont know how I'd add scores after their first though like above.
def File():
score = str(Name) + ": " + str(correct)
File = open('Test.txt', 'w+')
File.write(score)
File.close()
Name = input("Name: ")
correct = input("Number: ")
File()
You could use pandas to_csv() function and store your data in a dictionary. It will be much easier than creating your own format.
from pandas import DataFrame, read_csv
import pandas as pd
def tfile(names):
df = DataFrame(data = names, columns = names.keys())
with open('directory','w') as f:
f.write(df.to_string(index=False, header=True))
names = {}
for i in xrange(num_people):
name = input('Name: ')
if name not in names:
names[name] = []
for j in xrange(3):
score = input('Score: ')
names[name].append(score)
tfile(names)
Simon Jon
1 4
3 1
6 3
This should meet your text requirement now. It converts it to a string and then writes the string to the .txt file. If you need to read it back in you can use pandas read_table(). Here's a link if you want to read about it.
Since you are not asking for the exact code, here is an idea and some pointers
Collect the last three scores per person in a list variable called last_three
do something like:
",".join(last_three) #this gives you the format 4,1,3 etc
write to file an entry such as
name + ":" + ",".join(last_three)
You'll need to do this for each "line" you process
I'd recommend using with clause to open the file in write mode and process your data (as opposed to just an "open" clause) since with handles try/except/finally problems of opening/closing file handles...So...
with open(my_file_path, "w") as f:
for x in my_formatted_data:
#assuming x is a list of two elements name and last_three elems (example: [Harry, [1,4,5]])
name, last_three = x
f.write(name + ":" + ",".join(last_three))
f.write("\n")# a new line
In this way you don't really need to open/close file as with clause takes care of it for you

Use Python xlsxwriter module to write srt data into and excel

this time I tried to use Python's xlsxwriter module to write data from a .srt into an excel.
The subtitle file looks like this in sublime text:
but I want to write the data into an excel, so it looks like this:
It's my first time to code python for this, so I'm still in the stage of trial and error...I tried to write some code like below
but I don't think it makes sense...
I'll continue trying out, but if you know how to do it, please let me know. I'll read your code and try to understand them! Thank you! :)
The following breaks the problem into a few pieces:
Parsing the input file. parse_subtitles is a generator that takes a source of lines and yields up a sequence of records in the form {'index':'N', 'timestamp':'NN:NN:NN,NNN -> NN:NN:NN,NNN', 'subtitle':'TEXT'}'. The approach I took was to track which of three distinct states we're in:
seeking to next entry for when we're looking for the next index number, which should match the regular expression ^\d*$ (nothing but a bunch of numbers)
looking for timestamp when an index is found and we expect a timestamp to come in the next line, which should match the regular expression ^\d{2}:\d{2}:\d{2},\d{3} --> \d{2}:\d{2}:\d{2},\d{3}$ (HH:MM:SS,mmm -> HH:MM:SS,mmm) and
reading subtitles while consuming actual subtitle text, with blank lines and EOF interpreted as subtitle termination points.
Writing the above records to a row in a worksheet. write_dict_to_worksheet accepts a row and worksheet, plus a record and a dictionary defining the Excel 0-indexed column numbers for each of the record's keys, and then it writes the data appropriately.
Organizaing the overall conversion convert accepts an input filename (e.g. 'Wildlife.srt' that'll be opened and passed to the parse_subtitles function, and an output filename (e.g. 'Subtitle.xlsx' that will be created using xlsxwriter. It then writes a header and, for each record parsed from the input file, writes that record to the XLSX file.
Logging statements left in for self-commenting purposes, and because when reproducing your input file I fat-fingered a : to a ; in a timestamp, making it unrecognized, and having the error pop up was handy for debugging!
I've put a text version of your source file, along with the below code, in this Gist
import xlsxwriter
import re
import logging
def parse_subtitles(lines):
line_index = re.compile('^\d*$')
line_timestamp = re.compile('^\d{2}:\d{2}:\d{2},\d{3} --> \d{2}:\d{2}:\d{2},\d{3}$')
line_seperator = re.compile('^\s*$')
current_record = {'index':None, 'timestamp':None, 'subtitles':[]}
state = 'seeking to next entry'
for line in lines:
line = line.strip('\n')
if state == 'seeking to next entry':
if line_index.match(line):
logging.debug('Found index: {i}'.format(i=line))
current_record['index'] = line
state = 'looking for timestamp'
else:
logging.error('HUH: Expected to find an index, but instead found: [{d}]'.format(d=line))
elif state == 'looking for timestamp':
if line_timestamp.match(line):
logging.debug('Found timestamp: {t}'.format(t=line))
current_record['timestamp'] = line
state = 'reading subtitles'
else:
logging.error('HUH: Expected to find a timestamp, but instead found: [{d}]'.format(d=line))
elif state == 'reading subtitles':
if line_seperator.match(line):
logging.info('Blank line reached, yielding record: {r}'.format(r=current_record))
yield current_record
state = 'seeking to next entry'
current_record = {'index':None, 'timestamp':None, 'subtitles':[]}
else:
logging.debug('Appending to subtitle: {s}'.format(s=line))
current_record['subtitles'].append(line)
else:
logging.error('HUH: Fell into an unknown state: `{s}`'.format(s=state))
if state == 'reading subtitles':
# We must have finished the file without encountering a blank line. Dump the last record
yield current_record
def write_dict_to_worksheet(columns_for_keys, keyed_data, worksheet, row):
"""
Write a subtitle-record to a worksheet.
Return the row number after those that were written (since this may write multiple rows)
"""
current_row = row
#First, horizontally write the entry and timecode
for (colname, colindex) in columns_for_keys.items():
if colname != 'subtitles':
worksheet.write(current_row, colindex, keyed_data[colname])
#Next, vertically write the subtitle data
subtitle_column = columns_for_keys['subtitles']
for morelines in keyed_data['subtitles']:
worksheet.write(current_row, subtitle_column, morelines)
current_row+=1
return current_row
def convert(input_filename, output_filename):
workbook = xlsxwriter.Workbook(output_filename)
worksheet = workbook.add_worksheet('subtitles')
columns = {'index':0, 'timestamp':1, 'subtitles':2}
next_available_row = 0
records_processed = 0
headings = {'index':"Entries", 'timestamp':"Timecodes", 'subtitles':["Subtitles"]}
next_available_row=write_dict_to_worksheet(columns, headings, worksheet, next_available_row)
with open(input_filename) as textfile:
for record in parse_subtitles(textfile):
next_available_row = write_dict_to_worksheet(columns, record, worksheet, next_available_row)
records_processed += 1
print('Done converting {inp} to {outp}. {n} subtitle entries found. {m} rows written'.format(inp=input_filename, outp=output_filename, n=records_processed, m=next_available_row))
workbook.close()
convert(input_filename='Wildlife.srt', output_filename='Subtitle.xlsx')
Edit: Updated to split multiline subtitles across multiple rows in output

Categories

Resources