Python write to a text file after certain column

Python write to a text file after certain column - python

I am using the following code:
f.write(str(foo) + ' ' + str(bar) + '\n')
The problem is that the number of letters in foo is different for each value and I get the following output:
Account Category DORMANT
Last Made Update 21/12/2013
Mortgages Partly Satisfied 0
The problem is that because I am using same amount of space (' ') for all the values and Mortgages Partly Satisfied is longer string, so the value 0 goes to the right. What I would like the output to be is:
Account Category DORMANT
Last Made Update 21/12/2013
Mortgages Partly Satisfied 0
My question is: Is there a way to insert the second value bar after certain amount of columns so the values will always be aligned?
I hope I was clear enough.

It's probably best to use string formatting with the str.format method, like so:
items = [
('Account Category', 'DORMANT'),
('Last Made Update', '21/12/2013'),
('Mortgages Partly Satisfied', '0'),
]
for label, value in items:
f.write('{:28} {}\n'.format(label, value))
The :28 is the width specifier. See format string docs for more info.

Python lets you add padding for strings by specifying the number of characters a given field should use. This can be used when writing to your file as follows:
data = [["Account Category", "DORMANT"], ["Last Made Update", "21/12/2013"], ["Mortgages Partly Satisfied", "0"]]
with open('output.txt', 'w') as f:
for v1, v2 in data:
f.write("{:28} {}\n".format(v1, v2))
Giving you:
Account Category DORMANT
Last Made Update 21/12/2013
Mortgages Partly Satisfied 0

You can use the ljust function, that returns the string left justified.
Just, try it:
f.write(str(foo).ljust(40) + str(bar) + '\n')
You can also check other methods in the docs
This is going to give you the next output:
Last Made Update 21/12/2013
Account Category DORMANT
Mortgages Partly Satisfied 0

Related

Google Kickstart 2014 Round D Sort a scrambled itinerary - Do I need to bring the input in a ready-to-use array format?

Problem:
Once upon a day, Mary bought a one-way ticket from somewhere to somewhere with some flight transfers.
For example: SFO->DFW DFW->JFK JFK->MIA MIA->ORD.
Obviously, transfer flights at a city twice or more doesn't make any sense. So Mary will not do that.
Unfortunately, after she received the tickets, she messed up the tickets and she forgot the order of the ticket.
Help Mary rearrange the tickets to make the tickets in correct order.
Input:
The first line contains the number of test cases T, after which T cases follow.
For each case, it starts with an integer N. There are N flight tickets follow.
Each of the next 2 lines contains the source and destination of a flight ticket.
Output:
For each test case, output one line containing "Case #x: itinerary", where x is the test case number (starting from 1) and the itinerary is a sorted list of flight tickets that represent the actual itinerary.
Each flight segment in the itinerary should be outputted as pair of source-destination airport codes.
Sample Input: Sample Output:
2 Case #1: SFO-DFW
1 Case #2: SFO-DFW DFW-JFK JFK-MIA MIA-ORD
SFO
DFW
4
MIA
ORD
DFW
JFK
SFO
DFW
JFK
MIA
My question:
I am a beginner in the field of competitive programming. My question is how to interpret the given input in this case. How did Googlers program this input? When I write a function with a Python array as its argument, will this argument be in a ready-to-use array format or will I need to deal with the above mentioned T and N numbers in the input and then arrange airport strings in an array format to make it ready to be passed in the function's argument?
I have looked up at the following Google Kickstart's official Python solution to this problem and was confused how they simply pass the ticket_list argument in the function. Don't they need to clear the input from the numbers T and N and then arrange the airport strings into an array, as I have explained above?
Also, I could not understand how could the methods first and second simply appear if no Class has been initialized? But I think this should be another question...
def print_itinerary(ticket_list):
arrival_map = {}
destination_map = {}
for ticket in ticket_list:
arrival_map[ticket.second] += 1
destination_map[ticket.first] += 1
current = FindStart(arrival_map)
while current in destination_map:
next = destination_map[current]
print current + "-" + next
current = next

You need to implement it yourself to read data from standard input and write results to standard output.
Sample code for reading from standard input and writing to standard output can be found in the coding section of the FAQ on the KickStart Web site.
If you write the solution to this problem in python, you can get T and N as follows.
T = int(input())
for t in range(1, T + 1):
N = int(input())
...
Then if you want to get the source and destination of the flight ticket as a list, you can use the same input method to get them in the list.
ticket_list = [[input(), input()] for _ in range(N)]
# [['MIA', 'ORD'], ['DFW', 'JFK'], ['SFO', 'DFW'], ['JFK', 'MIA']]
If you want to use first and second, try a namedtuple.
Pair = namedtuple('Pair', ['first', 'second'])
ticket_list = [Pair(input(), input()) for _ in range(N)]

String Formatting Dollar Sign In Python

So I'm trying to get the dollar sign to appear directly to the left of the digits under the column 'Cost'. I'm having trouble figuring this out and any help is appreciated.
# Question 5
print("\nQuestion 5.")
# Get amount of each type of ticket to be purchased
adultTickets = input("\nHow many adult tickets you want to order: ")
adultCost = int(adultTickets) * 50.5
childTickets = input("How many children (>=10 years old) tickets: ")
childCost = int(childTickets) * 10.5
youngChildTickets = input("How many children (<10 years old) tickets: ")
# Display ticket info
print ("\n{0:<17} {1:>17} {2:>12}".format("Type", "Number of tickets",
"Cost"))
print ("{0:<17} {1:>17}${2:>12.2f}".format("Adult", adultTickets, adultCost))
print ("{0:<17} {1:>17}${2:>12.2f}".format("Children (>=10)", childTickets,
childCost))
print ("{0:<17} {1:>17} {2:>12}".format("Children (<10)", youngChildTickets,
"free"))
#Calculate total cost and total amount of tickets
totalTickets = (int(adultTickets) + int(childTickets) +
int(youngChildTickets))
totalCost = adultCost + childCost
print ("{0:<17} {1:>17}${2:>12.2f}".format("Total", totalTickets, totalCost))
I also want the cost column to be formatted right, which for some reason it isn't when I run the program.
(My output):
Edit: I also can't format the '$' onto the cost, as I have to keep the 2 decimal places in the formatting

If I understand correctly, you want to format a floating point number into a string, and have a dollar sign appear in the output in between the number and the padding that you use to space it out in the string. For example, you want to be able to create these two strings with the same formatting code (just different values):
foo: $1.23
foo: $12.34
Unfortunately, you can't do this with just one string formatting operation. When you apply padding to a number, the padding characters will appear in between the number and any prefixing text, like the dollar signs in your current code. You probably need to format the number in two steps. First make the numbers into strings prefixed with the dollar signs, then format again to insert the dollar strings into the final string with the appropriate padding.
Here's how I'd produce the example strings above:
a = 1.23
b = 12.34
a_dollars = "${:.2f}".format(a) # make strings with leading dollar sign
b_dollars = "${:.2f}".format(b)
a_padded = "foo:{:>8}".format(a_dollars) # insert them into strings with padding
b_padded = "foo:{:>8}".format(b_dollars)

I had the same issue, and I was able to solve it with a string like this:
print("%5d" % month,"%3s"%'$',"%12.2f" % balance,"%4s"%'$',"%11.2f" % interest,"%4s"%'$',"%13.2f" % principal)
It printed this in a nice even table.
Month | Current Balance | Interest Owned | Principal Owned
1 $ 9000.00 $ 90.00 $ 360.00
This indented the $ to the start of each table column, while also leaving enough space for very large figures. It's not ideal though since it would be nice to nest it right next to the dollar number.

is there a way to modify a string to remove a decimal?

I have a file with a lot of images. Each image is named something like:
100304.jpg
100305.jpg
100306.jpg
etc...
I also have a spreadsheet, Each image is a row, the first value in the row is the name, the values after the name are various decimals and 0's to describe features of each image.
The issue is that when I pull the name from the sheet, something is adding a decimal which then results in the file not being able to be transferred via the shutil.move()
import xlrd
import shutil
dataLocation = "C:/Users/User/Documents/Python/Project/sort_solutions_rev1.xlsx"
imageLocBase = "C:/Users/User/Documents/Python/Project/unsorted"
print("Specify which folder to put images in. Type the number only.")
print("1")
print("2")
print("3")
int(typeOfSet) = input("")
#Sorting for folder 1
if int(typeOfSet) == 1:
#Identifying what to move
name = str(sheet.cell(int(nameRow), 0).value)
sortDataStorage = (sheet.cell(int(nameRow), 8).value) #float
sortDataStorageNoFloat = str(sortDataStorage) #non-float
print("Proccessing: " + name)
print(name + " has a correlation of " + (sortDataStorageNoFloat))
#sorting for this folder utilizes the information in column 8)
if sortDataStorage >= sortAc:
print("test success")
folderPath = "C:/Users/User/Documents/Python/Project/Image Folder/Folder1"
shutil.move(imageLocBase + "/" + name, folderPath)
print(name + " has been sorted.")
else:
print(name + " does not meet correlation requirement. Moving to next image.")
The issue I'm having occurs with the shutil.move(imageLocBase + "/" +name, folderPath)
For some reason my code takes the name from the spreadsheet (ex: 100304) and then adds a ".0" So when trying to move a file, it is trying to move 100304.0 (which doesn't exist) instead of 100304.

Using pandas to read your Excel file.
As suggested in a comment on the original question, here is a quick example of how to use pandas to read your Excel file, along with an example of the data structure.
Any questions, feel free to shout, or have a look into the docs.
import pandas as pd
# My path looks a little different as I'm on Linux.
path = '~/Desktop/so/MyImages.xlsx'
df = pd.read_excel(path)
Data Structure
This is completely contrived as I don't have an example of your actual file.
IMAGE_NAME FEATURE_1 FEATURE_2 FEATURE_3
0 100304.jpg 0.0111 0.111 1.111
1 100305.jpg 0.0222 0.222 2.222
2 100306.jpg 0.0333 0.333 3.333
Hope this helps get you started.
Suggestion:
Excel likes to think it's clever and does 'unexpected' things, as you're experiencing with the decimal (data type) issue. Perhaps consider storing your image data in a database (SQLite) or as plain old CSV file. Pandas can read from either of these as well! :-)

splitOn = '.'
nameOfFile = text.split(splitOn, 1)[0]
Should work
if we take your file name eg 12345.0 and create a var
name = "12345.0"
Now we need to split this var. In this case we wish to split on .
So we save this condition as a second var
splitOn = '.'
Using the .split for python.
Here we offer the text (variable name) and the python split command.
so to make it literal
12345.0
split at .
only make one split and save as two vars in a list
(so we have 12345 at position 0 (1st value)
and 0 at position 1 (2nd value) in a list)
save 1st var
(as all lists are 0 based we ask for [0]
(if you ever get confused with list, arrays etc just start counting
from 0 instead of one on your hands and then you know
ie position 0 1 2 3 4 = 1st value, 2nd value, 3rd value, 4th value, 5th value)
nameOfFile = name.split(splitOn, 1)[0]
12345.0 split ( split on . , only one split ) save position 0 ie first value
So.....
name = 12345.0
splitOn = '.'
nameOfFile = name.split(splitOn, 1)[0]
yield(nameOfFile)
output will be
12345
I hope that helps
https://www.geeksforgeeks.org/python-string-split/
OR
as highlighted below, convert to float to in
https://www.geeksforgeeks.org/type-conversion-python/
if saved as float
name 12345.0
newName = round(int(name))
this will round the float (as its 0 will round down)
OR
if float is saved as a string
print(int(float(name)))

Apparently the value you retrieve from the spreadsheet comes parsed as a float, so when you cast it to string it retains the decimal part.
You can trim the “.0” from the string value, or cast it to integer before casting to string.
You could also check the spreadsheet’s cell format and ensure it is set to normal (idk the setting, but something that is not a number). With that fixed, your data probably wont come with the .0 anymore.

If always add ".0" to the end of the variable, You need to read the var_string "name" in this way:
shutil.move(imageLocBase + "/" + name[:-2], folderPath)
A string is like a list that we can choose the elements to read.
Slicing is colled this method
Sorry for my English. Bye

All these people have taken time to reply, please out of politeness rate the replies.

Trying to format a string into columns with python

I am trying to format a string to display two columns for a high score table. Python is able to do this well when using print
print '{0:2d} {1:3d} {2:4d}'.format(x, x*x, x*x*x)
but when trying to use a formatted string, but becomes more challenging it seems. This is the result I am trying to get:
#for name, score in list: prints the following
1. FirstName LastName 45000
2. First LastName 78000
3. Fst Lst 11123
4. Name Name 40404
5. llll lll 12345
This is all one string that goes into a pygtk label. Currently I have this:
score_string += "%i. %-15.12s\t%15s\n" % (index, name, score)
which yields untrustworthy results. My current test data is displayed as the following:
1. Firstname Lastname 49900
2. First Last 93000
3. Name Name 6400
Because the first name in that list is longer than the rest (in width, not count of characters) the tab forces the score out of position. Is there a way to do this that not only takes the length of the string, but the width of the string into account as well?

Is something like this acceptable?
>>> names = ["Firstname Lastname", "First Last", "Name Name"]
>>> scores = [49900, 93000, 6400]
>>> for i,v in enumerate(zip(names, scores)):
... name, score = v[0], v[1]
... print "% *d. % -*s %d" % (3, i, 30, name, score)
...
0. Firstname Lastname 49900
1. First Last 93000
2. Name Name 6400
Here the "name" field is padded with spaces to a max width of 30 characters.
Edit: I now see that the width of the font is also a problem. I did not realize that at first from your question. I'll leave this up in case future Googlers end up here for a different reason.

You have two options:
Change the font of the PyGTK label to a font that has equal width characters (not unreasonable in a game, it reminds us of the old arcade days). You can do this with set_markup of pango.Layout.
Use two labels next to each other and use the method set_alignment of the class pango.Layout. The first label aligns the name to the left, the second label contains the score and it aligns to the right. As long as their is enough space the names and scores will align nicely to the left and right respectively.

Schwartzian sort example in "Text Processing in Python"

I was browsing through "Text Processing in Python" and tried its example about Schwartzian sort.
I used following structure for sample data which also contains empty lines. I sorted this data by fifth column:
383230 -49 -78 1 100034 '06 text' 9562 'text' 720 'text' 867
335067 -152 -18 3 100030 'text' 2400 'text' 2342 'text' 696
136592 21 230 3 100035 '03. text' 10368 'text' 1838 'text' 977
Code used for Schwartzian sorting:
for n in range(len(lines)): # Create the transform
lst = string.split(lines[n])
if len(lst) >= 4: # Tuple w/ sort info first
lines[n] = (lst[4], lines[n])
else: # Short lines to end
lines[n] = (['\377'], lines[n])
lines.sort() # Native sort
for n in range(len(lines)): # Restore original lines
lines[n] = lines[n][1]
open('tmp.schwartzian','w').writelines(lines)
I don't get how the author intended that short or empty lines should go to end of file by using this code. Lines are sorted after the if-else structure, thus raising empty lines to top of file. Short lines of course work as supposed with the custom sort (fourth_word function) as implemented in the example.
This is now bugging me, so any ideas? If I'm correct about this then how would you ensure that short lines actually stay at end of file?
EDIT: I noticed the square brackets around '\377'. This messed up sort() so I removed those brackets and output started working.
else: # Short lines to end
lines[n] = (['\377'], lines[n])
print type(lines[n][0])
>>> (type 'list')
I accepted nosklo's answer for good clarification about the meaning of '\377' and for his improved algorithm. Many thanks for the other answers also!
If curious, I used 2 MB sample file which took 0.95 secs with the custom sort and 0.09 with the Schwartzian sort while creating identical output files. It works!

Not directly related to the question, but note that in recent versions of python (since 2.3 or 2.4 I think), the transform and untransform can be performed automatically using the key argument to sort() or sorted(). eg:
def key_func(line):
lst = string.split(line)
if len(lst) >= 4:
return lst[4]
else:
return '\377'
lines.sort(key=key_func)

I don't know what is the question, so I'll try to clarify things in a general way.
This algorithm sorts lines by getting the 4th field and placing it in front of the lines. Then built-in sort() will use this field to sort. Later the original line is restored.
The lines empty or shorter than 5 fields fall into the else part of this structure:
if len(lst) >= 4: # Tuple w/ sort info first
lines[n] = (lst[4], lines[n])
else: # Short lines to end
lines[n] = (['\377'], lines[n])
It adds a ['\377'] into the first field of the list to sort. The algorithm does that in hope that '\377' (the last char in ascii table) will be bigger than any string found in the 5th field. So the original line should go to bottom when doing the sort.
I hope that clarifies the question. If not, perhaps you should indicate exaclty what is it that you want to know.
A better, generic version of the same algorithm:
sort_by_field(list_of_str, field_number, separator=' ', defaultvalue='\xFF')
# decorates each value:
for i, line in enumerate(list_of_str)):
fields = line.split(separator)
try:
# places original line as second item:
list_of_str[i] = (fields[field_number], line)
except IndexError:
list_of_str[i] = (defaultvalue, line)
list_of_str.sort() # sorts list, in place
# undecorates values:
for i, group in enumerate(list_of_str))
list_of_str[i] = group[1] # the second item is original line
The algorithm you provided is equivalent to this one.

An empty line won't pass the test
if len(lst) >= 4:
so it will have ['\377'] as its sort key, not the 5th column of your data, which is lst[4] ( lst[0] is the first column).

Well, it will sort short lines almost at the end, but not quite always.
Actually, both the "naive" and the schwartzian version are flawed (in different ways). Nosklo and wbg already explained the algorithm, and you probably learn more if you try to find the error in the schwartzian version yourself, therefore I will give you only a hint for now:
Long lines that contain certain text
in the fourth column will sort later
than short lines.
Add a comment if you need more help.

Although the used of the Schwartzian transform is pretty outdated for Python it is worth mentioning that you could have written the code this way to avoid the possibility of a line with line[4] starting with \377 being sorted into the wrong place
for n in range(len(lines)):
lst = lines[n].split()
if len(lst)>4:
lines[n] = ((0, lst[4]), lines[n])
else:
lines[n] = ((1,), lines[n])
Since tuples are compared elementwise, the tuples starting with 1 will always be sorted to the bottom.
Also note that the test should be len(list)>4 instead of >=
The same logic applies when using the modern equivalent AKA the key= function
def key_func(line):
lst = line.split()
if len(lst)>4:
return 0, lst[4]
else:
return 1,
lines.sort(key=key_func)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.