is there a way to modify a string to remove a decimal? - python

I have a file with a lot of images. Each image is named something like:
100304.jpg
100305.jpg
100306.jpg
etc...
I also have a spreadsheet, Each image is a row, the first value in the row is the name, the values after the name are various decimals and 0's to describe features of each image.
The issue is that when I pull the name from the sheet, something is adding a decimal which then results in the file not being able to be transferred via the shutil.move()
import xlrd
import shutil
dataLocation = "C:/Users/User/Documents/Python/Project/sort_solutions_rev1.xlsx"
imageLocBase = "C:/Users/User/Documents/Python/Project/unsorted"
print("Specify which folder to put images in. Type the number only.")
print("1")
print("2")
print("3")
int(typeOfSet) = input("")
#Sorting for folder 1
if int(typeOfSet) == 1:
#Identifying what to move
name = str(sheet.cell(int(nameRow), 0).value)
sortDataStorage = (sheet.cell(int(nameRow), 8).value) #float
sortDataStorageNoFloat = str(sortDataStorage) #non-float
print("Proccessing: " + name)
print(name + " has a correlation of " + (sortDataStorageNoFloat))
#sorting for this folder utilizes the information in column 8)
if sortDataStorage >= sortAc:
print("test success")
folderPath = "C:/Users/User/Documents/Python/Project/Image Folder/Folder1"
shutil.move(imageLocBase + "/" + name, folderPath)
print(name + " has been sorted.")
else:
print(name + " does not meet correlation requirement. Moving to next image.")
The issue I'm having occurs with the shutil.move(imageLocBase + "/" +name, folderPath)
For some reason my code takes the name from the spreadsheet (ex: 100304) and then adds a ".0" So when trying to move a file, it is trying to move 100304.0 (which doesn't exist) instead of 100304.

Using pandas to read your Excel file.
As suggested in a comment on the original question, here is a quick example of how to use pandas to read your Excel file, along with an example of the data structure.
Any questions, feel free to shout, or have a look into the docs.
import pandas as pd
# My path looks a little different as I'm on Linux.
path = '~/Desktop/so/MyImages.xlsx'
df = pd.read_excel(path)
Data Structure
This is completely contrived as I don't have an example of your actual file.
IMAGE_NAME FEATURE_1 FEATURE_2 FEATURE_3
0 100304.jpg 0.0111 0.111 1.111
1 100305.jpg 0.0222 0.222 2.222
2 100306.jpg 0.0333 0.333 3.333
Hope this helps get you started.
Suggestion:
Excel likes to think it's clever and does 'unexpected' things, as you're experiencing with the decimal (data type) issue. Perhaps consider storing your image data in a database (SQLite) or as plain old CSV file. Pandas can read from either of these as well! :-)

splitOn = '.'
nameOfFile = text.split(splitOn, 1)[0]
Should work
if we take your file name eg 12345.0 and create a var
name = "12345.0"
Now we need to split this var. In this case we wish to split on .
So we save this condition as a second var
splitOn = '.'
Using the .split for python.
Here we offer the text (variable name) and the python split command.
so to make it literal
12345.0
split at .
only make one split and save as two vars in a list
(so we have 12345 at position 0 (1st value)
and 0 at position 1 (2nd value) in a list)
save 1st var
(as all lists are 0 based we ask for [0]
(if you ever get confused with list, arrays etc just start counting
from 0 instead of one on your hands and then you know
ie position 0 1 2 3 4 = 1st value, 2nd value, 3rd value, 4th value, 5th value)
nameOfFile = name.split(splitOn, 1)[0]
12345.0 split ( split on . , only one split ) save position 0 ie first value
So.....
name = 12345.0
splitOn = '.'
nameOfFile = name.split(splitOn, 1)[0]
yield(nameOfFile)
output will be
12345
I hope that helps
https://www.geeksforgeeks.org/python-string-split/
OR
as highlighted below, convert to float to in
https://www.geeksforgeeks.org/type-conversion-python/
if saved as float
name 12345.0
newName = round(int(name))
this will round the float (as its 0 will round down)
OR
if float is saved as a string
print(int(float(name)))

Apparently the value you retrieve from the spreadsheet comes parsed as a float, so when you cast it to string it retains the decimal part.
You can trim the “.0” from the string value, or cast it to integer before casting to string.
You could also check the spreadsheet’s cell format and ensure it is set to normal (idk the setting, but something that is not a number). With that fixed, your data probably wont come with the .0 anymore.

If always add ".0" to the end of the variable, You need to read the var_string "name" in this way:
shutil.move(imageLocBase + "/" + name[:-2], folderPath)
A string is like a list that we can choose the elements to read.
Slicing is colled this method
Sorry for my English. Bye

All these people have taken time to reply, please out of politeness rate the replies.

Related

Turning a text file into a tabular format [duplicate]

This question already has answers here:
How do I print parameters of multiple objects in table form? [duplicate]
(2 answers)
Line up columns of numbers (print output in table format)
(7 answers)
Closed 4 years ago.
I'm having issues trying to properly format a text file to fit the needed criteria for a school project. I've been stuck on it for a while and I'm still very new to coding and wanted to know if anyone has an answer that I can understand and implement, hopefully I can learn from those much more experienced.
I want to convert a text file that can be entered by a user that looks like this within the file:
Lennon 12 3.33
McCartney 57 7
Harrison 11 9.1
Starr 3 4.13
and create it to fit a tabular format like this:
Name Hours Total Pay
Lambert 34 357.00
Osborne 22 137.50
Giacometti 5 503.50
I can create the headers, though it may not be pretty code, but when I print the contents of the test file it usually turns out like this:
Name Hour Total pay
Lennon 12 3.33
McCartney 57 7
Harrison 11 9.1
Starr 3 4.13
And I don't understand how to properly format it to look like a proper table that's right justified and properly in line with the actual headers, I'm not sure how to really tackle it or where to even start as I haven't made any real ground on this.
I've gutted my code and broke it down into just the skeleton after trying with things like file_open.read().rstrip("\n) and .format making a mess of the indexes and sometimes somehow ending up with only single letters to appear:
file_name = input("Enter the file name: ")
print("Name" + " " * 12 + "Hour" + " " * 6 + "Total pay")
with open(file_name, 'r') as f:
for line in f:
print(line, end='')
I know it looks simple, and because it is. Our instructor wanted us to work with the "open" command and try and stay away from things that could make it less readable but still as compact as possible. This includes the importing of third party tools which shot down chances to use things like beautifultable like a few other friends have offered as an easier way out.
I had a classmate say to read the lines that turns it into a list and adjust it from there with some formatting, and another classmate said I could probably format it without listing it; although I found that the newline character "\n" appears at the end of each list index if turning it into a list
ex: ['Lennon 12 3.33\n', 'McCartney 57 7\n', 'Harrison 11 9.1\n', 'Starr 3 4.13']
Though what I don't understand is how to format the things that are within the list so that the name can be separated from each of the number variables and in line with the header as I don't have much experience with for loops that many say can be an easy fix within my class if I have that down pat.
I'm not exactly looking for straight coded answers, but rather a point in the right direction or where to read up on how to manipulate listed content
Here's something to get you headed in the right direction:
data_filename = 'employees.txt'
headers = 'Name', 'Hours', 'Rate' # Column names.
# Read the data from file into a list-of-lists table.
with open(data_filename) as file:
datatable = [line.split() for line in file.read().splitlines()]
# Find the longest data value or header to be printed in each column.
widths = [max(len(value) for value in col)
for col in zip(*(datatable + [headers]))]
# Print heading followed by the data in datatable.
# (Uses '>' to right-justify the data in some columns.)
format_spec = '{:{widths[0]}} {:>{widths[1]}} {:>{widths[2]}}'
print(format_spec.format(*headers, widths=widths))
for fields in datatable:
print(format_spec.format(*fields, widths=widths))
Output:
Name Hours Rate
Lennon 12 3.33
McCartney 57 7
Harrison 11 9.1
Starr 3 4.13
You can use pandas for this, a dataframe will do the required job
import pandas as pd
df = pd.read_csv('file.txt', sep='\s{1,}')
df.columns = ['Name','Hours','Total Pay']
print(df)
Hope this helps.

Problems reading a file in, and editing certain contents of it

So I have a file with
first name(space)last name(tab)a grade as such.
Example
Wanda Barber 96
I'm having trouble reading this in as a list and then editing the number.
My current code is,
def TopStudents(n):
original = open(n)
contents = original.readlines()
x = contents.split('/t')
for y in x[::2]:
y - 100
if y > 0: (????)
Here is the point where I'm confused. I am just trying to get the first and last names of students who scored over 100%. I thought of creating a new list for students that meet this qualification, but I'm not sure how I would write the corresponding first and last name. I know I need to take the stride of every other location in the list, as odd will always be the first and last names. Thank you in advance for the help!
There are several things wrong with your code:
- The open file must be closed (#1)
- Must be made a function call using to call it (#2)
- The split used is using the forwardslash (/) instead of the backslash () (#3)
- The way you decided to loop through your for loop is not optimal if you are looking to access all the members (#4)
- The for loops end in a : (#5)
- You must store the result of that calculation somewhere (#6)
def TopStudents(n):
original = open(n) #1
contents = original.readlines #2
x = contents.split('/t') #3
for y in x[::2] #4, #5
y - 100 #6
if y > 0:
That said, a fixed version could be:
original = open(n, 'r')
for line in original:
name, score = line.split('\t')
# If needed, you could split the name into first and last name:
# first_name, last_name = name.split(' ')
# 'score' is a string, we must convert it to an int before comparing to one, so...
score = int(score)
if score > 100:
print("The student " + name + " has the score " + str(score))
original.close() #1 - Closed the file
Note: I have focused on readability with several commentary to help you understand the code.
I always prefer to use ‘with open()’ because it closes the file automatically. I used a txt with comma separations for simplicity for me, but you can just replace the comma with \t.
def TopStudents():
with open('temp.txt', 'r') as original:
contents = list(filter(None, (line.strip().strip('\n') for line in original)))
x = list(part.split(',') for part in contents)
for y in x:
if int(y[1]) > 100:
print(y[0], y[1])
TopStudents()
This opens and loads all lines into contents as a list, removing blank lines and line breaks. Then it separates into a list of lists.
You then iterate through each list in x, looking for the second value (y[1]) which is your grade. If the int() is greater than 100, print each segment of y.

Split string in multiple places

New to programming and currently working with python. I am trying to take a user inputted string (containing letters, numbers and special characters), I then need to split it multiple times at different points to reform new strings. I have done research on the splitting of strings (and lists) and feel I understand it but I still know there must be a better way to do this than I can think of.
This is what I currently have
ass=input("Enter Assembly Number: ")
#Sample Input 1 - BF90UQ70321-14
#Sample Input 2 - BS73OA91136-43
ass0=ass[0]
ass1=ass[1]
ass2=ass[2]
ass3=ass[3]
ass4=ass[4]
ass5=ass[5]
ass6=ass[6]
ass7=ass[7]
ass8=ass[8]
ass9=ass[9]
ass10=ass[10]
ass11=ass[11]
ass12=ass[12]
ass13=ass[13]
code1=ass0+ass2+ass3+ass4+ass5+ass6+ass13
code2=ass0+ass2+ass3+ass4+ass5+ass6+ass9
code3=ass1+ass4+ass6+ass7+ass12+ass6+ass13
code4=ass1+ass2+ass4+ass5+ass6+ass9+ass12
# require 21 different code variations
Please tell me that there is a better way to do this.
Thank you
Give a look to this code and Google "python string slicing" (a nice tutorial for beginners is at https://www.youtube.com/watch?v=EqAgMUPRh7U).
String (and list) slicing is used a lot in Python. Be sure to learn it well. The upper index could be not so intuitive, but it becomes second nature.
ass="ABCDEFGHIJKLMN"
code1 = ass[0] + ass[2:7] + ass[13] # ass[2:7] is to extract 5 chars starting from index 2 (7 is excluded)
code2 = ass[0] + ass[3:7] + ass[9]
code3 = ass[1] + ass[4] + ass[6:8] + ass[12] + ass[6] + ass[13]
code4 = ass[1:3] + ass[4:7] + ass[9] + ass[12]
PS: You probably need also to check if the string length is 14 before working with it.
EDIT: Second solution
Here is another solution, perhaps it is easier to follow:
def extract_chars(mask):
chars = ""
for i in mask:
chars += ass[i]
return chars
mask = [0,2,3,4,5,6,13]
print extract_chars(mask)
Here you define a mask of indexes of the chars you want to extract.
You can try something like this,
input1 = 'BF90UQ70321-14'
code = lambda anum, pos: ''.join(anum[p] for p in pos)
code4 = code(input1, (1,2,4,5,6,9,12))

Function takes exactly 3 arguments (1 given)? Help formatting print statement

Here are my questions:
Create a function called "numSchools" that counts the schools of a specific type. The function should have three input parameters, (1) a string for the workspace, (2) a string for the shapefile name, and (3) a string for the facility type (e.g. "HIGH SCHOOL"), and one output parameter, (1) an integer for the number of schools of that facility type in the shapefile.
import arcpy
shapefile = "Schools.shp"
work = r"c:\Scripts\Lab 6 Data"
sTyp = "HIGH SCHOOL"
def numSchools(work, shapefile, sTyp):
whereClause = "\"FACILITY\" = 'HIGH SCHOOL' " # where clause for high schools
field = ['FACILITY']
searchCurs = arcpy.SearchCursor(shapefile, field, whereClause)
row = searchCurs.next()
for row in searchCurs:
# using getValue() to get the name of the high school
value = row.getValue("NAME")
high_schools = [row[0] for row in arcpy.SearchCursor(shapefile, field, whereClause)]
count = arcpy.GetCount_management(high_schools)
return count
numSchools(work, shapefile, sTyp)
print ("There are a total of: "),count
So this is my code that runs perfectly, but it is accomplished by scripting. I need to wrap it into a python function. (MY WEAKNESS). It seems there are some problems with the last line of my code. `
I am not quite sure how to format this last line of code to read
(there are a total of 29 high schools) while including necessary arguments.
You need to explicitly pass the arguments.
count = numSchools(work, shapefile, sTyp)
print("There are a total of: ", count)

adding '+' to all the numbers as a prefix (numbers are stored in a csv file) using a python script

goal
All the numbers in the csv file that I exported from hotmail are stored as 91123456789 whereas to complete a call i need to dial +91123456789. These contacts will be converted to a batch of vcf files and exported to my phone. I want to add the + to all my contacts at the beginning.
approach
write a python script that can do this for an indefinite number of contacts.
pre-conditions
none of the numbers in the csv file will have a + in them.
problem
(a) there is a posibility that the number itself may have a 91 in it like: +919658912365. This makes the adding a plus very difficult.
explanation:I am adding this as a problem, as if the 91 is there only at the beginning of a number then we can add it simple by checking two consecutive digits and if they match 91 then we can add + else we don't need to add + and we can move on to the next pair of digits.
(b) the fields are seprated by comma's. I want to add the + as a prefix only in front of the field which has the header mobile and not in any other field where a set of digits 91 may appear(like in landline numbers or fax numbers)
research
I tried this with excel, but the process it would take an unreasonable amount of time(like 2 hours!)
specs
I have 400 contacts.
Windows XP SP 3
please help me solve this problem.
Something like below??
import csv
for row in csv.reader(['num1, 123456789', 'num2, 987654321', 'num3, +23456789']):
phoneNumber = row[1].strip()
if not phoneNumber.startswith('+'):
phoneNumber = '+' + phoneNumber
print phoneNumber
Could use iterators to test each phone number as below:
phone_numbers = ['12234', '91232324', '913746', '3453' '9145653', '95843']
for i, number in enumerate(phone_numbers):
phone_numbers[i] = ''.join(['+', phone_numbers[i]]) if number.startswith('91') else phone_numbers[i]
Hope that helps

Categories

Resources