I am trying to return a list of unit numbers from about 1000 csv file names. I can read them in then get python to remove all the junk from around them and replace the 5th character to format it how I need it done. I would like to return a list of all the unit numbers so like ['6726-0501', '6826-1144']. What I am currently getting is it printing out the unit number one by one and not saving them. I have looked through previous questions but can't seem to get the mode of creating a list then appending the unit numbers to the list and saving that list to a variable to work. Does anyone know a good method for simply modifying this to output a list and save the list for later use?
Thanks,
Robin
file_names = ['job_1106_unit_672600501_las_PN23074.LAS.csv', 'job_1108_unit_682601144_las_PN23072.LAS.csv']
def change(file_names):
for comps in file_names:
comps_of_comps = list(comps)
unit_num = comps_of_comps[14:23] #[672600501]
a = (unit_num[0:4]) #[6726]
b = (unit_num[5:9]) #[0501]
unit_num = a + list('-') + b #[6,7,2,6,-,0,5,0,1]
unit_num = ''.join(unit_num) #6726-0501
print unit_num
change(file_names)
You can initialize a new list and append to it and return that list. Like
file_names = ['job_1106_unit_672600501_las_PN23074.LAS.csv', 'job_1108_unit_682601144_las_PN23072.LAS.csv']
def change(file_names):
result = []
for comps in file_names:
comps_of_comps = list(comps)
unit_num = comps_of_comps[14:23] #[672600501]
a = (unit_num[0:4]) #[6726]
b = (unit_num[5:9]) #[0501]
unit_num = a + list('-') + b #[6,7,2,6,-,0,5,0,1]
unit_num = ''.join(unit_num) #6726-0501
result.append(unit_num)
return result
print change(file_names)
OR
import re
def change(file_names):
result = []
for i in file_names:
s = re.match('.*unit_(.*)_las.*', i).group(1)
result.append(s[:len(s)/2]+"-"+s[(len(s)/2)+1:])
return result
Related
I'm new to programming and python and I'm looking for a way to distinguish between two input formats in the same input file text file. For example, let's say I have an input file like so where values are comma-separated:
5
Washington,A,10
New York,B,20
Seattle,C,30
Boston,B,20
Atlanta,D,50
2
New York,5
Boston,10
Where the format is N followed by N lines of Data1, and M followed by M lines of Data2. I tried opening the file, reading it line by line and storing it into one single list, but I'm not sure how to go about to produce 2 lists for Data1 and Data2, such that I would get:
Data1 = ["Washington,A,10", "New York,B,20", "Seattle,C,30", "Boston,B,20", "Atlanta,D,50"]
Data2 = ["New York,5", "Boston,10"]
My initial idea was to iterate through the list until I found an integer i, remove the integer from the list and continue for the next i iterations all while storing the subsequent values in a separate list, until I found the next integer and then repeat. However, this would destroy my initial list. Is there a better way to separate the two data formats in different lists?
You could use itertools.islice and a list comprehension:
from itertools import islice
string = """
5
Washington,A,10
New York,B,20
Seattle,C,30
Boston,B,20
Atlanta,D,50
2
New York,5
Boston,10
"""
result = [[x for x in islice(parts, idx + 1, idx + 1 + int(line))]
for parts in [string.split("\n")]
for idx, line in enumerate(parts)
if line.isdigit()]
print(result)
This yields
[['Washington,A,10', 'New York,B,20', 'Seattle,C,30', 'Boston,B,20', 'Atlanta,D,50'], ['New York,5', 'Boston,10']]
For a file, you need to change it to:
with open("testfile.txt", "r") as f:
result = [[x for x in islice(parts, idx + 1, idx + 1 + int(line))]
for parts in [f.read().split("\n")]
for idx, line in enumerate(parts)
if line.isdigit()]
print(result)
You're definitely on the right track.
If you want to preserve the original list here, you don't actually have to remove integer i; you can just go on to the next item.
Code:
originalData = []
formattedData = []
with open("data.txt", "r") as f :
f = list(f)
originalData = f
i = 0
while i < len(f): # Iterate through every line
try:
n = int(f[i]) # See if line can be cast to an integer
originalData[i] = n # Change string to int in original
formattedData.append([])
for j in range(n):
i += 1
item = f[i].replace('\n', '')
originalData[i] = item # Remove newline char in original
formattedData[-1].append(item)
except ValueError:
print("File has incorrect format")
i += 1
print(originalData)
print(formattedData)
The following code will produce a list results which is equal to [Data1, Data2].
The code assumes that the number of entries specified is exactly the amount that there is. That means that for a file like this, it will not work.
2
New York,5
Boston,10
Seattle,30
The code:
# get the data from the text file
with open('filename.txt', 'r') as file:
lines = file.read().splitlines()
results = []
index = 0
while index < len(lines):
# Find the start and end values.
start = index + 1
end = start + int(lines[index])
# Everything from the start up to and excluding the end index gets added
results.append(lines[start:end])
# Update the index
index = end
I've been playing around with a program that will take in information from two files and then write the information out to a single file in sorted order.
So what i did was store each line of the file as an element in a list. I create another function that splits each element into a 2d array where i can easily access the name variables. From there i want to create a nested for loop that as it iterates it checks for the highest value in the array, removes the value from the list and appending it to a new list until there's a sorted list.
I think I am like 90% of the way there, but I am having trouble wrapping my head around the logic of sorting algorithms. It seems like the problem just keeps getting more complex and i keep wanting to use pointers. If someone could help shine some light on the subject I would greatly appreciate it.
import os
from http.cookiejar import DAYS
from macpath import split
# This program reads a given input file and finds its longest line.
class Employee:
def __init__(self, EmployeeID, name, wage, days):
self.EmployeeID = EmployeeID
self.name = name
self.wage = wage
self.days = days
def Extraction(file,file2):
employList = []
while True:
line1 = file.readline().strip()
line2 = file2.readline().strip()
#print(type(line1))
employList.append(line1)
#print(line1)
employList.append(line2)
#print(line2)
if line1 == '' or line2 == '':
break
return employList
def Sort(mylist):
splitlist = []
sortedlist = []
print(len(mylist))
for items in range(len(mylist)):
#print(mylist[items].split())
splitlist.append(mylist[items].split())
print(splitlist)
#print(splitlist[1][1])
#print(splitlist[1][2])
highest = "z"
print(highest)
sortingLength = len(splitlist)
for i in range(10):
for items in range(len(splitlist)-2):
if highest > splitlist[items][2]:
istrue = highest < splitlist[items][2]
highest = splitlist[items][1]
print(items)
print(istrue)
print('marker')
print(splitlist[items][2])
if items == (len(splitlist)-2):
print("End of list",splitlist[items][2])
print(highest)
print(splitlist.index(highest))
print(splitlist[len(splitlist)-1][2])
print(sortingLength)
fPath = 'C:/Temp'
fileName = 'payroll1.txt'
fullFileName = os.path.join(fPath,fileName)
fileName2 = 'payroll2.txt'
fullFileName2 = os.path.join(fPath,fileName2)
f = open(fullFileName,'r')
f2 = open(fullFileName2, 'r')
employeeList = Extraction(f,f2)#pulling out each line in the file and placing into a list
Sort(employeeList)
ReportName= "List of Employees:"
marker = '-'* len(ReportName)
print (ReportName + ' \n' + marker)
total = 0
f.close()
I am having trouble with once having the higest value trying to append that value to a sortedlist, removing the value from the splitlist, and re running the code.
Using the sorted method is much easier and already built-in, per Joran's suggestion. I've edited your reading method so that it builds two lists of tuples, representing the line and the length of the line. The sorted method will return a list sorted according to the key (line length) and descending order (reverse=True)
from operator import itemgetter
class Employee:
def __init__(self, EmployeeID, name, wage, days):
self.EmployeeID = EmployeeID
self.name = name
self.wage = wage
self.days = days
def Extraction(file,file2):
employList = []
mylines = [(i, len(l.strip()), 'file1') for i,l in enumerate(file.readlines())]
mylines2 = [(i, len(l.strip()), 'file2') for i,l in enumerate(file2.readlines())]
employList = [*mylines, *mylines2]
return employList
fPath = 'C:/Temp'
fileName = 'payroll1.txt'
fullFileName = os.path.join(fPath,fileName)
fileName2 = 'payroll2.txt'
fullFileName2 = os.path.join(fPath,fileName2)
f = open(fullFileName,'r')
f2 = open(fullFileName2, 'r')
employeeList = Extraction(f,f2)#pulling out each line in the file and placing the line_number and length into a list
f.close()
f2.close()
# Itemgetter will sort on the second element of the tuple, len(line)
# and reverse will put it in descending order
ReportName = sorted(employeeList, key=itemgetter(1), reverse=True)
EDIT: I've added markers in the tuples so that you can keep track of what lines came from what file. Might be a bit confusing without them
Here is my code:
def option_A():
print("Pick a Fixture!")
fixture_choice = int(input("Enter: "))
file = open("firesideFixtures.txt", "r")
fixture_number = file.readlines(fixture_choice)
fixture = [linecache.getline("firesideFixtures.txt", fixture_choice)]
print(fixture)
file.close()
The first line from the file I am using is:
1,02/09/15,18:00,RNGesus,Ingsoc,Y,Ingsoc
The expected result is:
1, 02/09/15, RNGesus, Ingsoc, Y, Ingsoc
The result I get:
['1,02/09/15,18:00,RNGesus,Ingsoc,Y,Ingsoc\n']
How can I do this?
Print the only element of your list by indexing into it:
print(fixture[0])
Output:
1,02/09/15,18:00,RNGesus,Ingsoc,Y,Ingsoc
Or, even better don't create a list in the fist place (note the missing []):
fixture = linecache.getline("firesideFixtures.txt", fixture_choice)
How can I remove the "18:00" part from the output because all I need is "1,02/09/15, RNGesus, Ingsoc, Y, Ingsoc" (from comment)
Now, remove the time:
fixture = linecache.getline("firesideFixtures.txt", fixture_choice)
parts = fixture.split(',')
res = ','.join(parts[:2] + parts[3:])
print(res)
print(fixture)
Output:
1,02/09/15,RNGesus,Ingsoc,Y,Ingsoc
How to solve this renaming duplicates problem without resorting to renaming with something unique like "_DUPLICATED_#NO" the names have to be unique when finished, and preferably with iterative numbers denoting number of duplicates
from collections import defaultdict
l = ["hello1","hello2","hello3",
"hello","hello","hello"]
tally = defaultdict(lambda:-1)
for i in range(len(l)):
e = l[i]
tally[e] += 1
if tally[e] > 0:
e += str(tally[e])
l[i] = e
print (l)
results:
['hello1', 'hello2', 'hello3', 'hello', 'hello1', 'hello2']
as you can see, the names are not unique
This seems simple enough. You start with a list of filenames:
l = ["hello1","hello2","hello3",
"hello","hello","hello"]
Then you iterate through them to finished filenames, incrementing a trailing number by 1 if a duplicate is found.
result = {}
for fname in l:
orig = fname
i=1
while fname in result:
fname = orig + str(i)
i += 1
result[fname] = orig
This should leave you with a dictionary like:
{"hello1": "hello1",
"hello2": "hello2",
"hello3": "hello3",
"hello": "hello",
"hello4": "hello",
"hello5": "hello"}
Of course if you don't care about mapping the originals to the duplicate names, you can drop that part.
result = set()
for fname in l:
orig = fname
i=1
while fname in result:
fname = orig + str(i)
i += 1
result.add(fname)
If you want a list afterward, just cast it that way.
final = list(result)
Note that if you're creating files, this is exactly what the tempfile module is designed to do.
import tempfile
l = ["hello1","hello2","hello3",
"hello","hello","hello"]
fs = [tempfile.NamedTemporaryFile(prefix=fname, delete=False, dir="/some/directory/") for fname in l]
This will not create nicely incrementing filenames, but they are guaranteed unique, and fs will be a list of the (open) file objects rather than a list of names, although NamedTemporaryFile.name will give you the filename.
I've got an assignment and part of it asks to define a process_filter_description. Basically I have a list of images I want to filter:
images = ["1111.jpg", "2222.jpg", "circle.JPG", "square.jpg", "triangle.JPG"]
Now I have an association list that I can use to filter the images:
assc_list = [ ["numbers", ["1111.jpg", "2222.jpg"]] , ["shapes", ["circle.JPG", "square.jpg", "triangle.JPG"]] ]
I can use a filter description to select which association list I want to apply the filter the keyword is enclosed by colons):
f = ':numbers:'
I'm not exactly sure how to start it. In words I can at least think:
Filter is ':numbers:'
Compare each term of images to each term associated with numbers in the association list.
If term matches, then append term to empty list.
Right now I am just trying to get my code to print only the terms from the numbers association list, but it prints out all of them.
def process_filter_description(f, images, ia):
return_list = []
f = f[1:-1]
counter = 0
if f == ia[counter][0]:
#print f + ' is equal to ' + ia[counter][0]
for key in ial:
for item in key[1]:
#print item
#return_list.append(item)
return return_list
Instead of an "associative list", how about using a dictionary?
filter_assoc = {'numbers': ['1111.jpg', '2222.jpg'] ,
'shapes': ['circle.JPG', 'square.jpg', 'triangle.JPG']}
Now, just see which images are in each group:
>>> filter_assoc['numbers']
['1111.jpg', '2222.jpg']
>>>
>>> filter_assoc['shapes']
['circle.JPG', 'square.jpg', 'triangle.JPG']
Your processing function would become immensely simpler:
def process_filter_description(filter, association):
return association[filter[1:-1]]
I'll just think aloud here, so this is what I'd use as a function to perform the task of the dictionary:
def process_filter_description(index, images, association):
return_list = []
index = index[1:-1]
for top_level in association:
if top_level[0] == index:
for item in top_level[1]:
return_list.append(item)
break
return return_list