Splitting a line separated list into multiple smaller lists - python

I have a list of proxies separated by line. These proxies need to be separate into separate lists with sizes that I choose.
So I want the program to input how many lists of 10, 25, and 50 I need them to be split into, then output the new lists as a text file. The same proxy cannot be present in two separate lists.
This is what I've got so far simply to count the proxies
filename = input('Enter a file name: ')
with open(filename) as f:
line_count = 0
for line in f:
line_count += 1
print("Number of proxies: " + str(line_count))
Any tips on how to proceed?

You can achieve that by something like that:
def split_list(filename, size)
new_content = []
with open(filename) as f:
content = f.readlines()
for chunk in range(0, len(content), size):
new_content.append(content[chunk:chunk+size])
The code will generate numbers (range) from 0 to the length of the file. Using step param of range, we can increase the starting point by size every iteration.
The code will go through the list, and use slicing to get chunks of elements form the list constructing a new one. Those new lists will be the elements of a new list, new_content.

For a variable sizes try this:
def split_list(filename, sizes):
with open(filename) as f:
content = f.readlines()
new_content = []
start = 0
for size in sizes:
stop = start + size
new_content.append(content[start:stop])
start += size
return new_content
splitted_list = split_list('data.txt', [5, 2, 3])
for i, l in enumerate(splitted_list):
with open('{}.txt'.format(i), 'w') as f:
f.writelines(l)
Given data.txt is
1
2
3
4
5
6
7
8
9
10
it will generate three files (as specified in the second argument of the split_list function):
0.txt with the first 5 lines (the first specified chunk):
1
2
3
4
5
1.txt with the following 2 lines (the second chunk):
6
7
Finally 2.txt with the last 3 lines (third chunk):
8
9
10

Related

what's a better way to read information from TXT and put into list in python

I have a TXT in following form, which is the data for Knapsack problem:
100 in first row represents the total capacity
10 in first row represent the number of items
Then each row contains the value and weight for each item. For example, the second row 9 46 represents an item with 9 in value and 46 in weight.
100 10\n 9 46\n 28 31\n 15 42\n 13 19\n 31 48\n 36 11\n 13 27\n 42 17\n 28 19\n 1 31
I use the code below to read the information and put it into separate list.
with open(path) as f:
capacity,total_number = f.readline().split(' ')
capacity = int(capacity)
total_number = int(total_number)
value_list = [int(x.split(' ')[0]) for x in f.readlines()]
f.seek(0)
next(f)
weight_list = [int(x.split(' ')[1]) for x in f.readlines()]
assert total_number==len(weight_list)==len(value_list)
But it kinds feel redundant in a way.
Ccould anyone help me with improvements on that?
You can use a list comprehension to cast the entire line to integers at once.
You can use the zip(*...) idiom to transpose a list; here, to transpose a list of [(value, weight), (value, weight), (value, weight)] pairs to [value, value, value...] and [weight, weight, weight...].
with open(path) as f:
capacity, total_number = [int(num) for num in f.readline().split()]
values_and_weights = [[int(num) for num in l.split()] for l in f.readlines()]
value_list, weight_list = zip(*values_and_weights)
In fact, since all lines are just number pairs,
with open(path) as f:
data = [[int(num) for num in l.split()] for l in f.readlines()]
capacity, total_number = data.pop(0) # remove and unpack first line
value_list, weight_list = zip(*data)
is even more succinct.

convert list to str and write in new .mtx file

I have this code, that would prompt the user for an .mtx file (e.g. Mydata.mtx) that contain a matrix of integers. the program would take this matrix, transpose it, then create a new file with the transposed matrix.
the file is a simple .mtx file.
the original file (Mydata.mtx):
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15 #separated by a tap "\t" to be a matrix
here is the code:
def readMatrix(filename):
listOfLists = []
file = open(filename)
for i in file:
listOfLists.append(i.split())
return listOfLists
def transpose(M):
mtranspose = [list(i) for i in zip(*M)]
return mtranspose
def writeMatrix(M, filename):
for i in M:
convertListToStr = str(i)
newFile = open("T_" + filename, "w")
newFile.write(convertListToStr)
newFile.close()
callFile = input("Enter the file name: ")
toReadFile = readMatrix(callFile)
toTranspose = transpose(toReadFile)
ToWriteMatrix = writeMatrix(toTranspose, callFile)
the code functions, in that it transposes the matrix and creates a new file. so the problems is in the third function writeMatrix in for i in M as it does not print out the whole matrix but only last line in list form. I need it in a string form.
my output (in the new file):
T_Mydata.mtx
['5', '10', '15']
desired output:
1 6 11
2 7 12
3 8 13
4 9 14
5 10 15
can anyone help?
You need to write each row in the matrix as you loop over them, you also need to join each element in the row instead of converting the entire row to a string
def writeMatrix(M, filename):
with open("T_" + filename, "w") as f:
for row in M:
f.write(" ".join(row) + "\n")

Finding missing lines in file

I have a 7000+ lines .txt file, containing description and ordered path to image. Example:
abnormal /Users/alex/Documents/X-ray-classification/data/images/1.png
abnormal /Users/alex/Documents/X-ray-classification/data/images/2.png
normal /Users/alex/Documents/X-ray-classification/data/images/3.png
normal /Users/alex/Documents/X-ray-classification/data/images/4.png
Some lines are missing. I want to somehow automate the search of missing lines. Intuitively i wrote:
f = open("data.txt", 'r')
lines = f.readlines()
num = 1
for line in lines:
if num in line:
continue
else:
print (line)
num+=1
But of course it didn't work, since lines are strings.
Is there any elegant way to sort this out? Using regex maybe?
Thanks in advance!
the following should hopefully work - it grabs the number out of the filename, sees if it's more than 1 higher than the previous number, and if so, works out all the 'in-between' numbers and prints them. Printing the number (and then reconstructing the filename later) is needed as line will never contain the names of missing files during iteration.
# Set this to the first number in the series -1
num = lastnum = 0
with open("data.txt", 'r') as f:
for line in f:
# Pick the digit out of the filename
num = int(''.join(x for x in line if x.isdigit()))
if num - lastnum > 1:
for i in range(lastnum+1, num):
print("Missing: {}.png".format(str(i)))
lastnum = num
The main advantage of working this way is that as long as your files are sorted in the list, it can handle starting at numbers other than 1, and also reports more than one missing number in the sequence.
You can try this:
lines = ["abnormal /Users/alex/Documents/X-ray-classification/data/images/1.png","normal /Users/alex/Documents/X-ray-classification/data/images/3.png","normal /Users/alex/Documents/X-ray-classification/data/images/4.png"]
maxvalue = 4 # or any other maximum value
missing = []
i = 0
for num in range(1, maxvalue+1):
if str(num) not in lines[i]:
missing.append(num)
else:
i += 1
print(missing)
Or if you want to check for the line ending with XXX.png:
lines = ["abnormal /Users/alex/Documents/X-ray-classification/data/images/1.png","normal /Users/alex/Documents/X-ray-classification/data/images/3.png","normal /Users/alex/Documents/X-ray-classification/data/images/4.png"]
maxvalue = 4 # or any other maximum value
missing = []
i = 0
for num in range(1, maxvalue+1):
if not lines[i].endswith(str(num) + ".png"):
missing.append(num)
else:
i += 1
print(missing)
Example: here

How to find all instances of list values(ex: [1,2,3]) in a file at a specific index

I want to find out a list of elements in a file at a specific index.
For ex, below are the contents of the file "temp.txt"
line_0 1
line_1 2
line_2 3
line_3 4
line_4 1
line_5 1
line_6 2
line_7 1
line_8 2
line_9 3
line_10 4
Now, I need to find out the list of values [1,2,3] occurring in sequence at column 2 of each line in above file.
Output should look like below:
line_2 3
line_9 3
I have tried the below logic, but it some how not working ;(
inf = open("temp.txt", "rt")
count = 0
pos = 0
ListSeq = ["1","2","3"]
for line_no, line in enumerate(inf):
arr = line.split()
if len(arr) > 1:
if count == 1 :
pos = line_no
if ListSeq[count] == arr[1] :
count += 1
elif count > 0 :
inf.seek(pos)
line_no = pos
count = 0
else :
count = 0
if count >= 3 :
print(line)
count = 0
Can somebody help me in finding the issue with above code? or even a different logic which will give a correct output is also fine.
Your code is flawed. Most prominent bug: trying to seek in a text file using line number is never going to work: you have to use byte offset for that. Even if you did that, it would be wrong because you're iterating on the lines, so you shouldn't attempt to change file pointer while doing that.
My approach:
The idea is to "transpose" your file to work with vertical vectors, find the sequence in the 2nd vertical vector, and use the found index to extract data on the first vertical vector.
split lines to get text & number, zip the results to get 2 vectors: 1 of numbers 1 of text.
At this point, one list contains ["line_0","line_1",...] and the other one contains ["1","2","3","4",...]
Find the indexes of the sequence in the number list, and print the couple txt/number when found.
code:
with open("text.txt") as f:
sequence = ('1','2','3')
txt,nums = list(zip(*(l.split()[:2] for l in f))) # [:2] in case there are more columns
for i in range(len(nums)-len(sequence)+1):
if nums[i:i+len(sequence)]==sequence:
print("{} {}".format(txt[i+2],nums[i+2]))
result:
line_2 3
line_9 3
last for loop can be replaced by a list comprehension to generate the tuples:
result = [(txt[i+2],nums[i+2]) for i in range(len(nums)-len(sequence)) if nums[i:i+len(sequence)]==sequence ]
result:
[('line_2', '3'), ('line_9', '3')]
Generalizing for any sequence and any column.
sequence = ['1','2','3']
col = 1
with open(filename, 'r') as infile:
idx = 0
for _i, line in enumerate(infile):
if line.strip().split()[col] == sequence[idx]:
if idx == len(sequence)-1:
print(line)
idx = 0
else:
idx += 1
else:
idx = 0

Python txt files, average, information

i have a .txt file with this(it should be random names, tho):
My Name 4 8 7
Your Name 5 8 7
You U 5 9 7
My My 4 8 5
Y Y 8 7 9
I need to put the information into text file results.txt with the names + average of the numbers. How do I do that?
with open(r'stuff.txt') as f:
mylist = list(f)
i = 0
sk = len(mylist)
while i < sk - 4:
print(mylist[i], mylist[i+1], mylist[i+2], mylist[i+3])
i = i + 3
Firstly, open both the input and output files:
with open("stuff.txt") as in_file:
with open("results.txt", "w") as out_file:
Since the problem only needs to work on each line independently, a simple loop over each line would suffice:
for line in in_file:
Split each line at the whitespaces into list of strings (row):
row = line.split()
The numbers occur after the first two fields:
str_nums = row[2:]
However, these are still strings, so they must be converted to a floating-point number to allow arithmetic to be performed on them. This results in a list of floats (nums):
nums = map(float, str_nums)
Now calculate the average:
avg = sum(nums) / len(str_nums)
Finally, write the names and the average into the output file.
out_file.write("{} {} {}\n".format(row[0], row[1], avg))
what about this?
with open(fname) as f:
new_lines = []
lines = f.readlines()
for each in lines:
col = each.split()
l = len(col)#<-- length of each line
average = (int(col[l-1])+int(col[l-2])+int(col[l-3]))/3
new_lines.append(col[0]+col[1]+str(average) + '\n')
for each in new_lines:#rewriting new lines into file
f.write(each)
f.close()
I tried, and this worked:
inputtxt=open("stuff.txt", "r")
outputtxt=open("output.txt", "w")
output=""""""
for i in inputtxt.readlines():
nums=[]
name=""
for k in i:
try:
nums.append(int(k))
except:
if k!=" ":
name+=k
avrg=0
for j in nums:
avrg+=j
avrg/=len(nums)
line=name+" "+str(avrg)+"\n"
output+=line
outputtxt.write(output)
inputtxt.close()
outputtxt.close()

Categories

Resources