Split a list into multiple sequential files

Split a list into multiple sequential files - python

I have a file of more than 2000 jokes that I've done some preprocessing on and now have them in a list. I want to print them so that there are 20 per text document (except for the last document which may have fewer) and that they are numbered sequentially (jokeset1.txt, jokeset2.txt, etc), but when I try to do something like this, it only creates the first and last documents. How can I fix this?
i = 0
while len(jokes)>0:
i+=1
with open('Jokeset/jokeset' + str(i)+'.txt', 'w') as newfile:
if len(jokes)<20:
x = len(jokes)
else:
x = 20
random.shuffle(jokes)
for i in range(x):
print(jokes[i], file = newfile)
del jokes[:x]

You're rewriting your i variable inside of the loop!
You have i = 1 for the first run of the loop, so the first file is written, but then when you do
for i in range(x):
print(jokes[i], file = newfile)
the variable becomes i = 19. Then you just do i += 1 at the start of the outer loop and you're writing to a file called 'Jokeset/jokeset20.txt'. And this keeps happening over and over again, so no other files ever get written.
Simply rename one of your i variables and you will be good!

Related

Is there a way to assign random.choice results from a txt doc to another txt doc?

I am trying to assign the results of
x = 1
while (x < 11):
names =(random.choice(list(open('AllTickers.txt'))))
with open("NewStockList.txt", "w") as output:
for line in names:
output.write(str(names))
output.write('\n')
x = x + 1
to a new text doc.
When it ran, it just posted the same string 5 times.

That's a lot of code to refactor, first of all you need to declare how strings are separated. Get the variable from 'AllTickers.txt' out of loop for performance reasons and read it either with open or pathlib.Path.
If words or whatever you are checking is separated by linebreaks you can do .splitlines to the resulted text otherwise you might need to use split declaring the separator.
This will return you a list which can be randomly choose with random.choice.
Something like:
import pathlib
tickers = pathlib.Path('AllTickers.txt').read_text()
tickers = tickers.splitlines() # or `.split`
while a < b: # Your loop
ticker = random.choice(tickers)
# Now do whatever you need with that variable

What is the need for using both random.choice and a for loop.
because these two returns only a one item in each loop.
If you need get a random item from the AllStickers.txt and save it to NewStocklist.txt, then you need read every line in the Allsticker.txt and take a random item from the list and save it to newstocklist.txt.
Also you need to open newstockelist.txt from append mode, not in write mode. Otherwise, in your while loop the previously saved items will be overwritten in each turn.
import random
names = list(open("AllStickers.txt", "r").readlines())
x = 0
while x < 11:
with open("new.txt", "a") as output:
line = random.choice (names)
output.write(str(line))
x = x+1
continue
If you need to take items from AllSticker.txt and write them into the NewStockList.txt, then use a for loop and read every line in AllStickers.txt .
names = list(open("AllStickers.txt", "r").readlines())
x = 0
while x < 11:
with open("NewStockList.txt", "a") as output:
for line in names:
output.write(str(line))
x = x+1
continue

Basic python, how to return lists of lists when reading a text file

im trying to store each new line of a text file as a different list within a list, where the characters of that nested list are also individual cells. Right now it only appends the ending character of each line, not sure why due to the nested while loop. Anyone see the mistakes? Thanks
def read_lines(filename):
ls_1 = []
x = open(filename, 'r')
i = 0
t = 0
while True: #nested while loop to read lines and seperate lines into individual characters (cells)
read = x.readline()
if read == '':
break
st = read.strip("''\n''")
while t < len(st):
ls_2 = []
ls_2.append(st[t])
t += 1
ls_1.append(ls_2) #append a new list to the original list every time the while loop resets and a new line is read
#ls_2.clear() # removes contents so the next loop doesn't repeat the first readline (doesnt work for unkown reason)
t = 0 # resets the index of read so the next new line can be read from start of line
i += 1
x.close()
return ls_1
Whole txt file:
Baby on board, how I've adored
That sign on my car's windowpane.
Bounce in my step,
Loaded with pep,
'Cause I'm driving in the carpool lane.
Call me a square,
Friend, I don't care.
That little yellow sign can't be ignored.
I'm telling you it's mighty nice.
Each trip's a trip to paradise
With my baby on board!

The reason you are only getting the last character is because you create *a new list inside your inner loop:
while t < len(st):
ls_2 = []
ls_2.append(st[t])
t += 1
ls_1.append(ls_2)
Instead, you would have to do:
ls_2 = []
while t < len(st):
ls_2.append(st[t])
t += 1
ls_1.append(ls_2)
However, don't use while loops to read from files, file objects are iterators, so just use a for-loop. Similarly, don't use a while loop to iterate over a string.
Here is how you would do it, Pythonically:
result = []
with open(filename) as f:
for line in f:
result.append(list(line.strip()))
Or with a list comprehension:
with open(filename) as f:
result = [list(line.strip()) for line in f]
You almost never use while-loops in Python. Everything is iterator based.

I suggested you to use the function readlines from python, that way you can iterate of each line of the opened file, then you can cast the string to list, by doing that you generate a list with all characters that compose that string (which seems to be what you want).
Try using the following code:
def read_lines(filename):
x = open(filename, 'r')
ls_1 = [list(line.strip()) for line in x.readlines()]
x.close()
return ls_1

Taking info from one file and printing to another in Python

I am trying to copy the contents of a file that has MANY words and moving the contents into another file. The original file has 3 letter words that i'd like to sort out. unfortunately I have been unsuccessful in getting it to happen. I am newer to Python with some Java experience so im trying to do this pretty basic. Code is as follows:
# Files that were going to open
filename = 'words.txt'
file_two = 'new_words.txt'
# Variables were going to use in program
# Program Lists to transfer long words
words = []
# We open the file and store it into our list here
with open(filename, 'r') as file_object:
for line in file_object:
words.append(line.rstrip("\n"))
# We transfer the info into the new file
with open(file_two, 'a') as file:
x = int(0)
for x in words:
if len(words[x]) >= 5:
print(words[x])
file.write(words[x])
x += 1
I understand my problem is at the bottom while trying to import to the new file and perhaps a simple explanation might get me there, many thanks.

The problem is here:
with open(file_two, 'a') as file:
x = int(0)
for x in words:
if len(words[x]) >= 5:
print(words[x])
file.write(words[x])
x += 1
The reason for the error you're getting is that x isn't a number once the loop begins. It is a string.
I think you misunderstand how for loops work in python. They're more akin to foreach loops from other languages. When you do for x in words, x is given the value of the first element in words, then the second, and so on for each iteration. You however are trying to treat it like a normal for loop, going through the list by index. Of course this doesn't work.
There are two ways to go about fixing your code. You can either take the foreach approach:
with open(file_two, 'w') as file:
for x in words: #x is a word
if len(x) >= 5:
print(x)
file.write(x)
Or, use len() to loop through the range of indices of the list. This will yield behavior similar to that of a traditional for loop:
with open(file_two, 'a') as file:
for x in range(len(words)): #x is a number
if len(words[x]) >= 5:
print(words[x])
file.write(words[x])
There is also no need to manually increment x, or to give x an initial value, as it is reassigned at the beginning of the for loop.

Python - how to get last line in a loop

I have some CSV files that I have to modify which I do through a loop. The code loops through the source file, reads each line, makes some modifications and then saves the output to another CSV file. In order to check my work, I want the first line and the last line saved in another file so I can confirm that nothing was skipped.
What I've done is put all of the lines into a list then get the last one from the index minus 1. This works but I'm wondering if there is a more elegant way to accomplish this.
Code sample:
def CVS1():
fb = open('C:\\HP\\WS\\final-cir.csv','wb')
check = open('C:\\HP\\WS\\check-all.csv','wb')
check_count = 0
check_list = []
with open('C:\\HP\\WS\\CVS1-source.csv','r') as infile:
skip_first_line = islice(infile, 3, None)
for line in skip_first_line:
check_list.append(line)
check_count += 1
if check_count == 1:
check.write(line)
[CSV modifications become a string called "newline"]
fb.write(newline)
final_check = check_list[len(check_list)-1]
check.write(final_check)
fb.close()

If you actually need check_list for something, then, as the other answers suggest, using check_list[-1] is equivalent to but better than check_list[len(check_list)-1].
But do you really need the list? If all you want to keep track of is the first and last lines, you don't. If you keep track of the first line specially, and keep track of the current line as you go along, then at the end, the first line and the current line are the ones you want.
In fact, since you appear to be writing the first line into check as soon as you see it, you don't need to keep track of anything but the current line. And the current line, you've already got that, it's line.
So, let's strip all the other stuff out:
def CVS1():
fb = open('C:\\HP\\WS\\final-cir.csv','wb')
check = open('C:\\HP\\WS\\check-all.csv','wb')
first_line = True
with open('C:\\HP\\WS\\CVS1-source.csv','r') as infile:
skip_first_line = islice(infile, 3, None)
for line in skip_first_line:
if first_line:
check.write(line)
first_line = False
[CSV modifications become a string called "newline"]
fb.write(newline)
check.write(line)
fb.close()

You can enumerate the csv rows of inpunt file, and check the index, like this:
def CVS1():
with open('C:\\HP\\WS\\final-cir.csv','wb') as fb, open('C:\\HP\\WS\\check-all.csv','wb') as check, open('C:\\HP\\WS\\CVS1-source.csv','r') as infile:
skip_first_line = islice(infile, 3, None)
for idx,line in enumerate(skip_first_line):
if idx==0 or idx==len(skip_first_line):
check.write(line)
#[CSV modifications become a string called "newline"]
fb.write(newline)
I've replaced the open statements with with block, to delegate to interpreter the files handlers

you can access the index -1 directly:
final_check = check_list[-1]
which is nicer than what you have now:
final_check = check_list[len(check_list)-1]

If it's not an empty or 1 line file you can:
my_file = open(root_to file, 'r')
my_lines = my_file.readlines()
first_line = my_lines[0]
last_line = my_lines[-1]

Update iteration value in Python for loop

Pretty new to Python and have been writing up a script to pick out certain lines of a basic log file
Basically the function searches lines of the file and when it finds one I want to output to a separate file, adds it into a list, then also adds the next five lines following that. This then gets output to a separate file at the end in a different funcition.
What I've been trying to do following that is jump the loop to continue on from the last of those five lines, rather than going over them again. I thought the last line in the code would solved the problem, but unfortunately not.
Are there any recommended variations of a for loop I could use for this purpose?
def readSingleDayLogs(aDir):
print 'Processing files in ' + str(aDir) + '\n'
lineNumber = 0
try:
open_aDirFile = open(aDir) #open the log file
for aLine in open_aDirFile: #total the num. lines in file
lineNumber = lineNumber + 1
lowerBound = 0
for lineIDX in range(lowerBound, lineNumber):
currentLine = linecache.getline(aDir, lineIDX)
if (bunch of logic conditions):
issueList.append(currentLine)
for extraLineIDX in range(1, 6): #loop over the next five lines of the error and append to issue list
extraLine = linecache.getline(aDir, lineIDX+ extraLineIDX) #get the x extra line after problem line
issueList.append(extraLine)
issueList.append('\n\n')
lowerBound = lineIDX

You should use a while loop :
line = lowerBound
while line < lineNumber:
...
if conditions:
...
for lineIDX in range(line, line+6):
...
line = line + 6
else:
line = line + 1

A for-loop uses an iterator over the range, so you can have the ability to change the loop variable.
Consider using a while-loop instead. That way, you can update the line index directly.

I would look at something like:
from itertools import islice
with open('somefile') as fin:
line_count = 0
my_lines = []
for line in fin:
line_count += 1
if some_logic(line):
my_lines.append(line)
next_5 = list(islice(fin, 5))
line_count += len(next_5)
my_lines.extend(next_5)
This way, by using islice on the input, you're able to move the iterator ahead and resume after the 5 lines (perhaps fewer if near the end of the file) are exhausted.
This is based on if I'm understanding correctly that you can read forward through the file, identify a line, and only want a fixed number of lines after that point, then resume looping as per normal. (You may not even require the line counting if that's all you're after as it only appears to be for the getline and not any other purpose).
If you indeed you want to take the next 5, and still consider the following line, you can use itertools.tee to branch at the point of the faulty line, and islice that and let the fin iterator resume on the next line.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Split a list into multiple sequential files - python

Related

Is there a way to assign random.choice results from a txt doc to another txt doc?

Basic python, how to return lists of lists when reading a text file

Taking info from one file and printing to another in Python

Python - how to get last line in a loop

Update iteration value in Python for loop

Categories

Resources