How to convert list to 20 column array [duplicate] - python

This question already has answers here:
How to read a text file into a list or an array with Python
(6 answers)
Closed 3 years ago.
I am making a program that takes a text file that looks something like this:
1
0
1
1
1
and converts it into a list:
['1','0','1','1','1']
The file has 400 lines so I want to convert it into an array that's 20 columns by 20 rows.

just use slicing to chunk it every 20 entries:
lines = [*range(1,401)]
rows_cols = [lines[i:i + 20] for i in range(0, len(lines), 20)]

Detect characters one by one counting at the same time how many characters you detected. There are two cases one when you detect the character and the counter is less than 20 and the other one when you detect the newline character for which you don't update the counter. Therefore in the first case the detected character should be assigned to the list(updating at the same time the column variable) while on the other case you just skip the newline and you continue with the next character of the text file if the counter is less than 20. In case the counter is 20 you just simply update the variable which represents the lines of the list.

this is save the character in 20 column, if in case no of rows are not multple of 20, it will create a list less element than 20 and add it in main list
solu =[]
leng = 20
with open('result.txt','r') as f:
sol = f.readlines()
tmp = []
for i in sol:
if len(tmp)<leng:
tmp.append(i.strip('\n'))
else:
print(tmp)
solu.append(tmp)
tmp=[]
solu.append(tmp)
print(solu)

Related

Finding identical numbers in large files python

I have two data files in python, each containing two-column data as below:
3023084 5764
9152549 5812
18461998 5808
45553152 5808
74141469 5753
106932238 5830
112230478 5795
135207137 5800
148813978 5802
154818883 5798
There are about 10M entries in each file (~400Mb).
I have to sort through each file and check if any number in the first column of one file matches any number in the first column in another file.
The code I currently have converted the files to lists:
ch1 = []
with open('ch1.txt', 'r+') as file:
for line in file:
if ':' not in line:
line = line.split()
ch1.append([line[0], line[1]])
ch2 = []
with open('ch2.txt', 'r+') as file:
for line in file:
if ':' not in line:
line = line.split()
ch2.append([line[0], line[1]])
I then iterate through both of the lists looking for a match. When a match is found I with to add the sum of the right hand columns to a new list 'coin'
coin = []
for item1 in ch1:
for item2 in ch2:
if item1[0] == item2[0]:
coin.append(int(item1[1]) + int(item2[1]))
The issue is this is taking a very long time and or crashing. Is there a more efficient way of running with?
There are lots of ways to improve this; for example:
Since you only scan through the contents of ch1.txt once, you don't need to read it into a list, and should thus take up less memory, but probably won't speed things up all that much.
If you sort each of your lists, you can check for matches much more efficiently. Something like:
i1, i2 = 0, 0
while i1 < len(ch1) and i2 < len(ch2):
if ch1[i1][0] == ch2[i2][0]:
# Do what you do for matches
...
# Advance both indices
i1 += 1
i2 += 1
elif ch1[i1][0] < ch2[i2][0]:
# Advance index of the smaller value
i1 += 1
else: # ch1[i1][0] > ch2[i2][0]
i2 += 1
If the data in the files are already sorted, you can combine both ideas: instead of advancing an index, you simply read in the next line of the corresponding file. This should improve efficiency in time and space.
Few ideas to improve this:
store your data in dictionaries in such a way your first column is the key and the second column is the value of a dictionary for later use,
a match is if a key is in the intersection of the keys of the two dictionaries
Code example:
# store your data in dicts as following
ch1_dict[line[0]] = line[1]
ch2_dict[line[0]] = line[1]
#this is what you want to achieve
coin = [int(ch1_dict[key]) + int(ch2_dict[key]) for key in ch1_dict.keys() & ch2_dict.keys()]

Why does this error occur when my text files have clearly more than 1 lines?

I'm a beginner in Python. I've checked my text files, and definitely have more than 1 lines, so I don't understand why it gave me the error on
---> 11 Coachid.append(split[1].rstrip())
IndexError: list index out of range
The problem are the lines:
split=line.split(",")
Coachname.append(split[0].rstrip())
Coachid.append(split[1].rstrip())
The first line assumes that line contains at lest one comma so that after method split is called variable split will be a list of at least length two. But if line contains no commas, then split will have length 1 and Coachid.append(split[1].rstrip()) will generate the error you are getting. You need to add some conditional tests of the length of split.
Update
Your code should look like (assuming that the correct action is to append an empty string to the Coachid list if it is missing from the input):
split=line.split(",")
split_length = len(split)
Coachname.append(split[0].rstrip())
# append '' if split_length is less than 2:
Coachid.append('' if split_length < 2 else split[1].rstrip())
etc. for the other fields
If you want to loop over lines of a file, you have to use
for line in f.readlines()
...

Add apostrophes to string to make it a certain length python [duplicate]

This question already has answers here:
How can I fill out a Python string with spaces?
(14 answers)
Closed 2 years ago.
I am writing a file which gets information about stock items from a csv, the issue is that most of the stock item IDs are 4 digits long, but some of them are 2 or 3 digits, and the remaining digits are replaced with apostrophes ( 872' or 99'' for example). Because the user can pass in specific stock item IDs it would be better if they did not have to include apostrophes in their input just so the code runs, so I want to append apostrophes to their input ID.
At the moment, the stock item IDs to get information for are retrieved using this code::
if args.ID:
if args.ID[0].endswith('.txt'):
with open(args.ID[0], 'r') as f:
IDs = [line for line in f]
else:
IDs = args.FTID
else:
IDs = [ID[25:29] for ID in df['Stock Items'].unique()]
Then I iterate through the dataframe:
for index, row in df.iterrows():
if row['Stock Items'][25:29] in FTIDs:
# Processing
I need to be able to make sure that any input IDs are in the format above.
If you have str and you are 100% sure it is not longer than 4, you can use .ljust to get str with required number of ' added following way
myid = "99"
formatted_id = myid.ljust(4, "'")
print(formatted_id)
Output:
99''

Python: calculate the sum of a column after splitting the column

I'm new at writing python and thought I would re-write some of my programs that are in perl.
I have a tab delimited file, where columns 9-through the end (which varies) needs to be further split and then the sum of part of that column added
for instance, input (only looking at columns 9-12):
0:0:1:0 0:0:2:0 0:0:3:0 0:0:4:0
0:0:1:0 0:0:2:0 0:0:3:0 0:0:4:0
0:0:1:0 0:0:2:0 0:0:3:0 0:0:4:0
0:0:1:0 0:0:2:0 0:0:3:0 0:0:4:0
output (sum of each column[2]:
4
8
12
16
All I've got so far is
datacol = line.rstrip("\n").split("\t")
for element in datacol[9:len(datacol)]:
splitcol=int(element.split(r":")[2])
totalcol += splitcol
print(totalcol)
which doesn't work and gives me the sum of column[2] for each row.
Thanks
mysum = 0
with open('myfilename','r') as f:
for line in f:
mysum += int(line.split()[3])
line.split() will turn "123 Hammer 20 36" into ["123", "Hammer", "20", "36"].
We take the fourth value 36 using the index [3]. This is still a string, and can be converted to an integer using int or a decimal (floating-point) number using float.
to check for empty lines add the condition if line: in the for loop. In your particular case you might do something like:
for line in f:
words = line.split()
if len(words)>3:
mysum += int(words[3])
Try this:
totalcol = [0,0,0,0] #Store sum results in a list
with open('myfilename','r') as f:
for line in f
#split line and keep columns 9,10,11,12
#assuming you are counting from 1 and have only 12 columns
datacol = line.rstrip("\n").split("\t")[8:] #lists start at index 0!
#Loop through each column and sum the 3rd element
for i,element in enumerate(datacol):
splitcol=int(element.split(":")[2])
totalcol[i] += splitcol
print(totalcol)

Python - Splitting a large string by number of delimiter occurrences

I'm still learning Python, and I have a question I haven't been able to solve. I have a very long string (millions of lines long) which I would like to be split into a smaller string length based on a specified number of occurrences of a delimeter.
For instance:
ABCDEF
//
GHIJKLMN
//
OPQ
//
RSTLN
//
OPQR
//
STUVW
//
XYZ
//
In this case I would want to split based on "//" and return a string of all lines before the nth occurrence of the delimeter.
So an input of splitting the string by // by 1 would return:
ABCDEF
an input of splitting the string by // by 2 would return:
ABCDEF
//
GHIJKLMN
an input of splitting the string by // by 3 would return:
ABCDEF
//
GHIJKLMN
//
OPQ
And so on... However, The length of the original 2 million line string appeared to be a problem when I simply tried to split the entire string and by "//" and just work with the individual indexes. (I was getting a memory error) Perhaps Python can't handle so many lines in one split? So I can't do that.
I'm looking for a way that I don't need to split the entire string into a hundred-thousand indexes when I may only need 100, but instead just start from the beginning until a certain point, stop and return everything before it, which I assume may also be faster? I hope my question is as clear as possible.
Is there a simple or elegant way to achieve this? Thanks!
If you want to work with files instead of strings in memory, here is another answer.
This version is written as a function that reads lines and immediately prints them out until the specified number of delimiters have been found (no extra memory needed to store the entire string).
def file_split(file_name, delimiter, n=1):
with open(file_name) as fh:
for line in fh:
line = line.rstrip() # use .rstrip("\n") to only strip newlines
if line == delimiter:
n -= 1
if n <= 0:
return
print line
file_split('data.txt', '//', 3)
You can use this to write the output to a new file like this:
python split.py > newfile.txt
With a little extra work, you can use argparse to pass parameters to the program.
As a more efficient way you can read the firs N lines separated by your delimiter so if you are sure that all of your lines are splitted by delimiter you can use itertools.islice to do the job:
from itertools import islice
with open('filename') as f :
lines = islice(f,0,2*N-1)
The method that comes to my mind when I read your question uses a for loop
where you cut up the string into several (for example the 100 you called) and iterate through the substring.
thestring = "" #your string
steps = 100 #length of the strings you are going to use for iteration
log = 0
substring = thestring[:log+steps] #this is the string you will split and iterate through
thelist = substring.split("//")
for element in thelist:
if(element you want):
#do your thing with the line
else:
log = log+steps
# and go again from the start only with this offset
now you can go through all the elements go through the whole 2 million(!) line string.
best thing to do here is actually make a recursive function from this(if that is what you want):
thestring = "" #your string
steps = 100 #length of the strings you are going to use for iteration
def iterateThroughHugeString(beginning):
substring = thestring[:beginning+steps] #this is the string you will split and iterate through
thelist = substring.split("//")
for element in thelist:
if(element you want):
#do your thing with the line
else:
iterateThroughHugeString(beginning+steps)
# and go again from the start only with this offset
For instance:
i = 0
s = ""
fd = open("...")
for l in fd:
if l[:-1] == delimiter: # skip last '\n'
i += 1
if i >= max_split:
break
s += l
fd.close()
Since you are learning Python it would be a challenge to model a complete dynamic solution. Here's a notion of how you can model one.
Note: The following code snippet only works for file(s) which is/are in the given format (see the 'For Instance' in the question). Hence, it is a static solution.
num = (int(input("Enter delimiter: ")) * 2)
with open("./data.txt") as myfile:
print ([next(myfile) for x in range(num-1)])
Now that have the idea, you can use pattern matching and so on.

Categories

Resources