reading a file that is detected as being one column

reading a file that is detected as being one column - python

I have a file full of numbers in the form;
010101228522 0 31010 3 3 7 7 43 0 2 4 4 2 2 3 3 20.00 89165.30
01010222852313 3 0 0 7 31027 63 5 2 0 0 3 2 4 12 40.10 94170.20
0101032285242337232323 7 710153 9 22 9 9 9 3 3 4 80.52 88164.20
0101042285252313302330302323197 9 5 15 9 15 15 9 9 110.63 98168.80
01010522852617 7 7 3 7 31330 87 6 3 3 2 3 2 5 15 50.21110170.50
...
...
I am trying to read this file but I am not sure how to go about it, when I use the built in function open and loadtxt from numpy and i even tried converting to pandas but the file is read as one column, that is, its shape is (364 x 1) but I want it to separate the numbers to columns and the blank spaces to be replaced by zeros, any help would be appreciated. NOTE, some places there are two spaces following each other

If the columns content type is a string have you tried using str.split() This will turn the string into an array, then you have each number split up by each gap. You could then use a for loop for the amount of objects in the mentioned array to create a table out of it, not quite sure this has answered the question, sorry if not.
str.split():

So I finally solved my problem, I actually had to strip the lines and then read each "letter" from the line, in my case I am picking individual numbers from the stripped line and then appending them to an array. Here is the code for my solution;
arr = []
with open('Kp2001', 'r') as f:
for ii, line in enumerate(f):
arr.append([]) #Creates an n-d array
cnt = line.strip() #Strip the lines
for letter in cnt: #Get each 'letter' from the line, in my case it's the individual numbers
arr[ii].append(letter) #Append them individually so python does not read them as one string
df = pd.DataFrame(arr) #Then converting to DataFrame gives proper columns and actually keeps the spaces to their respectful columns
df2 = df.replace(' ', 0) #Replace the spaces with what you will

Related

How to make a grid of the size a rows x b columns from a list containing exactly a*b items? Python grid, list, matrix?

How do I make a 3x5 grid out of a list containing 15 items/strings?
I have a list containing 15 symbols but it could very well also just be a list such as mylist = list(range(15)), that I want to portray in a grid with 3 rows and columns. How does that work without importing another module?
I've been playing around with the for loop a bit to try and find a way but it's not very intuitive yet so I've been printing long lines of 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 etc I do apologize for this 'dumb' question but I'm an absolute beginner as you can tell and I don't know how to move forward with this simple problem
This is what I was expecting for an output, as I want to slowly work my way up to making a playing field or a tictactoe game but I want to understand portraying grids, lists etc as best as possible first
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15

A mxn Grid? There are multiple ways to do it. Print for every n elements.
mylist = list(range(15))
n = 5
chunks = (mylist[i:i+n] for i in range(0, len(mylist), n))
for chunk in chunks:
print(*chunk)
Gives 3x5
0 1 2 3 4
5 6 7 8 9
10 11 12 13 14
Method 2
If you want more cosmetic then you can try
Ref
pip install tabulate
Code
mylist = list(range(15))
wrap = [mylist[x:x+5] for x in range(0, len(mylist),5)]
from tabulate import tabulate
print(tabulate(wrap))
Gives #
-- -- -- -- --
0 1 2 3 4
5 6 7 8 9
10 11 12 13 14
-- -- -- -- --

Edit columns in a text file

I have a text file containing 4 columns. I need to remove the first two columns and replace them with one column. The value which I should put as the new column is being produced in a loop. Here is something I am trying to do.
The input is like this:
1 2 3 4
5 6 7 8
9 1 2 3
The output should be like this:
d 3 4
d 7 8
d 2 3
but "d" is a variable that is being produced in a loop for each line.
with open('EQ.txt','r') as f:
i = 0
for line in f
...
...
d=r+d
with open(c.txt, "w") as wrt:
new_line = d\n.format(line[2], line[3])
wrt.write(new_line)

What you want is:
new_line = "%d %d %d\n".format(d, line[2], line[3])
The format string has to be in quotes, with %d formats to specify that you want decimal numbers there. Then you list all three values in the argument list.

Why is set not calculating my unique integers?

I just started teaching myself Python last night via Python documentation, tutorials and SO questions.
So far I can ask a user for a file, open and read the file, remove all # and beginning \n in the file, read each line into an array, and count the number of integers per line.
I want to calculate the number of unique integers per line. I realized that Python uses a set capability which I thought would work perfectly for this calculation. However, I always receive the value of one greater than the prior value (I will show you). I looked at other SO posts related to sets and do not see what I am not missing and have been stumped for a while.
Here is the code:
with open(filename, 'r') as file:
for line in file:
if line.strip() and not line.startswith("#"):
#calculate the number of integers per line
names_list.append(line)
#print "There are ", len(line.split()), " numbers on this line"
#print names_list
#calculate the number of unique integers
myset = set(names_list)
print myset
myset_count = len(myset)
print "unique:",myset_count
For further explanation:
names_list is:
['1 2 3 4 5 6 5 4 5\n', '14 62 48 14\n', '1 3 5 7 9\n', '123 456 789 1234 5678\n', '34 34 34 34 34\n', '1\n', '1 2 2 2 2 2 3 3 4 4 4 4 5 5 6 7 7 7 1 1\n']
and my_set is:
set(['1 2 3 4 5 6 5 4 5\n', '1 3 5 7 9\n', '34 34 34 34 34\n', '14 62 48 14\n', '1\n', '1 2 2 2 2 2 3 3 4 4 4 4 5 5 6 7 7 7 1 1\n', '123 456 789 1234 5678\n'])
The output I receive is:
unique: 1
unique: 2
unique: 3
unique: 4
unique: 5
unique: 6
unique: 7
The output that should occur is:
unique: 6
unique: 3
unique: 5
unique: 5
unique: 1
unique: 1
unique: 7
Any suggestions as to why my set per line is not calculating the correct number of unique integers per line? I would also like any suggestions on how to improve my code in general (if you would like) because I just started learning Python by myself last night and would love tips. Thank you.

The problem is that as you are iterating over your file you are appending each line to the list names_list. After that, you build a set out of these lines. Your text file does not seem to have any duplicate lines, so printing the length of your set just displays the current number of lines you have processed.
Here's a commented fix:
with open(filename, 'r') as file:
for line in file:
if line.strip() and not line.startswith("#"):
numbers = line.split() # splits the string by whitespace and gives you a list
unique_numbers = set(numbers) # builds a set of the strings in numbers
print(len(unique_numbers)) # prints number of items in the set
Note that we are using the currently processed line and build a set from it (after splitting the line). Your original code stores all lines and then builds a set from the lines in each loop.

myset = set(names_list)
should be
myset = set(line.split())

Finding the most frequent items in a dataset

I am working with a big dataset and thus I only want to use the items that are most frequent.
Simple example of a dataset:
1 2 3 4 5 6 7
1 2
3 4 5
4 5
4
8 9 10 11 12 13 14
15 16 17 18 19 20
4 has 4 occurrences,
1 has 2 occurrences,
2 has 2 occurrences,
5 has 2 occurrences,
I want to be able to generate a new dataset just with the most frequent items, in this case the 4 most common:
The wanted result:
1 2 3 4 5
1 2
3 4 5
4 5
4
I am finding the first 50 most common items, but I am failing to print them out in a correct way. (my output is resulting in the same dataset)
Here is my code:
from collections import Counter
with open('dataset.dat', 'r') as f:
lines = []
for line in f:
lines.append(line.split())
c = Counter(sum(lines, []))
p = c.most_common(50);
with open('dataset-mostcommon.txt', 'w') as output:
..............
Can someone please help me on how I can achieve it?

You have to iterate again the dataset and, for each line, show only those who are int the most common data set.
If the input lines are sorted, you may just do a set intersection and print those in sorted order. If it is not, iterate your line data and check each item
for line in dataset:
for element in line.split()
if element in most_common_elements:
print(element, end=' ')
print()
PS: For Python 2, add from __future__ import print_function on top of your script

According to the documentation, c.most-common returns a list of tuples, you can get the desired output as follow:
with open('dataset-mostcommon.txt', 'w') as output:
for item, occurence in p:
output.writelines("%d has %d occurrences,\n"%(item, occurence))

Formatting lists

I need to create a function that takes inputs of lists from the user and returns them as such:
>>> print_table([[0,1,2,3,4,5],[0,1,4,9,16,25],[0,1,8,27,64,125]])
0 1 2 3 4 5
0 1 4 9 16 25
0 1 8 27 64 125
>>> print_table(times_table(6,6))
0 0 0 0 0 0 0
0 1 2 3 4 5 6
0 2 4 6 8 10 12
0 3 6 9 12 15 18
0 4 8 12 16 20 24
0 5 10 15 20 25 30
0 6 12 18 24 30 36
The times_table refers to my current code:
def times_table(s):
n = int(input('Please enter a positive integer between 1 and 15: '))
for row in range(n+1):
s = ''
for col in range(n+1):
s += '{:3} '.format(row * col)
print(s)
Help me if you can....

To get two values as input from the user, i.e. number of columns and rows, you can do as follows:
in_values = input('Please enter two positive integers between 1 and 15, separated by comma (e.g. 2,3): ')
m,n = map(int, in_values.split(','))
print(m,n)

To print out a formatted list of lists, you may wish to consider using string formatting through the format() method of strings. One thing I notice in your upper example is that you only get to 3 digits, and the space between the numbers seems to be unchanging. For lists with large numbers, this will likely mess up the formatting of the table. By using the format() method, you can take this into account and keep your table nicely spaced.
The easiest way I can think of to accomplish this is to determine what is the single largest number (most digits) in the entire list of lists and then incorporate that in the formatting. I would recommend you read up on string formatting for the python type string (including the mini formatting language).
Assuming s is the argument passed in to print_table:
maxchars = len(str(max(max(s))))
This will provide the largest number of characters in a single entry in the list. You can then utilize this number in the formatting of the rows in a for loop:
for lst in l:
output = ""
for i in lst:
output += "{0:<{1}} ".format(i, maxchars)
print(output)
the line output += "{0:<{1}} ".format(i, maxchars) means to print the number ({0} maps to the i in the call to format) left adjusted (<) in a space of characters "maxchars" wide ({1} maps to maxchars in the call to format).
So given your list of lists above, it will print it as:
0 1 2 3 4 5
0 1 4 9 16 25
0 1 8 27 64 125
but if the numbers are much larger (or any of the numbers are much larger, such as the 125 being replaced with 125125, it will unfortunately look like this because it is padding each item with the appropriate number of character spaces to contain a number of 6 characters:
0 1 2 3 4 5
0 1 4 9 16 25
0 1 8 27 64 125125
The above example takes a variable number of characters into account, however you could also format the string using an integer by replacing the {1} with an integer and omitting the maxchars portion (including both setting it and it being passed to format) if that is sufficient.
output += "{0:<4} ".format(i)
Optionally, you could figure out how to determine the largest number in a given column and then just format that column appropriately, however I am not going to put that in this answer.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

reading a file that is detected as being one column - python

Related

How to make a grid of the size a rows x b columns from a list containing exactly a*b items? Python grid, list, matrix?

Edit columns in a text file

Why is set not calculating my unique integers?

Finding the most frequent items in a dataset

Formatting lists

Categories

Resources