Edit columns in a text file - python

I have a text file containing 4 columns. I need to remove the first two columns and replace them with one column. The value which I should put as the new column is being produced in a loop. Here is something I am trying to do.
The input is like this:
1 2 3 4
5 6 7 8
9 1 2 3
The output should be like this:
d 3 4
d 7 8
d 2 3
but "d" is a variable that is being produced in a loop for each line.
with open('EQ.txt','r') as f:
i = 0
for line in f
...
...
d=r+d
with open(c.txt, "w") as wrt:
new_line = d\n.format(line[2], line[3])
wrt.write(new_line)

What you want is:
new_line = "%d %d %d\n".format(d, line[2], line[3])
The format string has to be in quotes, with %d formats to specify that you want decimal numbers there. Then you list all three values in the argument list.

Related

reading a file that is detected as being one column

I have a file full of numbers in the form;
010101228522 0 31010 3 3 7 7 43 0 2 4 4 2 2 3 3 20.00 89165.30
01010222852313 3 0 0 7 31027 63 5 2 0 0 3 2 4 12 40.10 94170.20
0101032285242337232323 7 710153 9 22 9 9 9 3 3 4 80.52 88164.20
0101042285252313302330302323197 9 5 15 9 15 15 9 9 110.63 98168.80
01010522852617 7 7 3 7 31330 87 6 3 3 2 3 2 5 15 50.21110170.50
...
...
I am trying to read this file but I am not sure how to go about it, when I use the built in function open and loadtxt from numpy and i even tried converting to pandas but the file is read as one column, that is, its shape is (364 x 1) but I want it to separate the numbers to columns and the blank spaces to be replaced by zeros, any help would be appreciated. NOTE, some places there are two spaces following each other
If the columns content type is a string have you tried using str.split() This will turn the string into an array, then you have each number split up by each gap. You could then use a for loop for the amount of objects in the mentioned array to create a table out of it, not quite sure this has answered the question, sorry if not.
str.split():
So I finally solved my problem, I actually had to strip the lines and then read each "letter" from the line, in my case I am picking individual numbers from the stripped line and then appending them to an array. Here is the code for my solution;
arr = []
with open('Kp2001', 'r') as f:
for ii, line in enumerate(f):
arr.append([]) #Creates an n-d array
cnt = line.strip() #Strip the lines
for letter in cnt: #Get each 'letter' from the line, in my case it's the individual numbers
arr[ii].append(letter) #Append them individually so python does not read them as one string
df = pd.DataFrame(arr) #Then converting to DataFrame gives proper columns and actually keeps the spaces to their respectful columns
df2 = df.replace(' ', 0) #Replace the spaces with what you will

Finding particular column value via regex

I have a txt file containing multiple rows as below.
56.0000 3 1
62.0000 3 1
74.0000 3 1
78.0000 3 1
82.0000 3 1
86.0000 3 1
90.0000 3 1
94.0000 3 1
98.0000 3 1
102.0000 3 1
106.0000 3 1
110.0000 3 0
116.0000 3 1
120.0000 3 1
Now I am looking for the row which has '0' in the third column .
I am using python regex package. What I have tried is re.match("(.*)\s+(0-9)\s+(1)",line) but of no help..
What should be the regular expression pattern I should be looking for?
You probably don't need a regex for this. You can strip trailing whitespaces from the right side of the line and then check the last character:
if line.rstrip()[-1] == "0": # since your last column only contains 0 or 1
...
Just split line and read value from list.
>>> line = "56.0000 3 1"
>>> a=line.split()
>>> a
['56.0000', '3', '1']
>>> print a[2]
1
>>>
Summary:
f = open("sample.txt",'r')
for line in f:
tmp_list = line.split()
if int(tmp_list[2]) == 0:
print "Line has 0"
print line
f.close()
Output:
C:\Users\dinesh_pundkar\Desktop>python c.py
Line has 0
110.0000 3 0

Python script for copying a column

This seems like a simple question, but I can't find an answer.
Input:
a 3 4
b 1 4
c 8 3
d 3 8
Wanted output:
a a 3 4
b b 1 4
c c 8 3
d d 3 8
Note: the file .txt input has many rows in the first column.
You didn't ask for it, but would you want awk? You could do:
awk '{$1=$1 OFS $1}1' Input
or the more obvious but less flexible:
awk '{print $1 $1 $2 $3}' Input
Assuming you've read your results in an array, you want:
values = ["a",1,2,3]
values.insert(0,values[0])
This inserts the value of index 0 (in this case "a") at position 0, moving all the other contents of values to the right.
This will also work on strings, so if your results are read as a string you can do the following - please note that I am including the spaces after each digit and am doing it a bit differently:
values="a 1 2 3"
values = values[:2] + values
In this example we take the first two array members (values[:2] or values[0:2]) and adding the existing array values to the end.
Hope this helps!
Try this:
fin = open("text.txt")
content = fin.readlines()
fin.close()
for elem in content:
print(elem[0],elem[0]+elem[1:-1])
Output:
a a 3 4
b b 1 4
c c 8 3
d d 3 8
with open("sample.csv") as inputs:
for line in inputs:
trimed_line = line.strip()
parts = trimed_line.split()
print("{0} {1}".format(parts[0], trimed_line))
output:
a a 3 4
b b 1 4
c c 8 3
d d 3 8

Why is set not calculating my unique integers?

I just started teaching myself Python last night via Python documentation, tutorials and SO questions.
So far I can ask a user for a file, open and read the file, remove all # and beginning \n in the file, read each line into an array, and count the number of integers per line.
I want to calculate the number of unique integers per line. I realized that Python uses a set capability which I thought would work perfectly for this calculation. However, I always receive the value of one greater than the prior value (I will show you). I looked at other SO posts related to sets and do not see what I am not missing and have been stumped for a while.
Here is the code:
with open(filename, 'r') as file:
for line in file:
if line.strip() and not line.startswith("#"):
#calculate the number of integers per line
names_list.append(line)
#print "There are ", len(line.split()), " numbers on this line"
#print names_list
#calculate the number of unique integers
myset = set(names_list)
print myset
myset_count = len(myset)
print "unique:",myset_count
For further explanation:
names_list is:
['1 2 3 4 5 6 5 4 5\n', '14 62 48 14\n', '1 3 5 7 9\n', '123 456 789 1234 5678\n', '34 34 34 34 34\n', '1\n', '1 2 2 2 2 2 3 3 4 4 4 4 5 5 6 7 7 7 1 1\n']
and my_set is:
set(['1 2 3 4 5 6 5 4 5\n', '1 3 5 7 9\n', '34 34 34 34 34\n', '14 62 48 14\n', '1\n', '1 2 2 2 2 2 3 3 4 4 4 4 5 5 6 7 7 7 1 1\n', '123 456 789 1234 5678\n'])
The output I receive is:
unique: 1
unique: 2
unique: 3
unique: 4
unique: 5
unique: 6
unique: 7
The output that should occur is:
unique: 6
unique: 3
unique: 5
unique: 5
unique: 1
unique: 1
unique: 7
Any suggestions as to why my set per line is not calculating the correct number of unique integers per line? I would also like any suggestions on how to improve my code in general (if you would like) because I just started learning Python by myself last night and would love tips. Thank you.
The problem is that as you are iterating over your file you are appending each line to the list names_list. After that, you build a set out of these lines. Your text file does not seem to have any duplicate lines, so printing the length of your set just displays the current number of lines you have processed.
Here's a commented fix:
with open(filename, 'r') as file:
for line in file:
if line.strip() and not line.startswith("#"):
numbers = line.split() # splits the string by whitespace and gives you a list
unique_numbers = set(numbers) # builds a set of the strings in numbers
print(len(unique_numbers)) # prints number of items in the set
Note that we are using the currently processed line and build a set from it (after splitting the line). Your original code stores all lines and then builds a set from the lines in each loop.
myset = set(names_list)
should be
myset = set(line.split())

Finding the most frequent items in a dataset

I am working with a big dataset and thus I only want to use the items that are most frequent.
Simple example of a dataset:
1 2 3 4 5 6 7
1 2
3 4 5
4 5
4
8 9 10 11 12 13 14
15 16 17 18 19 20
4 has 4 occurrences,
1 has 2 occurrences,
2 has 2 occurrences,
5 has 2 occurrences,
I want to be able to generate a new dataset just with the most frequent items, in this case the 4 most common:
The wanted result:
1 2 3 4 5
1 2
3 4 5
4 5
4
I am finding the first 50 most common items, but I am failing to print them out in a correct way. (my output is resulting in the same dataset)
Here is my code:
from collections import Counter
with open('dataset.dat', 'r') as f:
lines = []
for line in f:
lines.append(line.split())
c = Counter(sum(lines, []))
p = c.most_common(50);
with open('dataset-mostcommon.txt', 'w') as output:
..............
Can someone please help me on how I can achieve it?
You have to iterate again the dataset and, for each line, show only those who are int the most common data set.
If the input lines are sorted, you may just do a set intersection and print those in sorted order. If it is not, iterate your line data and check each item
for line in dataset:
for element in line.split()
if element in most_common_elements:
print(element, end=' ')
print()
PS: For Python 2, add from __future__ import print_function on top of your script
According to the documentation, c.most-common returns a list of tuples, you can get the desired output as follow:
with open('dataset-mostcommon.txt', 'w') as output:
for item, occurence in p:
output.writelines("%d has %d occurrences,\n"%(item, occurence))

Categories

Resources