Count all +1's in the file python - python

I have the following data:
1 3 4 2 6 7 8 8 93 23 45 2 0 0 0 1
0 3 4 2 6 7 8 8 90 23 45 2 0 0 0 1
0 3 4 2 6 7 8 6 93 23 45 2 0 0 0 1
-1 3 4 2 6 7 8 8 21 23 45 2 0 0 0 1
-1 3 4 2 6 7 8 8 0 23 45 2 0 0 0 1
The above data is in a file. I want to count the number of 1's,0's,-1's but only in 1st column. I am taking the file in standard input but the only way I could think of is to do like this:
cnt = 0
cnt1 = 0
cnt2 = 0
for line in sys.stdin:
(t1, <having 15 different variables as that many columns are in files>) = re.split("\s+", line.strip())
if re.match("+1", t1):
cnt = cnt + 1
if re.match("-1", t1):
cnt1 = cnt1 + 1
if re.match("0", t1):
cnt2 = cnt2 + 1
How can I make it better especially the 15 different variables part as thats the only place where I will be using those variables.

Use collections.Counter:
from collections import Counter
with open('abc.txt') as f:
c = Counter(int(line.split(None, 1)[0]) for line in f)
print c
Output:
Counter({0: 2, -1: 2, 1: 1})
Here str.split(None, 1) splits the line just once:
>>> s = "1 3 4 2 6 7 8 8 93 23 45 2 0 0 0 1"
>>> s.split(None, 1)
['1', '3 4 2 6 7 8 8 93 23 45 2 0 0 0 1']
Numpy makes it even easy:
>>> import numpy as np
>>> from collections import Counter
>>> Counter(np.loadtxt('abc.txt', usecols=(0,), dtype=np.int))
Counter({0: 2, -1: 2, 1: 1})

If you only want the first column, then only split the first column. And use a dictionary to store the counts for each value.
count = dict()
for line in sys.stdin:
(t1, rest) = line.split(' ', 1)
try:
count[t1] += 1
except KeyError:
count[t1] = 1
for item in count:
print '%s occurs %i times' % (item, count[item])

Instead of using tuple unpacking, where you need a number of variables exactly equal to the number of parts returned by split(), you can just use the first element of those parts:
parts = re.split("\s+", line.strip())
t1 = parts[0]
or equivalently, simply
t1 = re.split("\s+", line.strip())[0]

import collections
def countFirstColum(fileName):
res = collections.defaultdict(int)
with open(fileName) as f:
for line in f:
key = line.split(" ")[0]
res[key] += 1;
return res

rows = []
for line in f:
column = line.strip().split(" ")
rows.append(column)
then you get a 2-dimensional array.
1st column:
for row in rows:
print row[0]
output:
1
0
0
-1
-1

This is from a script of mine with an infile, I checked and it works with standard input as infile:
dictionary = {}
for line in someInfile:
line = line.strip('\n') # if infile but you should
f = line.split() # do your standard input thing
dictionary[f[0]]=0
for line in someInfile:
line = line.strip('\n') # if infile but you should
f = line.split() # do your standard input thing
dictionary[f[0]]+=1
print dictionary

Related

Random replace only on select indices at a set frequency

I am trying to replace data at a set frequency only on select columns. With the help of AKX here, I was able to find a solution to randomly replace values across an entire list. I feel bad for asking the question because I already asked a similar question, however, I am can't seem to really find a solution to do this regardless. So what I am trying to do exactly is, if I have a list that contains 4 values, I want to be able to select which values are randomly being replaced based on their indices. For example, if I select indices 2 and 4, I only want to replace values in those indices but indices 1 and 3 remain unaltered.
vals = ["*"]
def replace_random(lst, min_n, max_n, replacements):
n = random.randint(min_n, max_n)
if n == 0:
return lst
indexes = set(random.sample(range(len(lst)), n))
return [
random.choice(replacements)
if index in indexes
else value
for index, value
in enumerate(lst)
]
example of applying
with open("test2.txt", "w") as out, open("test.txt", "rt") as f:
for line in f:
li = line.strip()
tabs = li.split("\t")
geno = tabs[1:]
new_geno = replace_random_indexes(geno, 0, 5, vals)
print(new_geno)
example me what I have been trying to do to achieve the goal:
M = [1,3]
with open("test2.txt", "w") as out, open("test.txt", "rt") as f:
for line in f:
li = line.strip()
tabs = li.split("\t")
geno = tabs[1:]
new_geno = replace_random_indexes(geno[M], 0, 1, vals)
print(new_geno)
However, I get the following error when I try this:
TypeError: list indices must be integers or slices, not list
Example data:
Input:
123 1 2 1 4
234 - 2 0 4
345 - 2 - 4
456 0 2 1 4
567 1 2 1 4
678 0 2 0 4
789 - 2 1 4
890 0 2 1 4
Output:
123 1 * 1 4
234 - 2 0 4
345 - 2 - *
456 0 2 1 *
567 1 2 1 4
678 0 2 0 4
789 - 2 1 4
890 0 * 1 4
Edit:
One thing I forgot to mention, I thought about just removing the indices that I did not want to edit, then performing the replacement on the indices that I do want to replace, however, I wasn't sure how to join the indices back together in the same order. Here is what I tried
with open("test2.txt", "w") as out, open("start.test.txt", "rt") as f:
for line in f:
li = line.strip()
tabs = li.split("\t")
geno = tabs[1:]
geno_alt = [i for j, i in enumerate(geno) if j not in M]
geno_alt = replace_random(geno_alt,0,1,vals)
print(geno_alt)
If all you're trying to do is replace values at specific indices on each line of a file (taking the example data you provided), making n replacements (n randomly selected from some range), with a randomly selected replacement from some values, this would work:
from random import sample, choice
def make_replacements(fn_in, fn_out, indices, values, frequency):
with open(fn_out, "w") as out, open(fn_in, "r") as f:
for line in f:
indices_sample = sample(indices, choice(frequency))
line = '\t'.join(
choice(values)
if n in indices_sample
else v
for n, v in enumerate(line.strip().split())
) + '\n'
out.write(line)
make_replacements("start.test.txt", "out.txt", [2, 4], ['*'], [0, 1])
An example output:
123 1 2 1 *
234 - 2 0 4
345 - 2 - 4
456 0 2 1 4
567 1 2 1 *
678 0 * 0 4
789 - 2 1 *
890 0 2 1 *
I've updated the code and example output according to your changes in the question and the comments and believe this is what you were after.
Based on the answer given by Grismar and the answer given by AKX in my last post here, I was able to come up with this answer to solve my problem.
import random
def select_replacements(lst, indices, min_n, max_n, values):
n = random.randint(min_n, max_n)
if n == 0:
return lst
indices_sample = random.sample(indices, n)
return [
random.choice(values)
if n in indices_sample
else v
for n, v in enumerate(last)
]
program:
with open("test2.txt", "w") as out, open("start.test2.txt", "rt") as f:
for line in f:
li = line.strip()
tabs = li.split("\t")
geno = tabs[1:]
geno = select_replacements(geno, [1, 3], ['*'], 0,2)
geno = '\t'.join(geno)
merge = (f"{tabs[0]}\t{geno}\n")
out.write(merge)
input:
123 1 2 1 4
234 - 2 0 4
345 - 2 - 4
456 0 2 1 4
567 1 2 1 4
678 0 2 0 4
789 - 2 1 4
890 0 2 1 4
output:
123 1 2 1 4
234 - * 0 4
345 - * - *
456 0 * 1 4
567 1 * 1 *
678 0 * 0 *
789 - 2 1 4
890 0 * 1 *

Python to evaluate duplicate elements in csv files

I have 2 csv files:
csv 1:
CHANNEL
3
3
4
1
2
1
4
5
csv 2:
CHANNEL
1
2
2
3
4
4
4
5
I want to evaluate the state of the channel by finding the duplicate channels. If number of channel > 1, the state is 0, else the state is 1.
output csv:
index channel 1 channel 2 channel 3 channel 4 channel 5
1 0 1 0 0 0
2 1 0 1 0 1
So far I have counted the duplicate channels but for 1 file only. Now I don't know how to read 2 csv files and create the output file.
import csv
import collections
with open("csvfile.csv") as f:
csv_data = csv.reader(f,delimiter=",")
next(csv_data)
count = collections.Counter()
for row in csv_data:
channel = row[0]
count[channel] += 1
for channel, nb in count.items():
if nb>1:
You can read each file into a list then check the channel counts of each list.
Try this code:
ss1 = '''
CHANNEL
3
3
4
1
2
1
4
5
'''.strip()
ss2 = '''
CHANNEL
1
2
2
3
4
4
4
5
'''.strip()
with open("csvfile1.csv",'w') as f: f.write(ss1) # write test file 1
with open("csvfile2.csv",'w') as f: f.write(ss2) # write test file 2
#############################
with open("csvfile1.csv") as f:
lines1 = f.readlines()[1:] # skip header
lines1 = [int(x) for x in lines1] # convert to ints
with open("csvfile2.csv") as f:
lines2 = f.readlines()[1:] # skip header
lines2 = [int(x) for x in lines2] # convert to ints
lines = [lines1,lines2] # make list for iteration
state = [[0]*5,[0]*5] # default zero for each state
for ci in [0,1]: # each file
for ch in range(5): # each channel
state[ci][ch] = 0 if lines[ci].count(ch+1) > 1 else 1 # check channel count, set state
# write to terminal
print('Index','Channel 1','Channel 2','Channel 3','Channel 4','Channel 5', sep = ' ')
print(' ',1,' ',' '.join(str(c) for c in state[0]))
print(' ',2,' ',' '.join(str(c) for c in state[1]))
# write to csv
with open('state.csv','w') as f:
f.write('Index,Channel 1,Channel 2,Channel 3,Channel 4,Channel 5\n')
f.write('1,' + ','.join(str(c) for c in state[0]) + '\n')
f.write('2,' + ','.join(str(c) for c in state[1]) + '\n')
Output (terminal)
Index Channel 1 Channel 2 Channel 3 Channel 4 Channel 5
1 0 1 0 0 1
2 1 0 1 0 1
Output (state.csv)
Index,Channel 1,Channel 2,Channel 3,Channel 4,Channel 5
1,0,1,0,0,1
2,1,0,1,0,1
You can use collections.Counter
from collections import Counter
# read the two files
with open('file_0.csv','r') as source:
zero = source.readlines()
with open('file_1.csv','r') as source:
one = source.readlines()
# convert to integers
# if the last item is not '\n', you only need [1:]
zero = [int(item) for item in zero[1:-1]]
one = [int(item) for item in one[1:-1]]
# combine two lists
zero += one
# count the values with counter
channels_counts = Counter(zero)
unique_channels = sorted(set(channels_counts.keys()))
res = [0 if channels_counts[item] > 1 else 1 for item in unique_channels]
for ind,item in enumerate(res):
print('channel %i' % ind,item)

Python read from file that has multiple values

Ben
5 0 0 0 0 0 0 1 0 1 -3 5 0 0 0 5 5 0 0 0 0 5 0 0 0 0 0 0 0 0 1 3 0 1 0 -5 0 0 5 5 0 5 5 5 0 5 5 0 0 0 5 5 5 5 -5
Moose
5 5 0 0 0 0 3 0 0 1 0 5 3 0 5 0 3 3 5 0 0 0 0 0 5 0 0 0 0 0 3 5 0 0 0 0 0 5 -3 0 0 0 5 0 0 0 0 0 0 5 5 0 3 0 0
Reuven
I was wondering how to read multiple lines of this sort of file into a list or dictionary as I want the ratings which are the numbers to stay with the names of the person that corresponds with the rating
You could read the file in pairs of two lines and populate a dictionary.
path = ... # path to your file
out = {}
with open(path) as f:
# iterate over lines in the file
for line in f:
# the first, 3rd, ... line contains the name
name = line
# the 2nd, 4th, ... line contains the ratings
ratings = f.next() # by calling next here, we jump two lines per iteration
# write values to dictionary while using strip to get rid of whitespace
out[name.strip()] = [int(rating.strip()) for rating in ratings.strip().split(' ')]
It could also be done with a while loop:
path = ... # path to your file
out = {}
with open(path) as f:
while(True):
# read name and ratings, which are in consecutive lines
name = f.readline()
ratings = f.readline()
# stop condition: end of file is reached
if name == '':
break
# write values to dictionary:
# use name as key and convert ratings to integers.
# use strip to get rid of whitespace
out[name.strip()] = [int(rating.strip()) for rating in ratings.strip().split(' ')]
You can use zip to combine lines by pairs to form the dictionary
with open("file.txt","r") as f:
lines = f.read().split("\n")
d = { n:[*map(int,r.split())] for n,r in zip(lines[::2],lines[1::2]) }

How to sort a dictionary alphabetically?

def wordCount(inPath):
inFile = open(inPath, 'r')
lineList = inFile.readlines()
counter = {}
for line in range(len(lineList)):
currentLine = lineList[line].rstrip("\n")
for letter in range(len(currentLine)):
if currentLine[letter] in counter:
counter[currentLine[letter]] += 1
else:
counter[currentLine[letter]] = 1
sorted(counter.keys(), key=lambda counter: counter[0])
for letter in counter:
print('{:3}{}'.format(letter, counter[letter]))
inPath = "file.txt"
wordCount(inPath)
This is the output:
a 1
k 1
u 1
l 2
12
h 5
T 1
r 4
c 2
d 1
s 5
i 6
o 3
f 2
H 1
A 1
e 10
n 5
x 1
t 5
This is the output I want:
12
A 1
H 1
T 1
a 1
c 2
d 1
e 10
f 2
h 5
i 6
k 1
l 2
n 5
o 3
r 4
s 5
t 5
u 1
x 1
How do I sort the "counter" alphabetically?
I've tried simply sorting by keys and values but it doesn't return it alphabetically starting with capitals first
Thank you for your help!
sorted(counter.keys(), key=lambda counter: counter[0])
alone does nothing: it returns a result which isn't used at all (unless you recall it using _ but that's rather a command-line practice)
As opposed to what you can do with a list with .sort() method, you cannot sort dictionary keys "in-place". But what you can do is iterating on the sorted version of the keys:
for letter in sorted(counter.keys()):
, key=lambda counter: counter[0] is useless here: you only have letters in your keys.
Aside: your whole code could be simplified a great deal using collections.Counter to count the letters.
import collections
c = collections.Counter("This is a Sentence")
for k,v in sorted(c.items()):
print("{} {}".format(k,v))
result (including space char):
3
S 1
T 1
a 1
c 1
e 3
h 1
i 2
n 2
s 2
t 1

how to search and replace for multiple value of text file in python?

My data:
1 255 59 0 1 255 0 1 1 0 4 0 5
0 1 255 1 253 0 90 1 1 0 2 0 233
I'm new to python and I need to replace all value in text file by
use conditions:
If value in line = 255, Replace as 1
If value in line < 255 (254, 253, 252...), Replace as 0
So, My target data is:
1 1 0 0 1 1 0 1 1 0 0 0 0
0 1 1 1 0 0 0 1 1 0 0 0 0
How can I solve this problem?
Actually, I tried to fix it but doesn't work with my data.
This is my code:
f1 = open('20130103.txt','r')
f2 = open('20130103_2.txt','w')
count = 255
for line in f1:
line = line.replace('255','1')
count = count-1
line = line.replace('%d'%(count), '0')
f2.write(line)
f1.close()
f2.close()
And result is:
1 1 59 0 1 1 0 1 1 0 4 0 5
0 1 1 1 0 0 90 1 1 0 2 0 233
The eye catcher here is the number 255, for which you would want to display 1, which clearly indicates, you are only interested in the 8th bit.
The only case which looks strange is how your o/p not conforming with your requirement, does not manipulate 1 in which case, you have to leave it out of your transformation
If I have to trust your requirement
>>> with open("test.in") as fin, open("test.out","w") as fout:
for line in fin:
line = (e >> 7 for e in map(int, line.split()))
fout.write(''.join(map(str, line)))
fout.write('\n')
If I have to trust your data
>>> with open("test.in") as fin, open("test.out","w") as fout:
for line in fin:
line = (e >> 7 if e != 1 else 1 for e in map(int, line.split()))
fout.write(''.join(map(str, line)))
fout.write('\n')
Another alternate view to this problem.
>>> with open("test.in") as fin, open("test.out","w") as fout:
for line in fin:
line = (e / 255 if e != 1 else 1 for e in map(int, line.split()))
fout.write(''.join(map(str, line)))
fout.write('\n')
Here's an example of how you can pass re.sub a function -- instead of a replace string -- to gain really fine control over how replacements are done:
import re
lines = ['1 255 59 0 1 255 0 1 1 0 4 0 5',
'0 1 255 1 253 0 90 1 1 0 2 0 233']
def do_replace(match):
number = int(match.group(0))
if number == 0 or number == 1:
return str(number)
elif number == 255:
return '1'
elif number < 255:
return '0'
else:
raise ValueError
for line in lines:
print re.sub(r'\d+', do_replace, line)
prints:
1 1 0 0 1 1 0 1 1 0 0 0 0
0 1 1 1 0 0 0 1 1 0 0 0 0
You could do something like this:
with open('20130103.txt', 'r') as f1, open('20130103_2.txt', 'w') as f2:
for line in f1:
values = line.split()
new_values = ' '.join('1' if value == '255' else '0' for value in values)
f2.write(new_values + '\n')
line.split() splits the line into chunks. 'a b'.split() == ['a', 'b']
'1' if value == '255' else '0' for value in values is a generator that just outputs '1' or '0', depending on the value of an item in your list of values.
' '.join() joins together the values in the list (or generator, in this case) with a space.

Categories

Resources