Replacing numbers by random numbers in a changing file in Python/C++ - python

I need to mix my data. I've got some numbers in a file and I need it to be mixed, like for example, change all 4's on 20, but don't change 14's to 120's doing this. I thought a lot and I'm not really sure if it's possible, because there's a big number of digits and I need to do replacement hundred times with a random values.
Anyone did something like that? Anyone knows it's possible?

Here is a python example that might help you :
import re
import random
def writeInFile(fileName, tab): //This function writes the answer in a file
i = 0
with open(fileName, 'a') as n:
while i != len(tab):
n.write(str(tab[i]))
if i + 1 != len(tab):
n.write(' ')
i += 1
n.write('\n');
def main():
file = open('file.txt', 'r').readlines() //Reading the file containing the digits
tab = re.findall(r'\d+', str(file)) //Getting every number using regexp, in string file, and put them in a list.
randomDigit = random.randint(0, 100) // Generating a random integer >= 0 and <= 100
numberToReplace = "4" //Manually setting number to replace
for i in xrange(len(tab)): //Browsing list, and replacing every "4" to the randomly generated integer.
if tab[i] == str(numberToReplace):
tab[i] = str(randomDigit)
writeInFile("output.txt", tab) //Call function to write the results.
if __name__ == "__main__":
main()
Example :
file.txt contains : 4 14 4 444 20
Output.txt will be : 60 14 60 444 20, considering that the randomly generated integer was 60.
Important : In this example, I considered that your file is only containing positive numbers. So you will have to modify regexp to get negative numbers, and change it a bit if you have characters other than digits.
It might not be exactly the way you need it, but I think it's a good start.

Related

How to read a text file and convert into a list for use with statistics package in Python

The code I am running so far is as follows
import os
import math
import statistics
def main ():
infile = open('USPopulation.txt', 'r')
values = infile.read()
infile.close()
index = 0
while index < len(values):
values(index) = int(values(index))
index += 1
print(values)
main()
The text file contains 41 rows of numbers each entered on a single line like so:
151868
153982
156393
158956
161884
165069
168088
etc.
My tasks is to create a program which shows average change in population during the time period. The year with the greatest increase in population during the time period. The year with the smallest increase in population (from the previous year) during the time period.
The code will print each of the text files entries on a single line, but upon trying to convert to int for use with the statistics package I am getting the following error:
values(index) = int(values(index))
SyntaxError: can't assign to function call
The values(index) = int(values(index)) line was taken from reading as well as resources on stack overflow.
You can change values = infile.read() to values = list(infile.read())
and it will have it ouput as a list instead of a string.
One of the things that tends to happen whenever reading a file like this is, at the end of every line there is an invisible '\n' that declares a new line within the text file, so an easy way to split it by lines and turn them into integers would be, instead of using values = list(infile.read()) you could use values = values.split('\n') which splits the based off of lines, as long as values was previously declared.
and the while loop that you have can be easily replace with a for loop, where you would use len(values) as the end.
the values(index) = int(values(index)) part is a decent way to do it in a while loop, but whenever in a for loop, you can use values[i] = int(values[i]) to turn them into integers, and then values becomes a list of integers.
How I would personally set it up would be :
import os
import math
import statistics
def main ():
infile = open('USPopulation.txt', 'r')
values = infile.read()
infile.close()
values = values.split('\n') # Splits based off of lines
for i in range(0, len(values)) : # loops the length of values and turns each part of values into integers
values[i] = int(values[i])
changes = []
# Use a for loop to get the changes between each number.
for i in range(0, len(values)-1) : # you put the -1 because there would be an indexing error if you tried to count i+1 while at len(values)
changes.append(values[i+1] - values[i]) # This will get the difference between the current and the next.
print('The max change :', max(changes), 'The minimal change :', min(changes))
#And since there is a 'change' for each element of values, meaning if you print both changes and values, you would get the same number of items.
print('A change of :', max(changes), 'Happened at', values[changes.index(max(changes))]) # changes.index(max(changes)) gets the position of the highest number in changes, and finds what population has the same index (position) as it.
print('A change of :', min(changes), 'Happened at', values[changes.index(min(changes))]) #pretty much the same as above just with minimum
# If you wanted to print the second number, you would do values[changes.index(min(changes)) + 1]
main()
If you need any clarification on anything I did in the code, just ask.
I personally would use numpy for reading a text file.
in your case I would do it like this:
import numpy as np
def main ():
infile = np.loadtxt('USPopulation.txt')
maxpop = np.argmax(infile)
minpop = np.argmin(infile)
print(f'maximum population = {maxpop} and minimum population = {minpop}')
main()

How to do this recursion method to print out every 3 letters from a string on a separate line?

I'm making a method that takes a string, and it outputs parts of the strings on separate line according to a window.
For example:
I want to output every 3 letters of my string on separate line.
Input : "Advantage"
Output:
Adv
ant
age
Input2: "23141515"
Output:
231
141
515
My code:
def print_method(input):
mywindow = 3
start_index = input[0]
if(start_index == input[len(input)-1]):
exit()
print(input[1:mywindow])
printmethod(input[mywindow:])
However I get a runtime error.... Can someone help?
I think this is what you're trying to get. Here's what I changed:
Renamed input to input_str. input is a keyword in Python, so it's not good to use for a variable name.
Added the missing _ in the recursive call to print_method
Print from 0:mywindow instead of 1:mywindow (which would skip the first character). When you start at 0, you can also just say :mywindow to get the same result.
Change the exit statement (was that sys.exit?) to be a return instead (probably what is wanted) and change the if condition to be to return once an empty string is given as the input. The last string printed might not be of length 3; if you want this, you could use instead if len(input_str) < 3: return
def print_method(input_str):
mywindow = 3
if not input_str: # or you could do if len(input_str) == 0
return
print(input_str[:mywindow])
print_method(input_str[mywindow:])
edit sry missed the title: if that is not a learning example for recursion you shouldn't use recursion cause it is less efficient and slices the list more often.
def chunked_print (string,window=3):
for i in range(0,len(string) // window + 1): print(string[i*window:(i+1)*window])
This will work if the window size doesn't divide the string length, but print an empty line if it does. You can modify that according to your needs

Conversion to Logn Python 3.7

I have this code that works great and does what I want, however it does it in linear form which is way to slow for the size of my data files so I want to convert it to Log. I tried this code and many others posted here but still no luck at getting it to work. I will post both sets of code and give examples of what I expect.
import pandas
import fileinput
'''This code runs fine and does what I expect removing duplicates from big
file that are in small file, however it is a linear function.'''
with open('small.txt') as fin:
exclude = set(line.rstrip() for line in fin)
for line in fileinput.input('big.txt', inplace=True):
if line.rstrip() not in exclude:
print(line, end='')
else:
print('')
'''This code is my attempt at conversion to a log function.'''
def log_search(small, big):
first = 0
last = len(big.txt) - 1
while first <= last:
mid = (first + last) / 2
if str(mid) == small.txt:
return True
elif small.txt < str(mid):
last = mid - 1
else:
first = mid + 1
with open('small.txt') as fin:
exclude = set(line.rstrip() for line in fin)
for line in fileinput.input('big.txt', inplace=True):
if line.rstrip() not in exclude:
print(line, end='')
else:
print('')
return log_search(small, big)
big file has millions of lines of int data.
small file has hundreds of lines of int data.
compare data and remove duplicated data in big file but leave line number blank.
running the first block of code works but it takes too long to search through the big file. Maybe I am approaching the problem in a wrong way. My attempt at converting it to log runs without error but does nothing.
I don't think there is a better or faster way to do this that what you are currently doing in your first approach. (Update: There is, see below.) Storing the lines from small.txt in a set and iterating the lines in big.txt, checking whether they are in that set, will have complexity of O(b), with b being the number of lines in big.txt.
What you seem to be trying is to reduce this to O(s*logb), with s being the number of lines in small.txt, by using binary search to check for each line in small.txt whether it is in big.txt and removing/overwriting it then.
This would work well if all the lines were in a list with random access to any array, but you have just the file, which does not allow random access to any line. It does, however, allow random access to any character with file.seek, which (at least in some cases?) seems to be O(1). But then you will still have to find the previous line break to that position before you can actually read that line. Also, you can not just replace lines with empty lines, but you have to overwrite the number with the same number of characters, e.g. spaces.
So, yes, theoretically it can be done in O(s*logb), if you do the following:
implement binary search, searching not on the lines, but on the characters of the big file
for each position, backtrack to the last line break, then read the line to get the number
try again in the lower/upper half as usual with binary search
if the number is found, replace with as many spaces as there are digits in the number
repeat with the next number from the small file
On my system, reading and writing a file with 10 million lines of numbers only took 3 seconds each, or about 8 seconds with fileinput.input and print. Thus, IMHO, this is not really worth the effort, but of course this may depend on how often you have to do this operation.
Okay, so I got curious myself --and who needs a lunch break anyway?-- so I tried to implement this... and it works surprisingly well. This will find the given number in the file and replace it with an accordant number of - (not just a blank line, that's impossible without rewriting the entire file). Note that I did not thoroughly test the binary-search algorithm for edge cases, off-by-one erros etc.
import os
def getlineat(f, pos):
pos = f.seek(pos)
while pos > 0 and f.read(1) != "\n":
pos = f.seek(pos-1)
return pos+1 if pos > 0 else 0
def bsearch(f, num):
lower = 0
upper = os.stat(f.name).st_size - 1
while lower <= upper:
mid = (lower + upper) // 2
pos = getlineat(f, mid)
line = f.readline()
if not line: break # end of file
val = int(line)
if val == num:
return (pos, len(line.strip()))
elif num < val:
upper = mid - 1
elif num > val:
lower = mid + 1
return (-1, -1)
def overwrite(filename, to_remove):
with open(filename, "r+") as f:
positions = [bsearch(f, n) for n in to_remove]
for n, (pos, length) in sorted(zip(to_remove, positions)):
print(n, pos)
if pos != -1:
f.seek(pos)
f.write("-" * length)
import random
to_remove = [random.randint(-500, 1500) for _ in range(10)]
overwrite("test.txt", to_remove)
This will first collect all the positions to be overwritten, and then do the actual overwriting in a second stes, otherwise the binary search will have problems when it hits one of the previously "removed" lines. I tested this with a file holding all the numbers from 0 to 1,000 in sorted order and a list of random numbers (both in- and out-of-bounds) to be removed and it worked just fine.
Update: Also tested it with a file with random numbers from 0 to 100,000,000 in sorted order (944 MB) and overwriting 100 random numbers, and it finished immediately, so this should indeed be O(s*logb), at least on my system (the complexity of file.seek may depend on file system, file type, etc.).
The bsearch function could also be generalized to accept another parameter value_function instead of hardcoding val = int(line). Then it could be used for binary-searching in arbitrary files, e.g. huge dictionaries, gene databases, csv files, etc., as long as the lines are sorted by that same value function.

Python - Zip code to Barcode

The code is supposed to take a 5 digit zip code input and convert it to bar codes as the output. The bar code for each digit is:
{1:'...!!',2:'..!.!',3:'..!!.',4:'.!..!',5:'.!.!.',6:'.!!..',7:'!...!',8:'!..!.',9:'!.!..',0:'!!...'}
For example, the zip code 95014 is supposed to produce:
!!.!.. .!.!. !!... ...!! .!..! ...!!!
There is an extra ! at the start and end, that is used to determine where the bar code starts and stops. Notice that at the end of the bar code is an extra ...!! which is an 1. This is the check digit and you get the check digit by:
Adding up all the digits in the zipcode to make the sum Z
Choosing the check digit C so that Z + C is a multiple of 10
For example, the zipcode 95014 has a sum of Z = 9 + 5 + 0 + 1 + 4 = 19, so the check digit C is 1 to make the total sum Z + C equal to 20, which is a multiple of 10.
def printDigit(digit):
digit_dict = {1:'...!!',2:'..!.!',3:'..!!.',4:'.!..!',5:'.!.!.',6:'.!!..',7:'!...!',8:'!..!.',9:'!.!..',0:'!!...'}
return digit_dict[digit]
def printBarCode(zip_code):
sum_digits=0
num=zip_code
while num!=0:
sum_digits+=(num%10)
num/=10
rem = 20-(sum_digits%20)
answer=[]
for i in str(zip_code):
answer.append(printDigit(int(i)))
final='!'+' '.join(answer)+'!'
return final
print printBarCode(95014)
The code I currently have produces an output of
!!.!.. .!.!. !!... ...!! .!..!!
for the zip code 95014 which is missing the check digit. Is there something missing in my code that is causing the code not to output the check digit? Also, what to include in my code to have it ask the user for the zip code input?
Your code computes rem based on the sum of the digits, but you never use it to add the check-digit bars to the output (answer and final). You need to add code to do that in order to get the right answer. I suspect you're also not computing rem correctly, since you're using %20 rather than %10.
I'd replace the last few lines of your function with:
rem = (10 - sum_digits) % 10 # correct computation for the check digit
answer=[]
for i in str(zip_code):
answer.append(printDigit(int(i)))
answer.append(printDigit(rem)) # add the check digit to the answer!
final='!'+' '.join(answer)+'!'
return final
Interesting problem. I noticed that you solved the problem as a C-style programmer. I'm guessing your background is in C/C++. I's like to offer a more Pythonic way:
def printBarCode(zip_code):
digit_dict = {1:'...!!',2:'..!.!',3:'..!!.',4:'.!..!',5:'.!.!.',
6:'.!!..',7:'!...!',8:'!..!.',9:'!.!..',0:'!!...'}
zip_code_list = [int(num) for num in str(zip_code)]
bar_code = ' '.join([digit_dict[num] for num in zip_code_list])
check_code = digit_dict[10 - sum(zip_code_list) % 10]
return '!{} {}!'.format(bar_code, check_code)
print printBarCode(95014)
I used list comprehension to work with each digit rather than to iterate. I could have used the map() function to make it more readable, but list comprehension is more Pythonic. Also, I used the Python 3.x format for string formatting. Here is the output:
!!.!.. .!.!. !!... ...!! .!..! ...!!!
>>>

List Index Out of Range on Python. Nothing works

I have already reviewed multiple threads with similar answers to my question. Nothing seems to be working no matter what I try.
I am trying to create 100 random numbers, and put those random numbers into a list. However I keep getting
File "E:\WorkingWithFiles\funstuff.py", line 17, in randNumbs
numbList[index]+=1
IndexError: list index out of range
My code is:
def randNumbs(numbCount):
numbList=[0]*100
i=1
while i < 100:
index = random.randint(1,100)
numbList[index]+=1
i+=1
print (numbList)
return (numbList)
After reviewing multiple threads and tinkering around I cannot seem to get an answer.
Before I continue here is the scope of the project:
I have a .txt file thats a dictionary with however many words are in it. First, I write a function to calculate how many words are in the .txt file. Second, I generate 100 random numbers between 1 and the amount of words in the .txt file. Lastly I need to create a .txt file that prints
"Number Word"
120 Bologna
and so on. I am having trouble generating the random numbers. If anybody has any idea on why my list index is out of range and how to help, all help would be appreciated! Thank you!
Edit: the .txt file is 113k words long
You made a list of size 100 here:
numbList=[0]*100
Your problem is that you create indexes from 1 to 100 when you should be accessing indexes 0-99. Given a list of size n, the valid list indexes are 0 to n-1
Change your code to
index = random.randint(0,99)
Looks like an off-by-one error. randint will return numbers 1 to 100, while your list has indexes 0 to 99.
Also, you can rewrite your code like this:
def randNumbs(numbCount):
return [random.randint(1, 100) for i in range(numbCount)]
I would approach the problem a little differently:
from random import sample
SAMPLE_SIZE = 100
# load words
with open("dictionary.txt") as inf:
words = inf.read().splitlines() # assumes one word per line
# pick word indices
# Note: this returns only unique indices,
# ie a given word will not be returned twice
num_words = len(words)
which_words = sample(range(num_words), SAMPLE_SIZE)
# Note: if you did not need the word indices, you could just call
# which_words = sample(words, SAMPLE_SIZE)
# and get back a list of 100 words directly
# if you want words in sorted order
which_words.sort()
# display selected words
print("Number Word")
for w in which_words:
print("{:6d} {}".format(w, words[w]))
which gives something like
Number Word
198 abjuring
2072 agitates
2564 alevin
6345 atrophies
8108 barrage
9155 begloom
10237 biffy
11078 bleedings
11970 booed
14131 burials
14531 cabal
# etc...
Here, I’ve tried to fix your code. Explanations in comments.
import random
def rand_numbs(numb_count):
# this will generate a list of length 100
# it will have indexes from 0 to 99
numbList = [0] * 100
# dont use a while loop...
# when a for loop will do
for _ in range(numb_count):
# randint(i, j) will generate a number
# between i and j both inclusive!
# which means that both i and j can be generated
index = random.randint(0, 99)
# remember that python lists are 0-indexed
# the first element is nlist[0]
# and the last element is nlist[99]
numbList[index] += 1
print (numbList)
return (numbList)

Categories

Resources