Making two compressed files form into one paragraph - python

I have this code that I am trying to decompress. First, I have compressed the code which is all working but then when I go onto decompressing it there is a ValueError.
List.append(dic[int(bob)])
ValueError: invalid literal for int() with base 10: '1,2,3,4,5,6,7,8,9,'
This is the code...
def menu():
print("..........................................................")
para = input("Please enter a paragraph.")
print()
s = para.split() # splits sentence
another = [0] # will gradually hold all of the numbers repeated or not
index = [] # empty index
word_dictionary = [] # names word_dictionary variable
for count, i in enumerate(s): # has a count and an index for enumerating the split sentence
if s.count(i) < 2: # if it is repeated
another.append(max(another) + 1) # adds the other count to another
else: # if is has not been repeated
another.append(s.index(i) +1) # adds the index (i) to another
new = " " # this has been added because other wise the numbers would be 01234567891011121341513161718192320214
another.remove(0) # takes away the 0 so that it doesn't have a 0 at the start
for word in s: # for every word in the list
if word not in word_dictionary: # If it's not in word_dictionary
word_dictionary.append(word) # adds it to the dicitonary
else: # otherwise
s.remove(word) # it will remove the word
fo = open("indx.txt","w+") # opens file
for index in another: # for each i in another
index= str(index) # it will turn it into a string
fo.write(index) # adds the index to the file
fo.write(new) # adds a space
fo.close() # closes file
fo=open("words.txt", "w+") # names a file sentence
for word in word_dictionary:
fo.write(str(word )) # adds sentence to the file
fo.write(new)
fo.close() # closes file
menu()
index=open("indx.txt","r+").read()
dic=open("words.txt","r+").read()
index= index.split()
dic = dic.split()
Num=0
List=[]
while Num != len(index):
bob=index[Num]
List.append(dic[int(bob)])
Num+=1
print (List)
The problem is down on line 50. with ' List.append(dic[int(bob)])'.
Is there a way to get the Error message to stop popping up and for the code to output the sentence as inputted above?
Latest error message has occurred:
List.append(dic[int(bob)])
IndexError: list index out of range
When I run the code, I input "This is a sentence. This is another sentence, with commas."

The issue is index= index.split() is by default splitting on spaces, and, as the exception shows, your numbers are separated by ,s.
Without seeing index.txt I can't be certain if it will fix all of your indexes, but for the issue in OP, you can fix it by specifying what to split on, namely a comma:
index= index.split(',')
To your second issue, List.append(dic[int(bob)]) IndexError: list index out of range has two issues:
Your indexes start at 1, not 0, so you are off by one when reconstituting your array
This can be fixed with:
List.append(dic[int(bob) - 1])
Additionally you're doing a lot more work than you need to. This:
fo = open("indx.txt","w+") # opens file
for index in another: # for each i in another
index= str(index) # it will turn it into a string
fo.write(index) # adds the index to the file
fo.write(new) # adds a space
fo.close() # closes file
is equivalent to:
with open("indx.txt","w") as fo:
for index in another:
fo.write(str(index) + new)
and this:
Num=0
List=[]
while Num != len(index):
bob=index[Num]
List.append(dic[int(bob)])
Num+=1
is equivalent to
List = []
for item in index:
List.append(dic[int(item)])
Also, take a moment to review PEP-8 and try to follow those standards. Your code is very difficult to read because it doesn't follow them. I fixed the formatting on your comments so StackOverflow's parser could parse your code, but most of them only add clutter.

Related

Read a file line by line, subtract each number from one, replace hyphens with colons, and print the output on one single line

I have a text file (s1.txt) containing the following information. There are some lines that contain only one number and others that contain two numbers separated by a hyphen.
1
3-5
10
11-13
111
113-150
1111
1123-1356
My objective is to write a program that reads the file line by line, subtracts each number from one, replaces hyphens with colons, and prints the output on one single line. The following is my expected outcome.
{0 2:4 9 10:12 110 112:149 1110 1122:1355}
Using the following code, I am receiving an output that is quite different from what I expected. Please, let me know how I can correct it.
s1_file = input("Enter the name of the S1 file: ")
s1_text = open(s1_file, "r")
# Read contents of the S1 file to string
s1_data = s1_text.read()
for atoms in s1_data.split('\n'):
if atoms.isnumeric():
qm_atom = int(atoms) - 1
#print(qm_atom)
else:
qm_atom = atoms.split('-')
print(qm_atom)
If your goal is to output directly to the screen as a single line you should add end=' ' to the print function.
Or you can store the values in a variable and print everything at the end.
Regardless of that, you were missing at the end to subtract 1 from the values and then join them with the join function. The join function is used on a string where it creates a new string with the values of an array (all values must be strings) separated by the string on which the join method is called.
For example ', '.join(['car', 'bike', 'truck']) would get 'car, bike, truck'.
s1_file = input("Enter the name of the S1 file: ")
s1_text = open(s1_file, "r")
# Read contents of the S1 file to string
s1_data = s1_text.read()
output = []
for atoms in s1_data.split('\n'):
if atoms.isnumeric():
qm_atom = int(atoms) - 1
output.append(str(qm_atom))
else:
qm_atom = atoms.split('-')
# loop the array to subtract 1 from each number
qm_atom_substrated = [str(int(q) - 1) for q in qm_atom]
# join function to combine int with :
output.append(':'.join(qm_atom_substrated))
print(output)
An alternative way of doing it could be:
s1_file = input("Enter the name of the S1 file: ")
with open (s1_file) as f:
output_string = ""
for line in f:
elements = line.strip().split('-')
elements = [int(element) - 1 for element in elements]
elements = [str(element) for element in elements]
elements = ":".join(elements)
output_string += elements + " "
print(output_string)
why are you needlessly complicating a simple task by checking if a element is numerical then handle it else handle it differently.
Also your code gave you a bad output because your else clause is incorrect , it just split elements into sub lists and there is no joining of this sub list with ':'
anyways here is my complete code
f=open(s1_file,'r')
t=f.readlines()#reading all lines
for i in range(0,len(t)):
t[i]=t[i][0:-1]#removing /n
t[i]=t[i].replace('-',':') #replacing - with :
try:t[i]=int(t[i])-1 #convert str into int & process
except:
t[i]=f"{int(t[i].split(':')[0])-1}:{int(t[i].split(':')[1])-1}" #if str case then handle
print(t)

String searching in text file and dict values combinations

i'm a total beginner to python, i'm studying it at university and professor gave us some work to do before the exam. Currently it's been almost 2 weeks that i'm stuck with this program, the rule is that we can't use any library.
Basically I have this dictionary with several possibility of translations from ancient language to english, a dictionary from english to italian (only 1 key - 1 value pairs), a text file in an ancient language and another text file in Italian. Until now what i've done is basically scan the ancient language file and search for corresponding strings with dictionary (using .strip(".,:;?!") method), now i saved those corresponding strigs that contain at least 2 words in a list of strings.
Now comes the hard part: basically i need to try all possible combination of translations (values from ancient language to English) and then take these translations from english to italian the the other dictionary and check if that string exists in the Italian file, if yes i save the result and the paragraph where has been found (result in different paragraphs doesn't count, must be in the same I've made a small piece of code to count the paragraphs).
I'm having issues here for the following reasons:
In the strings that i've found how I'm supposed to replace the words and keep the punctuation? Because the return result must contain all the punctuation otherwise the output result will be wrong
If the string is contained but in 2 different lines of the text how should i proceed in order to make it work? For example i have a string of 5 words, at the end of a line i found the first 2 words corresponding but the remaining 3 words are the first 3 words of the next line.
As mentioned before the dict from ancient language to english is huge and can have up to 7 values (translations) for each key (ancient langauge), is there any efficient way to try all the combinations while searching if the string exists in a text file? This is probably the hardest part.
Probably the best way to process this is word by word scan every time and in case the sequence is broken i reset it somehow and keep scanning the text file...
Any idea?
Here you have commented code of what i've managed to do until now:
k = 2 #Random value, the whole program gonna be a function and the "k" value will be different each time
file = [ line.strip().split(';') for line in open('lexicon-GR-EN.csv', encoding="utf8").readlines() ] #Opening CSV file with possible translations from ancient Greek to English
gr_en = { words[0]: tuple(words[1:]) for words in file } #Creating a dictionary with the several translations (values)
file = open('lexicon-EN-IT.csv', encoding="utf8") # Opening 2nd CSV file
en_it = {} # Initializing dictionary
for row in file: # Scanning each row of the CSV file (From English to Italian)
L = row.rstrip("\n").split(';') # Clearing newline char and splitting the words
x = L[0]
t1 = L[1]
en_it[x] = t1 # Since in this CSV file all the words are 1 - 1 is not necesary any check for the length (len(L) is always 2 basically)
file = open('odyssey.txt', encoding="utf8") # Opening text file
result = () # Empty tuple
spacechecker = 0 # This is the variable that i need to determine if i'm on a even or odd line, if odd the line will be scanned normaly otherwise word order and words will be reversed
wordcount = 0 # Counter of how many words have been found
paragraph = 0 # Paragraph counter, starts at 0
paragraphspace = 0 # Another paragraph variable, i need this to prevent double-space to count as paragraph
string = "" # Empty string to store corresponding sequences
foundwords = [] # Empty list to store words that have been found
completed_sequences = [] # Empty list, here will be stored all completed sequences of words
completed_paragraphs = [] # Paragraph counter, this shows in which paragraph has been found each sequence of completed_sequences
for index, line in enumerate(file.readlines()): # Starting line by line scan of the txt file
words = line.split() # Splitting words
if not line.isspace() and index == 0: # Since i don't know nothing about the "secret tests" that will be conducted with this program i've set this check for the start of the first paragraph to prevent errors: if first line is not space
paragraph += 1 # Add +1 to paragraph counter
spacechecker += 1 # Add +1 to spacechecker
elif not line.isspace() and paragraphspace == 1: # Checking if the previous line was space and the current is not
paragraphspace = 0 # Resetting paragraphspace (precedent line was space) value
spacechecker += 1 # Increasing the spacechecker +1
paragraph +=1 # This means we're on a new paragraph so +1 to paragraph
elif line.isspace() and paragraphspace == 1: # Checking if the current line is space and the precedent line was space too.
continue # Do nothing and cycle again
elif line.isspace(): # Checking if the current line is space
paragraphspace += 1 # Increase paragraphspace (precedent line was space variable) +1
continue
else:
spacechecker += 1 # Any other case increase spacechecker +1
if spacechecker % 2 == 1: # Check if spacechecker is odd
for i in range(len(words)): # If yes scan the words in normal order
if words[i].strip(",.!?:;-") in gr_en != "[unavailable]": # If words[i] without any special char is in dictionary
currword = words[i] # If yes, we will call it "currword"
foundwords.append(currword) # Add currword to the foundwords list
wordcount += 1 # Increase wordcount +1
elif (words[i].strip(",.!?:;-") in gr_en == "[unavailable]" and wordcount >= k) or (currword not in gr_en and wordcount >= k): #Elif check if it's not in dictionary but wordcount has gone over k
string = " ".join(foundwords) # We will put the foundwords list in a string
completed_sequences.append(string) # And add this string to the list of strings of completed_sequences
completed_paragraphs.append(paragraph) # Then add the paragraph of that string to the list of completed_paragraphs
result = list(zip(completed_sequences, completed_paragraphs)) # This the output format required, a tuple with the string and the paragraph of that string
wordcount = 0
foundwords.clear() # Clearing the foundwords list
else: # If none of the above happened (word is not in dictionary and wordcounter still isn't >= k)
wordcount = 0 # Reset wordcount to 0
foundwords.clear() # Clear foundwords list
continue # Do nothing and cycle again
else: # The case of spacechecker being not odd,
words = words[::-1] # Reverse the word order
for i in range(len(words)): # Scanning the row of words
currword = words[i][::-1] # Currword in this case will be reversed since the words in even lines are written in reverse.
if currword.strip(",.!?:;-") in gr_en != "[unavailable]": # If currword without any special char is in dictionary
foundwords.append(currword) # Append it to the foundwords list
wordcount += 1 # Increase wordcount +1
elif (currword.strip(",.!?:;-") in gr_en == "[unavailable]" and wordcount >= k) or (currword.strip(",.!?:;-") not in gr_en and wordcount >= k): #Elif check if it's not in dictionary but wordcount has gone over k
string = " ".join(foundwords) # Add the words that has been found to the string
completed_sequences.append(string) # Append the string to completed_sequences list
completed_paragraphs.append(paragraph) # Append the paragraph of the strings to the completed_paragraphs list
result = list(zip(completed_sequences, completed_paragraphs)) # Adding to the result the tuple combination of strings and corresponding paragraphs
wordcount = 0 # Reset wordcount
foundwords.clear() # Clear foundwords list
else: # In case none of above happened
wordcount = 0 # Reset wordcount to 0
foundwords.clear() # Clear foundwords list
continue # Do nothing and cycle again
I'd probably take the following approach to solving this:
Try to collapse down the 2 word dictionaries into one (ancient_italian below), removing English from the equation. For example, if ancient->English has {"canus": ["dog","puppy", "wolf"]} and English->Italian has {"dog":"cane"} then you can create a new dictionary {"canus": "cane"}. (Of course if the English->Italian dict has all 3 English words, you need to either pick one, or display something like cane|cucciolo|lupo in the output).
Come up with a regular expression that can distinguish between words, and the separators (punctuation), and output them in order into a list (word_list below). I.e something like ['ecce', '!', ' ', 'magnus', ' ', 'canus', ' ', 'esurit', '.']
Step through this list, generating a new list. Something like:
translation = []
for item in word_list:
if item.isalpha():
# It's a word - translate it and add to the list
translation.append(ancient_italian[item])
else:
# It's a separator - add to the list as-is
translaton.append(item)
Finally join the list back together: ''.join(translation)
I'm unable to reply to your comment on the answer by match, but this may help:
For one, its not the most elegant approach but should work:
GR_IT = {}
for greek,eng in GR_EN.items():
for word in eng:
try:
GR_IT[greek] = EN_IT[word]
except:
pass
If theres no translation for a word it will be ignored though.
To get a list of words and punctuation split try this:
def repl_punc(s):
punct = ['.',',',':',';','?','!']
for p in punct:
s=s.replace(p,' '+p+' ')
return s
repl_punc(s).split()

Number not Printing in python when returning amount

I have some code which reads from a text file and is meant to print max and min altitudes but the min altitude is not printing and there is no errors.
altitude = open("Altitude.txt","r")
read = altitude.readlines()
count = 0
for line in read:
count += 1
count = count - 1
print("Number of Different Altitudes: ",count)
def maxAlt(read):
maxA = (max(read))
return maxA
def minAlt(read):
minA = (min(read))
return minA
print()
print("Max Altitude:",maxAlt(read))
print("Min Altitude:",minAlt(read))
altitude.close()
I will include the Altitude text file if it is needed and once again the minimum altitude is not printing
I'm assuming, your file probably contains numbers & line-breaks (\n)
You are reading it here:
read = altitude.readlines()
At this point read is a list of strings.
Now, when you do:
minA = (min(read))
It's trying to get "the smallest string in read"
The smallest string is usually the empty string "" - which most probably exists at the end of your file.
So your minAlt is actually getting printed. But it happens to be the empty string.
You can fix it by parsing the lines you read into numbers.
read = [float(a) for a in altitude.readlines() if a]
Try below solution
altitudeFile = open("Altitude.txt","r")
Altitudes = [float(line) for line in altitudeFile if line] #get file data into list.
Max_Altitude = max(Altitudes)
Min_Altitude = min(Altitudes)
altitudeFile.close()
Change your code to this
with open('numbers.txt') as nums:
lines = nums.read().splitlines()
results = list(map(int, lines))
print(results)
print(max(results))
the first two lines read file and store it as a list. third line convert string list to integer and the last one search in list and return max, use min for minimum.

Strip list items with multiple arguments error

I'm trying to remove a lot of stuff from a text file to rewrite it.
The text file has several hundred items each consisting of 6 lines of.
I got my code working to a point where puts all lines in an array, identifies the only 2 important in every item and deletes the whitespaces, but any further stripping gives me the following error:
'list' object has no attribute 'strip'
Here my code:
x = 0
y = 0
names = []
colors = []
array = []
with open("AA_Ivory.txt", "r") as ins:
for line in ins:
array.append(line)
def Function (currentElement, lineInSkinElement):
name = ""
color = ""
string = array[currentElement]
if lineInSkinElement == 1:
string = [string.strip()]
# string = [string.strip()]
# name = [str.strip("\n")]
# name = [str.strip(";")]
# name = [str.strip(" ")]
# name = [str.strip("=")]
names.append(name)
return name
# if lineInSkinElement == 2:
# color = [str.strip("\t")]
# color = [str.strip("\n")]
# color = [str.strip(";")]
# color = [str.strip(" ")]
# color = [str.strip("=")]
# colors.append(color)
# return color
print "I got called %s times" % currentElement
print lineInSkinElement
print currentElement
for val in array:
Function(x, y)
x = x +1
y = x % 6
#print names
#print colors
In the if statement for the names, deleting the first # will give me the error.
I tried converting the list item to string, but then I get extra [] around the string.
The if statement for color can be ignored, I know it's faulty and trying to fix this is what got me to my current issue.
but then I get extra [] around the string
You can loop through this to get around the listed string. For example:
for lst, item in string:
item = item.strip("\n")
item = item.strip(";")
item = item.strip(" ")
item = item.strip("=")
name.append(item)
return name
This will get you to the string within the list and you can append the stripped string.
If this isn't what you were looking for, post some of the data you're working with to clarify.
Alright, I found the solution. It was a rather dumb mistake of mine. The eerror occured due to the [] arroung the strip function making the outcome a list or list item. Removing them fixed it. Feeling relieved now, a bit stupid, but relieved.
You can also do that in one line using the following code.
item = item.strip("\n").strip("=").strip(";").strip()
The last strip will strip the white spaces.

How to .replace from a line to another in a .txt, if the line to be replaced is above the line where i'll get the value for replacement?

First of all, sorry for my long title.. i couldn't explain it in a simple way.. feel free to edit it!
I need some help with the logic to solve my problem...
I have a .txt file with the following characteristics:
Each line corresponds to a specific operation...
Each field of the line is delimited by a "|" separator
My .txt have 120k+ lines, but only 60k~ are for my invoices
I have to copy the values from the lines that starts with C195, and replace them in a specific field of the C100 line that is immediatly above the C195 line.
Example:
The invoice's part of my .txt is like:
|C100|1|1238761|128,82|1002,21|0,00|0,00|0,00|0,00|
|C170|1|414859|Mini Leitoa|Kg|21,80|KG|
|C190|363,53|0,00|0,00|0,00|0,00|0,00|0,00|
|C195|C195|1|Base de Cálculo ST: 193,56 - Valor da ST: 10,10|
|C195|C195|2|Valor do IPI: 7,10|
What it was supposed to be:
|C100|1|1238761|128,82|1002,21|0,00|193,56|10,10|7,10|
|C170|1|414859|Mini Leitoa|Kg|21,80|KG|
|C190|363,53|0,00|0,00|0,00|0,00|0,00|0,00|
|C195|1|Base de Cálculo ST: 193,56 - Valor da ST: 10,10|
|C195|2|Valor do IPI: 7,10|
What I did until now:
Created a program to read the lines of my txt and store them in input_lines = []
Get the index() of the lines in the input_lines that starts with C100, and store them in pos_c100 = []
Since my C195 field can have either values of the Tax1 (ICMS) or Tax2 (IPI), i used re.search(param,string) to find if the line contains either "ICMS" or "IPI".
If the line contains "ICMS", it will contains 2 values: the first one is the icms_basis and the other is the icms_value
|C195|1|Base de Cálculo ST: 193,56 - Valor da ST: 10,10|
If the line contains "IPI", it will contains 1 value: only the ipi_value
|C195|1|Valor do IPI: 10,10|
I've extracted the values out of the string using re.findall() and stored them in a "tax specific dictionary" with the line position
Since each value have a specific position to be replaced, and I already know these positions, I created 2 dictionaries to hold the index of the C195 line and their values, one dic for IPI and another for ICMS.
My data layout:
pos_c100 = [line_c100]
dic_icms = {line_c195 : [icms_basis, icms_value]}
dic_ipi = {line_c195 : ipi_value}
What I got after running my script now, for example:
input_lines = ["|C100|1..", "|C170|1..", "|C195|1|IPI..", "C195|1|ICMS.."] #the output of `file.readlines()`
pos_c100 = [2, 4, 8, 10] #the positions of the lines that start with C100
dic_icms = {6 : ["200,15", "15,80"]} #{key, [icms_basis, icms_value]}
dic_ipi = {7 : "7,15"} # {key, icms_}
#key is the position of the lines that startswith c195 in input_lines
Using the above dic_icms for example:
How can I get the "200,15" and the "15,80" from the dic_icms,
that is located at the line in the 6th position of the lines_input, and
replace it in a specific position of the line in the 4th position of
the lines_input using a loop in my dictionaries?
I need a way to check if the line is the closest above and if so,
replace the value referring to the dict values...
maybe with a
for key in dic_ipi:
for item in pos_c100:
dists = []
dist = key - item
dists.append(dist)
and
linha = (linha[0:posInicialBcICMS] + linha[posInicialBcICMS:posFinalBcICMS].replace("0,00", ICMS_BASIS) + linha[posInicialVlrICMS:posFinalVlrICMS].replace("0,00", ICMS_VALUE) + linha[posInicialVlrIPI:posFinalVlrIPI].replace("0,00", IPI_VALUE) + linha[posFinalVlrIPI + 1 : len(linha)])
Let's start with reading the file:
file = open("blabla.txt", "r").
data = file.read()
file.close()
Out data contains whole text. By using split:
data = data.split("\n") #splitting by \n
you will get big array looking like this: ['line1', 'line2', 'line3'...]. We need to create new table:
new_table = []
And Now it's time for heavy operations. Since your line looks like |C170|1|414859|Mini Leitoa|Kg|21,80|KG| you can use split() once more:
for line in data:
new_table.append(line.split("|") # split returns new table splitted by "|". Append adds this table to new_table in the last position.
new_table looks like this: [[line1_element1, line1_element2, line1_element2], [line2_element1, line2_element2, line2_element3],...]. Now you should be able to use len() function to get length of specific table. For example:
for i in xrange(0, len(new_table)): # iterating through lines, but 'i' is integer type from 0 to len(new_table)
for j in xrange(len(new_table[i])): # iterating through each element in 'i' line (i is index of new_table)
print new_table[i][j] # this will print j element of i line in new_table
If you can acces every element in every line by it's index, you can easli compare everything by using if statement. For example:
if new_data[i][j] == "C165":
do_something()
Hope it helps.
--EDIT--
You were asking about closest line. It will be i or j element +- 1.

Categories

Resources