Parsing a simple text file in python

Parsing a simple text file in python - python

I've the following text file taken from a csv file. The file is two long to be shown properly here, so here's the line info:
The file has 5 lines:The 1st one starts in ETIQUETASThe 2nd one stars in RECURSOSThe 3rd one starts in DATOS CLIENTE Y PIEZAThe 4th one starts in Numero Referencia,The 5th and last one starts in BRIDA Al.
ETIQUETAS:;;;;;;;;;START;;;;;;;;;;;;;;;;;;;;;END;;
RECURSOS:;;;;;;;;;0;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;1;1;1;0;1;0;;Nota: 0
equivale a infinito, para decir que no existen recursos usar un numero
negativo DATOS CLIENTE Y PIEZA;;;;PLAZOS Y PROCESOS;;;;;;;;;;hoja
de ruta;MU;;;;;;;;;;;;;;;;; Numero Referencia;Descripcion
Referencia;Nombre Cliente;Codigo Cliente;PLAZO DE
ENTREGA;piezas;PROCESO;MATERIAL;stock;PROVEEDOR;tiempo ida
pulidor;pzas dia;TPO;tiempo vuelta pulidor;TIEMPO RECEPCION;CONTROL
CALIDAD DE ENTRADA;TIEMPO CONTROL CALIDAD DE ENTRADA;ALMACEN A (ANTES
DE ENTRAR
MAQUINA);GRANALLA;TPO;LIMPIADO;TPO;BRILLADO;TPO;;CARGA;MAQUINA;SOLTAR;control;EMPAQUETADO;ALMACENB;TIEMPO;
BRIDA Al;BRIDA Al;AEROGRAFICAS AHE,
S.A.;394;;;niquelado;aluminio;;;;matriz;;;5min;NO;;3dias;;;;;;;;1;1;1;;1;4D;;
I want to do two things:
Count the between START and END of the first line, both inclusive and save it as TOTAL_NUMBERS. This means if I've START;;END has to count 3; the START itself, the blank space between the two ;; and the END itself. In the example of the test, START;;;;;;;;;;;;;;;;;;;;;END it has to count 22.
What I've tried so far:
f = open("lt.csv", 'r')
array = []
for line in f:
if 'START' in line:
for i in line.split(";"):
array.append(i)
i = 0
while i < len(array):
if i == 'START':
# START COUNTING, I DONT KNOW HOW TO CONTINUE
i = i + 1
2.Check the file, go until the word PROVEEDOR appears, and save that word and the following TOTAL_NUMBERS(in the example, 22) on an array.
This means it has to save:
final array = ['PROVEEDOR', 'tiempo ida pulidor', 'pzas dia, 'TPO', 'tiempo vuelta pulidor', 'TIEMPO RECEPCION', 'CONTROL CALIDAD DE ENTRADA', 'TIEMPO CONTROL CALIDAD DE ENTRADA, 'ALMACEN A (ANTES DE ENTRAR MAQUINA)', 'GRANALLA', 'TPO', 'LIMPIADO', 'TPO','BRILLADO','TPO','','CARGA', 'MAQUINA', 'SOLTAR', 'control', 'EMPAQUETADO', 'ALMACENB']
Thanks in advance.

I am assuming the file is split into two lines; the first line with START and END and then a long line which needs to be parsed. This should work:
with open('somefile.txt') as f:
first_row = next(f).strip().split(';')
TOTAL_NUMBER = len(first_row[first_row.index('START'):first_row.index('END')+1])
bits = ''.join(line.rstrip() for line in f).split(';')
final_array = bits[bits.index('PROVEEDOR'):bits.index('PROVEEDOR')+TOTAL_NUMBER]

Related

Use of random in a Hangman game

This is a Hangman game. The fact is that the user counts with one help, which can be used to reveal one of the letters of the word. I need it for the unknown letters (actually it works randomly and when the user has the word almost done it's more probably for the letter to be revealed yet, so there's no really help)
How can I modify the code for it to reveal a letter that hasn't been revealed yet?
import random
#AHORCADO
lista_palabras=['programacion', 'python', 'algoritmo', 'computacion', 'antioquia', 'turing', 'ingenio', 'AYUDA']
vidas=['🧡','🧡','🧡','🧡','🧡','🧡','🧡']
num_word=random.randint(0,6)
palabra=lista_palabras[num_word]
print(' _ '*len(palabra))
print('Inicias con siete vidas', "".join(vidas),'\n', 'Pista: escribe AYUDA para revelar una letra (sólo tienes disponible 1)')
#print(palabra)
palabra_actual=['_ ']*len(palabra)
posicion=7
contador_pistas=0
while True:
fullword="".join(palabra_actual)
#condición para ganar
if letra==palabra or fullword==palabra:
print(palabra)
print('¡GANASTE!')
break
letra=input('Inserta una letra: ')
#condición que agrega letra adivinada
if letra in palabra:
orden=[i for i in range(len(palabra)) if palabra[i] == letra]
for letras in orden:
palabra_actual[letras]=letra
print(''.join(palabra_actual))
#condición AYUDAs
elif letra=='AYUDA' and contador_pistas==0:
pista=random.randint(0,len(palabra)-1)
palabra_actual[pista]=palabra[pista]
print(''.join(palabra_actual))
contador_pistas+=1
#condición límite de ayudas
elif letra=='AYUDA' and contador_pistas>=1:
print('Ya no te quedan pistas restantes')
#condición para perder
elif letra not in lista_palabras:
posicion-=1
vidas[posicion]='💀'
print('¡Perdiste una vida!',''.join(vidas))
if posicion==0:
print('GAME OVER')
break
Thank you <3

Create a list of of index values for the word, and remove the indexes from the list as the user selects the correct characters. then when they ask for help, you can call random.choice on the remaining indexes so you are guarateed to get one that hasn't been chosen yet
...
...
palabra_actual=['_ ']*len(palabra)
posicion=7
contador_pistas=0
indexes = list(range(len(palabra))) # list of index values
while True:
fullword="".join(palabra_actual)
if letra==palabra or fullword==palabra:
print(palabra)
print('¡GANASTE!')
break
letra=input('Inserta una letra: ')
if letra in palabra:
orden=[i for i in range(len(palabra)) if palabra[i] == letra]
for letras in orden:
indexes.remove(letras) # remove index values for selected characters
palabra_actual[letras]=letra
print(''.join(palabra_actual))
elif letra=='AYUDA' and contador_pistas==0:
pista=random.choice(indexes) # choose from remaining index values using random.choice
palabra_actual[pista]=palabra[pista]
print(''.join(palabra_actual))
contador_pistas+=1
...
...

Python Index out of range Error in lib loop issue

everything's fine? I hope so.
I'm dealing with this issue: List index out of range. -
Error message:
c:\Users.....\Documents\t.py:41: FutureWarning: As the xlwt package is no longer maintained, the xlwt engine will be removed in a future version of pandas. This is the only engine in pandas that supports writing in the xls format. Install openpyxl and write to an xlsx file instead. You can set the option io.excel.xls.writer to 'xlwt' to silence this warning. While this option is deprecated and will also raise a warning, it can be globally set and the warning suppressed.
read_file.to_excel(planilhaxls, index = None, header=True)
The goal: I need to create a loop that store a specific line of a worksheet such as sheet_1.csv, this correspondent line in sheet_2.csv and a third sheet also, stored in 3 columns in a sheet_output.csv
Issue: It's getting an index error out of range that I don't know what to do
Doubt: There is any other way that I can do it?
The code is below:
(Please, ignore portuguese comments)
import xlrd as ex
import pyautogui as pag
import os
import pyperclip as pc
import pandas as pd
import pygetwindow as pgw
import openpyxl
#Inputs
numerolam = int(input('Escolha o número da lamina: '))
amostra = input('Escoha a amostra: (X, Y, W ou Z): ')
milimetro_inicial = int(input("Escolha o milimetro inicial: "))
milimetro_final = int(input("Escolha o milimetro final: "))
tipo = input("Escolha o tipo - B para Branco & E para Espelho: ")
linha = int(input("Escolha a linha da planilha: "))
# Conversão de código
if tipo == 'B':
tipo2 = 'BRA'
else:
tipo2 = 'ESP'
#Arquivo xlsx
#planilhaxlsx = f'A{numerolam}{amostra}{milimetro_inicial}{tipo2}.xlsx'
#planilhaxls = f'A{numerolam}{amostra}{milimetro_inicial}{tipo2}.xls'
#planilhacsv = f'A{numerolam}{amostra}{milimetro_inicial}{tipo2}.csv'
#planilhacsv_ = f'A{numerolam}{amostra}{milimetro_final}{tipo2}.csv'
#arquivoorigin = f'A{numerolam}{amostra}{milimetro_inicial}{tipo2}.opj'
#Pasta
pasta = f'L{numerolam}{amostra}'
while milimetro_inicial < milimetro_final:
planilhaxlsx = f'A{numerolam}{amostra}{milimetro_inicial}{tipo2}.xlsx'
planilhaxls = f'A{numerolam}{amostra}{milimetro_inicial}{tipo2}.xls'
planilhacsv = f'A{numerolam}{amostra}{milimetro_inicial}{tipo2}.csv'
planilhacsv_ = f'A{numerolam}{amostra}{milimetro_final}{tipo2}.csv'
arquivoorigin = f'A{numerolam}{amostra}{milimetro_inicial}{tipo2}.opj'
# Converte o arquivo .csv para .xls e .xlsx
read_file = pd.read_csv(planilhacsv)
read_file.to_excel(planilhaxls, index = None, header=True)
#read_file.to_excel(planilhaxlsx, index = None, header=True)
# Abre o arquivo .xls com o xlrd - arquivo excel.
book = ex.open_workbook(planilhaxls)
sh = book.sheet_by_index(0)
# Declaração de variáveis.
coluna_inicial = 16 # Q - inicia em 0
valor = []
index = 0
# Loop que armazena o valor da linha pela coluna Q-Z na variável valor 0-(len-1)
while coluna_inicial < 25:
**#ERRO NA LINHA ABAIXO**
**temp = sh.cell_value(linha, coluna_inicial)**
valor.append(temp) # Adiciona o valor
print(index)
print(valor[index])
index +=1
coluna_inicial += 1
# Abre planilha de saída
wb = openpyxl.Workbook()
ws = wb.active
#Inicia loop de escrita
colunas = ['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z']
idx_colunas = 0
contador_loop = colunas[idx_colunas]
linha_loop = 1
index_out = 0
s = f'{contador_loop}{linha_loop}'
print(s)
while linha_loop < len(valor):
valor[index_out] = "{}".format(valor[index_out])
ws[s].value = valor[index_out]
print(valor[index_out] + ' feito')
linha_loop += 1
idx_colunas += 1
index_out += 1
# Salva planilha de saída
wb.save("teste.xlsx")
milimetro_inicial += 1

Your problem is on this line
temp = sh.cell_value(linha, coluna_inicial)
There are two index params used linha and coluna_inicial, 'linha' appears to be a static value so the problem would seem to be with 'coluna_inicial' which gets increased by 1 each iteration
coluna_inicial += 1
The loop continues while 'coluna_inicial' value is less than 25. I suggest you check number of columns in the sheet 'sh' using
sh.ncols
either for debugging or as the preferred upper value of your loop. If this is less than 25 you will get the index error once 'coluna_inicial' value exceeds the 'sh.ncols' value.
<---------------Additional Information---------------->
Since this is an xls file there would be no need for delimiter settings, your code as is should open it correctly. However since the xls workbook to be opened is determined by params entered by the user at the start presumably meaning there are a number in the directory to choose from, are you sure you are checking the xls file your code run is opening? Also if there is more than one sheet in the workbook(s) are you opening the correct sheet?
You can print the workbook name to be sure which one is being opened. Also by adding verbosity to the open_workbook command (level 2 should be high enough), it will upon opening the book, print in console details of the sheets available including number of rows and columns in each.
print(planilhaxls)
book = ex.open_workbook(planilhaxls, verbosity=2)
sh = book.sheet_by_index(0)
print(sh.name)
E.g.
BOF: op=0x0809 vers=0x0600 stream=0x0010 buildid=14420 buildyr=1997 -> BIFF80
sheet 0('Sheet1') DIMENSIONS: ncols=21 nrows=21614
BOF: op=0x0809 vers=0x0600 stream=0x0010 buildid=14420 buildyr=1997 ->
BIFF80
sheet 1('Sheet2') DIMENSIONS: ncols=13 nrows=13
the print(sh.name) as shown checks the name of the sheet that 'sh' is assigned to.

How to add a progress bar while executing process in tkinter?

I'm building a little app to generate a word document with several tables, since this process takes a little long to finish I'd like to add a progress bar to, make it a little less boring the wait for the user, so far I manage to make the following method:
def loading_screen(self,e,data_hold=None):
"""
Generates a progress bart to display elapset time
"""
loading = Toplevel()
loading.geometry("300x50")
#loading.overrideredirect(True)
progreso = Progressbar(loading,orient=HORIZONTAL,length=280,mode="determinate")
progreso.pack()
progreso.start(10)
#progreso.destroy()
This method is supposed to run right after the user clicks a button of the next Toplevel.
def validate_data(self):
"""
genera una venta para validar los datos ingresadados y hacer cualquier correccion
previa a la generacion de los depositos finales
"""
if self.datos:
venta = Toplevel()
venta.title("Listado de depositos por envasadora")
venta.geometry("600x300")
columnas = ["ID","Banco","Monto","Envasadora"]
self.Tree_datos = ttk.Treeview(venta,columns=columnas,show="headings")
self.Tree_datos.pack()
for i in columnas:
self.Tree_datos.heading(i,text=i)
for contact in self.datos:
self.Tree_datos.insert('', END, values=contact)
self.Tree_datos.column("ID",width=20)
#Tree_datos.column("Banco",width=100)
imprimir_depositos = Button(venta,text="Generar Depositos",command=self.generar_depositos)
imprimir_depositos.pack(fill=BOTH,side=LEFT)
editar_deposito = Button(venta,text="Editar seleccion",command=self.edit_view)
editar_deposito.pack(fill=BOTH,side=RIGHT)
imprimir_depositos.bind("<Button-1>",self.loading_screen)
#return get_focus ()
def get_focus(e):
self.valor_actualizado = self.Tree_datos.item(self.Tree_datos.focus()) ["values"]
self.Tree_datos.bind("<ButtonRelease-1>",get_focus)
else:
messagebox.showinfo(message="No hay datos que mostra por el momento",title="No hay datos!")
The command that generates the doc file is this (along with other methods that are not relevant for now I guess):
def generar_depositos(self):
documento = Document()
add_style = documento.styles ["Normal"]
font_size = add_style.font
font_size.name = "Calibri"
font_size.size = Pt(9)
table_dict = {"BPD":self.tabla_bpd,"BHD":self.tabla_bhd,"Reservas":self.tabla_reservas}
self.tabla_bhd(documento,"La Jagua","2535")
#for banco,env,deposito in datos_guardados:
#table_dict [banco] (documento,env,deposito)
# documento.add_paragraph()
self.set_doc_dimx(documento,margen_der=0.38,margen_izq=0.9,margen_sup=0.3,margen_infe=1)
sc = documento.sections
for sec in sc:
print(sc)
documento.save("depositos.docx")
So basically what I want is to display the animated progress bar while this method is running, I read about threading but I don't how to implement it on my code.

How to set a value for an empy list

I am starting to learn to program using BeautifulSoup. What I want to achieve with this code is to save prices from different pages. To achieve this I store the prices of each page in a list and all those lists in a list. The problem is some pages do not save the prices so there are some lists that are completely empty. What I am looking for is that those empty lists are assigned the elements of the "ListaR" so that later I do not have problems. Here's my code:
from bs4 import BeautifulSoup
import requests
import pandas as pd
from decimal import Decimal
from typing import List
AppID = ['495570', '540190', '607210', '575780', '338840', '585830', '637330', '514360', '575760', '530540', '361890', '543170', '346500', '555930', '575700', '595780', '362400', '562360', '745670', '763360', '689360', '363610', '575770', '467310', '380560']
ListaPrecios = list()
ListaUrl = list() #<------- LISTA
Blanco = [""]
ListaR = ["$0.00 USD", "$0.00 USD"]
for x in AppID: # <--------- Para cada una de las AppID...
#STR#
url = "https://steamcommunity.com/market/search?category_753_Game%5B%5D=tag_app_"+x+"&category_753_cardborder%5B%5D=tag_cardborder_0&category_753_item_class%5B%5D=tag_item_class_2#p1_price_asc" # <------ Usa AppID para entrar a sus links de mercado
ListaUrl += [url] # <---------- AGREGA CADA LINK A UNA LISTA
PageCromos = [requests.get(x) for x in ListaUrl]
SoupCromos = [BeautifulSoup(x.content, "html.parser") for x in PageCromos]
PrecioCromos = [x.find_all("span", {"data-price": True}) for x in SoupCromos] # <--------- GUARDA LISTAS DENTRO DE LISTAS CON CODIGO
min_CromoList = []
for item in PrecioCromos:
CromoList = [float(i.text.strip('USD$')) for i in item]
min_CromoList.append(min(CromoList)) # <---------------- Lista con todos los precios minimos de cromos de cada juego
print(min_CromoList)
Output:
ValueError: min() arg is an empty sequence

You can change this line
min_CromoList.append(min(CromoList))
to:
if not CromoList: # this will evaluate to True if the list is empty
min_CromoList.append(min(ListaR))
else:
min_CromoList.append(min(CromoList))
A neat feature of python is that empty lists evaluate to False and non-empty lists evaluate to True.
Since min(ListaR) will always evaluate to '$0.00 USD' it is probably neater to write this as:
if not CromoList:
min_CromoList.append('$0.00 USD')
else:
min_CromoList.append(min(CromoList))

Compare two columns and take their position

I have two files:
file1.txt:
-33.68;-53.48;Chuí;Rio Grande do Sul;Brazil;
-33.68;-53.4;Chuí;Rio Grande do Sul;Brazil;
-33.68;-53.32;Santa Vitória do Palmar;Rio Grande do Sul;Brazil;
-33.6;-53.48;Santa Vitória do Palmar;Rio Grande do Sul;Brazil;
-33.6;-53.4;Chuí;Rio Grande do Sul;Brazil;
file2.txt:
-37.6 -57.72 13
-37.6 -57.48 15
-33.6 -53.4 12
-33.6 -53.48 5
I want to compare lat and lon and join the lines
Expected result:
-33.6;-53.48;Santa Vitória do Palmar;Rio Grande do Sul;Brazil;5
-33.6;-53.4;Chuí;Rio Grande do Sul;Brazil;12
Code:
fileWrite = open("out.txt","w")
gg2=[]
with open("ddprecip.txt", encoding="utf8", mode='r') as file5:
bruto = [line.split() for line in file5]
for dd in range(len(bruto)):
lat = float(bruto[dd][0])
lon = float(bruto[dd][1])
valor= int(float(bruto[dd][2]))
gg2.append(str(lat)+";"+str(lon)+";"+str(valor))
with open("geo2.txt", encoding="utf8", mode='r') as f:
text = f.readlines()
for ind in range(len(bruto)):
coord2 = (gg2[ind].split(";")[0]+";"+gg2[ind].split(";")[1])
match = [i for i,x in enumerate(text) if coord2 in x]
if match:
variaveis = text[match[0]].split(";")
show = coord2+";"+variaveis[2]+";"+variaveis[3]+";"+gg2[ind].split(";")[2]+";"+variaveis[4]
print(show)
fileWrite.write(str(show.encode("utf-8"))+";\n")
fileWrite.close()
Problem:
If you have a lat/lon: 3.6; -53.4
will return the line: -33.6;-53.4;Chuí;Rio Grande do Sul;Brazil;
I need the lat and lon to be exact in both files

I think you're making doing what you want much harder than it needs to be, in a relatively slow way that's using a lot more memory than necessary.
To speed up the whole process, a dictionary named geo_dict is first created from the second file. It maps each unique (lat, log) pair of value to a place name. This will make checking for matches much quicker than doing a linear-search through the list of all of them.
It also unnecessary to convert the values to floats and ints, in fact it might be better to not do it because comparing float values can be problematic on a computer.
Anyway, after the dictionary is created, each line in the first file can be read and processed sequentially. Note that lines with no match are skipped.
from pprint import pprint
with open("geo2.txt", encoding="utf8", mode='r') as file2:
geo_dict = {tuple(line[:2]): line[2:5] for line in (line.split(';') for line in file2)}
pprint(geo_dict)
print()
with open("ddprecip.txt", encoding="utf8", mode='r') as file1, \
open("out.txt","w") as fileWrite:
for line in (line.split() for line in file1):
lat, lon, valor = line[:3]
match = geo_dict.get((lat, lon))
if match:
show = ';'.join(line[:2] + match[:3] + [valor])
fileWrite.write(show + "\n")
print('Done')
On-screen output:
{('-33.6', '-53.4'): ['Chuí', 'Rio Grande do Sul', 'Brazil'],
('-33.6', '-53.48'): ['Santa Vitória do Palmar',
'Rio Grande do Sul',
'Brazil'],
('-33.68', '-53.32'): ['Santa Vitória do Palmar',
'Rio Grande do Sul',
'Brazil'],
('-33.68', '-53.4'): ['Chuí', 'Rio Grande do Sul', 'Brazil'],
('-33.68', '-53.48'): ['Chuí', 'Rio Grande do Sul', 'Brazil']}
Done
Contents of the out.txt file created:
-33.6;-53.4;Chuí;Rio Grande do Sul;Brazil;12
-33.6;-53.48;Santa Vitória do Palmar;Rio Grande do Sul;Brazil;5

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parsing a simple text file in python - python

Related

Use of random in a Hangman game

Python Index out of range Error in lib loop issue

How to add a progress bar while executing process in tkinter?

How to set a value for an empy list

Compare two columns and take their position

Categories

Resources