Python Pandas create new columns from existing one avoiding row iteration

Python Pandas create new columns from existing one avoiding row iteration - python

Heading ##I have this df['title'] column:
Apartamento en Venta
Proyecto Nuevo de Apartamentos
Proyecto Nuevo de Apartamentos
Lote en Venta
Casa Campestre en Venta
Proyecto Nuevo de Apartamentos
Based on this column I want to create three new ones:
df['property_type'] => (House, Apartment, Lot, etc)
df['property_status'] => (New, Used)
df['ofert_type'] => (Sale, Rent)
I'm achieving this through row iteration and splitting:
df['tipo_inmueble'] = ''
df['estado_inmueble'] = ''
df['tipo_oferta'] = ''
for data in range(len(df)):
if 'Proyecto Nuevo de' in df.loc[data,'title']:
df.loc[data,'property_type'] = df.loc[data,'title'].split('Proyecto Nuevo de')[1]
df.loc[data,'property_type'] = str(df.loc[data,'property_type']).split(' ')[1][:-1]
df.loc[data,'property_status'] = 'new'
df.loc[data,'ofert_type'] = 'sale'
else:
df.loc[data,'property_type'] = df.loc[data,'title'].split(' en ')[0]
df.loc[data,'property_status'] = 'used'
df.loc[data,'ofert_type'] = df.loc[data,'title'].split(' en ')[1].split(' ')[0].lower()
But it seems this approach takes too much time to process the entire data frame. I'm in search of a more "pandas" solution.
Thank you for your help

You can make a function and use the .apply function- might be faster although you are still iterating.
def property_split(row):
if row['delta_points'] == 'apartment:
return 1
else:
return 0
df['apartment'] = df.apply (lambda row: property_split(row), axis=1)

Related

delete an element of a tuple within a list [Python, Tuples, Lists]

I am creating a menu for a billing program with tuples inside lists, how would you do to delete a data requested by the user deleting all its values?
menu = """
(1) Añadir Cliente
(2) Eliminar Cliente
(3) Añadir Factura
(4) Procesar facturacion del mes
(5) Listar todos los clientes
(6) Terminar
"""
lista_nom =[]
list_borra2 = []
ventas = []
while True:
print(menu)
opc = input("Escriba una opcion: ")
opc = int(opc)
if opc == 1:
nombre = input("Escribe el nombre del cliente: ")
cif = input('Escribe el cif de cliente: ')
direccion = input("Escribe el direccion de cliente: ")
lista_nom1 = (nombre,cif,direccion)
lista_nom.append(lista_nom1)
print(lista_nom)
#print(lista_nom1)mismo dato en tupla
if opc == 2:
nombre = input('Escriba el nombre del cliente a borrar: ')
for nombre in lista_nom[0]:
lista_nom.remove[(nombre)]
else:
print('No existe el cliente con el nombre', nombre)
# for nom in range(len(lista_nom)):
# for eli in range(len(lista_nom[nom])):
# if lista_nom[nom][eli][0] == nombre:
# del lista_nom[nom:nom+1]
# print(lista_nom)
I have tried to access the element with nested for but it simply does nothing.
try to create a new list to store the deleted data and then install it in the first list with the main values to be deleted
# list_borra2.append(nombre)
# for nom in range(len(lista_nom)):
# for eli in range(len(lista_nom[nom])):
# if lista_nom[nom][eli][0] == nombre:
# del lista_nom[nom:nom+1]
# print(lista_nom)
# lista_nom.remove(nombre)
# list_borra2 += lista_nom
# # print(list_borra2)
# print(list_borra2)

Instead of deleting an element, what is not possible inside of tuples, you could define a new tuple by doing
nom_remove = lista_nom[:nom] + lista_nom[(nom+1):]

At the end I resolved the problem with this, like the other guy tell me the tuples are immutable so I need to go inside the list and i can access to the tuple:
if opc == 2:
nombre = input('Escriba el nombre del cliente a borrar: ')
vivo = 0
for kilo in ventas:
vivo = ventas[kilo].count(nombre)
if vivo == 0:
for nom in range(len(lista_nom)):
if lista_nom[nom].count(nombre)>0:
del lista_nom[nom:nom+1]
print(lista_nom)
break

python - Change one-row table value

I'm developing a script for a tables. I had a problem, that the table came with decimal values and I'm trying to make a change of these values in the code.
To do that, just take the decimal value and multiply it by 1000, but there's a problem that I really can't understand. It works up to a certain number, then the values of some
rows of the table change and the result at the end is of immense value. I don't know what's going on, someone can give me a hand.
Put the entire code of the function
def DESTINOS_CLUBE_CLIENTE_MOD072():
# Caminho do arquivo xlsx # Sheet_name Nome da tabela ou planilha. Obs: Tem que ser EXATAMENTE como está escrito no excel.
table = pd.read_excel('./planilha.xlsx', sheet_name='LP')
oneCard = ""
# Remoção de linhas que não contém valores
for i in range(table.shape[0]):
if pd.isna(table[1][i]) == True:
table = table.drop(labels=i, axis=0)
table.reset_index(inplace=True, drop=True)
# Criar nova tabela
table2 = pd.DataFrame(table)
# Resetar o index da nova tabela
table2.reset_index(inplace=True, drop=True)
for i in range(8):
print(table2[1][i])
valor2 = int(table2[1][i] * 1000)
table2[1] = table2[1].replace(table2[1][i], valor2)
print("------")
# table2 = table2.astype({1 : 'int32'})
# table2 = table2.astype({'1.1': 'int32'})
print(table2)
# Pegando os valores da tabela e inserindo em seus determinados campos
for i in range(table2.shape[0]):
values = 'element{}.querySelector("{}").value="{}"; element{}.querySelector("{}").value="{}"; element{}.querySelector("{}").value="{}"; element{}.querySelector("{}").value="{}"; element{}.querySelector("{}").value="{}";' . format(i, "input[name*='Origem']", table2['ORIGEM'][i], i, "input[name*='Destino']", table2['DESTINOS'][i], i, "input[name*='Clube_valor']", table2[1][i], i, "input[name*='Geral_valor']", table2['1.1'][i], i, "input[name*='Link_botao']", table2['LINKS'][i])
oneCard += " let element"+str(i)+" = document.querySelectorAll('[data-fieldname=Itens]')"+str([i])+"; setTimeout(()=>{"+values+"},10000);"
# Código completo
cod = "let totalElements = document.querySelectorAll('[data-fieldname=Itens]').length == "+str(table2.shape[0])+" ? 0 : "+str(table2.shape[0])+" - document.querySelectorAll('[data-fieldname=Itens]').length; let element = document.querySelector('[data-fieldname=Itens]'); for(i = 0; i < totalElements; i++){element.children[13].click()} setTimeout(() => {"+oneCard+"}, 10000); setTimeout(() => {console.log('Pronto, pode publicar!')}, 20050);"
# Inserir o código dentro de um arquivo de texto
file(cod)
# Mensagem de sucesso
message()
DESTINOS_CLUBE_CLIENTE_MOD072()

the problem was that I was just passing the value, so it got all the equal values from the table. So I used loc, to select the line
for i in range(table2.shape[0]):
valor2 = int(table2[1][i] * 1000)
table2.loc[i] = table2.loc[i].replace(table2.loc[i][1], valor2)

Pandas Excel row extract changes fields

I work on a small translation program. I have a .xlsx file attached with 5 columns each in different Language(English, French, German, Spanish, Italian).
The program provides a drop down list with with each row from the .xlsx being one of the available options(English Only). Selecting one of the options takes the English Value and adds it to a list.
I then use following to later extract the whole row of other languages based on the English selected and split by deliminator(;):
instructionList = ['Avoid contact with light-coloured fabrics, leather and upholstery. Colour may transfer due to the nature of indigo-dyed denim.']
for i in range(len(instructionList)):
newCompInst.append(translationFile.loc[translationFile['English'] == instructionList[i]].to_string(index=False, header=False))
newInst = [i.replace(' ', ',;') for i in newInst ]
strippedInst = [item.lstrip() for item in newInst ]
print('strippedInst: ', strippedInst)
The output I get from the following code is:
strippedInst: ['Avoid contact with light-coloured fabrics, lea...,;Bviter le contact avec les tissus clairs, le c...,;Kontakt mit hellen Stoffen, Leder und Polsterm...,;Evitar el contacto con tejidos de colores clar...,;Evitare il contatto con capi dai colori delica...']
After running this code all of the languages get cut in half and the rest of the sentence gets replaced with '...' - (NOTE the ENGLISH in the 'strippedInst' and compare with what has been inputed to the loop (instructionList).
The output gets cut only when the sentence is long. I tried running smaller phrases and it all seems to come through fine.
This is the Expected output:
strippedInst:
['
Avoid contact with light-coloured fabrics, leather and upholstery. Colour may transfer due to the nature of indigo-dyed denim.,;
Éviter le contact avec les tissus clairs, le cuir et les tissus d'ameublement. Les couleurs peuvent déteindre en raison de la nature de la teinture indigo du denim.,;
Kontakt mit hellen Stoffen, Leder und Polstermöbeln vermeiden. Aufgrund der Indigofärbung kann sich die Farbe übertragen,;
Evitar el contacto con tejidos de colores claros, con cuero y con tapicerías. El tinte índigo de los vaqueros podría transferirse a dichas superficies.,;
Evitare il contatto con capi dai colori delicati, pelli e tappezzerie. Si potrebbe verificare una perdita del colore blu intenso del tessuto di jeans.,
']
EDIT:
Here is the entire standalone working function:
import pandas as pd
excel_file = 'C:/Users/user/Desktop/Translation_Table_Edit.xlsx'
translationFile = pd.read_excel(excel_file, encoding='utf-8')
compList = ['Avoid contact with light-coloured fabrics, leather and upholstery. Colour may transfer due to the nature of indigo-dyed denim.', 'Do not soak']
newComp = []
def myFunction():
global newComp
for i in range(len(compList)):
newComp.append(translationFile.loc[translationFile['English'] == compList[i]].to_string(index=False, header=False))
newComp = [i.replace(' ', ';') for i in newComp]
myFunction()
strippedComp = [item.lstrip() for item in newComp]
print(strippedComp)
This outputs following:
['Avoid contact with light-coloured fabrics, lea...;�viter le contact avec les tissus clairs, le c...;Kontakt mit hellen Stoffen, Leder und Polsterm...;Vermijd contact met lichtgekleurde stoffen, le...;Evitar el contacto con tejidos de colores clar...;Evitare il contatto con capi dai colori delica...', 'Do not soak;Ne pas laisser tremper;Nicht einweichen;Niet weken;No dejar en remojo;Non lasciare in ammollo']

The issues lies with calling to_string on a dataframe. Instead, first extract the values into an array (df_sub.iloc[0].values), and then join the elements of that list (';'.join(...)).
This should do the trick:
def myFunction():
global newComp
for i in range(len(compList)):
df_sub = translationFile.loc[translationFile['English'] == compList[i]]
if df_sub.shape[0] > 0:
newComp.append(';'.join(df_sub.iloc[0].values))
EDIT: suggested code improvements
In addition, (in my opinion) your code could be improved by the following (using pandas functionality instead of looping, adherence to naming convention in pep8, avoiding use of global variables):
import pandas as pd
df_translations = pd.read_excel('./Translation_Table_Edit.xlsx', encoding='utf-8')
to_translate = ['Avoid contact with light-coloured fabrics, leather and upholstery. Colour may transfer due to the nature of indigo-dyed denim.',
'Do not soak']
def get_translations(df_translations, to_translate, language='English'):
"""Looks up translatios for all items in to_translate.
Returns a list with semi-colon separated translations. None if no translations found."""
df_sub = df_translations[df_translations[language].isin(to_translate)].copy() # filter translations
df_sub = df_sub.apply(lambda x: x.str.strip()) # strip each cell
# format and combine translations into a list
ret = []
for translation in df_sub.values:
ret.append(';'.join(translation))
return ret
translations = get_translations(df_translations, to_translate)

why is the second loop never executed ?

Hi, i am actually working on a python program and i need to read a csv file and use data.append(line) to fill a data Array.
I wrote this following part of the program :
print "Lecture du fichier", table1
lecfi = csv.reader(open(table1,'r'),skipinitialspace = 'true',delimiter='\t')
# delimiter = caractere utilisé pour séparer les différentes valeurs
tempSize = 0
tempLast = ""
oldSize = 0
#on initialise la taille du fichier et la derniere ligne du fichier
if os.path.exists(newFilePath):
tempSize = os.path.getsize(newFilePath)
else:
tempSize = 0
if os.path.exists(newFilePath) and tempSize != 0:
#Si le fichier tampon n'existe pas, on le créer
#Lecture du fichier tampon
lecofi = csv.reader(open(newFilePath,'r'),skipinitialspace = 'true',delimiter='\t')
csvFileArray = []
for lo in lecofi:
csvFileArray.append(lo)
tempLast = str(csvFileArray[0])
tempLast = tempLast[2:-2]
oldSize = csvFileArray[1]
print "Tempon de Last : ", tempLast
print "Taille du fichier : ", str(oldSize)
#on récupere la ligne représentant la derniere ligne de l'ancien fichier
else:
#si le fichier n'existe pas, on lui laisse cette valeur par défaut pour le traitement suivant
tempLast = None
# remplissage des données du fichier pulse dans la variable data
cpt = 0
indLast = 0
fileSize = os.path.getsize(table1)
if oldSize != fileSize:
for lecline in lecfi:
cpt = cpt + 1
last = str(lecline)
if tempLast != None and last == tempLast:
print "TEMPLAST != NONE", cpt
indLast = cpt
print "Indice de la derniere ligne : ", indLast
print last, tempLast
print "Variable indLast : ", indLast
i = 0
for co in lecfi:
print "\nCOOOOOOO : ", co
if i == indLast:
data.append(co[0])
i=i+1
for da in data:
print "\n Variable data : ", da
now look at the prints :
Lecture du fichier Data_Q1/2018-05-23/2018-5-23_13-1-35_P_HOURS_Q1
Tempon de Last : ['(2104.72652']
Taille du fichier : ['20840448']
TEMPLAST != NONE 317127
Indice de la derniere ligne : 317127
['(2104.72652'] ['(2104.72652']
Variable indLast : 317127
It seems like the program doesn't care about what's following my for loop. I assume that it can be a really basic mistake but i can't get it.
Any help ?

You are trying to iterate over the CSV twice without reseting it. this is the reason your data array is empty.
The first time you actually iterates over the file:
for lecline in lecfi:
The second time, the original iterator already reached it's end and is empty:
for co in lecfi:
As mentioned in the comments by Johnny Mopp one possible solution is using the following method:
Python csv.reader: How do I return to the top of the file?
Hope this explains your issue.

Here:
for lecline in lecfi:
cpt = cpt + 1
# ...
you are reading the whole file. After this loop, the file pointer is at the end of the file and there's nothing more to be read. Hence here:
i = 0
for co in lecfi:
# ...
this second loop is never executed, indeed. You'd need to either reset the file pointer, or close and reopen the file, or read it in a list right from the start and iterate over this list instead.
FWIW, note that opening files and not closing them is bad practice and can lead to file corruption (not that much in your case since you're only reading but...). A proper implementation would look like:
with open(table1) as tablefile:
lecfi = csv.reader(tablefile, ....)
for lecline in lecfi:
# ....
tablefile.seek(0)
for lecline in lecfi:
# ....
Also, this:
lecofi = csv.reader(open(newFilePath,'r'),skipinitialspace = 'true',delimiter='\t')
csvFileArray = []
for lo in lecofi:
csvFileArray.append(lo)
would be better rewritten as:
with open(newFilePath) as newFile:
lecofi = csv.reader(newFile, ...)
csvFileArray = list(lecofi)

Add a column in a numpy_array Python

I'm using a numpy array with Python and I would like to know how I can add a new column at the end of my array?
I have an array with N rows and I calculate for each row a new value which is named X. I would like, for each row, to add this new value in a new column.
My script is (the interesting part is at the end of my script) :
#!/usr/bin/python
# coding: utf-8
from astropy.io import fits
import numpy as np
#import matplotlib.pyplot as plt
import math
#########################################
# Fichier contenant la liste des champs #
#########################################
with open("liste_essai.txt", "r") as f :
fichier_entier = f.read()
files = fichier_entier.split("\n")
for fichier in files :
with open(fichier, 'r') :
reading = fits.open(fichier) # Ouverture du fichier à l'aide d'astropy
tbdata = reading[1].data # Lecture des données fits
#######################################################
# Application du tri en fonction de divers paramètres #
#######################################################
#mask1 = tbdata['CHI'] < 1.0 # Création d'un masque pour la condition CHI
#tbdata_temp1 = tbdata[mask1]
#print "Tri effectué sur CHI"
#mask2 = tbdata_temp1['PROB'] > 0.01 # Création d'un second masque sur la condition PROB
#tbdata_temp2 = tbdata_temp1[mask2]
#print "Tri effectué sur PROB"
#mask3 = tbdata_temp2['SHARP'] > -0.4 # Création d'un 3e masque sur la condition SHARP (1/2)
#tbdata_temp3 = tbdata_temp2[mask3]
#mask4 = tbdata_temp3['SHARP'] < 0.1 # Création d'un 4e masque sur la condition SHARP (2/2)
#tbdata_final = tbdata_temp3[mask4]
#print "Création de la nouvelle table finale"
#print tbdata_final # Affichage de la table après toutes les conditions
#fig = plt.figure()
#plt.plot(tbdata_final['G'] - tbdata_final['R'], tbdata_final['G'], '.')
#plt.title('Diagramme Couleur-Magnitude')
#plt.xlabel('(g-r)')
#plt.ylabel('g')
#plt.xlim(-2,2)
#plt.ylim(15,26)
#plt.gca().invert_yaxis()
#plt.show()
#fig.savefig()
#print "Création du Diagramme"
#hdu = fits.BinTableHDU(data=tbdata_final)
#hdu.writeto('{}_{}'.format(fichier,'traité')) # Ecriture du résultat obtenu dans un nouveau fichier fits
#print "Ecriture du nouveau fichier traité"
#################################################
# Détermination des valeurs extremales du champ #
#################################################
RA_max = np.max(tbdata['RA'])
RA_min = np.min(tbdata['RA'])
#print "RA_max vaut : " + str(RA_max)
#print "RA_min vaut : " + str(RA_min)
DEC_max = np.max(tbdata['DEC'])
DEC_min = np.min(tbdata['DEC'])
#print "DEC_max vaut : " + str(DEC_max)
#print "DEC_min vaut : " + str(DEC_min)
#########################################
# Calcul de la valeur centrale du champ #
#########################################
RA_central = (RA_max + RA_min)/2.
DEC_central = (DEC_max + DEC_min)/2.
#print "RA_central vaut : " + str(RA_central)
#print "DEC_central vaut : " + str(DEC_central)
print " "
print " ######################################### "
##############################
# Détermination de X et de Y #
##############################
i = 0
N = len(tbdata)
for i in range(0,N) :
print "Valeur de RA à la ligne " + str(i) + " est : " + str(tbdata['RA'][i])
print "Valeur de RA_moyen est : " + str(RA_central)
print "Valeur de DEC_moyen est : " + str(DEC_central)
X = (tbdata['RA'][i] - RA_central)*math.cos(DEC_central)
Add_column = np.vstack(tbdata, X) # ==> ????
print "La valeur de X est : " + str(X)
print " "
I tried something but I'm not sure that's working.
And I've a second question if it's possible. In the plot part, I would like to save my plot for each file but with the name of each file. I think that I need to write something like :
plt.savefig('graph',"{}_{}".format(fichier,png))

Numpy arrays are always going to be stored in a continuous memory block, that means that once you've created it, making it any bigger will mean numpy will have to copy the original array to make sure that the addition will be beside the original array in memory.
If you have a general idea of how many columns you will be adding, you can create the original array with additional columns of zeros. This will reserve the space in memory for your array and then you can "add" columns by overwriting the left-most column of zeros.
If you have the memory to spare you can always over-estimate the number of columns you will need and then remove extra columns of zeros later on. As far as I know this is the only way to avoid copying when adding new columns to a numpy array.
For example:
my_array = np.random.rand(200,3) # the original array
zeros = np.zeros((200,400)) # anticipates 400 additional columns
my_array = np.hstack((my_array,zeros)) # join my_array with the array of zeros (only this step will make a copy)
current_column = 3 # keeps track of left most column of zeros
new_columns = [] # put list of new columns to add here
for col in new_columns:
my_array[:,current_column] = col
current_column += 1

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Pandas create new columns from existing one avoiding row iteration - python

You can make a function and use the .apply function- might be faster although you are still iterating. def property_split(row): if row['delta_points'] == 'apartment: return 1 else: return 0 df['apartment'] = df.apply (lambda row: property_split(row), axis=1)

Related

delete an element of a tuple within a list [Python, Tuples, Lists]

python - Change one-row table value

Pandas Excel row extract changes fields

why is the second loop never executed ?

Add a column in a numpy_array Python

Categories

Resources