Get last row in last column from a csv file using python - python

Hello I have a csv file that contains those columns :
index, text , author , date
i want to select the last column from the last inserted row
what i did so far :
inputFile = 'bouyguesForum_results.csv'
f1 = open(inputFile, "r")
last_line = f1.readlines()[-1]
f1.close()
print (last_line)
this code gets me the last inserted row but i want to select the last column which is the date column
code output :
9,"J'ai souscrit à un abonnement Bbox de 6€99 + 3€ de location de box, sauf que j'ai été prélevé de 19€99 ce mois-ci, sachant que je n'ai eu aucune consommation supplémentaire, ni d'appel, et je n'ai souscrit à rien, et rien n'est précisé sur ma facture. Ce n'est pas normal, et je veux une explication.",JUSTINE,17 novembre 2021
thank you for your time.

You can do this: if you want the very last row
with open('data.csv', 'r') as csv:
data = [[x.strip() for x in line.strip().split(',')] for line in csv.readlines()][-1][-1]
print(data)
or if you want all the last elements in each row
with open('data.csv', 'r') as csv:
data = [line.strip().split(',')[-1] for line in csv.readlines()]
print(data)

Since you got the last row, now you can just split it into a list. Sample-
last_line = last_line.strip("\n")
last_line = [x for x in last_line.split(",") if x!=""]
last_date = last_line[-1]

Related

wrting to Excel file with excelwriter give no file

I have 9 sub directories which have three files and I want to write those files to an Excel file. I start by reading all sub directories and then I convert the file to list then to dataframe which I export to an Excel file using "to_excel" and "writer excel" but for a strange reasons, the code does not produces any file.
# Path to the different files
path = r"C:\Users\Emmanuelle\Documents\tal\retrotraduction\corpus_amazon\corpus_retraduit"
for root, subdirs, files in os.walk(path):
#print(root)
for file in files:
print(file)
f_name = file[:-7]
print(f_name)
#print(files)
print("-----File in processed :", file)
with open(os.path.join(root, file), "r", encoding='utf-8') as b_translate_file:
liste = [line.rstrip() for line in b_translate_file]
if liste[0] != 'Contenu':
#print(liste)
if len(liste) == 2020:
print("-------------")
print("-----File of freins category identified :" , len(liste))
print("-------------")
df = pd.DataFrame(liste)
print(df)
writer = pd.ExcelWriter(os.path.join(path, "/{}.xlsx".format(f_name)), engine ='xlsxwriter')
df.to_excel(writer, sheet_name = f_name)
I expected file with 2020 elements to be write to the excel file.
df looks like this
----File of freins category identified : 2020
-------------
0
0 Malheureusement, l'impression de violence, bie...
1 Tout cela ne me donne pas envie d'utiliser un ...
2 """Mettre 5 étoiles dans le pétrin pour cet al...
3 "c'est bien écrit, c'est fluide, la seule pris...
4 Oui, bien sûr, il y a la super introduction de...
... ...
2015 m'a plongé dans une nuit blanche pour ce roman...
2016 Ce disque n'est pas mauvais en soi, mais il ne...
2017 "En voulant changer l'esprit de la série, les ...
2018 "Voici le déclin et la décadence d'une ancienn...
2019 "C'est l'ensemble le plus complet, à ma connai...
Try simplifying this:
os.path.join(path, "/{}.xlsx".format(f_name)) into "{}.xlsx".format(f_name)
Also, how about trying:
df.to_excel(path_name, sheet_name=f_name) instead, without the use of ExcelWriter?

Python Pandas create new columns from existing one avoiding row iteration

Heading ##I have this df['title'] column:
Apartamento en Venta
Proyecto Nuevo de Apartamentos
Proyecto Nuevo de Apartamentos
Lote en Venta
Casa Campestre en Venta
Proyecto Nuevo de Apartamentos
Based on this column I want to create three new ones:
df['property_type'] => (House, Apartment, Lot, etc)
df['property_status'] => (New, Used)
df['ofert_type'] => (Sale, Rent)
I'm achieving this through row iteration and splitting:
df['tipo_inmueble'] = ''
df['estado_inmueble'] = ''
df['tipo_oferta'] = ''
for data in range(len(df)):
if 'Proyecto Nuevo de' in df.loc[data,'title']:
df.loc[data,'property_type'] = df.loc[data,'title'].split('Proyecto Nuevo de')[1]
df.loc[data,'property_type'] = str(df.loc[data,'property_type']).split(' ')[1][:-1]
df.loc[data,'property_status'] = 'new'
df.loc[data,'ofert_type'] = 'sale'
else:
df.loc[data,'property_type'] = df.loc[data,'title'].split(' en ')[0]
df.loc[data,'property_status'] = 'used'
df.loc[data,'ofert_type'] = df.loc[data,'title'].split(' en ')[1].split(' ')[0].lower()
But it seems this approach takes too much time to process the entire data frame. I'm in search of a more "pandas" solution.
Thank you for your help
You can make a function and use the .apply function- might be faster although you are still iterating.
def property_split(row):
if row['delta_points'] == 'apartment:
return 1
else:
return 0
df['apartment'] = df.apply (lambda row: property_split(row), axis=1)

Pandas Excel row extract changes fields

I work on a small translation program. I have a .xlsx file attached with 5 columns each in different Language(English, French, German, Spanish, Italian).
The program provides a drop down list with with each row from the .xlsx being one of the available options(English Only). Selecting one of the options takes the English Value and adds it to a list.
I then use following to later extract the whole row of other languages based on the English selected and split by deliminator(;):
instructionList = ['Avoid contact with light-coloured fabrics, leather and upholstery. Colour may transfer due to the nature of indigo-dyed denim.']
for i in range(len(instructionList)):
newCompInst.append(translationFile.loc[translationFile['English'] == instructionList[i]].to_string(index=False, header=False))
newInst = [i.replace(' ', ',;') for i in newInst ]
strippedInst = [item.lstrip() for item in newInst ]
print('strippedInst: ', strippedInst)
The output I get from the following code is:
strippedInst: ['Avoid contact with light-coloured fabrics, lea...,;Bviter le contact avec les tissus clairs, le c...,;Kontakt mit hellen Stoffen, Leder und Polsterm...,;Evitar el contacto con tejidos de colores clar...,;Evitare il contatto con capi dai colori delica...']
After running this code all of the languages get cut in half and the rest of the sentence gets replaced with '...' - (NOTE the ENGLISH in the 'strippedInst' and compare with what has been inputed to the loop (instructionList).
The output gets cut only when the sentence is long. I tried running smaller phrases and it all seems to come through fine.
This is the Expected output:
strippedInst:
['
Avoid contact with light-coloured fabrics, leather and upholstery. Colour may transfer due to the nature of indigo-dyed denim.,;
Éviter le contact avec les tissus clairs, le cuir et les tissus d'ameublement. Les couleurs peuvent déteindre en raison de la nature de la teinture indigo du denim.,;
Kontakt mit hellen Stoffen, Leder und Polstermöbeln vermeiden. Aufgrund der Indigofärbung kann sich die Farbe übertragen,;
Evitar el contacto con tejidos de colores claros, con cuero y con tapicerías. El tinte índigo de los vaqueros podría transferirse a dichas superficies.,;
Evitare il contatto con capi dai colori delicati, pelli e tappezzerie. Si potrebbe verificare una perdita del colore blu intenso del tessuto di jeans.,
']
EDIT:
Here is the entire standalone working function:
import pandas as pd
excel_file = 'C:/Users/user/Desktop/Translation_Table_Edit.xlsx'
translationFile = pd.read_excel(excel_file, encoding='utf-8')
compList = ['Avoid contact with light-coloured fabrics, leather and upholstery. Colour may transfer due to the nature of indigo-dyed denim.', 'Do not soak']
newComp = []
def myFunction():
global newComp
for i in range(len(compList)):
newComp.append(translationFile.loc[translationFile['English'] == compList[i]].to_string(index=False, header=False))
newComp = [i.replace(' ', ';') for i in newComp]
myFunction()
strippedComp = [item.lstrip() for item in newComp]
print(strippedComp)
This outputs following:
['Avoid contact with light-coloured fabrics, lea...;�viter le contact avec les tissus clairs, le c...;Kontakt mit hellen Stoffen, Leder und Polsterm...;Vermijd contact met lichtgekleurde stoffen, le...;Evitar el contacto con tejidos de colores clar...;Evitare il contatto con capi dai colori delica...', 'Do not soak;Ne pas laisser tremper;Nicht einweichen;Niet weken;No dejar en remojo;Non lasciare in ammollo']
The issues lies with calling to_string on a dataframe. Instead, first extract the values into an array (df_sub.iloc[0].values), and then join the elements of that list (';'.join(...)).
This should do the trick:
def myFunction():
global newComp
for i in range(len(compList)):
df_sub = translationFile.loc[translationFile['English'] == compList[i]]
if df_sub.shape[0] > 0:
newComp.append(';'.join(df_sub.iloc[0].values))
EDIT: suggested code improvements
In addition, (in my opinion) your code could be improved by the following (using pandas functionality instead of looping, adherence to naming convention in pep8, avoiding use of global variables):
import pandas as pd
df_translations = pd.read_excel('./Translation_Table_Edit.xlsx', encoding='utf-8')
to_translate = ['Avoid contact with light-coloured fabrics, leather and upholstery. Colour may transfer due to the nature of indigo-dyed denim.',
'Do not soak']
def get_translations(df_translations, to_translate, language='English'):
"""Looks up translatios for all items in to_translate.
Returns a list with semi-colon separated translations. None if no translations found."""
df_sub = df_translations[df_translations[language].isin(to_translate)].copy() # filter translations
df_sub = df_sub.apply(lambda x: x.str.strip()) # strip each cell
# format and combine translations into a list
ret = []
for translation in df_sub.values:
ret.append(';'.join(translation))
return ret
translations = get_translations(df_translations, to_translate)

why is the second loop never executed ?

Hi, i am actually working on a python program and i need to read a csv file and use data.append(line) to fill a data Array.
I wrote this following part of the program :
print "Lecture du fichier", table1
lecfi = csv.reader(open(table1,'r'),skipinitialspace = 'true',delimiter='\t')
# delimiter = caractere utilisé pour séparer les différentes valeurs
tempSize = 0
tempLast = ""
oldSize = 0
#on initialise la taille du fichier et la derniere ligne du fichier
if os.path.exists(newFilePath):
tempSize = os.path.getsize(newFilePath)
else:
tempSize = 0
if os.path.exists(newFilePath) and tempSize != 0:
#Si le fichier tampon n'existe pas, on le créer
#Lecture du fichier tampon
lecofi = csv.reader(open(newFilePath,'r'),skipinitialspace = 'true',delimiter='\t')
csvFileArray = []
for lo in lecofi:
csvFileArray.append(lo)
tempLast = str(csvFileArray[0])
tempLast = tempLast[2:-2]
oldSize = csvFileArray[1]
print "Tempon de Last : ", tempLast
print "Taille du fichier : ", str(oldSize)
#on récupere la ligne représentant la derniere ligne de l'ancien fichier
else:
#si le fichier n'existe pas, on lui laisse cette valeur par défaut pour le traitement suivant
tempLast = None
# remplissage des données du fichier pulse dans la variable data
cpt = 0
indLast = 0
fileSize = os.path.getsize(table1)
if oldSize != fileSize:
for lecline in lecfi:
cpt = cpt + 1
last = str(lecline)
if tempLast != None and last == tempLast:
print "TEMPLAST != NONE", cpt
indLast = cpt
print "Indice de la derniere ligne : ", indLast
print last, tempLast
print "Variable indLast : ", indLast
i = 0
for co in lecfi:
print "\nCOOOOOOO : ", co
if i == indLast:
data.append(co[0])
i=i+1
for da in data:
print "\n Variable data : ", da
now look at the prints :
Lecture du fichier Data_Q1/2018-05-23/2018-5-23_13-1-35_P_HOURS_Q1
Tempon de Last : ['(2104.72652']
Taille du fichier : ['20840448']
TEMPLAST != NONE 317127
Indice de la derniere ligne : 317127
['(2104.72652'] ['(2104.72652']
Variable indLast : 317127
It seems like the program doesn't care about what's following my for loop. I assume that it can be a really basic mistake but i can't get it.
Any help ?
You are trying to iterate over the CSV twice without reseting it. this is the reason your data array is empty.
The first time you actually iterates over the file:
for lecline in lecfi:
The second time, the original iterator already reached it's end and is empty:
for co in lecfi:
As mentioned in the comments by Johnny Mopp one possible solution is using the following method:
Python csv.reader: How do I return to the top of the file?
Hope this explains your issue.
Here:
for lecline in lecfi:
cpt = cpt + 1
# ...
you are reading the whole file. After this loop, the file pointer is at the end of the file and there's nothing more to be read. Hence here:
i = 0
for co in lecfi:
# ...
this second loop is never executed, indeed. You'd need to either reset the file pointer, or close and reopen the file, or read it in a list right from the start and iterate over this list instead.
FWIW, note that opening files and not closing them is bad practice and can lead to file corruption (not that much in your case since you're only reading but...). A proper implementation would look like:
with open(table1) as tablefile:
lecfi = csv.reader(tablefile, ....)
for lecline in lecfi:
# ....
tablefile.seek(0)
for lecline in lecfi:
# ....
Also, this:
lecofi = csv.reader(open(newFilePath,'r'),skipinitialspace = 'true',delimiter='\t')
csvFileArray = []
for lo in lecofi:
csvFileArray.append(lo)
would be better rewritten as:
with open(newFilePath) as newFile:
lecofi = csv.reader(newFile, ...)
csvFileArray = list(lecofi)

Python: Split CSV with character count

Need help in importing CSV file into python.
My CSV file
0,Donc, 2 jours, je me suis rendu compte que Musikfest est le lendemain de voir dmb, quel problème. Signifie que je ne peux pas aller ...
0,Le son est définitivement gâché.Noooooo mon bb
0,Il est le mien! Haha il me suit: ') m'aime et me veut.haha.i wana vivre en Amérique annie
I want to split the above file into 2 columns
Coloumn1 ---- Coloumn2
0 ---- Donc, 2 jours, je me suis rendu compte que Musikfest est le
lendemain de voir dmb, quel problème. Signifie que je ne peux pas
aller ...
0 ---- Le son est définitivement gâché.Noooooo mon bb
0 ---- Il est le mien! Haha il me suit: ') m'aime et me veut.haha.i wana
vivre en Amérique annie
Since my text has commas embedded and my value for the text is always the first character. Is it possible to read my CSV file with splitting first character and rest of the text?
You can use string.split() and specify a max split of 1. By this I mean, if you just want to split the line on the first comma, then do not read the file as a CSV. Instead read it line by line and split the line using string.split(',', 1)
You should use csv library to work with csv files: https://docs.python.org/3/library/csv.html#csv.reader
import csv
result = []
with open('test.csv') as csvfile:
csvreader = csv.reader(csvfile)
for row in csvreader:
result.append((row[0], ''.join(row[1:])))
print(result)

Categories

Resources