I am trying to print rows of a dataframe one by one.
I only manage to loop over the columns instead of the rows:
first I am importing from a csv:
table_csv = pd.read_csv(r'C:\Users\xxx\Desktop\table.csv',sep=';', error_bad_lines=False)
Next, I convert into a datframe using pandas:
table_dataframe = DataFrame(table_csv)
I then start the for loop as follows:
for row in table_dataframe:
print(row)
It however loops over the columns instead of the instead over the row. However i need to perform alterations on rows. Does anybody know where this goes wrong or has an alternative solution?
Check out answers to this question
In short this is how you'd do it:
for index, row in table_dataframe.iterrows():
print(row)
Related
I have a csv file, the columns are:
Date,Time,Spread,Result,Direction,Entry,TP,SL,Bal
And typical entries would look like:
16/07/21,01:25:05,N/A,No Id,Null,N/A,N/A,N/A,N/A
16/07/21,01:30:06,N/A,No Id,Null,N/A,N/A,N/A,N/A
16/07/21,01:35:05,8.06,Did not qualify,Long,N/A,N/A,N/A,N/A
16/07/21,01:38:20,6.61,Trade,Long,1906.03,1912.6440000000002,1900.0,1000.0
16/07/21,01:41:06,N/A,No Id,Null,N/A,N/A,N/A,N/A
How would I access the latest entry where the Result column entry is equal to Trade preferably without looping through the whole file?
If it must be a loop, it would have to loop backwards from latest to earliest because it is a large csv file.
If you want to use pandas, try using read_csv with loc:
df = pd.read_csv('yourcsv.csv')
print(df.loc[df['Result'] == 'Trade'].iloc[[-1]])
Load your .csv into a pd.DataFrame and you can get all the rows where df.Results equals Trade like this:
df[df.Result == 'Trade']
if you only want the last one then list use .iloc
df[df.Result == 'Trade'].iloc[-1]
I hope this is what you are looking for.
I suggest you use pandas, but in case you really cannot, here's an approach.
Assuming the data is in data.csv:
from csv import reader
with open("data.csv") as data:
rows = [row for row in reader(data)]
col = rows[0].index('Result')
res = [row for i, row in enumerate(rows) if i > 0 and row[col] == 'Trade']
I advise against using this, way too brittle.
I am trying to create a DataFrame from a simple if statement result with no success. Could you show me the right method, please? This is what I have so far but the value of discrep is not added to the DataFrame.
discrepancy_value=round(system_availability.iloc[0,0]-data_av.iloc[0,0],2)
discrep=[]
if discrepancy_value>1:
discrep=discrepancy_value
else:
discrep=r'Discrepancy is not significant'
discrepancy=pd.DataFrame()
discrepancy['Discrepancy']=discrep
Your problem is, that you are trying to insert a single value in the dataframe. The dataframe needs lists, not values.
What you should be doing is:
discrep=[]
if discrepancy_value>1:
discrep.append(discrepancy_value)
else:
discrep.append(r'Discrepancy is not significant')
discrepancy=pd.DataFrame()
discrepancy['Discrepancy']=discrep
On one line:
discrepancy = pd.DataFrame({'Discrepancy': [discrepancy_value if discrepancy_value > 1 else r'Discrepancy is not significant']})
You are trying to set a column on an empty dataset with 0 rows. If there would be already rows in the dataframe the following would add the same value to all rows:
discrepancy['Discrepancy']=discrep
But because there are no rows in the dataframe, the column is not added to any row.
You could append a new row with the column value like this:
discrepancy.append([{'Discrepancy': discrep}])
Or add the row already when you create the dataframe
discrepancy=pd.DataFrame([{'Discrepancy': discrep}])
Trying to iterate of a dataframe using iterrows, but its telling me it is not defined.
after opening the excel file with read_excel and getting the data into what I believe to be a dataframe it will not let me use iterrows() on the dataframe
df = pd.read_excel('file.xlsx')
objDF = pd.DataFrame(df['RDX']) $Throws does not exist
for (i, r) in objDF.iterrows():
#do stuff
Expected to be able to iterate over the rows and perform a calculation
Why are you trying to create a dataframe from a dataframe? Is the sole intention to just iterate across one column of the original dataframe? If so, you could access the column as follows:
df = pd.read_excel('file.xlsx')
for index, row in df.iterrows():
print(row['RDX'])
I have an array of unique elements and a dataframe.
I want to find out if the elements in the array exist in all the row of the dataframe.
p.s- I am new to python.
This is the piece of code I've written.
for i in uniqueArray:
for index,row in newDF.iterrows():
if i in row['MKT']:
#do something to find out if the element i exists in all rows
Also, this way of iterating is quite expensive, is there any better way to do the same?
Thanks in Advance.
Pandas allow you to filter a whole column like if it was Excel:
import pandas
df = pandas.Dataframe(tableData)
Imagine your columns names are "Column1", "Column2"... etc
df2 = df[ df["Column1"] == "ValueToFind"]
df2 now has only the rows that has "ValueToFind" in df["Column1"]. You can concatenate several filters and use AND OR logical doors.
You can try
for i in uniqueArray:
if newDF['MKT'].contains(i).any():
# do your task
You can use isin() method of pd.Series object.
Assuming you have a data frame named df and you check if your column 'MKT' includes any items of your uniqueArray.
new_df = df[df.MKT.isin(uniqueArray)].copy()
new_df will only contain the rows where values of MKT is contained in unique Array.
Now do your things on new_df, and join/merge/concat to the former df as you wish.
I am trying to insert or add from one dataframe to another dataframe. I am going through the original dataframe looking for certain words in one column. When I find one of these terms I want to add that row to a new dataframe.
I get the row by using.
entry = df.loc[df['A'] == item]
But when trying to add this row to another dataframe using .add, .insert, .update or other methods i just get an empty dataframe.
I have also tried adding the column to a dictionary and turning that into a dataframe but it writes data for the entire row rather than just the column value. So is there a way to add one specific row to a new dataframe from my existing variable ?
So the entry is a dataframe containing the rows you want to add?
you can simply concatenate two dataframe using concat function if both have the same columns' name
import pandas as pd
entry = df.loc[df['A'] == item]
concat_df = pd.concat([new_df,entry])
pandas.concat reference:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html
The append function expect a list of rows in this formation:
[row_1, row_2, ..., row_N]
While each row is a list, representing the value for each columns
So, assuming your trying to add one row, you shuld use:
entry = df.loc[df['A'] == item]
df2=df2.append( [entry] )
Notice that unlike python's list, the DataFrame.append function returning a new object and not changing the object called it.
See also enter link description here
Not sure how large your operations will be, but from an efficiency standpoint, you're better off adding all of the found rows to a list, and then concatenating them together at once using pandas.concat, and then using concat again to combine the found entries dataframe with the "insert into" dataframe. This will be much faster than using concat each time. If you're searching from a list of items search_keys, then something like:
entries = []
for i in search_keys:
entry = df.loc[df['A'] == item]
entries.append(entry)
found_df = pd.concat(entries)
result_df = pd.concat([old_df, found_df])