This question already has answers here:
How exactly does a generator comprehension work?
(8 answers)
Apply function to each element of a list
(4 answers)
Closed 6 months ago.
This is my code below and whenever I run my program, I receive and error stating "attribute error: 'generator' object has no attribute 'loc'"
I'm currently trying to change specified values in a specified column in all csv files to different specified values for the specified column
I'm not sure why this is happening
# Get CSV files list from a folder
csv_files = glob.glob(dest_dir + "/*.csv")
# Read each CSV file into DataFrame
# This creates a list of dataframes
df = (pd.read_csv(file) for file in csv_files)
df.loc[df['Plan_Code'].str.contains('NABVCI'), 'Plan_Code'] = 'CLEAR_BV'
df.loc[df['Plan_Code'].str.contains('NAMVCI'), 'Plan_Code'] = 'CLEAR_MV'
df.loc[df['Plan_Code'].str.contains('NA_NRF'), 'Plan_Code'] = 'FA_GUAR'
df.to_csv(csv_files, index=False)
Thanks!
You wrote this:
df = (pd.read_csv(file) for file in csv_files)
Rather than that generator expression,
you probably intended to write a list comprehension:
df = [pd.read_csv(file) for file in csv_files]
Additionally you likely want to call pd.concat(),
so that multiple .CSVs get incorporated into
a single dataframe.
Alternatively, you might prefer to build up a list of
dicts pulled from csv.DictReader,
and then call pd.DataFrame() on that list.
Multiple .csv files could contribute rows to the list.
One dict per row, without regard to which file
the row appears in.
Because you use round brackets and not square brackets when creating df, df becomes a generator object and not a list of dataframes. But even if you switch to square brackets you will still have a problem: df will now be a list, but lists don't have a loc attribute either, only dataframes -- individual elements of that list -- have it. So df.loc still wouldn't work.
If I understand your intent correctly, you want something like this instead:
csv_files = glob.glob(dest_dir + "/*.csv")
for file in csv_files:
df = pd.read_csv(file) #now df is a dataframe, so df.loc makes sense
#do your df.loc manipulations, then save each df to its own file
df.to_csv(file, index=False)
Related
i am trying to remove the 5th and sixth item of each line of my csv file each line is a list but when i am trying to run it i am getting a (DataFrame constructor not properly called!) error please help
i have tried everything i can but i cant find a simple way to remove the last 2 items of every list and then after this i want to add 2 items onto every list with a random int between diffent numbers .
Just edit how you're reading the file
You should use
df = pd.read_csv('database.csv')
that's why you are getting that error.
You reading the file incorrectly
df = pd.DataFrame() is for creating a new dataframe.
You should use
df = pd.read_csv("filename.csv")
I am trying to load multiple CSVs into a single pandas dataframe. They are all in one file, and all have the same column structure. I have tried a few different methods from a few different threads, and all return the error 'ValueError: No objects to concatenate.' I'm sure the problem is something dumb like my file path? This is what I've tried:
temps = pd.concat(map(pd.read_csv, glob.glob(os.path.join('./Resources/temps', "*.csv"))))
Also this:
path = r'./Resources/temps'
temps_csvs = glob.glob(os.path.join(path, "*.csv"))
df_for_each_csv = (pd.read_csv(f) for f in temps_csvs)
temps_df = pd.concat(df_for_each_csv, ignore_index = True)```
Thanks for any help!
It might not be as helpful as other answers, but when I tried running your code, it work perfectly fine. The only difference that conflicted was that I changed the path to be like this:
temps_csvs = glob.glob(os.path.join(os.getcwd(), "*.csv"))
df_for_each_csv = (pd.read_csv(f) for f in temps_csvs)
temps_df = pd.concat(df_for_each_csv, ignore_index = True)
and put the script in the same folder as to where the csv files are.
EDIT: I saw your comment about how you are having an error ParserError: Error tokenizing data. C error: Expected 5 fields in line 1394, saw 6
This means that the csv files don't have the same number of columns, here is a question that deals with a similar issue, maybe it will help :
Reading a CSV file with irregular number of columns using Pandas
Change tuple to list on the third line.
[pd.read_csv(f) for f in temps_csvs]
or add tuple to it: tuple(pd.read_csv(f) for f in temps_csvs)
Tuple Comprehension doesn't work this way.
See Why is there no tuple comprehension in Python?
I have to work with 50+ .txt files each containing 2 columns and 631 rows where I have to do different operations to each (sometimes with each other) before doing data analysis. I was hoping there was a way to import each text file under a different dataframe in pandas instead of doing it individually. The code I've been using individually has been
df = pd.read_table(file_name, skiprows=1, index_col=0)
print(B)
I use index_col=0 because the first row is the x-value. I use skiprows=1 because I have to drop the title which is the first row (and file name in folder) of each .txt file. I was thinking maybe I could use glob package and importing all as a single data frame from the folder and then splitting it into different dataframes while keeping the first column as the name of each variable? Is there a feasible way to import all of these files at once under different dataframes from a folder and storing them under the first column name? All .txt files would be data frames of 2 col x 631 rows not including the first title row. All values in the columns are integers.
Thank you
Yes. If you store your file in a list named filelist (maybe using glob) you can use the following commands to read all files and store them on a dict.
dfdict = {f: pd.read_table(f,...) for f in filelist}
Then you can use each data frame with dfdict["filename.txt"].
I have searched a lot but could not find a solution to my problem.
Here is what i am trying to do:
Read multiple txt(csv) files into dataframes and merge them into one big data frame. They have identical columns! (no problem)
Now when I try to write the final (concatinated) dataframe back into txt(csv) file, the function to_csv writing the content per row as strings per row. I need it to be like a normal csv file which is comma separated (but not contained in string per row).
What Am I doing wrong? May be not using concat or append function correctly?
My code:
Reading csv files into dataframes and appending them into a list
dfs=[]
for f in filenames:
df = pd.read_csv(f,delimiter='\t',header=None)
dfs.append(df)
concat them into final dataframe
dfs=pd.concat(dfs,axis=0)
Writing the final dataframe into one txt(csv) file
dfs.to_csv('merged.txt',header=None,index=False)
Here is the problem, the first row of the merged.txt file looks like (only copying few values from single row):
"131118091409,-400.198565,-0.018061"
How can I write the file without the double quotes (strings) per row? Thanks for the help.
Instead of this:
df = pd.read_csv(f,delimiter='\t',header=None)
Remove the delimiter='\t' or use this instead:
df = pd.read_csv(f,delimiter=',', header=None)
My ultimate goal is to merge the contents of a folder full of .xlsx files into one big file.
I thought the below code would suffice, but it only does the first file, and I can't figure out why it stops there. The files are small (~6 KB), so it shouldn't be a matter of waiting. If I print f_list, it shows the complete list of files. So, where am I going wrong? To be clear, there is no error returned, it just does not do the entire for loop. I feel like there should be a simple fix, but being new to Python and coding, I'm having trouble seeing it.
I'm doing this with Anaconda on Windows 8.
import pandas as pd
import glob
f_list = glob.glob("C:\\Users\\me\\dt\\xx\\*.xlsx") # creates my file list
all_data = pd.DataFrame() # creates my DataFrame
for f in f_list: # basic for loop to go through file list but doesn't
df = pd.read_excel(f) # reads .xlsx file
all_data = all_data.append(df) # appends file contents to DataFrame
all_data.to_excel("output.xlsx") # creates new .xlsx
Edit with new information:
After trying some of the suggested changes, I noticed the output claiming the files are empty, except for 1 of them which is slightly larger than the others. If I put them into the DataFrame, it claims the DataFrame is empty. If I put it into the dict, it claims there are no values associated. Could this have something to do with the file size? Many, if not most, of these files have 3-5 rows with 5 columns. The one it does see has 12 rows.
I strongly recommend reading the DataFrames into a dict:
sheets = {f: pd.read_excel(f) for f in f_list}
For one thing this is very easy to debug: just inspect the dict in the REPL.
Another is that you can then concat these into one DataFrame efficiently in one pass:
pd.concat(sheets.values())
Note: This is significantly faster than append, which has to allocate a temporary DataFrame at each append-call.
An alternative issue is that your glob may not be picking up all the files, you should check that it is by printing f_list.