Having problem with csv export in python with jupyter - python

i've tried several solutions that have on stack and each one give me some diferent error. The last one i tried is this:
df = pd.read_csv('arima1.csv', sep=';',parse_dates={'Month':[0, 1]}, index_col = 'Month')
df.head()
plt.xlabel('Data')
plt.ylabel('Receita')
plt.plot(df)
and i get this error:
IndexError: list index out of range
this is my CSV file:
https://drive.google.com/file/d/1BlDo10_Oz1RzFEcosiVgdGickXs4elSA/view?usp=sharing
thks

CSV file needs cleaning
df = pd.read_csv("arima1.csv",sep='\"+')
# df['Month']= pd.to_datetime(df['Month,'],format="%m/%d/%Y,")
# df['Receita'] = df['Receita'].apply(lambda x: float(x.replace("R$","").replace(",","")))
# df.set_index(['Month'])['Receita'].plot()

Your seperator is ',' not ';'.
When trying to separate by ; you don't have a column named 'Month'

Related

How to preserve complicated excel header formats when manipulating data using Pandas Python?

I am parsing a large excel data file to another one, however the headers are very abnormal. I tried to use "read_excel skiprows" and that did not work. I also tried to include the header in
df = pd.read_excel(user_input, header= [1:3], sheet_name = 'PN Projection'), but then I get this error "ValueError: cannot join with no overlapping index names." To get around this I tried to name the columns by location and that did not work either.
When I run the code as shows below everything works fine, but past cell "U" I get the header titles to be "unnamed1, 2, ..." I understand this is because pandas is considering the first row to be the header(which are empty), but how do I fix this? Is there a way to preserve the headers without manually typing in the format for each cell? Any and all help is appreciated, thank you!
small section of the excel file header
the code I am trying to run
#!/usr/bin/env python
import sys
import os
import pandas as pd
#load source excel file
user_input = input("Enter the path of your source excel file (omit 'C:'): ")
#reads the source excel file
df = pd.read_excel(user_input, sheet_name = 'PN Projection')
#Filtering dataframe
#Filters out rows with 'EOL' in column 'item status' and 'xcvr' in 'description'
df = df[~(df['Item Status'] == 'EOL')]
df = df[~(df['Description'].str.contains("XCVR", na=False))]
#Filters in rows with "XC" or "spartan" in 'description' column
df = df[(df['Description'].str.contains("XC", na=False) | df['Description'].str.contains("Spartan", na=False))]
print(df)
#Saving to a new spreadsheet called Filtered Data
df.to_excel('filtered_data.xlsx', sheet_name='filtered_data')
If you do not need the top 2 rows, then:
df = pd.read_excel(user_input, sheet_name = 'PN Projection',error_bad_lines=False, skiprows=range(0,2)
This has worked for me when handling several strangely formatted files. Let me know if this isn't what your looking for, or if their are additional issues.

How to solve the following errors "Name 'X' is not defined" when Subtracting multiple column data from two CSV files?

I want to subtract multiple column data from two text file. The text file contain 6 columns and these columns are not named.So I named it as No,X,Y,Z,Date,Time and seperated with comma. I want to perform X-X1, Y-Y1, Z-Z1. Date and Time are not important and they are only for reference. For this I have opened the files with different dataframe and I have used concat and then I produced another csv file which contains data from two text file in single CSV file. Now when I am subtracting the columns X-X1, Y-Y1, Z-Z1 I am getting the following error: "Name 'X' is not defined" also I am getting the following error: AttributeError: 'tuple' object has no attribute 'to_csv', when trying to produce file named "difference.csv".
Please help me to solve this error. Below is my code.
import pandas as pd
import os
import numpy as np
df1 = pd.read_csv('D:\\Work\\Data1.txt', names=['No1','X1','Y1','Z1','Date1','Time1'], sep='\s+')
df2 = pd.read_csv('D:\\Work\\Data19.txt', names=['No','X','Y','Z','Date','Time'], sep='\s+')
total=pd.concat([df1,df2], axis=1)
total.to_csv("merge.csv")
cols = ['X','Y','Z','X1','Y1','Z1']
print(total)
df3 = pd.read_csv('C:\\Users\\Admin\\PycharmProjects\\project1\\merge.csv')
df4[X,Y,Z] = df3[X,Y,Z]-df3[X1,Y1,Z1]
print(df4)
df4.to_csv("difference.csv")
How about fixing your code like this?
df1 = pd.read_table('D:\\Work\\Data1.txt', names=['No1','X1','Y1','Z1','Date1','Time1'], sep='\s+')
df2 = pd.read_table('D:\\Work\\Data19.txt', names=['No','X','Y','Z','Date','Time'], sep='\s+')
total.to_csv('merge.zip', index = False)
Also, I think the index of the data frame resulting from the code below is
total=pd.concat([df1,df2], axis=1)
['No1','X1','Y1','Z1','Date1','Time1','No','X','Y','Z','Date','Time']
I hope my answer is helpful
try this
df3["X2"] = df3["X"]-df3["X1"]
df3["Y2"] = df3["Y"]-df3["Y1"]
df3["Z2"] = df3["Z"]-df3["Z1"]
df4 = df3.loc[:, ["X2", "Y2", "Z2"]
df4 = df4.rename(columns={"X2": "X", "Y2": "Y", "Z2": "Z"}) #optional
df4.to_csv("difference.csv")

Pandas Concatenate dataframes

This is driving me nuts! I have several Dataframe that I am trying to concatenate with pandas. The index is the filename. When I use df.to_csv for individual data frames I can see the index column (filename) along with the column of interest. When I concatenate along the filename axis I only get the column of interest and numbers. No filename.
Here is the code I am using as is. It works as I expect up until the "all_filename" line.
for filename in os.listdir(directory):
if filename.endswith("log.csv"):
df = pd.read_fwf(filename, skiprows=186, nrows=1, names=["Attribute"])
df['System_Library_Name'] = [x.split('/')[6] for x in df['Attribute']]
df2= pd.concat([df for filename in os.listdir(directory)], keys=[filename])
df2.to_csv(filename+"log_info.csv", index=filename)
all_filenames = glob.glob(os.path.join(directory,'*log_info.csv'))
cat_log = pd.concat([pd.read_csv(f) for f in all_filenames ])
cat_log2= cat_log[['System_Library_Name']]
cat_log2.to_excel("log.xlsx", index=filename)
I have tried adding keys=filename to the 3rd to last line and giving the index a name with df.index.name=
I have used similar code before and had it work fine, however this is only one column that I am using from a larger original input file if that makes a difference.
Any advice is greatly appreciated!
df = pd.concat(
# this is just reading one value from each file, yes?
[pd.read_fwf(filename, skiprows=186, nrows=1, names=["Attribute"])
.set_index(pd.Index([filename]))
.applymap(lambda x: x.split('/')[6])
.rename(columns={'Attribute':'System_Library_Name'})
for filename in glob.glob(os.path.join(directory,'*log.csv'))
]
)
df.to_xlsx("log_info.xlsx")

Error in calling a column from pandas using the column name [duplicate]

I'm trying to read in a CSV file into a pandas dataframe and select a column, but keep getting a key error.
The file reads in successfully and I can view the dataframe in an iPython notebook, but when I want to select a column any other than the first one, it throws a key error.
I am using this code:
import pandas as pd
transactions = pd.read_csv('transactions.csv',low_memory=False, delimiter=',', header=0, encoding='ascii')
transactions['quarter']
This is the file I'm working on:
https://www.dropbox.com/s/81iwm4f2hsohsq3/transactions.csv?dl=0
Thank you!
use sep='\s*,\s*' so that you will take care of spaces in column-names:
transactions = pd.read_csv('transactions.csv', sep=r'\s*,\s*',
header=0, encoding='ascii', engine='python')
alternatively you can make sure that you don't have unquoted spaces in your CSV file and use your command (unchanged)
prove:
print(transactions.columns.tolist())
Output:
['product_id', 'customer_id', 'store_id', 'promotion_id', 'month_of_year', 'quarter', 'the_year', 'store_sales', 'store_cost', 'unit_sales', 'fact_count']
if you need to select multiple columns from dataframe use 2 pairs of square brackets
eg.
df[["product_id","customer_id","store_id"]]
I met the same problem that key errors occur when filtering the columns after reading from CSV.
Reason
The main reason of these problems is the extra initial white spaces in your CSV files. (found in your uploaded CSV file, e.g. , customer_id, store_id, promotion_id, month_of_year, )
Proof
To prove this, you could try print(list(df.columns)) and the names of columns must be ['product_id', ' customer_id', ' store_id', ' promotion_id', ' month_of_year', ...].
Solution
The direct way to solve this is to add the parameter in pd.read_csv(), for example:
pd.read_csv('transactions.csv',
sep = ',',
skipinitialspace = True)
Reference: https://stackoverflow.com/a/32704818/16268870
The key error generally comes if the key doesn't match any of the dataframe column name 'exactly':
You could also try:
import csv
import pandas as pd
import re
with open (filename, "r") as file:
df = pd.read_csv(file, delimiter = ",")
df.columns = ((df.columns.str).replace("^ ","")).str.replace(" $","")
print(df.columns)
Give the full path of the CSV file in the pd.read_csv(). This works for me.
Datsets when split by ',', create features with a space in the beginning. Removing the space using a regex might help.
For the time being I did this:
label_name = ' Label'

Key error when selecting columns in pandas dataframe after read_csv

I'm trying to read in a CSV file into a pandas dataframe and select a column, but keep getting a key error.
The file reads in successfully and I can view the dataframe in an iPython notebook, but when I want to select a column any other than the first one, it throws a key error.
I am using this code:
import pandas as pd
transactions = pd.read_csv('transactions.csv',low_memory=False, delimiter=',', header=0, encoding='ascii')
transactions['quarter']
This is the file I'm working on:
https://www.dropbox.com/s/81iwm4f2hsohsq3/transactions.csv?dl=0
Thank you!
use sep='\s*,\s*' so that you will take care of spaces in column-names:
transactions = pd.read_csv('transactions.csv', sep=r'\s*,\s*',
header=0, encoding='ascii', engine='python')
alternatively you can make sure that you don't have unquoted spaces in your CSV file and use your command (unchanged)
prove:
print(transactions.columns.tolist())
Output:
['product_id', 'customer_id', 'store_id', 'promotion_id', 'month_of_year', 'quarter', 'the_year', 'store_sales', 'store_cost', 'unit_sales', 'fact_count']
if you need to select multiple columns from dataframe use 2 pairs of square brackets
eg.
df[["product_id","customer_id","store_id"]]
I met the same problem that key errors occur when filtering the columns after reading from CSV.
Reason
The main reason of these problems is the extra initial white spaces in your CSV files. (found in your uploaded CSV file, e.g. , customer_id, store_id, promotion_id, month_of_year, )
Proof
To prove this, you could try print(list(df.columns)) and the names of columns must be ['product_id', ' customer_id', ' store_id', ' promotion_id', ' month_of_year', ...].
Solution
The direct way to solve this is to add the parameter in pd.read_csv(), for example:
pd.read_csv('transactions.csv',
sep = ',',
skipinitialspace = True)
Reference: https://stackoverflow.com/a/32704818/16268870
The key error generally comes if the key doesn't match any of the dataframe column name 'exactly':
You could also try:
import csv
import pandas as pd
import re
with open (filename, "r") as file:
df = pd.read_csv(file, delimiter = ",")
df.columns = ((df.columns.str).replace("^ ","")).str.replace(" $","")
print(df.columns)
Give the full path of the CSV file in the pd.read_csv(). This works for me.
Datsets when split by ',', create features with a space in the beginning. Removing the space using a regex might help.
For the time being I did this:
label_name = ' Label'

Categories

Resources