I am having a rough time getting my code (python 3) to read a txt file. I am using Pandas to get it to work and I have it read the file and gets the right number of rows, but the module reads the file as one column and makes the entire dataframe into one column 0. Here is an example of the code.
import pandas as pd
import numpy as np
data = pd.read_csv(r'file.txt',header=None)
I have used the delimiters/seperaters setup too in the line of code like \t or ' ' but it couldn't read the file then.
Here is an example of what the file looks like.
JK+0923 7.05 19.3 200.4 -56.1 0.140 0.022 2010 GHT-Jermi
As you can see, there is no header.
Either way, would like help. Thanks.
I want it to read the columns correctly.
import pandas as pd
import numpy as np
data = pd.read_csv(r'asd.txt',header=None,sep='\t')
This should work if thedelimiter in your case is tab
or you can use a regex like \s+ for the value of sep for accepting multiple spaces as delimiter
The pd.read_csv() function expects a header when used in the standard way. However, you can specify the header=None parameter, see this question for more details:
Pandas read in table without headers
As you pointed out in your question, you have already tried to specify the delimiter when reading in the file, so the combination of both should help you read the file in correctly:
data = pd.read_csv(r'file.txt',header=None, sep='\t')
Related
My code is:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dataset= pd.read_csv('libro1.csv')
Where in my excel I have 60 rows and 14 columns
but it shows me a Dataframe of size (59,1)
Pandas parses the first row as a header, so that's correct to have 59 rows in your case. You can disable this using header=None parameter.
Regarding the columns. Your csv file probably has non-standard delimiter like \t. Pandas assumes comma by default. Open the file in a simple text editor , check your delimiter and set the sep parameter if it is not a comma.
Is this truly a .csv file or an .excel / .xlsx file?
If not, you should open it with read_excel rather than read_csv.
I just want to import this csv file. It can read it but somehow it doesn't create columns. Does anyone know why?
This is my code:
import pandas as pd
songs_data = pd.read_csv('../datasets/spotify-top50.csv', encoding='latin-1')
songs_data.head(n=10)
Result that I see in Jupyter:
P.S.: I'm kinda new to Jupyter and programming, but after all I found it should work properly. I don't know why it doesn't do it.
To properly load a csv file you should specify some parameters. for example in you case you need to specify quotechar:
df = pd.read_csv('../datasets/spotify-top50.csv',quotechar='"',sep=',', encoding='latin-1')
df.head(10)
If you still have a problem you should have a look at your CSV file again and also pandas documentation, so that you can set parameters to match with your CSV file structure.
My csv file looks like the following:
As you see there are 7 columns with comma separated. I have spent hours to read and plot the first column starting with 31364 with the following code:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('test.csv', sep=',', header=None, names=['colA','colB','colC','colD','colE','colF','colG'])
y = df['colA']
plt.plot(y)
But the code outputs this plot which does not match the data at all:
I'm using Spyder with Anaconda. What could be the problem?
Is column A all values in the 31,000 range? You're not plotting the whole file.
edit: Don't know what result you're looking for. In your code, the first column in your csv is used as the index to the dataframe (after you read the csv, enter 'df', no quotes, at the python prompt to see what your dataset looks like.
If you don't want the first column in the csv as an index, add 'index_col=False', no quotes, to the parameters when you read the csv in.
Also, not a good idea to end lines in a csv wit the delimiter, comma in this case.
import pandas as pd
check = pd.read_csv('1.csv')
nocheck = check['CUSIP'].str[:-1]
nocheck = nocheck.to_frame()
nocheck['CUSIP'] = nocheck['CUSIP'].astype(str)
nocheck.to_csv('NoCheck.csv')
This works but while writing the csv, a value for an identifier like 0003418 (type = str) converts to 3418 (type = general) when the csv file is opened in Excel. How do I avoid this?
I couldn't find a dupe for this question, so I'll post my comment as a solution.
This is an Excel issue, not a python error. Excel autoformats numeric columns to remove leading 0's. You can "fix" this by forcing pandas to quote when writing:
import csv
# insert pandas code from question here
# use csv.QUOTE_ALL when writing CSV.
nocheck.to_csv('NoCheck.csv', quoting=csv.QUOTE_ALL)
Note that this will actually put quotes around each value in your CSV. It will render the way you want in Excel, but you may run into issues if you try to read the file some other way.
Another solution is to write the CSV without quoting, and change the cell format in Excel to "General" instead of "Numeric".
I'm using the following code of Python using the Pandas library. The purpose of the code is to join 2 CSV files and works as exptected. In the CSV files all the values are within "". When using the Pandas libray they dissapear. I wonder what I can do to keep them? I have read the documentation and tried lots of options but can't seem to get it right.
Any help is much appreciated.
Code:
import pandas
csv1 = pandas.read_csv('WS-Produktlista-2015-01-25.csv', quotechar='"',comment='"')
csv2 = pandas.read_csv('WS-Prislista-2015-01-25.csv', quotechar='"', comment='"')
merged = csv1.merge(csv2, on='id')
merged.to_csv("output.csv", index=False)
Instead of getting a line like this:
"1","Cologne","4711","4711","100ml",
I'm getting:
1,Cologne,4711,4711,100ml,
EDIT:
I now found the problem. My files contains a header with 16 columns. The data lines contains 16 values separated with ",".
Just found that some lines contains values within "" that contains ",". This is confusing the parser. Instead of expecting 15 commas, it finds 18. One example below:
"23210","Cosmetic","Lancome","Eyes Virtuose Palette Makeup",**"7,2g"**,"W","Decorative range","5x**1,2**g Eye Shadow + **1,2**g Powder","http://image.jpg","","3660732000104","","No","","1","1"
How can make the parser ignore the comma sign within ""?