Cannot read content in CSV File in Pandas - python

I have a dataset from the State Security Department in my county that has some problems.
I can't read the records at all from the file that is made available in CSV, bringing up only empty records. When I convert the file to XLSX it does get read.
I would like to know if there is any possible solution to the above problem.
The dataset is available at: here or here.
I tried the code below, but i only get nulls, except for the first row in the first column:
df = pd.read_csv('mensal_ss.csv', sep=';', names=cols, encoding='latin1')
image
Thank you!

If you try with utf-16 as the encoding, it seems to work. However, note that the year rows complicates the parsing, so you may need some extra manipulation of the csv to circumvent that depending on what you want to do with the data
df = pd.read_csv('mensal_ss.csv', sep=';', encoding='utf-16')

try to use 'utf-16-le':
import pandas as pd
df = pd.read_csv('mensal_ss.csv', sep=';', encoding='utf-16-le')
print(df.head())

Related

Export DataFrame from Python to CSV

I have a data frame I created with my original data appended with the topics from topic modeling. I keep running into errors when trying to export the data table into csv.
I've tried both csv module and pandas but get errors from both.
The data table has 1765 rows so writing the file row by row is not really an option.
When using pandas, most common errors are
DataFrame constructor not properly called!
and
function object has no attribute 'to_csv'
Code used:
import pandas as pd
data = (before.head)
df = pd.DataFrame(before.head)
df.to_csv (r'C:\Users\***\Desktop\beforetopics.csv', index = False, header=True)
print (df)
For the CSV module, there have been several errors such as
iterable expected, not method
Basically, how do I export this table (screenshot attached) into a csv file?
What is the command that you're trying to run?
Try this:
dataframe.to_csv('file_name.csv')
Or if it is the unicode error that you're coming across,
Try this:
dataframe.to_csv('file_name.csv', header=True, index=False, encoding='utf-8')
Since your dataframe's name is before,
Try this:
before.to_csv('file_name.csv', header=True, index=False, encoding='utf-8')
You can use the to_csv function:
before.to_csv('file_name.csv')
If you need extra options, you can check the documentation from here.

importing a dat file into pandas dataframe

I have a dat file downloaded from the below address:
ratings
I need to import it as a pandas dataframe. I've used the below code:
ratings = pd.read_csv('ratings.dat', sep='::', header=None, names['user_id', 'movie_id', 'rating', 'timestamp'])
But the datframe resulted as below which is false:
should I used another method to import dat files?
I've also checked the below link but that doesn't help me.
Read data (.dat file) with Pandas
Works well for me :
DAT=pd.read_csv('https://raw.githubusercontent.com/wesm/pydata-book/1st-edition/ch02/movielens/ratings.dat',
sep='::', header=None, names=['user_id', 'movie_id', 'rating', 'timestamp'])
DAT.head(5)
Are you sure you read the proper file (the raw one from github) ?
The URL you mention doest not point to the proper file, use the raw one.

CSV copy with pandas

I know this topic has been extensively treated, but I'm not able to get what I want, sorry about the probably newbie question. So the thing is I have a CSV like this:
Date,"Tmax","Tmin","Tmedia","Rachas","Vmax","LT","L1","L2","L3","L4"
23 nov 2018,"14.0 (15:30)","7.3 (23:59)","10.7","12 (14:50)","5 (14:50)","2.0","1.6","0.4","0.0","0.0"
I am getting a new CSV like that one each day, with multiple rows, but I'm interested only in the first row after the header. What I want to do is copying that first row each day to a new CSV iteratively, so at the end of the week, that CSV should have seven rows. Additionally, I'd like to check if that date is already in that daily file. The thing is that I'm not getting the new CSV right, here's my try:
import pandas as pd
df = pd.read_csv('file.csv', skiprows=4, header=None)
writer=df[df.index.isin([0])].to_csv('output.csv',header=None)
The problem with this code is that it overwrites the file output.csv each time. Then I considered changing it to:
writer=df[df.index.isin([0])]
pd.read_csv('output.csv').append(writer).to_csv('output.csv',header=None)
The problem now is that it does need the file to previously exist; and even so, the information is not correctly copied to the new file. I think it must be simpler than this, but I'm stuck. Thanks for your help.
If you only want the first row after the header, read the header and just use nrows=1. Then use open in append mode to write your one-row dataframe to the end of the csv file. The header=False argument deals nicely with excluding the header when writing.
df = pd.read_csv('file.csv', nrows=1)
with open('output.csv', 'a') as fout:
df.to_csv(fout, header=False)
I've omitted skiprows=4 because it's not clear how this relates to your input data.

Problems with creating a CSV file using Excel

I have some data in an Excel file. I would like to analyze them using Python. I started by creating a CSV file using this guide.
Thus I have created a CSV (Comma delimited) file filled with the following data:
I wrote a few lines of code in Python using Spyder:
import pandas
colnames = ['GDP', 'Unemployment', 'CPI', 'HousePricing']
data = pandas.read_csv('Dane_2.csv', names = colnames)
GDP = data.GDP.tolist()
print(GDP)
The output is nothing I've expected:
It can be easily seen that the output differs a lot from the figures in GDP column. I will appreciate any tips or hints which will help to deal with my problem.
Seems like in the GDP column there are decimal values from the first column in the .csv file and first digits of the second column. There's either something wrong with the .csv you created, but more probably you need to specify separator in the pandas.read_csv line. Also, add header=None, to make sure you don't lose the first line of the file (i.e. it will get replaced by colnames).
Try this:
import pandas
colnames = ['GDP', 'Unemployment', 'CPI', 'HousePricing']
data = pandas.read_csv('Dane_2.csv', names = colnames, header=None, sep=';')
GDP = data.GDP.tolist()
print(GDP)

Unable to get correct output from tsv file using pandas

I have a tsv file which I am trying to read by the help of pandas. The first two rows of the files are of no use and needs to be ignored. Although, when I get the output, I get it in the form of two columns. The name of the first column is Index and the name of second column is a random row from the csv file.
import pandas as pd
data = pd.read_csv('zahlen.csv', sep='\t', skiprows=2)
Please refer to the screenshot below.
The second column name is in bold black, which is one of the row from the file. Moreover, using '\t' as delimiter does not separate the values in different column. I am using Spyder IDE for this. Am I doing something wrong here?
Try this:
data = pd.read_table('zahlen.csv', header=None, skiprows=2)
read_table() is more suited for tsv files and read_csv() is a more specialized version of it. Then header=None will make first row data, instead of header.

Categories

Resources