I have some data in an Excel file. I would like to analyze them using Python. I started by creating a CSV file using this guide.
Thus I have created a CSV (Comma delimited) file filled with the following data:
I wrote a few lines of code in Python using Spyder:
import pandas
colnames = ['GDP', 'Unemployment', 'CPI', 'HousePricing']
data = pandas.read_csv('Dane_2.csv', names = colnames)
GDP = data.GDP.tolist()
print(GDP)
The output is nothing I've expected:
It can be easily seen that the output differs a lot from the figures in GDP column. I will appreciate any tips or hints which will help to deal with my problem.
Seems like in the GDP column there are decimal values from the first column in the .csv file and first digits of the second column. There's either something wrong with the .csv you created, but more probably you need to specify separator in the pandas.read_csv line. Also, add header=None, to make sure you don't lose the first line of the file (i.e. it will get replaced by colnames).
Try this:
import pandas
colnames = ['GDP', 'Unemployment', 'CPI', 'HousePricing']
data = pandas.read_csv('Dane_2.csv', names = colnames, header=None, sep=';')
GDP = data.GDP.tolist()
print(GDP)
Related
This is my database:
https://archive.ics.uci.edu/ml/datasets/Parkinson+Speech+Dataset+with++Multiple+Types+of+Sound+Recordings
This database consist of training data and test data. The training data consists of many features; one column is one feature. I intend to convert each column into a separate Excel sheet.
The following is my Python code that I formulated to convert the entire text file into a CSV. But I intend to convert the entire text file into Excel sheets. For example, the entire text file contains 10 columns, so I want to create 10 Excel sheets with each column separated into one Excel sheet. Can any expert guide me on how to do it? I am completely new to Python so I hope someone can help me.
import pandas as pd
read_file = pd.read_csv (r'C://Users/RichardStone/Pycharm/Project/train_data.txt')
read_file.to_csv (r'C://Users/RichardStone/Pycharm/Project/train_data.csv', index=None)
Try this.
sheetnames = list()
for i in range(len(read_file.columns)):
sheetnames.append('Sheet' + str(i+1))
for i in range(len(read_file.columns)):
read_file.iloc[:, i].to_excel(sheetnames[i] + '.xlsx', index = False)
My csv file looks like the following:
As you see there are 7 columns with comma separated. I have spent hours to read and plot the first column starting with 31364 with the following code:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('test.csv', sep=',', header=None, names=['colA','colB','colC','colD','colE','colF','colG'])
y = df['colA']
plt.plot(y)
But the code outputs this plot which does not match the data at all:
I'm using Spyder with Anaconda. What could be the problem?
Is column A all values in the 31,000 range? You're not plotting the whole file.
edit: Don't know what result you're looking for. In your code, the first column in your csv is used as the index to the dataframe (after you read the csv, enter 'df', no quotes, at the python prompt to see what your dataset looks like.
If you don't want the first column in the csv as an index, add 'index_col=False', no quotes, to the parameters when you read the csv in.
Also, not a good idea to end lines in a csv wit the delimiter, comma in this case.
I tried to get some basic statistics of my columns in csv file, but apparently, I can't even get the contents of the columns in my output.
I tried data['columnname']
import pandas as p
data = p.read_csv('Amazon.csv',delimiter='~}',na_values='nan')
data.columns
data['Title']
I expect to get the contents of 'Title' in my output
Without knowing the exact format of the .csv it's a bit hard, but hopefully something like this helps:
data = pd.read_csv("Amazon.csv", ... , header=0)
Setting header=0 will read the first line of the file and make it the column names.
If names aren't defined by the .csv you can either add them to the first line and use header=0 or use names=<array-like>.
data = pd.read_csv("Amazon.csv", ... , names=['Title',...,'LastCol'])
See: Pandas Docs for read_csv
I'm working on a rather large excel file and as part of it I'd like to insert two columns into a new excel file at the far right which works, however whenever I do so an unnamed column appears at the far left with numbers in it.
This is for an excel file, and I've tried to use the .drop feature as well as use a new file and read about the CSV files but I cannot seem to apply it here, so nothing seems to solve it.
wdf = pd.read_excel(tLoc)
sheet_wdf_map = pd.read_excel(tLoc, sheet_name=None)
wdf['Adequate'] = np.nan
wdf['Explanation'] = np.nan
wdf = wdf.drop(" ", axis=1)
I expect the output to be my original columns with only the two new columns being on the far right without the unnamed column.
Add index_col=[0] as an argument to read_excel.
I have a tsv file which I am trying to read by the help of pandas. The first two rows of the files are of no use and needs to be ignored. Although, when I get the output, I get it in the form of two columns. The name of the first column is Index and the name of second column is a random row from the csv file.
import pandas as pd
data = pd.read_csv('zahlen.csv', sep='\t', skiprows=2)
Please refer to the screenshot below.
The second column name is in bold black, which is one of the row from the file. Moreover, using '\t' as delimiter does not separate the values in different column. I am using Spyder IDE for this. Am I doing something wrong here?
Try this:
data = pd.read_table('zahlen.csv', header=None, skiprows=2)
read_table() is more suited for tsv files and read_csv() is a more specialized version of it. Then header=None will make first row data, instead of header.