Plot normal distribution in Python from a .csv file - python

The following script draws the Normal Distribution of a sort of data given.
import numpy as np
import scipy.stats as stats
import pylab as pl
h = sorted ([0.9, 0.6, 0.5, 0.73788,...]) #Data that I would like to change
fit = stats.norm.pdf(h, np.mean(h), np.std(h))
pl.plot(h,fit,'-o')
pl.show()
I would like to find how to plot the data taken from a .csv file instead of having to introduce it manually. Suppose the data wanted is in the 2nd column of a given .csv file, the way I know to do something similar to isolate the data is by creating an intermediate file, but maybe this is not even necessary.
with open('infile.csv','rb') as inf, open('outfile.csv','wb') as outf:
incsv = csv.reader(inf, delimiter=',')
outcsv = csv.writer(outf, delimiter=',')
outcsv.writerows(row[1] in incsv)
Anyway, basically my two questions here are,
- Would I be writing correctly the second column of a .csv into a new .csv file?
- How could I merge those two scripts so that I can substitute the static data in the first one for the data in a column of a .csv file?

It seems very roundabout to write the data back out to a file, presumably to read it back in again later. Why not create a list of the data?
def import_data(filename):
"""Import data in the second column of the supplied filename as floats."""
with open(filename, 'rb') as inf:
return [float(row[1]) for row in csv.reader(inf)]
You can then call this function to get the data you want to plot
h = sorted(import_data('infile.csv'))
As to your question "Would I be writing correctly the second column of a .csv into a new .csv file?", the answer is: test it and find out.

Related

How to write in a specific cell in a CSV file?

I have an asignment in which I need to imput random grades of different students in a csv file using Python 3, and get the average of each student(the average thing and how to get the random grades, I know how to do it), the thing is that I don't know how to write the grades on those specific columns and rows(highlighted ones).
Highlighted area is the space in which I need to write random grades:
Is there anyway that this can be done? I'm fairly new to programming and Python 3, and as far as I've read, specifics cells can't be changed using normal means.
csv module doesn't have functions to modify specific cells.
You can read rows from original file, append grades and write modified rows to new file:
import random
import csv
inputFile = open('grades.csv', 'r')
outputFile = open('grades_out.csv', 'w')
reader = csv.reader(inputFile)
writer = csv.writer(outputFile)
for row in reader:
grades = row.copy()
for i in range(5):
grades.append(random.randint(1, 5))
writer.writerow(grades)
inputFile.close()
outputFile.close()
Then you can delete original file and rename new file (it is not good to read the whole original file to a variable, close it, open it again in writing mode and then write data, because it can be big).

Extracting metadata from csv without loading data in python

I am trying to get the dimensions (shape) of a data frame using pandas in python without reading the entire data frame first in memory given that the file is quite large.
To get the number of columns with minimal loading of the file into the memory, I can for example use the argument below.
import pandas as pd
pd = pd.read_csv("myData.csv", nrows=1)
print(pd.shape)
To get the row numbers I can use the argument usecols = [1] when reading the file but there must be a simpler way of doing this.
If there are other packages or scripts that can easily give me such metadata information, I would be happy as well. It is really metadata I am looking for such as column names, number of rows, number of columns etc but I don't want to read the entire file in!
You don't even need pandas for this. Use the built-in csv module to parse the file:
import csv
with open('myData.csv')as fp:
reader = csv.reader(fp)
headers = next(reader) # The header row is now consumed
ncol = len(headers)
nrow = sum(1 for _ in reader) # What remains are the data rows

How to read, edit, merge and save all csv files from one folder?

I'm new in Python, I'm trying to read all .csv files from one folder, I must add the third column (Dataset 1)from all files to a new .csv file (or Excel file). I have no problem to work with one file and edit (read, cut rows and columns, add columns and make simple statistics).
This is an example of one of my CSV files Imgur
and I have more than 2000!!! each one with 1123 rows
This should be fairly easy with something like the csv library, if you don't want to get into learning dataframes.
import os
import csv
new_data = []
for filename in os.listdir('./csv_dir'):
if filename.endswith('.csv'):
with open('./csv_dir/' + filename, mode='r') as curr_file:
reader = csv.reader(curr_file, delimiter=',')
for row in reader:
new_data.append(row[2]) # Or whichever column you need
with open('./out_dir/output.txt', mode='w') as out_file:
for row in new_data:
out_file.write('{}\n'.format(row))
Your new_data will contain the 2000 * 1123 columns.
This may not be the most efficient way to do this, but it'll get the job done and grab each CSV. You'll need to do the work of making sure the CSV files have the correct structure, or adding in checks in the code for validating the columns before appending to new_data.
Maybe try
csv_file = csv.reader(open(path, "r",), delimiter=",")
csv_file1 = csv.reader(open(path, "r",), delimiter=",")
csv_file2 = csv.reader(open(path, "r",), delimiter=",")
and then read like
for row in csv_file:
your code here
for row in csv_file1:
your code here
for row in csv_file2:
your code here

Extracting columns containing a certain name

I'm trying to use it to manipulate data in large txt-files.
I have a txt-file with more than 2000 columns, and about a third of these have a title which contains the word 'Net'. I want to extract only these columns and write them to a new txt file. Any suggestion on how I can do that?
I have searched around a bit but haven't been able to find something that helps me. Apologies if similar questions have been asked and solved before.
EDIT 1: Thank you all! At the moment of writing 3 users have suggested solutions and they all work really well. I honestly didn't think people would answer so I didn't check for a day or two, and was happily surprised by this. I'm very impressed.
EDIT 2: I've added a picture that shows what a part of the original txt-file can look like, in case it will help anyone in the future:
One way of doing this, without the installation of third-party modules like numpy/pandas, is as follows. Given an input file, called "input.csv" like this:
a,b,c_net,d,e_net
0,0,1,0,1
0,0,1,0,1
(remove the blank lines in between, they are just for formatting the
content in this post)
The following code does what you want.
import csv
input_filename = 'input.csv'
output_filename = 'output.csv'
# Instantiate a CSV reader, check if you have the appropriate delimiter
reader = csv.reader(open(input_filename), delimiter=',')
# Get the first row (assuming this row contains the header)
input_header = reader.next()
# Filter out the columns that you want to keep by storing the column
# index
columns_to_keep = []
for i, name in enumerate(input_header):
if 'net' in name:
columns_to_keep.append(i)
# Create a CSV writer to store the columns you want to keep
writer = csv.writer(open(output_filename, 'w'), delimiter=',')
# Construct the header of the output file
output_header = []
for column_index in columns_to_keep:
output_header.append(input_header[column_index])
# Write the header to the output file
writer.writerow(output_header)
# Iterate of the remainder of the input file, construct a row
# with columns you want to keep and write this row to the output file
for row in reader:
new_row = []
for column_index in columns_to_keep:
new_row.append(row[column_index])
writer.writerow(new_row)
Note that there is no error handling. There are at least two that should be handled. The first one is the check for the existence of the input file (hint: check the functionality provide by the os and os.path modules). The second one is to handle blank lines or lines with an inconsistent amount of columns.
This could be done for instance with Pandas,
import pandas as pd
df = pd.read_csv('path_to_file.txt', sep='\s+')
print(df.columns) # check that the columns are parsed correctly
selected_columns = [col for col in df.columns if "net" in col]
df_filtered = df[selected_columns]
df_filtered.to_csv('new_file.txt')
Of course, since we don't have the structure of your text file, you would have to adapt the arguments of read_csv to make this work in your case (see the the corresponding documentation).
This will load all the file in memory and then filter out the unnecessary columns. If your file is so large that it cannot be loaded in RAM at once, there is a way to load only specific columns with the usecols argument.
You can use pandas filter function to select few columns based on regex
data_filtered = data.filter(regex='net')

How to export data (which is as result of Python program) from command line?

I am working on a Python program, and I have results on the command line.
Now I need to do analysis on the results, so I need all results as exported in any format like either SQL, or Excel or CSV format.
Can some tell me how can i do that ?
import csv
x1=1 x2=2
while True:
show = [ dict(x1=x1+1 , x2=x2+2)]
print('Received', show )
with open('large1.csv','w') as f1:
writer=csv.writer(f1, delimiter=' ',lineterminator='\n\n',)
writer.writerow(show)
x1=x1+1
x2=x2+1
Here this is infinite loop and I want to have a csv file containing 2 column of x1 and x2. and with regularly updated all values of x1 and x2 row wise (1 row for 1 iteration)
But by this code I'm getting a csv file which is named as 'large1.csv' and containing only one row (last updated values of x1 and x2).
So how can I get my all values of x1 and x2 as row was in python.
Just use the csv format it can easily imported into Excel and
the python standard library supports csv out of the box. #See python-csv
One way to handle this is to use the CSV module, and specifically a Writer object to write the output to a CSV file (perhaps even instead of writing to stdout). The documentation has several examples, including this one:
import csv
with open('some.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(someiterable)
You should then be able to import the CSV file easily in Excel if that is what you want.
Have a look at the open() documentation.
Mode w will truncate the file, which means it will replace the contents. Since you call that every loop iteration, you are continuously deleting the file and replacing it with a new one. Mode a appends to the file and is maybe what you want. You also might consider opening the file outside of the loop, in that case w might be the correct mode.

Categories

Resources