Classifying raw data into csv . python - python

I'm a real beginner in python and i was asked to use it to retrieve some data.
I manage to get them but now I need to file them in an excel tab or a csv that could be used later on.
The data I have were in this format:
2005-02-04T01:00:00+02:00,1836.910000#2005-02-05T01:00:00+02:00
I managed doing this to classify them better
>>> date_value = np.array( [ (dateutil.parser.parse(d), float(v))
... for d,v in [l.split(',') for l in values.text.split('#')]] )
>>>ts = pd.Series(date_value[:,1],index=date_value[:,0])
>>> ts
and now I got them on this format:
2005-02-04 01:00:00+02:00 1836.91
2005-02-05 01:00:00+02:00 1821.45
And now I can't find a way to store them as an excel or csv file.
If you have any advice...?
Thanks
G.

Since you seem to have quickly figured out how to use pandas to read data, the next step is to use it to write CSV. and that's done with:
pandas.DataFrame.to_csv
Excel is perfectly able to read CSV files, no need to convert to excel yourself. However if this is really a project requirement,
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_excel.html

Related

exporting to csv converts text to date

From Python i want to export to csv format a dataframe
The dataframe contains two columns like this
So when i write this :
df['NAME'] = df['NAME'].astype(str) # or .astype('string')
df.to_csv('output.csv',index=False,sep=';')
The excel output in csv format returns this :
and reads the value "MAY8218" as a date format "may-18" while i want it to be read as "MAY8218".
I've tried many ways but none of them is working. I don't want an alternative like putting quotation marks to the left and the right of the value.
Thanks.
If you want to export the dataframe to use it in excel just export it as xlsx. It works for me and maintains the value as string in the original format.
df.to_excel('output.xlsx',index=False)
The CSV format is a text format. The file contains no hint for the type of the field. The problem is that Excel has the worst possible support for CSV files: it assumes that CSV files always use its own conventions when you try to read one. In short, one Excel implementation can only read correctly what it has written...
That means that you cannot prevent Excel to interpret the csv data the way it wants, at least when you open a csv file. Fortunately you have other options:
import the csv file instead of opening it. This time you have options to configure the way the file should be processed.
use LibreOffice calc for processing CSV files. LibreOffice is a little behind Microsoft Office on most points except for csv file handling where it has an excellent support.

Output to CSV changing datatype

So I have a csv file with a column called reference_id. The values in reference id are 15 characters long, so something like '162473985649957'. When I open the CSV file, excel has changed the datatype to General and the numbers are something like '1.62474E+14'. To fix this in excel, I change the column type to Number and remove the decimals and it displays the correct value. I should add, it only does this in CSV file, if I output to xlsx, it works fine. PRoblem is, the file has to be csv.
Is there a way to fix this using python? I'm trying to automate a process. I have tried using the following to convert it to a string. It works in the sense that is converts the column to a string, but it still shows up incorrectly in the csv file.
df['reference_id'] = df['reference_id'].astype(str)
df.to_csv(r'Prev Day Branch Transaction Mems.csv')
Thanks
When I open the CSV file, excel has changed the data
This is an Excel problem. You can't fix how Excel decides to interpret your CSV. (You can work around some issues by using the text import format, but that's cumbersome.)
Either use XLS/XLSX files when working with Excel, or use eg. Gnumeric our something other that doesn't wantonly mangle your data.

Pandas txt to csv output only displays the first two lines of values, how do I get the full data to show?

My issue is as follows.
I've gathered some contact data from SurveyMonkey using the SM API, and I've converted that data into a txt file. When opening the txt file, I see the full data from the survey that I'm trying to convert into csv, however when I use the following code:
df = pd.read_csv("my_file.txt",sep =",", encoding = "iso-8859-10")
df.to_csv('my_file.csv')
It creates a csv file with only two lines of values (and cuts off in the middle of the second line). Similarly if I try to organize the data within a pandas dataframe, it only registers the first two lines, meaning most of my txt file is not being read registered.
As I've never run into this problem before and I've been able to convert into CSV without issues, I'm wondering if anyone here has ideas as to what might be causing this issue to occur and how I could go about solving it?
All help is much appreciated.
Edit:
I was able to get the data to display properly in csv, when I converted it directly into csv from json instead of converting it to a txt file first. I was not however able to figure out what when wrong in the conversion from txt to csv, as I tried multiple different encodings but came to the same result.

Converting output into .csv file

I am new to Python, and I am currently writing code to parse through an excel sheet of websites, look at websites that have been modified more than three months ago, and then pull out the names and emails of contacts at those sites. My problem now is that whenever I run the code in my terminal, it only shows me some of the output, so I'd like to export it to a .csv file or really anything that lets me see all the values but I'm not sure how.
import pandas as pd
data = pd.read_csv("filename.csv")
data.sort_values("Last change", inplace = True)
filter1 = data["Last change"]<44285
data.where(filter1, inplace = True)
print(data)
note: the 44285 came from me converting the dates in excel to integers so I didn't have to in Python, lazy I know but I'm learning
You can try converting it to a csv.
data.to_csv('data.csv')
Alternately if you want to just view more records, for example 50, you could do this:
print(data.head(50))
If you can share your parser code also, I think we can save your from the hassle of editing the excel in between the process. Or maybe something from here with a few more lines.
To solve your problem use
data.to_csv("resultfile.csv")
or if you want an excel file
data.to_excel("resultfile.xlsx")

split excel sheet for every nrows using python

I have an excel file with more than 1 million rows. Now i need to split that for every n rows and save it in a new file. am very new to python. Any help, is much appreciated and needed
As suggested by OhAuth you can save the Excel document to a csv file. That would be a good start to begin the processing of you data.
Processing your data you can use the Python csv library. That would not require any installation since it comes with Python automatically.
If you want something more "powerful" you might want to look into Pandas. However, that requires an installation of the module.
If you do not want to use the csv module of Python nor the pandas module because you do not want to read into the docs, you could also do something like.
f = open("myCSVfile", "r")
for row in f:
singleRow = row.split(",") #replace the "," with the delimiter you chose to seperate your columns
print singleRow
> [value1, value2, value3, ...] #it returns a list and list comprehension is well documented and easy to understand, thus, further processing wont be difficult
However, I strongly recommend looking into the moduls since they handle csv data better, more efficient and on 'the long shot' save you time and trouble.

Categories

Resources