I have a running Bitcoin node which downloads all block .dat files.
Now I would like to parse these files into a Python script and receive only the transaction data.
Afterwards, I am planning to push the data into Google BigQuery for analysis.
Does anyone have a good source for such a python script?
Many thanks in advance!
Basically, all you need is to read these .dat files with python and send the content to GBQ.
Here is a thread on how to process .dat files:
reading and doing calculation from .dat file in python
After converting .bat into pandas data frame, you can use pandas-gbq package to save it in Google BigQuery. Documentation: https://pandas-gbq.readthedocs.io/en/latest/
Related
I have a DataFrame that I would like to store as a CSV file in a Sharepoint.
It seems that the only way is to first save CSV file locally and then, using Shareplum, upload file to Sharepoint.
Is there a way to directly save DataFrame into Sharepoint as CSV file, without creating a local file?
Thanks a lot for your help.
It should be possible to write the csv content to an in-memory text buffer (e.g. StringIO or ByteIO) rather than to a local file - here is an example (last section of the page).
After that, you could use a library for writing the content directly to a Sharepoint: This discussion shows several approaches how to do that, including the Office365-REST-Python-Client and also SharePlum, which you have already mentioned.
Here are two more sources (Microsoft technical doc) that you might find useful:
How can I upload a file to Sharepoint using Python?
How to get and upload files from sharepoint with python?
I have read the documentation but just not able to figure it out, how to do it ?
My csv is this format
I have downloaded telegrf for this purpose.
I want to automatize the ETL process from .csv files into SQL Server.
First of all I have a issue with .csv files who have this structure (see the next line). As you can see I need to delete the first line and the last 4.
https://ibb.co/Z6rrbPY
I try using pandas and csv services from python, but I don't find the solution. I'm stuck in this part and is the beginning of what I'm trying to do
Let me know what can I do.
Thanks a lot.
Currently I am working in a project where user will be able to upload a CSV file and the data in CSV file will be stored into database. This project is developing on Falcon Framework as back-end where API requests are sending from client side Angular 4.
From Angular side, I can parse CSV file data into JSON data. There are some packages available as example ngx-papaparse. Is there other way around like getting CSV file in python and process CSV file data to be stored into database. Then what is the best way to do.
Python is quite flexible for processing data, you may also convert your csv data to json using Pandas in python
import pandas as pd
df = pd.read_csv('filename.csv')
df.to_json('jsonfilename.json')
You can also store csv files in mysql by exporting them using python , please read http://www.vallinme.com/v1/?p=95
Petl is another library for such purpose.
I extracted a .csv file from Google Bigquery of 2 columns and 10 Million rows.
I have downloaded the file locally as a .csv with the size of 170Mb, then I uploaded the file to Google Drive, and I want to use pandas.read_csv() function to read it into pandas DataFrame in my Jupyter Notebook.
Here is the code I used, with specific fileID that I wanna read.
# read into pandasDF from .csv stored on Google Drive.
follow_network_df = pd.read_csv("https://drive.google.com/uc?export=download&id=1WqHWdgMVLPKVbFzIIprBBhe3I9faq4HA")
Then here is what I got:
It seems the 170Mb csv file is read as an html link?
While when I tried the same code with another csv file of 40Mb, it worked perfectly
# another csv file of 40Mb.
user_behavior_df = pd.read_csv("https://drive.google.com/uc?export=download&id=1NT3HZmrrbgUVBz5o6z_JwW5A5vRXOgJo")
Can anyone give me some hint on the root cause of the difference?
Any ideas on how to read a csv file of 10 Million rows and 170Mb from online storage? I know it's possible to just read the 10 Million rows into pandasDF by just using the BigQuery interface or from local machine, but I have to include this as part of my submission, so it's only possible for me to read from online source.
The problem is that your first file is too large for Google Drive to scan for viruses, so there's a user prompt that gets displayed instead of the actual file. You can see this if you access the first file's link.
I'd say click on the user prompt and use the following url with pd.read_csv.