Use Pandas(Python) to read really big csv files from Google Drive - python

Good afternoon!
While using pandas to read csv data files > 500MB from my drive, instead of getting the csv file I receive the "can't scan large file for viruses" HTML page. I've tried a lot but can't find any workarounds. Can anyone tell me if it's possible to bypass that?
Sample file:- https://drive.google.com/file/d/1EQbD11iRnbXVJMZNTVExfrRP5WYIcAjk/view
Error Image
PS can someone also suggest a better (preferably free) service to upload multiple big csv files so that I can use pandas to get the data from it... i have >40gb of data to work with
Thanks :)

I found this and it's working for me as of 14/10/2020 though it was taken off of the documentation: http://web.archive.org/web/20190621105530/https://developers.google.com/drive/api/v3/manage-downloads

Related

Read CSV in SharePoint to DataFrame and Upload back to Sharepoint as CSV

I have no issues loading the data using the sharepy module, but I am struggling to upload back to sharepoint.
If anyone has a textbook method (I don't mind using another module than sharepy), I'd be very grateful for the help.
I know posting code helps, but frankly there is not much to show, I've been trying bits and pieces found in various forms but I'm open to any library out there.

Is there a function to download a pickle file via requests.post(url) and load it into a dataframe without saving it locally

I am trying to download a pickle file from a web-based API via the requests.post(url) function in python. I was able to download and load the pickle file in a dataframe however i had to save it locally before loading it. I wanted to check if there is a way to load the pickle file directly into the dataframe without having to save it locally. I was able to do it for csv files (as seen below) however not for pickle files:
r=requests.post(url)
data=r.content.decode()
df=pd.read_csv(io.StringIO(data),header=0,engine=None)
Any help is appreciated, thanks.
Just a guess at something that might work for you since it looks like the pickle file contains text-csv-like data.
df=pd.read_csv(io.StringIO(pd.read_pickle(url)),header=0,engine=None)
thanks for your suggestion. Basically i ended up using the following and it worked;
r=requests.post(url)
data=r.content
df=pd.read_pickle(io.BytesIO(data))

how to convert a .csv file to .bag in python?

I have some VLP16 LiDar data in .csv file format, have to load the data in Ros Rviz for which I need the Rosbag file(.bag). I have tried finding it in the Ros tutorial, what I got was to convert .bag to .csv
I'm not actually expert in processing .bag files but I think you need to go through your CSV file and manually add the values using rosbag Python API
Not direct answer but check this script in python, which might help you.
Regarding C++ I propose this repository: convert_csv_to_rosbag which is even closer to what you asked.
However, it seems that you need to do it by yourself based on these examples.

Where to upload a csv to then read it when coding on Jupyter

I need to upload a few CSV files somewhere on the internet, to be able to use it in Jupyter later using read_csv.
What would be some easy ways to do this?
The CSV contains a database. I want to upload it somewhere and use it in Jupyter using read_csv so that other people can run the code when I send them my file.
The CSV contains a database.
Since the CSV contains a database, I would not suggest uploading it on Github as mentioned by Steven K in the previous answer. It would be a better option to upload it to either Google Drive or Dropbox as rightly said in the previous answer.
To read the file from Google Drive, you could try the following:
Upload the file on Google Drive and click on "Get Sharable Link" and
ensure that anybody with the link can access it.
Click on copy link and get the file ID associated with the CSV.
Ex: If this is the URL https://drive.google.com/file/d/108ARMaD-pUJRmT9wbXfavr2wM0Op78mX/view?usp=sharing then 108ARMaD-pUJRmT9wbXfavr2wM0Op78mX is the file ID.
Simply use the file ID in the following sample code
import pandas as pd
gdrive_file_id = '108ARMaD-pUJRmT9wbXfavr2wM0Op78mX'
data = pd.read_csv(f'https://docs.google.com/uc?id={gdrive_file_id}&export=download', encoding='ISO-8859-1')
Here you are opening up the CSV to anybody with access to the link. A better and more controlled approach would be to share the access with known people and use a library like PyDrive which is a wrapper around Google API's official Python client.
NOTE: Since your question does not mention the version of Python that you are using, I've assumed Python 3.6+ and used f-strings in line #3 of the code. If you use any version of Python before 3.6, you would have to use format method to substitute the value of the variable in the string
You could use any cloud storage provider like Dropbox or Google Drive. Alternatively, you could use Github.
To do this in your notebook, import pandas and read_csv like you normally would for a local file.
import pandas as pd
url="https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
c=pd.read_csv(url)

How to create insights from an excel file using either Python? I'm a beginner in both languages

I have a large Excel spreadsheet which has lots and lots of historical data of an organisation. I want to be able to read that excel file and create valuable insights from it. I don't expect anyone to do this for me, but I'm just hoping someone can pinpoint me as to how should I go about doing this on Python/R or suggest any online resource I can access to get this done.
You should take a look at openpyxl: http://openpyxl.readthedocs.io/en/default/.
It is great library for reading, writing and processing Excel spreadsheet in python. You can easily process the data and take valuable insights from it.

Categories

Resources