downloading a CSV file without a direct link or API

downloading a CSV file without a direct link or API - python

I am trying to find a solution to download a CSV file from an internal company website.
My first thought was throught the API but it doesn't exit, at least not yet.
Second I figured, I would try a direct link to the CSV file on the website, that's also a No Go as the link is sort of static and even copy pasting to a new tab doesn't work.
My third idea was to record a mouse movement, but that's just too inefficient as I would like this process to happen in the background.
Please, I welcome any ideas, codes, software, anything.... most preferable solution would be in VBA after that python and then anything else.
I'm sorry if this is not stack overflow worthy question :/

Related

How to automate uploading files and do some tasks on website with python?

I'm new with this and I could really use help. I put my designs on site (https://society6.com - not my site) but it's not a simple upload process, because the website, doesn't load metadata (title, description, and keywords) which takes a lot of time. I was wondering how can I automate the whole process, starting from uploading, checking all required check statements (always the same) and copying metadata, and at the end publishing.
I use Linux and I would like to upload multiple images. One after another one.
I'm comfortable with python, but I have never used similar modules. I have found two Selenium (https://selenium-python.readthedocs.io/) and PyAutoGUI (https://pyautogui.readthedocs.io/en/latest/). Can you recommend me a better one for me and help me save time. And if you know of similar projects, please let me know about them.
Thanks in advance!

My Google Docs getting weird in formatting and difficult to edit

Crop of my google docs
So, I don't know what happen to my google docs, but everything went well until I tried to open the ".xlsx" file using openpyxl module in Python
I am not sure as well if it was caused by the module itself or what caused the error in my google docs.
So, basically my google docs is getting difficult to be edited. When i tried to move my cursor at certain points, it always brings the cursor in the beginning of the sentence at particular line. Next if I type something it always appear at the section "IT APPEARS HERE". And last thing, if i type something without a space or enter, it will not type in the new line but it seems like just breaching the page border and goes beyond that. I don't know what is going on here.
I am wondering if it was caused by my trial & error in the python code (?), but seems like no direct correlation between that. So, yeah I need your advice on this guys.. I really appreciate your help

.xlsx is file format for Microsoft Excel, try opening with Google Sheets

Might be related to the issue described here: https://www.androidpolice.com/2021/04/14/if-youre-using-an-ad-blocker-in-google-docs-you-could-run-into-weird-problems/
Basically: Disable your adblocker for google docs for now.

Download latest version from Github, unzip to folder and overwrite contents?

I have a Node project that's bundled and added to Github as releases. At the moment, it checks my Github for a new release via the API and lets the user download it. The user must then stop the Node server, unzip the release.zip to the folder and overwrite everything to update the project.
What I'm trying to do is write a Python script that I can execute in Node by spawning a new process. This will then kill the Node server using PM2, and then Python script will then check the Github API, grab the download url, downloads it, unzips the contents to the current folder, deletes the zip and then starts up the Node server again.
What I'm struggling with though is checking the Github API and downloading the latest release file. Can anyone point me in the right direction? I've read that wget shouldn't be used in Python, and instead use urlopen

If you are asking for ways to get data from a web server, the two main libraries are:
Requests
Urllib
Personally, I prefer requests. They both have good documentation.
With requests, getting JSON data is as simple as:
r = requests.get("example.com")
r = r.json()
You can add headers and other information easily, though keep in mind that while it supports HTTP, it doesn't support HTTPS.

You need to map out your workflow and dataflow better. You can do it in words or pictures. If you can express your problem clearly and completely in words step by step in list format in words, then translate it to pseudocode. Python is great because you can go almost immediately from a good written description, to pseudocode, to a working implementation. Then at least you have something that works, and you can optimize performance, simplify functionality or usability from there. This is the process of translating a problem into a solution.
When asking questions on SO, you need to show your current thinking, what you've already tried, preferably with your code that doesn't yet work, or work the way you need it to work. People can vote you down and give you negative reputation points if you ask a question with just a vague description, a question that is an obvious cry for help with homework (yours is not that), or a muse or a vague question with not even an attempt at a solution, because it does not contribute back to the community in any way.
Do you have any code or detailed pseudocode steps for checking the GitHub API and checking for the "latest release" of file(s) you are trying to update?

Code for web crawling with Python 2.7.3 in mac terminal?

I am a social scientist and a complete newbie/noob when it comes to coding. I have searched through the other questions/tutorials but am unable to get the gist of how to crawl a news website targeting the comments section specifically. Ideally, I'd like to tell python to crawl a number of pages and return all the comments as a .txt file. I've tried
from bs4 import BeautifulSoup
import urllib2
url="http://www.xxxxxx.com"
and that's as far as I can go before I get an error message saying bs4 is not a module. I'd appreciate any kind of help on this, and please, if you decide to respond, DUMB IT DOWN for me!
I can run wget on terminal and get all kinds of text from websites which is awesome IF I could actually figure out how to save the individual output html files into one big .txt file. I will take a response to either question.

Try Scrapy. It is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

You will most likely encounter this as you go, but in some cases, if the site is employing 3rd party services for comments, like Disqus, you will find that you will not be able to pull the comments down in this manner. Just a heads up.
I've gone down this route before and have had to tailor the script to a particular site's layout/design/etc.
I've found libcurl to be extremely handy, if you don't mind doing the post-processing using Python's string handler functions.
If you don't need to implement it purely in Python, you can make use of wget's recursive mirroring option to handle the content pull, then write your python code to parse the downloaded files.

I'll add my two cents here as well.
The first things to check are that you installed beautiful soup, and that it lives somewhere that it can be found. There's all kinds of things that can go wrong here.
My experience is similar to yours: I work at a web startup, and we have a bunch of users who register, but give us no information about their job (which is actually important for us). So my idea was to scrape the homepage and the "About us" page from the domain in their email address, and try to put a learning algorithm around the data that I captured to predict their job. The results for each domain are stored as a text file.
Unfortunately (for you...sorry), the code I ended up with was a bit complicated. The problem is that you'll end up getting a lot of garbage when you do the scraping, and you'll have to filter it out. You'll also end up with encoding issues, and (assuming you want to do some learning here) you'll have to get rid of low-value words. The total code is about 1000 lines, and I'll post some important pieces that may help you out here, if you're interested.

Create a 'single-serving site' with python

I want to make a Python script available as a service on the net. The script, which is my first 'proper' Python program, takes a txt file as argument and writes an image into the work directory. So:
How difficult is it for somebody who is new to Python and web development?
How much work is it?
Do I need a framework (Django, cherryPy, web2py)?
Are there good tutorials?
How do I avoid the server to be compromised?
What are my next steps?
==> What is the easiest way?
In the end it is enough, if it is a white page, with some text, and a button, which when clicked, opens a file dialog. After the txt is processed, the server should just return the image, which was written on the hard drive. Already I have access to a server which has Ubuntu installed through a friend.
[update]
Thanks for all your answers. After reading them I want to stress again, that I want to have it as minimal as possible. Srikar's suggestion sounds like the easiest one:
Put it in executable directory of your OS (commonly known as CGI
path). Provide a simple HTML form & upon form submission hit this
script which executes & returns back the image you want to display.
Any objections or comments? Do you know any tutorials for that?
[udpate2]
I found this SO answer: File Sharing Site in Python Is this a sensible approach?

It's not too difficult. Actually, it sounds like a good first project.
That too subjective to answer. An hour to days.
No, you don't need one, but I'd use one if I were you. They abstract away some of the stuff you really don't care about, and you'll learn a tool you can use again in the future.
Plenty. If you want a real rundown of how Python works for the web, read the HOWTO from Python.org. If you just want to learn how to do this one project, pick a framework and do their tutorial.
This question is so broad and complex that I'm not going to try to answer it. Search this site, or Google, for questions like that.
Your next step should be to pick a framework; I've used Django successfully. Just download it, follow the installation instructions, and work your way through their tutorial; it should tell you everything you need to know to do what you want. If you still have questions once you've learned how to do the basics, come back and ask again!
Edit: The answer to that other question will certainly work for you. There, they just receive a GET request and respond with data from a Python file. You need to receive a GET request, respond with an HTML page (easy enough), then respond to a POST request that includes an uploaded file (slightly more complicated) and run your python routine on the uploaded file and then respond with the created image (or a link to it).
Take a look at this page which includes a simple Python script to do file uploads. You should easily be able to modify it to do what you want.

How difficult is it for somebody who is new to Python and web development?
Depends on your level of knowledge.
How much work is it?
Depends on which method you choose to solve the problem.
Do I need a framework (Django, cherryPy, web2py)?
Not necessarily - you could get started by using the CGI (http://docs.python.org/library/cgi.html)
Are there good tutorials?
Yes, there are plenty. The Python docs are an excellent place to start.
How do I avoid the server to be compromised?
Again, depends on the method you choose to solve the problem, although there are commonalities.
What are my next steps?
Dare I say it again, choose a method, read the docs, have a play!

If its just as simple as you have described it. Then you might not even need Django. You could simply use CGI scripting. All of these design decisions, depend on whether
You need (or foresee) a SQL storage?
or a Content-Management-System?
Will you need multiple-user support?
Do you need tight security?
Do you need different privileges for different users?
Do you need an Admin to manage your site?
If the answer to above questions is atleast 60% correct, then you might consider Django. otherwise, just write a python script. Put it in executable directory of your OS (commonly known as CGI path). Provide a simple HTML form & upon form submission hit this script which executes & returns back the image you want to display. So, it all depends on the features you need...

In the end, I created what I needed with Flask.
They have a well documented pattern / tutorial on Uploading Files. The tutorial is understandable even for people with little python and web expericence.
To get a first working version it took me 2h and the resulting code was only 50 lines. This includes, starting the webserver, having a html file/form with file upload and serving a file back to the user.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.