Access Hadoop on server from python

Access Hadoop on server from python - python

First of all, I need to say that I am a girl who knows very little about remote server. A lot of similar questions asked here is very difficult for me to understand. So I come to ask.
My task is to generate a script which helps me fetch some data from a server.
The data is stored in Hadoop. Usually I log in the server with a user name and a temporary password. I run 'hive' clauses on the server. After I get all the data on the server, I download it. Then on my computer, I manipulate the data on my own computer with Python.
Now I hope to do this with one Python script.
I find thrift package, but don't know how to begin to understand.
I wonder should I install hive on my computer, then use sys to run hive. Or should I log in the server and run hive on the server in my script?
In any case, can thrift help me log in the server?
Thanks very much!

Although Thrift can surely help you, it is recommended to use a more higher-level client. They are usually well-tested and will keep all (or most of) the low level stuff away from you. In particular, HBase looks promising in your case. I'd recommend to have a look at that one, and to compare it with the Hadoop Python Thrift Client described in this tutorial.

Related

What strategy should I use to periodically extract information from a specific folder

With this question I would like to gain some insights/verify that I'm on the right track with my thinking.
The request is as follows: I would like to create a database on a server. This database should be updated periodically by adding information that is present in a certain folder, on a different computer. Both the server and the computer will be within the same network (I may be running into some firewall issues).
So the method I am thinking of using is as follows. Create a tunnel between the two systems. I will run a script that periodically (hourly or daily) searches through the specified directory, convert the files to data and add it to the database. I am planning to use python, which I am fairly familiar with.
Note: I dont think I will be able to install python on the pc with the files.
Is this at all doable? Is my approach solid? Please let me know if additional information is required.

Create a tunnel between the two systems.
If you mean setup the firewall between the two machines to allow connection, then yeah. Just open the postgresql port. Check postgresql.conf for the port number in case it isn't the default. Also put the correct permissions in pg_hba.conf so the computer's ip can connect to it.
I will run a script that periodically (hourly or daily) searches through the specified directory, convert the files to data and add it to the database. I am planning to use python, which I am fairly familiar with.
Yeah, that's pretty standard. No problem.
Note: I dont think I will be able to install python on the pc with the files.
On Windows you can install anaconda for all users or just the current user. The latter doesn't require admin privileges, so that may help.
If you can't install python, then you can use some python tools to turn your python program into an executable that contains all the libraries, so you just have to drop that into a folder on the computer and execute it.
If you absolutely cannot install anything or execute any program, then you'll have to create a scheduled task to copy the data to a computer that has python over the network, and run the python script there, but that's extra complication.
If the source computer is automatically backed up to a server, you can also use the backup as a data source, but there will be a delay depending on how often it runs.

Send message to a kafka topic using java

After several weeks looking for some information here and google, I've decided to post it here to see if anyone with the same problem can raise me a hand.
I have a java application developed in Eclipse Ganymede using tomcat to connect with my local database. The problem is that I want to send a simple message ("Hello World") to a Kafka Topic published on a public server. I've imported the libraries and developed the Kafka function but something happens when I run in debug mode. I have no issues or visible errors when compiling, but when I run the application and push the button to raise this function it stops in KafkaProducer function because there is NoClassDefFoundError kafka.producer..... It seems like it is not finding the library properly, but I have seen that it is in the build path properly imported.
I am not sure if the problem is with Kafka and the compatibility with Eclipse or Java SDK (3.6), it could be?. Anyone knows the minimum required version of Java for Kafka?
Also, I have found that with Kafka is really used Scala but I want to know if I can use this Eclipse IDE version for not change this.
Another solution that I found is to use a Python script called from the Java application, but I have no way to call it from there since I follow several tutorials but then nothing works, but I have to continue on this because it seems an easier option. I have developed the .py script and works with the Kafka server, now I have to found the solution to exchange variables from Java and Python. If anyone knows any good tutorial for this, please, let me know.
After this resume of my days and after hitting my head with the walls, maybe someone has found this error previously and can help me to find the solution, I really appreciate it and sorry for the long history.

Please include the Kafka client library within the WAR file of the Java application which you are deploying to Tomcat

Please use org.apache.kafka.clients.producer.KafkaProducer rather than kafka.producer.Producer (which is the old client API) and make sure you have the Kafka client library on the classpath. The client library is entirely in Java. It's the old API that's written in scala, as is the server-side code. You don't need to import the server library in your code or add it to the classpath if you use the new client API.

At the end the problem was related with the library that was not well added. I had to add it in the build.xml file, importing here the library. Maybe this is useful for the people who use an old Eclipse version.
So now it finds the library but I have to update Java version, other matter. So it is solved

Can Python server code be read?

I am working on a Python WebSocket server. I initiate it by running the python server.py command in Terminal. After this, the server runs fine and actually pretty well for what I'm using it for. The server runs on port 8000.
My question is, if I keep the server.py file outside of my localhost directory or any sub-directory, can the Python file be read and the code viewed by anyone else?
Thanks.

It is hard to give a definite yes or no answer, because there are a million ways in which your server may expose the .py file. The crucial point is though, that your server needs to actively expose the file to the outside world. A computer with no network-enabled services running does not expose anything on the network, period. Only physical access to the computer would allow you access to the file.
From this absolute point, it's a slow erosion of security with every additional service that offers a network component. Your Python server itself (presumably) doesn't expose its own source code; it only offers the services it's programmed to offer. However, you may have other servers running on the machine which actively do offer the file for download, or perhaps can be tricked into doing so. That's where an absolute "No" is hard to give, because one would need to run a full audit of your machine to be able to give a definitive answer.
Suffice it to say that a properly configured server without gaping security holes will not enable users to download the underlying source code through the network.

How to run remote shell command using python script

I tried to run remote command to back up my server db, after dumbing it I can't figure out how to remotely get the datas.
https://docs.python.org/2/library/subprocess.html
Found some documentation, but that didn't really helped me.

Most databases have some remote-backup service available, so I'd look into that first.
That said, you could use a library that simplifies secure-shell operations. One of those is Fabric which was based on paramiko.
Fabric was designed for things like remote backups (or deployments). You want to look specifically at it's get operation capabilities.

I advice you to read this documentation.
=> https://docs.python.org/2/library/ssl.html#socket-creation
Best regards,
Greg

How can we call the CLI executables commands using Python

How can we call the CLI executables commands using Python
For example i have 3 linux servers which are at the remote location and i want to execute some commands on those servers like finding the version of the operating system or executing any other commands. So how can we do this in Python. I know this is done through some sort of web service (SOAP or REST) or API but i am not sure....... So could you all please guide me.

Depends on how you want to design your software.
You could do stand-alone scripts as servers listening for requests on specific ports,
or you could use a webserver which runs python scripts so you just have to access a URL.
REST is one option to implement the latter.
You should then look for frameworks for REST development with python, or if it’s simple logic with not so many possible requests can do it on your own as a web-script.

Maybe you should take a look at Pushy, which allows to connect to remote machines through SSH and make them execute various Python functions. I like using it because there are no server-side dependencies except the SSH server and a Python interpreter, and is therefore really easy to deploy.
Edit: But if you wish to code this by yourself, i think SOAP is a nice solution, the SOAPpy module is great and very easy to use.

You can use Twisted,
It is easy create ssh clients or servers.
Examples:
http://twistedmatrix.com/documents/current/conch/examples/

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Access Hadoop on server from python - python

Related

What strategy should I use to periodically extract information from a specific folder

Send message to a kafka topic using java

Can Python server code be read?

How to run remote shell command using python script

How can we call the CLI executables commands using Python

Categories

Resources