Hadoop proxy server

Using Hadoop through a SOCKS proxy?
Using native Hadoop shell and UI on Amazon EMR | YHemanth s Blog

So our Hadoop cluster runs on some nodes and can only be accessed from these nodes. You SSH into them and do your work.

Since that is quite annoying, but (understandably) nobody will even go near trying to configure access control so that it may be usable from outside for some, I'm trying the next best thing, i.e. using SSH to run a SOCKS proxy into the cluster:

$ ssh -D localhost:10000 the.gateway cat

There are whispers of SOCKS support (naturally I haven't found any documentation), and apparently that goes into core-site.xml:

fs.default.name hdfs://reachable.from.behind.proxy:1234/

mapred.job.tracker reachable.from.behind.proxy:5678

hadoop.rpc.socket.factory.class.default org.apache.hadoop.net.SocksSocketFactory

hadoop.socks.server localhost:10000

Except hadoop fs -ls / still fails, without any mention of SOCKS.

Any tips?

I'm only trying to run jobs, not administer the cluster. I only need to access HDFS and submit jobs, through SOCKS (it seems there's an entirely separate thing about using SSL/Proxies between the cluster nodes etc; I don't want that, my machine shouldn't be part of the cluster, just a client.)

Is there any useful documentation on that? To illustrate my failure to turn up anything useful: I found the configuration values by running the hadoop client through strace -f and checking out the configuration files it read.

Is there a description anywhere of which configuration values it even reacts to? (I have literally found zero reference documentation, just differently outdated tutorials, I hope I've been missing something?)

Source: stackoverflow.com

Related posts:

  1. Unknown proxy Servers
  2. PS4 proxy server
  3. AnalogX proxy server