Hadoop proxy server
So our Hadoop cluster runs on some nodes and can only be accessed from these nodes. You SSH into them and do your work.
Since that is quite annoying, but (understandably) nobody will even go near trying to configure access control so that it may be usable from outside for some, I'm trying the next best thing, i.e. using SSH to run a SOCKS proxy into the cluster:
$ ssh -D localhost:10000 the.gateway cat
There are whispers of SOCKS support (naturally I haven't found any documentation), and apparently that goes into core-site.xml:
Except hadoop fs -ls / still fails, without any mention of SOCKS.
I'm only trying to run jobs, not administer the cluster. I only need to access HDFS and submit jobs, through SOCKS (it seems there's an entirely separate thing about using SSL/Proxies between the cluster nodes etc; I don't want that, my machine shouldn't be part of the cluster, just a client.)
Is there any useful documentation on that? To illustrate my failure to turn up anything useful: I found the configuration values by running the hadoop client through strace -f and checking out the configuration files it read.
Is there a description anywhere of which configuration values it even reacts to? (I have literally found zero reference documentation, just differently outdated tutorials, I hope I've been missing something?)