[Solved] Why am I getting these strange connection errors when reading or writing to Hadoop file system with a Python script?

I wrote a python code to read and write to a hadoop file system with IP hdfs_ip. It takes 3 arguments –

  1. local_file_path
  2. remote_file_path
  3. read_or_write

When I want to run it, I will first need to export the namenode IP, and then run the python code with the 3 arguments. However, I am getting some strange errors.

$ export namenode='http://hdfs_ip1:50470,http://hdfs_ip2:50470'
$ python3 python_hdfs.py ./1.png /user/testuser/new_1.png read
['http://hdfs_ip1:50470', 'http://hdfs_ip2:50470']
./1.png /user/testuser/new_1.png
http://hdfs_ip1:50470
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 377, in _make_request
    httplib_response = conn.getresponse(buffering=True)
TypeError: getresponse() got an unexpected keyword argument 'buffering'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 560, in urlopen
    body=body, headers=headers)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 379, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.5/http/client.py", line 1197, in getresponse
    response.begin()
  File "/usr/lib/python3.5/http/client.py", line 297, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.5/http/client.py", line 279, in _read_status
    raise BadStatusLine(line)
http.client.BadStatusLine: 


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 376, in send
    timeout=timeout
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 610, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 247, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/lib/python3/dist-packages/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 560, in urlopen
    body=body, headers=headers)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 379, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.5/http/client.py", line 1197, in getresponse
    response.begin()
  File "/usr/lib/python3.5/http/client.py", line 297, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.5/http/client.py", line 279, in _read_status
    raise BadStatusLine(line)
requests.packages.urllib3.exceptions.ProtocolError: ('Connection aborted.', BadStatusLine('x15x03x03x00x02x02n',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "python_hdfs.py", line 63, in <module>
    status, name, nnaddress= check_node_status(node)
  File "python_hdfs.py", line 18, in check_node_status
    request = requests.get("%s/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus"%name,verify=False).json()
  File "/usr/lib/python3/dist-packages/requests/api.py", line 67, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/api.py", line 53, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 468, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 576, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 426, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine('x15x03x03x00x02x02n',))

What might be the issue here, if anyone could point out please?


Below is the Python script I am trying to run, python_hdfs.py:

import requests
import json
import os
import kerberos
import sys

node = os.getenv("namenode").split(",")
print (node)

local_file_path = sys.argv[1]
remote_file_path = sys.argv[2]
read_or_write = sys.argv[3]
print (local_file_path,remote_file_path)

def check_node_status(node):
    for name in node:
        print (name)
        request = requests.get("%s/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus"%name,
                               verify=False).json()
        status = request["beans"][0]["State"]
        if status =="active":
            nnhost = request["beans"][0]["HostAndPort"]
            splitaddr = nnhost.split(":")
            nnaddress = splitaddr[0]
            print(nnaddress)
            break
    return status,name,nnaddress

def kerberos_auth(nnaddress):
    __, krb_context = kerberos.authGSSClientInit("[email protected]%s"%nnaddress)
    kerberos.authGSSClientStep(krb_context, "")
    negotiate_details = kerberos.authGSSClientResponse(krb_context)
    headers = {"Authorization": "Negotiate " + negotiate_details,
                "Content-Type":"application/binary"}
    return headers

def kerberos_hdfs_upload(status,name,headers):
    print("running upload function")
    if status =="active":
        print("if function")
        data=open('%s'%local_file_path, 'rb').read()
        write_req = requests.put("%s/webhdfs/v1%s?op=CREATE&overwrite=true"%(name,remote_file_path),
                                 headers=headers,
                                 verify=False, 
                                 allow_redirects=True,
                                 data=data)
        print(write_req.text)

def kerberos_hdfs_read(status,name,headers):
    if status == "active":
        read = requests.get("%s/webhdfs/v1%s?op=OPEN"%(name,remote_file_path),
                            headers=headers,
                            verify=False,
                            allow_redirects=True)

        if read.status_code == 200:
            data=open('%s'%local_file_path, 'wb')
            data.write(read.content)
            data.close()
        else : 
            print(read.content)


status, name, nnaddress= check_node_status(node)
headers = kerberos_auth(nnaddress)
if read_or_write == "write":
    kerberos_hdfs_upload(status,name,headers)
elif read_or_write == "read":
    print("fun")
    kerberos_hdfs_read(status,name,headers)

Solution #1:

I got the same error when substituting urllib for requests module. The problem is that the exception trace is conceptually in the wrong order. The real problem is the last exception in the trace.

More about this here: https://github.com/requests/requests/issues/2510

In your case a connection error has been raised in python_hdfs.py
This is the real problem

Respondent: Kristada673

The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .

Leave a Reply

Your email address will not be published.