Friday, May 17, 2019

C6.2 impala-shell requires python version > 2.7.9 to support TLS1.2

On CDH 6.2, if you try to lock down the TLS protocol used by Impala to TLS1.2 and disable the older versions (i.e. --ssl_minimum_version=tlsv1.2), it breaks impala-shell and it will fail to connect to Impala with this error:

Error connecting: TTransportException, Could not connect to localhost:21000: 
[Errno 8] _ssl.c:504: EOF occurred in violation of protocol

This is because impala-shell in C6.2 uses a newer version of Thrift which requires python 2.7.9 or higher to supprot TLS1.2. Since RHEL7.6 by default only comes with python 2.7.5, you have to either upgrade python or configure Impala to use TLS1.0.

This is documented in the following JIRAs:

https://issues.apache.org/jira/browse/IMPALA-6990
https://issues.apache.org/jira/browse/IMPALA-8407

"When impala-shell is used to connect to an impala cluster with --ssl_minimum_version=tlsv1.2, if the Python version being used is < 2.7.9 the connection will fail due to a limitation of TSSLSocket."

Thursday, January 3, 2019

Cloudera Parcel Stuck in Activating State

Have it ever happened to you when distributing a new parcel and it got stuck in activating state? It can be very annoying as Cloudera Manager may not show any error and it does not allow you to cancel the activation. Deleting the parcel from Cloudera Manager parcels-repo, restarting the Cloudera Manager server and all the agents do not help either.

This situation may happen when one or two nodes hit some problem downloading or activating the parcel. A common cause is wrong permission or ownership on one of the parcel directories on the nodes. One thing you can try to diagnose the problem is to abort the parcel activation using the Cloudera Manager REST API. To do this, use curl to invoke the API command, such as the following example:
curl -k -X POST -u admin:admin https://localhost:7183/api/v17/clusters/mycluster/parcels/products/SPARK2/versions/2.3.0.cloudera5-1.cdh5.13.3.p0.802571/commands/deactivate
Replace the highlighted portion appropriately. If you are unsure of the product name and version, use curl to retrieve the information:
curl -k -u admin:admin https://localhost:7183/api/v17/clusters/mycluster/parcels
The command will return you a json result like below:
{
  "items" : [ {
    "product" : "CDH",
    "version" : "5.15.3-1.cdh5.15.3.p0.77",
    "stage" : "ACTIVATED",
    "clusterRef" : {
      "clusterName" : "Cluster 1"
    }
  }, {
    "product" : "SPARK2",
    "version" : "2.3.0.cloudera5-1.cdh5.13.3.p0.802571",
    "stage" : "ACTIVATING",
    "clusterRef" : {
      "clusterName" : "Cluster 1"
    }
  }, {
  .....
  } ]
}

Monday, October 8, 2018

Setup DBeaver to Connect to Impala using Kerberos

With Cloudera, you can use the Hue web interface to run your SQL queries but sometimes you may find it easier and more convenient to use a SQL editor tool. One of the free SQL editor tool available is DBeaver but it is a bit tricky to setup the connection to Impala using JDBC if your Cloudera cluster is fully secured with TLS and Kerberos. Here is the steps that may help to to get it to connect successfully.

1. Download the JDBC drivers

The very first step is of course to download the latest Cloudera JDBC drivers from their website. Extract the downloaded zip file and you will see two other zip files containing the JDBC JAR files - ImpalaJDBC4.jar and ImpalaJDBC41.jar.

For this blog, I am using version 2.6.4 and ImpalaJDBC41.jar driver.

2. TLS Certificate Truststore

If you cluster has TLS enabled, you will need to have the root and intermediate CA certificates in Java keystore format as your truststore.

3. Create a New Connection in DBeaver

Start DBeaver on your laptop and create a new connection:


Navigate to Hadoop and select Cloudera Impala:


Enter the following information as shown in the following screenshot:


  • Host - hostname of the Impala daemon (coordinator), or the load balancer if there is one. The hostname has to be fully qualified (FQDN) for Kerberos to work.
  • Database/Schema - the name of the Impala database to connect to.
Next, click on "Edit Driver Settings". If this is the first time you are configuring the Impala driver, then you need to add the JDBC driver Jar file to Libraries.

We also need to configure the JDBC connection string to include the Kerberos and TLS properties. Click on "Connection Properties" and add the properties as shown in the screenshot below:


  • AuthMech - the authentication mechanism to use. Value of 1 indicates Kerberos.
  • KrbHostFQDN - the hostname in FQDN that you are connecting to. This should be the FQDN of the load balancer if you are connecting via the load balancer.
  • KrbRealm - Kerberos realm of the Cloudera cluster.
  • KrbServiceName - the Kerberos service name. In this case it is impala.
  • SSL - 1 if the Cloudera cluster has SSL/TLS enabled.
  • SSLTrustStore - the path to the truststore in Java Keystore format that contains the root CA and any intermediate CA certificates (if any).
  • SSLTrustStorePwd - the truststore password.
If you have gotten the properties right, click on the "Test Connection" and you should see that DBeaver is connected successfully. Remember that you need to kinit first to get a valid Kerberos ticket.

4. Troubleshooting

If for any reason you are having issues connecting with the JDBC driver, you can add the following properties to enable logging in the driver:
  • LogLevel - set the value to 6.
  • LogPath - the path to the directory where the driver will write the logs to.
A new log files will be created for each connection attempt with the filename Impala_connection_XX.log, where XX is an incrementing number.