Apache Zeppelin is a multi-purpose software that can be used for data visualization, data ingestion, data discovery and data analytics. You can check the below mentioned URL to know more about it and to download the software.
https://zeppelin.apache.org/
In download section, I’ll suggest to download the “Binary package with all interpreters” which provides number of interpreters that can be used to connect to various sources. In today’s tutorial, we will see:
- How to create interpreter to be used for Teradata in Zeppelin?
- How to connect to Teradata using interpreter and run queries?
- How to create simple pie-chart for data visualization report?
Note: We assume you have already downloaded and installed Zeppelin. In our case we have HDP 2.5 installed on my local vm.
Open the Zeppelin URL
If you have installed Zeppelin on localhost then open http://127.0.0.1:9995 else use the server ip where you have installed Zeppelin.
*in your case, check for the port number on which you have installed it. By default, port is 8080 which may result in some conflict with other software. In HDP, it is 9995.
Creating a new Interpreter:
Once Zeppelin is open, click on the drop down button at top-right corner next to user "anonymous" and select "interpreter"
Adding JDBC interpreter
We will be using TERADATA JDBC connection to connect to Teradata. So let’s scroll down to JDBC interpreter and add following 4 properties:
Name: teradata.driver
Value: com.teradata.jdbc.TeraDriver
Name: teradata.url
Value: jdbc:teradata://192.168.145.100 (change IP with your Teradata SERVERIP)
Name: teradata.user
Value: DBC (change user with your Teradata USER)
Name: teradata.password
Value: DBC (change password with your Teradata password)
Now scroll down to dependencies and give the path of 2 jdbc jar (tdgssconfig.jar, terajdbc4.jar
) files which are required to connect to Teradata. You can download JDBC driver for Teradata from below mentioned link:
https://downloads.teradata.com/download/connectivity/jdbc-driver
I have placed these two jar files @ below mentioned path:
/usr/hdp/2.5.0.0-1245/zeppelin/interpreter/jdbc/tdgssconfig.jar
/usr/hdp/2.5.0.0-1245/zeppelin/interpreter/jdbc/terajdbc4.jar
Hence added these two paths in dependencies artifact in jdbc interpreter.
Now save & restart JDBC interpreter.
Dedicated Teradata Interpreter
Now we want to create a dedicated TERADATA interpreter. So click on “Create” button on top-right corner in Interpreter screen.
Now give the name of interpreter as “teradata”. Select interpreter group as JDBC.
In the default parameters put the value you have mentioned in Step2.a.
- default.driver
- default.url
- default.user
- default.password
You can remove other sources like postgresql, tajo etc.
Again add in dependencies, the two jar files path. Save & Restart the interpreter.
Create new notebook
You have done the configuration required to connect to Teradata via Zeppelin. Now click on “Notebook” at the top and click on “Create new note”. Give it some name – like TD1. In the paragraph write below command:
%teradata
select tablename,tablekind from dbc.tables where databasename='dbc' order by tablename
Now run the paragraph. First time execution is slow however subsequent queries execution will be much faster. Once you run the query you can see data.
You can see the data in Tabular format by default. If you want to see various charts , then click on chart icons present above the table.Example you clicked on pie-chart icon you can see pie-chart from the data. If you do not see any chart or to change the key columns, click on settings which will appear as soon as you click on pie-chart icon. Select a column and add to keys and other column as values.
So now you have connected to Zeppelin and executed basic query and also build simple pie-chart with the data.
Let me know if you face any issue in connecting Teradata via Zeppelin.
Thank You Raj.. I will keep you posted on my experience. I also have similar use case like showing charts etc., But also need that full download functionality for business. I read some where that anything more than few thousand would take up browser memory. I will have to test that as well. Please keep doing these educational articles which are very helpful for people like us. I am sharing my Linkedin profile in my signature and I would love to connect with experts like you.
Thanks again,
-Raghu
https://www.linkedin.com/in/raghunakka/
Thank you for this post. Really helpful. What was your experience with JDBC connector on Teradata?
Hi Raghu
Happy to help. In reference to TD jdbc connection in Zeppelin, it may take some time for first request however subsequent response shall be much faster. Overall it was not that quick as I was expecting it.
Raj – Thank you for the quick response. What do you mean by not quick? I guess majority of it is based on Teradata performance right? As far as my use case I have a procedure on Teradata that returns a 10,000 row dataset based on the input parameters provided using Zeppelin. I am in the process of creating the jdbc interpreter. But really appreciate any inputs or thoughts you may have regarding the approach I just outlined. Or any issues you may have faced using Zeppelin and Teradata.
Hi Raghu
I meant the response time I typically get when using BTEQ or SQL Assistant or Studio Express is faster than what I get in Zeppelin. Also sometimes the Zeppelin notebook thread goes into hung state, so you may want to restart Zeppelin if it happens.
For me , use-case what to show ETL data flow across multiple tables from Source to Staging to EDW to Data Marts and I wanted to show it using charts rather than table to client hence I used it.
—
Raj