ETL SQL

Learn Extract Transform Load using SQL & PySpark

Author: Nitin Srivastava

  • DataWarehouse Migration Projects – Beginner Guide

    For the past few years I have been involved in many projects where the requirement was to migrate the existing on-premises data warehouse to modern platform. In this post I will share my experience and observation about the migration projects and why it is fun to be part of Migration projects. Wave of Migration Projects…

  • Reverse engineering of SQL queries to generate Data Model

    In this post I will share a quick method which you can use to create a data model from any existing SQLs in your data warehouse environment. Data Model actually plays a very vital role in building understanding of your data warehouse. When you are new to any project, understanding the existing data model can…

  • Redshift Stored Procedure for beginners

    In this post, we will see how you can create Redshift Stored Procedure with example. We will also talk about redshift stored procedure parameters and dynamic sql in stored procedure. What is a stored procedure ? Stored Procedure (SP) is a database object which encapsulates business transformation logic into one unit. Stored Procedure can consists…

  • AWS Certificate – Architect & Data Analytics

    My experience of AWS Certified Solutions Architect – Associate I recently appeared for AWS Certified Solutions Architect (Associate) exam and cleared it comfortably. I would like to share my experience and journey so far which may help you if you are preparing for it too. So let’s begin this post by sharing my background. My…

  • Redshift create table & copy data from S3

    In this post, we will see a very simple example in which we will create a Redshift table with basic structure and then we will see what all additional properties Redshift will add to it by default. Then we will quickly discuss about those properties and in subsequent posts we will see how these properties…

  • Teradata – ambiguous table & multiple source rows update error

    Table/view/trigger name is ambiguous – Not the obvious reason. In some cases, we write database name explicitly in our queries however most of the time we rely on default database to select the view/table used in queries. However do you know in Teradata , you can have more than one default database for your query….

  • Regular expression in Teradata

    With the use of regular expression many complex tasks actually become a one-line code in SQL. Below are few cases in which using REGULAR EXPRESSION can be really useful in SQL. Regex to check if string consists of repeated character There may be a case when source is sending you some data with some filler…

  • Recursive queries in Teradata

    Recently Krish asked us a query in which data in one row should be concatenated with data in other row. And this should continue for all subsequent rows. It looks like kind-of cumulative sum kind of operation for character columns. So we thought of doing it using rescursive queries:Let us see the example below: This…

  • ROWS BETWEEN in Window Function in Teradata

    ROWS BETWEEN is used in Teradata to determine group for calculating aggregation in Analytical Functions. Whenever you use window functions or OLAP functions you specify the rows which shall be considered for calculating measures. If you don’t specify anything then by default all the rows in the partition participate in calculating measures. Some of the…

  • Teradata combine multiple rows into one row

    We have couple of functions available which can help in combining rows into a single line. In this post, we will discuss about 2 such UDF which can give single value for multiple rows. First, XMLAGG: With Teradata supporting various operations on XML datatype now, we can leverage this function to combine multiple rows. Let’s…

  • Numeric overflow occurs during computation

    Numeric overflow in Teradata occurs while handling numeric values in computation or transformation. Numeric overflow occurs when you load value which is bigger than the range for numeric datatype. You can solve the error by casting the column to bigger datatype. Maximum possible value is decimal(38,0). Why Numeric overflow occurred during computation Numeric overflow occurs…

  • Teradata SQL – C3 (cast , concatenate , coalesce) with examples

    Teradata COALESCE with examples Teradata COALESCE function is used for NULL Handling. It returns the first not null value which is passed to it. If all the values passed to COALESCE function are NULL then output would be NULL else output would be first NOT NULL value encountered. Teradata Coalesce Date You can use coalesce…

  • Teradata – Slowly Changing Dimensions (SCD-1 & SCD-2) with examples

    Slowly changing dimensions or SCD are the dimensions that change over time. It called “slowly” because the change is expected to come rarely or rather I shall say not frequently. Let’s understand this with one real life example. You went to a coffee shop and ordered some coffee and snacks. The cashier asked for your…

  • Teradata SQL Query Performance Check & Measures

    One very common question people ask is about possible reasons for long running queries in Teradata. First thing you should understand is Teradata is very good in handling huge volume of data and handling several million or even billion records should not break it. If your query is taking unusual long time then your query…

  • Teradata data distribution & Indexes (Primary/Secondary)

    Teradata uses a HASHING algorithm on PRIMARY INDEX column/s to distribute rows among various AMP’s. The process of rows distribution is the main reason behind the massive parallel core architecture of TERADATA. Teradata HASHROW, HASHBUCKET, HASHAMP TERADATA uses indexes to determine the distribution of rows. Teradata uses a hashing algorithm which processes the index and…

  • Teradata Stored Procedure for beginners

    Stored Procedures in Teradata are very useful powerful objects especially with every new Teradata versions released. With so many options available for condition handling and dynamic sql, Teradata Stored Procedures makes a SQL Developer life lot better.Let us take an example to see how we can create a new Stored Procedure in Teradata. In the…

  • Handle Teradata Date Functions like pro – the ultimate guide

    Teradata Date Functions Date consists of 3 parts – Day , Month & Year. When we combine all 3 parts together we get the output of DATE datatype. In Teradata, any function which you can apply on DATE datatype can be considered as DATE Function. In my opinion, Teradata Date Functions can be classified into…

  • LOCKS IN TERADATA – Explained

    In Teradata , LOCKS are used to maintain integrity of data present in the tables. Teradata , itself decides the kind of lock to implement depending on the type of query however in some cases users can also decide what kind of lock, the query should use. In TERADATA, locks can be implemented at three…

  • Teradata Create Table (Permanent/Volatile) Example- SET/Multiset/CTAS

    Tables are two-dimensional representation of data in rows & columns. Tables are the core of any RDBMS environment. In TERADATA, table creation is not as straightforward as it looks as it comes with so many options. teradata create table You can create Tables with minimal options as we have in any ANSI standard. However creating…

  • Teradata storage space for beginners

    Storage space is used to store data as well as metadata into physical disks. Some data is persistent which remains in disks even when session is over while some may be cleared once application session completes. TERADATA has three kinds of Storage Space namely PERMANENT, SPOOL and TEMPORARY. Permanent Space Permanent Space is used for…

  • What is Teradata TPT ? Everything you must know

    What is Teradata TPT ? Everything you must know

    TPT is short form of Teradata Parallel Transporter. Teradata TPT is a single utility which internally calls other Teradata Utilities depending on the operators specified by user. So now you don’t have to write separate scripts for different utilities for data ingestion or data export. One utility which replaces all Teradata utilities. If you want…

  • Write your first BTEQ script

    Write your first BTEQ script

    Write your first BTEQ script In most of the Teradata environment I have seen the preferred medium to call BTEQ is shell script. It means that Teradata developer will write a shell script which will invoke BTEQ utility to run SQL commands on Teradata. I will show you how to write a sample BTEQ script…

  • Teradata SQL Cheat Sheet – Free Download

    Teradata SQL Cheat Sheet – Free Download

    I am sharing below some Teradata SQL queries which can quickly help you brush your SQL. I have tried to cover all the common SQL operations in Teradata and will keep on adding more. Hope this helps. [DDL] CREATE / ALTER / DROP / RENAME TABLE 1 CREATE table with column list 2 CREATE table…

  • Generate Teradata scripts automatically

    Generate Teradata scripts automatically

    Generate Teradata Import Scripts with just one click – saves ~99% man-hours Step 1 Enter required details of the file/utilty Step 2 Click on required utility button Step 3 Copy load script into file & run it Check the video to see how to use this utility – Teradata Import Script Generator. Click a button…