Uncategorized

FastLoad in TeraData

Sending
User Rating 5 (1 vote)
FASTLOAD
Fast Load is a utility that can be used to quickly load large amounts of data to an empty table on Teradata. It uses multiple sessions to load data to the Teradata target table. FastLoad divides its job into two phases, both designed for speed. They are typically known as Phase 1 and Phase 2. Sometimes they are referred to Acquisition Phase and Application Phase.

PHASE 1: Acquisition
The primary function of Phase 1 is to transfer data from the host computer to the Access Module Processors (AMPs) as quickly as possible. When the Parsing Engine (PE) receives the INSERT command, it uses one session to parse the SQL just once. The PE is the Teradata software processor responsible for parsing syntax and generating a plan to execute the request. It then opens a Teradata session from the FastLoad client directly to the AMPs. Then by default, one session is created for each AMP.


Therefore, on large systems, it is normally a good idea to limit the number of sessions using the SESSIONS command. Now one of the client sessions begins loading raw data in 64K blocks for transfer to an AMP. The first priority of Phase 1 is to get the data onto the AMPs as fast as possible. To accomplish this, the rows are packed, unhashed, into large blocks and sent to the AMPs without any concern for which AMP gets the block.

PHASE 2: Application
In this phase, each AMP sorts the rows in its worktable. Then it writes the rows into the table space on disks where they will permanently reside. Rows of a table are stored on the disks in data blocks. The AMP uses the block size as defined when the target table was created. If the table is Fallback protected, then the Fallback will be loaded after the Primary table has finished loading. This enables the Primary table to become accessible as soon as possible.
 
FastLoad Commands
Below are the some basic commands which are used in the fast Load Utility.
 

1. BEGIN LOADING This identifies and locks the FastLoad target table for the duration of the load. It also identifies the two error tables to be used for the load.
2. CREATE TABLE This defines the target table and follows normal syntax. If used, this should only be in the initial script. If the table is being loaded, it cannot be created a second time.
3. DEFINE
 
This names the Input file and describes the columns in that file and the data types for those columns.
4. DELETE Deletes all the rows of a table. This will only work in the initial run of the script. Upon restart, it will fail because the table is locked.
5. DROP TABLE Drops a table and its data. It is used in FastLoad to drop previous Target and error tables. At the same time, this is not a good thing to do within a FastLoad script since it cancels the ability to restart.
6. END LOADING Success! This command indicates the point at which that all the data has been transmitted. It tells FastLoad to proceed to Phase II. As mentioned earlier, it can be used as a way to partition data loads to the same table. This is true because the table remains empty until after Phase II.
7. ERRLIMIT It specifies the maximum number of rejected ROWS allowed in error table 1 (Phase I). ERRLIMIT provides you with a safety valve. We can specify a particular number of error rows beyond which FastLoad will immediately precede to the abort. This provides the option to restart the FastLoad or to scrub the input data more before loading it.
8. INSERT It inserts rows into the target table.
9. LOGON/LOGOFF or, QUIT
 
LOGON simply begins a session. LOGOFF ends a session. QUIT is the same as LOGOFF.
10. NOTIFY
 
NOTIFY command used to inform the job that some event has occurred. It calls a user exit or predetermined activity when such events occur.
11. RECORD It specifies the beginning record number of the Input data source, to be read by FastLoad. Syntactically, This command is placed before the INSERT keyword. It enables FastLoad to bypass input records that are not needed such as tape headers, manual restart, etc.
12. SET RECORD
 
Used only in the LAN environment, this command states in what format the data are there into the Input file.
13. SESSIONS This command specifies the number of FastLoad sessions to establish with Teradata. The default is 1 session per available AMP.
14. SLEEP SLEEP command specifies the amount minutes to wait before retrying to logon and establish all sessions. This situation can occur if all of the loader slots are used or if the number of requested sessions is not available. The default is 6 minutes. For example, suppose that Teradata sessions are already maxed-out when your job is set to run. If TENACITY were set at 4 and SLEEP at 10, then FastLoad would attempt to logon every 10 minutes for up to 4 hours. If there were no success by that time, all efforts to logon would cease.
15. TENACITY TENACITY specifies the amount of time, in hours, to retry to obtain a loader slot or to establish all requested sessions to logon. The default for FastLoad is “no tenacity”, meaning that it will not retry at all.
 
The FastLoad script runs with below sequences:
  • Logging onto Teradata
  • Defining the Teradata table that you want to load (target table)
  • Defining the INPUT data file
  • Telling the system to start loading
Below is the sample FastLoad script
SESSIONS 100; /*or, the number of sessions supportable*/
 
TENACITY 4; /* the default is no tenacity, means no retry */
SLEEP 10; /* the default is 6, means retry in 6 minutes */
LOGON Databasename/username,password;
SHOW VERSIONS; /* Shows the Utility’s release number */
/* Set the Record type to a comma delimited for FastLoad */ RECORD 2;
SET RECORD VARTEXT ‘,’;
/*Define the Textfile Layout and Input file*/
DEFINE Employee_No (VARCHAR(10))
                ,Last_name (VARCHAR(20))
                ,First_name (VARCHAR(12))
               ,Salary (VARCHAR(5))
               ,Dept No (VARCHAR(6))
FILE=EMPS.TXT;
/* Begin the Load and Insert Process into the */
/* Employee_Profile Table */
BEGIN LOADING Databasename.TableName
ERRORFILES Databasename.ErrorTable1,Databasename.ErrorTable2
CHECKPOINT 100000;
INSERT INTO Databasename.TableName VALUES
                           (:Employee_No
                            ,:Last_name
                            ,:First_name
                            ,:Salary
                            ,:Dept No
                            );
END LOADING;
LOGOFF;
Specify the number of sessions to logon.
Tenacity is set to 4 hr;
Wait 10 Min between retries.
 
Display the version of FastLoad.
Starts with the second record.
Specifies if record layout is vartext with a comma delimiter.
Notice that all fields are defined as VARCHAR. When using VARTEXT, the fields do not contain the length field like in these formats: text, FastLoad, or unformatted.
 
Defines the flat file name.
Specifies table to load and lock.
 
Names the error tables. Sets the number of rows at which to pause & record progress in the restart log before loading further.
 
Defines the insert statement to use for loading the rows.
 
Continues loading process with Phase 2.
 
 
Logs off of Teradata.
 
Step One: Before logging onto Teradata, it is important to specify how many sessions you need. The syntax is [SESSIONS {n}].

Step Two: LOGON to the Teradata system.

Step Three: If the input file is not a FastLoad format, before We describe the INPUT FILE structure in the DEFINE statement, We must first set the RECORD layout type for the file being passed by FastLoad. We have used VARTEXT in our example with a comma delimiter. The other options are FastLoad, TEXT, UNFORMATTED OR VARTEXT.

Step Four: Next, comes the DEFINE statement. FastLoad must know the structure and the name of the flat file to be used as the input FILE, or source file for the load.

Step Five: FastLoad will automatically create the error tables once we provide the name. Phase 1 uses “Error_Table1” because it comes first and Phase 2 uses “Error_Table2”.

Step Six: FastLoad focuses on its task of loading data blocks to AMPs and it will not proceed to Phase 2 without the END LOADING command.

Step Seven: Now all the sessions must LOGOFF. This will be the last utility command in our script. At this point the table lock is released and if there are no rows in the error tables, they are dropped automatically. However, if a single row is in one of them, we are responsible to check it, take the appropriate action and drop the table manually.
 
What Happens When FastLoad Finishes
The most important thing to do is verify that FastLoad completed successfully. This is accomplished by looking at the last output in the report and making sure that it is a return code or status code of zero (0). Any other value indicates that something wasn’t perfect and needs to be fixed.
The locks will not be removed and the error tables will not be dropped without a successful completion. This is because FastLoad assumes that it will need them for its restart. At the same time, the lock on the target table will not be released either.

Constraints of FastLoad:

  • It loads empty tables at high speed.
  • The target tables must be empty in order to use FastLoad.
  • It supports insert only – it is not possible to perform updates or deletes in FastLoad.
  • Although Fastload uses multiple sessions to load the data, only one target table can be processed at a time.
  • Teradata Fastload does not support join indexes, foreign key references in target tables and tables with secondary index defined. It is necessary to drop any of the constraints listed before loading and recreate them afterwards.
  • The maximum number of concurrent Teradata Fastload tasks can be adjusted by a system administrator.
  • Fastload runs in two operating modes: Interactive and Batch
  • Duplicate rows will not be loaded 

Share your Thoughts