Extract, Transform, Load



Definition


ETL is used for managing databases - extracting data from databases and make it ready to import into a new or existing database.



Description




Please note that there is also ELT - Extract, Load, Transform - question: what is the difference and when would you use ELT in place of ETL?

This following is from here.

Extract, transform, load (ETL) refers to three separate functions combined into a single programming tool.

First, the extract function reads data from a specified source database and extracts a desired subset of data. For example in a supermarket, you may want to extract all the data of a customers and what they have purchased.

Next, the transform function works with the acquired data - using rules or lookup tables, or creating combinations with other data - to convert it to the desired state. For example the database that you want to import the data into may require the gender field to be 'M' or "F" but your data may be male or female. You therefore need to transform the data to the correct format - for example find all instances of Male in a particular field and replace with M.
You might not need all the fields in the records extracted from the original source - for example you may not want to know the loyalty card number or the customer name/personal details.

Finally, the load function is used to write the resulting data (either all of the subset or just the changes) to a target database, which may or may not previously exist.


ETL can be used to acquire a temporary subset of data for reports or other purposes, or a more permanent data set may be acquired for other purposes such as: the population of a data mart or data warehouse; conversion from one database type to another; and the migration of data from one database or platform to another.



Explanation and application





Here is an extract from a googlebook on the ELT process:
Screen Shot 2014-11-30 at 8.13.23 pm.png



Here is a link to a Data Warehouse ELT programmer for Roundy's Supermarkets, Inc., one of the US's premiere food retailers, who is looking for a Senior Programmer Analyst to be located at its home office in downtown Milwaukee. Roundy's today is a company of 21,000 employees, exceeding $4 billion in annual sales, and undergoing tremendous growth. Company-owned and operated supermarkets include Pick 'n Save stores and Copps Food Centers in Wisconsin and Rainbow Foods in Minnesota.

Some of skills that the progammer will need are:
  • Bachelor's Degree (BA or BS) from four-year college or university; five to ten years related experience or training; or equivalent combination of education and experience
  • Knowledge of Windows OS (95, 98, W2k, XP) and MS applications required
  • Ability to negotiate through conflict and complex challenges within project team
  • Knowledge of Teradata (preferred) or other large Data Warehouses (Oracle, DB2, SQL Server) is required.
  • Primary ETL tool is Informatica. Secondary ETL tool is SQL Server Integration Services. Knowledge of either of these or other ETL tools such as Aesential, DataStage, or AbInitio, etc. is required.

RESPONSIBILITIES:

  • Formulates and defines system scope, objectives, and specifications.
  • Assists users in defining needs with no supervision required.
  • Estimates work effort based on specifications and scope.
  • Designs, creates, and/or modifies data transformations to solve highly complex problems through system integration. Uses judgment and experience to resolve complex issues and challenges.
  • Develops functional and technical requirements and design documentation including test scripts and end user documentation.
  • Develops and may oversee application integration and external interface development.
  • Conducts unit testing and participates in system and user acceptance testing.
  • Ensures project completion and user satisfaction.
  • Makes solid technical decisions and negotiates through conflict and complex challenges ensuring positive results.
  • Participates in off-hours on-call rotation.





Links to Social and Ethical Issues


Please note that not all issues need to be addressed. Please add the URL or source of any examples to support your suggestion. It may be helpful to RANK the issues in the THIRD column.

Social & Ethical Issue
Examples that specifically link to the concept and/or definition in the Case Study
Ranking
1.1 Reliability and integrity


1.2 Security


1.3 Privacy and anonymity


1.4 Intellectual property


1.5 Authenticity


1.6 The digital divide and equality of access


1.7 Surveillance


1.8 Globalization and cultural diversity


1.9 Policies


1.10 Standards and protocols


1.11 People and machines


1.12 Digital citizenship





References and resources