The danger of assuming that everyone knows…

I cannot count the number of times where I have been in a meeting, or a stand up, and someone has rattled off something technical to the group that I have subsequently had to Google (or beg assistance from a friendly technical person).  I do not have a computer degree (in fact I have a …art degree…) so I have had to self-teach and rely on explanations from patient managers, developers and DBAs over the years. I have been blessed with some truly excellent mentors, but I have also had my fair share of misunderstandings because of my lack of formal education in this discipline.

One of the first times this happened to me was when I was collating some requirements for a Business Intelligence application. I was an Analyst at the time, and very green. As the technical teams rattled on shaping their requirement, I was trying to take notes and I accidentally wrote down ELT instead of ETL  Anyone with a familiarity with data integration will know that both exist (although back then ELT was a pretty rare concept) and while they are similar there is a fundamental difference, and so by capturing one instead of the other, it started a chain of events which confused a lot of teams and wasted a fair chunk of time.

It was my mistake and I was in the fortunate position that I felt able to own up to it without feeling I would be castigated for my error. Long story short, it resulted in a very useful learning session with one of the OBIEE DBAs who very kindly took me through some of the concepts relating to that technology. While my understanding is obviously simplified compared to someone who works with the technology everyday – I would not make the same mistake again, and I could subsequently leverage that knowledge to help my Product Owners understand the concepts without resorting to too much jargon (a very useful skill that should be in any Environment Managers repertoire!)

Lessons learned?

If you are technical try to remember that not everyone has the same background knowledge as you. It is well worth investing some time in breaking down some of your more complex concepts into simpler terms.

If you are not technical don’t be afraid to ask. The worst thing you can do is keep quiet. treat everything as a learning opportunity.

If you are management, try to foster an environment where people are not afraid to fail. If I had not felt comfortable that I was not going to be ‘punished’ for my lack of knowledge, I may not have been so inclined to admit my ignorance, leading to more time wasted.

Just in case you didn’t know… what is a ETL?

I would remiss if I didn’t right?

Central to virtually all Data Integration is the basic process of Extract, Transform and Load. It can be a scheduled, event-driven or real time. In my world of Non production Environments Management it is usually event driven and often ad hoc, but in Operations it tends to be real-time in order to ensure the latest data is available on demand.

E – Extract: Selecting the right data and extracting it from its source. This is usually staged in a holding area somewhere. If there is a lot of data or the complex processing, this is often where one might see batch processing to perform the selection or identify changed data to extract.

T – Transform: Takes the selected data and makes it compatible with its target. This may be transformation to single or multiple targets, and where the data is used to trigger non-persistent events. Transformation may be done in batch or real-time

Examples could include

  • changing formats from one codeset to another,
  • changing the structure of data from denormalised to normalised.,
  • de-duplication (for unique keys or records)
  • Reordering to fit a specific format

L – Load: The load step of ETL presents the results of the transformation process into the target system. It is quite possible further processing will be required to make the data viable for final use or integration with other systems.

And what about ELT?

If the target system has more transformational capability than either the source or intermediary system, the order of process may become Extract, Load, Transform. ELT allows transformations to occur after the load to the target system; it enables instantiation on the target system as raw data. This is a fairly common approach fro data lakes. (Paraphrased from DMBOK 2nd ed)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s