Published | March 8, 2018

Data Lake Query Quality

Data Lake Query Quality

Does your attribute model natively unite data?

Many organizations established, or are presently establishing data lakes as a cost effective means of provisioning operational intelligence query and analytics capabilities directly to the field personnel who need them the most, understand the data the best, and are the most capable of actioning insights gleaned.  Sounds like an ideal arrangement.  The quality of insights will be determined by the sources ingested into the lake and how this ingestion occurs.

User queries from the data lake are often validated against the source operational systems.  The problem is that operational systems will change during the day, the volume depending on workload but the data lake may only be refreshed nightly or in batches.  Sometimes, due to complexities or service fees, only key aspects are refreshed regularly while certain portions are only accessed on-demand.  Understanding how your data lake is being populated can save you time validating results.

Analytics processes usually begin with data provisioning and the at rest data stored in the lake is an excellent place to start.  Augmenting this data with warehouse content ensures accuracy by leveraging the organization’s efforts to date by cleansing, standardizing and storing historical changes to key entities.  Data warehouse content is often extracted nightly to the data lake for this reason.  Uniting data in motion available from the enterprise bus or from streaming feeds with data lake contents usually means additional components that sit beside the lake.

Whenever data from multiple source systems or from multiple departments is combined with the intent of providing a consolidated view, data quality issues are to be expected.  Traditionally, organizations were empowered to identify perceived errors in their consumptive data and sought to remedy these errors by correcting the source when possible or the transformation process when feasible.  Now this is not to suggest that a data quality program is required to sustain operational reporting in a data lake, not at all, because this problem gets even further aggravated when uniting data in motion available from the enterprise bus or from streaming feeds.

Unexpected query results based on the unified results is a common complaint.  Tracing the origin and identifying the root cause can be a complicated undertaking.  Organizations can simplify this process by ensuring alignment behind a common information model rooted at the attribute level, a common dictionary if you will.  By focusing efforts towards improving the alignment of each individual attribute, consensus can be gained between the contributing systems and technology delivery channels.  Further agility can be expected by avoiding structural dependencies that may create multiple editions or versions of each attribute.

With NexJ DAi, an attribute model that natively unites data at rest or in motion and presents results using a consistent terminology helps firms to better provision results for both user query and analytic efforts.  With NexJ DAi, integration occurs at an attribute level and, once resolved, the attribute can be assigned to many views. Attributes can integrate content from streaming web services and messages or from data at rest like databases or files. Attributes can also define a computation, allowing for centralization and review of calculations, easily modified and tailored to address specific needs – all leveraging the attributes already defined.  The NexJ DAi engine will natively publish attributes to associated views so changes in operational systems, updates from streaming services, or the latest messages are reflected.

Takeaways To Date

Key AspectResponseNexJ DAI
Hadoop EcoSystemHDFS Low Cost Commodity Storage
Hive In-Place Query
NexJ DAI integrates with Hadoop data lakes as a potential source system
Hadoop EcoSystemSpark In-Memory QueryNexJ DAI provisions semantic view data consumable through a Spark Adapter
Hadoop Data LakeBatch LoadNexJ DAI provisions the most up to date data
Hadoop EcoSystemSeparate technology for streamingNexJ DAI addresses both data at rest and in motion


How does your organization use a Data Lake?  What 360-degree data views power your analytics?  We welcome your thoughts, value your insights and action your feedback: share below!

speaker_notes Post Comments

Author: Matthew Bogart

Vice President, Marketing

Matthew is responsible for building awareness for NexJ and demand for its solutions. He regularly engages with analysts, key industry stakeholders, and thought leaders to stay abreast of technology innovation and financial services industry trends and challenges.

Matthew will be sharing his insight and perspective on the enterprise customer management market and the issues affecting the financial services industry today in regular contributions to the NexJ blog. He encourages readers to take part in the conversation or reach out to him directly with their observations on market trends and issues.

Comments Off on Data Lake Query Quality

Comments are closed

Blog Categories

Blogs by Date