7.1.3: Validating Data Quality and Readiness

This lesson teaches how to verify everything is robust and ready for LLM-powered interrogation and analysis.

Key Objectives

Understand why data validation is essential before analysis.
Learn to use quick checks and simple summary statistics to spot errors, inconsistencies, and gaps.
Practice ethical review: double-checking privacy compliance and appropriate data access.

Things to think about

The data you extract from an RM Compare session will be in the format described and expected. However the work described in 7.1.2 that you may have completed is worth double checking. Here are some suggestions.

Column and Key Consistency Check
1. Ensure every tab with a shared key (“Item name”, “Unique Pupil Identifier”) uses matching spelling, case, and format, enabling reliable lookups and joins.
Structural Validation
1. Confirm that each tab contains the expected columns and no missing crucial fields.
2. Verify there are no duplicate entries or unintentional blanks.
Sanity Testing With Examples
1. Use simple summary statistics (e.g., row counts, mean scores, value ranges) to detect unexpected results that may signal data issues.
2. Compare a VLOOKUP or quick join between main data and metadata to confirm accurate linkage.
Final Privacy, Anonymisation, and Access Review
1. Reconfirm that all personal information is anonymised or pseudonymised where appropriate.
2. Ensure you are following any school or institutional guidelines for secure data handling.

Reflection Prompt

Have you ever made an assumption based on incomplete or misaligned data?
How might a simple validation step help you avoid costly reporting or analysis mistakes?

What this means for you:

With this lesson, you will feel confident their data is accurate, privacy-safe, and ready for complex LLM questioning, giving them a reliable foundation for all subsequent analysis tasks within Module 7. This also models strong professional practice that benefits any data-driven educational project.

Group	Name	Domain	Expiration	Security	Purpose
necessary	csrftoken	compare.rm.com	365 days, 0:00:00	HTTP	Helps prevent CSRF attacks
necessary	_cf_bm	vimeo.com	1 day, 0:00:00	HTTP	Used to distinguish between humans and bots
preferences	wtm	compare.rm.com	365 days, 0:00:00	HTTP	Used to store users cookie preference choices
statistics	_ga	rm.com	365 days, 0:00:00	HTTP	Registers a unique ID used to generate statistical data on how visitor used the website
statistics	_ga_#	rm.com	365 days, 0:00:00	HTTP	Used by Google Analytics to collect data on user visits to the website
statistics	_hp2_#	rm.com	1 day, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	_hp2_id.#	rm.com	365 days, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	_hp2_ses_props.#	rm.com	1 day, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	vuid	vimeo.com	365 days, 0:00:00	HTTP	Collects data on the user's visits to the website
marketing	td	googletagmanager.com	0:00:00	HTTP	Used by Google Tag Manager to collect data on the user behaviour and interaction with the website
marketing	h	heapanalytics.com	0:00:00	HTTP	Collects data on the user behaviour and interaction with the website

Name	Domain	Purpose	Expiration	Security
csrftoken	compare.rm.com	Helps prevent CSRF attacks	365 days, 0:00:00	HTTP
_cf_bm	vimeo.com	Used to distinguish between humans and bots	1 day, 0:00:00	HTTP

Name	Domain	Purpose	Expiration	Security
_ga	rm.com	Registers a unique ID used to generate statistical data on how visitor used the website	365 days, 0:00:00	HTTP
_ga_#	rm.com	Used by Google Analytics to collect data on user visits to the website	365 days, 0:00:00	HTTP
_hp2_#	rm.com	Collects data on the user's navigation and behaviour on the website	1 day, 0:00:00	HTTP
_hp2_id.#	rm.com	Collects data on the user's navigation and behaviour on the website	365 days, 0:00:00	HTTP
_hp2_ses_props.#	rm.com	Collects data on the user's navigation and behaviour on the website	1 day, 0:00:00	HTTP
vuid	vimeo.com	Collects data on the user's visits to the website	365 days, 0:00:00	HTTP

Name	Domain	Purpose	Expiration	Security
td	googletagmanager.com	Used by Google Tag Manager to collect data on the user behaviour and interaction with the website	0:00:00	HTTP
h	heapanalytics.com	Collects data on the user behaviour and interaction with the website	0:00:00	HTTP