7.1.3: Validating Data Quality and Readiness

This lesson teaches how to verify everything is robust and ready for LLM-powered interrogation and analysis.

Key Objectives

Things to think about

The data you extract from an RM Compare session will be in the format described and expected. However the work described in 7.1.2 that you may have completed is worth double checking. Here are some suggestions.

  1. Column and Key Consistency Check
    1. Ensure every tab with a shared key (“Item name”, “Unique Pupil Identifier”) uses matching spelling, case, and format, enabling reliable lookups and joins.
  2. Structural Validation
    1. Confirm that each tab contains the expected columns and no missing crucial fields.
    2. Verify there are no duplicate entries or unintentional blanks.
  3. Sanity Testing With Examples
    1. Use simple summary statistics (e.g., row counts, mean scores, value ranges) to detect unexpected results that may signal data issues.
    2. Compare a VLOOKUP or quick join between main data and metadata to confirm accurate linkage.
  4. Final Privacy, Anonymisation, and Access Review
    1. Reconfirm that all personal information is anonymised or pseudonymised where appropriate.
    2. Ensure you are following any school or institutional guidelines for secure data handling.​

Reflection Prompt

What this means for you:

With this lesson, you will feel confident their data is accurate, privacy-safe, and ready for complex LLM questioning, giving them a reliable foundation for all subsequent analysis tasks within Module 7. This also models strong professional practice that benefits any data-driven educational project.