Using Data to Tell Your Story

data

With a few exceptions, staff from all colleges and universities use data to complete regulatory compliance and accreditation work on a regular basis. Much of the time these tasks are routine, and some might say, mundane. Once programs are approved, staff typically only need to submit an annual report to a state department of education or accrediting body unless the institution wants to make major changes such as add new programs or satellite campus, change to a different educational model, and so on.

And then typically every 7-10 years a program or institution must reaffirm their program approval or accreditation. That process is much more complex than the work completed on an annual basis.

Regardless of whether an institution is simply completing its annual work or if they are reaffirming its accreditation, all strategic decisions must be informed or guided by data. Many institutions seem to struggle in this area but there are some helpful practices based on my experiences over the years:

Tips for Using Data to Tell Your Story

  • Know exactly what question(s) you are expecting to answer from your assessment data or other pieces of evidence. If you don’t know the question(s), how can you know you can provide the information accreditors are looking for?
  • Be selective when it comes to which assessments you will use. Choose a set of key assessments that will inform your decision making over time, and then make strategic decisions based on data trend lines. In other words, avoid the “kitchen sink” approach when it comes to assessments and pieces of evidence in general. Less is more, as long as you choose your sources carefully.
  • Make sure the assessments you use for accreditation purposes are of high quality. If they are proprietary instruments, that’s a plus because the legwork of determining elements such as validity and reliability has already been done for you. If you have created one or more instruments in-house, you must ensure their quality in order to yield accurate, consistent results over time. I talked about validity and reliability in previous articles. If you don’t make sure you are using high-quality assessments, you can’t draw conclusions about their data with any confidence. As a result, you can’t really use those instruments as part of your continuous program improvement process.
  • Take the time to analyze your data and try to “wring out” all those little nuggets of information they can provide. At a minimum, be sure to provide basic statistical information (i.e., N, mean, median, mode, standard deviation, range). What story are those data trying to tell you within the context of one or more regulatory standards?
  • Present the data in different ways. For example, disaggregate per program or per satellite campus as well as aggregate it as a whole program or whole institution.
  • Include charts and graphs that will help explain the data visually. For example, portraying data trends through line graphs or bar graphs can be helpful for comparing a program’s licensure exam performance against counterparts from across the state, or satellite campuses with the main campus.
  • Write a narrative that “tells a story” based on key assessment data. Use these data as supporting pieces of evidence in a self-study report. Narratives should fully answer what’s being asked in a standard, but they should be written clearly and concisely. In other words, provide enough information, but don’t provide more than what’s being asked for.

Let’s face it: Compliance and accreditation work can be tricky and quite complex. But using data from high-quality assessments can be incredibly helpful in “telling your story” to state agencies and accrediting bodies.

###

About the Author: A former public school teacher and college administrator, Dr. Roberta Ross-Fisher provides consultative support to colleges and universities in quality assurance, accreditation, educator preparation and competency-based education. Specialty: Council for the Accreditation of Educator Preparation (CAEP).  She can be reached at: Roberta@globaleducationalconsulting.com

Top Photo Credit: Markus Winkler on Unsplash 

Inter-rater Reliability Ensures Consistency

interrater reliability

In a previous article, we focused on determining content validity using the Lawshe method when gauging the quality of an assessment that’s been developed “in-house.” As a reminder, content validity pertains to how well each item measures what it’s intended to measure and the Lawshe method determines the extent to which each item is necessary and appropriate for the intended group of test takers. In this piece, we’ll zero in on inter-rater reliability.

Internally Created Assessments Often Lack Quality Control

Many colleges and universities use a combination of assessments to measure their success. This is particularly true when it comes to accreditation and the process of continuous program improvement. Some of these assessments are proprietary, meaning that they were created externally—typically by a state department of education or an assessment development company. Other assessments are internally created, meaning that they were created by faculty and staff inside the institution. Proprietary assessments have been tested for quality control relative to quality indicators such as validity and reliability. However, it’s common for institutional staff to confirm these elements in the assessments that are created in-house. In many cases, a department head determines they need an additional data source and so they tap the shoulder of faculty members to quickly create something they think will suffice. After a quick review, the instrument is approved and goes “live” without piloting or additional quality control checks.

Skipping these important quality control methods can wreak havoc later on, when an institution attempts to pull data and use it for accreditation or other regulatory purposes. Just as a car will only run well when its tank is filled with the right kind of fuel, data are only as good as the assessment itself. Without reliable data to that will yield consistent results over multiple administrations, it’s nearly impossible to draw conclusions and make programmatic decisions with confidence.

Inter-rater Reliability

One quality indicator that’s often overlooked is inter-rater reliability. In a nutshell, this is a fancy way of saying that an assessment will yield consistent results over multiple administrations by multiple evaluators. We most often see this used in conjunction with a performance-based assessment such as a rubric, where faculty or clinical supervisors go into the field to observe and evaluate the performance of a teacher candidate, a nursing student, counseling student, and so on. A rubric could also be used to evaluate a student’s professional dispositions at key intervals in a program, course projects, and the like.

In most instances, a program is large enough to have more than one clinical supervisor or faculty member in a given course who observe and evaluate student performance. When that happens, it’s extremely important that each evaluator rates student performance through a common lens. If for example one evaluator rates student performance quite high or quite low in key areas, it can skew data dramatically. Not only is this grading inconsistency unfair to students but it’s also highly problematic for institutions that are trying to make data-informed decisions as part of their continuous program improvement model. Thus, we must determine inter-rater reliability.

rubric

Using Percent Paired Agreement to Determine Inter-rater Reliability

One common way to determine inter-rater reliability is through the percent paired agreement method. It’s actually the simplest way to say with confidence that supervisors or faculty members who evaluate student performance based on the same instrument will rate them similarly and consistently over time. Here are the basic steps involved in determining inter-rater reliability using the percent paired agreement method:

Define the behavior or performance to be assessed: The first step is to define precisely what behavior or performance is to be assessed. For example, if the assessment is of a student’s writing ability, assessors must agree on what aspects of writing to evaluate, such as grammar, structure, and coherence as well as any specific emphasis or weight should be given to specific criteria categories. This is often already decided when the rubric is being created.

Select the raters: Next, select the clinical supervisors or faculty members who will assess the behavior or performance. It is important to choose evaluators who are trained in the assessment process and who have sufficient knowledge and experience to assess the behavior or performance accurately. Having two raters for each item is ideal—hence the name paired agreement.

Assign samples to each rater for review: Assign a sample of rubrics to each evaluator for independent evaluation. The sample size should be large enough to ensure statistical significance and meaningful results. For example, it may be helpful to pull work samples from 10% of the entire student body in a given class for this exercise, if there are 100 students in the group. The samples should either be random, or representative of all levels of performance (high, medium, low).

Compare results: Compare the results of each evaluator’s ratings of the same performance indicators using a simple coding system. For each item where raters agree, code it with a 1. For each item where raters disagree, code it with a 0. This is called an exact paired agreement, which I recommend over an adjacent paired agreement. In my opinion, the more precise we can be the better.

Calculate the inter-rater reliability score: Calculate the inter-rater reliability score based on the level of agreement between the raters. A high score indicates a high level of agreement between the raters, while a low score indicates a low level of agreement. The number of agreements between the two raters is then divided by the total number of items, and this number is multiplied by 100 to express it as a percentage. For example, if two raters independently score 10 items, and they agree on 8 of the items, then their inter-rater reliability would be 80%. This means that the two raters were consistent in their scoring 80% of the time.

Interpret the results: Finally, interpret the results to determine whether the assessment is reliable within the context of paired agreement. Of course, 100% is optimal but the goal should be to achieve a paired agreement of 80% or higher for each item. If the inter-rater reliability score is high, it indicates that the data harvested from that assessment is likely to be reliable and consistent over multiple administrations. If the score is low, it suggests that those items on the assessment need to be revised, or that additional evaluator training is necessary to ensure greater consistency.

Like determining content validity using Lawshe, the percent paired agreement method in determining inter-rater reliability is straightforward and practical. By following these steps, higher education faculty and staff can use the data from internally created assessments with confidence as part of their continuous program improvement efforts.

###

About the Author: A former public school teacher and college administrator, Dr. Roberta Ross-Fisher provides consultative support to colleges and universities in quality assurance, accreditation, educator preparation and competency-based education. Specialty: Council for the Accreditation of Educator Preparation (CAEP).  She can be reached at: Roberta@globaleducationalconsulting.com

 

Content Validity: One Indicator of Assessment Quality

content validity

Updated on April 13, 2023 to include additional CVR calculation options from Dr. Gideon Weinstein. Used with permission. 

In this piece, we will focus on one important indicator of assessment quality: Content Validity.

Proprietary vs. Internal Assessments

As part of their programmatic or institutional effectiveness plan, many colleges and universities use a combination of assessments to measure their success. Some of these assessments are proprietary, meaning that they were created externally—typically by a state department of education or an assessment development company. Other assessments are considered to be internal, meaning that they were created by faculty and staff inside the institution. Proprietary assessments have been tested for quality control relative to validity and reliability. In other words:

At face value, does the assessment measure what it’s intended to measure? (Validity)
Will the results of the assessment be consistent over multiple administrations? (Reliability)

Unfortunately, however, most colleges and universities fail to confirm these elements in the assessments that they create. This often results in less reliable results and thus the data are far less usable than they could be. It’s much better to take the time to develop assessments carefully and thoughtfully to ensure their quality. This includes checking them for content validity. One common way to determine content validity is through the Lawshe method.

Using the Lawshe Method to Determine Content Validity

The Lawshe method is a widely used approach to determine content validity. To use this method, you need a panel of experts who are knowledgeable about the content you are assessing. Here are the basic steps involved in determining content validity using the Lawshe method:

  • Determine the panel of experts: Identify a group of experts who are knowledgeable about the content you are assessing. The experts should have relevant expertise and experience to provide informed judgments about the items or questions in your assessment. Depending on the focus on the assessment, this could be faculty who teach specific content, external subject matter experts (SMEs) such as P-12 school partners, healthcare providers, business specialists, IT specialists, and so on.
  • Define the content domain: Clearly define the content domain of your assessment. This could be a set of skills, knowledge, or abilities that you want to measure. In other words, you would identify specific observable or measurable competencies, behaviors, attitudes, and so on that will eventually become questions on the assessment. If these are not clearly defined, the entire assessment will be negatively impacted.
  • Generate a list of items: Create a list of items or questions that you want to include in your assessment. This list should be comprehensive and cover all aspects of the content domain you are assessing. It’s important to make sure you cover all the competencies, et al. that you listed in #2 above.
  • Have experts rate the items: Provide the list of items to the panel of experts and ask them to rate each item for its relevance to the content domain you defined in step 2. The experts should use a rating scale (e.g., 1-5) to indicate the relevance of each item. So, if it’s an assessment to be used with teacher candidates, your experts would likely be P-12 teachers, principals, educator preparation faculty members, and the like.
  • Calculate the Content Validity Ratio (CVR): The CVR is a statistical measure that determines the extent to which the items in your assessment are relevant to the content domain. To calculate the CVR, use the formula: CVR = (ne – N/2) / (N/2), where ne is the number of experts who rated the item as essential, and N is the total number of experts. The CVR ranges from -1 to 1, with higher values indicating greater content validity. Note to those who may have a math allergy: At first glance, this may seem complicated but in reality it is truly quite easy to calculate.
  • Determine the acceptable CVR: Determine the acceptable CVR based on the number of experts in your panel. There is no universally accepted CVR value, but the closer the CVI is to 1, the higher the overall content validity of a test. A good rule of thumb is to aim for a CVR of .80.
  • Eliminate or revise low CVR items: Items with a CVR below the acceptable threshold should be eliminated or revised to improve their relevance to the content domain. Items with a CVR above the acceptable threshold are considered to have content validity.

As an alternative to the steps outlined above, the CVR computation with a 0.80 rule of thumb for quality can be replaced with another method, according to Dr. Gideon Weinstein, mathematics expert and experienced educator.  His suggestion: just compute the percentage of experts who consider the item to be essential (ne/N) and the rule of thumb is 90%. Weinstein went on to explain that “50% is the same as CVR = 0, with 100% and 0% scoring +1 and -1. Unless there is a compelling reason that makes a -1 to 1 scale a necessity, then it is easier to say, “seek 90% and anything below 50% is bad.”

Use Results with Confidence

By using the Lawshe method for content validity, college faculty and staff can ensure that the items in their internally created assessments measure what they are intended to measure. When coupled with other quality indicators such as interrater reliability, assessment data can be analyzed and interpreted with much greater confidence and thus can contribute to continuous program improvement in a much deeper way.

###

About the Author: A former public school teacher and college administrator, Dr. Roberta Ross-Fisher provides consultative support to colleges and universities in quality assurance, accreditation, educator preparation and competency-based education. Specialty: Council for the Accreditation of Educator Preparation (CAEP).  She can be reached at: Roberta@globaleducationalconsulting.com

 

Top Photo Credit: Unseen Studio on Unsplash 

 

Countdown: Assessing Your Way to Success

We’re just a couple of days away from an all-day workshop I’m conducting on behalf of the Network for Strong Communities entitled, “Assessing Your Way to Success: How to Use Measurable Outcomes to Achieve Your Goals”.

The June 14th workshop is designed for non-profit organizations representing a variety of sectors (healthcare, social services, educational, faith-based, etc.) who are committed to tackling problems and meeting the needs of those they serve.

Designing new programs and initiatives is something all non-profits do—but it’s important to give those efforts every chance of success. This workshop will provide many tools that can help!

This will be a fast-paced, action-packed day with lots of hands-on activities and FUN! Please consider joining us as we tackle topics like:

  • •      The basics of designing effective programs/initiatives
  • •      Determining success through measurable outcomes
  • •      The role of high-quality assessments to accurately gauge success
  • •      Building a strong program evaluation model
  • •      Assessment basics
  • •      Putting all the tools to work
  • •      Making data-driven decisions to inform strategic planning
  • •      Individualized consultation time: Let’s get started building your new                                       program/initiative!

 

 Looking forward to seeing you there! If you live outside the St. Louis metro area, reach out to me and I can come to your location.

–rrf

 

About the Author

Dr. Roberta Ross-Fisher is a national leader in educator preparation, accreditation, online learning, and academic quality assurance. An accomplished presenter, writer, and educator, she currently supports higher education, P-12 schools, and non-profit agencies in areas such as competency-based education, new program design, gap analysis, quality assurance, program evaluation, leadership, outcomes-based assessment, and accreditation through her company, Global Educational Consulting, LLC. She also writes about academic excellence and can be contacted for consultations through her blog site (www.robertarossfisher.com). 

 

 

 

 

What’s Under the Hood: Major Components of Competency-Based Educational Programs

This is the second installment in a series of blog posts on the topic of competency-based education. In the first blog, I provided a basic overview of what competency-based education is, why I started using it with my own students, and other terms it’s frequently known by. Feel free to reach out to me if you have additional questions or need support implementing CBE in your school.

Regardless of whether you work in a P-12 school or at a higher education institution, there are six major pillars that anchor a solid competency-based education program:

  • Curriculum
  • Instruction
  • Assessment
  • Faculty Training & Support
  • Parent/Caregiver Orientation & Support (for P-12 Schools)
  • Student Orientation & Support (for all learner levels)

 

A strong, healthy CBE program must be built on these pillars, which makes preparation, planning, and collaboration extremely important. All six should be tied directly to the school’s mission and vision, and they should all be connected to each other to avoid a disjointed program.

I recommend using a backwards design model when developing your own competency-based education program—in other words, create a well-defined “picture” of what you want to accomplish—what is your final goal? What does success look like in your school? How would that be defined? Once you and your team know what you want to accomplish, you can start working backward from there and build out each of those six components.

Installment #3 of this series will focus on developing curriculum in a competency-based education program.

 

–rrf

 

Dr. Roberta Ross-Fisher is a national leader in educator preparation, accreditation and academic quality assurance. She currently supports higher education and P-12 schools in areas such as competency-based education, teacher licensure, distance learning, and accreditation through her company, Global Educational Consulting, LLC.