Automated Marking of Tests/Quizzes

I’ve been dabbling in the automatic marking of tests and quizzes for several years now. By this, I mean a web-based set of questions that a student completes on a specific topic, that automatically grades the answers as correct/incorrect (and sometimes gives partial credit) and returns the mark (sometimes with feedback) to the student at a specific time. Before you think – oh this is good, no marking, I’d warn you about setup burden.

What kinds of assignments make really good auto-tests? I have used them for the following:

  • pre-laboratory exercises that give practice at calculations, some safety aspects, and identifying products/balancing equations (Blackboard Test/Pool)

  • online safety quiz  (Blackboard Test/Pool)

  • assessed tutorial with a very tight deadline before the exam  (Blackboard Test/Pool)

  • referencing and academic conduct test  (Blackboard Test/Pool)

  • diagnostic test (new for 2017! Google Form Test)

The technology has limitations, particularly related to the type of questions you can ask. I find the following types useful:

  • multiple choice questions

  • calculated numeric [with the caveat that Blackboard can’t deal with 5.5 and 5,5, units, or number of decimal places]

  • fill in the blank or short answer [with the caveat that students often can’t spell (even when given a list of words to select their answer from), and sometimes deducing the syntax of the required answer is tricky]

  • matching pairs [really good for reactant/product equation matching in transition metal redox chemistry]

I also like the ability to write a pool of questions and a system that allows each student to be asked a number of questions from the pool. If every question is from a different pool, this reduces the scope for collusion. An example of a good pool question stem for a calculated numeric question:

Calculate the mass of copper sulfate required to prepare a [Y] molar aqueous solution in a [X] mL volumetric flask. 

You can see how simple it is to vary X and Y within the realms of possibility, generate all the correct answers in excel and make a pool of questions.

The setup burden is how long it takes to create the initial test. As a rough guide, I’d estimate it to be at least twice as long as it would take to manually mark! So for a pre-lab done by 50 students, taking me 10 hours to mark, I’d expect to spend about 10 hours developing the online version. I do not recommend doing the online test thing unless you know you can use it for at least 2 years – one reason for doing it is to reduce the marking load and you don’t really start to make gains until the 3rd year of running. On the other hand, it’s a convenient way to ship time from semester (marking time) into quieter times of the year (prep time). I estimate that each test requires 1 – 2 hours of tweaking and set-up each year, usually after reviewing the analytics from Blackboard, weeding out poorer questions, adding a couple of new ones…that sort of thing.

Why do I do this? Well each of the assignments I’ve outlined are reason enough in themselves, but some have transitioned from paper-based to online (pre-labs, diagnostic test) and some would not exist if they could not be online (safety, referencing, academic conduct, assessed tutorial). So sometimes there is no reduction in marking time for me because I wouldn’t offer the assignment in an alternative manner. Technology facilitates the use of formative tests to aid learning, so I use it.

This year I’m expanding my range of formative tests to transfer my 1st year spectroscopy ‘drill’ questions into an online format. When teaching things like the equations of light, basic NMR, IR etc, I recognize the value in doing lots of examples. I also recognize the value in those examples stepping up in difficulty every so often, I’ve been calling them levels.

For example, using the equation E = hν

Level 1 – calculation of E in J, with ν in Hz

Level 2 – calculation of E in kJ/mol

Level 3 – calculation of E in kJ/mol with ν in ‘insert standard prefix here’ Hz

Level 4 – calculation of ν with energy in J

Level 5 – …

You get the idea anyway.  I read a paper on this a few years back, about stepping up calculations in small steps.  So I’m making question pools for each level, bundling a few levels together into a quiz then setting a minimum score requirement to gain access to the next levels. Students will do quiz 1 and if their mark is high enough (80%+) they get access to quiz 2. If it isn’t, they’ll get access to a couple of worked examples and the chance to re-do quiz 1 to get the mark.

I’m aware that this type of drill enforces a purely algorithmic approach, but if my students can’t do these bits, they are going to run into a whole lot of problems at higher levels. When setting exam questions, I balance the question between the algorithmic problem solving stuff like performing a calculation, and the ‘explain your answer’ part where they need to demonstrate a degree of understanding. We can argue over the appropriate balance between those sections but I think the algorithmic stuff should be 40 – 60% of the marks available (depending on the level of the paper) and the balance should be the explanation stuff, or higher level problem solving such as unseen, unfamiliar problem types. With this balance I’m saying ‘well you can probably pass the exam if you can do the basics but you need to show more to get a great mark’.  I also assume that intended learning outcomes define a passing mark (40%) or a low 2:2 mark (50%), rather than a 100% mark.

The experience of setting up and running diagnostic test through Google Form Tests requires a post on it’s own so I’ll come back to that.


Reflections on Variety in Chemistry Education/Physics Higher Education Conference 2017

This year’s conference started on Thursday 24th August 2017. I did not attend the lab events on Wednesday because I think a 2-day conference is sufficient – I get seriously ‘conferenced out’ after a while and I’d rather focus on the talks. I did arrange to arrive in time for Wednesday dinner so as to avoid early trains and a very long day by arriving on Thursday.

This was also the first time I attended a conference seriously aware that I had limited energy and that I needed to take care not to over do things. There were a few challenges in that regard, simple things like getting between venues, accommodation, dining rooms, and bus stops was harder, as was standing for several hours in a poster session. It’s quite eye opening when you notice that you struggle to do things you previously took for granted. The only advice I have for people organising conferences, is to be really explicit in delegate information before arrival about distances between things, requirements to take buses, and to give serious thought to how much seating is available during sessions such as posters and lunches.

I enjoyed many of the sessions, and a lot of the talks. I found several talks highly frustrating, particularly those that could easily play into the unconscious bias held by many. These included the keynote on gender differences in the physics force concept inventory, and the closing keynote on perspectives either side of A-level/university chemistry teaching. Here’s the thing: unconscious bias afflicts us all both as recipients and as people who hold them. One particularly insidious one is the bias that those who teach in HE often have towards those who teach in secondary.I’ve heard a lot of academics disparage teachers of chemistry at A-level, particularly around ‘teaching to the test’.  I do not doubt that many HE teachers view incoming students through a deficit lens – they focus on what the students can’t do rather than respecting the enormous hard work and effort and learning that has taken place in secondary (and further education). And let’s be clear, I don’t mean all HE teachers but I’d rather focus on what our incoming students can do and acknowledge the diversity inherent in the secondary and further education sectors, and the enormous pressure on both teachers and students. I disliked that the keynote could feed those attitudes by drawing greater attention to the issues.

My (annual?) issue with the gender thing. Males and females performed differently on a test that was developed to see how much students understand about some physics stuff. It wasn’t designed to investigate gender, and if it shows a difference in male/female performance, we should absolutely investigate and be concerned. But we have to be really careful how we do this and make sure that we’re using the right research tools and asking the right research questions. We also have to be really careful to explore gender sensitively which means considering all aspects of gender, rather than treating it as a male/female binary.


Moving on from that, there was an interesting talk on teaching tips for making a more inclusive teaching environment. I think that’s something we all need a reminder of from time to time but I would have liked a few more concrete suggestions on things that work. PDF format is variable and not all versions work for screenreaders…OK then, which way of generating a PDF is best? That sort of thing. And it would be really good to share ideas about how to convince all those who teach in HE to adopt practices that make teaching as inclusive as possible.

The poster session didn’t really work for me – there were two rooms, with many of those in the first room unaware of the second room. There were also very few seats and that made it very very tiring. I had several very good discussions with people over my posters but never really made it round them all. Fortunately the majority of the posters were shared electronically beforehand and this allowed me to review them at my own pace. There were still lots of people I wanted to talk to in the poster session but I never found them – and by the time lunch was served, I was out of energy.

There is also some question over what Variety in Chemistry Education conference is for. What type of chemistry education thing. To me, the strength of Variety is in the variety: I go to see great chem ed ideas presented, but I also enjoy presentations that steer more strongly towards chemistry education research. I like the mix and I wouldn’t want to see that change. I do, however, have one caveat: I dislike intensely anything presented that has not been sensibly evaluated. And to evaluate a teaching innovation, one must carry out the activity with students. I’m OK with the evaluation being done purely from the perspective of the teacher, I’m OK with the evaluation being done from all perspectives. I’m not OK with unevaluated activities being presented. I suppose I should comment on the role of Variety in seeking collaborations, or putting a flag up to say you’re doing a certain thing and would anyone like to help. That’s what oral bytes are for, nothing longer. I would call on future organisers of this conference, and those who scrutinise the abstracts to ensure that evidence of evaluation is clear.

I’ll also note that I have no intention of attending the pre-conference activities. 2 days of conference and dinner the night before is sufficient for me, and I don’t need to attend more.

We also had a chat on Twitter the other night about whether we should facilitate the attendance of undergraduates who’ve done chemistry education final year projects. I think it would be good to see undergrads attending Variety and if their work is of sufficient standard to justify submitting an abstract for a presentation, then great stuff, what an opportunity! I would not, however, reduce the ‘standard’ expected of that abstract or subsequent presentation. So I’m not in favour of giving undergraduates (or any other group) more chance of getting a presentation slot just because of what they are.


This post has been several drafts in the making, I’m still not happy that I’m articulating what I want to say very clearly, but right now I’m done with it and am hitting publish! Time to move on!


Poster: Alternative Assessments #ViCEPHEC17

This is my poster for #ViCEPHEC17 on a range of assessments through 3 years of the chemistry curriculum. It was a surprisingly difficult poster to put together as there was so much I wanted to include. In the end I focussed on the general skills developed rather than the specifics of each assignment.

One of the highlights of these assessments is getting students to create ‘What Am I?’ puzzles. This started as a part of this blog where I’d post the chemical components of everyday items and readers had to guess the item. Since running this as an assessment, I’ve hidden those blog posts. The students have to decide on an everyday item, investigate the chemical composition of it, draw the compounds using ChemDraw and make it an appealing single page graphic. The purpose of the first page is to allow the reader (that’d be me) to guess the item. I’m getting very good at identifying various body sprays by chemical structure. On the 2nd page they should indicate what the item is, and briefly outline the purpose of each chemical in that item, finishing with a reference list.  Should you wish to play, here is the What Am I? Variety in Chemistry Education Edition:

Below I’ve listed the key chemicals found in a common thing, which may be slightly UK-centric.  I’ve drawn the chemical structures of principal components where simple and appropriate; given the E number or CAS number (however tempting Sigma-Aldrich catalogue numbers would be) if no simple chemical structure exists for an additive; and given the chemical formulae or name if neither of the above make sense.  See if you can guess what this is!  If you guess on Twitter (@kjhaxton), please DM your guess so others can play.


structures of chemicals contained in common item


And should you wish to find more of these puzzles, try:

With respect to the other assessments I mention, I have presented on some of the elements before so there are a couple of slide decks for further info.

For more information on the 3rd year infographics:

For more information on some of the other assessments

And on the 1st year screencast presentations


Poster: NMR Diagnostic Test #ViCEPHEC17

One purpose of this blog post is to provide additional materials for my poster at Variety in Chemistry Education 2017.

This is the third of three posters presented this year on our NMR diagnostic test project. It’s a subset of a larger project but the NMR results have been quite interesting and I’ve also been fortunate enough to have a couple of students work on the project.

For those attending ViCEPHEC17 who fancy having a go at some spectroscopy diagnostic test questions, here’s a link to a shortened online version. All submissions are anonymous and there are only 5 questions!

If you have any feedback on the test, you can comment on this post, or get in touch at Variety or by email. Using GoogleForms is new this year for this type of test.

The ViCEPHEC17 Poster is here:


For the RSC Twitter Poster competition, this poster was submitted which outlines some of the key initial findings:

For Methods in Chemistry Education Research, the poster focussed on the research methods we were using to investigate this project:

Poster: Methods of investigating alternative conceptions in NMR

Better Ed Tech Solutions

It’s course prep season here. It’s fun. Sort of. I’m looking forward to term starting. Kind of. I’m doing everything the same as last year. Not really. I’m frustrated that there’s no ed tech tools that do what I need them to. Totally.

Screencast Presentations: After last academic year’s wee ‘oh shit I broke the VLE and this time it can’t be fixed’ moment, I’m looking for a legitimate alternative for the 1st year screencast presentation assignment.

Here’s the workflow:

Student: – create and submit screencast; complete electronic self-assessment form

Me: – allocate or initiate peer assessment, each student views and marks 4 screencasts and completes electronic peer-assessment form on each.

Student: – watches 4 screencasts, completes electronic peer-assessment form after each, completes 2nd self-assessment form.

Me: – screams in dismay as I have to compile (for a class of 100): 2 x self-assessment (200 items), 4 x peer-assessment (400) items, moderate the grades, arrive at the final grade and release marks and peer-assessment feedback to submitting students.

Now to be fair to me, I’ve got the grade compilation bit down to around 2 hours work for a class of 100. I’m fairly whizzy with excel (mmmmmm….data). But I think it should be possible to do this automatically which is what the thing that I broke on the VLE did (well not exactly but the shortcomings were far outweighed by the benefits).  The hard labour version of this is google forms for the self- and peer-assessment bits which generate multiple spreadsheets, then the excel-whizzing to bang it all together. [if you’re ever doing this, the “IF” function is your friend to ensure that you’re putting the right grades together].

Voting for Peer Instruction: Another problem this year is that I’m not hiring a supply chain of lorries just to get me and my teaching stuff to class.  That means no carting of personal response devices around. I have peer instruction questions, I’d like to do PI with students. I tried Socrative last year but the BYOD element of this was…frustrating.

Firstly, I couldn’t work out if Socrative could do those fancy graphs that the Turning Technologies stuff does – the ones that you can display in-presentation to the class to show the range of marks for an MCQ without the correct answer being shown. I couldn’t find it.

Secondly, (and this is a common complaint from me), most of my pre-existing MCQs weren’t in a format that could be easily imported or exported, and indeed multiplatform compatability was pretty much non-existent. Now if you think I object to 2 hours bashing grades in excel, it’s nothing compared to the 2 hours copy-pasting questions between different programmes that might do what I need.

Thirdly, there aren’t enough power points in lecture theatres for students BYODing laptops in a 2-hour class. Wifi held up well, batteries did not.

[In the interests of fairness, the personal response devices can be a pain when no one’s checked and replaced the batteries for a while…but at least I know which door to knock on about that. And there’s the weight.]


Reflective Diaries: last year I ran a reflective diary exercise for the first time. It worked fairly well and I used VLE blogs for the purpose. The interface for grading is basic, and doesn’t allow grading each post rather than the overall blog (unless I missed something). There’s also something quite irritating about the month by month view and it’s impact on the total number of posts submitted. This year I’m setting up a Google Form (set as a test) for each reflection topic and am wondering whether to run this through the VLE with links to each form (generating one spreadsheet of outcomes per reflection) or to brave Google Classroom which I believe will dump all the grades into one spreadsheet and save me some excel-whizzing. The plus side of Google Classroom definitely fits the ‘mmm shiny new play time’ category.  I can see some features that really appeal to me for the type of module I’m running. The negative side is that I will be making it all up as I go along and really have little idea beyond that which I can figure out or google. Actually I can see some really nice features in Google Classroom that would work well with the module generally so it might be worth the investment of time.


So, how about you lot? Trying anything new this coming academic year?



Colourful Chemistry Or Not

I have been reading about how colour vision deficiency (colour blindness) can cause issues for people. I was very interested in this article ( and several others about the difficulties in reading certain graphics and maps when red-blue colour scales are used. For example, temperature anomaly charts are often red-white-blue with red being hotter than average and blue being colder than average. Some forms of colour vision deficiency will stop a reader accurately differentiating these scales. I think the chart and map making disciplines may be ahead of chemistry in thinking about this – most of the articles I can find relate to earth and environmental sciences.

Colour blindness is about 7% prevalent in males and 0.4% in females and it differs in severity and type. 7% probably means that at least one student in a class has an issue with ‘seeing’ colour and this may cause problems. Many people are not aware that they are colour blind. Possible issues for chemistry teaching may be:

  • litmus paper/pH paper/universal indicator

  • red-blue shading on graphics (molecular orbital diagrams, where colours are used for emphasis, charts etc)

  • transition metal chemistry where observing colour changes and noting down those observations are key (and often assessed)

If students are working in pairs, the lab partner may be able to help out with some tasks.  But what if the task in question is part of a practical exam?  Generally I have avoided things that require too much recording of colour for practical exams, and where recording colour changes would be appropriate in the observations, it’s only been worth a tiny quantity of marks. In terms of other teaching we can ensure that our diagrams and graphics are created using appropriate colour schemes where we have the option – we don’t have a choice if they are textbook graphics. I’ve been moving towards purple-orange, have never used red-blue as it’s always felt a little jingoistic (hah!).

Concepts, Calculations and Confusion

I’ve been trying to straighten out a few things in my head with regard to conceptual understanding of topics.  Prompted by numerous yet isolated things, I’ve been trying to get to the bottom of several frustrations I have when it comes to assessing learning.

  • Some can’t work out how to answer the question when it isn’t familiar (and likely rehearsed).
  • Some can answer the majority of a question, particularly something that relies on rehearsed procedures, but never get the ‘explain your answer’ bit. Or even seem to notice that it is part of the question.
  • Model answers seem to be very very popular.
  • It is significantly harder to assess the underlying concepts than to test their application. You have to identify the concepts.

It is a muddled list but seems to be interconnected in some way that I can’t yet untangle.

Actually I’m going around in circles and getting tied up in algorithmic problem solving versus genuine problem solving…I’m off to read more in the hope that the circles become more spiral-like with some sense of an end in sight.

Slow Creep and Revolution: Bringing about actual change

The more I go to education conferences, days, read journals and otherwise engage in the ephemerous activity of scholarship, the more cynical I start to become. There seem to be a couple of processes at work, both of which can be overall positive and appropriate responses to specific circumstances, but neither of which actually bring something new to teaching and learning. On one hand there is the slow creep of a good teaching innovation, winding its way through the communities, passed on through conferences, word of mouth and the odd publication (or blog post?). On the other hand there is the endless cycle of revolution where things fall in and out of fashion, never truly solving the issues they were tasked to address but rather amplifying the bits they don’t address until that becomes the focus of a revolution to bring back the thing before and change something.

A perfect example of slow creep is lecture recordings. As more and more institutions are taking up the technology, and more and more individual activities are giving it a go, there are lots of entry level studies being carried out. The general questions these ask are whether the recorded lectures helped students learn or just made them feel better about everything. The unspoken questions tend to be about specific activities in specific courses and places – did my students in my class like my recordings? Perhaps the question we should be asking revolves more around whether lecture recordings are an adequate crutch to justify the continued existence of traditional lectures. And then there is the argument that lecture recordings are some kind of gateway activity, do it and the world of flipped teaching will open up before you. Yes, and don’t forget to tell the students how to generate a decent set of note for revision while you try the new shiny style. We probably don’t need more presentations on how student feedback was really positive about lecture capture, access stats are a bit iffy (because it could have been a few students constantly reloading the page out of mischief or some computer glitch) but they show that they were used a lot in revision periods, but we can’t really decide whether there is any impact on performance because other stuff changed too. We do need people to critically consider recorded lectures and the behaviours they provoke in staff and students, and figure out whether this is something we want more of or not.

As for revolution (and the reason for this post), I stumbled across a publication by a couple of colleagues charting our change from assessed problems to seen class tests. We’ve now moved to unseen but heavily cued class tests (e.g. I email the cohort and tell them I’ll test them on topics A, B and C but not D – G, or they can expect questions ‘like’ questions 2 – 6 in problem sheet 2), but there is a subtle push back to submitting problem sheets and sooner or later someone’s going to pipe up that if we’re asking the students to submit them anyway, why not give some marks. Well the reason we got rid of them 10 years ago was apparently the level of collusion that was noted in the submissions. The fine line between collaborative learning and good old fashioned collusion strikes again.

What does new look like in teaching? Not learning and teaching, just the bit I can do, the teaching bit. I have no idea.




Miraculous Coalescence of Ability, Circumstances, and Fate

There’s got to be a better term than the title of the post. Perhaps the German’s have a word for it. And I’m sure you all know it well: that ill-defined point in the future where your ability, your circumstances and a healthy dose of luck will conspire to enable you to complete something to a previously unachieved standard. Consider the student who believes that this time their revision, ability and luck of the exam questions will result in A-grades instead of C-grades. Consider the academic who believes that this next vacation (and aided by an appropriate productivity hashtag on twitter) will result in all of those partially conceived manuscripts lining themselves up and exiting stage left into the editors’ submission box. Consider Del-Boy Trotter standing by Rodney…this time next year we’ll be millionaires.

Unfortunately January and February are the collision of realism with dark skies for those of us in the Northern hemisphere, and compounded by assessment periods in many institutions.  The brutal truth is that these miraculous periods rarely appear and we general get or do what we always got or did. It’s often made worse by the notion that as tasks get progressively harder and we find ourselves working longer/harder/smarter, that’s often only to stand still, or worse only lose a little ground. I’m sure we’ve all shared that feeling of indignation and disappointment that comes when we feel we’ve worked harder/better/longer than ever before only to achieve roughly what we’ve always achieved. As we’re now in the summer period, this is just as relevant, except with fewer clothes, beaches and ice cream. How many of us think ‘this will be the summer of doing all the things’?


We need to discuss Failure Bragging

It’s nearly resit season, and even when it’s not, how many times do we hear a student leave an assessment bragging about how much they’ve failed it. Of course, we don’t actually consider it bragging, I’m not sure what we do consider it. Is it a veiled form of apology for considering one’s efforts to be ‘not good enough’? Is it a rebuke on the assessors for setting something perceptibly challenging rather than predictable and achievable? Is it a socially acceptable norm to avoid seeming like one of the smart ones?

It is bragging, and it occupies the same category as apologising: something we have to stop.

How many times have we had work submitted and a student apologise and say something like ‘it isn’t very good’? And how many times have we wanted to express our exasperation and simply ask them why they submit it. I’m very much in favour of things being good enough: good enough work for the circumstances in which I find myself at the moment; good enough for the effort and time I felt this task required; good enough because I don’t know how to do it any better at the moment.  There’s a really good essay on apologising for perceived deficiencies in one’s effort here:

The essay is about apologising for one’s efforts and so it relates to failure bragging because both are behaviours designed to provoke the same response: reassurance. Reassurance or perhaps ego fluffing, it depends how kind I’m feeling how I regard it. How many times do we respond to a students’ exclamations of ‘totally failed that’ with reassurances that they are smart and probably haven’t and the world doesn’t end even if they have and to focus on the next thing?

And both are behaviours that the person expressing the sentiment has within their power to control.  The only failure is when you don’t learn and improve for the next time. The only apology is when you have genuinely wronged someone. Apology is weakened by overuse (particularly amongst the British) and genuine failure is something far more serious than feeling rubbish after an assessment requires.

So what is the alternative? Success bragging is equally irritating but perhaps a more considered approach is better. How about admitting the assessment was difficult and you’re going to wait and see what the mark is? How about admitting you found working out what to cook for your fancy dinner was tricky rather than apologising for the rubbish you’re serving? How about sparing us the hand-wringing, reassurance seeking remarks and just ask for support if you feel you need it.