Standards: Proof and verification
Modeling doesn’t have to be limited to quantitative areas. Models can be qualitative. A good subject to illustrate this is the concept of a “professional standard”. This is a widely discussed topic, but I’m hoping to look at it from the perspective of looking at a relevant class of theoretical models, proof systems, and use concepts from them to discuss professional standards and a particular legal standard potentially applicable to your professional work.
A great deal of computer science literature deals with “proofs”, not necessarily in the sense of proofs of theorems but simply of validation or attestation of a solution. How do you prove that the solution that comes from your “black box” is correct? This needs to be understood as a foundational question for computer science given that the goal of computing was to prepare calculations that could not easily be performed by humans.
The two main mechanisms for proof of validity are independent verification, reproducing the result using something other than the black box, or use of “witnesses”, pieces of information that make it computationally easier to verify the result. A prime factor of a composite number is a witness to the result that the number is composite. A candidate solution of a maximization problem is a witness to a lower bound on the maximum value.
Examples of independent verification should be familiar to most risk management professionals. These include manually walking through a calculation using a calculator, matching results with a spreadsheet, and external replication audits. Comparison of results against estimates based on a simpler model also constitute a form of independent verification.
One practical example of witness data is a checksum. If a sequence of numbers is followed by a number asserted to be the sum of the sequence, if reevaluating the sum produces a different result, this indicates that either the sequence has been altered or that the checksum did not relate to that sequence. For any real world data, it is also going to be improbable that two sequences produce the same checksum.
Another practical example of a process with witnesses would be double-entry bookkeeping. If (1) all known transactions are reflected, (2) the various balance equations on the accounts are maintained, and (3) the beginning and ending balances can be externally verified, this establishes that the accounting system documents the use of funds over the reporting period.
A more general mechanism for proofs is an interactive proof system. In an interactive proof system, you have one party serving the role of a “prover”, tasked with providing information to establish the truth of a result and a “verifier”, tasked with the responsibility of making the decision about whether the result is true. This is a very general framework and the details will change depending on who we assign these roles to and what kinds of capabilities they have.
Most of the literature on these systems gives very specific names to very specifically defined frameworks. I’m going to borrow some terminology from the Merlin-Arthur framework. This framework has mainly been used in relation to some specific results related to a form of proof called a “zero knowledge” proof which has applications in cryptography. However, even though the literature for this particular framework is narrow, it’s a colorful example which can help in discussing the concepts with a lay audience.
In this framework, our “prover” is Merlin and our “verifier” is Arthur. Merlin is a wizard and he can be treated as having unbounded knowledge and computational resources. He has the capability to answer any question, including follow-up questions asking him to show his work or identify his sources of information. However, Merlin is not necessarily reliable. We know that he’s very smart but we don’t know if he has an error rate in his answers or if he might act with ulterior motives.
Arthur, as “verifier”, is a decision maker with bounded knowledge and computational resources. Merlin is not vested with any legal, social, or institutional authority but Arthur’s authority and responsibility for decisions made under his authority are defining attributes.
Merlin and Arthur might be viewed as being a pair of computer programs, or they might be a computer program and a user, or they might even be an entry level analyst and a supervisor. All of this relates to a fundamental issue of how someone can safely rely on work that is provided by someone else.
Balance of power between Arthur and Merlin
One question that is investigated under this framework is, what kind of problems can Arthur solve in a reliable manner using Merlin’s potentially unreliable assistance? It turns out, that the class of problems that can be solved in this format is very large. It roughly translates into all problems where it is practically possible to show all of the work involved in solving the problem.
For these problems, it is always possible, in principle, for Arthur to interactively ask Merlin for supporting information and work out a process to either deterministically or statistically audit this information for errors, even if Merlin knows the procedure that Arthur will use to check the work and crafts his response to try to avoid detection. It does not matter if the problem is too hard for Arthur to solve on his own.
In the theoretical framework, it does not matter much how many rounds of interaction there are between Arthur and Merlin, but this result is based on the premise that Arthur can formulate questions that ask about everything that he may need to know depending on where his line of questioning takes him. That assumption could be weakened with a compensating requirement of more rounds of questioning, but still with an upper bound on the rounds required for the problem.
Broadly, this all means that a thoughtful and careful verifier can make use of an unreliable prover up to the limits of the verifiers capability to vet the work and elect not to use it when presented with unverifiable work.
Starting points for further reading:
- https://en.wikipedia.org/wiki/Formal_language#Formal_theories,_systems_and_proofs —Establishes background for formal treatment of proofs
- https://en.wikipedia.org/wiki/Interactive_proof_system — Compares range of frameworks that model one or more verifiers making a decision based on information supplied by other agents
- https://en.wikipedia.org/wiki/Probabilistically_checkable_proof — One of the more sophisticated proof frameworks which establishes some relationships between approximability and checkability and sets the expectation that statistical methods of verification can generally be derived from deterministic verification processes
- https://plato.stanford.edu/entries/epistemic-game/ — This topic can also be viewed in terms of game theory with prover and verifier interactions as moves in a game
- https://en.wikipedia.org/wiki/Agent-based_model — For a discussion of some concrete models on related topics
Professional standards can be viewed as documenting the rules which will be used by the appointed verifiers of professional accountability. Imagine if Arthur codified his methodology for interrogating Merlin into a list of constraints to ask Merlin to comply with when answering questions. These would be devised in a manner so that compliance is easy to verify and so that they make Merlin’s conduct more transparent.
This gives a clear explanation of the professional rubrics of qualification statements, disclosures, and documentation. These are specific responses to the requirements of the standards. Even if fully detailed evidence is not included in the work product, these statements assert that the professional is capable of providing additional supporting detail. Lying about qualifications, alterations of disclosures to omit specific considerations, or falsifying documentation all establish willful intent to deceive.
Unlike Arthur’s task of vetting Merlin’s work, professional standards are not necessarily crafted to make violations easy to detect. Some elements can be understood based on making the most egregious violations easier to punish when they are detected. Establishing intent
Professional standards are crafted to provide a very specific form of verification. Not all professional work products are subjected to full scrutiny, but when standards are followed, all work products include items which would assist in verification. There are also multiple verifiers that are considered: clients, regulators, in-company supervisors, external auditors, and professional counseling and discipline boards.
There is also an external standard that may sometimes be applicable to professional work. This standard is admissibility of expert testimony in legal trials in the US. I will now invoke the standard disclaimer used in all internet posts about law: I am not a lawyer. Everything I discuss is purely based on a cursory review of some literature on this topic and represents only my understanding and not legal advice.
The Daubert standard comes from the case of Daubert v. Merrell Dow Pharmaceuticals in 1993. The significance of this case is that the decision gives a standard for whether or not a judge should permit expert testimony as evidence. Expert testimony in court is a particularly persuasive form of evidence. If a plaintiff and/or defendant’s cases depends on expert evidence, the party that has the evidence excluded is likely to lose the case.
Prior to the Supreme Court’s decision on this case, the dominant standard for admissibility of expert testimony was the Frye standard dating back to 1923 when it was necessary to evaluate the admissibility of polygraph evidence. The Frye standard requires that expert evidence be based on scientific principles that “have gained general acceptance in the particular field in which it belongs”. My understanding of the sense of “general acceptance” is that you could prove that the Frye standard was met by showing that other experts in a field would agree that a principle is correct or widely used.
The intention of a “general acceptance” standard is to limit expert testimony to uncontroversial findings. In a sense, if you pick any expert at random out of this field, if the principle is generally accepted, you should get roughly the same answer about what it is, how it is used, and even the status of any opposing theories. If an expert is making unique or unprecedented claims and could not be replaced with a random expert, the evidence fails this standard.
The Daubert standard did not radically overturn this state of affairs, but it did add some additional nuance. The Daubert decision enumerated a set of “illustrative factors” to be evaluated by the judge to determine the admissibility of expert evidence. These illustrative factors are:
1. Whether the theory or technique employed by the expert is generally accepted in the scientific community;
2. Whether it has been subjected to peer review and publication;
3. Whether it can be and has been tested;
4. Whether the known or potential rate of error is acceptable; and
5. Whether the research was conducted independent of the particular litigation or dependent on an intention to provide the proposed testimony.
General acceptance is included as a component of these factors and can still be viewed as the most significant component of the standard. However, the degree to which all of the factors are met is considered and failing to meet a single factor may not disqualify evidence if the other factors are deemed to be strong enough. At the same time, a single factor being met may be overridden by considerations from other factors. From Frye to Daubert, the framing of the standard changed to emphasize the judge’s role as a verifier of specific considerations rather than limiting it to ensuring that the expert witness is representative of a population of potential expert witnesses.
The Daubert standard is not necessarily applied consistently but it can be recognized as an ideal of an empirical standard. Can you convince a judge, a person with a professional education in a different field, that your work is grounded in credible facts and should be treated as relevant to a matter that turns on those facts?
Professional standards v. Legal standards
Most risk management professionals would prefer to view outcomes as the most important external verifier of our work, but sometimes the verifier may very well be a judge.
This consideration should warrant a moment of reflection. There may be models that I personally would like to develop and apply in my work but where I would have difficulty demonstrating compliance with the Daubert standard even if I could demonstrate compliance with professional standards. Disclosure of experimental methods and applicable caveats may protect me within the profession, but they would not satisfy these illustrative factors. If I can’t point to publications, empirical evidence, and exposure of the specific model I’m using to peer review and validation, I would not be able to have my work used in a court case. What I think is the most predictive model may not be the same as the model that can be most credibly defended.
This may does not affect anyone’s day to day work, but it should definitely be treated as a constraint on how new ideas are generated and used within a profession.