The word “forensic” comes from the Latin word “forensis” meaning “of or before the forum.” In ancient Rome, an accused criminal and the accusing victim would present their cases before a group in a public forum. In this very general sense it wasn’t unlike the modern U.S. legal system where plaintiffs and defendants present their cases in a public forum. Of course the rules and procedures of the presentation differ from those days. Also, parties are typically represented by lawyers trained in the intricacies of all of these rules and procedures.
After deliberation, one party would be declared a winner. The party with the best presentation skills, regardless of innocence or guilt, would often prevail. The modern system relies on attorneys representing the parties to make the arguments rather than the parties themselves, under the assumption that lawyers, trained in law and skilled at presenting complex information, will each present their client’s case in the best possible manner and that ultimately a just outcome will occur. I don’t want to say that the truth will prevail, not only because that’s a cliché, but because there’s often some amount of truth in the arguments of both parties. Rather, more often than not, justice will be served.
I believe that this model works very well. Not perfectly, but very well. With regard to highly technical cases, however, the percentages for justice being served go down because the issues become difficult for a judge or jury to grasp. Technical experts can throw around technical jargon, sometimes without realizing it and other times to purposely cause confusion. This is the reason that I believed, when I started out examining code for intellectual property litigation, that two things were required to improve the analysis of software for the legal system.
- A standard method of quantizing software comparisons.
- A standard methodology for using this quantization to reach a conclusion that was usable in a court of law.
The Need for Software Forensics
Some years ago, when I had just begun developing the software forensics algorithms described in my book, The Software IP Detective’s Handbook, I was contacted by a party in a software copyright dispute in Europe. One software company had been accused of copying source code from another company. A group of software engineers had left one company to work for the other company; that’s the most common reason that software is stolen or accused of being stolen.
The plaintiff hired a well known computer science professor from the Royal Institute of Technology in Stockholm, Sweden to compare the source code. This respected professor who had taught computer science for many years reviewed both sets of source code and wrote up his report. His conclusion could be boiled down to this: “I have spent twenty years in the field of computer science and have reviewed many lines of source code. In my experience, I have not seen many examples of code written in this way. Thus it is my opinion that any similarities in the code are due to the fact that code was copied from one program to another.”
The defendant responded by hiring another well known computer science professor. This person was the head of the computer science department at the very same Royal Institute of Technology, the first professor’s boss. This professor compared the source code from the two parties and essentially her conclusion was this: “I have spent twenty years in the field of computer science and have reviewed many lines of source code. In my experience, I have seen many examples of code written in this way. Thus it is my opinion that any similarities in the code are due to the fact that these are simply common ways of writing code.”
The defendant did some research and came across my research and my CodeMatch software (which is a part of CodeSuite, a collection of patented computer code analysis tools) for comparing two sets of source code for signs of copying. They hired me. I ran a CodeMatch comparison and then followed my standard procedure. CodeMatch revealed a fairly high correlation between the two programs’ source code. However, there were no common comments or strings, there were no common instruction sequences, and when I filtered out common statements and identifier names I was left with only a single identifier name that correlated. Because the identifier name combined standard terms in the industry, and both programs were written by the same programmers, I concluded that no copying had actually occurred.
After writing up my expert report, what struck me was how much a truly standardized, quantified, scientific method was needed in this area of software forensics, and I made it my goal to bring as much credibility to this field as there is in the field of DNA analysis, another very complex process that is well defined and accepted in modern courts.
What is Software Forensics?
Software forensics is the examination of software for producing results in court. It should not be confused with digital forensics, also called computer forensics, which consists of tools and techniques for recovering and examining digitally stored computer files. Digital forensics requires significant understanding of low-level file structures but little understanding of the content of the files whereas software forensics is concerned with the content of executable software files, whether those files contain binary object code or readable text source code.
The objective of software forensics is to find evidence for a legal proceeding by examining the literal expression and the functionality of software. Software forensics requires a knowledge of the software, often including things such as the programming language in which it’s written, its functionality, the system on which it’s intended to run, the devices that the software controls, and the processor that’s executing the code.
Whereas a digital forensics examiner may attempt to locate files or sections of files that are identical, a software forensics examiner must look at code that has similar functionality even though the exact representation might be different. In patent and trade secret cases, functionality is key, and two programs that implement a patent or trade secret may have been written entirely independently and look very different. In copyright cases, and some trade secret cases, software source code may have been copied but, because of the normal development process or through attempts to hide the copying, may end up looking very different. Digital forensic processes will not find functionally similar programs. Software forensic processes will. Digital forensic processes will not find code that has been even minimally modified. Software forensic processes will find code that has been significantly modified.
In recent years I’ve been frequently disturbed by the poor job done by some experts on the opposing side of cases I’ve worked on. Sometimes the experts don’t seem to have spent enough time on the analysis, maybe because of their client’s cost constraints. Other times, the experts don’t actually have the qualifications to perform the analysis. For example, I’ve been across from experts who use hashing to “determine” that a file wasn’t copied because the files have different hashes. If you’re familiar with hashes, you know that changing even a single space inside a source code file will result in a completely different hash. While hashing is a great way to find exact copies, it can’t be used to make any statement about copyright infringement.
Most disturbing is when an expert makes a statement that’s unquestionably false and the only reason it could be made was that the expert knowingly lied to support the client. In one case an expert justified scrubbing all data from all company disks (overwriting the data so that it can’t be retrieved), the weekend after that expert’s client received a subpoena to turn over all computer hard drives. The expert claimed this was a regular procedure at the company. Another time an experienced programmer–the author of several programming textbooks–claimed that she could determine that trade secrets were implemented in certain source code files simply by looking at the file paths and file names. Yet another time a very experienced expert, after hours at deposition trying to explain a concept that was simply and obviously wrong, finally admitted that the lawyers had written his expert report for him. Although I was often successful, working with the attorneys for my client, to discredit the results of such “disingenuous” opposing experts, there were times when the judge simply didn’t understand the issues well enough to differentiate the other expert’s opinions from mine.
Is there a way to ensure that experts actually know the areas about which they opine and a way to encourage them to give honest testimony and strongly discourage them from giving false testimony? I believe there are, but each potential solution carries with it potential problems.
TO BE CONTINUED… In part 2 of this series Bob Zeidman will discuss ways that the legal system could require experts to honestly opine rather than merely giving whatever testimony the client needs.