Did Oracle Overlook the Smoking Gun in its Case against Google?

The decisions in the recent intellectual property lawsuit of Oracle v. Google [i] have drawn the attention of software developers and intellectual property lawyers alike. As we read about the verdict’s potential to shape future copyright case law, our team here at Zeidman Consulting also wondered whether all the facts in the copyright portion of the case had been uncovered. We decided to pursue these questions using the advanced tools for detecting copyright infringement created by our sister company, Software Analysis and Forensic Engineering (SAFE Corporation) and the thorough processes that we have developed. What started off as simple curiosity turned into an interesting research and analysis project to determine if we could uncover evidence of copyright infringement that Oracle’s experts had missed. Our two-week effort turned up some very surprising results–significant amounts of apparently copied code that was not brought up at the trial.

The Oracle v. Google Lawsuit

The lawsuit began with Oracle accusing Google’s mobile operating system, Android, of violating both patents and copyrights that Oracle holds based on its Java programming language. Specifically Oracle initially accused Google of infringing seven of Oracle’s patents [ii] though five were later thrown out [iii], and also accused Google of copying 37 Java language application program interfaces (APIs) [iv] and other Java source code into Android source code. This article focuses on the copyright portion of the case, leaving the patent infringement claims to a future article (if we have the time).

Oracle claimed that by violating Java copyrights Android reaped the benefits of the large Java programming community as well as a faster time to market. Google’s own chief Android architect Joshua Bloch, who had worked at Sun for eight years before moving to Google, admitted that Google probably copied Java source code. When a company copies software functionality from source code and wants to avoid copyright infringement, it must implement a formal development process called a clean room that separates the developers who have access to the original source code from those who are attempting to duplicate its functionality independently. Google claimed that a clean room was implemented, and Google CEO Larry Page claimed that company would take any copying or pasting of code “very seriously,” but Bloch said that Google had him work on Android code development despite having also worked on Java while at Sun [v].

Google countered that APIs are not creative and any similarity in the APIs is necessitated for the sake of conciseness and interoperability within the industry. Google also argued that the direct source code copying that was found was so small and easy to reproduce that it added “no or little value to Android.” The jury and judge essentially agreed with Google. The verdict absolved Google of any wrongdoing regarding the patent infringement allegations but was more nuanced when addressing the copyright violations. Judge William Alsup initially instructed the jury to consider any API copying to be a violation of copyright [vi] and the jury, under these directions, returned a guilty verdict regarding Google’s violation of Oracle’s Java copyrights. Judge Alsup then took the bite out of the verdict when he ruled that the structure, sequence, and organization of the Java APIs were not copyrightable. In the final ruling the only element Google was found guilty of copying was a nine line method named RangeCheck(). Our task was to analyze the code and determine if this method really comprised the only implementation of non-API source code that matched between Android and Java.

Our Challenge

The Java and Android projects are very large code bases. The version of Java at issue during the trial, and the version we analyzed, was J2SE 5.0 [vii]. It contains 127 megabytes of source code in 12,262 files [viii]. The version of Android at issue in the case and in our analysis was Android 2.2, also known as Frozen Yogurt or Froyo, and it contains 127 megabytes of source code in 17,062 files [ix].

The sheer number of files is an enormous amount of information for a human to analyze, and the dataset becomes truly enormous when considering that a thorough analysis involves comparing each file from the Java code base to each file in the Android code base. This amounts to 209,214,244 comparisons and becomes impossible to completely analyze manually for anything short of an army of software developers. Thorough analysis in situations like this is exactly what the CodeSuite tool, designed by our sister company SAFE Corporation, is intended for. One of the tools within CodeSuite is CodeMatch, and it is designed to compare thousands of source code files in multiple directories and determine which files are the most highly correlated, or most similar [x]. This automates the comparison process and allows the source code examiner to very quickly narrow down the files of interest. The CodeMatch algorithm recognizes statements, comments, identifiers, and instruction sequences in all major programming languages, including Java, and compares these programming constructs in an asynchronous fashion to identify similarity between source code files. The output of CodeMatch is a database which contains the correlation score of file pairs that fall within certain user defined parameters. Following our process for finding copyright infringement, the database is iteratively filtered using the CodeSuite tool, and false positives are ruled out, exposing any source code files with matching source code [xi].

History

To gain some perspective on these two code bases a brief overview of their histories may be useful. Java was first developed at Sun Microsystems in the early 1990s and released to the public in 1995. It is a hardware-independent, object-oriented programming language that runs via a virtual machine supporting its “write once, run anywhere” promise. The majority of Java is licensed under various open source licenses, such as the GPL (general public license) and the LGPL (lesser general public license) [xii]. Sun was purchased by Oracle in January 2010, which transferred all Java rights to Oracle. The Java source code download was trivial and hosted by Oracle on a webpage titled Java SE 5.0 Downloads [xiii]. We chose the Linux version of Java 2SE 5.0 as Android is based on Linux.

Android source code was initially developed by Android Inc., a company purchased by Google in 2005. The first version of Android was released in 2007. It is a Linux-based operating system for mobile computing devices. It incorporates open source licensing under the Apache License and the GPL [xiv]. The Android interface and applications created for Android are written in a customized version of Java, which is what caused Oracle to take exception. We downloaded the Android source code from an Android forum hosted by Google [xv].

Our Findings

Because of the large number of files, we ran the MP version of CodeMatch that runs on multiple processes on a multicore computer. We chose to run 10 processes on an 8-core machine, and it took just under 34 hours to complete. We took the results and performed our standard filtering to eliminate correlation due to reasons other than copying: common algorithms, automatically generated code, common identifier names, common author, and third party code.

Our results revealed at least nine file pairs exhibiting code showing significant literal copying of this code. Eight of these nine file pairs have the same name in each code base, while one file pair has nearly the same name. These files and their paths are listed below.

Java files:

jdk1.5.0\src\com\sun\jmx\snmp\IPAcl\ParseException.java
jdk1.5.0\src\com\sun\jmx\snmp\IPAcl\TokenMgrError.java
jdk1.5.0\src\com\sun\jmx\snmp\IPAcl\SimpleNode.java
jdk1.5.0\src\com\sun\jmx\snmp\IPAcl\JJTParserState.java
jdk1.5.0\src\java\util\concurrent\ArrayBlockingQueue.java
jdk1.5.0\src\java\util\concurrent\AbstractExecutorService.java
jdk1.5.0\src\java\util\concurrent\ConcurrentHashMap.java
jdk1.5.0\src\java\util\concurrent\LinkedBlockingQueue.java
jdk1.5.0\src\java\util\concurrent\PriorityBlockingQueue.java

Android files:

android-2.2-froyo\org\apache\james\mime4j\field\address\parser\ParseException.java
android-2.2-froyo\org\apache\james\mime4j\field\address\parser\TokenMgrError.java
android-2.2-froyo\org\apache\james\mime4j\field\address\parser\SimpleNode.java
android-2.2-froyo\org\apache\james\mime4j\field\address\parser\JJTAddressListParserState
android-2.2-froyo\java\util\concurrent\ArrayBlockingQueue.java
android-2.2-froyo\java\util\concurrent\AbstractExecutorService.java
android-2.2-froyo\java\util\concurrent\ConcurrentHashMap.java
android-2.2-froyo\java\util\concurrent\LinkedBlockingQueue.java
android-2.2-froyo\java\util\concurrent\PriorityBlockingQueue.java.

Matching functional code

In the Java version of the file ParseException.java, the method public String getMessage()is almost identical to the method of the same name in the Android version of the file ParseException.java. The following snippet from the method shows an example of this identical code.

String retval = “Encountered \””;

Token tok = currentToken.next;

for (int i = 0; i < maxSize; i++) {

if (i != 0) retval += ” “;

if (tok.kind == 0) {

retval += tokenImage[0];

break;

}

retval += add_escapes(tok.image);

tok = tok.next;

}

This kind of identical functional code between Java and Android is also seen in the file TokenMgrError.java. The following is a code snippet from an identical switch() statement within the method protected static final String addEscapes(String str).

default:

if ((ch = str.charAt(i)) < 0x20 || ch > 0x7e) {

String s = “0000” + Integer.toString(ch, 16);

retval.append(“\\u” + s.substring(s.length() – 4, s.length()));

} else {

retval.append(ch);

}

continue;

Matching Comments

The similarities between the files grew even more stark as we continued analyzing the files. Another similarity between the code bases were identical comments and misspellings within those comments, which in our experience is a sure sign of copying. For example, a snippet of a comment in ParseException.java shows an identical comment with the word ‘following’ misspelled in both files:

* this object has been created due to a parse error, the token
* followng this token will (therefore) be the first error token.

A comment snippet taken from TokenMgrError.java shows an identical comment and the words ‘lexical’ and ‘occurred’ misspelled in both Java and Android versions.

*    EOFSeen     : indicates if EOF caused the lexicl error
*    curLexState : lexical state in which this error occurred

Matching Whitespace

Another sign pointing to copying is that both ParseException.java and TokenMgrError.java have identical whitespace sequences including spaces, tabs, and new lines.

Other Clues

In addition to matching code, whitespace, comments, and misspellings, several of the Android files contain the following comment inserted among code that is otherwise identical to the Java file of the same name. An example follows as found in the Android version of the file ArrayBlockingQueue.java:

// BEGIN android-note
// removed link to collections framework docs
// END android-note

Another piece of evidence that points to copying between Java and Android is that several of these nine file pairs seem to have been written by the same person because an identical comment stating the author is found in several of the file pairs. The following comment appears in both Java and Android versions of the file ConcurrentHashMap.java:

* @since 1.5
* @author Doug Lea
* @param <K> the type of keys maintained by this map
* @param <V> the type of mapped values

Another comment appears in several of the Java files but not in the Android versions of the same file. The following comment is shown as it appears in the Java version of the file LinkedBlockingQueue.java:

* <p>This class is a member of the
* <a href=”{@docRoot}/../guide/collections/index.html”>
* Java Collections Framework</a>.

Changed Copyright Notices

Perhaps most interesting of all is that the two sets of files have comments showing different copyright notices and claim that the code is subject to different licenses. All nine of the files found in the Java source code have the same copyright notice, a short comment at the beginning of each file that reads:

* Copyright 2004 Sun Microsystems, Inc. All rights reserved.
* SUN PROPRIETARY/CONFIDENTIAL. Use is subject to license terms.

The Android files have a few different comment headers, only some of which are copyright notices. They include:

*  Copyright 2004 the mime4j project
*
*  Licensed under the Apache License, Version 2.0 (the “License”);

and

/* Generated By:JJTree: Do not edit this line. SimpleNode.java */

and

/* Generated By:JJTree: Do not edit this line. /Users/jason/Projects/apache-mime4j-0.3/target/generated-sources/jjtree/org/apache/james/mime4j/field/address/parser/JJTAddressListParserState.java */

and

* Written by Doug Lea with assistance from members of JCP JSR-166
* Expert Group and released to the public domain, as explained at
* http://creativecommons.org/licenses/publicdomain

The Java versions of the files ParseException.java and TokenMgrError.java contain the Sun copyright as shown above, but the Android versions carry the following copyright and license notice:

*  Copyright 2004 the mime4j project
*
*  Licensed under the Apache License, Version 2.0 (the “License”);

The Apache James Mime4j project is an open source project developed by the Apache James team that “delivers a rich set of open source modules and libraries, written in Java, related to Internet mail communication which build into an advanced enterprise mail server [xvi].” Based on this copyright notice appearing in Android’s source code and software developer comments on the internet [xvii], it appears that Google is using the Mime4j project in Android’s email client. In addition, the licenses conflict as the Java file uses the GNU General Public License and the Android files use the Apache License. Both copyrights are dated 2004, but the earliest release of software for the open source Mime4j project is actually May 3, 2005 [xviii]. While we can’t be sure that there was no development before this date, and we can’t determine the date that Sun began development, obviously only one of these copyright notices can be correct and the code can be covered by only one license.

Moving on to the code with Doug Lea listed as the author, the files AbstractExecutorService.java, ArrayBlockingQueue.java, ConcurrentHashMap.java, LinkedBlockingQueue.java, and PriorityBlockingQueue.java, we found that Doug Lea used to serve on the Executive Committee of the JCP (Java Community Process). He also chaired JSR 166, which was a Java Specification Request group under the JCP, andthis position is referenced along with his name in the above authorship comment. In addition Doug Lea announced his intention not to run for that position again on October 22, 2010 [xix], only a couple months after Oracle served Google with its infringement lawsuit [xx].

Finally, in reference to the Android files without any copyright comments, the files SimpleNode.java and JJTParserState.java, if a file has a copyright notice it is not acceptable to merely remove that notice and use the material. This may state the obvious, but a copyright remains in effect whether the notice is shown on the material or not.

These files that are all almost identical cannot be legally distributed with different copyright notices or licenses unless the copyright was transferred from one party to another, and we find no evidence that this was the case. One release must be a violation of the other’s valid copyright—one must have been copied and the other the original. Similarly, we are unaware of a way that software can be released under one license and later released under a different license.

Conclusion

In the copyright portion of the case Oracle claimed that Google had copied 37of Oracle’s Java APIs, eight decompiled files, and nine lines of source code [xxi]. In his ruling the judge dismissed the APIs and decompiled files as evidence of copying when he wrote, “So long as the specific code used to implement a method is different, anyone is free under the Copyright Act to write his or her own code to carry out exactly the same function or specification of any methods used in the Java API [xxii]. The judge found the nine lines of identical code in the RangeCheck()function to be, “so innocuous and overblown by Oracle that the actual facts, as found herein by the judge, will be set forth below for the benefit of the court of appeals22.” In the judge’s opinion Oracle did not posses compelling enough evidence that Google had copied Java when creating Android and the little copying that was discovered was found to be insignificant. In our opinion that evidence would have become much more compelling had Oracle discovered the source code similarities we have uncovered.

We extensively compared 12,262 Java files to 17,062 Android files using CodeMatch and found 9 files in each of the code bases that matched 9 files in the other code base almost entirely. In fact, 3,110 lines in these files were identical. These files had different copyright notices and claimed to be available through different licenses. We can determine that only one entity owned the copyright and once released through one license they could not be subsequently released through a different license. Therefore one set of files has the correct copyright license and the other does not. One set of files are the original ones and the other set of files are copies.

Did Oracle lose its case because it missed these examples of copying? Quite possibly. Not knowing all of the details of the case, there could be issues that we’re not aware of. However, from the facts we have found, it appears that Oracle did in fact blow it and missed 3,110 smoking guns.

To download the correlated Java files and Android files that were found and to see the reports that were generated by CodeMatch, go to http://www.safe-corp.biz/Oracle-Google.

Acknowledgements

We would like to acknowledge Steve Wu of Cooke Kobrick & Wu LLP for his help locating trial documents for this case.


Endnotes

[i] Oracle v. Google

[ii] http://news.cnet.com/8301-30684_3-20013546-265.html

[iii] http://news.cnet.com/8301-1001_3-57417144-92/android-java-and-the-tech-behind-oracle-v-google-faq

[iv] http://news.cnet.com/8301-1035_3-57418976-94/oracle-and-google-continue-sparring-over-apis

[v] http://www.theverge.com/2012/4/19/2961128/google-chief-java-architect-likely-i-copied-sun-code-in-android

[vi] http://www.economist.com/blogs/babbage/2012/06/oracle-v-google

[vii] http://opensource.com/law/12/6/oracle-v-google-and-api-copyrightability

[viii] http://www.oracle.com/technetwork/java/javasebusiness/downloads/java-archive-downloads-javase5-419410.html#jdk-1.5.0-oth-JPR

[ix] http://code.google.com/p/android/issues/detail?id=979

[x] http://www.safe-corp.biz/CodeMatch_algorithms.htm

[xi] http://www.safe-corp.biz/company_process.htm

[xii] http://en.wikipedia.org/wiki/Java_(programming_language)

[xiii] http://www.oracle.com/technetwork/java/javasebusiness/downloads/java-archive-downloads-javase5-419410.html#jdk-1.5.0-oth-JPR

[xiv] http://en.wikipedia.org/wiki/Android_(operating_system)

[xv] http://code.google.com/p/android/issues/detail?id=979

[xvi] http://james.apache.org/

[xvii] http://therning.org/niklas/2009/03/mime4j-used-by-androids-e-mail-client/

[xviii] https://github.com/apache/james-mime4j/commits/trunk?page=21

[xix] http://en.wikipedia.org/wiki/Doug_Lea

[xx] http://news.cnet.com/8301-30684_3-20013546-265.html

[xxi] http://www.wired.com/wiredenterprise/2012/05/oracle-v-google-no-steak-only-parsley/

[xxii] http://www.courthousenews.com/2012/06/01/Gcopyright.pdf

The Author

Evan Kovanis & Bob Zeidman

Evan Kovanis & Bob Zeidman

Warning & Disclaimer: The pages, articles and comments on IPWatchdog.com do not constitute legal advice, nor do they create any attorney-client relationship. The articles published express the personal opinion and views of the author and should not be attributed to the author’s employer, clients or the sponsors of IPWatchdog.com. Read more.

Discuss this

There are currently 41 Comments comments.

  1. Parched June 26, 2012 11:40 am

    What if Oracle was the party who changed the copyright notices? Did they intentionally not present this evidence because of their unclean hands?

  2. Tennille Christensen June 26, 2012 12:23 pm

    You wrote:

    “While we can’t be sure that there was no development before this date, and we can’t determine the date that Sun began development, obviously only one of these copyright notices can be correct and the code can be covered by only one license. ”

    “These files that are all almost identical cannot be legally distributed with different copyright notices or licenses unless the copyright was transferred from one party to another, and we find no evidence that this was the case.”

    I disagree – source code can be covered by as many different licenses as the copyright holder chooses to release it under.

  3. michael June 26, 2012 1:24 pm

    9 files out of 12,262 = smoking gun?

    sorry, but this rather proves the contrary: Google did not copy code in a relevant extend

  4. orcmid June 26, 2012 1:59 pm

    “[W]e are unaware of a way that software can be released under one license and later released under a different license.”

    This can happen if a license provides for it or if the copyright holder makes additional licenses on the same work. For example, the Sun/Oracle OpenOffice.org code base was available under the LGPL license. Since Sun/Oracle held the copyright, they were also able to make private license arrangements (e.g., for IBM Lotus Symphony using the code under a non-copyleft license), and to make additional public license grants (i.e., the grant of OpenOffice.org code to the Apache Software Foundation (ASF) which is releasing its version under the Apache License version 2.0).

    With regard to Apache James and its mime4j component, it is possible that a software grant was made that provides for release under a different license than the one used in the Java code base. I have no personal knowlege of the particular state of affairs beyond confidence that the ASF is determined with regard to assuring the provenance of code that is contributed to its projects. E.g., if the the code was contributed by the same author to both projects, that is possible so long as Sun did not require an exclusive transfer of copyright for its use in Java. I suspect that the Apache James project will find this analysis interesting. Note that Apache James mime4j is offered under the Apache License 2.0 by that project.

  5. Gene Quinn June 26, 2012 2:07 pm

    michael-

    Perhaps you didn’t really comprehend what you read. I would love for you to explain why the misspellings are not a smoking gun. Is it your contention that those comments are exact down to 3 misspellings as the result of coincidence?

    Of course there was copying here. My guess is that you are just either that ignorant or you didn’t actually read the article, choosing to voice your ill conceived comments without proper consideration of the facts.

    -Gene

  6. Carlos Woelz June 26, 2012 2:39 pm

    Gene,

    Please be more respectfull of people trying to contribute to the debate.

    There is a lot of code in Java that was contributed to Sun by others. The contribution is NOT exclusive, they can give it to others, including Apache. One example is mr. Bloch contribuion to Sun while he was already at Google. It was accepted by Sun an granted graciously by mr. Bloch, but he kept the rights of his own code.

    So to prove that Google copied from Oracle it is not enough that the code is the same. Not all of Java is exclusively owned by Oracle. Google my got the rights, like it did in mr. Bloch case, from the original authors.

    Carlos

  7. Gene Quinn June 26, 2012 4:30 pm

    Carlos-

    I am happy to be respectful of thoughtful comments, but the comment of michael was hardly that. What is said is simply flat wrong. He turned a blind eye to the clear copying and wants to pretend that this means that Google didn’t copy or that their copying was allowable under some unidentified theory. Of course there is copying here, and if he wants to think that 9 out of 12,262 proves nothing then that is his right to be incorrect. If this type of copying of comments and misspellings is found that is clear evidence of copying, it is that simple. This together with other facts can and does easily sway litigations.

    You are offering one reason to suggest that even if there was copying that it might not have been infringement. That is fine. We can all hypothesis and guess, but to say that there was no copying or not legally recognizable copying is simply not true.

    -Gene

  8. Bob Zeidman June 26, 2012 4:43 pm

    To Michael’s statement, copyright law does not consider percentages, so the number of lines is irrelevant. What is relevant is the significance of the lines. How creative was the effort” How difficult would it have been to write those lines independently? How valuable is the function those lines perform? You can’t copy a chapter of a book and claim it’s not copyright infringement due to percentages.

    To the others, you can hypothesize that the code was released under multiple licenses, but show me some evidence. We could find none. In any case, both licenses are so broad that there is no carve-out that would allow a different license. You can’t give one group of people a license to your code and then give that same group a different license with different requirements. At best that’s a legal mess. In any case, both licenses, to my knowledge, require that the license be stated in the code comments, and thus the later code would have to have two licenses stated in the comments, which it does not.

    -Bob

  9. Bob Zeidman June 26, 2012 4:49 pm

    One other comment about copied code. When a murder investigator examines a crime scene and finds a fingerprint, he doesn’t conclude that the finger committed the crime. The fingerprint is a significant clue, it’s not the whole story. When we find lines of code that have been copied, we conclude that more code was likely to have been copied but is difficult to find because of changes due to 1) normal development and/or 2) attempts to hide it.

    -Bob

  10. Simon Linder June 26, 2012 5:48 pm

    The analysis of potential copying is interesting, but this smoking gun argument was mostly irrelevant to Oracle’s case. Oracle convinced the jury that Google directly copied code in two of the three alleged instances mentioned in the special jury verdict form, and Judge Alsup overturned the jury verdict in the last instance as a matter of law because Google did not counter Oracle’s expert testimony on it. However, most of the billion dollars of damages was related to the structure, sequence, and organization (SSO) of the API packages, not the code itself. Oracle counsel did its job because the jury also found that Google copied the SSO of some of the Java API packages. The jury could not decide if copying the SSO would be a fair use, so even if the judge ruled the SSO copyrightable, there would be a new trial.
    Of course there was copying here. Dr. Bloch admitted he put in the rangecheck() code into Android and took it out before Android’s commercial release because he knew the code was from Java. On the other hand, Oracle stipulated that they would waive the damages from direct copying if the judge decided that the SSO of the API packages were not copyrightable. Oracle tried its best to convince Alsup that they were entitled to anything close to the billion dollars of damages based on the direct copying. Judge Alsup knows how to program, and he voiced his annoyance when a superstar lawyer like David Boies tried to present some directly copied code as more than a coincidence and a product of an insignificant amount of time and effort.
    That is not an exaggeration. Alsup told Boies that the directly copied code was “so simple” and that the copying “was an accident.” Boies argued that Oracle should receive part of the profits from Android device sales because the direct copying helped get Android to market more rapidly. Judge Alsup responded, “You’re one of the best lawyers in America. I don’t know how you could make that argument.” Alsup called out Boies again when Alsup found out Oracle wanted to ask witnesses about the relationship between the infringing code and Google’s profits; apparently Oracle didn’t cover that in the discovery phase. Source: http://bit.ly/Qe2NzS
    Sometime later Oracle and Google agreed to forgo the damages phase of the trial if Alsup ruled the SSO not copyrightable. In the end Oracle got no damages from patent or copyright infringement and is probably hoping that an appeals court will overturn the decision on the copyrightability of the SSO of the API packages.

  11. Curious Coder June 26, 2012 5:56 pm

    The article’s authors state that they “performed our standard filtering to eliminate correlation due to reasons other than copying: common algorithms, automatically generated code.” They then go on to flag four files under jdk1.5.0\src\com\sun\jmx\snmp\IPAcl\*, two of which clearly state at the start of the file that they are “Generated By:JavaCC: Do not edit this line”, and the other two which clearly state they are “Generated By:JJTree: Do not edit this line.”

    Further, the misspelled word “followng” in ParserException.java that appears in the Sun (Oracle) and cited Android file (actually Mime4J file, a third-party project not of Google’s authorship), is actually present in the original JavaCC source code from which these files are clearly derived (line 61 in JavaCC file ParserException.java, as available from http://java.net/projects/javacc/downloads/download/javacc-5.0src.zip; in fact, note that the entire comment block in which this misspelling is found in the Sun (Oracle) and mime4j files is identical to lines 59-61 in JavaCC file ParserException.java).

    Can the authors explain these analytical anomalies?

  12. Andrew June 26, 2012 6:23 pm

    I think both of these are examples of dual licensing.

    JSR 166 – which I was pretty interested in at the time is an example of Sun working with the community to produce the best result for software – it was all done in the open with multiple contributions – the best way to write software (http://www.jcp.org/en/resources/guide/166-casestudy).

    The source code is still publicly available (http://gee.cs.oswego.edu/dl/concurrency-interest/index.html). For example, here is ConcurrentHashMap (http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/main/java/util/concurrent/ConcurrentHashMap.java?view=markup)

    It is terrible that one of the better contributions to Java, using one of the better processes, using the best resources in concurrent software might be now used to extract money from Google by Oracle.

    It does seem like an example of one culture (Sun’s being open, community driven) and another (Oracle’s/lawyers being extracting the last dollar of value to hell with how it effects people).

  13. orcmid June 26, 2012 6:49 pm

    @Bob,

    Only a court will determine whether copying of the mime4j code was copied or had a common origin and that a copyright owner offered different, non-exclusive licenses. While it is puzzling when the same code is found under different license, it is neither unusual or illegal, assuming that it was done by someone having the right to do so.

    I repeat, there is code that is in Oracle’s LGPL code base for OpenOffice.org that occurs, without change except for license statements, in the imported code base for Apache OpenOffice. The Apache James mime4j code appears to have been imported under an appropriate software granty agreement or by a committer who authored the code and had the right to contribute it. Contributions to Apache are non-exclusive and do not transfer copyright, so the copyright holder could still go off and license it or derivatives yet another way.

    Different no-exclusive licensings of the same work are possible and not uncommon. There is no basis to assume that either occurrence is the result of an infringement; it takes more evidence than what is claimed in this article as far as the mime4j code goes. I don’t think it is useful to second-guess Oracle in choosing what weapons to put forward in its battle with Google.

  14. Bob Zeidman June 26, 2012 7:13 pm

    orcmid,

    You actually have it backwards. This is what my company does–determine whether copying occurred. It is not up to a lawyer to determine whether code was copied (they don’t have the expertise to make that determination). That is up to an expert like myself (though experts may disagree). It is then up to the lawyers to argue whether that copying constituted copyright infringement and it is up to a judge or jury to ultimately determine whether copyright infringement took place.

    Licenses do not transfer copyrights, they grant rights from the copyright holder. Therefore, there should not be two different copyright notices on the code, even if there was some strange reason to release the code under two different licenses. Copyrights can be transferred, but not just by putting your copyright notice on it. There has to be an agreement. So far you and the others who defend Google have produced nothing to support your conjectures.

    -Bob

  15. David Wilkins June 26, 2012 7:22 pm

    Reading the article, I decided to see what happens on putting a line of the claimed copied code into Google. This led me straight here:

    http://www.w3.org/WAI/Resources/Tablin/source/Tablin/src/org/w3c/wai/tablin/parser/html4/

    This location contains files called ParseException.java and TokenMgrError.java

    These two files on the website of the World Wide Web Consortium contain the matching code from the Java and Android codebase identified in the article, together with the mis-spellings in the comments.

    Further poking around on the http://www.w3.org website reveals versions going back to 1999, at least.

    The code was apparently developed as part of a W3C open development project called the Web Accessibility Initiative. See

    http://www.w3.org/WAI/

    The files on the W3C website have no copyright notices embedded in them. There is absolutely nothing to suggest that Sun had anything to do with the development of these files.

  16. orcmid June 26, 2012 8:07 pm

    Bob,

    I agree that you find where code is the same. Establishment which is the copy or whether they both have a common origin requires fact not in evidence in the inspection.

    I agree what licenses do. However, Sun commonly required non-exclusive copyright transfers from non-employee contributors. I have no idea how that figures in the case of the mime4j code, and one would need to know the ultimate provenance to know. In addition, ASF does not require copyright, but only a license, and copyright notices are preserved on contributed code unless the contributor removes their own notices or permits their removal. Of course, where more than one copyright holder exists, it is more complicated to pursue a claim of infringement.

    Furthermore, if it happens that the provenance of the Apache James mime4j code is unclean, that does not make Google an infringer in relying on the Apache license. Something would need to be done, but it is not of any use to Oracle in claiming a Google infringement.

  17. David Wilkins June 26, 2012 8:11 pm

    A small amount of Googling landed me here:

    http://gee.cs.oswego.edu/dl/concurrency-interest/index.html

    Concurrency JSR-166 Interest Site
    Maintained by Doug Lea

    To join a mailing list discussing this JSR, go to: http://altair.cs.oswego.edu/mailman/listinfo/concurrency-interest. (Archived postings may also be found at MarkMail’s searchable archives.)

    While JSR166 has completed and is a now final approved JCP spec, the expert group remains involved in incremental improvements and changes to the java.util.concurrent package and related classes and packages.

    […]

    Related information

    For the official JSR166 proposal see the JCP web site.

    The initial contents of JSR166 were released as part of J2SE 5.0 (“Tiger”) , mostly in new package java.util.concurrent.

    Sources for all classes originated by the JSR166 group are released to the public domain, as described at http://creativecommons.org/licenses/publicdomain. This includes all code in java.util.concurrent and its subpackages (except CopyOnWriteArrayList), as well as java.util classes Deque, NavigableMap, NavigableSet, Queue, AbstractQueue, PriorityQueue, and ArrayDeque. Additionally, the JSR166 effort included modifications of openjdk versions of a few other java.util classes including TreeMap, AbstractMap, LinkedList, and Collections, which carry GPL+Classpath exception licenses.

    Many of our internal functionality and performance tests are also in the CVS repository, at http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/test/loops/. Other (overlapping) sets of tests designed for use in openjdk testing may be found in http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/test/tck/ and http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/test/jtreg/. Some are poorly documented, and some target only specific functionality or performance issues so may not be useful indicators. (A unix “runscript” is available to run most in “loops”). Still they may be useful for tracking changes, comparing VMs and VM options, or exploring alternative implementations.

    The book Java Concurrency in Practice by Brian Goetz, with Tim Peierls, Joshua Bloch, Joseph Bowbeer, David Holmes, Doug Lea. Addison-Wesley, 2006 describes use of java.util.concurrent.

    Slides from the JavaOne Collections Connection BoF (PDF) (PPT) describe Tiger and Mustang collections features, including concurrent collections.

    There is a a case study report about the JCP process used in JSR166.

    There is a technical paper on the JSR166 synchronizer framework appearing in the 2004 PODC CSJP Workshop.

    Expert group member Brian Goetz has written several on-line articles about java.util.concurrent features, as listed at his publications page.

    There are some out-of-date but mostly still accurate slides on highlights of JSR-166.

    The initial contributing members of the JSR166 expert group are Josh Bloch, Joe Bowbeer, Brian Goetz, David Holmes, Doug Lea (spec lead), and Tim Peierls. Martin Buchholz and Bill Scherer joined as members in maintenance and extension efforts. Others contributing notable guidance and expertise include Dave Dice (Sun), Cliff Click (Azul), Steve Dever (Sun), and Bill Pugh (UMd). Thanks also to the many other people contributing ideas, reviewing APIs and code, and testing out pre-releases.

    Doug Lea

  18. Stephen Wu June 27, 2012 1:21 am

    orcmid wrote, “Furthermore, if it happens that the provenance of the Apache James mime4j code is unclean, that does not make Google an infringer in relying on the Apache license. Something would need to be done, but it is not of any use to Oracle in claiming a Google infringement.”

    Not so.

    Innocent infringement could have taken place if Google copied from Apache in good faith where Apache made unauthorized copies from Sun. Nonetheless, innocent infringement is not a defense to a copyright infringement claim. It merely acts as a reason to reduce statutory damages (where those are recovered). 17 USC 502(c)(2). Innocent does not absolve the innocent infringer of all liable, and under the above scenario Google would be infringer.

    I don’t know if the Apache mime4j is unclean, but if it were, Google’s copying from Apache, even under an Apache license, still permits Oracle to make use of Google’s conduct to assert a claim.

  19. Robert White June 27, 2012 3:48 am

    Way to find Sun putting their copyright on someone else’s work that was released into the public domain there guys…

    I’ll let someone else pull the actual trigger on embarrassing you because I like to watch. 😎

    Smoking gun indeed…

  20. Mark June 27, 2012 4:46 am

    You may be using a footgun here.

    There are some indications that the files concerned were in the public domain in 2003 and Sun changed the copyright notices to their own in 2004, so it would be Sun/Oracle doing the misappropriation if anyone!.

    There are many perfectly lawful reasons why these files may be almost the same, yet you seem to have gone all tin foil hat instead of doing the research into the origin of the files. Doug Lea appears to have been a member of some sort of Java Working group and may not have been a Sun employee, so copyright/ licensing of his code would have been his to decide.

    All in all it seems Oracle are/were well advised to steer clear of racking up a probably futile legal bill over these files.

  21. Gene Quinn June 27, 2012 12:57 pm

    Just to echo Stephen… there is no such thing as an innocent infringer defense to copyright infringement. Copyright infringement is strict liability. If you copy you are liable, period.

    If you are found to infringe then there may be a fair use defense that could be asserted, but that is only relevant when copyright infringement has already been established.

    -Gene

  22. Wang-Lo June 27, 2012 2:29 pm

    Bob:

    You have examined two code bases and found 3110 matching lines.

    If you are going to comment on the possible implications of this salient fact, it is your responsibility as an expert, an educated person, and a scientist to behave honestly. You must at least explain that one base could have been copied from the other, or the other from the one, or both from another, unexamined, code base.

    Instead you chose to imply only that one code base must have been copied from the other, and further to imply that Oracle, and its hired experts, are not as smart as you are, because they failed to jump to the conclusion that the direction of the copying was from Java to Android.

    Although SAFE and its CodeSuite product remain on my qualified vendor list, you can be sure that I will recommend against using Zeidman Consulting in any capacity. My clients need to hear both accurate facts and unbiased interpretations of those facts. One out of two is not sufficient..

    “A” consultancies attract “A” clients. “B” consultancies attract sensational publicity, contentious comments, and “C” clients.

    -Wang-Lo.

  23. Bob Zeidman June 27, 2012 3:44 pm

    Wang-Lo,

    If you read the article you can see how we came to our conclusions about which code base was copied from the other. it is all based on the evidence. You have supplied only conjecture about what might have happened, but no one here has actually produced facts showing an alternative. There are certainly other explanations–two independent coders may have independently written the exact same 3,110 lines of code in the exact order. Or both sets of code may have been randomly written by a supreme being. However, court requires reasonable conclusions. While I have left the door open to other explanations, the only reasonable alternative explanation is that Sun put its copyright on code that it didn’t have the right to do so. However, given that there is no proof of this, my conclusion stands.

    As an educated, honest scientist, do you believe that the speed of light is a constant? And yet, you have probably not measured it yourself. Nor have you measured every single photon in the universe to be sure that there is not one that travels faster. However, as a rational scientist, you gather the facts and come to the most reasonable conclusion possible until another fact is discovered that contradicts your belief.

    -Bob

  24. David Wilkins June 27, 2012 4:40 pm

    Bob Zeidman wrote: “but no one here has actually produced facts showing an alternative”.

    What do you mean by this.

    Files ParseException.java and TokenMsgError.java were proved to have been developed as part of an open source project by the World Wide Web consortium in the late 90’s. The W3C releases most of its stuff under a liberal open-source licence (the W3C Document licence), and any code is copyright of the contributors. The files certainly predate any 2004 Java copyright, and there appears to be absolutely no evidence on the W3C website that the files in question were contributed by Sun, or that Sun made any contribution whatsoever to the relevant W3C project.

    In the face of the evidence that this is old code developed in as part of an open source project within the W3C, why would it be considered plausible that Sun, and afterwards Oracle, would own the copyright to this code?

    And the other fact is that Doug Lea on his university website specificly stated that:

    “Sources for all classes originated by the JSR166 group are released to the public domain,”

    Moreover Doug Lea is a professor at the State University of New York at Oswego. No doubt copyright in code developed by employees of Sun or Oracle would now beling to Oracle. But why is it considered plausible that Oracle would own the copyright on code that was not authored by Sun Employees but was contributed by academics and others? Why would the developers of the java.util.concurrent codebase not have the authority to release the code that they themselves wrote into the public domain?

  25. Bob Zeidman June 27, 2012 4:50 pm

    Mark,

    I like the “footgun” reference–I hadn’t heard that term before.

    If Sun did inappropriately put its copyright on files that Doug Lea created, then Sun screwed up. However, don’t you think Google would have pointed that out in its expert report? It would have shown unclean hands and thrown doubt on Sun’s ownership of all Java files. This all means that very likely one or both parties missed these significant files.

    -Bob

  26. David Wilkins June 27, 2012 5:10 pm

    Oracle identified 51 Java APIs that had been implemented in Android. However it transpired that 14 of those APIs had not been authored by a Sun Employee, and therefore Oracle did not hold copyright in them. Therefore the issue considered at trial was the infringement of the remaining 39 APIs. These facts were known to those following the trial, and are referenced in the second paragraph of the following Groklaw article:

    http://www.groklaw.net/articlebasic.php?story=2012042619534619

  27. Rocky Hinten June 27, 2012 5:52 pm

    Bob,
    You made this comment in response to Wang-Lo: “If you read the article you can see how we came to our conclusions about which code base was copied from the other. it is all based on the evidence.”
    Actually you do make an unfounded conclusion, which Wang-Lo was trying to point out, that is not supported by evidence. You have a series of four sentences where you are trying to progress in a logical chain, each building on the one before, but there is a point where you make an assumption that is not fact-based.

    1. “These files had different copyright notices and claimed to be available through different licenses.”
    –Right. They each had different notices.
    2. “We can determine that only one entity owned the copyright and once released through one license they could not be subsequently released through a different license.”
    –This is not certain. As several people have tried to point out to you, it is not too unusual for the copyright holder to release material under multiple licenses. This isn’t the real logic problem yet, though.
    3. “Therefore one set of files has the correct copyright license and the other does not.”
    –Or they could both have the wrong copyright notice. You can’t assume one is correct at this point.
    4. “One set of files are the original ones and the other set of files are copies.”
    –Here’s the biggest leap. Where is your factual basis for declaring that one of these is “the original”? They could have both been copies from a different original source. (For which others have actually done the research and found the original–Doug Lea, released in the public domain.)

    I did think it was funny the projecting you did toward Wang-Lo with “You have supplied only conjecture about what might have happened.” Uh, no. He just pointed out your conjecture and made none of his own.

  28. Robert White June 27, 2012 5:58 pm

    Before claming the files were missed, I suggest you go re-read the transcripts very carefully. You did read the trial transcripts before you posted this “missing” and “overlooked” information didn’t you?

    I suspect not…

    But again, being mean and liking to watch, I will just strongly hint that actual lawyers and experts -just- -might- have known about these files and even mentioned them in deposition and/or trial…

    I’ll let you find what foot-bullets there may be all by yourself. 😎

    *popcorn*

  29. Sidney Markowitz June 27, 2012 6:11 pm

    In addition to the reference in the Groklaw article David WIlkins linked to in the comment 26, the files written by Doug Lea in java.util.concurrent are mentioned by Google in their re-cross of Oracle’s witness Dr. Reinhold as summarized by Groklaw’s reporter at the trial that day:

    http://www.groklaw.net/article.php?story=20120427122707710

  30. Wang-Lo June 27, 2012 6:33 pm

    If you had followed SCO vs. IBM you would have encountered the term “footgun” many times.

    You also would have observed the fate of an litigious egotist who discovers evidence of identical lines in two different code bases and, like you, immediately assumes that one particular party illegally copied those lines from the other.

    -Wang-Lo.

  31. Sidney Markowitz June 27, 2012 6:35 pm

    Bob (in reply to comment 25),

    Oracle dropped their claims for the 14 API’s, which includes the five copied files by Doug Lea that you found, once they discovered that Sun did not own the copyrights. Nobody overlooked those files, other than Oracle overlooking the correct copyright when they first asserted their claims. But they caught Sun’s error and dropped them from the case before it got to the point of Google being able to take advantage of the error.

    Furthermore, if you read the detailed reports of the trial, especially the cross-examination of Oracle’s experts, it looks like they ran software similar to the forensic service that you are touting. That’s how they found the eight files from a test directory and the nine lines of code and comments repeated in two files that were admitted by Google to have been directly copied out of the thousands of files and millions of lines of code in Java and Android.

    The only difference between what they did and your heroic efforts to process all those files using all those computers appears that they noticed (though somewhat late for the Doug Lea files) that Sun did not own the copyright for everything that matched, and did not include in the case the files they did not own. I guess Oracle has software people who can process all that many files through all that many computers as well as your company can.

  32. David Wilkins June 27, 2012 8:03 pm

    From Week 2, day 6 of Oracle v. Google. Groklaw article here:

    http://www.groklaw.net/articlebasic.php?story=20120423121100882

    Judge: Sometime around 2008 I’m assuming, Google identified 51. (Yes) And some of the 51 are in the public domain.

    Google: Testimony of Bob Lee. Wrote java.util.concurrent.* Those packets were dedicated to the public.

    Judge: Of the 37, then, 4 are in the basic programming language?

    —-

    Next from cross-examination of Oracle’s expert witness, Mark Reinhold on Day 10 of Oracle v. Google.

    http://www.groklaw.net/articlebasic.php?story=20120427122707710

    Dr. Reinhold: Yes. [Google gets him to say that a stub implementation has about 10,000 lines, out of about 5M lines of code.]

    Dr. Reinhold: Yes, 10,000 very expensive lines of code. Google: In addition to the 37 accused packages there are 14 other packages that are in Android too.

    Google: java.util.concurrent written by Doug Lea?

    Dr. Reinhold: Yes.

    Google: Donated it to the public?

    Dr. Reinhold: Yes.

    Google: When you looked at the source code to see if anyone else owned it. You mentioned some. But there are some others too though?

    Dr. Reinhold: Not sure.

    Google: What about java.awt.font?

    Dr. Reinhold: Yes sir, but my understanding is that’s not in Android.

    ============================================

    Thus for java.util.concurrent.*

    What about JJTParserState.java ?

    This seems to be part of JJTree, which in turn is part of JavaCC.

    It seems that JavaCC was created at Sun in the ’90s, but the developers set up their own company. JavaCC became an open source project.

    BSD Licence.

    Wikipedia: http://en.wikipedia.org/wiki/JavaCC

    JavaCC Home Page: http://java.net/projects/javacc
    http://java.net/projects/javacc

  33. bjd June 27, 2012 8:34 pm

    To Mr. Zeidman:

    In post no.9 you talk about a crime scene being examined, and fingerprints.
    That is a tendentious remark which fatally colors your slant in this story.

    Because there is no crime scene here [yet] — we only have two fellows
    calling out CRIME!

    I know it’s subtle, but to me, and IANAL, it so ‘quite different’ from a ‘crime
    scene being examined’ for fingerprints that, again, this fatally colors you.

    bjd

  34. Sidney Markowitz June 27, 2012 8:46 pm

    Bob,

    About one of the points in comment 27, quoting you as saying

    “We can determine that only one entity owned the copyright and once released through one license they could not be subsequently released through a different license.”

    Actually the Sun Contributor Agreement under which people contributed code to Java says that joint ownership of the copyright is given to Sun, i.e., two entities end up owning the copyright, each with independent full rights. Thus Sun was in their rights to replace the copyright notice and distribute under a different license, while Google would have been within their rights to use the copy they got from the other copyright holder under the license granted by that holder.

    Sun’s FAQ on the Sun Contributor Agreement in which there is a link to a PDF of the agreement which says in part “With respect to any worldwide copyrights, or copyright applications and registrations, in your contribution: you hereby assign to us joint ownership,”

    http://sosc-dr.sun.com/software/opensource/contributor_agreement.jsp

    From the FAQ

    Q:
    What can Sun do with my contribution?

    A:
    Sun may exercise all rights that a copyright holder has, as well as the rights you grant in the SCA to use any patents you have in your contributions. As the SCA provides for joint copyright ownership, you may exercise the same rights as Sun in your contributions.

  35. Carlos Woelz June 27, 2012 11:20 pm

    I believe the case was already well discussed, and deserves a follow up, either correcting the flashy headline or pointing to code that really belongs to Sun and was copied.

    The copyright notices are not proof of ownership, as the Oracle v. Google trial showed.

    I guess that just because Google is agressive in its copyrights theories, and makes money out of other people’s content (legally or not), it does not follow that it is so careless with other people’s code. On the contrary, the evidence is that it is clean of Oracle code, impressive for a project this size. For proprietary code we never know, as we cannot audit the code like Bob just did.

  36. Robert White June 28, 2012 3:32 am

    So when does someone put an update on the article that says “whoops, nothing to see here. Sorry guys.”

  37. Gene Quinn June 28, 2012 9:27 am

    bjd-

    I know the Internet is prone to extreme over exaggeration and a healthy amount of paranoia, but you need to grow up. If you can’t participate in a meaningful debate then go elsewhere.

    Everyone who is being honest with themselves knows exactly what Bob was talking about at comment 9. The fact that you are trying to make it more than what it is to suit your own predetermined notions is a YOU problem.

    Experts need to approach a task in a certain manner. They are asked questions. They analyze the information and provide answers. That was what Bob was clearly pointing out. He did not say there was a crime. He was trying to explain the approach in a way that anyone in the Law & Order generation can and should understand. The fact that you chose to take this somewhere else speaks volumes about your integrity, intellectual honest and bias.

    When you are living in a glass house don’t throw stones.

    -Gene

  38. Wang-Lo June 28, 2012 3:21 pm

    Carlos, Robert:

    There won’t be any update, or correction, or retraction, or apology.

    The article was written as an excuse for a sensational headline to sell a press release to undiscriminating MarketWire subscribers. Now that it has served its purpose it will be abandoned by its authors. It’s yesterday’s news. The MarketWire audience has no interest in a followup story containing the truth.

    “Did Oracle Overlook the Smoking Gun in its Case against Google?” was featured by Bloomberg, Technology Digital, Business Review USA, and others.

    “Ignorant Blowhards Publicly Humiliated After Misinterpreting Their Own Analysis” just wouldn’t attract as many eyeballs.

    -Wang-Lo.

  39. Sidney Markowitz June 28, 2012 5:12 pm

    (reply to comment 37)

    Gene,

    To continue in the analogy Bob uses in comment 9, he is touting his company’s fingrprint-finding service by showing how his technology has found two fingerprints at the scene that were not used as evidence at trial, with a headline “Did the plaintiff miss this important evidence?”. But the trial testimony shows that the search for fingerprints used technology every bit as effective, that one of those two fingerprints was actually in the original claims and was dropped early because it was not relevant, and the other one is not relevant for the same reason and was never introduced.

    Bob’s article is an ad for Codematch. It attempts to show how his technology manages a task that is far beyond the ability of humans to do by hand. But it sensationalizes a result that is not sensational. What would take an army of software developers a huge amount of time to do by hand can be accomplished by Codematch running 10 processes on an 8 core computer 34 hours. Or maybe Google could have thrown a few thousand machine on the problem for a week using their existing search technology to do do it “inefficiently” without having to write any new special code. Or Oracle could have thrown the files into a big database (not a huge one for enterprise scale databases) and use a few hundred machines doing database queries for a week to get the same result. My point is that even if Codematch does its job 1000 times faster than Google or Oracle could do it using the technology that each of them is known for, that just requires them to throw a few hundred 8-core computers at it for 100 hours and job done and they continue with their preparation for the case. And most importantly, the trial testimony shows that they did have all the information that Bob is making such a big deal of.

    Bob may know his code matching software, but he wrote a sensationalist article without checking how his technological results related to the details of the court case, without doing the further checking of the results that is necessary to understand the meaning of the results, and without digging into the legal background necessary to understand the results (e.g., “there can only be one copyright owner” when the Sun Contributor Agreement says it creates two independent copyright owners on every contributed file).

    But he did get some PR for his Codematch software. Hey, it was mentioned in Bloomberg. I guess he did something right.

  40. Rocky Hinten June 29, 2012 1:13 am

    Good point, Sydney. Codematch may be great software, but I’m not a fan of the ill-informed quasi-legal opinions that the authors are drawing from the code comparison results.