“Demographic data is less worth collecting at all unless it can be collected in a way that is statistically representative and useful for analysis and policy making.”
The Day One Project recently released over 100 proposals for the Biden-Harris administration to use as roadmaps in crafting science and technology policy. One of those proposals, a Transition Document for the United States Patent and Trademark Office (USPTO), recommends an important and specific step forward for the growing policy agenda on diversity in U.S. innovation.
The USPTO should undertake a pilot program for mandatory collection of demographic data from patent and trademark applicants. This recommendation is a conscious break from past public commentary, which has often urged data collection on a purely voluntary basis.
Studying USPTO applicant diversity is a genuine policy challenge with competing constraints. Although the favorable political consensus to address it is relatively recent, its origins go back a decade to the 2011 America Invents Act (AIA). On balance, the benefits of mandatory data collection are substantial, whereas privacy concerns can be resolved without squandering those benefits with voluntary data collection. The discussion here focuses on patent data, but the arguments also translate readily to the trademark system.
Growing Interest about Diversity in Innovation
There is now significant interest in systematic research on demographic diversity across the U.S. innovation system. The USPTO-driven National Council for Expanding American Innovation is both a reflection of this interest and a focal point for advancing it, especially after the agency’s public outreach under the SUCCESS Act. In addition to the USPTO’s final report to Congress in October 2019, the Chief Economist of the USPTO has also published the companion paper Progress and Potential, first in February 2019 and updated in July 2020 with an empirical profile of women’s participation in U.S. patenting.
Meanwhile, Senators Mazie Hirono (D-HI), Thom Tillis (R-NC), and Chris Coons (D-DE) recently asked the USPTO for details about gender disparity in the patent bar as reported in a 2020 scholarly article by Mary Hannon as well as prior studies from 2011 and from 2014. The USPTO’s response specifically suggested the possibility of further empirical study.
Finally, beyond innovators and their representatives in the bar, empirical research on diversity has also extended to the USPTO’s own examiner corps. For example, a 2019 working paper by Deepak Hegde and Manav Raj has shown statistically significant differences between male and female patent examiners as to diligence, average work quality, and quantity of output—as well as in their respective likelihoods of promotion and work preferences. In each of these contexts, diversity is of growing policy interest, and gender diversity is especially salient.
The Problem of Reliable, Replicable Data
What these and various other studies share is a common set of difficulties in ensuring that the underlying data is reliable, replicable, and capable of supporting statistically sound inferences. For example, the 2014 study on gender diversity in the patent bar began with the entire public register of attorneys and agents admitted to the USPTO, but gender was only estimated from first names, not confirmed. Moreover, although the data dictionary used for inferring gender was a publicly available report from the Census Bureau, that report was published in 1995—based on data that is now four decennial census periods old.
The 2019 patent examiner study by Hegde and Raj improved on this considerably by relying on a professional vendor with a database of over 1.2 million unique personal and family names. This makes for higher-quality estimation, but the vendor’s proprietary algorithm creates a barrier to replicability, which is especially important for policy making.
Even the USPTO’s response to Senators Hirono, Tillis, and Coons was candid about the limited quality of the agency’s statistics, noting that they “do not show the complete gender data” but do provide “some insight into the possible gender breakdown” of the patent bar. The analysis was based on the use of honorific prefixes among patent bar applicants, with “Mr.” and “Ms.” corresponding to men and women. One problem is that the relevant USPTO form offers not two choices but five: “Mr.,” “Ms.,” Mrs.,” “Miss,” and “Dr.” In particular, the final option “Dr.” is not likely to be evenly distributed between men and women across different technology areas. Moreover, all of this information is optionally self-reported, creating further potential for non-random bias in the data.
The Call for Mandatory Data Collection
The USPTO is well aware of these data reliability issues. Back in March 2012, the agency published a methodology for studying the diversity of applicants as required by Section 29 of the AIA. That methodology included matching public patent data with confidential Census Bureau data to determine demographic attributes including race, gender, veteran status, age, economic status, education, geography, and much else.
However, as later reported in a June 2015 memorandum, even this data-matching effort was “only partially successful.” The relatively basic information in USPTO data—name, town, and state—was not enough to disambiguate inventors, especially common names in large population centers. Even with full and detailed demographic data at the Census Bureau, the USPTO’s own sparse data collection allowed only a modest-quality match of 64.3%. That is, only 64.3% of U.S.-resident inventors listed in USPTO patent data could be matched to Census Bureau data.
That memorandum highlighted the likely risk of statistical bias in voluntarily self-reported data. It recognized strong support among public commentators, especially the influential view of the American Intellectual Property Law Association (AIPLA), for voluntary data collection—but also noted that none of the respondents who voiced concern about respondent privacy addressed the problem of data quality.
Importantly, the memorandum concluded that “for the USPTO to study patent applicant diversity further, there must first be a resolution to the tension under current law between the statistical rigor of mandatory surveys and the public support and existing authority for voluntary surveys.”
Statistical Rigor amid Privacy Concerns
The tension that the USPTO identified may be summed up like this. Demographic data is less worth collecting at all unless it can be collected in a way that is statistically representative and useful for analysis and policy making. Thus, the problem with voluntary approaches is that they would almost surely suffer from selection effects, including self-selection among respondents, whose magnitudes and directions would be difficult to estimate or correct. The risks of selection and other biases in the data are especially problematic given the already low-quality match (64.3%) of USPTO data with Census Bureau data.
Meanwhile, demographic data cannot be collected in violation of the Paperwork Reduction Act, the Privacy Act, and the Census Bureau’s own confidentiality obligations. Indeed, though the USPTO’s own patent data is publicly available, once matched with Census information, even the resulting data is prohibited from release because it includes commingled confidential data. Thus, any effort at mandatory data collection must not run afoul of these legal constraints.
Privacy Safeguards and Authorizing Legislation
Between these competing constraints, the law is more moveable than the demands of statistical rigor. Accordingly, the USPTO’s focus for collecting demographic data should be twofold.
First, the agency should protect the data from unauthorized disclosure outside the agency and from undue influence inside the agency on patent examination or other processes. In particular, the USPTO should not allow the availability of demographic information about patent applicants to enable bias, whether conscious or unconscious, on the part of patent examiners.
Second, the agency should ensure that its legal authority to collect demographic information is clear and specific. The USPTO’s response to Senators Hirono, Tillis, and Coons was quite sensible in this regard, expressly connecting the collection of more comprehensive data to a need for relevant authorizing legislation—and offering technical assistance on such legislation.
The Flip Side of Privacy Concerns
Finally, there is another, largely unexamined, dimension to the concerns for USPTO applicant privacy. Recent research cautions that unconscious bias may already exist in patent examination to varying degrees across technology centers and art units. A 2012 report of the National Women’s Business Council [Part I; Part II] shows nearly identical trends for the patents filed versus patents granted for both women and men.
However, despite this trend, a peer-reviewed 2018 paper by Kyle Jensen and coauthors showed that patent examiners tend to favor male inventors and judge applications with a female name more harshly: applicants with common names from which female inventors can easily be identified were 8.2% less likely to be granted a patent, whereas those with uncommon names that are harder to guess were only 2.8% less likely. This suggests that there may already be some unconscious bias at work. If so, such bias is likely to be rooted in demographic inferences that may be drawn from inventor information that is already available.
It does not necessarily follow that collecting more information and better information about applicants will cause even greater gender bias or disparity. To the contrary, by restricting access to any new demographic information that the USPTO collects—especially keeping it firewalled from the examination process—the agency could minimize the day-to-day effect of that information on examiner operations. The USPTO’s more systematic and complete collection of demographic information may even aid in identifying and mitigating existing bias.
Conclusion: A USPTO Pilot Program
This, in sum, is the analytical and historical case for the pilot program proposed in the Day One Project Transition Document. Indeed, as law professor and former Obama administration adviser Colleen Chien has aptly observed, the USPTO is “an innovation agency that generates its own fees” and thus “has a less politicized mandate as well as a strong culture of piloting.” A soundly constructed, statistically informative pilot program would do much to guide the USPTO’s decision making in the difficult balance between gathering useful data and respecting privacy values.
Image Source: Deposit Photos
Image ID: 31248541