Driving AI innovation through a National Research Cloud
A newly released whitepaper provides a roadmap for the creation of a National Research Cloud (NRC) that would fuel artificial intelligence-based research.
In “Building a National AI Research Resource: A Blueprint for the National Research Cloud,” the Stanford Institute for Human-Centered Artificial Intelligence (HAI) and the Stanford Law School’s Policy Lab, which studied the feasibility of an NRC, makes recommendations for how an NRC could play out.
First, the paper, released Oct. 6, recommends the use of a dual investment strategy that makes the most of both public computing infrastructure and services from commercial cloud providers. In the short term, “the compute model of the NRC can be quickly launched by subsidizing and negotiating cloud computing for AI researchers with existing vendors, expanding on existing initiatives like the [National Science Foundation’s CloudBank project,” which provides subsidized access to existing cloud resources.
NRC should also invest in a pilot test of public infrastructure to assess its ability to provide similar resources in the long run, similar to the way Energy Department national laboratories operate now. They own supercomputing facilities that researchers get approval to use.
Second, the paper recommends that eligibility to access and use NRC be limited — at least initially — to academic and nonprofit AI research, specifically those who are considered principal investigators (PIs) at U.S. colleges and universities and to what the paper terms “Affiliated Government Agencies.” Those are organizations that will contribute previously unreleased, high-value datasets to NRC in return for subsidized compute resources.
Third, the researchers recommend a default base-level access for computing that should cover most PIs’ needs and a custom grant process for access to additional compute beyond that base.
The whitepaper’s data access model is based on four recommendations. The first is that NRC focus on government data. Second, it recommends “a tiered access model: by default, researchers will gain access to government data that is already public; researchers can then apply through a streamlined process to gain access at higher security levels on a project-specific basis.”
One challenge to sharing data is the Privacy Act of 1974, which requires that there not be a central repository of government data, putting it somewhat at odds with an NRC. The researchers took that into consideration.
“There’s no question here about removing or eliminating the Privacy Act,” Jennifer King, privacy and data policy fellow at the Stanford HAI, said during a webinar announcing the blueprint. Instead, the researchers came up with ways the two could exist compatibly. For instance, the act allows for an exemption for statistical research, which makes up the bulk of AI study, and agencies are expected to protect data with privacy treatments beyond anonymization, including differential privacy, homomorphic encryption or synthetic datasets.
In terms of where to locate NRC, the whitepaper recommends that it be a Federally Funded Research and Development Center (FFRDC) to start, moving to a public/private partnership in the long run.
Overall, the paper identified three primary themes. They are complementarity between compute and data, rebalancing AI research toward long-term non-commercial research and coordinating short- and long-term approaches to creating the NRC.
The idea for an NRC came from Stanford HAI’s founders, who helped usher it into legislation with the National AI Research Resource Task Force Act, part of the National Defense Authorization Act. The act created a task force, set up in June, to study and plan for the implementation of a “National Artificial Intelligence Research Resource” (NAIRR), also known as NRC, the paper states.
The need for an NRC stems from an imbalance between commercial and noncommercial AI research that threatens to “undermine the historical innovation ecosystem where basic, fundamental and noncommercial research have laid the foundations for applications that may be decades away, not yet marketable or promote the public interest,” the blueprint states.
“We need to understand there are inherent differences between academic research and commercialized research,” Russell Wald, director of policy for the Stanford HAI, said during the webinar. “With longer time horizons and no profit constraints, basic scientific research has given way to breakthroughs such as GPS, the internet and CRISPR [DNA sequences]. Examples such as this have led to an eventual commercialization of these discoveries and greater downstream benefits to society.”
For example, after Landsat imagery went from costing about $600 per file to being freely available to the public in 2008, it generated a productivity savings resulting in annual economic benefits of $3 billion to $4 billion, Wald said.
He added that 82% of algorithms used today came from federally funded nonprofit and university efforts, he said, but that is waning because the “innovation ecosystem” is threatened by the high cost of computing power, limited access to the raw data used to train AI models and “brain drain” of AI researchers at universities.
“NRC will generate distinct positive externalities by integrating compute and data, the two bottlenecks for high-quality AI research,” according to the report. “Specifically, the NRC will provide affordable access to high-end computational resources, large-scale government datasets in a secure cloud environment, and the necessary expertise to benefit from this resource through a close partnership between academia, government, and industry. By expanding access to these critical resources in AI research, the NRC will support basic scientific AI research, the democratization of AI innovation, and the promotion of U.S. leadership in AI.”
Stephanie Kanowitz is a freelance writer based in northern Virginia.