Like electricity, nuclear power, or biotechnology before it, data will shape the next age of innovation and invention. With a bright new IDEA, Rensselaer is positioned at the forefront of this technological revolution.
Our lives are overflowing with data. We leave behind a trail of digital breadcrumbs when we send emails, post photos to the Web, use credit cards, or play video games. Our homes are outfitted with sensors that measure room temperature and the use of gas, water, and electricity. Video cameras, card readers, and keypads adorn our businesses and workplaces. Satellites above us collect data on the weather, and sensors in the ground gather information about earthquakes. Industry and academia are also awash in data. On the Rensselaer campus, students and faculty in every department are generating vast libraries of digital information from their course work and research.
Many metaphors have been employed to convey the importance and value of data. It has been likened to a natural resource and called “the new oil,” while others declare data a currency and call it “the new gold.”
Professor James Hendler, a world leader in Web science research who advises the White House on Web and data policy, is resolute in insisting data is more than a fuel to be spent or a commodity to be traded. These views, he says, do not accurately celebrate the transformative power of data.
Hendler sees data instead as the stage upon which this century’s biggest technological and scientific innovations will play out. Chemists in the late 1700s started organizing their investigations, which led to a scientific revolution enabled by chemical forces. Similar disruptive periods occurred in the wake of the organization of thinking surrounding electrical forces, nuclear forces, and biological forces. Data, Hendler says, is the force that will enable the next great era of rapid, radical scientific and technological progress.
“Data, or more technically the information derived from data, is the force of the future. Our global society creates data, consumes data, and uses data on a daily basis. But we’re not even close to being able to deal with the sheer volume and variety of data being generated. Organizing this data in such a way that it can be found, accessed, and used more effectively by more people is the challenge that will dominate scientific and industrial progress for at least the next decades,” he says. “We need to learn how to harness the force of data, or we will be left behind as individuals, as academics, and as a nation.”
It is in response to this grand challenge that President Shirley Ann Jackson announced on June 13, 2013, the creation of the Rensselaer Institute for Data Exploration and Applications. Known as the Rensselaer IDEA, the new research institute connects and fortifies the wealth of data-related research taking place on campus—with a specific focus on high performance computing, cognitive computing, Web, network and data sciences, predictive analytics, and immersive technologies—and links this work to actionable applications at the interface of engineering and the physical, life, and social sciences.
“The goal of the Rensselaer IDEA is to access and aggregate a global storehouse of social, cultural, financial, scientific, and engineering information—and then to make it available in a form in which any person, anywhere on Earth, can ask important questions and contribute to emergent hypotheses,” President Jackson said.
The Rensselaer IDEA, led by Hendler, cuts across the university’s five schools and more than a dozen academic departments. The new research institute represents an investment of more than $60 million. It also represents an opportunity for Rensselaer to capitalize on its unique platforms and intellectual capital to bolster its sterling reputation as a global leader in data-related innovation and in moving research breakthroughs from the laboratory to the public sphere.
“Here at Rensselaer, we will address the hard problems, which we are uniquely qualified to address because of our strengths in engineering, science, design, management and entrepreneurship, and the humanities, arts and social sciences,” says President Jackson. “We will continue to leverage our interdisciplinary approaches to problem solving and educating students, using the new tools and technologies of this data-driven, Web-enabled, supercomputer-powered, globally interconnected world.”
“We are looking to fundamentally change the life cycle of scientific inquiry by infusing it and informing it with leading-edge data techniques,” Hendler says. “This holds the potential to touch every discipline across campus, and in a way that is unique and very special to Rensselaer.”
Seeking the Tree Instead of the Forest
Rensselaer faculty researchers and students, past and present, have never been afraid to tackle the biggest, most complex challenges of the day. This audacity, paired with the Institute’s culture of excellence and interdisciplinarity in education and research, is a key factor that enables a bold, strategic endeavor at the scale of the Rensselaer IDEA to take root and grow.
The new institute builds on key Rensselaer strengths: Faculty in all five schools are already pursuing data-driven research. Educational innovation is ensuring students start using data and analytics in introductory courses taken within their first few semesters on campus. Research centers and constellations are developing critical capabilities and intellectual leadership in the use of data in science and engineering. And most important, arguably, are the low walls between engineering and other disciplines on campus.
The situation has evolved, Hendler says, to the point where barriers to collaboration are no longer internal but external to Rensselaer. The university has no cultural or procedural impediments to transitioning scientific discoveries into the realm of engineering, where the breakthrough is refined, documented, and moved forward toward a product or an application.
The challenge, he says, is that very few governmental agencies or private foundations are willing or able to fund this type of long-term project that spans both basic and applied research across traditional disciplinary bounds.
This promotes what Hendler calls “the data forest.” He characterizes the situation as a forest of slender trees drawing upon a shared underground aquifer of data. Each tree represents an individual research project, with the green treetop symbolizing the data-driven end application. (Such an application could be a piece of novel software for modeling metal fatigue at the nanoscale, or a student-created app for wirelessly monitoring home water usage, or anything in between.) The challenge is that not all researchers are data scientists or computer programmers, Hendler says, nor is it reasonable to expect them to be. This often results in custom-built applications that are valuable to the intended end users, but which are precariously balanced upon project-specific programming techniques that Hendler equates to thin, brittle tree trunks.
Even though the underlying data is good, and the program works, the nature of the solution results in an application that is expensive to maintain, difficult to validate, and oftentimes impossible to share with others. Eventually, Hendler says, these brittle data applications diminish and crumble under the weight of their own complexity and narrow focus.
The Rensselaer IDEA is in the process of reshaping this forest of delicate saplings into one formidable, spectacular redwood. Here’s how the scene changes: Instead of being separate tiny top-heavy trees, the data-driven applications would be branches upon the big tree. And the trunk of the tree would be strong, healthy, and robust enough to support its many branches indefinitely into the future.
Consider this: An astrophysicist’s data, on the surface, looks considerably different from a network scientist’s data, which in turn looks significantly different from air flow and friction data collected by a mechanical engineer. “But from a computer’s point of view,” Hendler says, “the data probably doesn’t look very different.” And while the tools used to collect this data—a telescope, Twitter, a wind tunnel—are remarkably dissimilar, researchers often seek to do very similar things with the information they capture.
This is the opportunity of the Rensselaer IDEA: to create infrastructure, processes, and common tools for manipulating, mining, and sharing data that can be used by faculty members or students from any discipline. This common “trunk” will greatly reduce the time researchers sink into developing one-off, dead-end applications, and they can instead spend their time on other aspects of the project. Data collected by these applications will no longer go into a dusty digital folder never to be seen again. Rather, this data will be accessible to other researchers and thus could help to spark unplanned, serendipitous collaborations and innovation.
“Think about a civil engineer who, before building a bridge, had to learn to forge metal, pour concrete, and cable suspension trusses. Most of the time is going to be spent creating tools, instead of doing what she’s actually really good at—designing the bridge,” Hendler says. “Researchers working with data today often face a similar situation. They are experts in biotechnology, supply-chain management, or sociology, but much of their students’ time is spent in front of a computer screen trying to use specialized programming techniques to construct a functional application.
“What we want to do is create common tools that are accessible and easier to use, whether you’re a chemist, a mechanical engineer, an architect, or an artist. Just think of all the extra time our faculty and students will have to ponder and extract value from that data, instead of thinking about how to collect and manage it. We will also make it easy for people to share their data, and have access to data sets from across campus, even if it was created in a different department, research center, or school.”
Simply put: The Rensselaer IDEA tends the aquifer and the tree trunk, leaving faculty researchers and students to do what they do best—grow their branches by applying leading-edge science, engineering, scholarship, and technology to the common purposes of life.
Along with the collaborative, low-walls thinking that pervades campus, the Rensselaer IDEA draws upon and would not be possible without four unique intellectual and physical platforms: the Center for Computational Innovations (CCI) supercomputing center, the Curtis R. Priem Experimental Media and Performing Arts Center (EMPAC), the Center for Materials, Design, and Integrated Systems (cMDIS), and the Center for Biotechnology and Interdisciplinary Studies (CBIS).
Each platform was envisioned and realized by President Jackson, and each contributes critical capabilities to what she calls the “computational ecosystem” coalescing at Rensselaer, which ultimately culminated in the launch of the Rensselaer IDEA. The four platforms, Hendler says, largely make up the trunk of the tree.
Ethics, Educating the Leaders of Tomorrow, and the Big IDEA
Hendler describes the Rensselaer IDEA as the “integratory fabric” that weaves together CCI, CBIS, EMPAC, cMDIS, and other data-related research taking place across campus. Researchers and students who seek to harness the capabilities of these platforms—a social scientist who wants to run simulations on AMOS, or visualize data at EMPAC—but who may lack direction on how to make it happen, will know the Rensselaer IDEA should be their first destination.
“It doesn’t make sense for each individual to try and quickly become experts in high performance computing, data visualization, or cognitive computing, in order to use these world-class, unique research platforms at Rensselaer,” he says. “We want them to focus instead on the applications they’re building—those branches at the top of the tree—and use our infrastructure for the other stuff. Rensselaer is the only place in the world where these powerful platforms all exist together, and we want to make it easy for students and faculty to exploit the interconnections between CCI, CBIS, cMDIS, and EMPAC to spectacular, world-changing results.”
Rensselaer Vice President for Research Jonathan Dordick says the Rensselaer IDEA is important for strengthening the position of Institute researchers to win grants in the current environment of shrinking federal budgets and heightened competition for funding from government, industry, and foundations. It also helps differentiate Rensselaer as the university seeks to attract the best and brightest students, faculty, and researchers from across the world. Most importantly, he says, the Rensselaer IDEA provides a fundamental framework to use data science and data analytics to address some of society’s most difficult, yet important, problems.
The Rensselaer IDEA has a particular emphasis on helping elevate the leadership of Rensselaer in seven emerging areas of research, Dordick says: health-care analytics, business analytics and intelligence, built and natural environments, virtual and augmented environments, cybersecurity applications, basic research in physical and engineering sciences, and public policy.
The educational aspect of the Rensselaer IDEA is also critical, says Kristin Bennett, a professor in the Department of Mathematical Sciences, who recently received a $550,000 grant from the National Science Foundation to integrate data analytics throughout the undergraduate mathematics curricula and engage freshman and sophomore students in data analytics research. “Our goal is to create the next generation of deep analytical thinkers who can leverage data to solve the most pressing business and societal needs,” she says.
The experience of working on data-related course work and research at CCI, CBIS, and EMPAC—particularly for undergraduates—will be extremely valuable as they graduate from Rensselaer and enter the workforce. A recent study by Gartner Inc. estimates that 1.9 million so-called “Big Data” jobs will be created in the United States by 2015. This workforce—which is in high demand today—will require professionals who understand how to ask the right questions to find insights in data, harness data-exploration, data-crunching, and data visualization technologies, and effectively communicate information found as part of multidisciplinary teams, Bennett says.
In light of the increasing pervasiveness of data and highly publicized persistent widespread concerns over the security of data, Hendler says the Rensselaer IDEA is working with the School of Humanities, Arts, and Social Sciences to develop a framework for addressing digital ethics. The goal is to challenge students with ideas about balancing security and profit or progress, and to help them become acutely sensitive to the personal, professional, and policy-level implications of the privacy issues that arise when dealing with potentially sensitive data.
Overall, the Rensselaer IDEA is empowering students and faculty with the ability to make intelligent use of the best tools available to inform their decision-making and help produce a range of discoveries and applications to change the world for the better.
“Rensselaer has an innovation mindset that is unique to the very best universities in the world,” Hendler says. “With the Rensselaer IDEA, we now have a data-forward mindset that is also rare in academia. We are ahead of the curve, and in a very strong position to be a leading force in helping to shape the innovations of the data revolution.”