navigating big data - Columbia Engineering - Columbia University [PDF]

s p e c i a l i s s u e : T H E I N S T I T U T E F O R DATA S C I E N C E S A N D E N G I N E E R I N G

navigating big data

A

s I embark on my second stint as Interim Dean of the School (I first served in this capacity in 1994–95), I cannot help but compare the School then and now. While technology, engineering disciplines, and applied sciences have advanced tremendously in scope and importance during the intervening years, the School has made even greater strides, with our faculty leading research initiatives that are both highly interdisciplinary and genuinely trailblazing.

Interim Dean of the School Donald Goldfarb Executive Director of Communications Margaret R. Kelly Special Issue Editor: Kathleen R. McKeown Contributing Editor: Patricia J. Culligan Editor: Melanie A. Farmer Writers: Keren Bergman, Raimondo Betti, Kartik Chandran, Shih-Fu Chang, Michael Collins, Patricia J. Culligan, Garud Iyengar, Angelos D. Keromytis, Andrew Laine, Kathleen R. McKeown, Vijay Modi, Chris Wiggins, Gil Zussman Contributing Illustrators: Andrew Bannecker/Bernstein & Andriulli; Nomoco/Bernstein & Andriulli; Jeffrey Fisher Design and Art Direction: University Publications Contributors: Doneliza Joaquin, David Simpson Columbia Engineering is published twice a year by: Columbia University in the City of New York The Fu Foundation School of Engineering and Applied Science 500 West 120th Street, MC 4714 New York, NY 10027 Comments, suggestions, or address changes may be mailed to: Columbia University The Fu Foundation School of Engineering and Applied Science Room 510, MC 4714 500 West 120th Street New York, NY 10027 Phone: 212-851-5993 Fax: 212-864-0104 Email: [email protected] Read more about Columbia Engineering at www.engineering.columbia.edu For more on the Institute for Data Sciences and Engineering: engineering.columbia.edu/idse Find Us Also On: engineering.columbia.edu/Facebook twitter.com/CUSEAS youtube.com/ColumbiaSEAS

In late July, our School received significant support from New York City to create the Institute for Data Sciences and Engineering, announced jointly by President Lee C. Bollinger and Mayor Michael R. Bloomberg in a press conference held in the new Northwest Corner Building (see full story on page 40). The Institute, housed in Engineering and led by our faculty, calls for an interdisciplinary approach that fosters increased collaborative research with seven other Columbia units. During the press conference, President Bollinger credited Engineering faculty for their pioneering research, which shaped the core of the University’s proposal for the new Institute, and acknowledged how essential new research space is to the continuing growth of the School. The City’s $15 million seed money will allow the School to jumpstart creation of 44,000 square feet of new space on Columbia’s campus by 2016 and add 30 new faculty members within the same time period. Space has been the only constraint that has prevented Columbia Engineering from being in the very top echelon of engineering schools. This agreement with the City will enable us to expand our research program, faculty, and students at a pace that otherwise would not be possible. The Institute is paving the way for our School’s future development. To provide an insight into the diversity of research that the Institute encompasses, this special issue of Columbia Engineering magazine is devoted to the work that will go on within the Institute’s five centers—smart cities, new media, health analytics, financial analytics, and cybersecurity—and core research initiatives. Written by some of the professors who will be spearheading each of these areas, this issue has been edited by Institute Director Kathleen R. McKeown, Henry and Gertrude Rothschild Professor of Computer Science, with Institute Associate Director and Professor Patricia J. Culligan of the Department of Civil Engineering and Engineering Mechanics as special contributing editor. I hope the following articles help you understand more about the research initiatives that are the foundation of the Institute and the new paths that will be sparked by the work of these and other faculty members who are part of Columbia’s newest academic/ research/entrepreneurial endeavor. Indeed, this is a very exciting time for the School, and, as we approach our 150th anniversary, we look forward to the momentum that the Institute will provide as we create the next chapter in the School’s history.

Donald Goldfarb Interim Dean and Avanessians Professor of Industrial Engineering and Operations Research

Fall 2012 | Volume 54, No. 1

Contents

Features 5

Addressing the Data Deluge

Data Explosion: The proliferation of data is essentially everywhere and its ever-increasing impact on our daily lives has created an urgent need to manage, collect, and make sense of it all. Our new Institute for Data Sciences and Engineering fills this critical need for the acquisition, synthesis, and analysis of “big data” in five key areas: smart cities, new media, health analytics, financial analytics, and cybersecurity. Faculty—not only from the Engineering School but also from myriad disciplines across the University— are working together to chip away at the challenges posed by this explosion of data. This special issue of Columbia Engineering magazine takes a closer look at some of the research behind the Data Sciences Institute and explores the current and future state of a range of significant topics. From wastewater treatment to the structural monitoring of bridges to rethinking cloud computing security, find out how the Engineering School faculty and their collaborators campus-wide are addressing the problems and challenges tied to this ever-increasing inundation of data.

2 | columbia engineering

By Kathleen R. McKeown and Patricia J. Culligan If properly harnessed, data have the potential to generate knowledge that can drive innovation, create jobs, foster economic growth, transform decision-making, and develop solutions to problems of societal concern, worldwide.

8

Big City, Bigger Infrastructure Challenges By Raimondo Betti In the future, a city’s infrastructure could have the ability to monitor its own health through a complex network of sensors that, in real time, will be able to provide an estimate of its structural integrity and, if necessary, initiate corrective actions.

11

Shifting from Resource Removal to Resource Recovery By Kartik Chandran The foundation for future smart, sustainable cities requires a novel approach to wastewater treatment and development of new models to “remove” waste from other essential resources.

14

Smarter Cities, Smarter Retrofits By Vijay Modi How smart devices can help future urban cities become more resourceful and shrink their environmental footprint.

33 40 17

Picture This By Shih-Fu Chang The use of digital video is rapidly growing at an unprecedented rate. However, such a massive development creates not only a wealth of opportunity but also a whole set of new and challenging problems.

23 in order to improve their clinical decisions in treating diseases such as cancer, heart disease, diabetes, and chronic neurological diseases like Alzheimer’s and Parkinson’s.

forcing a change in the way many of the problems in this field are being addressed.

26

Whither Cloud Computing Security?

Applying Big Data Approaches to Biological Problems

By Michael Collins The vast amount of linguistic data now in electronic form is posing significant challenges in how researchers manage and access this data through the use of natural language processing technologies.

By Chris Wiggins We can apply so-called big data approaches not only to biological examples but also to health data and health records. These approaches offer the possibility of, for example, revealing unknown lethal drug-to-drug interactions or forecasting future patient health problems. Such models could have consequences for both public health policies and individual patient care.

23

29

19

More Than Enough Words to Process

Imaging Informatics: Integrating Multimodal and Longitudinal Data for Understanding and Treatment of Disease By Andrew Laine A revolutionary system will help doctors manage large amounts and many different kinds of patient information

Data-Driven World Fuels Change in Portfolio Selection, Risk Management By Garud Iyengar With the cost of IT infrastructure down, more and more financial transactions are being captured electronically. The quality of financial data is improving, and with that, the quantity of data is increasing. This data revolution is

33

By Angelos D. Keromytis Cloud infrastructures represent a tempting and highly lucrative target for attackers. Therefore, the security expectations from cloud computing infrastructures are (or should be) arguably higher than those in traditional computing.

37

Smarter Optical Networks for a Congested Internet By Keren Bergman and Gil Zussman Columbia University was one of the main contributors to the development of the Internet, and since the 1980s, it has retained a leading position in the area of networking. Faculty members in the School’s Departments of Electrical Engineering and of Computer Science continue this tradition by dealing with the challenges imposed by the ever-increasing amount of Internet traffic and energy consumption.

News and Alumni 40 Big Data in the Big Apple

Columbia, NYC Launch New Data Sciences Institute

42

Commencement 2012 Xerox CEO Ursula Burns MS’82, the School’s Class Day keynote speaker, urges graduates to help make the world a better place.

44 64

Reconnecting at Reunion At Orientation, Meet an Astronaut

Departments 46 57 60

Class Notes Program Notes In Memoriam

columbia engineering | 3

1

11 0 00 1 0 0 0 1 1 1 10 1 11 0 00 10 1 1 1 0 0 1 0 1 0 0 0 1 1 1 1 1 0 1 1 0 0 00 0 1 10 1 1 1 1 0 0 1 1 1 0 0 0 1 0 0 1 0 1 1

Data Columbia Engineering

1 0 1 1 0

0 1 1 0 0A 1 1 0

Addressing the Data Deluge By Kathleen R. McKeown and Patricia J. Culligan

s computational technology has advanced over time, so has the abundance of data being generated, collected, and stored in systems around the globe. Along with technological changes, society is undergoing a dramatic transition from a “data poor” to a “data rich” environment in both scientific and business applications. The huge abundance, complexity, and variety of the data that are being produced are challenging scientists and industry alike. This so-called data deluge is arising from the growth of online resources as well as direct collection enablers, including smart sensors, handheld data entry devices, and satellite technology. Smart phones, social media sites, and monitored online consumer behavior also provide new sources of data. While the data deluge continues to raise concerns about personal privacy, the possibilities to create value through the intelligent use and mining of data are enormous. If properly harnessed, data have the potential to generate knowledge that can drive innovation, create jobs, foster economic

growth, transform decision making, and develop solutions to problems of societal concern, worldwide. Columbia’s new Institute for Data Sciences and Engineering will zero in on just that—big data and its full potential. Funded in part through an award from New York City’s Economic Development Corporation (NYCEDC), the Institute will enable engineering and applied science researchers to obtain the education, resources, and collaborations necessary to translate a data-rich environment into informational discoveries that offer tremendous potential in innovation, commercial enterprise, and workforce development. This new Institute will comprise five core centers of study and entrepreneurship—focused on New Media, Smart Cities, Health Analytics, Cybersecurity, and Financial Analytics. The 2011 McKinsey report presents the massive scale of what big data represents in its recent assessment of the economy. Among the leading indicators are five billion mobile phones in use in 2010, and 15 out of 17


Columbia’s new Institute for Data Sciences and Engineering will zero in on just that—big data and its full potential. Funded in part through an award from New York City’s Economic Development Corporation, the Institute will enable engineering and applied science researchers to obtain the education, resources, and collaborations necessary to translate a data-rich environment into informational discoveries that offer tremendous potential in innovation, commercial enterprise, and workforce development.

major sectors in the United States having more data stored per company than the U.S. Library of Congress. The report also states that more than 30 million networked sensor nodes are now present in the transportation, automotive, industrial, utilities, and retail sectors, while over half a billion people worldwide are using smart phones. McKinsey’s report “expects big data to rapidly become a key determinant of competition across sectors,” noting that this is exactly where the workforce will experience a gap in the coming years, with demand exceeding supply by “140,000 to 190,000 positions” in 2012. It is exactly such a diversity of challenges and opportunities that our Institute targets. The late 1990s featured Silicon Valley Internet start-ups, Boston biotech start-ups, and Washington, D.C., “beltway bandits” to support defense and intelligence agency needs. These companies grew organically from the needs, talent, and culture of the local environment. We envision a similar organic growth of start-ups in New York City, addressing the needs and interests of our environment. In New York, the media capital of the world, companies struggle to shift to a new digital paradigm, advertising and marketing turn online, and youth turns to new forms of social media. These changes set the stage for a focus on innovation in our New Media Center. In the New Media


Section of this special issue of the magazine, Professor Shih-Fu Chang of the Department of Electrical Engineering highlights cutting-edge research at Columbia in the analysis and creation of visual media, such as video, while Professor Michael Collins of Computer Science discusses advances in the analysis and creation of online language, as is carried out, for example, in machine translation. These faculty members highlight their own work as well as that of the many other faculty within the Engineering School who work on the analysis of a wide variety of media, including text, speech, image, video, and social media. New York also faces challenges posed by an aging infrastructure, a need to improve its energy efficiency, and the potential to use data-enabled technology to help its concentrated population live more efficiently. These and other issues set the stage for a focus on data-enabled innovation in our Smart Cities Center. In the section Smart Cities, Professor Raimondo Betti of the Department of Civil Engineering and Engineering Mechanics describes research that uses advanced sensing to monitor the health of infrastructure in New York and other cities, including vital civil infrastructure such as bridges, while Professor Vijay Modi of the Department of Mechanical Engineering writes about the role of big data in increasing urban energy efficiency. Professor Kartik Chandran writes about

technology that can aid in clean water supplies. Faculty from Electrical Engineering, Earth and Environmental Engineering, Civil Engineering, Computer Science, and Mechanical Engineering, as well as researchers from the Center for Computational Learning Systems, address a wide range of problems in this area. With a diverse population in need of health care and preventative medical interventions, national health care costs have skyrocketed. New York City’s hospitals, including those of the Columbia University Medical Center, are the most advanced in the world in their use of online patient data. Growing demand for effective health care combined with our local talent base in this area set the stage for a focus in innovation in our new Health Analytics Center. In the section Health Analytics, Professor Andrew Laine of the Department of Biomedical Engineering describes research on the analysis of large data sets resulting from medical imaging, which helps to improve patient care, while Professor Chris Wiggins of Applied Physics and Applied Mathematics writes about the need for interdisciplinary research of the human genome, an endeavor that promises to provide new understanding of diseases that were previously difficult to prevent and treat. Faculty from the Morningside campus often collaborate with faculty from the Columbia University Medical campus, where biomedical informatics researchers work with clinical data to improve health care, and Professor Andrea Califano’s group works on problems in systems biology involving large genomic datasets. With the almost immeasurable reams of data generated every minute of every day, worldwide, comes a commensurate need to keep data secure and private for its lifetime—for both institutions and individuals that rely on and generate that information. Greater research, technology, and business development in the sphere of security is critical, and we are forming a new Cybersecurity Center as a key part of the new Institute for Data Sciences and Engineering. In the section on Cybersecurity, Professor Angelos Keromytis of the Department of Computer Science explains the new security challenges that arise when computation takes place in the “cloud,” a new way of supporting large data applica-

tions that is becoming rapidly embraced by data users around the globe. Other faculty within the Computer Science Department work on problems ranging from cryptographic theory to policy to algorithms that ensure secure systems. And finally, as New York is the finance capital of the world, it demands technology experts and innovative new approaches to data comprehension, capture, curation, and management—with tremendous opportunities for both entrepreneurship and workforce development. The demands of the finance sector also require particular expertise and drain talent from other industries and sectors. As such, our new Financial Analytics Center will cultivate a larger talent pool and workforce, as well as the technology and applications necessary to further advance this critical sector of the New York City business community. In the section on Finance, Professor Garud Iyengar of the Department of Industrial Engineering and Operations Research (IEOR) discusses new methods for analyzing data that can help with financial risk management, an important approach to avoid the problems we have seen in the financial industry in the last few years. Other faculty within IEOR and the Computer Science Department work on related problems, as do faculty within the Columbia Business School. To support and amplify the work of five Institute centers, which all lie at the heart of New York City’s innovation economy, the Institute also will conduct core research on problems that cut across the data sciences and engineering. The research will focus on formal and mathematical models for data processing, as well as on issues concerning the engineering of large-scale data collection, aggregation, transmission, and processing systems. In the section on core research, Professors Keren Bergman and Gil Zussman of the Department of Electrical Engineering discuss Columbia University’s historic and ongoing contributions to the development of the Internet, highlighting new interdisciplinary research in the field of intelligent optical devices that has potential to completely transform network services of the future. Core research within the Institute will also focus on problems in machine learning and data analytics, collaborating with faculty across all centers to apply new

techniques to problems they are addressing. A key focus of the Data Sciences Institute will be on translational research and interaction with industry. As Brynjolfsson, Hitt, and Kim (2011) discovered in their survey of 179 large corporations, those companies that have adopted a “data-driven” decision-making process had 5 to 6 percent greater productivity than companies that followed a more traditional “intuition and experience” approach. Borrowing from the medical field’s translational paradigm “from bench to bedside,” the new Institute will address the continuum from “data to innovation” through a program that spans from basic scientific research through to solutions and technology transfer. New educational models and products will be built to attract and train a diverse cadre of students with the talents to exploit the value of a data-rich society. The Institute for Data Sciences and Engineering will be led by The Fu Foundation School of Engineering and Applied Science in close collaboration with seven other schools within the University: Columbia Business School, the Graduate School of Arts and Sciences, the Mailman School of Public Health, the College of Physicians and Surgeons, Columbia Journalism School, the School of International and Public Affairs, and the Graduate School of Architecture, Planning and Preservation. Through interaction with a coalition of industry and community partners and startups and the NYCEDC, the Institute for Data Sciences and Engineering will form an innovation hub that can help harness the power of our data-rich society through novel research and enterprises that have local, national, and global impact.

Kathleen R. McKeown (top) is the inaugural director of Columbia’s Institute for Data Sciences and Engineering and also is the Henry and Gertrude Rothschild Professor of Computer Science at the Engineering School. A leading scholar and researcher in the field of natural language processing, McKeown focuses her research on big data; her interests include text summarization, question answering, natural language generation, multimedia explanation, digital libraries, and multilingual applications. Institute Associate Director Patricia J. Culligan (bottom), professor of civil engineering, is a leader in the field of water resources and urban sustainability. She has worked extensively with The Earth Institute’s Urban Design Lab at Columbia University to explore novel, interdisciplinary solutions to the modern day challenges of urbanization, with a particular emphasis on the City of New York. Read more about the Institute’s leadership on page 41.



Smart Cities Research conducted by the Institute’s Smart Cities Center will develop and monitor green infrastructure and buildings, improve the power supply through smart grid technology, detect and counteract problems with aging urban infrastructure, calculate and communicate optimal transportation routes under congested traffic conditions, and deploy sensing devices to facilitate everyday activities in a crowded urban environment.

Big City, Bigger Infrastructure Challenges By Raimondo Betti

T

here is no doubt that the infrastructure of current and future large cities is a critical issue in our society. The importance of infrastructure to both the fabric of society and its economy is nowhere more apparent than in our urban centers. Its fragility as it ages is exemplified by incidents like the recent collapse of the I-35W Bridge in Minneapolis, but the problems are generally pervasive and less evident. One key problem: money. Budgets set aside for infrastructure maintenance and monitoring are slim, but the costs tied to them, steep. Public (federal and state) expenditures on infrastructure grew slowly (1.7 percent per year) from 1956 to 2004, and slightly more (just 2.1 percent) in recent


years. The American Society of Civil Engineers (ASCE) estimates that upgrading the nation’s infrastructure system will cost $2.2 trillion over a five-year period. The Federal Highway Administration reports that the costs resulting from the loss of a critical bridge or tunnel could exceed $10 billion. ASCE estimates that Americans spend $54 billion each year on vehicle damage repairs caused by poor road conditions. Our water systems are also failing. According to the Environmental Protection Agency (EPA), there are 240,000 water main breaks per year in the nation with an estimated waste of some six billion gallons of drinking water each day. The crucial role of infrastructure as an indicator of the health of a society was eloquently described by the Civil Infrastructure System Task Group


Increasingly, the infrastructure of the future is being envisioned as having the ability to monitor its own health through a complex network of sensors that, in real time, will be able to provide an estimate of its structural integrity and, if necessary, activate corrective actions.

Raimondo Betti specializes in the areas of structural dynamics and earthquake engineering. His research interests range from health monitoring of structures to analyzing corrosion in high-strength bridge wires. For the past five years, Betti has worked on the development of a state-of-the-art corrosion monitoring system to be used in main cables of suspension bridges. Betti, who joined Columbia Engineering in 1991, is professor and chair of the School’s Department of Civil Engineering and Engineering Mechanics.


of the National Science Foundation in its 1993 report: “The rise and fall of a civilization ultimately is linked to its ability to feed and shelter its people and to defend itself. These capabilities depend on the vitality of its infrastructure—the underlying, nearly imperceptible foundation of a society’s wealth and quality of life. A civilization that stops investing in its infrastructure takes the first step toward decline.” Today this call to action is even timelier, and advanced sensing technologies, whether for roads, highways, bridges, or water systems, are providing us with a new and improved way of addressing this need. Increasingly, the infrastructure of the future is being envisioned as having the ability to monitor its own health through a complex network of sensors that, in real time, will be able to provide an estimate of its structural integrity and, if necessary, activate corrective actions. For example, modern bridges are built with a sensor network that can reach up to 10,000 sensors, monitoring, in real time, accelerations, deformations, tilting, temperature, wind speed, humidity levels, etc. This new trend is especially applicable and needed for aging infrastructure for which continuous monitoring becomes essential for safe operation. In other engineering fields—mechanical, aerospace, and electrical engineering—such diagnostic philosophy is common, but only recently has it found consideration in civil infrastructure applications. There are similarities with other applications to be copied here, but there are also many profound differences and challenges as well that must be addressed. A key area of the research thrusts in structural monitoring will be a thorough in-

vestigation of different types of sensors for a variety of infrastructure applications (e.g., flow meters to detect leaks and blockages in pipelines or motion sensors for structural integrity of bridges and buildings). Of course, existing sensor technologies where appropriate will have first consideration, but where needed, development of new sensors need to be pursued. For example, a monitoring system for main cables of suspension bridges, developed at Columbia University, integrates many types of sensors for corrosion, temperature, pH, and other characteristics needed to assess the condition and remaining strength of main cables which, to date, are not readily inspected. Some of these sensors already have successful records in other applications but are not necessarily transferrable to suspension bridges. The cross-disciplinary character of sensor development and handling the large amounts of data are ideally suited to the broad-based data center focused on Smart Cities that the Engineering School intends to create. Expeditious collection and processing of the large amounts of data requires new methods of communication that will also be a subject of research. Instead of installing a new communication infrastructure, the infrastructure itself will be used to transfer information. We will investigate ad hoc wireless networks, communications over the power network, and intermittently connected networking techniques. Since in large urban areas there will be large numbers of sensors connected in complex networks, special attention will be given to networking techniques that process sensor readings during collection to reduce the required bandwidth. In addition, the

power requirements for monitoring widely in transportation, water distribution, and sewer systems of large urban areas can be mitigated by focusing on use of low bandwidth communications to perform infrequent “meter reading” or to summon an intermittently connected networking collection device (with mobile radio and storage nodes for water and sewer lines or with passing trains in the subway system). The use of sensor information in assessment methodologies to evaluate the structural health of the aging infrastructure system, even when the information is less than complete for the numerical models of the specific system, will enable quick response reactions in keeping with the management models that will be developed. Our Civil Engineering and Engineering Mechanics (CEEM) Department is uniquely positioned in the area of Structural Health

Monitoring (SHM). Keeping its strength on the mechanics end of the spectrum, the CEEM Department has created a group in SHM comprised of leading experts in research areas critical to our goal of remedying aging infrastructures. My own research and that of Professor Andrew Smyth focuses on structural health monitoring and damage assessment of different structural systems, such as bridges and buildings, while Professor George Deodatis brings to the group his expertise on uncertainty quantification. Professor Maria Feng is working on the development of innovative technologies for novel fiber optic and vision-based sensors as well as on microwave imaging technology. Professor Richard Longman, a world-renowned expert in control theory, complements the CEEM SHM team with his work on vibrationbased system identification.

This strong SHM group will continue to concentrate its research initiatives, critical to our goal of remedying aging infrastructures, in the areas of evaluation of existing sensor technologies and development of new sensors; collection, processing, and communication of large amounts of data; data and sensor fusion; power requirements for large sensor networks; data interpretation and system identification; structural health monitoring and damage assessment of different infrastructure evaluation types (e.g., bridges, buildings, pipelines, etc.); and quick response assessment. As part of the Institute’s program, we will be training a new generation of civil engineers in both the use of such tools and the development of new ones to ensure that our infrastructure, and our civilization, continues to advance.

Shifting from Resource Removal to Resource Recovery By Kartik Chandran

B

y many accounts, about 70 to 80 percent of the world’s population will reside in cities or urban metropolitan regions by 2050. Such a localized migration will impose severe stresses on water, food, energy, and other resources unless adequately managed. Today, the model of resource utilization is that of one-time use followed by removal. For instance, water is treated using high degrees of energy and resource inputs for potable, domestic, industrial, and agricultural purposes. Worse, still, we use treated water as a convenient medium to flush away the waste products that we generate as a society. On average, in the United States, we use 100 gallons per person per day. Used water or wastewater is simply discarded into receiving water bodies with or without further treatment. Conventional wastewater treatment is increasingly required across the world. Its implementation demands more energy and resources. Ironically, in many cases, treated wastewater is much cleaner than the water bodies into which it is discharged. Thus, if we just take the “engineered”

water cycle, what we have achieved is possibly the same water, but with the input of copious energy, resources, and money . . . twice. Our society follows a similar model for yet another important resource—nitrogen, which is an essential nutrient and drives crucial cellular metabolic processes, including plant growth. The fixation of atmospheric nitrogen (N2) in the form of organic nitrogen via the Haber-Bosch process was one of the most exciting developments of the previous century. However, the Haber-Bosch process remains rather energy intensive. Subsequently, the overapplication of nitrogenous fertilizers around the globe has resulted in severe negative impacts on the water environment (by runoff of nitrogen). It has also affected the atmospheric environment by the emission of nitrous oxide (which is 300 times more potent than CO2 as a greenhouse gas over a 150-year timeframe and is also an ozonedepleting substance) and nitric oxide (which is also an ozone-depleting substance).


Sewage offers enormous potential for recovery of resources such as smart soils, chemicals, synthetic nutrients, fertilizers, bioplastics, and alcohols as pictured above, clockwise. Resource recovery as opposed to resource removal serves as a far superior model to follow for future sustainable cities.

Human beings also discharge fixed nitrogen as “waste,” especially from cities, where the discharge load could be as high as 200 tonnes per day, as it is in New York City, for instance. The approach to solving this nitrogen problem has once again been to devote inordinate amounts of energy and resources to “remove” the nitrogen by converting it into benign N2 gas. Once again, we end up with N2, but by sinking in resources, energy, and money . . . twice. Therefore, the same redundant model of (1) investing resources to produce water, food, energy and (2) reinvesting resources to “remove” wastewater, food waste, or other waste is repeated over and over again. If we don’t change this model, there simply won’t be enough resources to sustain the projected human population on this planet and certainly not 80 percent of this population in highly clustered cities. Therefore, we need to switch to what I call “resource recovery,” which can be a foundation for future sustainable and smart cities. The shift from “removal” to “recovery”


has been widely welcomed by the traditional wastewater treatment industry worldwide. One excellent example of this shift is a wastewater treatment plant in Strass, Austria, which has gone from being a net power consumer to a net power producer, by combining anaerobic digestion (for carbon and energy recovery) with a novel, cost-effective and energy-efficient process for anaerobic nitrogen removal. Closer to home, researchers at Columbia have been leading similar efforts. For instance, Professor Nickolas Themelis, director of the Earth Engineering Center, has been pioneering work related to waste to energy using physical and chemical technologies. Indeed, the Waste to Energy Research and Technology Center, which Themelis founded, has been actively engaged in energy recovery from waste around the world. Researchers in my laboratories at Columbia Engineering are working more toward developing advanced biological technologies, which can provide an even more flexible platform and foundation for

energy, resource, and water recovery. For instance, we work with researchers at Strass, Austria; DC Water; Hampton Roads Sanitation District in Virginia; and the New York City Department of Environmental Protection to bring such technologies to the United States for the first time—at full scale in communities, where, on average, many hundreds of millions of gallons of wastewater are produced and treated. In our project in Ghana, we are converting fecal sludge into biodiesel and methane. Currently, about two billion people across the world do not have access to sanitation, very simply because they cannot afford to build or operate massive energy- and resource-intensive centralized large-scale wastewater treatment systems. The development of technologies focused on resource recovery (chemicals, energy, fuels), such as the one in Ghana, allows access for such populations to sanitation. Further, by monetizing the products, the funds generated can be fed back into the local populations for additional societal improvements. Another example of such technologies, based on microbial fermentation, was developed and implemented at pilot scale in New York City as early as 2001. These types of recovery technologies do not have to follow the centralized infrastructure model. Rather, they perform much more efficiently when used in a distributed, decentralized fashion. Several high-rises in Manhattan, for example, already recycle and reuse water. The technologies developed by my group add yet another dimension to this by recovering not only water but also chemicals and biofuels. For instance, on the ninth floor of the

Mudd Engineering Building, we have been converting food waste from the Carleton Cafeteria to chemicals since 2009. These chemicals include precursors for pharmaceuticals as well as biodiesel and microbially produced oil. However, the problem of resource recovery and sustainable cities cannot be solved by engineers alone. The involvement of other disciplines such as public health and architecture is rather crucial. So is the involvement of policy makers, who can drive regulations, not based on antiquated “removal” but on “recovery” practices. In this framework, the work of Dean Linda Fried of Columbia’s Mailman School of Public Health relating to urban demographics and that of Dean Mark Wigley of the Graduate School of Architecture, Planning and Preservation on specific demographics-targeted cities is especially significant. The dimension of art and artists in rejuvenating local communities as led by Columbia’s School of the Arts Dean Carol Becker is especially crucial as well. Finally, the worldwide efforts of The Earth Institute at Columbia, led by Professor Jeffrey Sachs, that thread together these different aspects to create a better planet and society are extremely critical on this front. Columbia University and the Engineering School are especially poised to make significant breakthroughs in the field of engineered resource recovery and in reshaping and redesigning future cities that are smart and sustainable. In order to achieve this effectively, we will need to work together with our colleagues University-wide.

We need to do away with this redundant model of investing resources to produce water, food, energy and reinvesting resources to “remove” wastewater, food waste, or other waste. If we don’t, there simply won’t be enough resources to sustain the projected human population on this planet.

Kartik Chandran’s work on the global nitrogen cycle and engineered wastewater treatment has been widely recognized. Chandran, who is associate professor of earth and environmental engineering at Columbia Engineering, sits on the board of trustees of the Water Environment Federation. He received his PhD in environmental engineering from the University of Connecticut in 1999, and in 2009 he won an NSF CAREER award for his research on the link between engineered wastewater treatment and climate change. In addition to these accolades, he is the recipient of the WERF Paul Busch Award for water quality research. In 2011, Chandran received a grant from the Bill and Melinda Gates Foundation to develop a revolutionary new model in water and sanitation in Africa. His research has been profiled in Discovery News, Science, U.S. News & World Report, and on Deutsche Welle Radio and NY1 News, among other media outlets.



NEW YORK CITY OF LIGHTS

Energy consumption at tax lot level

Smarter Cities, Smarter Retrofits By Vijay Modi

I

n decades to come, urban areas are poised to become both the engines of growth and engines of change as we recognize the importance of efficient resource use and quality of life. To this end, a data-driven system is more likely to be adopted to make these necessary improvements, taking into account human behavior and the physical environment. When it comes to resource use, urban environments are efficient by design but are also vulnerable to the supply chains that connect the cities to the world. High population densities and coastal locations also make them vulnerable to risks from extreme climatic events. Increasing resilience and reducing the environmental footprint will require a multipronged approach with lower consumption, more efficient use, reuse of resources, and integration of the energy, materials, water, sanitation, and transport systems. A data-driven balancing act will be required if we are to accomplish this transformation while adding to the quality of life and while retaining the vibrant economic landscape of the city. It is difficult to dramatically reshape the physical infrastructure of the city, and thus it will require both information and intelligence so that we can achieve change through smarter retrofits and through social and economic incentives that drive our behavior. In New York City, buildings use nearly 75 percent of all the energy consumed in the city. So clearly, the built environment is of significant interest as we seek ways to reduce energy use. We need to look both within and outside of the buildings for a system-level transformation of the energy system. We can talk about the “transformation of the energy system,” but the word “system” tends to become more real when people can appreciate the need for different physical scales to interact with each other. These scales can vary, from the interconnected power grid of the eastern United States to your city or metropolitan area, your neighborhood block, or even to your building, right down to the home appliances you use. Information flow is essential for such an interaction. We can learn much by looking at the behavioral aspects of energy conservation at the scale of an individual’s home. Without severe economic penalties, we, as residents, are less likely to monitor the thermostat setting in different rooms of the house, customize it to our workday, or to our schedules and extracurricular activities, or change it daily in response to the weather outside. In a data-driven world, we can learn from our occupancy pattern in each room, or from the lights and appliances we turn on and off at different times of the day.


A smart device might begin to discern our very specific needs and, accordingly, adjust the settings in response to individual comfort levels and the weather, thus allowing us to conserve energy without compromising comfort. In fact, the more data a smart device can gather, the less an individual will need to be involved in actively making any adjustments, and when the device makes the decisions, energy use will be cut. This idea could apply to adjusting the light and heat a window lets through. It also can apply to something as simple as knowing when the next bus or train will arrive at your desired stop, even though you are still at home. Perhaps this smart device could allow you to view specific and customized public transportation information right from your watch, so every time you look at the time, you would be informed of the exact moment you need to leave your current location in order to catch the next bus. You may or may not want the transit system to know where you are, but there are other ways in which a data-driven system could be useful. One that allows buses to move along the road at a predictable speed could lead to better utilization of the fleet, as well as provide a vastly improved customer experience. This ultimately would encourage greater use of public transportation, another energy saver. Moving from the impact of energy conservation at the individual level to the scale of multi-buildings, a block, or a neighborhood creates opportunities for significant energy savings. For example, large college campuses are beginning to exploit this potential by deploying cogeneration plants (utilizing the heat rejected from gas-fired or municipal solid waste–fired power production) or cost-effectively utilizing geothermal heat. Identifying opportunities for a cluster of buildings to interact with each other is something that requires an understanding of the specific use to which energy is put in your building and buildings around you. An electricity-generating power plant in your neighborhood, feeding excess to the grid and fully utilizing the waste heat, while allowing additional heat to come from the utility, requires real-time coordination. A group of buildings that can utilize nearly all of the waste heat from a large enough power demand to justify an efficient but decentralized local natural gas–fired power plant could end up significantly reducing both energy demand and emissions while increasing resiliency in the electric grid. Such a decentralized power plant could also replace the gas-fired power plant with rooftop solar-photovoltaic power during the daytime, obviating the immediate need

No

t to

sc

ale

ESTIMATED ANNUAL ENERGY CONSUMPTION The map provides an estimate of the building energy consumption (“delivered” energy as opposed to “primary” energy) throughout New York City. The estimate is specific to the weather of New York, specific to the particular function of the building and specific to the built-up area of the building.

(kWh/METER2 OF BLOCK AREA) N/A

5000

A 100-watt lightbulb turned on for ten hours uses one kilowatt-hour(kWh) of energy. Data sources: New York City Department of City Planning New York City Office of Long-Term Planning Residential Energy Consumption Survey (RECS) Commercial Buildings Energy Consumption Survey (CBECS) Authors: Professor Vijay Modi, Bianca Howard, Shaky Sherpa ©2012 Modi Research Group, Columbia University

The annual building energy consumption was estimated using ZIP code-level energy usage, on electricity, natural gas, fuel oil and steam consumption for the year 2009 as well as building information obtained from MapPLUTO (a NYC Department of City Planning geographic database). With these two data sources through statistical regression we were able to estimate annual energy usage intensities. Energy usage intensity (EUI), is annual energy consumption divided by the total building floor area. These are “delivered” energy intensities and not “primary” energy intensities. This distinction is critical since “primary” energy utilized to produce electricity can vary with the type of power plant. Finally, for visualization purposes only, the energy use was normalized by the block or tax lot land area. This map was developed as part of a NSF IGERT funded research project in the School of Engineering and Applied Science at Columbia University

This map represents an estimate of the energy consumption for each of the buildings on a lot or block in New York City. The buildings with larger energy consumption are shown in dark reds and lower energy consumption, not necessarily more efficient, are shown in lighter shades of yellow. The breakdown values for energy use range from less than 50 kilowatt-hour (kWh) (lower end) to greater than 5000 kWh (higher end) per square meter of block/lot area. (A 100-watt light bulb turned on for 10 hours uses 1 kWh of energy.)

to store solar power. Smart meters can allow such a coordination to occur. Considering a system as massive as the eastern U.S. power grid, an occasional shortfall in power could be managed by allowing smart meters to potentially shut off less critical appliances. Or the energy shortfall could be made up by drawing power from the batteries of plug-in hybrid cars that were previously charged from the grid at times when clean energy was available. Perhaps the biggest contribution of a data-driven city will be that data will allow us to experiment at a smaller scale with different changes, learn how those changes are being accepted, and learn the consequences of those changes, all before recommending large-scale changes that would have a significant impact on a city and its inhabitants. At Columbia Engineering, we are working on creating the tools to understand the potential scale of such opportunities—pin-

pointing the low-hanging fruit that can allow us to make initial investments to create pilot projects and figuring out how to provide the information that will allow many different players to get involved, while reducing design and deployment costs. Our efforts encompass the chain of technologies from sensors to monitoring platforms, software, and hardware to meter and manage at the decentralized level as well as the design, control, and operational logic for the large-scale smart grid efforts. As economist Edward Glaeser recently pointed out, while it may appear paradoxical that electronic communications are making cities more, rather than less, important, it may actually be fortuitous that this is so, since electronic communications will be the underpinnings that ensure the future vitality of the city.

Vijay Modi is a professor of mechanical engineering at Columbia and a faculty member at the University’s Earth Institute. He earlier led the UN Millennium Project (MP) effort on the role of energy and energy services in reaching the Millennium Development Goals. Modi is a leading expert and scholar in energy sources and conversion, heat/mass transfer, and fluid mechanics. He currently works on, among other projects, low-cost smart micro-grids, examining the foodenergy-water nexus in Indian agriculture, design and planning energy infrastructure, and energy technologies for sustainable development. His laboratory has developed monitoring and data capture and visualization tools such as Sharedsolar, Network Planner, and Formhub.



New Media The New Media Center will address the automated customization and targeting of online advertising, the creation of new forms of smart media to augment traditional publishing and journalism, the acquisition and analysis of data from social media, and the extraction of useful information from online multimedia, including text, speech, video, and images.

Picture This By Shih-Fu Chang

T

hanks to the pervasive adoption of digital video devices and the convergence of distribution channels, the use of digital video is rapidly growing at an unprecedented rate. Online video site YouTube reports an extraordinary trend of video traffic growth—60 hours of video are uploaded every minute, and more than four billion videos are viewed every day. Compared to the established fields, the number of videos uploaded to YouTube alone in one month is already higher than what is created by the three major U.S. television networks combined over a period of 60 years. Major growth trends of digital video use can also be found in other areas such as surveillance, advertising, consumer media, education, and science. However, such a massive development creates not only a wealth of opportunity, but also a whole set of new and challenging problems. How

can we develop robust automatic techniques to extract useful structures and information from video to support intuitive search and browsing interfaces? How can the rich information of multiple modalities (audio, visual, text) and metadata associated with the video be combined to detect semantic information and topics in the video? How do we optimize the presentation of video programs for different users based on their personal interests, tasks, and device platforms? Finally, how can we track the evolution of video-related memes on social networks and understand their impacts on community sentiments and social trends? These opportunities have stimulated exciting development and research in both industry and academia. Large-scale sharing and streaming of videos have become possible in commercial services such as YouTube and Netflix. Content identification on the scale of tens of millions is deployed


Data The explosion of digital video is fueling not only a wealth of opportunity but also a whole set of new and challenging problems.

Shih-Fu Chang is the Richard Dicker Professor of Telecommunications, professor of electrical engineering, and professor of computer science. He has made significant contributions to multimedia search and analysis, visual communications, and media forensics. Chang is an IEEE fellow and a fellow of the American Association for the Advancement of Science and has served as editor in chief for IEEE Signal Processing Magazine (2006–2008) and chair of Columbia Engineering’s Department of Electrical Engineering (2007–2010). In 2011, he received the ACM SIGMM Technical Achievement Award, and in July of 2012, he was appointed senior vice dean of Columbia Engineering.


commercially for copyright infringement prevention and video program recognition. Augmented TV with content augmentation, interactive control, and personalization has been identified as a major trend that will influence business strategies and consumer experience in the next few years. In academia, digital video research has also attracted a large number of groups across the world and spurred large funding programs, including digital libraries in the 1990s and, recently, several large programs from government agencies like the Defense Advanced Research Projects Agency (DARPA) and the Intelligence Advanced Research Projects Activity (IARPA). Columbia researchers have led this field for several decades. Starting in the late 1980s, Columbia has actively led the development of video coding and retrieval standards, such as MPEG-2, MPEG-4, and MPEG-7. Columbia Engineering Professor Dimitris Anastassiou’s invention on video deinterlacing formed the foundation of Columbia’s original contribution, the only university component in the consortium that developed and organized the patent licenses used in almost every form of digital video product and application today, such as DVD and Internet video. My research group has developed some of the earliest video search engines that allow intuitive access at the semantic concept level and advanced content-based matching functionalities. Working with cross-disciplinary collaborators, we have demonstrated several top-performing prototype systems for searching videos in news, consumer media, biomedicine, and aerial surveillance. Columbia Engineering Professor John Kender’s group developed advanced indexing techniques and search interfaces that allow users to browse lecture videos via different modalities (visual, text, diagrams, and faces). Together with our Engineering colleague Professor Dan Ellis, who specializes in audio and music recognition, we are currently working with industry partners, including IBM and Raytheon BBN Technologies, to develop nextgeneration systems that can detect highlevel events (such as complex human activities and social events) and generate intuitive recounting of multimodal evidences found in the digital video clips. In addition, my

group and Columbia Engineering Professor Steven Feiner’s group are collaborating with industry partners in using digital video and virtual reality technologies to develop new applications such as digital signage, mobile search, and semantic aware augmented reality. Our research team at Columbia is privileged by having the opportunity to collaborate with colleagues from many other disciplines, such as professors at the Columbia School of Journalism, Columbia Business School, the Biomedical Informatics Department, the Department of Psychology, and Teachers College. For example, we are working with the Journalism School and collaborators from Stanford University (Bernd Girod) to develop an intelligent, personalized TV system that will provide real-time personalized TV news services in a style similar to Pandora’s system for personalized radio. We are also working with Professor Paul Sajda of Biomedical Engineering to develop new image retrieval systems using hands-free brain-machine interfaces and novel ways of fusing computer vision and human vision capabilities. Beyond these topics, many long-term challenges and opportunities also arise at this revolutionary stage of digital video. The introduction of depth sensors and 3-D capturing devices, like Kinect and 3-D cameras, will fundamentally transform the user experience and the representation and processing of digital video. The increasing use of digital video in social media networks will provide valuable insight for understanding the semantic content of digital video, as well as discovering the role of video in shaping sociopolitical trends. In the age of citizen journalism and distributed crowd sensing, digital video provides a new tool for sensing the user activities, realworld environment, and natural phenomena. Success in extracting information and discovering knowledge from the rich multimodal sensor data in digital videos will be important for many applications such as smart living, emergency response, and social study. We are ready to tackle these challenges and continue to lead innovation in this field by building on the prior success in research and also creating new initiatives that skillfully combine the strengths of many disciplines across the University.

Columbia Engineering

More Than Enough Words to Process By Michael Collins

L

anguage is pervasive. It constitutes one of the most complex forms of human behavior and offers a rich problem domain for computational and data-driven approaches. Natural language processing (NLP) deals with the interactions between computers and human languages, often using machine learning to approach problems in text or speech. The vast amount of linguistic data now in electronic form is posing significant challenges (and opportunities) in how we manage and access this data through the use of NLP technologies. Key NLP problems include how to search and browse language data effectively; how to extract useful information from language data, turning unstructured text into structured (database) representations that support a wide range of queries; how to automatically translate between languages; how to summarize single documents or collections of documents; how to develop interfaces that allow us to search or browse speech data; and so on. Researchers at Columbia have made seminal contributions to a wide range of NLP problems. A key example of an NLP problem is parsing, which seeks to identify the underlying syntactic structure of sentences.

Parsing is central to NLP and underpins many natural language applications—for example, identifying the main verb of a sentence, identifying the arguments to the verb such as its subject and object, and identifying other relationships between words in a sentence. Syntactic structures and the parsing problem are critical ideas in both NLP and theoretical linguistics. The idea of syntactic structures, in fact, stems from Noam Chomsky’s PhD thesis work and has been a central focus of linguistics ever since. The parsing problem is relevant to a vast range of applications in NLP. It is a very challenging one because of the complexity of natural language syntax and the extraordinarily high level of ambiguity exhibited by natural languages. Dramatic progress in the accuracy of natural language parsers has been made over the past couple of decades. They are now in widespread use in NLP applications. The key reason for this success has been a shift to statistical models, which integrate techniques from machine learning together with linguistically detailed grammars. In the mid-1990s, I developed one of the earliest high-performing statistical parsers and have since created a


series of models with increasing levels of accuracy. My colleague Owen Rambow, a research scientist with the Center for Computational Learning Systems (CCLS), has made fundamental contributions in the area of tree adjoining grammar (TAG), a syntactic formalism that blends linguistic precision with powerful computational properties. Fellow CCLS research scientists Nizar Habash and Mona Diab are leaders in the area of Arabic language processing. Arabic is a challenging language from a parsing standpoint due to its rich morphology. Another key problem we are addressing in NLP is machine translation—automatic translation between human languages. This application area has also seen recent dramatic progress as a result of data-driven and statistical methods. Significant work in the early 1990s, carried out by researchers at IBM, framed translation as a statistical problem. In this approach, millions of existing translations between a pair of languages are used to leverage a translation model, for example, through automatically learned bilingual lexicons. There was a resurgence of interest in statistical machine translation in the late 1990s, and this led to second-generation statistical machine translation systems, a prime example being Google translate. While extremely successful, many of these systems make essentially no use of syntactic information in the source or target languages, with the result that translations from these systems are often ungrammatical or fail to preserve the meaning of the source language text. Our research in machine translation has focused primarily on integrating richer linguistic representations within statistical machine translation systems. This is exemplified in my recent work using syntactic information to model the differences in word order between different languages. Diab and Habash use rich morphological and semantic models for translation with Arabic as the source or target language in their research. Kathleen McKeown, the Henry and Gertrude Rothschild Professor of Computer Science, is developing postediting systems in her research that make much richer use of contextual information than previous systems. As a result, a third generation of statistical machine trans-


The vast amount of linguistic data now in electronic form is posing significant challenges (and opportunities) in how we manage and access this data through the use of natural language processing (NLP).

lation systems is emerging. These systems make much deeper use of syntactic, semantic, and morphological information, and give more accurate and fluent translations. Given the deluge of text or speech data in electronic form—for example, on the web, on mobile devices, and on social networks—automatic summarization of text is another key NLP application area. McKeown’s group has performed extensive research on summarization. As one example, McKeown developed the Newsblaster system, which takes a group of several news articles on the same event and produces a concise summary of those articles. Newsblaster uses several stages of linguistic processing: first, articles on the same event are identified through a clustering approach; next, the system identifies sentences across different articles that overlap in terms of content; finally, the system generates a summary that contains the most salient pieces of information. In addition to work on text, we have considerable expertise in the area of spoken

language processing. Professor Julia Hirschberg has worked in a wide range of areas concerning speech, including speech synthesis; search and summarization of speech data; and the study of intonation in speech. Hirschberg and Rebecca Passonneau, senior research scientist, have worked extensively in dialog systems, which allow a user to interact with a computer through speech. The underlying theme of all this research is the use of computational methods, in conjunction with detailed linguistic representations, and rich statistical models. There are close connections to many other research areas within and outside Columbia Engineering, in particular, machine learning and statistics. Indeed, the vast amount of text and speech data in electronic form has led to both challenges and opportunities for data-driven approaches for NLP, and the researchers at Columbia are at the forefront of research in this area.

Michael J. Collins, the Vikram S. Pandit Professor of Computer Science, is one of the world’s leading researchers in statistical natural language processing (NLP). He has developed parsers that have obtained unprecedented accuracy levels and have revolutionized the field of NLP. Before joining Columbia Engineering, Collins was a faculty member in the Department of Electrical Engineering and Computer Science at MIT.

The underlying theme of all this research is the use of computational methods, in conjunction with detailed linguistic representations, and rich statistical models. There are close connections to many other research areas within and outside Columbia Engineering, in particular, machine learning and statistics.



Health Analytics The Health Analytics Center will build upon the work of Columbia researchers drawn from the fields of medicine, biology, computer science, applied mathematics, and statistics. These researchers are using patient data, genomic databases, and public health records to improve patient care and to achieve greater efficiencies in public and private health care systems.

Imaging Informatics: Integrating Multimodal and Longitudinal Data for Understanding and Treatment of Disease By Andrew Laine

T

he rapid increase in medical imaging data poses new challenges to data analysis and management. We want a way to understand the information encoded in these large image data sets. To get there, we need to think of new ways of analyzing the data. Our hope and potential gain is that by looking back at previous outcomes of many “similar” patients, and the effectiveness of prior treatments and therapies, we can better understand disease processes and/or discover new relationships/correlates that result in a better quality of life for patients at reduced costs. In simple terms, we are building computer systems that will help doctors manage large amounts and many different kinds of patient information in order to improve their clinical decisions in treating diseases such

as cancer, heart disease, diabetes, and chronic neurological diseases like Alzheimer’s and Parkinson’s. A resulting system will analyze and display a patient’s history in text and image records, and will use these to forecast possible treatment outcomes, based on comparisons to its knowledge of similar patients and treatments. Such revolutionary tools will enable physicians to explore alternatives and share their experiences with other doctors, leading to better treatment while reducing costly mistakes. A standard 3-D medical image volume is large if compared to text documents, but the computing power available in normal desktop computers is usually sufficient for the isolated analysis of a single volume. The challenges arise when data from multiple imaging modalities are incorporated into simultaneous analysis, which is often necessary in state-of-the-art


approaches. If one of these modalities is a functional modality such as functional magnetic resonance imaging (fMRI), this means that there is an image volume for every point in time at which an image was acquired. The step to “big data” is taken when the analysis method requires the incorporation of images from multiple “subjects.” A typical picture archival and communications system (PACS) database, used commonly in hospitals to manage image data, often includes images from thousands of patients. In longitudinal studies, each one of these patients can have numerous imaging studies over a long period of time. This data integration problem motivates the application of data mining approaches for images. Increasing interest among researchers at Columbia has also been directed toward combining genetic data with such imaging studies. The analysis of chronic lower respiratory disease (CLRD) is a perfect example of the data processing requirements we are facing. CLRD comprises of multiple separate diseases and is the third leading cause of death in the United States. However, CLRD is currently poorly understood, partially because the capacity and sophistication available for data analysis have not been sufficient to deal with the complexity and size of all the available data. A full lung CT provides precise evaluation of lung tissue, but current methods of analysis simply extract a few simple values from each image using coarse approaches, therefore lacking adaptivity and throwing out valuable data in the process. In collaboration with Dr. Graham Barr of the Department of Medicine at Columbia University Medical Center, Elsa Angelini PhD’02, and Yrjö Häme, Fulbright Scholar and PhD candidate in biomedical engineering, we are developing new approaches for studying disease patterns in the lung. The Emphysema and Cancer Action Program (EMCAP) data set at Columbia University has longitudinal data of around 500 subjects with pulmonary CT and MR perfusion images as well as genetic data. The total


number of image volumes for each imaging modality is around 1,200. If a system could analyze all this data simultaneously and extract relevant patterns using machinelearning approaches, our understanding of CLRD would certainly increase. We would then have a direct impact on understanding, preventing, and treating these diseases. With George Hripcsak, chair of Biomedical Informatics at Columbia, we plan to design, develop, and evaluate an investigative informatics platform. This platform will enable interactive exploration of multimodal longitudinal patient data for supporting improved clinical decision making in brain tumor patient management. We do this by integrating the resources of a strong multidisciplinary team of clinicians, computer scientists, and informaticians across academia and industry. For example, the current clinical practice of neuro-oncology requires physicians to approach brain tumor patient management as an investigative task, where they create mental models of the patient, form hypotheses, create observations from information embedded in various sources of patient data, compare disease progression and efficacy of various treatments among patients, and adjust course of care based on the outcomes. It is time-consuming to access and integrate all these pieces of information in patient records. This becomes more of a problem when correlating the temporal progression of various factors and biomarkers obtained from patients’ clinical studies. In addition, the similarities in disease progression among different patients and their relationships to outcomes and clinical decisions remain difficult to discern in piecemeal. Hence, there exists a gap between heterogeneous data comprising patient records and the decision-enabling information and evidence required in managing patients. We believe that providing physicians with an informatics platform that allows them to explore the multimodal longitudinal intraand inter-patient records, form different ondemand observations of patient information and images, and test various hypotheses with respect to proper course of action for a given patient, can help bridge this gap. This new platform would provide tools for content analytics and temporal pattern mining; interactive exploration and visualiza-

tion of information; composing global (intrapatient trends) and local (patient-specific) observations of patient information; and capturing and sharing insight and knowledge about preferred view for a condition in the form of templates. Such an investigative modeling and data-guided exploration informatics platform will allow clinicians to derive insights and examine evidence in a dynamic fashion for better decision making and enhanced patient outcome. In the long term, we plan to evolve and expand the informatics platform to create an “information marketplace,” where analytic developers can provide their specific analytic tools, and expert clinicians can provide their insight and knowledge in the form of preferred composite views for patient observation or hypothesis testing as a service to the clinical community—knowledge as a service. Functional imaging in the brain is another ongoing project we are tackling in collaboration with faculty in the Departments of Radiology and Neuroscience and the New York State Psychiatric Institute, including Chaitan Divgi, Yaakov Stern, Ramin Parsley, and John Mann. Molecular brain imaging studies are typically comprised of MRI and positron emission tomography (PET) images, and over 20 PET images are acquired over time in a typical session. Each volumetric image contains over one million voxels. When co-registered with high resolution MRI, this expands to over eight million voxels per image. Storage wise this may not be a lot, but processing requirements are substantial. Each voxel of the PET scan must be motion corrected followed by kinetic analysis that is performed on each voxel’s time series to extract many important physiological parameters (e.g., binding affinity of a particular radio nucleotide). Even if executed on fast computers, these sophisticated models could take up to a day to run a single brain scan. Additionally, PET data is almost always combined with MRI analysis. The MRI data, although a single-image volume, needs to be segmented, de-skulled, warped to a spatial template, and expanded to a surface representation—processes that take up to two days to complete on current high-speed computer systems. Today, databases and PACS in hospitals hold tens of thousands of MRI and PET scans. One notable example on an even

Anatomical slices of a CT lung study (left); computer-based classification of most likely areas of lung disease based on longitudinal analysis of imaging data (right)

larger scale is ADNI (Alzheimer’s Disease Neuroimaging Initiative), which curates brain imaging data from Alzheimer’s disease studies across the United States. These data are accessible to researchers. In fact, it is not uncommon to have 400 to 500 subjects included in a single study—the amount of data typically collected by midsized research laboratories over a 10-year period. Such large sample sizes allow us to use datadriven methods, or data mining. However, to leverage the full power of data-mining methods it is necessary to load and store huge data sets in computer memory that can be accessed by large clusters of machines to perform computations on the spatio-temporal data described earlier. These computational requirements grow exponentially if multimodal information is introduced to the analysis, such as combining PET with structural (MRI) and/ or functional (fMRI) genomic data, and so forth. Image analysis groups within the School’s new Institute for Data Sciences and Engineering that wish to compete in this field will need access to IT, computer programmers, and technicians devoted to maintaining the infrastructure to process complex data streams of multimodal data analysis. A focus of Arthur Mikhno’s dissertation research in the Department of Biomedical Engineering, as a fellow of the National

Institutes of Health, is to create a noninvasive PET analysis method. As it turns out, this requires segmenting the brain vasculature of each patient, which requires data-enhancement to improve resolution and a new motion correction and reconstruction routine. Although only recently introduced, these methods have shown archival data to be useful for longitudal studies. To validate our method we are working with multiple radioligands and different patient populations. Altogether our project will need to utilize over 800 PET data sets. Furthermore, each PET data set is paired with over 60 variables from each patient medical record. Thus, to optimize our model we will need to perform feature selection and parameter optimization in the context of modeling hundreds of scans— yet another example of how the explosion of data is fundamentally changing the way we analyze, store, and examine medical images, and integrate patient data. Indeed, it is exciting that this recent commitment between the School and New York City could provide the infrastructure needed for us to perform high-quality retrospective research. If the resources are available, new data analysis methods can be validated on hundreds, if not thousands, of different data sets to ensure they are robust and reduce the amount of studies and time required to validate new methodologies.

Andrew Laine is chair of Columbia Engineering’s Department of Biomedical Engineering, Percy K. and Vida L.W. Hudson Professor of Biomedical Engineering, and professor of radiology. For the last 14 years, he has directed the Heffner Biomedical Imaging Laboratory at Columbia and leads numerous research projects focusing on quantitative image analysis, cardiac functional imaging, ultrasound and MRI, retinal imaging, intravascular imaging, and bio-signal processing. Laine has served on several professional societies, including as vice president of publications and chair of the Technical Committee on Biomedical Imaging and Image Processing for the IEEE Engineering in Medicine & Biology Society (EMBS) and as program chair for the 2006 and 2011 Institute of Electrical and Electronics Engineers EMBS annual conferences.



Applying Big Data Approaches to Biological Problems By Chris Wiggins

It’s an exciting time for data in New York City.

I

find myself having conversations here with people from increasingly diverse fields, both at Columbia and in local start-ups, about how their work is becoming “data-informed” or “data-driven,” and about the challenges posed by applied computational statistics or “big data.” In discussions with New York City journalists, physicists, or even former students now working in advertising or social media analytics, I have been struck by how many of the technical challenges and lessons learned are reminiscent of those faced in the health and biology communities over the last 15 years, when these fields experienced their own data-driven revolutions. We were wrestling with many of the problems now faced by people in other fields of research or industry. It was around then, as I was working on my PhD thesis, that sequencing technologies became sufficient to reveal the entire genomes of simple organisms and, not long thereafter, the first draft of the human genome. This advance in sequencing technologies made possible the “high throughput” quantification of, for example: the dynamic activity of all the genes in an organism; the set of all protein-to-protein interactions in an organism; or even statistical comparative genomics revealing how small differences in genotype correlate with disease or other phenotypes. These advances required formation of multidisciplinary collaborations, multi-departmental initiatives, advances in technologies for dealing with massive datasets, and advances in statistical and mathematical methods for making sense of copious natural data. This shift wasn’t just a series of technological advances in biological research. The more important change was a realization that research


in which data vastly outstrip our ability to posit models is qualitatively different. Much of science for the last three centuries advanced by deriving simple models from first principles—models whose predictions could then be compared with novel experiments. In modeling complex systems for which the underlying models are not yet known but for which data are abundant, however, as in systems biology or social network analysis, one may turn this process on its head by using the data to learn not only parameters of a single model but to select which among many or an infinite number of competing models is favored by the data. Just over a half-decade ago, the computer scientist Jim Gray described this as a “fourth paradigm” of science, after experimental, theoretical, and computational paradigms (Markoff, The New York Times, 2009). Gray predicted that every sector of human endeavor would soon emulate biology’s example of identifying datadriven research and modeling as a distinct field. In the years since then we’ve seen just that; examples include datadriven social sciences (often leveraging the massive data now available through social networks) and even data-driven astronomy. I have personally enjoyed seeing many Columbia Engineering students, trained in applications of big data to biology, go on to develop and apply datadriven models in these fields. As one example, a recent Engineering PhD student spent a summer as a hackNY fellow, applying machine learning methods at a data-driven dating website, OKCupid—a New York City–based start-up. He is now applying similar methods to population genetics as a postdoctoral researcher at the University of Chicago. These students, often with job titles like data scientist, are able to translate to

other fields, or even to the “real world” of industry and technology-driven start-ups, methods needed in biology and health for making sense of abundant natural data. In my research group, our work balances engineering goals, e.g., developing models that can make accurate quantitative predictions, with natural science goals— meaning building models that are interpretable to our biology and clinical collaborators, and that suggest to them those novel experiments that are most likely to reveal the workings of natural systems. For example, we’ve developed machine learning methods for modeling the expression of genes—the “on-off ” state of the tens of thousands of individual processes human cells execute—by combining sequence data with microarray expression data. These models reveal which genes control which other genes, via what important sequence elements. We’ve analyzed large biological protein networks and shown how statistical signatures reveal what evolutionary laws can give rise to such graphs. In collaboration with faculty at Columbia’s Chemistry Department and New York University’s medical school, we’ve developed hierarchical Bayesian inference methods that can automate the analysis of thousands of time series data from single molecules. These techniques can identify the best model from models of varying complexity, along with the kinetic and biophysical parameters of interest to the chemist and clinician. Our current projects include, in collaboration with experts at Columbia’s College of Physicians and Surgeons in pathogenic viral genomics, using machine learning methods to reveal whether a novel viral sequence may be carcinogenic or may lead to a pandemic. This research requires an abundant corpus of training data as well as close collaboration with the domain experts to ensure that the models exploit—and are interpretable in light of—the decades of bench work that has revealed what we now know of viral pathogenic mechanisms. Throughout, our goals balance building models that are not only predictive but interpretable, e.g., revealing which sequence elements convey carcinogenicity or permit pandemic transmissibility.

More generally, we can apply so-called big data approaches not only to biological examples as above but also to health data and health records. These approaches offer the possibility of, for example, revealing unknown lethal drug-to-drug interactions or forecasting future patient health problems; such models could have consequences for both public health policies and individual patient care. As one example, the Heritage Health Prize is a $3 million challenge ending in April 2013 “to identify patients who will be admitted to a hospital within the next year, using historical claims data.” Researchers at Columbia, both here at the Engineering School and at Columbia’s Medical Center, are building the technologies needed for answering such big questions from big data. In 2011, the McKinsey Global Institute estimated that between 140,000 and 190,000 additional data scientists will need to be trained by 2018 in order to meet the increased demand in academia and industry in the United States alone. The multidisciplinary skills required for data science applied to such fields as health and biology will include: the computational skills needed to work with large data sets usually shared online; the ability to format these data in a way amenable to mathematical modeling; the curiosity to explore these data to identify what features our models may be built on; the technical skills that apply, extend, and validate statistical and machine learning methods; and most importantly, the ability to visualize, interpret, and communicate the resulting insights in a way that advances science. As the mathematician Richard Hamming said, “The purpose of computing is insight, not numbers.” More than a decade ago the statistician William Cleveland, then at Bell Labs, coined the term “data science” for this multidisciplinary set of skills (Cleveland, International Statistical Review, 2001) and envisioned a future in which these skills would be needed for more and more fields of technology. The term has had a more recent explosion in usage as a rapidly growing number of fields—both in academia and in industry—are realizing precisely this future.

We can apply so-called big data approaches not only to biological examples but also to health data and health records. These approaches offer the possibility of revealing unknown lethal drug-to-drug interactions or forecasting future patient health problems; such models could have consequences for both public health policies and individual patient care.

Chris Wiggins is associate professor of applied mathematics at Columbia Engineering, a founding member of Columbia’s Center for Computational Biology and Bioinformatics, and cofounder of hackNY.org, a nonprofit organization that connects students with career options in New York City–based start-ups. His research centers on applications of machine learning, statistical inference, and stochastic modeling in biology.



$

Financial Analytics The Financial Analytics Center will bring together in finance theory, machine learning, statistics, signal processing, operations, and natural language processing, and will support collaborations with students, as well as with the financial industry. The result will be entrepreneurial ventures with the potential to define finance and financial engineering for the 21st century.

Data-Driven World Fuels Change in Portfolio Selection, Risk Management By Garud Iyengar

T

he current financial crisis has highlighted the need for a better understanding of portfolio selection and financial risk management. It is important that the wealth management industry update the models and methods they use to invest. Many new products are now available, and the risk inherent in these products is not always apparent. And, as we saw, in the recent economic crisis, a concentrated investment in certain products, e.g., credit default swaps (CDS), can spill over to cause a systemic failure of the entire financial system.

With the reduction in the cost of the information technology infrastructure, more and more financial transactions are being captured electronically. The quality of financial data is improving, and with that, the quantity of data is increasing. This data revolution is forcing a change in the way many of the problems in this field are addressed. In the past, high-quality data for calibrating models was hard to come by; the present challenge is developing methods that can adequately understand the trends in the large amount of available data to arrive at good decisions.


The quality of financial data is improving, and with that, the quantity of data is increasing. This data revolution is forcing a change in the way many of the problems in this field are addressed. Analyzing financial data can be quite complex. The underlying factors driving the prices and volatility can change, and consequently, the data can become stale. A number of faculty members at the Engineering School are involved in both developing data-driven models for financial risk managements that can exploit the available data and in creating new optimization methods that can compute optimal decisions in these new models. One of the oldest methods for portfolio selection is the mean-variance portfolio selection model proposed by Harry Markowitz, the 1990 Nobel laureate in economics, where the goal is to compute a portfolio that has the highest mean return for a given level of risk that is given by the variance of the portfolio return. To calibrate this model, the mean return on each asset and the covariance of all pairs of assets need to be estimated. These estimates always have errors—and the optimization procedure inflates these errors. To get an intuitive feel for this phenomenon, consider two independent assets with identical return distributions. Suppose the statistical estimation procedure underestimates the mean of the first asset and overestimates the mean of the second. The portfolio selection step would then overweight the second asset; consequently, the difference between the expected return and the realized return would be worse than the initial estimation error. This phenomenon is well known in the field, referred to as “mean variance leads to error-maximizing, investment-irrelevant portfolios.” The equally weighted portfolio


would have performed much better in this case. The availability of a large amount of data could do away with this problem because now the statistical errors will be very small. Unfortunately, financial data become stale, and, therefore, the error can never be reduced below a threshold. Moreover, the shifts in factors driving the market result in estimates that are extremely error prone. With Interim Dean and Avanessians Professor Donald Goldfarb, I have devised a robust portfolio selection strategy that computes the portfolio assuming the worst-case behavior of the parameters in the statistically estimated confidence region. We show that computing the optimal portfolio in this model is no harder than computing the optimal portfolio for the mean variance model. But the performance of the portfolios on real market data is significantly superior, and the portfolios perform very well through times where the factors driving the market change drastically. We’ve extended this model to active portfolios and have shown that the policy consistently beats the benchmark. Returning to the two-asset example above, this new methodology yields an equally weighted portfolio with very high probability. Meanwhile, variance, though still a good measure for variability, has come under severe criticism recently as a measure of risk because it does not adequately capture the risk associated with very low probability events with heavy losses. Some examples of such an occurrence include so-called tail events or “black swan” events like a possible Greek default or a drop in the equity market and simultaneously a drop in interest rates. Risk measures such as Value-at-Risk (VaR) and the Conditional Value-at-Risk (CVaR) are better able to represent this risk. The VaR of a portfolio is typically set equal to the loss level such that the chance of observing losses larger than the VaR level is at most 5 percent or 1 percent. CVaR is the average loss of the portfolio in the event that the losses exceed the VaR level. The CVaR is a special case of a more general measure called the spectral risk measure. It is known that the mean-

spectral risk portfolio selection problem can be formulated as a linear program—an optimization problem that can be solved efficiently in theory. But in practice, these programs are very large—a 100 asset portfolio selection problem can very easily blow up to a linear program with 150,000 decision variables that is very hard to solve in practice. Now with the increased availability of historical data for different market conditions and the continued unstable financial outlook, portfolio managers want to add risk measures with respect to many different scenarios for the market outcomes. The state-of-the-art commercial linear program solvers are unable to solve such portfolio selection problems. In collaboration with faculty at Penn State and the Statistics Department at Columbia, we have developed a new iterative algorithm that computes solutions to very large-scale instances of the mean-spectral risk portfolio selection problems efficiently. Moreover, at each step, our algorithm computes the solution to a very simple mean-variance problem; therefore, using this algorithm, portfolio managers can solve mean-spectral risk portfolio selection problems using the existing mean-variance technology. Since the beginning of the current financial crisis, the measurement and management of the risk of systemic failure has become the central problem in risk management of business and engineering systems. A number of different ad hoc risk measures have been proposed for measuring systemic risk. Some of these measures model the economy as a portfolio of firms and then use risk measures that are used to measure the risk of a single firm; others define the systemic risk to be the insurance premium required to protect against future bailouts. With Professor Ciamac Moallemi from Columbia Business School, I have proposed an axiomatic framework for defining systemic risk. We show that all admissible systemic risk measures can be decomposed as a risk measure across scenarios and an aggregation function that aggregates firm outcomes in each scenario. All the systemic risk measures in the literature are special cases of admissible measures. We prove that risk measures appropriate for quantifying

the systemic risk in applications as diverse as transportation networks, the electricity networks, and supply chain networks can all be represented in this framework. We hope to extend this research on several fronts. On the methodological front, we want to understand how a systemic risk function can be used to incentivize firms to hold positions that reduce systemic risk. On the empirical front, we’re working on identifying systemic risk factors that can explain the systemic risk in the economy. The factor identification procedure is a very large-scale convex optimization problem that uses signal-processing techniques that were originally introduced in the context of compressing images and videos. We expect that these factors can be used to stress test the economy, just as the factors are currently used to stress test firms. It’s crucial that the research we conduct now focus on creating new models to handle large-scale data and to develop algorithmic approaches that can scale up to work with such data sets. Columbia is particularly well poised to make fundamental contributions in this area. At the Engineering School, my colleagues, Dean Goldfarb and Professor Daniel Bienstock, are experts in developing models and algorithms for financial applications; Columbia Business School Professors Paul Glasserman, Mark Broadie, and Moallemi are experts in risk management. In addition, the Department of Statistics is making a concentrated effort to hire faculty whose main research interest is developing new algorithms for datadriven decision models.

Garud Iyengar is a professor in Columbia Engineering’s Department of Industrial Engineering and Operations Research. He specializes in convex optimization, robust optimization, queuing networks, combinatorial optimization, mathematical and computational finance, communication and information theory. He has published in numerous journals, including IEEE Transactions on Information Theory, Mathematics of Operations Research, and IEEE Transactions on Communication Theory.



Cybersecurity The Cybersecurity Center will be dedicated to developing the capacity for keeping data secure and private throughout its lifetime, a core focus of the Institute for Data Sciences and This Center bring together and buildthe upon the research of thedata DeTheEngineering. Cybersecurity Center willwill be dedicated to developing capacity for keeping partments Computer Scienceits and Electrical Engineering andInstitute the workforofData the Columbia secure andofprivate throughout lifetime, a core focus of the Sciences Business School, among other Columbia schools and departments. and Engineering. This center will bring together and build upon the research of the Departments of Computer Science and Electrical Engineering, and the work of the Columbia Business School, among other Columbia schools and departments.

Whither Cloud Computing Security? By Angelos D. Keromytis

C

loud computing is rapidly becoming the new paradigm for deploying software services to business, government, and individuals in the form of public clouds such as Amazon EC2, a web service that provides resizable compute capacity in the cloud or other enterprise private ones. An increasing number of critical applications are deployed and operated in such computational environments. Many of the qualitative improvements in other fields and applications—from social networking to scientific computation and mobile computing—stem, to a large extent, from the ability to quickly and efficiently deploy and scale up

new services that are responsive to varying workloads. Given the concentration of services and data, often from a large number of different entities, into a single logical (and sometimes physical) location, i.e., a data center, cloud infrastructures represent a tempting and highly lucrative target for attackers. Therefore, the security expectations from cloud computing infrastructures are (or should be) arguably higher than those in traditional computing. While there is probably some basis to the expectation that unified and concentrated administration and management will lead to better overall


Cloud infrastructures represent a tempting and highly lucrative target for attackers. Therefore, the security expectations from cloud computing infrastructures are (or should be) arguably higher than those in traditional computing.

security, the current state of affairs regarding the (in)security of enterprise networks and systems does not inspire great confidence. In other words, given existing security mechanisms and practices, it is likely that any gains from homogeneity and higher professional standards will not be sufficient to protect the cloud infrastructure and the applications running on it from even more motivated adversaries. We need to rethink the security of cloud computing. In our work at the Network Security Lab, we are introducing fresh principles and mechanisms for protecting these new computing infrastructures and applications. These new primitives should leverage the strengths inherent in this form of computing to improve security. A key

characteristic of cloud computing is the overprovisioning of resources to comfortably accommodate the highest expected workload, with some margin of error. This leads to “dormant” resources, which are not used most of the time. The argument has been made that such resources can be used, at least some of the time, to protect the cloud infrastructure and all applications on it. However, if spread across all applications, the quantitative improvement in security (e.g., by being able to use a given security mechanism whose overhead is compensated by increased resource usage) will be incremental and small. In our DARPA-funded MEERKATS research, we argue for a mission-oriented cloud computing security architecture that focuses resources to improve the resiliency of components that are critical to the current application; learns and adapts to past, current, and anticipated threats; and inherently presents an unpredictable target through continuous “motion” and mutation of services and data, and through the use of deception. The two high-level challenges to our envisioned architecture are the lack of the efficient and effective mechanisms for instantiating several of our architecture’s core elements, and the complexity of integrating and operating an architecture where “everything changes.” To realize our vision we need to investigate, develop, and evaluate a number of individual components, and to integrate them in a coherent architecture. Our group at the Engineering School, which includes Computer Science Professors Salvatore Stolfo, Junfeng Yang, Roxana Geambasu, and Simha Sethumadhavan, are not starting this effort from scratch.

We are building on prior and concurrent work in the areas of software hardening and deception. In the Air Force–funded MINESTRONE project, we are developing a system for protecting software against a large class of software vulnerabilities using a combination of static analysis, dynamic instrumentation, and runtime diversification. In the DARPA-funded SPARCHS project, we are designing systems with novel, integrated security mechanisms in all layers of the software and hardware stack. Finally, in the DARPA-funded ADAMS project, we are developing active deception techniques for identifying attackers who have already succeeded in penetrating an organization’s defenses, as well as malicious insiders. The next few years will see a panoply of technologies coming out of these research projects. If our effort in developing MEERKATS is successful, we will create a “moving target” defense mechanism for the cloud that will leverage the inherent distributed nature of the cloud to improve service and data resilience to threats and attacks. The ability of MEERKATS to explicitly control the tradeoff between resilience/security and resource consumption is fundamental to the adoption of security mechanisms. We believe that the inherent availability of fungible resources in the cloud and the ability of MEERKATS to both strategically and tactically deploy them as the situation warrants, e.g., in anticipation of or in response to an attack against a specific service or collection of data, will result in a more secure environment compared to current systems and services making, at best, small-scale tradeoffs between security and resource consumption.

Angelos D. Keromytis is an associate professor of computer science and director of Columbia’s Network Security Lab. He is an expert in systems security, network security, and cryptography. Currently, Keromytis is working on software hardening, system self-healing, network denial of service, information accountability, and privacy. He has served as associate editor with ACM Transactions on Systems and Information Security (TISSEC).

The barbarians are no longer at the gates. They are inside the doors and there are not enough guards to repel them.




Core Research To support and amplify the study of the five different centers, the new Data Sciences Institute will conduct core research on problems that cut across the data sciences and engineering. The research will focus on formal and mathematical models for data processing, as well as on issues concerning the engineering of large-scale data processing systems.

Smarter Optical Networks for a Congested Internet By Keren Bergman and Gil Zussman

T

he Internet is a crucial worldwide infrastructure that connects over two billion people, offering more than seven billion web pages, transporting roughly 30 exabytes of data a month, and connecting over a billion mobile broadband users. Emerging network services will enable various transformative applications such as 3-D holographic video for telepresence in education and telemedicine. However, the realization of the future Internet requires overcoming significant technological obstacles, which include significant growth in Internet

traffic and energy consumption as well as the need to support diverse applications and traffic requirements. Internet traffic continues to grow at an exponential rate, doubling roughly every one and a half years, driven by an increasing number of users, bandwidth-intensive applications such as video-on-demand, and numerous mobile and wireless platforms. Moreover, the Internet and the cellular networks already account for about 1 percent of the global carbon emissions, and their portion is steadily increasing.


Columbia University was one of the main contributors to the development of the Internet, and since the 1980s, we have retained a leading position in the area of networking.

Columbia University was one of the main contributors to the development of the Internet, and since the 1980s, Columbia has retained a leading position in the area of networking. Currently, several faculty members in the Engineering School’s Departments of Electrical Engineering and of Computer Science continue this tradition by dealing with the challenges imposed by issues such as traffic growth, heterogeneous networks, mobility, quality of service requirements, and energy consumption constraints. Specific areas of research include data center networking (Professors Keren Bergman, Vishal Misra, and Dan Rubenstein), wireless networking (Professors Augustin Chaintreau, Nicholas Maxemchuk, Vishal Misra, Dan Rubenstein, Henning Schulzrinne, Xiaodong Wang, and Gil Zussman), optical networking (Bergman), social networking (Chaintreau), the Internet of Things and cyber-physical systems (Maxemchuk, Schulzrinne, and Zussman), smart grid (Professors Javad Lavaei, Maxemchuk, and Zussman), and future Internet protocols (Misra and Schulzrinne). The work we do in this field is highly interdisciplinary; for instance, our joint work on access and aggregation networks, both optical and wireless. While the Internet core supports very high data rates by using high-capacity links, routers, and switches, there are major bottlenecks between the core and the access/aggregation networks (i.e., the networks covering metropolitan areas). We are both members of the NSF-funded Center for Integrated Access Networks (CIAN), a 10-university consortium led by the University of


Arizona. The Center’s vision is to create transformative technologies for optical aggregation networks, where any application requiring any resource can be seamlessly and efficiently aggregated and interfaced with existing and future core networks at low cost and with high energy efficiency. Recent advances in the field of optical communications provide new capabilities to optical elements. Instead of functioning as a simple bit pipe, modern devices can continuously make optical measurements on the quality of the data flowing through the links (e.g., measure the bit error rate or optical signal-to-noise ratio). Such measurements can be made directly in the optical domain, without having to convert the signal to the electrical domain. In addition to measurement capabilities, new devices can be dynamically programmed by a network management layer and dynamically configured based on the needs of the network. Our work focuses on leveraging these novel capabilities and the new devices that are being developed by the Center’s researchers in order to develop the CIANbox. The CIAN-box is an information aggregation node that uses real-time optical performance measurements and energy consumption monitoring, to enable application and impairment-aware switching, regeneration, and adaptive coding. Our groups are developing the CIAN-box hardware as well as the software and algorithms that will leverage its capabilities. Traditional networking algorithms operate disjointedly in different layers of the networking protocol stack (for example, network applications are designed separately from the routing algorithms, and the routing algorithms do not consider the type of physical medium they are using). However, due to the capability of the CIAN-box to react to measurements of the optical link and to adapt to traffic characteristics, there is a need for network management algorithms that span the various layers of the protocol stack. In recent years, cross layering has gained popularity in the wireless domain (for example, a cell phone that routes the packet through a Wi-Fi network rather than a cellular network based on the channel quality in both). Hence, bringing these ideas from the wireless to the optical domain and leveraging the CIAN-box hardware

components has the potential to significantly improve the performance and to turn optical networks from “dumb pipes” to intelligent networks. A few industrial collaborations build on the emerging CIAN-box capabilities. For example, Columbia is a member of the Greentouch industry-academia consortium, whose objective is to reduce the energy consumption of telecommunications networks and to build a sustainable Internet. Within this consortium, we are collaborating with the group of Dr. Dan Kilper at AlcatelLucent on a project considering the new capability to dynamically add and remove wavelengths. While this capability has the potential for significant energy savings (via turning off electrical and optical equipment when it is not needed), modifying the network on the fly can result in interference to other wavelengths sharing the same fiber. Hence, our groups are developing algorithms and techniques for dynamically modifying transmission and amplification power such that interference will be automatically mitigated and the network could rapidly adapt to required changes. Another collaboration takes place with the group of Dr. Peter Magill in AT&T Research and focuses on the placement of nodes (e.g., future CIAN-boxes) that can provide services such as optical signal regeneration and dynamic network reconfiguration. Within this project several routing-constrained location problems are being considered, and algorithms that have the potential to reduce the operator’s

operational and deployment cost are being developed. Finally, optical networks are being increasingly used to support cellular communications. Since smartphone usage is growing rapidly, the increase in bandwidth demands at the edge of the network is putting a strain on the optical backhaul networks, resulting among other things in high-energy usage by cellular providers. In collaboration with Schulzrinne, who is Julian Clarence Levi Professor of Mathematical Methods and Computer Science and professor of electrical engineering, a prototype of the CIAN-box has been integrated with a WiMAX (4G) base station deployed in Columbia as part of the NSF Global Environment for Network Innovations (GENI) project. This optical-wireless integration aims to demonstrate dynamic switching in the optical domain based on information about the quality of the wireless channel. Indeed, the Internet is fast moving and constantly evolving, but so are the ways in which we are tackling the challenges stemming from its ever-increasing reach and from the numerous new applications it supports. By driving the design of new optical devices, by jointly developing the hardware and networking algorithms, and by using cross-layering, it is possible to intelligently optimize the performance of optical aggregation networks that will carry most of the wireline- and wireless-originated traffic in the Internet.

Our vision is to create transformative technologies for optical aggregation networks where any application requiring any resource can be seamlessly and efficiently aggregated and interfaced with existing and future core networks at low cost and with high energy efficiency.

Keren Bergman is Charles Batchelor Professor of Electrical Engineering and chair of Columbia Engineering’s Department of Electrical Engineering. She directs the Lightwave Research Laboratory and leads multiple research programs on optical interconnection networks for advanced computing systems, data centers, optical packet switched routers, and nanophotonic networkson-chip. She is a fellow of the Institute of Electrical and Electronics Engineers (IEEE) and of the Optical Society of America (OSA), and she currently serves as the coeditor in chief of the IEEE/OSA Journal of Optical Communications and Networking. Gil Zussman received his PhD in electrical engineering from the Technion in 2004. Before joining Columbia Engineering, he was a postdoctoral associate at MIT. He is currently an assistant professor of electrical engineering at Columbia, and his research focuses on wireless networks. Zussman has served on the editorial boards of IEEE Transactions on Wireless Communications and of Ad Hoc Networks, and as the Technical Program Committee chair of IFIP Performance 2011. He is a corecipient of four Best Paper Awards and was a member of a team that won the first place in the 2009 Vodafone Americas Foundation Wireless Innovation competition. He is a recipient of the Fulbright Fellowship, the Marie Curie Outgoing International Fellowship, the Defense Threat Reduction Agency (DTRA) Young Investigator Award, and the National Science Foundation CAREER Award.


News

big data

in the

big apple

Bring on the data deluge.


Photo: Eileen Barroso

C

olumbia University, in a partnership with New York City, has launched the Institute for Data Sciences and Engineering to fill the explosive need for the acquisition and analysis of “big data.” At a press conference July 30 held at Columbia’s Northwest Corner Building, Mayor Michael R. Bloomberg detailed the new partnership alongside Columbia President Lee C. Bollinger, Columbia Engineering Interim Dean Donald Goldfarb, Engineering Professor and Institute Director Kathleen R. McKeown, Manhattan Borough President Scott Stringer, and other prominent City and elected officials. This was the latest step in the City’s Applied Sciences NYC Initiative, announced in 2010, which aims to increase New York’s capacity for applied sciences and spur commercialization and economic growth. The Columbia partnership, said Mayor Bloomberg, is expected to generate nearly $4 billion in overall economic impact and create more than 4,500 jobs over the next three decades. In conjunction with this announcement, Dean Goldfarb appointed Engineering Professors Kathleen R. McKeown and Patricia J. Culligan as the Institute’s inaugural director and associate director, respectively. As part of the deal, the City will provide Columbia with $15 million in financial assistance to help develop the Institute, and the University will contribute $80 million in private investments. The agreement also includes the creation of 44,000 square feet of new space

Interim Dean Donald Goldfarb with Institute Director Kathleen R. McKeown, Associate Director Patricia J. Culligan, and New York City Mayor Michael R. Bloomberg

on Columbia’s campus by 2016 and the addition of 75 new faculty—engineering and other disciplines—over the next decade and a half. “This is probably the most exciting moment that I can think of in the School’s 150year history and the future has never looked brighter,” said Dean Goldfarb. “Mayor Bloomberg’s announcement and the City’s support for the Data Sciences Institute means that the only constraint that prevents Columbia Engineering from being in the very, very top echelon of engineering schools in overall rankings—that is our small size relative to our peers—will essentially be overcome.”

President Bollinger also stressed this point in his remarks at the press conference and credited the Engineering faculty for its pioneering research, which shaped the core of the University’s proposal for the new Institute. “Their acknowledged excellence has lifted the school to near the top tier over the past decade—an ascent that has been limited only by the amount of space available to do their work and a much smaller scale than our leading peers,” said Bollinger. “Today the mayor and his EDC team are making it possible for us to take rapid new steps beyond the past in terms of new and improved research space

and expanding the number of our faculty and students.” Columbia’s Institute for Data Sciences and Engineering will focus on five key sectors: smart cities, new media, health analytics, financial analytics, and cybersecurity. The proposal for the Institute called for a rich, interdisciplinary approach with Engineering faculty working closely—as many already do—with departments in the Arts and Sciences, Columbia University Medical Center, and the University’s professional schools. “There’s every reason to believe that the Institute will produce a flood of innovations in these areas, and we expect the return on the City’s investment in the Institute to be substantial,” said Mayor Bloomberg, who called the agreement a “historic partnership” and said it is “by far, the largest and most far-reaching economic development effort City government has undertaken in modern memory.” Columbia will begin the development of the first of two phases of the Institute immediately, first creating 44,000 square feet of new applied sciences and engineering facilities in existing buildings by August 2016. In addition, Columbia will hire 30 new faculty members as a part of the first phase and ultimately expects to expand the Institute faculty to 75 by 2030. As part of phase 2, Columbia may expand the Institute’s use of the Audubon building at the University’s Medical Center in Washington Heights and at the same time create a 10,000 square foot bio-research incubator in the building. Columbia joined other prominent universities that have recently reached agreements with the City as part of its comprehensive Applied Sciences Initiative. In April, the City announced its partnership with an NYU-led consortium to build the Center for Urban Science and Progress in Downtown Brooklyn, and last December, the City created its first partnership with Cornell University and the Technion to develop a tech campus on Roosevelt Island. These agreements will provide a major boost to the City’s economy over the next several decades. According to an economic impact analysis conducted by the New York City Economic Development Corporation (EDC), Columbia’s Institute, in particular, is expected to generate $3.9 billion in overall economic activity over the next three decades, including 4,223 permanent jobs and 285 construction jobs.

About the Institute’s Leadership KATHLEEN R. MCKEOWN Director Henry and Gertrude Rothschild Professor of Computer Science A leading scholar and researcher in the field of natural language processing, McKeown focuses her research on big data, and her interests include text summarization, question answering, natural language generation, multimedia explanation, digital libraries, and multilingual applications. Her research group’s Columbia Newsblaster, which has been live since 2001, is an online system that automatically tracks the day’s news and demonstrates the group’s new technologies for multi-document summarization, clustering, and text categorization, among others. Currently, she leads a large research project involving prediction of technology emergence from a large collection of journal articles. McKeown joined Columbia in 1982, immediately after earning her PhD from University of Pennsylvania. In 1989, she became the first woman professor in the School to receive tenure and, later, the first woman to serve as a department chair (1998–2003). McKeown has received numerous honors and awards for her research and teaching. She received the National Science Foundation Presidential Young Investigator Award in 1985 and also is the recipient of a National Science Foundation Faculty Award for Women. She was selected as an AAAI fellow, a fellow of the Association for Computational Linguistics, and an Association for Computing Mechinery (ACM) fellow. In 2010, she won both the Columbia Great Teacher Award and the Anita Borg Woman of Vision Award for Innovation. McKeown served as a board member of the Computing Research Association and as secretary of the board. She was president of the Association of Computational Linguistics in 1992, vice president in 1991, and secretary treasurer for 1995–1997. She was also a member of the Executive Council of the Association for Artificial Intelligence and the co–program chair of their annual conference in 1991.

PATRICIA J. CULLIGAN Associate Director Professor of Civil Engineering A leader in the field of water resources and urban sustainability, Culligan has worked extensively with The Earth Institute’s Urban Design Lab at Columbia University to explore novel, interdisciplinary solutions to the modern-day challenges of urbanization, with a particular emphasis on the City of New York. Culligan is the director of a joint interdisciplinary PhD program between Columbia Engineering and the Graduate School of Architecture, Planning and Preservation that focuses on designs for future cities, including digital city scenarios. Her research group is active in investigating the opportunities for green infrastructure, social networks, and advanced measurement and sensing technologies to improve urban water, energy, and environmental management. Culligan received her MPhil and PhD from the University of Cambridge and was on the faculty at MIT before joining Columbia in 2003. She has received numerous awards for her contributions in engineering research and education, including the National Science Foundation’s CAREER Award, the Egerton Career Development Chair, MIT’s Arthur C. Smith Award for contributions to undergraduate life, Columbia Engineering School Alumni Association’s Distinguished Faculty Award, and Columbia’s Presidential Teaching Award. Culligan serves on the National Academies Nuclear and Radiation Studies Board and the Board of Earth Sciences and Resources Committee on Geological and Geotechnical Engineering. In 2011, she was elected to the Board of Governors of the American Society of Civil Engineer’s Geo-Institute. She is the author or coauthor of six books, two book chapters, and more than 70 refereed scientific publications and 110 technical articles.


CONGRATULATIONS, GRADUATES!

Xerox CEO Ursula Burns MS’82 was this year’s Class Day speaker. Burns reflected on her challenging road to Columbia, watching her mother struggling financially but remaining determined to put education first for Burns and her two siblings. “My mother made whatever sacrifices were necessary to see to it that we had an opportunity to get a good education and then she insisted that we take advantage of that opportunity. … All of [you] have that same opportunity. Don’t ever take it for granted.”

“The most valuable lessons we can take away from our experiences is that as Columbians, we’ve learned to work really, really incredibly hard for what we believe in.” —Judy Kim, Senior Class President

Annual faculty awards were presented to: Gerard Ateshian, professor of mechanical engineering, Distinguished Faculty Teaching Award Luis Gravano, associate professor of computer science, Distinguished Faculty Teaching Award

“When your life’s journey ends, I promise

Siu-Wai Chan, professor of applied physics and applied mathematics, Avanessians Diversity Award

you that you won’t care very much about

Aaron Kyle, lecturer in biomedical engineering design, Edward and Carole Kim Award for Faculty Involvement Kristin Myers, assistant professor of mechanical engineering, Rodriguez Family Junior Faculty Development Award

the money you made or the status you’ve achieved if you haven’t made the world a better place along the way.” —Ursula Burns MS’82

More than 800 students were honored at Class Day on May 14. And for the first time, the School held a unified Class Day ceremony, recognizing all graduating students, including those who earned doctorates from the Graduate School of Arts and Sciences in engineering disciplines. Alumni celebrating their 50th anniversary of graduation from a master’s, professional, or doctorate program were recognized with an honorary certificate. Pictured on left: Tanju Koseoglu MS’60 (left) and Leonard Krawitz MS’47, members of the inaugural class of Jubilee participants 42 | columbia engineering

43

C

olumbia Engineering enjoyed record-breaking attendance at its annual reunion weekend, held May 31 through June 3. With a slew of events that included dinners, cocktail receptions, social outings, and academic lectures, the jam-packed weekend gave alumni—from class years ending in 2 and 7—plenty of opportunities to forge new memories with old classmates and professors. The fun weekend kicked off with a welcoming dinner in Low Rotunda, at which the Columbia Engineering Alumni Association Awards were given. The Pupin Medal, Egleston Medal, and Samuel John-

son Medal were presented to three pioneers in science and engineering: Nobel Laureate and University Trustee Emeritus Dr. Harold Varmus ’64SIPA, ’66PS, ’90HON; former Samsung Electronics President Jae-Un Chung BS’64, MS’69; and Bernard Roth MS’58, PhD’63, Adams Professor of Engineering and cofounder of the Institute of Design at Stanford University. At the Golden Lions Dinner held Friday at the Russian Tea Room, perhaps one of the School’s most devoted alums, Bernard Queneau ’32, was recognized for celebrating the 80th anniversary of his graduation from Columbia. In his speech to the Golden Lions—fellow alumni who have celebrated the 50th anniversary of their graduation—Queneau talked about his time at the School in the 1930s and about his chosen field of metallurgy. He shared his career highlights—having volunteered with the U.S. Navy Reserve during World War II and building his professional career thereafter at U.S. Steel for some 30 years, ending up as general manager of quality assurance for the steel giant. He said, “The reason for the overly long story of my life is that we are still in a

Left to right: Johnson medalist Jae-Un Chung, Egleston medalist Bernard Roth, and Pupin medalist Harold Varmus


steel age as well as the World Wide Web of the Internet.” Queneau also expressed his gratitude to Columbia University for “the outstanding education” he received. At the Dean’s Day Luncheon for all Engineering alumni, the Class of 1962 was inducted into the Golden Lions Society and the Class of 1987 to the Silver Lions Society. In his remarks, former dean Feniosky Peña-Mora discussed the School’s positive momentum. He talked about faculty recruitment, the jump in admissions figures for both the undergraduate and graduate programs, and the ongoing negotiations with the City of New York to create an Institute for Data Sciences and Engineering. This year, the Magill Lecture in Science, Technology, and the Arts was given by Jeffrey Brock ’91GSAPP of Moneo Brock Studio, lead designer of the Northwest Corner Building. He focused on the architectural design of the newly erected building as well as parts of the building phase. In addition to the Magill Lecture, Reunion attendees also heard from Columbia Engineering faculty at other science and engineering lectures held over the weekend. Associate Professor of Computer Science

Left to right: Lou Shrier BS’59 and his wife, Diane, with Clyde Smith BS’62 and his wife, Marvalena, at the Russian Tea Room

Photos by Diane Bondareff and Michael DiVito

Reconnecting at Reunion 2012

Top row, left to right: Bernard Queneau ’32 and wife, Esther; Class of ’62; bottom row, left to right: Jeffrey Brock ’91GSAPP; Engineering Golden Lions

Eitan Grinspun discussed how cinema and Hollywood use computers to animate physics, and Keren Bergman, Charles Batchelor Professor of Electrical Engineering, spoke on the topic “How Our Future Computers Will Run on Light.” Engineering professors Klaus Lackner and Ken Shepard, and alumnus and NASA astronaut Mike Massimino BS’84, also delivered lectures on climate change, the biological sciences, and space exploration. Other activities over Reunion Weekend included a party for young alumni on the USS Intrepid, a tour of the Museum of Modern Art, the Chelsea Art Gallery Crawl, a tour of the new Northwest Corner Building, and cocktails and dancing on Low Plaza.

Janneth Marcelo BS’92, who returned to campus for her 20th Reunion, enjoyed several events over the weekend with her husband, son, and twin girls. “Camp Columbia was so much fun (for the kids) that one of my daughters cried when we picked them up,” said Marcelo, who now lives and works in Washington, D.C. “I enjoyed the Saturday lunch and lecture. Seeing design diagrams of the Northwest Corner Building’s trusses brought back memories of solid mechanics class! And, it was great to catch up with old friends, especially Ami Dave BS’92, my Engineering Student Council running mate during our junior year. . . . I introduced my family to Koronet Pizza and hauled real New York City bagels home. It was a great time.”

Save the Date! Reunion Weekend May 30–June 2, 2013

Alan Czemerinski ’15 (left) with his father, Ariel Czemerinski MS’90


In Memoriam

Program Notes Materials Science Engineering

Darren Su with his wife on a trip to France

After graduating, Di-Shi (Darren) Su MS’00 joined TSMC (Taiwan Semiconductor Manufacturing Company) as the process integration engineer in wafer process development. He successfully qualified the first copper line in 0.13um poly/ gate process. Since 2006, Darren has been a manager of foundry execution at LSI Corporation. He now has more than 12 years of experience in the semiconductor field and water fabrication with an emphasis in wafer process/yield improvement, reliability and SPICE evaluations, productbased performance optimization, and customer quality solutions. Darren got married in 2006 and currently lives in HsinChu (northern Taiwan) with his wife, Kris Chen, and their two kids. While completing his MS degree, Kyle Teamey MS’12 was also running a start-up, Liquid Light. His company is developing a technology for converting carbon dioxide to industrial chemicals. PhD candidate Theodore Kramer MS’08, MPhil’11 is one of Liquid Light’s employees.

Mechanical Engineering Vito Agosta PhD’59 is now in his

90th year. He is professor emeritus at the now Polytechnic Institute of NYU with dual appointments in mechanical and aerospace engineering between 1951 and 1986. During that period, he founded a rocket laboratory and did both analytical and experimental research on solid and liquid propellant rocket engines for both government agencies and for industry. Vito also founded Propulsion Sciences Incorporation and had a sole source contract with the Applied Physics Laboratory from 1960 to 1978 on the development of the gas dynamic equations


for two-phase reacting flows. These were applied to the development of a hypersonic vehicle with supersonic combustion. He also worked with JPL and participated in the design and operation of the reaction control engines for the LEM Moon Module. Vito prepared for retirement by studying the combustion of waste and alternate fuels in existing engines and boilers. Recently, FAST System Corp. was formed by some of Vito’s former students at Polytechnic, led by the corporation’s president, Mordechai Schlam, to exploit Vito’s energy patents and patent applications.

In Memoriam FACULTY

Elmer L. Gaden BS’44, MS’47, PhD’49, former professor and chair

Kenneth Chen, with his wife and two sons.

Kenneth Chen MS’97, MPhil’03, PhD’07 worked as a consulting engi-

neer at Syska Hennessy Group in New York City after graduation until June 2011. A month later, Kenneth joined Dell as a datacenter global solution architect. His engineering practice mainly focuses on U.S. and Asia datacenters engineering solutions. Kenneth and his wife, Maggie, married in 2006 and currently live in Brooklyn. They have two sons, Kayden, just over a year old, and Marcus, three months old. Rimas Gulbinas MS’11 recently started his PhD studies at Virginia Tech, where he is investigating how peer networks in commercial settings behave when presented with energy consumption information. The goal is to reduce energy consumption at the workplace by influencing behavior and spreading awareness. Alexander Potulicki MS’10

recently moved back to NYC and accepted a position at ConEdison as a systems engineer. Ben Spitz MS’92 recently received his Certified Energy Manager credential from the Association of Energy Engineers.

of Chemical Engineering at Columbia Engineering and an alumnus of the School, died March 10 of congestive heart failure in Charlottesville, Va. Widely known in the field as the “father of biochemical engineering,” Gaden was 88 at the time of his death. Gaden began his research in biochemical engineering at the Engineering School, where he received three degrees in chemical engineering. His groundbreaking dissertation focused on providing the optimal amount of oxygen to allow greater fermentation energy for penicillin mold to grow and multiply more rapidly. This research formed the basis for mass production of a wide range of antibiotics, beginning with penicillin, and it was this work for which Gaden earned in 2009 the prestigious Fritz J. and Dolores H. Russ Prize, which was established jointly by the National Academy of Engineering (NAE) and Ohio University and is bestowed biennially. Gaden’s interest in harnessing biological processes to produce chemicals led him to publish widely and to found the international research journal Biotechnology and Bioengineering, which he edited for 25 years. “He was the first to develop and organize biotechnology as an engineering practice,” said Professor Sanat Kumar, chair of the Department of Chemical Engineering. “He had a very big presence at the School; he was a big influence on

the department and is the reason why we have an extremely strong biochemical presence.” Kumar met Gaden five years ago when the late professor attended an event at the School to launch a lectureship series in his name. The Elmer L. Gaden Lectureship is hosted by the Chemical Engineering Department and brings to campus leading researchers and scientists as guest speakers each fall. A Brooklyn native, Gaden served in the U.S. Navy during World War II and spent one year as a researcher at Pfizer. The majority of his career, however, was spent in academia. He was a professor at Columbia Engineering for 25 years (from 1949 to 1974), during which time he was a teacher, researcher, and department chair, and founder of the program in biochemical engineering. In 1974, he was elected to the National Academy of Engineering. That same year, Gaden was named dean of the College of Engineering, Mathematics and Business Administration at the University of Vermont. In 1979, he joined the engineering faculty at the University of Virginia as the Wills Johnson Professor of Chemical Engineering, where he remained until his retirement in 1994. Gaden received other numerous honors and awards throughout his impressive career. In addition to the Russ Prize, considered by many as the equivalent of the Nobel Prize for engineering, Gaden received the Egleston Medal for distinguished engineering achievement from Columbia in 1986, an honorary doctorate in 1987 from Rensselaer Polytechnic Institute, and, in 1988, the Founders Award from the American Institute of Chemical Engineers. He also was honored with Columbia’s Great Teacher Award for outstanding teaching. He is survived by Jennifer, his wife of 48 years, daughter, Barbara, and sons David and Paul. He also is survived by two grandchildren.

David L. Waltz, director of the Center for Computational Learning Systems (CCLS) at Columbia Engineering and prominent computer scientist, passed away March 22 at a hospital in Princeton, N.J. He was 68. Waltz joined Columbia in 2003 as director of CCLS, an interdisciplinary research center established to focus on leading-edge machine learning and data mining research. CCLS colleague Roger Anderson, a senior research scholar, said he is “terribly saddened for Dave’s passing, and proud to have served under his vision, integrity, and strength of leadership.” Waltz received his PhD from the Massachusetts Institute of Technology, where his thesis on computer vision originated the field of constraint propagation. According to the story “Early Warning for Seizures” in the Fall 2009 issue of Columbia Engineering News, he is also well known as the originator, along with former colleague Craig Stanfill, of the memory-based reasoning branch of Case-Based Reasoning. Prior to joining Columbia, Waltz was president of the NEC Research Institute in Princeton and from 1984 to 1993 served as director of Advanced Information Systems at Thinking Machines Corporation and as a professor of computer science at Brandeis University. He had also been professor of electrical and computer engineering at the University of Illinois (CSL and ECE Department) for 11 years. Waltz served as president of AAAI (American Association for Artificial Intelligence) from 1997 to 1999 and was a fellow of AAAI and ACM (Association for Computing Machinery), a senior member of IEEE (Institute for Electrical and Electronics Engineers), and former chairman of ACM SIGART (Special Interest Group on Artificial Intelligence). Waltz served on several boards, including the Army Research Lab Tech-

nical Advisory Board and the Advisory Board of the Florida Institute for Human and Machine Cognition, the Technical Advisory Board of 4C (Cork Constraint Computation Center, Ireland), and more recently on external advisory boards for Rutgers University, Carnegie Mellon University, Brown University, and EPFL (Ecole Polytechnique Fédérale de Lausanne). He was also on the Advisory Board for IEEE Intelligent Systems, the Computing Community Consortium Board of the CRA (Computing Research Association), and NSF Computer Science Advisory Board.

ALUMNI 1932

Paul E. Queneau ME’33, ’31CC

died peacefully in Hanover, N.H., on March 31, 2012. He was 101. Paul was a decorated war veteran who fought at Normandy in World War II, held 36 U.S. patents in metallurgical and chemical engineering, earned his doctorate at age 60 from Delft University of Technology in the Netherlands, and explored the Perry River region of the Arctic in 1949. Born in Philadelphia, Paul and his family followed his father’s engineering career around the world. After graduating from Columbia Engineering and successfully persevering through the Great Depression, Paul joined International Nickel’s (INCO) alloy plant in Huntington, W.Va. In 1939, he married Joan Hodges. Paul went on to graduate from the Army Engineer School. He was deployed to Europe as part of the Corps of Engineers and spent several years battling from Normandy beachhead to Rhine River. He was awarded the Bronze Star, the Army Commendation Medal, and the ETO Ribbon with five battle stars. In 1945, he returned to the Army Reserve as a lieutenant colonel.

Paul’s career at INCO spanned 35 years; he retired as INCO’s vice president, technical assistant to the president, and assistant to the chairman. During that time, he and Joseph R. Boldt wrote The Winning of Nickel, still considered one of the bibles on nickel recovery and processing. During retirement, Paul earned his doctorate, then joined the faculty of Dartmouth College’s Thayer School of Engineering in 1971, where he taught for the next quarter century. He invented a number of successful industrial processes; his patents focused on extraction of nickel, copper, cobalt, and lead from their ores and concentrates. Paul was elected to the National Academy of Engineering and was a fellow and past president of The Minerals, Metals & Materials Society (TMS). He received an Evans Fellowship from Columbia University and later was awarded Columbia’s Egleston Medal. Both avid lovers of nature, he and wife, Joan, bought a farm near Cornish, N.H., where they spent their free time building ponds, making maple syrup, raising cattle, and living out his boyhood dream of being a farmer. Paul was preceded in death by his loving wife. He was the loving father of Paul B. Queneau and Josie Queneau, and devoted grandparent to six grandchildren and seven great-grandchildren. He is survived by his brother and fellow SEAS alumnus, Bernard R. Queneau, who celebrated his 100th birthday this past July.

1936

1943 Edward Buyer passed away on February 4, 2012, in Sykesville, Md., at the age of 90. Ed was an accomplished sailor, athlete, and swimmer. He graduated as the valedictorian of New Rochelle High School and flew with the 493rd Bombardment Group, Tenth Air Force, in India and Burma after graduating from Columbia. He received his MS from the Polytechnic Institute of Brooklyn and worked as an electrical engineer who helped pioneer the development of electronic reconnaissance. Ed’s wife, Marilyn, died six years ago. They had three children and seven grandchildren. Robert William Schubert MS’48

passed away on January 5, 2012, at the age of 88 at his home in Rye, N.Y. Robert was an engineer, executive, and entrepreneur. He was a World War II veteran, naval commander, and an amateur sailor. Born in the Bronx, Robert was an honors graduate of DeWitt Clinton High School before earning his degrees in mechanical engineering from Columbia. In World War II, Robert was assigned to the LSM-441 of the Pacific Theater. He captained the first non–Red Cross ship into Nagasaki, Japan, after the detonation of the second atomic bomb. He joined Watson Laboratories as a chief mechanical engineer on the Naval ordinance research computer, the fastest computer in existence in 1954. Subsequently, Robert joined IBM, where he built a 30-year career. Robert and his wife Joan married in 1948. They raised three children. In 1985, he married Rita, and they raised one child.

Elmer L. Knoedler PhD’52, 100,

of Davidson, N.C., died on April 4, 2012. Born in Gloucester, N.J., Elmer was preceded in death in 2003 by his wife, Mabel Dyer Knoedler, whom he married on January 15, 1966. He became a partner and senior field engineer with Shepherd T. Powell and Associates in Baltimore, where he remained until 1982. He was the past member and chairman of the Baltimore American Institute of Chemical Engineers. He was also a life member of the American Society of Mechanical Engineers. Elmer is survived by a stepson, three grandchildren, and seven great-grandchildren.

1945 Donal J. Lonergan Sr., 87, of Salis-

bury Township, Pa., died peacefully at Lehigh Valley Hospital on March 14, 2012. Donal spent his childhood in the Bronx, where he attended the Bronx High School of Science. He served in the U.S. Navy Seabees. His professional career took him across the nation as a civil engineer for Lehigh Structural Steel. Donal pursued varied interests, including membership in the Metropolitan Opera and working on archeological digs throughout the Middle East. He is survived by his wife of 60 years, Margaret, three


In Memoriam

In Memoriam daughters, three sons, 21 grandchildren, and five great-grandchildren. Sheldon E. Isakoff MS’47, PhD’52

passed away on January 29, 2012, at his home in Chadds Ford, Pa. Sheldon spent his professional career at DuPont, rising from research engineer when he was hired in 1951 to director of the engineering research and development division when he retired in 1990. His work at DuPont culminated in many patented developments, including the EFT Dacron and nylon processes, Mylar and Cronar process improvements, and the first Lycra plant in the world. He was a member of the National Academy of Engineering and served as president of the American Institute of Chemical Engineers (AIChE). He played a significant role in the governance of Chemical Heritage Foundation (CHF) for nearly three decades. Sheldon had been an ardent supporter of Columbia Engineering since his student days. In 1996, he established the Sheldon E. Isakoff Scholarship in the School’s Department of Chemical Engineering. He was awarded the Alumni Association’s Egleston Medal for Distinguished Engineering Achievement in 1993.

1950 Wallace K. Grubman-Graham

passed away in Concord, N.H., on January 6, 2012. Wallace was chair and CEO of National Starch and Chemical Company and subsequently, director of Unilver PLC. In 1998, he established the Wallace K. Grubman-Graham Scholarship to support a SEAS student in chemical engineering. Wallace and his wife of 61 years, Ruth, lived for many years in London and Surrey, England. They most recently lived in Maine. Wallace is survived by his wife, two sons, and five grandchildren. Walter Mitton ’49CC, 88, passed away on February 27, 2012, after a brief illness. With World War II in progress, Walter was drafted into the U.S. Army after attending only one semester at Columbia. He was in action with his unit on the front lines for an extended period, including the Battle of the Bulge. After the war, Walter resumed his studies and then began


his first engineering job with Curtiss Wright Corp. in aircraft engine design. He continued to study at Columbia part time for his master’s degree in mechanical engineering, where he met and married Virginia Bassford. Walter worked at Convair Astronautics as an engineer in rocket propulsion design. In 1970, he bought a cabinet shop and ran it successfully for more than 20 years. Walter was predeceased by his beloved wife. He is survived by his two children and his sisters.

1951 Klara Salamon Samuels passed

away on August 23, 2012, after a brief battle with multiple myeloma. Her friend and fellow classmate at SEAS, Elna Loscher Robbins, writes, “Klara achieved the ultimate revenge on the perpetrators of the Holocaust. She had a productive and happy personal and professional life.” As a young teenager, Klara learned English while incarcerated in the Bergen-Belsen concentration camp. She prepared for college without going to high school but was admitted to both New York University and Barnard. Klara and Elna transferred to the Engineering School at the same time and were among the first women to graduate from SEAS. Klara enjoyed a long and happy marriage to New York City native Bertram Samuels, who is also deceased. They raised two sons; both went on to become Presidential Scholars. Klara retired after a 30-year career teaching high school chemistry and physics and wrote about her life experience in a book titled God Does Play Dice, published in 1999. She gave many lectures about her life, especially about her experiences during the Holocaust. She will be missed by her family and friends. The Samuels Family created a tribute page for Klara on the Multiple Myeloma Research Foundation (MMRF) website. Charitable donations may be made to MMRF in her honor. Beno Sternlicht MS’54, 84, of

Niskayuna, N.Y., passed away May 6, 2012. Born in Nowy Sacz, Poland, Beno came to Schenectady, N.Y., from Europe in the late 1940s after escaping the Holocaust with $100

in his pocket. Most of Beno’s family did not survive the war. He and his father escaped from Nazi-held Poland, traveling through Russia, Turkey, Palestine, and Iraq before settling in India. He earned a series of degrees, including a bachelor’s in electrical engineering from Union College, and a master’s degree in applied mechanics and a PhD in energy conversion at Columbia. Following his education, he founded several companies. He had been a manager at the General Electric Company, heavily involved in guiding and financing start-up companies, leaving in 1961 to cofound Mechanical Technology Inc. (MTI), which manufactures testing and measuring instruments. Beno retired from MTI’s board of directors in 2005. He also founded Volunteers in Technical Assistance, a nonprofit organization that has provided technological and engineering assistance to developing countries for four decades. He received the Machine Design Award from the American Society of Mechanical Engineers in 1966. He held several patents and was an adviser in the Carter and Reagan administrations and chairman of the NASA Committee on Space Power and Propulsion from 1972 to 1975. He is survived by his wife of 37 years and two sons.

1952 Walter G. Berghahn ’51CC passed

away on February 7, 2012. Born in Yonkers, N.Y., to first-generation immigrant parents from Germany, Walter excelled at every level of schooling, finishing at Columbia. He married his childhood sweetheart, Martha Buckley. Walter worked for General Electric in the 1950s and 1960s, contributing to the Polaris Missile Program as well as the early space program, where he was involved in the production of the first space helmet prototype. In 1967, he joined Bristol Myers, where he attained numerous patents for specialized packaging for pharmaceuticals. Walter retired in 1986. Walter was predeceased by his wife in 2006, and both of his brothers. He is survived by four children and six grandchildren. Clark Loring Poland (MS, Industrial Engineering and Operations Research), 88, died on January 4,

2012, in Charlton, Mass. He grew

up in Oradell, N.J., where he met his wife of 57 years, Harriet Desmond, who died in 2006. His three children and five grandchildren survive him. A veteran of World War II, Clark was a member of an Army mortar platoon in Company D, 387th Infantry Regiment. He achieved the rank of staff sergeant and participated in both the European and Japanese theaters of war. Clark was awarded the Purple Heart and Good Conduct Medals. He had a successful career in manufacturing, including many years at General Food Corp. and The American Can Co. In retirement, he ran his own company as a distributor of Danish papermaking machinery.

1957 David E. Boyer ’56CC of West Caldwell, N.J., died peacefully on July 8, 2012. He was 78. Dave was employed by Foster Wheeler Energy Corporation for 42 years as a civil engineer and as a project manager. He served as a member of the Caldwell– West Caldwell Board of Education and the West Caldwell Planning Board. David was ordained as an elder in the Presbyterian Church when he was 20 years old. For many years, he attended the Caldwell United Methodist Church, where he was a member of the choir. He is survived by his wife of 48 years, Doreen, four children, and three grandchildren.

1962 Frank J. Affinito, a longtime resident

of Ridgefield, Conn., died on June 2, 2012, after a long illness. Born in the Bronx, Frank enlisted and proudly served in the U.S. Navy from 1952 to 1956. After receiving his degree from Columbia, Frank earned a master’s degree in electrical engineering from the University of Connecticut. Frank was employed by the Dunlap Corporation and then worked at the IBM Thomas J. Watson Research Center. Among his many professional achievements, he was involved in critical patent work regarding the computer “mouse” in the mid- to late-1980s. He is survived by two sons. His wife of 32 years, Marion Maass, died in 1991.

1968 Uriel Domb (MS, Operations Research) passed away at his home

on March 12, 2012. Uriel was the beloved husband of Elizabeth; father of Sharon Domb and Stephen Manly, and Ilana, Gabrielle, Arielle, and Michael Domb. He was the special “saba” to Jordyn and Eden Manly and brother to Daniel Domb.

1979 William D. Kennedy (MS, Mechanical Engineering) died peacefully at

Princeton Medical Center on June 23, 2012, at the age of 69. During a 46year career with Parsons Brinckerhoff, William, a vice president and senior engineering manager, participated in the development of tunnel ventilation systems for public transit systems around the world. William and a small group of colleagues were part of a joint venture team that developed the Subway Environmental Design Handbook under contract to the U.S. Department of Transportation in the early 1970s. As part of that project, William led the development of the Subway Environment Simulation (SES) software program. In the 1980s, William and his colleagues developed the concept of platform screen doors for the Singapore mass rapid transit system that prevented heat from the subway tunnels from entering station platforms. In the 1990s, William contributed to the development of SOLVENT, a three-dimensional computational fluid dynamics (CFD) fireventilation program for road tunnels. Most recently, William contributed to the development of ventilation systems for projects such as the extension of the No. 7 subway line in New York, the Purple Line subway in Los Angeles, the Delhi Metro, and rail and road tunnels in Istanbul. William had been a member of the advisory board to the Columbia University Department of Mechanical Engineering since 2007. He is survived by his parents, his wife, Patricia, two daughters, and four grandchildren.

1980 Michael (Mick) Lawler MS’81, 55,

of Gouverneur, N.Y., passed away on July 16, 2012. Michael graduated in

1975 from East Syracuse Minoa High School, where he played football and lacrosse. While a student at Columbia, he was a defensive lineman on the football team and played in the first college football game played in the New York Giants Meadowland Stadium. He went on to earn an MBA from Clarkson University in 1985. In 1981, Michael and his wife, Gale, moved to Gouverneur, N.Y., where he was the assistant mine superintendent of the St. Joe Minerals zinc mine in Balmat, eventually becoming president and CEO. He traveled all over the world on business and was often asked to speak at conferences and meetings related to the mining industry. Michael is survived by his wife, two children, six siblings, and several nephews and nieces.

2002 Peng Wang MS’02, MPhil’04, PhD’04, 36, died in a car accident on

February 6, 2012, on his way to work in Rhode Island. Peng was assistant professor of chemical engineering and pharmaceutical sciences at the University of Rhode Island (URI). He worked as part of a research group that focuses on the thermodynamics of mixing mechanisms of polymer drug mixtures. A promising young researcher, Peng worked with Columbia Engineering Professor Jeffrey Koberstein on the modification of polymer surfaces while at Columbia, and following a brief stint in industry, was a postdoctoral associate at both the University of Michigan and the New Jersey Institute of Technology. He is survived by his wife, Ran Luo, and their daughter, Carolyn.

Other Deaths Reported We also have learned of the passing of the following alumni, faculty, and friends of the School: Jeane R. Clark MS’32 Francis J. McAdam BS’36 Robert V. Close BS’37, PhD’38 Bertram Coren BS’38, PhD’39, ’37CC Richard F. Marzari BS’40, ’39CC Dante Bove BS’42, ’41CC Leon N. Canick BS’42, MS’47

Richard Y. LeVine BS’42, PhD’43 Stanley A. Balter BS’43, MS’59 Roland A. Kozlik BS’43, ’43CC Robert J. Ullmer BS’46 John Kazan BS’47 Norman K. Trozzi BS’47, MS’48 Isak Arditi MS’48 Harold A. Golle BS’48, MS’49 Eric Jenett BS’48, MS’49, ’45CC Ken Knoernschild BS’48 Harrison B. Rhodes BS’48, MS’50, EngScD’60 Calvin H. Soldan BS’48, MS’49 Robert Schrage BS’48, MS’48, PhD’50, ’46CC Howard J. Baker BS’49, ’49CC William Beauchemin BS’49 John M. French MS’49 Stanley L. Johnson BS’49, ’48CC, ’55GSAS Charles A. Kroetz BS’49 Elmo Miller BS’49 Raymond C. Daley MS’50 Harry L. Davis BS’50 Jay C. Fernandes BS’50, ’CC49 Walter R. Meth BS’50 Emil J. Schonheinz BS’50 Hugo Landerer MS’51 James T. McQueen MS’52 Marius Charlet BS’53 Ludomir T. Lazarz BS’54 Alan P. Lowenstern BS’54 Guido A. Moglia MS’54 Eugene N. Montelone MS’54 George J. Pastor BS’55 John J. Gaffney MS’56 Thomas M. Kiely MS’56 Francis H. Sullivan MS’56 Lester N. Trachtman BS’56, ’55CC Daniel P. Bennett MS’57 Thomas Bergel BS’57 Alberto Calderaro MS’57 Houchang Handjani MS’57 Robert Podell BS’57 Roderick A. Maclennan BS’58, ’57CC Alfred Weiss MS’59 Richard Will MS’59, ’66GSAS Walter H. Bridges BS’60 Enno Koehn MS’60 James McDonagh MS’61 Charles E. O’Neill PhD’61 Edwin H. Taylor MS’61 Eugene S. Rocks MS’62 Robert E. Weiblen MS’62 Michael E. Zelkin BS’62 John F. Walsh MS’63 Frank J. Lupo PhD’64 Jim H. Harris MS’65 Richard R. Fyfe MS’66, EngScD’69 Jay C. Jeffes BS’67 Richard Lucek PhD’68 Robert Zincone MS’68 Kevin J. Brady EngScD’70

Wayne H. Stayton BS’70 Piara L. Qusba MS’71 Michael Vorkas BS’71 Marcel Didier Paul Desbois BS’80, MS’81, ’77CC Stephen S. Moss BS’81, MS’85 Dumitru Nicolici MS’81 William J. Devlin MPhil’82, PhD’86, ’81GSAS Chun-Sheng Li MS’91, MBA’92BUS Ercan Alemdar BS’02 Jacquelin L. Craig, friend Huston Ellis Mount, friend


We Have Liftoff! You Can Change Lives that Change the World

What Is Giving Day? On October 24, for 24 hours, the entire Columbia community will join together to celebrate and give back to Columbia. Why Give to Columbia?

Why Give on Giving Day?

How to Get Involved

Think of it as giving through Columbia,

On Giving Day, your donation will go further

Give

donations can support a wide breadth

The largest is a $250,000 challenge,

rather than giving to Columbia. Your

of world-changing initiatives, including

scientific research, student scholarships, art programs, community organizations, Top row, left to right: Interim Dean Goldfarb dons beanie; first-years in Havemeyer Hall; bottom row, left to right: Mary Byers ’13; Mike Massimino BS’84, via Skype

W

hat could be more inspiring to a first-year engineering student than receiving advice directly from a full-fledged NASA astronaut? That’s the treat first-years were in for when Michael Massimino BS’84, via Skype, addressed them during their first orientation session on August 28 in Havemeyer Hall. More than 300 members of the Class of 2016 listened intently as Massimino shared funny childhood anecdotes about his dream to one day become an astronaut. He even showed a photo of himself from one Halloween as a kid dressed in a homemade astronaut costume (revamped by his mom from an old elephant costume). Massimino’s road to NASA had its fair share of challenges including struggling, at times, with his course load at SEAS and later, receiving three rejection letters from NASA before


athletics programs, and much more. By

giving through Columbia, you enable the research, students, and programs that

being selected. To the entering class he said, “Always try to maintain a positive attitude. Whenever it gets difficult, don’t get discouraged. Fight through it and remember, there are plenty of people here who will help you at Columbia.” Senior Mary Byers also shared a few words of wisdom. As someone who started off with an interest in mechanical engineering, but ultimately chose industrial engineering and operations research as a major, Byers encouraged her peers to take the opportunity to explore various fields, be open, and “find something to be crazy about.” The program kicked off with a message from Interim Dean Donald Goldfarb, who also talked briefly about his own career trajectory and his varied background in chemical engineering, computer science,

and industrial engineering and operations research. Dean Goldfarb urged students to make connections and to network with upperclassmen to get a better sense of the different courses and fields of study the School has to offer. David Vallancourt, a senior lecturer in Electrical Engineering and an alumnus of the School, echoed this point and also stressed that college isn’t a race. “You are here to learn,” he said, and “be present.” At the end of his official welcome and as a tradition, Dean Goldfarb led the new students in donning what are called the “first-year beanies.” In a centuries-old tradition, students wore the blue beanies as a symbol of their distinctive position on campus as first-years.

make up Columbia’s powerful and diverse community.

through generous matching opportunities.

portions of which will be awarded to schools and programs based on a percentage of total dollars raised. In addition, five cash

prizes, starting at $50,000, will be awarded to the top five schools with the greatest percentage of alumni contributing on

Giving Day. Finally, your donation could be

selected to win one of six $5,000 prizes that will go to the school or program to which you donated.

Spread the Giving Day message, motivate fellow alumni and raise the bar for

participation at your school and across Columbia.

Your donation makes a difference. g i v i n g d a y. c o l u m b i a . e d u

On October 24, make a donation to

your school or program of choice at givingday.columbia.edu.

Join Join us for live discussions with President Lee C. Bollinger, Nobel Laureate Eric

Kandel, and other Columbians making an

impact. Tweet your questions during these

live events and include #columbiagivingday to participate in the conversation.

Share Share the news of your gift and tell your

network about Columbia’s contributions to the world.