Academic Publications

S. Chen, Z. Qin, Z. Wilson, B. Calaci, M. Rose, R. Evans, S. Abraham, D. Metzler, S. Tata, M. Colagrosso (2020)

S. Chen, Z. Qin, Z. Wilson, B. Calaci, M. Rose, R. Evans, S. Abraham, D. Metzler, S. Tata, and M. Colagrosso. Improving Recommendation Quality in Google Drive. In 26TH ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2020.


  @inproceedings{49272,
    title = {Improving Recommendation Quality 
      in Google Drive},
    author = {Suming Jeremiah Chen and Zhen Qin 
      and Zachary Teal Wilson and Brian Lee
      Calaci and Michael Richard Rose and Ryan
      Lee Evans and Sean Robert Abraham and
      Don Metzler and Sandeep Tata and Mike
      Colagrosso},
    year = {2020}}

Quick Access is a machine-learned system in Google Drive that predicts which files a user wants to open. Adding Quick Access recommendations to the Drive homepage cut the amount of time that users spend locating their files in half. Aggregated over the ~1 billion users of Drive, the time saved up adds up to ~1000 work weeks every day. In this paper, we discuss both the challenges of iteratively improving the quality of a personal recommendation system as well as the variety of approaches that we took in order to improve this feature. We explored different deep network architectures, novel modeling techniques, additional data sources, and the effects of latency and biases in the UX. We share both pitfalls as well as successes in our attempts to improve this product, and also discuss how we scaled and managed the complexity of the system. We believe that these insights will be especially useful to those who are working with private corpora as well as those who are building a large-scale production recommendation system.

W. Kong, M. Bendersky, M. Najork, B. Vargo, M. Colagrosso (2020)

W. Kong, M. Bendersky, M. Najork, B. Vargo, and M. Colagrosso. Learning to Cluster Documents into Workspaces Using Large Scale Activity Logs. In26TH ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2020.


  @inproceedings{49265,
    title = {Learning to Cluster Documents 
      into Workspaces Using Large Scale
      Activity Logs},
    author = {Weize Kong and Mike Bendersky 
      and Marc Najork and Brandon Vargo 
      and Mike Colagrosso},
    year = {2020},
    booktitle = {Proceedings of the 26th
      ACM SIGKDD Conference on Knowledge
      Discovery and Data Mining (KDD ’20)},
    pages = {2416–2424}}
    
    

Google Drive is widely used for managing personal and work-related documents in the cloud. To help users organize their documents in Google Drive, we develop a new feature to allow users to create a set of working files for ongoing easy access, called workspace. A workspace is a cluster of documents, but unlike a typical document cluster, it contains documents that are not only topically coherent, but are also useful in the ongoing user tasks. To alleviate the burden of creating workspaces manually, we automatically cluster documents into suggested workspaces. We go beyond the textual similarity-based unsupervised clustering paradigm and instead directly learn from users’ activity for document clustering. More specifically, we extract co-access signals (i.e., whether a user accessed two documents around the same time) to measure document relatedness. We then use a neural document similarity model that incorporates text, metadata, as well as co-access features. Since human labels are often difficult or expensive to collect, we extract weak labels based on co-access data at large scale for model training. Our offline and online experiments based on Google Drive show that (a) co-access features are very effective for document clustering; (b) our weakly supervised clustering achieves comparable or even better performance compared to the models trained with human labels; and (c) the weakly supervised method leads to better workspace suggestions that the users accept more often in the production system than baseline approaches.

S. Tata, V. Panait, S. Chen, M. Colagrosso (2019)

S. Tata, V. Panait, S. Chen, and M. Colagrosso. ItemSuggest: A Data Management Platform for Machine Learned Ranking Services. In CIDR, 2019


  @inproceedings{47850,
    title = {ItemSuggest: A Data Management
      Platform for Machine Learned Ranking
      Services},
    author = {Sandeep Tata and Vlad Panait
      and Suming Jeremiah Chen and Mike
      Colagrosso},
    year = {2019},
    booktitle = {CIDR}}
        

Machine Learning (ML) is a critical component of several novel applications and intelligent features in existing applications. Recent advances in deep learning have fundamentally advanced the state- of-the-art in several areas of research and made it easier to apply ML to a wide variety of problems. However, applied ML projects in industry, where the objective is to build and improve a production feature that uses ML continues to be complicated and often bottlenecked by data management challenges. In this paper, we describe the design and implementation of a machine learning platform for building learned ranking services that leverages key ideas from data management. The platform allows engineers to focus on application-specific modeling and simplifies key tasks of 1) gathering training data, 2) cleaning, validating, and monitoring data quality, 3) training and evaluating models, 4) feature lifecycle management, 5) and infrastructure for A/B tests. We describe key design choices anchored around the core idea of optimizing for experiment velocity. We describe lessons learned from applications built on this platform that have been in production serving hundreds of millions of users for over a year. Finally, we identify two key components of the platform where data management research can have a major impact. We believe such platforms have the potential to accelerate and simplify ML applications the same way data warehouses radically simplified complex reporting applications.

S. Tata, A. Popescul, M. Najork, M. Colagrosso, J. Gibbons, A. Green, A. Mah, M. Smith, D. Garg, C. Meyer, R. Kan (2017)

S. Tata, A. Popescul, M. Najork, M. Colagrosso, J. Gibbons, A. Green, A. Mah, M. Smith, D. Garg, C. Meyer, and R. Kan. Quick Access: Building a Smart Experience for Google Drive. In Proc. of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017.


  @inproceedings{46184,
    title = {Quick Access: Building a Smart
    Experience for Google Drive},
    author = {Sandeep Tata and Alexandrin
      Popescul and Marc Najork and Mike
      Colagrosso and Julian Gibbons and
      Alan Green and Alexandre Mah and
      Michael James Smith and Divanshu Garg
      and Cayden Meyer and Reuben Kan},
    year = {2017},
    booktitle = {Proc. of the 23rd ACM SIGKDD
      International Conference on Knowledge
      Discovery and Data Mining},
    pages = {1643-1651}}
      

Google Drive is a cloud storage and collaboration service used by hundreds of millions of users around the world. Quick Access is a new feature in Google Drive that surfaces the relevant documents to the user on the home page. We describe the development of a machine-learned service behind this feature. Our metrics show that this feature cuts the time it takes for users to locate their documents in half. The development of this product feature is an illustration of a number of more general challenges and constraints associated with machine learning product deployment such as dealing with private corpora and protecting user privacy, working with data services that are not designed with machine-learning in mind and may be owned and operated by different teams with different constraints, and evolving product definitions which inform the metric being optimized. We believe that the lessons learned from this experience will be useful to practitioners tackling a wide range of applied machine-learning problems.

C. Doerr, M. Colagrosso, D. Grunwald, and D. Sicker (2008)

C. Doerr, M. Colagrosso, D. Grunwald, and D. Sicker. Scalability of Cognitive Radio Control Algorithms. In IEEE International Symposium on Wireless Pervasive Computing (ISWPC), pages 685–692, Santorini, Greece, 2008.

  
@inproceedings{Doerr08,
  Address = {Santorini, Greece},
  Author = {C. Doerr and M. Colagrosso and
    D. Grunwald and D. Sicker},
  Booktitle = {Proceedings of the IEEE
    International Symposium on Wireless
    Pervasive Computing (ISWPC)},
  Pages = {685--692},
  Title = {Scalability of Cognitive Radio
    Control Algorithms},
  Year = {2008}}
  

Cognitive radios with their intelligent mechanisms of avoiding interference and their ability to automatically discover and utilize unoccupied spectrum have frequently been proposed for pervasive wireless deployments.

In recent years, a variety of control algorithms managing such cognitive radios have been developed, following many different types of strategies and implementing the whole range from centralized control to distributed control principles. As future wireless deployments are expected to be very dense, the ability of a cognitive radio control algorithm to scale well to large network sizes becomes crucial.

In this paper, we review and discuss a wide variety of previously proposed algorithms for managing cognitive radio networks. For each of these algorithms, we theoretically analyze performance in pervasive, dense deployments and compare their ability to scale.

S. Kurkowski, T. Camp, and M. Colagrosso (2008)

S. Kurkowski, T. Camp, and M. Colagrosso. A visualization and analysis tool for NS-2 wireless simulations: iNSpect. ACMs Mobile Computing and Communications Review, (2008).

  
@article{Kurkowski08a,
  Author = {S. Kurkowski and T. Camp 
    and M. Colagrosso},
  Journal = {ACM's Mobile Computing and 
    Communications Review},
  Title = {A Visualization and Analysis Tool 
    for Wireless Simulations: {iNSpect}},
  Year = {2008}}
  

Simulation is an important tool for wireless ad hoc network research. As simulation complexity increases, tools are needed to analyze volumes of output. In this paper, we discuss a new visualization and analysis tool for wireless simulations, iNSpect. Visual analysis is important for at least three areas of simulation research: (1) verifying the accuracy of a mobility model and the node topologies used to drive the simulation; (2) improving confidence in the correctness of a simulator's models; and (3) analysis of simulation results. We have used iNSpect to improve our research in all of these areas. iNSpect works with any simulator or testbed that can output in iNSpect's format, including NS-2. We have made iNSpect publicly available in order to improve the accuracy of simulation research in our community.

K. Stone and M. Colagrosso (2007)

K. Stone and M. Colagrosso. Efficient duty cycling through prediction and sampling in wireless sensor networks. Wireless Communications and Mobile Computing. Special issue on Cognitive Radio, Software Defined Radio and Adaptive Wireless Systems, vol. 7, num. 9 (2007) 1087-1102.

    
@article{Stone07a,
  Author = {K. Stone and M. Colagrosso},
  Journal = {Wireless Communications and
    Mobile Computing},
  Title = {Efficient duty cycling through
    prediction and sampling in wireless
    sensor networks},
  Volume = {7},
  Number = {9},
  Pages = {1087--1102},
  Year = {2007}}
    

We present BoostMAC, a CSMA-based MAC layer protocol for wireless sensor networks that provides an adjustable interface to achieve ultra-low power operation. To reach low power operation, we adaptively set two radio parameters. First, BoostMAC implements a preamble sampling scheme that allows a mote to dynamically set the length of its duty cycles. Second, we apply machine learning such that a sender can predict its destination’s channel polling time and set each outgoing packet’s preamble length accordingly. Our two improvements require no extra communication overhead, and each mote in the network implements simple, local models. Energy conservation is one of the most fundamental problems in sensor networks, and by optimizing low level operation at the MAC layer, sensor network applications realize an increase in overall performance.

Static configuration at the MAC layer inhibits optimal energy conservation within habitat monitoring applications. We implemented BoostMAC in TOSSIM, and we compare its performance to B-MAC, a well-known low power MAC protocol with static behavior. We found that BoostMAC saves energy relative to B-MAC in bursty networks without affecting latency.

M. Colagrosso, W. Simmons, and M. Graham (2006)

M. Colagrosso, W. Simmons, and M. Graham. Demo abstract: Simple sensor syndication. In SenSys '06: Proceedings of the 4th international conference on Embedded networked sensor systems, pages 377–378, Boulder, CO, 2006.

   
@inproceedings{Colagrosso06b,
  Author = {M. Colagrosso and W. Simmons 
    and M. Graham},
  Booktitle = {SenSys '06: Proceedings of the
    4th international conference on Embedded
    networked sensor systems},
  Title = {Demo Abstract: Simple
    Sensor Syndication},
  Pages = {377--378},
  Year = {2006}}
   

We create a publish/subscribe programming model for wireless sensor networks using RSS feeds, and we call it Simple Sensor Syndication (SSS). This project establishes a new way to make WSNs more interactive, through which a scientist can get interesting sensor data delivered online.

With SSS, scientists using a sensor network specify events that they are interested in, such as when the temperature gets too cold or when the lights go out. They specify these events by writing short sections of code in Python that serve as function detectors. The sensor network evaluates these function detectors periodically over regions of interest, and it records these events for the scientist. Finally, the network generates an RSS feed that is updated every time an event occurs.

We are inspired by related work that makes sensor networks easier to program by using a spreadsheet approach. Like that work, we aim to simplify the process of programming sensor networks, and our approach is to provide high-level Python constructs for regions and events that scientists can create and then manipulate through a web interface. We argue that RSS feeds are lightweight and appropriate for delivering interesting events that occur on the time-scale of minutes or hours over the web.

M. Colagrosso (2006)

M. Colagrosso. Intelligent broadcasting in mobile ad hoc networks: Three classes of adaptive protocols. EURASIP Journal on Wireless Communications and Networking, 2006.

    
@article{Colagrosso06a,
  Author = {M. Colagrosso},
  Journal = {EURASIP Journal on Wireless
    Communications and Networking},
  Title = {Intelligent Broadcasting in Mobile
    Ad Hoc Networks: Three Classes of
    Adaptive Protocols},
  Year = {2006}}
    

Because adaptability greatly improves the performance of a broadcast protocol, we identify three ways in which machine learning can be applied to broadcasting in a mobile ad hoc network (MANET). We chose broadcasting because it functions as a foundation of MANET communication. Unicast, multicast, and geocast protocols utilize broadcasting as a building block, providing important control and route establishment functionality. Therefore, any improvements to the process of broadcasting can be immediately realized by higher-level MANET functionality and applications. While efficient broadcast protocols have been proposed, no single broadcasting protocol works well in all possible MANET conditions. Furthermore, protocols tend to fail catastrophically in severe network environments.

Our three classes of adaptive protocols are Pure Machine Learning, Intra-Protocol Learning, and Inter-Protocol Learning. In the pure machine learning approach, we exhibit a new approach to the design of a broadcast protocol: the decision of whether to rebroadcast a packet is cast as a classification problem. Each mobile node (MN) builds a classifier and trains it on data collected from the network environment.

K. Hellman and M. Colagrosso (2006)

K. Hellman and M. Colagrosso. Investigating a wireless sensor network optimal lifetime solution for linear topologies. Journal of Interconnection Networks, 7(1):91-99, 2006.

    
@article{Hellman06a,
  Author = {K. Hellman and M. Colagrosso},
  Journal = {Journal of Interconnection 
    Networks},
  Title = {Investigating a wireless sensor 
    network optimal lifetime solution for
    linear topologies},
  Volume = {7},
  Number = {1},
  Pages = {91--99},
  Year = {2006}}
    

We investigate a known optimal lifetime solution for a linear wireless sensor network through simulation, and propose alternative solutions where a known optimal solution does not exist. The network is heterogeneous in the sensors’ energy distribution and also in the amount of data each sensor must communicate. As a basis for comparison, we analyze the lifetime of a network using a simple, nearest-neighbor routing algorithm, and an analytic solution to the optimal lifetime of networks meeting certain constraints. Alternative solutions considered range from those requiring global knowledge of the network to solutions using only next-neighbor knowledge. We compare the performance of all the routing algorithms in simulation.

W. Hereman, M. Colagrosso, R. Sayers, A. Ringler, B. Deconinck, M. Nivala, and M.S. Hickman (2005)

W. Hereman, M. Colagrosso, R. Sayers, A. Ringler, B. Deconinck, M. Nivala, and M.S. Hickman. Differential Equations with Symbolic Computation, chapter Continuous and Discrete Homotopy Operators with Applications in Integrability Testing, pages 249–285. Birkhäuser Verlag, Basel, 2005.

    
@inbook{Hereman05a,
  Address = {Basel},
  Author = {W. Hereman and M. Colagrosso and 
    R. Sayers and A. Ringler and B. Deconinck
    and M. Nivala and M.S. Hickman},
  Chapter = {Continuous and Discrete Homotopy
    Operators with Applications in 
    Integrability Testing},
  Editor = {D. Wang and Z. Zheng},
  Pages = {249--285},
  Publisher = {Birkh{\"a}user Verlag},
  Title = {Differential Equations with 
    Symbolic Computation},
  Year = {2005}}
    

We introduce calculus-based formulas for the continuous Euler and homotopy operators. The 1D continuous homotopy operator automates integration by parts on the jet space. Its 3D generalization allows one to invert the total divergence operator. As a practical application, we show how the operators can be used to symbolically compute local conservation laws of nonlinear systems of partial diferential equations in multi-dimensions.

Analogous to the continuous case, we also present concrete formulas for the discrete Euler and homotopy operators. We use it to algorithmically invert the forward diference operator. We apply the discrete operator to compute fluxes of diferential-diference equations in (1+1) dimensions.

Our calculus-based approach allows for a straightforward implementation of the operators in major computer algebra system, such as Mathematica and Maple. The symbolic algorithms for integration and summation by parts are illustrated with elementary examples. The algorithms to compute conservation laws are illustrated with nonlinear PDEs and their discretizations arising in fluid dynamics and mathematical physics.

R. Parker, W. Hoff, V. Norton, J.Y. Lee, and M. Colagrosso (2005)

R. Parker, W. Hoff, V. Norton, J.Y. Lee, and M. Colagrosso. Activity identification and visualization. In Proceedings of the Fifth International Workshop on Pattern Recognition in Information Systems (PRIS), pages 124–133, Miami, Florida, 2005.

    
@inproceedings{Parker05a,
  Author = {R. Parker and W. Hoff and
    V. Norton and J.Y. Lee
    and M. Colagrosso},
  Booktitle = {Proceedings of the Fifth
    International Workshop on Pattern
    Recognition in Information 
    Systems (PRIS)},
  Address = {Miami, Florida},
  Pages = {124--133},
  Title = {Activity Identification
    and Visualization},
  Year = {2005}}
    

Understanding activity from observing the motion of agents is simple for people to do, yet the procedure is difficult to codify. It is impossible to enumerate all possible motion patterns which could occur, or to dictate the explicit behavioral meaning of each motion. We develop visualization tools to assist a human user in labeling detected behaviors and identifying useful attributes. We also apply machine learning to the classification of motion into motion and behavioral labels. Issues include feature selection and classifier performance.

K. Hellman and M. Colagrosso (2005)

K. Hellman and M. Colagrosso. Increasing sensor network lifetime by identifying and leveraging nodes with excess energy in heterogeneous networks. In Proceedings of the 8th International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN 05), pages 542–546, Las Vegas, Nevada, 2005.

    
@inproceedings{Hellman05a,
  Author = {K. Hellman and M. Colagrosso},
  Booktitle = {Proceedings of the 8th 
    International Symposium on Parallel
    Architectures, Algorithms and
    Networks (ISPAN 05)},
  Address = {Las Vegas, Nevada},
  Pages = {542--546},
  Title = {Increasing Sensor Network Lifetime
    by Identifying and Leveraging Nodes with
    Excess Energy In Heterogeneous Networks},
  Year = {2005}}
    

We propose and evaluate wireless sensor routing algorithms designed to extend the lifetime of a heterogeneous wireless sensor network. The network is heterogeneous in the sensors’ energy distribution and also in the amount of data each sensor must communicate. As a basis for comparison, we analyze the lifetime of a network using a simple, nearest-neighbor routing algorithm and an analytic solution to the optimal lifetime of networks meeting certain constraints. We compare the performance of all the routing algorithms in simulation.

N. Bauer, M. Colagrosso, and T. Camp (2005)

N. Bauer, M. Colagrosso, and T. Camp. An agile approach to distributed information dissemination in mobile ad hoc networks. In Proceedings of the IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM), pages 131–141, Taormina, Sicily, 2005.

    
@inproceedings{Bauer05b,
  Author = {N. Bauer and M. Colagrosso 
    and T. Camp},
  Booktitle = {Proceedings of the IEEE
    International Symposium on a World 
    of Wireless, Mobile and Multimedia 
    Networks (WoWMoM)},
  Address = {Taormina, Sicily},
  Pages = {131--141},
  Title = {An agile approach to distributed
    information dissemination in mobile
    ad hoc networks},
  Year = {2005}}
    

In order to ease the challenging task of information dissemination in a MANET, we employ a legend: a data structure passed around a network to share information with all the mobile nodes. Our motivating application of the legend is sharing location information. Previous research shows that a simplistic legend performs better than other location services in the literature. To realize the full potential of legend-based location services, we propose three methods for the legend to traverse a network and compare their performance in simulation. We also evaluate several improvements to the traversal methods, and describe our way of making the legend transmission reliable. The result is a simple, lightweight location service that makes eficient use of network resources.

S. Kurkowski, T. Camp, N. Mushell, and M. Colagrosso (2005)

S. Kurkowski, T. Camp, N. Mushell, and M. Colagrosso. A visualization and analysis tool for NS-2 wireless simulations: iNSpect. In Proceedings of the IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), pages 503–506, Atlanta, Georgia, 2005.

    
@inproceedings{Kurkowski05b,
  Address = {Atlanta, Georgia},
  Author = {S. Kurkowski and T. Camp and 
       N. Mushell and M. Colagrosso},
  Booktitle = {Proceedings of the IEEE
    International Symposium on Modeling,
    Analysis, and Simulation of Computer
    and Telecommunication Systems (MASCOTS)},
  Pages = {503--506},
  Title = {A visualization and analysis tool
    for {NS-2} wireless simulations:
    {iNSpect}},
  Year = {2005}}
    

The Network Simulator 2 (NS-2) is a popular and powerful simulation environment, and the number of NS-2 users has increased greatly in recent years. Although it was originally designed for wired networks, NS-2 has been extended to work with wireless networks, including wireless LANs, mobile ad hoc networks (MANETs), and sensor networks; however, the Network Animator (NAM) for NS-2 has not been extended for wireless visualization. In this paper, we discuss a new visualization and analysis tool for use with NS-2 wireless simulations. Visual analysis of a wireless environment is important for three areas of NS-2 based simulation research: (1) validating the accuracy of a mobility model’s output and/or the node topology files used to drive the simulation; (2) validation of new versions of the NS-2 simulator itself; and (3) analysis of the results of NS-2 simulations. Our iNSpect program handles all three of these areas quickly and accurately. We’ve made our iN-Spect program available for other researchers in order to improve the accuracy of their simulations.

M. Colagrosso and M. Mozer (2005)

M. Colagrosso and M. Mozer. Theories of access consciousness. In Advances in Neural Information Processing Systems 17, pages 289–296, Vancouver, Canada, 2005.

    
@inproceedings{Colagrosso05a,
   Author = {M. Colagrosso and M. Mozer},
   Booktitle = {Advances in Neural Information 
       Processing Systems 17},
   Address = {Vancouver, Canada},
   Pages = {289--296},
   Title = {Theories of access consciousness},
   Year = {2005}}
    

Theories of access consciousness address how it is that some mental states but not others are available for evaluation, choice behavior, and verbal report. Farah, O’Reilly, and Vecera (1994) argue that quality of representation is critical; Dehaene, Sergent, and Changeux (2003) argue that the ability to communicate representations is critical. We present a probabilistic information transmission or PIT model that suggests both of these conditions are essential for access consciousness. Having successfully modeled data from the repetition priming literature in the past, we use the PIT model to account for data from two experiments on subliminal priming, showing that the model produces priming even in the absence of accessibility and reportability of internal states. The model provides a mechanistic basis for understanding the dissociation of priming and awareness.

M. Colagrosso (2005)

M. Colagrosso. A classification approach to broadcasting in a mobile ad hoc network. In Proceedings of 40th IEEE International Conference on Communications (ICC 2005), volume 2, pages 1112–1117, Seoul, South Korea, 2005.

    
@inproceedings{Colagrosso05b,
  Address = {Seoul, South Korea},
  Author = {M. Colagrosso},
  Booktitle = {Proceedings of 40th IEEE 
    International Conference on
    Communications (ICC 2005)},
  Pages = {1112--1117},
  Title = {A Classification Approach to 
    Broadcasting in a Mobile Ad Hoc Network},
  Volume = {2},
  Year = {2005}}
    

We present a new broadcast protocol using Bayesian probabilistic classiiers, and we demonstrate its use in a mobile ad hoc network (MANET). Broadcasting functions as a foundation of MANET communication. Unicast, multicast, and geocast protocols utilize broadcasting as a building block, providing important control and route establishment functionality. Therefore, any improvements to the process of broadcasting can be immediately realized by MANET applications. While efficient broadcast protocols have been proposed, no single broadcasting protocol works well in all possible MANET conditions. Furthermore, every protocol fails catastrophically in severe network environments. We exhibit a new approach to the design of a broadcast protocol: the decision of whether to rebroadcast a packet is cast as a classiication problem. Each mobile node (MN) builds a classiier and trains it on data collected from the network environment. Given an input vector describing a broadcast packet and current network conditions, the classifier returns an output label of “Rebroadcast” or “Drop.” Because each MN adapts to changing network conditions, the result is a more robust communication protocol and more efficient use of network resources. We show that our protocol, compared to those tested, is the most eficient under a range of network conditions.

N. Bauer, M. Colagrosso, and T. Camp (2005)

N. Bauer, M. Colagrosso, and T. Camp. Efficient implementations of all-to-all broadcasting in mobile ad hoc networks. Pervasive and Mobile Computing, 1(3):311–342, 2005.

    
@article{Bauer05a,
  Author = {N. Bauer and M. Colagrosso
    and T. Camp},
  Journal = {Pervasive and Mobile Computing},
  Number = {3},
  Pages = {311--342},
  Title = {Efficient implementations of 
    all-to-all broadcasting in mobile 
    ad hoc networks},
  Volume = {1},
  Year = {2005}}
    

In order to ease the challenging task of information dissemination in a MANET, we employ a legend: a data structure passed around a network to share information with all the mobile nodes. Our motivating application of the legend is sharing location information. Previous research shows that a simplistic legend performs better than other location services in the literature. To realize the full potential of legend-based location services, we propose three methods for the legend to traverse a network and compare their performance in simulation. Two of our proposed methods are novel, and the third is an improvement on an existing method. We also evaluate several general improvements to the traversal methods, and describe our way of making the legend transmission reliable. The result is a simple, lightweight location service that makes efficient use of network resources. Beyond a using the legend as a location service, we discuss several implementation aspects of providing an efficient all-to-all broadcast operation, including legend reliability, preventing duplicate legends, using a legend in dynamic networks, and working with non-synchronized clocks. We provide also provide pseudocode for our legend traversal methods to aid implementation.

S. Kurkowski, T. Camp, and M. Colagrosso (2005)

S. Kurkowski, T. Camp, and M. Colagrosso. MANET simulation studies: The Incredibles. ACM’s Mobile Computing and Communications Review, 9(4):50–61, 2005.

    
@article{Kurkowski05a,
  Author = {S. Kurkowski and T. Camp 
    and M. Colagrosso},
  Journal = {ACM's Mobile Computing and
    Communications Review},
  Number = {4},
  Pages = {50--61},
  Title = {{MANET} simulation studies:
    The Incredibles},
  Volume = {9},
  Year = {2005}}        
    

Simulation is the research tool of choice for a majority of the mobile ad hoc network (MANET) community. However, while the use of simulation has increased, the credibility of the simulation results has decreased. To determine the state of MANET simulation studies, we surveyed the 2000—2005 proceedings of the ACM International Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc). From our survey, we found significant shortfalls. We present the results of our survey in this paper. We then summarize common simulation study pitfalls found in our survey. Finally, we discuss the tools available that aid the development of rigorous simulation studies. We offer these results to the community with the hope of improving the credibility of MANET simulation-based studies.

M. Colagrosso, N. Enochs, and T. Camp (2004)

M. Colagrosso, N. Enochs, and T. Camp. Improvements to location-aided routing through directional count restrictions. In Proceedings of the International Conference on Wireless Networks (ICWN), pages 924–929, Las Vegas, Nevada, 2004.

    
@inproceedings{Colagrosso04a,
  Author = {M. Colagrosso and N. Enochs
    and T. Camp},
  Booktitle = {Proceedings of the International
    Conference on Wireless Networks (ICWN)},
  Address = {Las Vegas, Nevada},
  Pages = {924--929},
  Title = {Improvements to Location-Aided
    Routing through Directional Count 
    Restrictions},
  Year = {2004}}
    

We present an effective way to improve the quality of unicast routes determined by the LAR Box method. Our method uses the location information acquired by LAR itself to determine relay nodes in the forwarding zone. The improvements are two-fold: (1) shorter, more-direct routes are discovered; and (2) less overhead is needed to find those routes. Because more direct routes last longer, higher delivery ratio and lower end-to-end delays are produced by our improvements. The higher the network density, the greater the effect of our improvements.

J. Lackey and M. Colagrosso (2004)

J. Lackey and M. Colagrosso. Supervised segmentation of visible human data with image analogies. In Proceedings of the International Conference on Machine Learning, Models, Technologies and Applications (MLMTA), pages 843–847, Las Vegas, Nevada, 2004.

    
@inproceedings{Lackey04a,
  Author = {J. Lackey and M. Colagrosso},
  Booktitle = {Proceedings of the 
    International Conference on Machine Learning,
    Models, Technologies and 
    Applications (MLMTA)},
  Address = {Las Vegas, Nevada},
  Pages = {843--847},
  Title = {Supervised Segmentation of Visible
    Human Data with Image Analogies},
  Year = {2004}}
    

We present a new application of the Image Analogies algorithm to be used for image segmentation. Our approach requires supervised training data, so we apply it to the domain of labeling human anatomical data. In the Visible Human Project, expert anatomists are overwhelmed with high-resolution images to analyze. We propose that the anatomist can work in conjunction with our approach, letting the machine segment 80% of the images, and requiring that the expert segment only every fifth image.

M. Mozer, M. Colagrosso, and D. Huber (2003)

M. Mozer, M. Colagrosso, and D. Huber. Mechanisms of long-term repetition priming and skill refinement: A probabilistic pathway model. In Proceedings of the Twenty Fifth Annual Conference of the Cognitive Science Society, Hillsdale, NJ, 2003. Erlbaum Associates.

    
@inproceedings{Mozer03a,
  Author = {M. Mozer and M. Colagrosso 
    and D. Huber},
  Booktitle = {Proceedings of the Twenty 
    Fifth Annual Conference of the Cognitive
    Science Society},
  Publisher = {Erlbaum Associates},
  Address = {Hillsdale, NJ},
  Title = {Mechanisms of long-term repetition
    priming and skill refinement: A 
    probabilistic pathway model},
  Year = {2003}}
    

We address an omnipresent and pervasive form of human learning—skill reinement, the improvement in performance of a cognitive or motor skill with practice. A simple and well studied example of skill refinement is the psychological phenomenon of long-term repetition priming: Participants asked to identify briefly presented words are more accurate if they recently viewed the word. We simulate various phenomena of repetition priming using a probabilistic model that characterizes the time course of information transmission through processing pathways. The model suggests two distinct mechanisms of adaptation with experience, one that updates prior probabilities of pathway outputs, and one that increases the instantaneous probability of information transmission through a pathway. These two mechanisms loosely correspond to bias and sensitivity effects that have been observed in experimental studies of priming. The mechanisms are extremely sensible from a rational perspective, and can also explain phenomena of skill learning, such as the power law of practice. Although other models have been proposed of these phenomena, we argue for the probabilistic pathway model on grounds of parsimony and the elegant computational perspective it offers.

M. Mozer, R. Dodier, M. Colagrosso, C. Guerra-Salcedo, and R. Wolniewicz (2002)

M. Mozer, R. Dodier, M. Colagrosso, C. Guerra-Salcedo, and R. Wolniewicz. Prodding the ROC curve: Constrained optimization of classifier performance. In T. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 1409–1415, Cambridge, MA, 2002. MIT Press

    
@inproceedings{Mozer02b,
  Author = {M. Mozer and R. Dodier and
    M. Colagrosso and C. Guerra-Salcedo 
    and R. Wolniewicz},
  Booktitle = {Advances in Neural Information 
    Processing Systems 14},
  Editor = {T. Dietterich and S. Becker 
    and Z. Ghahramani},
  Pages = {1409--1415},
  Publisher = {MIT Press},
  Address = {Cambridge, MA},
  Title = {Prodding the {ROC} curve: 
    Constrained optimization of
    classifier performance},
  Year = {2002}}        
    

When designing a two-alternative classifier, one ordinarily aims to maximize the classifier’s ability to discriminate between members of the two classes. We describe a situation in a real-world business application of machine-learning prediction in which an additional constraint is placed on the nature of the solution: that the classifier achieve a specified correct acceptance or correct rejection rate (i.e., that it achieve a fixed accuracy on members of one class or the other). Our domain is predicting churn in the telecommunications industry. Churn refers to customers who switch from one service provider to another. We propose four algorithms for training a classifier subject to this domain constraint, and present results showing that each algorithm yields a reliable improvement in performance. Although the improvement is modest in magnitude, it is nonetheless impressive given the difficulty of the problem and the financial return that it achieves to the service provider.

M. Mozer, M. Colagrosso, and D. Huber (2002)

M. Mozer, M. Colagrosso, and D. Huber. A rational analysis of cognitive control in a speeded discrimination task. In T. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 51–57, Cambridge, MA, 2002. MIT Press.

    
@inproceedings{Mozer02a,
  Author = {M. Mozer and M. Colagrosso 
    and D. Huber},
  Booktitle = {Advances in Neural Information 
    Processing Systems 14},
  Editor = {T. Dietterich and S. Becker 
    and Z. Ghahramani},
  Address = {Cambridge, MA},
  Pages = {51--57},
  Publisher = {MIT Press},
  Title = {A rational analysis of cognitive
    control in a speeded discrimination task},
  Year = {2002}}
    

We are interested in the mechanisms by which individuals monitor and adjust their performance of simple cognitive tasks. We model a speeded discrimination task in which individuals are asked to classify a sequence of stimuli (Jones & Braver, 2001). Response conflict arises when one stimulus class is infrequent relative to another, resulting in more errors and slower reaction times for the infrequent class. How do control processes modulate behavior based on the relative class frequencies? We explain performance from a rational perspective that casts the goal of individuals as minimizing a cost that depends both on error rate and reaction time. With two additional assumptions of rationality—that class prior probabilities are accurately estimated and that inference is optimal subject to limitations on rate of information transmission—we obtain a good fit to overall RT and error data, as well as trial-by-trial variations in performance.

W. Hereman, Ü. Göktas, M. Colagrosso, and A. Miller (1998)

W. Hereman, Ü. Göktas, M. Colagrosso, and A. Miller. Algorithmic integrability tests for nonlinear differential and lattice equations. Computer Physics Communications, 115:428–446, 1998.

  
@article{Hereman98a,
  Author = {W. Hereman and {\"U}. G{\"o}ktas
    and M. Colagrosso and A. Miller},
  Journal = {Computer Physics
    Communications},
  Pages = {428--446},
  Title = {Algorithmic integrability tests
    for nonlinear differential and
    lattice equations},
  Volume = {115},
  Year = {1998}}
  

Three symbolic algorithms for testing the integrability of polynomial systems of partial differential and differential-difference equations are presented. The first algorithm is the well-known Painleve test, which is applicable to polynomial systems of ordinary and partial differential equations. The second and third algorithms allow one to explicitly compute polynomial conserved densities and higher-order symmetries of nonlinear evolution and lattice equations.

The first algorithm is implemented in the symbolic syntax of both Macsyma and Mathematica. The second and third algorithms are available in Mathematica. The codes can be used for computer-aided integrability testing of nonlinear differential and lattice equations as they occur in various branches of the sciences and engineering. Applied to systems with parameters, the codes can determine the conditions on the parameters so that the systems pass the Painleve test, or admit a sequence of conserved densities or higher-order symmetries.