ftp.cs.arizona.edu

home *** CD-ROM | disk | FTP | other *** search

/ ftp.cs.arizona.edu / ftp.cs.arizona.edu.tar / ftp.cs.arizona.edu / reports / bibs.refer.20090604 < prev next >

Wrap

Text File | 2009-06-04 | 146KB | 3,023 lines

%A Fowler, Joe %A Kobourov, Stephen %A Estrella-Balderrama, Alejandro %T Colored Simultaneous Geometric Embeddings and Universal Pointsets %D 5/14/09 %Z Mon, 05 Jan 09 00:00:00 GMT %R TR09-02 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2009/TR09-02.pdf %X Universal pointsets can be used for visualizing multiple relationships on the same set of objects or for visualizing dynamic graph processes. Using the same point in the plane to represent the same object helps preserve the viewer.s mental map. Small universal pointsets are highly desirable but often do not exist because of the restriction that a given object must be mapped to a fixed point in the plane. In colored simultaneous embeddings this restriction is relaxed, by allowing a given object to map to a subset of points in the plane. Specifically, consider a set of graphs on the same set of n vertices partitioned into k colors. Finding a corresponding set of k-colored points in the plane in which each vertex is mapped to a point of the same color so as to allow a straight-line plane drawing of each graph is the problem of colored simultaneous geometric embedding. For trees, we show that there exists small universal pointsets (1) for 3-colored caterpillars of size n, (2) for 3-colored radius-2 stars of size n+3, and (3) for 2-colored spiders of size n. For outerplanar graphs, we show that these same universal pointsets also suffice for (1) 3-colored K3-caterpillars, (2) 3-colored K3-stars, and (3) 2-colored fans, respectively. We also show that there exist (i) a 2-colored planar graph and pseudo-forest, (ii) three 3-colored outerplanar graphs, (iii) four 4-colored pseudo-forests, and (iv) three 5-colored pseudo-forests without simultaneous embeddedings. %K keywords %Y %A Perkins, David N. %T Predicting Secondary Structure of Proteins by Linear and Dynamic Programming %D April 29, 2009 %Z Mon, 05 Jan 09 00:00:00 GMT %R TR09-01 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2009/TR09-01.pdf %X Proteins are sequences of amino acids that fold into secondary and tertiary structure, which plays an important role in their function. As biologists have yet to discover the rules that govern how a protein folds in nature from its underlying sequence, this thesis tries a new approach to secondary structure prediction using dynamic programming on the input protein sequence. The sequence is broken into short words, where each word has a probability of folding into the three different types of secondary structure. By combining word probabilities with an abstraction called contexts, which model a run of the same secondary structure type up to a bounded length, the optimal prediction for an entire sequence can be computed via dynamic programming. The structure probabilities for words are learned from a training set of sequences with known secondary structure using linear programming. The combined approach to prediction using linear and dynamic programming achieves high accuracy on protein sequences whose words were observed in the training set, but is far less accurate on sequences with unobserved words not seen in the training set. The challenge for future work lies in interpolating probabilities for unobserved words to achieve improved generalization. %K keywords %Y %A Huang, Huilong %T Efficient Routing in Wireless Ad Hoc Networks %D August 12, 2008 %Z Mon, 03 Jan 08 00:00:00 GMT %R TR08-05 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2008/TR08-05.pdf %X abstract (see TR) %K dissertation %Y %A Cappos, Justin %T Stork: Secure Package Management for VM Environments %D May 6, 2008 %Z Mon, 03 Jan 08 00:00:00 GMT %R TR08-04 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2008/TR08-04.pdf %X Package managers are a common tool for installing, removing, and updating software on modern computer systems. Unfortunately existing package managers have two major problems. First, inadequate security leads to vulnerability to attack. There are nine feasible attacks against modern package managers, many of which are enabled by flaws in the underlying security architecture. Second, in Virtual Machine (VM) environments such as Xen, VMWare, and VServers, different VMs on the same physical machine are treated as separate systems by package managers leading to redundant package downloads and installations. This dissertation focuses on the design, development, and evaluation of a package manager called Stork that does not have these problems. Stork provides a security architecture that prevents the attacks other package managers are vulnerable to. Stork also is efficient in VM environments and reduces redundant package management actions. Stork is a real system that has been in use for four years and has managed half a million VM instantiations. %K dissertation, Stork %Y %A Collberg, Christian %A Nagra, Jasvir %A Snavely, Will %T bi`anli.an: Remote Tamper-Resistance with Continuous Replacement %D March 31, 2008 %Z Mon, 03 Jan 08 00:00:00 GMT %R TR08-03 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2008/TR08-03.pdf %X In this paper we describe bi`anli.an, a system for producing tamper-resistant clients programs running over the Internet. The basic idea is to use continuous replacement of program blocks running on the client site to make it difficult for an adversary to analyze and modify the program. There are many potential applications, for example protecting client programs in networked computer games from being tampered with. We show the basic design of a continuous replacement system for Java, including a number of obfuscating and tamperproofing transformations. In achieving tamper-resistance bi`anli.an incorporates two novel ideas. First, it maintains an incorrect pool of code on the client such that no snapshot that an adversary takes contains the complete correct program. In addition, the type of incorrect code introduced ensures that on different input, executing the same trace of instructions will produce incorrect results. Secondly, bi`anli.an achieves scalability through a distributed protection scheme which allows the server to offload the tamper-detection of one client to another client. %K bi`anli.an, tamperproofing, tamper-detection %Y %A Fowler, Joe %A Junger, Michael %A Kobourov, Stephen %A Schulz, Michael %T Characterizing Simultaneous Embedding with Fixed Edges %D February 8, 2008 %Z Mon, 01 Jan 08 00:00:00 GMT %R TR08-01 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2008/TR08-01.ps.Z %X A set of planar graphs share a simultaneous embedding if they can be drawn on the same vertex set V in the plane without crossings between edges of the same graph. Fixed edges are common edges between graphs that share the same Jordan curve in the simultaneous drawings.While any number of planar graphs have a simultaneous embedding without fixed edges, determining which graphs always share a simultaneous embedding with fixed edges (SEFE) has been open. We partially close this problem by giving a necessary condition to determine when pairs of graphs have a SEFE. As a direct application, we are able to determine for the set of planar graphs P and for the set of outerplanar graphs O (all vertices lie on an outerface), what the proper subsets of P and O are that always have a SEFE with all of P and O, respectively. In both cases, we provide algorithms to compute the simultaneous drawings. Finally, we provide a polynomial time decision algorithm for deciding when a specific pair of outerplanar graphs has a SEFE. Whether two planar graphs have a SEFE can similarly be decided in polynomial time remains as an open problem. %K SEFE, vertex, Jordan curve %Y %A Cappos, Justin %A Donnelly, Austin %A Mortier, Richard %A Narayanan, Dushyanth %A Rowstron, Antony %T Cost-aware view materialization for highly distributed datasets %D Thursday, September 13, 2007 %Z Mon, 03 Jan 05 00:00:00 GMT %R TR07-05 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2007/TR07-05.pdf %X Querying large datasets distributed over thousands of endsystems is a challenge for existing distributed querying infrastructures. High data availability requires either replicating or centralizing the dataset but both require infeasibly high network bandwidth. In-situ querying provides low bandwidth overheads but requires users to tolerate low data availability. This paper advocates partial data replication, increasing the availability of a subset of the data through centralization and/or in-network (peer-to-peer) replication. This is analogous to materializing views in centralized databases, but where materialized views in centralized databases trade view update overheads for query overheads, in the distributed case they trade bandwidth usage for availability. Given an example workload, state-of-the-art tools for centralized databases are able to determine a set of materialized views that will improve performance. Key to this is the ability to estimate view maintenance costs with different hypothetical materialized views. This paper describes estimation of view maintenance costs in a highly distributed database. We present metrics that capture the cost of different materializations, and show that we can estimate these metrics accurately, efficiently, and scalably on a real distributed dataset. %K keywords %Y %A Cappos, Justin %A Baker, Scott %A Plichta, Jeremy %A Nyugen, Duy %A Hardies, Jason %A Borgard, Matt %A Johnston, Jeffry %A Hartman, John H. %T Stork: Package Management for Dstributed VM Environments %D Tuesday, May 15, 2007 %Z Tues, 02 Jan 07 00:00:00 GMT %R TR07-02 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2007/TR07-02.ps.Z %X In virtual machine environments each application is often run in its own virtual machine (VM), isolating it from other applications running on the same physical machine. Contention for memory, disk space, and network bandwidth among virtual machines, coupled with an inability to share due to the isolation virtual machines provide, leads to heavy resource utilization. Additionally, VMs increase management overhead as each is essentially a separate system. Stork is a package management tool for virtual machine environments that is designed to alleviate these problems. Stork securely and efficiently downloads packages to physical machines and shares packages between VMs. Disk space and memory requirements are reduced because shared files, such as libraries and binaries, only require one persistent copy per physical machine. Experiments show that Stork reduces the disk space required to install additional copies of a package by over an order of magnitude, and memory by about 50\%. Stork downloads each package once per physical machine no matter how many VMs install it. The transfer protocols used during download improve elapsed time by 7X and reduce repository traffic by an order of magnitude. Stork users can manage groups of VMs with the ease of managing a single machine -- even groups that consist of machines distributed around the world. Stork is a real service that has run on PlanetLab for over 4 years and has managed thousands of VMs. %K keywords %Y %A Cappos, Justin %A Hartman, John H. %T San Fermin: Aggregating Large Data Sets using Dynamic Binomial Trees %D Monday, May 14, 2007 %Z Tues, 02 Jan 07 00:00:00 GMT %R TR07-01 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2007 %X This paper presents San Fermin, a system for aggregating large data sets from the nodes of large-scale distributed systems. Each San Fermin node individually computes the aggregated result by dynamically creating its own binomial tree as it aggregates data. Nodes that fall behind abort their aggregations, thereby reducing overhead. Having each node create its own binomial tree makes San Fermin highly resilient to failures, and ensures that the internal nodes of the tree have high capacity, reducing completion time without overwhelming nodes. Compared to existing solutions San Fermin handles large data sets better, has higher completeness when nodes fail, computes the aggregated result faster, and has better scalability. We analyze the completion time, completeness, and overhead of San Fermin versus existing solutions using analytical models, simulation, and experimentation with a prototype deployed on PlanetLab. Our evaluation shows that San Fermin is scalable both in the number of nodes and in the size of the data being aggregated. With 10% node failures during aggregation, San Fermin still returns the answer from over 97% of the nodes and in most cases does so faster than the underlying DHT recovers from failures. %K keywords %A Fowler, J. Joseph %A Kobourov, Stephen G. %T Characterization of Unlabeled Level Planar Graphs %D Monday, December 4, 2006 %Z Mon, 03 Jan 06 00:00:00 GMT %R TR06-04 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2006/TR06-04.ps.Z %X abstract %K keywords %Y %A Zhang, Beichuan %A Kambhampati, Vamsi %A Massey, Daniel %A Oliveira, Ricardo %A Pei, Dan %A Wang, Lan %A Zhang, Lixia %T A Secure and Scalable Internet Routing Architecture (SIRA) %D Tuesday, April 4, 2006 %Z Mon, 03 Jan 06 00:00:00 GMT %R TR06-01 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2006/TR06-01.ps.Z %X Today's Internet routing architecture faces many challenges, ranging from scaling problems, security threats, poor fault diagnosis to inadequate support for traffic engineering and customer multihoming. By analyzing these challenges and learning lessons from previously proposed solutions, we gain two fundamental insights for designing a secure and scalable routing architecture: cutting a clear boundary between customer networks and transit providers, and embedding essential information in the address structure. We propose the Secure and Scalable Internet Routing Architecture (SIRA), a clean-slate design that separates provider networks from customer networks and embeds organization and location information in address structure. The resulting system provides dramatic improvements in scalability, security, fault diagnosis, and multihoming and traffic engineering support. We also identify new design issues raised by SIRA and sketch out straw-man solutions. %K keyword %Y %A Rao, Praveen %A Moon, Bongki %T psiX: Hierarchical Distributed Index for Efficiently Locating XML Data in Peer-to-Peer Networks %D Monday, March 13, 2006 %Z Mon, 03 Jan 05 00:00:00 GMT %R TR05-10 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/ %X This report has been superseded by a UMKC-TR %K keywords %Y %A Barnard, Kobus %A Fan, Quanfu %A Swaminathan, Ranjini %A Hoogs, Anthony %A Collins, Roderic %A Rondot, Pascale %A Kaufhold, John %T Evaluation of localized semantics: data, methodology, and experiments %D Monday, October 31, 2005 %Z Mon, 03 Jan 05 00:00:00 GMT %R TR05-08 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2005/TR05-08.ps %X We present a new data set encoding localized semantics for 1014 images and a methodology for using this kind of data for recognition evaluation. This methodology establishes protocols for mapping algorithm specific localization (e.g., segmentations) to our data, handling synonyms, scoring matches at different levels of specificity, dealing with vocabularies with sense ambiguity (the usual case), and handling ground truth regions with multiple labels. Given these protocols, we develop two evaluation approaches. The first measures the range of semantics that an algorithm can recognize, and the second measures the frequency that an algorithm recognizes semantics correctly. The data, the image labeling tool, and programs implementing our evaluation strategy are all available on-line (kobus.ca//research/data/IJCV). We apply this infrastructure to evaluate four algorithms which learn to label image regions from weakly labeled data. The algorithms tested include two variants of multiple instance learning, and two generative multi-modal mixture models. These experiments are on a significantly larger scale than previously reported, especially in the case of the multiple instance learning. More specifically, we used training data sets up to 37,000 images and training vocabularies of up to 650 words. We found that image level word prediction, which is a cheaper evaluation alternative, does not correlate well with region labeling performance, thus validating the need for region level analysis. We also found that for the measures sensitive to occurrence statistics, we needed to provide the multiple instance learning methods with an appropriate prior for good performance. With that modification used when appropriate, we found that the EMDD multiple instance learning method gave the best overall performance over three tasks, with one of the generative multi-mixture models giving the best performance on one of them. %K keywords %Y %A Kobourov, Stephen %A Iyer, Anand %A Efrat, Alon %A Erten, Cesim %A Forrester, David %T A Force-Directed Approach to Sensor Localization %D Friday, July 15, 2005 %Z Mon, 03 Jan 05 00:00:00 GMT %R TR05-07 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2005/TR05-07.ps %X abstract %K keywords %Y %A Kobourov, Stephen %A Cappos, Justin %T Outerplanar Graphs and Trees on Tracks %D Friday, July 15, 2005 %Z Mon, 03 Jan 05 00:00:00 GMT %R TR05-06 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2005/TR05-06.ps %X abstract %K keywords %Y %A Kobourov, Stephen %A Landis, Matthew %T Morphing Planar Graphs in Spherical Space %D Friday, July 15, 2005 %Z Mon, 03 Jan 05 00:00:00 GMT %R TR05-05 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2005/TR05-05.ps %X abstract %K keywords %Y %A Collberg, Christian %A Kobourov, Stephen %A Hutcheson, C. %A Trimble, J. %A Stepp, Michael %T Monitoring Java Programs Using Music %D Wednesday, June 15, 2005 %Z Mon, 03 Jan 05 00:00:00 GMT %R TR05-04 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2005/TR05-04.ps.Z %X We present jmusic, a tool to analyze the run-time behavior of a Java program. Unlike other profiling systems, this one presents its results to the user as {\em music}. Using audio in addition to more standard visual methods of presentation allows more attempt at presenting profiling information as computer-generated music. The tool is specification-driven, allowing users complete control over what profiling information to present, and what type of (MIDI) music to generate for different situations. We discuss the system design and implementation, as well as present several examples. %K jmusic, run-time, Java %Y %A Rajagopolan, Mohan %A Debray, Saumya %A Hiltunen, Matti %A Schlichting, Richard %T Automatic Operating System Specialization via Binary Rewriting %D Monday, May 9, 2005 %Z Mon, 03 Jan 05 00:00:00 GMT %R TR05-03 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2005/TR05-03.ps.Z %X This paper explores the application of binary rewriting techniques to customization of operating system kernels. Specifically, this paper describes a new binary rewriting system, {\it Charon}, and its application to the synthesis of application-specific operating systems. Compiler techniques are used to analyze and transform the kernel based on holistic knowledge of the system. Preliminary experiments have been promising and argue persuasively that more opportunities for automation should be explored. %K keywords %Y %A Cappos, Justin %A Hartman, John %T A Resource Allocation Framework for Global Service-Oriented Networks %D Thursday, February 24, 2005 %Z Mon, 03 Jan 05 00:00:00 GMT %R TR05-02 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2005/TR05-02.ps %X We study the problem of allocating resources on global networks where there is no central administrative control. We describe a framework that abstractly describes a number of components that are necessary in an auction system to provide users with a secure trading environment. We propose solutions for specific issues relating to auction granularity, cost/value effective bidding, bids on large resource sets, currency control, and computationally effective auction resolution. We then describe the application of our framework to PlanetLab and how the components would be implemented on this system. %K keywords %Y %A Rajagopalan, Mohan %A Baker, Scott %A Debray, Saumya K. %A Hiltunen, Matti %A Schlichting, Richard D. %A Hartman, John %T System Call Signatures and Hidden Fingerprints %D Wednesday, December 22, 2004 %Z Wed, 03 Jan 04 00:00:00 GMT %R TR04-15 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2004 %X Remote code injection attacks against computer systems are occurring at an alarming frequency. A crucial aspect of such attacks is that in order to do any real damage, the injected attack code has to execute system calls, and therefore can be foiled by suitably hardening the system call interface. Most current proposals for doing so, however, suffer from various shortcomings, such as relying on special compilers or libraries, or incurring huge runtime overheads, or being vulnerable to mimicry attacks. This paper describes a systematic approach to defending against remote code injection attacks that uses two complementary techniques: cryptographic signatures to protect system calls themselves, and compiler-based techniques to hide code fingerprints that could be exploited for mimicry attacks. Experiments indicate that our approach is effective against a wide variety of attacks at modest cost. %K keywords %Y %A Barnard, Kobus %A Johnson, Matthew %T Word Sense Disambiguation with Pictures %D Monday, August 9, 2004 %Z Wed, 03 Jan 04 00:00:00 GMT %R TR04-12 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2004/ %X We introduce using images for word sense disambiguation, either alone, or in conjunction with traditional text based methods. The approach is based on a recently developed method for automatically annotating images by using a statistical model for the joint probability for image regions and words. The model itself is learned from a data base of images with associated text. To use the model for word sense disambiguation, we constrain the predicted words to be possible senses for the word under consideration. When word prediction is constrained to a narrow set of choices (such as possible senses), it can be quite reliable. We report on experiments using the resulting sense probabilities as is, as well as augmenting a state of the art text based word sense disambiguation algorithm. In order to evaluate our approach, we developed a new corpus, ImCor, which consists of a substantive portion of the Corel image data set associated with disambiguated text drawn from the SemCor corpus. Our experiments using this corpus suggest that visual information can be very useful in disambiguating word senses. It also illustrates that associated non-textual information such as image data can help ground language meaning. Please note: This TR has been superseeded by the http://kobus.ca/research/publications/AIJ-WSD AIJ version. %K keywords %Y %A Collberg, Christian %A Myles, Ginger %A Stepp, Michael %T An Empirical Study of Java Bytecode Programs %D Thursday, August 5, 2004 %Z Wed, 03 Jan 04 00:00:00 GMT %R TR04-11 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2004/TR04-11.ps %X We present a study of the static structure of real Java bytecode programs. A total of 1132 Java jar- les were collected from the Internet and analyzed. In addition to simple counts (number of methods per class, number of bytecode instructions per method, etc.), structural metrics such as the complexity of control- ow and inheritance graphs were computed. We believe this study will be valuable in the design of future programming languages and virtual machine instruction sets, as well as in the ef cient implementation of compilers and other language processors. %K keywords %Y %A Collberg, Christian %A Proebsting, Todd A. %T AlgoVista - A Tool for Classifying Algorithmic Problems and Combinatorial Structures %D Wed, April 28, 2004 %Z Wed, 03 Jan 04 00:00:00 GMT %R TR04-09 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2004 %X AlgoVista is a web-based search engine that assists programmers to classify algorithmic problems and combinatorial structures. AlgoVista is not keyword based but rather requires users to provide --- in a very simple query language --- input==>output samples that give a rough description of the behavior of their needed algorithm. AlgoVista also allows algorithm designers to advertise their results in a forum accessible to programmers and theoreticians alike. AlgoVista's search mechanism is based on a novel application of program checking, a technique developed as an alternative to program verification and testing. The current AlgoVista database consists of over 300 problem descriptions including problems on graphs, trees, matrices, vectors, sets, numbers, and geometric objects. AlgoVista can be searched textually as well as visually. A user creates a visual query by simply drawing it on the canvas of a web browser. Visual queries are parsed into their textual counter-part by an algorithm that relies on user input to resolve ambiguities. AlgoVista operates at http://algovista.cs.arizona.edu. %K keywords %Y %A Collberg, Christian %A Thomborson, Clark %A Townsend, Gregg M. %T Dynamic Graph-Based Software Watermarking %D Wed, April 28, 2004 %Z Wed, 03 Jan 04 00:00:00 GMT %R TR04-08 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2004/TR04-08.ps %X Watermarking embeds a secret message into a cover message. In media watermarking the secret is usually a copyright notice and the cover a digital image. Watermarking an object discourages intellectual property theft, or when such theft has occurred, allows us to prove ownership. The Software Watermarking problem can be described as follows. Embed a structure $W$ into a program $P$ such that: $W$ can be reliably located and extracted from $P$ even after $P$ has been subjected to code transformations such as translation, optimization and obfuscation; $W$ is stealthy; $W$ has a high data rate; embedding $W$ into $P$ does not adversely affect the performance of $P$; and $W$ has a mathematical property that allows us to argue that its presence in $P$ is the result of deliberate actions. In this paper we describe a software watermarking technique in which a dynamic graph watermark is stored in the execution state of a program. Because of the hardness of pointer alias analysis such watermarks are difficult to attack automatically. %K keywords %Y %A Sahoo, Tapas Ranjan %A Collberg, Christian %T Software Watermarking in the Frequency Domain: Implementation, Analysis, and Attacks %D Wednesday, March 3, 2004 %Z Wed, 03 Jan 04 00:00:00 GMT %R TR04-07 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2004/TR04-07.ps.Z %X In this paper we analyze the \SHKQ\ software watermarking algorithm, originally due to Stern, Hachez, Koeune and Quisquater. The algorithm has been implemented within the \SM\ framework, a system designed to allow effective study of software protection algorithms (such as code obfuscation, software watermarking, and code tamper-proofing) targeting Java bytecode. The SHKQ algorithm embeds a watermark in a program using a spread spectrum technique. The idea is to spread the watermark over the entire application by modifying instruction frequencies. Spreading the watermark over the code provides a high level of stealth and some manner of resilience against attack. In this paper we describe the implementation of the \SHKQ\ algorithm, in particular the issues that arise when targeting Java bytecodes. We then present an empirical examination of the robustness of the watermark against a wide variety of attacks. We conclude that \SHKQ, while stealthy, is easily attacked by simple distortive transformations. %K keywords %Y %A Collberg, Christian %A Huntwork, Andrew %A Carter, Edward %A Townsend, Gregg %T Graph Theoretic Software Watermarks: Implementation, Analysis, and Attacks %D Wednesday, March 3, 2004 %Z Wed, 03 Jan 04 00:00:00 GMT %R TR04-06 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2004/TR04-06.ps.Z %X This paper presents an implementation of the novel watermarking method proposed by Venkatesan, Vazirani, and Sinha in their recent paper {\em A Graph Theoretic Approach to Software Watermarking}. An executable program is marked by the addition of code for which the topology of the control-flow graph encodes a watermark. We discuss issues that were identified during construction of an actual implementation that operates on Java bytecode. We measure the size and time overhead of watermarking, and evaluate the algorithm against a variety of attacks. %K keywords %Y %A Collberg, Christian %A Myles, Ginger %A Stepp, Michael %T Cheating Cheating Detectors %D Wednesday, March 3, 2004 %Z Wed, 03 Jan 04 00:00:00 GMT %R TR04-05 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2004/TR04-05.ps.Z %X In this paper we present a new cheating technique that is successful at defeating cheating detectors and could become popular with students. The idea is to use obfuscating code transformations (such as those found in the \SM\ tool) to apply a sequence of minor code transformations to a copied programming assignment. This purpose is to produce a copy that will defeat detection. We show that this technique is successful in defeating common plagiarism detectors such as Moss. This paper is offered as a cautionary tale to the Computer Science teaching community. With the advent of powerful code transformation tools it will become necessary to develop correspondingly more powerful cheating detectors, or to revert back to manually testing for plagiarism. %K keywords %Y %A Rao, Praveen %A Moon, Bongki %T Approximate Tree Pattern Counts over (Streaming) Labeled Trees %D Thursday, July 7, 2005 %Z Wed, 03 Jan 04 00:00:00 GMT %R TR04-04 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2004/TR04-04.ps.Z %X There has been a growing interest in developing online approximation algorithms for data streams that use only limited amounts of memory. With XML emerging as a standard for data interchange on the Internet, we believe that the problem of counting twig patterns over streaming XML is worth addressing. In this paper, we address the problem of approximately counting twig patterns over a stream of XML document trees that are looked at only once and appear in a fixed order. We propose a novel approximation algorithm called k-TwigSketch that computes a synopsis of the stream in one pass using a limited amount of memory. k-TwigSketch provides approximate answers for twig counts with provably strong error guarantees. Furthermore, we propose an effective optimization called SHUFFLE that can be applied to k-TwigSketch to reduce the memory requirements for estimating twig counts with a particular relative error guarantee. Experimental results on real and synthetic datasets demonstrate that k-TwigSketch can estimate twig counts within 10% relative errors with high confidence by using small amounts of memory. Further, we observed that the proposed optimization SHUFFLE can achieve significant memory reduction. %K updated %Y %A Heffner, Kelly %A Collberg, Christian %T The Obfuscation Executive %D Friday, February 27, 2004 %Z Wed, 03 Jan 04 00:00:00 GMT %R TR04-03 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2004/TR04-03.ps.Z %X Code obfuscations are semantics-preserving code transformations used to protect a program from reverse engineering. There is generally no expectation of complete, long-term, protection. Rather, there is a trade-off between the protection afforded by an obfuscation (i.e. the amount of resources an adversary has to expend to overcome the layer of confusion added by the transformation) and the resulting performance overhead. An obfuscation tool will generally apply a series of obfuscation algorithms to the same application. While each individual obfuscation may add a trivial amount of confusion, the layering of and interaction between the different transformations can result in a highly obfuscated application. In this paper we examine the problems that arise when constructing an {\em Obfuscation Executive}. This is the main loop in charge of a) selecting the part of the application to be obfuscated next, b) choosing the best transformation to apply to this part, c) evaluating how much confusion and overhead has been added to the application, and d) deciding when the obfuscation process should terminate. %K keywords %Y %A Hartman, John %A Collberg, Christian %A Babu, Sridivya %A Udupa, Sharath K. %T Slinky: Static Linking Reloaded %D Friday, February 27, 2004 %Z Wed, 03 Jan 04 00:00:00 GMT %R TR04-02 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2004/TR04-02.ps.Z %X Static linking has many advantages over dynamic linking. It is simple to understand, implement, and use. It ensures that an executable is self-contained and does not require a particular set of libraries during execution. As a consequence, the executable image that was tested by the developer is exactly the same as gets executed by the user, diminishing the risk that the user's environment will affect correct behavior. The major disadvantages of static linking are increases in the memory required to run an executable, network bandwidth to transfer it, and disk space to store it. In this paper we describe the \slinky\ system that uses digest-based sharing to combine the simplicity of static linking with the space savings of dynamic linking: \slinky\ executables are completely self-contained and minimal performance and disk-space penalties are incurred if two executables use the same library. We have developed a \slinky\ prototype that consists of tools for adding digests to executables, and a slight modification of the Linux kernel to use those digests to share code pages. Results show that our unoptimized prototype has a performance decrease of at most 4\% and a space increase of 40\% relative to dynamic linking. %K keywords %Y %A Rajagopalan, Mohan %A Debray, Saumya K. %A Hiltunen, Matti A. %A Schlichting, Richard D. %T Reducing the energy cost of application/OS Interactions %D Tuesday, October 21, 2003 %Z Wed, 03 Jan 04 00:00:00 GMT %R TR03-19 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2003/TR03-19.ps %X Software approaches to power optimization have traditionally had two distinct foci, in relative isolation, targeting either individual applications (compilationbased techniques) or global (operating system) policies. Dynamic interactions between the application and operating system through system calls, which can potentially have a large impact on overall performance and power consumption, remain largely unoptimized due to the partitioning of concerns. This paper discusses the energy implications of a new system call clustering optimization technique for reducing application/OS interaction costs that is based on a novel multi-call mechanism. Preliminary results on common utility programs such as the mpeg play video decoder have been promising. %K keywords %Y %A Kollipara, Siva %T Applying Network Processors to Header Compression %D Monday, October 20, 2003 %Z Wed, 03 Jan 00 00:00:00 GMT %R TR03-17 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2003/TR03-17.ps.Z %X Network Processors (NPs) are highly specialized processors optimized to handle protocol processing, with possibly including many other services like QoS, Statistics, Compression-Decompression etc, at highest speeds possible. Many vendors like Intel, IBM etc., have released powerful silicon for this market. NPs represent the Holy Grail the panacea of network bottlenecks. Even though the architectural design of these various NPs varies significantly, the architectural principles/goals/ideas/requirements are almost similar - to achieve network processing at highest speeds possible with low latency. The programmability and parallel nature of these processor chips make them an ideal choice when high performance and ability to quickly adapt to new network standards are the requirements. This report briefly describes the need for header compression in the Internet today and provides an analysis of the evolution of CRTP header compression design, using the IXP2X00. This was a particularly difficult challenge due to the fact that the order of packets is important. Reordering is not admissible since the compression (decompression) of each valid packet makes changes to the context, which is used for the compression (decompression) of the next packet. We also have time constraints to consider based on bus width, bus clock speed, processor clock speed, pipelining, network data rate (keeping in mind the link layer, physical layer overhead) and packet processing. Say some "x" bus ops/pkt and "y" CPU ops/pkt. At this rate, for each packet, buffers must be allocated, the packet must be received, stored in DRAM, header verified, classified, evaluated as to whether it should be dropped, compressed (decompressed), context updated, previous header stripped, new compressed (decompressed) header prepended, statistic counters updated, (multi-field and destination lookups performed for destination,) enqueued for transmit, transmitted, and buffers freed. This report chronicles the issues encountered and solutions devised to achieve header compression. The necessary concepts of context/functional pipeline, critical sections, signaling and ring buffers are defined with references. The CRTP pipe stages are described. %K keywords %Y %A Kollipara, Siva %T Venti FS: A Hash Based File System %D Monday, October 20, 2003 %Z Wed, 03 Jan 00 00:00:00 GMT %R TR03-16 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2003/TR03-16.ps.Z %X Venti, a network storage system, is a write-once archival repository. It provides a block level interface and uses unique hashes (SHA1 hash) to identify these blocks. These hashes can be used to identify duplicate blocks and thus eliminate redundant duplication of blocks and reduce storage consumption. A direct mapping between hashes and disk addresses is not possible, therefore an indexer is used to map the hashes to Venti disk addresses. Venti can be used as a building block for constructing a variety of storage applications such as logical backup, physical backup and snapshot file systems. Most file systems to date use fixed size blocks. Fixed size blocks limit the amount of duplication possible. However by breaking files into variable sized blocks based on the identification of anchor or break points, duplication can be increased and cross file similarities can be exploited efficiently. We have built a file system on top of Venti and investigated the implications of several design decisions: 1. Use of different searching algorithms (B-Tree / Binary Tree) in indexer; 2. Variable Sized blocks instead of fixed size blocks. We present some performance results that measure the average execution time for various operations of the file system and also analyze the impact of the design decisions made. %K keywords %Y %A Kollipara, Siva %T SENSE: A Toolkit for Stick-e Frameworks %D Monday, October 20, 2003 %Z Wed, 03 Jan 00 00:00:00 GMT %R TR03-15 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2003/TR03-15.ps.Z %X Context-aware applications are gaining popularity and are hot research topics. As context-aware devices hit the market these kinds of applications will become common. This paper presents one such application that takes advantage of context-awareness - Electronic Post-It Note. This application consists of presenting information to the user as he enters a given context. There are already some existing E-Post-It Note software, but these are too application/field oriented; thus lacking the broadness that would make this a "killer" application. This project aims at providing a generic E-Post-It Note application that can support various end-user devices, enhanced/enriched contexts and even a combination of contexts! The increase in the number of handheld devices and active research in the field of Context Aware Computing has led to a growing interest in the development of platforms and toolkits for context aware applications. In this paper we present the experiences in designing and implementing SENSE: An e-note toolkit for context aware applications. SENSE provides a platform to develop applications like campus guides that are both passive and active context aware. %K keywords %Y %A Arango, Jesus %T Virtual IP Machines: A System Framework for Emulating Multiple IP Hosts %D Friday, October 10, 2003 %Z Wed, 03 Jan 00 00:00:00 GMT %R TR03-14 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2003 %X This paper proposes a framework for emulating multiple IP hosts (virtual IP machines) inside a single system kernel. By augmenting and restructuring certain system components, the framework can provide an emulation and testing environment where a single system is capable of transparently representing multiple IP hosts comprising a virtual network. The main advantage of the framework is that existing protocols, applications and network configuration utilities may execute under multiple virtual IP machines without being modified nor recompiled. The framework significantly reduces the equipment and spatial resources required by test beds and laboratory environments to appropriately conduct experiments and emulations. This paper describes the architecture of the framework. We also describe our implementation in the Linux kernel and analyze the performance and scalability of the framework based on the results obtained from experiments conducted on our implementation. %K keywords %Y %A Arango, Jesus %A Degermark, Mikael %A Efrat, Alon %T An Efficient Flooding Algorithm for Mobile Ad-hoc Networks %D Friday, October 10, 2003 %Z Wed, 03 Jan 03 00:00:00 GMT %R TR03-13 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2003/ %X We present a bandwidth-efficient flooding algorithm intended for use in wireless ad-hoc networks, and demonstrate its viability by simulation. Flooding algorithms solve the problem of delivering a message to all nodes in a network. With standard flooding algorithms, each node broadcasts received messages being flooded in order to ensure delivery to all nodes in the network. Our simulations show that the new algorithm is able to achieve a significant reduction in the number of broadcasts required to reach all nodes while still being able to maintain high coverage and low latencies. The number of broadcast messages required decreases with increasing node density, with an overhead reduction of 57% at a node density of 6.56. The new algorithm is simple to implement and does not require any additional protocol messages. %K keywords %Y %A Visvanathan, Srinivas %A Hartman, John H. %T Exploiting Trust in Peer-to-Peer Systems %D Tuesday, August 5, 2003 %Z Wed, 03 Jan 00 00:00:00 GMT %R TR03-09 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2003/TR03-09.ps.Z %X Second generation peer-to-peer systems have shown a lot of promise with many desirable properties such as scalability, self-configuration, automatic load balancing etc. However the open nature of these systems also makes them vulnerable to Byzantine attacks. These systems are designed to be fault tolerant in the presence of a few malicious nodes. We however believe that even a handful of evil nodes could disrupt the system by exploiting certain weaknesses that we have identified. In these systems, a single malicious node, can present multiple identities to the system. This allows them to emulate multiple nodes simultaneously and also allows them limited control of their placement in overlay networks that are integral to the system. We devised attacks based on these weaknesses to disrupt lookup and storage operations in the peer-to-peer storage systems, CFS and PAST. We show that it is possible to exploit these weaknesses to attack these systems and have evaluated the feasibility of these attacks. %K keywords %Y %A Siva Kollipara %T SFSRO LITE - A Self-Certifying Read-Write File System %D Thursday, June 26, 2003 %Z Wed, 03 Jan 03 00:00:00 GMT %R TR03-07 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2003 %X This report presents a read-write version of SFSRO. SFSRO [2], a self-certifying read-only file system, is a content distribution system providing secure, scalable read-only access to data. Since the data exists as a database, the read-only file system structure is too tightly coupled with the data [2]. Any change in the data will cause the whole file system to be updated and its hierarchy modified. To get around this problem, SFSRO creates a completely new database for each version of the file system and incrementally updates the replica file servers. In this report, we present some modifications and extensions to SFSRO. We aim to achieve a file system with SFSRO functionality and guarantees, but without its drawbacks and limitations - suitably called SFSRO Lite (SFSROL). We plan to build a read-write version of SFSRO that uses a 'floating inode table' [5] and supports variable sized blocks. We have currently finished making necessary changes to incorporate the floating inode table. Further research in achieving a read-write version is in progress. Measurements of an implementation show that SFSROL is approximately 0-2% slower than SFSRO. %K keywords %Y %A Rao, Praveen %A Moon, Bongki %T PRIX: Indexing And Querying XML Using Prufer Sequences %D Wednesday, June 25, 2003 %Z Wed, 03 Jan 03 00:00:00 GMT %R TR03-06 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2003/TR03-06.ps.Z %X We propose a new way of indexing XML documents and processing twig patterns in an XML database. Every XML document in the database can be transformed into a sequence of labels by Prufer's method that constructs a one-to-one correspondence between trees and sequences. During query processing, a twig pattern is also transformed into its Prufer sequence. By performing subsequence matching on the set of sequences in the database, and performing a series of refinement phases that we have developed, we can find all the occurrences of a twig pattern in the database. Our approach allows holistic processing of a twig pattern without breaking the twig into root-to-leaf paths and processing these paths individually. Furthermore, we show in the paper that all correct answers are found without any false dismissals or false alarms. Experimental results demonstrate the performance benefits of our proposed techniques. %K keywords %Y %A Rajagopalan, Mohan %A Debray, Saumya K. %A Hiltunen, Matti A. %A Schlichting, Richard D. %T Profile Directed Optimization of Event Based Programs %D Friday, June 6, 2003 %Z Wed, 03 Jan 03 00:00:00 GMT %R TR03-05 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2003 %X Events are used as a fundamental abstraction in programs ranging from graphical user interfaces (GUIs) to systems for building customized network protocols. While providing a flexible structuring and execution paradigm, events have the potentially serious drawback of extra execution overhead due to the indirection between modules that raise events and those that handle them. This paper describes an approach to addressing this issue using static optimization techniques. This approach, which exploits the underlying predictability often exhibited by event-based programs, is based on first profiling the program to identify commonly occurring event sequences. A variety of techniques that use the resulting profile information are then applied to the program to reduce the overheads associated with such mechanisms as indirect function calls and argument marshaling. In addition to describing the overall approach, experimental results are given that demonstrate the effectiveness of the techniques. These results are from event-based programs written for X Windows, a system for building GUIs, and Cactus, a system for constructing highly configurable distributed services and network protocols. %K keywords %Y %A Erten, Cesim %A Harding, Philip J. %A Kobourov, Stephen G. %A Wampler, Kevin %A Yee, Gary %T Exploring the Computing Literature Using Temporal Graph Visualization %D Tuesday, June 3, 2003 %Z Wed, 03 Jan 03 00:00:00 GMT %R TR03-04 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2003 %X What are the hottest computer science research topics today? Which research areas are experiencing steady decline? How many co-authors are typical for a research paper today and 20 years ago? Who are the most prolific writers? In this paper, we attempt to address these questions as well as study collaboration patterns, research communities, interactions between related research specialties, and the evolution of these characteristics through time. For our analysis we use data from the Association of Computing Machinery's Digital Library of Scientific Literature (ACM Portal) which contains over a hundred thousand research papers and authors. We use a novel technique for visualization of large graphs that evolve through time. Given a dynamic graph, the layout algorithm produces two-dimensional representations of each time-slice, while preserving the mental map of the graph from one slice to the next. A combined view, with all the time-slices can also be viewed and explored. Graphs with tens of thousands of vertices and edges, resulting from specific queries to our local copy of the ACM database, are generated and displayed in seconds. The images in this paper are produced by a graph layout tool which uses the dynamic graph layout algorithm. %K keywords %Y %A Kobourov, Stephen %A Collberg, Christian %T Self-Plagiarism in Computer Science %D Tuesday, May 13, 2003 %Z Wed, 03 Jan 00 00:00:00 GMT %R TR03-03 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2003 %X abstract %K keywords %Y %A Arvind Krishnaswamy %A Rajiv Gupta %T Instruction Coalescing for 16-bit Code %D Thursday, January 16, 2003 %Z Mon, 06 Jan 03 00:00:00 GMT %R TR03-01 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2003/ %X In the embedded domain, memory usage and energy consumption are critical constraints. Embedded processors such as the ARM and MIPS provide a 16-bit instruction set (called Thumb in the case of the ARM cpu family) in addition to the 32-bit instruction set to address these concerns. Using 16-bit instructions one can achieve code size reduction and I-cache energy savings at the cost of performance. This paper presents a novel approach that enhances the performance of 16-bit Thumb code. We have observed that throughout Thumb code there exist Thumb instruction pairs that are equivalent to a single ARM instruction. We have developed enhancements to the processor microarchitecture and the Thumb instruction set to exploit this property. We enhance the Thumb instruction set by incorporating Augmenting eXtensions(AX). A Thumb instruction pair that can be combined into a single ARM instruction is replaced by an AXThumb instruction pair by the compiler. The AX instruction is coalesced with the immediately following Thumb instruction to generate a single ARM instruction at decode time. The enhanced microarchitecture ensures that coalescing does not introduce pipeline delays or increase cycle time thereby resulting in reduction of both instruction counts and cycle counts. Using AX instructions and coalescing hardware we are also able to support efficient predicated execution in 16 bit mode. %K keywords %Y %A Trebisky, Tom %T The Skidoo Real-Time Operating System %D Monday, November 25, 2002 %Z Mon, 03 Jan 05 00:00:00 GMT %R TR02-05 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2002/TR02-05.ps.Z %X Embedded systems have needs that are not adequately met by conventional operating systems. Skidoo is a new operating system especially tailored to support embedded systems. Independently scheduled threads are provided that synchronize using semaphores and condition variables. Threads share a common address space and communicate using shared variables. Fully preemptive scheduling meets the needs of hard real-time applications. %K keywords %Y %A Baker, Scott %A Hartman, John %T The Mirage NFS Router %D Monday, November 25, 2003 %Z Wed, 03 Jan 02 00:00:00 GMT %R TR02-04 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2002 %X Mirage is a system that aggregates multiple NFS servers into a single, virtual NFS file server. It is interposed between the NFS clients and servers, making the clients believe that they are communicating with a single, large server. Mirage is an NFS router because it routes an NFS request from a client to the proper NFS server, and routes the reply back to the proper client. Mirage also prevents DoS attacks on the NFS protocol, ensuring that all clients receive a fair share of the servers\222 resources. Mirage is designed to run on an IP router, providing virtualized NFS file service without affecting other network traffic. Experiments with a Mirage prototype show that Mirage effectively virtualizes an NFS server using unmodified clients and servers, and it ensures that legitimate clients receive a fair share of the NFS server even during a DoS attack. Mirage imposes an overhead of only 7% on a realistic NFS workload. %K Mirage, router, NFS %Y %A Moon, Bongki %A Shin, Hyoseop %A Lee, Sukho %T Adaptive and Incremental Processing for Distance Join Queries %D Thursday, August 29, 2002 %Z Wed, 03 Jan 02 00:00:00 GMT %R TR02-03 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2002 %X A spatial distance join is a relatively new type of operation introduced for spatial and multimedia database applications. Additional requirements for ranking and stopping cardinality are often combined with the spatial distance join in on-line query processing or internet search environments. These requirements pose new challenges as well as opportunities for more efficient processing of spatial distance join queries. In this paper, we first present an efficient k-distance join algorithm that uses spatial indexes such as R-trees. Bi-directional node expansion and plane-sweeping techniques are used for fast pruning of distant pairs, and the plane-sweeping is further optimized by novel strategies for selecting a sweeping axis and direction. Furthermore, we propose adaptive multi-stage algorithms for k-distance join and incremental distance join operations. Our performance study shows that the proposed adaptive multi-stage algorithms outperform previous work by up to an order of magnitude for both k-distance join and incremental distance join queries, under various operational conditions. %K spatial, cardinality, k-distance %Y %A Al-Bin-Ali, Fahd %T Design and Implementation of an Inter-cell Management System: The Sabino System %D Tuesday, July 2, 2002 %Z Wed, 03 Jan 02 00:00:00 GMT %R TR02-02 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2002 %X Wireless networks based on IEEE 802.11 are becoming increasingly widely deployed. However, 802.11 systems fail to make optimal use of the available network resources because of a lack of coordination between different system components, in particular, the absence of any form of coordination between access points and mobile nodes. Consequently, wireless network installations may not make optimal allocation of mobile nodes to access points, leading to imbalances in the load distribution and potentially reducing the network bandwidth available to users. This thesis presents the Sabino system: a lightweight management architecture that coordinates the actions of access points and mobile nodes to ensure, where possible, a more balanced load distribution and hence better utilization of system resources. Sabino as an architectural solution is applicable for many other applications and future wireless deployments in which mobile nodes have different Quality-of-Service requirements and are serviced by different network technologies. Key aspects of the Sabino system include easy and incremental deployment, flexible architectural topology, minimal overhead and effective management of resources. This thesis presents evidence of some of the problems in existing 802.11 systems and it describes the design, the implementation and the evaluation of the Sabino system as an effective solution to these problems. %K keywords %Y %A Hartman, John %T Activating Storage Systems with Agents %D Thursday, June 13, 2002 %Z Wed, 03 Jan 02 00:00:00 GMT %R TR02-01 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2002 %X Swarm is a scalable, modular storage system that allows high-level services to influence low-level storage functions such as data layout, metadata management, and crash recovery via agents. An agent is a program that is attached to data in the storage system and invoked when particular events occur during the data's lifetime. For example, when Swarm needs to write data to disk, agents attached to the data are invoked to determine a layout policy. Agents can be persistent, so that they remain attached to the data they manage until the data are deleted; this allows agents to continue to affect how the data are handled long after the application or storage service that created the data has terminated. Swarm and its agent mechanism are implemented as a Linux kernel module. In this paper, we present Swarm's agent architecture, describe the types of agents that Swarm supports and the infrastructure used to support them, and discuss their performance overhead and security implications. We describe how several storage services and applications use agents, and the benefits they derive from doing so. %K keywords %Y %A Cappos, Justin %A Homer, Patrick %T DsCats: Animating Data Structures for CS 2 and CS 3 Courses %D Wednesday, March 14, 2001 %Z Mon, 03 Jan 01 00:00:00 GMT %R TR01-02 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2001/TR01-02.ps.Z %X A new data structure animation tool called DsCats is available for classroom use. This tool supports educator presentations, student experimentation, and programming assignments. It implements a user-centered approach supporting a wide range of detail levels, the ability to jump to any point in the animation, and on the fly variations in the data structure during animations. The tool is written in Java with modularity and expandability in mind. %K keywords %Y %A Collberg, Christian %T A Fuzzy Visual Query Language for a Domain-Specific Web Search Engine %D Tuesday, March 13, 2001 %Z Tue, 02 Jan 01 00:00:00 GMT %R TR01-01 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2001 %X Algovista is a web-based search engine that assists programmers to find algorithms and implementations that solve specific problems. Algovista is not keyword based but rather requires users to provide - in a very simple textual language - input/output samples that describe the behavior of their needed algorithm. Unfortunately, even this simple language has proven too challenging for casual users. To overcome this problem and make Algovista more accessible to novice programmers, we are designing and prototyping a visual language for creating Algovista queries. Since web users do not have the patience to learn fancy query languages (be they textual or visual), our goal is to make this language and its implementation natural enough to require virtually no explanation or user training. %K Algovista, web-based, algorithm, language %Y %A Gupta, Neelam %A Bass, Len %T Designing Software to Reduce Cost of Testing %D Thursday, July 6, 2000 %Z Wed, 03 Jan 00 00:00:00 GMT %R TR00-06 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2000/ %X Software testing is an important and expensive component of the software development life cycle. The testing community has always treated the design of the software to be tested as an input over which they have no control. In this paper, we propose a new approach to reduce the cost of integration testing by influencing the design of the system to be tested. We consider the simple pipe and filter architecture style and analyse its testability for integration testing. Our analysis shows that the size of test suite required for integration testing is a linear function of the number of modules in pipe and filter architecture style. In contrast, the size of test suite required for a general design, with arbitrary communication among its modules, is an exponential function of the number the modules in the design. This illustrates that the cost of the testing stage can be significantly reduced by appropriate selection of the architecture style during the design stage. %K software archictecture style, software testing, pipe and filter %Y %A Hiltunen, Matti %A Jaiprakash, Sumita %A Schlichting, Richard %A Ugarte, Carlos %T Fine-Grain Configurability for Secure Communication %D Friday, June 16, 2000 %Z Wed, 03 Jan 00 00:00:00 GMT %R TR00-05 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2000/ %X Current solutions for providing communication security in network applications allow customization of certain security attributes and techniques, but in limited ways and without the benefit of a single unifying framework. Here, the design of a highly-customizable extensible service called SecComm is described in which attributes such as authenticity, privacy, integrity, and non-repudiation can be customized in arbitrary ways. With SecComm, applications can open secure communication connections in which only those attributes selected from among a wide range of possibilities are enforced, and are enforced using the strength or technique desired. SecComm has been implemented using Cactus, a system for building configurable communication services. In Cactus, different properties and techniques are implemented as software modules called micro-protocols that interact using an event-driven execution paradigm. This non-hierarchical design approach has a high degree of flexibility, yet provides enough structure and control that it is easy to build collections of micro-protocols realizing a large number of diverse properties. This paper gives an overview of the design and implementation of SecComm, and gives initial performance figures for a prototype implementation running on a cluster of Pentiums using the Mach MK 7.3 operating system. %K keywords %Y %A Debray, Saumya %A Evans, William %A Muth, Robert %A De Sutter, Bjorn %T Compiler Techniques for Code Compaction %D Wednesday, March 15, 2000 %Z Wed, 03 Jan 00 00:00:00 GMT %R TR00-04 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2000/ %X In recent years there has been an increasing trend towards the incorporation of computers into a variety of devices where the amount of memory available is limited. This makes it desirable to try to reduce the size of applications where possible. This paper explores the use of compiler techniques to accomplish code compaction to yield smaller executables. The main contribution of this paper is to show that careful, aggressive, interprocedural optimization, together with procedural abstraction of repeated code fragments, can yield significantly better reductions in code size than previous approaches, which have generally focused on abstraction of repeated instruction sequences. We also show how ``equivalent'' code fragments can be detected and factored out using conventional compiler techniques, and without having to resort to purely linear treatments of code sequences as in suffix-tree-based approaches, thereby setting up a framework for code compaction that can be more flexible in its treatment of what code fragments are considered equivalent. Our ideas have been implemented in the form of a binary-rewriting tool that reduces the size of executables by about 30% on the average. %K keywords %Y %A Collberg, Christian %A Thomborson, Clark %T Watermarking, Tamper-Proofing, and Obfuscation -- Tools for Software Protecti on %D Thursday, Feb 10, 2000 %Z Wed, 03 Jan 00 00:00:00 GMT %R TR00-03 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2000/ %X abstract %K keywords %Y %A Collberg, Christian %A Davey, Sean %A Proebsting, Todd %T Language-Agnostic Program for Rendering for Presentation, Debugging and Visua lization %D Thursday, Feb 10, 2000 %Z Wed, 03 Jan 00 00:00:00 GMT %R TR00-02 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2000/ %X abstract %K keywords %Y %A Collberg, Christian %A Proebsting, Todd %T AlgoVista - A Search Engine for Computer Scientists %D Thursday, January 27, 2000 %Z Mon, 03 Jan 00 00:00:00 GMT %R TR00-1 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/2000/TR %X We describe AlgoVista, a web-based search engine designed to allow applied computer scientists to classify problems and find algorithms and implementations that solve these problems. Unlike other search engines, AlgoVista is not keyword based. Rather, users provide a set of input==>output samples that describe the behavior of the problem they wish to classify. This type of query-by-example requires no knowledge of specialized terminology, only an ability to formalize the problem. The search mechanism of AlgoVista is based on a novel application of program checking, a technique developed as an alternative to program verification and testing. %K AlgoVista, web-based, keyword %Y %A Hyoseop Shin %A Bongki Moon %A Sukho Lee %T Adaptive Multi-Stage Distance Join Processing %D Monday, October 18, 1999 %Z Wed, 08 Jan 99 00:00:00 GMT %R TR99-14 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1999/TR99-14.ps %X A spatial distance join is a relatively new type of operation introduced for spatial and multimedia database applications. Additional requirements for ranking and stopping cardinality are often combined with the spatial distance join in on-line query processing or internet search environments. These requirements pose new challenges as well as opportunities for more efficient processing of spatial distance join queries. In this paper, we first present an efficient $k$-distance join algorithm that uses spatial indexes such as R-trees. Bi-directional node expansion and plane-sweeping techniques are used for fast pruning of distant pairs, and the plane-sweeping is further optimized by novel strategies for selecting a sweeping axis and direction. Furthermore, we propose adaptive multi-stage algorithms for $k$-distance join and incremental distance join operations. Our performance study shows that the proposed adaptive multi-stage algorithms outperform previous work by up to an order of magnitude for both $k$-distance join and incremental distance join queries, under various operational conditions. %K keywords %Y %A Muth, Robert %A Debray, Saumya %T On the Complexity of Flow-Sensitive Dataflow Analyses %D Wednesday, August 4, 1999 %Z Wed, 08 Jan 99 00:00:00 GMT %R TR99-12 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1999/TR99-12.ps %X This paper attempts to address the question of why certain dataflow analysis problems can be solved efficiently, but not others. We focus on flow-sensitive analyses, and give a simple and general result that shows that analyses that require the use of relational attributes for precision must be PSPACE-hard in general. We then show that if the language constructs are slightly strengthened to allow a computation to maintain a very limited summary of what happens along an execution path, inter-procedural analyses become EXPTIME-hard. We discuss applications of our results to a variety of analyses discussed in the literature. Our work elucidates the reasons behind the complexity results given by a number of authors, improves on a number of such complexity results, and exposes conceptual commonalities underlying such results that are not readily apparent otherwise. %K PSPACE-hard, EXPTIME-hard %Y %A Verkhedkar, Sameer A. %T A Highly Customizable System Monitoring and Control Tool %D Monday, July 19, 1999 %Z Wed, 08 Jan 99 00:00:00 GMT %R TR99-11 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1999/TR99-11.ps %X The management of a large collection of machines connected by a network and the software running on them is a difficult task. For example, processes may fail causing service disruptions, or users may exceed their quota of system resources causing shortages. If the site has a large number of machines, keeping track of all such problems can be very cumbersome. This thesis presents a system monitoring tool that addresses these problems. The tool monitors different aspects of the system state, such as processes running on each machine and their resource usage, and reports the information to the administrator via a graphical user interface(GUI). The tool can also be used for controlling remote machines through the GUI by starting or stopping processes, and for automatically restarting failed processes. Since the monitoring requirements differ for every system, the monitoring tool is highly customizable and extensible at a fine-grain level. This is achieved by implementing the tool as a collection of modules called micro-protocols that can be configured in various combinations to provide customized variants of the monitoring service. The implementation is based on Cactus, a framework for developing middleware offering fine-grain customizability for Quality of Service(QoS) attributes related to dependability, real-time and security in distributed systems. This thesis describes the design and implementation of an architecture for system administration based on a higly customizable system monitoring tool, and validates the Cactus approach for developing highly customizable software at the application layer. %K thesis %Y %A Moon, Bongki %A Jagadish, H.V. %A Faloutsos, Christos %A Saltz, Joel H. %T Analysis of the Clustering Properties of Hilbert Space-filling Curve %D May 26, 1999 %Z Wed, 08 Jan 99 00:00:00 GMT %R TR99-10 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1999/TR99-10.ps %X Several schemes for linear mapping of multidimensional space have been proposed for many applications such as access methods for spatio-temporal databases, image compression and so on. In all these applications, one of the most desired properties from such linear mappings is clustering, which means the locality between objects in the multidimensional space is preserved in the linear space. It is widely believed that the Hilbert space-filling curve achieves the best clustering [1, 16]. In this paper we provide closed-form formulas of the number of clusters required by a given query region of an arbitrary shape (e.g., polygons and polyhedra) for Hilbert space-filling curve. Both the asymptotic solution for a general case and the exact solution for a special case generalize the previous work [16], and they agree with the empirical results that the number of clusters depends on the hyper-surface area of the query region and not on it's hyper-volume. We have also shown that Hilbert curve achieves better clustering than the z curve. From the practical point of view, the formulas given in this paper provide a simple measure which can be used to predict the required disk access behaviors and hence the total access time. %K Hilbert, clustering, linear space %Y %A Hiltunen, Matti %A Jaiprakash, Sumita %A Schlichting, Richard %T Exploiting Fine-Grain Configurability for Secure Communication %D Wednesday, April 28, 1999 %Z Wed, 08 Jan 99 00:00:00 GMT %R TR99-08 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1999/TR99-08.ps %X Current protocols such as IPSec and TLS that provide communication security for network applications allow customization of certain security attributes and techniques, but in limited ways and without the benefit of a single unifying framework. Here, the design of a highly-customizable extensible service called SecComm is described in which attributes such as authenticity, privacy, integrity, and non-repudiation can be customized in arbitrary ways. With SecComm, applications can open secure communication connections in which only those attributes selected from among a wide range of possibilities are enforced, and are enforced using the strength or technique desired. SecComm is being implemented using the Mach MK 7.3 operating system and Cactus, a system for building configurable communication services. In Cactus, different properties and techniques are implemented as software modules called micro-protocols that interact using an event-driven execution paradigm. This design approach has a high degree of flexibility, yet provides enough structure and control that it is easy to build collections of micro-protocols realizing a large number of diverse properties. %K fine-grain, SecComm, Cactus %Y %A Debray, Saumya %A Evans, William %A Muth, Robert %T Compiler Techniques for Code Compression %D Friday, April 23, 1999 %Z Wed, 08 Jan 99 00:00:00 GMT %R TR99-07 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/TR99-07.ps %X In recent years there has been an increasing trend towards the incorporation of computers into a variety of devices where the amount of memory available is limited. This makes it desirable to try and reduce the size of applications where possible. This paper explores the use of compiler techniques to accomplish code compression to yield smaller executables. The main contribution of this paper is that, by showing how "equivalent" code fragments can be detected and factored out without having to report to purely linear treatments of code sequences as in suffix-tree-based approaches, it sets up a framework for code compression that can be more flexible in it's treatment of what code fragments are considered equivalent. Our ideas have been implemented in the form of a binary-rewriting tool that is able to achieve signaficantly better compression that previous approaches. %K keywords %Y %A Hartman, John H. %A Murdock, Ian %A Spalink, Tammo %T The Swarm Scalable Storage System %D Thursday, April 1, 1999 %Z Wed, 08 Jan 99 00:00:00 GMT %R TR99-06 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1999/TR99-06.ps %X Swarm is a storage system that provides scalable, reliable, and cost-effective data storage. Swarm is based on storage servers, rather than file servers; the storage servers are optimized for cost-performance and aggregated to provide high-performance data access. Swarm uses a <I>striped log</I> abstraction to store data on the storage servers. This abstraction simplifies storage allocation, improves file access performance, balances server loads, provides fault-tolerance through computed redundancy, and simplifies crash recovery. We have developed a Swarm prototype using a cluster of Linux-based personal computers as the storage servers and clients; the clients access the servers via the Swarm-based Sting file system. Our performance measurements show that a single Swarm client can write to two storage servers at 3.0 MB/s., while four clients can write to eight servers at 16.0 MB/s. %K Swarm, scalable, storage %Y %A Chiu, Wanda %A Hartman, John H. %T Building Caches using Multi-Threaded State Machines %D Monday, February 15, 1999 %Z Wed, 08 Jan 99 00:00:00 GMT %R TR99-05 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1999/TR99-05.ps %X Designing a client-side cache for a distributed file system is complicated by concurrent accesses by applications, network communication with the server, and complex relationships between the data in the cache. Despite these difficulties, caches are usually built using threads or finite state machines as the programming model, even though both are inadequate for the task. We have created an infrastructure for cache development based on multi-threaded state machines, which attempts to capitalize on the benefits of both programming models. The state machine allows the global state of the cache to be carefully controlled, allowing interactions between concurrent cache operations to be reasoned about and verified. Threads allow individual operations to maintain their own local state, and propagate that state across transitions of the state machine. We created a prototype of this infrastructure and used it to implement a write-through file block cache that provides multiple-reader/single-writer access to its blocks; although this is a relatively simple cache protocol, it is much easier to implement using multi-threaded state machines than using either threads or finite state machines alone. %K state machines, caches, multi-threaded %Y %A Hiltunen, Matti A. %A Immanuel, Vijaykumar %A Schlichting, Richard D. %T Supporting Customized Failure Models for Distributed Software %D Thursday, February 11, 1999 %Z Wed, 08 Jan 99 00:00:00 GMT %R TR99-04 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1999/TR99-04.ps %X The cost of employing software fault-tolerance techniques in distributed systems is strongly related to the type of failures to be tolerated. For example, in terms of the amount of redundancy required and execution time, tolerating a processor crash is much cheaper than tolerating arbitrary (or Byzantine) failures. The tradeoff, of course, is that making stronger assumptions about failures lessens the degree of fault coverage provided by the system. This paper describes an approach to constructing configurable services for distributed systems that allows easy customization of the type of failures to tolerate. For example, using our approach, it is possible to configure custom services across a spectrum of possibilities, from a very efficient but unreliable server group that does not tolerate any failures, to a less efficient but reliable group that tolerates crash, omission, timing, or arbitrary failures. The approach is based on building configurable services as collections of software modules called micro-protocols. Each micro-protocol implements a different semantic property or property variant, and interacts with other micro-protocols using an event-driven model provided by a runtime system. The net result is an enhanced ability to explicitly manage the tradeoff between the level of reliability and cost. In addition to facilitating the choice of failure model, our approach allows service properties such as message ordering and delivery atomicity to be customized for each application. %K keywords %Y %A Miller, Susan J. %A Myers, Gene %T The FAKtory DNA Sequence Fragment Assembly System %D Wednesday, February 3, 1999 %Z Wed, 08 Jan 99 00:00:00 GMT %R TR99-03 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1999/TR99-03.ps %X FAKtory is a software environment for supporting DNA sequencing projects. FAKtory runs on machines configured with the UNIX operating system. It is called FAKtory because it employs our FAKII Fragment Assembly Kernel that is a C-library of routines incorporating our best algorithms for solving the shotgun assembly problem. The system is highly configurable and can perform fragment clipping, prescreening, and tagging functions, shotgun assebmly with or without constraints, and sequence finishing. An extensible input/output mechanism permits FAKtory to be inserted into any informatics environment. %K keywords %Y %A Myers, Gene %T The GAF Data Exchange Format %D Wednesday, February 3, 1999 %Z Wed, 08 Jan 99 00:00:00 GMT %R TR99-02 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1999/TR99-02.ps %X GAF is a General Assembly Format for exchanging data in a sequencing project. It provides constructs for specifying sequences,quality numbers, assemblies, multi-alignments, overlaps between fragments, and constraints between them. It is designed for coding the information at any point in the assembly process, so for an example, a GAF file can specify just a collection of fragments, a collection of fragments and assigned quality numbers, a collection of fragments and their overlaps, an assembly of some fragments, assemblies of cooperative software processes handling different aspects of an overall sequencing informatics pipeline. A very simple translation (using a simple awk script) can convert the CAF format introduced by the Sanger group into a subset of the GAF format. %K keywords %Y %A Miller, Susan J. %A Jain, Mudita %A Anson, Eric %A Myers, Gene %T Interface for the FAKII Fragment Assembly Kernel %D Wednesday, February 3, 1999 %Z Wed, 08 Jan 99 00:00:00 GMT %R TR99-01 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1999/TR99-01.ps %X This document describes the C programming language interface to our FAKII Fragment Assembly Kernel library. Inputs to the Fragment Assembly Kernel are (1)DNA fragment sequences from potentially inaccurate sequencing experiments, and (2) optional constraints on fragment assembly such as known fragment overlaps or relative fragment orientation. Fragment sequence version control is supported. The Fragment Assembly Kernel produces the most probable reconstructions of the original Dna sequence from the fragments, subject to any specified constraints. Each fragment assembly includes multiple sequence alignment and concsensu sequences. Multiple sequence alignment editing capabilities are provided to allow manual correction of sequence errors. %K keywords %Y %A Debray, Saumya %A Muth, Robert %A Watterson, Scott %T Link-time Improvement of Scheme Programs %D Wednesday, December 16, 1998 %Z Wed, 08 Jan 98 00:00:00 GMT %R TR98-16 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1998/TR98-16.ps.Z %X Optimizing compilers typically limit the scope of their analyses and optimizations to individual modules. This has two drawbacks: first, library code cannot be optimized together with their callers, which implies that reusing code through libraries incurs a penalty; and second, the results of analysis and optimization cannot be propagated from an application module written in one language to a module written in another. A possible solution is to carry out (additional) program optimization at link time. This paper describes our experiences with such optimization using two different optimizing Scheme compilers, and several benchmark programs, via alto, a link-time optimizer we have developed for the DEC Alpha architecture. Experiments indicate that significant performance improvements are possible via link-time optimization even when the input programs have already been subjected to high levels of compile-time optimization. %K optimizing compilers, link-time %Y %A Soo, Michael %T Constructing a Temporal Database Management System %D Tuesday, December 15, 1998 %Z Wed, 08 Jan 98 00:00:00 GMT %R TR98-15 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/ %X abstract %K dissertation, not available online %Y %A Muth, Robert %A Debray, Saumya %A Watterson, Scott %A de Bosschere, Koen %T alto: A Link-Time Optimizer for the DEC Alpha %D Wednesday, December 9, 1998 %Z Wed, 08 Jan 98 00:00:00 GMT %R TR98-14 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1998/TR98-14.ps.Z %X Traditional optimizing compilers are limited in the scope of their optimizations by the fact that only a single function, or possibly a single module, is available for analysis and optimization. In particular, this means that library routines cannot be optimized to specific calling contexts. Other optimization opportunities, exploiting information not available before linktime such as addresses of variables and the final code layout, are often ignored because linkers are traditionally unsophisticated. A possible solution is to carry out whole-program optimization at link time. This paper describes {\tt alto}, a link-time optimizer for the DEC Alpha architecture. It is able to realize significant performance improvements even for programs compiled with a good optimizing compiler with a high level of optimization. The resulting code is considerably faster that that obtained using the OM link-time optimizer, even when the latter is used in conjunction with profile-guided and inter-file compile-time optimizations. %K supersedes TR96-15 %Y %A Spalink, Tammo %A Hartman, John %A Gibson, Garth %T The Effect of Mobile Code on File Service %D Monday, November 16, 1998 %Z Wed, 08 Jan 98 00:00:00 GMT %R TR98-12 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1998/TR98-12.ps.Z/ %X Mobile code promises to improve the functionality and performance of applications, but may have a detrimental effect on overall system performance. In this paper we consider the effect of moving an application from a client to a file server, both on the application and the server. Under what circumstances does application performance improve, and does it come at the expense of other (non-mobile) background applications using the same server? We use a trace-driven simulation to measure the effect of mobile code, allowing system parameters such as the size of the server memory and server speed relative to client speed to be varied. We found that several factors influence the benefit of mobile code. Server memory does not appear to be a significant problem; relatively small server caches have a high hit rate even when shared with mobile code. The relative CPU performance of the client and server has a bigger effect on system performance: mobile code should not be run on the server if its CPU is a bottleneck. %K mobile code %Y %A Moon, Bongki %A Vega Lopez, Ines Fernando %A Immanuel, Vijaykumar %T Scalable Algorithms for Large-Scale Temporal Aggregation %D Monday, November 2, 1998 %Z Wed, 08 Jan 98 00:00:00 GMT %R TR98-11 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1998/TR98-11.ps.Z/ %X The ability to model time-varying natures is essential to many database applications such as data warehousing and mining. However, the temporal aspects provide many unique characteristics and challenges for query processing and optimization. Among the challenges is computing temporal aggregates, which is complicated by having to compute temporal grouping. In this paper, we introduce a variety of temporal aggregation algorithms that overcome major drawbacks of previous work. First, for small-scale aggregations, both the worst-case and average-case processing time have been improved significantly. Second, for large-scale aggregations, the proposed algorithms can deal with a database that is substantially larger than the size of available memory. Third, the parallel algorithm designed on a shared-nothing architecture achieves scalable performance by delivering nearly linear scale-up and speed-up. The contributions made in this paper are particularly important because the rate of increase in database size and response time requirements has out-paced advancements in processor and mass storage technology. %K large-scale temporal aggregation %Y %A Peterson, Larry L. %A Spatscheck, Oliver %T Defending Against Denial of Service Attacks in Scout %D Thursday, September 24, 1998 %Z Wed, 08 Jan 98 00:00:00 GMT %R TR98-10 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1998/TR98-10.ps.Z/ %X We describe a two-dimensional architecture for defending against denial of service attacks. In one dimension, the architecture accounts for all resources consumed by each I/O path in the system; this accounting mechanism is implemented as an extension to the path object in the Scout operating system. In the second dimension, the various modules that define each path can be configured in separate protection domains; we implement hardware enforced protection domains, although other implementations are possible. The resulting system---which we call Escort---is the first example of a system that simultaneously does end-to-end resource accounting (thereby protecting against denial of service attacks) and supports multiple protection domains (thereby allowing untrusted modules to be isolated from each other). The paper describes the Escort architecture and its implementation in Scout, and reports a collection of experiments that measure the costs and benefits of using Escort to protect a web server from denial of service attacks. %K Scout, Escort, Defending %Y %A Gendrano, Jose Alvin %A Huang, Bruce C. %A Rodrigue, Jim M. %A Moon, Bongki %A Snodgrass, Richard T. %T Parallel Algorithms for Computing Temporal Aggregates %D Monday, August 31, 1998 %Z Wed, 08 Jan 98 00:00:00 GMT %R TR98-09 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1998/TR98-09.ps.Z/ %X The ability to model the temporal dimension is essential to many applications. Furthermore, the rate of increase in database size and response time requirements has out-paced advancements in processor and mass storage technology, leading to the need for parallel temporal database management systems. In this paper, we introduce a variety of parallel temporal aggregation algorithms for a shared-nothing architecture based on the sequential Aggregation Tree algorithm. Via an empirical study, we found that the number of processing nodes, the partitioning of the data, the placement of results, and the degree of data reduction effected by the aggregation impacted the performance of the algorithms in different ways. We designed the Time Division Merge algorithm to produce distributed result placement, as differentiated from the centralized result strategies used by the other proposed algorithms. For centralized results and high data reduction, we found that the Pairwise Merge algorithm was preferred regardless of the number of processing nodes, but for low data reduction it was only preferred up to 32 nodes, while a variant of Time Division Merge was best for larger configurations. %K keywords %Y %A Baker, Scott M. %A Moon, Bongki %T Scalable Web Server Design for Distributed Data Management %D Wednesday, August 19, 1998 %Z Wed, 08 Jan 98 00:00:00 GMT %R TR98-08 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1998/TR98-08.ps.Z/ %X Traditional techniques for a distributed web server design rely on manipulation of central resources, such as routers or DNS services, to distribute requests designated for a single IP address to multiple web servers. The goal of the Distributed Cooperative Web Server (DCWS) system development is to explore application-level techniques for distributing web content. We achieve this by dynamically manipulating the hyperlinks stored within the web documents themselves. The DCWS system effectively eliminates the bottleneck of centralized resources, while balancing the load among distributed web servers. DCWS servers may be located in different networks, or even different continents and still balance load effectively. DCWS system design is fully compatible with existing HTTP protocol semantics and existing web client software products. %K DCWS, HTTP %Y %A Datta, Anindya %A Moon, Bongki %A Ramamritham, Krithi %A Thomas, Helen %A Viguier, Igor %T Have Your Data and Index It, too %D Monday, August 17, 1998 %Z Wed, 08 Jan 98 00:00:00 GMT %R TR98-07 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1998/TR98-07.ps.Z/ %X Two possible strategies may be utilized to enhance the efficiency of processing OLAP queries: (a) precomputation strategies (e.g., view materialization, realizing data cubes), and (b) ad-hoc strategies. While a significant amount of work has been done in developing precomputation strategies, it is generally recognized that it is difficult to materialize the answers to all possible queries. Thus, ad-hoc querying must be supported in data warehouses. This realization has sparked an interest in exploring indexing strategies suitable for OLAP queries. There appears to have been relatively little work done in ad-hoc query support for data warehouses~\cite{og:95,oq:97,sarawagi:97,kr:98}. In this paper we propose {\em DataIndexes} as a new paradigm for storing the base data. {\em An attractive feature of DataIndexes is that they serve as indexes as well as the store of base data.} Thus, DataIndexes actually define a physical design strategy for a data warehouse where the indexing, for all intents and purposes, comes for ``free''. We also present two efficient algorithms for performing star-joins with DataIndexes. In addition, we present a mathematical analysis of all the indexes presented by O'Neil and Quass as well as our DataIndexes and present analytical expressions categorizing the cost of query evaluation using these structures for range selections and star-joins, two common classes of queries in OLAP. These aid in performing an analysis yielding precise ``break-even" points for comparing these indexing alternatives. Overall, it turns out that DataIndexes are very attractive in a wide variety of cases in terms of enhancing the performance of range and star-join queries in data warehouses. %K data, index, OLAP %Y %A Hiltunen, Matti %A Schlichting, Richard D. %A Han, Xiaonan %A Cardozo, Melvin M. %A Das, Rajsekhar %T Real-Time Dependable Channels: Customizing QoS Attributes for Distributed Systems %D Wednesday, July 1, 1998 %Z Wed, 08 Jan 98 00:00:00 GMT %R TR98-06 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1998/TR98-06.ps.Z/ %X Communication services that provide enhanced Quality of Service (QoS) guarantees related to dependability and real time are important for many applications in distributed systems. This paper presents real-time dependable (RTD) channels, a communication-oriented abstraction that can be configured to meet the QoS requirements of a variety of distributed applications. This customization ability is based on using CactusRT, a system that supports the construction of middleware services out of software modules called micro-protocols. Each micro-protocol implements a different semantic property or property variant, and interacts with other micro-protocols using an event-driven model supported by the CactusRT runtime system. In addition to RTD channels, CactusRT and its implementation are described. This prototype executes on a cluster of Pentium PCs running the OpenGroup/RI MK 7.3 Mach real-time operating system and CORDS, a system for building network protocols based on the x-kernel. %K keywords %A Sarkar, Prasenjit %T Title %D Wednesday, July 1, 1998 %Z Wed, 08 Jan 98 00:00:00 GMT %R TR98-05 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1998/TR98-05.ps.Z/ %X abstract %K dissertation %Y %A Yeatts, Andrey %T A Graphical User Interface Design Paradigm Based on Production Rules %D Tuesday, June 9, 1998 %Z Wed, 08 Jan 97 00:00:00 GMT %R TR98-04 %I The Department of Computer Science, University of Arizona %U %X abstract %K PhD dissertation %Y not available online %A Peterson, Larry %T Title %D Friday, March 13, 1998 %Z Wed, 08 Jan 98 00:00:00 GMT %R TR98-03 %I The Department of Computer Science, University of Arizona %U %X abstract %K keywords %Y %A Sundaram, Rajesh %T Design and Implementation of the Swarm Storage Server %D Tuesday, March 10, 1998 %Z Wed, 08 Jan 98 00:00:00 GMT %R TR98-02 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1998/TR98-02.ps.Z/ %X The Swarm storage system uses log-based striping to achieve high performance. Clients collect application writes in a log and stripe the log across multiple storage servers to aggregate server bandwidth. The Swarm storage server has been designed to meet various requirements of the Swarm storage system. These include high performance for data-intensive operations, rapid crash recovery, security support and atomic interface routines. The design of the Swarm storage server is simple enough to allow its implementation as a network appliance. %K Swarm, log-based, atomic interface routines %Y %A Spatscheck, Oliver %A Hansen, Jorgen S. %A Hartman, John H. %A Peterson, Larry %T Optimizing TCP Forwarder Performance %D Monday, February 9, 1998 %Z Mon, 09 Feb 98 00:00:00 GMT %R TR98-01 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1998/TR98-01.ps.Z/ %X A TCP forwarder is a network node that establishes and forwards data between a pair of TCP connections. For example, a firewall that places a proxy between a TCP connection to an external host and a TCP connection to an internal host---for the purpose of implementing access control to a resource on the internal host---is an example of a TCP forwarder. Once the proxy approves the access, it simply forwards data from one connection to the other. We use the term {\it TCP forwarding} to describe indirect TCP communication via a proxy in general. This paper characterizes the behavior of TCP forwarding, and illustrates the role TCP forwarding plays in common network services like firewalls and HTTP proxies. We introduce an optimization technique, called {\it connection splicing}, that can applied to a TCP forwarder, and reports the results of a performance study designed to evaluate its impact. Connection splicing has the effect of improving the performance of TCP forwarding by a factor of two to four, making it competitive with the performance of an IP router running on the same hardware. %K TCP, network, node, internal host, firewalls %Y %A Mosberger, David %T Message Library Design Notes %D Tuesday, November 25, 1997 %Z Wed, 08 Jan 97 00:00:00 GMT %R TR97-19 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1997/TR97-19.ps.Z %X This document describes the current implementation of the x -kernel message library. The focus is on its data structures and the underlying principles. This document does not describe the message librarys in-terface or how it is used. Please refer to the x -kernel Programmers Manual [1] and the x -kernel Tutorial [2] for that purpose. %K x-kernel %Y %A Mosberger, David %T Map Library Design Notes %D Tuesday, November 25, 1997 %Z Wed, 08 Jan 97 00:00:00 GMT %R TR97-18 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1997/TR97-18.ps.Z %X This document describes the current implementation of the x-kernel map library. The focus is on its un-derlying data structures and algorithms. This document does not describe the map librarys interface or how it is used. Please refer to the x-kernel Programmers Manual [1] and the x-kernel Tutorial [2] for that purpose. %K x-kernel %Y %A Spatscheck, Oliver %A Peterson, Larry %T Escort: A Path-Based OS Security Architecture %D Monday, November 17, 1997 %Z Wed, 08 Jan 97 00:00:00 GMT %R TR97-17 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1997/TR97-17.ps.Z %X Escort is the security architecture for Scout, a configurable operating system designed for network appliances. Scout is unique in that it is designed around {\it paths}---a communication-centric abstraction that encapsulates information flows through the system---rather than the more traditional processes and servers. Scout uses paths to make end-to-end resource allocation decisions. Escort extends this idea to isolate these information flows, as well as to provide end-to-end accountability. This paper introduces the Escort security architecture, shows how it can be used to enforce common security policies, and evaluates its design according to several well-established criteria. %K Escort, Scout %Y %A Hartman, John H. %A Peterson, Larry P. %A Bavier, Andy %A Bigot, Peter %A Bridges, Patrick %A Montz, Brady %A Piltz, Rob %A Proebsting, Todd %A Spatscheck, Oliver %T Joust: A Platform for Communication-Oriented Liquid Software %D Monday, November 17, 1997 %Z Wed, 08 Jan 97 00:00:00 GMT %R TR97-16 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1997/TR97-16.ps.Z %X Joust is a software platform for liquid software---code that flows easily from machine to machine. Liquid software makes it easier to maintain, debug, update, and customize networked systems. One of the most interesting applications of liquid software is to interject it into the nodes of a network, allowing network functionality, such as routing, to be customized. Additional features, such as special-purpose congestion control and filtering algorithms, are also easily added. The challenge is to develop a communication-oriented platform for liquid software, one in which the focus is the efficient transfer of data, not high-performance computation. To this end we have designed and implemented Joust, which consists of a complete re-implementation of the Java virtual machine (including both the runtime system and a just-in-time compiler), running on the Scout operating system (a configurable, communication-oriented OS). The result is a configurable, high-performance platform for running communication-oriented liquid software. We present the results of implementing three different liquid software applications on Joust, including a prototype architecture for active networks. %K Joust, Liquid Software, Java %Y %A Bavier, Andy %A Montz, Brady %A Peterson, Larry %T Predicting MPEG Execution Times %D Monday, November 3, 1997 %Z Wed, 08 Jan 97 00:00:00 GMT %R TR97-15 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1997/TR97-15.ps.Z %X This paper reports on a set of experiments that measure the amount of CPU processing needed to decode MPEG-compressed video in software. These experiments were designed to discover indicators that could be used to predict how many cycles are required to decode a given frame. Such predictors can be used to do more accurate CPU scheduling. We found that by considering both frame type and size, it is possible to construct a linear model of MPEG decoding with $R^2$ values of 0.97 and higher. Moreover, this model can be used to predict decoding times at both the frame and packet level that are almost always accurate to within 25% of the actual decode times. This is a surprising result given the large variability in MPEG decoding times, and suggests that it is feasible to design systems that make quality of service guarantees for MPEG-encoded video, rather than less variable encodings, such as JPEG. %K keywords %Y %A Lambright, H. Dan %T Automated Verification of Mobile Code %D October 24, 1997 %Z Wed, 08 Jan 97 00:00:00 GMT %R TR97-14 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1997/TR97-14.ps.Z %X In this thesis, we introduce a new technique to automate the verification of mobile code. Using dataflow analysis techniques, the verifier can check whether data passed between trusted software components is illegally modified by untrusted mobile code. We show that this analysis is powerful enough to make significant guarantees about whether the program will access system resources safely. Furthermore, by rendering the verification transparent to the user, the security system is not vulnerable to human error, or dependent on the user's technical abilities. Other verification techniques do not share these advantages. We describe what requirements enable this analysis, explore its limitations, and present prototype software that implements the idea. %K dissertation %Y %A Debray, Saumya %A Muth, Robert %A Weippert, Matthew %T Alias Analysis of Executable Code %D July 18, 1997 %Z Wed, 08 Jan 97 00:00:00 GMT %R TR97-13 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1997/TR97-13.ps.Z %X Recent years have seen increasing interest in systems that reason about and manipulate executable code. Such systems can generally benefit from information about aliasing. Unfortunately, most existing alias analyses are formulated in terms of high-level language features, and are unable to cope with features, such as pointer arithmetic, that pervade executable programs. This paper describes a simple algorithm that can be used to obtain aliasing information for executable code. In order to be practical, the algorithm is careful to keep its memory requirements low, sacrificing precision where necessary to achieve this goal. Experimental results indicate that it is nevertheless able to provide a reasonable amount of information about memory references across a variety of benchmark programs. %K keywords %Y %A Bhatti, Nina T. %A Hiltunen, Matti A. %A Schlichting, Richard D. %A Chiu, Wanda %T Coyote: A System for Constructing Fine-Grain Configurable Communication Services %D July 7, 1997 %Z Wed, 08 Jan 97 00:00:00 GMT %R TR97-12 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1997/TR97-12.ps.Z %X Communication-oriented abstractions such as atomic multicast, group RPC, and protocols for location-independent mobile computing can simplify the development of complex applications built on distributed systems. This paper describes Coyote, a system that supports the construction of highly modular and configurable versions of such abstractions. Coyote extends the notion of protocol objects and hierarchical composition found in existing systems with support for finer-grain objects called micro-protocols that implement individual semantic properties of the target service. A customized service is constructed by selecting micro-protocols based on their semantic guarantees and configuring them together with a standard runtime system to form a composite protocol implementing the service. Micro-protocols within a composite protocol can share data and are executed using an event-driven paradigm that enhances configurability. The overall approach is described and illustrated with examples of services that have been constructed using Coyote, including atomic multicast, group RPC, membership, and mobile computing protocols. A prototype implementation based on extending {\it x}-kernel version 3.2 running on Mach MK82 with support for micro-protocols is also presented, together with performance results from a suite of micro-protocols from which over 60 variants of group RPC can be constructed. %K keywords %Y %A Jain, Mudita %T Algorithms for Physical Mapping Using Unique Probes %D June 9, 1997 %Z Wed, 08 Jan 97 00:00:00 GMT %R TR97-11 %I The Department of Computer Science, University of Arizona %U %X DNA molecules are sequences of characters over a four letter alphabet. Determining the text of the DNA sequence contained in human cells is the goal of the Human Genome Project. The structure of a DNA sequence is reconstructed from a set of shorter fragments sampled from it at unknown locations, as it is usually too long to be determined directly. We consider the problem when the fragments are very long, and each fragment has a fingerprint consisting of the presence of two or three pre-selected, smaller sequences called probes within it. These probes have a unique location along the original DNA sequence. The fingerprints contain false negative and false positive errors, and the fragments may be chimeric. A physical map of a DNA sequence is a reconstruction of the order of the probes and fragments along it. In short, given a collection of fragments, with fingerprints for each fragment taken from a collection of probles, and parameters that bound the rates of false negatives, false positives, and chimeras in the input data, the problem is to find the most likely probe ordering. Physical mapping in NP-complete when the input data contains errors. To contstruct physical maps we first determine neighbourhoods of probes and clones that are highly likely to be adjacent on the original DNA sequence. We then use a new, versatile integer programming formulation of the problem, to derive heuristics for ordering probes within neighbourhoods. This formulation provides a single, uniform representation for diverse data such as end-clone probes and in-situ hybridization, and provides a natural medium for the integration of previously constructed maps with newer data. We also present an ordering heuristic based upon end-clone data. Finally, we connect these local permutations into a larger, more global probe permutation. For this we use heuristics that have at their core previously mapped data. All heuristics are implemented and evaluated by comparing the computed probe orderings to the original probe orderings for simulated data. %K dissertation; not available online %Y %A Han, Xiaonan Han %A Hiltunen, Matti A. %A Schlichting, Richard D. %T Supporting Configurable Real-Time Communication Services %D June 5, 1997 %Z Wed, 08 Jan 97 00:00:00 GMT %R TR97-10 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1997/TR97-10.ps.Z %X abstract %K keywords %Y %A Evans, William %A Kirkpatrick, David %A Townsend, Gregg %T Right Triangular Irregular Networks %D May 30, 1997 %Z Wed, 08 Jan 97 00:00:00 GMT %R TR97-09 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1997/TR97-09.ps.Z %X We describe a hierarchical data structure for representing a digital terrain (height field) which contains approximations of the terrain at different levels of detail. The approximations are based on triangulations of the underlying two-dimensional space using right-angled triangles. The methods we discuss allow the approximation to precisely represent the surface in certain areas while coarsely approximating the surface in others. Thus, for example, the area close to an observer may be represented with greater detail than areas which lie outside their field of view. We discuss the application of this hierarchical data structure to the problem of interactive terrain visualization. We point out some of the advantages of this method in terms of memory usage and speed. %K keywords %Y %A Coffman, Jr., E.G. %A Downey, Peter J. %A Winkler, Peter %T A Note on Packing Rectangles in Groups %D May 15, 1997 %Z Wed, 08 Jan 97 00:00:00 GMT %R TR97-07 %I Department of Computer Science, The University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1997/TR97-07.ps.Z %X Rectangles from a list of length n are packed into a unit width strip. The rectangles have dimensions independently chosen from a uniform distribution on [0, 1], and the packing objective is to minimize the expected height of the packing of n items. The packing algorithms of interest must operate on-line, as well as adhere to a constraint reminiscent of the Tetris game: rectangles arrive from the top and must be moved at arrival without overlap within the strip to reach their final placement. This paper assumes no rotation of rectangles. The Group Packing algorithm GP_3 packs rectangles densely in groups of 3 at a time, starting a new level at the highest point reached by any rectangle in the group. The GP_3 algorithm achieves an asymptotic expected height of (0.38541 ... ) n. This is slightly worse than the bound (0.38134 ... ) n achieved by Next Fit Level (NFL) packing. %K two-dimensional bin-packing, strip packing, on-line algorithms, lower bounds %Y %A Mosberger-Tang, David %T SCOUT: A Path-based Operating System %D May 13, 1997 %Z Wed, 08 Jan 97 00:00:00 GMT %R TR97-06 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1997/TR97-06.ps.Z %X Scout is a new operating system architecture that is designed specifically to accommodate the needs of communication-centric systems. An important class of such systems is formed by information appliances, which, broadly speaking, are devices whose primary task is to facilitate communication. Appliances are typically relatively small, special-purpose, and often mobile devices such as remote controls, personal information managers, network-attached disks, cameras, displays, or dedicated file-servers. Scout has a modular structure that is complemented by a new abstraction called the path. The modular structure enables the efficient building of systems that are tailored precisely to the requirements of a particular appliance. Paths address issues related to the performance and quality with which a communication service is rendered. A path can be visualized as a vertical slice through a layered system or viewed abstractly as a bidirectional flow of data. As such, a path typically traverses multiple modules in a Scout system. This means that paths provide additional context to the modules that process data that is being communicated through the system. This context often makes it possible to implement data processing more efficiently or to improve the quality with which resource management, such as CPU scheduling or memory allocation, is realized. This dissertation develops the path abstraction from first principles and then introduces the various aspects of the Scout architecture. Aside from the path abstraction, Scout uses a novel approach for network packet classification. With the Scout architecture defined, two studies are presented that provide an in-depth look at how to use Scout and its path abstraction. The first study employs the path abstraction to reduce processing latency in the networking subsystem. Evaluating these path optimizations also provides important insights on the performance behavior of networking subsystems on modern RISC machines. The second study employs the path abstraction to improve resource management for an information appliance that involves a networked TV displaying MPEG encoded video. %K dissertation %Y %A Gopal, Burra %T Integrating Content-Based Access Mechanisms with Hierarchical File Systems %D April 18, 1997 %Z Wed, 08 Jan 97 00:00:00 GMT %R TR97-05 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1997/TR97-05.ps.Z %X We describe a new file system that provides, at the same time, both name and content based access to files. To make this possible, we introduce the concept of a semantic directory. Every semantic directory has a query associated with it. When a user creates a semantic directory, the file system automatically creates a set of pointers to the files in the file system that satisfy the query associated with the directory. This set of pointers is called the query-result of the directory. To access the files that satisfy the query, users just need to de-reference the appropriate pointers. Users can also create files and sub-directories within semantic directories in the usual way. Hence, users can organize files in a hierarchy and access them by specifying path names, and at the same time, retrieve files by asking queries that describe their content. Our file system also provides facilities for query-refinement and customization. When a user creates a new semantic sub-directory within a semantic directory, the file system ensures that the query-result of the sub-directory is a subset of the query-result of its parent. Hence, users can create a hierarchy of semantic directories to refine their queries. Users can also edit the set of pointers in a semantic directory, and thereby modify its query-result without modifying its query or the files in the file system. In this way, users can customize the results of queries according to their personal tastes, and use customized results to refine queries in the future. That is, users do not have to depend solely on the query language to achieve these objectives. Our file system has many other features, including semantic mount-points that allow users to access information in other file systems by content. The file system does not depend on the query language used for content-based access. Hence, it is possible to integrate any content-based access mechanism into our file system. %K dissertation %Y %A Coffman, E.G., Jr. %A Downey, Peter %A Winkler, Peter %T Packing Rectangles in a Strip %D April 8, 1997 %Z Wed, 08 Jan 97 00:00:00 GMT %R TR97-04 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1997/TR97-04.ps.Z %X Rectangles with dimensions independently chosen from a uniform distribution on [0, 1] are packed on-line into a unit width strip under a constraint like that of the Tetris game: rectangles arrive from the top and must be moved inside the strip to reach their place; once placed, they cannot be moved again. Cargo loading applications impose similar constraints. This paper assumes that rectangles must be moved without rotation. For n rectangles, the resulting packing height is shown to have an asymptotic expected value of at least (0.31382733 ... )n under any on-line packing algorithm. An on-line algorithm is presented that achieves an asymptotic expected height of (0.36976421 ... )n. This algorithm improves the bound achieved in Next Fit Level (NFL) packing, by compressing the items packed on two successive levels of an NFL packing via on-line movement admissible under the Tetris-like constraint. %K two-dimensional bin-packing, strip packing, on-line algorithms, lower bounds %Y %A Brakmo, Lawrence %T End-To-End Congestion Detection and Avoidance in Wide Area Networks %D 03/10/97 %Z Wed, 08 Jan 97 00:00:00 GMT %R TR97-03 %I The Department of Computer Science, University of Arizona %U %X As human dependence on wide area networks like the Internet increases, so does contention for the network's resources. This contention has noticeably affected the performance of these networks, reducing their usability. This dissertation addresses this problem in two ways. First, it describes TCP Vegas, a new implementation of TCP that is distinguished from current TCP implementations by containing a new congestion detection and avoidance mechanism. This mechanism was designed to work in currently available wide area networks and achieves between 37% and 71% better throughput on the Internet, with one-fifth to one-half the losses, as compared to the current implementation of TCP. Second is describes x-Sim, a network simulator based on the x-kernel, that is able to simulate the topologies and traffic patterns of large scale networks. The usefulness of the simulator to analyze and debug network components is illustrated throughout this dissertation. %K dissertation; not available online %Y %A Orman, Hilarie %T The Oakley Key Determination Protocol %D February 17, 1997 %Z Wed, 08 Jan 97 00:00:00 GMT %R TR97-02 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1997/TR97-02.ps.Z %X This document describes a protocol, named OAKLEY, by which two authenticated parties can agree on secure and secret keying material. The basic mechanism is the Diffie-Hellman key exchange algorithm. The OAKLEY protocol supports Perfect Forward Secrecy, compatibility with the ISAKMP protocol for managing security associations, user-defined abstract group structures for use with the Diffie-Hellman algorithm, key updates, and incorporation of keys distributed via out-of-band mechanisms. %K keywords %Y %A Proebsting, Todd A. %A Townsend, Gregg %A Bridges, Patrick %A Hartman, John H. %A Newsham, Tim %A Watterson, Scott A. %T Toba: Java For Applications: A Way Ahead of Time (WAT) Compiler %D January 8, 1997 %Z Wed, 08 Jan 97 00:00:00 GMT %R TR97-01 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1997/TR97-01.ps.Z %X Toba is a system for generating efficient standalone Java applications. Toba includes a Java-bytecode-to-C compiler, a garbage collector, a threads package, and Java API support. Toba-compiled Java applications execute 1.5--10 times faster than interpreted and Just-In-Time compiled applications. %K keywords %Y %A Bhatti, Nina %T A System for Constructing Configurable High-level Protocols %D Wednesday, December 4, 1996 %Z Wed, 23 Oct 96 00:00:00 GMT %R TR96-22 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1996/TR96-22.ps.Z %X Distributed applications often require sophisticated communication services such as multicast, membership, group RPC (GRPC), transactions, or support for mobility. These services form a large portion of the supporting software for distributed applications, yet the specific requirements of the service vary from application to application. Constructing communication services that are useful for multiple diverse applications while still being manageable and efficient is a major challenge. This dissertation focuses on improving the construction of complex communication services. The contributions of the dissertation are a new model for the construction of such services and the design and implementation of a supporting network subsystem. In this model, a communication service is decomposed into distinct micro-protocols, each implementing a specific semantic property. Micro-protocols have well-defined interfaces that use events to coordinate actions and communicate state changes, which results in a highly modular and configurable implementation. This model augments, rather than replaces, the conventional hierarchical protocol model. In this implementation, a conventional {\it x}-kernel protocol is replaced with a composite protocol in which micro-protocol objects are linked with a standard runtime system that externally presents the standard {\it x}-kernel interface. Internally, the runtime system provides common message services, enforces a uniform interface between micro-protocols, detects and generates events, and synchronously or asynchronously executes event handlers. The viability of the approach is demonstrated by performance tests for several different configurations of a suite of micro-protocols for a group RPC service. The micro-protocols in this suite implement multiple semantic properties of procedure call termination, message ordering, reliability, collation of responses, call semantics, membership, and failure. The tests were conducted while running within the {\it x}-kernel as a user level task on the Mach operating system. Additional micro-protocols for mobile computing applications validate the generality of the model. We designed micro-protocols for quality of service (QoS), transmitting and renegotiating QoS parameters during handoffs, as well as for mobility management, covering cell detection, handoff, and disconnection. This suite of micro-protocols can be configured to accommodate a range of different service requirements or even to mimic existing mobile architectures such as those found in the Crosspoint, PARC TAB, InfoPad, or DataMan projects. %K dissertation %Y %A Kagedal, Andreas %A Debray, Saumya %T A Practical Approach to Structure Reuse of Arrays in Single Assignment Languages %D December 2, 1996 %Z Wed, 23 Oct 96 00:00:00 GMT %R TR96-21 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1996/TR96-21.ps.Z %X Array updates in single assignment languages generally require some copying of the array, and thus tend to be more expensive than in imperative languages. As a result, programs in single assignment languages sometimes suffer from a performance handicap compared to those in imperative languages. Traditional attempts to address this problem have typically involved either complex compile-time analyses, which tend to be slow and fragile; or new language constructs, which do not always interface with already existing code. In this paper, we propose a new approach to this problem, based on a simple and straightforward program transformation, that we believe addresses the shortcomings of both of these approaches: it is easy to understand, efficiently implemented, does not require new language constructs, and yet is applicable to most commonly encountered programs. %K keywords %Y %A Debray, Saumya %A Proebsting, Todd %T Title %D December 2, 1996 %Z Wed, 23 Oct 96 00:00:00 GMT %R TR96-20 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1996/TR96-20.ps.Z %X abstract %K keywords %Y %A Debray, Saumya %T Resource-Bounded Partial Evaluation %D November 18, 1996 %Z Wed, 23 Oct 96 00:00:00 GMT %R TR96-19 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1996/TR96-19.ps.Z %X Most partial evaluators do not take the availability of machine-level resources, such as registers or cache, into consideration when making their specialization decisions. The resulting resource contention can lead to severe performance degradation---causing, in extreme cases, the specialized code to run slower than the unspecialized code. In this paper we consider how resource availability considerations can be incorporated within a partial evaluator. We develop an abstract formulation of the problem, show that optimal resource-bounded partial evaluation is NP-complete, and discuss simple heuristics that can be used to address the problem in practice. %K keywords %Y %A Muth, Robert %A Debray, Saumya %T On the complexity of Function Pointer May-Alias Analysis %D October 25, 1996 %Z Wed, 23 Oct 96 00:00:00 GMT %R TR96-18 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1996/TR96-18.ps.Z %X This paper considers the complexity of interprocedural function pointer may-alias analysis, i.e., determining the set of functions that a function pointer (in a language such as C) can point to at a point in a program. This information is necessary, for example, in order to construct the control flow graphs of programs that use function pointers, which in turn is fundamental for most dataflow analyses and optimizations. We show that the general problem is complete for deterministic exponential time. We then consider two natural simplifications to the basic (precise) analysis and examine their complexity. The approach described can be used to readily obtain similar complexity results for related analyses such as reachability and recursiveness. %K keywords %Y %A Tong, Bo-Ming %A Leung, Ho-Fung %T Load Balancing in a Distributed-Memory Or-Parallel System %D Date issued %Z Wed, 23 Oct 96 00:00:00 GMT %R TR 96-17 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1996/TR96-17.ps.Z %X We consider or-parallel logic programming implementations on parallel machines with no shared-memory. Traditional implementation techniques as employed in Aurora and Muse are not applicable. In our or-parallel execution model, all processors perform identical work initially. At each choice point, processors are divided evenly among alternatives of the choice point. Backtracking is employed if there are not enough processors for such a division. As execution proceeds, the division of processors among alternatives becomes uneven. In this paper, we present two different methods of load balancing called equalization and apportion, aimed at improving the degree of parallelism. Equalization and apportion reallocates all processors to the or-parallel branches by copying heaps through a interprocessor communication network. %K keywords %Y %A Pagels, Michael A. %T Chimaera: A High-Bandwidth Network Interface Supporting Cooperative Tasks %D Date issued %Z Wed, 23 Oct 96 00:00:00 GMT %R TR96-16 %I The Department of Computer Science, University of Arizona %U %X abstract %K not available online %Y %A DeBosschere, Koen %A Debray, Saumya %T alto: A Link-Time Optimizer for the DEC Alpha %D Date issued %Z Wed, 23 Oct 96 00:00:00 GMT %R TR96-15 %I The Department of Computer Science, University of Arizona %U %X abstract %K see TR98-14 %Y %A Ogurtsov, Nick %A Orman, Hilarie %A Schroeppel, Richard %A O'Malley, Sean %A Spatscheck, Oliver %T Covert Channel Elimination Protocols %D Date issued %Z Wed, 23 Oct 96 00:00:00 GMT %R TR96-14 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1996/TR96-14.ps.Z %X With the increasing growth of electronic communications, it is becoming important to provide a mechanism for enforcing various security policies on network communications. This paper discusses our implementation of several previously proposed protocols that enforce theell LaPadula security model. We also introduce a new protocol called "Quantized Pump" that offers several advantages, and present experimental results to support our claims. %K keywords %Y %A Afjeh, Abdollah A. %A Homer, Patrick T. %A Lewandowski, Henry %A Reed, John A. %A Schlichting, Richard D. %T Implementing Monitoring and Zooming in a Distributed Jet Engine Simulation %D Date issued %Z Wed, 23 Oct 96 00:00:00 GMT %R TR96-13 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1996/TR96-13.ps.Z %X The NASA Numerical Propulsion System Simulation (NPSS) project is exploring the use of computer simulation to facilitate the design of new jet engines. Several key issues raised in this research are being examined in an NPSS-related research project: zooming, monitoring and control, and support for heterogeneity. The design and implementation of a simulation executive that addresses each of these issues is described. In this work, the strategy of zooming, which allows codes that model at different levels of fidelity to be integrated within a single simulation, is applied to the fan component of a turbofan propulsion system. A prototype monitoring and control system supports experimentation with expert system techniques for active control of the simulation. An interconnection system provides a transparent means of connecting the heterogeneous systems that comprise the prototype. %K keywords %Y %A Hiltunen, Matti A. %T Configurable Fault-Tolerant Distributed Services %D Date issued %Z Wed, 23 Oct 96 00:00:00 GMT %R TR96-12 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1996/TR96-12.ps.Z %X Fault tolerance---that is, the ability of a system to continue providing its specified service despite failures---is becoming more important as computers are increasingly used in application areas such as process control, air-traffic control, and banking. Distributed systems, consisting of computers connected by a network, are an important platform for many fault-tolerant systems. Unfortunately, it is difficult to construct fault-tolerant distributed software, so communication services such as multicast, RPC, membership, and transactions have been proposed as simplifying abstractions. However, although numerous versions of these services have been defined, no single implementation provides a perfect match for all applications and all execution environments. This dissertation presents an approach to constructing highly configurable fault-tolerant services. A new model is proposed where a service is composed out of micro-protocol objects, each of which implements an individual semantic property of the overall service. This makes it easy to construct different customized versions of a service with properties tailored to the specifics of an application. The model allows micro-protocols to cooperate using user-definable events and shared variables, making the model more flexible than existing approaches. Three prototype implementations of the model are also described. In addition, a new approach is introduced for specifying abstract properties of services using temporal logic over message ordering graphs, which are abstract representations of collections of messages on each site. Furthermore, the problem of which combinations of properties or corresponding micro-protocols are feasible is addressed by defining relations that identify those combinations that result in a functioning service. Dependency and configuration graphs are presented as tools for constructing operational configurations. This new approach is used to develop configurable membership and group RPC services. Furthermore, the system diagnosis problem is contrasted with membership, and new membership and system diagnosis algorithms are derived based on the observations. Finally, the dissertation presents an application of the event-driven model to adaptive systems that dynamically change their behavior as a result of changes in the execution environment or user requirements. %K keywords %Y %A Hartman, John %A Manber, Udi %A Peterson, Larry L. %A Proebsting, Todd %T Liquid Software: A New Paradigm for Networked Systems %D Date issued %Z Wed, 23 Oct 96 00:00:00 GMT %R TR96-11 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1996/TR96-11.ps.Z %X This paper introduces the idea of dynamically moving functionality in a network-between clients and servers, and between hosts at the edge of the network and nodes inside the network. At the heart of moving functionality is the ability to support mobile code-code that is not tied to any single machine, but instead can easily move from one machine to another. Mobile code has been studied mostly for application-level code. This paper explores its use for all facets of the network, and in a much more general way. Issues of efficiency, interface design, security, and resource allocation, among others, are addressed. We use the term liquid software to describe the complete picture-liquid software is an entire infrastructure for dynamically moving functionality throughout a network. We expect liquid software to enble new paradigms, such as active networks that allow users and applications to customize the network by interjecting code into it. %K keywords %Y %A Freeh, Vincent W. %A Andrews, Gregory R. %T Dynamically Controlling False Sharing in a Distributed Shared Memory %D Date issued %Z Wed, 23 Oct 96 00:00:00 GMT %R TR96-10 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1996/TR96-10.ps.Z %X Distributed shared memory (DSM) alleviates the need to program message passing explicitly on a distributed-memory machine. In order to reduce memory latency, a DSM replicates copies of data. This paper examines several current approaches to controlling thrashing caused by false sharing in a DSM. Then it introduces a novel memory consistency protocol, writer-owns, which detects and eliminates false sharing at run time. In iterative computations, where the data is accessed similarly every iteration, the writer-owns protocol can have tremendous benefits because the overhead of eliminating false sharing is only incurred once. Performance results show that the writer-owns protocol is competitive with and often better than existing approaches. %K keywords %Y %A Downey, Peter J. %T The Price of Synchrony %D Date issued %Z Wed, 23 Oct 96 00:00:00 GMT %R TR96-09 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1996/TR96-09.ps.Z %X abstract %K keywords %Y %A Bigot, Peter A. %T pC*: Efficient and Portable Runtime Support for Data-Parallel Languages %D Date issued %Z Wed, 23 Oct 96 00:00:00 GMT %R TR96-08 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1996/TR96-08.ps.Z %X A variety of historically-proven computer languages have recently been extended to support parallel computation in a \emph{data-parallel} framework. The performance capabilities of modern microprocessors have made the ``cluster-of-workstations'' model of parallel computing more attractive, by permitting organizations to network together workstations to solve problems in concert, without the need to buy specialized and expensive supercomputers or mainframes. For the most part, research on these extended languages has focused on compile-time analyses which detect data dependencies and use user-provided hints to distribute data and encode the necessary communication operations between nodes in a multiprocessor system. These analyses have shown their value when the necessary hints are provided, but require more information at compile-time than may be available in large-scale real-world programs. This dissertation focuses on elements important to an efficient and portable implementation of runtime support for data-parallel languages, to the near absence of any reliance on compile-time information. We consider issues ranging from data distribution and global/local address conversion, through a communication framework intended to support modern networked computers, and optimizations for a variety of communications patterns common to data-parallel programs. The discussion is grounded in a complete implementation of a data-parallel language, C*, on stock workstations connected with standard network hardware. The performance of the resulting system is evaluated on a set of eight benchmark programs by comparing it to optimized sequential solutions to the same problems, and to the reference implementation of C* on the Connection Machine CM5 supercomputer. Our implementation, denoted pC* for ``portable C*'', generally performs within a factor of four of the optimized sequential algorithms. In addition, the optimizations developed in this dissertation permit a cluster of twelve workstations connected with Ethernet to outperform a sixty-four node CM5 in absolute performance on three of the eight benchmarks. Though we specifically address the issues of runtime support for C*, the material in this dissertation applies equally well to a variety of other parallel systems, especially the data-parallel features of Fortran 90 and High Performance Fortran. %K keywords %Y %A Myers, Eugene %T Title %D Date issued %Z Wed, 23 Oct 96 00:00:00 GMT %R TR96-07 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1996/TR96-07.ps.Z %X abstract %K keywords %Y %A Suzuki, Masato %A Katayama, Takuya %A Schlichting, Richard %T FTAG: A Functional and Attribute Based Model for Writing Fault-Tolerant Software %D Date issued %Z Wed, 23 Oct 96 00:00:00 GMT %R TR96-06 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1996/TR96-06.ps.Z %X Programs constructed using techniques that allow software or operational faults to be tolerated are typically written using an imperative computational model. Here, an alternative is described in which such programs are written using a functional and attribute based model called FTAG (Fault-Tolerant Attribute Grammars). The basic model is introduced first, followed by a description of mechanisms that allow a variety of standard fault-tolerance techniques to be realized in a straightforward way. Techniques that can be accommodated include replication and checkpointing to tolerate operational faults, and recovery blocks and N-version programming to tolerate software faults. Several examples are given to illustrate these techniques, including a replicated name server and a fault-tolerant sort that uses recovery blocks. A formal description of FTAG that precisely specifies the semantics of the model is also presented. Finally, a software architecture describing how FTAG can be implemented in a computer system containing multiple processors is given. %K keywords %Y %A Mosberger, David %A Peterson, Larry L. %T Making Paths Explicit in the Scout Operating System %D Date issued %Z Wed, 23 Oct 96 00:00:00 GMT %R TR96-05 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1996/TR96-05.ps.Z %X This paper makes a case for paths as an explicit abstraction in operating system design. Paths provide a unifying infrastructure for several OS mechanisms that have been introduced in the last several years, including fbufs, integrated layer processing, packet classifiers, code specialization, and migrating threads. This paper articulates the potential advantages of a path-based OS structure, describes the specific path architecture implemented in the Scout OS, and demonstrates the advantages in a particular application domain---receiving, decoding, and displaying MPEG-compressed video. %K keywords %Y %A Larson, Susan %A Jain, Mudita %A Anson, Eric %A Myers, Eugene %T An Interface for a Fragment Assembly Kernel %D Date issued %Z Wed, 23 Oct 96 00:00:00 GMT %R TR96-04a %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1996/TR96-04a.ps.Z %X This document describes the C programming language interface to our Fragment Assembly Kernel library. Inputs to the Fragment Assembly Kernel are (1) DNA fragment sequences from potentially inaccurate sequencing experiments, and (2) optional constraints on fragment assembly such as known fragment overlaps or relative fragment orientation. Fragment sequence version control is supported. The Fragment Assembly Kernel produces the most probable reconstructions of the original DNA sequence from the fragments, subject to any specified constraints. Each fragment assembly includes multiple sequence alignment and consensus sequences. Multiple sequence alignment editing capabilities are provided to allow manual correction of sequence errors. %K keywords %Y %A Mosberger, David %A Peterson, Larry L. %A Bridges, Patrick G. %A O'Malley, Sean %T Analysis of Techniques to Improve Protocol Processing Latency %D Date issued %Z Wed, 23 Oct 96 00:00:00 GMT %R TR96-03 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1996/TR96-03.ps.Z %X This paper describes several techniques designed to improve protocol latency, and reports on their effectiveness when measured on a modern RISC machine employing the DEC Alpha processor. We found that the memory system---which has long been known to dominate network throughput---is also a key factor in protocol latency. In particular, improving instruction cache effectiveness can greatly reduce protocol processing overheads. An important metric in this context is the memory cycles per instructions (mCPI), which is the average number of cycles that an instruction stalls waiting for a memory access to complete. The techniques presented in this paper reduce the mCPI by up to a factor of 5.8. In analyzing the effectiveness of the techniques, we also present a detailed study of the protocol processing behavior of two protocol stacks---TCP/IP and RPC---on a modern RISC processor. %K keywords %Y %A Author %T Title %D Date issued %Z Wed, 23 Oct 96 00:00:00 GMT %R TR96-02 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports %X abstract %K keywords %Y %A Author %T Title %D Date issued %Z Wed, 23 Oct 96 00:00:00 GMT %R TR96-01 %I The Department of Computer Science, University of Arizona %U ftp://ftp.cs.arizona.edu/reports/1996 %X abstract %K keywords %Y