Component and transformation based frameworks for building and optimizing Spark programs -

Shmeiss, Zeinab Hasan,

AUB ScholarWorks Home
→
Students Publications
→
AUB Students' Theses, Dissertations, and Projects
→
View Item

dc.contributor.author	Shmeiss, Zeinab Hasan,
dc.date.accessioned	2018-10-11T11:37:00Z
dc.date.available	2018-10-11T11:37:00Z
dc.date.issued	2018
dc.date.submitted	2018
dc.identifier.other	b21047510
dc.identifier.uri	http://hdl.handle.net/10938/21387
dc.description	Thesis. M.S. American University of Beirut. Department of Computer Science, 2018. T:6716$Advisor : Dr. Mohamad Jaber, Assistant Professor, Computer Science ; Committee members : Dr. Paul Attie, Professor, Computer Science ; Dr. Mohamed Nassar, Assistant Professor, Computer Science.
dc.description	Includes bibliographical references (leaves 67-69)
dc.description.abstract	Spark is the leading platform for distributed large-scale data processing. It is designed with two main features: (1) an in-memory data engine that makes it uniquely faster than other systems (e.g., Hadoop MapReduce), and (2) a distributed programming model with an extensible, easy-to-use API supported by Scala, Java, R, and Python. Despite these features, writing efficient and complex Spark applications is still error-prone, time-consuming, and requires a clear and deep understanding of the inner-workings of Spark. For instance, (1) Spark does not support composition of distributively developed Spark applications; (2) it lacks automatic persisting-caching of distributed data sets for reuse across several operations; and (3) the same task can be implemented in several different ways, with significantly different execution times. The contribution of the thesis is twofold. First, we propose a component-based framework for composing independently developed Spark applications. The framework takes as input a set of sub-Spark applications embedded with input-output interfaces for exchanging datasets, and a configuration file defining the dependencies between these interfaces. Then, it automatically merges them into a single monolithic Spark application. We support our framework with several automatic persisting strategies to optimize the execution of the produced Spark application. Second, we present TaBOS, a transformation-based optimizer for Spark programs. TaBOS takes a Spark program and generates a state-space of semantically equivalent programs by applying a set of rewrite rules. A single rewrite rule replaces a fragment in the program with a new one aiming at performance optimization while preserving its semantics. From the generated state-space, TaBOS selects one optimal program based on a predefined strategy. We introduce several selection strategies (e.g., applying maximum number of transformations, a program with minimum number of heavy operations, prune-search techniques) for identifying an optimal program
dc.format.extent	1 online resource (x, 69 leaves) : illustrations
dc.language.iso	eng
dc.subject.classification	T:006716
dc.subject.lcsh	SPARK (Computer program language)$Big data.$Software engineering.$Electronic data processing -- Distributed processing.
dc.title	Component and transformation based frameworks for building and optimizing Spark programs -
dc.type	Thesis
dc.contributor.department	Faculty of Arts and Sciences.$Department of Computer Science,
dc.contributor.institution	American University of Beirut.

Files in this item

Name: t-6716.pdf

Size: 1.793Mb

Format: PDF

View/Open

This item appears in the following Collection(s)

AUB Students' Theses, Dissertations, and Projects [12709]

Show simple item record

Search AUB ScholarWorks

Browse

All of AUB ScholarWorks
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

My Account

Copyright Statement

All materials included in the institutional repository are protected by copyright laws and are the property of their respective copyright holders. Materials may be used for non-commercial, educational, or research purposes only, and must be cited or attributed to the original source. Permission for any other use must be obtained from the copyright holder(s) directly. The American University of Beirut Libraries does not assume responsibility for any infringement of copyright laws that may occur as a result of the use of materials in the repository. If you believe that your copyright has been infringed upon in the repository, please contact the AUB Libraries immediately.

For further information, please contact us at scholarworks@aub.edu.lb