Publications

  • Journal Papers

  • “DHkmeans-ℓdiversity: distributed hierarchical K-means for satisfaction of the ℓ-diversity privacy model using Apache Spark”

    Farough Ashkouti, Keyhan Khamforoosh, Amir Sheikhahmadi, Hana KhamfroushThe Journal of Supercomputing • 2021

    Abstract

    One of the main steps in the data lifecycle is to publish it for data analysts to discover hidden patterns. But, data publishing may lead to unwanted disclosure of personal information and cause privacy problems. Data anonymization techniques preserve privacy models to prevent the disclosure of individuals’ private information in published data. In this paper, a distributed in-memory method is proposed on the Apache Spark framework to preserve the ℓ-diversity privacy model. This method anonymizes large-scale data in a three-phase process, which includes, seed selection, data clustering for ℓ-diversity, and finalizing phase. In this method, a hierarchical kmeans-based data clustering algorithm has been designed for data anonymization. One of the major challenges of anonymization methods is to establish a better trade-off between data utility and privacy. Therefore, for calculating the distance between records and forming more cohesive ℓdiverse-clusters, the authors have designed two Manhattan-based and Euclidean-based distance functions to satisfy the requirements of the ℓ-diversity model. Given the 100-fold speed of the Spark compared to MapReduce, the proposed method is presented using in-memory RDD programming in Apache Spark, to address the runtime, scalability, and performance in large-scale data anonymization as it exists in the previous MapReduce-based algorithms. Our method provides general knowledge to use parallel in-memory computation of Spark in big data anonymization. In experiments, this method has obtained lower information loss and loses about 1% to 2% accuracy and FMeasure criteria; therefore, it establishes a better trade-off than the state-of-the-art MapReduce-based Mondrian methods.

    Citation (BibTex)

    @article{ashkouti2021dhkmeans, title={DHkmeans-ℓdiversity: distributed hierarchical K-means for satisfaction of the ℓ-diversity privacy model using Apache Spark},author={Ashkouti, Farough and Khamforoosh, Keyhan and Sheikhahmadi, Amir and Khamfroush, Hana},journal={The Journal of Supercomputing},pages={1--35},year={2021},publisher={Springer}}

  • “Service Placement and Request Scheduling for Data-Intensive Applications in Edge Clouds”

    Vajihe Farhadi, Fidan Mehmeti, Tom La Porta, Ting He, Hana Khamfroush, Shiqiang Wang, Kevin ChanIEEE/ACM Transactions on Networking • 2021

    Abstract

    Mobile edge computing provides the opportunity for wireless users to exploit the power of cloud computing without a large communication delay. To serve data-intensive applications (e.g., video analytics, machine learning tasks) from the edge, we need, in addition to computation resources, storage resources for storing server code and data as well as network bandwidth for receiving user-provided data. Moreover, due to time-varying demands, the code and data placement needs to be adjusted over time, which raises concerns of system stability and operation cost. In this paper, we address these issues by proposing a two-time-scale framework that jointly optimizes service (code and data) placement and request scheduling, while considering storage, communication, computation, and budget constraints. First, by analyzing the hardness of various cases, we completely characterize the complexity of our problem. Next, we develop a polynomial-time service placement algorithm by formulating our problem as a set function optimization, which attains a constant-factor approximation under certain conditions. Furthermore, we develop a polynomial-time request scheduling algorithm by computing the maximum flow in a carefully constructed auxiliary graph, which satisfies hard resource constraints and is provably optimal in the special case where requests have homogeneous resource demands. Extensive synthetic and trace-driven simulations show that the proposed algorithms achieve 90% of the optimal performance.

  • “Behavioral information diffusion for opinion maximization in online social networks”

    Nathaniel Hudson, Hana KhamfroushIEEE Transactions on Network Science & Engineering • 2020

    Abstract

    Online social networks provide a platform to diffuse information and influence people's opinion. Conventional models for information diffusion do not take into account the specifics of each users’ personality, behavior, and their opinion. This work adopts the “Big Five” model from the social sciences to ascribe each user node with a personality. We propose a behavioral independent cascade (BIC) model that considers the personalities and opinions of user nodes when computing propagation probabilities for diffusion. We use this model to study the opinion maximization (OM) problem and prove it is NP-hard under our BIC model. Under the BIC model, we show that the objective function of the proposed OM problem is not submodular. We then propose an algorithm to solve the OM problem in linear-time based on a state-of-the-art influence maximization (IM) algorithm. We run extensive simulations under four cases where initial opinion is distributed in polarized/non-polarized and community/non-community cases. We find that when communities are polarized, activating a large number of nodes is ineffective towards maximizing opinion. Further, we find that our proposed algorithm outperforms state-of-the-art IM algorithms in terms of maximizing opinion in uniform opinion distribution—despite activating fewer nodes to be spreaders.

  • “On Fundamental Bounds on Failure Identifiability by Boolean Network Tomography”

    Novella Bartolini, Ting He, Viviana Arrigoni, Annalisa Massini, Federico Trombetti, Hana KhamfroushIEEE/ACM Transactions on Networking • 2020

    Abstract

    Boolean network tomography is a powerful tool to infer the state (working/failed) of individual nodes from path-level measurements obtained by edge-nodes. We consider the problem of optimizing the capability of identifying network failures through the design of monitoring schemes. Finding an optimal solution is NP-hard and a large body of work has been devoted to heuristic approaches providing lower bounds. Unlike previous works, we provide upper bounds on the maximum number of identifiable nodes, given the number of monitoring paths and different constraints on the network topology, the routing scheme, and the maximum path length. These upper bounds represent a fundamental limit on identifiability of failures via Boolean network tomography. Our analysis provides insights on how to design topologies and related monitoring schemes to achieve the maximum identifiability under various network settings. Through analysis and experiments we demonstrate the tightness of the bounds and efficacy of the design insights for engineered as well as real networks.

  • “PicSys: Energy-Efficient Fast Image Search on Distributed Mobile Networks”

    Noor Felemban, Fidan Mehmeti, Hana Khamfroush, Zongqing Lu, Swati Rallapalli, Kevin S. Chan, Tom La PortaIEEE Transactions on Mobile Computing • 2019

    Abstract

    Mobile devices collect a large amount of visual data that are useful for many applications. Searching for an object of interest over a network of mobile devices can aid human analysts in a variety of situations. However, processing the information on these devices is a challenge owing to the high computational complexity of the state-of-the-art computer vision algorithms that primarily rely on Convolutional Neural Networks (CNNs). Thus, this paper builds PicSys, a system that enables answering visual search queries on a mobile network. The objective of the system is to minimize the maximum completion time over all devices while taking into account the energy consumption of mobile devices as well. First, PicSys carefully divides the computation into multiple filtering stages, such that only a small percentage of images need to run the entire CNN pipeline. Splitting such CNN computation into multiple stages requires understanding the intermediate CNN features and systematically trading off accuracy for the computation speed. Second, PicSys determines where to run each of the stages of the multi-stage pipeline to fully utilize the available resources. Finally, through extensive experimentation, system implementation, and simulation, we show that PicSys performance is close to optimal and significantly outperforms other standard algorithms.

  • “Influence spread in two-layer interdependent networks: designed single-layer or random two-layer initial spreaders?”

    Hana Khamfroush, Nathaniel Hudson, Samuel Iloo, Mahshid Rahnamay-NaeiniSpringer Applied Network Science • 2019

    Abstract

    Influence spread in multi-layer interdependent networks (M-IDN) has been studied in the last few years; however, prior works mostly focused on the spread that is initiated in a single layer of an M-IDN. In real world scenarios, influence spread can happen concurrently among many or all components making up the topology of an M-IDN. This paper investigates the effectiveness of different influence spread strategies in M-IDNs by providing a comprehensive analysis of the time evolution of influence propagation given different initial spreader strategies. For this study we consider a two-layer interdependent network and a general probabilistic threshold influence spread model to evaluate the evolution of influence spread over time. For a given coupling scenario, we tested multiple interdependent topologies, composed of layers A and B, against four cases of initial spreader selection: (1) random initial spreaders in A, (2) random initial spreaders in both A and B, (3) targeted initial spreaders using degree centrality in A, and (4) targeted initial spreaders using degree centrality in both A and B. Our results indicate that the effectiveness of influence spread highly depends on network topologies, the way they are coupled, and our knowledge of the network structure — thus an initial spread starting in only A can be as effective as initial spread starting in both A and B concurrently. Similarly, random initial spread in multiple layers of an interdependent system can be more severe than a comparable initial spread in a single layer. Our results can be easily extended to different types of event propagation in multi-layer interdependent networks such as information/misinformation propagation in online social networks, disease propagation in offline social networks, and failure/attack propagation in cyber-physical systems.

  • “Critical Component Analysis in Cascading Failures for Power Grids Using Community Structures in Interaction Graphs”

    Upama Nakarmi, Mahshid Rahnamay-Naeini, Hana KhamfroushIEEE Transactions on Network Science & Engineering • 2019

    Abstract

    Cascading phenomena have been studied extensively in various networks. Particularly, it has been shown that the community structures in networks impact their cascade processes. However, the role of community structures in cascading failures in power grids have not been studied heretofore. In this paper, cascading failures in power grids are studied using interaction graphs. Key evidence has been provided that the community structures in interaction graphs bear critical information about the cascade process and the role of system components in cascading failures in power grids. Furthermore, a centrality measure based on the community structures is proposed to identify critical components of the system, which their protection can help in containing failures within a community and prevent the propagation of failures to large sections of the power grid. Various criticality evaluation techniques, including data driven, epidemic simulation based, power system simulation based, and graph based, have been used to verify the importance of the identified critical components in the cascade process and compare them with those identified by traditional centrality measures. Moreover, it has been shown that the loading level of the power grid impacts the interaction graph and consequently, the community structure and criticality of the components in the cascade process.

  • “On progressive network recovery from massive failures under uncertainty”

    Diman Zad Tootaghaj, Novella Bartolini, Hana Khamfroush, Thomas La PortaIEEE Transactions on Network & Service Management • 2018

  • “Mitigation and recovery from cascading failures in interdependent networks under uncertainty”

    Diman Zad Tootaghaj, Novella Bartolini, Hana Khamfroush, Ting He, Nilanjan Ray Chaudhuri, Thomas La PortaIEEE Transactions on Control of Network Systems • 2018

    Abstract

    The interdependence of multiple networks makes today's infrastructures more vulnerable to failures. Prior works mainly focused on robust network design and recovery strategies after failures, given complete knowledge of failure location. Nevertheless, in real-world scenarios, the location of failures might be unknown or only partially known. In this paper, we focus on cascading failures involving the power grid and its communication network with imprecision in failure assessment. We consider a model where functionality of the power grid and its failure assessment rely on the operation of a monitoring system and vice versa. We address ongoing cascading failures with a twofold approach: first, once a cascading failure is detected, we limit further propagation by redispatching generation and shedding loads; and second, we formulate a recovery plan to maximize the total amount of load served during the recovery intervention. We performed extensive simulations on real network topologies showing the effectiveness of the proposed approach in terms of number of disrupted power lines and total served load.

  • “On Propagation of Phenomena in Interdependent Networks”

    Hana Khamfroush, Novella Bartolini, Thomas La Porta, Ananthram Swami, Justin DillmanIEEE Transactions on Network Science & Engineering • 2016

    Abstract

    When multiple networks are interconnected because of mutual service interdependence, propagation of phenomena across the networks is likely to occur. Depending on the type of networks and phenomenon, the propagation may be a desired effect, such as the spread of information or consensus in a social network, or an unwanted one, such as the propagation of a virus or a cascade of failures in a communication or service network. In this paper, we propose a general analytic model that captures multiple types of dependency and of interaction among nodes of interdependent networks, that may cause the propagation of phenomena. The above model is used to evaluate the effects of different diffusion models in a wide range of network topologies, including different models of random graphs and real networks. We propose a new centrality metric and compare it to more traditional approaches to assess the impact of individual network nodes in the propagation. We propose guidelines to design networks in which the diffusion is either a desired phenomenon or an unwanted one, and consequently must be fostered or prevented, respectively. We performed extensive simulations to extend our study to large networks and to show the benefits of the proposed design solutions.

  • “Network Coding for Hop-by-Hop Communication Enhancement in Multi-hop Networks”

    Peyman Pahlevani, Hana Khamfroush, Daniel E. Lucani, Morten V. Pederson, Frank H. P. FitzekElsevier Computer Networks • 2016

    Abstract

    In our recent study, we introduced the PlayNCool protocol that increases the throughput of the wireless networks by enabling a helper node to strengthen the communication link between two neighboring nodes and using random linear network coding. This paper focuses on design and implementation advantages of the PlayNCool protocol in a real environment of wireless mesh networks. We provide a detailed protocol to implement PlayNCool that is independent from the other protocols in the current computer network stack. PlayNCool performance is evaluated using NS–3 simulations and real-life measurements using Aalborg University’s Raspberry Pi test-bed. Our results show that selecting the best policy to activate the helper node is a key to guarantee the performance of PlayNCool protocol. We also study the effect of neighbor nodes in the performance of PlayNCool. Using a helper in presence of active neighbors is useful even if the channel from helper to destination is not better than the channel between sender and destination. PlayNCool increases the gain of end-to-end communication by two-fold or more while maintaining compatibility to standard wireless ad-hoc routing protocols.

  • “On Optimal Policies for Network Coded Cooperation: Theory and Implementation”

    Hana Khamfroush, Daniel E. Lucani, Peyman Pahlevani, João BarrosIEEE Journal on Selected Areas in Communication (JSAC) • 2015

  • “Network-Coded Cooperation Over Time-Varying Channels”

    Hana Khamfroush, Daniel E. Lucani, João Barros, Peyman PahlevaniIEEE Transactions on Communications • 2014

  • 13 papers

© 2021 Hana Khamfroush • Built with Bootstrap