Sorting

Páginas: 13 (3038 palabras) Publicado: 20 de septiembre de 2011
Advances in Engineering Software 42 (2011) 50–54

Contents lists available at ScienceDirect

Advances in Engineering Software
journal homepage: www.elsevier.com/locate/advengsoft

Fast sort of floating-point data for data engineering
Changsoo Kim, Sungroh Yoon, Dongseung Kim ⇑
School of Electrical Engineering, Korea University, Seoul 136-713, Republic of Korea

a r t i c l e

i n f oa b s t r a c t
In this paper, a novel external sort algorithm that improves the speedup of the sorting of floating-point numbers has been described. Our algorithm decreases the computation time significantly by applying integer arithmetic on floating-point data in the IEEE-754 standard or similar formats. We conducted experiments with synthetic data on a 32-processor Linux cluster; in the caseof the internal sort alone, the Giga-byte sorting achieved approximately fivefold speedups. Furthermore, the sorting achieved twofold or greater improvements over the typical parallel sort method, network of workstations (NOW)-sort. Moreover, the sorting scheme performance is independent of the computing platform. Thus, our sorting method can be successfully applied to binary search, data mining,numerical simulations, and graphics. Ó 2010 Elsevier Ltd. All rights reserved.

Article history: Received 13 May 2010 Received in revised form 19 September 2010 Accepted 26 October 2010

Keywords: Parallel sort Floating-point arithmetic External sort Engineering simulation Workstation cluster Message passing interface

1. Introduction Sorting is a fundamental operation widely used in manyapplications such as data searching, job scheduling, database management, and engineering simulations [1]. Numerous high-performance sorting algorithms have been previously developed to decrease the time required [2–5], including one algorithm that employed fast graphic processor units [6]. The sorting of integer keys is simple and flexible, but the sorting of real numbers needs floating-pointarithmetic for comparison, which usually takes longer time than the integer sorting. If floating-point data can be sorted by integer arithmetic, the execution time can be significantly shortened and the overall task can become considerably faster. Internal sort refers to the ordering of the amount of data that will fit in the main memory, while in external sort, large-scale data that are often stored instorage disks are ordered. Hence, external sort demands multiple iterations of data retrieval from the disk first, ordering computations in the main memory next, and finally, writing back to the disk. Here, slow disk memory and large computations make the process time consuming. Speeding up of such computations can be achieved by parallel external sort algorithms such as the well-known network ofworkstations (NOW)-sort [7], which runs on networked workstations. NOW-sort consists of two phases—the first phase sets approximate boundary values (pivots) and relocates all data to processors based on them. Subsequently, the second phase performs local sorting in parallel; this part of the sorting is fast since data exchange is no longer required
⇑ Corresponding author. Tel.: +82 232903232; fax: +8229288909.
E-mail address: dkimku@korea.ac.kr (D. Kim). 0965-9978/$ - see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.advengsoft.2010.10.017

after the first phase. However, load balance by even data distribution among the processors should be maintained. In this research, a better sort algorithm than NOW-sort has been developed by avoiding floating-point arithmetic.Further, the degree of performance enhancement in previous methods such as parallel sort by regular sampling (PSRS) [8] and partitioned parallel radix sort [3] is not comparable to ours. In addition, the improvement by our algorithm is independent of processor architecture and computer hardware.

2. Comparison by integer translation As mentioned previously, many sorting algorithms use comparisons to...
Leer documento completo

Regístrate para leer el documento completo.

Estos documentos también te pueden resultar útiles

  • Card Sorting
  • Card sorting
  • Estructura de datos SORTING
  • Celda sorting
  • An accurate and efficient method for sorting biomass extracted from soil cores using point- intercept sampling
  • Sorting Things Out Through Endoplasmic Reticulum Quality Control

Conviértase en miembro formal de Buenas Tareas

INSCRÍBETE - ES GRATIS