### Citations

324 | Static scheduling algorithms for allocating directed task graphs to multiprocessors
- Kwok, Ahmad
- 1999
(Show Context)
Citation Context ...urations to activities. The critical paths in a task graph determine the time to execute the whole graph. If t(a) = t(d) = 1 and t(b) = t(c) = 2, the series-parallel graph on the left will take 4 time units since the longest critical path is from b to c, whereas the N -graph on the right will take 3 time units to execute. 4 Via the MATLAB Parallel Computing Toolbox. http://www.mathworks.com/ 5 From version 7. http://www.wolfram.co.uk/ Some dependencies in a task graph are inherent in the algorithm. Additional dependencies can be imposed by static scheduling, which maps the tasks to processors [6]. Dependencies can also be implicit in the programming constructs that are used to express the algorithm. The left graph above is a series-parallelised version of the N -graph, obtained by adding an arc from activity b to activity c. (There are two other minimal ways to series-parallelise it.) By using the series-parallel (SP) graph with its extra dependencies instead of the nonseries-parallel (NSP) N -graph, we can observe a slowdown of 4/3. This is in fact the least slowdown possible for this task graph and workload. Slowdown is defined as the ratio of the slower SP graph to the faster NSP g... |

98 | A view of the parallel computing landscape
- Asanovic, Bodik, et al.
(Show Context)
Citation Context ...titute of Quantitative Finance 3 LFCS, School of Informatics, University of Edinburgh Abstract. Standards bodies and commercial software vendors have defined parallel constructs to harness the parallelism in computations. Using the task graph model of parallel program execution, we show how common programming constructs that impose series-parallel task dependencies can lead to unbounded slowdown compared to the inherent parallelism in the algorithm. We describe various ways in which this slowdown can be avoided. Inexpensive multicore processors have brought parallelism to the desktop computer [2] and users would like to take advantage of this parallelism for faster program execution. Standards for multiple-processor programming such as OpenCL [7] and commercial numerical software such as Matlab4 and Mathematica5 include language constructs for parallelism. Our position is that these constructs may limit the amount of parallelism, causing slowdown, but we also argue that there are ways to avoid this unnecessary loss in performance. With the projected progression from multicore computing (2-8 cores) to manycore computing (hundreds of cores) [10], we believe that parallel computing syste... |

65 | Perfect pipelining: A new loop parallelization technique.
- Aiken, Nicolau
- 1988
(Show Context)
Citation Context ...sidering the options for dealing with this. General techniques: One option is to choose a parallel language with more expressive constructs (ensuring that it does not lose parallelism unnecessarily during compilation). The disadvantage is that programming then becomes more complex and introduces the possibility of unexpected behaviour such as deadlocks. Alternatively, it would be possible for numerical software such as MATLAB to introduce a parallelisation phase in which sequential code is assessed for data dependencies and these could then be used to introduce as much parallelism as possible [1,8]. Limited knowledge of workload: If it is known that the variance in the duration of tasks is neglible, then using inherently SP constructs is unlikely to lead to large slowdown. For a simple technique for series-parallelisation that gives level-constrained (LC) graphs, the slowdown is bounded by the ratio of the longest task to the shortest task [9]. If the variance is low, this will be close to 1, hence very little slowdown. Full knowledge of workload: If workloads can be fully assigned to tasks in advance of execution, then careful hand-coding is a good approach, especially when the resulti... |

20 | Sensitivity analysis for automatic parallelization on multi-cores
- Rus, Pennings, et al.
- 2007
(Show Context)
Citation Context ...sidering the options for dealing with this. General techniques: One option is to choose a parallel language with more expressive constructs (ensuring that it does not lose parallelism unnecessarily during compilation). The disadvantage is that programming then becomes more complex and introduces the possibility of unexpected behaviour such as deadlocks. Alternatively, it would be possible for numerical software such as MATLAB to introduce a parallelisation phase in which sequential code is assessed for data dependencies and these could then be used to introduce as much parallelism as possible [1,8]. Limited knowledge of workload: If it is known that the variance in the duration of tasks is neglible, then using inherently SP constructs is unlikely to lead to large slowdown. For a simple technique for series-parallelisation that gives level-constrained (LC) graphs, the slowdown is bounded by the ratio of the longest task to the shortest task [9]. If the variance is low, this will be close to 1, hence very little slowdown. Full knowledge of workload: If workloads can be fully assigned to tasks in advance of execution, then careful hand-coding is a good approach, especially when the resulti... |

13 | Scheduling UET-UCT series-parallel graphs on two processors.
- Finta, Liu, et al.
- 1996
(Show Context)
Citation Context ...mer, between the concept and the keyboard. We focus on a specific structure on the dependencies between program tasks which some constructs impose. This structure is called series-parallel and can be most easily expressed as those task graphs generated by the language P ::= seq(P, P ) |par(P, P ) |a where a is a task or activity which represents some amount of program code (possibly as small as a single arithmetic operation) to be executed on one processor. Series-parallel task graphs are not only easy to express, but also have modular structure which can be exploited for efficient scheduling [3,11]. We can represent the tasks in a parallel program, and the dependencies between tasks, in a task graph (also known as an activity network). A task graph is a directed graph where each node is labelled with a distinct activity name, and associated with each activity is a positive real number, its duration, describing how long the activity will take to execute. Arcs in the task graph capture dependencies between tasks (also known as precedence constraints). The left graph in the figure is series-parallel and can be written as seq(par(a, b), par(c, d)). The right graph is not series-parallel a b... |

12 | The importance of synchronization structure in parallel program optimization.
- Gemund
- 1997
(Show Context)
Citation Context ...e first, workloads are not known, namely it is assumed that they are not known a priori [4]. In the second, workload information is used [9]. Note that in the case of the parallel-for construct, there is no explicit transformation as such but there is an NSP version given by the inherent data dependencies of the algorithm, hence this can be considered as a transformation without workload information. For the case when workloads are not used in the series-parallelisation transformation or algorithm, it has been hypothesised that except for pathological workloads, the slowdown is bounded by two [12]. This can be expressed logically as ∃Y ∀G ∀t T (G, t)/T (Y (G), t) ≤ 2 where Y is an algorithm to convert an NSP graph to an SP graph that does not consider t, and T (G, t) is the shortest possible time to execute G with workload t (obtained from the critical path of G with the largest execution time). Less formally, the hypothesis is that any task graph can be series-parallelised with slowdown at most 2, even if the details of the workload in obtaining the SP version are ignored. In our opinion, this statement is false. We have shown that for every algorithm, it is possible to find a graph a... |

9 | The new landscape of parallel computer architecture.
- Shalf
- 2007
(Show Context)
Citation Context ...rought parallelism to the desktop computer [2] and users would like to take advantage of this parallelism for faster program execution. Standards for multiple-processor programming such as OpenCL [7] and commercial numerical software such as Matlab4 and Mathematica5 include language constructs for parallelism. Our position is that these constructs may limit the amount of parallelism, causing slowdown, but we also argue that there are ways to avoid this unnecessary loss in performance. With the projected progression from multicore computing (2-8 cores) to manycore computing (hundreds of cores) [10], we believe that parallel computing systems should avoid slowdown at the point of expressing the intention of the programmer, between the concept and the keyboard. We focus on a specific structure on the dependencies between program tasks which some constructs impose. This structure is called series-parallel and can be most easily expressed as those task graphs generated by the language P ::= seq(P, P ) |par(P, P ) |a where a is a task or activity which represents some amount of program code (possibly as small as a single arithmetic operation) to be executed on one processor. Series-parallel ... |

6 | On the loss of parallelism by imposing synchronization structure.
- Escribano, Payo, et al.
- 1997
(Show Context)
Citation Context ...a2,3 . . . a2,m . . . ... ... ... .... . . at,1 at,2 at,3 . . . at,m In the NSP graph, there is no requirement that all tasks must finished before the second row is finished, so for example if a1,1, a1,2, a1,3 have completed then a2,2 can start regardless of whether a1,4 has finished. This is not the case in the SP graph which is a series-parallelisation of the NSP graph. We now consider transforming an NSP graph to an SP graph. There are two classes of approaches to series-parallelising a task graph. In the first, workloads are not known, namely it is assumed that they are not known a priori [4]. In the second, workload information is used [9]. Note that in the case of the parallel-for construct, there is no explicit transformation as such but there is an NSP version given by the inherent data dependencies of the algorithm, hence this can be considered as a transformation without workload information. For the case when workloads are not used in the series-parallelisation transformation or algorithm, it has been hypothesised that except for pathological workloads, the slowdown is bounded by two [12]. This can be expressed logically as ∃Y ∀G ∀t T (G, t)/T (Y (G), t) ≤ 2 where Y is an a... |

2 | Performance implications of synchronization structure in parallel programming. - Gonzalez-Escribano, Gemund, et al. - 2009 |

1 | Bounds on series-parallel slowdown,
- Salamon, Galpin
- 2009
(Show Context)
Citation Context ...t,2 at,3 . . . at,m In the NSP graph, there is no requirement that all tasks must finished before the second row is finished, so for example if a1,1, a1,2, a1,3 have completed then a2,2 can start regardless of whether a1,4 has finished. This is not the case in the SP graph which is a series-parallelisation of the NSP graph. We now consider transforming an NSP graph to an SP graph. There are two classes of approaches to series-parallelising a task graph. In the first, workloads are not known, namely it is assumed that they are not known a priori [4]. In the second, workload information is used [9]. Note that in the case of the parallel-for construct, there is no explicit transformation as such but there is an NSP version given by the inherent data dependencies of the algorithm, hence this can be considered as a transformation without workload information. For the case when workloads are not used in the series-parallelisation transformation or algorithm, it has been hypothesised that except for pathological workloads, the slowdown is bounded by two [12]. This can be expressed logically as ∃Y ∀G ∀t T (G, t)/T (Y (G), t) ≤ 2 where Y is an algorithm to convert an NSP graph to an SP graph t... |