Skip to content

Commit e91dce5

Browse files
author
Mathieu Lobet
committed
few more corrections
1 parent d67b6c7 commit e91dce5

File tree

7 files changed

+124
-98
lines changed

7 files changed

+124
-98
lines changed

courses/01_beginners/main.tex

Lines changed: 46 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
\usepackage[font=Times, timeinterval=1, timeduration=15]{tdclock}
88
\usepackage{fontspec}
99
\usepackage{pifont}
10+
\usepackage{hyperref}
1011

1112
% Définir la police principale
1213
% \setmainfont{Clear Sans Light}
@@ -216,7 +217,8 @@ \section{Introduction}
216217

217218
\begin{itemize}
218219
\item A supercomputer is a distributed memory system composed of many compute nodes packed into racks and linked by a high-speed network
219-
\item Compute nodes are composed of one or more CPUs and one or more GPUs
220+
\item Compute nodes are composed of one or more CPUs and one or more accelerators
221+
\item Most common accelerators today are GPGPU for General Purpose Graphic Processing Units
220222
\end{itemize}
221223

222224
\begin{center}
@@ -232,7 +234,7 @@ \section{Introduction}
232234
Architecture description
233235

234236
\begin{itemize}
235-
\item CPUs are designed for general purpose, from sequential task to parallel computing
237+
\item CPUs are designed for general purpose, from sequential task to parallel computing. They run the operating system as well.
236238
\item tens to hundred of cores in biggest processors
237239
\item SIMD (Single Instruction Multiple Data) units for accelerating arithmetic operations
238240
\end{itemize}
@@ -251,7 +253,7 @@ \section{Introduction}
251253

252254
\small
253255
\begin{itemize}
254-
\item GPGPUs (General Purpose Graphic Processing Units) are designed to achieve massive parallelism of simple kernels
256+
\item GPUs are designed to achieve massive parallelism of simple kernels
255257
\item hundreds of computing units, thousands of threads
256258
\item Large SIMT vector unit (Single Instruction Multiple Threads) per computing unit
257259
\end{itemize}
@@ -278,6 +280,25 @@ \section{Introduction}
278280

279281
% _____________________________________________________________________________
280282

283+
\begin{frame}
284+
\frametitle{Host / Device model}
285+
286+
\begin{itemize}
287+
\item The program is always started first on the CPU.
288+
\item The CPU is often referred to as the \highlight{Host}
289+
\item The CPU orchestrates when the kernels are launched on the GPU and how to make the memory transfers
290+
\item The GPU side waits for kernels to execute
291+
\item The GPU is often referred to as the \highlight{Device}
292+
\end{itemize}
293+
294+
\begin{alertblock}{}
295+
Today's GPU can not work standalone
296+
\end{alertblock}
297+
298+
\end{frame}
299+
300+
% _____________________________________________________________________________
301+
281302
\begin{frame}
282303
\frametitle{SIMD versus SIMT}
283304

@@ -602,7 +623,7 @@ \section{Basic concepts of Kokkos}
602623
\includegraphics[width=0.4\textwidth]{../../images/sleeping_otter.png}
603624
\end{center}
604625

605-
Go to the folder \texttt{01\_compiling\_kokkos} and follow the instructions in the README file
626+
Go to the folder \href{https://github.com/CExA-project/cexa-kokkos-tutorials/tree/main/exercises/01_compiling_kokkos}{01\_compiling\_kokkos} and follow the instructions in the README file
606627

607628
Goal of this Exercise:
608629
\begin{itemize}
@@ -694,7 +715,7 @@ \section{Basic concepts of Kokkos}
694715

695716
\begin{itemize}
696717
\item Kokkos propagates its compilation flags to your program
697-
\item If Kokkos not detected in your paths, use the \texttt{Kokkos\_ROOT} CMake variable to specify the path to Kokkos
718+
\item If Kokkos is not detected in your paths, use the \texttt{Kokkos\_ROOT} CMake variable to specify the path to Kokkos
698719
\end{itemize}
699720

700721
\small
@@ -715,6 +736,8 @@ \section{Basic concepts of Kokkos}
715736
\includegraphics[width=0.4\textwidth]{../../images/sleeping_otter.png}
716737
\end{center}
717738
739+
Go to the folder \href{https://github.com/CExA-project/cexa-kokkos-tutorials/tree/main/exercises/02_first_program}{02\_first\_program} and follow the instructions in the README file
740+
718741
Goal of this Exercise:
719742
720743
\begin{itemize}
@@ -739,7 +762,7 @@ \section{Basic concepts of Kokkos}
739762
\begin{itemize}
740763
\item No need to allocate or deallocate memory by hand
741764
\item Vendor-specific memory allocation is hidden
742-
\item unified semantic and portable memory management (CPU and GPU)
765+
\item Unified semantic and portable memory management (CPU and GPU)
743766
\item Advanced capability (abstracted layout, subarray, multidimensionality, etc.)
744767
\end{itemize}
745768
@@ -840,7 +863,7 @@ \section{Basic concepts of Kokkos}
840863
841864
\begin{itemize}
842865
\item A View lives in a specific memory space (Host or Device) not both
843-
\item If Kokkos is compiled with \textbf{a CPU backend only}, the View data is allocated in the \textbf{Host memory} by defaults
866+
\item If Kokkos is compiled with \highlight{a CPU backend only}, the View data is allocated in the \highlight{Host memory} by defaults
844867
\end{itemize}
845868
846869
\centering
@@ -854,8 +877,8 @@ \section{Basic concepts of Kokkos}
854877
\frametitle{Where does the data reside?}
855878
856879
\begin{itemize}
857-
\item If Kokkos is compiled with a \textbf{GPU backend}, the View data is allocated in the \textbf{Device memory} by default
858-
\item We will later how to allocate and copy data between the Host and the Device
880+
\item If Kokkos is compiled with a \highlight{GPU backend}, the View data is allocated in the \highlight{Device memory} by default
881+
\item We will later see how to allocate and copy data between the Host and the Device
859882
\end{itemize}
860883
861884
\centering
@@ -876,7 +899,7 @@ \section{Basic concepts of Kokkos}
876899
\item Allocations only happen when explicitly specified
877900
\item Copy construction and assignment are shallow. So, you pass Views by value, not by reference. (Python like)
878901
\item Reference counting is used for automatic deallocation (like shared pointers)
879-
\item Metadata (rank, extend, etc) is however always accessible on the Host
902+
\item Metadata (rank, extent, etc) is however always accessible on the Host
880903
\end{itemize}
881904
882905
\begin{alertblock}{}
@@ -934,7 +957,7 @@ \section{Basic concepts of Kokkos}
934957
\end{itemize}
935958
936959
\begin{block}{}
937-
Go to the exercise \texttt{02\_basic\_view} and follow the instructions in the README file
960+
Go to the exercise \href{https://github.com/CExA-project/cexa-kokkos-tutorials/tree/main/exercises/03_basic_view}{03\_basic\_view} and follow the instructions in the README file
938961
\end{block}
939962
940963
\end{frame}
@@ -945,7 +968,7 @@ \section{Basic concepts of Kokkos}
945968
\frametitle{Understand the notion of memory space}
946969
947970
\begin{itemize}
948-
\item Kokkos provides an abstraction of where the data lives: the memory space
971+
\item Kokkos provides an abstraction of where the data lives: \textbf{the memory space}
949972
\item A View is always associated with a defined memory space (Host or Device for instance) at compilation time
950973
\item Default behavior: View data is allocated in the Host memory space if no GPU backend is available, else in the Device memory space
951974
\end{itemize}
@@ -957,7 +980,7 @@ \section{Basic concepts of Kokkos}
957980
\frametitle{Understand the notion of memory space}
958981
959982
\begin{itemize}
960-
\item Problem: how to deal with data residing in different memory spaces?
983+
\item \textbf{Problem:} how to deal with data residing in different memory spaces?
961984
\end{itemize}
962985
963986
\begin{center}
@@ -997,7 +1020,7 @@ \section{Basic concepts of Kokkos}
9971020
\frametitle{Mirror Views presentation}
9981021
9991022
\begin{itemize}
1000-
\item Solution: we need linked host and device view to access the data on both sides
1023+
\item \textbf{Solution:} we need linked host and device view to access the data on both sides
10011024
\item Kokkos provides the notion of \textbf{mirror views}
10021025
\item A mirror view is a view that is a copy of another view but in a different memory space
10031026
\item There is a specific function to create a mirror view called \texttt{create\_mirror}
@@ -1011,7 +1034,7 @@ \section{Basic concepts of Kokkos}
10111034
\end{minted}
10121035
10131036
\begin{itemize}
1014-
\item Mirror views automatically inherit the properties of the original view (extent, layout, etc)
1037+
\item A mirror view automatically inherits the properties of the original view (extent, layout, etc)
10151038
\end{itemize}
10161039
10171040
\end{frame}
@@ -1126,7 +1149,7 @@ \section{Basic concepts of Kokkos}
11261149
\end{itemize}
11271150
11281151
\begin{block}{}
1129-
Go to the exercise \texttt{03\_deep\_copy} and follow the instructions in the README file
1152+
Go to the exercise \href{https://github.com/CExA-project/cexa-kokkos-tutorials/tree/main/exercises/04_deep_copy}{04\_deep\_copy} and follow the instructions in the README file
11301153
\end{block}
11311154
11321155
\end{frame}
@@ -1215,6 +1238,7 @@ \section{Basic concepts of Kokkos}
12151238
});
12161239
\end{minted}
12171240
1241+
\normalsize
12181242
\begin{itemize}
12191243
\item The previous OpenMP version only works on CPUs (need \texttt{target} directive for GPUs)
12201244
\item The same Kokkos version works on CPUs and GPUs depending on the compile backend
@@ -1254,6 +1278,7 @@ \section{Basic concepts of Kokkos}
12541278
\item If Kokkos is compiled with a GPU backend, the loop is executed on GPU by default
12551279
\item Else, the loop is executed on the CPU
12561280
\item Non-kokkos C++ code is always executed on the Host
1281+
\item Kokkos loops can have a name for debugging purpose
12571282
\end{itemize}
12581283
12591284
\end{frame}
@@ -1273,13 +1298,14 @@ \section{Basic concepts of Kokkos}
12731298
// Kokkos parallel loop
12741299
Kokkos::parallel_for("my_loop", N, KOKKOS_LAMBDA(int i) {...});
12751300
1276-
// Host code
1301+
// Host code executed during the parallel loop
1302+
// if Kokkos is compiled with a GPU backend
12771303
for (int i = 0; i < N; i++) { ... }
12781304
\end{minted}
12791305
12801306
\normalsize
12811307
1282-
Example: if Kokkos uses a GPU backend, the parallel loop is executed asynchronously on the GPU:
1308+
12831309
\begin{itemize}
12841310
\item If Kokkos uses a GPU backend, the parallel loop is executed asynchronously on the GPU
12851311
\item \textbf{Problem:} what if I need the results of the parallel loop in the host code?
@@ -1329,7 +1355,7 @@ \section{Basic concepts of Kokkos}
13291355
\end{itemize}
13301356
13311357
\begin{block}{}
1332-
Go to the exercise \texttt{05\_parallel\_loop} and follow the instructions in the README file
1358+
Go to the exercise \href{https://github.com/CExA-project/cexa-kokkos-tutorials/tree/main/exercises/05_parallel_loop}{05\_parallel\_loop} and follow the instructions in the README file
13331359
\end{block}
13341360
13351361
\end{frame}
@@ -1354,7 +1380,7 @@ \section{Basic concepts of Kokkos}
13541380
13551381
\begin{itemize}
13561382
\item \texttt{ExecutionSpace} is an optional template parameter that specifies the execution space, by default the execution space is \texttt{DefaultExecutionSpace}.
1357-
\item \texttt{start\_index} and \texttt{end\_inde<x} are the start and end indexes of the loop
1383+
\item \texttt{start\_index} and \texttt{end\_index} are the beginning and end of the loop
13581384
\end{itemize}
13591385
13601386
\small

images/cpu_vs_gpu.png

9.64 KB
Loading

0 commit comments

Comments
 (0)