Make mdspan size_type controllable| Document #: | P2553R1 | 
| Date: | 2022-03-15 | 
| Project: | Programming Language C++ LEWG | 
| Reply-to: | Christian Trott <crtrott@sandia.gov> Damien Lebrun-Grandie <lebrungrandt@ornl.gov> Mark Hoemmen <mhoemmen@stellarscience.com> Daniel Sunderland <dansunderland@gmail.com> | 
unsigned_integral
conceptextents constructor to account for
signed typessize_typeunsigned_integralsize_type for
extentsP0009 explicitly sets the size_type of
extents to size_t, which is then used by
layout mappings and mdspan. While this matches
span whose extent function returns
size_t, this behavior has significant performance impact on
various architectures where 64-bit integer throughput is significantly
lower than 32-bit integer computation throughput.
While that problem is present for span it is a much
bigger issue for mdspan, since every data access into a
multi-dimensional array is accompanied by an offset calculation which
typically costs two integer operations per dimension.
On current GPUs, which are an essential hardware component in
machines used for high performance computing, machine learning and
graphics (arguably the core constituency for mdspan), the
ratio between 32-bit and 64-bit integer throughput is often 2x-4x.
To gauge the impact, we investigated a simple stencil code benchmark,
which is hosted in the mdspan
reference implementation. That benchmark is using CUDA and compares
a variant with raw pointers and explicit index calculation against a
version which uses mdspan.
The mdspan variant does in each CUDA thread the
following code:
for(size_t i = blockIdx.x+d; i < s.extent(0)-d; i += gridDim.x) {
  for(size_t j = threadIdx.z+d; j < s.extent(1)-d; j += blockDim.z) {
    for(size_t k = threadIdx.y+d; k < s.extent(2)-d; k += blockDim.y) {
      value_type sum_local = 0;
      for(size_t di = i-d; di < i+d+1; di++) {
      for(size_t dj = j-d; dj < j+d+1; dj++) {
      for(size_t dk = k-d; dk < k+d+1; dk++) {
        sum_local += s(di, dj, dk);
      }}}
      o(i,j,k) = sum_local;
    }
  }
}The raw pointer variant looks like this:
for(size_t i = blockIdx.x+d; i < x-d; i += gridDim.x) {
  for(size_t j = threadIdx.z+d; j < y-d; j += blockDim.z) {
    for(size_t k = threadIdx.y+d; k < z-d; k += blockDim.y) {
      value_type sum_local = 0;
      for(size_t di = i-d; di < i+d+1; di++) {
      for(size_t dj = j-d; dj < j+d+1; dj++) {
      for(size_t dk = k-d; dk < k+d+1; dk++) {
        sum_local += data[dk + dj*z + di*z*y];
      }}}
      data_o[k + j*z + i*z*y] = sum_local;
    }
  }
}Running the raw pointer variant with unsigned vs
size_t as the loop indices, results in a timing of 31ms vs
56ms. The same is observed for the mdspan variant when
switching in the mdspan implementation the
size_type from size_t to
unsigned. The 31ms result can also be obtained when leaving
size_type as size_t but casting
extents.extent(r) to the user provided index type inside
the layout mappings index calculation operator
while using unsigned as the loop index type in the
algorithm.
One way to address this issue would be for mappings to do all their
internal calculations with the common_type of the
user-provided indices. That includes casting
extents.extent(i). However the drawback of this approach is
that it is hard to identify overflows, which depend on layout as
well.
// The following is ok, extents converts to size_t, required_span_size returns size_t too
mdspan<double,dextents<3>,layout_right> r(ptr,2000,2000,2000); 
mdspan<double,dextents<3>,layout_left>  l(ptr,2000,2000,2000);
...
r(1,1,1000) = 5; // ok
r(1000,1,1) = 5; // overflow
l(1,1,1000) = 5; // overflow
l(1000,1,1) = 5; // okIn particular, in situations where allocations and
mdspan creation happens in another code, location this
could be an issue.
extents
on size_typeIn order to make overflow a better controllable artifact, and avoid
accidental overflows we can make the index type part of the type. The
natural place for this is extents. Every index calculation
related piece of the mdspan proposal gets its
size_type from extents, specifically both
layout mappings, and mdspan itself is required to get its
public size_type type member from
extents_type::size_type. Furthermore, extents
defines the complete iteration space for mdspan. Note, that
a mapping might require a larger integer range that the product of
extents (e.g. layout_stride::required_span_size can return
a number larger than the product of its extents).
size_typeInstead of modifying extents, we could introduce a new
type basic_extents which is templated on the size type and
the extents, but otherwise is identical to extents: When we
can make anything in the mdspan proposal which accepts
extents also accept basic_extents.
Potentially, extents could be just a template alias to
basic_extents:
template<size_t ... Extents>
using extents = basic_extents<size_t, Extents...>;Unfortunately that means that the following type of code would not work:
template<class T, class L, class A, size_t ... Extents>
void foo(mdspan<T,extents<Extents...>,L,A> a) {...{However we believe the common use case would be to template on the extents object itself, mitigating this issue:
template<class T, class E, class L, class A>
void foo(mdspan<T,E,L,A> a) {...}size_typeWe could also template the layout policy class on
size_type, and use that type for the offset calculation,
casting extents::extent explicitly on use. However this
still means that the size of the object is larger (i.e. we would still
store 64bit extents, instead of 32bit) and that additional casts will
happen.
All in all we prefer the option of making extents
require the additional argument (2.2.2), with the next best thing being
the introduction basic_extents and making
extents an alias to basic_extents with
size_t as the size_type. If LEWG would prefer
the second option, the wording is largely the same with the following
changes at the end:
Rename extents to basic_extents
throughout P0009 and
Add an alias in [mdspan.syn]:
template<size_t ... Extents>
using extents = basic_extents<size_t, Extents...>;LEWG would need to decide whether to make dextents have
a size_type template parameter or not.
In principle we could add a second extents type later, though it may
break code such as the one shown before (in the sense that it wouldn’t
generally work for every instance of mdspan anymore):
template<class T, class L, class A, size_t ... Extents>
void foo(mdspan<T,extents<Extents...>,L,A> a) {...{The proposed changes are relative to the working draft of the standard as of P0009 R16.
Replace:
template<size_t... Extents>
  class extents;
template<size_t Rank>
  using dextents = see below;with:
template<class SizeT, size_t... Extents>
  class extents;
template<class SizeT, size_t Rank>
  using dextents = see below;template<size_t... Extents>
class extents {
public:
  using size_type = size_t;
  using rank_type = size_t;with
template<class SizeT, size_t... Extents>
class extents {
public:
  using size_type = SizeT;
  using rank_type = size_t;  static constexpr size_type static_extent(rank_type) noexcept;with
  static constexpr size_t static_extent(rank_type) noexcept;  template<size_t... OtherExtents>
    explicit(see below)
    constexpr extents(const extents<OtherExtents...>&) noexcept;with:
  template<class OtherSizeT, size_t... OtherExtents>
    explicit(see below)
    constexpr extents(const extents<OtherSizeT, OtherExtents...>&) noexcept;  // [mdspan.extents.cmp], extents comparison operators
  template<size_t... OtherExtents>
    friend constexpr bool operator==(const extents&, const extents<OtherExtents...>&) noexcept;with:
  // [mdspan.extents.cmp], extents comparison operators
  template<class OtherSizeT, size_t... OtherExtents>
    friend constexpr bool operator==(const extents&, const extents<OtherSizeT, OtherExtents...>&) noexcept;  template<class... Integrals>
  static constexpr bool are-valid-extents(Integrals... values) noexcept; // exposition only
  template<class OtherSizeT, size_t... OtherExtents>
  static constexpr bool are-valid-extents(extents<OtherSizeT, OtherExtents>) noexcept; // exposition only
  template<class OtherSizeT, size_t N>
  static constexpr bool are-valid-extents(array<OtherSizeT, N>) noexcept; // exposition onlyIf SizeT is not an integral type other than
bool, then the program is ill-formed.
template<class... Integrals>
static constexpr bool are-valid-extents(Integrals... values) noexcept; // exposition onlyReturns:
true if sizeof...(Integrals) == 0 is
true, otherwise
true if
((values > 0) && ...) is true and
each element of values is a representable value of
size_type, otherwise
false.
template<class OtherSizeT, size_t... OtherExtents>
static constexpr bool are-valid-extents(extents<OtherSizeT, OtherExtents> e) noexcept; // exposition onlyReturns:
true if sizeof...(OtherExtents) == 0 is
true, otherwise
true if e.extent(r) > 0 is
true for all rank index r of e
and e.extent(r) is a representable value of
size_type for all rank index r of
e, otherwise
false.
template<class OtherSizeT, size_t N>
static constexpr bool are-valid-extents(array<OtherSizeT, N> arr) noexcept; // exposition onlyReturns:
true if N == 0 is true,
otherwise
true if for all r in the range of [0,N) arr[r] > 0 is
true and arr[r] is a representable value of
size_type, otherwise
false.
template<size_t... OtherExtents>
  explicit((((Extents!=dynamic_extent) && (OtherExtents==dynamic_extent)) || ... ))
  constexpr extents(const extents<OtherExtents...>& other) noexcept;to:
template<class OtherSizeT, size_t... OtherExtents>
  explicit(see below)
  constexpr extents(const extents<OtherSizeT, OtherExtents...>& other) noexcept;Change Paragraph 2 to:
Preconditions:
other.extent(r) equals Er for each
r for which Er is a static
extent, and
are-valid-extents(other) is
true.
Add new paragraph 4:
Remarks: The expression inside explicit is equivalent to:
  (((Extents!=dynamic_extent) && (OtherExtents==dynamic_extent)) || ... ) ||
  (numeric_limits<size_type>::max() < numeric_limits<OtherSizeT>::max())Change Paragrpah 6 to:
Preconditions:
If sizeof...(SizeTypes) != rank_dynamic() is
true, exts_arr[r] equals Er for each
r for which Er is a static
extent, and
are-valid-extents(exts...) is
true.
Change Paragrpah 9 to:
Preconditions:
If N != rank_dynamic() is true,
exts[r] equals Er for each
r for which Er is a static
extent, and
are-valid-extents(exts) is
true.
Change paragraph 12 to:
Remarks: The deduced type is
dextents<size_t, sizeof...(Integrals)>.
Replace the following:
  static constexpr size_type static_extent(rank_type i) const noexcept;with:
  static constexpr size_t static_extent(rank_type i) const noexcept;Replace the following:
  template<size_t... OtherExtents>
    friend constexpr bool operator==(const extents& lhs,
                                     const extents<OtherExtents...>& rhs) noexcept;with:
  template<class OtherSizeT, size_t... OtherExtents>
    friend constexpr bool operator==(const extents& lhs,
                                     const extents<OtherSizeT, OtherExtents...>& rhs) noexcept;Returns: true if
lhs.rank() == rhs.rank() is true and
lhs.extents(r) equals rhs.extents(r) for every
rank index r of rhs, otherwise
false.
Replace section with:
  template <class SizeT, size_t Rank>
    using dextents = see below;1
Result: A type E that is a specialization of
extents such that
E::rank() == Rank && E::rank() == E::rank_dynamic()
is true and E::size_type denotes
SizeT.
Preconditions: other.required_span_size() is a
representable value of size_type ([basic.fundamental]).
Preconditions: other.required_span_size() is a
representable value of size_type ([basic.fundamental]).
Preconditions:
If extents_type::rank() > 0 is true
then for all r in the range
[0,
extents_type::rank()),
other.stride(r) equals
extents().fwd-prod-of-extents(r),
and
other.required_span_size() is a representable value
of size_type ([basic.fundamental]).
Preconditions: other.required_span_size() is a
representable value of size_type ([basic.fundamental]).
Preconditions: other.required_span_size() is a
representable value of size_type ([basic.fundamental]).
Preconditions:
If extents_type::rank() > 0 is true
then for all r in the range
[0,
extents_type::rank()),
other.stride(r) equals
extents().rev-prod-of-extents(r+1),
and
other.required_span_size() is a representable value
of size_type ([basic.fundamental]).
add in paragraph 6 additiona precondition:
other.stride(r) > 0 is true for all
rank index r of extents(),In the synopsis replace:
  static constexpr size_type static_extent(size_t r) { return Extents::static_extent(r); }with:
  static constexpr size_t static_extent(size_t r) { return Extents::static_extent(r); }In the synopsis replace:
  constexpr size_type size() const;with
  constexpr size_t size() const;In the synopsis replace:
  template <class ElementType, class SizeType, size_t N>
  mdspan(ElementType*, const array<SizeType, N>&)
    -> mdspan<ElementType, dextents<N>>;
  template <class ElementType, size_t... ExtentsPack>
  mdspan(ElementType*, const extents<ExtentsPack...>&)
    -> mdspan<ElementType, extents<ExtentsPack...>>;with:
  template <class ElementType, class SizeType, size_t N>
  mdspan(ElementType*, const array<SizeType, N>&)
    -> mdspan<ElementType, dextents<size_t, N>>;
  template <class ElementType, class SizeT, size_t... ExtentsPack>
    mdspan(ElementType*, const extents<SizeT, ExtentsPack...>&)
    -> mdspan<ElementType, extents<SizeT, ExtentsPack...>>;Change code after paragraph 19 to:
  template <class ElementType, class SizeType, size_t N>
  mdspan(ElementType*, const array<SizeType, N>&)
    -> mdspan<ElementType, dextents<size_t, N>>;Add a sub-bullet in 11.1:
typename SubExtents::size_type is
typename Extents::size_typeThere is an mdspan implementation available at https://github.com/kokkos/mdspan/.
mdspan