CCore -> PtrLen

PtrLen

Files CCore/inc/gadget/PtrLen.h CCore/src/gadget/PtrLen.cpp

Range idiom and loops

This file contains main tools to work with ranges of objects. Range is a continuous part of an array. There is an object in the range with the lower address, its address is a base address of the range. "Address" means a pointer of the correspondent type. If this address is p, then the range consists of objects with addresses { p, p+1, ..., p+n-1 }, where n is the number of elements of the range. A range can be empty. In this case it has the base address, which may not refer to an existing object, but the number of elements is 0. A range of length 1 can also be formed from a single object. Finally, empty range may have the base address equals null.

Sometimes Range is considered more generally using idiom of iterator, but we focus on the basic type of ranges, because of its practical importance. Basic ranges are the most efficient and convenient way of representation of a set of similar data, so it must be used whenever possible. The main advantages of basic ranges are: best loops, best sequential and random object access, best cache locality.

The best loop in the programming is the loop n-times. This is because such loops can be unrolled, pipelined and optimized for small loop count. So compilers usually try to reorganize every loop into this kind of loop, if possible.

Range...()

Range...() is four basic functions to work with ranges.


template <class T,class S>
void RangeCopy(T *restrict dst,const S *restrict src,ulen count);

template <class T>
void RangeSet(T *dst,T src,ulen count);

template <class T>
void RangeSet_null(T *dst,ulen count);

template <class T,class S>
ulen RangeMatch(const T *dst,const S *src,ulen count);

RangeCopy() does copy one range to disjoint another range with the possible implicit type conversion. It performs assignments *dst=*src;. Here we use the non-standard extension to C++ — keyword restrict. It tells compiler, that dst and src ranges has no common elements, and enables loop optimization, based on this assumption. restrict, in fact, is a macro in the target PlatformBase.h header, so target can define this word according compiler abilities.

RangeSet() sets elements of the given range to the given value. The src argument is not a reference. It performs assignments *dst=src;.

RangeSet_null() sets elements of the given range to the null value T(). It performs assignments *dst=T();.

The interface and implementation of these functions are selected to be more appropriate, when the type T is a Small Data Type. They do they job in the ascending order, but you should not assume any specific order of execution.

RangeMatch() compares elements of two ranges. It returns the number of the first equals pairs. In other words, if the RangeMatch() returned len, then dst[0]==src[0], ..., dst[len-1]==src[len-1], but len==count or dst[len]!=src[len]. To compare elements the operator != is used.

PtrLen

PtrLen is a range representation type. It contains the range base pointer and the range length.


template <class T>
struct PtrLen
 {
  T *ptr;
  ulen len;

PtrLen is widely used in CCore to pass ranges as arguments and to do native operations with ranges. This type has a bunch of methods to do so.

You can use constructors to create a PtrLen, but more often you should use the function Range().


  // constructors
  
  PtrLen() noexcept : ptr(0),len(0) {}
  
  PtrLen(NothingType) : ptr(0),len(0) {}
  
  PtrLen(T *ptr_,ulen len_) : ptr(ptr_),len(len_) {}

  operator PtrLen<const T>() const { return {ptr,len}; }

You can also cast to the const version.

Cursor methods can be used to iterated through range elements. When you use PtrLen as a cursor, it advances in the ascending order. I.e. after (ptr,len) the (ptr+1,len-1) follows.


  // cursor
  
  ulen operator + () const { return len; } // get length
                                           // range is not empty
  
  bool operator ! () const { return !len;  } // range is empty
  
  T & operator * () const { return *ptr; } // first element
  
  T * operator -> () const { return ptr; } // pointer to the first element
  
  void operator ++ () { ptr++; len--; } // advance 
  
  void operator -- () { ptr--; len++; } // back
  
  PtrLen<T> operator += (ulen delta); // advance on delta
   
  PtrLen<T> takeup(ulen delta); // advance up to delta elements
  
  T & take(); // take first element and advance

  PtrLen<T> getFinal() const { return {ptr+len,0}; }

Here is a typical usage:


for(auto r=Range(....); +r ;++r) r->method();


for(auto r=Range(....); +r ;++r) func(*r);


for(auto r=Range(....); +r ;) func(r.take());


for(auto r=Range(....); r.fit(10) ;)
  {
   auto prefix=(r+=10); // first 10 element subrange

   ....
  }


for(auto r=Range(....);;)
  {
   auto prefix=r.takeup(10); // first <=10 element subrange

   if( !prefix ) break; // r was empty

   ....
  }

operator += advances on the given number of elements and returns the range of these elements. You must be sure the number does not exceed the range length.

takeup() is similar, but it checks the range length and advances up to the given number of elements.

Loop


for(auto r=Range(....); +r ;++r) { .... /* (*r) */ }

,in fact, is the same as

{
 T *ptr=....;
 ulen len=....;

 for(; len ;ptr++,len--) { .... /* (*ptr) */ }
}

So it is count-down n-times loop. There is another popular way of a range iteration:


T *ptr=....;
T *lim=....;

for(; ptr<lim ;ptr++) { .... /* (*ptr) */ }

Compilers, however, often turn this loop into the previous one. It leads to:


T *ptr=....;
T *lim=....;

ulen len=lim-ptr; // there is a hidden division operation here

for(; len ;ptr++,len--) { .... /* (*ptr) */ }

Many embedded CPU has no hardware division. So it leads to an expensive hidden operation, that may kill performance for short tight loops.

Reverse iteration can be done with the method getFinal():


auto r=Range(....); // (ptr,len)

for(auto fin=r.getFinal(); fin.len<r.len ;) 
  {
   --fin;

   func(fin); // iterate (ptr+len-1,1), (ptr+len-2,2), ..., (ptr,len) 
  }

Methods fit() can be used to check the length of the range.


  // fit
   
  bool fit(ulen length) const { return length<=len; }
  
  bool fit(ulen off,ulen length) const { return off<=len && length<=(len-off) ; }

The first checks is there at least the length elements. The second can be used to check if the subrange with the given off(set) from the begin and length can be selected.

You can select a desired part of the range using parts methods:


  // parts
  
  PtrLen<T> prefix(ulen length) const; // prefix of the given length
  
  PtrLen<T> prefix(PtrLen<T> suffix) const; // prefix, complementary to the given suffix
  
  PtrLen<T> suffix(ulen length) const; // suffix of the given length
  
  PtrLen<T> part(ulen off,ulen length) const; // inner part with the given off(set) and length
  
  PtrLen<T> part(ulen off) const; // suffix with the given off(set)
  
  PtrLen<T> inner(ulen off,ulen endoff) const; // inner part with the given off(set) and offset from the end (endoff)

You can pick an element of the range based on its index:

   
  // index access
  
  T & operator [] (ulen index) const;
   
  T & at(ulen index) const; // with the index guard
  
  T & back(ulen index) const; // reverse order, index 1 means the last element

The following set of methods performs basic range operations:


  // methods
  
  template <class S>
  void copyTo(S dst[/* len */]) const;
  
  template <class S>
  void copyFrom(const S src[/* len */]) const;
   
  void copy(const T src[/* len */]) const;
   
  void set(T src) const;
  
  void set_null() const;
   
  template <class S>
  ulen match(const S src[/* len */]) const;
   
  template <class S>
  bool equal(const S src[/* len */]) const;
  
  template <class S>
  bool equal(PtrLen<S> src) const; 

  template <class S>
  bool hasPrefix(PtrLen<S> src) const;
  
  template <class S>
  bool hasSuffix(PtrLen<S> src) const;

The last method is required to support range-based for with the PtrLen:


  // begin()/end() support
  
  bool operator != (PtrLen<T> end) const { return len!=end.len; }

StrLen

StrLen is a derived class from the PtrLen<const char>.


struct StrLen : PtrLen<const char>
 {
  // constructors
  
  StrLen() noexcept {}

  StrLen(NothingType) {}

  // implicit conversions
 
  StrLen(const char *zstr); // zstr!=0 is a pointer to a zero-terminated string

  StrLen(PtrLen<const char> str) : PtrLen<const char>(str) {}
 
  StrLen(PtrLen<char> str) : PtrLen<const char>(str.ptr,str.len) {}

  // given string
  
  StrLen(const char *str,ulen len) : PtrLen<const char>(str,len) {}
 };

StrLen is widely used in CCore to pass string arguments and to work with strings. It is not assumed, that the string is zero-terminated. Instead, we specify the string length.

StrLen has a several implicit constructors, so it can be silently converted from different string-like types.

The implicit encoding string length with zero termination was a language design mistake. To perform string operations you have to know its length. So each time you will have to calculate it. It may lead to quadratic time algorithms instead of linear time if you are not steady enough. But the most goodless thing is you cannot select a substring from a zero-terminated string: it is not a zero-terminated! So you have to allocate memory and do copy! In CCore we avoid it converting zero-terminated strings into StrLen. So we calculate zero-terminated string length only one time and then use it everywhere.

Range() and Range_const()

Range() and Range_const() is a family of Creator functions for the range creation.


template <class T>
PtrLen<T> Range(PtrLen<T> a);
 
template <class T,ulen Len>
PtrLen<T> Range(T (&buf)[Len]);
 
template <class T>
PtrLen<T> Range(T *ptr,ulen len);
 
template <class T>
PtrLen<T> Range(T *ptr,T *lim);

And the correspondent const-variants.


template <class T>
PtrLen<const T> Range_const(PtrLen<T> a);
 
template <class T,ulen Len>
PtrLen<const T> Range_const(T (&buf)[Len]);
 
template <class T>
PtrLen<const T> Range_const(T *ptr,ulen len);
 
template <class T>
PtrLen<const T> Range_const(T *ptr,T *lim);

You can create a range from a PtrLen, from an array, from a base pointer and length and from a base pointer and a limit pointer. Limit is a pointer to the past-the-end element. It may not refer to an actual object. The advantage of using the function Range() is an implicit type inference. Consider:


int buf[10];

auto r=Range(buf);

PtrLen<int> r1(buf,10);

Finally, there exists two generic functions Range() and Range_const():


template <class S>
auto Range(S &&src);
 
template <class S>
auto Range_const(const S &src);

They are applicable to objects of types with the Range Access Interface:


class C
 {
  public:

   // range access
   
   T * getPtr(); // called from Range()
   
   const T * getPtr() const; // called from Range() with constant argument

   const T * getPtr_const() const; // called from Range_const()
   
   ulen getLen() const;
 };

Range and range-based for

CCore defines in its namespace global functions begin() and end(). They are applicable to objects of types with the Range Access Interface and to PtrLen type. So you can use range-based for:


PtrLen<T> r=....;

for(auto obj : r ) { .... }

This loop is essentially the same as:


PtrLen<T> r=....;

for(; +r ;++r) { .... } // *r instead of obj

Or:


DynArray<T> data;

for(auto obj : data ) { .... }

Instead of:


DynArray<T> data;

for(auto r=Range(data); +r ;++r) { .... } // *r instead of obj

Sometimes you need to do something more complicated than merely run through a range. In such cases use range loops instead of range-based for:


DynArray<T> data;

auto r=Range(data);

if( +r )
  {
   .... // handle the first element

   for(++r; +r ;++r)
     {
      .... // do with the rest
     }
  }
else
  {
   .... // empty case
  }

Mutate...()

Mutate...() is a three functions family to cast a pointer or a range to the byte pointer or the byte range.


template <class T>
T * MutatePtr(void *ptr);

template <class T>
T * MutatePtr(const void *ptr);

template <class T,class S>
PtrLen<T> Mutate(PtrLen<S> a);

The destined type T must be (const) char or unsigned char. Mutate() creates a PtrLen over all bytes, occupied by elements of the source range. C++ standard allows aliasing of objects with these types. Mutate...() should be used with caution in special cases for the sake of performance or to do some low-level processing.