Here's a short one on string literals in C++. Ask yourself: what is their type? It is 'array of n const char', correct! So, we might think:
char* literal = "Hello World!";
would be "invalid"/"illegal" in C++. But you'd be surprised it is not and even Comeau online compiles it successfully without even a warning.
The C++ standard, however, tries to protect you hinting that the above is wrong by stating that it is a deprecated feature in C++ that probably was allowed to keep backward compatibility with C.
Here is what the standard says as part of section [2.13.4/2]:
[quote]
A string literal that does not begin with u, U, or L is an ordinary string literal, also referred to as a narrow string literal. An ordinary string literal has type “array of n const char”, where n is the size of the string as defined below; it has static storage duration (3.7) and is initialized with the given characters.
[/quote]
So, the following would have definitely been invalid in C++:
char* literal = "Hello World!";
"Hello World!" is an array of 13 [spooky :-)] constant characters. 'literal' is the pointer to the first element of the array and since that element is const, the pointer cannot be declared as a non-const char*. The pointer has to be of the type 'pointer to a const char'.
But as mentioned above, to have backward compatibility with C where the above works an implicit conversion is defined for array to pointer conversion where a string literal would be converted to an r-value of type "pointer to a char". The standard mentions this in section [4.2/2]:
[quote]
A string literal (2.13.4) with no prefix, with a u prefix, with a U prefix, or with an L prefix can be converted to an rvalue of type “pointer to char”, “pointer to char16_t”, “pointer to char32_t”, or “pointer to wchar_t”, respectively. In any case, the result is a pointer to the first element of the array. This conversion is considered only when there is an explicit appropriate pointer target type, and not when there is a general need to convert from an lvalue to an rvalue. [ Note: this conversion is deprecated. See Annex D. —end note ]
[/quote]
But, the thing to be happy about is the note above, that is re-iterated in Annexure D section [D.4/1] as:
[quote]
The implicit conversion from const to non-const qualification for string literals (4.2) is deprecated.
[/quote]
So, best is to keep the good habit of declaring the pointer to the literal as a pointer to a const. :-)
[The C++ standard's draft version used for quotes above, has document number : N2315=07-0175 dated 2007-06-25]
Showing posts with label string. Show all posts
Showing posts with label string. Show all posts
Tuesday, July 28, 2009
String literals in C++
Posted by
abnegator
at
7/28/2009 09:29:00 PM
1 comments
Labels:
backward compatibility,
C,
C++,
const,
function pointers,
literals,
rvalue,
standard conversions,
string,
undefined behaviour
Friday, August 15, 2008
Permutations in C++
A small simple sample that illustrates how to get the various permutations of the characters of a string in C++ using std::next_permutation provided under the standard include <algorithm>.
[code]
#include<algorithm>
#include<string>
#include<vector>
#include <iostream>
int main()
{
std::string input="ABC";
std::vector<std::string> perms;
perms.push_back(input);
std::string::iterator itBegin = input.begin();
std::string::iterator itEnd = input.end();
while(std::next_permutation(itBegin, itEnd))
{
perms.push_back(std::string(itBegin, itEnd));
}
std::copy(perms.begin(), perms.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
}
[/code]
[code]
#include<algorithm>
#include<string>
#include<vector>
#include <iostream>
int main()
{
std::string input="ABC";
std::vector<std::string> perms;
perms.push_back(input);
std::string::iterator itBegin = input.begin();
std::string::iterator itEnd = input.end();
while(std::next_permutation(itBegin, itEnd))
{
perms.push_back(std::string(itBegin, itEnd));
}
std::copy(perms.begin(), perms.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
}
[/code]
Posted by
abnegator
at
8/15/2008 10:04:00 PM
1 comments
Labels:
algorithm,
C++,
next_permutation,
ostream_iterator,
permutations,
std::copy,
string,
vector
Wednesday, April 16, 2008
Case insensitive string comparison
While looking up for case insensitive comparison function, I came across this nice article by Matt Austern : Case Insensitive String Comparison). And decided to try to make a sample that does that. Below is the result, am not sure if it is perfect and has no issues but that is the best I could do. Atleast, better than ignoring the locale completely! Yes, ignoring it would work most of the times as it's not needed but just in case, a need came up, what would you do?
I tested the code with VS 2005 (couldn't test it with gcc as I found out that I did not have support for the german locale as I only had the sample strings for that language (out of the above article) and I felt lazy enough to find more sample to test) but if someone finds an issue with some other language words for which your compiler provides locale support, can you please point it out?
Code:
#include<iostream>
#include<locale>
#include<string>
#include<algorithm>
#include<functional>
//used boost::bind due to buggy bind2nd
//#include<tr1/bind.hpp>
#include<boost/bind.hpp>
//using namespace std;
//using namespace std::tr1;
//using namespace std::tr1::placeholders;
using namespace boost;
struct CaseInsensitiveCompare
{
bool operator()(const std::string& lhs, const std::string& rhs)
{
std::string lhs_lower;
std::string rhs_lower;
//std::transform(lhs.begin(), lhs.end(), std::back_inserter(lhs_lower), std::bind2nd(std::ptr_fun(std::tolower<char>), loc));
//std::transform(rhs.begin(), rhs.end(), std::back_inserter(rhs_lower), std::bind2nd(std::ptr_fun(std::tolower<char>), loc));
std::transform(lhs.begin(), lhs.end(), std::back_inserter(lhs_lower), bind(std::tolower<char>, _1, loc));
std::transform(rhs.begin(), rhs.end(), std::back_inserter(rhs_lower), bind(std::tolower<char>, _1, loc));
return lhs_lower < rhs_lower;
}
CaseInsensitiveCompare(const std::locale& loc_): loc(loc_){}
private:
std::locale loc;
};
int main()
{
std::string lhs = "GEW\334RZTRAMINER";
std::string rhs = "gew\374rztraminer";
std::cout << "lhs : " << lhs << std::endl;
std::cout << "rhs : " << rhs << std::endl;
CaseInsensitiveCompare cis((std::locale("German_germany")));
//CaseInsensitiveCompare cis((std::locale()));
std::cout << "compare result : " << cis(lhs,rhs) << std::endl;
}
One obvious improvement/alternative could be not copying the strings and instead doing a per character based tolower/toupper and compare them. This has a 2nd advantage as well that it will break out as soon as a mismatch happens for a character, without the need to convert the whole 2 strings into a common case.
Herb Sutter, in one of his Gotw's, writes about a case insensitive string class but also shows the basic problems that would have. Quite simply put, it doesn't work with iostreams (cout/cerr etc). Here: Strings: A case insensitive string class.
I tested the code with VS 2005 (couldn't test it with gcc as I found out that I did not have support for the german locale as I only had the sample strings for that language (out of the above article) and I felt lazy enough to find more sample to test) but if someone finds an issue with some other language words for which your compiler provides locale support, can you please point it out?
Code:
#include<iostream>
#include<locale>
#include<string>
#include<algorithm>
#include<functional>
//used boost::bind due to buggy bind2nd
//#include<tr1/bind.hpp>
#include<boost/bind.hpp>
//using namespace std;
//using namespace std::tr1;
//using namespace std::tr1::placeholders;
using namespace boost;
struct CaseInsensitiveCompare
{
bool operator()(const std::string& lhs, const std::string& rhs)
{
std::string lhs_lower;
std::string rhs_lower;
//std::transform(lhs.begin(), lhs.end(), std::back_inserter(lhs_lower), std::bind2nd(std::ptr_fun(std::tolower<char>), loc));
//std::transform(rhs.begin(), rhs.end(), std::back_inserter(rhs_lower), std::bind2nd(std::ptr_fun(std::tolower<char>), loc));
std::transform(lhs.begin(), lhs.end(), std::back_inserter(lhs_lower), bind(std::tolower<char>, _1, loc));
std::transform(rhs.begin(), rhs.end(), std::back_inserter(rhs_lower), bind(std::tolower<char>, _1, loc));
return lhs_lower < rhs_lower;
}
CaseInsensitiveCompare(const std::locale& loc_): loc(loc_){}
private:
std::locale loc;
};
int main()
{
std::string lhs = "GEW\334RZTRAMINER";
std::string rhs = "gew\374rztraminer";
std::cout << "lhs : " << lhs << std::endl;
std::cout << "rhs : " << rhs << std::endl;
CaseInsensitiveCompare cis((std::locale("German_germany")));
//CaseInsensitiveCompare cis((std::locale()));
std::cout << "compare result : " << cis(lhs,rhs) << std::endl;
}
One obvious improvement/alternative could be not copying the strings and instead doing a per character based tolower/toupper and compare them. This has a 2nd advantage as well that it will break out as soon as a mismatch happens for a character, without the need to convert the whole 2 strings into a common case.
Herb Sutter, in one of his Gotw's, writes about a case insensitive string class but also shows the basic problems that would have. Quite simply put, it doesn't work with iostreams (cout/cerr etc). Here: Strings: A case insensitive string class.
Posted by
abnegator
at
4/16/2008 07:56:00 PM
0
comments
Labels:
back_inserter,
bind,
bind2nd,
boost,
C++,
case insensitive comparison,
locale,
ptr_fun,
string,
tolower,
toupper,
tr1,
transform
Subscribe to:
Comments (Atom)
