==== Regular Expressions: Additional note ====

A question came up in the Regular Expressions class regarding character
classes and ordering.  It is interesting to note that the language
settings a person has on UNIX and Linux will affect character classes
like ''[a-z]'' and ''[A-Z]''.

You can check your language settings by running a command called
"locale".  If this outputs ''LC_COLLATE=C'', then character classes will
behave as I have suggested in the class.  By order of the ASCII
character table ([[http://www.ascii-code.com/]]).

However, if ''LC_COLLATE'' is set to almost anything else, then the behavior
may change.  So, if your ''LC_COLLATE=en_US.UTF-8'' (the default on Mills
and Farber clusters) your ranges will follow the collation (sorting)
order of your language.  This often means the order of ''aAbBcC ... yYzZ''.

As an example, I have a directory with files starting with every letter
of the alphabet, capital and lowercase.

The following command prints all files starting with ''R'', ''s'', and ''S'':
    ls [R-S]*

Whereas the following command prints only files starting with ''R'' and ''s'':
    ls [R-s]*

In neither case did files starting with a lower-case ''r'' get listed.

This problem is further exacerbated by the fact that the behavior can
vary a little from system to system.  Which leads most of the discussions
on the Internet to the conclusion that character classes like ''[a-z]'' and
''[A-Z]'' should only be used when they would not be affected by these
differences.