Alphanumerical lists are sortable by alphabet and number, obviously, but if you have a list where each entry begins with a different punctuation mark (or any other kind of non-alphanumeric character), is there a similar standardised ordering method for them?
I imagine, for example, that a comma will come before whatever this is: ¦
I just tested an A-Z sort in Google Sheets where each cell was a different punctuation mark, and it seemed to rearrange what I’d entered into some sort of order, but is this order shared universally? Is there a global Unicode-compliant ordering method everyone uses?
Cheers!
There is a Unicode Technical Standard for this, called the Unicode Collation Algorithm. Whether everyone uses it, I can’t say. As it says on the linked page:
Conformance to the Unicode Standard does not imply conformance to any UTS.
So in other words it’s possible to conform to the Unicode Standard without adhering to the Unicode Collation Algorithm.
whatever this is: ¦
That is the pipe symbol, or vertical bar. When it has a gap in the middle it may be known as the broken pipe symbol or broken bar. It’s considered the same symbol with or without the gap. Early terminals displayed it with a gap to make it distinguishable from lower-case L characters.
The vertical bar (pipe) and broken bar are not the same symbol. Wikipedia has a whole section about it (“Solid vertical bar versus broken bar”). Only the pipe character can be used for pipes in Linux/Windows/Mac terminals.
This is the technically correct answer, and like lots of things is waaaaay more complicated than you’d expect.
Ascii numbers?
If your input is limited to ASCII, sure.
But ASCII is only a 7-bit standard, and only supports those characters needed by American English computer users in the 1960s. Lots of characters you might see in “plain text” are not part of ASCII; including all accented characters, all non-Latin alphabets, and many common symbols and punctuation marks including these: £€¢©™°
(Yes, you could get accented characters in the pre-Unicode days using 8-bit “extended ASCII”, e.g. IBM/Windows code pages. However, those are not really ASCII and they will break if the text is interpreted as the wrong code page.)
Unicode collation is the Right Thing today.
That’s the best standard I can think of.