Adding characters to your code


1st Dec, 2020

Click here to share

A character entity reference is an SGML (Standard Generalized Markup Language) construct that references a character on the document character set. It comprises several sets of characters. These characters are in different forms for different purposes which includes the following

  • Character entity references for ISO-8859-1(latin-1) characters.
  • Character entity references for symbols, mathematical symbols, and Greek letters.
  • Character entity reference for markup-significant and informational characters.


Adding character entity in your website can be very useful to your users for easy understanding and making your webpage more intuitive. You need to use characters from character entity reference when

  1.  Your Editor does not support Unicode character.

‘’Unicode is a character standard where each character uses a unique number between U+0000 and U+10FFFF, Unicode may be 8-bit, 16-bit, or 32-bit.

For example, to see a character Unicode when on a Microsoft world, select the character and press Alt+X and it will display the Unicode for that character.

 The character code for w is 0077.

A common type of Unicode is UTF-8, which utilizes 8-bit character encoding. It is often used in Linux environments, to encode foreign characters so they display properly when outputting to a text file’’

So if your editor does not support the Unicode for the world or character you want to use you can easily write it using the character entity reference like the example below

¾ can be written as ¾

  1. Your keyboard does not support the character you need to type. For example, many keyboards do not have em-dash (-) or the copyright (©) symbol.

You can simply type the following and it will display as expected


Copyright = © 

Emdash = &emdash;


  1. Your editor does not support Unicode (very common some years ago, but probably not today). So if you want to make it explicit in the source code what is happening. For example, the   code is clearer than the corresponding white space character. Another reason to keep   is so that you can display multiple spaces on an HTML page. Typing multiple backspaces always trim to just one backspace when rendered in the HTML DOM.


  1. For characters that are easily understood visually (such as the Chinese, Arabic, Japanese), go ahead and use UTF-8 as using character entity will make the code difficult to understand.


  1. If character set conversion from/to UTF-8 wasn't such a big unreliable mess (you always stumble over some characters and some tools that don't convert properly), standardizing on UTF-8 would be the way to go.
  2. If your pages are correctly encoded in UTF-8 you should have no need for HTML entities, just use the characters you want directly.


From the above, you have seen how it is important to know where to use character entity and UTF-8 in your website. Now as a developer if you have any code that you are hard coding in your website you can easily type in the character entity and it will display properly as the required character on the website.

You are not obliged to have the character entity reference stored in your brain but you can download it from here and reference it as you code.

Also if you are using some of the WYSIWYG editors on your websites, such as CKEDITOR and the likes you will find that these editors do it for you while you are just to type in UTF-8.


Example code.




This will show the character O when you run the code.

Leave a Reply


Total of 0 Comment