CompSci497
Advanced XML Programming with XmlPL
Lecture Notes
Friday, January 19th, 2007
Contents
- A brief history: SGML, HTML, XML & XHTML
- SGML
- HTML
- XML
- XHTML
- Data description languages, what are they good for?
- An introduction to XML.
- Pros
- Cons
- Basic Syntax
- XML Extensions
1) A brief history: SGML, HTML, XML & XHTML
1.1) SGML
- Standard Generalized Markup Language
- Developed in the 60's
- Created for long-term machine readable document storage.
- Complicated syntax
1.2) HTML
- Hyper-Text Markup Language
- First version appeared in 1993
- Originally an application of SGML. Later deviated from
SGML standard.
- HTML is a standard maintained by the W3.org. Specifications
can be found on their website.
1.3) XML
- First standard published in 1996 by the W3.org
- Also an application of SGML
- Designed as a simplified sub-set of SGML
1.4) XHTML
- Published as a recommendation by the W3.org in 2000
- XHTML replaces HTML as the new Web standard
- XHTML is a true application of XML
- XHTML is stricter in what it allows but therefore easier
to parse.
2) Data description languages, what are they good for?
- Sometimes called Metalanguages or Markup languages
- Focus on the description of data rather than on the description
of computation.
- Although DDLs are often terribly verbose for describing
computation, it is easy to "make up" new languages.
- Parsing data is easy. Unlike data structures in other languages
like C or Java
- Data is easily interchangeable between various machines,
operating systems and programming languages
- Data longevity.
- Other languages: JASON, YAML, S-expressions
3) An introduction to XML.
3.1) Pros
- Both human and machine readable
- Fairly simple syntax. (if you stay away from DTDs)
- Easy to parse.
- Widely supported
- Hierarchical
- Easy to describe common data structures such as
lists, trees and graphs.
- Unicode support
3.2) Cons
- Can be verbose
- Some features left over from SGML are overly complicated.
- Can be difficult to map to non-hierarchical relation
structures such as databases.
3.3) Basic Syntax
- <name attribute="value">content</name>
- One root element
- Element children are: element, text, comment, processing
instruction
- <!-- Comment -->
- Short form element: <element/>
- Special characters: &, <, >, ' and "
- Entity references: & (&), <(<), >
(>), etc.
- Numerical character references: © (©)
- w3.org's
specification
3.4) XML Extensions
- Namespaces
- XPath
- XPointer
- XML Schema