IDevResource.com - XML Channel - What are Web Services?

The Developer's Resource & Community Site

COM	XML	ASP	Java & Misc.	NEW: VS.NET
International	This Week	Forums	Author Central	Find a Job

What are Web Services?

Author: Richard Anderson ([email protected])
Date Submitted: February 25th 2000
Level of Difficulty: Beginner
Subjects Covered: XML, HTTP, HTML, COM, CORBA
Pre-required Reading: None

Introduction

A lot of people are starting to getting interested in XML based RPC protocols. Whilst these are great techologies, people sometimes don't appreciate the basic principles behind them. In this article I attempt to show how simple the concepts are, and why the whole idea of 'web services' are such a good idea.

The 5th Generation Web

The web as we know it today is mostly interactive. We browse to a web site, find the information we are looking for, and typically either remember it or print it out. Whilst the web is useful in this respect, it is not really reaching its full potential.

Once I've found the information I'm looking for on the web, it isn't easy for me to save that information and manipulate the specific elements I'm interested in. For example, Amazon.com list most of the Wrox Press books that I have co-authored. As an author I like to go to Wrox and see how the books I've been involved with are selling. I also like to go and check out how my friends books are selling too. To do this I have to manually go to each page and view the ranking and reviews. This is a real pain as I don't care about all the flashy web graphics and plugs for other books, I just want to see the two or three elements of information I'm interested in. In SQL terms, I'd like to write something like this:

Select ranking, reviews FROM WWW.AMAZON.COM WHERE PUBLISHER = 'Wrox Press' and Author = 'ME'

Now as it happens, an author friend of mine wrote a tool for parsing each of the Amazon.com book pages which breaks out the ranking/reviews information by using regular expressions. Whilst not overly difficult to do once you've figured out the structure of the HTML (assuming you understand regular expressions too), it's a real hassle to do, and breaks as soon as Amazon change their web design. It would be much better for Amazon to expose functions like GetAllRankingForBookXYZ() and let me access that function and the returned data. I'd typically want to import the information into a database, or display it on my web site.

Sticking with the Amazon.com example, given they are already presenting the information in a HTML page, it's safe to assume the basic data access infrastructure is there, so all we need is for Amazon to provide us with a way of accesses such data without all the HTML noise. Before running away and thinking DCOM or CORBA could solve this problem (assuming Amazon know what they are), we have to consider a few obvious traits of a typical web client.

Firstly, we have to remember that as a client I could be running any operating system, and conversly that Amazon could of course be running any server platform. Secondly, the application I use to access the data if it existed could be written in any number of programming langauges. We therefore need to ensure the communication protocol used between the two sides is platform neutral and can work in any langauge. COM and CORBA (CORBA is in simple terms a unix version of COM/DCOM) won't cut it, as they don't work well together. Worse still they are both complicated, especially so when firewalls and multi langauges are involved. Both COM and CORBA can work with firewalls, but most people never get that far.

Another obvious requirement is that we need to ensure that whatever we do is fairly secure. Amazon obviously don't want to expose functions for deleting or updating book information to Joe Public. If we stick to using HTTP as the transport protocol, we can safely assume that what ever security they use for HTML pages would suffice for anything else we decided to transport over HTTP. Finally, and probably most importantly, we don't want lots and lots of committees to agree on new standards or extensions to protocol like HTTP to implement the functions we need. That approach would be too slow and painful, espcially if you consider how many clients and servers there are today on the net!

As it turns out, the solution we need for Amazon to expose the information we want, in a fashion that can be easily consumed is very simple: XML. To be more precise, what we need are XML pages, or XML over HTTP.

When you surf to a URL what is typically returned is a HTML document. However, a web server can equally well return XML. So if I typed in the following URL:

https://www.amazon.com/querybooks.asp?publisher=wrox&author=me


<Books>
  <Book>
    <Title>ProXML</Title>
    <Ranking>81</Ranking>
  </Book>
  ..
</Books>

The XML output can be generated just like any other HTML page using dynamic pages, maybe by using ASP and ADO, and possibly a few XML tools like MSXML. As this solution uses existing techologies like HTTP and XML, all of the existing web infrastructure will work as is, as all we have done is returned an XML document rather than HTML. That approach doesn't break anything and doesn't pose any great security risks. As XML is easy to parse, I can now easily write an application to load this file and process it. The returned XML files using this solution don't contain any interactive or visual elements such as graphics or hyperlinks to other pages , so the format doesn't need to change if Amazon decide to redesign their site. XML is of course extensible so Amazon can happily add new elements without breaking my application.

What we've achieved by using XML over HTTP is the creation of a web service. Using this simple technique sites like Amazon can expose all sorts of function to clients. Further more, other sites can easily consume this data and expose it on their sites. For example, Wrox could automatically display the Amazon ranking and reviews of their site. For this solution to work, all we need is a way for Amazon to describe the web services they expose, and the format of the XML documents we can expect to receive as output. The former could be done using plain old HTML, the later using XML schema or plain old HTML.

What I've defined as a web service here is what I call a first generation web service. XML is returned from the server to client, but the client doesn't use XML to talk to the server. In a 2nd generation web service, XML is used by the client to the talk to the server. This gives the client more flexbility because the input data passed to the web service can be arbitrarily complex given XML is heirarical. By using XML as the request format, we can now consider more advanced web services, potentially object based. This 2nd generation web service is the arena in which SOAP (Simple Object Access Protocol) lives.

SOAP describes the format of an XML payload passed over HTTP between two XML endpoints. An XML request is sent, and XML response is sent. Effectivly SOAP is an XML Remote Protocol Call protocol, and indeed has roots in XML-RPC (see https://www.xmlrpc.com/ ).

To Be Continued

What do you think of this article?

You can also write a review. We will publish the best ones here on this article. Send your review to [email protected].