IDevResource.com - What COM is all about

The Developer's Resource & Community Site

COM	XML	ASP	Java & Misc.	NEW: VS.NET
International	This Week	Forums	Author Central	Find a Job

What COM is all about

Level of Difficulty: Beginner
Languages covered: All
Pre-required reading: None

As a software developer you cannot avoid COM these days. It is everywhere. DirectDraw is based on COM; Windows scripting is based on COM; ASP is based on COM and when you insert a Visio drawing into a Word document you are using COM (heck: I am writing this in Word and just dragged a sentence to another place in the document and that action itself uses COM, it is everywhere). You may have heard about a new fangled thing called Windows 2000? Well, if you want to write an application that will be internet ready, using distributed transactions and message queuing, then you'll use COM.

So the chips are down, if you are developing for any of the Windows platforms and you want to get on, you must know about COM. You have to admit it, sometime during your next appraisal you will be asked about the future of software development and Windows 2000 and COM+ will be on everyone's lips, so now is a good opportunity to find out really what those three letters stand for.

OK. COM stands for the Component Object Model. The plus in COM+ means that the version of COM in Windows 2000 will be the newest version available. In fact, people are already calling it COM+ 1.0 expecting COM+ 2.0 to appear sometime in the future - it would be nice if Microsoft rationalised their versioning. You may also have heard about DCOM or Distributed COM. People talk about DCOM objects as if they are something special, something exciting, as if they add something to COM objects. They do not. A DCOM object is a COM object. Period. I will come back to that term later in this article.

You may also have heard about OLE objects and ActiveX objects. They are simply COM objects. The term OLE is not as trendy as it was a few years ago, but basically COM first appeared as OLE 2.0 on 16-bit Windows 3.1 which was used to allow you to take objects like Excel tables and link and embed them in Word documents (and visa versa). Microsoft played around with the term for a while, and at one point even tried to convince people that OLE was not an acronym at all, they said that OLE was a word in its own right, they even took pains to say that it was not just Ole and that it was such an important technology THAT IT SHOULD ONLY BE TALKED ABOUT IN UPPERCASE!

Tosh. Everyone knew that OLE stood for Object Linking and Embedding. These days people generally use the term when talking about the desktop technologies of compound documents (the generic term for documents that can contain objects from other applications) or the OLE controls that you see used on VB forms (the scriptable 'widgets' that are used to show grids or calendars).

What about ActiveX? Well, a few years ago Microsoft suddenly woke up to the fact that the internet was a good thing and so they needed to come up with some gee-whiz term to encompass their internet technologies. The people in shiny suits in the marketing department came up with the term ActiveX. If you are the sort of person that wears a shiny suit with a kipper tie and works in marketing, then go ahead, wave your arms around and talk about ActiveX objects. They are just COM objects and the quiet, knowing developers that work with you will understand that it is just your way of getting attention to yourself.

Wind the clock a bit further forward and you come to MTS components and to the more up to date term of COM+ components. Again, these are just COM objects, albeit COM objects that 'live' in a special environment. As the name suggests, a component that runs under Microsoft Transaction Server can be run under a transaction. This is a great boon for the bankers among you, but it is also vitally important for any distributed application.

A transaction groups together COM objects that are involved to perform some task. These may live in many places on the network, and if one object throws an error you will want to ensure that the other objects in the transaction know about this. (Imagine a relay race where the second runner gives up and goes home; someone has to tell the third and fourth runners otherwise they will remain standing on the track waiting for the baton.) MTS transactions ensure that every work in the transaction is performed correctly, or if just one object reports an error then the entire work in the transaction is undone.

COM+ components take this a step further by allowing the component to have access to other facilities including Microsoft Message Queue Server. But again, the actual object itself is a COM object, you write it the same way as you would write any other COM object - they just have access to different facilities.

So, I have established that in spite of all the different terms that have appeared over the last few years, all of the object models that have appeared from Microsoft have been COM. But what really is COM? In the most simplistic terms COM is merely a code maintenance utility. Windows uses dynamic link libraries (DLLs) to dynamically load code as and when it is needed. This way code can be shared between several applications (they just load the same DLL) and your application can be efficient with memory because when it no longer needs a DLL it can unload it, and Windows will remove it from memory.

Although this sounds easy it is fraught with problems, which it can be summarised to three:

You have to locate and load the DLL
You have to obtain the code in the DLL
You have to unload the DLL

Windows provides a function, called LoadLibrary, to load DLLs. This function is flexible but it is this flexibility that causes the problems. You have to tell LoadLibrary where to find the DLL and you can either pass a full path or just give the name of the DLL. In the second case Windows uses a search algorithm to find the DLL. When it finds a DLL with the name you suggest it will load it.

Generally, you should not hard code a path in your code because you cannot guarantee that the path will exist on machines other than your own. So this means that most developers allowed Windows to locate the DLL. The problem with this was to make sure that your DLL was picked up before any other with the same name. Most developers took one of two approaches. Either they put all of their DLLs in the Windows System folder (along side the system DLLs) or they put them in the application's current folder.

The immediate problem with the first was that the System folder got filled with lots of DLLs many of which were private to specific applications. Often if an application was uninstalled its DLLs would remain and your hard disk got gradually smaller and smaller as more and more DLLs ate it.

More concerning was that if an application needed a specific version of one of the system DLLs then installing it in the System folder meant that it would overwrite an existing DLL, possibly preventing other applications from working. The classic case of this was the 3D effect controls in CTL3D32.dll which appeared to be updated weekly by Microsoft and every time you installed a new application it would install a different version than the one required by your existing, working applications.

The other solution was to put the DLLs in an application's folder. This meant that these DLLs were essentially private to the application and this led to the problem of multiple versions of the same DLL in lots of folders on a disk. Moreover, if these folders were in the search path then this meant that LoadLibrary called by another application could pick up the wrong version.

This problem with Windows DLLs has been often and aptly been described as DLL Hell.

Once you had loaded the correct version of the DLL, your next task would be to obtain the code in the DLL. The code is obtained as exported functions and you obtained these by calling a function called GetProcAddress. The problem with this function was that it was generic, and could return a pointer to any function. However, functions are not generic, different functions take different parameters and numbers of parameters and return different types of data. To get access to a function your code needed a pointer to that specific type of function.

Since GetProcAddress returned an untyped pointer that could be to any function, you had to convert it to one that you could use to call the function you required. C programmers call this casting. This process was tedious, it only worked well if the functions were C functions and were called in a specific way, as a consequence it was error prone. If you got the functions description slightly wrong then your code would come crashing down. However, although it is bad in Win32 , in Windows 3.1 it was even worse, but I won't go into that because I want to forget those bad old days.

DLL functions can be exported out of a DLL using a name, which offers some possibility of uniqueness, or they can also be exported using a number. Whichever is used, there is the possibility of the wrong function being obtained from the DLL and because C-style casting respects no pointer (ask a C programmer about the fun involved in debugging C casts) it means that code can happily call a function pointer wrongly cast to another. This often resulted in catastrophic failure where the problem showed itself, not by a mere error code, but by the entire application dying.

Once you have loaded a DLL and got access to its code then there is one more issue to face. DLLs take up memory and so once you have finished using one you must unload it. This allows the DLL to release any resources it may have loaded and free up the memory the DLL took up. Failing to unload DLLs results in applications using more and more memory.

So how does COM help COM DLLs are called COM servers and before an application can use one the DLL must be registered with the system, identifying the code in the server using unique IDs. This is a once-only registration. After that, applications can use code in the DLL by referring to the unique IDs, it does not use the DLL name nor its path. COM will locate the right DLL using a specified ID and load it for the application; it will go through all the tedious process of getting access to the code in the DLL and it will unload the DLL when you no longer need it. Real dynamic linking without the pain. Wonderful!

Since you register the DLL's location with COM it means that the DLL can be anywhere on the local machine, and since the DLL name is unimportant you can use any name that you like. But this is a rather rosy view, so what happens if one DLL overwrites another one? Well, the code in a DLL knows about the unique IDs that you have registered and when COM loads the DLL it asks the DLL for the required code. If the DLL does not recognise the ID then it returns an error, which COM will pass back to the application. This means that the application will not have the facilities that it wanted and possibly it will signal an error to the user. However, there is no way that incorrect code can be loaded and run by accident, so the possibility of a catastrophic failure is reduced.

When you register a DLL with COM you can do so in such a way that allows the code in the DLL to be used by applications on another machine. People often call this DCOM, but strictly speaking they are not correct. The COM specification has always had the facility for remotely accessing code, however, it was only when the network protocol became available in NT4 was this facility realised.

This protocol is correctly called the DCOM Wire Protocol and, incidentally, it uses Microsoft RPC (Remote Procedure Call). This technology has existed on Windows since the first version of NT, and it has existed on other machines in a form called DCE RPC (Distributed Computing Environment RPC, designed by the Open Group) for even longer. Because it has been around for so long it means that it has been well tested and hence is it is very reliable. Do not believe the doom mongers when they say that DCOM has not been tested, RPC has been tested far more than any of the other distribution technologies currently available.

To register your DLL to allow it to be accessed remotely you need to specify a surrogate process. There are several ways you can do this, but the most convenient is to use MTS on NT4 and Windows9x, or COM+ on Windows2000. As I mentioned earlier, these objects are still COM objects, it is just that MTS (or COM+) wraps the object and provides it with some extra facilities.

For both MTS and COM+ you do not have to worry about the registration required, because both supply an application that will do it for you. Indeed, with a single DLL you can use these applications to specify that the code can be accessed remotely, by other processes on the local machine, or loaded within the process that calls it. The DLL is written in the same way in all cases and the application that uses the code calls it in the same way irrespective of where the code is located. This is called location transparency and is a very powerful feature of COM.

Summary

So, what is COM? COM is a facility to allow you to provide code in a DLL and get the system to manage that DLL. COM will ensure that the DLL is loaded only when needed and will unload it when your application no longer needs it. Further, COM will locate the DLL, on the local machine, or on another machine, and will load it in an appropriate process.

What most people see when they use COM objects is not COM but the facilities provided by the DLL. For example, if you use a calendar control then COM will find it for you and load it. You then have to write values to the control or call methods to get it to work for you. If you find those methods difficult to use, it is not COM's fault because that code is not COM: COM is all about loading DLLs and accessing code rather than the actual implementation of the code.

In this article I have deliberately avoided talking about what COM objects are or how to access them. That is the subject of my next article, Components and Interface Programming.

What do you think of this article?

Have your say about the article. You can make your point about the article by mailing [email protected]. Richard will be checking out the discussion and responding to you points.

You can also write a review. We will publish the best ones here on this article. Send your review to [email protected].