C#

.NET : Processes, Threads and Multi-threading


So I’ve been digging around in my evernote to publish something this evening that might be of actual use.  My notes tend to be lots of small tid bits that I’ll tag into evernote whilst I’m working on projects and whilst they’re great on their own as a golden nuggets of information that I’ll always have access to, it fails to be useful for use in a blog article.

I did come across some notes from a couple of years ago around threading, specifically threading in the .NET framework…. So as I’ve little time this week to dedicate to writing some new content, I thought I’d cheat a bit and upload these notes, which should go some way in introducing you to threading and processes on the Windows / .NET platform. Apologies for spelling errors in advance as well as the poor formatting of this article. I’ve literally not got the time to make it look any prettier.

So, what is a windows process?

An application is made up of data and instructions. A process is an instance of a running application. It has it’s own ‘process address space’. So a process is a boundary for threads and data.
A .NET managed process can have a subdivision, called an AppDomain.

Right, so what is a thread?

A thread is an independent path of execution of instructions within a process.
For unmanaged threads (that is, non .NET), threads can access to any part of the process address space.
For managed threads (.NET CLR managed), threads only have access to the AppDomain within the process. Not the entire process. This is more secure.

A program with multiple simultaneous paths of execution (concurrent threads) is said to be ‘multi-threaded’. Imagine some string (a thread of string) that goes in and out of methods (needle heads) from a single starting point (main). That string can break off (split) into another pieces of string thread that goes in and out of other methods (needle heads) at the very same time as it’s sibling thread.

When a process has no more active threads (i.e. they’re all in a dead state because all the instructions within the thread have been processed by the CPU already), the process exits (windows ends it).

So if you think about when you manually end a process via Task Manager. You are forcing the currently executing threads and the scheduled to execute threads into a ‘dead’ state. Thus the process is killed as it has no more code instructions to execute.
 threadendOnce a thread is started, the IsAlive property is set to true. Checking this will confirm whether a thread is active or not.
Each thread created by the CLR is assigned it’s own memory ‘stack’ so that local variables are kept separate

Thread Scheduling

A CPU, although it appears to do process a billion things at the same time can only process a single instruction at a time. The order of which instruction is processed by the CPU is determined by the thread priority. If a thread has a high priority, the CPU will execute the instructions  in sequential order inside that thread before any other thread of a lower priority. This requires that the thread execution is scheduled, according to priority.   If threads have the same priority however then an equal amount of time is dedicated to each (through time slicing of usually 20ms for each thread).  This might leave low priority threads out in the cold however if the CPU is being highly utilized, so to avoid not executing low priority threads, Windows specifically dedicates a slice of time to processing those instructions, but that time is a lot less than given to the higher priority threads. The .NET CLR actually lets the windows operating system thread scheduler take care of managing all the time slicing for threads.

Windows uses pre-emptive scheduling. All that means is that when a thread is scheduled to execute on the CPU then Windows can (if it wants to) unschedule a thread if need be.
Other operating systems may use non pre-emptive scheduling, meaning the OS cannot unschedule a thread if it wants to if the thread has yet not finished.

Thread States

Multi-threading in terms of programming, is the co-ordination of multiple threads within the same application. It is the management of those threads to different thread states.

A thread can be in…
  • Ready state – The thread tells the OS that it is ready to be scheduled. Even if a thread is resumed, it must go to ‘Ready’ state to tell the OS that it is ready to be put in the queue for the CPU.
  • Running state – Is currently using the CPU to execute its instructions.
  • Dead state – The CPU has completed the execution of instructions within the thread.
  • Sleep state – The thread goes to sleep for a period of time. On waking, it is put in Ready state so it can be scheduled for continued execution.
  • Suspended state – Thread has stopped. It can suspend it’s self or can be suspended by another thread. It cannot be resumed by itself. A thread can be suspended indefinatley.
  • Blocked state – The thread is held up by the execution of another thread within the same memory space. Once the blocking thread goes into dead state (completes), the blocked thread will resume.
  • Waiting state – A thread will release its resources and wait to be moved into a ready state.

Why use multiple threads?

Using multiple threads allows your applications to remain responsive to the end user, whilst doing background work. For example, you may have a windows application that requires that the user continue working in the UI whilst an I/O operation is performed in the background (loading data from a network connection into the process address space for example). Using multi-threading also gives you control over what parts of your applications (which threads) get priotity CPU processing. Keeping the user happy whilst performing non critical operations on background threads can make or break an application. These less critical, low priority threads are usually called ‘Background’ or ‘Worker’ threads.

Example

If you created a simple windows form and drag another window over it, your windows form will repaint itself to the windows UI.

If you created a button on that form, then when clicked put the thread to sleep for 5 seconds (5000ms), the windows you are dragging over your form would stay visible on the form, even when you dragged it off of your form. The reason being is that the thread was being held up by the 5 second sleep, so it waiting to repaint itself to the screen until the thread resumed.

Implementing multi-threading, i.e. putting a background thread to sleep and allowing the first thread to repaint the window would keep users happy.

Multi-threading on single core / single processor 

Making your app multi-threaded can affect performance on machines that have a single CPU.  The reason for this is that the more threads you use, the more time slicing the CPU has to perform to ensure all threads get equal time being processed.  The overhead involved in the scheduler switching between multiple threads to allow for processing time slices extends the processing time. There is additional ‘scheduler admin’ involved.

If you have a multi-core system however, let’s say 4 cpu cores this becomes less of a problem, because each thread is processed physically at the same time across the cpu cores. No switching between threads is involved.

multithreading

Using multiple threads makes code a little harder to read and testing / debugging becomes more difficult because threads could be running at the same time, so they’re hard to monitor.

CLR Threads

A thread in the CLR is represented as a System.Threading.Thread object.  When you start an application, the entry point of the application is the start of a single thread that all new applications will have. To start running code on new threads, you must create a new instance of the System.Threading.Thread object, passing in the address of the method that should be the first point of code execution. This in turn tells the CLR to create a new thread within the process space.

To access the properties of the currently executing thread, you can use the ‘CurrentThread’ static property of the thread class: Thread.CurrentThread.property

You can point a starting thread at both a method that has parameters or a parameterless method. Below are examples of how to start these 2 method types:

  • ThreadStart – new Thread(Method);
  • ParameterizedThreadStart – new Thread(Method(object stateArg));
ThreadStart example

Thread backgroundThread = new Thread(ThisMethodWillExecuteOnSecondThread);
            backgroundThread.Name = “A name is useful for identifying thread during debugging!”;
backgroundThread.Priority = ThreadPriority.Lowest;
backgroundThread.Start(); //Thread put in ready state and waits for the scheduler to put it into a running state.

public void ThisMethodWillExecuteOnSecondThread()
{
//Do Something
}

ParameterizedThreadStart example

Thread background = new Thread(ThisMethodWillExecuteOnSecondThread);
           background.Start(“string”);

public void ThisMethodWillExecuteOnSecondThread(object stateArg)
{
     string value = stateArg as string;
//Do Something
}

Thread lifetime

When the thread is started, the life of the thread is dependent on a few things:
  • When the method called at the start point of the thread returns (completes).
  • When the thread object has it’s interrupt or abort methods invoked (which essentially injects an exception into the thread) from another thread that has handled an outside exception (asynchronous exception).
  • When an unhandled exception occurs within the thread (synchronous exception).
Synchonous exception = From within.
ASynchronous exception = From outside.

Thread Shutdown

Whilst threads will end in the scenario’s listed above, you may wish to control when the thread ends and have the parent thread regain control before it leaves the current method.
The example below shows the Main() thread starting off a secondary thread to take care of looping. It uses a volatile field member (value will always be checked for updates) to tell the thread to finish its looping.
Then the main thread tells the secondary thread to rejoin the main threads execution (which blocks the secondary thread).

Background vs Foreground Threads

Foreground Thread – A foreground thread if still running will keep the application alive until the thread ends. A foreground thread has it’s IsBackground property set to false (is default value).
Background Thread – A background thread can be terminated if there are no more foreground threads to execute. They are seen as non important, throw away threads (IsBackground = true);

Thread Pools

Threads in the CLR are a pooled resource. That is, they can be borrowed from a pool of available threads, be used and then return the thread back into the pool.
Threads in the pool automatically have their IsBackground property set to true, meaning that they are not seen as important by the CLR and if a parent thread (if is NOT a background thread) ends, the child will end whether complete or not. Threads in the pool work on a FIFO queue basis. The first available thread added to the queue is the first thread pulled out of the pool for use. When that thread ends execution, it is returned to the pool queue. Thread pool threads are useful for non important background checking / monitoring that do not need to hold up the application.

//Creating a new thread from the pool

Threadpool.QueueUserWorkItem(MethodName, methodArgument); //This will be destroyed if the foreground thread ends.

Advertisements

Entity Framework : Unable to load the specified metadata resource.


So I’ve been working on a code base for a financial system today and have performed a good bit of code re-factoring involving the re-organization of classes into and out of my namespace structure. After deploying my code to the test server, I ran into a problem with a few of my domain model objects that load their state from the database using an entity framework model (edmx). The problem, as per the title of this post threw me a little as I hadn’t changed the properties of my edmx, nor had I changed the connection string.

I should add at this point, that I dynamically pass my entity connection string to my entity model constructor at runtime so that I have more control.  In this instance, the code I’m using to build that connection string had not changed either, to my knowledge. The model names had remained the same.  After a few minutes of head scratching, it became clear that my connection string had become invalid because I had renamed the namespace that housed my entity model’s and had to update my code that took care of building the entity connection string.

Here’s my code. The namespace my models used to be in was called ‘DataModels’. Then as I added some more DAL classes to that namespace, I decided to call it just ‘Data’. That needed updating:

Image

Hopefully my moment of ‘numptiness’ can benefit you if this exception is thrown in your code after a re-factoring exercise. It’s very likely a change in the connection string / connection strings metadata. The 3 resource addresses in the connections metadata relate to the storage model (ssdl), conceptual model (csdl) and mappings (msl) although they are not terribly clear in my image above, so here’s my models metadata in full:

res://*/Data.MetastormModel.csdl|res://*/Data.MetastormModel.ssdl|res://*/Data.MetastormModel.msl

Entity Framework : A short introduction


Introduction

The entity framework is an object relational mapping tool that is well integrated into the .NET development environment. It allows for the conceptual modelling of a physical data store that can be used with your applications. Data sources are represented as objects so its much easier to incorporate data entities into your logic.

The problem with older data access technologies is that they did nothing to bridge the gap between the relational set based format of the data and the strongly typed object oriented logic that needs to access that data. Supporting large data reads for example required considerable code written into the logic of the application to load the data in and persist it back to the data store, this affected performance, caused issues with the data type differences and generally contributed to poor design. The Entity Framework is a bridge between the data tier and the application logic.

Entities are represented as partial classes, this is to allow you to extend your entity by creating another partial class with the same class name
Differences between object orientated logic and relational data stores

Data Type Differences

– Nvarchar(20) has a limited size, but string allows upto 2gb of data
– Sql_varient only loosely maps to .net object
– Binary data in the database doesn’t quite map to a byte array in .net
– Date and time formats can become an issue when mapping between DB and .NET

Relationship Differences

– The database uses primary / foreign key relationships. These relationships are stored in a master database table.
– .NET uses object references (linking to a parent object from the child objects (in a child object collection))

Inheritance Differences

– .NET supports single object inheritance (have a base object with shared features and then then derive from the that base object for creating similar objects)
– Makes code simpler and easier to debug
– Relational databases do not support databases, tables cannot be inherited by other other tables
– Modelling a database that attempts to emulate some kind of table hierarchy (inheritance) can cause issues and introduce very complex entity models in .NET

Identity / Equality Differences

– In the database, the primary key constraint (unique column) is the identifier of a row object
– When comparing objects in .NET that have had the same data loaded from the same table, these objects are NOT equal, even though they hold the same data (only when the variable REFERENCES the same object… equality by reference… is the object EQUAL to another). So loading the same data from the database (the same row) will still not seem equal in the logic.

Handling these differences : Object Relational Mapping

You can write a lot of complex code to try and iron out these object/relational differences when working with data from your logic… or you can use a framework, such as the entity framework to act as an intermediate layer that focuses on resolving the issues highlighted. This intermediatry is called, Object Relational Mapping. That means that the relational database object is MAPPED to an OOP object. That mapping solves the differences highlighted. That means it is a back and forth mapping, meaning your object based changes can be written back to the database.

Microsoft’s answer to handle these differences

Microsoft’s primary ORM tool, now supported within the framework is the Entity Framework. This is Microsofts primary data access strategy, released in .NET Framework 3.5. Microsoft are investing a lot of cash into the development of Entity Framework so it is the recommended data access method.
LINQ to SQL is a similar technology developed by Microsoft and also benefits from the LINQ syntax, but Microsoft decided to take Entity Framework forward and leave LINQ to SQL as is, with no further development planned.

Benefits of using ORM

– More productive. Less complex code writing required to manage the differences between objects and relational data
– Better application design. Adding complex code can complicate the design. Introducing a mapping layer, maintains the n-tier application stack
– Code can be re-used (the model can be saved off as its own project that is used by many applications or parts of the same application).
– Much more maintainable that custom code.
– Lots of different ORM tools (NHibernate was an older ORM tool that could be used with .NET, based on a port from the java Hibernate ORM)

Note that, a small performance hit is required to execute mappings, but the other benefits of using ORM far outway the cons.

Generic Data Objects vs. Entity Objects

ADO.NET has long used generic data objects such as SqlConnection, SqlCommand, SqlDataAdapter, SqlDataReader, DataSet etc. Together these objects allow a means for getting data from a data store and the writing back of data changes to that data store. The DataAdapter object monitors the DataTable for changes and writes those changes back. Whilst these objects do a great job, there are a few issues with using generic data objects like these:

Generic Data Objects

– Tight coupling between application and database because the app code needs to know the structure of the data in the database. So changes to the database can cause problems and code updates may be needed.
– Older DataTables contain columns with loose typing and you would have to convert the column value to a strong type using the Convert() method (although typed data sets resolve this to an extent)
– Extracting related data (data tables in a data set) whilst possible introduced some complexities.
– Unecessary looping of in memory data is needed to find values (unless LINQ to DataSets is used)
– Generally more code needs to be written to extract bits of data at difference times from DataSets.

Entity Objects

– The ORM takes care of the strong typing from relational DB type to object type. This means you don’t have to be concerned with running expensive conversions on the returned data.
– The compiler checks the types at runtime
– More productive to work with as you have to write less code in order to work with the objects. Intellisense provides a much quicker way of accessing the columns in your data objects when writing LINQ for example.
– You are programming against a model, rather than custom code specific to the application
– You’re no longer working directly with the data store in your application logic, just normal objects, therefore decoupling the two.
– Is a layer of data abstraction.

Entity Framework Benefits

So what’s the benefits of using the Entity Framework? – Plenty of reasons.

1) It’s API’s are now fully integrated into Visual Studio so available to add to your projects
2) Uses LINQ as its query language, which again is integrated into the language
3) It’s independent of the data store
4) It abstracts data layer from the logic layer

Note, you might not need to use the Entity Framework for more basic data requirements. Simple reads of data from a few simple tables will not require you to setup entity models. Entity Framework is useful when you are working with more than a few tables within an application.

Entity Data Model

The entity data model, is the structure of the business data, but reshaped into objects usable by the programming language. It describes the structure of the entity data objects that make up the model and also the relationships between this entity objects.

It is made up of 3 main chunks of XML in a single data model file, with the extension .edmx:

– Conceptual model (the object orientated structure)
– Storage model (the physical database storage structure)
– Mapping data (the mappings between the two models above)

Creating a Model

1) Create a VS project targeted at .NET 3.5
2) Add a new folder to that project called ‘DataModels’ (good to keep your models separate from your code)
3) Add a new item to that folder, specifically an ADO.NET Data Entity Model, this will start the model wizard.

First you can select whether you want a blank model or to read from an existing databases schema, that’s what we’ll do. Next, you specify the connection string to use to connect to the data store. This then uses that connection string to build an entity framework connection string (contains the normal ADO.NET connection string, but also some other metadata). The ‘metastorm’ connection string I’m using here is pulled from the App.Config file in my project.

At the bottom of this screen, the MetastormEntities is not just the name of the connection string that will be saved to App.Config, it’s the name of the entity model, so should reflect the name of the database you are pulling your entities from.

Next, the database specified by the connection string is read and its schema is presented to you so you are able to select the database objects you wish to model. Tables, Views or Stored Procedures (incl. Functions):

I select 3 of my local tables, MRPOPS, MRPIPS and MRQUERY and click finish. This creates my .edmx model called MetastormEntities (as specified earlier):

By right clicking each entity you can view the mapping from the physical table to the conceptual entity, including the data type conversions. You can edit the names of the local properties that map to the database column names :

You may notice that I have renamed the entities as follows:

MRPOPS is now called Order (a singular entity) with an EntitySet name of Orders
MRPIPS is now called Invoice with an EntitySet name of Invoices
MRQUERY is now called Query with an EntitySet name of Queries

This will make working with the tables easier in code and leaves the actual table names intact.

Looking into the EDMX Model

As previously mentioned, the .edmx model is actually an XML file that is rendered graphically by visual studio to show the conceptual model. In order to view the 3 parts of the .edmx file (physical storage, conceptual model and mappings) then right click the model and select ‘..open with’ and select the XML Editor. This is a collapsed view of my model file that shows the 3 parts (with the edmx namespace):

The runtime here contains the metadata about the data, including the mappings.
The designer section is used by visual studio in order to display the entities properly graphically.

Looking into the Model Designer cs

The designer.cs file contains the partial classes that represent each entity as well as the model itself:

Using Our MetastormModel.edmx

In our project, add a new class and make sure we add reference to System.Data.Entity in that class. Ensure that the System.Data.Entity.dll has been added to your project references. It should been added automatically but best to check. Because we have our class in the same project as our model, we do not need to copy our App.Config out into another project that might need our model.

We can now instantiate our model and start querying our tables (note the looping in the below example is unnecessary but illustrates how to access each order in the query results:

//Create an instance of the model
MetastormEntities metastorm = new MetastormEntities();

//Query the orders entity for doc generation errors
var SelectedOrders = metastorm.Orders.Where(p => p.txtDocGenResult.StartsWith(“E” ));

foreach (Order order in SelectedOrders)
{
return “Order Folder with ID ” + order.EFOLDERID + ” has had a document failure!” ;
}

Entity Framework Architecture

Metastorm 9 Designer Performance


Ok, so reading back over my last few posts, I appreciate that I’m on a bit of a ‘moaning’ streak about Metastorm 9.  I think before I talk about my experience with the BPM Designer IDE, I need to offer up a disclaimer of sorts.  I appreciate version 9 of the product and the Designer in particular have  had quite an overhaul and a platform transplant (now .NET based).  The way we now design processes does feel more solid and reliable and the use of C# in designing business services / login and the ability to deal with data as objects is far, let me repeat, FAR better than in 7.6.  So in terms of the efforts put into version 9 to release a piece of software that was albeit more developer friendly (less analyst) but was focused on a more maintainable design structure is appreciated.

The area I think that has suffered with the release of 9 is the performance. Especially of the Designer. In short, the Metastorm BPM 9 Designer is SLOW.

Performance Vs Visual Studio

So I’ll reveal my machine stats before I start.  This machine is only a development machine and runs 32-bit Windows 7 because of the problems encountered with Designer on 64-bit.  The processor is a 2.26GHz dual core and the machine runs on 3GB of RAM.  Not that fastest, but as I say, it’s for development only.  I run Visual Studio 2008 alongside the Metastorm designer and regardless of machine performance, Visual Studio, being a far bulkier product is lightning fast compared to the BPM Designer.  I can literally build (syntax check and compile) C# code into a managed module (wrapped by an assembly) in the time it takes me to select one of the input parameters of a custom visual script activity.  That is NOT good.

Load Time

So, the first and most widely reported performance issue with the designer is the start-up.  If you’re running SR3 then this is no longer such an issue compared to the initial release of 9 and even SR1 which had terrible start up times. It is still an issue however.  An obvious factor is the start page, which reads in the homepage from the Metastorm Community Central site (content and images).  This should not be a major issue as reading in a page with minimum graphics on even the most basic broadband connection is quick.  An ASP.NET server will accept a HTML page request, build an executable environment on the server with the page/required objects, then execute the code and respond to the client with the output HTML page in a matter of milliseconds, so I don’t believe this is a major hold up, but it does at add to the load time.  You can however turn this off via the firewall if you’re not interested in what the community is talking about.

As far as tests go, Metastorm Designer alone loads on this development machine in 22 seconds.  This is the time when the designer is in a usable state and the ‘timer’ has stopped spinning.  For visual studio 2008, I’m ready to work in 9 seconds, SQL Server Management Studio in 7 seconds and Excel within 3 seconds.  So whatever you can say about Microsoft products, they get the job done and are quick with it.  I found that running the Designer by clicking on a solution file within windows explorer adds approx another 20 seconds to the loading time, running at an average of 40 seconds across 3 separate tests.  Obviously the loading of the file and the validation of the .NET code adds to overhead here, but this is still uber slow.

General Operational Performance

In terms of general operation, I’ve never used an IDE or any design application (e.g. including any version of the Adode Creative Suite) that has to think as much about every action that takes place.  Whatever design action I seem to perform, there is a short lag in the designer responding to what I’ve done.  This can sometimes only be a second, but I don’t expect to have to wait for everything done, especially if there is a deadline looming for unit testing or client playback.  It feels very much like I am asking Designer to perform a task and it is deciding whether to grant me the honour.   Some examples:

– Sometimes selecting an item from the Toolbox and dragging onto the design surface (forms generally) can be completely ignored because the Designer is busy with something else.

– Creating roles is painfully slow.  Adding a name then tabbing out requires about 3 seconds of thinking time, what is being done here a low level call to the OS? (being sarcastic)

– Real time code validation on server-side scripts seems to take about 5 seconds.  When you change your code because of a syntax error, which you have to rely on your spidey C# senses for instead of visual studios neat ‘Errors List’ box, the Designer takes a good 5 to 8 seconds before re-validating your code and confirming all is well.  Why?

– Loading up the visual script for an on complete event takes about 7 seconds.

The worst for me however, that requires a paragraph all of its own is the absolute drag that is filling in the parameters / arguments of a custom visual script activity that has been promoted from a server side method.  Just after dropping the visual script activity, which now appears in the visual scripts toolbox you can complete the visual script arguments, which essentially make up the method signature.  As i click into each argument to fill in the data to be passed at run time, it literally takes about 4 seconds to place the text selector into the field.  Then I have to click the small expression builder icon, which takes another few seconds.  This stuff should be instant people, I’m not asking for any computation to be done, just Focus().

Overall, I’m happier to be designing Metastorm processes this new way, but it feels like such an uphill struggle to design what I could of designed in 7 in half the time.  I still believe that longer term, you will spend less time maintaining the application because of its greater stability, but trying to get a design off the ground by building the fundamental layers is just tedious.  There is talk that 9.1 will address many of the performance issues, let’s hope so, this application really is the slowest IDE I have ever used.

In terms of productivity, I could go as far as saying that building an ASP.NET application that utilized Windows Workflow would be a quicker development task that would be easier to debug and test.  The question arises – Why invest in a tool like Metastorm when you are not really gaining the benefits of having an integrated UI, Business Logic and Workflow environment?

Thoughts on this topic are welcome. Last rant for a while, honest 😉

Metastorm 9 : Development Productivity


Ok, so I’ve seen several forum posts and had a discussion with a fellow BPM Consultant this week about Metastorm 9 and two main ‘gripes’ that people have. Namely, the reduction in developer productivity against version 7 and the fact that version 9 now really requires you to be a developer and not just a business analyst. I wanted to provide some thoughts on these two:

Slow down in developer productivity

My colleague and several other developers in the community have commented on how many more button pushes are required to put together a functional process in version 9.  Now compared to version 7, yes you do have to click ‘into’ the product a few more times to apply some logic to a process, take the stage and action on start and on complete event handlers.  In version 7, these where two large free text area’s in the ‘do this’ tab of any stage or action.  You can could click on a stage and start typing (for those developer that knew the version 7 function syntax instinctively).  In contrast to this, version 9 requires a click on a stage, then a click on an event handler button, then the selection of a visual script activity (for example, assignment) and then the use of expression builder to make that assignment.  So a couple of extra steps.  I think what is being missed here however is the focus on re-use and overall maintainability.

Any Metastorm developer worth their salt has worked on a large project that has taken more than 6 months to plan, write and test due to either large processes or a number of smaller but more complex processes with many rules and alternative flows.  I have not long finished a 2 year project and if I could show you the amount of onstart and oncomplete code that is used and more importantly repeated, you would tell me that Metastorm was the wrong development environment to use. Whilst I agree, some companies run their entire operational application suite off of custom Metastorm applications, this client being one of them.  Now I can see that with all of their systems that maintainability using version 7 has become a nightmare, there are several full-time developers that are bug fixing as their day jobs. All because of the line by line syntax where no OOP patterns and principles can be used (unless you write the entire thing in Jscript.NET).  Important code support features like code dependency (knowing what may break if you change a variable value), automatic refactoring etc that make life so much easier with many development environments just don’t exist in version 7.  Finally I should mention the use of objects to represent business entities and the ability to loop these.  If you’re working with data, you’re looping it to run some row by row processing which again required extra programming in version 7 to accomplish (e.g. writing a static server-side method that takes a SQL statement and returns a dataset/datatable for enumeration).

My main point here is that yes there are a few more clicks in version 9 in accomplishing some goals, but the ability to reuse server-side code for assignments, create visual toolbox items for common process activities (thus avoiding writing code against a version 7 common stage with a bunch of conditionals attached as to only execute in some situations) and completely re-use visual scripts supports true OOP abstraction. You can edit the smallest part of a process to make a change and have everything else that depends on that code or visual activity be changed too.

When it comes to using version 9, you have to understand good design practices such as cohesion, low coupling, area’s of concern as well as OOP principles and patterns.  You have to look to the long term and understand that the OOP approach to designing processes in version 9 will mean longer term maintainability.  Anyone that has delivered a version 9 project (and I have) will notice that a good design is far more maintainable and will require less man power going forward than a version 7 procedure.  The design stage of a project actually becomes a lot easier too.

What is a few more clicks compared to putting together a solid consistent, sustainable and maintainable design? (and don’t get me started on the ability to debug in 9 and not in 7… how much time are we saving here?) The short term productivity does suffer, but only slightly and if you where to analyse the amount of time spent on a project long term, you might be surprised by the results.

You have to be a developer to create processes in version 9

Version 9 certainly has shifted its focus on to the developer.  You really have to approach the design of a process from a OOP perspective and use many techniques used in the design of a .NET application for example to properly plan and design a Metastorm 9 solution.  If you can not code C# or understand basic .NET concepts such as the heap, the stack, value and reference types, type conversion, type modifiers and commonly used namespaces such as System,System.Text,System.Data and System.IO namespaces then you might as well just close the application and take your lunch break.

In a way, I agree with this point but understand the move.  A custom Metastorm 7 syntax was never going to work long term in a world where open standards are king and compatibility is of major importance.  A Metastorm application at the end of the day is an application, it will execute in a business environment, handle possibly business critical data and need to be up most of the time.  Therefore it needs qualified designers (on paper or by experience, the latter generally being the most important) to design, create and adequately test the application.  There is nothing stopping a business analyst putting process actions and stages together using the ‘classic’ or basic process designer in version 9 and when they are doing so, it should be a must that the analyst collaborates with an experienced process designer who has a native technical knowledge but also understands how the functional requirements will translate into technical requirements, advising the best solutions to some of the most common process problems and bringing some sense to some of the more ‘out there’ ideas.

Some compatibility with a widely used modelling tool such as Visio would certainly reel the business analyst community closer, but in the age of BPM being an umbrella for many other technologies including Enterprise Systems Integration and Enterprise Content Management, the design of BPM systems is now at least 70% a technical field and I think the new version 9 product adequately represents that. If a client is going for a simpler process, then maybe Metastorm 9 is not the tool and they should be opting for a free open source alternative like bonita open solution.  I like to think that Metastorm 9 is not as ‘Fisher Price’ as it once was and is now a mature pure play BPM product.

The C# using statement explained


A colleague emailed me this week enquiring about the usage for the C# using statement for some code refactoring he was working through.  Now, I don’t mean the ‘using’ directive used to import namespace references at the top of your code, I’m talking about the statement used with your methods.  First a bit of history, then theory.  The statement itself has been around since the release of C#.NET, I have an old Microsoft Press .NET text book from 2002 and it mentions using this statement, so its certainly nothing new.  To understand the purpose of the using statement, you need to be familiar with the two methods of unmanaged memory deallocation.

Garbage Collection

The .NET framework is pretty good as looking after your code for you during execution.  When your applications create objects, some memory space (on the managed heap) is put aside and when objects are finished with and references to them on the heap no longer exist, the CLR will give you a hand and destroy these objects for you without you having to worry about memory management unlike the old days.  It’s called Garbage Collection and its been around for years (Java has done it for longer).

Finalize()

When the CLR destroys your objects, it looks for code in a destructor method in your object and then implicitly calls the Finalize() method.  You cannot call the Finalize() method directly, so you should read up on implementing destructor’s if I’m losing you here.  As a developer you release any unmanaged resources in your destructor method so that the garbage collection process can do this as it implicitly calls the Finalize() method.  Think of creating a destructor as a ‘catch all’ backup.  What I mean is, you are essentially relying on the CLR to invoke the Finalize() method and release your unmanaged resources. It’s implicit and you don’t know when it will happen.  The alternative (and best) approach when working with unmanaged resources is to explicitly release the resource there and then once you have finished with it in your code.  This way the deallocation of unmanaged memory is immediate.  This is where the Dispose() method comes into play.

Dispose()

If you think about .NET objects that work with external resources such as databases, files, network streams etc, you may notice that they implement the System.IDisposable interface. This interface implements the Dispose() method, which explicitly deallocates the memory for the unmanaged resource it is designed to work with.  Take the System.Data.SqlClient.SqlConnection object for example, this works with an external resource that is out of the reach of the garbage collection process (and so no Finalize() method is involved).  The IDisposable interface is very basic in that in contains only one void method that takes no arguments… you guessed it, Dispose().

The Using Statement

Microsoft have a great to the point explanation of what the using statement does – ‘Defines a scope, outside of which an object or objects will be disposed’. Its right on the money.  You create objects within the using statement that once the using statement is complete are disposed of.  The using statement implicitly calls the Dispose() method of the objects being used. I guess common sense is telling you here that you can only use the ‘using’ statement with objects that implement IDisposable!

The code example below demonstrates the use of the statement with the SqlConnection object we mentioned above.  Once this method returns the data requested in the passed in SQL query, the using method calls the Dispose() method on the SqlConnection object (I should point out at this point that you pass more than one object to the using statement):

Tip – When the compiler turns the using statement into MSIL it creates Try and Finally blocks for the code.

Metastorm 9 : Server Side Scripting


Metastorm BPM 9, like with earlier versions of the product provides the facility to extend BPM applications and integrate processes with other systems via scripts that execute on the BPM server itself.  Up to the release of version 9, server side scripting could be performed using JScript, VBScript or JScript.NET, the latter of which allowed integration with the .NET framework.  Version 9 now allows the developer to code in C#, the most popular of the .NET languages (C# is used in all examples here).  This article looks to provide a very brief introduction to server side scripting in Metastorm BPM 9 and will cover how to create server side scripts, access your process data and how to execute a server side script from within a process. Knowledge of C# is required.

Object Model

Before delving into the coding, its important to get a grasp on the core Metastorm namespaces and understand what namespaces are imported into each server side script and why. Each new server side script is created with a default set of ‘using’ statements (‘imports’ in JScript.NET) that are required for the script to execute correctly.  To add a server side script to your project, right clicking the project name and select New > Server Script.  A new script file will be added to your project called ServerScript1 and the script will open for editing.  Take a look at the using statements at the top (yours will exclude my comments in green).

Metastorm.Runtime.Models

For the sake of clarity, we are refering to the metastorm project  inventory items as project objects in this article. Not only because ‘object’ is a standard programming label/identifier for ‘things’ you code against (abstract representations of real world entities) but because at runtime, your project objects are seen as just that, .NET objects of classes in the Metastorm.Runtime.Models namespace. The class represents the process definition that you deploy to the BPM server, the objects represent the process instances.

One of the biggest fundamental and most obvious improvements in version 9 over version 7 is the object orientated design approach. Your data, regardless of its source (web services, database, LDAP etc) are represented as named business objects.  Your processes have their own processData and processContext objects. Forms also have local data objects, all of which are available to use in your server side code.  The introduction of looping constructs is no doubt a result of this new object orientated approach to process design.

Before jumping into the details of server side scripting in version 9 and how we can utilize and manipulate our metastorm objects using C#, I want to touch on the object model for a project. Essentially each project represents a namespace inside of the Metastorm.Runtime.Models name space, for example I’m using a project called RH_RootCases so the namespace would be Metastorm.Runtime.Models.RH_RootCases.  This namespace contains the following (on the basis that you have added these to your project) and lists them by name (e.g. Form1,Process1 etc):

Forms

The most important property of the form class is probably the ‘Fields’ collection which can be accessed to return the fields and their values for use in your scripts. The following illustrates a few basic field and form business object related operations:

Connections

The connection represents the named object that connects to your data sources and contains the configuration information required for the connection. The configuration information is dependant on the type of connection you create, e.g. data link properties for database connections, URI and method/parameter specifics for a web services.  The below is a simple example of how to access a database connections properties using the default Metastorm connection object:

Processes

Each of your processes has its own process data object. This object contains the values of your process variables which can be accessed via server side code.  As well as the process data, there is also process context data available which contains values such as the folder ID, folder name, process instance creation date and the user name of the originator.  The below illustrates some simple assignments from data found in the Process1 object (the default name for your first process):

As well as forms, connections and processes, you can access your other Server Side Scripts, which means you can create a very logical server side object model of your own allowing you maintain OOP design principles and to some extent patterns.  One of my own personal problems with Metastorm 7 was its line by line syntax.  Very large procedures would have lots of random Metastorm functions and assignments that were attached to stages and actions and there wasn’t really any true OOP style organization and ability to abstract out changeable code into their own objects.  With 9, this has changed and so via the same Metatorm.Runtime.Models namespace, you can access each of your server side script files by name.  Note however that you can set your own namespaces for server scripts, meaning that they will not be available via Metatorm.Runtime.Models, but via the namespace you set, which may for example be inline with a client standard namespace model, for example ‘company.application.logicalgrouping.scriptname’.  If this is the case, just remember to use this namespace model to access your other server side scripts.

Custom Business Objects can also be accessed by your server scripts.

Getting Access to Objects

Now we know where to find our project objects, we need to ensure that our server side script can access the methods and properties of our project objects.  Like with the old ‘ework’ object (eWork.Engine.ScriptObject) used in v7 which allowed access to integration wizard functions on the server side by passing it in as an argument, we need to do exactly the same thing here.  The business objects you wish to work with inside of a server side method should be passed to the method as arguments.  The type of the object being passed should be expressed, for example :

Now you can use the expression builder to assign the value of the script return.  For the first action in my process, I’ve added an assignment activity called ‘getCase’ which will assign the return value to a process variable called ‘intCaseNo’.  You can see from the below expression builder window that I call the script and then pass an instance of the RootCase object:

In order for the expression builder to be able to see and invoke your methods,  the ‘Promote’ attribute must be used in the script to promote the method being called.  This is accomplished by placing the attribute declaration just above the method name [Promote(PromotionTargets.ExpressionBuilder)].  Our previous script examples illustrate this.  As well as exposing the script methods to the expression builder, you should ensure that the method is static so that no object instantiation is required and the class can just be called directly. Classes are static by default as they have all static members.

So, server side scripting is a relatively straight forward process in Metastorm BPM 9.  A good knowledge of C# syntax is obviously a pre-requisite.

Encoding and Decoding to Base64


I did a bit of work today that included encoding and decoding of a string to Base64. It’s nothing overly complicated for those already familiar with .NET and encoding schemes, but with having not posted for a few days, I thought I’d share.  Base64 is an encoding scheme that looks to represent binary data in a textual format using the standard ASCII character set. Metastorm BPM for example returns a Base64 encoded string when it pulls attachments from its eAttachment table (stored as a SQL Image data type) using the GetAttachment() method.  Anyway, I’m rambling, here’s the code in C#:

.NTOD StringBuilder()


Namespace : System.Text

Called the StringBuffer in Java, this type is a great way to dynamically build up a string without the need to use the + operator which concatenates strings together.  Breaking this down, a String is the in memory name for some text, upto 2,147,483,647 characters to be precise.  Adding strings together with the + operator is perfectly do-able in .NET but is not advised as its bad on performance.  Because strings are immutable (meaning they cannot be changed once created), a new string object has to be created for each concatenation that occurs to represent the resulting string.

String myfirstString = “my name is”;  //This creates a string object on the heap and points to it from the stack

String mySecondString = myfirstString + ” Scott”; //A new string object is created

One of the most common uses of string concatenation for me is the dynamic building of a SQL string that uses both literal text (the SQL query) and references to variables being used in the query.  The best way to build such a string is to use the StringBuilder class.  Ensure you are ‘using’ the System.Text namespace in your project.

The reason the string builder class is so useful is because it dynamically creates and destroys a char array on the fly as you append string values to the object. The object has a property called .Capacity which represents the max size of the objects current char array and by default this is 16 characters.  As you create your StringBuilder object and if you have a rough idea of how long your string is going to be,  you can set this property using the constructor.  When you append additional string values onto the StringBuilder object, internally the object checks what capacity is currently set, if the appended string does not exceed the current capacity of the array, then the current array is updated.  If the appended string does exceed the current array capacity, then the StringBuilder object creates a new array, with a higher capacity (well double the current length), copies your current char array into the new one and then destroys the old.  As with standard arrays, you can use the StringBuilders .Length property to check the current char array length.

String tableFQN = “Database.dbo.Customers”;

StringBuilder sb = new StringBuilder(“SELECT * FROM”); //Creates sb object with 16 character array

sb.Append(tableFQN); //Exceeds the default 16 chars and so a new 32 character array is created and the old destroyed

return sb.ToString()//returns SELECT * FROM Database.dbo.Customers as a System.String object

When you are ready to use the value of the object, you must then call the .ToString() method to convert the char array into a string object.  Just remember that strings are immutable where as the StringBuilder is mutable (can be changed).

Metastorm BPM : Dynamically determine the Metastorm database name


At the moment, both of my Metastorm BPM clients have multiple Metastorm servers running on different physical servers that operate as independent Development, QA (or test) and Live systems.  Now, this isn’t so rare, most companies that implement Metastorm BPM environments have this type of setup.  You will generally also have three Metastorm databases running for these three servers and if you’re like both my current clients, all three of those Metastorm databases run on the same Sql Server Instance and as such are named differently.   If you have for example MetastormDev, MetastormQA, MetastormLive databases setup then accessing the data in these from one single procedure file isn’t a big hassle as the procedure just references a DSN (data source name) that is setup during installation.  This DSN is called Metastorm by default, so provided the procedure references that name in it grids, sql executions etc, then all is well.

When you are creating server side scripts you can still reference this DSN name by using the built in ework.SelectSQL or ework.ExecSQL functions via the script designer. You would create a SQL statement as you normally would in Metastorm Designer and then specify the DSN properties in the second parameter like this :

var sql : String = “SELECT myColumn FROM myTable”

ework.SelectSQL(sql,”DSN=Metastorm;UID=;PWD=”) //Uses windows authentication hence no Sql Server password

The above illustrates a simple select of a single column, but take the scenario of selecting multiple columns of a database that you want to apply a JScript.NET for loop to and loop each record performing logic such as raising a flag or checking a secondary data source.  SelectSQL is no good for this unless you want to start breaking the data down into sub strings and use arrays.

It is far easier to use ADO.NET in your server side script to load the SQL data into a dataset and loop that dataset, referring to the columns by name in your for loop statements.  I always much prefer the use of an in memory dataset for column/row manipulation.  The issue that arises using this method however is that we will need to specify a database connection string to pass to the SqlDataAdapter class constructor when you are filling your data set.  But what is the database name?  Our DSN becomes of little use to us now.

The answer, or at least the method I normally use is to read the Metastorm registry keys for a database value for the server you are running the procedure on.  Specifically we want to grab the local machines sub key value at this location : \\HKEY_LOCAL_Machine\SOFTWARE\ODBC\ODBC.INI\Metastorm\Database.  Here is a JScript.NET method that checks the local servers Metastorm database name (click for full size image):

For those who would want to implement this method in C# (i.e. for supporting utility assemblies that you can place in the Metastorm engines dotnetbin server folder):

You can call these methods from anywhere in your server side code and have a string return naming the database being used by the current Metastorm server.