Archive for April, 2008

CFThread and dividing up work

CFThread is a wonderful addition to ColdFusion 8. It lets you perform parallel actions within your code. However, parallel programming is a complex beast under the best of circumstances.

One of the early things to realize in CF8’s threading support is that it makes a deep copy of the local variables (ala Duplicate()) when you start the thread. In this way you have a lot of thread safety, you can access and change even variables defined outside the thread space, and you really have a copy of that variable local to your thread. You don’t have to worry about other threads changing the value while you’re using it, and you don’t even have to worry about goofing it up for the parent page (the page thread).

Dividing Up the Work
A common use for threading is to divide up a lot of work so that it can be done in parallel. For example, let’s say you have a script which aggregates RSS feeds from 1,000 external sites. All that HTTP stuff is pretty quiet work, you spend a lot of time waiting for responses and the like. It’s a good candidate for parallelizing the work.

With 1,000 requests to make, it’s not a good idea to just create 1,000 threads – you’ll tie up a lot of TCP connections on your server, plus you’ll use up a lot of memory (remember, each thread is going to operate in its own memory space). So let’s decide somewhat arbitrarily that we’re only going to run 10 requests in parallel.

// How many feeds will each thread handle?
eachThreadDoesCount = ceiling(ArrayLen(feedURLs) / parallelThreads);

threadID – 1) * eachThreadDoesCount + 1>
threadID * eachThreadDoesCount, ArrayLen(feedURLs))>

Looks pretty easy, eh? There’s a critical error there though which is not obvious even by Adobe’s documentation. I’ve bolded it for you. This will end up with the earliest URLs getting fetched numerous times, and the later URLs not getting fetched at all.

The reason for this has to do in some way with how ColdFusion initiates its threads under the hood. To me, it looks like when tag is encountered, it actually spawns an initial thread to do the local variable copying. All threads which are started before this initial thread finishes the copying will get an identical set of local variables – this includes the threadID variable used in the loop. I’m not certain if that’s actually what’s going on under the hood, but the behavior is similar as if that is the case. In any event, you cannot rely on variables changed by the parent page (”page thread” by Adobe’s documentation) appearing in your cfthreads, even if that change was made before your specific thread was launched, but after the initial thread was launched.

It seems bizarre, but let me give you a piece of sample code which demonstrates it. This code is non-deterministic for me – that is to say some times I see a “correct” result, but then I immediately refresh it and get a different result.


Here you should see each worker thread having a unique loopID. Instead, for example, my code shows Worker 1 has a loopID of 1, Worker 2 = 2, Worker 3 = 3, Worker 4 = 4, but Workers 5 through 20 have a loopID of 5. If I refresh, it’s different. I’ve seen all 20 workers starting with a loopID of 1, and I’ve seen the first fiew having a loopID of 1, then the next few having a loopID of 5, then the next few having a loopID of 8, etc.

So how do you safely determine which worker you are so you know which of the set of work you should be doing? The answer, as Ray posted in a comment (and contrary to my more elaborate work-around involving locking), is to use the attributes scope:


Basically the idea is you can pass additional custom attributes to and these show up as values in a structure named “attributes” available only within the scope of your thread.

,

2 Comments

ColdFusion 8.0.1 – Nested Array/Struct Shorthand

As you probably know, ColdFusion 8 gave us a long-needed shorthand for creating arrays and structures:


Unfortunately you couldn’t nest those constructs. With the 8.0.1 updater though, you now can:

This is fantastic when you’re trying to make configurable code – a chunk of code whose basic function is tweakable by updating a few settings variables at the top instead of hard-coding them in throughout the code (for example – URLs, filesystem paths, data sources, etc).

No Comments

Follow up to Real Time Command Execution Feedback Post

With ColdFuison 8.0.1 Adobe has introduced errorVariable and/or errorFile to the attributes of the tag. You can only use one of the two in the same tag.

This will give some insight once the tag has completed execution if there is an error. Before there was no way for CF to report errors to a file or to the browser.

No Comments

Speed Limiter

Once in a while you need to simulate what a user experience will be at a certain connection speed. I’ve added up data sizes and times and did some math to figure out what the numbers will be, but this ignores tcp and http header overhead.

Fortunately there exists a simple tool for Windows to let you simulate any connection speed to any TCP port from any TCP client. Notably, web browsers are TCP clients which connect to TCP ports.


The tool is called Speed Limiter, and it’s freeware (an English download page can be found here). It includes some special functions related to HTTP, but it can be used for any protocol.

,

No Comments

ColdFusion Preserve POST variable name case

ColdFusion provides access to POSTed data via the FORM structure. Unfortunately ColdFusion always upper-cases the names of these variables. Recently a chunk of code I was working on needed to know the original case of these keys. At first I worked on passing a hidden form field with the original field name text, but this bothered me as way too much of a work-around.

This chunk of code will give you a structure called FormContent where the case of the field names is preserved.

Unfortunately this code will not distinguish between two fields with the same name but different case. To do that, you’ll want to use a case-sensitive StructNew() alternative. I recommend CreateObject(”java”, “java.util.LinkedHashMap”).init(). This has the added value of preserving the order that the fields appeared in the calling form when you iterate over it for output. It has the disadvantage that you’ll have to properly match case of the keys when you retrieve them in your code.

4 Comments

ColdFusion REMatchAll

This ColdFusion method offers functionality similar to PHP’s preg_match_all function. It searches for the supplied regular expression in the supplied text. The return value is an array with one entry for each time the pattern matches the string. The array entries are structs with a numbered element for each parenthesized sub-expression within the match, and a zero-entry for the whole match.

It’s probably easier to see example data.


Screenshot of the dump result from a ReMatchAll call

As you can see, it returns every match, position, and full text of the match, as well as each parenthesized subexpression. The example pattern basically matches {foo|bar|baz}, {foo|bar}, or {foo}, and returns the alphabetical sub-components as the sub-expressions.

Here is the code.

, ,

No Comments