B and E Blog

Just a couple of opinionated nerds talking about ColdFusion, PHP, and technology in general.

B and E Blog header image 2

CFThread and dividing up work

April 23rd, 2008 · 2 Comments · ColdFusion, Programming

CFThread is a wonderful addition to ColdFusion 8. It lets you perform parallel actions within your code. However, parallel programming is a complex beast under the best of circumstances.

One of the early things to realize in CF8’s threading support is that it makes a deep copy of the local variables (ala Duplicate()) when you start the thread. In this way you have a lot of thread safety, you can access and change even variables defined outside the thread space, and you really have a copy of that variable local to your thread. You don’t have to worry about other threads changing the value while you’re using it, and you don’t even have to worry about goofing it up for the parent page (the page thread).

Dividing Up the Work
A common use for threading is to divide up a lot of work so that it can be done in parallel. For example, let’s say you have a script which aggregates RSS feeds from 1,000 external sites. All that HTTP stuff is pretty quiet work, you spend a lot of time waiting for responses and the like. It’s a good candidate for parallelizing the work.

With 1,000 requests to make, it’s not a good idea to just create 1,000 threads - you’ll tie up a lot of TCP connections on your server, plus you’ll use up a lot of memory (remember, each thread is going to operate in its own memory space). So let’s decide somewhat arbitrarily that we’re only going to run 10 requests in parallel.

  1. // Let's assume that feedURLs is an array with the URLs of 1,000 RSS feeds we're going to work with.
  2. feedURLs = [...];
  3. // The number of requests to run in parallel.
  4. parallelThreads = 10;

// How many feeds will each thread handle?
eachThreadDoesCount = ceiling(ArrayLen(feedURLs) / parallelThreads);

threadID - 1) * eachThreadDoesCount + 1>
threadID * eachThreadDoesCount, ArrayLen(feedURLs))>

Looks pretty easy, eh? There’s a critical error there though which is not obvious even by Adobe’s documentation. I’ve bolded it for you. This will end up with the earliest URLs getting fetched numerous times, and the later URLs not getting fetched at all.

The reason for this has to do in some way with how ColdFusion initiates its threads under the hood. To me, it looks like when tag is encountered, it actually spawns an initial thread to do the local variable copying. All threads which are started before this initial thread finishes the copying will get an identical set of local variables - this includes the threadID variable used in the loop. I’m not certain if that’s actually what’s going on under the hood, but the behavior is similar as if that is the case. In any event, you cannot rely on variables changed by the parent page (”page thread” by Adobe’s documentation) appearing in your cfthreads, even if that change was made before your specific thread was launched, but after the initial thread was launched.

It seems bizarre, but let me give you a piece of sample code which demonstrates it. This code is non-deterministic for me - that is to say some times I see a “correct” result, but then I immediately refresh it and get a different result.

Here you should see each worker thread having a unique loopID. Instead, for example, my code shows Worker 1 has a loopID of 1, Worker 2 = 2, Worker 3 = 3, Worker 4 = 4, but Workers 5 through 20 have a loopID of 5. If I refresh, it’s different. I’ve seen all 20 workers starting with a loopID of 1, and I’ve seen the first fiew having a loopID of 1, then the next few having a loopID of 5, then the next few having a loopID of 8, etc.

So how do you safely determine which worker you are so you know which of the set of work you should be doing? The answer, as Ray posted in a comment (and contrary to my more elaborate work-around involving locking), is to use the attributes scope:

Basically the idea is you can pass additional custom attributes to and these show up as values in a structure named “attributes” available only within the scope of your thread.

Tags: ·

2 responses so far ↓

  • 1 Raymond // Apr 23, 2008 at 2:58 pm

    I posted this on my blog, but why don’t you just pass the values to the thread via Attributes?

  • 2 eric stevens // Apr 23, 2008 at 3:17 pm

    Thanks for the tip Ray, I guess I’m still uninitiated with cfthread. The Adobe documentation on cfthread doesn’t make this clear, but Ray refers to the ability to pass any additional attributes you want to <cfthread>, which appear within the thread’s scope inside a structure named attributes.

    Thanks for the tip Ray, I’ll update my post, that’s certainly easier!

Leave a Comment