Posts Tagged ColdFusion

CF.Objective() So Far

So far I’ve been to two really good sessions at CF.Objective(). The first I was dubious about, “Indiana Jones and the Server of Doom,” but I actually learned some things about low-level memory management within ColdFusion, and I can definitely say I’ve got something new to check out on production boxes when I get back to the office. I’ll post more on that later.

Also the session from Adobe where they were highlighting server administration in the upcoming ColdFusion 9 was fantastic. It’s almost like they took a list of the things which cost the most time when maintaining servers today, and created all-new functionality to make managing this much easier. I’m very excited by this, things that are being manually configured on many, many instances today will be able to be rapidly and widely deployed when ColdFusion 9 (Centaur) comes out.

As with all conferences there are going to be times when there’s not much directed at your skill level. This is because there are people of all walks here, from new developers to seasoned experts. I fall somewhere in between, and so I’m doing pretty well on finding good sessions overall.

Next up: Advanced Subversion Techniques (”Subversion for Smarties”).  Here’s hoping it’s not review =)

, ,

2 Comments

ColdFusion Ordered Struct

As most readers probably already know, in ColdFusion, structs are associatively keyed storage structures similar to an array but where you get to use a string to key an entry rather than only a sequential number.

PHP only has array() which acts both like ColdFusion’s array and struct both. You can numerically key arrays or associatively key them, or both. One of the reasons it can get away with this is that it preserves the insert order. So if you do:


The output will be the values in the same order they were put into the structure. If you do the same thing in ColdFusion, you’ll get it back in a seemingly random order (or depending on the version some times you’ll get back in alphabetical order):

Instead if you need to preserve insert order, you can use a similar Java object from ColdFusion:

You can treat this like a struct in every way, including <cfdump>ing it (though cfdump will not show you the insert order for some reason). As you iterate over it, the contents will always come back in the same order they were inserted.

It’s important to note that LinkedHashMap keys are also case-sensitive while ColdFusion Struct keys are case insensitive. This may cause undesired results as you might have two keys that you believe are the same but differ in case; this may cause collisions when working with other objects that are not case sensitive.

, ,

2 Comments

CF.Objective() Here I Come

Heading off to Minneapolis tomorrow morning for CF.Objective().  This is the first conference I’ve been to in a while.  Hoping we get to hear some about the next version of ColdFusion and the Bolt IDE (I’ve played with it some; I can’t say a lot, but I can say that it’s got some fanstastic features).

No Comments

ColdFusion Including Sub-Applications

Ben Nadel has an interesting question on his blog about including sub-applications from within an existing CF application, and having the relevant sub-level Application.cfc fire off.

This is doable in a fairly simple manner but which relies on a barely-documented feature of ColdFusion, and the fact that the sub-level Application.cfc fires is completely undocumented, and may even be unintentional!

Here’s how you can do it, but because we’re wandering into a pretty hazy gray area here, I wouldn’t go using this unless you don’t have much other choice.

Application.cfc:


Then you can do the below and if it has an Application.cfc, that application.cfc will be invoked.

The caveat though is that CGI scope will still contain the variables from the source page – for example, CGI.SCRIPT_NAME will still be the script name in the URL. As a result, context-sensitive functions like ExpandPath() will operate relative to the root file being called – meaning you might not get the results you’re expecting.

Also the code above will only work 1 sub-application level deep; you’d have to tweak it if you wanted a sub-application within a sub-application, but by that point, zounds, what are you doing man?!?

The good news is that even if the sub-application executes a <cfabort>, execution will return to the calling page, so that sub-app can’t abort your own page.

, ,

2 Comments

Relative CFLoop Performance for Various Loop Structures

Introduction

Jim over at Ben Nadel’s blog made the assertion that looping a list is faster than looping a struct.  It’s an interesting assertion that looping a list would be faster than looping an array.  I did a test of my own to find out.

Setup

Starting with objects with 1 entry populated, I increased the number of populated entries by 1,000 until I reached 100,001 entries in each of an Array, a List, and a Struct.

I also compared the following syntaxes:
<cfloop array=”myArray” index=”x”>
<cfloop from=”1″ to=”#ArrayLen(myArray)#” index=”x”>
<cfloop collection=”#myStruct#” item=”x”>
<cfloop from=”1″ to=”#StructCount(myStruct)#” index=”x”>
<cfloop list=”myList” index=”x”>
<cfloop from=”1″ to=”#ListLen(myList)#” index=”x”>

For each type of test, I did a <cfsavecontent> to cfoutput the relevant entries from each object without causing a ton of data to be sent back to the browser.  The actual <cfloop> and output was tight (all on one line to minimize the total content going to cfsavecontent).

I ran each incremental test 10 times to get an average duration.  I did this because my in my earliest tests it was obvious that garbage collection was causing significant variance (shorter tests often took longer than their longer brethren).  Even still it’s obvious garbage collection played a bigger factor in efficiency than the actual test results, making them somewhat difficult to decipher meaningfully.

Execution

Early on, I dropped the final test (which involved ListGetAt() for the output portion) since while the others were still taking a few milliseconds, this test was already taking multiple seconds.

It should be no surprise that 1 entry of each took no measurable time (getTickCount() had not incremented).

At 10,001 entries, <cfloop array> and <cfloop list> took 5 and 4 milliseconds respectively, but at 11,001 (the next level up), it was 3 and 7 milliseconds respectively – neck and neck, looking in favor of <cfloop array>.  <cfloop ArrayLen()> was taking 9 and 6 seconds, <cfloop collection> 21 and 17 seconds, and <cfloop StructCount()> waas taking 21 and 14.  At this level we could say <cfloop array> and <cfloop list> were close, while the others were not.

At 50,001 entries, <cfloop array> and <cfloop list> are still neck and neck.  They reported identical times of 11.00 milliseconds.  <cfloop ArrayLen()> weighed in at 56.10ms, <cfloop collection> at 126.2ms, and <cfloop StructCount()> at 91.2ms.

By the final test of 100,001 entries, <cfloop array> and <cfloop list> had finally started to show a real difference.  <cfloop array> pulled ahead as the clear winner, with at least a 10ms margin on all but one of the the 10 tests leading up to 100,001.

Cfloop Performance Comparison
(Click for full version)

So what causes such a performance difference?

<cfloop ListLen()> was the clear loser by a wide margin.  ListGetAt() is horribly inefficient.  Each time you call it, it has to consume the string from the start, counting delimiters until it reaches position-1, then continue consuming until it reaches position.  This is no surprise at all.

<cfloop collection> was the clear next-to-last-place loser.  The most likely reason for this to me is that internally it needs to loop over some other type of object (in this case, most likely an array of struct keys), then needs to do a lookup within the actual structure itself (a HashMap of some form).

<cfloop StructCount()> came next.  This was a bit of an non-real world scenario.  We didn’t have to loop over a collection of struct keys, we knew what the struct keys were and were able to perform the loop itself in a faster fashion than looking up keys.  But to output the value, we still needed to do HashMap lookups on the keys themselves.

<cfloop ArrayLen()> won the bronze.  Seems to me that this is because we have to do a positional lookup to output the value for each entry.  That shouldn’t be huge because internally this should be represented as a vector of pointers to the actual entries, and since each vector has a fixed memory size, we take ArrayMemoryStartPoint + position * sizeof(vector).  Well, would be nice if that’s the case, but it’s not quite that simple since these arrays are dynamically sized.  We actually have to do a bit of work with a table of memory references, blah blah blah.  Point is we have to do a lookup (albeit cheaper than a struct lookup) to find the correct entry.

<cfloop list> took the silver.  This makes sense, the iterator involved in the cfloop can remember where it left off, and each time through the loop only has to consume up to the next delimiter.  It knows what the index of the previous delimiter is, and so the only lookup it has to do at all is collecting the characters from offset A to offset B (which it could actually have kept track of while it was consuming).

<cfloop array> took the gold, but not by a large margin.  This also makes sense because it has the same sort of performance gain that <cfloop list> has (already knowing its current offset for the next time through the loop), and all it has to do is return a pointer to the next element in the vector.  The vector iterator handles this stuff at a lower level and is able to make the best use of its own internal structures to make this as fast as possible.

Conclusion

<cfloop array> and <cfloop list> are pretty interchangeable even though the former wins the contest.

All of this being said, we are still looking at loop overhead of less than 300 milliseconds even for the slowest (other than <cfloop ListLen()>) loop type for 100,000 records.  This is incredibly trivial overhead for that amount of data.  So in reality which type you use probably depends more on what structures you’re already using – trying to convert to a different data container for your loop will probably cost you as many or more cycles than using the native loop type associated with wherever your data currently is.

, , , ,

6 Comments

BOM – Is it part of the data?

This is a post in response to a comment at Ben Nadel’s blog by PaulH which I think is an interesting and important discussion, but sufficiently off-topic to the blog entry at hand that I didn’t want to completely derail the on-topic discussion.

Whereas initially BOM (Byte Order Marker U+FEFF) was intended to indicate the order of the bytes when dealing with systems which may have had little-endian or big-endian CPU’s, and so written characters in a byte order for which they are most suited while potentially interoperating with systems with opposite endianness. Their use in recent years has mostly been as a hint to byte stream -> character array string parsers as to the encoding of the string. For example, you can identify UTF-8 encoded files with a leading BOM because they will start with 0xEF 0xBB 0xBF (Unicode U+FEFF). UTF-16BE will start with 0xFE 0xFF, UTF-16LE will start with 0xFF 0xEF, and so on.

At a conceptual sense, byte order markers are not considered to be part of the data. In fact, U+FEFF is a Zero Width Non-Breaking Space, and is considered obviated within data because of its use as a BOM. You’re supposed to use U+2060 WORD JOINER instead of U+FEFF ZWNBSP. The fact that U+FEFF is a zero-width character is part of why it was chosen as BOM – if a system reads the encoding correctly but doesn’t know how to deal with BOM, it will be interpreted instead as an invisible character – exactly what we’re seeing with CFHTTP.

The problem in this case is that no character, not even zero-width characters are permitted before the processing instruction in an XML document. ColdFusion’s cfhttp function preserves the BOM as part of the data, while it’s xmlParse() function fails to handle it correctly. This is an inconsistency, and I suspect you may be able to start a holy war over which feature has the bug.

I Googled around for a while to see if I could find a source that said definitively whether a BOM is considered part of the Unicode data, or simply a hint which is intended to be dropped as part of the Unicode decoding (eg, we convert multiple bytes into single characters in the case of characters greater than U+00EF, likewise we consume the hint as a means of informing us how the file is encoded, and nothing else). This latter case has always been my understanding of it, and indeed this behavior is reinforced by many systems, including ColdFusion itself in some arenas (eg, reading a file with cffile that contains a leading BOM – the BOM will be discarded). Unfortunately other systems do seem to retain BOM, but it’s impossible to say for those systems whether this was a design decision or failure to address BOM at all as they would look the same to an outside observer.

I couldn’t find much in the way of a definitive statement for or against BOM being retained as part of the character array. The closest I could find is something you alluded to – XML 1.0 specifies that BOM is not considered to be part of the data, and should be used only to identify the endianness of the data being passed to it, and otherwise ignored. That wording is a bit ambiguous since in its original context, it’s talking about how to handle BOM at the start of a byte stream.

As to the applicability of this part of the XML standard to ColdFusion’s XML parser – ColdFusion’s XML parser isn’t dealing with a byte stream, it’s dealing with a character array. By the point we call xmlParse(), we have gotten past our need for the BOM (if we’ve already parsed the byte stream as characters, BOM can no longer affect what we do with those characters), so the XML standard on dealing with BOM no longer applies.

So this all comes down to: Is BOM part of the data or just metadata? Conceptually it’s part of the metadata, whereas the question is should it be preserved in the character array. I land firmly in the camp of it being purely metadata, and with it being desirable to discard it as part of character parsing. ColdFusion treats it as metadata in some instances and as part of the data in other instances, this inconsistency is where we see the original error.

Some software even provides an option as to whether or not BOM should be preserved as part of the data: http://www.webhostingsearch.com/blogs/richard/bom-byte-order-mark-in-biztalk-output/

In any event, CF should be consistent with how it handles BOM, and if it considers it as part of the data, then its string functions should consistently handle it as such (ie, xmlParse() should ignore it). If it considers it to be metadata, then it should always be discarded when parsing the byte stream into a character array. The fact that in some cases it discards BOM when parsing characters says to me that the design decision by Macrodobeia was to discard it when parsing characters, since someone had to have written code to this effect, but that it wasn’t applied consistently throughout.

, , ,

No Comments

CFThread and dividing up work

CFThread is a wonderful addition to ColdFusion 8. It lets you perform parallel actions within your code. However, parallel programming is a complex beast under the best of circumstances.

One of the early things to realize in CF8’s threading support is that it makes a deep copy of the local variables (ala Duplicate()) when you start the thread. In this way you have a lot of thread safety, you can access and change even variables defined outside the thread space, and you really have a copy of that variable local to your thread. You don’t have to worry about other threads changing the value while you’re using it, and you don’t even have to worry about goofing it up for the parent page (the page thread).

Dividing Up the Work
A common use for threading is to divide up a lot of work so that it can be done in parallel. For example, let’s say you have a script which aggregates RSS feeds from 1,000 external sites. All that HTTP stuff is pretty quiet work, you spend a lot of time waiting for responses and the like. It’s a good candidate for parallelizing the work.

With 1,000 requests to make, it’s not a good idea to just create 1,000 threads – you’ll tie up a lot of TCP connections on your server, plus you’ll use up a lot of memory (remember, each thread is going to operate in its own memory space). So let’s decide somewhat arbitrarily that we’re only going to run 10 requests in parallel.

// How many feeds will each thread handle?
eachThreadDoesCount = ceiling(ArrayLen(feedURLs) / parallelThreads);

threadID – 1) * eachThreadDoesCount + 1>
threadID * eachThreadDoesCount, ArrayLen(feedURLs))>

Looks pretty easy, eh? There’s a critical error there though which is not obvious even by Adobe’s documentation. I’ve bolded it for you. This will end up with the earliest URLs getting fetched numerous times, and the later URLs not getting fetched at all.

The reason for this has to do in some way with how ColdFusion initiates its threads under the hood. To me, it looks like when tag is encountered, it actually spawns an initial thread to do the local variable copying. All threads which are started before this initial thread finishes the copying will get an identical set of local variables – this includes the threadID variable used in the loop. I’m not certain if that’s actually what’s going on under the hood, but the behavior is similar as if that is the case. In any event, you cannot rely on variables changed by the parent page (”page thread” by Adobe’s documentation) appearing in your cfthreads, even if that change was made before your specific thread was launched, but after the initial thread was launched.

It seems bizarre, but let me give you a piece of sample code which demonstrates it. This code is non-deterministic for me – that is to say some times I see a “correct” result, but then I immediately refresh it and get a different result.


Here you should see each worker thread having a unique loopID. Instead, for example, my code shows Worker 1 has a loopID of 1, Worker 2 = 2, Worker 3 = 3, Worker 4 = 4, but Workers 5 through 20 have a loopID of 5. If I refresh, it’s different. I’ve seen all 20 workers starting with a loopID of 1, and I’ve seen the first fiew having a loopID of 1, then the next few having a loopID of 5, then the next few having a loopID of 8, etc.

So how do you safely determine which worker you are so you know which of the set of work you should be doing? The answer, as Ray posted in a comment (and contrary to my more elaborate work-around involving locking), is to use the attributes scope:


Basically the idea is you can pass additional custom attributes to and these show up as values in a structure named “attributes” available only within the scope of your thread.

,

2 Comments

ColdFusion 8.0.1 – Nested Array/Struct Shorthand

As you probably know, ColdFusion 8 gave us a long-needed shorthand for creating arrays and structures:


Unfortunately you couldn’t nest those constructs. With the 8.0.1 updater though, you now can:

This is fantastic when you’re trying to make configurable code – a chunk of code whose basic function is tweakable by updating a few settings variables at the top instead of hard-coding them in throughout the code (for example – URLs, filesystem paths, data sources, etc).

No Comments

Follow up to Real Time Command Execution Feedback Post

With ColdFuison 8.0.1 Adobe has introduced errorVariable and/or errorFile to the attributes of the tag. You can only use one of the two in the same tag.

This will give some insight once the tag has completed execution if there is an error. Before there was no way for CF to report errors to a file or to the browser.

No Comments

ColdFusion Preserve POST variable name case

ColdFusion provides access to POSTed data via the FORM structure. Unfortunately ColdFusion always upper-cases the names of these variables. Recently a chunk of code I was working on needed to know the original case of these keys. At first I worked on passing a hidden form field with the original field name text, but this bothered me as way too much of a work-around.

This chunk of code will give you a structure called FormContent where the case of the field names is preserved.

Unfortunately this code will not distinguish between two fields with the same name but different case. To do that, you’ll want to use a case-sensitive StructNew() alternative. I recommend CreateObject(”java”, “java.util.LinkedHashMap”).init(). This has the added value of preserving the order that the fields appeared in the calling form when you iterate over it for output. It has the disadvantage that you’ll have to properly match case of the keys when you retrieve them in your code.

4 Comments