Archive for category Programming

Merging GET & POST Data Leads to Sloppy Programming

Under typical situations when writing web applications, there are two ways that you can pass user data to the webserver. GET and POST, which line up with the HTTP verbs of the same names. Of course there are quite a few other verbs, but these are the two which are used more than any others. It is common practice for some developers or framework authors to merge these two data sources into a single common structure and work purely out of that common merged space. It’s my belief that this leads to sloppy programming practices and can have security implications for your code.

GET data is part of the URL, and is visible in the browser. It can also be bookmarked by the browser, and it can be retrieved from browser history. POST data is “invisible” to the user, and cant be bookmarked or recalled from browser history.

To me, GET/URL ($_GET['key'] in PHP, URL.key in ColdFusion) is where you pass application variables such as categoryID, productID, fuseaction, etc. These are properties which are decided and created by the application. User input on the other hand belongs being submitted via POST ($_POST['key'] in PHP, FORM.key in ColdFusion).

There are a few situations where you want to pass user input on the URL. For example if you want to allow a user to bookmark, use browser history, or send another user a URL to a specific result. I do this a lot for reporting and searching functions. But unless you’re specifically attempting to provide this as a feature, user data always belongs being submitted as POST, especially if that page is not a read-only page.

If the page performs any action based on user input other than querying or formatting output, user data absolutely belongs as a POST. Consider this link (which I don’t advise you to click unless you either don’t read Slashdot or don’t mind a mild annoyance): http://slashdot.org/my/logout . This is a security function; it’s much easier to craft a URL and to trick someone into following that link (such as with a tinyURL on your twitter or in a website comment) than to post a form.

Merging your data from GET and POST removes your ability to control and only accept sensitive parameters from one scope or the other unless you break the paradigm of the merged scopes. It also means that as a developer you don’t have to think at all about where your data comes from. That’s a dangerous practice; you should always think very carefully about the path data has traversed before you interact with it. Some data can be trusted, while some data cannot be trusted.

Rather than defaulting to vague data source for convenience of the programmer, you should specifically choose which times you want to accept input from multiple locations and handle those. Your default mode of operation should always be restrictive, opening up only as much as necessary to support your application’s requirements.

No Comments

ColdFusion: XSS Vulnerability in SerializeJSON()

There is a minor vulnerability in ColdFusion’s SerializeJSON() method. ColdFusion fails to escape object keys correctly.

Here is a typical example of the expected way to use SerializeJSON():


The output of this is:

The bug is that object keys are not properly escaped, so if you have an object such as a Struct with a specially designed key, you can inject javascript code where it was not intended:


The result of which is:

As you can see, the generated javascript will be parsed by the browser successfully – except where we only intended to communicate data, we instead executed a function.

So what’s the danger?
The problem is that some users may wish to give easy access to GET and POST (URL/FORM) variables to their client-side javascript – maybe some of these parameters affect how you output data for example. If you just SerializeJSON() the URL or FORM variable, then you may unintentionally be allowing user-supplied data to execute in your page context, which can result in cookie stealing, malicious script injection, and other nasty things like that, by the user including javascript code as the name of a URL argument. Actually exploiting that takes some creativity due to ColdFusion automatically upper-casing URL keys, but that’s a one-time exercise which I’ll leave for the reader (sorry script kiddies).

SerializeJSON() should be safe, it should only create a JavaScript object which represents the data, and should not allow for script injection. The fix for Adobe would be incredibly simple; all they would have to do is escape object keys the same way they already escape output strings.

No Comments

ColdFusion: SerializeJSON() Recursion Error

In ColdFusion 8, Adobe introduced a new function called SerializeJSON(), which takes a single object of just about any type and returns a JSON representation of that object and its properties.  This can include objects which are not native ColdFusion types such as a Java object, and it does a respectable job of figuring out values for this object to include in the JSON by automatically and recursively including the values of any zero-parameter methods which start with ‘get’ such as ‘getName(),’ ‘getAddress(),’ etc.

In theory this works well because for essentially no additional effort, you can send a JavaScript representation of this class down to the browser and interact with it there.

There are a number of shortcomings to this approach though, not the least of which is the inability to filter what properties of a Java object are exposed to the browser when serializing it.  Sometimes objects have sensitive data stored in them which you do not want to expose.  For example if you had a page which details a user’s profile, and you store the properties of this user in a Java bean, you cannot simply pass the same copy of the user object down to the browser through SerializeJSON() as you might be tempted, as this may contain sensitive values such as getPasswordHash() or getAdministrativeNotes() (something I use in  to keep private notes on users especially for when a user has a history of abusive behavior).

However there’s an outright bug in the serialization routine which is essentially unrecoverable if you encounter it.  We first discovered it at work when working with a Java enum data type; having a getter or property whose value was a Java enum will cause you to get a stack overflow exception.  For example:


If you create an instance of this object, you can see the error:

This actually fails on a level that a normal try/catch cannot recover from it. The only way to see the actual error is to handle the error from an Application.cfc onError() method. Here is the error:

The problem is that enums are actually for the most part a java sub-class which have public final static properties for each of the possible enum values. ColdFusion is attempting to serialize the public final static property which is the same class as itself, and this ends up creating a circular reference as far as ColdFusion is concerned. It’s not actually a circular reference in the traditional sense; static class properties are not subject to reference counters and garbage collection – they are part of the permanent generation in Java. The “circular” reference is also created by the compiler and are an endemic aspect of enums.

As already stated, the problem is not limited to java enums, that’s just the first place we noticed it. Common Java design patterns like the Singleton pattern will raise this error as well, if the singleton accessor begins with ‘get’ or the static instance is public:

What can be done to fix it? From an implementation perspective you would have to change your object interface to avoid the problem as it exists today in ColdFusion 8. Essentially you need to be sure that there are no getters or public static properties which may back-reference to the owning class. If you require such, you must put them behind a getter instead of a public static property, and you must also make sure that getter doesn’t start with ‘get’. Essentially you have to kludge the crap out of your Java objects.

From Adobe’s perspective they need to detect a recursion loop and avoid it. Alternately they could provide a way to overload the JSON output for a java object (eg if they test for the existence of a toJSON() method and use its output instead of constructing the object properties themselves). Ideally they would do both, but although this latter approach would offload much of the work to individual developers, it would also give the develoer a way to filter the properties which were output as part of the JSON implementation of an object. The developer could even create flags on the object that allow them to select from several sets of specific properties as necessary (for example, setJSONPrivilege(PUBLIC | PRIVATE) ).

No Comments

ColdFusion: Using Java Beans

A while back we were working on a huge new website in ColdFusion which was a rearchitecture of an extremely mature but very worn out code base. One of the biggest things we wanted to do was adopt a substantially more object oriented approach to development as the original site was started in the ColdFusion 4.5 days.

However we very quickly ran into the problem that many ColdFusion developers have faced (in fact there was even a session on this exact subject at CF.Objective() this year). ColdFusion objects have a substantial overhead to instantiation. Java programmers create hundreds or thousands of disposable objects for even fairly simple tasks. They create objects for things they don’t even realize they’re creating objects for (ints, Booleans, Strings, even lots and lots of Char objects as a string has a Char array in it with one object for each character in the string).

ColdFusion developers can’t be so cavalier with their object creation though. Of course under the surface of ColdFusion is its Java runtime, and so there are plenty of Java objects created under the hood, but when it comes to ColdFusion components, you really need to limit how many you create.

If you try a traditional bean approach to programming in ColdFusion (one data container object for each distinct thing you’re working with – eg one for each query row in a result set), you’ll discover that your application quickly crawls to a halt under any serious load. ColdFusion simply cannot afford the overhead of creating so many components.

There are two main approaches to solving this problem. One is aggressive use of caching of components, and sharing components between users. This is not really a bean approach, at best it could be considered “inspired by,” and it raises quite a few complications of its own, including increased memory footprint, and dangers introduced when one user is modifying a shared object while other users are using it.

The other (and the one I prefer, but it’s a bit less convenient) is to create your beans in Java. Here is a simple bean:

The advantage of this approach is that you can create tons and tons of disposable copies of this object with very little performance penalty. In fact it should be no more expensive than creating a Struct of properties – but you get the benefit of type safety and guaranteed properties. You can also add convenience functions like:


This way you can:

and get “A 1999 Chevy Cavalier. This was not my first car.” as your output.

Coming soon:

  • How to build and bundle your beans
  • Using the Java enum construct for enumerable properties

No Comments

One Man’s View is Another Man’s Data

I think it’s common for a developer to get the idea in his or her head that developing under an MVC (Model View Controller) paradigm is ultra cut and dry: There is one Model, one View, and one Controller for a given task. Within a given layer of the software stack this may often (or even always) be true.

However: all software produces something (in its view) which is consumed by the next higher tier of the software stack as data, and thus is only part of the model for that next stack. All software produces something to be consumed by something else. What is to you a view is to someone else data.

So it’s important to avoid thinking that there’s something special about your particular view (no matter where you are in the stack) – as if it’s somehow magical and the view. You’re never the end of the line, it doesn’t matter how far down the line you really are. The end of the line is the brain of the human who will ultimately consume this data. Even then they’re not the end of the line, they process that data, make decisions based on it, and produce some output of their own – whether it’s using it to produce new data for the next action within the application, taking that data elsewhere as an input to a different process, or storing it to memory for later use as data for a future process.

Let’s look at an example from web development. The PHP/ColdFusion/ASP.NET/Ruby/FOTM developers I know tend to think of themselves as the end of the line (and I have too to a substantial extent). They’re producing something to be consumed (as they see it) by the end user. This is the guy who gets to make the HCI (Human-Computer Interaction) interface that either creates a positive experience for the user, or a negative one. Everything past him just follows the instructions he produces. Bold this, outline that, send this data there. What’s harder to see is that he’s just following the instructions he was given by the user and producing data in a format that other software later in the stack requires of him.

What the web developer calls data is actually just a view provided by lower down in the stack (a database typically, which in turn gets a view of stuff from the filesystem which it calls data, etc). A web developer’s view (HTML typically) is just data to the web browser. The web browser’s view (graphical representation) is just data to the display driver. The display driver’s view (bits of light on a computer monitor) is just data to the user.

The software stack starts and ends with the user. If anything is sacrosanct, the ultimate MVC, it’s the user. But as I’ve already said, the user ends up just starting the cycle again, maybe she uses that data to feed the computer again, or maybe she uses that data for another purpose.

So don’t get caught up in “HTML is the view, XML/SOAP is data,” it’s all data to someone, and therefore it’s all a view to you. Don’t think there’s something special about one way to structure data vs another way to structure data. Finally don’t create different channels depending on what consumes your data. Use the same data channels (Model/Controller) and provide a different View. That is after all what the purpose of a View is all about – consumer agnosticism.

, , ,

No Comments

CF.Objective() So Far

So far I’ve been to two really good sessions at CF.Objective(). The first I was dubious about, “Indiana Jones and the Server of Doom,” but I actually learned some things about low-level memory management within ColdFusion, and I can definitely say I’ve got something new to check out on production boxes when I get back to the office. I’ll post more on that later.

Also the session from Adobe where they were highlighting server administration in the upcoming ColdFusion 9 was fantastic. It’s almost like they took a list of the things which cost the most time when maintaining servers today, and created all-new functionality to make managing this much easier. I’m very excited by this, things that are being manually configured on many, many instances today will be able to be rapidly and widely deployed when ColdFusion 9 (Centaur) comes out.

As with all conferences there are going to be times when there’s not much directed at your skill level. This is because there are people of all walks here, from new developers to seasoned experts. I fall somewhere in between, and so I’m doing pretty well on finding good sessions overall.

Next up: Advanced Subversion Techniques (”Subversion for Smarties”).  Here’s hoping it’s not review =)

, ,

2 Comments

ColdFusion Ordered Struct

As most readers probably already know, in ColdFusion, structs are associatively keyed storage structures similar to an array but where you get to use a string to key an entry rather than only a sequential number.

PHP only has array() which acts both like ColdFusion’s array and struct both. You can numerically key arrays or associatively key them, or both. One of the reasons it can get away with this is that it preserves the insert order. So if you do:


The output will be the values in the same order they were put into the structure. If you do the same thing in ColdFusion, you’ll get it back in a seemingly random order (or depending on the version some times you’ll get back in alphabetical order):

Instead if you need to preserve insert order, you can use a similar Java object from ColdFusion:

You can treat this like a struct in every way, including <cfdump>ing it (though cfdump will not show you the insert order for some reason). As you iterate over it, the contents will always come back in the same order they were inserted.

It’s important to note that LinkedHashMap keys are also case-sensitive while ColdFusion Struct keys are case insensitive. This may cause undesired results as you might have two keys that you believe are the same but differ in case; this may cause collisions when working with other objects that are not case sensitive.

, ,

2 Comments

CF.Objective() Here I Come

Heading off to Minneapolis tomorrow morning for CF.Objective().  This is the first conference I’ve been to in a while.  Hoping we get to hear some about the next version of ColdFusion and the Bolt IDE (I’ve played with it some; I can’t say a lot, but I can say that it’s got some fanstastic features).

No Comments

ColdFusion Including Sub-Applications

Ben Nadel has an interesting question on his blog about including sub-applications from within an existing CF application, and having the relevant sub-level Application.cfc fire off.

This is doable in a fairly simple manner but which relies on a barely-documented feature of ColdFusion, and the fact that the sub-level Application.cfc fires is completely undocumented, and may even be unintentional!

Here’s how you can do it, but because we’re wandering into a pretty hazy gray area here, I wouldn’t go using this unless you don’t have much other choice.

Application.cfc:


Then you can do the below and if it has an Application.cfc, that application.cfc will be invoked.

The caveat though is that CGI scope will still contain the variables from the source page – for example, CGI.SCRIPT_NAME will still be the script name in the URL. As a result, context-sensitive functions like ExpandPath() will operate relative to the root file being called – meaning you might not get the results you’re expecting.

Also the code above will only work 1 sub-application level deep; you’d have to tweak it if you wanted a sub-application within a sub-application, but by that point, zounds, what are you doing man?!?

The good news is that even if the sub-application executes a <cfabort>, execution will return to the calling page, so that sub-app can’t abort your own page.

, ,

2 Comments

Relative CFLoop Performance for Various Loop Structures

Introduction

Jim over at Ben Nadel’s blog made the assertion that looping a list is faster than looping a struct.  It’s an interesting assertion that looping a list would be faster than looping an array.  I did a test of my own to find out.

Setup

Starting with objects with 1 entry populated, I increased the number of populated entries by 1,000 until I reached 100,001 entries in each of an Array, a List, and a Struct.

I also compared the following syntaxes:
<cfloop array=”myArray” index=”x”>
<cfloop from=”1″ to=”#ArrayLen(myArray)#” index=”x”>
<cfloop collection=”#myStruct#” item=”x”>
<cfloop from=”1″ to=”#StructCount(myStruct)#” index=”x”>
<cfloop list=”myList” index=”x”>
<cfloop from=”1″ to=”#ListLen(myList)#” index=”x”>

For each type of test, I did a <cfsavecontent> to cfoutput the relevant entries from each object without causing a ton of data to be sent back to the browser.  The actual <cfloop> and output was tight (all on one line to minimize the total content going to cfsavecontent).

I ran each incremental test 10 times to get an average duration.  I did this because my in my earliest tests it was obvious that garbage collection was causing significant variance (shorter tests often took longer than their longer brethren).  Even still it’s obvious garbage collection played a bigger factor in efficiency than the actual test results, making them somewhat difficult to decipher meaningfully.

Execution

Early on, I dropped the final test (which involved ListGetAt() for the output portion) since while the others were still taking a few milliseconds, this test was already taking multiple seconds.

It should be no surprise that 1 entry of each took no measurable time (getTickCount() had not incremented).

At 10,001 entries, <cfloop array> and <cfloop list> took 5 and 4 milliseconds respectively, but at 11,001 (the next level up), it was 3 and 7 milliseconds respectively – neck and neck, looking in favor of <cfloop array>.  <cfloop ArrayLen()> was taking 9 and 6 seconds, <cfloop collection> 21 and 17 seconds, and <cfloop StructCount()> waas taking 21 and 14.  At this level we could say <cfloop array> and <cfloop list> were close, while the others were not.

At 50,001 entries, <cfloop array> and <cfloop list> are still neck and neck.  They reported identical times of 11.00 milliseconds.  <cfloop ArrayLen()> weighed in at 56.10ms, <cfloop collection> at 126.2ms, and <cfloop StructCount()> at 91.2ms.

By the final test of 100,001 entries, <cfloop array> and <cfloop list> had finally started to show a real difference.  <cfloop array> pulled ahead as the clear winner, with at least a 10ms margin on all but one of the the 10 tests leading up to 100,001.

Cfloop Performance Comparison
(Click for full version)

So what causes such a performance difference?

<cfloop ListLen()> was the clear loser by a wide margin.  ListGetAt() is horribly inefficient.  Each time you call it, it has to consume the string from the start, counting delimiters until it reaches position-1, then continue consuming until it reaches position.  This is no surprise at all.

<cfloop collection> was the clear next-to-last-place loser.  The most likely reason for this to me is that internally it needs to loop over some other type of object (in this case, most likely an array of struct keys), then needs to do a lookup within the actual structure itself (a HashMap of some form).

<cfloop StructCount()> came next.  This was a bit of an non-real world scenario.  We didn’t have to loop over a collection of struct keys, we knew what the struct keys were and were able to perform the loop itself in a faster fashion than looking up keys.  But to output the value, we still needed to do HashMap lookups on the keys themselves.

<cfloop ArrayLen()> won the bronze.  Seems to me that this is because we have to do a positional lookup to output the value for each entry.  That shouldn’t be huge because internally this should be represented as a vector of pointers to the actual entries, and since each vector has a fixed memory size, we take ArrayMemoryStartPoint + position * sizeof(vector).  Well, would be nice if that’s the case, but it’s not quite that simple since these arrays are dynamically sized.  We actually have to do a bit of work with a table of memory references, blah blah blah.  Point is we have to do a lookup (albeit cheaper than a struct lookup) to find the correct entry.

<cfloop list> took the silver.  This makes sense, the iterator involved in the cfloop can remember where it left off, and each time through the loop only has to consume up to the next delimiter.  It knows what the index of the previous delimiter is, and so the only lookup it has to do at all is collecting the characters from offset A to offset B (which it could actually have kept track of while it was consuming).

<cfloop array> took the gold, but not by a large margin.  This also makes sense because it has the same sort of performance gain that <cfloop list> has (already knowing its current offset for the next time through the loop), and all it has to do is return a pointer to the next element in the vector.  The vector iterator handles this stuff at a lower level and is able to make the best use of its own internal structures to make this as fast as possible.

Conclusion

<cfloop array> and <cfloop list> are pretty interchangeable even though the former wins the contest.

All of this being said, we are still looking at loop overhead of less than 300 milliseconds even for the slowest (other than <cfloop ListLen()>) loop type for 100,000 records.  This is incredibly trivial overhead for that amount of data.  So in reality which type you use probably depends more on what structures you’re already using – trying to convert to a different data container for your loop will probably cost you as many or more cycles than using the native loop type associated with wherever your data currently is.

, , , ,

6 Comments