2009/11/21

Segmentation fault on Ubuntu 9.10 Server under Windows 7 x64 Virtual PC

I have been using Ubuntu since version 8.10 Intrepid Ibex and I was anxiously waiting the release of Ubuntu 9.10 Karmik Koala some weeks ago. In previous versions of Ubuntu it was a nightmare to have it running under Microsoft virtual environments (i.e. Virtual Server 2007, Virtual PC 2004, Virtual PC 2007 and so on). Problems with screen resolutions, bouncing mouse cursors, and skewing clocks were common and somewhat hard do solve for a novice Linux user I was by those times.

The fact is that I tried Ubuntu 9.10 Server beta, some weeks before the final version was released, under Virtual PC 2007 on Windows Vista Ultimate x64 on my desktop computer. When the bare bone LAMP server was installed, I logged in and installed gnome-desktop, crossing fingers. I was gladly surprised that everything worked fine right after the reboot: no screen flicking, no bouncing mouse, all Ok. Great. It was still the beta but it was a promising start.

When the final version of Ubuntu Server 9.10 was released, I downloaded the ISO and tried to install it on my laptop, a Windows 7 x64 with Virtual PC, the one shipped with Windows 7, not Virtual PC 2007 that you must use in Vista. All my expectations felt helplessly to the mud.

Everything seemed to be fine when the installer told me to reboot the system for first time:

Installation is complete

I rebooted the virtual machine and … oops… segmentation fault. what? I rebooted once again, and the same error: segmentation fault. Sometimes the virtual machine window simply closed, if not, the console showed me the same error: Segmentation fault and rubbish all along the screen.

Segmentation fault 1

Segmentation fault 2 Segmentation fault 3

There was no way I could run Ubuntu 9.10 Server under Virtual PC from Windows 7 x64. I tried various different install configurations (LAMP, DNS, nothing at all), with different RAM sizes, I even tried to change some settings in the guest BIOS, without any luck. In all cases, when the machine booted, I get the segmentation fault error.

After reading some documentation about the general occurrence of a segfault error, and finding that it happens when the code being execute tries to read/write some memory allocation that it should not, or an invalid memory address.

It sounded me like something dealing with Data Execution Prevention or DEP. You can find those settings in your Windows 7 under System Properties –> Advanced Options –> Performance settings –> Data Execution Prevention.

I tried to disable Data Execution Prevention for %windir%\system32\vpc.exe (the executable file for Virtual PC) but since it was a 64bit system I got an error message: You cannot set DEP attributes on 64 bit executables. No luck this way either.

According to Microsoft about Data Execution Prevention:

32-bit versions of Windows Server 2003 with Service Pack 1 utilize the no-execute page-protection (NX) processor feature as defined by AMD or the Execute Disable bit (XD) feature as defined by Intel. In order to use these processor features, the processor must be running in Physical Address Extension (PAE) mode. The 64-bit versions of Windows use the NX or XD processor feature on 64-bit extensions processors and certain values of the access rights page table entry (PTE) field on IPF processors.

XD processor feature? Umhhh, my BIOS (the laptop, physical one) had such a thing… My laptop is a Dell Vostro 1700 and it has a setting called CPU XD Support. Why don’t we try to disable it? I rebooted to check that setting and I saw that it was Enabled (by default). Just for doing one more test, I disabled it and restarted.

CPU XD (Execute Disable) Support

I then started the Ubuntu 9.10 Server virtual machine and… it worked!!! I was even capable of installing gnome-desktop also and everything worked as it worked with Windows Vista in my desktop computer.

But is it safe to disable such a feature for the whole system? Just to be able to try and play with Ubuntu as a VM sometimes? I suppose not. I then rebooted and reset the value to Enabled (by default).

There must be something wrong with either Ubuntu or Virtual PC. Maybe Ubuntu is trying to execute certain memory address that are code for the guest, but data for the host. I don’t know.

At least, I have found a workaround for the problem. Whenever I want to test something in Ubuntu, it costs me a reboot, a change of settings in CPU XD Support value of the BIOS and a restart… ah.. and another reboot to change it back to the safe value.

If someone else finds a better workaround for this problem, I am willing to hear about it!

2009/10/30

Windows 7: Disable builtin DHCP server for “Internal network” in Virtual PC

I recently installed Windows 7 and I have been waiting for the final release of XP Mode and Virtual PC which occurred last 22nd of October. I previously had (in Windows Vista and using Virtual PC 2007) a virtual domain, composed of virtual machines such as:

  • server2003: a domain controller and DHCP server, with fixed IP address, connected to the “internal network” of Virtual PC.
  • isa2006: with two interfaces (dual homed), one connected to the physical host network adapter (for connecting to the internet), the other one connected to the “internal network”. Both IPs are manually set.
  • sql2008: the database server for the tests with this virtual domain, IP address assigned dinamically through DHCP.
  • vs2008xp: a Windows XP with Visual Studio 2008, belonging to the domain for testing and developing, IP configured through DHCP (that should be handled by server2003).

With such a testing environment, all traffic that should go to/from the internet passes though isa2006. If isa2006 is not running (for instance) the virtual domain is isolated and the virtual machines can only see themselves (members of the domain).

This was the scenario that I had configured in my old Vista using Virtual PC 2007 and wanted to reuse the .vhd files so that I do not need to rebuild the playground from scratch again.

It was quite simple, I just recreated every single virtual machine using the wizard, and when asked for the hard disk, I selected ‘the existing one’ instead creating an empty one. Then, when the machine was first started, I reinstalled the Virtual Machine Additions (now called Integration Components), and after a couple of restarts everthing seemed to be working… but it only seemed.

Then I realized that sql2008 and vs2008xp (both were configured to use dynamic IPs using DHCP) cannot browse the internet, nor ping any other server in the domain. They were using the “Internal network”, but their IP addresses were not assigned by the DHCP running in server2003, since they were not in the expected range/mask.

After Gooling for a while I learned that Virtual PC has its own builtin DHCP server and it seems it is (incorrectly) enabled for the “Internal network”. Fortunately there is a fix for it:

  1. Turn off or hibernate all your running Virtual Machines.
  2. From the Task manager, kill vpc.exe if it does not exit on its own.
  3. Edit "%localappdata%\microsoft\Windows Virtual PC\options.xml"
  4. Search for the “Internal network” section, and then inside the <dhcp> section, disable it: <enabled type="boolean">false</enabled> and save the file. You can keep a backup of the original xml file just in case.
  5. Turn your VMs and verify everything runs as expected.

2009/10/11

URL Canonicalization with 301 redirects for ASP.NET

There are lots of pages talking about the benefits of canonicalization (c14n for short). It is a common agreement that it is just a set of rules in order to have our pages indexed in the most standardized, simplified and optimal way as possible. This would allow us to recollect our PageRank instead of having it spread among all the possible combinations of writing an URL for a particular page. In this post we will cover some canonicalization cases and their implementations for our IIS server running ASP.NET.

These different cases include:

  • Secure versus non secure versions of a page: Are http://www.example.com and https://www.example.com the same?
  • Upper and lowercase characters in the URL: Are ~/Default.aspx, ~/default.aspx and ~/DeFaUlT.aspx the same page?
  • www versus non-www domain: Do http://example.com and http://www.example.com return the same contents?
  • Parameters in the QueryString: Should ~/page.aspx?a=123&b=987 and ~/page.aspx?b=987&a=123 be considered the same? Are we handling bogus parameters? What happens if someone links us with a parameter that is not expected/used such as ~/page.aspx?useless=33 ?
  • Percent encoding: Do ~/page.aspx?p=d%0Fa and ~/page.aspx?p=d%0fa return the same page?

If your answer is yes in all cases, you must keep on reading. If you only answer yes in some cases, this post will be interesting for you anyway; you could skip those points that do not apply in your scenario by just commenting some lines of code, or modify them to match your needs. Sample VS2008 website project with full VB source code is available for downloading.

In our sample code we will be following these assumptions:

  • We prefer non-secure version over secure version, except for some particular (secure) paths: If we receive an https request from a non-authenticated user for a page that should not be served a secure, we will do a 301 redirect to the same requested URL but without the secure ‘s’.
  • We will prefer lowercase for all the the URLs: If we receive a request that contains any uppercase char (parameter names and their values are not considered), we will do a permanent 301 redirect to the lowercase variant for the URL being requested.
  • www vs. non-www should be handled by creating a new website in IIS for the non-www version and placing there a 301 redirect to the www version. This case is not covered by our code in ASP.NET since it only needs some IIS configuration work.
  • The parameters must be alphabetically ordered: If we receive a request for ~/page.aspx?b=987&a=123, we will do permanent redirect to ~/page.aspx?a=126&b=987, since the alphabetic sort a is before b. Regarding lower and uppercase variants either in the name of the parameter or the value itself, we will consider them as being different pages, in other words, no redirecting will be done if the name of a QueryString is found in upper/mixed/lowercase. The same would apply for the value of those parameters: ~/page.aspx?a=3T, ~/page.aspx?A=3T and ~/page.aspx?a=3t will be considered as different pages, no redirection will be done. In pages that accept parameters extra coding must be done to check that no other than the allowed parameters are used.
  • We will prefer percent encoded characters in their uppercase variant, for that reason %2f for instance will be redirected to %2F whenever they appear in the value of any parameter. This way we follow RFC 3986 that states:
    Although host is case-insensitive, producers and normalizers should use lowercase for registered names and hexadecimal addresses for the sake of uniformity, while only using uppercase letters for percent-encodings.

<link rel=”canonical” …>

Last february 2009 Google announced through their Google Webmaster Central Blog a way for you to explicitly declare your preferred canonical version for every page (see Specify your canonical ). By simply adding a <link> tag inside the <head> section of your pages, you can tell spiders the way you prefer them to index your content, the canonical preferred way. This helps to concentrate the GoogleJuice to that particular canonical URL from any other URL version or URL variation pointing to it in this way (the link rel=canonical way). This very same method was later adopted by Ask.com, Microsoft Live Search and Yahoo!, so it can be considered a de facto standard.

We will adopt this relative new feature in our sample code. Most of the times we will be using permanent 301 redirects, but there might be cases where you may not want to do a redirect and simply return the requested page as is (with no redirection) and return the canonical URL as a hint for Search Engines. Whenever we receive a request for a page, including bogus parameters in the query string, we will handle the request as a normal one but we will discard the useless parameters when calculating the link rel=canonical version of the page.

In particular, if you are using Google Adwords, your landing pages will be hit with an additional parameter called gclid that is used for Adwords auto-tagging. We do not want to handle those requests differently, nor treat them as errors in any way. We will only discard the unknown variables when creating the rel=canonical URL for any request.

Related links.

Internet Information Services IIS optimization
Are upper- and lower-case URLs the same page?
Google Indexing my Home Page as https://. 
http:// and https:// - Duplicate Content? 
SEO advice: url canonicalization

Q: What about https and http versions? I have a site is indexed for https, in place of http. I am sure this too is a form of canonical URIs and how do you suggest we go about it?
A: Google can crawl https just fine, but I might lean toward doing a 301 redirect to the http version (assuming that e.g. the browser doesn’t support cookies, which Googlebot doesn’t).

Specify your canonical

Keywords.

canonicalization, seo, optimization, link, rel, canonical, c14n, asp.net, http vs. https, uppercase vs. lowercase

2009/08/30

Automatic generation of META tags for ASP.NET

Some of the well known tags commonly used in SEO are the three following meta tags:  meta title tag, meta keywords tag and meta description tag:

<meta name="title" content="title goes here" /> 
<meta name="keywords" content="keywords, for, the, page, go, here"/> 
<meta name="description" content="Here you will find a textual description of the page" />

A lot has been written about the benefits of using them, and almost the same amount telling that they are not considered anymore by search engines. Anyway, no matter if they are used or not on the calculation of SERP (Search Engine Results Page), nobody discusses the benefits of having them correctly set on all your pages. At least meta description tags are somehow considered by Google, since Google Webmaster Tools warns you about pages with duplicate meta description:

Differentiate the descriptions for different pages. Using identical or similar descriptions on every page of a site isn't very helpful when individual pages appear in the web results. In these cases we're less likely to display the boilerplate text. Wherever possible, create descriptions that accurately describe the specific page. [...]

Download the VB project code

The question is not “should I use meta tags in my pages?”, the real question (and here comes the problem) is “how can I manage to create individual meta descriptions for all my pages?” and “how can I automate the process of creating meta keywords?”. That would be too much work (or too much technical work) for you (or your users, if they create content on their own).

For instance, consider a CMS (Content Management System) in which users are prompted for some fields to create a new entry. In the simplest form, the CMS can ask the user to enter title and content, for the new entry. In advanced-user mode, the CMS could also ask the user to suggest some keywords, but the user will probably enter just two, three or four words (if any). The CMS needs a way to automatically guess and suggest a default set of meta keywords based on the content before definitely saving the new entry. Those could be checked, and eventually completed by the user, and then accepted. Meta title and meta descriptions are much easier, but will be covered also in our code.

In our sample VB project we will not suggest keywords for the user to confirm, we will just calculate them on the fly and we will set them without user intervention. We will use a dummy VirtualPathProvider that will override the GetFile function in order to retrieve the virtualPath file from the real file system, so it is not a real VirtualPathProvider in the whole sense, just a wrapper to take control of the files being served to ASP.NET before they are actually compiled. A VirtualPathProvider is commonly used to seamless integrate path URLs with databases or any other source of data rather than the file system itself. Our custom class inheriting from VirtualPathProvider will be called FileWrapperPathProvider. In our case it will not use the full potential of VirtualPathProviders, since we will only retrieve the data from the file system, do minor changes to the source code on the fly and return them in order to be compiled. This will introduce a bit of overload and some extra CPU cycles before the compilation of the pages, but that will only happen once, until the file needs to be compiled again (because the underlying file has changed, for instance).

Our FileWrapperPathProvider.GetFile function will return a FileWrapperVirtualFile whenever the virtualPath requested falls under the conditions of IsPathVirtual function: the file extension is .aspx or .aspx.vb and the path of the requested URL follows the scheme of ~/xx/, that is to say, under a folder of two characters (for the language, ~/en/, ~/de/, ~/fr/, ~/es/, …). In other case, it will return a VirtualFile handled by the previously registered VirtualPathProvider; ie. none, or the filesystem itself without any change.

We have chosen to use a VirtualPathProvider wrapper around the real file system just to show what kind of things can be done with that class. If your data is on a database instead of static files, you will probably be using your own VirtualPathProvider, and in that case it will work by virtualizing the path being requested and retrieving the file contents from the database instead of the filesystem. Whichever the case, you can adapt it to your scenario in order to make use of the idea that we will illustrate in this post.

The idea is somewhat twisted or cumbersome:

  1. Parse the code behind file for the page being requested (.aspx.vb file) and, using regular expressions (regex), replace the base class so that the page no longer inherits from System.Web.UI.Page and inherits from System_Web_UI_ProxyPage instead(a custom class of our own). This proxy page class declares public MetaTitle, MetaDescription and MetaKeywords properties and link them to the underlying meta title, meta description and meta keywords declared inside the head tag in the masterpage. When a page inherits from our System_Web_UI_ProxyPage, it will expose those 3 properties that can be easily set. See System_Web_UI_ProxyPage.OnLoad in our sample project.
  2. Read and parse the .aspx file linked to the former .aspx.vb file (the same without the .vb) and make a call to JAGBarcelo.MetasGen.GuessMetasFromString method which makes the main job with the file contents. See FileWrapperVirtualFile.Open function in the sample project.
  3. Besides of changing the base class to that of our own, we add some lines to create (or extend) Page_Init method on that .aspx.vb file. In those few lines of code that are added on the fly we set the three properties exposed by System_Web_UI_ProxyPage class and that we have just calculated.
  4. Return the Stream as output of the VirtualFile.Open function with the modified contents so that it can be compiled by ASP.NET engine, based on the underlying real file, using the formerly calculated meta title, meta keywords and meta description. Note that this is done in memory, the actual filesystem is not written at any time. The real files are read (.aspx.vb and .asxp), parsed, and the virtual contents created on the fly and given to ASP.NET. You need too be really careful since you can run into compile-time errors in places that will be hard to understand, since the filesystem version of the files are the base contents, but not the actual contents being compiled.

The way we calculate the metas in JAGBarcelo.MetasGen.GuessMetasFromString is:

  1. Select the proper set of noise-words set depending on the language of the text.
  2. Look for the content inside the whole text. It must be inside ContentPlaceholders (we will suppose you will be using masterpages), and we will look for a particular ContentPlaceHolder that contains the main body/contents of the page. Change the LookForThisContentPlaceHolder const inside MetasGen.vb file in order to customise it for your own masterpage ContentPlaceHolder's names.
  3. Calculate the meta title as the text within the first <h1> tags right after the searched ContentPlaceHolder.
  4. Iterate through the rest of the content, counting word occurrences and two-word phrases occurrences, discarding noise words for the given language.
  5. Calculate the keywords, creating a string that will be filled with the most frequent single-word occurrences (up to 190 characters), and two-word most frequent occurrences (up to 250 characters in total).
  6. Calculate the description, concatenating content previously parsed, to create a string between of 200 and 394 characters. Those two figures are not randomly chosen, Google Webmaster Tools warns you when any of your pages has meta descriptions shorter than 200 or longer than 394 characters (based on my experience).
  7. Return the calculated title, keywords and description in the proper ByRef parameters.

A good thing about this approach, using a VirtualFile is that you can apply it to your already existing website easily. No matter how many pages your site has, hundreds, thousands,... this code adds meta title, meta keywords and meta descriptions to all your pages automatically, transparently, without user intervention, very little modifications (if any) to your already existing pages and it scales well.

Counting word occurrences.

We iterate through the words within the text under consideration (ContentPlaceHolder) and store their occurrences into a HashTable (ht1 for single-words and ht2 for two-words). All words are considered in their lowercase variant. The word must have more than two characters to taken into account and must not start with a number. If it passes the former fast test, it is checked against a noise-word list. If it is not a noise word, it is checked against the existing values in the proper HashTable and included (ht1.Add(word, 1)), or its value incremented (ht1(word) = ht1(word) + 1) if it was already there.

Regarding the noise words, we first considered some word frequency lists available out there, but then we thought about using verb conjugations as well. So we first created MostCommonWordsEN, an array based on simple frequency lists, and then we created also MostCommonVerbsEN based on another frequency list which considered only verbs. At the end we created MostCommonConjugatedVerbsEN, where we stored all the conjugations of the former most common English verbs. When checking a word against one of these word strings we only use MostCommonWordsXX and MostCommonConjugatedVerbsXX (where XX is one of EN, ES, FR, DE, IT). Yes, we did the same for other languages like Spanish, French, German and Italian, whose conjugations are much more complex than -ed, -ing and -s terminations. For automatic generation of all possible conjugations for the given verbs (in their infinitive form) we used http://www.verbix.com/

Calculating meta title.

It will be the text surrounding the first <h1> and </h1> heading tags right after the main ContentPlaceHolder of the page.

Calculating meta description.

Most of the time a description about what a whole text is about (or at least it should) is within the first paragraphs of it. Based on that supposition, we try to parse and concatenate text within paragraphs (<p></p> tags) after the first <h1> tag. Based on our experience, when the meta description tag is longer than 394 characters, Google Webmaster Tools complain about it being too long. Taking that point in mind, we try to concatenate html-cleaned text from the first paragraphs of the text to create the meta description tag, ensuring it is not longer than 394 characters. Once we know the way our meta descriptions are automatically created, all we need to do is create our pages starting with an <h1> header tag followed by one or more paragraphs (<p></p>) that will be the source for creating the meta description for the page. This will be suitable for most scenarios. In other cases, you should modify the way you create your pages or update the code to match your needs.

Calculating meta keywords.

Provided those noise word lists for a given language, calculating the keywords (single word) and key phrases (two words) occurrences within the text was something straightforward. We just iterate through the text, check against noise words, and add a new keyword or increment the frequency if the given keyword is already on the HashTable. At the end of the iteration, we sort the HashTables by descending frequency (using a custom class inheriting from System.Collections.IComparer). The final keywords list is a combination of the most frequent single keywords (ht1) up to 190 characters, and the most common two-word key phrases (ht2), until completing a maximum of 250 characters. All of them will be comma separated values in lowercase.

Summary.

Having meta tags correctly set is a must, however it is difficult to set them manually on every page sometimes, furthermore not forgetting all possible keyword combinations. Too much frequently only a few words are added, and this is when automatic keyword handling can help. If you consider this might be your case, please, download our sample VB project and give it a try (and a few debug traces too). I will be waiting for your comments.

Links.

Internet Information Services IIS optimization

2009/08/21

Fixing “Padding is invalid and cannot be removed” when requesting WebResource.axd

If you are using ASP.NET in your website and have a look at your Application EventLog you will probably see warning entries like this:

CryptographicException: Padding is invalid and cannot be removed.

Event Type: Warning
Event Source: ASP.NET 2.0.50727.0
Event Category: Web Event
Event ID: 1309
Date:  21/08/2009
Time:  13:08:48
User:  N/A
Equipo: WEBSERVER
Description:
  Event code: 3005
  Event message: An unhandled exception has occurred.
  Event time: 21/08/2009 13:08:48
  Event time (UTC): 21/08/2009 11:08:48
  Event ID: 1cc59501bae34562a1e486c16f2e799f
  Event sequence: 11912
  Event occurrence: 1
  Event detail code: 0
  Application information:
    Application domain: /LM/W3SVC/1/ROOT-1-128952696565995867
    Trust level: Full
    Application Virtual Path: /
    Application Path: C:\Inetpub\webs\www.test-domain.com\
   Machine name: WEBSERVER
  Process information:
    Process ID: 3920
    Process name: w3wp.exe
    Account name: TEST-DOMAIN\IWAM_WEBSERVER
  Exception information:
    Exception type: CryptographicException
    Exception message: Padding is invalid and cannot be removed.
  Request information:
    Request URL: http://www.test-domain.com/WebResource.axd?d=pFeBotgPWN6u7M4UfAnWTw2&t=633687432177195930
    Request path: /WebResource.axd
    User host address: 127.0.0.1
    User:
     Is authenticated: False
    Authentication Type:
     Thread account name: TEST-DOMAIN\IWAM_WEBSERVER
  Thread information:
    Thread ID: 12
    Thread account name: TEST-DOMAIN\IWAM_WEBSERVER
    Is impersonating: False
    Stack trace:
       at System.Security.Cryptography.RijndaelManagedTransform.DecryptData(Byte[] inputBuffer, Int32 inputOffset, Int32 inputCount, Byte[]& outputBuffer, Int32 outputOffset, PaddingMode paddingMode, Boolean fLast)
       at System.Security.Cryptography.RijndaelManagedTransform.TransformFinalBlock(Byte[] inputBuffer, Int32 inputOffset, Int32 inputCount)
       at System.Security.Cryptography.CryptoStream.FlushFinalBlock()
       at System.Web.Configuration.MachineKeySection.EncryptOrDecryptData(Boolean fEncrypt, Byte[] buf, Byte[] modifier, Int32 start, Int32 length, IVType ivType, Boolean useValidationSymAlgo)
       at System.Web.UI.Page.DecryptStringWithIV(String s, IVType ivType)
       at System.Web.Handlers.AssemblyResourceLoader.System.Web.IHttpHandler.ProcessRequest(HttpContext context)
       at System.Web.HttpApplication.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
       at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)
Custom event details:
For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

Depending on how busy is your web server you can see them appear from time to time or up to every few minutes, thus filling your EventLog and being from a light annoyance up to a real problem (depending on how hypochondriac you are).

In fact, they are just warnings that can be ignored on most of the cases, but they can be a real problem when they bury other events and the forest do not let you see the trees. If there are many of them and you want to get rid of them (or most of them at least), keep on reading.

You might check your IIS Log by the times when the warnings appear and (if you also log user-agent) you will probably see that most of the time the URL is NOT requested by a real user, but a spider engine doing its crawl (googlebot, msnbot, yahoo, tahoma, or any other). You can double check doing a reverse dns check for the offending IP address doing a ping –a aaa.bbb.ccc.ddd and you will also see the IP resolves to something like *.googlebot.com, *.search.msn.com, *.crawl.yahoo.net or *.ask.com. This should give you a hint on what to do…

WebResource.axd is just an httpHandler that wraps several resources within the same DLL. It is in charge of returning from little .gif files for serving the arrows of the ASP:Menu control, to .js files governing the behavior of the menu itself. Even though your website do not use ASP:Menu control, you probably will be using WebResource.axd for javascript dealing the post back of your form or any other thing.

Why does this exception happen?

If you see in detail the parameters following the WebResource.axd request you will notice two of them. The first one d refers to a particular resource embedded in the httpHandler DLL. It is a fixed value as long as the source DLL is not updated or recompiled. The second t parameter is a timestamp parameter that changes whenever the web application (AppPool) is recompiled (a changed/updated DLL, an update to web.config, and so) and depends on the machineKey of the web site. If web.config does not explicitly declare a fixed machineKey, the t parameter will change from time to time (restarts, job recycles, etc).

In fact these CryptographicException warnings are well known in web farms configurations. In that case, all the servers belonging to the same farm must have the same machineKey because if a served page (.aspx container page) by a particular server of the farm includes a value of t parameter and the subsequent request for that URL resource is handled by other server of the farm, the exception would arise and the user could not download the resource. And, in this case we would be talking about real browsers with real users behind them, not spider engines.

Furthermore, if you have implemented a conditional GET in your webserver, this exception is more likely to happen, since a user can come back to your site, do a request for a page that has not changed, being returned a 304 Not Modified, and still request the resources included in that page, that might be invalid due to the change of t.

The solution: two steps.

As you can imagine, the first thing that you can do is setting a fixed machineKey in your web.config file. Even though you are not running a cluster, nor a web farm, it will help you to minimize the occurrences of the warning Padding is invalid and cannot be removed.

For this you can use a machineKey generator or generate your own if you know how to do it (random chars will not work).

<system.web>
  <machineKey
    validationKey='A06BDCF2F6CF.A.VERY.LONG.44F13E76184945A7C477601'
    decryptionKey='99079B21C2F3644.A.BIT.SHORTER.BB81C7E9D58378'
    validation='SHA1'/>
</system.web>

The second (and easier) step to follow is to prevent WebResource.axd URLs from being requested as much as possible. In particular by search engines crawlers or bots, since those resources should not be indexed nor cached in any way by them. Those URLs are not real content to be indexed. If you only add the following lines to your robots.txt you will see how the frequency of CryptographicException is reduced drastically. If you also change the machineKey to a static value, you will get rid of them almost completely.

User-agent: *
Disallow: /WebResource.axd

As I said, you will get rid of this warning almost completely. There might be search engines not following your robots.txt policies, users visiting you from a Google cached page version, etc. so you cannot get rid of this warning messages for good, but yet enough for not being a problem anymore.

Summary.

Summing up, this event appears when there is a big time difference (lap time) between the page that contains the resource and the resource itself being requested. During that lapse, the application pool might have been recycled, recompiled, the server restarted, etc, thus changing the value of t and thus, rendering the older t value useless (the cryptographic checks fail).

Links.

Internet Information Services IIS optimization

Keywords.

WebResource.axd, CryptographicException, padding, invalid, removed, machineKey, exception, warning, IIS

2009/03/29

Conditional GET and ETag implementation for ASP.NET

This post continues the series of Internet Information Services IIS optimization. See the link if you want to follow the whole series.

You can download the VB project code of this article. Another way for optimizing your web site is setting it up for supporting conditional GET, that is, implementing the logic for handling requests whose headers specify If-None-Match (ETag) and/or If-Modified-Since values. This is not something easy, since ASP.NET does not offer support for it directly, nor have primitives/methods/functions for it and, by default, always returns 200 OK, no matter the headers of the request (apart from errors, such as 404, and so).

The idea behind this is quite simple; let’s suppose a dialog between a browser (B) and a web server (WS):

B: Hi, can you give me a copy of ~/document.aspx?
WS: Of course. Here you are: 200Kb of code. Thanks for coming, 200 OK.
B: Hi again, can you give me a copy of ~/another-document.aspx?
WS: Yes, we’re here to serve. Here you are: 160Kb. Thanks for coming, 200 OK.
(Now the user clicks on a link that points to ~/document.aspx or goes back in his browsing history)
B: Sorry for disturbing you again, can I have another copy of ~/document.aspx
WS: No problem at all. Here you are: 200Kb of code (the same as before). Thanks for coming, 200 OK.

Stupid, isn’t it? The way for enhancing the dialogue and avoid unnecessary traffic is having a richer vocabulary (If-None-Match & If-Modified-Since). Here is the same dialogue with these improvements:

B: Hi can you give me a copy of ~/document.aspx?
WS: Of course. Here you are: 200Kb of code. ISBN is 55511122 (ETag) and this is the 2009 edition (Last-Modified). Thanks for coming, 200 OK.
B: Hi again, can you give me a copy of ~/another-document.aspx?
WS: Yes, we are here to serve. Here you are: 160Kb. ISBN is 555111333 (ETag) and it is the 2007 edition (Last-Modified). Thanks for coming, 200 OK.
(Now the time passes and the user goes back to ~/document.aspx, maybe it was in his favorites, or arrived to the same file after browsing for a while)
B: Hi again, I already have a copy of ~/document.aspx, ISBN is 555111222 (If-None-Match), dated 2009 (If-Modified-Since). Is there any update for it?
WS: Let me check… No, you are up to date, 0Kb transferred, 304 Not modified.

It sounds more logical. It takes a little more dialogue (negotiation) previous to the transaction, but if the conditions are met, these extra words saves time and money (bandwidth) on both parties.

Most of the browsers nowadays support such a negotiation, but the web server must do it also in order to get benefits. Unfortunately IIS only supports conditional GET natively for static files. If you want to use it also for dynamic content (ASP.NET files) you need to add support for it programmatically. That’s what we are going to show here.

 

Calculating Last-Modified response header.

To begin with, the server needs to know when a page was last modified. This is very easy for static contents, a simple mapping between the web page being requested and the file in the underlying file system and you are done. The calculation of this date for .ASPX files is a little more complicated. You need to consider all the dependencies for the content being served and calculate the most recent date among them. For instance, let’s suppose the browser requests a page at ~/default.aspx and this file is based on a masterpage called ~/MasterPage.master which has a menu inside it, that grabs its contents from the file ~/web.sitemap. In the simplest scenario (no content being retrieved from a database, no user controls), ~/default.aspx will contain static content within. In this case, the Last-Modified value will be the most recent last modification time of these files:
  • ~/default.aspx
  • ~/default.aspx.vb (Optionally, depending on your pages having code behind which modifies the output or not)
  • ~/MasterPage.master
  • ~/MasterPage.master.vb (Optionally)
  • ~/web.sitemap

The last-mod time is retrieved using System.IO.File.GetLastWriteTime. In case of the content being retrieved from a database, you must have a column for storing last-mod-time (when the content was last written) in order to use this functionality.

 

Calculating ETag response header.

The second key of the dialogue is the ETag value. It is simply a hash function for the final contents being served. If you have any way (with low CPU footprint) for calculating a hash based on certain textual input, it can be used. In our implementation, we used CRC32 but any other will work the same way. We calculate the ETag value making a CRC32 checksum of any dependant content plus the last-mod-dates of these dependencies. I our simplest case, the concatenation of all these strings:
  • ~/default.aspx last write time
  • ~/default.aspx.vb last write time (not likely, but optionally necessary)
  • ~/MasterPage.master last write time
  • ~/MasterPage.master.vb last write time (Optionally)
  • ~/web.sitemap last write time
  • ~/default.aspx contents
  • ~/default.aspx.vb contents (Optionally, but not likely, to speed up calculations)
  • ~/MasterPage.master contents
  • ~/MasterPage.master.vb (Optionally)
  • ~/web.sitemap contents

And then a CRC32 of the whole. If your content is really dynamically generated (from a database, or by code), you will need to use it also, like any other dependency and include it in the former list.
It might seem too much burden, too much CPU usage but, as everything, it really depends on the website:

  High CPU usage Low CPU usage
High volume This scenario might not cope with the extra CPU needed. See Note*. You can safely spend CPU cycles in order to save some bandwidth. Implementing conditional GETs is a must.
Low volume What kind of web server is it? Definitely not a public web server as we know them. Implementing conditional GETs will give your website the impression of being served faster.

Note*: Consider this question: Is your CPU usage so high partly because the same contents are requested once and again by the same users? If you answer is yes (or maybe), an extra CPU usage with the intention of allowing client-side caching and conditional GETs will, globally viewed, lower your overall CPU usage and also the bandwidth being used. Giving a try to this idea and decide for yourself afterwards.

 

Returning them in the response.

Once we have calculated both the Last-Modified & Etag values, we need to return them with the response of the page. This is done using the following lines of code:
Response.Cache.SetLastModified(LastModifiedValue.ToUniversalTime)    
Response.Cache.SetETag(ETagValue)

 

Looking for the values in request’s headers.

Now that our pages’ responses are generated with Last-Modified and ETag headers, we need to check for those values in the requests too. The names of those parameters, when asked via request headers differ from the original names:

Response headers names Request headers names
Last-Modified If-Modified-Since
ETag If-None-Match

The logic for deciding if we should return 200 OK or 304 Not modified is as follows:

  • If both values (If-Modified-Since & If-None-Match) were provided in the request and both match, return 304 and no content (0 bytes)
  • If any of them do NOT match, return 200 and the complete page
  • If only one of them was specified (If-Modified-Since or If-None-Match), it decides.
  • If none were provided, always return 200 and the complete page.
In order to return 304 and no content for the page the code to be used is:
Response.Clear()    
Response.StatusCode = System.Net.HttpStatusCode.NotModified     
Response.SuppressContent = True

 

Test 1: Requests to ~/default.aspx

Having the ideas in place, we have reused the VB project from the previous post ASP.NET menu control optimization, to add it the support for conditional GETs. In the sample VB project for this post there are 2 new files, under App_Code, called CRC32.vb which implements a crc32 checksum algorithm, and another one named HttpSnippets.vb which implements a method called ConditionalGET that does most the jobs explained in this post. We have used Fiddler2 to debug two requests made to ~/default.aspx.

The first one, shown in the left column (red arrow), is made without the browser having any cached information about it. As you can see the browser makes the request without providing any If-Modified-Since or If-None-Match headers. The response given by the server sets the ETag and Last-Modified values for the browser to use in the future in case it supports them.

The second request, shown at the right column (green arrow), is made by the same browser some seconds later. The browser already have information for the page being requested and provides that information along with the request: the If-Modified-Since and If-None-Match headers are provided. The result from the server in this case is different. Instead of returning 200 Ok and the whole page, it returns 304 Not Modified, and the size of the body is 0. You are saving bandwidth at the cost of some CPU cycles and some bytes more in the negotiation (headers).

 

Test 2: Requests to ~/default-optimized.aspx

Following with the ASP.NET menu control optimization project, we added also support for conditional GET to our ~/default-optimized.asp page, which saves the menu in an external client-side cacheable page, in order to reduce (even more) the size of the pages being transferred.

In this case the first column (red arrow) belongs to the request of ~/default-optimized.aspx. As you can see the size of the page being transferred completely is 3785 bytes (in the previous example it was 18358 bytes). This reduction is solely due to the ASP.NET menu control optimization. For more info about this check the previous article. Regarding the conditional GET, the first request does not know anything about the page and no data is provided in the request. The response includes ETag and Last-Modified values.

The second request of interest is at the right column (green arrow) and belongs to the same browser requesting the same file some seconds later. This time, information about the page is provided by the browser with the headers (If-Modified-Since and if-None-Match values). The server then checks them and decides that the content has not changed, returning 304 Not Modified and a body length of 0 bytes.

It seems that ASP.NET Developer Server (“Cassini”), the web server used for debugging with VS2008, does not handle static files very well. As you can see, menu.css and some other static files under ~/resources/ are transferred completely with every request. No ETag nor Last-Modified values are returned for them automatically. This does not happen at all in real production environments with IIS, which handles static files correctly (calculating ETags and Last-Modified values) to avoid transferring static files unnecessarily.

 

Resources and links.

Internet Information Services IIS optimization
For live websites (in the public internet) you can easily test if they support Conditional GETs using HTTP compression and HTTP conditional GET test tool
Another valuable resource is Fiddler2.
The VB website project source sample is available for you to download.

2009/03/28

VIEWSTATE size minimization

This post continues the series of Internet Information Services IIS optimization. See the link if you want to follow the whole series.

According to Microsoft, in NET Framework Class Library documentation for Control.ViewState Property:
A server control's view state is the accumulation of all its property values. In order to preserve these values across HTTP requests, ASP.NET server controls use this property […]
That means that, the bigger the contents of the control, the bigger must be its ViewState property.
What is it used for? When server technologies are used, such as ASP, ASP.NET, PHP, and so on, in the server side a high level and powerful language is used. These languages have advanced server controls (such as grid, treeview, etc), and they can do validations of any kind (on database access, etc). The final end of this high level language is transforming the ‘server page’ in a final page that a browser can understand (HTML+Javascript). If on the one hand you have server controls that are rendered into HTLM when they are output to the browser, what happens when the user does a postback/callback and sends the page back to the server? Here is where the ViewState plays its role, helping to recreate the page objects at the server, in the OOP sense (<asp:TextBox ID=...) based on HTML controls (<input name=”...).

Wouldn’t be easier to forget about all this and handle it in the traditional way? For recreating simple controls as a text box or a select box, it could be feasible to fetch the values right from the HTML, without using the ViewState, but imagine trying to recreate a GridView with only the HTML, having to strip out the formatting. Besides, without the ViewState we could not send to the server certain events such as a change of selection in a DropDownList (the previously selected element is saved in the ViewState).

Ok, we will need the ViewState after all, but, is there any way of minimizing it? Yes. As Microsoft states:
View state is enabled for all server controls by default, but there are circumstances in which you will want to disable it. For more information, see Performance Overview.
If you see Performance Overview, you will be suggested to:
Save server control view state only when it is required. View state enables server controls to repopulate property values on a round trip without requiring you to write code […]
Take that into consideration when writing your master pages, since most of the controls in the master page will be static (or at much, written only once by the server) and probably not needed at all again in case of a postback or callback (unless, for instance a DropDownList for changing the language of the site, being placed in the master page).

When can we disable the view state? Basically, when we use data that will be read only, or that will not be needed again by the server in case of a postback/callback, for instance a grid that do not have associated events for sorting or selection at the server.

There are several ways for controlling the viewstate:
  • In a page by page basis: If you have any particular page in which you know you will not need the viewstate, you can disable it completely at the page declaration:
    <%@ Page Title="Home" Language="C#" MasterPageFile="~/MasterPage.master" AutoEventWireup="true" CodeFile="default.aspx.cs" Inherits="_default" EnableViewState="false" %>
    However, doing so might render your masterpage controls (if any) unusable for that particular page. For instance, if you have DropDownList control in your masterpage for changing/selecting the language of the website and you disable the viewstate for several single files of your site,
  • In the master page declaration: In a similar way as you do for a single page, you can do it also in the masterpage. The result will be that, unless you override this option for single page (explicitly declaring single pages as having it), all pages using a master page declared this way will not have ViewState (and if they do, it will not contain any info about controls from the masterpage):
    <%@ Master Language="C#" AutoEventWireup="true" CodeFile="MasterPage.master.cs" Inherits="MasterPage" EnableViewState="false" %>
  • In a control by control basis: A more flexible (due to its granularity) way for controlling the view state is enabling/disabling it control by control:
    <asp:TextBox ID="TextBox1" runat="server" EnableViewState="false" ></asp:TextBox> 
    This will probably be the easiest method and the one that less interfere with the rest of a website; besides its effects (in the size of the viewstate and functionality) can be easily checked and can be easily reverted back if something does not work.
Most of the controls in a masterpage will fall into the category of light control group (see Viewstate Optimization Strategies in ASP.NET), it means that including or excluding them from the view state makes very little difference (its footprint is very small). Even that being the case, you should make sure you set EnableViewState="false" attribute for them just in case.

One of the asp.net controls that makes the View State grow heavily is the asp.menu control. As I showed in my previous post ASP.NET menu control optimization, moving it out of the masterpage and placing it in another standalone client-side cacheable file can make wonders. However, if you do not implement such a suggestion, you can at least disable the view state for the menu control. The menu control will still be rendered within every page, but the size of the View State will be significantly smaller without further effort. In one of our customers, simply adding EnableViewState="false" to the menu control definition, reduced the size of their homepage (for example) from 150Kb to around 109Kb. Since the menu was in the masterpage, the reduction was similar for all the pages in their site.

Links.

Internet Information Services IIS optimization

2009/03/25

ASP.NET menu control size reduction (a graphical proof)

I have prepared a graphical proof of the type of optimization I suggested a couple of days ago in my post ASP.NET menu control optimization. I just saved into plain txt files the following requests:

  • ~/default.aspx: The original menu sample page without any optimization
  • ~/default-optimized.aspx & ~/resources/menu-js.aspx: The optimized version that split menu related html to an external client-cacheable file (that is requested only once).

Then, I opened those .txt files with MS Word, and reduced the font-size to 6,5 for all of them (to keep the number of pages to a reasonable number), and did some highlighting:

  • Green: The useful real contents of the page.
  • Red: The __VIEWSTATE variable.
  • Blue: Menu related code.
  • Light blue (in menu-js.aspx): Parsed & modified menu related code converted to a javascript string to be written by browser directly.

~/default.aspx

The original menu page output

As you can see in the original non optimal version, the page is mostly filled with content related to the menu and __VIEWSTATE variable. The worst part of this original implementation is that in all the pages, 70%-80% of the contents is the same. The client's browser is downloading mostly the same contents once and again. My idea consists in taking that common factors out of the pages and place them in a single different cacheable page.

~/default-optimized.aspx

In the optimized version, menu related code is reduced drastically (blue). Content with white background is masterpage related code (formatting).

~/resources/menu-js.aspx

Most of the menu-related code is moved to an external file that is requested at the end of the masterpage. That file is client-side cached and thus requested only once per session. The result is that only pages with optimal contents are downloaded afterwards.

If you like this approach and want to see the whole post explaining the idea in detail, with source code project and all the stuff, see my previous post ASP.NET menu control optimization.

Links.

Internet Information Services IIS optimization

2009/03/23

IE8 breaks <asp:Menu> control

There has been a lot of controversy since the public release of IE8 last March, 19th (even before, when in beta) due to it is following the standards and because it does not properly render <asp:Menu> controls under certain conditions (because <asp:Menu> control developers did not follow the standards).

If you see this error feedback in Microsoft Connect regarding ASP.NET menu control not working in IE 8 beta 2, they closed it as ‘By design’ which means that IE8 behaves as it should, that the source of the problem is not IE8, but the ASP.NET engine. As they say, Microsoft will be releasing KB for ASP.NET that issues this problem sooner or later.

In the meantime there are some workarounds, as explained in ASP.NET Menu and IE8 rendering white issue:

  1. Overriding the z-index property.
  2. Using CSS Friendly Control Adapters.
  3. Add the IE7 META tag to the project.

Up to here, nothing new under the sun, I am just introducing you some facts already in the public web… the bad news of this post is that my solution for ASP:NET menu control optimization, posted yesterday, needs an update because it uses <asp:Menu> behind the scenes and shows the same behavior in IE8. The good news is that, since the menu is taken from an external file, only 2 files need to be updated, and not the masterpage, nor all the pages of whole website, etc.

~/resources/menu.css should be updated to include .IE8Fix { z-index: 100; } and ~/resources/menu.aspx should be updated so that DynamicMenuStyle includes an attribute CssClass="IE8Fix"

Those little changes are already implemented in the project available for you to download so you do not need to worry about this problem and concentrate yourself in optimizing.

2009/03/22

ASP.NET menu control optimization

This is the first of my posts regarding Internet Information Services IIS optimization. See the link if you want to follow the whole series.

One of the controls that our website uses the most is the <asp:Menu> control. It is used in the masterpage so that, in the end, it is used at every page of the site, along with breadcrums. I have prepared a sample VS2008 website projects in VB and C# where you can see the facts and follow the steps for yourself. In this sample project, the masterpage sets up several ContentPlaceHolders arranged for a multicolumn webpage. One row at the top contains the logo, breadcrums and menu for the website, a second row with 2 columns contains the left content and main content, and a third row at the end contains the footer with fixed text for all website pages. Of course, if you want to do it right, you should not use <tables> for the layout of the content, you should use <divs> and CSS styles, but that is out of the scope of this post. Here we will only cover and explain a way to optimize your pages that use <asp:Menu> controls.

The layout of the master page is shown using ~/default.aspx in the following image:

Using Fiddler2, the http debugging proxy, we get this file is sized 18214 bytes (17,78 Kb), when browsed through IE7 (see User-Agent string). I strongly recommend Fiddler2 if you want to optimize or debug your web server. It has a lot of useful features, one of the most interesting being the Timeline to see how your server performs in overall (considering all the requests for pages, css files, images, scripts, etc.) graphically, being the time in the X axis. In this case we will just prepare a request using Request Builder and see the results using the Inspector tab:

fiddler request for non-optimal page

Further analysis of the received page, throw these values:

CPH ContentPlaceHolders (2) 1,12 Kb 6,30%
VS __VIEWSTATE 2,81 Kb 15,80%
M Menu contents, scripts & related styles 11,80 Kb 66,37%
T-CPH-VS-M The rest, due to layout (master page) 2,05 Kb 11,53%
T TOTAL 17,78 Kb 100,00%

As you can see, most of the contents of the page is menu-related code. Furthermore, if the menu does not change (very very probable) between subsequent requests of the visitor, we are sending out the same contents again and again, since the menu is in our master page and the same menu related content is rendered for the browser in every page. What a waste of bandwidth (probably money too, if you pay your ISP by traffic) and time for your visitors. Being the bandwidth broader and broader nowadays is no reason for wasting it absurdly.

Besides, if you can read html and see through the generated file, you will see that html code for the menu is in near the top, exactly where we placed the <asp:Menu> control in the masterpage. What would happen if we could delay the load of the menu whist give priority to the real contents of the page? I mean, delay the load of the menu until the contents are shown in the visitor’s browser, and then (afterwards), load the menu. That would increase the responsiveness of the website; the page will not seem stalled while loading a big menu before the actual contents. The users could start reading the contents and in the meantime, even without notice, the menu would appear in its right place.

In subsequent requests, since the menu is already loaded, the visitor would not need to re-download those 11,80Kb (in our case) bytes of menu-related html. In our example, the page of 17,78Kb could be reduced to 1,12 + 2,81 + 2,05 = 5.58 Kb size. The size of the sample page would be 66% smaller, by just stripping out of the page the menu related html and placing it into another page. This can be reduced even more by minimizing the size of the __VIEWSTATE variable, but that will be another post.

The main things to be replaced.

If you read through the html generated code for the menu, you will find several distinguished pieces of code:

  • The <styles> used in the menu, in our example:
    <style type="text/css">
    .ctl00_Menu1_0 { background-color:white;visibility:hidden;display:none;position:absolute;left:0px;top:0px; } 
    .ctl00_Menu1_1 { color:Black;text-decoration:none; } 
    .ctl00_Menu1_2 { color:Black; } 
    .ctl00_Menu1_3 { } 
    .ctl00_Menu1_4 { background-color:Transparent;border-color:Transparent;padding:0px 5px 0px 5px; } 
    .ctl00_Menu1_5 { background-color:White;border-color:Transparent; } 
    .ctl00_Menu1_6 { color:Black; } 
    .ctl00_Menu1_7 { background-color:White;border-color:White;border-width:1px;border-style:solid;padding:0px 5px 0px 5px; } 
    .ctl00_Menu1_8 { background-color:White;border-color:#BBBBBB;border-width:1px;border-style:solid; } 
    .ctl00_Menu1_9 { color:White; } 
    .ctl00_Menu1_10 { color:White;background-color:#BBBBBB;border-color:Transparent; } 
    .ctl00_Menu1_11 { color:White; } 
    .ctl00_Menu1_12 { color:White;background-color:#BBBBBB;border-color:Transparent;border-width:1px;border-style:solid; } 
    </style>
  • Two calls to WebResource.axd for retrieving scripts:
    <script src="/www.mytestsite.com/WebResource.axd?d=Fg4XkH9c9OdEq6bmF8mMjg2&amp;t=633691223257795724" 
      type="text/javascript"></script>
    <script src="/www.mytestsite.com/WebResource.axd?d=-JPtlwQvfdzq429NBDEh_w2&amp;t=633691223257795724"
      type="text/javascript"></script>
  • The actual text for the menu, which is coded using tables (when the browser is IE7) and starts with the string: <a href="#ctl00_Menu1_SkipLink"><img alt...
  • Near the end of the page, there is a script that is also related to the menu, where the object is initialized with the styles and values defined for it. You will find something similar to:
    <script type="text/javascript"> 
    //<![CDATA[ var ctl00_Menu1_Data = new Object(); 
    ctl00_Menu1_Data.disappearAfter = 5000; 
    ctl00_Menu1_Data.horizontalOffset = 0; 
    ctl00_Menu1_Data.verticalOffset = 0; 
    ctl00_Menu1_Data.hoverClass = 'ctl00_Menu1_12'; 
    ctl00_Menu1_Data.hoverHyperLinkClass = 'ctl00_Menu1_11'; 
    ctl00_Menu1_Data.staticHoverClass = 'ctl00_Menu1_10'; 
    ctl00_Menu1_Data.staticHoverHyperLinkClass = 'ctl00_Menu1_9'; 
    //]]> </script>

The problem is that ASP.NET menu control renders differently depending on the User-agent (browser), thus we cannot take this values as fixed constants to create static files with them. However we can still do other thing: Create a simple page with only the menu (between searchable placeholders), self-request this menu-only-file on behalf of the browser making the real request, parse (using regex) and transform the result to create a script file, cache it on the server side too (varying on every user-agent) and return it to the browser (if not a valid cached version already stored).

The steps.

1. Create a standalone menu.aspx file for showing the menu only.

We need to create a ~/resources/ directory under the root of the site (any other name will do the job as long as it is explicitly excluded from being browsed in robots.txt), and as you have imagined, modify your robots.txt and insert:

User-agent: *
Disallow: /resources/ 
Disallow: /WebResource.axd

We will create a simple aspx file (not masterpage based) called ~/resources/menu.aspx and we will insert the <asp:SiteMapDataSource> and <asp:Menu> just as they were in the masterpage (copy & paste) inside the <form> tag. This way we will keep the format and properties of the menu, but get rid of everything else. This page will render just the menu, nothing else. Then surround the start and the end of <asp:Menu> tags with some comments that we will use afterwards when parsing the page to identify exactly where the menu starts and ends (something like <!-- MENU STARTS HERE --> and <!-- MENU ENDS HERE --> will do the job).

menu.aspx

2. Create menu-js.aspx that will be called by the masterpage.

Then we need to create another web form (not masterpage based) that we will call ~/resources/menu-js.aspx. This .aspx file will only have the <% @Page ...> directive, no contents at all at design time. The contents will be generated by the code-behind that will do the parsing of the former menu.aspx page and will be responsible for caching and sending the menu to the client’s browser after having rendered it as a javascript file. The contents of this javascript file that is sent to client’s browser are simply:

var placement = document.getElementById("aspmenu"); 
placement.innerHTML = *** ALL THE MENU CONTENTS ***

This way the menu is rendered after the page has already been loaded and shown in cllient’s browser using javascript, because the call to this menu-js.aspx is near the end of the page. This method works in latest versions of IE, Firefox, Safari, Opera & Chrome, provided that they have javascript enabled. In text only browsers (Lynx and similar) or if javascript is not enabled, this method falls nicely not showing any menu, but keeping the overall appearance provided by the masterpage intact.

3. Create the stylesheet for the menu.

We need to create a ~/resources/menu.css with all the styles that were defined by the original <asp:Menu> control, those named like ctl00_Menu1_xx shown before.

4. Changes in the masterpage.

4.1. Link to the former css file.

You need to include a link to the former css file in the masterpage (see MasterPage-Optimized.master file in the downloadable project, the line is <link href="~/resources/menu.css" rel="stylesheet" type="text/css" />).

4.2. Replace <asp:Menu> by identified <div>.

You must also include an empty <div> tag with id = “aspmenu” in place where the original <asp:Menu> was:

<div id="aspmenu" title="Menu"></div> 

This div tag called aspmenu is the placement where the javascript file will try to insert the real contents of the menu after the page has been loaded. See former point 2, in document.getElementById(“aspmenu”).

4.3. Changes after the <form> tag.

Right after the <form> tag, include a literal control <asp:Literal ID="ltWebResourceMenu" runat="server" EnableViewState="false" />. In the codebehind, this will be set to <script> tags to read the files menu-webresource-axd-a.js & menu-webresource-axd-a.js that we will prepare in next step.

4.4. Changes near the end of the masterpage.

The script that was near the end of a non-optimal page needs to be hard coded now into the master page. Thus, right before the </body> tag, we need to write:

<script type="text/javascript"> 
//<![CDATA[ var Menu1_Data = new Object(); 
Menu1_Data.disappearAfter = 5000; 
Menu1_Data.horizontalOffset = 0; 
Menu1_Data.verticalOffset = 0; 
Menu1_Data.hoverClass = 'Menu1_12'; 
Menu1_Data.hoverHyperLinkClass = 'Menu1_11'; 
Menu1_Data.staticHoverClass = 'Menu1_10'; 
Menu1_Data.staticHoverHyperLinkClass = 'Menu1_9'; 
//]]> 
</script> 
</form> 
<asp:Literal ID="ltMenuScript" runat="server" EnableViewState="false" />

5. Save WebResource.axd used resources as static files under ~/resources/.

In the original non-optimal ~/default.aspx you probably have noticed some lines requesting for a file called WebResource.axd with 2 parameters (d & t). In our case the menu contains some resources that we will grab and save as static files:

Original code Static filename Description
<img src="/www…com/WebResource.axd?d=p51493b-… menu-arrow.gif a right arrow
<script src="/www…com/WebResource.axd?d=Fg4XkH9c9O… menu-webresource-axd-a.js 20,3Kb javascript file
<script src="/www…com/WebResource.axd?d=-JPtlwQvfdz… menu-webresource-axd-b.js 32,4Kb javascript file
<img alt="Skip navigation links" … src="/…/WebResource.axd?d=vlTL… menu-webresource-axd-1x1.gif blank gif

6. Modify menu.aspx to use those static files.

Now that we have saved those resources as static files, we need to modify the menu.aspx we did on step 1, to use these files instead of calls to WebResource.axd. This can be done using the IDE, but the results (in code) should be similar to including these attributes to <asp:Menu> control:

DynamicPopOutImageUrl="~/images/menu-arrow.gif" 
ScrollDownImageUrl="~/images/menu-scroll-down.gif" 
ScrollUpImageUrl="~/images/menu-scroll-up.gif" 
StaticPopOutImageUrl="~/images/menu-arrow.gif"

7. Test the whole thing.

I think I have not left any step behind. Anyway you have the whole projects (in VB and C# , around 28Kb each zip file) to download and see the idea working for yourself. After all, and using Fiddler2, if we request the page ~/default-optimized.aspx, we get the following results:

fiddler request for the optimized-menu page

The file size is 3821 bytes (the original was 18214): that means an improvement of 79% in size reduction!!! Much better than we expected, that is because the size of the __VIEWSTATE has been reduced too (since <asp:Menu> control no longer resides in the page). Of course, the menu-js.aspx still needs to be downloaded, and its size is 9153 bytes (in our example), but using client-side caching, this file only needs to be downloaded once in an hour (Response.Cache.SetExpires(DateTime.Now.AddMinutes(60))).

Another advantage of having the menu rendered in a different file is that the pagerank that any of your pages might have will not dilute its outgoing value among all the rest of the pages due to links in the menu. This way the outgoing links for your pages are much less, only those in the masterpage (that you can easily set to rel=”nofollow”) and those that are real links inside your content. No more outgoing links from any page to any other page because of the menu.

An alternative to my approach for improving performance (and compatibility) of <asp:Menu> control is the use of CSS Friendly Control Adapters. However in that case, the menu is still rendered inside the page (not on a different page request). Their improvement makes a reduction of half the size of the html used to render the menu, by using CSS and <ul> tags instead of <table> tags. Though an improvement (most in compatibility) the improvement we achieve using our approach is much better, since we strip out of any page any menu-related html and place it in another file. By using client-side caching, that file is only requested once per client/connection. Maybe a hybrid solution would be the best: using CSS Friendly Control Adapters and place the html code related to the menu on a different page, but that has not already been done. By now you will need to make your mind up for one or another, you cannot have the best of both in a single solution.

I hope you find this article useful and I am willing to hear your comments about this approach to the problem.

2009-03-23 Update: I have just installed IE8 and checked the well known issue of dropdown menus appearing as blank boxes. Unfortunately, my solution for <asp:Menu> optimization shows the same behavior but has been fixed in both projects ( VB and C#). For more info see my post IE8 breaks asp menu control.

2009/03/21

Internet Information Services IIS optimization

It has been a long time since my last post. For the last 8 months I have been working on web pages, IIS based, ASP.NET 3.5, using master pages, and so on.

When I thought that most of the work was almost done (master page designing, CSS/HTML editing, linking between pages, and so on), I faced the other side of the problem: SEO optimization, page sizes optimization, download times, conditional GETs, metas (title, keywords, description), page compression (gzip, deflate). The biggest part of the iceberg was under the water; I had a lot to learn, and a lot of lines to code.

Now all that things are already in place and running, so I am willing to share all the things I have learnt with the community, in a series of posts that will cover:

  • ASP.NET menu control optimization; to reduce the page size, increase download speed, desirable to have in place before using conditional GETs.
  • __VIEWSTATE size minimization; in our case it simply doubled the size of the page. A proper optimization can make the page half the size (or less).
  • Conditional GET and ETag implementation for ASP.NET; generation of ETag and Last-Modified headers, when and how to return 304 – Not modified with no content (saves bandwidth and increases responsiveness of your site).
  • Solve the CryptographicException: Padding is invalid and cannot be removed when requesting WebResource.axd; this problem is somewhat common but you will fill your EventLog with these errors if you start using conditional GETs.
  • Automatic generation of meta tags: meta title, meta description, meta keywords; this way the editing of pages will be much simpler and faster.
  • URL canonicalization with 301 redirects for ASP.NET; solve problems of http/https, www/non-www, upper/lower case, dupe content indexing among others.
  • Serve different versions of robots.txt: Whether they are requested via http or https you can serve different contents for your robots.txt.
  • Enforce robots.txt directives; to ban those robots on the wild by detecting bad-behaving bots, not following rules at robots.txt; we will ban them for some months and prevent them from spending our valuable bandwidth.
  • Distinguish a crawl from Googlebot and from someone else pretending to be Googlebot (or any other well known bot); in order to ban those pretenders for a while.
  • Set up honey-pots being excluded in robots.txt and ban anyone visiting that forbidden URL; very good against screen-scrapers, offline explorers, and so on.

Since we use Google Webmaster Tools and Google Analytics for all our websites, we had the opportunity to check the consequences of every change. For instance here is the graph that shows the decrease in number of Kb downloaded per day when we enabled http compression and put conditional GETs in place. Note how the number of crawled pages keeps more or less the same during the period, while the Kb downloaded per day slides down passed middle January (peaks match several master page updates).