Alternative caching method for .htaccess. What is browser cache

Learn more about the SharePoint BLOB cache, page output cache, and object cache.

Microsoft SharePoint Server 2010 can be used to build various business solutions from collaboration portals and record archives to Internet sites. Whatever option you choose, you will still be interested in the acceptable speed of the solution, and here, understanding the principles of the cache will not be superfluous. The main task of the cache is to ensure faster display of your portal to end users. But every coin has two sides, so you need to know both the advantages and disadvantages of different types of cache.

In this article we will talk about three types of cache. Each of them has unique functionality to help your SharePoint Server grow. However, the cache is not a panacea; each type of cache has its own compromises, and it is far from a fact that all types of cache will suit your specific scenario. Thoughtlessly enabling the cache without correct settings, most likely will not lead to the expected performance improvement.

Any SharePoint Server installation consists of an instance of Microsoft SQL Server and at least one Web Front-End server. When users request data from SharePoint Server (for example, a page or document), the WFE server receives all the necessary data from SQL and processes the user request based on it. Although this ensures that the user receives the most up-to-date information, this situation results in increased traffic between the SQL and WFE servers, which in turn affects the speed of the end user.

SharePoint Server cache runs on Web Front-End servers, each type of cache stores a local copy of data in order to, whenever possible, serve clients using the local cache, reducing the amount of data transferred from the SQL server and the load on its own processors.

BLOB cache.

The BLOB cache reduces the load on SQL Server by storing the contents of the requested files (mostly parts of the page like JavaScript, CSS, and images) on the WFE server's hard drives. When a new request comes in for a file that has already been cached, the BLOB cache returns the file from disk instead of calling SQL Server.

When you develop SharePoint websites, there are several places to store page content. They can be stored on file system WFE server (usually in the _layouts directory) or in the SharePoint library. Files that are stored in the _layouts directory can be read from disk fairly quickly, but if the files need to be updated, the administrator must change them on each WFE server. Storing in a SharePoint library has its benefits, so that not only farm administrators can add and update content, but also users. But since everything that is stored in the library is in SQL, and by extracting data from SQL, the speed of obtaining it will be lower. So, when storing a file in SharePoint and using a BLOB cache, access to the content is provided quickly and there is the possibility of centralized management.

But there are also nuances. When adding a new file, it makes five times more requests to the SQL server than in a situation with the BLOB cache disabled. These additional calls retrieve permission information and other metadata to ensure safe and reliable operation of the cache. In addition, to avoid returning out-of-date content to the client, the BLOB cache will remove files from the cache if there is a possibility that they will become outdated. Naturally, after this the file will be re-cached, which will again affect calls to SQL.

In addition to reducing hits to the SQL server, the BLOB cache helps reduce page reload time by adding control headers to the HTTP response for the files it serves. These headers tell the user's browser to store these files in the browser's cache. When the browser needs one of the cached files, it can use that cache instead of going to SharePoint Server. This leads to a significant reduction in HTTP requests and page load time.

As already mentioned, the BLOB cache is especially useful when caching large multimedia files. SharePoint itself is optimized for working with small files. It can handle smaller files FileReadChunkSize (100KB) per request, and files up to 5 MB LargeFileChunkSize served directly from SQL without disk buffering with low latency. SharePoint buffers files larger than 5 MB on the WFE server disk before returning them to the user. This saves memory, but affects the return delay. BLOB cache can reduce latency in this situation. When a file is cached in a BLOB it is returned just as quickly as if it were located directly on IIS.

Another advantage of the BLOB cache is that it allows you to HTTP request a portion of a file instead of requesting the entire file. For example, if the browser only needs 1MB of a 10MB file, it can make a request and only get 1MB from the cache. When the BLOB cache is disabled, SharePoint Server ignores such requests (in the English documentation they are called HTTP range requests) and returns the full size of the requested file. It turns out that BLOB cache increases network performance by minimizing network load.

Client media players will benefit the most from such partial HTTP range requests. It doesn’t matter whether it’s Windows Media Player or Silverlight built into a web page, when you move the video forward slider, the BLOB cache will return the required part of the file without completely downloading it to the client.

Logical architecture and layout.

The BLOB cache runs on every WFE server in the farm. More precisely, every web application and every virtual server We have our own BLOB cache. In this case, a virtual server means IIS Web Site, but in SharePoint Server, as a rule, each web application is associated with one virtual server. Only one BLOB cache instance can be running on one virtual server at a time. This means that the BLOB cache cannot be used with Web Garden. (A web garden is an application pool that uses more than one request process to process requests, more than one w3wp.exe process)

If the SharePoint web application is extended, and it usually is extended when used different ways authentication for one portal, the second virtual server will be handled by its own instance of the BLOB cache. Therefore, the BLOB cache is enabled for each zone separately. For example, data requested by internal users is cached, but data requested by external users (via External Url) is not cached. And although the content provided to external and internal users is identical, having two instances of the cache cannot be avoided.

Cache filling mechanism.

Files with certain extensions end up in the BLOB cache as users request them. The extension list is customizable and can be configured for specific tasks. When checking out a file for the first time from the BLOB cache, small files may experience a slightly longer delay than a typical SharePoint checkout. On the other hand, large files are served faster due to the BLOB cache optimization performed. The file begins to be cached when the first bytes are read from SQL Server. The data is returned to the client, while the rest of it continues to be loaded from the database server. Naturally, this is only true for the first request, since subsequent times the data is served directly from the BLOB cache.

The BLOB cache can handle multiple requests for a single file by making the data in the cache available to all requests. This occurs even if the file has not yet been completely retrieved from SQL Server. For example, a link to a video report (500MB) stored on SharePoint Server is sent to e-mail company employees. If a large number of users click on the link at the same time, then with the cache disabled, many queries will be made to SQL Server. (one for each user) It’s not hard to guess how this will affect performance. With cache enabled, the video will be received from SQL once by each WFE server, and even if it does not have time to be cached completely, it will be used to serve all requests. The conclusion suggests itself - the BLOB cache is necessary for serving large files on the SharePoint server.

Data storage and disk cache size.

Since you should not edit any of the cache files manually, understanding the structure of BLOB cache data storage on disk is useful, at least from a theoretical point of view. The BLOB cache stores its files on disk in a structure that mirrors the structure of your portal. For example, a file on the portal with the URL http://contoso/sites/publishing/documents/somefile.jpg will be stored on disk approximately at the following path c:\BlobCache\14\11111111\AB25499AF39572\sites\publishing\documents\somefile-1238DEF8097AB .jpg. This path contains random parts of the string, this is done to prevent overwriting old version the file is newer because old file at this moment it can still be used. The name of the host where the file is located is replaced in the link with a unique string, which prevents caching conflicts between two files with addresses like http://contoso/images/logo.jpg and http://northwinds/images/logo.jpg.

In the operating room Windows system There is a 260 character limit in the file path. Since the BLOB cache adds additional unique lines to the cache file paths, it is quite possible that when writing a file to disk, this limit will be exceeded. Therefore, you should try to avoid excessively long URLs in your SharePoint portal. If you follow the recommendation, then for normal file caching you should not make links on the portal longer than 160 characters.

In addition to disk space, the BLOB cache requires a small amount of RAM to maintain an index of files on disk. Each index entry uses about 800 bytes of memory. In most cases, the memory consumed by the BLOB cache will be a small portion of the total memory consumed by SharePoint. However, if the BLOB cache needs to store hundreds of thousands of files, then the memory requirements will need to be planned with the above in mind.

BLOB cache persistence when restarting the application pool.

The BLOB cache is the only persistent cache, meaning that it will survive restarts or shutdowns of the IIS application pool. This happens because the index is periodically written to disk. A serialized index is approximately one-third the size of an in-memory index. Like all I/O operations, the size of the index affects the length of serialization and deserialization. A very large BLOB cache contains hundreds of thousands of elements, so the process of rewriting them into the index can take more than a minute. While the serialization process is in progress, new items cannot be added to the cache. This means that if requests are received for files that are not yet in the cache, the client will have to wait until the serialization process is completed. If the index is extremely large (millions of objects), the serialization time may exceed the client request timeout and the request will be discarded.

Cache checking mechanism.

The BLOB cache cleans up outdated cached files by polling SharePoint Server for changes. The default polling interval is five seconds, but this parameter can be configured. In fact, the file is deleted later (this interval is also configurable), after any HTTP sessions are disabled. Outdated and deleted files from the cache are not added to the cache automatically; they will be added the next time the user requests the file. When content changes on a SharePoint site, the BLOB cache can change quite rapidly. The following table shows file operations and their impact on the BLOB cache.

The maximum size of the BLOB cache is also adjusted to avoid unnecessary wastage. free space on disk. When the total size of files in the cache exceeds the established limits, the BLOB cache removes the least used files until the weight of cached files drops to 70% of the allowed size. This process is called compaction. Compaction is quite an “expensive” process in terms of performance, this is due to possible repeated caching deleted files. Running compaction periodically allows you to get rid of “unpopular” files and free up space for more frequently used ones. If compaction occurs frequently, this only indicates a lack of cache space, you can see the frequency of this operation using the “Total number of cache compactions” counter in the SharePoint Disk-Based Cache group. Providing additional space for frequent compaction is good decision, under ideal conditions the cache size should be sufficient to accommodate all popular queries.

Another way to delete cached files is to reset the cache. When the cache is reset, a new folder is created, but the old cache remains. This allows existing queries against the old cache to be completed. The old cache is deleted later after a certain time. (configurable interval) The cache can be reset for several reasons: if the index cannot be deserialized correctly at startup, the user policy for the web application has changed, the content database cannot be read. The cache can also be cleared manually by calling the Microsoft.SharePoint.Publishing.PublishingCache.FlushBlobCache() function from PowerShell.

Authentication and BLOB cache.

BLOB cache is optimized for anonymous file return. When an anonymously accessible file is requested, the BLOB cache returns it before authentication is attempted.

The benefits of this operating principle can be obtained in two cases.

1. Anonymous access to the site is allowed

2. Frequently requested files are stored in libraries that have the option enabled AllowEveryoneViewItems.

When creating a portal based on the Publishing Portal template, two libraries are created with the parameter set AllowEveryoneViewItems. These are the “Images” and “Site Collection Images” libraries. In any case, even if anonymous access is not used, the BLOB cache will work, but the WFE server will have to contact the SQL server to check user permissions. (ACL)

To be continued….

MCT/MVP Ilya Rud

Based on the document “SharePoint Server Caches Overview”

If, after updating the configuration, your forms float, the report stops working, and error windows pop up, then most likely the problem can be solved by clearing the cache. We'll tell you how.

What is cache?

The 1C:Enterprise program is created in such a way that during its work it constantly strives to optimize the speed of operations. For this purpose, a “cache” is created on the user’s computer, which stores frequently used information, for example: the location and shape of windows, user service data, selection settings, fonts, etc.

Caching allows you to reduce the number of calls to the server and, thereby, . This mechanism saves time, but also contains a number of problems.

If, after updating the configuration, your forms float, the report stops working, and error windows pop up, then most likely the problem can be solved by clearing the cache.

How to clear cache?

There are two main ways to clear the cache.

1. Launching the 1C database using the “/ClearCache” parameter

This method is very simple. In the infobase selection window, select the one whose cache you want to clear. Click the "Edit" button.

In the last Edit infobase window, set the launch parameter “/ClearCache”. Click "Finish" and launch the infobase.

As a result of the above steps, the client-server request cache will be cleared. Therefore, if the problem was in the local metadata cache, then this cache clearing method will not work. When using this method, it is important to understand that the temporary files folder will be “unlinked” from the infobase, but Not will be deleted from your computer.

2. Clearing the 1C cache manually

To delete cache files manually, you need to find the folders where the cache is stored. For operating systems Win7 and higher temporary files are stored at:

  • C:\Users\Username\AppData\Roaming\1C And C:\Users\Username\AppData\Local\1C in folders starting with "1cv8".
  • In Windows XP, in the user's folder at Local Settings\Application Data\1C\.
  • If the AppData folder is not visible, then you need to configure the visibility of hidden folders.

The figure below shows what cache files look like - folders with long, unclear names. In our case, there is only one file.

To clear the cache, you need to delete these folders.

Important! You can delete folders only when the processes of working with 1C:Enterprise are completed.

3. Clearing the cache in 1C on a server or user’s PC using ready-made scripts

On the Internet you can find ready-made scripts for cleaning temporary 1C files. The use of such scripts can lead to unpredictable consequences, so it is recommended only for system administrators and technical support staff.

This method will help clear the 1C cache on both the client and the server. To do this you will need access to the corresponding server folders

4.Additional

If after using the above methods to clear the cache an error occurs, for example “ Invalid data storage format“, is still saved, it is recommended to stop and manually clean the reg_1541/SNCCNTX folder. It is located on the computer of the central 1C:Enterprise server in the directory<рабочий каталог кластера> / <идентификатор информационной базы>.

For example:

Be careful, not everything in this folder can be cleaned. I will list what can be cleaned:

  • 1CV8Reg.lst – cluster registry (it stores a list of registered infobases, working servers and processes, correspondence between the cluster and additional manager, and a list of administrators.)
  • srvribrg.lst – list of clusters (registered clusters and central server admins)
  • 1cv8ftxt – full text search data. They are located on the central 1c server: cluster working directory - infobase identifier
  • 1Cv8Log – database registration log *.lgp and *.lgf.

It is important to keep in mind that after clearing the cache, the launch of 1C will slow down a little.

Photos, we learned that caching and RAM play a key role in the scalability and performance of the site.

The site can store data to speed up processing of subsequent requests at four levels:

  • client;
  • network;
  • server;
  • application level.

Different pages on a website often share the same resources. The user must reuse resources during navigation. Images, scripts, and styles can be cached for months, and the document page itself can be cached for minutes in the client browser.

Client level cache

HTTP headers are responsible for determining whether the response can be cached and for determining how long the data will be retained. The following example Cache-control header specifies that the response may be cached for 7 days. The browser will resend the data storage request if the storage period expires or the user deliberately refreshes the page.

A request and response that can be cached for 604800 seconds.

The response may also include a Last-Modified or Etag header. These headers are needed to check whether the data can be reused. A 304 response status indicates that the content has not changed and a re-upload is not required. Note the paired Last-Modified and If-Modified-Since headers, as well as the dates below:

A response with a “Last-Modified” header followed by a request using it.

The Etag header is used with If-None-Match in a similar way to exchange response codes when detecting changes in content, if any.

A site with well-thought-out HTTP headers will be more successful among users. In addition, the browser will save time and bandwidth.

Network level cache

Clients requesting the same content from the proxy server.

Multiple clients requesting the same content at the same time.

This simple yet powerful mechanism avoids application-side clutter when there are a large number of requests when content expires.

The idea behind this last but not least approach is that a proxy server can improve the application's fault tolerance. There are proxy_cache_use_stale directive flags for delivering expired content when the application returns an error status or when communication between the proxy server and the application is not working as expected.

Another important consideration when using cache stores is the race condition that occurs when different instances of an application access uncached data at the same time. The Rails request caching API includes a race_condition_ttl property to minimize this effect.

Anticipating race conditions for caches with multiple application instances is challenging. The optimal solution in this case is to update the cache data outside the application thread and use the cached data in the application itself. In a microservice architecture, you can secure the communication between the application and the service using nginx, as described above.

Conclusion

We hope this article helps you understand and choose the best strategy for your application. HTTP headers are the simplest thing you can and should configure to optimize your application's caching. Use other strategies as well when you experience certain performance problems, but remember that premature optimization is the root of all problems.

04/17/1999 Phil Keppeler

IP cache servers are expected to be in high demand in the enterprise market in 1999. Below we will look at the latest offers from manufacturers. In contrast to the bandwidth of global networks, memory has become much cheaper.

IP cache servers are expected to be in high demand in the enterprise market in 1999. Below we will look at the latest offers from manufacturers.

In contrast to the bandwidth of global networks, memory has become much cheaper. According to IDC research, the overall price level for global networks will remain the same or, at best, decrease slightly. Meanwhile, the cost of memory decreases annually by 31.4-39.8%.

Given these facts, IP caching becomes attractive for optimizing bandwidth utilization and improving network efficiency. Keeping frequently accessed files closer to end users reduces enterprise bandwidth requirements global network or Internet connections and, as a result, eliminates or delays the need for costly upgrades. It also improves end-user productivity because objects are delivered at LAN speeds.

The Internet community knew about the benefits of caching long before the Internet became the commercial phenomenon it is today. Typically, file archives for Internet services such as ftp, gopher, and conferences were mirrored around the world to keep popular files as close to users as possible. With the advent of HTTP, mirroring became ineffective due to the sheer volume, time sensitivity, and random nature of the requested content.

IP cache servers are to HTTP what mirroring was to archiving protocols. All cache servers are based on essentially the same principles: they intercept requests for objects from the browser to the Web server and store the objects received from the server on the hard drive before transferring them to the browser. Thus, on subsequent requests for the same object from other browsers, the cache server returns a copy of the object from its memory instead of passing a request to the Web server to obtain the original object. Ideally, having a cache server perform requests for objects should save both time and bandwidth. (More detailed description caching technologies can be found in the article “Small cache is expensive” in LAN No. 3 for this year.)

Under pressure from both consumers and content providers, Internet providers have become the primary users of IP caching. Faster connections such as Digital Subscriber Line (DSL), IDSN and cable modems offer hope that the once weakest link in the data chain was the standard telephone modem with a maximum data transfer rate of 56 Kbps. /s - will be eliminated. As Internet connections speed up, the volume of objects copied will increase proportionally, which will lead to an increase in traffic on the Internet backbone. At the same time, content providers are moving to more complex and high-volume data formats, such as streaming audio/video and Java applets.

As a result of this attack from both sides, Internet providers are forced to look for more effective ways using its infrastructure to meet user requirements. IP caching was and remains an important part of their solution.

Although many Internet providers recognize the benefits of IP caching, enterprises have yet to implement the technology on a large scale. According to a February 1998 Collaborative Research report, about 80% of Internet providers in the United States have announced plans to implement caching within the next six months. On the other hand, only 56% of companies planned to start using caching within the same period. However, as experts predict, in 1999 caching will be in high demand in the corporate market. According to Collaborative Research, enterprise caching technology investment is expected to quickly outpace that of Internet providers, growing from $85 million in 1998 to over $1 billion in 2000 (see table).

World market for caching products in 1998-2002.
Market segment 1998 1999 2000 2001 2002
Corporate users $85 million$421 million$1,113 million$2,108 million$3,157 million
Internet providers $103 million$214 million$376 million$481 million$576 million
Other $19 million$63 million$149 million$259 million$373 million
Total $207 million$698 million$1,638 million$2,848 million$4,106 million
Source: Collaborative Research, 1998

This trend has not gone unnoticed by manufacturers, and they are actively reorienting their products to corporate clients. Initially producing high-end products for Internet providers, manufacturers began to include offerings in their product lines at a relatively low price and with a level of performance sufficient for companies. In addition, a dozen new vendors have announced or released cache servers based on industry standard hardware and software- for example servers on the Intel platform with free Squid caching software - with the goal of offering products as cheap as possible.

INTERMEDIARY AS A CACH?

The first cache servers were usually implemented on the basis of proxy servers. As such, they acted as object brokers for a group of users, accepting all requests and passing them on to the destination on the Internet. As a common access point for all users, proxy servers have proven to be extremely attractive for implementing a variety of additional services: content filtering, user identification, event logging, and object caching. Together with a firewall, the proxy server made it possible to create a secure connection to the Internet.

One of the first cache-enabled intermediaries was the Harvest Cache software server, which was the result of a joint project funded in 1994-1996 by the Advanced Research Projects Agency (ARPA), the National Science Foundation, NSF) and NASA. Since then, at least a dozen products have been marketed as "caching intermediaries." Notably, Netscape Communications, Microsoft, and Novell all have cache-enabled proxy servers that are tightly integrated with their other enterprise tools. In addition to caching, their products offer a wide variety of intermediary functions such as user authentication, content filtering, virus scanning, security, and event logging. Microsoft's Proxy runs on Windows NT 4.0; Proxy Server from Netscape - based on most varieties of UNIX, as well as Windows NT; BorderManager FastCache from Novell - on IntranetWare, NetWare 4.11 and NetWare 5.

Another widely used commercial caching intermediary is Squid, an extension of Harvest Cache developed by the National Laboratory for Advanced Network Research (NLANR). Perhaps because it emerged as the product of a collective effort in an environment where standardized and accepted software is welcomed and widely used, Squid has established itself in the Internet provider market and continues to have a relatively strong installed base.

Configurations with caching proxies have two main disadvantages. First, because each user's browser must be configured to go through an intermediary, a server failure causes all users to lose their connection to the Internet. Secondly, entering the configuration of each user's browser with information about the intermediary can be labor-intensive in large enterprises and, essentially, an impossible task for the Internet operator.

To avoid these issues in man-in-the-middle configurations, you can implement transparent caching on your network by installing a policy-enabled router or Layer 4 switch to forward traffic to a cache server or group of servers. These devices intercept all HTTP traffic on port 80 and redirect it to the cache. Cache executes HTTP requests and returns the objects back to the browser. A truly transparent caching solution must support scalability by balancing the load across multiple cache servers, as well as failover to backup servers if one or all cache servers are unavailable. Examples of Layer 4 switching devices include ACEdirector from Alteon Networks and ServerIron from Foundry Networks.

DynaCache's Infolibria cache server takes a different approach, providing transparency without the need for a separate switch or router. This is achieved using DynaLink Redirector (DLR), a dedicated Layer 4 switch that interfaces with DynaCache. The DLR, an integral part of the company's caching strategy, resides on the network and forwards only cache misses to the Internet. According to the company, this strategy can reduce the load on the router by two-thirds.

SOFTWARE VS HARDWARE

In 1997, in a report entitled "Why Caching Matters," Forrester Research predicted that Internet service providers and businesses would migrate from software cache servers to dedicated caching devices. Likewise, Dataquest stated in a July 1998 report that dedicated devices would dominate the caching product market.

It is therefore not surprising that over half a dozen manufacturers released caching devices in 1998. They claim that their products offer better performance than their software counterparts because the operating system and caching server are tightly integrated with each other and optimized for caching. They also claim that their products are easier to set up and configure and provide more secure platforms because they are less likely to create security holes due to administrative or configuration errors. Typically, software caches, such as the caching proxy discussed above, are designed with a proxy focus in mind, while hardware caches are designed solely to support heavy-duty caching. Despite this, many caching devices can be used in proxy configurations.

Network Appliance was one of the first to offer a dedicated caching appliance. To do this, it adapted the NetCache software for the hardware product. Network Appliance acquired NetCache software (and acquired Peter Danzig, one of the chief architects of the Harvest Project, into the bargain) along with a small start-up company, Internet Middleware.

Other caching devices introduced in 1998 include Cisco Systems' Cache Engine, CacheFlow's CacheFlow, and InfoLibria's DynaCache. While not strictly a dedicated device, Sun Microsystems' Netra Proxy comes preconfigured on an UltraSPARC II computer. It contains Sun's caching software and is optimized for these functions.

More recently, relatively inexpensive caching devices have appeared on the market. They are based on standardized hardware and software and are preconfigured server devices designed to make caching simpler and more affordable. This approach may be attractive to small companies or even large corporations that want to take advantage of the benefits of workgroup caching but are hesitant due to the high cost and complexity of available solutions. The price of these products hovers around $2,000, while the above-mentioned solutions cost at least $7,000.

Three examples of low-cost caching devices are Packetstorm Technologies' WebSpeed ​​and Cobalt Networks' CacheQube and CacheRaQ. WebSpeed ​​retails for between $2,100 and $7,100 depending on cache size. WebSpeed ​​uses Intel processors and a free operating system Linux system, as well as Squid caching software. The company is betting that customers will appreciate a low-cost, preconfigured device that they can install into their networks with minimal effort. Cobalt Network's CacheQube and rack-mounted CacheRaQ are scalable through both DRAM capacity and disk space, and by creating a cluster of several devices. CacheQube costs $1,899 and CacheRaQ costs $2,299 or $2,799 depending on configuration.

In an attempt to counter experts' predictions that dedicated caching devices will dominate the market, Inktomi has released Traffic Server, which the company positions as a high-performance caching solution aimed primarily at Internet service providers and large enterprises. In contrast, other software caches focus on mediation and protection functions as much as caching functions. At $30,000 per CPU, Traffic Server also has the price of a carrier-grade product.

COMPATIBILITY AND STANDARDS

Dating back to early caching research at the Harvest Project, the Internet Caching Protocol (ICP) defines how multiple IP cache servers can share information about the recency of Web objects and how they retrieve objects from other caches ( as opposed to retrieving objects from the original Web server). With ICP, server administrators can configure the cache to query other cache servers that also support ICP to see if they have the latest information about Web objects. For example, a local cache might ask an upstream cache to see if it has a newer copy of a file, and if it doesn't, then whether it has checked the age of the file on the origin server. Even if the upstream server has no more new version file, he could have recently verified that the file had been modified on the origin server. Depending on the update algorithm, the local cache may use this information to retrieve a newer version of the object from the origin server, or use a local copy instead (see figure).

Polling the upstream cache introduces additional latency due to increased transmission distance and time; however, the time savings will be quite significant in many cases, since the request does not have to travel all the way to the server with the original object. In addition, providing objects from ICP-communicating servers located close to the recipient will reduce the load on the Internet backbone, freeing up bandwidth for the Internet community as a whole. Almost all caching solutions today support ICP.

Similar to ICP, the Caching Array Routing Protocol (CARP) is a protocol for sharing the caching load across a local server fleet. It was developed by Microsoft and submitted to the World Wide Web Consortium (W3C) as an Internet standard proposal. In addition to Microsoft, about a dozen other vendors, including Packetstorm Technologies and Sun, have announced support for CARP.

To enable the Cache Engine to communicate with its routers, Cisco developed the Web Cache Communication Protocol (WCCP). Using WCCP, a Cisco router with iOS support intercepts HTTP requests coming from browsers and redirects them to a cache server or dedicated device. WCCP supports scalability by distributing requests across multiple cache servers based on their availability.

In November 1998, Cisco began licensing WCCP to other caching product manufacturers. Inktomi and Network Appliance have announced plans to include WCCP support in future releases of their products.

MARKET INDICATORS

Although there is some controversy regarding the numbers, the market for Internet caching products is expected to grow significantly over the next four years. Collaborative Research projects the market to grow from $206 million in 1998 to over $4 billion in 2002.

Given these numbers, it is not surprising that major software and hardware manufacturers and developers are trying to use their position to penetrate the caching market. For example, with a large installed base of server operating systems, Novell is relying on tight integration of BorderManager with its other products to attract attention from corporate customers.

Like Novell, Microsoft and Sun are vying for dominance in the Internet software and server market. They both have large installed bases of Web servers and position their products—with their accompanying arsenal of intermediary capabilities—as essential components for an integrated Web application support environment. With a large installed base of networking devices, Cache Engine's tight integration with other Cisco networking components can help drive widespread adoption.

PRICE WHAT YOU NEED

When you decide to implement caching on your network, you have a choice of products from free to those costing $100,000 or more. In general, the more expensive the product, the more powerful it is.

At the lower end of the price scale, where software cache servers have recently dominated, you can now find about a dozen caching devices. Using free product, such as Squid, which is available in both source and precompiled form, you will need a computer on which to install it. To avoid unnecessary expenses, you can repurpose existing equipment to perform caching tasks.

Netscape, Microsoft, and Novell offer powerful software cache servers with a wide range of mediation features. Their products cost around $1000 per CPU. As with Squid, the overall cost of the solution can be reduced by using existing hardware. Otherwise, the cost of purchasing equipment will have to be included in the expense portion.

Phil Keppeler is a Web developer at a design and programming firm. He can be contacted at: [email protected].

Products Reviewed

Microsoft

Netscape Communications

National Laboratory for Advanced Network Research

Alteon Networks

ACEdirector
http://ircache.nlanr.net/Cache/FAQ/ircache-faq-9.html .

Brian D. Davidson, Ph.D. from Rutgers University, maintains an information page on caching resources on his server http://www.cs.rutgers.edu/~davison/Web-caching/. It contains news about caching, a list and table of caching intermediaries, a bibliography, etc.

If you would like to learn more about the Harvest Project, relevant links to research findings, meeting transcripts, and frequently asked questions are available at: http://www.harvest.transarc.com .

CacheNow is an ongoing campaign to promote large-scale caching to address bandwidth shortages and overcome Internet infrastructure limitations. Information about it is available at http://vancouver-Webpages.com/CacheNow/ .



  • Translation

Quite a detailed and interesting presentation of material regarding the cache and its use. Part 2 .

From the translator: please report typos and inaccuracies in a personal message. Thank you.

A web cache resides between one or more web servers and a client or multiple clients, and monitors incoming requests while storing copies of the responses—HTML pages, images, and files (collectively known as representation(representations); approx. translator - let me use the word “content” - it, in my opinion, does not hurt the ear so much), for own needs. Then, if another request arrives with a similar URL, the cache can use the previously stored response instead of asking the server again.

There are two main reasons why web cache is used:

1. Reduced waiting time- since the data on request is taken from the cache (which is located “closer” to the client), it takes less time to receive and display content on the client side. This makes the Web more responsive.

2. Decline network traffic - reusing content reduces the amount of data transferred to the client. This in turn saves money if the customer pays for traffic, and keeps bandwidth requirements lower and more flexible.

Types of web caches

Browser cache
If you examine the settings window of any modern web browser (for example, Internet Explorer, Safari or Mozilla), you'll probably notice a Cache setting option. This option allows you to select an area hard drive on your computer to store previously viewed content. The browser cache works according to fairly simple rules. It simply checks to see if the data is "fresh", usually once per session (that is, once in the current browser session).

This cache is especially useful when the user presses the back button or clicks on a link to see the page they were just viewing. Also, if you use the same navigation images on your site, they will be fetched from the browser cache almost instantly.

Proxy cache
Proxy cache works on a similar principle, but on a much larger scale. Proxies serve hundreds or thousands of users; large corporations and Internet service providers often configure them on their firewalls or use them as individual devices(intermediaries).

Since proxies are not part of the client or origin server, but still face the network, requests must be forwarded to them somehow. One way is to use your browser settings to manually tell it which proxy to contact; Another way is to use an interception proxy. In this case, proxies process web requests forwarded to them by the network without the client needing to configure them or even know about their existence.

Proxy caches are a kind of shared cache: instead of serving one person, they work with a large number users and are therefore very good at reducing latency and network traffic. Mainly because popular content is requested many times.

Gateway Cache
Also known as “reverse proxy caches” or “surrogate caches,” gateways are also intermediaries, but instead of being used by system administrators to save bandwidth, they are typically used by webmasters to make their sites more scalable, reliable and efficient.

Requests can be forwarded to gateways by a number of methods, but typically some form of load balancer is used.

Content delivery networks (CDNs) distribute gateways throughout the Internet (or some part of it) and serve cached content to interested websites. Speedera and Akamai are examples of CDNs.

This tutorial primarily focuses on browser caches and proxies, but some of the information is also relevant for those interested in gateways.

Why should I use it

Caching is one of the most misunderstood technologies on the Internet. Webmasters, in particular, fear losing control of their site because proxies can “hide” their users, making it difficult to monitor traffic.

Unfortunately for them (webmasters), even if web cache didn't exist, there are too many variables on the Internet to ensure that site owners would be able to get an accurate picture of how users are interacting with the site. If this is a big problem for you, this guide will teach you how to get the statistics you need without making your site “cache-hating”.

Another problem is that the cache may store content that is out of date or expired.

On the other hand, if you design your website responsibly, a cache can help with more fast loading and maintaining the load on the server and Internet connection within the acceptable limits. The difference can be dramatic: a non-cache site can take a few seconds to load; while the benefits of using caching can make it seem instantaneous. Users will appreciate the site's fast loading time and may visit it more often.

Think of it this way: many large Internet companies spend millions of dollars setting up farms of servers around the world to replicate content in order to make data access as fast as possible for their users. The cache does the same for you and is much closer to the end user.

CDNs, from this perspective, are an interesting development because, unlike many proxy caches, their gateways are aligned to the interests of the website being cached. However, even when you use a CDN, you still have to consider that there will be a proxy and subsequent caching in the browser.

To summarize, proxies and browser cache will be used whether you like it or not. Remember, if you do not configure your site to cache correctly, it will use the default cache settings.

How web cache works

All types of caches have a specific set of rules that they use to determine when to take content from the cache if it is available. Some of these rules are set by protocols (HTTP 1.0/HTTP 1.1), some are set by cache administrators (browser users or proxy administrators).

Generally speaking, these are the most general rules(don't worry if you don't understand the details, they will be explained below):

  1. If the response headers tell the cache not to save them, it won't save them.
  2. If the request is authorized or secure (that is, HTTPS), it will not be cached.
  3. Cached content is considered “fresh” (that is, can be sent to the client without checking from the origin server) if:
    • It has an expiration time or other header that controls the lifetime and has not expired yet.
    • If the cache recently checked the content and it was modified quite a long time ago.
    Fresh content is taken directly from the cache, without checking from the server.
  4. If the content is out of date, the origin server will be asked to validate it or tell the cache whether the existing copy is still up to date.
  5. Under certain circumstances—for example, when it is offline—the cache may retain stale responses without checking with the origin server.
If the response does not have a validator (ETag or Last-Modified header) and does not contain any explicit freshness information, the content will usually (but not always) be considered uncacheable.

Freshness(freshness) and validation(validation) are the most important ways in which the cache operates on content. Fresh content will be available instantly from the cache; Valid content will avoid resending all packets if it has not been changed.

Publications on the topic