Good wiki pages have good titles and the title is usually in the URL. This is all well and good but if you’re creating pages on a corporate wiki that is hidden from the outside world, your title may be giving away too much information.
A few days ago on twitter one of my colleagues noticed this post on twitter.
His initial thought was that all our candid discussions about Erlang on twitter gave us away. But this wasn’t the case.
For those non-techies out there let me explain what a referrer is.
When you click a link on a web page, the destination site is sent information on what page sent it the traffic. The source page is the referrer (or referer as it is misspelled in the HTTP spec). There are several useful applications that use the referrer information that I won’t discuss here. Naturally, Wikipedia has a good article on the subject. http://en.wikipedia.org/wiki/Referer
What would have been worse is if we had a page called wiki.office.aol.com/wiki/Technology_Acquisitions_In_Process and someone at company X noticed this referrer in their access logs. If company X was a public company that person might run off and buy a pile of stock based on this simple observation.
Now most wikis are public and people wouldn’t be creating pages like this in the public space. But more and more companies have internal wikis and it is becoming common to discuss things on these pages that should not be shared with the public. Rather than having to worry about your page titles giving away too much information I have created a mediawiki extension that will prevent referrer information from being sent to external sites.
Information on the extension can be found here.
Handles are the key to having applications update on the profile page without a user taking action. Some notes on how to do this on Facebook are here. Since Bebo has implemented most of the Facebook API this should be easy but I think some things (like Infinite session keys) are still missing. Hoping someone will leave me pointer proving me wrong on this.
By placing a handle on a user’s page you can (in theory) push updates to a user’s profile page without them interacting with the application. For example. If you had a “Washington Capitals Fans” application that pushed news to every user’s profile that had the application, all these pages would have the same handle. Having a single handle means that a single update of the handle content would update all profiles using the application.
In other cases you might want a unique handle per profile. For example, if your application injected an RSS feed of the user’s choice onto their profile.
Here is a simple example of setting a handle on a user’s page. I use the user’s UID as the handle in this case.
I then write an update.php file that injects the date on the user’s profile
That’s all fine and good but the update wasn’t automatic. The user had to hit the update.php file. In a real application we would have a database of all the users and we could periodically update the handles. Assuming there is an infinite session concept in the Bebo API. Since I could not figure out how to implement an infinite session here is my hack.
I place a hidden image on the user’s page that calls the update function.
Now when a user views their profile, the hidden image calls refresh.php and the handle content is updated. Unfortunately, the update happens after the page is rendered so you’ll need to reload to see the change.
Unfortunately, the update happens after the page is rendered so you’ll need to reload the page to see the change. It should also be noted that anytime someone views your page they will call the update.php. If they also have the application installed your handle will be updated.
Assuming infinite sessions keys are unavailable I would leverage app users to update their peers. In your database of users keep track of the last time they were updated. Anytime the update.php function is called do an update on all handles that haven’t been updated in a specific period of time. As the user base grows this should allow for a more even distribution of updates. This might even be better than the cron jobs that infinite session key users use in their cron jobs.
Last week I spent some time learning how to build Bebo applications. Their API is just like the Facebook API but it is not complete and the documentation has some holes in it. If you haven’t built a Facebook application the entire process can seem awkward.
The Bebo Developer Page makes it sound like a simple 3 step process
- Get the library
- Install the app
- Start Creating.
If you don’t know how to write web applications in PHP, Java or Ruby stop now. You need to know at least one of these systems to continue.
I downloaded the PHP library and put it on my host. Next step was to install the developer application. Why? Why do I need to install an app to build apps? Seems weird at 1st but basically this application is used to manage the keys, locations and settings for your apps.
Once you have the app launch it and click “Create new app” (way off on the right) . Give your application a name and a URL and a callback URL.
What is the callback URL? It is the URL that is hit when a user goes to the Application URL. http://apps.bebo.com/yourappname is just a proxy that adds headers and parameters to the request before it hits your application file. The bebo library uses this information.
For example. I created an app called sample with a URL of http://apps.bebo.com/sample. My callback URL is http://bebo.tonycode.com/apps/sample
Hitting http://apps.bebo.com/sample results in a call to http://bebo.tonycode.com/apps/sample where I have the following index.php
This application simply prints out all your friends.
Add the application to your page by going to the developer application, viewing the app and clicking the profile page for the app. Now go to your profile page a voila.. the app is there but not your friends.
Huh? How do I get the content to show up on my profile page? If I don’t show it on the profile page it seems pretty pointless. right?
To add content to the profile page you use profile_setFBML. Last I checked the return codes for this function are not defined. Seems to return a 1 when it works and an array when it fails. Nice. :-\
Here is the code updated to put friends on profile too.
Weeks go by and you add new friends to your profile and you notice that the list of friends in your custom application aren’t changing. profile_setFBML is only called when you visit the application page. Well that stinks you say. How can I make it update each time I visit the profile? I’ll get to that in my next post.
- Documentation is at http://developer.bebo.com/documentation.html
- The downloaded libraries have example code. Start there
- You may need to look at 2 different sections to piece together how a function works. For example user.getInfo has a fields parameter
- Look at the SNQL users table to find what is available
After I blogged about building a sitemap for Rails I contacted Mike Clark and asked him if he thought it would make a good Recipe for his upcoming book, Advanced Rails Recipes. He thought it was a good fit and it is currently in the Beta version of the book.
I wrote up my notes in the Recipe format, then Mike basically rewrote it for Rails 2.0 and added some additional content. Thanks Mike! I almost feel bad being cited as the author since after editing it is drastically different from the original.
The core concepts are still there and some thoughts were dropped since Recipes should be short. So, Here are some elaborations..
The Ping Protocol
There is a warning in the book about excessively pinging to Google to have them read your sitemap. I would recommend letting search engines crawl your sitemaps at their own speed. The ping example in the book was a nice overview of when to use an Observer and also provided complete coverage on how to submit sitemaps. Please use ping sparingly, if at all.
Sitemaps with over 50,000 entries
I work on sites where we use siteindex files because we submit well over 50,000 URLs to the search engines. I didn’t provide an example on how to build these in Rails because I’m not sure they provide any value to the typical site.
My theory is that if you build a sitemap with the 50,000 pages that were most recently updated you will give the search engines all they need. If a page isn’t updated for a while and it falls off the list is that really a problem? If the page was worth anything someone externally would be linking to it before it fell off the list. Now if your site is creating millions of pages a day this may not be the case.
If your pages are islands (no links to them) and you’re afraid they won’t be found unless they are all in the sitemap, I would suggest building the sitemap via a rake task that is kicked off via a cron job. This will also give you an opportunity to gzip the files. I’ll try to writeup some example code for the this when I find some free time.
Do I really need a sitemap?
If your site has navigation to all its pages, then a sitemap will probably not benefit you. I suggest checking what pages the search engines have in their index and if key content is missing then pursue a sitemap. Even if they are finding all your pages a sitemap certainly couldn’t hurt.
Just in case you didn’t know how to find the pages Google knows about on your site you can simply type site:youdomain.com in the Google or Yahoo search box.
Example results for my site are here
Recently I implemented a Rails application sitemap but once the search engines started hitting my site, I realized that I wasn’t giving them all the information they desired.
- Yahoo was doing simple HTTP head requests on my sitemap and going away
- 18.104.22.168 – - [11/Dec/2007:08:35:49 -0800] “HEAD /sitemap.xml HTTP/1.0″ 200 334 “-” “Yahoo! Slurp/Site Explorer”
- Google read my sitemap infrequently but loved to do HEAD requests on my index page
- 22.214.171.124 – - [17/Dec/2007:07:11:24 -0800] “HEAD / HTTP/1.1″ 200 372 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
For those of you unfamiliar with HEAD requests, they are requests for a page that don’t actually return the page. Just information like size and date. Its a nice way for search engines and caches to decide if there are any changes they need to pull. For some examples of HTTP requests check out the wiki page I wrote on making HTTP requests via telnet.
Now what do you think Google and Yahoo were looking for? My assumption is they are looking for the Last-Modified header in the response to see if they should bother requesting the sitemap. They could compare the size of the response since they last visited but I doubt they want track this information.
The Last-Modified date is the ideal field to check.
This is all fine and good but guess what. Rails doesn’t set that header for you.
telnet ficlets.com 80 Trying 126.96.36.199... Connected to www.ficlets.com. Escape character is '^]'. HEAD / HTTP/1.0 Host: www.ficlets.com HTTP/1.1 200 OK Date: Wed, 09 Jan 2008 20:22:31 GMT Server: Mongrel 1.0.1 Status: 200 OK Cache-Control: no-cache Content-Type: text/html; charset=utf-8 Content-Length: 16355 Set-Cookie: _session_id=s322clipped; path=/ Vary: Accept-Encoding Connection: close
The good news is this is easy to add! If you look at my wiki example you’ll see that I set the header with the time of the latest entry in the sitemap.
headers["Last-Modified"] = @entries.updated_at.httpdate
Now it isn’t rails job to add this header for you but it would be nice if the scaffolding added this header. The standard show/1 actions are pulling a record from a database and the action knows the updated_at value.
I have added this header to my show actions as I am just pulling a record from a database and I have the modified time!
headers["Last-Modified"] = @business.updated_at.httpdate
That’s all. I’m looking forward to any comments and suggestions people may have.
I occasionally find myself setting the wrong date on my camera and ending up with 100s of pictures that need the dates edited on them. In the past I made these changes with programs that made me edit the dates one picture at a time. I want to set the year ahead or the hour back on 100s of pictures all at once.
JHead to the rescue! Just today I discovered that I needed to set the date back on 100 pictures that I had taken since daylight savings time. With jhead I just stuck the pictures with bad dates in a folder and ran
jhead -ta-1 *jpg
then I changed the actual timestamps on the files to match the EXIF data with
jhead -ft *.jpg
Now that my pictures had the correct hour I was hoping to see my pictures and my daughters in order in picasa. For example. The picture of Maia taking a picture of a boat should be followed by the picture of the boat.
It wasn’t! Turns out her camera was 5 minutes ahead of mine. With jhead this is not an issue.
jhead -ta-0:05 HP*jpg
jhead -ft HP*jpg
This moved the time on all her HP pictures back 5 minutes.
Now the pictures are in order and I’m happy. ya. I’m just a little anal.
Here is my picture
Here is what Maia got
More jhead notes are on my wiki