Formatting Google News in your own way

GoogleNews.PNG
Live demo for the article: http://xsltdb.com. Look at right pane.

Google provides news with images, titles, descriptions, etc. in form of RSS. But news descriptions go with ugly HTML and bunch of links that you probably do not want to see on your site.

Here I’ll show how you can use XsltDb to get clear news with high quality markup. Note, that news aren't put into database. News are kept in ASP.NET cache.

First we need to obtain apropriate news RSS link. Go to news.google.com, search for news and copy RSS link (at the bottom of the page).

In XsltDb configuration create a variable for the link:

<xsl:variable name="url">
<![CDATA[http://news.google.com/news?pz=1&cf=all&ned=us&hl=en&q=barack+obama&cf=all&output=rss]]>
</xsl:variable>
Now we can download news to our server

<xsl:variable name="news" select="mdo:node-set(string($url), 120)"/>
mdo:node-set is capable to download XML data from a http URL. If the second parameter is specified it puts XML in ASP.NET cache for specified number of seconds. So the line of code above will re-query URL every 2 minutes. After we’ve downloaded news we can iterate through news as follows:

<xsl:for-each select="$news//item">
<!-- Output news record here -->
</xsl:for-each>
Each item of this RSS feed contains a description tag that is actually XHTML, so we can use XPath to analyze it. But XHTML can contain HTML entities that is not supported by XML so we need to replace “&” to “&amp;” and reverse it before outputting text to page. This can be done with the following line of code:

  <xsl:variable name="desc" select="mdo:node-set(mdo:replace(description, '&amp;', '&amp;amp;'))"/>
mdo:node-set is also capable of converting XML string to XML DOM so we can query it using XPath.
Now we can output news on page. First, get image if it is in feed:

  <xsl:if test="$desc//img[@src!='']">
    <a href="{link}"> <!-- display image as hyperlink -->
      <img src="{$desc//img[@src!='']/@src}" style="float:left;padding-right:5px;"/>
    </a>
  </xsl:if>
Second, extract news header, details and news maker brand.
Exploring description tag you can find out that header is something like the following

<xsl:copy-of select="$desc//div[@class='lh']/a/b//text()"/> <!--plain text -->
or

<xsl:copy-of select="$desc//div[@class='lh']/a/b/node()"/> <!--with tags -->
News details can be extracted as follows

<xsl:copy-of select="$desc//div[@class='lh']/font[position()=2]/node()" />
News origin can be got with the following code:

<xsl:copy-of select="$desc//div[@class='lh']/font[position()=1]//text()" />
Putting all together we get a simple configuration that is capable to query google and display news with your own template.

<xsl:variable name="url">
<![CDATA[http://news.google.com/news?pz=1&cf=all&ned=us&hl=en&q=barack+obama&cf=all&output=rss]]>
</xsl:variable>

<xsl:variable name="news" select="mdo:node-set(string($url), 120)"/>

<div style="width:300px;text-align:justify;font-size:11px;">
<xsl:for-each select="$news//item">
  <!-- Get description as XML -->
  <xsl:variable name="desc" select="mdo:node-set(mdo:replace(description, '&amp;', '&amp;amp;'))"/>
  
  <!-- If there is an imegr - output it. -->
  <xsl:if test="$desc//img[@src!='']">
    <a href="{link}"> <!-- display image as hyperlink -->
      <img src="{$desc//img[@src!='']/@src}" style="float:left;padding-right:5px;"/>
    </a>
  </xsl:if>

  <!-- Get header text from description -->
  <xsl:variable name="header">
    <xsl:copy-of select="$desc//div[@class='lh']/a/b//text()"/>
  </xsl:variable>
  <div>
    <a href="{link}">{h{mdo:replace(mdo:text($header), '&amp;amp;', '&amp;')}}</a>
  </div>

  <!-- Output details of news -->
  <xsl:variable name="details">
    <xsl:copy-of select="$desc//div[@class='lh']/font[position()=2]/node()" />
  </xsl:variable>
  <div style="padding-top:5px;">
    {h{mdo:replace(mdo:text($details), '&amp;amp;', '&amp;')}}
  </div>

  <!-- Output news maker brand -->
  <xsl:variable name="origin">
    <xsl:copy-of select="$desc//div[@class='lh']/font[position()=1]//text()" />
  </xsl:variable>
  <div style="padding-top:5px;text-align:right;color:grey;">
    {h{mdo:replace(mdo:text($origin), '&amp;amp;', '&amp;')}}
  </div>

  <div style="clear:both;padding-bottom:20px;"/>
  
</xsl:for-each>
</div>

Last edited Sep 12, 2010 at 7:39 AM by findy, version 26

Comments

findy Sep 13, 2010 at 1:56 PM 
I tried to find good balance between complexity and flexibility. Many other tools are simple to learn and use but users have limited programming model while tool's author has to work hard on top-level functions.

JohnPlex Sep 12, 2010 at 5:37 PM 
This worked perfectly for me. Why do other tools make doing this so complex?! ;^}