Home     Articles & Projects     Products & Web Services     Forum

Get the main body text from site/s

Hi just wondering, does anyone know how to tell PHP to get the text from a story.

I am sure as we all know a web developer can place text anywhere on the page, but I want to only get the area that has the most text.

Hi Russell, There is no easy

Hi Russell,

There is no easy way to do this i'm afraid.

What may be possible if the page is valid XHTML is to use Magic Parser to read the entire document at the HTML/BODY/ and then scan the response for the longest element; for example:

<?php
 
require("MagicParser.php");
  function
myRecordHandler($record)
  {
    global
$story;
   
$longest = 0;
    foreach(
$record as $k => $v)
    {
      if (
strlen($v) > $longest)
      {
       
$story = $v;
       
$longest = strlen($story);
      }
    }
  }
 
$url = "http://www.example.com/";
 
MagicParser_parse($url,"myRecordHandler","xml|HTML/BODY/");
  print
$story;
?>