Extract Links From A HTML File With PHP

Note: This post is over two years old and so the information contained here might be out of date. If you do spot something please leave a comment and we will endeavour to correct.

6th March 2008 - 2 minutes read time

Use the following function to extract all of the links from a HTML string.

function linkExtractor($html)
{
 $linkArray = array();
 if(preg_match_all('/<a\s+.*?href=[\"\']?([^\"\' >]*)[\"\']?[^>]*>(.*?)<\/a>/i', $html, $matches, PREG_SET_ORDER)){
  foreach ($matches as $match) {
   array_push($linkArray, array($match[1], $match[2]));
  }
 }
 return $linkArray;
}

To use it just read a web page or file into a string, and pass that string to the function. The following example reads a web page using the PHP CURL functions and then passes the result into the function to retrieve the links.

$url = 'http://www.hashbangcode.com';	
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12');
curl_setopt($ch,CURLOPT_HEADER,0);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_FOLLOWLOCATION,0);
curl_setopt($ch,CURLOPT_TIMEOUT,120);
$html = curl_exec($ch);
curl_close($ch);
echo '<pre>' . print_r(linkExtractor($html), true) . '<pre>';

The function will return an array, with each element being an array containing the link location and the text that the link contains.

PHP

Comments

Cool Scripts.

Submitted by Mark James on Thu, 09/04/2008 - 09:27

Permalink

Hi Philip, I have similar script:

\n");
PRINT("\n");
WHILE(!FEOF($page)) { $line = FGETS($page, 255); WHILE(EREGI("HREF=\"[^\"]*\"", $line, $match)) { PRINT(""); PRINT($match[0]); PRINT("
\n"); $replace = EREG_REPLACE("\?", "\?", $match[0]); $line = EREG_REPLACE($replace, "", $line); }
}
PRINT("\n");
FCLOSE($page);
?>

How do I get links only with .zip extension and without href=" and in the end of each line " as well?

Submitted by Bilal on Sun, 05/10/2015 - 21:24

Permalink

Extract Links From A HTML File With PHP

Comments

Add new comment

Related Content

Generating Colour Palettes From Images In PHP

Validating XML Files With XML Schema Definitions In PHP

Creating A Character Bitmap In PHP

Approximating Pi Using A Circle And A Square

Drawing A Parabolic Curve With Straight Lines In PHP

Recreating Spotify Wrapped In PHP