Home » PHP/MySQL, Programming Techniques, Regular Expression » Regular Expression in PHP – Find Link Texts

6

A small article for those who want to experience with Regular Expressions in PHP. Regular expressions, also referred to as regex or regexp, provide a concise and flexible means for matching strings of text, such as particular characters, words, or patterns of characters. Regular expressions are used by many text editors, utilities, and programming languages to search and manipulate text based on patterns.

OK. Let’s now try a small example. Let’s try to find the URL defined in the HREF attribute and the Link Text in all the tags present in an HTML string.

This is the HTML we have:

<html>
<body>
<a href=”http://www.google.com”>Google</a>
<a href=”http://www.yahoo.com”>Yahoo</a>
</body>
</html>

We will now find the href value and the link text in the above html code. So we are expecting an output like this.

http://www.google.com – Google
http://www.yahoo.com – Yahoo

Here is the Regular Expression for this.

preg_match_all("/\<a.*href=\"(.*?)\".*?\>(.*)\<\/a\>+/", $yourhtmlstring, $matches, PREG_SET_ORDER);
PREG_SET_ORDER is used order results so that $matches[0] is an array of first set of matches, $matches[1] is an array of second set of matches, and so on.

All the matchings found will be returned in the $matches array. Let’s see the content of the $matches array.

Array
(
    [0] => Array
        (
            [0] => Google
            [1] => http://www.google.com
            [2] => Google
        )
    [1] => Array
        (
            [0] => Yahoo
            [1] => http://www.yahoo.com
            [2] => Yahoo
        )
)

A simple script for you to try:

<?php
if(count($_POST)) {
	preg_match_all("/\<a.*href=\"(.*?)\".*?\>(.*)\<\/a\>+/", stripslashes($_POST['data']), $matches, PREG_SET_ORDER);
	foreach($matches as $key=>$match) {
		echo htmlentities($match[2]).' : '.$match[1]."<br/>";
	}
}
?>
<br/>
<br/>
<form action="" enctype="multipart/form-data" method="post">
<textarea name="data" rows="10" cols="100"></textarea><br>
<input type="submit" name="submit"/>
</form>

This script when run shows a text area where you can paste your html code with <a> tags in it. Submit the form and you can see the links extracted.

More articles on Regular Expression coming soon!

Enjoy!

6 Comments

  1. suraj says:

    it is not working for me

  2. Aneeska says:

    Hi Suraj,

    There was an error in the code as some characters appeared as html entities.

    I have updated the code above.

    Try and let me know.

    Regards,
    Anees

  3. suraj says:

    Thanks for your kind reply.
    This is what I wanted .Thank you…. ;)

  4. Aneeska says:

    Great to know Suraj. Have fun!

  5. charlie says:

    magnificent post, very informative. I wonder why the other specialists of this sector don’t realize this. You should continue your writing. I’m sure, you have a great readers’ base already!

Leave a Reply

Page optimized by WP Minify WordPress Plugin