Extract Content using PHP and Preview like Facebook

by Vincy. Last modified on July 9th, 2022.

While sharing URLs, if the client automatically extracts the title and image to post it as a preview, then that would be nice. On many social media websites like Facebook and LinkedIn, we can see that feature. They extract title and meta information on sharing links.

In this tutorial, we are going to extract the page title, meta description and images from the URL shared by the user.

In this example, I used PHP and curl for extracting content and images from the given URL. When the user shares the link to the given input text box, an AJAX call will be sent to a PHP page to process the CURL request for extracting remote data.

In a previous tutorial, we have seen how to extract content from a URL and parse using a Simple HTML DOM Parser. After getting remote data the PHP code generates JSON with the title, description and image data and responds to the AJAX call to show the preview on the browser.

extract-remote-data-preview

jQuery AJAX call to initiate CURL

The following code shows the jQuery AJAX script to request remote data for the given input URL. When the user paste the URL into the input field, this AJAX request will be sent to the get-data.php file.

In this file, I process extracting remote content via CURL and return this content as an AJAX response.

After getting data from the URL, it will be previewed on the browser. If the URL contains more images then this image will be seen by clicking the previous and next navigations shown below the image preview.

<script type="text/javascript">
$(document).ready(function() {
    var image_src;
    $('#remote-url').on("keyup", function() { 	
				$("#output").html("");
				$("#loader").show(); 
				
				var remote_url = $(this).val();
			    var image_html = '';
                
				$.ajax({
                    url: "get-data.php",
                    type: "POST",
                    data: {'url': remote_url},
                    dataType: "json",
                    success: function(data, status){
                        	image_src = data.image_src;
        					total_images = 0;
                            if(data.image_src) {
                                total_images = parseInt(data.image_src.length-1);
                                current_image_position = total_images;
            					
                					if(total_images>=0){
                						image_html = '<div class="image-preview" id="image-preview"><img src="'+data.image_src[current_image_position]+'"></div>'+
                                        '<div class="prev-next-navigation"><span class="prev-img" id="prev-img"> </span><span class="next-img" id="next-img"> </span> </div>';
                					}
                            }
                         
                         cotent_html = '<div class="text-data"><a class="page-title" href="'+remote_url+'" target="_blank">'+data.title+'</a><div>'+data.body+'</div>';
        					var responseHTML = image_html + cotent_html;
        					
        					$("#output").html(responseHTML).show(); 
        					$("#loader").hide();
                    },
                    error: function () {alert("Problem in extracting data from the remote URL");}
                });
	});


	$("body").on("click","#prev-img", function(e){		
		if(current_image_position>0) 
		{
			current_image_position--;
			$("#image-preview").html('<img src="'+image_src[current_image_position]+'">');
		}
	});
	
	$("body").on("click","#next-img", function(e){		
		if(current_image_position<total_images)
		{
			current_image_position++;
			$("#image-preview").html('<img src="'+image_src[current_image_position]+'">');
		}
	});
});
</script>

PHP CURL Request to Extract Title and Meta From URL

The following PHP code shows how to get page titles and other meta details by using  PHP CURL. In this code, I initialized the CURL object and set the URL to be accessed with the reference of this object. The CURL script will return the HTML content of the remote page.

After getting the HTML content, we need to parse the HTML by referring to the title, meta and img tag names to get the page title, description and the image URLs, respectively. These data are encoded into a JSON array and returned as the AJAX response.

<?php
if (isset($_POST["url"]) && filter_var($_POST["url"], FILTER_VALIDATE_URL)) {
    
    // Extract HTML using curl
    $ch = curl_init();
    
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $_POST["url"]);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    
    $data = curl_exec($ch);
    curl_close($ch);
    
    // Load HTML to DOM Object
    $dom = new DOMDocument();
    @$dom->loadHTML($data);
    
    // Parse DOM to get Title
    $nodes = $dom->getElementsByTagName('title');
    $title = $nodes->item(0)->nodeValue;
    
    // Parse DOM to get Meta Description
    $metas = $dom->getElementsByTagName('meta');
    $body = "";
    for ($i = 0; $i < $metas->length; $i ++) {
        $meta = $metas->item($i);
        if ($meta->getAttribute('name') == 'description') {
            $body = $meta->getAttribute('content');
        }
    }
    
    // Parse DOM to get Images
    $image_urls = array();
    $images = $dom->getElementsByTagName('img');
     
     for ($i = 0; $i < $images->length; $i ++) {
         $image = $images->item($i);
         $src = $image->getAttribute('src');
         
         if(filter_var($src, FILTER_VALIDATE_URL)) {
             $image_src[] = $src;
         }
     }
    
    $output = array(
        'title' => $title,
        'image_src' => $image_src,
        'body' => $body
    );
    echo json_encode($output); 
}
?>

Download

Vincy
Written by Vincy, a web developer with 15+ years of experience and a Masters degree in Computer Science. She specializes in building modern, lightweight websites using PHP, JavaScript, React, and related technologies. Phppot helps you in mastering web development through over a decade of publishing quality tutorials.

Leave a Reply

Your email address will not be published. Required fields are marked *

↑ Back to Top

Share this page