This tutorial shows how to show a webpage preview by extracting the metadata from a link using PHP. It uses PHP cURL to get the webpage title, description, and image if any from the open graph metadata.
The below steps are used to show the webpage link preview.
The getSiteHTMLViaCURL()
function prepares PHP cURL requests to the webpage link. It returns the site HTML from the cURL response.
<?php
$link = 'http://example.com/';
$siteHTML = getSiteHTMLViaCURL($link);
function getSiteHTMLViaCURL($url)
{
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$siteHTML = curl_exec($ch);
if (curl_errno($ch)) {
return false;
}
curl_close($ch);
return $siteHTML;
}
?>
The getMetadataFromHTML()
prepares a regex pattern to filter the open graph metadata from the site HTML. It generates an array of metadata that matches the regex pattern.
Regex patterns are generally used to ensure the expected formats. We have already used the regex to validate email data and to measure password strength.
<?php
function getMetadataFromHTML($siteHTML)
{
// Get the webpage metadata
$metadataArray = [];
if (preg_match_all('/<meta property="og:([^"]+)"\s*content="([^"]*)"/i', $siteHTML, $matches)) {
$metadataArray = array_combine($matches[1], $matches[2]);
}
return $metadataArray;
}
$metadataArray = getMetadataFromHTML($siteHTML);
?>
The code below has the function generatePreviewHTML()
uses the metadata og:title
, og:description
and og:image
to form the preview HTML.
<?php
function generatePreviewHTML($metadataArray)
{
// Generate HTML with webpage link preview
$previewHTML = '<div class="site-preview">';
$previewHTML .= '<h1>' . $metadataArray['title'] . '</h1>';
$previewHTML .= '<p>' . $metadataArray['description'] . '</p>';
if (isset($linkPreview['image'])) {
$previewHTML .= '<p><img src="' . $metadataArray['image'] . '" alt="Site Logo"></p>';
}
$previewHTML .= '</div>';
return $previewHTML;
}
?>
<!DOCTYPE html>
<html>
<head>
<title>How to get metadata from webpage link to show preview in PHP</title>
<link rel="stylesheet" type="text/css" href="style.css" />
</head>
<body>
<div class="phppot-container">
<?php
echo generatePreviewHTML($metadataArray);
?>
</div>
</body>
</html>
This example gives details to scrape a URL using JavaScript. This is done on the client side, run by the browser. A client-side script with simple JavaScript fetch() or XMLHttpRequest can get the response from a different domain via a network call.
The request is served by your server from domain A and you will be scrapping a URL from domain B. For this scenario, CORS policy comes into the picture. By default, this is not allowed by a browser.
To bypass this restriction, the called domain (B) should declare either of the one below.
The below code helps to test your API endpoints during development. You can write a client API, then allow CORS and test it.
get-metadata-via-ajax.html
<html>
<head>
<meta charset="UTF-8">
<title>Fetch HTML with CORS</title>
</head>
<body>
<script>
// Example AJAX code accesses remote data using XMLHttpRequest
var url = 'http://example.com';
xhr.onreadystatechange = function () {
if (xhr.readyState === XMLHttpRequest.DONE) {
if (xhr.status === 200) {
var response = xhr.responseText;
// Parse the meta tags from the response using DOMParser
var parser = new DOMParser();
var doc = parser.parseFromString(response, 'text/html');
var metaTags = doc.querySelectorAll('meta');
// Display the site preview data from the meta tags
metaTags.forEach(function (tag) {
if (tag.getAttribute('name') == 'og:title') {
document.write("<h1>" + tag.getAttribute('content') + "</h1>");
}
if (tag.getAttribute('name') == 'og:description') {
document.write("<p>" + tag.getAttribute('content') + "</p>");
}
if (tag.getAttribute('name') == 'og:image') {
document.write("<img src='" + tag.getAttribute('content') + "' />");
}
});
} else {
console.error('Request failed with status:', xhr.status);
}
}
};
xhr.open('GET', url, true);
xhr.send();
</script>
</body>
</html>
You can also enable the browser CORS extension to access the cross-site resource from the remote URL for testing purposes.
file_get_contents()
to get the metadataThis PHP code is the easiest possible way of getting metadata from a webpage link HTML. But, it requires the php.ini directive allow_url_fopen set to 1 to use the file_get_contents(). It is not secure to enable this flag in a real-time scenario. So, this method is also applicable to a development environment.
file_get_contents.php
<?php
$link = 'http://example.com';
// get HTML using PHP core function
$html = file_get_contents($link);
// Extract the Open Graph metadata tags using regular expression
$ogMetaTags = [];
if (preg_match_all('/<meta property="og:([^"]+)"\s*content="([^"]*)"/i', $html, $matches)) {
$ogMetaTags = array_combine($matches[1], $matches[2]);
}
?>
<!DOCTYPE html>
<html>
<head>
<title>How to get metadata from webpage link to show preview in PHP</title>
<link rel="stylesheet" type="text/css" href="style.css" />
</head>
<body>
<div class="phppot-container">
<!-- Create webpage link preview with open graph meta tags -->
<div class="site-preview">
<h2><?php echo $ogMetaTags['title']; ?></h2>
<!-- Check if 'og:image' is set before including it in the HTML -->
<?php
if (isset($ogMetaTags['image'])) {
?>
<img src="<?php echo $ogMetaTags['image']; ?>" alt="Site Logo" width="100%"></p>
<?php
}
?>
<p><?php echo $ogMetaTags['description']; ?></p>
</div>
</div>
</body>
</html>
There are APIs available to fetch the webpage meta and other data from a link. For that, the API requires them to create keys from their dashboard.
By sending the encoded webpage link with the key the API endpoint returns metadata to the client.
Most of the APIs are paid services, but provide free trials to see how things give value.
The below example code shows how to use 3-party API to get the metadata to show webpage preview.
// Example usage with a website URL
var siteUrl = 'http://example.com/';
//Specify the Third pary API URL with key
//Eg. `https://opengraph.io/api/1.1/site/${encodeURIComponent(url)}`
//Eg. 'https://apiv2.ahrefs.com?token=${apiKey}&target=${encodeURIComponent(siteUrl)}'
fetch('API URL with access key')
.then(response => response.json())
.then(data => {
console.log(data);
})
.catch(error => {
console.error('Error fetching site metadata:', error);
});