PHP Classes

PHP Image Crawler: Crawl Web site pages to find images in the pages

Recommend this page to a friend!
     
  Info   Example   View files Files   Install with Composer Install with Composer   Download Download   Reputation   Support forum   Blog    
Ratings Unique User Downloads Download Rankings
Not yet rated by the usersTotal: 93 All time: 9,906 This week: 524Up
Version License PHP version Categories
image-crawler 1.0.0MIT/X Consortium ...5Graphics, Searching, Web services
Description 

Author

This package can crawl Web site pages to find images in the pages.

It provide a script that can be run from the command line that starts a robot to retrieve a Web page with a given URL and follow links to other Web pages in the same site.

The package can return the number of image tags that it finds in the retrieved pages and saves a report to a text file.

Picture of Igor Dyshlenko
  Performance   Level  
Innovation award
Innovation award
Nominee: 1x

 

Example

#!/usr/bin/env php
<?php
if (!file_exists(__DIR__ . '/vendor/autoload.php')) {
    echo
"\nThe crawler utility not installed. Use \"composer install\" or \"composer update\" for install.\n\n";
    exit(
1);
}

error_reporting(E_ERROR);

include
__DIR__ . '/vendor/autoload.php';

use
App\Console\ArgumentHolder;
use
App\ContentLoader;
use
App\ImgCountHandler;
use
Domain\Site;

const
DEFAULT_TIMEOUT = 60,
DEFAULT_LEVEL = PHP_INT_MAX,
SITE_INDEX = 0;

$consoleArguments = new ArgumentHolder();

$url = $consoleArguments->getParameter(SITE_INDEX);

$paramsError =
    (
filter_var($url, FILTER_VALIDATE_URL, FILTER_FLAG_SCHEME_REQUIRED | FILTER_FLAG_HOST_REQUIRED) === false) ||
    ((
$parsed = parse_url($url)) === false);
$paramsError |= (!in_array($parsed['scheme'] ?? [], ['http', 'https'], true));
if (
$paramsError) {
    echo
"\nIncorrect URL ", $url, "\n\nUse the utility as follows.\n\n";
}

if (
$paramsError || $url === null || $consoleArguments->getOption('h') !== null) {
    if (
$text = file_get_contents('help.txt')) {
        echo
$text;
    } else {
        echo
"File \"help.txt\" not found.\n";
    }
    exit(
1);
}
$start = microtime(true);
set_time_limit($consoleArguments->getOption('t') ?? DEFAULT_TIMEOUT);

$loader = ContentLoader::getInstance();
$site = new Site($url);
$handler = new ImgCountHandler($site, $url, $loader, [], $consoleArguments->getOption('l') ?? DEFAULT_LEVEL);

$report = $handler->handle($url);
$fullFilename = ($consoleArguments->getOption('d') ?? '.') . '/' . $report->getDefaultFilename();

if (
file_put_contents($fullFilename, $report->getContent()) === false) {
    echo
"\n\nFile ", $fullFilename, " cann't be saved.";
    exit(
1);
}

echo
"\nFile ", $fullFilename, " saved.\n", sprintf("Full runtime = %.3f sec.\n", microtime(true) - $start);


  Files folder image Files (19)  
File Role Description
Files folder imageApp (4 files, 1 directory)
Files folder imageDomain (4 files)
Files folder imageInfrastructure (1 directory)
Files folder imagetests (4 files)
Accessible without login Plain text file composer.json Data Auxiliary data
Accessible without login Plain text file composer.lock Data Auxiliary data
Accessible without login Plain text file crawler Example Example script
Accessible without login Plain text file help.txt Doc. Documentation
Accessible without login Plain text file phpunit.xml Data Auxiliary data

The PHP Classes site has supported package installation using the Composer tool since 2013, as you may verify by reading this instructions page.
Install with Composer Install with Composer
 Version Control Unique User Downloads Download Rankings  
 100%
Total:93
This week:0
All time:9,906
This week:524Up