Convert And Replace Embedded base64 Images In WordPress

In a recent case, I had a WordPress Installation with hundreds of articles, and each article contains several base64 images. Not only from the point of view of SEO we have to convert and replace that base64 in the articles to physical images. With the amount of base64 images, they lose their reason d’être continue to exist. For this issue, I wrote some scripts which resolve this issue. The following code lines are not clean or tidy, but they serve their purpose.

My solution consists of two parts. With the first part, We get the affected articles and pull out the necessary information about the embedded base64 images for the second part. In the second part, we upload media attachments in WordPress and replace the embedded base64 code lines with the uploaded media files. Let’s start this shit!

Part 1: Prepare and collect information about the embedded base64 images

<?php

// Load the WordPress framework, a bunch of functions and more
require_once( dirname(__FILE__).'/initialize.php');

// Get all published articles by the post_type 'post'
$posts_array = get_posts( array(
        'post_type'        => 'post',
        'post_status'      => 'publish',
        'posts_per_page'   => -1,
    ) );

// Go ahead if at least one article exists
if( count( $posts_array ) > 0 )
{
    // Loop over the whole article list
    foreach( $posts_array as $post )
    {
        // Go ahead if the post_content contains more than 1500 characters (
        // INFO: This ensures that the post_content contains at least one base64 image
        if( strlen( $post->post_content ) > 1500 )
        {
            // Collect and prepare the base64 information of each article for further processing
            $basesf_array = parse_code( $post->post_content, $post->ID, $mime_types_map );

            // Go ahead if we collect at least one embedded base64 image in the article
            if( $basesf_array )
            {
                // Loop over the articles were embedded base64 images found
                foreach( $basesf_array as $post_id => $basesf )
                {
                    // Loop over the a single article and go through the founded base64 images
                    foreach( $basesf as $baseimage )
                    {
                        // Define a fictional filename for the media attachment
                        // TODO: Collect the alt or title attributes for a better filename
                        $filename = sanitize_title( substr( $baseimage['base64'], 0, 15 ) ).'.'.$baseimage['extension'];

                        // For the forthcoming upload procedure, We have to create a physical file by decoding the base64 image
                        if( @file_put_contents( get_temp_dir().'/'.$filename, base64_decode( $baseimage['base64'] )) )
                        {
                            $curl_post_array =
                                array(
                                    'file'         => '@'.get_temp_dir().'/'.$filename,
                                    'filebasename' => md5( $baseimage['full'] ),
                                    'post_id'      => $post->ID,
                                    'base64full'   => $baseimage['full']
                                );

                            // Make the curl http post request for part 'base64-fix-part-two.php'
                            curl_post( $public_wp_path.'base64-fix-part-two.php', $curl_post_array );
                        }
                    }
                }
            }
        }
    }
}

Part 2: Upload new media files and replace the base64 code lines with the uploaded media files

<?php

// Load the WordPress framework, a bunch of functions and more
require_once( dirname(__FILE__).'/initialize.php');

if( !function_exists( 'wp_handle_upload' ) )
{
    require_once( ABSPATH . 'wp-admin/includes/file.php' );
    require_once( ABSPATH . 'wp-admin/includes/image.php' );
}

// Get the absolute file path from the formdata
$uploadedfile = $_FILES['file'];

// Get the post_id
$post_id      = $_POST['post_id'];

// Get the full base64 image code for replacing the 'post_content'
$base64full   = $_POST['base64full'];

// Get the filebasename to check of existence media attachment in the wordpress media pool
$filebasename = $_POST['filebasename'];

// Set the 'user_id' fix for the post updates (1 = Admin)
$user_id = 1;

// Check whether a media with the 'filebasename' already exists and get the 'attachment_id'
$attach_id = get_attachment( $filebasename );

// If no attachment with the 'filebasename' was found, upload the media file
if( !$attach_id )
{
    // Init the media upload to ./wp-content/uploads/.. or wether you define your upload folder
    $movefile = wp_handle_upload( $uploadedfile, array( 'test_form' => false, 'test_upload' => false ) );

    // Yay, the upload proccess was was successful. Go ahead...
    if( $movefile )
    {
        // Get an array of path information
        $wp_upload_dir = wp_upload_dir();

        // Delete the temporary media files which have created as previously
        @unlink( $uploadedfile );

        // Define some meta values for the linkin the
        $attachment =
            array(
                'guid'           => $wp_upload_dir['url'] . '/' . basename( $movefile['file'] ),
                'post_mime_type' => $movefile['type'],
                'post_title'     => preg_replace('/.[^.]+$/', '', basename($movefile['file'])),
                'post_content'   => $filebasename,
                'post_status'    => 'inherit'
            );

        // Insert an attachment into the media library by the uploaded file
        $attach_id = wp_insert_attachment( $attachment, $movefile['file'], $post_id );
        // Do some more stuff with the uploaded attachment
        $attach_data = wp_generate_attachment_metadata( $attach_id, $movefile['file'] );
        // Update the attachment meta data
        $attachment_metadata =  wp_update_attachment_metadata( $attach_id,  $attach_data );
    }
}

// Get the attachment media url by the 'ID' (post id)
$attachment_url = wp_get_attachment_url( $attach_id );

// Replace in the 'post_content' the embedded base64 code with the 'attachment_url'
post_content_replacement( $post_id, $user_id, $base64full, $attachment_url );

And last but not least the missing functions…

<?php

/**
 * Check whether a media file with the  'filebasename' exist and return the ID (post id)
 *
 * @param  string $filebasename The filebasename for the database query
 * @return int ID Return the ID or 0 if no attachment found
 */
function get_attachment( $filebasename )
{
    global $wpdb;

    $res = $wpdb->get_row( $wpdb->prepare( 'SELECT ID FROM '.$wpdb->posts.' WHERE post_content = "%s" LIMIT 1', $filebasename ) );
    return (int) $res->ID;
}

/**
 * Replace in the 'post_content' the embedded base64 code with the 'attachment_url'
 *
 * @param array $post_id The ID (post_id)
 * @param array $user_id The user_id which we need for the 'post_author' binding
 * @param array $base64 The raw base64 code for the replacement
 * @param array $attachment_url The media url for the replacement
 */
function post_content_replacement( $post_id, $user_id, $base64, $attachment_url )
{
    global $wpdb;

    // Get a single article by the ID (post_id)
    $post = get_post( $post_id );

    // replace the base64 code with the media url and sage the result in the '$post_content' variable
    $post_content = str_replace( "$base64", $attachment_url, $post->post_content );

    // Update the 'post_content' of the article
    $wpdb->query( $wpdb->prepare( 'UPDATE '.$wpdb->posts.' SET post_content = "%s" WHERE ID = %s LIMIT 1', $post_content, $post_id ) );

    // OR creating a new article with the new 'post_content' and let wordpress create a revision-post of that latest article
    // Therefore you need afterwards update the post_author manually!!!
    /*
        wp_update_post( array( 'ID' => $post_id, 'post_content' => $post_content ));

        // Update the 'post_author' of that previously created article
        $wpdb->update( $wpdb->posts, array( 'post_author' => $user_id ), array( 'ID' => $wpdb->insert_id ) );
    */
}

/**
 * Make a simple curl http post request
 *
 * @param string $url The url of the calling script
 * @param array $post_array The post_array contains the params and values
 * @return bool Returns true on success or false on failure
 */
function curl_post( $url, $post_array )
{
    $ch = curl_init();

    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_POST, 1);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $post_array);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

    $result = curl_exec ($ch);
    curl_close( $ch );

    return $result;
}

/**
 * Get the file extension by the array of mapped mime-types
 *
 * @param array $mime_types_map mime-type and extension mapping
 * @param array $mime_content_type the mime-type from embedded base64 code
 * @return string Return the matched file extension
 */
function get_file_extension( $mime_types_map, $mime_content_type )
{
    foreach( $mime_types_map as $ext => $mime_type)
    {
        if( $mime_type === $mime_content_type )
        {
            return $ext;
        }
    }
}

/**
 * Create array structure based on the base64 matches
 *
 * @param array $matches The matches from the previous regex
 * @param array $mime_types_map An array with mapped mime-types and file extensions
 * @return array Return an array with base64 informations
 */
function prepare_matches( $matches, $mime_types_map )
{
    return
        array(
            'extension' => get_file_extension( $mime_types_map, $matches[1][0] ),
            'mime-type' => $matches[1][0],
            'base64'    => $matches[2][0],
            'full'      => $matches[0][0],
        );
}

/**
 * Check whether a media file with the 'filebasename' exist
 *
 * @param string $content The post_content
 * @param integer $post_id The post_id (needed for the array structure)
 * @param array $mime_types_map An array with mime-types and extensions
 * @return array Return an array with the collected base64 information
 */
function parse_code( $content, $post_id, $mime_types_map )
{
    $dom = DOMDocument::loadHTML( $content );

    $tags = $dom->getElementsByTagName('img');

    $basesf_array[$post_id] = array();

    if( $tags )
    {
        foreach( $tags as $tag )
        {
            preg_match_all( '/data:([-/w]+);base64,([0-9a-zA-Z+/=]{20,})/i', $tag->getAttribute('src'), $matches );

            if( ( isset($matches[1][0]) and preg_match('/([-w]+/[-w]+)/i', $matches[1][0]) ) and isset($matches[2][0]) )
            {
                $basesf_array[$post_id][] = prepare_matches( $matches, $mime_types_map );
            }
        }
    }

    if( empty($basesf_array[$post_id]) )
    {
        unset( $basesf_array[$post_id] );
    }

    return $basesf_array;
}

Download the entire scripts from my GitHub Repository.

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedIn