Happy

Enumerate "Data" Big Idea from College Board

Some of the big ideas and vocab that you observe, talk about it with a partner ...

  • "Data compression is the reduction of the number of bits needed to represent data"
  • "Data compression is used to save transmission time and storage space."
  • "lossy data can reduce data but the original data is not recovered"
  • "lossless data lets you restore and recover"

The Image Lab Project contains a plethora of College Board Unit 2 data concepts. Working with Images provides many opportunities for compression and analyzing size.

Image Files and Size

Here are some Images Files. Download these files, load them into images directory under _notebooks in your Blog. - Clouds Impression

Describe some of the meta data and considerations when managing Image files. Describe how these relate to Data Compression ...

  • File Type, PNG and JPG are two types used in this lab
  • Size, height and width, number of pixels
  • Visual perception, lossy compression

Python Libraries and Concepts used for Jupyter and Files/Directories

Introduction to displaying images in Jupyter notebook

IPython

Support visualization of data in Jupyter notebooks. Visualization is specific to View, for the web visualization needs to be converted to HTML.

pathlib

File paths are different on Windows versus Mac and Linux. This can cause problems in a project as you work and deploy on different Operating Systems (OS's), pathlib is a solution to this problem.

  • What are commands you use in terminal to access files?
    • cd, code, vi
  • What are the command you use in Windows terminal to access files?
    • cd, travel
  • What are some of the major differences?
    • Bash is a Linux based software, bash can do alot more things and is more flexible Provide what you observed, struggled with, or leaned while playing with this code.
  • Why is path a big deal when working with images?
    • So you can display images correctly
  • How does the meta data source and label relate to Unit 5 topics?
    • Meta Data is like the meta data inside the html websites
  • Look up IPython, describe why this is interesting in Jupyter Notebooks for both Pandas and Images?
    • It is an interactive tool, that can show Images
from IPython.display import Image, display
from pathlib import Path  # https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f

# prepares a series of images
def image_data(path=Path("images/"), images=None):  # path of static images is defaulted
    if images is None:  # default image
        images = [
            {'source': "Internet", 'label': "Smile Emoji", 'file': "smile.png"},
        ]
    for image in images:
        # File to open
        image['filename'] = path / image['file']  # file with path
    return images

def image_display(images):
    for image in images:  
        display(Image(filename=image['filename']))


# Run this as standalone tester to see sample data printed in Jupyter terminal
if __name__ == "__main__":
    # print parameter supplied image
    Smile_Emoji = image_data(images=[{'source': "Internet", 'label': "Smile Emoji", 'file': "smile.png"}])
    image_display(Smile_Emoji)
    
    # display default images from image_data()
    default_images = image_data()
    image_display(default_images)
    

Reading and Encoding Images (2 implementations follow)

PIL (Python Image Library)

Pillow or PIL provides the ability to work with images in Python. Geeks for Geeks shows some ideas on working with images.

base64

Image formats (JPG, PNG) are often called *Binary File formats, it is difficult to pass these over HTTP. Thus, base64 converts binary encoded data (8-bit, ASCII/Unicode) into a text encoded scheme (24 bits, 6-bit Base64 digits). Thus base64 is used to transport and embed binary images into textual assets such as HTML and CSS.- How is Base64 similar or different to Binary and Hexadecimal?

  • Translate first 3 letters of your name to Base64.

numpy

Numpy is described as "The fundamental package for scientific computing with Python". In the Image Lab, a Numpy array is created from the image data in order to simplify access and change to the RGB values of the pixels, converting pixels to grey scale.

io, BytesIO

Input and Output (I/O) is a fundamental of all Computer Programming. Input/output (I/O) buffering is a technique used to optimize I/O operations. In large quantities of data, how many frames of input the server currently has queued is the buffer. In this example, there is a very large picture that lags.

  • Where have you been a consumer of buffering?
  • From your consumer experience, what effects have you experienced from buffering?
  • How do these effects apply to images?

Data Structures, Imperative Programming Style, and working with Images

Introduction to creating meta data and manipulating images. Look at each procedure and explain the the purpose and results of this program. Add any insights or challenges as you explored this program.

  • Does this code seem like a series of steps are being performed?
    • Yes, it first gets the images from the path in the Images dict, then scalzes and makes grey scale
  • Describe Grey Scale algorithm in English or Pseudo code?
    • All the pixels are averaged out and turn to a grey/black
  • Describe scale image? What is before and after on pixels in three images?
    • They were either alot bigger or smaller, and got put to 320
  • Is scale image a type of compression? If so, line it up with College Board terms described?
    • Yes the scaling of an image can be compression. Depending on the format, it can have lossy or lossless compression.
from IPython.display import HTML, display
from pathlib import Path  
from PIL import Image as pilImage 
from PIL import ImageFilter, ImageDraw, ImageFont
from io import BytesIO
import base64
import numpy as np
from PIL import ImageFilter

def image_data(path=Path("images/"), images=None): 
    if images is None: 
        images = [
            {'source': "Internet", 'label': "Smile Emoji", 'file': "smile.png"},
        ]
    for image in images:
        image['filename'] = path / image['file']  
    return images

def scale_image(img):
    baseWidth = 320
    scalePercent = (baseWidth/float(img.size[0]))
    scaleHeight = int((float(img.size[1])*float(scalePercent)))
    scale = (baseWidth, scaleHeight)
    return img.resize(scale)

def image_to_base64(img, format):
    with BytesIO() as buffer:
        img.save(buffer, format)
        return base64.b64encode(buffer.getvalue()).decode()

def image_management(image):   
    img = pilImage.open(image['filename'])
    
    image['format'] = img.format
    image['mode'] = img.mode
    image['size'] = img.size
    img = scale_image(img)
    image['pil'] = img
    image['scaled_size'] = img.size
    image['html'] = '<img src="data:image/png;base64,%s">' % image_to_base64(image['pil'], image['format'])

def image_management_add_html_grey(image):
    img = image['pil']
    format = image['format']
    
    img_data = img.getdata()
    image['data'] = np.array(img_data) 
    image['gray_data'] = [] 

    for pixel in image['data']:
        average = (pixel[0] + pixel[1] + pixel[2]) // 3  
        if len(pixel) > 3:
            image['gray_data'].append((average+60, average, average+150, pixel[3])) 
        else:
            image['gray_data'].append((average, average, average))
        
    img.putdata(image['gray_data'])
    image['html_grey'] = '<img src="data:image/png;base64,%s">' % image_to_base64(img, format)

def image_management_add_blur_text(image):
    img = image['pil']
    format = image['format']

    # Add Gaussian blur
    img_blur = img.filter(ImageFilter.GaussianBlur(radius=10))
    image['pil_blur_text'] = img_blur
    image['html_blur_text'] = '<img src="data:image/png;base64,%s">' % image_to_base64(img_blur, format)


if __name__ == "__main__":
    # Use numpy to concatenate two arrays
    images = image_data()
    
    # Display meta data, scaled view, and blurred with FINN text for each image
    for image in images:
        # Load and scale the image
        image_management(image)
        
        # Display meta data and original image
        print("---- meta data -----")
        print(image['label'])
        print(image['source'])
        print(image['format'])
        print(image['mode'])
        print("Original size: ", image['size'])
        print("Scaled size: ", image['scaled_size'])
        
        print("-- original image --")
        display(HTML(image['html']))
        
        # Convert the image to grayscale and display it
        print("---  image ----")
        image_management_add_html_grey(image)
        display(HTML(image['html_grey']))
        
        # Blur the image and add text
        print("--- blurred with text ---")
        image_management_add_blur_text(image)
        display(HTML(image['html_blur_text']))
---- meta data -----
Smile Emoji
Internet
PNG
RGBA
Original size:  (1200, 1200)
Scaled size:  (320, 320)
-- original image --
---  image ----
--- blurred with text ---

Data Structures and OOP

Most data structures classes require Object Oriented Programming (OOP). Since this class is lined up with a College Course, OOP will be talked about often. Functionality in remainder of this Blog is the same as the prior implementation. Highlight some of the key difference you see between imperative and oop styles.

  • Read imperative and object-oriented programming on Wikipedia
  • Consider how data is organized in two examples, in relations to procedures
  • Look at Parameters in Imperative and Self in OOP

Additionally, review all the imports in these three demos. Create a definition of their purpose, specifically these ...

  • PIL
  • numpy
  • base64
from IPython.display import HTML, display
from pathlib import Path  # https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f
from PIL import Image as pilImage # as pilImage is used to avoid conflicts
from io import BytesIO
import base64
import numpy as np


class Image_Data:

    def __init__(self, source, label, file, path, baseWidth=320):
        self._source = source    # variables with self prefix become part of the object, 
        self._label = label
        self._file = file
        self._filename = path / file  # file with path
        self._baseWidth = baseWidth

        # Open image and scale to needs
        self._img = pilImage.open(self._filename)
        self._format = self._img.format
        self._mode = self._img.mode
        self._originalSize = self.img.size
        self.scale_image()
        self._html = self.image_to_html(self._img)
        self._html_grey = self.image_to_html_grey()


    @property
    def source(self):
        return self._source  
    
    @property
    def label(self):
        return self._label 
    
    @property
    def file(self):
        return self._file   
    
    @property
    def filename(self):
        return self._filename   
    
    @property
    def img(self):
        return self._img
             
    @property
    def format(self):
        return self._format
    
    @property
    def mode(self):
        return self._mode
    
    @property
    def originalSize(self):
        return self._originalSize
    
    @property
    def size(self):
        return self._img.size
    
    @property
    def html(self):
        return self._html
    
    @property
    def html_grey(self):
        return self._html_grey
        
    # Large image scaled to baseWidth of 320
    def scale_image(self):
        scalePercent = (self._baseWidth/float(self._img.size[0]))
        scaleHeight = int((float(self._img.size[1])*float(scalePercent)))
        scale = (self._baseWidth, scaleHeight)
        self._img = self._img.resize(scale)
    
    # PIL image converted to base64
    def image_to_html(self, img):
        with BytesIO() as buffer:
            img.save(buffer, self._format)
            return '<img src="data:image/png;base64,%s">' % base64.b64encode(buffer.getvalue()).decode()
            
    # Create Grey Scale Base64 representation of Image
    def image_to_html_grey(self):
        img_grey = self._img
        numpy = np.array(self._img.getdata()) # PIL image to numpy array
        
        grey_data = [] # key/value for data converted to gray scale
        # 'data' is a list of RGB data, the list is traversed and hex and binary lists are calculated and formatted
        for pixel in numpy:
            # create gray scale of image, ref: https://www.geeksforgeeks.org/convert-a-numpy-array-to-an-image/
            average = (pixel[0] + pixel[1] + pixel[2]) // 3  # average pixel values and use // for integer division
            if len(pixel) > 3:
                grey_data.append((average + 200, average, average, pixel[3])) # PNG format
            else:
                grey_data.append((average + 200, average, average))
            # end for loop for pixels
            
        img_grey.putdata(grey_data)
        return self.image_to_html(img_grey)

        
# prepares a series of images, provides expectation for required contents
def image_data(path=Path("images/"), images=None):  # path of static images is defaulted
    if images is None:  # default image
        images = [
            {'source': "Internet", 'label': "Smile_Emoji", 'file': "smile.png"},
        ]
    return path, images

# turns data into objects
def image_objects():        
    id_Objects = []
    path, images = image_data()
    for image in images:
        id_Objects.append(Image_Data(source=image['source'], 
                                  label=image['label'],
                                  file=image['file'],
                                  path=path,
                                  ))
    return id_Objects

# Jupyter Notebook Visualization of Images
if __name__ == "__main__":
    for ido in image_objects(): # ido is an Imaged Data Object
        
        print("---- meta data -----")
        print(ido.label)
        print(ido.source)
        print(ido.file)
        print(ido.format)
        print(ido.mode)
        print("Original size: ", ido.originalSize)
        print("Scaled size: ", ido.size)
        
        print("-- scaled image --")
        display(HTML(ido.html))
        
        print("--- grey image ---")
        display(HTML(ido.html_grey))
        
    print()

Hacks Responses

AP Prep

Notes

  • What are commands you use in terminal to access files?
    • cd, code, vi
  • What are the command you use in Windows terminal to access files?
    • cd, travel
  • What are some of the major differences?
    • Bash is a Linux based software, bash can do alot more things and is more flexible Provide what you observed, struggled with, or leaned while playing with this code.
  • Why is path a big deal when working with images?
    • So you can display images correctly
  • How does the meta data source and label relate to Unit 5 topics?
    • Meta Data is like the meta data inside the html websites
  • Look up IPython, describe why this is interesting in Jupyter Notebooks for both Pandas and Images?
    • It is an interactive tool, that can show Images
  • Does this code seem like a series of steps are being performed?
    • Yes, it first gets the images from the path in the Images dict, then scalzes and makes grey scale
  • Describe Grey Scale algorithm in English or Pseudo code?
    • All the pixels are averaged out and turn to a grey/black
  • Describe scale image? What is before and after on pixels in three images?
    • They were either alot bigger or smaller, and got put to 320
  • Is scale image a type of compression? If so, line it up with College Board terms described?
    • Yes the scaling of an image can be compression. Depending on the format, it can have lossy or lossless compression.

Practice Problems

1Q: Which of the following is an advantage of a lossless compression algorithm over a lossy compression algorithm?

  • (A) A lossless compression algorithm can guarantee that compressed information is kept secure, while a lossy compression algorithm cannot.

  • (B) A lossless compression algorithm can guarantee reconstruction of original data, while a lossy compression algorithm cannot.

  • (C) A lossless compression algorithm typically allows for faster transmission speeds than does a lossy compression algorithm.

  • (D) A lossless compression algorithm typically provides a greater reduction in the number of bits stored or transmitted than does a lossy compression algorithm.

  • 1A: B ### 2Q: A user wants to save a data file on an online storage site. The user wants to reduce the size of the file, if possible, and wants to be able to completely restore the file to its original version. Which of the following actions best supports the user’s needs?

  • (A) Compressing the file using a lossless compression algorithm before uploading it

  • (B) Compressing the file using a lossy compression algorithm before uploading it

  • (C) Compressing the file using both lossy and lossless compression algorithms before uploading it

  • (D) Uploading the original file without using any compression algorithm

  • 2A: A ### 3Q: A programmer is developing software for a social media platform. The programmer is planning to use compression when users send attachments to other users. Which of the following is a true statement about the use of compression?

  • (A) Lossless compression of video files will generally save more space than lossy compression of video files.

  • (B) Lossless compression of an image file will generally result in a file that is equal in size to the original file.

  • (C) Lossy compression of an image file generally provides a greater reduction in transmission time than lossless compression does.

  • (D) Sound clips compressed with lossy compression for storage on the platform can be restored to their original quality when they are played.

  • 3A: C

Lossy data

  • The .JPEG format is a great example of lossy data
  • The term lossy meaning, if you scale the image or change it, it wont be reversible

lossless data

  • As the name suggests, the term lossless means the changes in an image can be made without quailty being lost
  • A great example of lossless formats are, .png .gif and .raw

Programing Paradigms

from IPython.display import HTML, display
from pathlib import Path  
from PIL import Image as pilImage 
from PIL import ImageFilter, ImageDraw, ImageFont
from io import BytesIO
import base64
import numpy as np
from PIL import ImageFilter

def image_data(path=Path("images/"), images=None): 
    if images is None: 
        images = [
            {'source': "Internet", 'label': "Smile Emoji", 'file': "smile.png"},
        ]
    for image in images:
        image['filename'] = path / image['file']  
    return images

def scale_image(img):
    baseWidth = 320
    scalePercent = (baseWidth/float(img.size[0]))
    scaleHeight = int((float(img.size[1])*float(scalePercent)))
    scale = (baseWidth, scaleHeight)
    return img.resize(scale)

def image_to_base64(img, format):
    with BytesIO() as buffer:
        img.save(buffer, format)
        return base64.b64encode(buffer.getvalue()).decode()

def image_management(image):   
    img = pilImage.open(image['filename'])
    
    image['format'] = img.format
    image['mode'] = img.mode
    image['size'] = img.size
    img = scale_image(img)
    image['pil'] = img
    image['scaled_size'] = img.size
    image['html'] = '<img src="data:image/png;base64,%s">' % image_to_base64(image['pil'], image['format'])

def image_management_add_html_grey(image):
    img = image['pil']
    format = image['format']
    
    img_data = img.getdata()
    image['data'] = np.array(img_data) 
    image['gray_data'] = [] 

    for pixel in image['data']:
        average = (pixel[0] + pixel[1] + pixel[2]) // 3  
        if len(pixel) > 3:
            image['gray_data'].append((average+60, average, average+150, pixel[3])) 
        else:
            image['gray_data'].append((average, average, average))
        
    img.putdata(image['gray_data'])
    image['html_grey'] = '<img src="data:image/png;base64,%s">' % image_to_base64(img, format)

def image_management_add_blur_text(image):
    img = image['pil']
    format = image['format']

    # Add Gaussian blur 
    img_blur = img.filter(ImageFilter.GaussianBlur(radius=10))
    image['pil_blur_text'] = img_blur
    image['html_blur_text'] = '<img src="data:image/png;base64,%s">' % image_to_base64(img_blur, format)


if __name__ == "__main__":
    # Use numpy to concatenate two arrays
    images = image_data()
    
    # Display meta data, scaled view, and blurred with FINN text for each image
    for image in images:
        # Load and scale the image
        image_management(image)
        
        # Display meta data and original image
        print("---- meta data -----")
        print(image['label'])
        print(image['source'])
        print(image['format'])
        print(image['mode'])
        print("Original size: ", image['size'])
        print("Scaled size: ", image['scaled_size'])
        
        print("-- original image --")
        display(HTML(image['html']))
        
        # Convert the image to grayscale and display it
        print("---  image ----")
        image_management_add_html_grey(image)
        display(HTML(image['html_grey']))
        
        # Blur the image and add text
        print("--- blurred with text ---")
        image_management_add_blur_text(image)
        display(HTML(image['html_blur_text']))
---- meta data -----
Smile Emoji
Internet
PNG
RGBA
Original size:  (1200, 1200)
Scaled size:  (320, 320)
-- original image --
---  image ----
--- blurred with text ---

Hacks

Early Seed award

  • Add this Blog to you own Blogging site.
  • In the Blog add a Happy Face image.
  • Have Happy Face Image open when Tech Talk starts, running on localhost. Don't tell anyone. Show to Teacher.

AP Prep

  • In the Blog add notes and observations on each code cell that request an answer.
  • In blog add College Board practice problems for 2.3
  • Choose 2 images, one that will more likely result in lossy data compression and one that is more likely to result in lossless data compression. Explain.

Project Addition

  • If your project has images in it, try to implement an image change that has a purpose. (Ex. An item that has been sold out could become gray scale)

Pick a programming paradigm and solve some of the following ...

  • Numpy, manipulating pixels. As opposed to Grey Scale treatment, pick a couple of other types like red scale, green scale, or blue scale. We want you to be manipulating pixels in the image.
  • Binary and Hexadecimal reports. Convert and produce pixels in binary and Hexadecimal and display.
  • Compression and Sizing of images. Look for insights into compression Lossy and Lossless. Look at PIL library and see if there are other things that can be done.
  • There are many effects you can do as well with PIL. Blur the image or write Meta Data on screen, aka Title, Author and Image size.