Wilcox Development Solutions Blog

Remarkable Handwriting with Vertex AI

April 19, 2024

Some Background: The Remarkable tablet

The Remarkable is an e-ink powered device that leans into writing as an experience. I often refer to it as the “infinity notebook”: in the UI you create notebooks which have pages, which you write on. Instead of a stack of notebooks I have my Remarkable.

Important to my purchase was the ability to get information out of the Remarkable. The Remarkable app, on desktops and iOS, lets you export Remarkable notebooks or individual pages to PDF, PNG, and SVG. In fact, that later one enables easy migration of handwritten drawings to OmniGraffle.

So far, for the most part, the features of the Remarkable supplement that focus. You can - and I do - enter text into the Remarkable via an onscreen or (proprietary) keyboard, and it can do handwriting recognition and conversion to text.

For my handwriting the conversion to text is OK, but not great. As such I rarely use it.

An example:

My handwriting

Remarkable’s Handwriting recognition output:

Output from the Remarkables Handwriting OCR

Not great.

Using GenAI for better handwriting recognition

The other day I was exploring Vertex AI on Google Cloud

the Vertex AI homepage

Huh, I can upload media. I tried it with my sample page

What Vertex AI did with the handwriting recognition. It's MUCH better

If I adjust the temperature - which in Google-speak means creativity - I can (sometimes) get the AI to extrapolate from the information given. Most of the time I don’t want this, but it is a knob I could tweak.

Next Step

Then I asked myself: if Google can do this, can I write a shell script that runs this prompt with an arbitrary file, from my desktop?

After some flailing around I notice the Get Code button.

The get code button .

This reproduces your prompt in code, including what libraries you need to install

The get code pane

This also includes any attachments, saved in your code as base64 encoded data.

Doing it yourself, the final script

With some minor modifications I get the following script:


import sys
import base64
import vertexai
from vertexai.generative_models import GenerativeModel, Part
import vertexai.preview.generative_models as generative_models

PROJECT = "FILL ME IN HERE"

def multiturn_generate_content():
  vertexai.init(project=PROJECT, location="us-central1")
  model = GenerativeModel(
    "gemini-1.5-pro-preview-0409",
  )
  chat = model.start_chat()
  result =  chat.send_message(
      [document1_1, text1_1],
      generation_config=generation_config,
      safety_settings=safety_settings
  )

  print(result.candidates[0].content.parts[0].text)

file_path = sys.argv[1]

document = None
with open(file_path, "rb") as f:
    document = f.read()

document1_1 = Part.from_data(
    mime_type="application/pdf",
    data=document)

text1_1 = """please use OCR to process this handwritten document into Markdown format."""

generation_config = {
    "max_output_tokens": 8192,
    "temperature": 1,
    "top_p": 0.95,
}

safety_settings = {
    generative_models.HarmCategory.HARM_CATEGORY_HATE_SPEECH: generative_models.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    generative_models.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: generative_models.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    generative_models.HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: generative_models.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    generative_models.HarmCategory.HARM_CATEGORY_HARASSMENT: generative_models.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
}

multiturn_generate_content()

Which works!

Conclusion

The Google Vision API gives the ability to more directly interact with known quantities of documents, or if I provided my own data set. For example, the Vision API had templates for bills, bank statements, and forms. If I had more specific needs (translating a handwritten form 100 times a day, for example) I may look into those APIs. For now letting generative AI figure it out with a prompt and a given PDF file works great!


Tagged with:

Written by Ryan Wilcox Chief Developer, Wilcox Development Solutions... and other things