Introducing AfsaneDB (Beta) – Now Available on Play Store!

Dive into the world of classic literature with AfsaneDB. Explore timeless masterpieces in an elegant and user-friendly app, designed for book lovers like you.

Showing posts with label Tutorial. Show all posts
Showing posts with label Tutorial. Show all posts

Monday, 13 January 2025

Setting Up OCR for Windows and Linux: A Comprehensive Guide

Automating repetitive tasks like extracting text from images can save valuable time (unless you don't value it, in that case it will save some invaluable time). This process, known as Optical Character Recognition (OCR), is a powerful tool for converting text in images into editable formats. Here, I’ll walk you through setting up custom OCR solutions for both Windows (that we all have) and Linux (used in office mostly) systems, complete with keyboard shortcuts for seamless integration.

Most of my OCR use in personal work is for Urdu, but professionally - it is required for English as well. Using Google lens is a fine option - exept that you hate repeating those clicks and key presses just to copy some text from image. And to be honest - I kind of feel bad even for giant corps like Google when I unnecessarily utilize their 'precious' resources.

Why not use a browser extension you ask? Well, because it's limited to browser - and we do need text from other apps as well. You can argue that one can take a screenshot of that app and then go to browser and run OCR, but if you have opened a browser and afford to take a screenshot just for that, why not run Google lens instead of an extension. You get the point.


Background

OCR technology is invaluable for tasks such as digitizing printed documents, extracting text from screenshots, or processing scanned images. By automating OCR, you can:

  • Instantly access extracted text.
  • Improve productivity.
  • Simplify your workflow.

This guide provides a step-by-step walkthrough for setting up OCR on Windows and Linux, ensuring a smooth and user-friendly experience.


Introduction

Why Automate OCR?

Manual text extraction is time-consuming and error-prone. Automating the process ensures:

  • Faster access to text data.
  • Minimal effort for repetitive tasks.
  • A consistent and reliable workflow.

How It Works

We’ll create scripts for Windows and Linux that:

  1. Capture an image or utilize an existing one.
  2. Perform OCR using Tesseract (an open-source OCR engine).
  3. Copy the extracted text directly to the clipboard.

Setup

Prerequisites

Before getting started, ensure you have the following:

  1. Tesseract OCR

    • Download and install from Tesseract’s official page.
    • Install necessary language packs (e.g., -l eng for English, -l ara+eng for Arabic and English).
  2. Clipboard Utilities

    • Windows: Use nircmd for clipboard operations.
    • Linux: Install xclip for clipboard management.
  3. Screenshot Tools

    • Windows: Use built-in snipping tools or third-party software.
    • Linux: Install flameshot for advanced screenshot functionality.

Procedure

For Windows

1. Create the OCR Script

Create a batch file named sstoocr.bat and save it in a convenient location:

@echo off
:: Save clipboard to image
start nircmd/nircmd.exe clipboard saveimage screenshot.png

:: Run Tesseract OCR on the image
tesseract screenshot.png output -l ara+eng

:: Copy extracted text to clipboard
type output.txt | clip

:: Optionally, clean up
:: del screenshot.png
:: del output.txt

2. Assign a Shortcut

  1. Place the script on your desktop.
  2. Right-click the script and select Create Shortcut.
  3. Right-click the shortcut, go to Properties, and under the Shortcut tab, assign Ctrl + Alt + O as the shortcut key.

3. Use the Script

  1. Copy an image to the clipboard or take a screenshot.
  2. Press Ctrl + Alt + O.
  3. The extracted text will automatically be copied to your clipboard.

For Linux

1. Create the OCR Script

Create a shell script named flameshot_ocr.sh:

#!/bin/bash
flameshot gui --raw | tesseract -l eng stdin stdout | xclip -selection clipboard

Make the script executable:

chmod +x flameshot_ocr.sh

2. Assign a Shortcut

  1. Open your desktop environment’s keyboard settings.
  2. Add a custom shortcut:
    • Command: /path/to/flameshot_ocr.sh
    • Shortcut: Ctrl + Shift + O

3. Use the Script

  1. Press Ctrl + Shift + O to open the Flameshot GUI.
  2. Select the area to capture.
  3. The text will be extracted and copied to your clipboard.

Conclusion

By following this guide, you can set up a streamlined OCR solution for both Windows and Linux. With a simple keyboard shortcut, you’ll have quick access to extracted text directly on your clipboard, saving time and effort.

Feel free to customize these scripts to better suit your needs. Happy automating, and may your workflows become ever more efficient!

Saturday, 3 October 2020

Magic of Browser Bookmarks - Automate Simple Tasks using JavaScript

Automation using Bookmarklet

As I promised in #LearnedToday, I'm going to show you how much you can achieve with this little bookmark feature in the browsers.

Ever wondered how to easily remove citations from a Wikipedia page? 

What are bookmarks?

The bookmarks in the browsers are to save the links to the pages you wish to visit again, or you just find them useful and save them for later. 

Instead of creating a text file "Imp Links" and saving all the links there (I've done it a lot), you could use the browser's bookmark feature.

The shortcut to bookmark a webpage in most browsers is ctrl+b.

What more can they do?

To sum up, they can run JavaScript on a page. So instead of opening the browser console to do run a couple lines of code, you could create a bookmark and click that instead.

Example?

Whenever I needed to copy something from Wikipedia, I usually had to deal with the references/citations they have. You must've seen those, with squared brackets around numbers, something like this [1] or with a disclaimer like [citation needed], etc. I needed to remove all those. 

Initially, I used to do it in MS Word manually, by Find and Replace. I don't remember that now, doesn't matter anyway. 

Finally, I came to know about these browser bookmarklets, and then a simple regex was enough to do the work for me.

Now I have a simple bookmark. I go to any Wikipedia page, select the text I need, and click the bookmark. Viola! Citations are removed.

How to create a bookmarklet?

Got to Bookmarks Manager

1. Click three vertical dots in the upper right corner > Bookmark > Bookmark Manager

Or chrome shortcut: ctrl+shift+o

Or type in the address bar: chrome://bookmarks/

2. Click three vertical dots in the upper right corner of Bookmark manager (Shows tooltip: Organize) 

3. Add new bookmark

4. It will show a popup with two fields: Name and URL. 

5. Give any appropriate name, and in the URL bar, paste the JavaScript code you want to execute.

6. Click Save. 


You have your bookmarklet ready. 

Show/Hide Bookmarks bar with ctrl+shift+b. Clicking on the name of your bookmark will run the underlying code. 

Any easier way to do this?

If you don't want to go through all those steps, there's a simple tool called Bookmarkleter. Paste your JS code, it will generate a link that you can drag and drop to the bookmarks bar. 

For example, drag and drop the following link to your bookmarks bar. This will allow you to change fonts on any website. 

Which bookmarklets am I using?

  1. Citation Remover: Removes citations from a Wikipedia page. Drag&drop this link to the bookmarks bar: Citation Remover
  2. Set Font: If a website is using bad font, use this. As I use Urdu a lot, and Urdu without Nastaleeq font looks ugly. So I apply any font to the page available in my system. Payami Nastaleeq is the default one for me.
  3. Calci: A tiny calculator which returns results of simple arithmetic operations.
  4. StyleStripper: Strips all CSS styles from a webpage. Helpful if I don't want to load an entire page I want to copy something from. Also works on most of the sites which disable copying using JavaScript. Click StyleStripper and you can copy the text. 

Misc. bookmarklets I created

QuoraSkip: Skip Quora-enforced 'login' popup by removing added elements and blur overlay.

To those who requested, don't complain now. (Abuzar :D) I have shared it finally. More such tips will follow. Keep visiting! And I know you will. :wink:

Rab Raakha!

Friday, 2 October 2020

PDF to Single Image - A Tutorial by 17 Year Old Me

Back in the days when I had a small Nokia phone, I wanted to do EVERTHING in that tiny device. It wasn't actually mine but because I was going to college, I was more "in need" of it than my sister.

Nokia-C1-01 Phone I had in my Engineering
Source: gsmarena.com [1]

The one on your right with maroon border. That was it.

Anyway, with a screen of 144x160px, I wanted to read PDFs which were stored in our desktop + laptop. Lots of books, of almost all genres I was interested in. Interestingly enough, the same neatly arranged folders are copied over to every computer I have used. So I still have all those books, plus what was added later on.

Initially, the idea to "read PDF on phone" was for the Quran, so that I could read it in the Indo-Pak Naskh font. Actually I had a Quran app in it, full text with super fast search engine, but the font used in that wasn't good enough for long tilaawat. In fact, even after getting android phone I've been searching something as fast as that app. I had been a fan of that guy who built it. Just looked it up, he goes by the name of Raza Mahi. His "Mahi Dictionary" was awesome too. All java .jar applications are things of the past now, but he has also moved on and started to build the similar apps for Android now. Good for him. I've linked his website in the references. [2]

So where was I? Yes. As I had difficulty reading the Quran in that app, I selected a PDF copy of Quran which had Arabic text in one column and its Urdu translation side-by-side. I cropped-out the translation part (making the text narrow enough to fit on my phone) and then started thinking about a way to achieve the result.

Necessity is the mother of invention they say, so I came up with two methods (discussed in the booklet below). Will attach the Quran files too for the record. Wow! Time flies. Seems like yesterday to me.

Later on when I converted many books to 'single image' using the same method, I compiled a short tutorial in the form of a booklet. I've left the whole text as is, without any correction in grammar or sentence structure, because

  1. It's a reminder of my journey (read the booklet and see for yourself how writing styles change)
  2. It's cute. ;)
Here's the summary of the two methods discussed in the booklet:

Method 1: Microsoft Office OneNote + MS Paint
Method 2: PDF to Images + IrfanView

Read the booklet and know how to use them. And remember it's an OLD tutorial.

DOWNLOADS

PDF to Single Image Tutorial (Booklet) : Read online or download

https://archive.org/details/PDFToSingleImageShakes.Ahmad

IrfanView: I came to know later on that this was very popular image-manipulation tool back then, and still is. Its first release was in June 1996. Now it's more powerful than ever. Check its Wikipedia page.[3]

https://www.irfanview.com

PDF to Images Converter: I still use it. Small size, works smoothly.

https://www.weenysoft.com/free-pdf-to-image-converter.html


Enjoy!


Reference

[1] Specifications of Nokia C1-01 via gsmarena [link]

[2] Raza Mahi Team - Old Apps [link]

[3] IrfanView on Wikipedia [link]

Sunday, 5 April 2020

LTE - Types, Features and Working

WHAT IS IT?

Assuming this is a new term for you and you have no idea what this is, “what on earth does this mean” is the first thing you should ask. Let’s know the full form first. LTE stands for Long-Term Evolution.
Ok. But evolution of what? I don’t know either. According to sources, this naming conventions were a part of advertising the technology and appeal to the customer base. Alright, enough of the intro, let’s know it’s simple explanation borrowed from Wikipedia:
Long-Term Evolution (LTE) is a standard for wireless broadband communication for mobile devices and data terminals.
You still don’t get it, did you? Remember 2G and 3G technologies? This LTE is the next stone in that journey. So the architecture was purely ased on the 3G technology by UMTS. Much of the LTE standard addresses the upgrading of 3G UMTS to what will eventually be 4G.
What’s the major difference between LTE and the third generation (3G)? Well, a large amount of the work is aimed at simplifying the architecture of the system. But is it 4G? We’ll discuss this in the end of this blog. For now, let’s jump to its classification.

TYPES

There are basically 2 mobile data transmission technologies based on 2 major factors, viz:
How data is uploaded and downloaded
What frequency spectra the networks are deployed in
So, based on these two factors, we have two types of LTE.
1. Long-Term Evolution Time-Division Duplex (LTE-TDD)
2. Long-Term Evolution Frequency-Division Duplex (LTE-FDD)
Before proceeding with this, let’s know some basics of GSM and CDMA so that you know what these “divisions” are. Afterwards, you’ll be able to digest this easily.

GSM, CDMA and LTE

GSM and CDMA are two different ways to accomplish the two things. LTE is newer.
The way GSM solves (1) is by something called TDMA (time division multiple access). When you're in a phone call, you're phone is scheduled a bunch of time slots when your phone either sends or receives data. These exclusive to your phone and different from other phones in the cell so there's no interference. This way, multiple phones can talk to the cell tower (seemingly) at once (the bursts of time are super short so you don't notice them).
CDMA deals with (1) in a completely different way. It breaks up the channel into codes/signals (Code division random access). This is a little hard to explain without some math, but there's a notion called orthogonality. If two signals are orthogonal you can pull one signal out without getting interference from the other. Every user is assigned a different code/signal and these are (approximately) orthogonal to each other. This is a more advanced technique and generally thought of as advantageous since there isn't as much waste (TDMA needs little bits of extra time between users to make sure there's no overlap, for example).
The way (2) is accomplished is also very different. In fact there are many different ways it is done even within GSM or CDMA. The way data is sent along depends a lot on how good the quality of the radio signal and other factors. That's a whole other thing. But the options for GSM and CDMA differ.
3G and 4G are kind of marketing terms that come from "3rd generation" and "4th generation". They refer to families of standards, but not specific methods to accomplish (1) or (2).
Now you know the basics, let’s get back to types of LTE.

LTE-TDD and LTE-FDD

LTE-TDD Uses a single frequency, alternating between uploading and downloading data through time while LTE-FDD paired frequencies to upload and download data.
Despite the differences in how the two types of LTE handle data transmission., LTE-TDD and LTE-FDD share 90 percent of their core technology. This makes it possible for the same chipsets and networks to use both versions of LTE.
Several companies produce dual-mode chips or mobile devices, including Samsung and Qualcomm.

FEATURES


  • Peak download rates up to 299.6 Mbit/s and upload rates up to 75.4 Mbit/s
  • Cost effective
  • Low data transfer latencies
  • Lower latencies for handover and connection setup time 
  • Higher network throughput
  • Improved support for mobility, exemplified by support for terminals moving at up to 350 km/h
  • Orthogonal frequency-division multiple access for the downlink, Single-carrier FDMA for the uplink to conserve power
  • Support for inter-operation and co-existence with legacy standards (GSM/GPRS or W-CDMA-based UMTS )
  • Uplink and downlink Carrier aggregation.
  • Packet-switched radio interface
  • It’s because of these features that most carriers supporting GSM networks can be expected to upgrade their networks to LTE at some stage


MADE OF?

What is LET made of? That means it’s working backbone consists of these things, most of which we have already discussed above. For the concepts you might not find familiar, I’ve attached link to resources so that you can have an idea of what they are.
OFDM (Orthogonal Frequency Division Multiplexing) for Downlink
SC-FDMA (Single Carrier FDMA) for Uplink
MIMO (Multiple Input Multiple Output)
E-UTRAN (for Network)

VOICE CALLS IN LTE

One of the major problems they faced designing LTE was how to handle voice calls using it. LTE was primarily meant for (internet) data transfer, so the transfer of voice data to integrate with telecom operators was an issue.
With the adoption of LTE, carriers had to re-engineer their voice call network. The reason behind this was that the LTE standard supports only packet switching with its all-IP network. On the other hand, voice calls in GSM, UMTS and CDMA2000 are circuit switched.
3 different approaches sprang up to handle this:

1] Voice over LTE (VoLTE)

VoLTE networks support both voice and data at the same time, without hampering the other. Whereas, the traditional LTE networks may or may not support data and voice together, or may affect the quality of the voice call

2] Circuit-Switched Fallback (CSFB)

LTE just provides data services. When voice call is to be made, it will fall back to the circuit-switched domain.
Advantage: Operators can provide services quickly.
Disadvantage: Requires longer call setup delay.

3] Simultaneous Voice and LTE (SVLTE)

Handset works simultaneously in the LTE and circuit switched modes.
LTE mode providing data services and the circuit switched mode providing the voice service. This is a solution solely based on the handset, which does not have special requirements on the network.
Disadvantage: The phone can become expensive with high power consumption.

IS IT 4G?

Now the controversy (not a big one, I know… but still, it is there.)
Contrary to popular belief, LTE at the current stage was not always considered 4G. ITU (International Telecommunication Union) determines what can be considered 4G and they initially had defined all the standards which a technology had to meet. LTE couldn’t meet those requirements.
Therefore, LTE is popularly known as 3.95G.
LTE-Advanced did make the cut through. But the business and telecom operators had allegedly “influenced” the ITU to update their standards so that they can advertise their services as 4G to attract users.
As a result, there is a slight disagreement between the businesspeople and technophiles on definition of 4G. technophiles consider the original ITU guidelines as a standard for 4G.

CONCLUSION

To solve “How to get many people to share a piece of spectrum”, LTE uses OFDMA which increases throughput
Hope you get at least the gist of what’s been explained in this blog. If not, jump over to the pages linked in the article or post comment if you are reading this on ShakesVision.

SHAKEEB AHMAD
April 05, 2020