(Very) Basic Intro to Hash Functions (SHA-256, MD5, etc)
Why Use A Hash Function?
Hash functions are used all over the internet in order to securely store passwords, find duplicate records, quickly store and retrieve data, and more. For example, Qvault uses hashes to extend master passwords into private encryption keys.
Scrambles Data Deterministically
Think of a Rubix cube.
Determinism is important for securely storing a password. For instance, let’s pretend my password is “iLoveBitcoin”
I can use a hash function to scramble it:
iLoveBitcoin → “2f5sfsdfs5s1fsfsdf98ss4f84sfs6d5fs2d1fdf15”
Now, if anyone were to see the scrambled version, they wouldn’t know my original password! This is important because it means that as a website developer, I only need to store the hash (scrambled data) of my user’s password to be able to verify them. When the user signs up, I hash the password and store it in my database. When the user logs in, I just hash what they typed in and compare the two hashes. Because a given input always produces the same hash, this works every time.
[It is a huge breach of security when websites store passwords in plain-text. If someone hacks one such site, they will find all the emails and passwords and can try those combinations on other websites. ](https://twitter.com/intent/tweet?url=https%3A%2F%2Fwp.me%2Fpb7Qei-9C&text=It is a huge breach of security when websites store passwords in plain-text. If someone hacks one such site%2C they will find all the emails and passwords and can try those combinations on other websites.&via=wagslane&related=wagslane) ////plain-text 纯文本
No Matter the Input, the Output is the Same Size
If I hash a single word the output will be a certain size (in the case of SHA-256, a particular hashing function, the size is 256 bits). If I hash a book, the output will be the same size.
This is another important feature because it can save us computing time. A classic example is using a hash as a key in a data map. A data map is a simple structure used in computer science to store data.///用转换后的哈希值作为数据的标签
When a program stores data in a map, a key and value are given to the map. When a program wants to access the value, it can give the appropriate key to the map and receive the corresponding value. Data maps are good because they can find data instantly. The key is used as an address that the computer can find immediately, instead of taking hours searching through millions of records.
Because keys are like addresses, they can’t be too large. If I want to store books in a data map I can hash the contents of the book and use the hash as a key. As a programmer, I can simply use the hash to look up the contents of the book instead of trying to sort through thousands of records by title, author, etc.
How Do They Work?
Here is the real challenge of writing this article. I’m going to keep it extremely simple and omit the actual implementation details while giving you a basic idea of what the computer actually does when it hashes some data.
Let’s walk through an example algorithm I’m making up on the fly for this demonstration, LANEHASH:
- We start with some data to hash
iLoveBitcoin
- I convert the letters and numbers into 1’s and 0’s (All data in computers are stored in 1’s and 0’s, different patterns of 1’s and 0’s represent different letters)
iLoveBitcoin→ 100010100000101111
- At this point we go through various predetermined steps to transform our data. The steps can be anything, the important thing is that whenever we use LANEHASH we need to use the same steps so that our algorithm is deterministic.
- We move the first four bits from the left side to the right
100010100000101111 → 101000001011111000
- We separate every other bit
101000001011111000 → 110011110 & 000001100*
- We convert those two parts into base 10 numbers. Base 10 is the “normal” number system that we all learned in school. (all binary data really just number, you can look up how it converts binary to base 10 easily online elsewhere)
110011110 → 414
000001100→ 12
- We multiply the two numbers together
414 *12 = 4968
- We square that number
4968 ^ 2 = 24681024
- We convert that number back to binary
24681024 →1011110001001101001000000
- We chop 9 bits off the right side to get exactly 16 bits
1011110001001101001000000 → 1011110001001101
- We convert that binary data back to English
1011110001001101 → “8sj209dsns02k2”
As you can see, if you start with the same word at the beginning, you will always get the same output at the end. However, if you even change one letter, the outcome will be drastically changed.