Encode and Decode Strings
要点:题的特点:不是压缩,而是encode为字节流。所以需要找delimiter来分割每个word,但是delimiter可能是字符本身,所以可以用数字记录每个word的长度。这样一般比delimiter简单并且节省空间。接着可能会想到如果前后单词有数字怎么办?所以需要一个特殊字符来分割数字和词本身,而这个字符要在数字之后:不用担心前面单词以数字结尾,前一个单词通过长度信息已经成功decode。
- ''.join('%d:' % len(s)+s for s in strs)意思:(‘%d:’% len(s) + s) 这个对strs做list comprehension产生新list,所以每个都有len(s):在开头,new style是’’.join(‘{}:{}’.format(len(s), s) for s in strs
错误点:
- 要用find(‘:’, i),也就是有boundary的方法来得到下一个:的位置
- 这题证明list.append => join 比str+=快不少
- 实际上,多string/int混合最好用formatter
# Design an algorithm to encode a list of strings to a string. The encoded string is then sent over the network and is decoded back to the original list of strings.
# Machine 1 (sender) has the function:
# string encode(vector<string> strs) {
# // ... your code
# return encoded_string;
# }
# Machine 2 (receiver) has the function:
# vector<string> decode(string s) {
# //... your code
# return strs;
# }
# So Machine 1 does:
# string encoded_string = encode(strs);
# and Machine 2 does:
# vector<string> strs2 = decode(encoded_string);
# strs2 in Machine 2 should be the same as strs in Machine 1.
# Implement the encode and decode methods.
# Note:
# The string may contain any possible characters out of 256 valid ascii characters. Your algorithm should be generalized enough to work on any possible characters.
# Do not use class member/global/static variables to store states. Your encode and decode algorithms should be stateless.
# Do not rely on any library method such as eval or serialize methods. You should implement your own encode/decode algorithm.
# Hide Company Tags Google
# Hide Tags String
# Hide Similar Problems (E) Count and Say (H) Serialize and Deserialize Binary Tree
class Codec:
def encode(self, strs):
"""Encodes a list of strings to a single string.
:type strs: List[str]
:rtype: str
"""
return ''.join("%d:" % len(s) + s for s in strs)
def decode(self, s):
"""Decodes a single string to a list of strings.
:type s: str
:rtype: List[str]
"""
i = 0
ret = []
while i<len(s):
j = s.find(':', i) # error 1: should have i as find's left boundary, inclusive
i = j+int(s[i:j])+1
ret.append(s[j+1:i])
return ret
sol = Codec()
assert sol.decode(sol.encode(['3e5d', 'd:4rf']))==['3e5d', 'd:4rf']