一、概述
Superset使用Flask-Cache进行缓存,Flask-Cache支持redis,memcached,simplecache(内存),或本地文件系统)等缓存后端,如果你打算使用memcached,就需要使用memcached服务器作为后端,如果你打算使用redis,就需要安装python-redis。推荐使用redis作为缓存后端。
二、安装redis
1、去官网下载源码:https://redis.io/download
下载在,解压到data目录:
#tar xf redis-4.0.2.tar.gz -C /data
#cd /data/redis-4.0.2
#make install -j 4
#mkdir {bin,data}
#find src/ -type f -perm -111 | xargs -i cp {} ../bin
#vim /etc/profile
export PATH=$PATH:/data/redis-4.0.2/bin
#source /etc/profile
#cat redis.conf
bind 10.10.2.34
protected-mode no
port 6379
tcp-backlog 511
timeout 0
tcp-keepalive 300
daemonize yes
supervised no
pidfile /var/run/redis_6379.pid
loglevel notice
logfile "redis.log"
databases 16
always-show-logo yes
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
dir /data/redis-4.0.2/data
slave-serve-stale-data yes
slave-read-only yes
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-disable-tcp-nodelay no
slave-priority 100
#LRU(Least Recently Used)为常用的缓存清空机制
maxmemory 3g
maxmemory-policy allkeys-lru
maxmemory-samples 5
lazyfree-lazy-eviction no
lazyfree-lazy-expire no
lazyfree-lazy-server-del no
slave-lazy-flush no
appendonly no
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
aof-use-rdb-preamble no
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-size -2
list-compress-depth 0
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
aof-rewrite-incremental-fsync yes
2、启动
#redis-server /data/redis-4.0.2/redis.conf
三、为superset配置redis缓存
1、superset
在superset_config.py文件中增加如下内容:
CACHE_DEFAULT_TIMEOUT = 60*60*6
CACHE_CONFIG = {
'CACHE_TYPE': 'redis',
'CACHE_REDIS_HOST': 'spark-worker',
'CACHE_REDIS_PORT': '6379',
'CACHE_REDIS_URL': 'redis://spark-worker:6379'
}
修改hosts文件,添加10.10.2.34 spark-worker
修改完成后,重启superset生效。进入页面点击dashboard,然后去redis里面查看,
#redis-cli -h spark-worker -p 6379
>KEYS *
1) "flask_cache_cd696b3707087317077fe46bd306804a"
2、SQL Lab Celery设置
SQL Lab是superset整合的一个强大的数据库查询工具,支持所有SQLAlchemy兼容的数据库,默认通过web请求实现数据库查询。然而,当数据规模较大,需要长时间操作数据库时,会造成web请求超时而查询失败。因此,有必要为superset配置可异步执行的后端。
superset的异步后端包括:
- 一个或多个superset worker(Celery worker);
- 一个celery broker(消息队列),推荐使用redis或RabbitMQ;
- 一个结果后端,用于储存查询结果。
修改superset_config.py,增加如下内容:
class CeleryConfig(object):
BROKER_URL = 'redis://spark-worker:6379/0'
CELERY_IMPORTS = ('superset.sql_lab',)
CELERY_RESULT_BACKEND = 'redis://spark-worker:6379/0'
# CELERY_ANNOTATIONS = {'tasks.add':{'rate_limit':'10/s'}}
CELERY_CONFIG = CeleryConfig
from werkzeug.contrib.cache import RedisCache
RESULTS_BACKEND = RedisCache(
host='spark-worker', port=6379, key_prefix='superset_results')
重启superset,(superset我是用docker安装的,参考我的另一篇文章superset安装),
#docker restart superset
#docker exec -it superset /bin/bash
进入容器后,启动superset worker
#nohup superset worker 2>&1 & >>/dev/null
在web页面上进行设置,选择Allow Run Async
然后去sql lab执行sql查询,在去redis里面查看是否有缓存:
>KEYS *
1) "celery-task-meta-a006a611-4da6-41fe-a825-0da3e7b31060"
2) "celery-task-meta-e9b90a39-15a5-4d51-bf2b-3c461eede835"
3) "_kombu.binding.celery.pidbox"
4) "celery-task-meta-d02fd720-340d-4863-ab7c-e518984a494a"
5) "celery-task-meta-b65e3dee-96fe-43c7-b4c4-cd9035f14c2f"
6) "celery-task-meta-ffe2e759-5bae-46b0-912d-9b1db0f36086"
7) "_kombu.binding.celery"
8) "celery-task-meta-5480d794-3b28-444f-8f32-54d6fbe92fc9"
9) "flask_cache_cd696b3707087317077fe46bd306804a"
10) "unacked_mutex"
11) "celery-task-meta-140a3eaf-a121-4a39-8fc5-73d7a9e198c2"
12) "_kombu.binding.celeryev"
13) "celery-task-meta-e9dd5359-17c9-40a3-b3a2-f2dc977fd34c"
配置文件:
#cat superset_config.py
#---------------------------------------------------------
# Superset specific config
#---------------------------------------------------------
ROW_LIMIT = 5000
SUPERSET_WORKERS = 4
SUPERSET_WEBSERVER_TIMEOUT = 3000
SUPERSET_WEBSERVER_PORT = 8088
#---------------------------------------------------------
#---------------------------------------------------------
# Flask App Builder configuration
#---------------------------------------------------------
# Your App secret key
SECRET_KEY = '21thisismyscretkey12eyyh'
# The SQLAlchemy connection string to your database backend
# This connection defines the path to the database that stores your
# superset metadata (slices, connections, tables, dashboards, ...).
# Note that the connection information to connect to the datasources
# you want to explore are managed directly in the web UI
#SQLALCHEMY_DATABASE_URI = 'sqlite:////data/superset.db'
SQLALCHEMY_DATABASE_URI = 'sqlite:////home/superset/superset.db'
# Flask-WTF flag for CSRF
WTF_CSRF_ENABLED = True
# Add endpoints that need to be exempt from CSRF protection
WTF_CSRF_EXEMPT_LIST = []
# Set this API key to enable Mapbox visualizations
MAPBOX_API_KEY = ''
CACHE_DEFAULT_TIMEOUT = 60*60*6
CACHE_CONFIG = {
'CACHE_TYPE': 'redis',
'CACHE_REDIS_HOST': 'spark-worker',
'CACHE_REDIS_PORT': '6379',
'CACHE_REDIS_URL': 'redis://spark-worker:6379'
}
class CeleryConfig(object):
BROKER_URL = 'redis://spark-worker:6379/0'
CELERY_IMPORTS = ('superset.sql_lab',)
CELERY_RESULT_BACKEND = 'redis://spark-worker:6379/0'
# CELERY_ANNOTATIONS = {'tasks.add':{'rate_limit':'10/s'}}
CELERY_CONFIG = CeleryConfig
from werkzeug.contrib.cache import RedisCache
RESULTS_BACKEND = RedisCache(
host='spark-worker', port=6379, key_prefix='superset_results')