python任务调度模块
APScheduler是一个Python定时任务框架,使用起来十分方便.提供了基于日期,时间,固定时间间隔以及crontab类型的任务,并且可以持久化任务,并以daemon方式运行运用.
apsschedule中有四个组件
- 触发器(trigger)
包含调度逻辑,每一个作业有它自己的触发器,用于决定接下来哪一个作业余运行.除了他们自己初始配置以外,触发器完全是无状态的. - 作业存储(job store)
存储被调度的作业,默认的作业存储是简单地把作业保存在内存中,其他的作业存储是将作业保存在数据库中.一个作业的数据将在保存在持久化作业存储时被序列化,并在加载时被反序列化.调度器不能分享同一个作业存储. - 执行器(executor)
处理作业的运行,他们通常在作业中提交指定的可调用对象到一个线程或者进程池来进行.当作业完成时,执行器将会通知调度器 - 调度器(scheduler)
是其他的组成部分.你通常在应用只有一个调度器,应用的开发者通常不会直接处理作业存储,调度器和触发器,相反,调度器提供了处理这些的合适的接口.配置作业存储和执行器可以在调度器中完成,例如添加,修改和移除作业.
最常用的两个调度器
- BlockingScheduler
当调度器时你应用中唯一要运行的东西时使用 - BackgroundScheduler
当你不运行任何其他框架时使用,并希望调度器在你应用的后台执行
配置调度器
APScheduler提供了许多不同的方式来配置调度器,你可以使用一个配置字典或者作为参数关键字的方式传入,你也可以先创建调度器,在配置和添加作业,这样你可以在不同的环境中得到更大的灵活性.
- 实例
下面是一个简单使用BlockingScheduler,使用默认内存存储和默认执行器(默认选项分别是MemoryJobStore和ThreadPoolExecutor,其中线程池的最大线程数为10),配置完成后使用start()方法来启动.from apscheduler.schedulers.blocking import BlockingScheduler def my_job(): print 'hello world' sched = BlockingScheduler() sched.add_job(my_job, 'interval', seconds=5) sched.start()
在运行程序5秒后,将会输出第一个Hello world.
下面进行一个更复杂的配置,使用两个作业存储和两个调度器.在这个配置中,作业将使用mongo作业存储,信息写入到MongoDB中.
```
from pymongo import MongoClient
from apscheduler.schedulers.blocking import BlockingScheduler
from apscheduler.jobstores.mongodb import MongoDBJobStore
from apscheduler.jobstores.memory import MemoryJobStore
from apscheduler.executors.pool import ThreadPoolExecutor, ProcessPoolExecutor
def my_job():
print 'hello world'
host = '127.0.0.1'
port = 27017
client = MongoClient(host, port)
jobstores = {
'mongo': MongoDBJobStore(collection='job', database='test', client=client),
'default': MemoryJobStore()
}
executors = {
'default': ThreadPoolExecutor(10),
'processpool': ProcessPoolExecutor(3)
}
job_defaults = {
'coalesce': False,
'max_instances': 3
}
scheduler = BlockingScheduler(jobstores=jobstores, executors=executors, job_defaults=job_defaults)
scheduler.add_job(my_job, 'interval', seconds=5)
try:
scheduler.start()
except SystemExit:
client.close()
```
查看MongoDB可以看到作业的运行情况如下:
{ "_id" : "55ca54ee4bb744f8a5ab08cc4319bc24", "next_run_time" : 1434017278.797, "job_state" : new BinData(0, "gAJ9cQEoVQRhcmdzcQIpVQhleGVjdXRvcnEDVQdkZWZhdWx0cQRVDW1heF9pbnN0YW5jZXNxBUsDVQRmdW5jcQZVD19fbWFpbl9fOm15X2pvYnEHVQJpZHEIVSA1NWNhNTRlZTRiYjc0NGY4YTVhYjA4Y2M0MzE5YmMyNHEJVQ1uZXh0X3J1bl90aW1lcQpjZGF0ZXRpbWUKZGF0ZXRpbWUKcQtVCgffBgsSBzoMKUhjcHl0egpfcApxDChVDUFzaWEvU2hhbmdoYWlxDU2AcEsAVQNDU1RxDnRScQ+GUnEQVQRuYW1lcRFVBm15X2pvYnESVRJtaXNmaXJlX2dyYWNlX3RpbWVxE0sBVQd0cmlnZ2VycRRjYXBzY2hlZHVsZXIudHJpZ2dlcnMuaW50ZXJ2YWwKSW50ZXJ2YWxUcmlnZ2VyCnEVKYFxFn1xF1UPaW50ZXJ2YWxfbGVuZ3RocRhHQBQAAAAAAABzfXEZKFUIdGltZXpvbmVxGmgMKGgNTehxSwBVA0xNVHEbdFJxHFUIaW50ZXJ2YWxxHWNkYXRldGltZQp0aW1lZGVsdGEKcR5LAEsFSwCHUnEfVQpzdGFydF9kYXRlcSBoC1UKB98GCxIHIQwpSGgPhlJxIVUIZW5kX2RhdGVxIk51hmJVCGNvYWxlc2NlcSOJVQd2ZXJzaW9ucSRLAVUGa3dhcmdzcSV9cSZ1Lg==") }
操作作业
- 添加作业
上面是通过add_job()来添加作业,另外还有一种方式是通过scheduled_job()修饰器来修饰函数.@sched.scheduled_job('cron', id='my_job_id', day='last sun') def some_decorated_task(): print("I am printed at 00:00:00 on the last Sunday of every month!")
- 移除作业
job = scheduler.add_job(myfunc, 'interval', minutes=2) job.remove() Same, using an explicit job ID: scheduler.add_job(myfunc, 'interval', minutes=2, id='my_job_id') scheduler.remove_job('my_job_id')
- 暂停和恢复作业
- 暂停作业
- `apscheduler.job.Job.pause()``
apscheduler.schedulers.base.BaseScheduler.pause_job()
- 恢复作业
apscheduler.job.Job.resume()
apscheduler.schedulers.base.BaseScheduler.resume_job()
- 暂停作业
- 获得job列表
获得调度作业的列表,可以使用get_jobs()
来完成,它会返回所有的job实例.或者使用print_jobs()
来输出所有格式化的作业列表. - 修改作业
def some_decorated_task(): print("I am printed at 00:00:00 on the last Sunday of every month!")
- 关闭调度器
默认情况下调度器会等待所有正在运行的作业完成后,关闭所有的调度器和作业存储.如果你不想等待,可以将wait选项设置为Falsescheduler.shutdown() scheduler.shutdown(wait=False)
作业运行控制
add_job的第二个参数是trigger,它管理着作业调度方式,它可以为date,interval或者cron.
-
cron定时调度
year (int|str)
4-digit yearmonth (int|str)
month(1-12)day (int|str)
day of the (1-31)week (int|str)
ISO week(1-53)day_of_week (int|str)
number or name of weekday (0-6 or mon,tue,wed,thu,fri,sat,sun)hour (int|str)
hour(0-23)minute (int|str)
minute(0-59)second (int|str)
second(0-59)start_date (datetime|str)
earliest possible date/time to trigger on (inclusive)end_date (datetime|str)
– latest possible date/time to trigger on (inclusive)timezone (datetime.tzinfo|str)
– time zone to use for the date/time calculations (defaults to scheduler timezone)
和Linux的crontab一样,它的值格式为:
Expression Field Description * any Fire on every value */a any Fire every a values, starting from the minimum a-b any Fire on any value within the a-b range (a must be smaller than b) a-b/c any Fire every c values within the a-b range xth y day Fire on the x -th occurrence of weekday y within the month last x day Fire on the last occurrence of weekday x within the month last day Fire on the last day within the month x,y,z any Fire on any matching expression; can combine any number of any of the above expressions # Schedules job_function to be run on the third Friday # of June, July, August, November and December at 00:00, 01:00, 02:00 and 03:00 sched.add_job(job_function, 'cron', month='6-8,11-12', day='3rd fri', hour='0-3') # Runs from Monday to Friday at 5:30 (am) until 2014-05-30 00:00:00 sched.add_job(job_function, 'cron', day_of_week='mon-fri', hour=5, minute=30, end_date='2014-05-30')
-
interval 间隔调度
weeks (int)
– number of weeks to waitdays (int)
– number of days to waithours (int)
– number of hours to waitminutes (int)
– number of minutes to waitseconds (int)
– number of seconds to waitstart_date (datetime|str)
– starting point for the interval calculationend_date (datetime|str)
– latest possible date/time to trigger ontimezone (datetime.tzinfo|str)
– time zone to use for the date/time calculations
# Schedule job_function to be called every two hours sched.add_job(job_function, 'interval', hours=2)
-
date 定时调度
最基本的一种调度方式,作业只会执行一次,它的参数如下:run_date (datetime|str)
– the date/time to run the job attimezone (datetime.tzinfo|str)
– time zone for run_date if it doesn’t have one already
# The job will be executed on November 6th, 2009 sched.add_job(my_job, 'date', run_date=date(2009, 11, 6), args=['text']) # The job will be executed on November 6th, 2009 at 16:30:05 sched.add_job(my_job, 'date', run_date=datetime(2009, 11, 6, 16, 30, 5), args=['text'])