• 源码安装ipython,并在ipython中整合spark


    一、安装ipython

    下载ipython, https://pypi.python.org/packages/source/i/ipython/ipython-2.2.0.tar.gz#md5=b91d3724f655a8e16d022772f696cfd5

    cd /app/softwares/ipython
    tar -zxvf ipython-2.2.0.tar.gz
    cd ipython-2.2.0
    python2.7 setup.py install
    ln -s /usr/local/python2.7/bin/ipython /usr/bin/ipython
    

    二、配置ipython notebook

    ipython profile create nbserver
    cd ~/.ipython/profile_nbserver/
    
    openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout mycert.pem -out mycert.pem
    

    在出现的提示中进行信息填写:

    Country Name (2 letter code) [XX]:CN
    State or Province Name (full name) []:Guangdong
    Locality Name (eg, city) [Default City]:Shenzhen
    Organization Name (eg, company) [Default Company Ltd]:*
    Organizational Unit Name (eg, section) []:ShuJuPingTaiBu
    Common Name (eg, your name or your server's hostname) []:*
    Email Address []:*
    

    生成加密的密码:

    python2.7 -c "import IPython;print IPython.lib.passwd()"
    
    Enter password:
    Verify password:
    sha1:5ba5d1a5aa4f:6edaa277f374497b1d026b799b473b3ef7f8c636

    ipython profile create nbserver

    vi2 ipython_notebook_config.py

    # This starts plotting support always with matplotlib
    c.IPKernelApp.pylab = 'inline'
    
    # You must give the path to the certificate file.
    
    # If using a Linux VM:
    c.NotebookApp.certfile = u'/root/.ipython/profile_nbserver/mycert.pem'
    
    # Create your own password as indicated above
    c.NotebookApp.password = u'sha1:5ba5d1a5aa4f:6edaa277f374497b1d026b799b473b3ef7f8c636'
    
    # Network and browser details. We use a fixed port (9999) so it matches
    # our Windows Azure setup, where we've allowed traffic on that port
    
    c.NotebookApp.ip = '*'
    c.NotebookApp.port = 9999
    c.NotebookApp.open_browser = False
    

    启动ipython notebook server

     ipython notebook --profile=nbserver
    

    安装pyzmq,需要安装zeromq 

    下载zeromq,http://download.zeromq.org/zeromq-4.0.4.tar.gz

    ./configure
    make && make install
    

    下载pyzmq,https://pypi.python.org/packages/source/p/pyzmq/pyzmq-14.3.1.tar.gz#md5=7196b4a6fbf98022f17ffa924be3d68d

    ln -s /usr/local/lib/libzmq.so.3 /usr/local/include/
    python2.7 setup.py install --zmq=/usr/local/

    安装Jinja2, 需要安装distribute

    下载jinja2,https://pypi.python.org/packages/source/J/Jinja2/Jinja2-2.7.3.tar.gz

    python2.7 setup.py install
    

    下载distribute,https://pypi.python.org/packages/source/d/distribute/distribute-0.7.3.zip#md5=c6c59594a7b180af57af8a0cc0cf5b4a

    python2.7 setup.py install
    

    安装makeupsafe,https://pypi.python.org/packages/source/M/MarkupSafe/MarkupSafe-0.23.tar.gz

    python2.7 setup.py install
    

    安装tornado,需要安装backports.ssl_match_hostname和certifi

    https://pypi.python.org/packages/source/t/tornado/tornado-4.0.2.tar.gz
    https://pypi.python.org/packages/source/b/backports.ssl_match_hostname/backports.ssl_match_hostname-3.4.0.2.tar.gz
    https://pypi.python.org/packages/source/c/certifi/certifi-14.05.14.tar.gz


    安装sqlite3

    http://blog.csdn.net/gl1987807/article/details/7253021
    安装 sqlite-devel.x86_64

    yum install sqlite-devel.x86_64

    安装sqlite-devel之后,仍然报sqlite3模块不存在的问题,解决该问题,参考http://stackoverflow.com/questions/1210664/no-module-named-sqlite3
    重新编译python2.7.5

    cp /app/softwares/python/Python-2.7.5/build/lib.linux-x86_64-2.7/_sqlite3.so /usr/local/python2.7/lib/python2.7/sqlite3/
    

    安装MathJax,https://github.com/mathjax/MathJax/archive/2.4.0.tar.gz

    cd /app/softwares/ipython
    python2.7 -m IPython.external.mathjax MathJax-2.4.0.tar.gz
    

    测试 ipython notebook使用,参考示例:http://www.cnblogs.com/cbscan/p/3545084.html

    from IPython.display import Latex
    Latex(r"$sqrt{x^2+y^2}$")
    
    Out[1]:
    $sqrt{x^2+y^2}$
    
    %load_ext sympyprinting
    from sympy import *
    x, y = symbols("x,y")
    sqrt(x**2+y**2)
    
    ImportError: No module named sympy 

    下载安装sympy,https://pypi.python.org/packages/source/s/sympy/sympy-0.7.5.tar.gz

    from sympy import init_printing ;
    init_printing()
    from sympy import *
    x, y = symbols("x,y")
    sqrt(x**2+y**2)
    
    Out[7]:
    $$sqrt{x^{2} + y^{2}}$$
    
    %pylab inline
    
    plot(random.randn(100));
    
    ImportError: No module named matplotlib
    

    下载安装matplotlib,https://pypi.python.org/packages/source/m/matplotlib/matplotlib-1.4.0.tar.gz#md5=1daf7f2123d94745feac1a30b210940c

    安装新版freetype,http://download.savannah.gnu.org/releases/freetype/freetype-2.5.3.tar.gz

    安装新版numpy
    https://pypi.python.org/packages/source/n/numpy/numpy-1.9.0.tar.gz#md5=510cee1c6a131e0a9eb759aa2cc62609

    https://pypi.python.org/packages/source/m/mock/mock-1.0.1.tar.gz#md5=c3971991738caa55ec7c356bbc154ee2

    https://pypi.python.org/packages/source/n/nose/nose-1.3.4.tar.gz#md5=6ed7169887580ddc9a8e16048d38274d

    https://pypi.python.org/packages/source/p/pyparsing/pyparsing-2.0.2.tar.gz#md5=b170c5d153d190df1a536988d88e95c1

    https://pypi.python.org/packages/source/p/python-dateutil/python-dateutil-2.2.tar.gz#md5=c1f654d0ff7e33999380a8ba9783fd5c

    https://pypi.python.org/packages/source/s/six/six-1.8.0.tar.gz#md5=1626eb24cc889110c38f7e786ec69885


    三、在ipython notebook中整合spark

    在/etc/profile中添加

    export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
    export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH
    export PYSPARK_PYTHON=python2.7
    

    在python中测试:

    >>> from pyspark import SparkConf, SparkContext
    >>> conf = SparkConf().setMaster("spark://ip:19002").setAppName("pyspark")
    >>> sc = SparkContext(conf = conf)
    >>> data = [1, 2, 3, 4, 5]
    >>> distData = sc.parallelize(data, 1)
    >>> distData
    ParallelCollectionRDD[0] at parallelize at PythonRDD.scala:315
    >>> distData.count()
    >>> distData.first()
    

    另外,也可以使用下述命令导入spark模块,并初始化SparkContext

    execfile("/app/spark/python/pyspark/shell.py")
    

    之后可直接使用sc,测试如下:

    file = sc.textFile("/tmp/test_spark/input")
    data = file.flatMap(lambda line: line.split(" "))
    data.collect()

      

  • 相关阅读:
    xxl-job 使用相关
    Kettle 使用相关
    C# 调用 WinApi 中 ShellExecute 打开 Excel 的方法
    SQL Server 日志文件清理
    windows远程桌面无法粘贴复制的解决方法
    大数据、业务多元化将是商业银行未来的发展趋势
    TeraData金融数据模型
    整理ORACLE数据库备份常用术语
    ORACLE恢复神器之ODU/AUL/DUL
    ORACLE之UTL_FILE包详解
  • 原文地址:https://www.cnblogs.com/Cherise/p/4351022.html
Copyright © 2020-2023  润新知