Apache Spark 1.6.1 發布了,Apache Spark 是一種與 Hadoop 相似的開源集群計算環境,但是兩者之間還存在一些不同之處,這些有用的不同之處使 Spark 在某些工作負載方面表現得更加優越,換句話說,Spark 啟用了內存分布數據集,除了能夠提供交互式查詢外,它還可以優化迭代工作負載。
Spark 是在 Scala 語言中實現的,它將 Scala 用作其應用程式框架。與 Hadoop 不同,Spark 和 Scala 能夠緊密集成,其中的 Scala 可以像操作本地集合對象一樣輕鬆地操作分布式數據集。
儘管創建 Spark 是為了支持分布式數據集上的迭代作業,但是實際上它是對 Hadoop 的補充,可以在 Hadoo 文件系統中並行運行。通過名為 Mesos 的第三方集群框架可以支持此行為。Spark 由加州大學伯克利分校 AMP 實驗室 (Algorithms, Machines, and People Lab) 開發,可用來構建大型的、低延遲的數據分析應用程式。
新特性
[SPARK-10359] - Enumerate Spark's dependencies in a file and diff against it for new pull requests
Bug 修復
[SPARK-7615] - MLLIB Word2Vec wordVectors divided by Euclidean Norm equals to zero
[SPARK-9844] - File appender race condition during SparkWorker shutdown
[SPARK-10524] - Decision tree binary classification with ordered categorical features: incorrect centroid
[SPARK-10847] - Pyspark - DataFrame - Optional Metadata with `None` triggers cryptic failure
[SPARK-11394] - PostgreDialect cannot handle BYTE types
[SPARK-11624] - Spark SQL CLI will set sessionstate twice
[SPARK-11972] - [Spark SQL] the value of 'hiveconf' parameter in CLI can't be got after enter spark-sql session
[SPARK-12006] - GaussianMixture.train crashes if an initial model is not None
[SPARK-12010] - Spark JDBC requires support for column-name-free INSERT syntax
[SPARK-12016] - word2vec load model can't use findSynonyms to get words
[SPARK-12026] - ChiSqTest gets slower and slower over time when number of features is large
[SPARK-12268] - pyspark shell uses execfile which breaks python3 compatibility
[SPARK-12300] - Fix schema inferance on local collections
[SPARK-12316] - Stack overflow with endless call of `Delegation token thread` when application end.
[SPARK-12327] - lint-r checks fail with commented code
詳情請看:https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12334009&styleName=Html&projectId=12315420&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED|a0202c18e71ce446af35a0775298cc3f2be9d54f|lin
下載地址:http://spark.apache.org/downloads.html