创建html嵌套父js的数据的数据帧通过火花问题，怎么解决

风水堪舆学 | 网络营销 | 住宅风水 | 英文歌曲 | Adobe After Effects | 电脑配置 | 书籍改编电影 | 下载 | Legion | 网络推广 | 动画制作 | 赛事 | PLC | 小说创作 | 虚拟专用服务器 | 成语 | 家庭 | 单反相机 | 电视节目 | 投影机 | 面相 | 香港购物 | 配音 | 文具 | 二次元 | 影视 | 固态硬盘ssd | 虚拟机 | 跆拳道 | r（编程语言） | 秦时明月之天行九歌 | 使命召唤 | 网盘 | 地图 | 琅琊榜（电视剧） | 手机内存 | 角色扮演 | 华硕 | 百度输入法 | 盗墓笔记（小说） | 营销策划 | 化妆品 | Windows | ip地址 | 装修设计 | 齐内丁·齐达内 | 动画电影 | 中国中央电视台 | 罗兰 | 网站优化 | 斗鱼直播 | 冷知识 | 张帅 | 任天堂 | 摄影师 | 三菱商事 | 迅雷（软件） | 计算机病毒 | amd | 屏幕 | 微单相机 | 电学 | qq浏览器 | MacOS | 联赛 | snh48 | 芯片（集成电路） | 后宫·甄嬛传（书籍） | 植物辨识 | 运动 | 大一 | 美容 | 双色球 | 蓝牙音箱 | 楼盘 | 电脑电源 | 采暖 | 显卡驱动 | 体育赛事 | thinkpad | 离婚 | 武侠小说 | 索尼笔记本 | 中国足球协会超级联赛（csl） | youtube | 王力宏（人物） | 外星人 | 努比亚（手机品牌） | 海贼王 | 移动电源 | 完美世界（游戏） | 摩托车 | 编辑器 | 低音炮 | 收益 | 海关 | 徐波 | akb48 | 互联网创业 | 张璐 | 男性 | 性价比 | MacBook Air | 新疆维吾尔自治区 | 插座 | 外汇平台 | 华为Mate30 | 羽毛球技术 | 腾讯 QQ | 蓝屏 | 字幕 | 免费软件 | 电脑故障 | 女生 | 周星驰（人物） | 足球欧洲杯 | pdf | macbook | 直播 | 生活经历 | 骁龙处理器 | 主题曲 | 户外运动 | CPU | 娱乐圈 | 初恋 | 家居 | 流氓软件 | 名言 | 中国足球 | 近视眼 | acg | 一级方程式赛车（f1） | 小品 | 网站运营 | 英格兰足球超级联赛 | 一体机 | 人肉搜索 | 日本电影 | 系统软件 | 人生 | 流星花园 | 电钢琴 | 分辨率 | 迅雷 | 机械设计 | 古典音乐 | 液晶电视 | 睡眠 | 大片 | 资产 | Html/Css | ansys | 天蝎座 | 对联 | 大二 | 吉他学习 | 实习 | uc浏览器 | 计算机科学 | 新华社 | 脱毛 | 视力 | 乐视超级电视 | 大学生活 | 开关电源 | 平面设计 | 音乐版权 | iPhone 11 Pro | 面膜 | 鞠婧祎 | 胡歌（演员） | 郭富城 | 语言 | 赵丽颖（演员） | 意大利 | 电路设计 | 情侣 | NBA篮球 | 蔡徐坤 | 豆瓣电影 | 社交软件 | 微信开发 | 足球彩票 | 电工 | 手机摄像头 | 用户界面设计师 | 华语流行音乐 | 网卡 | 易烊千玺 | 笛子 | 日语学习 | 日语歌曲 | 歌手 | 张子枫 | 搏击项目 | 谭松韵 | 快捷键 | O2O | 移民 |

你的位置：网站首页 >> 频道首页 >>编程语言 >>创建html嵌套父js的数据的数据帧通过火花问题，怎么解决

创建html嵌套父js的数据的数据帧通过火花问题，怎么解决

来源：蜘蛛抓取(WebSpider) 时间：2017-10-19 22:14 标签： html嵌套父js的数据

I am trying to run random forest classification by using
but I am having issues with creating right data frame input into pipeline.
Here is sample data:
age,hours_per_week,education,sex,salaryRange
38,40,"hs-grad","male","A"
28,40,"bachelors","female","A"
52,45,"hs-grad","male","B"
31,50,"masters","female","B"
42,40,"bachelors","male","B"
age and hours_per_week are integers while other features including label salaryRange are categorical (String)
Loading this csv file (lets call it sample.csv) can be done by
like this:
val data = sqlContext.csvFile("/home/dusan/sample.csv")
By default all columns are imported as string so we need to change "age" and
"hours_per_week" to Int:
= udf[Int, String]( _.toInt)
val dataFixed = data.withColumn("age", toInt(data("age"))).withColumn("hours_per_week",toInt(data("hours_per_week")))
Just to check how schema looks now:
scala& dataFixed.printSchema
|-- age: integer (nullable = true)
|-- hours_per_week: integer (nullable = true)
|-- education: string (nullable = true)
|-- sex: string (nullable = true)
|-- salaryRange: string (nullable = true)
Then lets set the cross validator and pipeline:
val rf = new RandomForestClassifier()
val pipeline = new Pipeline().setStages(Array(rf))
val cv = new CrossValidator().setNumFolds(10).setEstimator(pipeline).setEvaluator(new BinaryClassificationEvaluator)
Error shows up when running this line:
val cmModel = cv.fit(dataFixed)
java.lang.IllegalArgumentException: Field "features" does not exist.
It is possible to set label column and feature column in RandomForestClassifier ,however I have 4 columns as predictors (features) not only one.
How I should organize my data frame so it has label and features columns organized correctly?
For your convenience here is full code :
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.ml.classification.RandomForestClassifier
import org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
import org.apache.spark.ml.tuning.CrossValidator
import org.apache.spark.ml.Pipeline
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions._
import org.apache.spark.mllib.linalg.{Vector, Vectors}
object SampleClassification {
def main(args: Array[String]): Unit = {
//set spark context
val conf = new SparkConf().setAppName("Simple Application").setMaster("local");
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
import com.databricks.spark.csv._
//load data by using databricks "Spark CSV Library"
val data = sqlContext.csvFile("/home/dusan/sample.csv")
//by default all columns are imported as string so we need to change "age" and
"hours_per_week" to Int
= udf[Int, String]( _.toInt)
val dataFixed = data.withColumn("age", toInt(data("age"))).withColumn("hours_per_week",toInt(data("hours_per_week")))
val rf = new RandomForestClassifier()
val pipeline = new Pipeline().setStages(Array(rf))
val cv = new CrossValidator().setNumFolds(10).setEstimator(pipeline).setEvaluator(new BinaryClassificationEvaluator)
// this fails with error
//java.lang.IllegalArgumentException: Field "features" does not exist.
val cmModel = cv.fit(dataFixed)
Thanks for help!
解决方案 You simply need to make sure that you have a "features" column in your dataframe that is of type VectorUDF as show below:
scala& val df2 = dataFixed.withColumnRenamed("age", "features")
df2: org.apache.spark.sql.DataFrame = [features: int, hours_per_week: int, education: string, sex: string, salaryRange: string]
scala& val cmModel = cv.fit(df2)
java.lang.IllegalArgumentException: requirement failed: Column features must be of type org.apache.spark.mllib.linalg.VectorUDT@1eef but was actually IntegerType.
at scala.Predef$.require(Predef.scala:233)
at org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:37)
at org.apache.spark.ml.PredictorParams$class.validateAndTransformSchema(Predictor.scala:50)
at org.apache.spark.ml.Predictor.validateAndTransformSchema(Predictor.scala:71)
at org.apache.spark.ml.Predictor.transformSchema(Predictor.scala:118)
at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:164)
at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:164)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:108)
at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:164)
at org.apache.spark.ml.tuning.CrossValidator.transformSchema(CrossValidator.scala:142)
at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:59)
at org.apache.spark.ml.tuning.CrossValidator.fit(CrossValidator.scala:107)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.&init&(&console&:67)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.&init&(&console&:72)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.&init&(&console&:74)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.&init&(&console&:76)
Essentially there need to be two fields in your data frame "features" for feature vector and "label" for instance labels. Instance must be of type Double.
To create a "features" fields with Vector type first create a udf as show below:
val toVec4
= udf[Vector, Int, Int, String, String] { (a,b,c,d) =&
val e3 = c match {
case "hs-grad" =& 0
case "bachelors" =& 1
case "masters" =& 2
val e4 = d match {case "male" =& 0 case "female" =& 1}
Vectors.dense(a, b, e3, e4)
Now to also encode the "label" field, create another udf as shown below:
val encodeLabel
= udf[Double, String]( _ match { case "A" =& 0.0 case "B" =& 1.0} )
Now we transform original dataframe using these two udf:
val df = dataFixed.withColumn(
"features",
dataFixed("age"),
dataFixed("hours_per_week"),
dataFixed("education"),
dataFixed("sex")
).withColumn("label", encodeLabel(dataFixed("salaryRange"))).select("features", "label")
Note that there can be extra columns / fields present in the dataframe, but in this case I have selected only features and label:
scala& df.show()
+-------------------+-----+
features|label|
+-------------------+-----+
|[38.0,40.0,0.0,0.0]|
|[28.0,40.0,1.0,1.0]|
|[52.0,45.0,0.0,0.0]|
|[31.0,50.0,2.0,1.0]|
|[42.0,40.0,1.0,0.0]|
+-------------------+-----+
Now its upto you to set correct parameters for your learning algorithm to make it work.
本文地址： &
我试图用运行随机森林分类，但我有与创建正确的数据帧输入到管道的问题。下面是样本数据：年龄，hours_per_week，教育，性别，salaryRange38,40，“HS-毕业生”，“男性”，“A”28,40，“单身汉”，“女性”，“A”52,45，“HS-毕业生”，“男性”，“B”31,50，“主人”，“女性”，“B”42,40，“单身汉”，“男性”，“B”
年龄和 hours_per_week 是整数，而其他的功能，包括标签的 salaryRange 是分类（字符串）加载该csv文件（可以称之为sample.csv）可以通过这样的：
VAL数据= sqlContext.csvFile（“/家庭/杜尚/ sample.csv”）在默认情况下所有列导入为字符串，所以我们需要改变“年龄”和“hours_per_week”来诠释：
VAL toInt = UDF [诠释，字符串]（_.toInt）VAL dataFixed = data.withColumn（“时代”，toInt（数据（“时代”）））。withColumn（“hours_per_week”，toInt（数据（“hours_per_week”）））只是为了检查模式现在的样子：斯卡拉＆GT; dataFixed.printSchema根 |
- 年龄：整数（可为空=真） |
hours_per_week：整数（可为空=真） |
- 教育：字符串（可为空=真） |
- 性别：字符串（可为空=真） |
salaryRange：字符串（可为空=真）然后让设置交叉验证和管道：
VAL RF =新RandomForestClassifier（）VAL管道=新管道（）。setStages（阵列（RF））VAL CV =新CrossValidator（）。setNumFolds（10）.setEstimator（管道）.setEvaluator（新BinaryClassificationEvaluator）运行这条线时的错误显示出来：
VAL cmModel = cv.fit（dataFixed）
java.lang.IllegalArgumentException异常：字段“特色”不存在这是可以设置标签栏和功能列RandomForestClassifier，但是我有4列，predictors（功能）不是唯一的一个。
我应该如何组织我的数据帧因此具有标签和功能正常组织列？为了您的方便这里到处是code：进口org.apache.spark.SparkConf进口org.apache.spark.SparkContext进口org.apache.spark.ml.classification.RandomForestClassifier进口org.apache.spark.ml.evaluation.BinaryClassificationEvaluator进口org.apache.spark.ml.tuning.CrossValidator进口org.apache.spark.ml.Pipeline进口org.apache.spark.sql.DataFrame进口org.apache.spark.sql.functions._进口org.apache.spark.mllib.linalg {向量，向量}反对SampleClassification {
高清主（参数：数组[字符串]）：单位= {
//设置背景火花
VAL的conf =新SparkConf（）setAppName（“简单应用程序”）setMaster（“本地”）。
VAL SC =新SparkContext（CONF）
VAL sqlContext =新org.apache.spark.sql.SQLContext（SC）
进口sqlContext.implicits._
进口com.databricks.spark.csv._
通过使用databricks //加载数据“星火CSV库”
VAL数据= sqlContext.csvFile（“/家庭/杜尚/ sample.csv”）
//默认情况下，所有列导入为字符串，所以我们需要改变“年龄”和“hours_per_week”为Int
VAL toInt = UDF [诠释，字符串]（_.toInt）
VAL dataFixed = data.withColumn（“时代”，toInt（数据（“时代”）））。withColumn（“hours_per_week”，toInt（数据（“hours_per_week”）））
VAL RF =新RandomForestClassifier（）
VAL管道=新管道（）。setStages（阵列（RF））
VAL CV =新CrossValidator（）。setNumFolds（10）.setEstimator（管道）.setEvaluator（新BinaryClassificationEvaluator）
//这个失败，错误
//java.lang.IllegalArgumentException：字段“特色”不存在。
VAL cmModel = cv.fit（dataFixed）
}} 感谢您的帮助！解决方案您只需要确保你在你的数据帧一“特色”列，它是的键入 VectorUDF ，如下所示：斯卡拉＆GT; VAL DF2 = dataFixed.withColumnRenamed（“时代”，“特色”）DF2：org.apache.spark.sql.DataFrame = [特点：INT，hours_per_week：INT，教育：字符串，性别：字符串，salaryRange：字符串]斯卡拉＆GT; VAL cmModel = cv.fit（DF2）java.lang.IllegalArgumentException异常：要求失败：列功能的类型必须为org.apache.spark.mllib.linalg.VectorUDT@1eef，但实际上是IntegerType。
。在斯卡拉preDEF $ .require（predef.scala：233）
在org.apache.spark.ml.util.SchemaUtils $ .checkColumnType（SchemaUtils.scala：37）
在org.apache.spark.ml predictorParams $ class.validateAndTransformSchema（predictor.scala：50）。
。在org.apache.spark.ml predictor.validateAndTransformSchema（predictor.scala：71）
。在org.apache.spark.ml predictor.transformSchema（predictor.scala：118）
在org.apache.spark.ml.Pipeline $$ anonfun $ transformSchema $ 4.适用（Pipeline.scala：164）
在org.apache.spark.ml.Pipeline $$ anonfun $ transformSchema $ 4.适用（Pipeline.scala：164）
在scala.collection.IndexedSeqOptimized $ class.foldl（IndexedSeqOptimized.scala：51）
在scala.collection.IndexedSeqOptimized $ class.foldLeft（IndexedSeqOptimized.scala：60）
在scala.collection.mutable.ArrayOps $ ofRef.foldLeft（ArrayOps.scala：108）
在org.apache.spark.ml.Pipeline.transformSchema（Pipeline.scala：164）
在org.apache.spark.ml.tuning.CrossValidator.transformSchema（CrossValidator.scala：142）
在org.apache.spark.ml.PipelineStage.transformSchema（Pipeline.scala：59）
在org.apache.spark.ml.tuning.CrossValidator.fit（CrossValidator.scala：107）
在$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.&init&(&console&:67)
在$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.&init&(&console&:72)
在$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.&init&(&console&:74)
在IWC万国表$ $$ $$ IWC万国表IWC万国表IWC $$ $$ IWC万国表$$ $$ IWC万国表$$ $$ IWC万国表$$ $$ IWC万国表$$ $$ IWC万国表＆LT;＆初始化GT;（小于控制台计算值： 76）
本质上说需要在你的数据帧的“功能”，例如标签的特征向量和“标签”两个领域。实例的类型必须是双击。要建立一个“功能”领域与矢量键入首先创建一个 UDF ，如下所示：
VAL toVec4 = UDF [载体，INT，INT，字符串，字符串] {（A，B，C，D）=＆GT;
VAL E3 = C匹配{
案“HS-毕业生”=＆GT; 0
案“单身汉”=＆GT; 1
案“主人”=＆GT; 2
VAL E4 = D匹配{案“男”=＆GT; 0的情况下“女”=＆GT; 1}
Vectors.dense（A，B，E3，E4）} 现在也EN code中的“标签”字段，创建另一个 UDF ，如下所示：
VAL EN codeLabel = UDF [双，字符串]（_ {匹配的情况下“A”=＆GT; 0.0案件“B”=＆GT; 1.0}）现在我们使用变换原数据帧这两个 UDF ：
VAL DF = dataFixed.withColumn（
“特征”，
dataFixed（“时代”），
dataFixed（“hours_per_week”），
dataFixed（“教育”），
dataFixed（“性”）
））.withColumn（“标签”，恩codeLabel（dataFixed（“salaryRange”）））。选择（“特征”，“标签”）请注意，可以有额外的列/域present的数据帧，但在这种情况下，我只选择了功能和标签：斯卡拉＆GT; df.show（）------------------- + + ----- +|功能|标签|------------------- + + ----- +| [38.0,40.0,0.0,0.0] | 0.0 || [28.0,40.0,1.0,1.0] | 0.0 || [52.0,45.0,0.0,0.0] | 1.0 || [31.0,50.0,2.0,1.0] | 1.0 || [42.0,40.0,1.0,0.0] | 1.0 |------------------- + + ----- + 现在其高达您设置正确的参数为你的学习算法，使其工作。
本文地址： &
扫一扫关注官方微信

创建html嵌套父js的数据的数据帧通过火花问题，怎么解决

我要回帖

更多关于 html嵌套父js的数据的文章

随机推荐

创建html嵌套父js的数据的数据帧通过火花问题，怎么解决

我要回帖

更多关于 html嵌套父js的数据 的文章

随机推荐

更多关于 html嵌套父js的数据的文章