scala - value toDF is not a member of org.apache.spark.rdd.RDD -
i've read issue in other posts , still don't know i'm doing wrong. in principle, adding these 2 lines:
val sqlcontext = new org.apache.spark.sql.sqlcontext(sc) import sqlcontext.implicits._
should have done trick error persists
this build.sbt:
name := "pickacustomer" version := "1.0" scalaversion := "2.11.7" librarydependencies ++= seq("com.databricks" %% "spark-avro" % "2.0.1", "org.apache.spark" %% "spark-sql" % "1.6.0", "org.apache.spark" %% "spark-core" % "1.6.0")
and scala code is:
import scala.collection.mutable.map import scala.collection.immutable.vector import org.apache.spark.rdd.rdd import org.apache.spark.sparkcontext import org.apache.spark.sparkcontext._ import org.apache.spark.sparkconf import org.apache.spark.sql._ object foo{ def reshuffle_rdd(rawtext: rdd[string]): rdd[map[string, (vector[(double, double, string)], map[string, double])]] = {...} def do_prediction(shuffled:rdd[map[string, (vector[(double, double, string)], map[string, double])]], prediction:(vector[(double, double, string)] => map[string, double]) ) : rdd[map[string, double]] = {...} def get_match_rate_from_results(results : rdd[map[string, double]]) : map[string, double] = {...} def retrieve_duid(element: map[string,(vector[(double, double, string)], map[string,double])]): double = {...} def main(args: array[string]){ val conf = new sparkconf().setappname(this.getclass.getsimplename) if (!conf.getoption("spark.master").isdefined) conf.setmaster("local") val sc = new sparkcontext(conf) //this should trick val sqlcontext = new org.apache.spark.sql.sqlcontext(sc) import sqlcontext.implicits._ val path_file = "/mnt/fast_export_file_clean.csv" val rawtext = sc.textfile(path_file) val shuffled = reshuffle_rdd(rawtext) // predict function of last seen uid val results = do_prediction(shuffled.filter(x => retrieve_duid(x) > 1) , predict_as_last_uid) results.cache() case class summary(ismatch: double, t_to_last:double, nflips:double,d_uid: double, truth:double, guess:double) val summary = results.map(x => summary(x("match"), x("t_to_last"), x("nflips"), x("d_uid"), x("truth"), x("guess"))) //problematic line val sum_df = summary.todf() } }
i get:
value todf not member of org.apache.spark.rdd.rdd[summary]
bit lost now. ideas?
move case class outside of main
:
object foo { case class summary(ismatch: double, t_to_last:double, nflips:double,d_uid: double, truth:double, guess:double) def main(args: array[string]){ ... } }
something scoping of preventing spark being able handle automatic derivation of schema summary
. fyi got different error sbt
:
no typetag available summary
Comments
Post a Comment