join - Matching two dataframes in scala -
i have 2 rdds in scala , converted dataframes. have 2 dataframes.one produniquedf
have 2 columns named prodid
, uid
, having master data of product
scala> produniquedf.printschema root |-- prodid: string (nullable = true) |-- uid: long (nullable = false)
second, ratingsdf
have columns named prodid
,custid
,ratings
scala> ratingsdf.printschema root |-- prodid: string (nullable = true) |-- custid: string (nullable = true) |-- ratings: integer (nullable = false)
i want join above 2 , replace ratingsdf.prodid
produniquedf.uid
in ratingsdf
to this, first registered them 'temptables'
produniquedf.registertemptable("produniquedf") ratingsdf.registertemptable("ratingsdf")
and run code
val testsql = sql("select produniquedf.uid, ratingsdf.custid, ratingsdf.ratings produniquedf, ratingsdf produniquedf.prodid = ratingsdf.prodid")
but error comes :
org.apache.spark.sql.analysisexception: table not found: produniquedf; line 1 pos 66
please help! how can achieve join? there method map rdds instead?
the joining of dataframes can achieved, format is
dataframea.join(dataframeb)
by default takes inner join, can specify type of join want , have api's can here more information.
http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.sql.dataframe
for replacing values in existing column can take of withcolumn method api
it this:
val newdf = dfa.withcolumn("newcolumnname", dfb("columnname"))).drop("columnname").withcolumnrenamed("newcolumnname", "columnname")
i think might trick !
Comments
Post a Comment