join - Matching two dataframes in scala -


i have 2 rdds in scala , converted dataframes. have 2 dataframes.one produniquedf have 2 columns named prodid , uid, having master data of product

scala> produniquedf.printschema root  |-- prodid: string (nullable = true)  |-- uid: long (nullable = false) 

second, ratingsdf have columns named prodid,custid,ratings

scala> ratingsdf.printschema root |-- prodid: string (nullable = true) |-- custid: string (nullable = true) |-- ratings: integer (nullable = false) 

i want join above 2 , replace ratingsdf.prodid produniquedf.uid in ratingsdf

to this, first registered them 'temptables'

produniquedf.registertemptable("produniquedf") ratingsdf.registertemptable("ratingsdf") 

and run code

val testsql = sql("select produniquedf.uid, ratingsdf.custid, ratingsdf.ratings produniquedf, ratingsdf produniquedf.prodid = ratingsdf.prodid") 

but error comes :

org.apache.spark.sql.analysisexception: table not found: produniquedf; line 1 pos 66 

please help! how can achieve join? there method map rdds instead?

the joining of dataframes can achieved, format is

 dataframea.join(dataframeb) 

by default takes inner join, can specify type of join want , have api's can here more information.

http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.sql.dataframe

for replacing values in existing column can take of withcolumn method api

it this:

 val newdf = dfa.withcolumn("newcolumnname", dfb("columnname"))).drop("columnname").withcolumnrenamed("newcolumnname", "columnname") 

i think might trick !


Comments

Popular posts from this blog

Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on project.Error occurred in starting fork -

windows - Debug iNetMgr.exe unhandle exception System.Management.Automation.CmdletInvocationException -

unity3d - Fatal error- Monodevelop-Unity failed to start -