hadoop - Able to create Hive table using JSON objects but not all objects are being captured into respective attribute -
i able create hive table using json objects not objects being captured respective attribute. few absorbed 1 attribute. how break out remaining attributes respective "columns".
most of below fields put table screen_name, name , location grouped number of other objects retweeted_status , user. retweeted_status , user appear long json string needs broken out table. need perform activity below, , if so, how that? appreciated. thanks,
here script,
create table bank_f001 (json string); load data local inpath 'banking-1-26-16.txt' table bank_f001; create table bank_f001_table select get_json_object(bank_f001.json, '$.text') text, get_json_object(bank_f001.json, '$.in_reply_to_user_id') in_reply_to_user_id, get_json_object(bank_f001.json, '$.id') id, get_json_object(bank_f001.json, '$.favorite_count') favorite_count, get_json_object(bank_f001.json, '$.coordinates') coordinates, get_json_object(bank_f001.json, '$.id_str') id_str, get_json_object(bank_f001.json, '$.location') location, get_json_object(bank_f001.json, '$.lang') lang, get_json_object(bank_f001.json, '$.indices') indices, get_json_object(bank_f001.json, '$.type') type, get_json_object(bank_f001.json, '$.hashtags') hashtags, get_json_object(bank_f001.json, '$.user_mentions') user_mentions, get_json_object(bank_f001.json, '$.screen_name') screen_name, get_json_object(bank_f001.json, '$.name') name, get_json_object(bank_f001.json, '$.in_reply_to_screen_name') in_reply_to_screen_name, get_json_object(bank_f001.json, '$.retweet_count') retweet_count, get_json_object(bank_f001.json, '$.favorited') favorited, get_json_object(bank_f001.json, '$.retweeted_status') retweeted_status, get_json_object(bank_f001.json, '$.user') user, get_json_object(bank_f001.json, '$.followers_count') followers_count, get_json_object(bank_f001.json, '$.statuses_count') statuses_count, get_json_object(bank_f001.json, '$.description') description, get_json_object(bank_f001.json, '$.geo_enabled') geo_enabled, get_json_object(bank_f001.json, '$.favourites_count') favourites_count, get_json_object(bank_f001.json, '$.created_at') created_at, get_json_object(bank_f001.json, '$.time_zone') time_zone, get_json_object(bank_f001.json, '$.listed_count') listed_count, get_json_object(bank_f001.json, '$.in_reply_to_user_id_str') in_reply_to_user_id_str bank_f001;
i figured out... in case you're interested... :) bunch of attributes nested in user attribute object. used following code break out... ".user.location"
create table bank_f001_table select get_json_object(bank_f001.json, '$.text') text, get_json_object(bank_f001.json, '$.in_reply_to_user_id') in_reply_to_user_id, get_json_object(bank_f001.json, '$.id') id, get_json_object(bank_f001.json, '$.favorite_count') favorite_count, get_json_object(bank_f001.json, '$.coordinates') coordinates, get_json_object(bank_f001.json, '$.id_str') id_str, get_json_object(bank_f001.json, '$.user.location') location, get_json_object(bank_f001.json, '$.lang') lang, get_json_object(bank_f001.json, '$.indices') indices, get_json_object(bank_f001.json, '$.type') type, get_json_object(bank_f001.json, '$.hashtags') hashtags, get_json_object(bank_f001.json, '$.user_mentions') user_mentions, get_json_object(bank_f001.json, '$.user.screen_name') screen_name, get_json_object(bank_f001.json, '$.user.name') name, get_json_object(bank_f001.json, '$.in_reply_to_screen_name') in_reply_to_screen_name, get_json_object(bank_f001.json, '$.retweet_count') retweet_count, get_json_object(bank_f001.json, '$.favorited') favorited, get_json_object(bank_f001.json, '$.retweeted_status') retweeted_status, get_json_object(bank_f001.json, '$.user') user, get_json_object(bank_f001.json, '$.followers_count') followers_count, get_json_object(bank_f001.json, '$.statuses_count') statuses_count, get_json_object(bank_f001.json, '$.description') description, get_json_object(bank_f001.json, '$.geo_enabled') geo_enabled, get_json_object(bank_f001.json, '$.favourites_count') favourites_count, get_json_object(bank_f001.json, '$.created_at') created_at, get_json_object(bank_f001.json, '$.time_zone') time_zone, get_json_object(bank_f001.json, '$.listed_count') listed_count, get_json_object(bank_f001.json, '$.in_reply_to_user_id_str') in_reply_to_user_id_str bank_f001;
Comments
Post a Comment