Pyspark 'tzinfo' error when using the Cassandra connector -
i'm reading cassandra using
a = sc.cassandratable("my_keyspace", "my_table").select("timestamp", "vaue")
and want convert dataframe:
a.todf()
and schema correctly infered:
dataframe[timestamp: timestamp, value: double]
but when materializing dataframe following error:
py4jjavaerror: error occurred while calling o89372.showstring. : org.apache.spark.sparkexception: job aborted due stage failure: task 0 in stage 285.0 failed 4 times, recent failure: lost task 0.3 in stage 285.0 (tid 5243, kepler8.cern.ch): org.apache.spark.api.python.pythonexception: traceback (most recent call last): file "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main process() file "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) file "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream vs = list(itertools.islice(iterator, batch)) file "/opt/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/types.py", line 541, in tointernal return tuple(f.tointernal(v) f, v in zip(self.fields, obj)) file "/opt/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/types.py", line 541, in <genexpr> return tuple(f.tointernal(v) f, v in zip(self.fields, obj)) file "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/sql/types.py", line 435, in tointernal return self.datatype.tointernal(obj) file "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/sql/types.py", line 190, in tointernal seconds = (calendar.timegm(dt.utctimetuple()) if dt.tzinfo attributeerror: 'str' object has no attribute 'tzinfo'
which sounds string
been given pyspark.sql.types.timestamptype
.
how debug further?
Comments
Post a Comment