[ad_1]
Earlier this week, LinkedIn introduced it was open-sourcing AvroTensorDataset, which is a “TensorFlow dataset for studying, parsing, and processing Avro knowledge.” Apache Avro is the first storage format that LinkedIn makes use of for its coaching knowledge.
Based on LinkedIn, it was experiencing bottlenecks in its machine studying workloads that had been brought on by the necessity to learn a number of terabytes of enter knowledge. AvroTensorDataset can pace up preprocessing of knowledge by a number of orders of magnitude, in response to the corporate.
The instrument was constructed internally at LinkedIn, and it needed to open-source the undertaking in order that others may expertise the massive efficiency boosts to coaching workloads. It has been in manufacturing for over a yr already at LinkedIn.
LinkedIn says that with this instrument it has been capable of enhance processing pace by 162x in comparison with current options and has decreased total coaching time by 66%
“ATDSDataset is LinkedIn’s answer to effectively learn Avro knowledge into TensorFlow. By a number of efficiency enhancements, we had been capable of pace up I/O throughput by orders of magnitude over current Avro reader options. Our workforce at LinkedIn labored intently with the TensorFlow I/O group to open-source this characteristic, and we hope that by open-sourcing it, the TensorFlow group can even profit from these efficiency enhancements,” Jonathan Hung, workers software program engineer at LinkedIn, wrote in a weblog put up.
[ad_2]