Original title: Hladkost funkcí naučených neuronovými sítěmi
Translated title: Smoothness of Functions Learned by Neural Networks
Authors: Volhejn, Václav ; Musil, Tomáš (advisor) ; Straka, Milan (referee)
Document type: Bachelor's theses
Year: 2020
Language: eng
Abstract: Modern neural networks can easily fit their training set perfectly. Surprisingly, they generalize well despite being "overfit" in this way, defying the bias-variance trade-off. A prevalent explanation is that stochastic gradient descent has an implicit bias which leads it to learn functions that are simple, and these simple functions generalize well. However, the specifics of this implicit bias are not well understood. In this work, we explore the hypothesis that SGD is implicitly biased towards learning functions that are smooth. We propose several measures to formalize the intuitive notion of smoothness, and conduct experiments to determine whether these measures are implicitly being optimized for. We exclude the possibility that smoothness measures based on first derivatives (the gradient) are being implicitly optimized for. Measures based on second derivatives (the Hessian), on the other hand, show promising results. 1
Keywords: generalization; machine learning; neural networks; smoothness; hladkost; neuronové sítě; strojové učení; zobecňování

Institution: Charles University Faculties (theses) (web)
Document availability information: Available in the Charles University Digital Repository.
Original record: http://hdl.handle.net/20.500.11956/119446

Permalink: http://www.nusl.cz/ntk/nusl-415933


The record appears in these collections:
Universities and colleges > Public universities > Charles University > Charles University Faculties (theses)
Academic theses (ETDs) > Bachelor's theses
 Record created 2020-08-02, last modified 2022-03-04


No fulltext
  • Export as DC, NUŠL, RIS
  • Share