Advances in machine learning in recent years have also been seen in computationally creative systems. Interest in machine-generated artifacts paved a way for creative models to evolve as such. But the earlier methods mostly explored a one domain approach and cross-modal learning has stayed relatively unexplored. Thus, the direct mapping between modalities for cross-modal creative models is not fully explored. This work proposes a novel methodology for generating symbolic music through images by directly mapping their features. A CNN encoder and deep stacked LSTM decoder are the base models as the proposed method uses the image captioning approach to map the two domains’ features. The generated music is evaluated quantitatively by using a custom genre classification model and BLEU scores calculations. The qualitative evaluation involves a melody listening test with human evaluators. The results show that the proposed method works well for music generation.
Eser ady (dc.title) | Image to Music: Cross-Modal Melody Generation Through Image Captioning |
Yazar [Asıl] (dc.creator.author) | Kaplan, Alper |
Yazar Departmanı (dc.creator.department) | Yeditepe University Graduate School of Social Sciences |
Yazar Departmanı (dc.creator.department) | Yeditepe University Graduate School of Social Sciences Cognitive Science Department |
Neşir senesi (dc.date.issued) | 2023 |
Yayın Turu [Akademik] (dc.type) | preprint |
Yayın Türü [Ortam] (dc.format) | application/pdf |
Konu Başlıkları [Genel] (dc.subject) | Music Generation |
Konu Başlıkları [Genel] (dc.subject) | Melody Generation |
Konu Başlıkları [Genel] (dc.subject) | Cross-Domain Learning |
Konu Başlıkları [Genel] (dc.subject) | Image Captioning |
Konu Başlıkları [Genel] (dc.subject) | Machine Learning |
Konu Başlıkları [Genel] (dc.subject) | Deep Learning |
Konu Başlıkları [Genel] (dc.subject) | Müzik Üretimi |
Konu Başlıkları [Genel] (dc.subject) | Melodi Üretimi |
Konu Başlıkları [Genel] (dc.subject) | Alanlar Arası Öğrenim |
Konu Başlıkları [Genel] (dc.subject) | Resim Altyazısı |
Konu Başlıkları [Genel] (dc.subject) | Makine öğrenimi |
Konu Başlıkları [Genel] (dc.subject) | Derin Öğrenme |
Yayıncı (dc.publisher) | Yeditepe University Academic and Open Access Information System |
Dil (dc.language.iso) | eng |
Özet Bilgisi (dc.description.abstract) | Advances in machine learning in recent years have also been seen in computationally creative systems. Interest in machine-generated artifacts paved a way for creative models to evolve as such. But the earlier methods mostly explored a one domain approach and cross-modal learning has stayed relatively unexplored. Thus, the direct mapping between modalities for cross-modal creative models is not fully explored. This work proposes a novel methodology for generating symbolic music through images by directly mapping their features. A CNN encoder and deep stacked LSTM decoder are the base models as the proposed method uses the image captioning approach to map the two domains’ features. The generated music is evaluated quantitatively by using a custom genre classification model and BLEU scores calculations. The qualitative evaluation involves a melody listening test with human evaluators. The results show that the proposed method works well for music generation. |
Täzelenenç Düzümleniş Senesi (dc.date.accessioned) | 2023-12-28 |
Açık Erişim Tarihi (dc.date.available) | 2023-12-28 |
Haklar (dc.rights) | Yeditepe University Academic and Open Access Information System |
Erişim Hakkı (dc.rights.access) | Open Access |
Telif Hakkı (dc.rights.holder) | Unless otherwise stated, copyrights belong to Yeditepe University. Usage permissions are specified in the Open Access System, and "InC-NC/1.0" and "by-nc-nd/4.0" are as stated. |
Telif Hakkı Url (dc.rights.uri) | http://creativecommons.org/licenses/by-nc-nd/4.0 |
Telif Hakkı Url (dc.rights.uri) | https://rightsstatements.org/page/InC-NC/1.0/?language=en |
Açıklama [Genel] (dc.description) | Final published version |
Açıklama [Not] (dc.description.note) | Note: This preprint reports new research that has not been certified by peer review and should not be used as established information without consulting multiple experts in the field. |
Tanım Koleksiyon Bilgisi (dc.description.collectioninformation) | This item is part of the preprint collection made available through Yeditepe University library. For your questions, our contact address is openaccess@yeditepe.edu.tr |
Yazar [KatkıdaBulunan] (dc.contributor.author) | Goularas, Dionysis |
Yazar [KatkıdaBulunan] Kurum (dc.contributor.institution) | Yeditepe University Graduate School of Natural and Applied Sciences |
Yazar [KatkıdaBulunan] Kurum (dc.contributor.institution) | Yeditepe University Graduate School of Natural and Applied Sciences Computer Engineering Department |
Yazar Katkı Sağlayan OrcID (dc.contributor.authorOrcid) | 0000-0002-4802-2802 |