Image to Music: Cross-Modal Melody Generation Through Image Captioning

  • Kaplan, Alper
  • _preprint
  • تاريخ النشر 2023
  • Yayıncı Yeditepe University Academic and Open Access Information System

Advances in machine learning in recent years have also been seen in computationally creative systems. Interest in machine-generated artifacts paved a way for creative models to evolve as such. But the earlier methods mostly explored a one domain approach and cross-modal learning has stayed relatively unexplored. Thus, the direct mapping between modalities for cross-modal creative models is not fully explored. This work proposes a novel methodology for generating symbolic music through images by directly mapping their features. A CNN encoder and deep stacked LSTM decoder are the base models as the proposed method uses the image captioning approach to map the two domains’ features. The generated music is evaluated quantitatively by using a custom genre classification model and BLEU scores calculations. The qualitative evaluation involves a melody listening test with human evaluators. The results show that the proposed method works well for music generation.

Görüntülenme
53
29.12.2023 tarihinden bu yana
İndirme
2
29.12.2023 tarihinden bu yana
Son Erişim Tarihi
29 Temmuz 2024 06:58
Google Kontrol
انقر
Tam Metin
Tam Metin انقر للتنزيل Ön izleme
Detaylı Görünüm
اسم العمل
(dc.title)
Image to Music: Cross-Modal Melody Generation Through Image Captioning

(dc.creator.author)
Kaplan, Alper

(dc.creator.department)
Yeditepe University Graduate School of Social Sciences

(dc.creator.department)
Yeditepe University Graduate School of Social Sciences Cognitive Science Department
تاريخ النشر
(dc.date.issued)
2023

(dc.type)
preprint

(dc.format)
application/pdf

(dc.subject)
Music Generation

(dc.subject)
Melody Generation

(dc.subject)
Cross-Domain Learning

(dc.subject)
Image Captioning

(dc.subject)
Machine Learning

(dc.subject)
Deep Learning

(dc.subject)
Müzik Üretimi

(dc.subject)
Melodi Üretimi

(dc.subject)
Alanlar Arası Öğrenim

(dc.subject)
Resim Altyazısı

(dc.subject)
Makine öğrenimi

(dc.subject)
Derin Öğrenme
Yayıncı
(dc.publisher)
Yeditepe University Academic and Open Access Information System
اللغة
(dc.language.iso)
eng

(dc.description.abstract)
Advances in machine learning in recent years have also been seen in computationally creative systems. Interest in machine-generated artifacts paved a way for creative models to evolve as such. But the earlier methods mostly explored a one domain approach and cross-modal learning has stayed relatively unexplored. Thus, the direct mapping between modalities for cross-modal creative models is not fully explored. This work proposes a novel methodology for generating symbolic music through images by directly mapping their features. A CNN encoder and deep stacked LSTM decoder are the base models as the proposed method uses the image captioning approach to map the two domains’ features. The generated music is evaluated quantitatively by using a custom genre classification model and BLEU scores calculations. The qualitative evaluation involves a melody listening test with human evaluators. The results show that the proposed method works well for music generation.
Kayıt Giriş Tarihi
(dc.date.accessioned)
2023-12-28
Açık Erişim Tarihi
(dc.date.available)
2023-12-28
Haklar
(dc.rights)
Yeditepe University Academic and Open Access Information System

(dc.rights.access)
Open Access

(dc.rights.holder)
Unless otherwise stated, copyrights belong to Yeditepe University. Usage permissions are specified in the Open Access System, and "InC-NC/1.0" and "by-nc-nd/4.0" are as stated.

(dc.rights.uri)
http://creativecommons.org/licenses/by-nc-nd/4.0

(dc.rights.uri)
https://rightsstatements.org/page/InC-NC/1.0/?language=en

(dc.description)
Final published version

(dc.description.note)
Note: This preprint reports new research that has not been certified by peer review and should not be used as established information without consulting multiple experts in the field.

(dc.description.collectioninformation)
This item is part of the preprint collection made available through Yeditepe University library. For your questions, our contact address is openaccess@yeditepe.edu.tr

(dc.contributor.author)
Goularas, Dionysis

(dc.contributor.institution)
Yeditepe University Graduate School of Natural and Applied Sciences

(dc.contributor.institution)
Yeditepe University Graduate School of Natural and Applied Sciences Computer Engineering Department

(dc.contributor.authorOrcid)
0000-0002-4802-2802
Analizler
Yayın Görüntülenme
Yayın Görüntülenme
Erişilen ülkeler
Erişilen şehirler
6698 sayılı Kişisel Verilerin Korunması Kanunu kapsamında yükümlülüklerimiz ve cerez politikamız hakkında bilgi sahibi olmak için alttaki bağlantıyı kullanabilirsiniz.

creativecommons
Bu site altında yer alan tüm kaynaklar Creative Commons Alıntı-GayriTicari-Türetilemez 4.0 Uluslararası Lisansı ile lisanslanmıştır.
Platforms