Respair commited on
Commit
e5762f6
·
verified ·
1 Parent(s): 834bd9b

Upload folder using huggingface_hub

Browse files
cotlet/1.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
cotlet/2.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
cotlet/3.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
cotlet/__pycache__/phon.cpython-311.pyc ADDED
Binary file (5.55 kB). View file
 
cotlet/__pycache__/utils.cpython-311.pyc ADDED
Binary file (24.8 kB). View file
 
cotlet/cell_output.log ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ Logging started. Output will be saved to /home/austin/disk2/llmvcs/tt/cotlet/cell_output.log every 5 seconds.
2
+ Finding audio files...
3
+ Finding audio files...
cotlet/hallucinate.csv ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/shinichiro_miki/Shinichiro_Miki__01/Shinichiro_Miki__01_chunk1929.wav|kojomi oniːtɕaɴ da ka kambarɯsaɴ da ka no eikʲoɯ o ɯkesɯgi dʑa neː no ka? omae wa! aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː.|4
2
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/kamiya_hiroshi/Kamiya_Hiroshi_02/Kamiya_Hiroshi_02_chunk2670.wav|ɴ, ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa , ɯwa .|5
3
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/sakurai_takahiro/Sakurai_Takahiro_02/Sakurai_Takahiro_02_chunk290.wav|ɯɯ ɯɯ ɯɯ ɯɯ.|1
4
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/sawashiro_miyuki/Sawashiro_Miyuki_03/Sawashiro_Miyuki_03_chunk2123.wav|eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː eː.|2
5
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/shinichiro_miki/Shinichiro_Miki__02/Shinichiro_Miki__02_chunk1211.wav|ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ... ɕi no bɯ...|4
6
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/sakurai_takahiro/Sakurai_Takahiro_02/Sakurai_Takahiro_02_chunk37.wav|ʔɴ?ʔɴ? ɯwaː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː jɯkɯɯɯɯɯ kɯɯɯɯ kɯɯɯɯ kɯɯɯ kɯɯɯ kɯɯɯ kɯɯɯ kɯɯɯ kɯɯ kɯɯɯ kɯɯ kɯɯ kɯɯ kɯɯ kɯɯ kɯ kɯ.|1
7
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/kamiya_hiroshi/Kamiya_Hiroshi_02/Kamiya_Hiroshi_02_chunk2676.wav|ɯ, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa, ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯwa,ɯ.|5
8
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/shinichiro_miki/Shinichiro_Miki_03/Shinichiro_Miki_03_chunk258.wav|aɽi enai aɽi enai aɽi enai aɽi enai aɽi enai aɽi enai aɽi enai aɽi enai aɽi enai.|4
9
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/shinichiro_miki/Shinichiro_Miki__01/Shinichiro_Miki__01_chunk1916.wav|doɯ... doɯ... doɯ... doɯ...|4
10
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/sawashiro_miyuki/Sawashiro_Miyuki_03/Sawashiro_Miyuki_03_chunk1723.wav|aite no kʲotae o ɽijoɯ sɯrɯ sempoɯnʲaɴ. kʲɯɯsoɯ neko o kamɯ, to iɯ kotowaza ga arɯkaɽainʲaɴ. neko ga toɽa o kandaʔte okaɕikɯ wa niai daɽoɯ. soɽe ni...ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯ.|2
11
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/sawashiro_miyuki/Sawashiro_Miyuki_03/Sawashiro_Miyuki_03_chunk1975.wav|aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː itai... itai... itai... atsɯi... itai... atsɯi... atsɯi... atsɯi... atsɯi...|2
12
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/shinichiro_miki/Shinichiro_Miki_03/Shinichiro_Miki_03_chunk204.wav|kʲaː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː a.|4
13
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/sakurai_takahiro/Sakurai_Takahiro_02/Sakurai_Takahiro_02_chunk634.wav|ʔte, oi oi oi oi oi oi oi!|1
14
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/shinichiro_miki/Shinichiro_Miki__02/Shinichiro_Miki__02_chunk1373.wav|ʔɴ...te... ɯwa!! ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw ɯw .|4
15
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/sawashiro_miyuki/Sawashiro_Miyuki_03/Sawashiro_Miyuki_03_chunk491.wav|tada, “çinoɯma ɕi no eɯ ma ɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ.|2
16
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/chiwa_saito/Chiwa_Saito_01/Chiwa_Saito_01_chunk119.wav|ɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ, ɯɯɴ.|3
17
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/kamiya_hiroshi/Kamiya_Hiroshi_01/Kamiya_Hiroshi_01_chunk1491.wav|ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ, ɴʔ,.|5
18
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/kamiya_hiroshi/Kamiya_Hiroshi_01/Kamiya_Hiroshi_01_chunk914.wav|nɯʔ, nɯʔ, nɯʔ, nɯʔ, nɯʔ, nɯʔ, nɯʔ, nɯʔ, nɯʔ, nɯʔ, nɯʔ, nɯʔ, nɯʔ, nɯʔ, nɯʔ, nɯʔ, nɯʔ, nɯʔ, nɯʔ, nɯʔ, nɯʔ, nɯʔ, nɯʔ, nɯʔ, nɯʔ, nɯʔ, nɯʔ, nɯʔ, nɯʔ, nɯʔ, nɯʔ, ɯɯʔ, ɯɯʔ, ɯɯʔ, ɯɯʔ, ɯɯʔ, ɯɯʔ, ɯɯʔ, ɯɯʔ, ɯɯʔ, ɯɯʔ, ɯɯʔ, ɯɯʔ, ɯɯʔ, ɯɯʔ, ɯɯʔ, ɯɯʔ,.|5
19
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/sawashiro_miyuki/Sawashiro_Miyuki_03/Sawashiro_Miyuki_03_chunk1801.wav|ɯ o?nʲa aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː a.|2
20
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/sakurai_takahiro/Sakurai_Takahiro_02/Sakurai_Takahiro_02_chunk57.wav|gʲaː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː.|1
21
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/shinichiro_miki/Shinichiro_Miki_03/Shinichiro_Miki_03_chunk2136.wav|ɯ!! ɯ!! ɯ!!|4
22
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/shinichiro_miki/Shinichiro_Miki__02/Shinichiro_Miki__02_chunk398.wav|do, do, do, do, do, do, do, do, do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do,do!|4
23
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/sakurai_takahiro/Sakurai_Takahiro_02/Sakurai_Takahiro_02_chunk2832.wav|nani o saɽeta... nani o saɽeta... nani o saɽeta... nani o saɽeta... nani o saɽeta... nani o saɽeta... nani o saɽeta...|1
24
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/sakamoto_maya/Sakamoto_Maya_03/Sakamoto_Maya_03_chunk544.wav|ʔɴ, maː na. ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɽ ɯɯɯ, maː na.|6
25
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/sawashiro_miyuki/Sawashiro_Miyuki_01/Sawashiro_Miyuki_01_chunk1450.wav|aɽaɽagikɯɴ ni taiɕite na no ka, oɕinosaɴ ni taiɕite na no ka, arɯiha wataɕi ni taiɕite na no ka...|2
26
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/sakamoto_maya/Sakamoto_Maya_02/Sakamoto_Maya_02_chunk311.wav|tona ni ɕɯɽeba naɽakɯte ɕɯɽeba naɽakɯte ɕɯɽeba naɽakɯte ɕɯɽeba naɽakɯte ɕɯɽeba naɽakɯte ɕɯɽeba naɽakɯte ɕɯɽeba naɽakɯte ɕɯɽeba naɽakɯte ɕɯɽeba naɽakɯte ɕɯɽeba naɽakɯte ɕɯɽeba naɽakɯte ɕɯɽeba naɽakɯte ɕɯɽeba naɽakɯte ɕɯɽeba naɽakɯte ɕɯɽeba naɽakɯte ɕɯɽeba naɽakɯte ɕɯɽeba naɽakɯte ɕɯɽeba naɽakɯte ɕɯɽeba naɽakɯte ɕɯɽeba naɽakɯte ɕɯɽeba naɽakɯte ɕɯɽeba naɽakɯte ɯ.|6
27
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/sakurai_takahiro/Sakurai_Takahiro_01/Sakurai_Takahiro_01_chunk6.wav|itɕi. ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯ.|1
28
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/shinichiro_miki/Shinichiro_Miki__01/Shinichiro_Miki__01_chunk1524.wav|kʲaː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː.|4
29
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/sawashiro_miyuki/Sawashiro_Miyuki_02/Sawashiro_Miyuki_02_chunk1106.wav|koɽe ga kazokɯ no kaiwa na no ka, to ɕɯɯkaɴ ɕɯɯkaɴ ɕɯɯkaɴ ɕɯɯkaɴ ɕɯɯkaɴ ɕɯɯkaɴ ɕɯɯkaɴ ɕɯɯkaɴ ɕɯɯkaɴ ɕɯɯkaɴ ɕɯɯkaɴ ɕɯɯkaɴ ɕɯɯkaɴ ɕɯɯkaɴ ɕɯɯkaɴ ɕɯɯkaɴ ɕɯɯkaɴ ɕɯɯkaɴ ɕɯɯkaɴ ɕɯɯkaɴ ɕɯɯkaɴ ɕɯɯkaɴ ɕɯɯkaɴ ɕɯɯkaɴ ɕɯɯkaɴ ɕɯɯkaɴ ɕɯɯkaɴ ɕɯɯkaɴ ɯɯkaɴ ɯɯkaɴ ɯɯkaɴ ɯɯkaɴ ɯɯkaɴ ɯɯkaɴ ɯɯkaɴ ɯɯkaɴ ɯɯkaɴ ɯɯkaɴ ɯɯkaɴ ɯɯkaɴ ɯɯkaɴ ɯɯkaɴ ɯkaɴ ɯkaɴ ɯkaɴ ɯkaɴ ɯkaɴ ɯka.|2
30
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/shinichiro_miki/Shinichiro_Miki_03/Shinichiro_Miki_03_chunk771.wav|aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː.|4
31
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/chiwa_saito/Chiwa_Saito_03/Chiwa_Saito_03_chunk344.wav|eː to, neko, bokɯ ga ima kaɽa iɯ bɯɴɕoɯ o fɯkɯɕoɯ ɕiɽo. naname ɯ ɯɯ ɽiɴ do no naɽabi de ɕɯɯnɯɯ ɕɯɯ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ ɽoɴ .|3
32
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/shinichiro_miki/Shinichiro_Miki__01/Shinichiro_Miki__01_chunk456.wav|waɽaʔte waɽaʔte waɽaʔte waɽaʔte waɽaʔte.|4
33
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/sawashiro_miyuki/Sawashiro_Miyuki_02/Sawashiro_Miyuki_02_chunk1018.wav|tonikakɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯ.|2
34
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/sawashiro_miyuki/Sawashiro_Miyuki_03/Sawashiro_Miyuki_03_chunk1526.wav|haʔhaʔhaʔhaʔhaʔhaʔhaʔhaʔhaʔhaʔhaʔhaʔhaʔhaʔhaʔhaʔhaʔhaʔhaʔhaʔhaw wa haʔhaʔhaʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔhaw waʔtɽi desɯ ne.|2
35
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/sakurai_takahiro/Sakurai_Takahiro_03/Sakurai_Takahiro_03_chunk1489.wav|ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ.|1
36
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/sawashiro_miyuki/Sawashiro_Miyuki_03/Sawashiro_Miyuki_03_chunk1933.wav|aɽaɽagikɯɴ! aɽaɽagikɯɴ! aɽaɽagikɯɴ!|2
37
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/sawashiro_miyuki/Sawashiro_Miyuki_03/Sawashiro_Miyuki_03_chunk1735.wav|nʲa aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː.|2
38
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/shinichiro_miki/Shinichiro_Miki_03/Shinichiro_Miki_03_chunk354.wav|nani mo iʔteneː jo, nani mo iʔteneːʔte, nani mo naː.|4
39
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/horie_yui/Horie_Yui_01/Horie_Yui_01_chunk1622.wav|«haʔhaʔhaʔhaʔ, itai, itai, haʔhaʔ, itai, itai, itai...».|0
40
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/sakamoto_maya/Sakamoto_Maya_03/Sakamoto_Maya_03_chunk1924.wav|nemaki gawaɽi no jɯkata wa, ɕɯɯrʲɯɯ ɕɯɯrʲɯɯ ɕɯɯrʲɯɯ ɕɯɯrʲɯɯ ɕɯɯrʲɯɯ ɕɯɯrʲɯɯ ɕɯɯrʲɯɯ ɕɯɯrʲɯɯ ɕɯɯrʲɯɯ ɕɯɯrʲɯɯ ɕɯɯrʲɯɯ ɕɯɯrʲɯɯ ɕɯɯrʲɯɯ ɕɯɯrʲɯɯ ɕɯɯrʲɯɯ ɕɯɯrʲɯɯ ɕɯɯrʲɯɯ ɕɯɯrʲɯɯ ɕɯɯrʲɯɯ.|6
41
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/shinichiro_miki/Shinichiro_Miki_03/Shinichiro_Miki_03_chunk2177.wav|kojomi oniːtɕaɴ kojomi oniːtɕaɴ kojomi oniːtɕaɴ kojomi oniːtɕaɴ kojomi oniːtɕaɴ kojomi oniːtɕaɴ kojomi oniːtɕaɴ.|4
42
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/horie_yui/Horie_Yui_01/Horie_Yui_01_chunk814.wav|kondo tsɯkiçitɕaɴ no hoɯ ni, ɽ ɽ ɽ ɽ ɽi ɽi ɽi ɽi ɽi ɽi ɽi ɽi ɽi ɽi ɽi ɽi ɽi no doɯkoɯ o kiːte okoɯ.|0
43
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/shinichiro_miki/Shinichiro_Miki_03/Shinichiro_Miki_03_chunk1105.wav|ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɕɯɯ ɯɯ ɕɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ .|4
44
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/sawashiro_miyuki/Sawashiro_Miyuki_03/Sawashiro_Miyuki_03_chunk2136.wav|i!!? aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aː aɯ oɯ kɯɯɯɯ kɯɯɯ kɯɯɯ kɯɯɯ kɯɯɯ kɯɯɯ kɯɯ kɯɯɯ kɯɯ kɯɯɯ kɯɯ kɯɯɯ kɯɯ kɯɯ kɯɯ kɯɯ kɯɯ kɯɯ kɯɯ kɯɯ kɯɯ kɯɯ kɯɯ kɯɯ kɯ k.|2
45
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/sawashiro_miyuki/Sawashiro_Miyuki_03/Sawashiro_Miyuki_03_chunk2134.wav|a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, aɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯɯ.|2
46
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/sakamoto_maya/Sakamoto_Maya_02/Sakamoto_Maya_02_chunk877.wav|ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ ɯɯ.|6
47
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/kamiya_hiroshi/Kamiya_Hiroshi_01/Kamiya_Hiroshi_01_chunk1765.wav|na na ga tsɯ na na nitɕi.|5
48
+ /home/austin/disk2/llmvcs/tt/stylekan/Data/moe_res/monogatari/monogatari_voices/monogatari_split/sawashiro_miyuki/Sawashiro_Miyuki_03/Sawashiro_Miyuki_03_chunk1792.wav|a tɕi, a tɕi, a tɕi, a tɕi, a tɕi, a tɕi, a tɕi, a tɕi!|2
cotlet/phon.py ADDED
@@ -0,0 +1,158 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from cotlet.utils import *
2
+ import cutlet
3
+
4
+ katsu = cutlet.Cutlet(ensure_ascii=False)
5
+ katsu.use_foreign_spelling = False
6
+
7
+ def process_japanese_text(ml):
8
+ # Check for small characters and replace them
9
+ if any(char in ml for char in "ぁぃぅぇぉ"):
10
+
11
+ ml = ml.replace("ぁ", "あ")
12
+ ml = ml.replace("ぃ", "い")
13
+ ml = ml.replace("ぅ", "う")
14
+ ml = ml.replace("ぇ", "え")
15
+ ml = ml.replace("ぉ", "お")
16
+
17
+ # Initialize Cutlet for romaji conversion
18
+
19
+ # Convert to romaji and apply transformations
20
+ # output = katsu.romaji(ml, capitalize=False).lower()
21
+
22
+ output = katsu.romaji(apply_transformations(alphabetreading(ml)), capitalize=False).lower()
23
+
24
+ # Replace specific romaji sequences
25
+ if 'j' in output:
26
+ output = output.replace('j', "dʑ")
27
+ if 'tt' in output:
28
+ output = output.replace('tt', "ʔt")
29
+ if 't t' in output:
30
+ output = output.replace('t t', "ʔt")
31
+ if ' ʔt' in output:
32
+ output = output.replace(' ʔt', "ʔt")
33
+ if 'ssh' in output:
34
+ output = output.replace('ssh', "ɕɕ")
35
+
36
+ # Convert romaji to IPA
37
+ output = Roma2IPA(convert_numbers_in_string(output))
38
+
39
+
40
+ output = hira2ipa(output)
41
+
42
+ # Apply additional transformations
43
+ output = replace_chars_2(output)
44
+ output = replace_repeated_chars(replace_tashdid_2(output))
45
+ output = nasal_mapper(output)
46
+
47
+ # Final adjustments
48
+ if " ɴ" in output:
49
+ output = output.replace(" ɴ", "ɴ")
50
+
51
+ if ' neɽitai ' in output:
52
+ output = output.replace(' neɽitai ', "naɽitai")
53
+
54
+ if 'harɯdʑisama' in output:
55
+ output = output.replace('harɯdʑisama', "arɯdʑisama")
56
+
57
+
58
+ if "ki ni ɕinai" in output:
59
+ output = re.sub(r'(?<!\s)ki ni ɕinai', r' ki ni ɕinai', output)
60
+
61
+ if 'ʔt' in output:
62
+ output = re.sub(r'(?<!\s)ʔt', r'ʔt', output)
63
+
64
+ if 'de aɽoɯ' in output:
65
+ output = re.sub(r'(?<!\s)de aɽoɯ', r' de aɽoɯ', output)
66
+
67
+
68
+ return output.lstrip()
69
+
70
+ # def replace_repeating_patterns(text):
71
+ # def replace_repeats(match):
72
+ # pattern = match.group(1)
73
+ # if len(match.group(0)) // len(pattern) >= 3:
74
+ # return pattern + "~~~"
75
+ # return match.group(0)
76
+
77
+ # # Pattern for space-separated repeats
78
+ # pattern1 = r'((?:\S+\s+){1,5}?)(?:\1){2,}'
79
+ # # Pattern for continuous repeats without spaces
80
+ # pattern2 = r'(.+?)\1{2,}'
81
+
82
+ # text = re.sub(pattern1, replace_repeats, text)
83
+ # text = re.sub(pattern2, replace_repeats, text)
84
+ # return text
85
+
86
+
87
+ def replace_repeating_a(output):
88
+ # Define patterns and their replacements
89
+ patterns = [
90
+ (r'(aː)\s*\1+\s*', r'\1~'), # Replace repeating "aː" with "aː~~"
91
+ (r'(aːa)\s*aː', r'\1~'), # Replace "aːa aː" with "aː~~"
92
+ (r'aːa', r'aː~'), # Replace "aːa" with "aː~"
93
+ (r'naː\s*aː', r'naː~'), # Replace "naː aː" with "naː~"
94
+ (r'(oː)\s*\1+\s*', r'\1~'), # Replace repeating "oː" with "oː~~"
95
+ (r'(oːo)\s*oː', r'\1~'), # Replace "oːo oː" with "oː~~"
96
+ (r'oːo', r'oː~'), # Replace "oːo" with "oː~"
97
+ (r'(eː)\s*\1+\s*', r'\1~'),
98
+ (r'(e)\s*\1+\s*', r'\1~'),
99
+ (r'(eːe)\s*eː', r'\1~'),
100
+ (r'eːe', r'eː~'),
101
+ (r'neː\s*eː', r'neː~'),
102
+ ]
103
+
104
+
105
+ # Apply each pattern to the output
106
+ for pattern, replacement in patterns:
107
+ output = re.sub(pattern, replacement, output)
108
+
109
+ return output
110
+
111
+ def phonemize(text):
112
+
113
+ if "っ" in text:
114
+ text = text.replace("っ","ʔ")
115
+
116
+ output = post_fix(process_japanese_text(text))
117
+ #output = text
118
+
119
+ if " ɴ" in output:
120
+ output = output.replace(" ɴ", "ɴ")
121
+ if "y" in output:
122
+ output = output.replace("y", "j")
123
+ if "ɯa" in output:
124
+ output = output.replace("ɯa", "wa")
125
+
126
+ if "a aː" in output:
127
+ output = output.replace("a aː","a~")
128
+ if "a a" in output:
129
+ output = output.replace("a a","a~")
130
+
131
+
132
+
133
+
134
+
135
+ output = replace_repeating_a((output))
136
+ output = re.sub(r'\s+~', '~', output)
137
+
138
+ if "oː~o oː~ o" in output:
139
+ output = output.replace("oː~o oː~ o","oː~~~~~~")
140
+ if "aː~aː" in output:
141
+ output = output.replace("aː~aː","aː~~~")
142
+ if "oɴ naː" in output:
143
+ output = output.replace("oɴ naː","onnaː")
144
+ if "aː~~ aː" in output:
145
+ output = output.replace("aː~~ aː","aː~~~~")
146
+ if "oː~o" in output:
147
+ output = output.replace("oː~o","oː~~")
148
+ if "oː~~o o" in output:
149
+ output = output.replace("oː~~o o","oː~~~~") # yeah I'm too tired to learn regex how did you know
150
+
151
+ output = random_space_fix(output)
152
+ output = random_sym_fix(output) # fixing some symbols, if they have a specific white space such as miku& sakura -> miku ando sakura
153
+ output = random_sym_fix_no_space(output) # same as above but for those without white space such as miku&sakura -> miku ando sakura
154
+
155
+ return output.lstrip()
156
+
157
+ # def process_row(row):
158
+ # return {'phonemes': [phonemize(word) for word in row['phonemes']]}
cotlet/sanity_check.py ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import csv
2
+ import wave
3
+ import os
4
+ from tqdm import tqdm
5
+ def verify_wav_file(file_path):
6
+ try:
7
+ with wave.open(file_path, 'rb') as wav_file:
8
+ # Try to read some basic properties
9
+ channels = wav_file.getnchannels()
10
+ sample_width = wav_file.getsampwidth()
11
+ framerate = wav_file.getframerate()
12
+ frames = wav_file.getnframes()
13
+
14
+ # If we got here, the file is likely valid
15
+ return True
16
+ except Exception as e:
17
+ print(f"Error processing {file_path}: {str(e)}")
18
+ return False
19
+
20
+ def main():
21
+ csv_path = "/home/austin/disk1/stts-zs_cleaning/data/filename.csv"
22
+ total_files = 0
23
+ valid_files = 0
24
+
25
+ with open(csv_path, 'r') as csv_file:
26
+ csv_reader = csv.reader(csv_file, delimiter='|')
27
+ for row in tqdm(csv_reader,desc="Verifying files", unit="file"):
28
+ if row: # Check if the row is not empty
29
+ wav_path = row[0]
30
+ total_files += 1
31
+
32
+ if os.path.exists(wav_path):
33
+ if verify_wav_file(wav_path):
34
+ valid_files += 1
35
+ else:
36
+ print(f"File is corrupted or invalid: {wav_path}")
37
+ else:
38
+ print(f"File does not exist: {wav_path}")
39
+
40
+ print(f"\nVerification completed.")
41
+ print(f"Total files checked: {total_files}")
42
+ print(f"Valid files: {valid_files}")
43
+ print(f"Invalid or missing files: {total_files - valid_files}")
44
+
45
+ if __name__ == "__main__":
46
+ main()
cotlet/utils.py ADDED
@@ -0,0 +1,1003 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ import re
3
+ import cutlet
4
+
5
+
6
+ formal_to_informal = {
7
+
8
+
9
+
10
+ 'ワタクシ': 'わたし',
11
+ 'チカコ':'しゅうこ',
12
+ "タノヒト":"ほかのひと",
13
+
14
+ # Add more mappings as needed
15
+ }
16
+
17
+ formal_to_informal2 = {
18
+
19
+ "たのひと":"ほかのひと",
20
+ "すうは": "かずは",
21
+
22
+
23
+ # Add more mappings as needed
24
+ }
25
+
26
+ formal_to_informal3 = {
27
+
28
+ "%":"%",
29
+ "@": "あっとさいん",
30
+ "$":"どる",
31
+ "#":"はっしゅたぐ",
32
+ "$":"どる",
33
+ "#":"はっしゅたぐ",
34
+ "何が":"なにが",
35
+
36
+ "何も":"なにも",
37
+ "何か":"なにか",
38
+ # "奏":"かなで",
39
+ "何は":"なにが",
40
+ "お父様":"おとうさま",
41
+ "お兄様":"おにいさま",
42
+ "何を":"なにを",
43
+ "良い":"いい",
44
+ "李衣菜":"りいな",
45
+ "志希":"しき",
46
+ "種":"たね",
47
+ "方々":"かたがた",
48
+ "颯":"はやて",
49
+ "茄子さん":"かこさん",
50
+ "茄子ちゃん":"かこちゃん",
51
+ "涼ちゃん":"りょうちゃん",
52
+ "涼さん":"りょうさん",
53
+ "紗枝":"さえ",
54
+ "文香":"ふみか",
55
+ "私":"わたし",
56
+ "周子":"しゅうこ",
57
+ "イェ":"いえ",
58
+ "可憐":"かれん",
59
+ "加蓮":"かれん",
60
+ "・":".",
61
+ "方の":"かたの",
62
+ "気に":"きに",
63
+ "唯さん":"ゆいさん",
64
+ "唯ちゃん":"ゆいちゃん",
65
+ "聖ちゃん":"ひじりちゃん",
66
+ "他の":"ほかの",
67
+ "他に":"ほかに",
68
+ "一生懸命":"いっしょうけんめい",
69
+ "楓さん":"かえでさん",
70
+ "楓ちゃん":"かえでちゃん",
71
+ "内から":"ないから",
72
+ "の下で":"のしたで",
73
+
74
+ }
75
+
76
+
77
+ mapper = dict([
78
+
79
+ ("仕方","しかた"),
80
+ ("明日","あした"),
81
+ ('私',"わたし"),
82
+ ("従妹","いとこ"),
83
+
84
+ ("1人","ひとり"),
85
+ ("2人","ふたり"),
86
+
87
+ ("一期","いちご"),
88
+ ("一会","いちえ"),
89
+
90
+ ("♪","!"),
91
+ ("?","?"),
92
+
93
+ ("どんな方","どんなかた"),
94
+ ("ふたり暮らし","ふたりぐらし"),
95
+
96
+ ("新年","しんねん"),
97
+ ("来年","らいねん"),
98
+ ("去年","きょねん"),
99
+ ("壮年","そうねん"),
100
+ ("今年","ことし"),
101
+
102
+ ("昨年","さくねん"),
103
+ ("本年","ほんねん"),
104
+ ("平年","へいねん"),
105
+ ("閏年","うるうどし"),
106
+ ("初年","しょねん"),
107
+ ("少年","しょうねん"),
108
+ ("多年","たねん"),
109
+ ("青年","せいねん"),
110
+ ("中年","ちゅうねん"),
111
+ ("老年","ろうねん"),
112
+ ("成年","せいねん"),
113
+ ("幼年","ようねん"),
114
+ ("前年","ぜんねん"),
115
+ ("元年","がんねん"),
116
+ ("経年","けいねん"),
117
+ ("当年","とうねん"),
118
+
119
+ ("明年","みょうねん"),
120
+ ("歳年","さいねん"),
121
+ ("数年","すうねん"),
122
+ ("半年","はんとし"),
123
+ ("後年","こうねん"),
124
+ ("実年","じつねん"),
125
+ ("年年","ねんねん"),
126
+ ("連年","れんねん"),
127
+ ("暦年","れきねん"),
128
+ ("各年","かくねん"),
129
+ ("全年","ぜんねん"),
130
+
131
+ ("年を","としを"),
132
+ ("年が","としが"),
133
+ ("年も","としも"),
134
+ ("年は","としは"),
135
+
136
+
137
+ ("奏ちゃん","かなでちゃん"),
138
+ ("負けず嫌い","まけずぎらい"),
139
+ ("貴方","あなた"),
140
+ ("貴女","あなた"),
141
+ ("貴男","あなた"),
142
+
143
+ ("その節","そのせつ"),
144
+
145
+ ("何し","なにし"),
146
+ ("何する","なにする"),
147
+
148
+ ("心さん","しんさん"),
149
+ ("心ちゃん","しんちゃん"),
150
+
151
+ ("乃々","のの"),
152
+
153
+ ("身体の","からだの"),
154
+ ("身体が","からだが"),
155
+ ("身体を","からだを"),
156
+ ("身体は","からだは"),
157
+ ("身体に","からだに"),
158
+ ("正念場","しょうねんば"),
159
+ ("言う","いう"),
160
+
161
+
162
+ ("一回","いっかい"),
163
+ ("一曲","いっきょく"),
164
+ ("一日","いちにち"),
165
+ ("一言","ひとこと"),
166
+ ("一杯","いっぱい"),
167
+
168
+
169
+ ("方が","ほうが"),
170
+ ("縦輪城","じゅうりんしろ"),
171
+ ("深息","しんそく"),
172
+ ("家人","かじん"),
173
+ ("お返し","おかえし"),
174
+ ("化物語","ばけものがたり"),
175
+ ("阿良々木暦","あららぎこよみ"),
176
+ ("何より","なにより")
177
+
178
+
179
+ ])
180
+
181
+
182
+ # Merge all dictionaries into one
183
+ all_transformations = {**formal_to_informal, **formal_to_informal2, **formal_to_informal3, **mapper}
184
+
185
+ def apply_transformations(text, transformations = all_transformations):
186
+ for key, value in transformations.items():
187
+ text = text.replace(key, value)
188
+ return text
189
+
190
+
191
+ def number_to_japanese(num):
192
+ if not isinstance(num, int) or num < 0 or num > 9999:
193
+ return "Invalid input"
194
+
195
+ digits = ["", "いち", "に", "さん", "よん", "ご", "ろく", "なな", "はち", "きゅう"]
196
+ tens = ["", "じゅう", "にじゅう", "さんじゅう", "よんじゅう", "ごじゅう", "ろくじゅう", "ななじゅう", "はちじゅう", "きゅうじゅう"]
197
+ hundreds = ["", "ひゃく", "にひゃく", "さんびゃく", "よんひゃく", "ごひゃく", "ろっぴゃく", "ななひゃく", "はっぴゃく", "きゅうひゃく"]
198
+ thousands = ["", "せん", "にせん", "さんぜん", "よんせん", "ごせん", "ろくせん", "ななせん", "はっせん", "きゅうせん"]
199
+
200
+ if num == 0:
201
+ return "ゼロ"
202
+
203
+ result = ""
204
+ if num >= 1000:
205
+ result += thousands[num // 1000]
206
+ num %= 1000
207
+ if num >= 100:
208
+ result += hundreds[num // 100]
209
+ num %= 100
210
+ if num >= 10:
211
+ result += tens[num // 10]
212
+ num %= 10
213
+ if num > 0:
214
+ result += digits[num]
215
+
216
+ return result
217
+
218
+ def convert_numbers_in_string(input_string):
219
+ # Regular expression to find numbers in the string
220
+ number_pattern = re.compile(r'\d+')
221
+
222
+ # Function to replace numbers with their Japanese pronunciation
223
+ def replace_with_japanese(match):
224
+ num = int(match.group())
225
+ return number_to_japanese(num)
226
+
227
+ # Replace all occurrences of numbers in the string
228
+ converted_string = number_pattern.sub(replace_with_japanese, input_string)
229
+ return converted_string
230
+
231
+
232
+
233
+ roma_mapper = dict([
234
+
235
+ ################################
236
+
237
+ ("my","mʲ"),
238
+ ("by","bʲ"),
239
+ ("ny","nʲ"),
240
+ ("ry","rʲ"),
241
+ ("si","sʲ"),
242
+ ("ky","kʲ"),
243
+ ("gy","gʲ"),
244
+ ("dy","dʲ"),
245
+ ("di","dʲ"),
246
+ ("fi","fʲ"),
247
+ ("fy","fʲ"),
248
+ ("ch","tɕ"),
249
+ ("sh","ɕ"),
250
+
251
+ ################################
252
+
253
+ ("a","a"),
254
+ ("i","i"),
255
+ ("u","ɯ"),
256
+ ("e","e"),
257
+ ("o","o"),
258
+ ("ka","ka"),
259
+ ("ki","ki"),
260
+ ("ku","kɯ"),
261
+ ("ke","ke"),
262
+ ("ko","ko"),
263
+ ("sa","sa"),
264
+ ("shi","ɕi"),
265
+ ("su","sɯ"),
266
+ ("se","se"),
267
+ ("so","so"),
268
+ ("ta","ta"),
269
+ ("chi","tɕi"),
270
+ ("tsu","tsɯ"),
271
+ ("te","te"),
272
+ ("to","to"),
273
+ ("na","na"),
274
+ ("ni","ni"),
275
+ ("nu","nɯ"),
276
+ ("ne","ne"),
277
+ ("no","no"),
278
+ ("ha","ha"),
279
+ ("hi","çi"),
280
+ ("fu","ɸɯ"),
281
+ ("he","he"),
282
+ ("ho","ho"),
283
+ ("ma","ma"),
284
+ ("mi","mi"),
285
+ ("mu","mɯ"),
286
+ ("me","me"),
287
+ ("mo","mo"),
288
+ ("ra","ɽa"),
289
+ ("ri","ɽi"),
290
+ ("ru","ɽɯ"),
291
+ ("re","ɽe"),
292
+ ("ro","ɽo"),
293
+ ("ga","ga"),
294
+ ("gi","gi"),
295
+ ("gu","gɯ"),
296
+ ("ge","ge"),
297
+ ("go","go"),
298
+ ("za","za"),
299
+ ("ji","dʑi"),
300
+ ("zu","zɯ"),
301
+ ("ze","ze"),
302
+ ("zo","zo"),
303
+ ("da","da"),
304
+
305
+
306
+ ("zu","zɯ"),
307
+ ("de","de"),
308
+ ("do","do"),
309
+ ("ba","ba"),
310
+ ("bi","bi"),
311
+ ("bu","bɯ"),
312
+ ("be","be"),
313
+ ("bo","bo"),
314
+ ("pa","pa"),
315
+ ("pi","pi"),
316
+ ("pu","pɯ"),
317
+ ("pe","pe"),
318
+ ("po","po"),
319
+ ("ya","ja"),
320
+ ("yu","jɯ"),
321
+ ("yo","jo"),
322
+ ("wa","wa"),
323
+
324
+
325
+
326
+
327
+ ("a","a"),
328
+ ("i","i"),
329
+ ("u","ɯ"),
330
+ ("e","e"),
331
+ ("o","o"),
332
+ ("wa","wa"),
333
+ ("o","o"),
334
+
335
+
336
+ ("wo","o")])
337
+
338
+ nasal_sound = dict([
339
+ # before m, p, b
340
+ ("ɴm","mm"),
341
+ ("ɴb", "mb"),
342
+ ("ɴp", "mp"),
343
+
344
+ # before k, g
345
+ ("ɴk","ŋk"),
346
+ ("ɴg", "ŋg"),
347
+
348
+ # before t, d, n, s, z, ɽ
349
+ ("ɴt","nt"),
350
+ ("ɴd", "nd"),
351
+ ("ɴn","nn"),
352
+ ("ɴs", "ns"),
353
+ ("ɴz","nz"),
354
+ ("ɴɽ", "nɽ"),
355
+
356
+ ("ɴɲ", "ɲɲ"),
357
+
358
+ ])
359
+
360
+ def Roma2IPA(text):
361
+ orig = text
362
+
363
+ for k, v in roma_mapper.items():
364
+ text = text.replace(k, v)
365
+
366
+ return text
367
+
368
+ def nasal_mapper(text):
369
+ orig = text
370
+
371
+
372
+ for k, v in nasal_sound.items():
373
+ text = text.replace(k, v)
374
+
375
+ return text
376
+
377
+ def alphabetreading(text):
378
+ alphabet_dict = {"A": "エイ",
379
+ "B": "ビー",
380
+ "C": "シー",
381
+ "D": "ディー",
382
+ "E": "イー",
383
+ "F": "エフ",
384
+ "G": "ジー",
385
+ "H": "エイチ",
386
+ "I":"アイ",
387
+ "J":"ジェイ",
388
+ "K":"ケイ",
389
+ "L":"エル",
390
+ "M":"エム",
391
+ "N":"エヌ",
392
+ "O":"オー",
393
+ "P":"ピー",
394
+ "Q":"キュー",
395
+ "R":"アール",
396
+ "S":"エス",
397
+ "T":"ティー",
398
+ "U":"ユー",
399
+ "V":"ヴィー",
400
+ "W":"ダブリュー",
401
+ "X":"エックス",
402
+ "Y":"ワイ",
403
+ "Z":"ゼッド"}
404
+ text = text.upper()
405
+ text_ret = ""
406
+ for t in text:
407
+ if t in alphabet_dict:
408
+ text_ret += alphabet_dict[t]
409
+ else:
410
+ text_ret += t
411
+ return text_ret
412
+
413
+
414
+ roma_mapper_plus_2 = {
415
+
416
+ "bjo":'bʲo',
417
+ "rjo":"rʲo",
418
+ "kjo":"kʲo",
419
+ "kyu":"kʲu",
420
+
421
+ }
422
+
423
+ def replace_repeated_chars(input_string):
424
+ result = []
425
+ i = 0
426
+ while i < len(input_string):
427
+ if i + 1 < len(input_string) and input_string[i] == input_string[i + 1] and input_string[i] in 'aiueo':
428
+ result.append(input_string[i] + 'ː')
429
+ i += 2
430
+ else:
431
+ result.append(input_string[i])
432
+ i += 1
433
+ return ''.join(result)
434
+
435
+
436
+ def replace_chars_2(text, mapping=roma_mapper_plus_2):
437
+
438
+
439
+ sorted_keys = sorted(mapping.keys(), key=len, reverse=True)
440
+
441
+ pattern = '|'.join(re.escape(key) for key in sorted_keys)
442
+
443
+
444
+ def replace(match):
445
+ key = match.group(0)
446
+ return mapping.get(key, key)
447
+
448
+ return re.sub(pattern, replace, text)
449
+
450
+
451
+ def replace_tashdid_2(s):
452
+ vowels = 'aiueoɯ0123456789.?!_。؟?!...@@##$$%%^^&&**()()_+=[「」]></\`~~―ー∺"'
453
+ result = []
454
+
455
+ i = 0
456
+ while i < len(s):
457
+ if i < len(s) - 2 and s[i].lower() == s[i + 2].lower() and s[i].lower() not in vowels and s[i + 1] == ' ':
458
+ result.append('ʔ')
459
+ result.append(s[i + 2])
460
+ i += 3
461
+ elif i < len(s) - 1 and s[i].lower() == s[i + 1].lower() and s[i].lower() not in vowels:
462
+ result.append('ʔ')
463
+ result.append(s[i + 1])
464
+ i += 2
465
+ else:
466
+ result.append(s[i])
467
+ i += 1
468
+
469
+ return ''.join(result)
470
+
471
+ def replace_tashdid(input_string):
472
+ result = []
473
+ i = 0
474
+ while i < len(input_string):
475
+ if i + 1 < len(input_string) and input_string[i] == input_string[i + 1] and input_string[i] not in 'aiueo':
476
+ result.append('ʔ')
477
+ result.append(input_string[i])
478
+ i += 2 # Skip the next character as it is already processed
479
+ else:
480
+ result.append(input_string[i])
481
+ i += 1
482
+ return ''.join(result)
483
+
484
+ def hira2ipa(text, roma_mapper=roma_mapper):
485
+ keys_set = set(roma_mapper.keys())
486
+ special_rule = ("n", "ɴ")
487
+
488
+ transformed_text = []
489
+ i = 0
490
+
491
+ while i < len(text):
492
+ if text[i] == special_rule[0]:
493
+ if i + 1 == len(text) or text[i + 1] not in keys_set:
494
+ transformed_text.append(special_rule[1])
495
+ else:
496
+ transformed_text.append(text[i])
497
+ else:
498
+ transformed_text.append(text[i])
499
+
500
+ i += 1
501
+
502
+ return ''.join(transformed_text)
503
+
504
+ k_mapper = dict([
505
+ ("ゔぁ","ba"),
506
+ ("ゔぃ","bi"),
507
+ ("ゔぇ","be"),
508
+ ("ゔぉ","bo"),
509
+ ("ゔゃ","bʲa"),
510
+ ("ゔゅ","bʲɯ"),
511
+ ("ゔゃ","bʲa"),
512
+ ("ゔょ","bʲo"),
513
+
514
+ ("ゔ","bɯ"),
515
+
516
+ ("あぁ"," aː"),
517
+ ("いぃ"," iː"),
518
+ ("いぇ"," je"),
519
+ ("いゃ"," ja"),
520
+ ("うぅ"," ɯː"),
521
+ ("えぇ"," eː"),
522
+ ("おぉ"," oː"),
523
+ ("かぁ"," kaː"),
524
+ ("きぃ"," kiː"),
525
+ ("くぅ","kɯː"),
526
+ ("くゃ","ka"),
527
+ ("くゅ","kʲɯ"),
528
+ ("くょ","kʲo"),
529
+ ("けぇ","keː"),
530
+ ("こぉ","koː"),
531
+ ("がぁ","gaː"),
532
+ ("ぎぃ","giː"),
533
+ ("ぐぅ","gɯː"),
534
+ ("ぐゃ","gʲa"),
535
+ ("ぐゅ","gʲɯ"),
536
+ ("ぐょ","gʲo"),
537
+ ("げぇ","geː"),
538
+ ("ごぉ","goː"),
539
+ ("さぁ","saː"),
540
+ ("しぃ","ɕiː"),
541
+ ("すぅ","sɯː"),
542
+ ("すゃ","sʲa"),
543
+ ("すゅ","sʲɯ"),
544
+ ("すょ","sʲo"),
545
+ ("せぇ","seː"),
546
+ ("そぉ","soː"),
547
+ ("ざぁ","zaː"),
548
+ ("じぃ","dʑiː"),
549
+ ("ずぅ","zɯː"),
550
+ ("ずゃ","zʲa"),
551
+ ("ずゅ","zʲɯ"),
552
+ ("ずょ","zʲo"),
553
+ ("ぜぇ","zeː"),
554
+ ("ぞぉ","zeː"),
555
+ ("たぁ","taː"),
556
+ ("ちぃ","tɕiː"),
557
+ ("つぁ","tsa"),
558
+ ("つぃ","tsi"),
559
+ ("つぅ","tsɯː"),
560
+ ("つゃ","tɕa"),
561
+ ("つゅ","tɕɯ"),
562
+ ("つょ","tɕo"),
563
+ ("つぇ","tse"),
564
+ ("つぉ","tso"),
565
+ ("てぇ","teː"),
566
+ ("とぉ","toː"),
567
+ ("だぁ","daː"),
568
+ ("ぢぃ","dʑiː"),
569
+ ("づぅ","dɯː"),
570
+ ("づゃ","zʲa"),
571
+ ("づゅ","zʲɯ"),
572
+ ("づょ","zʲo"),
573
+ ("でぇ","deː"),
574
+ ("どぉ","doː"),
575
+ ("なぁ","naː"),
576
+ ("にぃ","niː"),
577
+ ("ぬぅ","nɯː"),
578
+ ("ぬゃ","nʲa"),
579
+ ("ぬゅ","nʲɯ"),
580
+ ("ぬょ","nʲo"),
581
+ ("ねぇ","neː"),
582
+ ("のぉ","noː"),
583
+ ("はぁ","haː"),
584
+ ("ひぃ","çiː"),
585
+ ("ふぅ","ɸɯː"),
586
+ ("ふゃ","ɸʲa"),
587
+ ("ふゅ","ɸʲɯ"),
588
+ ("ふょ","ɸʲo"),
589
+ ("へぇ","heː"),
590
+ ("ほぉ","hoː"),
591
+ ("ばぁ","baː"),
592
+ ("びぃ","biː"),
593
+ ("ぶぅ","bɯː"),
594
+ ("ふゃ","ɸʲa"),
595
+ ("ぶゅ","bʲɯ"),
596
+ ("ふょ","ɸʲo"),
597
+ ("べぇ","beː"),
598
+ ("ぼぉ","boː"),
599
+ ("ぱぁ","paː"),
600
+ ("ぴぃ","piː"),
601
+ ("ぷぅ","pɯː"),
602
+ ("ぷゃ","pʲa"),
603
+ ("ぷゅ","pʲɯ"),
604
+ ("ぷょ","pʲo"),
605
+ ("ぺぇ","peː"),
606
+ ("ぽぉ","poː"),
607
+ ("まぁ","maː"),
608
+ ("みぃ","miː"),
609
+ ("むぅ","mɯː"),
610
+ ("むゃ","mʲa"),
611
+ ("むゅ","mʲɯ"),
612
+ ("むょ","mʲo"),
613
+ ("めぇ","meː"),
614
+ ("もぉ","moː"),
615
+ ("やぁ","jaː"),
616
+ ("ゆぅ","jɯː"),
617
+ ("ゆゃ","jaː"),
618
+ ("ゆゅ","jɯː"),
619
+ ("ゆょ","joː"),
620
+ ("よぉ","joː"),
621
+ ("らぁ","ɽaː"),
622
+ ("りぃ","ɽiː"),
623
+ ("るぅ","��ɯː"),
624
+ ("るゃ","ɽʲa"),
625
+ ("るゅ","ɽʲɯ"),
626
+ ("るょ","ɽʲo"),
627
+ ("れぇ","ɽeː"),
628
+ ("ろぉ","ɽoː"),
629
+ ("わぁ","ɯaː"),
630
+ ("をぉ","oː"),
631
+
632
+ ("う゛","bɯ"),
633
+ ("でぃ","di"),
634
+ ("でぇ","deː"),
635
+ ("でゃ","dʲa"),
636
+ ("でゅ","dʲɯ"),
637
+ ("でょ","dʲo"),
638
+ ("てぃ","ti"),
639
+ ("てぇ","teː"),
640
+ ("てゃ","tʲa"),
641
+ ("てゅ","tʲɯ"),
642
+ ("てょ","tʲo"),
643
+ ("すぃ","si"),
644
+ ("ずぁ","zɯa"),
645
+ ("ずぃ","zi"),
646
+ ("ずぅ","zɯ"),
647
+ ("ずゃ","zʲa"),
648
+ ("ずゅ","zʲɯ"),
649
+ ("ずょ","zʲo"),
650
+ ("ずぇ","ze"),
651
+ ("ずぉ","zo"),
652
+ ("きゃ","kʲa"),
653
+ ("きゅ","kʲɯ"),
654
+ ("きょ","kʲo"),
655
+ ("しゃ","ɕʲa"),
656
+ ("しゅ","ɕʲɯ"),
657
+ ("しぇ","ɕʲe"),
658
+ ("しょ","ɕʲo"),
659
+ ("ちゃ","tɕa"),
660
+ ("ちゅ","tɕɯ"),
661
+ ("ちぇ","tɕe"),
662
+ ("ちょ","tɕo"),
663
+ ("とぅ","tɯ"),
664
+ ("とゃ","tʲa"),
665
+ ("とゅ","tʲɯ"),
666
+ ("とょ","tʲo"),
667
+ ("どぁ","doa"),
668
+ ("どぅ","dɯ"),
669
+ ("どゃ","dʲa"),
670
+ ("どゅ","dʲɯ"),
671
+ ("どょ","dʲo"),
672
+ ("どぉ","doː"),
673
+ ("にゃ","nʲa"),
674
+ ("にゅ","nʲɯ"),
675
+ ("にょ","nʲo"),
676
+ ("ひゃ","çʲa"),
677
+ ("ひゅ","çʲɯ"),
678
+ ("ひょ","çʲo"),
679
+ ("みゃ","mʲa"),
680
+ ("みゅ","mʲɯ"),
681
+ ("みょ","mʲo"),
682
+ ("りゃ","ɽʲa"),
683
+ ("りぇ","ɽʲe"),
684
+ ("りゅ","ɽʲɯ"),
685
+ ("りょ","ɽʲo"),
686
+ ("ぎゃ","gʲa"),
687
+ ("ぎゅ","gʲɯ"),
688
+ ("ぎょ","gʲo"),
689
+ ("ぢぇ","dʑe"),
690
+ ("ぢゃ","dʑa"),
691
+ ("ぢゅ","dʑɯ"),
692
+ ("ぢょ","dʑo"),
693
+ ("じぇ","dʑe"),
694
+ ("じゃ","dʑa"),
695
+ ("じゅ","dʑɯ"),
696
+ ("じょ","dʑo"),
697
+ ("びゃ","bʲa"),
698
+ ("びゅ","bʲɯ"),
699
+ ("びょ","bʲo"),
700
+ ("ぴゃ","pʲa"),
701
+ ("ぴゅ","pʲɯ"),
702
+ ("ぴょ","pʲo"),
703
+ ("うぁ","ɯa"),
704
+ ("うぃ","ɯi"),
705
+ ("うぇ","ɯe"),
706
+ ("うぉ","ɯo"),
707
+ ("うゃ","ɯʲa"),
708
+ ("うゅ","ɯʲɯ"),
709
+ ("うょ","ɯʲo"),
710
+ ("ふぁ","ɸa"),
711
+ ("ふぃ","ɸi"),
712
+ ("ふぅ","ɸɯ"),
713
+ ("ふゃ","ɸʲa"),
714
+ ("ふゅ","ɸʲɯ"),
715
+ ("ふょ","ɸʲo"),
716
+ ("ふぇ","ɸe"),
717
+ ("ふぉ","ɸo"),
718
+
719
+ ("あ"," a"),
720
+ ("い"," i"),
721
+ ("う","ɯ"),
722
+ ("え"," e"),
723
+ ("お"," o"),
724
+ ("か"," ka"),
725
+ ("き"," ki"),
726
+ ("く"," kɯ"),
727
+ ("け"," ke"),
728
+ ("こ"," ko"),
729
+ ("さ"," sa"),
730
+ ("し"," ɕi"),
731
+ ("す"," sɯ"),
732
+ ("せ"," se"),
733
+ ("そ"," so"),
734
+ ("た"," ta"),
735
+ ("ち"," tɕi"),
736
+ ("つ"," tsɯ"),
737
+ ("て"," te"),
738
+ ("と"," to"),
739
+ ("な"," na"),
740
+ ("に"," ni"),
741
+ ("ぬ"," nɯ"),
742
+ ("ね"," ne"),
743
+ ("の"," no"),
744
+ ("は"," ha"),
745
+ ("ひ"," çi"),
746
+ ("ふ"," ɸɯ"),
747
+ ("へ"," he"),
748
+ ("ほ"," ho"),
749
+ ("ま"," ma"),
750
+ ("み"," mi"),
751
+ ("む"," mɯ"),
752
+ ("め"," me"),
753
+ ("も"," mo"),
754
+ ("ら"," ɽa"),
755
+ ("り"," ɽi"),
756
+ ("る"," ɽɯ"),
757
+ ("れ"," ɽe"),
758
+ ("ろ"," ɽo"),
759
+ ("が"," ga"),
760
+ ("ぎ"," gi"),
761
+ ("ぐ"," gɯ"),
762
+ ("げ"," ge"),
763
+ ("ご"," go"),
764
+ ("ざ"," za"),
765
+ ("じ"," dʑi"),
766
+ ("ず"," zɯ"),
767
+ ("ぜ"," ze"),
768
+ ("ぞ"," zo"),
769
+ ("だ"," da"),
770
+ ("ぢ"," dʑi"),
771
+ ("づ"," zɯ"),
772
+ ("で"," de"),
773
+ ("ど"," do"),
774
+ ("ば"," ba"),
775
+ ("び"," bi"),
776
+ ("ぶ"," bɯ"),
777
+ ("べ"," be"),
778
+ ("ぼ"," bo"),
779
+ ("ぱ"," pa"),
780
+ ("ぴ"," pi"),
781
+ ("ぷ"," pɯ"),
782
+ ("ぺ"," pe"),
783
+ ("ぽ"," po"),
784
+ ("や"," ja"),
785
+ ("ゆ"," jɯ"),
786
+ ("よ"," jo"),
787
+ ("わ"," wa"),
788
+ ("ゐ"," i"),
789
+ ("ゑ"," e"),
790
+ ("ん"," ɴ"),
791
+ ("っ"," ʔ"),
792
+ ("ー"," ː"),
793
+
794
+ ("ぁ"," a"),
795
+ ("ぃ"," i"),
796
+ ("ぅ"," ɯ"),
797
+ ("ぇ"," e"),
798
+ ("ぉ"," o"),
799
+ ("ゎ"," ɯa"),
800
+ ("ぉ"," o"),
801
+ ("っ","?"),
802
+
803
+ ("を","o")
804
+
805
+ ])
806
+
807
+
808
+ def post_fix(text):
809
+ orig = text
810
+
811
+ for k, v in k_mapper.items():
812
+ text = text.replace(k, v)
813
+
814
+ return text
815
+
816
+
817
+
818
+
819
+ sym_ws = dict([
820
+
821
+ ("$ ","dorɯ"),
822
+ ("$ ","dorɯ"),
823
+
824
+ ("〇 ","marɯ"),
825
+ ("¥ ","eɴ"),
826
+
827
+ ("# ","haʔɕɯ tagɯ"),
828
+ ("# ","haʔɕɯ tagɯ"),
829
+
830
+ ("& ","ando"),
831
+ ("& ","ando"),
832
+
833
+ ("% ","paːsento"),
834
+ ("% ","paːsento"),
835
+
836
+ ("@ ","aʔto saiɴ"),
837
+ ("@ ","aʔto saiɴ")
838
+
839
+
840
+
841
+ ])
842
+
843
+ def random_sym_fix(text): # with space
844
+ orig = text
845
+
846
+ for k, v in sym_ws.items():
847
+ text = text.replace(k, f" {v} ")
848
+
849
+ return text
850
+
851
+
852
+ sym_ns = dict([
853
+
854
+ ("$","dorɯ"),
855
+ ("$","dorɯ"),
856
+
857
+ ("〇","marɯ"),
858
+ ("¥","eɴ"),
859
+
860
+ ("#","haʔɕɯ tagɯ"),
861
+ ("#","haʔɕɯ tagɯ"),
862
+
863
+ ("&","ando"),
864
+ ("&","ando"),
865
+
866
+ ("%","paːsento"),
867
+ ("%","paːsento"),
868
+
869
+ ("@","aʔto saiɴ"),
870
+ ("@","aʔto saiɴ"),
871
+
872
+ ("~","—"),
873
+ ("kʲɯɯdʑɯɯkʲɯɯ.kʲɯɯdʑɯɯ","kʲɯɯdʑɯɯ kʲɯɯ teɴ kʲɯɯdʑɯɯ")
874
+
875
+
876
+
877
+
878
+
879
+ ])
880
+
881
+ def random_sym_fix_no_space(text):
882
+ orig = text
883
+
884
+ for k, v in sym_ns.items():
885
+ text = text.replace(k, f" {v} ")
886
+
887
+ return text
888
+
889
+
890
+ spaces = dict([
891
+
892
+ ("ɯ ɴ","ɯɴ"),
893
+ ("na ɴ ","naɴ "),
894
+ (" mina ", " miɴna "),
895
+ ("ko ɴ ni tɕi ha","konnitɕiwa"),
896
+ ("ha i","hai"),
897
+ ("boɯtɕama","boʔtɕama"),
898
+ ("i eːi","ieːi"),
899
+ ("taiɕɯtsɯdʑoɯ","taiɕitsɯdʑoɯ"),
900
+ ("soɴna ka ze ni","soɴna fɯɯ ni"),
901
+ (" i e ","ke "),
902
+ ("�",""),
903
+ ("×"," batsɯ "),
904
+ ("se ka ɯndo","sekaɯndo"),
905
+ ("i i","iː"),
906
+ ("i tɕi","itɕi"),
907
+ ("ka i","kai"),
908
+ ("naɴ ga","nani ga"),
909
+ ("i eː i","ieːi"),
910
+
911
+ ("naɴ koɽe","nani koɽe"),
912
+ ("naɴ soɽe","nani soɽe"),
913
+ (" ɕeɴ "," seɴ "),
914
+
915
+ # ("konna","koɴna"),
916
+ # ("sonna"," soɴna "),
917
+ # ("anna","aɴna"),
918
+ # ("nn","ɴn"),
919
+
920
+ ("en ","eɴ "),
921
+ ("in ","iɴ "),
922
+ ("an ","aɴ "),
923
+ ("on ","oɴ "),
924
+ ("ɯn ","ɯɴ "),
925
+ # ("nd","ɴd"),
926
+
927
+ ("koɴd o","kondo"),
928
+ ("ko ɴ d o","kondo"),
929
+ ("ko ɴ do","kondo"),
930
+
931
+ ("oanitɕaɴ","oniːtɕaɴ"),
932
+ ("oanisaɴ","oniːsaɴ"),
933
+ ("oanisama","oniːsama"),
934
+ ("hoːmɯrɯɴɯ","hoːmɯrɯːmɯ"),
935
+ ("so ɴ na ","sonna"),
936
+ (" sonna "," sonna "),
937
+ (" konna "," konna "),
938
+ ("ko ɴ na ","konna"),
939
+ (" ko to "," koto "),
940
+ ("edʑdʑi","eʔtɕi"),
941
+ (" edʑdʑ "," eʔtɕi "),
942
+ (" dʑdʑ "," dʑiːdʑiː "),
943
+ ("secɯnd","sekaɯndo"),
944
+
945
+ ("ɴɯ","nɯ"),
946
+ ("ɴe","ne"),
947
+ ("ɴo","no"),
948
+ ("ɴa","na"),
949
+ ("ɴi","ni"),
950
+ ("ɴʲ","nʲ"),
951
+
952
+ ("hotond o","hotondo"),
953
+ ("hakoɴd e","hakoɴde"),
954
+ ("gakɯtɕi ɽi","gaʔtɕiɽi "),
955
+
956
+ (" ʔ","ʔ"),
957
+ ("ʔ ","ʔ"),
958
+
959
+ ("-","ː"),
960
+ ("- ","ː"),
961
+ ("--","~ː"),
962
+ ("~","—"),
963
+ ("、",","),
964
+
965
+ (" ː","ː"),
966
+ ('ka nade',"kanade"),
967
+
968
+ ("ohahasaɴ","okaːsaɴ"),
969
+ (" "," "),
970
+ ("viː","bɯiː"),
971
+ ("ːː","ː—"),
972
+
973
+ ("d ʑ","dʑ"),
974
+ ("d a","da"),
975
+ ("d e","de"),
976
+ ("d o","do"),
977
+ ("d ɯ","dɯ"),
978
+
979
+ ("niːɕiki","ni iɕiki"),
980
+ ("anitɕaɴ","niːtɕaɴ"),
981
+ ("daiːtɕi","dai itɕi"),
982
+
983
+ ("naɴ sono","nani sono"),
984
+ ("naɴ kono","nani kono"),
985
+ ("naɴ ano","nani ano"), # Cutlet please fix your shit
986
+ (" niːtaɽa"," ni itaɽa"),
987
+ ("doɽamaɕiːd","doɽama ɕiːdʲi"),
988
+ ("aɴ ta","anta"),
989
+ ("aɴta","anta"),
990
+ ("naniːʔteɴ","nani iʔteɴ"),
991
+ ("niːkite","ni ikite")
992
+
993
+ ])
994
+
995
+
996
+
997
+ def random_space_fix(text):
998
+ orig = text
999
+
1000
+ for k, v in spaces.items():
1001
+ text = text.replace(k, v)
1002
+
1003
+ return text