I don't mean to give negative feedback, as I don't consider myself a full-blown expert with Python/ML, however, for someone with passing experience, it fails out of the box for me, with and without the typically required 16Hz bit rate audio files (of various codecs/formats).
Was really hoping it would be a quick, brilliant solution to something I'm working on now, perhaps I'll dig in and invest in it, but I'm not sure I have the luxury right now to do the exploratory work... Hope someone else has better luck than I!
I would recommend then to be more specific. Did you had trouble installing it? Did it give you an error? Was there no output? Was the output wrong? Is it not working on your files, but working on example files? Is it solving a different problem than the one you have?
Installing was okay, but it was not running on any of the sample files I had. This is the output I got:
UserWarning: You are using a softmax over axis 3 of a tensor of shape (1, 8, 1, 1). This axis has size 1. The softmax operation will always return the value 1, which is likely not what you intended. Did you mean to use a sigmoid instead?
warnings.warn(
I know this isn't the right place for this, the right place is raising within github, but because you asked I posted...
Moonshine author here: The warning is from Keras library, and is benign. If you didn't get any other output, it was probably because the model thought there was no speech (not saying there really was no speech). We uploaded ONNX version that is considerably faster than the Torch/JAX/TF versions, and is usable with less package bloat. I hope you would give it another shot.
Was really hoping it would be a quick, brilliant solution to something I'm working on now, perhaps I'll dig in and invest in it, but I'm not sure I have the luxury right now to do the exploratory work... Hope someone else has better luck than I!