Hi GleamingGiraffe20 , still getting those errors?
Were you able to replicate the issue with task?
Can you send me the logs with and without? (you can send the logs in DM if you prefer)
yes, when I comment out the storage manager no error.
When you are not using the StorageManager you don’t get the OSError: [Errno 9] Bad file descriptor
errors?
The error doesn't appear when not using the storage manager.
The files were crearted, and the one I needed uploaded to storage.
TimelyPenguin76 good morning,
From the CLI. Yes, I see it.
Hi GleamingGiraffe20 ,
Without adding Task.init
, i’m getting some OSError: [Errno 9] Bad file descriptor
error, do you get those too?
Do you run your script from CLI or IDE (pycharm maybe?)?
https://github.com/ThilinaRajapakse/simpletransformers#minimal-start-for-multilabel-classification that's what I'm using (just with DistilBERT)
Can you point me to a specific example?
For 'TRAINS Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start'
message - the iteration reporting is automatically detected if you are using tensorboard
, matplotlib
, or explicitly with trains.Logger
Assuming there were no reports, so the monitoring falls back to report every 30 seconds.
Thanks for the examples, will try to reproduce it now.
You can probably replicate this yourself. I'm using simple transformers library for quick benchmarking.
Just run one his examples (I'm using multilabel clasification): https://github.com/ThilinaRajapakse/simpletransformers but change the output_dir to something else. when you don't track this works (no task.init), if you track this get stuck.
Locally I had no issues finding loading etc., didn't try to load it into memory remotely. The thing is, since this is stuck at training (with msg ‘TRAINS Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start’) I will probably have no idea whether it was loaded or not.
How do you load the file? Can you find this file manually?
This specific issue doesn't concern the output URI (although I did a run both with it and without) as I'm trying to load a config file that's being saved locally using manager.upload_file.
manager.upload_file(local_file="/mnt/data/also_file.ext", remote_url="s3://MyBucket/MyFolder")
Hi GleamingGiraffe20 ,
The example in the documentation is missing a filename at the end of remote URL (an error I gotten locally when I tried to upload).
In https://allegro.ai/docs/examples/examples_storagehelper/#uploading-a-file example, the filename is /mnt/data/also_file.ext
, did I miss the example you talked about? If so, can you send a link to it?
When using a trains task to track my run AND changing my scripts output directory, I get: ‘TRAINS Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start’ - and that’s it, it’s stuck on running and no error. When running the same script without the trains task it works.
changing my scripts output directory
can you send a small example of that? did you changed the output_uri
?
SuccessfulKoala55 Conclusions:
The example in the documentation is missing a filename at the end of remote URL (an error I gotten locally when I tried to upload). When using a trains task to track my run AND changing my scripts output directory, I get: 'TRAINS Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start' - and that's it, it's stuck on running and no error. When running the same script without the trains task it works.
I agree - I just want to make sure I understand the exact scenario when it does 🙂
I'll check.
I do however expect to see an error message when something isn't working, this just got stuck.
When this code is running on your machine, does it work?
I'm not specifying a filename under remote URL, just like in the example.
manager.upload_file(local_file=str(output_dir/"config.json"), remote_url=remote_url+'bucket/')
This is my specific upload. I wanted to make sure the example in the documentation is accurate.
Remember the script is running on the remote machine as well, and the upload_file
function will always try to upload the file. It's meant as a utility function you can use for uploading files, but it does not care if you're running locally or remotely
GleamingGiraffe20 when you run on a remote machine, is there a file in /mnt/data/also_file.ext
?
SuccessfulKoala55 Is this example correct:
https://allegro.ai/docs/examples/examples_storagehelper/#uploading-a-file
manager.upload_file(local_file="/mnt/data/also_file.ext", remote_url="s3://MyBucket/MyFolder")
No error.
I didn't check the contents on the remote machine. However, when you run it locally it creates a bunch of files (text, model etc.)