No error.
I didn't check the contents on the remote machine. However, when you run it locally it creates a bunch of files (text, model etc.)
GleamingGiraffe20 when you run on a remote machine, is there a file in /mnt/data/also_file.ext
?
yes, when I comment out the storage manager no error.
Can you send me the logs with and without? (you can send the logs in DM if you prefer)
SuccessfulKoala55 Is this example correct:
https://allegro.ai/docs/examples/examples_storagehelper/#uploading-a-file
manager.upload_file(local_file="/mnt/data/also_file.ext", remote_url="s3://MyBucket/MyFolder")
When this code is running on your machine, does it work?
Hi GleamingGiraffe20 ,
The example in the documentation is missing a filename at the end of remote URL (an error I gotten locally when I tried to upload).
In https://allegro.ai/docs/examples/examples_storagehelper/#uploading-a-file example, the filename is /mnt/data/also_file.ext
, did I miss the example you talked about? If so, can you send a link to it?
When using a trains task to track my run AND changing my scripts output directory, I get: ‘TRAINS Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start’ - and that’s it, it’s stuck on running and no error. When running the same script without the trains task it works.
changing my scripts output directory
can you send a small example of that? did you changed the output_uri
?
Were you able to replicate the issue with task?
Remember the script is running on the remote machine as well, and the upload_file
function will always try to upload the file. It's meant as a utility function you can use for uploading files, but it does not care if you're running locally or remotely
manager.upload_file(local_file=str(output_dir/"config.json"), remote_url=remote_url+'bucket/')
This is my specific upload. I wanted to make sure the example in the documentation is accurate.
When you are not using the StorageManager you don’t get the OSError: [Errno 9] Bad file descriptor
errors?
Hi GleamingGiraffe20 ,
Without adding Task.init
, i’m getting some OSError: [Errno 9] Bad file descriptor
error, do you get those too?
Do you run your script from CLI or IDE (pycharm maybe?)?
The error doesn't appear when not using the storage manager.
Can you point me to a specific example?
https://github.com/ThilinaRajapakse/simpletransformers#minimal-start-for-multilabel-classification that's what I'm using (just with DistilBERT)
I'm not specifying a filename under remote URL, just like in the example.
This specific issue doesn't concern the output URI (although I did a run both with it and without) as I'm trying to load a config file that's being saved locally using manager.upload_file.
I agree - I just want to make sure I understand the exact scenario when it does 🙂
The files were crearted, and the one I needed uploaded to storage.
How do you load the file? Can you find this file manually?
manager.upload_file(local_file="/mnt/data/also_file.ext", remote_url="s3://MyBucket/MyFolder")
TimelyPenguin76 good morning,
From the CLI. Yes, I see it.
You can probably replicate this yourself. I'm using simple transformers library for quick benchmarking.
Just run one his examples (I'm using multilabel clasification): https://github.com/ThilinaRajapakse/simpletransformers but change the output_dir to something else. when you don't track this works (no task.init), if you track this get stuck.
SuccessfulKoala55 Conclusions:
The example in the documentation is missing a filename at the end of remote URL (an error I gotten locally when I tried to upload). When using a trains task to track my run AND changing my scripts output directory, I get: 'TRAINS Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start' - and that's it, it's stuck on running and no error. When running the same script without the trains task it works.
Hi GleamingGiraffe20 , still getting those errors?
For 'TRAINS Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start'
message - the iteration reporting is automatically detected if you are using tensorboard
, matplotlib
, or explicitly with trains.Logger
Assuming there were no reports, so the monitoring falls back to report every 30 seconds.
Thanks for the examples, will try to reproduce it now.
I'll check.
I do however expect to see an error message when something isn't working, this just got stuck.
Locally I had no issues finding loading etc., didn't try to load it into memory remotely. The thing is, since this is stuck at training (with msg ‘TRAINS Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start’) I will probably have no idea whether it was loaded or not.