Remember the script is running on the remote machine as well, and the upload_file
function will always try to upload the file. It's meant as a utility function you can use for uploading files, but it does not care if you're running locally or remotely
SuccessfulKoala55 Is this example correct:
https://allegro.ai/docs/examples/examples_storagehelper/#uploading-a-file
manager.upload_file(local_file="/mnt/data/also_file.ext", remote_url="s3://MyBucket/MyFolder")
manager.upload_file(local_file=str(output_dir/"config.json"), remote_url=remote_url+'bucket/')
This is my specific upload. I wanted to make sure the example in the documentation is accurate.
manager.upload_file(local_file="/mnt/data/also_file.ext", remote_url="s3://MyBucket/MyFolder")
When you are not using the StorageManager you don’t get the OSError: [Errno 9] Bad file descriptor
errors?
Can you send me the logs with and without? (you can send the logs in DM if you prefer)
The files were crearted, and the one I needed uploaded to storage.
TimelyPenguin76 good morning,
From the CLI. Yes, I see it.
yes, when I comment out the storage manager no error.
The error doesn't appear when not using the storage manager.
Hi GleamingGiraffe20 , still getting those errors?
Can you point me to a specific example?
You can probably replicate this yourself. I'm using simple transformers library for quick benchmarking.
Just run one his examples (I'm using multilabel clasification): https://github.com/ThilinaRajapakse/simpletransformers but change the output_dir to something else. when you don't track this works (no task.init), if you track this get stuck.
Locally I had no issues finding loading etc., didn't try to load it into memory remotely. The thing is, since this is stuck at training (with msg ‘TRAINS Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start’) I will probably have no idea whether it was loaded or not.
Hi GleamingGiraffe20 ,
Without adding Task.init
, i’m getting some OSError: [Errno 9] Bad file descriptor
error, do you get those too?
Do you run your script from CLI or IDE (pycharm maybe?)?
This specific issue doesn't concern the output URI (although I did a run both with it and without) as I'm trying to load a config file that's being saved locally using manager.upload_file.
https://github.com/ThilinaRajapakse/simpletransformers#minimal-start-for-multilabel-classification that's what I'm using (just with DistilBERT)
How do you load the file? Can you find this file manually?
Hi GleamingGiraffe20 ,
The example in the documentation is missing a filename at the end of remote URL (an error I gotten locally when I tried to upload).
In https://allegro.ai/docs/examples/examples_storagehelper/#uploading-a-file example, the filename is /mnt/data/also_file.ext
, did I miss the example you talked about? If so, can you send a link to it?
When using a trains task to track my run AND changing my scripts output directory, I get: ‘TRAINS Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start’ - and that’s it, it’s stuck on running and no error. When running the same script without the trains task it works.
changing my scripts output directory
can you send a small example of that? did you changed the output_uri
?
For 'TRAINS Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start'
message - the iteration reporting is automatically detected if you are using tensorboard
, matplotlib
, or explicitly with trains.Logger
Assuming there were no reports, so the monitoring falls back to report every 30 seconds.
Thanks for the examples, will try to reproduce it now.
SuccessfulKoala55 Conclusions:
The example in the documentation is missing a filename at the end of remote URL (an error I gotten locally when I tried to upload). When using a trains task to track my run AND changing my scripts output directory, I get: 'TRAINS Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start' - and that's it, it's stuck on running and no error. When running the same script without the trains task it works.
I'll check.
I do however expect to see an error message when something isn't working, this just got stuck.
I agree - I just want to make sure I understand the exact scenario when it does 🙂
When this code is running on your machine, does it work?
No error.
I didn't check the contents on the remote machine. However, when you run it locally it creates a bunch of files (text, model etc.)
GleamingGiraffe20 when you run on a remote machine, is there a file in /mnt/data/also_file.ext
?
Were you able to replicate the issue with task?
I'm not specifying a filename under remote URL, just like in the example.