Reputation
Badges 1
25 × Eureka!How is this different from argparser btw?
Not different, just a dedicated section ๐ Maybe we should do that automatically, the only "downside" is you will have to name the Dataset when getting it (so it will have an entry name in the Dataset section), wdyt ?
Hi @<1523701868901961728:profile|ReassuredTiger98>
Could you send the full log ? Also what's the clearml-agent version?
I can but that is not a configuration we would want to run with in production
Agreed, I just want to isolate the issue. I think this is the bottom python interface missing some configuration or environment variables
DilapidatedDucks58 trains-agent adds the artifactory URL as --extra-index-url , are you sure you are getting the correct torch version in the container? because the torch html is not an artifactory html, it is a list of links, I just want to make sure you are getting the correct version, because otherwise it can default to the CPU version, which we don't want ๐ anyhow you can use the direct link in the "installed packages and just put there " https://download.pytorch.org/whl/nightly/cu101...
Hi BroadMole98
A bit hacky but doable ๐task = Task.get_task(task_id='aabbcc') task.get_logger().report_scalar(...)
MelancholyChicken65 what's the clearml-serving you are using ? (I believe this issue was fixed in 1.2)
We were able to find a stable, free, open source, multiplatform way to do this
You mean to move the data from the gdrive to object storage ? or to just mount the gdrive ?
but then the error occurs, after the training und the validating where succesfuly completed
It seems it is failing on the last eval ? could it be testing is missing? is it the same dataset ? can you verify the file is there? (notice I see a mix of / and \ in the file name, this is odd Windows is \ and linux/mac are / , you should never have a mix)
EnviousStarfish54 a fix is already available in the latest RC
Could you verify it solves your issue as well?pip install trains==0.16.2rc0
I just set the git credentials in the
clearml.conf
and it works out of the box
git has issues with passing the user/token from the main repo to the submodules, hence my surprise that it is working out-of-the-box.
Do notice that if you are ussing ssh-key this is a none issue.
Nope, no
.netrc
defined anywhere, ...
If this is the case can you try to add the following to your "extra_vm_bash_script"
` echo machine example.com > ~/.netrc && echo log...
I see..
Generally speaking If that is the case, I would think it might be better to use the docker mode, it offers way more stable environment, regardless on the host machine runinng the agent. Notice there is no need to use custom containers, as the agent will basically run the venv process, only inside a container, allowing you to reuse offf the shelf containers.
If you were to add this, where would you put it? I can use a modified version ofย
clearml-agent
Yep, that would b...
Hi BoredPigeon26
what do you mean by "reuse the task" ? is this manual execution (i.e. from code)?
How about archiving the old version?
You can also force Task.init to always create a new Task (which preserves the previous run alongside the execution tab)
Basically what's the specific use case ?
Hi, I was expecting to see the container rather then the actual physical machine.
It is the container, it should tunnels directly into it. (or that's how it should be).
SSH port 10022
However, I have not yet found a flexible solution other than ssh-agent forwarding.
And is it working?
For example HPO, early stopping. It would mark the Task as aborted. Make sense ?
Hi ConfusedPig65
Any keras model will be automatically uploaded if you pass an upload url to the Task init:task = Task.init('examples', 'keras upload test', output_uri=" ")(You can also pass to output_uri s3://buckket/folder or change the default output_uri in the clearml.conf file)
After this line any keras model will be automatically uploaded (you will see it under the Artifacts Tab)
Accessing models from executed tasks:
` trains_task = Task.get_task('task_uid_here')
last_check...
Well it should work, make sure you see the Task "holds" all the information needed (under the execution tab). repo / uncommitted changes / python packages etc.
Then configure your agent (choose pip/conda/poetry as package managers), and spin it up (by default in venv/coda mode, or in docker mode)
Should work ๐
Thanks @<1569496075083976704:profile|SweetShells3> ! let me see if I can reproduce the issue
Okay I found it, this is due to the fact the newer versions are sending the events/images in a subprocess (it used to be a thread).
The creation of the object is done on he main process, updating file index (round robin manner), but the check itself, happens on the subprocess., which is not "aware" of the used indexes (i.e. it is always 0, hence when exceeding the history side, it skips it)
JitteryCoyote63 any chance the trains-agent-1 is running in services mode ?
Which means it will spin more than a single experiment at once
in the UI, find the task (just search for the Task ID, it will find it), then tight click it, and select "reset"
@<1562610699555835904:profile|VirtuousHedgehong97>
source_url="s3:...",
This means your data is already on S3 bucket, it will not "upload" it it will just register it.
If you want to upload files, then they should be local and then when you call upload you can specify the target S3 bucket, and the data will be stored in a unique folder in the bucket
Does that make sense ?
The one it is trying to execute, i.e. on the Task it shows as Script Path
I have a question regarding running the code on the remote machine, each time I run the code I see the console in the ClearML server start downloading all the libraries I used in the code and when I run another code the same thing happens so why it has to download all the libraries again and many times?
I'm assuming you are referring to the installation, the downloaded python packages are cached.
You can turn on full caching by uncommenting the following line:
https://github.com/alleg...
BTW: if you make the right column the base line (i.e. move it to the left, you will get what you probably expected)
In any case, do you have any suggestion of how I could at least hack tqdm to make it behave? Thanks
I think I know what the issue is, it seems tqdm is using Unicode for the CR this is the 1b 5b 41 sequence I see on the binary log.
Let me see if I can hack something for you to test ๐