
Reputation
Badges 1
25 × Eureka!Oh this is Only in the SaaS server ...
(I'm sorry I was not clear on that)
Hmm and you are getting empty list for thi one:
server_info['url'] = f"http://{server_info['hostname']}:{server_info['port']}/"
So the only difference is how I log in into machine to start clear-ml
the only different that I can think of is the OS Environments in the two login types:
can you run export
in the two cases and check the diff between them?export
And command is a list instead of a single str
"command list", you mean the command
argument ?
Hi FloppyDeer99
Since this thread is a bit old, I might have missed something ๐
Are we saying the links are not working in the UI ?
(notice the links themselves are generated by the clearml package, so if there was a bug, still not sure here, then old links will remain invalid until manually fixed) Can you verify that the latest clearml generates working links?
JitteryCoyote63 virtualenv v20 is supported, pip v21 needs the latest trains/trains-agent RC,
Hi EnthusiasticCoyote38
But one one process finished it changed task status to complete. May be you know some save way to deal with such situation? Or maybe the best way to check task status before upload object?
Well, you can actually forcefully set the state of the Task to running, then add artifacts, then close it?
would that work?
` my_other_task.reload()
my_other_task.mark_started(force=True)
my_other_task.upload_artifact(...)
my_other_task.flush(wait_for_uploads=True)
my_othe...
ReassuredTiger98 I โค the DAG in ASCII!!!
port = task_carla_server.get_parameter("General/port")
This looks great! and will acheive exactly what you are after.
BTW: when you are done you can do :task_carla_server.mark_aborted(force=True)
And it will shutdown the Clara Task ๐
SparklingElephant70 , let me make sure I understand, the idea is to make sure the pipeline will launch a specific commit/branch, and that you can control it? Also are you using the pipeline add_step
function or are you decorating a function with PipelineDecorator ?
WickedGoat98 this is awesome! Let me know how I could help ๐
BTW: I checked regrading the plot comparison, this is a BE issue due to the size of the plot, I was told a fix will be deployed in a day or two.
: For artifacts already registered, returns simply the entry and for artifacts not existing, contact server to retrieve them
This is the current state.
Downloading the artifacts is done only when actually calling get()/get_local_copy()
clearml_agent: ERROR: Can not run task without repository or literalscript in
script.diff
This is odd ...
OutrageousSheep60 when you launch clearml-session
it tells you the session ID (which is also a Task ID), can you look for it in the UI and check there is something in the repo/uncommitted-changes section ?
Quick update Nexus supports direct http upload, which means that as CostlyOstrich36 mentioned, just pointing to the Nexus http upload endpoint would work:output_uri="http://<nexus>:<port>/repository/something/"
See docs:
https://support.sonatype.com/hc/en-us/articles/115006744008-How-can-I-programmatically-upload-files-into-Nexus-3-
Hi DeliciousBluewhale87
So basically no webhooks, the idea is that you have full API to query everything in the system and launch task based on any logic. You can check the slack monitoring example, it is basically doing that. Wdyt?
It seems like the configuration is cached in a way even when you change the CLI parameters.
@<1523704461418041344:profile|EnormousCormorant39> nice!
Yes the configuration is cached so that after you set it once you can just call clearml-session again without all the arguments
What was the actual issue ? Should we add something to the printout?
Yes ๐ documentation is being worked on ... Anyhow we will be uploading a new documentation site soon (hopefully in a week or so), putting it all on GitHub so it will be easier for the community to edit and add more
YummyMoth34
It tried to upload all events and then killed the experiment
Could you send a log?
Also, what's the train package version ?
SubstantialElk6
Notice if you are using a manual setup the default is "secure: false" you have to change it to "secure: true":
https://github.com/allegroai/clearml-agent/blob/176b4a4cdec9c4303a946a82e22a579ae22c3355/docs/clearml.conf#L251
HighOtter69
By default if you are continuing an experiment it will start from the last iteration of the previous run. you can reset it with:task.set_initial_iteration(0)
looks like at the end of the day we removedย
proxy_set_header Host $host;
ย and use the fqdn for the proxy_pass line
And did that solve the issue?
Hi LazyLeopard18 ,
So long story short, yes it does.
Longer version, to really accomplish full federated learning with control over data at "compute points" you need some data abstraction layer. Without data abstraction layer, federated learning is just averaging derivatives from different location, this can be easily done with any distributed learning framework, such as horovod pr pytorch distributed or TF distributed.
If what you are after is, can I launch multiple experiments with the sam...
DilapidatedDucks58 use a full link , without the package namegit+
Maybe WackyRabbit7 is a better approach as you will get a new object (instead of the runtime copy that is being used)
Thanks JitteryCoyote63 , once we have a reproducible example the fix should be very quick to push (with these things reproducing it is the challenge)
RoughTiger69 I think you need the latest version (+1.3.0 with UI support)
If you are using an older version, you need to specify that you are continuing an execution (Change the "Configuration/Args/continue_pipeline" to True)
EDIT: clearml 1.3.x will work with clearml-server 1.2
was consistent, whereas for some reason this old virtualenv decided to use python2.7 otherwiseย
Yes,
This sounds like a virtualenv bug I think it will not hurt to do both (obviously we have the information)
ย
Thank you!!! ๐
Hi UnevenDolphin73
Took a long time to figure out that there was a specific Python version with a specific virtualenv that was old ...
NICE!
Then the task requested to use Python 3.7, and that old virtualenv version was broken.
Yes, if the Task is using a specific python version it will first try to find this one (i.e. which python3.7
) then use it to create the new venv
As a result -> Could the agent maybe also output theย
virtualenv
ย version used ...
One way to circumvent this btw would be to also add/use theย
--python
ย flag forย
virtualenv
Notice that when creating the venv , the cmd that is used is basically pythonx.y -m virtualenv ...
By definition this will create a new venv based on the python that executes the venv.
With all that said, it might be there is a bug in virtualenv and in some cases, it does Not adhere to this restriction
Simple file transfer test gives me approximately 1 GBit/s transfer rate between the server and the agent, which is to be expected from the 1Gbit/s network.
Ohhh I missed that. What is the speed you get for uploading the artifacts to the server? (you can test it with simple toy artifact upload code) ?