Reputation
Badges 1
25 × Eureka!Sure, ReassuredTiger98 just add them after the docker image in the "Base Docker image" section under the execution Tab. The same applies for setting it from code.
example:nvcr.io/nvidia/tensorflow:20.11-tf2-py3 -v /mnt/data:/mnt/data
You can also always force extra docker run arguments by changing the clearml.conf on the agent itself:
https://github.com/allegroai/clearml-agent/blob/822984301889327ae1a703ffdc56470ad006a951/docs/clearml.conf#L121
BoredGoat1 where exactly do you think that happens ?
https://github.com/allegroai/trains/blob/master/trains/utilities/gpu/gpustat.py#L316
?
https://github.com/allegroai/trains/blob/master/trains/utilities/gpu/gpustat.py#L202
Hi @<1720249416255803392:profile|IdealMole15>
I'm assuming you mean on a remote machine with clearml-agent running ?
If you do, then you either use clearml-task
to create a Task (Job) and specify the container and script. or click on "Create New Experiment" in the UI, and fill out the git repo / script and specify the docker image.
Make sense?
Hi @<1631102016807768064:profile|ZanySealion18>
ClearML (remote execution) sometimes doesn't "pick-up" GPU. After I rerun the task it picks it up.
what do you mean by "does not pick up"? is it the container is up but not executed with --gpus , so no GPU access?
of what task? i'm running lots of them and benchmarking
If you are skipping every installation it should be the same
because if you set CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1
it will not install Anything at all
This is why it's odd to me...
wdyt?
- Could we add a comparison feature directly from the search results (Dashboard view -> search -> highlight some experiments for comparison)?
Totally forgot about the global search feature, hmm I'm not sure the webapp is in the correct "state" for that, i.e. I think that the selection only works in "table view", which is the "all experiments" flat table
- Could we add a filter on the project name in the "All Experiments" project?
You mean "filter by project" ?
Could we ad...
JitteryCoyote63 no I think this is all controlled from the python side.
Let me check something
I'll make sure they get back to you
Hi @<1572395184505753600:profile|GleamingSeagull15>
Is there an official place to report bugs and add feature requests for the app.clear.ml website?
GitHub issues is usually the place, or the
Assuming GitHub, but just making sure you don't have another PM tool you'd rather use.
Really appreciate asking! it is always hard to keep track 🙏
CheerfulGorilla72 my guess is the Slack token does not have credentials for the private channel, could that be ?
when you are running the n+1 epoch you get the 2*n+1 reported
RipeGoose2 like twice the gap, i.e internally it adds the an offset of the last iteration... is this easily reproducible ?
It's dead simple to install:
Pip install trains-agent
the.n you can simply do:
Trains-agent execute --id myexperimentid
PompousBeetle71 If this is argparser and the type is defined, the trains-agent will pass the equivalent in the same type, with str
that amounts to '' . make sense ?
(also could you make sure all posts regrading the same question are put in the thread of the first post to the channel?)
I'm assuming the reason it fails is that the docker network is Only available for the specific docker compose. This means when you spin Another docker compose they do not share the same names. Just replace with host name or IP it should work. Notice this has nothing to do with clearml or serving these are docker network configurations
hmm I assume the reason is the cookie / storage changed?
So the thing is clearml
automatically detects the last iteration of the previous run, my assumption you also add it hence the double shift.
SourOx12 could that be it?
Hi ArrogantBlackbird16
but it returns a task handle even after the Task has been closed.
It should not ... That is a good point!
Let's fix that 🙂
It is for storing the predictions a trained model makes, so two different models do create slightly different images
That actually makes sense.
So how would you create exactly the same file (i.e. why do you need to manually control the upload folder, wouldn't creating a new unique folder suffice ?)
You can see in the log it tries to download an artifact from a specific IP:URL is that link a valid one?
(this seems like the main cause of the error, first line in the screenshot)
Also, for a single parameter you can use:cloned_task.set_parameter(name="Args/artifact_name", value="test-artifact", description="my help text that will appear in the UI next to the value")
This way, you are not overwriting the other parameters, you are adding to them.
(Similar to update_parameters
, only for a single parameter)
Yes, the left side is the location of the file on the host machine, the right side is the location of the file inside the docker. in our case it is the same location
SmarmySeaurchin8 could you test with the latest RCpip install clearml==0.17.5rc2
Wait even without the pipeline decorator this function creates the warning?
@<1523707653782507520:profile|MelancholyElk85>
What's the clearml
version you are using ?
Just making sure... base_task_id has to point to a Task that is in "draft" mode, for the pipeline to use it
Set it on the PID of the agent process itself (i.e. the clearml-agent python process)
MelancholyElk85 if you are manually adding models OutputModel, then when you call update_weights(...)
upload will start in the background (if the process ends it will wait until the upload is competed). You can also specify auto_delete_file
which will delete the local copy once the upload completes
Damn, JitteryCoyote63 seems like a bug in the backend, it will not allow you to change the task type to the new types 😞
Hi JuicyFox94
I think you are correct, this bug will explain the entire thing.
Basically what happens is that remote_execute stops the local run before the configuration is set on the Task. Then running remotely the code pull the configuration, sees that it is empty and does nothing.
Let me see if I can reproduce it...