BoredHedgehog47
is this ( https://clearml.slack.com/archives/CTK20V944/p1665426268897429?thread_ts=1665422655.799449&cid=CTK20V944 ) the same issue (or solution) ?
Hi VexedCat68
Could it be the python version is not the same? (this is the only reason not to find a specific python package version)
How does
deferred_init
affect the process?
It ders all the networking and stuff in the background (usually the part that might slow the Task initialization process)
Also, is there a way of specifying a blacklist instead of a whitelist of features?
BurlyPig26 you can while list per framework and file name, exampletask = Task.init(..., auto_connect_frameworks={'pytorch' : '*.pt', 'tensorflow': ['*.h5', '*.hdf5']} )What am I missing ?
Yes, let's assume we have a task with id aabbcc
On two different machines you can do the following:trains-agent execute --docker --id aabbccThis means you manually spin two simultaneous copies of the same experiment, once they are up and running, will your code be able to make the connection between them? (i.e. openmpi torch distribute etc?)
ReassuredTiger98 no, but I might be missing something.
How do you mean project-specific?
What's the error you are getting ?
(open the browser web developer, see if you get something on the console log)
The agents are docker containers, how do I modify the startup script so it creates a queue?
Hmm actually not sure about that, might not be part of the helm chart.
So maybe the easiest is:from clearml.backend_api.session.client import APIClient c = APIClient() c.queues.create(name="new_queue")
Yep ๐
Also maybe worth changing the entry point of the agent docker to always create a queue if it is missing?
(I mean new logs, while we are here did it report any progress)
Hi ZippyAlligator65
You can configure it in the clearml.conf: see here:
https://github.com/allegroai/clearml-agent/blob/ebb955187dea384f574a52d059c02e16a49aeead/clearml_agent/backend_api/config/default/agent.conf#L202
Hi HelpfulDeer76
I mean that the task was being monitored on the demo ClearML server created by Allegro
Yes that is consistent with what I would expect to have happened
Basically if you are running it as k8s job, you can just configure the following environment variables:CLEARML_WEB_HOST: CLEARML_API_HOST: CLEARML_FILES_HOST: CLEARML_API_ACCESS_KEY: <clearml access> CLEARML_API_SECRET_KEY: <clearml secret>
The function
a delete request with a
raise_on_errors=False
flag.
Are you saying we should expose raise_on_errors it to _delete_artifacts() function itself?
If so, sure seems logic to me, any chance you want to PR it? (please just make sure the default value is still False so we keep backwards compatibility)
wdyt?
delete logged images and texts though
logged images are also stored there?
Hi @<1697056701116583936:profile|JealousArcticwolf24>
You have clearml Datasets None
It will version catalog and store meta-data of your datasets.
Each version only stores the delta from the parent version, but delta is on a file granularity not a "block" granularity
Notice that under the hood of course it uses storage solutions to store and cache the underlying immutable copy of the data. What's your use case?
Hi HandsomeCrow5 hmm interesting use case,
we have seen html reports as artifacts, then you can press "download" and it should open in another tab, what would you expect on "debug samples" ?
Hi GreasyLeopard35
I try to resume a stopped or aborted parameter optimization experiment,
How are you continuing the HPO? are you runing everything locally? is this with an agent? are you seeing the '[0, 0]' value on the configuration when launching the HPO or when continuing it ?
Hi @<1573119962950668288:profile|ObliviousSealion5>
Hello, I don't really like the idea of providing my own github credentials to the ClearML agent. We have a local ClearML deployment.
if you own the agent, that should not be an issue,, no?
forward my SSH credentials using
ssh -A
and then starting the clearml agent?
When you are running the agent and you force git clonening with SSH, it will autmatically map the .ssh into the container for the git to use
Ba...
Sure thing!
BTW: not sure if it helps but the SaaS version integrates with Genesis Cloud I know they provide cheap GPUs might be worth checking
Hi @<1643423185791619072:profile|DashingCentipede5>
Notice that you called "start_locally", it tries to run the code locally inside your jupter notebook, it assumes everything including code already exists, is that your case ?
hi @<1546303293918023680:profile|MiniatureRobin9>
I can still see the metrics in Grafana. I
it will not delete it from grafana, it means it's no longer collected, make sense ?
I think it's supposed to be out early Nov ๐
Bake to the error:
clearml_agent: ERROR: Failed getting token (error 401 from
): Unauthorized (invalid credentials) (failed to locate provided credentials)
See here:
https://github.com/allegroai/clearml-server/blob/3f2b96266bc51bfce680bd759c7fa9d635ae36d3/docker/docker-compose.yml#L131
You need to provide an access key so it can actually "talk" to the server next to it.
Also, on the ClearML dashboard, I can see theย
clearml-agent
ย log:
Is the clearml-agent running in docker mode ?
@<1615519322766053376:profile|DrainedOctopus19> if your code is a single file (which was stored on the clearml server), then ity is stored on the Task:
task = Task.get_task("task UID here")
# this should be your entire code
print(task.data.script.diff)
seems like the network inside the running code cannot access the localhost (even though you have --network=host . Could you test it with the machine's IP?
(Actually the best practice is to add a name to the machine (in your hosts file), so that if later you move the server, all the links will be valid)
Give me a minute, I'll check something
Wait who is creating this file? I thought you remove it in the uncommitted changes