Reputation
Badges 1
25 × Eureka!Funny enough Iβm running into a new issue now.
Sorry my bad, I thought have known π yes it probably should be packages=["clearml==1.1.6"]
BTW: do you have any imports inside the pipeline function itself ? if you do not, then no need to pass "packages" at all, it will just add clearml
ShinyWhale52 any time π
Feel free to followup with more questions
Hi FiercePenguin76
By default clearml will list only the packages you import, and not derivative packages.
This means that if you import package X and it imports package Y , only package X will be listed.
The way it should work is by statically analyzing the entire repository, but if you import a local package from a different local folder, and that folder is Not in the same repo, it will not get listed (obviously if you install the external local package, it will be...
BTW: @<1673501397007470592:profile|RelievedDuck3> we just released 1.3.1 with better debugging, it prints full exception stack on failure to the clearml Serving Session Task.
I suggest you pull the latest image re run the docker compose and check what you have on the serving session Task in the UI
This is odd... can you post the entire trigger code ?
also what's the clearml version?
Maybe this is part of the paid version, but would be cool if each user (in the web UI) could define their own secrets,
Very cool (and actually how it works), but at the end someone needs to pay for salaries π
The S3 bucket credentials are defined on the agent, as the bucket is also running locally on the same machine - but I would love for the code to download and apply the file automatically!
I have an idea here, why not use the "docker bash script" argument for that ?...
Hi Team, I'm currently trying to install ClearML-Server on a Powerpc server with RedHat7.
You are a brave man LividCrab90 !
s there dockerfiles for the ClearML-Server stack somewhere ?
The main issue is replacing the DB containers, do you have elastic/mongo/redis for powerpc ?
Okay this more complicated but possible.
The idea is to write a glue layer (service) that pulls from the (i.e UI) queue
sets the slurm job
and puts it in a pending queue (so you know the job s waiting in the slurm scheduler)
There is a template here:
https://github.com/allegroai/trains-agent/blob/master/trains_agent/glue/k8s.py
I would love to help and setup a slurm glue in a similar manner
what do you think?
Okay, I'll make sure we change the default image to the runtime flavor of nvidia/cuda
AbruptHedgehog21 could it be the console log itself is huge ?
I just assumed it should only be triggered by dataset related things but after a lot of experimenting i realized its also triggered by tasks...
VexedCat68 I think you are correct, and it should only be triggered by "Dataset" Tasks, that said maybe there is a bug , in which case if there are no additional filters it will get triggered on Any change in the project. This will explain how adding the tags filter solved the issue.
wdyt?
Hi @<1523706645840924672:profile|VirtuousFish83>
Hello, is it possible to disable lazy loading ?
You mean in the UI for loading the console ?
The logs can be huge 10s and 100s of MB...
We have the same issue for hyperparameters even with only ~100 keys,
100+ parameters that is quite a lot.
So are you saying the search in the UI only filter the lazily loaded elements and not the entire param list?
Hi @<1523704757024198656:profile|MysteriousWalrus11>
"parents": [
"step_two",
"step_four"
],
Seems like step 5 depends on steps 2+4 , how did you create it? what did the console say ?
Could it be your not actually passing any output from step3 ? how is it dependent on it ?
So you want to have two Tasks and connect the two ?
Maybe the best approach is to have th current_task. the parent of the Dataset Task ?dataset._task.set_parent(Task.current_task())
MagnificentPig49 quick update, front-end guys updated me that with the next trains-server update they will have the web client code available on the repository , ETA probably mid May or so :)
Hi AstonishingRabbit13
now Iβm training yolov5 and i want to save all the info (model and metrics ) with clearml to my bucket..
The easiest thing (assuming you are running YOLOv5 with python train.py is to add the following env variable:CLEARML_DEFAULT_OUTPUT_URI=" " python train.pyNotice that you need to pass your GS credentials here:
https://github.com/allegroai/clearml/blob/d45ec5d3e2caf1af477b37fcb36a81595fb9759f/docs/clearml.conf#L113
I want is to manually provide a name to each series equal to the subject name (Subject 1, Subject 2, etc.)
They appear as they are reported to TB. I think this is a PyTorchLightning thing... If you look as the TB produced, you will get the same naming schemes, no?!
Sure thing π
BTW: ReassuredTiger98 this is definitely an interesting use case, and I think you can actually write some code to solve it if you like.
Basically let's followup on you setup:Machine X: agent listening to queue A, B_machine_a *notice we have two agents here Machine Y: agent listening to queue B_machine_bNow we (the users) will push our jobs into queues A and B
Now we have a service that does the following:
` see if we have a job in queue B
check if machine Y is working...
Hi MelancholyElk85
I have strong deja vu feeling. Credentials are OK. How to solve this? If you need the full log, how to share the full log without sharing private information? I'm fed up with this shit
Is this coming from the agent ?
That sounds like an issue with "working dir" , check the "Execution" "Working Directory" field.
'.' means the root of the git repository
'subfolder' means run the script from the subfolder etc. also make sure that the script path is adjusted accordingly.
btw: Trains should have filled in all the correct paths... If you have time get the latest trains (0.14.3) and run again see if the problem consts, we should probably fix that bug π
There is a version coming out next week, the one after it (probably 2/3 weeks later) will have this feature
Hi DrabCockroach54
Do we know if gpu_0_mem_usage and gpu_0_mem_used_gb, both shows current GPU usage?
the first is percentage used (memory % used at any specific moment) and the second is memory used GiB , both for the video memory
How to know from this how much GPU is reserved for the task if this task is in progress?
What do you mean by how much is reserved ? Are you running with an agent?
I gather there's a distinction between the two, with app.clear being the public cloud-based SaaS version
My apologies SmallDeer34 , this is all some legacy domain stuff
actually " http://app.pro.clear.ml ," is not used any longer (although up), and will be removed in the future
SaaS free/pro is the same domain ( http://app.clear.ml ), same accounts, the only difference is whether you added a credit card, other than that it is the same domain and access.
does that make sense ?
for example, if I somehow start the execution of an agent task in a specific docker container?)
You mean to specify the container from code? or to make sure the agent can access private docker container registry ? Or is it for private pypi container repository ?
SarcasticSquirrel56 when the process dies (i.e. killed) it does not have time not update the state, then the server watchdog will set the state to aborted after X amount of time of inactivity (default is 2 hours)
When I'm setting up my Pipeline, I can't go "here are some brand new tasks, please run them",
I think this is the main point. Can you create those Tasks via Task.create and get what you want? If so, then sure you can do that:
` def create_step_task(a_node):
task = Task.create(...)
return task
pipe.add_step(
name="stage_process",
parents=["stage_data"],
base_task_factor=create_step_task
) `wdyt?
As for the node, this confusing bit is that this is text from the docs...
Would this be best if it were executed in the Triton execution environment?
It seems the issue is unrelated to the Triton ...
Could I use theΒ
clearml-agent build
Β command and theΒ
Triton serving engine
Β task ID to create a docker container that I could then use interactively to run these tests?
Yep, that should do it π
I would start simple, no need to get the docker itself it seems like clearml credentials issue?!
The package detection is done when running the code on your laptop, and this is when it first logs the packages and versions. Following it, what do you have on your laptop? OS/Conda/Python