Reputation
Badges 1
25 × Eureka!self.task.upload_artifact('trend_step', self.trend_step + 1)
Out of curiosity why would every request generate an artifact ? Wouldn't it be better to report as part of the statistics ?
What would be the size / type of the matrix X
(i.e. np.size / np.dtype) ?
TightElk12 I think this message belongs to a diff thread ;)
TRAINS_WORKER_NAME=first_agent trains-agent --gpus 0
andTRAINS_WORKER_NAME=second_agent trains-agent --gpus 0
Yes the one you create manually is not really of the same "type" as the one you create online, this is why you do not see it there 😞
not sure what is the "right way" 🙂
But I do pkill -f "trains-agent --gpus 0"
This will kill a process that started "trains-agent --gpus 0" Notice it matches the cmd pattern so it has to match the way you executed the agent. You can check it with ps -Af | grep trains-agent
Task status change to "completed" is set after all artifacts upload is completed.
JitteryCoyote63 that seems like the correct behavior for your scenario
I did nothing to generate a command-line. Just cloned the experiment and enqueued it. Used the server GUI.
Who/What created the initial experiment ?
I noticed that if I run the initial experiment by "python -m folder_name.script_name"
"-m module" as script entry is used to launch entry points like python modules (which is translated to "python -m script")
Why isn't the entry point just the python script?
The command line arguments are passed as arguments on the Args section of t...
Pretty confusing that neither
services
StickyLizard47 basically this is how a services queue agent should be spinned:
https://github.com/allegroai/clearml-server/blob/9b108740da21f25407bd2c59583ca1c86f8e1faa/docker/docker-compose.yml#L123
When spinning on a k8s cluster, this is a bit more complicated, as it needs to work with the clearml-k8s-glue.
See here how to spin it on k8s
https://github.com/allegroai/clearml-agent/tree/master/docker/k8s-glue
How does ClearML select reference branch? Could it be that ClearML only checks "origin" branch?
Yes 😞 I think we can quickly fix that, I'm just trying to realize if there are down sides to running "git ls-remote --get-url" without origin
Interesting... TrickyRaccoon92 could it be the validation phase was creating a new Tensorboard file ?
I think I found something, let me test my theory
The agent is installing the "Installed Paclages" section of the Task (think of it as requirements.txt)
And again, what do you have there? Is it the outcome of the Task.init auto populating it?
Hmm that is odd, could it be you are changing the sys.path ?
(What I'm assuming is happening is that it detects the packages in the PYTHONPATH and for some reason the order is different so it finds the "system" package before the "venv" package, hence the incorrect version)
EmbarrassedSpider34 I can update that an RC should be out later today with a fix 🙂
Hi ThoughtfulBadger56
Just add --stop
to the clearml-agent
(the exact same command as you used to spin it, just add --stop at the end and it will stop it, or just do clearml-agent daemon --stop and it will iteratively close them)
CrookedWalrus33 can you post the clearml.conf you have on the agent machine?
Hi @<1547028116780617728:profile|TimelyRabbit96>
Trying to do model inference on a video, so first step in
Preprocess
class is to extract frames.
Basically this depends on the RestAPI, usually would will be sending a link to data to be processed and returned Synchronously
What you should have a custom endpoint doing the extraction, send Raw data into another endpoint doing the model inference, basically think "pipeline" end points:
[None](https://github.com/allegro...
Omg that's a lot of submodules!
It has nothing with what the tasks sees if you are inside a git repo you will have to cone it on the remote machine. Let me check in the code maybe you have a workaround
I see, actually what you should do is a fully custom endpoint,
- preprocessing -> doenload video
- processing -> extract frames and send them to Triton with gRPC (see below how)
- post processing, return a human readable answer
Regrading the processing itself, what you need is to take this function (copy paste):
None
have it as internal `_process...
Is gpu_0_utilization also in % then?
Correct 🙂
I was trying to find, what are those min and max value for above metrics.
Oh that makes sense, notice that you can get the values over time, so you can track the usage over the experiment lifetime (you can of course see it in the Scalar tab of the experiment)
Hi RipeGoose2
when I'm using the set_credentials approach does it mean the trains.conf is redundant? if
Yes this means there is no need for trains.conf , all the important stuff (i.e. server + credentials you provide from code).
BTW: When you execute the same code (i.e. code with set_credentials call) the agent's coniguration will override what you have there, so you will be able to run the Task later either on prem/cloud without needing to change the code itself 🙂
Let me check the API reference
https://clear.ml/docs/latest/docs/references/api/endpoints#post-tasksget_all
So not straight query, but maybe:
https://clear.ml/docs/latest/docs/references/api/endpoints#post-tasksget_all_exall
section might do the trick.
SuccessfulKoala55 any chance you have an idea on what to pass there ?
@<1560074028276781056:profile|HealthyDove84> if you want you can PR a fix, it should be very simple basically:
None
elif np_dtype == str:
return "STRING"
elif np_dtype == np.object_ or np_dtype.type == np.bytes_:
return "BYTES"
return None
We are working on 1.3.0 so this is right in time
Thank you so much @<1572395184505753600:profile|GleamingSeagull15> !
looks like your
faq.clear.ml
site is missing from your main sites sitemap files,
Thank you for noticing! I'll check with the webdevs
Also missing the
robots
meta tag on that site,
🙏
Last tip is to add a link on the
faq.clear.ml
site back to
clear.ml
for search index relevancy ( connects the two sites as being related in content...
JealousParrot68 yes this seems like a correct description.
The main diff between 1 & 2 is what is the actual data, if this is training/testing data, then Dataset would make sense, if this is a part of a preprocessing pipeline, then artifacts make more sense (notice we added pipeline step caching in the artifacts, so that you can reuse steps if they have the same parameters/code, which means you are able to clone a pipeline and rerun without repeating unnecessary data processing.
Yes I was thinking a separate branch.
The main issue with telling git to skip submodules is that it will be easily forgotten and will break stuff. BTW the git repo itself is cached so the second time there is no actual pull. Lastly it's not clear on where one could pass a git argument per task. Wdyt?
JitteryCoyote63
I agree that its name is not search-engine friendly,
LOL 😄
It was an internal joke the guys decided to call it "trains" cause you know it trains...
It was unstoppable, we should probably do a line of merch with AI 🚆 😉
Anyhow, this one definitely backfired...