AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Questions 48
Answers 8049

0 Votes

6 Answers

408 Views

0 Votes 6 Answers 408 Views

Hi :robot_face: , humans We have the new documentation site up and running 🎉 None 🎊 This is still a work in progress, so we keep the previous version alive...

clearml

3 years ago

0 Votes

1 Answers

951 Views

0 Votes 1 Answers 951 Views

Quick Note: V1.3.1 Caused Pipelinedecorator Tasks To By Default Disable The Automagic Frameworks Connection, This Bug Is Solved In The Latest Rc

Quick note: v1.3.1 caused PipelineDecorator Tasks to by default disable the automagic frameworks connection, this bug is solved in the latest RC pip install ...

clearml

2 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

New releases: ```pip install trains==0.13.3``` <https://github.com/allegroai/trains/releases/tag/0.13.3> ```pip install trains-agent==0.13.2``` <https://github.com/allegroai/trains-agent/releases/tag/0.13.2>

New releases: pip install trains==0.13.3https://github.com/allegroai/trains/releases/tag/0.13.3 pip install trains-agent==0.13.2https://github.com/allegroai/...

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Hello Everyone!

clearml

4 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hi ClearML v0.17.1 and ClearML-Agent v0.17.0 are now the official packages & repositories 🎉 🎊 👋 🛤️ This new name brings on many changes, mainly replace a...

clearml

3 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Hi Guys/Gals, If You Want To Checkout The Latest Rc We Have 0.15.0Rc0 Out :

Hi Guys/Gals, If you want to checkout the latest RC we have 0.15.0rc0 out : pip install trains==0.15.0rc0 pip install trains-agent==0.15.0rc0Many of the impr...

clearml

4 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

This Is Usually Due To Enterprise Level Issued Https Certificates Not Part Of The Local Installation (Basically Any Python Generated Ssl Request Will Fail)

This is usually due to enterprise level issued https certificates not part of the local installation (basically any python generated SSL request will fail)

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

We Are At Aaai Ny, Come Look Us Up :)

We are at AAAI NY, come look us up :)

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

New Rc For Trains-Agent Is Out

New RC for trains-agent is out pip install trains-agent==0.13.2rc1

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Is You Server Using Https ?!

Is you server using https ?!

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

you set it :slightly_smiling_face:

you set it 🙂

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

docs are up

clearml

4 years ago

0 Votes

3 Answers

976 Views

0 Votes 3 Answers 976 Views

This Will Close It

This will close it Task.current_task().close()I think we should rename completed() because it just marks the Task as completed on the backend but does not ac...

clearml

3 years ago

0 Votes

2 Answers

958 Views

0 Votes 2 Answers 958 Views

Hi ! trains 0.16.2 is finally out with the new pipelines interface! Check out the new example https://github.com/allegroai/trains/blob/master/examples/pipeli...

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

@YummyWhale40 you are saying the example code is not working when running with the demo server? Also I think I was able to view your experiment on the demo server, and do get the Scalars without any issues...

YummyWhale40 you are saying the example code is not working when running with the demo server? Also I think I was able to view your experiment on the demo se...

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Https://M.Facebook.Com/Story.Php?Story_Fbid=2484620658505570&Id=1620822758218702&Refid=52&__Tn__=-R

https://m.facebook.com/story.php?story_fbid=2484620658505570&id=1620822758218702&refid=52&tn=-R

clearml

4 years ago

0 Votes

1 Answers

921 Views

0 Votes 1 Answers 921 Views

Gals, Guys &

Gals, Guys & :robot_face: , if you want to checkout the Hyper-Parameters automation (Using Bayesian Optimization Hyper-Band) We have an example on the demo s...

clearml

4 years ago

0 Votes

0 Answers

972 Views

0 Votes 0 Answers 972 Views

Gals, Guys & :robot_face: If you want to get some inspiration on building DL Continuous Integration pipelines, I suggest this post (obviously built on top of Trains :smile_cat: ) <https://twitter.com/PyTorch/status/1272919483980500999>

Gals, Guys & :robot_face: If you want to get some inspiration on building DL Continuous Integration pipelines, I suggest this post (obviously built on top of...

clearml

4 years ago

Show more results

0 Hi, What Is The Right Way Of Syncing A Dataset? Whenever I Add New Archives And Try To Upload I Get:

And when running

get

the files on the parent dataset will be available as links.

BTW: if you call get_mutable_copy() the files will be copied, so you can work on them directly (if you need)

3 years ago

0 Hi, What Is The Right Way Of Syncing A Dataset? Whenever I Add New Archives And Try To Upload I Get:

Correct 🙂

3 years ago

0 Hi There, I Am Running A Clearml-Agent In Services Mode (With Docker) On A Machine With Two Disks: One With The Os (8Go, 91% Space Used) And One For The Data (100Go, 40% Space Used). When Executing The Auto-Scaler Task In This Agent, I Get The Following E

I have to admit mounting it to a different drive is a good reason to bring this feature back, the reasoning was it means the agent needs to make sure it manages them (e.g. multiple agents running on the same machine)

3 years ago

Maybe there is setting in docker to move the space used in a different location?

No that I know of...

I can simply increase the storage of the first disk, no problem with that

probably the easiest 🙂

But as you described

it looks like an edge case, so I don’t mind

🙂

3 years ago

JitteryCoyote63 I think that with 0.17.2 we stopped mounting the venv build to the host machine. Which means it is all stored inside the docker.

3 years ago

0 I Cannot Get Clearml-Agent With Docker Containers To Work. Clearml Uses

Yes (Mine isn't and it is working 🙂 )

3 years ago

0 Is There A Way To Get A Task'S Docker Container Id/Name? I'M Generally Interested In Resource Profiling Of Each Container, So I Noticed I Can Use

Oh, yes, that might be (threshold is 3 minutes if no reports) but you can change that:
task.set_resource_monitor_iteration_timeout(seconds_from_start=10)

2 years ago

0 Hello! When I Use The

Hi DangerousDragonfly8

, is it possible to somehow extract the information about the experiment/task of which status has changed?

From the docstring of add_task_trigger
```py def schedule_function(task_id): pass ```This means you are getting the Task ID that caused the trigger, now you can get all the info that you need with Task.get_task(task_id)
` def schedule_function(task_id):
the_task = Task.get_task(task_id)
# now we have all the info on the Task tha...

2 years ago

0 One More Follow-Up Still; We'Re Trying To Run Non-Gpu Scaler, And I'Ve Finally Sorted Out Subnet And Security Groups Issues, Only To Run Into This:

I just set the git credentials in the

clearml.conf

and it works out of the box

git has issues with passing the user/token from the main repo to the submodules, hence my surprise that it is working out-of-the-box.
Do notice that if you are ussing ssh-key this is a none issue.

Nope, no

.netrc

defined anywhere, ...

If this is the case can you try to add the following to your "extra_vm_bash_script"
` echo machine example.com > ~/.netrc && echo log...

2 years ago

0 Hi, I Have A Small Issue About Gpu Monitoring. I Run My Training Inside A Singularity Container And I Set The Cuda_Visible_Devices Variable. However, I Get The Following Message:

Yes that's the part that is supposed to only pull the GPU usage for your process (and sub processes) instead of globally on the entire system

4 years ago

0 Has Anyone Got Any Experience With C++ Extensions In Python When Using Clearml? In Our Setup.Py We Have:

containing the

Extension

module

Not sure I follow, what is the Extension module ? what were you running manually that is not just pip install /opt/keras-hannd ?

one year ago

0 I Have An On-Prem/Free Clearml-Server Setup With Custom S3 Back-End Storage. I'M Trying Out The Clearml-Serving Capability And Not Sure What'S Failing. When I Start The Serving Containers It Can'T Retrieve The Model:

I can't seem to figure out what the names should be from the pytorch example - where did INPUT__0 come from

This is actually the latyer name in the model:
https://github.com/allegroai/clearml-serving/blob/4b52103636bc7430d4a6666ee85fd126fcb49e2e/examples/pytorch/train_pytorch_mnist.py#L24
Which is just the default name Pytorch gives the layer
https://discuss.pytorch.org/t/how-to-get-layer-names-in-a-network/134238

it appears I need to converted into TorchScript?

Yes, this ...

2 years ago

0 Hello, Everyone! I Have A Question Regarding Clearml Features. We Run Into The Situation When Some Of The Agents That Are Working On A Hpo Die Due To Variable Reasons. Some Workers Go Offline Or Resources Need Temporarily Be Detached For Other Needs. Thu

The main reason we need the above mentioned functionality is because there are some experiments that need to run for a long time. Let's say weeks.

Good point!

. We need to temporarily pause(kill or something else) running HPO task and reassign the resource for other needs.

Oh I see now....

Later, when more important experiments has been completed, we can continue HPO task from the same state.

Quick question when you say the HPO Task, you mean the HPO controller logic Task...

2 years ago

okay that makes sense, if this is the case I would just use clearml-agent execute --id <task_id here> to continue the training Task.
Do notice you have to reload your last chekcpoint from the Task's models/artifacts to continue 🙂
Last question, what is the HPO optimization algorithm, is it just grid/random search or optuna hbop/optuna, if this is the later, how do make it "continue" ?

2 years ago

should reload the reported scalars

Exactly (notice it also understand when was the last report of scalars so it should automatically increase the iterations (i.e. you will not accidentally overwrite previously reported scalars)

and the task needs to reload last checkpoints only, right?

Correct 🙂

We didn't figure out the best way of continuing for both the grid and optuna. Can you suggest something?

That is a good point, not sure if we have a GH issue, for that but wo...

one year ago

0 Is There A Way To Get A Task'S Docker Container Id/Name? I'M Generally Interested In Resource Profiling Of Each Container, So I Noticed I Can Use

Hi ElegantCoyote26

is there a way to get a Task's docker container id/name?

you mean like Task.get_task("task_id_here").get_base_docker() ?

ow a Task's results page also has a plot for this, but I guess it's at the machine level and not the task level?

This is actually on the container level, meaning checked from inside the container. It should be what you are looking for

2 years ago

0 I'M A Little Confused As To How Force_Requirements_Env_Freeze Works When No Requirements File Is Supplied. Is It Supposed To Store The Full Reqs Of The Environment That Calls It?

Correct (basically pip freeze results)

3 years ago

0 I'M A Little Confused As To How Force_Requirements_Env_Freeze Works When No Requirements File Is Supplied. Is It Supposed To Store The Full Reqs Of The Environment That Calls It?

If you have a requirements file then you can specify it:
Task.force_requirements_env_freeze(requirements_file='requirements.txt')
If you just want pip freeze output to be shown in your "Installed Packages" section then use:
Task.force_requirements_env_freeze()
Notice that in both cases you should call the function Before you call Task.init()
btw, what do you mean by "Packages will be installed from projects requirements file" ?

3 years ago

0 Hi, I Am Trying To Upload A Plot To An Existing Task Using The

SmarmyDolphin68 , All looks okay to me...
Could you verify you still get the plot on debug samples as image with the latest trains RC
pip install trains==0.16.4rc0

3 years ago

0 Hi, I Would Like To Bring Awareness

if this is the case pytorch really messed things up, this means they removed packages
Let me check something

one year ago

0 Hey, I Have One Question Regarding The Cleanup_Service Task In The Devops Project: Does It Assume That The Agent In Services Mode Is In The Trains-Server Machine?

Hi JitteryCoyote63 ,
The easiest would probably be to list the experiment folder, and delete its content.
I might be missing a few things but the general gist should be:
from trains.storage import StorageHelper h = StorageHelper('s3://my_bucket') files = h.list(prefix='s3://my_bucket/task_project/task_name.task_id') for f in files: h.delete(f)Obviously you should have the right credentials 🙂

4 years ago

0 Hey, Using K8S With Trains 0.16.1-320, All Of A Sudden The Entire Data (I.E Experiments, Tasks, Api Creds) Is Not Showing In The Ui Anymore. All Logs Seems To Be Fine Afai Can Tell... Any Idea What Went Wrong?

Could it be it was never allocated to begin with ?

3 years ago

0 Has Anyone Compared

is it planned to add a multicursor in the future?

CheerfulGorilla72 can you expand? what do you mean by multicursor ?

2 years ago

0 After I Have Create A Task And Closed It In A Notebook, Any Activity Seems To Trigger Another Task. For Example:

Is Task.current_task() creating a task?

Hmm it should not, it should return a Task instance if one was already created.
That said, I remember there was a bug (not sure if it was in a released version or an RC) that caused it to create a new Task if there isn't an existing one. Could that be the case ?

3 years ago

0 Hi, I Was Some How Able To Get A Project Running Yesturday, However Now I Am Unable To Get It Running, I Keep Getting An Failed Getting Token Error

Can you login to it?

3 years ago

0 Hi, Is A Remote Task Execution Wit Azure Devops Private Repository Working? I Am Hitting A Problem Where Clearml Is Creating A Wrong Url For Loading:

is it also possible to somehow propagate ssh keys to the agent pod? Not sure how to approach that

I would use the k8s secret manager to do that (there is a way to mount secrets files into pod, SSH is relatively standard to do)

one year ago

0 Hi, Is A Remote Task Execution Wit Azure Devops Private Repository Working? I Am Hitting A Problem Where Clearml Is Creating A Wrong Url For Loading:

Hi MelancholyChicken65
I'm assuming you need ssh protocol not https user/token, set this one to true 🙂
force_git_ssh_protocol: true
https://github.com/allegroai/clearml-agent/blob/76c533a2e8e8e3403bfd25c94ba8000ae98857c1/docs/clearml.conf#L39

one year ago

0 I'M Having A Problem Reusing The Last Task Id On Jupyter Notebooks. Dispite Having Reuse_Last_Task_Id=True On Task.Init, It Always Creates A New Task Id. Anyone Ever Had This Issue?

You can however pass a specific Task ID and it will reuse it "reuse_last_task_id=aabb11", would that help?

Hmm I'm sorry it might be "continue_last_task", can you try:
Task.init(..., continue_last_task="aabb11")

one year ago

0 I Have A Training Task That Auto-Magically Saves A Model For Me To Gcs

Hi PanickyMoth78
` torch.save(net.state_dict(), PATH) # auto-uploads to GCS

get all the models from the Task

output_models = Task.current_task().models["output"]

get the last one

last_model = output_models[-1]

set meta-data

last_model.set_metadata(key="my key", value="my value", type="str") `

one year ago

0 Heya, Just Wanting To Know The State Of Current And Planned Features In Term Of Secret Managements/Key-Value Stores To Be Used In Clearml Pipelines, If They Exists Of Course. That Would Be Very Handy And Less Cumbersome Than Having To Deal With External S

Hi FierceHamster54
Thanks for bringing it up 🙂

... in term of secret managements/key-value stores

Currently the open-source version does not include the Vault support (e.g. secret management), this is something they added to the enterprise version a few versions away, and as far as I understand this is a per user/project/company granularity feature (i.e. company wide merging with project merging with user specific).
Is this what you are looking for or am I missing something ?

one year ago

Show more results