AgitatedDove14

48 Questions, 8051 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Questions 48
Answers 8051

0 Votes

7 Answers

428 Views

0 Votes 7 Answers 428 Views

Thank You All For Taking The Time To Answer Our Survey (If You Haven'T Already, We Urge You To

Thank you all for taking the time to answer our survey (If you haven't already, we urge you to do so ). Your feedback has a major impact on what we build, do...

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Lol, I Wonder What The Adblock Rule Was ;)

Lol, I wonder what the adblock rule was ;)

clearml

4 years ago

0 Votes

0 Answers

980 Views

0 Votes 0 Answers 980 Views

<!channel> *important notice* : it seems Nvidia broke some of their PPA's security :confused: , causing `apt-get updates` to fail inside containers. This in term will cause `clearml-agent` to fail with specific Nvidia containers. _If you are seeing simila

important notice : it seems Nvidia broke some of their PPA's security 😕 , causing apt-get updates to fail inside containers. This in term will cause clearml...

clearml

2 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

This Is Usually Due To Enterprise Level Issued Https Certificates Not Part Of The Local Installation (Basically Any Python Generated Ssl Request Will Fail)

This is usually due to enterprise level issued https certificates not part of the local installation (basically any python generated SSL request will fail)

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Is It A One Time Thing? Or Recurring?

Is it a one time thing? or recurring?

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

New Rc For Trains-Agent Is Out

New RC for trains-agent is out pip install trains-agent==0.13.2rc1

clearml

4 years ago

0 Votes

1 Answers

484 Views

0 Votes 1 Answers 484 Views

Lstmeow Is Back! Bots/Gals/Guys Feel Free To

LSTMeow is back! Bots/Gals/Guys feel free to 👍 None

clearml

4 years ago

0 Votes

0 Answers

985 Views

0 Votes 0 Answers 985 Views

Gals, Guys & :robot_face: If you want to get some inspiration on building DL Continuous Integration pipelines, I suggest this post (obviously built on top of Trains :smile_cat: ) <https://twitter.com/PyTorch/status/1272919483980500999>

Gals, Guys & :robot_face: If you want to get some inspiration on building DL Continuous Integration pipelines, I suggest this post (obviously built on top of...

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

docs are up

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

<!here> New video is out :slightly_smiling_face: Cloud Autoscalers are awesome <https://www.youtube.com/watch?v=j4XVMAaUt3E>

New video is out 🙂 Cloud Autoscalers are awesome https://www.youtube.com/watch?v=j4XVMAaUt3E

clearml

2 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

@YummyWhale40 you are saying the example code is not working when running with the demo server? Also I think I was able to view your experiment on the demo server, and do get the Scalars without any issues...

YummyWhale40 you are saying the example code is not working when running with the demo server? Also I think I was able to view your experiment on the demo se...

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

I Would Guess Connectivity Issues, The Tls Is Probably Python Inaccurate Response (I Mean In A Way, It Is Also A Tls Error, But I Would Imagine This Has More To Do With The Actual Network Connection)

I would guess connectivity issues, the TLS is probably python inaccurate response (I mean in a way, it is also a TLS error, but I would imagine this has more...

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

New releases: ```pip install trains==0.13.3``` <https://github.com/allegroai/trains/releases/tag/0.13.3> ```pip install trains-agent==0.13.2``` <https://github.com/allegroai/trains-agent/releases/tag/0.13.2>

New releases: pip install trains==0.13.3https://github.com/allegroai/trains/releases/tag/0.13.3 pip install trains-agent==0.13.2https://github.com/allegroai/...

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Https://M.Facebook.Com/Story.Php?Story_Fbid=2484620658505570&Id=1620822758218702&Refid=52&__Tn__=-R

https://m.facebook.com/story.php?story_fbid=2484620658505570&id=1620822758218702&refid=52&tn=-R

clearml

4 years ago

0 Votes

3 Answers

503 Views

0 Votes 3 Answers 503 Views

We Recently Released A New Version Of

we recently released a new version of clearml-session with Persistent Workspace support! 🚀 🎉 Finally you can develop on remote machines with workspace fold...

remote-ssh

7 months ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

you set it :slightly_smiling_face:

you set it 🙂

clearml

4 years ago

0 Votes

0 Answers

894 Views

0 Votes 0 Answers 894 Views

<!everyone> Trains v0.14.2 is out (<https://github.com/allegroai/trains/releases/tag/0.14.2|Change log>) Highlights: <https://github.com/allegroai/trains/blob/master/trains/storage/manager.py#L13|trains.storage.StorageManager> - with caching for any http

Trains v0.14.2 is out ( https://github.com/allegroai/trains/releases/tag/0.14.2 ) Highlights: https://github.com/allegroai/trains/blob/master/trains/storage/...

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Hello Everyone!

clearml

4 years ago

0 Votes

2 Answers

404 Views

0 Votes 2 Answers 404 Views

Omg Look Who Just Joined The Pytorch Ecosystem

OMG Look who just joined the PyTorch EcoSystem None Yes! it is TRAINS 🚆 🎉 🎈

clearml

4 years ago

0 Votes

6 Answers

425 Views

0 Votes 6 Answers 425 Views

Hi :robot_face: , humans We have the new documentation site up and running 🎉 None 🎊 This is still a work in progress, so we keep the previous version alive...

clearml

3 years ago

0 Votes

1 Answers

384 Views

0 Votes 1 Answers 384 Views

Please Skip

🙏 Please skip cleaml python package v1.0.1 and just move on to v1.0.2 😊 apologies for the inconvenience 🙂 pip install clearml==1.0.2

clearml

3 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

@YummyWhale40 awesome thanks!

YummyWhale40 awesome thanks!

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Well To Be Honest, We Kind Of Thought It'S Redundant. Basically Storing Artifacts In Experiments And Having Them Retrieved Quickly From The Code Itself Was Way More Convenient For Us Then To Manually Have To Do Clone/Pull Of The Data... Example: Create Da

Well to be honest, we kind of thought it's redundant. Basically storing artifacts in experiments and having them retrieved quickly from the code itself was w...

clearml

4 years ago

0 Votes

6 Answers

1K Views

0 Votes 6 Answers 1K Views

Hi ! ClearML Server + SDK v1.9.0 is out! 🎉 🚀 🎊 Happy Holidays and Happy New Year! ❇️ 🎇 🎄

clearml

one year ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

We Are At Aaai Ny, Come Look Us Up :)

We are at AAAI NY, come look us up :)

clearml

4 years ago

0 Votes

3 Answers

998 Views

0 Votes 3 Answers 998 Views

This Will Close It

This will close it Task.current_task().close()I think we should rename completed() because it just marks the Task as completed on the backend but does not ac...

clearml

3 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Slack Security ... Go Figure

Slack security ... Go figure 😉

clearml

4 years ago

0 Votes

9 Answers

984 Views

0 Votes 9 Answers 984 Views

Hi https://github.com/allegroai/trains/releases/tag/0.15.1 / https://github.com/allegroai/trains-server/releases/tag/0.15.1 / https://github.com/allegroai/tr...

clearml

4 years ago

0 Votes

3 Answers

389 Views

0 Votes 3 Answers 389 Views

These Are Xgboost Internal Metrics That Are Automatically Picked By Clearml

@<1523703325881536512:profile|ConvolutedSealion94> these are xgboost internal metrics that are automatically picked by clearml

xgboost

2 years ago

0 Votes

0 Answers

987 Views

0 Votes 0 Answers 987 Views

@PunySquid88 I'm not very familiar with what they do, but it seems that although it has a backend server as an option, it will mostly target single users with what seems like an easy to use single app. From the Reddit thread it seems that it is still not

PunySquid88 I'm not very familiar with what they do, but it seems that although it has a backend server as an option, it will mostly target single users with...

clearml

4 years ago

Show more results

0 Hi, I Am New Here. I Was Wondering Where Can I Configure Which Machines Trains (Or Trains-Agent?) Use For Queueing Tasks, And How Do I Create Such Queues. Thanks.

We are here if you need further help 🙂

4 years ago

0 Hi, I Am New Here. I Was Wondering Where Can I Configure Which Machines Trains (Or Trains-Agent?) Use For Queueing Tasks, And How Do I Create Such Queues. Thanks.

SmarmySeaurchin8 just so that I don't miss anything.
One machine, two trains-agents each one connected to a different trains-server, correct ?
from the trains-agent --help
trains-agent --config-file /home/user/my_trains_server1.conf daemon trains-agent --config-file /home/user/my_trains_server2.conf daemon

4 years ago

0 Hi, I Am New Here. I Was Wondering Where Can I Configure Which Machines Trains (Or Trains-Agent?) Use For Queueing Tasks, And How Do I Create Such Queues. Thanks.

Check the examples on the github page, I think this is what you are looking for 🙂
https://github.com/allegroai/trains-agent#running-the-trains-agent

4 years ago

0 Hello Periodically Under High Load, We Are Facing Too Long(>1 Sec) Processing Times For Requests Such As: Workers.Status_Report Events.Add_Batch Queues.Get_Next_Task. Also There Are Warnings "Connection Pool Is Full, Discarding Connection: Elasticsearch-S

Hi ItchyJellyfish73
This seems aligned with scenario you are describing, it seems the api server is overloaded with simultaneous connections.
Add an additional apiserver instance to the docker-compose and an nginx as load balancer:
https://github.com/allegroai/clearml-server/blob/09ab2af34cbf9a38f317e15d17454a2eb4c7efd0/docker/docker-compose.yml#L4
`
apiserver:
command:
- apiserver
container_name: clearml-apiserver
image: allegroai/clearml:latest
restart: unless-sto...

3 years ago

Seems the apiserver is out of connections, this is odd...
SuccessfulKoala55 do you have an idea ?

3 years ago

0 Hi All

This one should work:
` path = task.connect_configuration(path, name=name)
if task.running_locally():
my_params = read_from_path(path)
my_params = change_parmas(my_params) # change some staff

store back the change, my_params assumed to be the content of the param file (text)

task.set_configuration_object(name=name, config_taxt=my_params) `

3 years ago

0 Hey I’M Running This Script And Initialise The Clearml Task Also In This File

Still I wonder if it is normal behavior that clearml exits the experiments with status "completed" and not with failure

Well that depends on the process exit code, if for some reason (not sure why) the process exits with return code 0, it means everything was okay.
I assume this is doing something "Detected an exited process, so exiting main" this is an internal print of your code, I guess it just leaves the process with exitcode 0

3 years ago

0 I'M Working On Lsf Cluster And Wonder What Is The Easiest Way To Use The Queue Without Having Permanent Agent Or Submitting New Agents To The Lsf System Periodically. Otherwise Is There An Option To Start Task Locally That Submit The Task To The Lsf Clust

Woot woot
ChubbyLouse32 when you get it working please PR it, this is very very cool!
(I'll be happy to help 🙂 )

2 years ago

0 Hi Everyone And Thanks Again For The Help, I Still Have No Success In Running Clearml Agent, It Just Gets Stuck Without Any Output, On Debug Mode For

Okay found the issue, to disable SSL verification global add the following env variable:
CLEARML_API_HOST_VERIFY_CERT=0(I will make sure we fix the actual issue with the config file)

2 years ago

0 Hi, I Do The Following:

Hmm, let me check, there is a chance the level is dropped when manually reporting (it might be saved for internal critical reports). Regardless I can't see any reason we could not allow to control it.

3 years ago

0 Hi, I Do The Following:

Many thanks!

3 years ago

0 Hey I’M Running This Script And Initialise The Clearml Task Also In This File

I’m not sure if this was solved, but I am encountering a similar issue.

Yep, it was solved (I think v1.7+)

With

spawn

and

forkserver

(which is used in the script above) ClearML is not able to automatically capture PyTorch scalars and artifacts.

The "trick" is to have Task.init before you spawn your code, then (since your code will not start from the same state), you should call Task.current_task(), which would basically make sure everything is...

one year ago

0 Hey I’M Running This Script And Initialise The Clearml Task Also In This File

Hi ClumsyElephant70
What's the clearml you are using ?
(The first error is a by product of python process.Event created before a forkserver is created, some internal python issue. I thought it was solved, let me take a look at the code you attached)

3 years ago

0 Hi, I Would Like To Understand How I Can Set The Pip Cache Location For My Agent, I Thought That I Already Had The Right Setting With

What sort of data would be stored in the

venvs-build

folder?

ClumsyElephant70 temporary (lifetime of the task execution) virtual environment, including the code etc. It is deleted and recreated for every new task launched (or restored from cache, if venvs_cache is enabled)

2 years ago

0 Hi, I Would Like To Understand How I Can Set The Pip Cache Location For My Agent, I Thought That I Already Had The Right Setting With

Hi, I would like to understand how I can set the pip cache location for my agent,

ClumsyElephant70 by default the pip cache (and all other cache folders) are mounted back into the host itself ~/.clearml/
I'm assuming the idea is shared cache, if this is the case, do:
docker_pip_cache = ~/my_shared_nfs/pip-cachehttps://github.com/allegroai/clearml-agent/blob/e3e6a1dda81bee2dd20a64d09746568e415f1823/docs/clearml.conf#L139

2 years ago

0 Hey All. Quick Question About The

TenseOstrich47 it's based on free "index" so the first index not in used will be captured, but if you remove agents, then the order will change e.g. you take down worker #1 , the next worker you spin will be #1 becuase it is not taken)

3 years ago

0 Hey All. Quick Question About The

ClumsyElephant70
Can you manually run the same command ?
['python3.6', '-m', 'virtualenv', '/home/user/.clearml/venvs-builds/3.6']Basically:
python3.6 -m virtualenv /home/user/.clearml/venvs-builds/3.6'

3 years ago

0 Any Idea Why I Get This Error In All My Agents

i'm sorry, I mean if the queue name is not provided to the agent , the agent will look for the queue with the "default" tag. If you are specifying the queue name, there is no need to add the tag.
Is it working now?

3 years ago

0 Hey All. Quick Question About The

okay that's good, that means the agent could run it.
Now it is a matter of matching the TF with cuda (and there is no easy solution for that). Basically I htink that what you need is "nvidia/cuda:10.2-cudnn7-runtime-ubuntu16.04"

3 years ago

0 Any Idea Why I Get This Error In All My Agents

Seems like settings on the clearml-server disappeared (specifically default queue tag?!)

3 years ago

0 Any Idea Why I Get This Error In All My Agents

How are you spinning the agents ?

3 years ago

0 Any Idea Why I Get This Error In All My Agents

It seems like you are correct, everything should just work. Are you still getting the error? What's the clearml agent version?

3 years ago

0 Any Idea Why I Get This Error In All My Agents

in the docker-compose file. Still strange...

hmm yes it is... If you have an idea on what went wrong let me know, we would love to fix it

3 years ago

0 Any Idea Why I Get This Error In All My Agents

Is this still an issue (if you provide queue name, the default tag is not used so no error should be printed)

3 years ago

0 Assuming I Call

is there a way for me to get a link to the task execution? I want to write a message to slack, containing the URL so collaborators can click and see the progress

WackyRabbit7 Nice!
basically you can use this one:
task.get_output_log_web_page()

2 years ago

0 Hi Everyone! We Are Trying To Run Pipelines From Gitlab Ci Runners, But Are Faced With The Following Error When Performing

PreciousParrot26 I think this is really a matter of the CI process having very limited resources. just to be clear, you are correct and the steps them selves are Not executed inside the CI environment, but it seems that even running the pipeline logic is somehow "too much" for the limited resources... Make sense ?

2 years ago

0 Hi Everyone! We Are Trying To Run Pipelines From Gitlab Ci Runners, But Are Faced With The Following Error When Performing

OSError: [Errno 28] No space left on deviceHi PreciousParrot26
I think this says it all 🙂 there is no more storage left to run all those subprocesses

btw:

I am curious about why a

ThreadPool

of

16

threads is gathered,

This is the maximum simultaneous jobs it will try to launch (it will launch more after the launching is doe, notice not the actual execution) but this is just a way to limit it.

2 years ago

0 Hi Everyone! We Are Trying To Run Pipelines From Gitlab Ci Runners, But Are Faced With The Following Error When Performing

controller_object.start_locally()

. Only the pipelinecontroller should be running locally, right?

Correct, do notice that if you are using Pipeline decorator and calling run_locally() the actual the pipeline steps are also executed locally.
which of the two are you using (Tasks as steps, or functions as steps with decorator)?

2 years ago

0 Hi, We’Re Deploying Clearml On The Eks And Have An Issue With Authenticating The Server With The S3 Bucket. The Connection To S3 Bucket Is Not Working. Our Current Diagnosis: Clearml Internally Uses Aws_Access_Key_Id And Aws_Secret_Access_Key. But We A

Good point!
I'll make sure we do 🙂

2 years ago

0 ... And Yet Another

When using the UI with regex to search for experiments, due to the greedy nature of the search, it consistently pops up the "ERROR Fetch Experiments failed" window when starting to use groups in regex (that is, parentheses of any kind).

hmm that is a good point (i.e. only on enter it would actually search)

Could it be updated so that if an invalid regex pattern is given, it simply highlights the search bar in red (or similar) rather than stop us while writing the search pattern?

...

2 years ago

Show more results